├── Ch1 ├── .ipynb_checkpoints │ ├── 1. Analysing the Mean of a dataset-checkpoint.ipynb │ ├── 2. Checking the Median of a dataset-checkpoint.ipynb │ ├── 3. Identifying the Mode of a dataset-checkpoint.ipynb │ ├── 4. Checking the Variance of a dataset-checkpoint.ipynb │ ├── 5. Identifying the Standard Deviation of a dataset-checkpoint.ipynb │ ├── 6. Generating the Range of a dataset-checkpoint.ipynb │ ├── 7. Identifying the Percentiles of a dataset-checkpoint.ipynb │ ├── 8. Checking the Quartiles of a dataset-checkpoint.ipynb │ ├── 9. Analysing the Interquartile Range (IQR) of a dataset-checkpoint.ipynb │ ├── Analysing the Interquartile Range (IQR) of a dataset-checkpoint.ipynb │ ├── Analysing the Mean of a dataset-checkpoint.ipynb │ ├── Chapter 1 EDA in Python-checkpoint.ipynb │ ├── Checking the Median of a dataset-checkpoint.ipynb │ ├── Checking the Quartiles of a dataset-Copy1-checkpoint.ipynb │ ├── Checking the Quartiles of a dataset-checkpoint.ipynb │ ├── Checking the Variance of a dataset-checkpoint.ipynb │ ├── Generating the Range of a dataset-checkpoint.ipynb │ ├── Identifying the Mode of a dataset-checkpoint.ipynb │ ├── Identifying the Percentiles of a dataset-checkpoint.ipynb │ └── Identifying the Standard Deviation of a dataset-checkpoint.ipynb ├── 1. Analysing the Mean of a dataset.ipynb ├── 2. Checking the Median of a dataset.ipynb ├── 3. Identifying the Mode of a dataset.ipynb ├── 4. Checking the Variance of a dataset.ipynb ├── 5. Identifying the Standard Deviation of a dataset.ipynb ├── 6. Generating the Range of a dataset.ipynb ├── 7. Identifying the Percentiles of a dataset.ipynb ├── 8. Checking the Quartiles of a dataset.ipynb ├── 9. Analysing the Interquartile Range (IQR) of a dataset.ipynb └── Data │ └── covid-data.csv ├── Ch2 ├── .ipynb_checkpoints │ ├── 1. Grouping Data-checkpoint.ipynb │ ├── 10. Replacing Data-checkpoint.ipynb │ ├── 11. Dealing Missing values-checkpoint.ipynb │ ├── 2. Appending Data-checkpoint.ipynb │ ├── 3. Concatenating Data-checkpoint.ipynb │ ├── 4. Merging Data-checkpoint.ipynb │ ├── 5. Sorting Data-checkpoint.ipynb │ ├── 6. Categorising Data-checkpoint.ipynb │ ├── 7. Removing Duplicates-checkpoint.ipynb │ ├── 8. Dropping Rows and Columns-checkpoint.ipynb │ ├── 9. Changing Data Format-checkpoint.ipynb │ ├── Appending Data-checkpoint.ipynb │ ├── Categorising Data-checkpoint.ipynb │ ├── Changing Data Format-checkpoint.ipynb │ ├── Concatenating Data-checkpoint.ipynb │ ├── Dropping Rows and Columns-checkpoint.ipynb │ ├── Grouping Data-checkpoint.ipynb │ ├── Merging Data-checkpoint.ipynb │ ├── Removing Duplicates-checkpoint.ipynb │ ├── Replacing Data-checkpoint.ipynb │ ├── Sorting Data-checkpoint.ipynb │ └── Untitled-checkpoint.ipynb ├── 1. Grouping Data.ipynb ├── 10. Replacing Data.ipynb ├── 11. Dealing with Missing values.ipynb ├── 2. Appending Data.ipynb ├── 3. Concatenating Data.ipynb ├── 4. Merging Data.ipynb ├── 5. Sorting Data.ipynb ├── 6. Categorising Data.ipynb ├── 7. Removing Duplicates.ipynb ├── 8. Dropping Rows and Columns.ipynb ├── 9. Changing Data Format.ipynb └── Data │ ├── marketing_campaign.csv │ ├── marketing_campaign_append1.csv │ ├── marketing_campaign_append2.csv │ ├── marketing_campaign_concat1.csv │ ├── marketing_campaign_concat2.csv │ ├── marketing_campaign_merge1.csv │ └── marketing_campaign_merge2.csv ├── Ch3 ├── .ipynb_checkpoints │ ├── 1. Grouping Data-checkpoint.ipynb │ ├── 1. Preparing for EDA .-checkpoint.ipynb │ ├── 10. Replacing Data-checkpoint.ipynb │ ├── 2. Appending Data-checkpoint.ipynb │ ├── 2. Visualizing data in Matplotlib-checkpoint.ipynb │ ├── 3. Concatenating Data-checkpoint.ipynb │ ├── 3. Visualizing data in Seaborn-checkpoint.ipynb │ ├── 4. Merging Data-checkpoint.ipynb │ ├── 4. Visualizing data in GGPLOT-checkpoint.ipynb │ ├── 5. Sorting Data-checkpoint.ipynb │ ├── 5. Visualizing data in Bokeh-checkpoint.ipynb │ ├── 6. Categorising Data-checkpoint.ipynb │ ├── 7. Removing Duplicates-checkpoint.ipynb │ ├── 8. Dropping Rows and Columns-checkpoint.ipynb │ ├── 9. Changing Data Format-checkpoint.ipynb │ ├── Appending Data-checkpoint.ipynb │ ├── Categorising Data-checkpoint.ipynb │ ├── Changing Data Format-checkpoint.ipynb │ ├── Concatenating Data-checkpoint.ipynb │ ├── Dropping Rows and Columns-checkpoint.ipynb │ ├── Grouping Data-checkpoint.ipynb │ ├── Merging Data-checkpoint.ipynb │ ├── Removing Duplicates-checkpoint.ipynb │ ├── Replacing Data-checkpoint.ipynb │ ├── Sorting Data-checkpoint.ipynb │ └── Untitled-checkpoint.ipynb ├── 1. Preparing for EDA ..ipynb ├── 2. Visualizing data in Matplotlib.ipynb ├── 3. Visualizing data in Seaborn.ipynb ├── 4. Visualizing data in GGPLOT.ipynb ├── 5. Visualizing data in Bokeh.ipynb └── Data │ └── HousingPricesData.csv ├── Ch4 ├── .ipynb_checkpoints │ ├── 1. Performing univariate analysis using a Histogram-checkpoint.ipynb │ ├── 2. Performing univariate analysis using a Boxplot-checkpoint.ipynb │ ├── 3. Performing univariate analysis using a Violinplot-checkpoint.ipynb │ ├── 4. Performing univariate analysis using a Summary Table-checkpoint.ipynb │ ├── Performing univariate analysis using a Bar Chart-checkpoint.ipynb │ ├── Performing univariate analysis using a Boxplot-checkpoint.ipynb │ ├── Performing univariate analysis using a Histogram-checkpoint.ipynb │ ├── Performing univariate analysis using a Pie Chart-checkpoint.ipynb │ ├── Performing univariate analysis using a Table-checkpoint.ipynb │ └── Performing univariate analysis using a Violinplot-checkpoint.ipynb ├── 1. Performing univariate analysis using a Histogram.ipynb ├── 2. Performing univariate analysis using a Boxplot.ipynb ├── 3. Performing univariate analysis using a Violinplot.ipynb ├── 4. Performing univariate analysis using a Summary Table.ipynb ├── 5. Performing univariate analysis using a Bar Chart.ipynb ├── 6. Performing univariate analysis using a Pie Chart.ipynb └── Data │ ├── HousingPricesData.csv │ ├── penguins_lter.csv │ └── penguins_size.csv ├── Ch5 ├── .ipynb_checkpoints │ ├── 1. Analysing two variables using a Scatter plot-checkpoint.ipynb │ ├── 2. Creating CrosstabTwo-way table on bivariate data-checkpoint.ipynb │ ├── 3. Analysing two variables using a Pivot table-checkpoint.ipynb │ ├── 4. Generating Pairplots on two variables-checkpoint.ipynb │ ├── 5. Analysing two variables using a Bar chart-checkpoint.ipynb │ ├── 6. Generating Box plots for two variables-checkpoint.ipynb │ ├── 7. Creating Histograms on two variables-checkpoint.ipynb │ └── 8. Analysing two variables using a Correlation analysis-checkpoint.ipynb ├── 1. Analysing two variables using a Scatter plot.ipynb ├── 2. Creating CrosstabTwo-way table on bivariate data.ipynb ├── 3. Analysing two variables using a Pivot table.ipynb ├── 4. Generating Pairplots on two variables.ipynb ├── 5. Analysing two variables using a Bar chart.ipynb ├── 6. Generating Box plots for two variables.ipynb ├── 7. Creating Histograms on two variables.ipynb ├── 8. Analysing two variables using a Correlation analysis.ipynb └── Data │ ├── HousingPricesData.csv │ ├── penguins_lter.csv │ └── penguins_size.csv ├── Ch6 ├── .ipynb_checkpoints │ ├── 1. Implementing Cluster Analysis on multiple variables using Kmeans-checkpoint.ipynb │ ├── 2. Choosing the Optimal number of K clusters in Kmeans-checkpoint.ipynb │ ├── 3. Profiling Kmeans Clusters-checkpoint.ipynb │ ├── 4. Implementing Principal Component Analysis (PCA) on multiple variables-checkpoint.ipynb │ ├── 5. Choosing the number of Principal Components-checkpoint.ipynb │ ├── 6. Analysing Principal Components-checkpoint.ipynb │ ├── 7. Implementing Factor Analysis on multiple variables-checkpoint.ipynb │ ├── 8. Determining the number of factors-Copy1-checkpoint.ipynb │ ├── 8. Determining the number of factors-checkpoint.ipynb │ └── 9. Analysing the factors-checkpoint.ipynb ├── 1. Implementing Cluster Analysis on multiple variables using Kmeans.ipynb ├── 2. Choosing the Optimal number of K clusters in Kmeans.ipynb ├── 3. Profiling Kmeans Clusters.ipynb ├── 4. Implementing Principal Component Analysis (PCA) on multiple variables.ipynb ├── 5. Choosing the number of Principal Components.ipynb ├── 6. Analysing Principal Components.ipynb ├── 7. Implementing Factor Analysis on multiple variables.ipynb ├── 8. Determining the number of factors.ipynb ├── 9. Analysing the factors.ipynb └── Data │ ├── marketing_campaign.csv │ └── website_survey.csv ├── Ch7 ├── .ipynb_checkpoints │ ├── 1. Using line and boxplots to visualise time series data-checkpoint.ipynb │ ├── 2 Spotting patterns in Time series old-checkpoint.ipynb │ ├── 2 Spotting patterns in Time series-checkpoint.ipynb │ ├── 3 Performing Time series data Decomposition-checkpoint.ipynb │ ├── 4 Performing Smoothing - Moving Average-checkpoint.ipynb │ ├── 5 Performing Smoothing - Exponential Smoothing-checkpoint.ipynb │ ├── 6. Performing Stationarity checks on Time series data-checkpoint.ipynb │ ├── 7. Differencing Time series data-checkpoint.ipynb │ └── 8. Using Correlation plots to visualise time series data-checkpoint.ipynb ├── 1. Using line and boxplots to visualise time series data.ipynb ├── 2 Spotting patterns in Time series old.ipynb ├── 2 Spotting patterns in Time series.ipynb ├── 3 Performing Time series data Decomposition.ipynb ├── 4 Performing Smoothing - Moving Average.ipynb ├── 5 Performing Smoothing - Exponential Smoothing.ipynb ├── 6. Performing Stationarity checks on Time series data.ipynb ├── 7. Differencing Time series data.ipynb ├── 8. Using Correlation plots to visualise time series data.ipynb └── Data │ ├── DailyDelhiClimate.csv │ ├── DataSF_Data_Dictionary_for_Air_Traffic_Passenger_Statistics.pdf │ ├── MTNOY.csv │ ├── SF_Air_Traffic_Passenger_Statistics.csv │ └── SF_Air_Traffic_Passenger_Statistics_Transformed.csv ├── Ch8 ├── .ipynb_checkpoints │ ├── 1. Preparing Text data-checkpoint.ipynb │ ├── 10.Choosing Optimal number of Topics-checkpoint.ipynb │ ├── 2. Removing Stop words-checkpoint.ipynb │ ├── 3. Analysing Part of Speech-checkpoint.ipynb │ ├── 4. Performing Stemming and Lemmatisation-checkpoint.ipynb │ ├── 5. Analysing Ngrams-checkpoint.ipynb │ ├── 6. Creating Word Clouds-checkpoint.ipynb │ ├── 7. Checking Term Frequency-checkpoint.ipynb │ ├── 8. Checking Sentiments-checkpoint.ipynb │ └── 9. Performing Topic Modelling-checkpoint.ipynb ├── 1. Preparing Text data.ipynb ├── 10.Choosing Optimal number of Topics.ipynb ├── 2. Removing Stop words.ipynb ├── 3. Analysing Part of Speech.ipynb ├── 4. Performing Stemming and Lemmatisation.ipynb ├── 5. Analysing Ngrams.ipynb ├── 6. Creating Word Clouds.ipynb ├── 7. Checking Term Frequency.ipynb ├── 8. Checking Sentiments.ipynb ├── 9. Performing Topic Modelling.ipynb └── Data │ ├── a1_RestaurantReviews_HistoricDump.tsv │ ├── cleaned_reviews_data.csv │ ├── cleaned_reviews_lemmatized_data.csv │ └── cleaned_reviews_no_stopwords_data.csv ├── Data ├── Melbourne_housing_EDA.csv ├── Melbourne_housing_FULL.csv ├── Melbourne_housing_imp.csv └── datasets_notes.numbers ├── LICENSE └── README.md /Ch2/.ipynb_checkpoints/1. Grouping Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n", 56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n", 57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "128c9ecd", 63 | "metadata": {}, 64 | "source": [ 65 | "#### Inspect first 2 rows and data types of the dataset" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "id": "1f556097", 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | "
01
ID55242174
Year_Birth19571954
EducationGraduationGraduation
Marital_StatusSingleSingle
Income58138.046344.0
Kidhome01
Teenhome01
Dt_Customer04-09-201208-03-2014
Recency5838
NumStorePurchases42
NumWebVisitsMonth75
\n", 157 | "
" 158 | ], 159 | "text/plain": [ 160 | " 0 1\n", 161 | "ID 5524 2174\n", 162 | "Year_Birth 1957 1954\n", 163 | "Education Graduation Graduation\n", 164 | "Marital_Status Single Single\n", 165 | "Income 58138.0 46344.0\n", 166 | "Kidhome 0 1\n", 167 | "Teenhome 0 1\n", 168 | "Dt_Customer 04-09-2012 08-03-2014\n", 169 | "Recency 58 38\n", 170 | "NumStorePurchases 4 2\n", 171 | "NumWebVisitsMonth 7 5" 172 | ] 173 | }, 174 | "execution_count": 4, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "marketing_data.head(2).T" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 5, 186 | "id": "b3601b62", 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "text/plain": [ 192 | "ID int64\n", 193 | "Year_Birth int64\n", 194 | "Education object\n", 195 | "Marital_Status object\n", 196 | "Income float64\n", 197 | "Kidhome int64\n", 198 | "Teenhome int64\n", 199 | "Dt_Customer object\n", 200 | "Recency int64\n", 201 | "NumStorePurchases int64\n", 202 | "NumWebVisitsMonth int64\n", 203 | "dtype: object" 204 | ] 205 | }, 206 | "execution_count": 5, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "marketing_data.dtypes" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 6, 218 | "id": "bdf17c4b", 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "(2240, 11)" 225 | ] 226 | }, 227 | "execution_count": 6, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "marketing_data.shape" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "id": "46f24ec2", 239 | "metadata": {}, 240 | "source": [ 241 | "#### Check the average number of store purchases of customers based on number of kids in the home" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 7, 247 | "id": "fe10f0a2", 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/plain": [ 253 | "Kidhome\n", 254 | "0 7.217324\n", 255 | "1 3.863181\n", 256 | "2 3.437500\n", 257 | "Name: NumStorePurchases, dtype: float64" 258 | ] 259 | }, 260 | "execution_count": 7, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()" 267 | ] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.7.1" 287 | } 288 | }, 289 | "nbformat": 4, 290 | "nbformat_minor": 5 291 | } 292 | -------------------------------------------------------------------------------- /Ch2/.ipynb_checkpoints/10. Replacing Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "id": "1f556097", 70 | "metadata": { 71 | "scrolled": true 72 | }, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | "
IDYear_BirthKidhomeTeenhome
05524195700
12174195411
24141196500
36182198410
45324198110
\n", 139 | "
" 140 | ], 141 | "text/plain": [ 142 | " ID Year_Birth Kidhome Teenhome\n", 143 | "0 5524 1957 0 0\n", 144 | "1 2174 1954 1 1\n", 145 | "2 4141 1965 0 0\n", 146 | "3 6182 1984 1 0\n", 147 | "4 5324 1981 1 0" 148 | ] 149 | }, 150 | "execution_count": 4, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "marketing_data.head()" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 5, 162 | "id": "b3601b62", 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "data": { 167 | "text/plain": [ 168 | "(2240, 4)" 169 | ] 170 | }, 171 | "execution_count": 5, 172 | "metadata": {}, 173 | "output_type": "execute_result" 174 | } 175 | ], 176 | "source": [ 177 | "marketing_data.shape" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "id": "46f24ec2", 183 | "metadata": {}, 184 | "source": [ 185 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\"" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 6, 191 | "id": "fe10f0a2", 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "id": "998a9da7", 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/html": [ 207 | "
\n", 208 | "\n", 221 | "\n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | "
TeenhomeTeenhome_replaced
00has no teen
11has teen
20has no teen
30has no teen
40has no teen
\n", 257 | "
" 258 | ], 259 | "text/plain": [ 260 | " Teenhome Teenhome_replaced\n", 261 | "0 0 has no teen\n", 262 | "1 1 has teen\n", 263 | "2 0 has no teen\n", 264 | "3 0 has no teen\n", 265 | "4 0 has no teen" 266 | ] 267 | }, 268 | "execution_count": 7, 269 | "metadata": {}, 270 | "output_type": "execute_result" 271 | } 272 | ], 273 | "source": [ 274 | "marketing_data[['Teenhome','Teenhome_replaced']].head()" 275 | ] 276 | } 277 | ], 278 | "metadata": { 279 | "kernelspec": { 280 | "display_name": "Python 3", 281 | "language": "python", 282 | "name": "python3" 283 | }, 284 | "language_info": { 285 | "codemirror_mode": { 286 | "name": "ipython", 287 | "version": 3 288 | }, 289 | "file_extension": ".py", 290 | "mimetype": "text/x-python", 291 | "name": "python", 292 | "nbconvert_exporter": "python", 293 | "pygments_lexer": "ipython3", 294 | "version": "3.7.1" 295 | } 296 | }, 297 | "nbformat": 4, 298 | "nbformat_minor": 5 299 | } 300 | -------------------------------------------------------------------------------- /Ch2/.ipynb_checkpoints/9. Changing Data Format-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth','Marital_Status','Income']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "id": "1f556097", 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "data": { 74 | "text/html": [ 75 | "
\n", 76 | "\n", 89 | "\n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | "
IDYear_BirthMarital_StatusIncome
055241957Single58138.0
121741954Single46344.0
241411965Together71613.0
361821984Together26646.0
453241981Married58293.0
\n", 137 | "
" 138 | ], 139 | "text/plain": [ 140 | " ID Year_Birth Marital_Status Income\n", 141 | "0 5524 1957 Single 58138.0\n", 142 | "1 2174 1954 Single 46344.0\n", 143 | "2 4141 1965 Together 71613.0\n", 144 | "3 6182 1984 Together 26646.0\n", 145 | "4 5324 1981 Married 58293.0" 146 | ] 147 | }, 148 | "execution_count": 4, 149 | "metadata": {}, 150 | "output_type": "execute_result" 151 | } 152 | ], 153 | "source": [ 154 | "marketing_data.head()" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 5, 160 | "id": "b3601b62", 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "data": { 165 | "text/plain": [ 166 | "(2240, 4)" 167 | ] 168 | }, 169 | "execution_count": 5, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "marketing_data.shape" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "id": "35c8704d", 181 | "metadata": {}, 182 | "source": [ 183 | "#### Fill NAs in the income column" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 6, 189 | "id": "9e28186b", 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "marketing_data['Income'] = marketing_data['Income'].fillna(0)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "id": "46f24ec2", 199 | "metadata": {}, 200 | "source": [ 201 | "#### Change the data type of the Income from float to int" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 7, 207 | "id": "fe10f0a2", 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "marketing_data['Income_changed'] = marketing_data['Income'].astype(int)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 8, 217 | "id": "5b38b1b8", 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "data": { 222 | "text/html": [ 223 | "
\n", 224 | "\n", 237 | "\n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | "
IncomeIncome_changed
058138.058138
146344.046344
271613.071613
326646.026646
458293.058293
\n", 273 | "
" 274 | ], 275 | "text/plain": [ 276 | " Income Income_changed\n", 277 | "0 58138.0 58138\n", 278 | "1 46344.0 46344\n", 279 | "2 71613.0 71613\n", 280 | "3 26646.0 26646\n", 281 | "4 58293.0 58293" 282 | ] 283 | }, 284 | "execution_count": 8, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "marketing_data[['Income','Income_changed']].head()" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 9, 296 | "id": "998a9da7", 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "data": { 301 | "text/plain": [ 302 | "Income float64\n", 303 | "Income_changed int32\n", 304 | "dtype: object" 305 | ] 306 | }, 307 | "execution_count": 9, 308 | "metadata": {}, 309 | "output_type": "execute_result" 310 | } 311 | ], 312 | "source": [ 313 | "marketing_data[['Income','Income_changed']].dtypes" 314 | ] 315 | } 316 | ], 317 | "metadata": { 318 | "kernelspec": { 319 | "display_name": "Python 3", 320 | "language": "python", 321 | "name": "python3" 322 | }, 323 | "language_info": { 324 | "codemirror_mode": { 325 | "name": "ipython", 326 | "version": 3 327 | }, 328 | "file_extension": ".py", 329 | "mimetype": "text/x-python", 330 | "name": "python", 331 | "nbconvert_exporter": "python", 332 | "pygments_lexer": "ipython3", 333 | "version": "3.7.1" 334 | } 335 | }, 336 | "nbformat": 4, 337 | "nbformat_minor": 5 338 | } 339 | -------------------------------------------------------------------------------- /Ch2/.ipynb_checkpoints/Grouping Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n", 56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n", 57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "128c9ecd", 63 | "metadata": {}, 64 | "source": [ 65 | "#### Inspect first 5 rows and data types of the dataset" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 9, 71 | "id": "1f556097", 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | "
01
ID55242174
Year_Birth19571954
EducationGraduationGraduation
Marital_StatusSingleSingle
Income58138.046344.0
Kidhome01
Teenhome01
Dt_Customer04-09-201208-03-2014
Recency5838
NumStorePurchases42
NumWebVisitsMonth75
\n", 157 | "
" 158 | ], 159 | "text/plain": [ 160 | " 0 1\n", 161 | "ID 5524 2174\n", 162 | "Year_Birth 1957 1954\n", 163 | "Education Graduation Graduation\n", 164 | "Marital_Status Single Single\n", 165 | "Income 58138.0 46344.0\n", 166 | "Kidhome 0 1\n", 167 | "Teenhome 0 1\n", 168 | "Dt_Customer 04-09-2012 08-03-2014\n", 169 | "Recency 58 38\n", 170 | "NumStorePurchases 4 2\n", 171 | "NumWebVisitsMonth 7 5" 172 | ] 173 | }, 174 | "execution_count": 9, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "marketing_data.head(2).T" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 10, 186 | "id": "20f83686", 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "marketing_data.head(2).T.to_clipboard()" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 5, 196 | "id": "b3601b62", 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "data": { 201 | "text/plain": [ 202 | "ID int64\n", 203 | "Year_Birth int64\n", 204 | "Education object\n", 205 | "Marital_Status object\n", 206 | "Income float64\n", 207 | "Kidhome int64\n", 208 | "Teenhome int64\n", 209 | "Dt_Customer object\n", 210 | "Recency int64\n", 211 | "NumStorePurchases int64\n", 212 | "NumWebVisitsMonth int64\n", 213 | "dtype: object" 214 | ] 215 | }, 216 | "execution_count": 5, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "marketing_data.dtypes" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 11, 228 | "id": "4082ff2c", 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "marketing_data.dtypes.to_clipboard()" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 12, 238 | "id": "bdf17c4b", 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/plain": [ 244 | "(2240, 11)" 245 | ] 246 | }, 247 | "execution_count": 12, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "marketing_data.shape" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "id": "46f24ec2", 259 | "metadata": {}, 260 | "source": [ 261 | "#### Check the average number of store purchases of customers based on number of kids in the home" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 6, 267 | "id": "fe10f0a2", 268 | "metadata": {}, 269 | "outputs": [ 270 | { 271 | "data": { 272 | "text/plain": [ 273 | "Kidhome\n", 274 | "0 7.217324\n", 275 | "1 3.863181\n", 276 | "2 3.437500\n", 277 | "Name: NumStorePurchases, dtype: float64" 278 | ] 279 | }, 280 | "execution_count": 6, 281 | "metadata": {}, 282 | "output_type": "execute_result" 283 | } 284 | ], 285 | "source": [ 286 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 14, 292 | "id": "2c5c4c2f", 293 | "metadata": {}, 294 | "outputs": [], 295 | "source": [ 296 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean().to_clipboard()" 297 | ] 298 | } 299 | ], 300 | "metadata": { 301 | "kernelspec": { 302 | "display_name": "Python 3", 303 | "language": "python", 304 | "name": "python3" 305 | }, 306 | "language_info": { 307 | "codemirror_mode": { 308 | "name": "ipython", 309 | "version": 3 310 | }, 311 | "file_extension": ".py", 312 | "mimetype": "text/x-python", 313 | "name": "python", 314 | "nbconvert_exporter": "python", 315 | "pygments_lexer": "ipython3", 316 | "version": "3.7.1" 317 | } 318 | }, 319 | "nbformat": 4, 320 | "nbformat_minor": 5 321 | } 322 | -------------------------------------------------------------------------------- /Ch2/.ipynb_checkpoints/Replacing Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 29, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 30, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 31, 69 | "id": "1f556097", 70 | "metadata": { 71 | "scrolled": true 72 | }, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | "
IDYear_BirthKidhomeTeenhome
05524195700
12174195411
24141196500
36182198410
45324198110
\n", 139 | "
" 140 | ], 141 | "text/plain": [ 142 | " ID Year_Birth Kidhome Teenhome\n", 143 | "0 5524 1957 0 0\n", 144 | "1 2174 1954 1 1\n", 145 | "2 4141 1965 0 0\n", 146 | "3 6182 1984 1 0\n", 147 | "4 5324 1981 1 0" 148 | ] 149 | }, 150 | "execution_count": 31, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "marketing_data.head()" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 32, 162 | "id": "954b5d8d", 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "marketing_data.head().to_clipboard()" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 33, 172 | "id": "b3601b62", 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "(2240, 4)" 179 | ] 180 | }, 181 | "execution_count": 33, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "marketing_data.shape" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "id": "46f24ec2", 193 | "metadata": {}, 194 | "source": [ 195 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\"" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 21, 201 | "id": "fe10f0a2", 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 22, 211 | "id": "998a9da7", 212 | "metadata": {}, 213 | "outputs": [ 214 | { 215 | "data": { 216 | "text/html": [ 217 | "
\n", 218 | "\n", 231 | "\n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | "
TeenhomeTeenhome_replaced
00has no teen
11has teen
20has no teen
30has no teen
40has no teen
\n", 267 | "
" 268 | ], 269 | "text/plain": [ 270 | " Teenhome Teenhome_replaced\n", 271 | "0 0 has no teen\n", 272 | "1 1 has teen\n", 273 | "2 0 has no teen\n", 274 | "3 0 has no teen\n", 275 | "4 0 has no teen" 276 | ] 277 | }, 278 | "execution_count": 22, 279 | "metadata": {}, 280 | "output_type": "execute_result" 281 | } 282 | ], 283 | "source": [ 284 | "marketing_data[['Teenhome','Teenhome_replaced']].head()" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 25, 290 | "id": "393c92d6", 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "marketing_data[['Teenhome','Teenhome_replaced']].head().to_clipboard()" 295 | ] 296 | } 297 | ], 298 | "metadata": { 299 | "kernelspec": { 300 | "display_name": "Python 3", 301 | "language": "python", 302 | "name": "python3" 303 | }, 304 | "language_info": { 305 | "codemirror_mode": { 306 | "name": "ipython", 307 | "version": 3 308 | }, 309 | "file_extension": ".py", 310 | "mimetype": "text/x-python", 311 | "name": "python", 312 | "nbconvert_exporter": "python", 313 | "pygments_lexer": "ipython3", 314 | "version": "3.7.1" 315 | } 316 | }, 317 | "nbformat": 4, 318 | "nbformat_minor": 5 319 | } 320 | -------------------------------------------------------------------------------- /Ch2/.ipynb_checkpoints/Sorting Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 2, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 3, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 4, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n", 56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n", 57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "128c9ecd", 63 | "metadata": {}, 64 | "source": [ 65 | "#### Inspect first 5 rows and data types of the dataset" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 6, 71 | "id": "1f556097", 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | "
IDYear_BirthEducationMarital_StatusIncomeKidhomeTeenhomeDt_CustomerRecencyNumStorePurchasesNumWebVisitsMonth
055241957GraduationSingle58138.00004-09-20125847
121741954GraduationSingle46344.01108-03-20143825
241411965GraduationTogether71613.00021-08-201326104
361821984GraduationTogether26646.01010-02-20142646
453241981PhDMarried58293.01019-01-20149465
\n", 181 | "
" 182 | ], 183 | "text/plain": [ 184 | " ID Year_Birth Education Marital_Status Income Kidhome Teenhome \\\n", 185 | "0 5524 1957 Graduation Single 58138.0 0 0 \n", 186 | "1 2174 1954 Graduation Single 46344.0 1 1 \n", 187 | "2 4141 1965 Graduation Together 71613.0 0 0 \n", 188 | "3 6182 1984 Graduation Together 26646.0 1 0 \n", 189 | "4 5324 1981 PhD Married 58293.0 1 0 \n", 190 | "\n", 191 | " Dt_Customer Recency NumStorePurchases NumWebVisitsMonth \n", 192 | "0 04-09-2012 58 4 7 \n", 193 | "1 08-03-2014 38 2 5 \n", 194 | "2 21-08-2013 26 10 4 \n", 195 | "3 10-02-2014 26 4 6 \n", 196 | "4 19-01-2014 94 6 5 " 197 | ] 198 | }, 199 | "execution_count": 6, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "marketing_data.head()" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 7, 211 | "id": "b3601b62", 212 | "metadata": {}, 213 | "outputs": [ 214 | { 215 | "data": { 216 | "text/plain": [ 217 | "ID int64\n", 218 | "Year_Birth int64\n", 219 | "Education object\n", 220 | "Marital_Status object\n", 221 | "Income float64\n", 222 | "Kidhome int64\n", 223 | "Teenhome int64\n", 224 | "Dt_Customer object\n", 225 | "Recency int64\n", 226 | "NumStorePurchases int64\n", 227 | "NumWebVisitsMonth int64\n", 228 | "dtype: object" 229 | ] 230 | }, 231 | "execution_count": 7, 232 | "metadata": {}, 233 | "output_type": "execute_result" 234 | } 235 | ], 236 | "source": [ 237 | "marketing_data.dtypes" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "id": "46f24ec2", 243 | "metadata": {}, 244 | "source": [ 245 | "#### Sort customers based on number of Store Purchases in descending order" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 13, 251 | "id": "fe10f0a2", 252 | "metadata": {}, 253 | "outputs": [], 254 | "source": [ 255 | "sorted_data = marketing_data.sort_values('NumStorePurchases', ascending=False)" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 20, 261 | "id": "07e2dab3", 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "sorted_data[['ID','NumStorePurchases']].tail().to_clipboard()" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 12, 271 | "id": "021d0a5a", 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "marketing_data.sort_values('NumStorePurchases', ascending=False).head(2).T.to_clipboard()" 276 | ] 277 | } 278 | ], 279 | "metadata": { 280 | "kernelspec": { 281 | "display_name": "Python 3", 282 | "language": "python", 283 | "name": "python3" 284 | }, 285 | "language_info": { 286 | "codemirror_mode": { 287 | "name": "ipython", 288 | "version": 3 289 | }, 290 | "file_extension": ".py", 291 | "mimetype": "text/x-python", 292 | "name": "python", 293 | "nbconvert_exporter": "python", 294 | "pygments_lexer": "ipython3", 295 | "version": "3.7.1" 296 | } 297 | }, 298 | "nbformat": 4, 299 | "nbformat_minor": 5 300 | } 301 | -------------------------------------------------------------------------------- /Ch2/.ipynb_checkpoints/Untitled-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /Ch2/1. Grouping Data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n", 56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n", 57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "128c9ecd", 63 | "metadata": {}, 64 | "source": [ 65 | "#### Inspect first 2 rows and data types of the dataset" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "id": "1f556097", 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | "
01
ID55242174
Year_Birth19571954
EducationGraduationGraduation
Marital_StatusSingleSingle
Income58138.046344.0
Kidhome01
Teenhome01
Dt_Customer04-09-201208-03-2014
Recency5838
NumStorePurchases42
NumWebVisitsMonth75
\n", 157 | "
" 158 | ], 159 | "text/plain": [ 160 | " 0 1\n", 161 | "ID 5524 2174\n", 162 | "Year_Birth 1957 1954\n", 163 | "Education Graduation Graduation\n", 164 | "Marital_Status Single Single\n", 165 | "Income 58138.0 46344.0\n", 166 | "Kidhome 0 1\n", 167 | "Teenhome 0 1\n", 168 | "Dt_Customer 04-09-2012 08-03-2014\n", 169 | "Recency 58 38\n", 170 | "NumStorePurchases 4 2\n", 171 | "NumWebVisitsMonth 7 5" 172 | ] 173 | }, 174 | "execution_count": 4, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "marketing_data.head(2).T" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 5, 186 | "id": "b3601b62", 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "text/plain": [ 192 | "ID int64\n", 193 | "Year_Birth int64\n", 194 | "Education object\n", 195 | "Marital_Status object\n", 196 | "Income float64\n", 197 | "Kidhome int64\n", 198 | "Teenhome int64\n", 199 | "Dt_Customer object\n", 200 | "Recency int64\n", 201 | "NumStorePurchases int64\n", 202 | "NumWebVisitsMonth int64\n", 203 | "dtype: object" 204 | ] 205 | }, 206 | "execution_count": 5, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "marketing_data.dtypes" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 6, 218 | "id": "bdf17c4b", 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "(2240, 11)" 225 | ] 226 | }, 227 | "execution_count": 6, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "marketing_data.shape" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "id": "46f24ec2", 239 | "metadata": {}, 240 | "source": [ 241 | "#### Check the average number of store purchases of customers based on number of kids in the home" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 7, 247 | "id": "fe10f0a2", 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/plain": [ 253 | "Kidhome\n", 254 | "0 7.217324\n", 255 | "1 3.863181\n", 256 | "2 3.437500\n", 257 | "Name: NumStorePurchases, dtype: float64" 258 | ] 259 | }, 260 | "execution_count": 7, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()" 267 | ] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.7.1" 287 | } 288 | }, 289 | "nbformat": 4, 290 | "nbformat_minor": 5 291 | } 292 | -------------------------------------------------------------------------------- /Ch2/10. Replacing Data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "id": "1f556097", 70 | "metadata": { 71 | "scrolled": true 72 | }, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | "
IDYear_BirthKidhomeTeenhome
05524195700
12174195411
24141196500
36182198410
45324198110
\n", 139 | "
" 140 | ], 141 | "text/plain": [ 142 | " ID Year_Birth Kidhome Teenhome\n", 143 | "0 5524 1957 0 0\n", 144 | "1 2174 1954 1 1\n", 145 | "2 4141 1965 0 0\n", 146 | "3 6182 1984 1 0\n", 147 | "4 5324 1981 1 0" 148 | ] 149 | }, 150 | "execution_count": 4, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "marketing_data.head()" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 5, 162 | "id": "b3601b62", 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "data": { 167 | "text/plain": [ 168 | "(2240, 4)" 169 | ] 170 | }, 171 | "execution_count": 5, 172 | "metadata": {}, 173 | "output_type": "execute_result" 174 | } 175 | ], 176 | "source": [ 177 | "marketing_data.shape" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "id": "46f24ec2", 183 | "metadata": {}, 184 | "source": [ 185 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\"" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 6, 191 | "id": "fe10f0a2", 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "id": "998a9da7", 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/html": [ 207 | "
\n", 208 | "\n", 221 | "\n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | "
TeenhomeTeenhome_replaced
00has no teen
11has teen
20has no teen
30has no teen
40has no teen
\n", 257 | "
" 258 | ], 259 | "text/plain": [ 260 | " Teenhome Teenhome_replaced\n", 261 | "0 0 has no teen\n", 262 | "1 1 has teen\n", 263 | "2 0 has no teen\n", 264 | "3 0 has no teen\n", 265 | "4 0 has no teen" 266 | ] 267 | }, 268 | "execution_count": 7, 269 | "metadata": {}, 270 | "output_type": "execute_result" 271 | } 272 | ], 273 | "source": [ 274 | "marketing_data[['Teenhome','Teenhome_replaced']].head()" 275 | ] 276 | } 277 | ], 278 | "metadata": { 279 | "kernelspec": { 280 | "display_name": "Python 3", 281 | "language": "python", 282 | "name": "python3" 283 | }, 284 | "language_info": { 285 | "codemirror_mode": { 286 | "name": "ipython", 287 | "version": 3 288 | }, 289 | "file_extension": ".py", 290 | "mimetype": "text/x-python", 291 | "name": "python", 292 | "nbconvert_exporter": "python", 293 | "pygments_lexer": "ipython3", 294 | "version": "3.7.1" 295 | } 296 | }, 297 | "nbformat": 4, 298 | "nbformat_minor": 5 299 | } 300 | -------------------------------------------------------------------------------- /Ch2/9. Changing Data Format.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth','Marital_Status','Income']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "id": "1f556097", 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "data": { 74 | "text/html": [ 75 | "
\n", 76 | "\n", 89 | "\n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | "
IDYear_BirthMarital_StatusIncome
055241957Single58138.0
121741954Single46344.0
241411965Together71613.0
361821984Together26646.0
453241981Married58293.0
\n", 137 | "
" 138 | ], 139 | "text/plain": [ 140 | " ID Year_Birth Marital_Status Income\n", 141 | "0 5524 1957 Single 58138.0\n", 142 | "1 2174 1954 Single 46344.0\n", 143 | "2 4141 1965 Together 71613.0\n", 144 | "3 6182 1984 Together 26646.0\n", 145 | "4 5324 1981 Married 58293.0" 146 | ] 147 | }, 148 | "execution_count": 4, 149 | "metadata": {}, 150 | "output_type": "execute_result" 151 | } 152 | ], 153 | "source": [ 154 | "marketing_data.head()" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 5, 160 | "id": "b3601b62", 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "data": { 165 | "text/plain": [ 166 | "(2240, 4)" 167 | ] 168 | }, 169 | "execution_count": 5, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "marketing_data.shape" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "id": "35c8704d", 181 | "metadata": {}, 182 | "source": [ 183 | "#### Fill NAs in the income column" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 6, 189 | "id": "9e28186b", 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "marketing_data['Income'] = marketing_data['Income'].fillna(0)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "id": "46f24ec2", 199 | "metadata": {}, 200 | "source": [ 201 | "#### Change the data type of the Income from float to int" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 7, 207 | "id": "fe10f0a2", 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "marketing_data['Income_changed'] = marketing_data['Income'].astype(int)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 8, 217 | "id": "5b38b1b8", 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "data": { 222 | "text/html": [ 223 | "
\n", 224 | "\n", 237 | "\n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | "
IncomeIncome_changed
058138.058138
146344.046344
271613.071613
326646.026646
458293.058293
\n", 273 | "
" 274 | ], 275 | "text/plain": [ 276 | " Income Income_changed\n", 277 | "0 58138.0 58138\n", 278 | "1 46344.0 46344\n", 279 | "2 71613.0 71613\n", 280 | "3 26646.0 26646\n", 281 | "4 58293.0 58293" 282 | ] 283 | }, 284 | "execution_count": 8, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "marketing_data[['Income','Income_changed']].head()" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 9, 296 | "id": "998a9da7", 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "data": { 301 | "text/plain": [ 302 | "Income float64\n", 303 | "Income_changed int32\n", 304 | "dtype: object" 305 | ] 306 | }, 307 | "execution_count": 9, 308 | "metadata": {}, 309 | "output_type": "execute_result" 310 | } 311 | ], 312 | "source": [ 313 | "marketing_data[['Income','Income_changed']].dtypes" 314 | ] 315 | } 316 | ], 317 | "metadata": { 318 | "kernelspec": { 319 | "display_name": "Python 3", 320 | "language": "python", 321 | "name": "python3" 322 | }, 323 | "language_info": { 324 | "codemirror_mode": { 325 | "name": "ipython", 326 | "version": 3 327 | }, 328 | "file_extension": ".py", 329 | "mimetype": "text/x-python", 330 | "name": "python", 331 | "nbconvert_exporter": "python", 332 | "pygments_lexer": "ipython3", 333 | "version": "3.7.1" 334 | } 335 | }, 336 | "nbformat": 4, 337 | "nbformat_minor": 5 338 | } 339 | -------------------------------------------------------------------------------- /Ch3/.ipynb_checkpoints/1. Grouping Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n", 56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n", 57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "128c9ecd", 63 | "metadata": {}, 64 | "source": [ 65 | "#### Inspect first 2 rows and data types of the dataset" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "id": "1f556097", 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | "
01
ID55242174
Year_Birth19571954
EducationGraduationGraduation
Marital_StatusSingleSingle
Income58138.046344.0
Kidhome01
Teenhome01
Dt_Customer04-09-201208-03-2014
Recency5838
NumStorePurchases42
NumWebVisitsMonth75
\n", 157 | "
" 158 | ], 159 | "text/plain": [ 160 | " 0 1\n", 161 | "ID 5524 2174\n", 162 | "Year_Birth 1957 1954\n", 163 | "Education Graduation Graduation\n", 164 | "Marital_Status Single Single\n", 165 | "Income 58138.0 46344.0\n", 166 | "Kidhome 0 1\n", 167 | "Teenhome 0 1\n", 168 | "Dt_Customer 04-09-2012 08-03-2014\n", 169 | "Recency 58 38\n", 170 | "NumStorePurchases 4 2\n", 171 | "NumWebVisitsMonth 7 5" 172 | ] 173 | }, 174 | "execution_count": 4, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "marketing_data.head(2).T" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 5, 186 | "id": "b3601b62", 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "text/plain": [ 192 | "ID int64\n", 193 | "Year_Birth int64\n", 194 | "Education object\n", 195 | "Marital_Status object\n", 196 | "Income float64\n", 197 | "Kidhome int64\n", 198 | "Teenhome int64\n", 199 | "Dt_Customer object\n", 200 | "Recency int64\n", 201 | "NumStorePurchases int64\n", 202 | "NumWebVisitsMonth int64\n", 203 | "dtype: object" 204 | ] 205 | }, 206 | "execution_count": 5, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "marketing_data.dtypes" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 6, 218 | "id": "bdf17c4b", 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "(2240, 11)" 225 | ] 226 | }, 227 | "execution_count": 6, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "marketing_data.shape" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "id": "46f24ec2", 239 | "metadata": {}, 240 | "source": [ 241 | "#### Check the average number of store purchases of customers based on number of kids in the home" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 7, 247 | "id": "fe10f0a2", 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/plain": [ 253 | "Kidhome\n", 254 | "0 7.217324\n", 255 | "1 3.863181\n", 256 | "2 3.437500\n", 257 | "Name: NumStorePurchases, dtype: float64" 258 | ] 259 | }, 260 | "execution_count": 7, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()" 267 | ] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.7.1" 287 | } 288 | }, 289 | "nbformat": 4, 290 | "nbformat_minor": 5 291 | } 292 | -------------------------------------------------------------------------------- /Ch3/.ipynb_checkpoints/10. Replacing Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "id": "1f556097", 70 | "metadata": { 71 | "scrolled": true 72 | }, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | "
IDYear_BirthKidhomeTeenhome
05524195700
12174195411
24141196500
36182198410
45324198110
\n", 139 | "
" 140 | ], 141 | "text/plain": [ 142 | " ID Year_Birth Kidhome Teenhome\n", 143 | "0 5524 1957 0 0\n", 144 | "1 2174 1954 1 1\n", 145 | "2 4141 1965 0 0\n", 146 | "3 6182 1984 1 0\n", 147 | "4 5324 1981 1 0" 148 | ] 149 | }, 150 | "execution_count": 4, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "marketing_data.head()" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 5, 162 | "id": "b3601b62", 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "data": { 167 | "text/plain": [ 168 | "(2240, 4)" 169 | ] 170 | }, 171 | "execution_count": 5, 172 | "metadata": {}, 173 | "output_type": "execute_result" 174 | } 175 | ], 176 | "source": [ 177 | "marketing_data.shape" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "id": "46f24ec2", 183 | "metadata": {}, 184 | "source": [ 185 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\"" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 6, 191 | "id": "fe10f0a2", 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "id": "998a9da7", 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/html": [ 207 | "
\n", 208 | "\n", 221 | "\n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | "
TeenhomeTeenhome_replaced
00has no teen
11has teen
20has no teen
30has no teen
40has no teen
\n", 257 | "
" 258 | ], 259 | "text/plain": [ 260 | " Teenhome Teenhome_replaced\n", 261 | "0 0 has no teen\n", 262 | "1 1 has teen\n", 263 | "2 0 has no teen\n", 264 | "3 0 has no teen\n", 265 | "4 0 has no teen" 266 | ] 267 | }, 268 | "execution_count": 7, 269 | "metadata": {}, 270 | "output_type": "execute_result" 271 | } 272 | ], 273 | "source": [ 274 | "marketing_data[['Teenhome','Teenhome_replaced']].head()" 275 | ] 276 | } 277 | ], 278 | "metadata": { 279 | "kernelspec": { 280 | "display_name": "Python 3", 281 | "language": "python", 282 | "name": "python3" 283 | }, 284 | "language_info": { 285 | "codemirror_mode": { 286 | "name": "ipython", 287 | "version": 3 288 | }, 289 | "file_extension": ".py", 290 | "mimetype": "text/x-python", 291 | "name": "python", 292 | "nbconvert_exporter": "python", 293 | "pygments_lexer": "ipython3", 294 | "version": "3.7.1" 295 | } 296 | }, 297 | "nbformat": 4, 298 | "nbformat_minor": 5 299 | } 300 | -------------------------------------------------------------------------------- /Ch3/.ipynb_checkpoints/9. Changing Data Format-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth','Marital_Status','Income']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "id": "1f556097", 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "data": { 74 | "text/html": [ 75 | "
\n", 76 | "\n", 89 | "\n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | "
IDYear_BirthMarital_StatusIncome
055241957Single58138.0
121741954Single46344.0
241411965Together71613.0
361821984Together26646.0
453241981Married58293.0
\n", 137 | "
" 138 | ], 139 | "text/plain": [ 140 | " ID Year_Birth Marital_Status Income\n", 141 | "0 5524 1957 Single 58138.0\n", 142 | "1 2174 1954 Single 46344.0\n", 143 | "2 4141 1965 Together 71613.0\n", 144 | "3 6182 1984 Together 26646.0\n", 145 | "4 5324 1981 Married 58293.0" 146 | ] 147 | }, 148 | "execution_count": 4, 149 | "metadata": {}, 150 | "output_type": "execute_result" 151 | } 152 | ], 153 | "source": [ 154 | "marketing_data.head()" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 5, 160 | "id": "b3601b62", 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "data": { 165 | "text/plain": [ 166 | "(2240, 4)" 167 | ] 168 | }, 169 | "execution_count": 5, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "marketing_data.shape" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "id": "35c8704d", 181 | "metadata": {}, 182 | "source": [ 183 | "#### Fill NAs in the income column" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 6, 189 | "id": "9e28186b", 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "marketing_data['Income'] = marketing_data['Income'].fillna(0)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "id": "46f24ec2", 199 | "metadata": {}, 200 | "source": [ 201 | "#### Change the data type of the Income from float to int" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 7, 207 | "id": "fe10f0a2", 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "marketing_data['Income_changed'] = marketing_data['Income'].astype(int)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 8, 217 | "id": "5b38b1b8", 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "data": { 222 | "text/html": [ 223 | "
\n", 224 | "\n", 237 | "\n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | "
IncomeIncome_changed
058138.058138
146344.046344
271613.071613
326646.026646
458293.058293
\n", 273 | "
" 274 | ], 275 | "text/plain": [ 276 | " Income Income_changed\n", 277 | "0 58138.0 58138\n", 278 | "1 46344.0 46344\n", 279 | "2 71613.0 71613\n", 280 | "3 26646.0 26646\n", 281 | "4 58293.0 58293" 282 | ] 283 | }, 284 | "execution_count": 8, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "marketing_data[['Income','Income_changed']].head()" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 9, 296 | "id": "998a9da7", 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "data": { 301 | "text/plain": [ 302 | "Income float64\n", 303 | "Income_changed int32\n", 304 | "dtype: object" 305 | ] 306 | }, 307 | "execution_count": 9, 308 | "metadata": {}, 309 | "output_type": "execute_result" 310 | } 311 | ], 312 | "source": [ 313 | "marketing_data[['Income','Income_changed']].dtypes" 314 | ] 315 | } 316 | ], 317 | "metadata": { 318 | "kernelspec": { 319 | "display_name": "Python 3", 320 | "language": "python", 321 | "name": "python3" 322 | }, 323 | "language_info": { 324 | "codemirror_mode": { 325 | "name": "ipython", 326 | "version": 3 327 | }, 328 | "file_extension": ".py", 329 | "mimetype": "text/x-python", 330 | "name": "python", 331 | "nbconvert_exporter": "python", 332 | "pygments_lexer": "ipython3", 333 | "version": "3.7.1" 334 | } 335 | }, 336 | "nbformat": 4, 337 | "nbformat_minor": 5 338 | } 339 | -------------------------------------------------------------------------------- /Ch3/.ipynb_checkpoints/Grouping Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n", 56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n", 57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "128c9ecd", 63 | "metadata": {}, 64 | "source": [ 65 | "#### Inspect first 5 rows and data types of the dataset" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 9, 71 | "id": "1f556097", 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | "
01
ID55242174
Year_Birth19571954
EducationGraduationGraduation
Marital_StatusSingleSingle
Income58138.046344.0
Kidhome01
Teenhome01
Dt_Customer04-09-201208-03-2014
Recency5838
NumStorePurchases42
NumWebVisitsMonth75
\n", 157 | "
" 158 | ], 159 | "text/plain": [ 160 | " 0 1\n", 161 | "ID 5524 2174\n", 162 | "Year_Birth 1957 1954\n", 163 | "Education Graduation Graduation\n", 164 | "Marital_Status Single Single\n", 165 | "Income 58138.0 46344.0\n", 166 | "Kidhome 0 1\n", 167 | "Teenhome 0 1\n", 168 | "Dt_Customer 04-09-2012 08-03-2014\n", 169 | "Recency 58 38\n", 170 | "NumStorePurchases 4 2\n", 171 | "NumWebVisitsMonth 7 5" 172 | ] 173 | }, 174 | "execution_count": 9, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "marketing_data.head(2).T" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 10, 186 | "id": "20f83686", 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "marketing_data.head(2).T.to_clipboard()" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 5, 196 | "id": "b3601b62", 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "data": { 201 | "text/plain": [ 202 | "ID int64\n", 203 | "Year_Birth int64\n", 204 | "Education object\n", 205 | "Marital_Status object\n", 206 | "Income float64\n", 207 | "Kidhome int64\n", 208 | "Teenhome int64\n", 209 | "Dt_Customer object\n", 210 | "Recency int64\n", 211 | "NumStorePurchases int64\n", 212 | "NumWebVisitsMonth int64\n", 213 | "dtype: object" 214 | ] 215 | }, 216 | "execution_count": 5, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "marketing_data.dtypes" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 11, 228 | "id": "4082ff2c", 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "marketing_data.dtypes.to_clipboard()" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 12, 238 | "id": "bdf17c4b", 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/plain": [ 244 | "(2240, 11)" 245 | ] 246 | }, 247 | "execution_count": 12, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "marketing_data.shape" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "id": "46f24ec2", 259 | "metadata": {}, 260 | "source": [ 261 | "#### Check the average number of store purchases of customers based on number of kids in the home" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 6, 267 | "id": "fe10f0a2", 268 | "metadata": {}, 269 | "outputs": [ 270 | { 271 | "data": { 272 | "text/plain": [ 273 | "Kidhome\n", 274 | "0 7.217324\n", 275 | "1 3.863181\n", 276 | "2 3.437500\n", 277 | "Name: NumStorePurchases, dtype: float64" 278 | ] 279 | }, 280 | "execution_count": 6, 281 | "metadata": {}, 282 | "output_type": "execute_result" 283 | } 284 | ], 285 | "source": [ 286 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 14, 292 | "id": "2c5c4c2f", 293 | "metadata": {}, 294 | "outputs": [], 295 | "source": [ 296 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean().to_clipboard()" 297 | ] 298 | } 299 | ], 300 | "metadata": { 301 | "kernelspec": { 302 | "display_name": "Python 3", 303 | "language": "python", 304 | "name": "python3" 305 | }, 306 | "language_info": { 307 | "codemirror_mode": { 308 | "name": "ipython", 309 | "version": 3 310 | }, 311 | "file_extension": ".py", 312 | "mimetype": "text/x-python", 313 | "name": "python", 314 | "nbconvert_exporter": "python", 315 | "pygments_lexer": "ipython3", 316 | "version": "3.7.1" 317 | } 318 | }, 319 | "nbformat": 4, 320 | "nbformat_minor": 5 321 | } 322 | -------------------------------------------------------------------------------- /Ch3/.ipynb_checkpoints/Replacing Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 29, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 30, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "128c9ecd", 61 | "metadata": {}, 62 | "source": [ 63 | "#### Inspect first 5 rows and data types of the dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 31, 69 | "id": "1f556097", 70 | "metadata": { 71 | "scrolled": true 72 | }, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | "
IDYear_BirthKidhomeTeenhome
05524195700
12174195411
24141196500
36182198410
45324198110
\n", 139 | "
" 140 | ], 141 | "text/plain": [ 142 | " ID Year_Birth Kidhome Teenhome\n", 143 | "0 5524 1957 0 0\n", 144 | "1 2174 1954 1 1\n", 145 | "2 4141 1965 0 0\n", 146 | "3 6182 1984 1 0\n", 147 | "4 5324 1981 1 0" 148 | ] 149 | }, 150 | "execution_count": 31, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "marketing_data.head()" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 32, 162 | "id": "954b5d8d", 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "marketing_data.head().to_clipboard()" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 33, 172 | "id": "b3601b62", 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "(2240, 4)" 179 | ] 180 | }, 181 | "execution_count": 33, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "marketing_data.shape" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "id": "46f24ec2", 193 | "metadata": {}, 194 | "source": [ 195 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\"" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 21, 201 | "id": "fe10f0a2", 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 22, 211 | "id": "998a9da7", 212 | "metadata": {}, 213 | "outputs": [ 214 | { 215 | "data": { 216 | "text/html": [ 217 | "
\n", 218 | "\n", 231 | "\n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | "
TeenhomeTeenhome_replaced
00has no teen
11has teen
20has no teen
30has no teen
40has no teen
\n", 267 | "
" 268 | ], 269 | "text/plain": [ 270 | " Teenhome Teenhome_replaced\n", 271 | "0 0 has no teen\n", 272 | "1 1 has teen\n", 273 | "2 0 has no teen\n", 274 | "3 0 has no teen\n", 275 | "4 0 has no teen" 276 | ] 277 | }, 278 | "execution_count": 22, 279 | "metadata": {}, 280 | "output_type": "execute_result" 281 | } 282 | ], 283 | "source": [ 284 | "marketing_data[['Teenhome','Teenhome_replaced']].head()" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 25, 290 | "id": "393c92d6", 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "marketing_data[['Teenhome','Teenhome_replaced']].head().to_clipboard()" 295 | ] 296 | } 297 | ], 298 | "metadata": { 299 | "kernelspec": { 300 | "display_name": "Python 3", 301 | "language": "python", 302 | "name": "python3" 303 | }, 304 | "language_info": { 305 | "codemirror_mode": { 306 | "name": "ipython", 307 | "version": 3 308 | }, 309 | "file_extension": ".py", 310 | "mimetype": "text/x-python", 311 | "name": "python", 312 | "nbconvert_exporter": "python", 313 | "pygments_lexer": "ipython3", 314 | "version": "3.7.1" 315 | } 316 | }, 317 | "nbformat": 4, 318 | "nbformat_minor": 5 319 | } 320 | -------------------------------------------------------------------------------- /Ch3/.ipynb_checkpoints/Sorting Data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 2, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "id": "a2050fe1", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Load dataset" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 3, 33 | "id": "3a02fac1", 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "id": "e81aa501", 43 | "metadata": {}, 44 | "source": [ 45 | "#### Subset for relevant columns" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 4, 51 | "id": "9279d462", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n", 56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n", 57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "id": "128c9ecd", 63 | "metadata": {}, 64 | "source": [ 65 | "#### Inspect first 5 rows and data types of the dataset" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 6, 71 | "id": "1f556097", 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/html": [ 77 | "
\n", 78 | "\n", 91 | "\n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | "
IDYear_BirthEducationMarital_StatusIncomeKidhomeTeenhomeDt_CustomerRecencyNumStorePurchasesNumWebVisitsMonth
055241957GraduationSingle58138.00004-09-20125847
121741954GraduationSingle46344.01108-03-20143825
241411965GraduationTogether71613.00021-08-201326104
361821984GraduationTogether26646.01010-02-20142646
453241981PhDMarried58293.01019-01-20149465
\n", 181 | "
" 182 | ], 183 | "text/plain": [ 184 | " ID Year_Birth Education Marital_Status Income Kidhome Teenhome \\\n", 185 | "0 5524 1957 Graduation Single 58138.0 0 0 \n", 186 | "1 2174 1954 Graduation Single 46344.0 1 1 \n", 187 | "2 4141 1965 Graduation Together 71613.0 0 0 \n", 188 | "3 6182 1984 Graduation Together 26646.0 1 0 \n", 189 | "4 5324 1981 PhD Married 58293.0 1 0 \n", 190 | "\n", 191 | " Dt_Customer Recency NumStorePurchases NumWebVisitsMonth \n", 192 | "0 04-09-2012 58 4 7 \n", 193 | "1 08-03-2014 38 2 5 \n", 194 | "2 21-08-2013 26 10 4 \n", 195 | "3 10-02-2014 26 4 6 \n", 196 | "4 19-01-2014 94 6 5 " 197 | ] 198 | }, 199 | "execution_count": 6, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "marketing_data.head()" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 7, 211 | "id": "b3601b62", 212 | "metadata": {}, 213 | "outputs": [ 214 | { 215 | "data": { 216 | "text/plain": [ 217 | "ID int64\n", 218 | "Year_Birth int64\n", 219 | "Education object\n", 220 | "Marital_Status object\n", 221 | "Income float64\n", 222 | "Kidhome int64\n", 223 | "Teenhome int64\n", 224 | "Dt_Customer object\n", 225 | "Recency int64\n", 226 | "NumStorePurchases int64\n", 227 | "NumWebVisitsMonth int64\n", 228 | "dtype: object" 229 | ] 230 | }, 231 | "execution_count": 7, 232 | "metadata": {}, 233 | "output_type": "execute_result" 234 | } 235 | ], 236 | "source": [ 237 | "marketing_data.dtypes" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "id": "46f24ec2", 243 | "metadata": {}, 244 | "source": [ 245 | "#### Sort customers based on number of Store Purchases in descending order" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 13, 251 | "id": "fe10f0a2", 252 | "metadata": {}, 253 | "outputs": [], 254 | "source": [ 255 | "sorted_data = marketing_data.sort_values('NumStorePurchases', ascending=False)" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 20, 261 | "id": "07e2dab3", 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [ 265 | "sorted_data[['ID','NumStorePurchases']].tail().to_clipboard()" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 12, 271 | "id": "021d0a5a", 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "marketing_data.sort_values('NumStorePurchases', ascending=False).head(2).T.to_clipboard()" 276 | ] 277 | } 278 | ], 279 | "metadata": { 280 | "kernelspec": { 281 | "display_name": "Python 3", 282 | "language": "python", 283 | "name": "python3" 284 | }, 285 | "language_info": { 286 | "codemirror_mode": { 287 | "name": "ipython", 288 | "version": 3 289 | }, 290 | "file_extension": ".py", 291 | "mimetype": "text/x-python", 292 | "name": "python", 293 | "nbconvert_exporter": "python", 294 | "pygments_lexer": "ipython3", 295 | "version": "3.7.1" 296 | } 297 | }, 298 | "nbformat": 4, 299 | "nbformat_minor": 5 300 | } 301 | -------------------------------------------------------------------------------- /Ch3/.ipynb_checkpoints/Untitled-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /Ch3/1. Preparing for EDA ..ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "id": "a2050fe1", 24 | "metadata": {}, 25 | "source": [ 26 | "#### Load dataset" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "id": "3a02fac1", 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "houseprices_data = pd.read_csv(\"data/HousingPricesData.csv\")" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "id": "e81aa501", 42 | "metadata": {}, 43 | "source": [ 44 | "#### Subset for relevant columns" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "id": "9279d462", 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "houseprices_data = houseprices_data[['Zip', 'Price', 'Area', 'Room']]" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "id": "128c9ecd", 60 | "metadata": {}, 61 | "source": [ 62 | "#### Inspect first 2 rows and data types of the dataset" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 4, 68 | "id": "1f556097", 69 | "metadata": { 70 | "scrolled": true 71 | }, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/html": [ 76 | "
\n", 77 | "\n", 90 | "\n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | "
ZipPriceAreaRoom
01091 CR685000.0643
11059 EL475000.0603
21097 SM850000.01094
31060 TH580000.01286
41036 KN720000.01385
\n", 138 | "
" 139 | ], 140 | "text/plain": [ 141 | " Zip Price Area Room\n", 142 | "0 1091 CR 685000.0 64 3\n", 143 | "1 1059 EL 475000.0 60 3\n", 144 | "2 1097 SM 850000.0 109 4\n", 145 | "3 1060 TH 580000.0 128 6\n", 146 | "4 1036 KN 720000.0 138 5" 147 | ] 148 | }, 149 | "execution_count": 4, 150 | "metadata": {}, 151 | "output_type": "execute_result" 152 | } 153 | ], 154 | "source": [ 155 | "houseprices_data.head()" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 5, 161 | "id": "bdf17c4b", 162 | "metadata": {}, 163 | "outputs": [ 164 | { 165 | "data": { 166 | "text/plain": [ 167 | "(924, 4)" 168 | ] 169 | }, 170 | "execution_count": 5, 171 | "metadata": {}, 172 | "output_type": "execute_result" 173 | } 174 | ], 175 | "source": [ 176 | "houseprices_data.shape" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 6, 182 | "id": "b3601b62", 183 | "metadata": {}, 184 | "outputs": [ 185 | { 186 | "data": { 187 | "text/plain": [ 188 | "Zip object\n", 189 | "Price float64\n", 190 | "Area int64\n", 191 | "Room int64\n", 192 | "dtype: object" 193 | ] 194 | }, 195 | "execution_count": 6, 196 | "metadata": {}, 197 | "output_type": "execute_result" 198 | } 199 | ], 200 | "source": [ 201 | "houseprices_data.dtypes" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "id": "09bbdf22", 207 | "metadata": {}, 208 | "source": [ 209 | "#### Create a price per sqm variable based on the price and area variables" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 7, 215 | "id": "60b95d7c", 216 | "metadata": {}, 217 | "outputs": [], 218 | "source": [ 219 | "houseprices_data['PriceperSqm'] = houseprices_data['Price']/houseprices_data['Area']" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 8, 225 | "id": "cf89a96b", 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "data": { 230 | "text/html": [ 231 | "
\n", 232 | "\n", 245 | "\n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | "
ZipPriceAreaRoomPriceperSqm
01091 CR685000.064310703.125000
11059 EL475000.06037916.666667
21097 SM850000.010947798.165138
31060 TH580000.012864531.250000
41036 KN720000.013855217.391304
\n", 299 | "
" 300 | ], 301 | "text/plain": [ 302 | " Zip Price Area Room PriceperSqm\n", 303 | "0 1091 CR 685000.0 64 3 10703.125000\n", 304 | "1 1059 EL 475000.0 60 3 7916.666667\n", 305 | "2 1097 SM 850000.0 109 4 7798.165138\n", 306 | "3 1060 TH 580000.0 128 6 4531.250000\n", 307 | "4 1036 KN 720000.0 138 5 5217.391304" 308 | ] 309 | }, 310 | "execution_count": 8, 311 | "metadata": {}, 312 | "output_type": "execute_result" 313 | } 314 | ], 315 | "source": [ 316 | "houseprices_data.head()" 317 | ] 318 | } 319 | ], 320 | "metadata": { 321 | "kernelspec": { 322 | "display_name": "Python 3", 323 | "language": "python", 324 | "name": "python3" 325 | }, 326 | "language_info": { 327 | "codemirror_mode": { 328 | "name": "ipython", 329 | "version": 3 330 | }, 331 | "file_extension": ".py", 332 | "mimetype": "text/x-python", 333 | "name": "python", 334 | "nbconvert_exporter": "python", 335 | "pygments_lexer": "ipython3", 336 | "version": "3.7.1" 337 | } 338 | }, 339 | "nbformat": 4, 340 | "nbformat_minor": 5 341 | } 342 | -------------------------------------------------------------------------------- /Ch4/.ipynb_checkpoints/2. Performing univariate analysis using a Boxplot-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "import matplotlib.pyplot as plt\n", 20 | "import seaborn as sns" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "a2050fe1", 26 | "metadata": {}, 27 | "source": [ 28 | "#### Load dataset" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": null, 34 | "id": "3a02fac1", 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "id": "e81aa501", 44 | "metadata": {}, 45 | "source": [ 46 | "#### Subset for relevant columns" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "id": "9279d462", 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "houseprices_data = houseprices_data[['Zip','Price','Area','Room']]" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "id": "128c9ecd", 62 | "metadata": {}, 63 | "source": [ 64 | "#### Inspect first 5 rows and data types of the dataset" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "id": "1f556097", 71 | "metadata": { 72 | "scrolled": true 73 | }, 74 | "outputs": [], 75 | "source": [ 76 | "houseprices_data.head()" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "id": "bdf17c4b", 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "houseprices_data.shape" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "id": "b3601b62", 93 | "metadata": {}, 94 | "outputs": [], 95 | "source": [ 96 | "houseprices_data.dtypes" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "id": "09bbdf22", 102 | "metadata": {}, 103 | "source": [ 104 | "#### Create a Boxplot in Seaborn" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": null, 110 | "id": "aa501b1d", 111 | "metadata": { 112 | "scrolled": true 113 | }, 114 | "outputs": [], 115 | "source": [ 116 | "sns.boxplot(data = houseprices_data, x= houseprices_data[\"Price\"])" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "id": "e7bceadc", 122 | "metadata": {}, 123 | "source": [ 124 | "#### Provide additional chart details" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "id": "8b23246b", 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [ 134 | "plt.figure(figsize= (12,6))\n", 135 | "\n", 136 | "ax = sns.boxplot(data = houseprices_data, x= houseprices_data[\"Price\"])\n", 137 | "ax.set_xlabel('House Prices in millions',fontsize = 15)\n", 138 | "ax.set_title('Univariate analysis of House Prices', fontsize= 20)\n", 139 | "plt.ticklabel_format(style='plain', axis='x')\n" 140 | ] 141 | } 142 | ], 143 | "metadata": { 144 | "kernelspec": { 145 | "display_name": "Python 3", 146 | "language": "python", 147 | "name": "python3" 148 | }, 149 | "language_info": { 150 | "codemirror_mode": { 151 | "name": "ipython", 152 | "version": 3 153 | }, 154 | "file_extension": ".py", 155 | "mimetype": "text/x-python", 156 | "name": "python", 157 | "nbconvert_exporter": "python", 158 | "pygments_lexer": "ipython3", 159 | "version": "3.7.1" 160 | } 161 | }, 162 | "nbformat": 4, 163 | "nbformat_minor": 5 164 | } 165 | -------------------------------------------------------------------------------- /Ch4/.ipynb_checkpoints/4. Performing univariate analysis using a Summary Table-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "id": "a2050fe1", 24 | "metadata": {}, 25 | "source": [ 26 | "#### Load dataset" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "id": "3a02fac1", 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "id": "e81aa501", 42 | "metadata": {}, 43 | "source": [ 44 | "#### Subset for relevant columns" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "id": "9279d462", 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "houseprices_data = houseprices_data[['Zip','Price']]" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "id": "128c9ecd", 60 | "metadata": {}, 61 | "source": [ 62 | "#### Inspect first 5 rows and data types of the dataset" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 4, 68 | "id": "1f556097", 69 | "metadata": { 70 | "scrolled": true 71 | }, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/html": [ 76 | "
\n", 77 | "\n", 90 | "\n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | "
ZipPrice
01091 CR685000.0
11059 EL475000.0
21097 SM850000.0
31060 TH580000.0
41036 KN720000.0
\n", 126 | "
" 127 | ], 128 | "text/plain": [ 129 | " Zip Price\n", 130 | "0 1091 CR 685000.0\n", 131 | "1 1059 EL 475000.0\n", 132 | "2 1097 SM 850000.0\n", 133 | "3 1060 TH 580000.0\n", 134 | "4 1036 KN 720000.0" 135 | ] 136 | }, 137 | "execution_count": 4, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "houseprices_data.head()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 5, 149 | "id": "bdf17c4b", 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "(924, 2)" 156 | ] 157 | }, 158 | "execution_count": 5, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "houseprices_data.shape" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 6, 170 | "id": "b3601b62", 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "data": { 175 | "text/plain": [ 176 | "Zip object\n", 177 | "Price float64\n", 178 | "dtype: object" 179 | ] 180 | }, 181 | "execution_count": 6, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "houseprices_data.dtypes" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "id": "09bbdf22", 193 | "metadata": {}, 194 | "source": [ 195 | "#### Create a Summary Table using the describe method in Pandas" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "id": "aa501b1d", 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/html": [ 207 | "
\n", 208 | "\n", 221 | "\n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | "
Price
count9.200000e+02
mean6.220654e+05
std5.389942e+05
min1.750000e+05
25%3.500000e+05
50%4.670000e+05
75%7.000000e+05
max5.950000e+06
\n", 263 | "
" 264 | ], 265 | "text/plain": [ 266 | " Price\n", 267 | "count 9.200000e+02\n", 268 | "mean 6.220654e+05\n", 269 | "std 5.389942e+05\n", 270 | "min 1.750000e+05\n", 271 | "25% 3.500000e+05\n", 272 | "50% 4.670000e+05\n", 273 | "75% 7.000000e+05\n", 274 | "max 5.950000e+06" 275 | ] 276 | }, 277 | "execution_count": 7, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "houseprices_data.describe()" 284 | ] 285 | } 286 | ], 287 | "metadata": { 288 | "kernelspec": { 289 | "display_name": "Python 3", 290 | "language": "python", 291 | "name": "python3" 292 | }, 293 | "language_info": { 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "file_extension": ".py", 299 | "mimetype": "text/x-python", 300 | "name": "python", 301 | "nbconvert_exporter": "python", 302 | "pygments_lexer": "ipython3", 303 | "version": "3.7.1" 304 | } 305 | }, 306 | "nbformat": 4, 307 | "nbformat_minor": 5 308 | } 309 | -------------------------------------------------------------------------------- /Ch4/.ipynb_checkpoints/Performing univariate analysis using a Table-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "id": "a2050fe1", 24 | "metadata": {}, 25 | "source": [ 26 | "#### Load dataset" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "id": "3a02fac1", 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "Houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "id": "e81aa501", 42 | "metadata": {}, 43 | "source": [ 44 | "#### Subset for relevant columns" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "id": "9279d462", 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "Houseprices_data = Houseprices_data[['Zip','Price']]" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "id": "128c9ecd", 60 | "metadata": {}, 61 | "source": [ 62 | "#### Inspect first 5 rows and data types of the dataset" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 4, 68 | "id": "1f556097", 69 | "metadata": { 70 | "scrolled": true 71 | }, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/html": [ 76 | "
\n", 77 | "\n", 90 | "\n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | "
ZipPrice
01091 CR685000.0
11059 EL475000.0
21097 SM850000.0
31060 TH580000.0
41036 KN720000.0
\n", 126 | "
" 127 | ], 128 | "text/plain": [ 129 | " Zip Price\n", 130 | "0 1091 CR 685000.0\n", 131 | "1 1059 EL 475000.0\n", 132 | "2 1097 SM 850000.0\n", 133 | "3 1060 TH 580000.0\n", 134 | "4 1036 KN 720000.0" 135 | ] 136 | }, 137 | "execution_count": 4, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "Houseprices_data.head()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 5, 149 | "id": "bdf17c4b", 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "(924, 2)" 156 | ] 157 | }, 158 | "execution_count": 5, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "Houseprices_data.shape" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 6, 170 | "id": "b3601b62", 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "data": { 175 | "text/plain": [ 176 | "Zip object\n", 177 | "Price float64\n", 178 | "dtype: object" 179 | ] 180 | }, 181 | "execution_count": 6, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "Houseprices_data.dtypes" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "id": "09bbdf22", 193 | "metadata": {}, 194 | "source": [ 195 | "#### Create a Summary Table using the describe method in Pandas" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "id": "aa501b1d", 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/html": [ 207 | "
\n", 208 | "\n", 221 | "\n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | "
Price
count9.200000e+02
mean6.220654e+05
std5.389942e+05
min1.750000e+05
25%3.500000e+05
50%4.670000e+05
75%7.000000e+05
max5.950000e+06
\n", 263 | "
" 264 | ], 265 | "text/plain": [ 266 | " Price\n", 267 | "count 9.200000e+02\n", 268 | "mean 6.220654e+05\n", 269 | "std 5.389942e+05\n", 270 | "min 1.750000e+05\n", 271 | "25% 3.500000e+05\n", 272 | "50% 4.670000e+05\n", 273 | "75% 7.000000e+05\n", 274 | "max 5.950000e+06" 275 | ] 276 | }, 277 | "execution_count": 7, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "Houseprices_data.describe()" 284 | ] 285 | } 286 | ], 287 | "metadata": { 288 | "kernelspec": { 289 | "display_name": "Python 3", 290 | "language": "python", 291 | "name": "python3" 292 | }, 293 | "language_info": { 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "file_extension": ".py", 299 | "mimetype": "text/x-python", 300 | "name": "python", 301 | "nbconvert_exporter": "python", 302 | "pygments_lexer": "ipython3", 303 | "version": "3.7.1" 304 | } 305 | }, 306 | "nbformat": 4, 307 | "nbformat_minor": 5 308 | } 309 | -------------------------------------------------------------------------------- /Ch4/4. Performing univariate analysis using a Summary Table.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "id": "a2050fe1", 24 | "metadata": {}, 25 | "source": [ 26 | "#### Load dataset" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "id": "3a02fac1", 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "id": "e81aa501", 42 | "metadata": {}, 43 | "source": [ 44 | "#### Subset for relevant columns" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "id": "9279d462", 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "houseprices_data = houseprices_data[['Zip','Price']]" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "id": "128c9ecd", 60 | "metadata": {}, 61 | "source": [ 62 | "#### Inspect first 5 rows and data types of the dataset" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 4, 68 | "id": "1f556097", 69 | "metadata": { 70 | "scrolled": true 71 | }, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/html": [ 76 | "
\n", 77 | "\n", 90 | "\n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | "
ZipPrice
01091 CR685000.0
11059 EL475000.0
21097 SM850000.0
31060 TH580000.0
41036 KN720000.0
\n", 126 | "
" 127 | ], 128 | "text/plain": [ 129 | " Zip Price\n", 130 | "0 1091 CR 685000.0\n", 131 | "1 1059 EL 475000.0\n", 132 | "2 1097 SM 850000.0\n", 133 | "3 1060 TH 580000.0\n", 134 | "4 1036 KN 720000.0" 135 | ] 136 | }, 137 | "execution_count": 4, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "houseprices_data.head()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 5, 149 | "id": "bdf17c4b", 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "(924, 2)" 156 | ] 157 | }, 158 | "execution_count": 5, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "houseprices_data.shape" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 6, 170 | "id": "b3601b62", 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "data": { 175 | "text/plain": [ 176 | "Zip object\n", 177 | "Price float64\n", 178 | "dtype: object" 179 | ] 180 | }, 181 | "execution_count": 6, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "houseprices_data.dtypes" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "id": "09bbdf22", 193 | "metadata": {}, 194 | "source": [ 195 | "#### Create a Summary Table using the describe method in Pandas" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "id": "aa501b1d", 202 | "metadata": {}, 203 | "outputs": [ 204 | { 205 | "data": { 206 | "text/html": [ 207 | "
\n", 208 | "\n", 221 | "\n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | "
Price
count9.200000e+02
mean6.220654e+05
std5.389942e+05
min1.750000e+05
25%3.500000e+05
50%4.670000e+05
75%7.000000e+05
max5.950000e+06
\n", 263 | "
" 264 | ], 265 | "text/plain": [ 266 | " Price\n", 267 | "count 9.200000e+02\n", 268 | "mean 6.220654e+05\n", 269 | "std 5.389942e+05\n", 270 | "min 1.750000e+05\n", 271 | "25% 3.500000e+05\n", 272 | "50% 4.670000e+05\n", 273 | "75% 7.000000e+05\n", 274 | "max 5.950000e+06" 275 | ] 276 | }, 277 | "execution_count": 7, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "houseprices_data.describe()" 284 | ] 285 | } 286 | ], 287 | "metadata": { 288 | "kernelspec": { 289 | "display_name": "Python 3", 290 | "language": "python", 291 | "name": "python3" 292 | }, 293 | "language_info": { 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "file_extension": ".py", 299 | "mimetype": "text/x-python", 300 | "name": "python", 301 | "nbconvert_exporter": "python", 302 | "pygments_lexer": "ipython3", 303 | "version": "3.7.1" 304 | } 305 | }, 306 | "nbformat": 4, 307 | "nbformat_minor": 5 308 | } 309 | -------------------------------------------------------------------------------- /Ch5/.ipynb_checkpoints/2. Creating CrosstabTwo-way table on bivariate data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "import matplotlib.pyplot as plt\n", 20 | "import seaborn as sns" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "a2050fe1", 26 | "metadata": {}, 27 | "source": [ 28 | "#### Load dataset" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "id": "3a02fac1", 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "penguins_data = pd.read_csv(\"data/penguins_size.csv\")" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "id": "e81aa501", 44 | "metadata": {}, 45 | "source": [ 46 | "#### Subset for relevant columns" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 3, 52 | "id": "9279d462", 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "penguins_data = penguins_data[['species','culmen_length_mm','sex']]" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "id": "128c9ecd", 62 | "metadata": {}, 63 | "source": [ 64 | "#### Inspect first 5 rows and data types of the dataset" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 4, 70 | "id": "1f556097", 71 | "metadata": { 72 | "scrolled": true 73 | }, 74 | "outputs": [ 75 | { 76 | "data": { 77 | "text/html": [ 78 | "
\n", 79 | "\n", 92 | "\n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | "
speciesculmen_length_mmsex
0Adelie39.1MALE
1Adelie39.5FEMALE
2Adelie40.3FEMALE
3AdelieNaNNaN
4Adelie36.7FEMALE
\n", 134 | "
" 135 | ], 136 | "text/plain": [ 137 | " species culmen_length_mm sex\n", 138 | "0 Adelie 39.1 MALE\n", 139 | "1 Adelie 39.5 FEMALE\n", 140 | "2 Adelie 40.3 FEMALE\n", 141 | "3 Adelie NaN NaN\n", 142 | "4 Adelie 36.7 FEMALE" 143 | ] 144 | }, 145 | "execution_count": 4, 146 | "metadata": {}, 147 | "output_type": "execute_result" 148 | } 149 | ], 150 | "source": [ 151 | "penguins_data.head()" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 5, 157 | "id": "bdf17c4b", 158 | "metadata": {}, 159 | "outputs": [ 160 | { 161 | "data": { 162 | "text/plain": [ 163 | "(344, 3)" 164 | ] 165 | }, 166 | "execution_count": 5, 167 | "metadata": {}, 168 | "output_type": "execute_result" 169 | } 170 | ], 171 | "source": [ 172 | "penguins_data.shape" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 6, 178 | "id": "b3601b62", 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "data": { 183 | "text/plain": [ 184 | "species object\n", 185 | "culmen_length_mm float64\n", 186 | "sex object\n", 187 | "dtype: object" 188 | ] 189 | }, 190 | "execution_count": 6, 191 | "metadata": {}, 192 | "output_type": "execute_result" 193 | } 194 | ], 195 | "source": [ 196 | "penguins_data.dtypes" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "id": "09bbdf22", 202 | "metadata": {}, 203 | "source": [ 204 | "#### Create a Crosstab in Pandas" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 7, 210 | "id": "60b95d7c", 211 | "metadata": { 212 | "scrolled": true 213 | }, 214 | "outputs": [ 215 | { 216 | "data": { 217 | "text/html": [ 218 | "
\n", 219 | "\n", 232 | "\n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | "
sexFEMALEMALE
species
Adelie7373
Chinstrap3434
Gentoo5862
\n", 263 | "
" 264 | ], 265 | "text/plain": [ 266 | "sex FEMALE MALE\n", 267 | "species \n", 268 | "Adelie 73 73\n", 269 | "Chinstrap 34 34\n", 270 | "Gentoo 58 62" 271 | ] 272 | }, 273 | "execution_count": 7, 274 | "metadata": {}, 275 | "output_type": "execute_result" 276 | } 277 | ], 278 | "source": [ 279 | "pd.crosstab(index= penguins_data['species'], columns= penguins_data['sex'])" 280 | ] 281 | } 282 | ], 283 | "metadata": { 284 | "kernelspec": { 285 | "display_name": "Python 3", 286 | "language": "python", 287 | "name": "python3" 288 | }, 289 | "language_info": { 290 | "codemirror_mode": { 291 | "name": "ipython", 292 | "version": 3 293 | }, 294 | "file_extension": ".py", 295 | "mimetype": "text/x-python", 296 | "name": "python", 297 | "nbconvert_exporter": "python", 298 | "pygments_lexer": "ipython3", 299 | "version": "3.7.1" 300 | } 301 | }, 302 | "nbformat": 4, 303 | "nbformat_minor": 5 304 | } 305 | -------------------------------------------------------------------------------- /Ch5/2. Creating CrosstabTwo-way table on bivariate data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "import matplotlib.pyplot as plt\n", 20 | "import seaborn as sns" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "a2050fe1", 26 | "metadata": {}, 27 | "source": [ 28 | "#### Load dataset" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "id": "3a02fac1", 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "penguins_data = pd.read_csv(\"data/penguins_size.csv\")" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "id": "e81aa501", 44 | "metadata": {}, 45 | "source": [ 46 | "#### Subset for relevant columns" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 3, 52 | "id": "9279d462", 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "penguins_data = penguins_data[['species','culmen_length_mm','sex']]" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "id": "128c9ecd", 62 | "metadata": {}, 63 | "source": [ 64 | "#### Inspect first 5 rows and data types of the dataset" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 4, 70 | "id": "1f556097", 71 | "metadata": { 72 | "scrolled": true 73 | }, 74 | "outputs": [ 75 | { 76 | "data": { 77 | "text/html": [ 78 | "
\n", 79 | "\n", 92 | "\n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | "
speciesculmen_length_mmsex
0Adelie39.1MALE
1Adelie39.5FEMALE
2Adelie40.3FEMALE
3AdelieNaNNaN
4Adelie36.7FEMALE
\n", 134 | "
" 135 | ], 136 | "text/plain": [ 137 | " species culmen_length_mm sex\n", 138 | "0 Adelie 39.1 MALE\n", 139 | "1 Adelie 39.5 FEMALE\n", 140 | "2 Adelie 40.3 FEMALE\n", 141 | "3 Adelie NaN NaN\n", 142 | "4 Adelie 36.7 FEMALE" 143 | ] 144 | }, 145 | "execution_count": 4, 146 | "metadata": {}, 147 | "output_type": "execute_result" 148 | } 149 | ], 150 | "source": [ 151 | "penguins_data.head()" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 5, 157 | "id": "bdf17c4b", 158 | "metadata": {}, 159 | "outputs": [ 160 | { 161 | "data": { 162 | "text/plain": [ 163 | "(344, 3)" 164 | ] 165 | }, 166 | "execution_count": 5, 167 | "metadata": {}, 168 | "output_type": "execute_result" 169 | } 170 | ], 171 | "source": [ 172 | "penguins_data.shape" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 6, 178 | "id": "b3601b62", 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "data": { 183 | "text/plain": [ 184 | "species object\n", 185 | "culmen_length_mm float64\n", 186 | "sex object\n", 187 | "dtype: object" 188 | ] 189 | }, 190 | "execution_count": 6, 191 | "metadata": {}, 192 | "output_type": "execute_result" 193 | } 194 | ], 195 | "source": [ 196 | "penguins_data.dtypes" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "id": "09bbdf22", 202 | "metadata": {}, 203 | "source": [ 204 | "#### Create a Crosstab in Pandas" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 7, 210 | "id": "60b95d7c", 211 | "metadata": { 212 | "scrolled": true 213 | }, 214 | "outputs": [ 215 | { 216 | "data": { 217 | "text/html": [ 218 | "
\n", 219 | "\n", 232 | "\n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | "
sexFEMALEMALE
species
Adelie7373
Chinstrap3434
Gentoo5862
\n", 263 | "
" 264 | ], 265 | "text/plain": [ 266 | "sex FEMALE MALE\n", 267 | "species \n", 268 | "Adelie 73 73\n", 269 | "Chinstrap 34 34\n", 270 | "Gentoo 58 62" 271 | ] 272 | }, 273 | "execution_count": 7, 274 | "metadata": {}, 275 | "output_type": "execute_result" 276 | } 277 | ], 278 | "source": [ 279 | "pd.crosstab(index= penguins_data['species'], columns= penguins_data['sex'])" 280 | ] 281 | } 282 | ], 283 | "metadata": { 284 | "kernelspec": { 285 | "display_name": "Python 3", 286 | "language": "python", 287 | "name": "python3" 288 | }, 289 | "language_info": { 290 | "codemirror_mode": { 291 | "name": "ipython", 292 | "version": 3 293 | }, 294 | "file_extension": ".py", 295 | "mimetype": "text/x-python", 296 | "name": "python", 297 | "nbconvert_exporter": "python", 298 | "pygments_lexer": "ipython3", 299 | "version": "3.7.1" 300 | } 301 | }, 302 | "nbformat": 4, 303 | "nbformat_minor": 5 304 | } 305 | -------------------------------------------------------------------------------- /Ch6/Data/website_survey.csv: -------------------------------------------------------------------------------- 1 | user_id,language,platform,gender,age,q1,q2,q3,q4,q5,q6,q7,q8,q9,q10,q11,q12,q13,q14,q15,q16,q17,q18,q19,q20,q21,q22,q23,q24,q25,q26 2 | 080c468b-27c0-455c-aa63-b8f807f2e3d7,en,Desktop,male,34,9,7,6,6,7,7,6,6,5,5,5,3,6,4,5,4,8,4,6,5,6,6,5,2,5,3 3 | 0b0379c7-04db-4c85-84bd-a2bd55329e29,en,Mobile,female,19,10,10,10,9,10,10,10,10,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,8 4 | 0e623280-b28b-4d4a-8eea-0732f09ed497,en,Mobile,female,19,10,10,10,10,10,10,10,10,10,10,10,10,6,9,7,9,9,8,10,10,10,10,9,9,8,8 5 | 045dc0f3-a730-4d03-a615-f51814e5b04f,en,Mobile,male,21,5,8,5,5,5,5,5,6,6,8,7,8,10,9,5,7,7,9,10,8,8,10,10,8,10,6 6 | 092f2ee7-5281-4a09-9bce-e5523b95b53b,en,null,female,53,9,10,9,10,9,7,8,5,7,8,7,8,10,9,8,7,7,8,8,8,9,9,10,10,10,10 7 | 19c0851c-cbb1-45b5-9e78-884b761802ac,en,Mobile,female,53,10,9,9,9,5,6,8,7,4,5,5,5,4,4,8,5,8,7,7,7,8,6,10,10,10,10 8 | 1a386158-d08c-40b1-9365-e758927d3352,en,Desktop,male,18,8,7,7,6,6,6,7,7,7,6,6,6,5,6,7,6,7,7,7,6,8,7,5,6,8,7 9 | 78e234a7-c15a-49e5-b38c-7fb47b294ca8,en,Mobile,female,20,9,5,10,6,8,10,9,10,7,6,10,10,9,10,10,9,10,9,9,9,10,10,2,8,8,7 10 | 1331586c-ee72-4fc1-b622-812f54a49f6c,en,Mobile,female,19,8,8,10,9,9,8,9,9,9,5,7,7,8,8,9,7,9,9,9,9,9,9,9,9,9,9 11 | 35913db6-2e1e-4028-a631-473241be215c,en,Mobile,male,19,7,8,4,7,8,9,8,9,7,9,8,8,7,8,8,8,7,8,6,6,7,6,7,6,6,5 12 | 31d15d10-ae4f-4527-9aba-c0f93d329cb2,en,Mobile,female,13,9,9,10,8,10,9,8,9,10,6,9,9,9,9,10,8,10,9,9,9,9,8,9,10,9,8 13 | 1e41998a-cda9-4b44-aa34-8bcde6752d0a,en,Mobile,male,18,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10 14 | 7c390c81-d248-4099-9abc-7b787be98b47,en,Mobile,female,19,10,10,9,8,8,8,7,9,9,9,9,9,9,9,10,9,10,9,9,8,9,8,10,8,7,10 15 | 5f294260-b679-434f-8818-e09ecd02ae70,en,Mobile,female,34,10,8,10,6,10,8,10,9,10,7,6,5,8,9,10,6,10,8,7,8,9,9,10,10,9,9 16 | 76dad81c-1a4a-4153-9932-f03188b1ae25,en,Desktop,female,29,10,10,5,8,9,9,10,9,7,3,10,5,7,10,10,6,8,8,9,9,10,10,10,10,10,10 17 | 3cd5d4f2-97f1-418c-bc49-ff0c2a58c855,en,Mobile,male,19,10,10,10,10,10,3,10,2,7,3,8,5,8,8,8,9,9,10,8,8,7,8,9,8,8,8 18 | 10e9b324-156e-4eb5-aec5-04d8a4d7124c,en,null,male,46,4,4,4,4,4,3,3,3,3,3,4,3,6,4,2,2,2,3,1,1,3,1,4,4,1,1 19 | 20ba7dbd-f51c-49ff-9bf2-a777cd6d13f8,es,Mobile,female,44,9,7,7,7,5,8,9,8,6,6,8,7,7,6,8,7,7,5,6,7,8,8,7,7,7,7 20 | 6167cd49-a021-4f3f-b29b-a7ed48b9d94f,en,Desktop,female,21,8,9,9,9,9,8,7,8,8,7,8,8,7,9,8,9,9,9,9,9,9,9,7,8,8,8 21 | 824afe21-5c5b-4e61-a556-75bef43196e9,en,Mobile,male,20,7,7,8,8,7,7,10,7,7,7,8,8,8,8,8,8,7,9,10,7,8,8,7,7,10,7 22 | 8330a755-ac13-444b-90ca-7ddc4f4f3992,en,Mobile,male,20,9,9,10,9,8,9,8,10,9,7,8,9,10,9,8,9,9,8,8,10,10,9,8,8,10,10 23 | 6ee3efea-f12c-4740-937a-62f34a53e2ec,en,Mobile,female,20,10,9,8,7,8,10,9,9,7,6,10,7,9,10,8,9,10,7,9,8,9,8,8,10,9,9 24 | 3931df86-6c91-45c8-ae55-dd3a1036647f,en,Mobile,male,18,3,6,7,9,7,3,1,8,8,8,8,8,6,8,8,8,9,8,10,8,8,8,9,8,8,8 25 | 60b7d252-ffba-432f-a243-8b0d5f81331d,en,Mobile,female,2,9,9,9,9,8,9,8,9,9,9,9,8,8,8,9,9,9,8,9,9,9,9,9,9,8,8 26 | 75b63c62-9473-4ddc-979f-66b8de0a3683,en,Mobile,male,21,8,7,9,5,5,4,5,5,4,5,4,5,6,5,5,5,6,4,3,3,2,1,1,2,3,4 27 | 518a2693-4f58-463e-ac2c-c5ce06d3c9fb,en,Mobile,male,27,6,6,7,6,8,7,7,9,8,8,7,8,8,7,8,8,8,7,10,7,8,8,10,8,5,8 28 | 8ebeb6f9-b48c-4067-a307-1da81fde67da,en,Mobile,female,18,10,9,10,8,8,9,10,10,10,8,10,6,7,9,10,10,10,9,7,7,10,10,7,7,8,6 29 | 7e80110b-c26c-43b1-8bc2-2515642420ad,en,Mobile,female,45,8,8,8,8,7,8,7,8,7,8,7,7,7,7,8,7,8,7,9,9,8,8,8,8,8,8 30 | 702da1cd-9691-4f0b-9ab9-020f9454d86b,en,Mobile,female,51,9,8,8,8,9,7,6,7,5,7,6,7,8,9,8,8,8,8,8,8,8,8,8,8,8,7 31 | 25602500-ca31-4404-b155-3e070652c2c0,en,null,other,1,7,4,7,6,6,7,6,5,5,8,5,6,6,7,6,5,3,4,5,6,4,5,6,6,6,5 32 | 11cbc3a6-1c92-4f10-abe6-27def200d91d,en,null,male,25,10,10,10,10,10,10,9,9,3,2,7,7,5,5,10,5,9,5,5,5,8,6,6,4,4,5 33 | 803144dd-8bb5-435e-94af-99bace5baf35,en,Mobile,male,21,8,7,9,4,7,9,7,9,9,7,7,8,9,7,5,9,8,8,5,8,5,9,9,9,8,8 34 | 780a59cc-70e6-4148-94c9-47bd1744cd5c,en,Mobile,male,18,6,8,8,4,7,9,9,8,10,10,6,9,10,7,8,4,10,10,7,9,9,9,7,7,6,6 35 | 3cd027c4-c9cf-475c-a678-769abe7e816a,en,Mobile,male,21,9,10,9,9,8,9,9,10,9,9,8,9,10,9,10,10,10,10,8,8,10,10,10,10,9,9 36 | 902d976c-f400-44ab-a0ac-198d94ba035a,en,Mobile,male,37,6,8,6,7,8,7,8,8,8,7,8,9,7,7,5,5,8,8,7,6,6,7,8,6,7,9 37 | 3d551ae7-9275-41b2-8270-260d730a6222,en,Desktop,male,38,5,5,4,7,6,7,6,6,6,3,5,3,6,5,5,5,5,7,5,5,3,5,3,3,3,1 38 | 42798e5c-9aa2-4efd-a370-5635018d4d81,en,Mobile,male,18,10,7,6,8,8,8,8,7,9,7,8,7,7,7,7,7,6,10,8,7,8,9,6,8,10,9 39 | c65d2c31-227d-4750-af1a-1bfd6902d881,en,Mobile,female,45,7,8,10,6,6,6,6,7,10,7,8,6,6,10,8,8,9,7,8,9,8,9,7,7,6,6 40 | 4b737514-74b7-433a-9dde-ea224a6deeb9,en,Mobile,male,21,8,7,2,3,8,10,10,5,5,9,10,10,10,7,10,10,10,5,10,10,1,10,10,10,10,10 41 | 40d5e94f-2468-4f73-a0a0-e5a2c0443f06,en,null,other,1,6,9,6,6,6,5,6,6,6,5,7,5,6,7,4,6,7,5,6,6,6,7,7,6,6,8 42 | 95897683-0e62-4f31-8575-77136b4ebebe,en,Mobile,male,20,9,9,9,9,7,2,2,3,3,3,1,1,1,7,1,1,1,8,1,1,1,1,1,7,3,3 43 | 6cd79452-c315-4dbe-93af-8c2077a67a56,en,Mobile,female,26,9,9,5,10,9,9,7,9,5,8,8,10,10,10,10,8,10,7,10,9,8,10,10,8,8,8 44 | 334df000-99c4-4ccd-b5b5-4b8cabdad476,en,Desktop,male,18,8,6,5,4,5,8,8,5,3,6,5,5,9,9,9,5,8,4,6,4,4,7,9,8,3,8 45 | 1537bb2d-2d89-4d66-b6c7-f3659b85403f,en,Mobile,male,20,8,9,9,7,8,6,10,8,9,7,8,7,10,9,8,9,9,10,10,10,8,9,10,9,10,10 46 | b2a12ed0-6436-4cbc-b70d-d0a7ba0e0981,en,Desktop,male,19,10,10,10,10,7,8,8,10,10,10,8,10,10,10,10,10,10,10,10,10,10,10,10,10,8,8 47 | e03a0573-ee71-4c91-a8ca-0b1eb82c1300,en,Desktop,male,37,9,8,9,9,9,7,7,8,2,4,5,5,7,6,7,6,7,6,7,5,1,3,7,7,7,3 48 | ba818a91-0bf1-47d4-bc93-e5befe1c26e6,en,Mobile,male,22,8,9,9,7,8,9,9,8,8,5,8,8,8,8,7,9,8,7,9,9,9,8,8,8,8,8 49 | d4a66c13-cd50-4171-8849-d52f0898a156,en,Mobile,female,20,7,7,8,7,7,7,7,10,7,6,6,6,6,6,6,6,6,10,6,7,7,7,7,7,8,8 50 | ca6bae45-d4f7-4d52-bd43-e4d3542a6f93,en,Mobile,male,21,8,8,9,8,8,3,8,5,5,3,7,6,6,6,6,6,9,7,7,7,7,8,5,6,7,5 51 | e5eccf62-17fb-4353-b886-e547a9699dd4,en,null,male,45,8,8,9,10,5,8,8,8,1,5,10,1,10,9,1,8,10,9,9,9,9,10,8,10,10,9 52 | f98ab946-62e2-4cb1-97a4-3301ccccba1b,en,Mobile,female,19,1,3,2,3,2,3,1,2,8,2,3,8,1,1,2,9,1,1,2,2,1,1,2,2,3,3 53 | cb553655-ef6a-4f28-8271-131cdc9fc2ec,en,Mobile,male,40,9,10,7,7,8,8,6,8,7,7,9,9,6,7,9,3,5,5,1,8,8,8,5,1,9,9 54 | ea3518ef-6655-4a33-bf76-337e21de5689,en,Mobile,female,46,10,9,9,9,9,9,6,8,7,4,5,4,7,7,7,6,8,5,8,8,8,8,9,8,8,7 55 | f9cc1fc5-b439-4f4b-8257-0e5b2f47eb02,es,Desktop,male,55,8,9,8,7,8,9,9,7,7,5,5,3,5,5,9,5,10,5,6,6,8,7,6,9,8,8 56 | ca55018f-21c4-41c2-b38f-b966fc3a8075,en,Mobile,male,19,9,8,9,9,9,9,9,7,8,6,8,10,7,9,8,7,7,9,7,7,10,8,9,8,8,10 57 | a663330e-6e40-4813-b6f7-232aba1299eb,en,Mobile,male,19,6,8,9,10,9,10,7,8,9,8,7,10,10,10,10,10,10,7,8,9,8,10,10,7,10,8 58 | be2f133c-eed1-4582-b8b8-d02aa1e594ad,en,Mobile,male,21,8,8,6,6,6,6,8,9,8,5,7,8,8,9,9,7,7,9,9,9,9,9,8,8,8,8 59 | c9301569-631c-4d4d-aa30-2a65a5873029,en,Mobile,male,45,9,8,8,8,7,6,8,7,7,4,6,6,7,7,8,6,8,7,8,9,9,8,6,6,7,7 60 | bac1cde8-4780-4edb-8062-46b1286d3f47,en,Mobile,male,19,10,10,8,8,8,9,8,7,8,10,7,9,7,9,7,9,7,10,7,7,9,8,10,7,9,9 61 | ba4eba66-ab7c-4704-84e8-4218f019f196,en,null,male,23,10,10,10,10,6,8,7,7,6,5,7,5,8,8,8,6,8,5,8,8,9,9,8,8,8,6 62 | d0d86600-3649-4760-bb23-89f353e7d20b,en,Mobile,female,19,4,2,5,3,6,8,8,7,5,5,4,7,9,9,5,6,9,9,3,3,5,8,10,10,10,9 63 | 4f914c97-7283-4ed0-94f7-43bcc99e9e28,en,Mobile,male,21,7,6,3,5,7,8,6,6,6,3,4,4,5,5,6,5,7,6,4,4,9,7,5,5,4,3 64 | 9d832240-0310-4657-90c0-20525d24d766,en,Mobile,male,21,9,8,6,6,8,9,8,7,9,9,7,8,8,7,8,7,9,9,10,9,10,7,9,7,9,8 65 | f6ae26b4-f319-4305-a6ca-9cdd8528695d,es,Desktop,male,18,9,9,9,9,10,10,9,10,8,4,4,4,9,7,8,4,10,4,8,8,7,9,9,9,9,9 66 | f038ad7e-db1f-430f-8974-12c83a20eb8e,en,Mobile,male,20,10,9,9,9,8,9,9,9,10,10,10,9,9,10,9,9,10,10,10,10,10,10,10,10,10,10 67 | f3b05d57-f280-41fc-a32d-d94647bade9a,en,Mobile,female,18,1,5,8,6,7,7,8,8,7,10,6,8,8,10,8,8,8,5,7,4,7,6,5,8,9,7 68 | 13e305ec-6c39-4b60-9cf1-5ddeae54b677,en,Mobile,male,20,7,8,8,9,7,9,8,8,7,6,7,6,8,8,7,7,8,7,8,9,7,8,7,6,5,9 69 | e0ffda50-97e7-4872-a467-cc143cbac8fc,en,Mobile,female,12,10,6,10,6,10,10,9,8,9,6,9,10,8,8,9,9,10,8,8,9,9,9,10,10,10,8 70 | cb71d11a-4e8d-4ab8-80be-ac1f46f5b2d1,en,null,male,1,9,9,9,9,7,7,8,8,3,5,6,5,9,8,6,6,9,6,7,8,8,9,5,5,7,5 71 | cd3dad39-da3f-4ada-9e09-a9832ca7bdd7,en,Mobile,male,21,8,9,10,9,8,8,7,9,7,9,8,7,8,5,9,6,10,8,10,9,9,8,6,7,8,6 72 | f3b94dcb-2dde-48d7-968d-606fcddd692d,en,Desktop,male,46,10,6,10,6,6,6,6,6,2,1,1,1,5,7,7,1,5,8,6,5,8,6,1,1,1,1 73 | ddbc5616-edb8-4052-9186-6e105e4491f6,en,Mobile,male,20,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10 74 | a7788435-7465-4f4d-9f4f-d412db68b295,en,Mobile,female,18,1,1,1,2,2,9,9,7,10,6,6,3,6,6,8,7,9,9,8,7,5,7,10,10,6,10 75 | -------------------------------------------------------------------------------- /Ch7/.ipynb_checkpoints/6. Performing Stationarity checks on Time series data-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd\n", 20 | "import matplotlib.pyplot as plt\n", 21 | "import seaborn as sns\n", 22 | "from statsmodels.tsa.stattools import adfuller" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "id": "a2050fe1", 28 | "metadata": {}, 29 | "source": [ 30 | "#### Load dataset" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "id": "3a02fac1", 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "air_traffic_data = pd.read_csv(\"data/SF_Air_Traffic_Passenger_Statistics_Transformed.csv\")" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "id": "128c9ecd", 46 | "metadata": {}, 47 | "source": [ 48 | "#### Inspect first 5 rows and data types of the dataset" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 3, 54 | "id": "1f556097", 55 | "metadata": { 56 | "scrolled": true 57 | }, 58 | "outputs": [ 59 | { 60 | "data": { 61 | "text/html": [ 62 | "
\n", 63 | "\n", 76 | "\n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | "
DateTotal Passenger Count
02006012448889
12006022223024
22006032708778
32006042773293
42006052829000
\n", 112 | "
" 113 | ], 114 | "text/plain": [ 115 | " Date Total Passenger Count\n", 116 | "0 200601 2448889\n", 117 | "1 200602 2223024\n", 118 | "2 200603 2708778\n", 119 | "3 200604 2773293\n", 120 | "4 200605 2829000" 121 | ] 122 | }, 123 | "execution_count": 3, 124 | "metadata": {}, 125 | "output_type": "execute_result" 126 | } 127 | ], 128 | "source": [ 129 | "air_traffic_data.head()" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 4, 135 | "id": "bdf17c4b", 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "data": { 140 | "text/plain": [ 141 | "(132, 2)" 142 | ] 143 | }, 144 | "execution_count": 4, 145 | "metadata": {}, 146 | "output_type": "execute_result" 147 | } 148 | ], 149 | "source": [ 150 | "air_traffic_data.shape" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 5, 156 | "id": "b3601b62", 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "data": { 161 | "text/plain": [ 162 | "Date int64\n", 163 | "Total Passenger Count int64\n", 164 | "dtype: object" 165 | ] 166 | }, 167 | "execution_count": 5, 168 | "metadata": {}, 169 | "output_type": "execute_result" 170 | } 171 | ], 172 | "source": [ 173 | "air_traffic_data.dtypes" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "id": "32043140", 179 | "metadata": {}, 180 | "source": [ 181 | "#### Transform date int to date" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 6, 187 | "id": "592fcb98", 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "air_traffic_data['Date']= pd.to_datetime(air_traffic_data['Date'], format = \"%Y%m\")" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 7, 197 | "id": "0a562145", 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "data": { 202 | "text/plain": [ 203 | "Date datetime64[ns]\n", 204 | "Total Passenger Count int64\n", 205 | "dtype: object" 206 | ] 207 | }, 208 | "execution_count": 7, 209 | "metadata": {}, 210 | "output_type": "execute_result" 211 | } 212 | ], 213 | "source": [ 214 | "air_traffic_data.dtypes" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "id": "f525a931", 220 | "metadata": {}, 221 | "source": [ 222 | "#### Set date as index" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 8, 228 | "id": "8cdc6f00", 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/plain": [ 234 | "(132, 1)" 235 | ] 236 | }, 237 | "execution_count": 8, 238 | "metadata": {}, 239 | "output_type": "execute_result" 240 | } 241 | ], 242 | "source": [ 243 | "air_traffic_data.set_index('Date',inplace = True)\n", 244 | "air_traffic_data.shape" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "id": "60de4cbf", 250 | "metadata": {}, 251 | "source": [ 252 | "#### Check Stationarity" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 9, 258 | "id": "d0be68f5", 259 | "metadata": {}, 260 | "outputs": [ 261 | { 262 | "data": { 263 | "text/plain": [ 264 | "(0.7015289287377346,\n", 265 | " 0.9898683326442054,\n", 266 | " 13,\n", 267 | " 118,\n", 268 | " {'1%': -3.4870216863700767,\n", 269 | " '5%': -2.8863625166643136,\n", 270 | " '10%': -2.580009026141913},\n", 271 | " 3039.0876643475)" 272 | ] 273 | }, 274 | "execution_count": 9, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "adf_result = adfuller(air_traffic_data)\n", 281 | "adf_result" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 10, 287 | "id": "46f1e971", 288 | "metadata": {}, 289 | "outputs": [ 290 | { 291 | "name": "stdout", 292 | "output_type": "stream", 293 | "text": [ 294 | "ADF Test Statistic: 0.701529\n", 295 | "p-value: 0.989868\n", 296 | "Critical Values:\n", 297 | "{'1%': -3.4870216863700767, '5%': -2.8863625166643136, '10%': -2.580009026141913}\n", 298 | "Failed to Reject Null Hypothesis - Time Series is Non-Stationary\n" 299 | ] 300 | } 301 | ], 302 | "source": [ 303 | "print('ADF Test Statistic: %f' % adf_result[0])\n", 304 | "\n", 305 | "print('p-value: %f' % adf_result[1])\n", 306 | "\n", 307 | "print('Critical Values:')\n", 308 | "\n", 309 | "print(adf_result[4])\n", 310 | "\n", 311 | "if adf_result[0] < adf_result[4][\"5%\"]:\n", 312 | " print (\"Reject Null Hypothesis - Time Series is Stationary\")\n", 313 | "else:\n", 314 | " print (\"Failed to Reject Null Hypothesis - Time Series is Non-Stationary\")" 315 | ] 316 | } 317 | ], 318 | "metadata": { 319 | "kernelspec": { 320 | "display_name": "Python 3", 321 | "language": "python", 322 | "name": "python3" 323 | }, 324 | "language_info": { 325 | "codemirror_mode": { 326 | "name": "ipython", 327 | "version": 3 328 | }, 329 | "file_extension": ".py", 330 | "mimetype": "text/x-python", 331 | "name": "python", 332 | "nbconvert_exporter": "python", 333 | "pygments_lexer": "ipython3", 334 | "version": "3.7.1" 335 | } 336 | }, 337 | "nbformat": 4, 338 | "nbformat_minor": 5 339 | } 340 | -------------------------------------------------------------------------------- /Ch7/6. Performing Stationarity checks on Time series data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "2dbc33b1", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Import relevant libraries" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "0ae67507", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import numpy as np\n", 19 | "import pandas as pd\n", 20 | "import matplotlib.pyplot as plt\n", 21 | "import seaborn as sns\n", 22 | "from statsmodels.tsa.stattools import adfuller" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "id": "a2050fe1", 28 | "metadata": {}, 29 | "source": [ 30 | "#### Load dataset" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "id": "3a02fac1", 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "air_traffic_data = pd.read_csv(\"data/SF_Air_Traffic_Passenger_Statistics_Transformed.csv\")" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "id": "128c9ecd", 46 | "metadata": {}, 47 | "source": [ 48 | "#### Inspect first 5 rows and data types of the dataset" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 3, 54 | "id": "1f556097", 55 | "metadata": { 56 | "scrolled": true 57 | }, 58 | "outputs": [ 59 | { 60 | "data": { 61 | "text/html": [ 62 | "
\n", 63 | "\n", 76 | "\n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | "
DateTotal Passenger Count
02006012448889
12006022223024
22006032708778
32006042773293
42006052829000
\n", 112 | "
" 113 | ], 114 | "text/plain": [ 115 | " Date Total Passenger Count\n", 116 | "0 200601 2448889\n", 117 | "1 200602 2223024\n", 118 | "2 200603 2708778\n", 119 | "3 200604 2773293\n", 120 | "4 200605 2829000" 121 | ] 122 | }, 123 | "execution_count": 3, 124 | "metadata": {}, 125 | "output_type": "execute_result" 126 | } 127 | ], 128 | "source": [ 129 | "air_traffic_data.head()" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 4, 135 | "id": "bdf17c4b", 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "data": { 140 | "text/plain": [ 141 | "(132, 2)" 142 | ] 143 | }, 144 | "execution_count": 4, 145 | "metadata": {}, 146 | "output_type": "execute_result" 147 | } 148 | ], 149 | "source": [ 150 | "air_traffic_data.shape" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 5, 156 | "id": "b3601b62", 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "data": { 161 | "text/plain": [ 162 | "Date int64\n", 163 | "Total Passenger Count int64\n", 164 | "dtype: object" 165 | ] 166 | }, 167 | "execution_count": 5, 168 | "metadata": {}, 169 | "output_type": "execute_result" 170 | } 171 | ], 172 | "source": [ 173 | "air_traffic_data.dtypes" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "id": "32043140", 179 | "metadata": {}, 180 | "source": [ 181 | "#### Transform date int to date" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 6, 187 | "id": "592fcb98", 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "air_traffic_data['Date']= pd.to_datetime(air_traffic_data['Date'], format = \"%Y%m\")" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 7, 197 | "id": "0a562145", 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "data": { 202 | "text/plain": [ 203 | "Date datetime64[ns]\n", 204 | "Total Passenger Count int64\n", 205 | "dtype: object" 206 | ] 207 | }, 208 | "execution_count": 7, 209 | "metadata": {}, 210 | "output_type": "execute_result" 211 | } 212 | ], 213 | "source": [ 214 | "air_traffic_data.dtypes" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "id": "f525a931", 220 | "metadata": {}, 221 | "source": [ 222 | "#### Set date as index" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 8, 228 | "id": "8cdc6f00", 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/plain": [ 234 | "(132, 1)" 235 | ] 236 | }, 237 | "execution_count": 8, 238 | "metadata": {}, 239 | "output_type": "execute_result" 240 | } 241 | ], 242 | "source": [ 243 | "air_traffic_data.set_index('Date',inplace = True)\n", 244 | "air_traffic_data.shape" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "id": "60de4cbf", 250 | "metadata": {}, 251 | "source": [ 252 | "#### Check Stationarity" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 9, 258 | "id": "d0be68f5", 259 | "metadata": {}, 260 | "outputs": [ 261 | { 262 | "data": { 263 | "text/plain": [ 264 | "(0.7015289287377346,\n", 265 | " 0.9898683326442054,\n", 266 | " 13,\n", 267 | " 118,\n", 268 | " {'1%': -3.4870216863700767,\n", 269 | " '5%': -2.8863625166643136,\n", 270 | " '10%': -2.580009026141913},\n", 271 | " 3039.0876643475)" 272 | ] 273 | }, 274 | "execution_count": 9, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "adf_result = adfuller(air_traffic_data)\n", 281 | "adf_result" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 10, 287 | "id": "46f1e971", 288 | "metadata": {}, 289 | "outputs": [ 290 | { 291 | "name": "stdout", 292 | "output_type": "stream", 293 | "text": [ 294 | "ADF Test Statistic: 0.701529\n", 295 | "p-value: 0.989868\n", 296 | "Critical Values:\n", 297 | "{'1%': -3.4870216863700767, '5%': -2.8863625166643136, '10%': -2.580009026141913}\n", 298 | "Failed to Reject Null Hypothesis - Time Series is Non-Stationary\n" 299 | ] 300 | } 301 | ], 302 | "source": [ 303 | "print('ADF Test Statistic: %f' % adf_result[0])\n", 304 | "\n", 305 | "print('p-value: %f' % adf_result[1])\n", 306 | "\n", 307 | "print('Critical Values:')\n", 308 | "\n", 309 | "print(adf_result[4])\n", 310 | "\n", 311 | "if adf_result[0] < adf_result[4][\"5%\"]:\n", 312 | " print (\"Reject Null Hypothesis - Time Series is Stationary\")\n", 313 | "else:\n", 314 | " print (\"Failed to Reject Null Hypothesis - Time Series is Non-Stationary\")" 315 | ] 316 | } 317 | ], 318 | "metadata": { 319 | "kernelspec": { 320 | "display_name": "Python 3", 321 | "language": "python", 322 | "name": "python3" 323 | }, 324 | "language_info": { 325 | "codemirror_mode": { 326 | "name": "ipython", 327 | "version": 3 328 | }, 329 | "file_extension": ".py", 330 | "mimetype": "text/x-python", 331 | "name": "python", 332 | "nbconvert_exporter": "python", 333 | "pygments_lexer": "ipython3", 334 | "version": "3.7.1" 335 | } 336 | }, 337 | "nbformat": 4, 338 | "nbformat_minor": 5 339 | } 340 | -------------------------------------------------------------------------------- /Ch7/Data/DataSF_Data_Dictionary_for_Air_Traffic_Passenger_Statistics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PacktPublishing/Exploratory-Data-Analysis-with-Python-Cookbook/2ab376175c0cf8bc368c986484e862fa3a5ba319/Ch7/Data/DataSF_Data_Dictionary_for_Air_Traffic_Passenger_Statistics.pdf -------------------------------------------------------------------------------- /Ch7/Data/SF_Air_Traffic_Passenger_Statistics_Transformed.csv: -------------------------------------------------------------------------------- 1 | Date,Total Passenger Count 2 | 200601,2448889 3 | 200602,2223024 4 | 200603,2708778 5 | 200604,2773293 6 | 200605,2829000 7 | 200606,3071396 8 | 200607,3227605 9 | 200608,3143839 10 | 200609,2720100 11 | 200610,2834959 12 | 200611,2653887 13 | 200612,2698200 14 | 200701,2507430 15 | 200702,2304990 16 | 200703,2820085 17 | 200704,2869247 18 | 200705,3056934 19 | 200706,3263621 20 | 200707,3382382 21 | 200708,3436417 22 | 200709,2957530 23 | 200710,3129309 24 | 200711,2922500 25 | 200712,2903637 26 | 200801,2670053 27 | 200802,2595676 28 | 200803,3127387 29 | 200804,3029021 30 | 200805,3305954 31 | 200806,3453751 32 | 200807,3603946 33 | 200808,3612297 34 | 200809,3004720 35 | 200810,3124451 36 | 200811,2744485 37 | 200812,2962937 38 | 200901,2644539 39 | 200902,2359800 40 | 200903,2925918 41 | 200904,3024973 42 | 200905,3177100 43 | 200906,3419595 44 | 200907,3649702 45 | 200908,3650668 46 | 200909,3191526 47 | 200910,3249428 48 | 200911,2971484 49 | 200912,3074209 50 | 201001,2785466 51 | 201002,2515361 52 | 201003,3105958 53 | 201004,3139059 54 | 201005,3380355 55 | 201006,3612886 56 | 201007,3765824 57 | 201008,3771842 58 | 201009,3356365 59 | 201010,3490100 60 | 201011,3163659 61 | 201012,3167124 62 | 201101,2883810 63 | 201102,2610667 64 | 201103,3129205 65 | 201104,3200527 66 | 201105,3547804 67 | 201106,3766323 68 | 201107,3935589 69 | 201108,3917884 70 | 201109,3564970 71 | 201110,3602455 72 | 201111,3326859 73 | 201112,3441693 74 | 201201,3211600 75 | 201202,2998119 76 | 201203,3472440 77 | 201204,3563007 78 | 201205,3820570 79 | 201206,4107195 80 | 201207,4284443 81 | 201208,4356216 82 | 201209,3819379 83 | 201210,3844987 84 | 201211,3478890 85 | 201212,3443039 86 | 201301,3204637 87 | 201302,2966477 88 | 201303,3593364 89 | 201304,3604104 90 | 201305,3933016 91 | 201306,4146797 92 | 201307,4176486 93 | 201308,4347059 94 | 201309,3781168 95 | 201310,3910790 96 | 201311,3466878 97 | 201312,3814984 98 | 201401,3432625 99 | 201402,3078405 100 | 201403,3765504 101 | 201404,3881893 102 | 201405,4147096 103 | 201406,4321833 104 | 201407,4499221 105 | 201408,4524918 106 | 201409,3919072 107 | 201410,4059443 108 | 201411,3628786 109 | 201412,3855835 110 | 201501,3550084 111 | 201502,3248144 112 | 201503,4001521 113 | 201504,4021677 114 | 201505,4361140 115 | 201506,4558511 116 | 201507,4801148 117 | 201508,4796653 118 | 201509,4201394 119 | 201510,4374749 120 | 201511,4013814 121 | 201512,4129052 122 | 201601,3748529 123 | 201602,3543639 124 | 201603,4137679 125 | 201604,4172512 126 | 201605,4573996 127 | 201606,4922125 128 | 201607,5168724 129 | 201608,5110638 130 | 201609,4543759 131 | 201610,4571997 132 | 201611,4266481 133 | 201612,4343369 134 | -------------------------------------------------------------------------------- /Data/datasets_notes.numbers: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PacktPublishing/Exploratory-Data-Analysis-with-Python-Cookbook/2ab376175c0cf8bc368c986484e862fa3a5ba319/Data/datasets_notes.numbers -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Packt 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

Machine Learning Summit 2025

2 | 3 | ## Machine Learning Summit 2025 4 | **Bridging Theory and Practice: ML Solutions for Today’s Challenges** 5 | 6 | 3 days, 20+ experts, and 25+ tech sessions and talks covering critical aspects of: 7 | - **Agentic and Generative AI** 8 | - **Applied Machine Learning in the Real World** 9 | - **ML Engineering and Optimization** 10 | 11 | 👉 [Book your ticket now >>](https://packt.link/mlsumgh) 12 | 13 | --- 14 | 15 | ## Join Our Newsletters 📬 16 | 17 | ### DataPro 18 | *The future of AI is unfolding. Don’t fall behind.* 19 | 20 |

DataPro QR

21 | 22 | Stay ahead with [**DataPro**](https://landing.packtpub.com/subscribe-datapronewsletter/?link_from_packtlink=yes), the free weekly newsletter for data scientists, AI/ML researchers, and data engineers. 23 | From trending tools like **PyTorch**, **scikit-learn**, **XGBoost**, and **BentoML** to hands-on insights on **database optimization** and real-world **ML workflows**, you’ll get what matters, fast. 24 | 25 | > Stay sharp with [DataPro](https://landing.packtpub.com/subscribe-datapronewsletter/?link_from_packtlink=yes). Join **115K+ data professionals** who never miss a beat. 26 | 27 | --- 28 | 29 | ### BIPro 30 | *Business runs on data. Make sure yours tells the right story.* 31 | 32 |

BIPro QR

33 | 34 | [**BIPro**](https://landing.packtpub.com/subscribe-bipro-newsletter/?link_from_packtlink=yes) is your free weekly newsletter for BI professionals, analysts, and data leaders. 35 | Get practical tips on **dashboarding**, **data visualization**, and **analytics strategy** with tools like **Power BI**, **Tableau**, **Looker**, **SQL**, and **dbt**. 36 | 37 | > Get smarter with [BIPro](https://landing.packtpub.com/subscribe-bipro-newsletter/?link_from_packtlink=yes). Trusted by **35K+ BI professionals**, see what you’re missing. 38 | 39 | 40 | ### [Packt Conference : Put Generative AI to work on Oct 11-13 (Virtual)](https://packt.link/JGIEY) 41 | 42 |

[![Packt Conference](https://hub.packtpub.com/wp-content/uploads/2023/08/put-generative-ai-to-work-packt.png)](https://packt.link/JGIEY)

43 | 3 Days, 20+ AI Experts, 25+ Workshops and Power Talks 44 | 45 | Code: USD75OFF 46 | 47 | # Exploratory Data Analysis with Python Cookbook 48 | 49 | Exploratory Data Analysis with Python Cookbook 50 | 51 | This is the code repository for [Exploratory Data Analysis with Python Cookbook](https://www.packtpub.com/product/exploratory-data-analysis-with-python-cookbook/9781803231105?utm_source=github&utm_medium=repository&utm_campaign=9781803231105), published by Packt. 52 | 53 | **Over 50 recipes to analyze, visualize, and extract insights from structured and unstructured data** 54 | 55 | ## What is this book about? 56 | Exploratory data analysis (EDA) is a crucial step in data analysis and machine learning projects as it helps in uncovering relationships and patterns and provides insights into structured and unstructured datasets. With various techniques and libraries available for performing EDA, choosing the right approach can sometimes bechallenging. This hands-on guide provides you with practical steps and ready-to-use code for conducting exploratory analysis on tabular, time series, and textual data. 57 | 58 | This book covers the following exciting features: 59 | * Perform EDA with leading Python data visualization libraries 60 | * Execute univariate, bivariate, and multivariate analyses on tabular data 61 | * Uncover patterns and relationships within time series data 62 | * Identify hidden patterns within textual data 63 | * Discover different techniques to prepare data for analysis 64 | * Overcome the challenge of outliers and missing values during data analysis 65 | * Leverage automated EDA for fast and efficient analysis 66 | 67 | If you feel this book is for you, get your [copy](https://www.amazon.com/dp/B09NC5XJ6D) today! 68 | 69 | https://www.packtpub.com/ 71 | 72 | 73 | ## Instructions and Navigations 74 | All of the code is organized into folders. 75 | 76 | The code will look like the following: 77 | ``` 78 | import numpy as np 79 | import pandas as pd 80 | import seaborn as sns 81 | ``` 82 | 83 | 84 | **Following is what you need for this book:** 85 | If you are a data analyst interested in the practical application of exploratory data analysis in Python, then this book is for you. This book will also benefit data scientists, researchers, and statisticians who are looking for hands-on instructions on how to apply EDA techniques using Python libraries. Basic knowledge of Python programming and a basic understanding of fundamental statistical concepts is a prerequisite. 86 | 87 | With the following software and hardware list you can run all code files present in the book (Chapter 1-10). 88 | 89 | 90 | ### Software and Hardware List 91 | 92 | Basic knowledge of Python and statistical concepts is all that is needed to get the best out of this book. 93 | System requirements are mentioned in the following table: 94 | 95 | | Software/Hardware | Operating System requirements | 96 | | ------------------------------------ | -----------------------------------| 97 | | Python 3.6+ | Windows, Mac OS X, and Linux (Any) | 98 | | 512GB, 8GB RAM, i5 processor(Preferred specs) | Windows, Mac OS X, and Linux (Any) | 99 | 100 | 101 | 102 | We also provide a PDF file that has color images of the screenshots/diagrams used in this book. [Click here to download it](https://packt.link/npXws). 103 | 104 | 105 | ### Related products 106 | * Python Data Cleaning Cookbook[[Packt]](https://www.packtpub.com/product/python-data-cleaning-cookbook/9781800565661) [[Amazon]](https://www.amazon.com/dp/1800565666) 107 | 108 | * Hands-On Data Preprocessing in Python [[Packt]](https://www.packtpub.com/product/hands-on-data-preprocessing-in-python/9781801072137) [[Amazon]](https://www.amazon.com/dp/1801072132) 109 | 110 | ## Get to Know the Author 111 | **Ayodele Oluleye** 112 | is a certified data professional with a rich cross functional background that spans across 113 | strategy, data management, analytics, and data science. He currently leads a team of data professionals 114 | that spearheads data science and analytics initiatives across a leading African non-banking financial 115 | services group. Prior to this role, he spent over 8 years at a big four consulting firm working on strategy, 116 | data science and automation projects for clients across various industries. In that capacity, he was a 117 | key member of the data science and automation team which developed a proprietary big data fraud 118 | detection solution used by many Nigerian financial institutions today. To learn more about him, visit 119 | his LinkedIn profile. 120 | --------------------------------------------------------------------------------