├── piggy_bank.jpg ├── ReadME.md ├── .ipynb_checkpoints ├── notebook-checkpoint.ipynb └── Designing a Bank Marketing Database-checkpoint.ipynb └── Designing a Bank Marketing Database.ipynb /piggy_bank.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Chisomnwa/Designing-a-Bank-Marketing-Database/main/piggy_bank.jpg -------------------------------------------------------------------------------- /ReadME.md: -------------------------------------------------------------------------------- 1 | #

Designing a Bank Marketing Database 2 | 3 | ![piggy_bank](piggy_bank.jpg) 4 | 5 |
6 | 7 | --- 8 | *This **Data Engineering** project involves creating a comprehensive database to store and manage customer information for a bank's marketing campaigns.* 9 | 10 |
11 | 12 | ## Project Description 13 | 14 | Personal loans are a lucrative revenue stream for banks. The typical interest rate of a two year loan in the United Kingdom is [around 10%](https://www.experian.com/blogs/ask-experian/whats-a-good-interest-rate-for-a-personal-loan/). This might not sound like a lot, but in September 2022 alone UK consumers borrowed [around £1.5 billion](https://www.ukfinance.org.uk/system/files/2022-12/Household%20Finance%20Review%202022%20Q3-%20Final.pdf), which would mean approximately £300 million in interest generated by banks over two years! 15 | 16 | You have been asked to work with a bank to clean and store the data they collected as part of a recent marketing campaign, which aimed to get customers to take out a personal loan. They plan to conduct more marketing campaigns going forward so would like you to set up a PostgreSQL database to store this campaign's data, designing the schema in a way that would allow data from future campaigns to be easily imported. 17 | 18 | They have supplied you with a csv file called `"bank_marketing.csv"`, which you will need to clean, reformat, and split, in order to save separate files based on the tables you will create. It is recommended to use `pandas` for these tasks. 19 | 20 | Lastly, you will write the SQL code that the bank can execute to create the tables and populate with the data from the csv files. As the bank are quite strict about their security, you'll save SQL files as multiline string variables that they can then use to create the database on their end. 21 | 22 | You have been asked to design a database that will have three tables: 23 | 24 | ## client 25 | 26 | | column | data type | description | original column in dataset | 27 | |--------|-----------|-------------|----------------------------| 28 | | `id` | `serial` | Client ID - primary key | `client_id` | 29 | | `age` | `integer` | Client's age in years | `age` | 30 | | `job` | `text` | Client's type of job | `job` | 31 | | `marital` | `text` | Client's marital status | `marital` | 32 | | `education` | `text` | Client's level of education | `education` | 33 | | `credit_default` | `boolean` | Whether the client's credit is in default | `credit_default` | 34 | | `housing` | `boolean` | Whether the client has an existing housing loan (mortgage) | `housing` | 35 | | `loan` | `boolean` | Whether the client has an existing personal loan | `loan` | 36 | 37 |
38 | 39 | ## campaign 40 | 41 | | column | data type | description | original column in dataset | 42 | |--------|-----------|-------------|----------------------------| 43 | | `campaign_id` | `serial` | Campaign ID - primary key | N/A - new column | 44 | | `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` | 45 | | `number_contacts` | `integer` | Number of contact attempts to the client in the current campaign | `campaign` | 46 | | `contact_duration` | `integer` | Last contact duration in seconds | `duration` | 47 | | `pdays` | `integer` | Number of days since contact in previous campaign (`999` = not previously contacted) | `pdays` | 48 | | `previous_campaign_contacts` | `integer` | Number of contact attempts to the client in the previous campaign | `previous` | 49 | | `previous_outcome` | `boolean` | Outcome of the previous campaign | `poutcome` | 50 | | `campaign_outcome` | `boolean` | Outcome of the current campaign | `y` | 51 | | `last_contact_date` | `date` | Last date the client was contacted | A combination of `day`, `month`, and the newly created `year` | 52 | 53 |
54 | 55 | ## economics 56 | 57 | | column | data type | description | original column in dataset | 58 | |--------|-----------|-------------|----------------------------| 59 | | `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` | 60 | | `emp_var_rate` | `float` | Employment variation rate (quarterly indicator) | `emp_var_rate` | 61 | | `cons_price_idx` | `float` | Consumer price index (monthly indicator) | `cons_price_idx` | 62 | | `euribor_three_months` | `float` | Euro Interbank Offered Rate (euribor) three month rate (daily indicator) | `euribor3m` | 63 | | `number_employed` | `float` | Number of employees (quarterly indicator)| `nr_employed` | 64 | 65 |
66 | 67 | ## Project Tasks 68 | 69 | View the [notebook](https://github.com/Chisomnwa/Designing-a-Bank-Marketing-Database/blob/main/Designing%20a%20Bank%20Marketing%20Database.ipynb) to see the project tasks and all expected output for each task and finally see how to successfully load the data into a PostgreSQL database after cleaning and getting it ready. 70 | 71 | In case, GitHub doesn't render the notebook withing the 5 seconds timeframe of clicking the link, you can view the notebook [here](https://nbviewer.org/github/Chisomnwa/Designing-a-Bank-Marketing-Database/blob/main/Designing%20a%20Bank%20Marketing%20Database.ipynb). 72 | 73 |
74 | 75 | --- 76 | 77 | And you'd love to read my blog post on this project [here](https://medium.com/@chisompromise/designing-a-bank-marketing-database-a033f3eee479). 78 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/notebook-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "02077ee3-e1e4-4fc5-8de1-16e987afa5fb", 6 | "metadata": {}, 7 | "source": [ 8 | "![piggy_bank](piggy_bank.jpg)\n", 9 | "\n", 10 | "
\n", 11 | "\n", 12 | "Personal loans are a lucrative revenue stream for banks. The typical interest rate of a two year loan in the United Kingdom is [around 10%](https://www.experian.com/blogs/ask-experian/whats-a-good-interest-rate-for-a-personal-loan/). This might not sound like a lot, but in September 2022 alone UK consumers borrowed [around £1.5 billion](https://www.ukfinance.org.uk/system/files/2022-12/Household%20Finance%20Review%202022%20Q3-%20Final.pdf), which would mean approximately £300 million in interest generated by banks over two years!\n", 13 | "\n", 14 | "You have been asked to work with a bank to clean and store the data they collected as part of a recent marketing campaign, which aimed to get customers to take out a personal loan. They plan to conduct more marketing campaigns going forward so would like you to set up a PostgreSQL database to store this campaign's data, designing the schema in a way that would allow data from future campaigns to be easily imported. \n", 15 | "\n", 16 | "They have supplied you with a csv file called `\"bank_marketing.csv\"`, which you will need to clean, reformat, and split, in order to save separate files based on the tables you will create. It is recommended to use `pandas` for these tasks.\n", 17 | "\n", 18 | "Lastly, you will write the SQL code that the bank can execute to create the tables and populate with the data from the csv files. As the bank are quite strict about their security, you'll save SQL files as multiline string variables that they can then use to create the database on their end. \n", 19 | "\n", 20 | "You have been asked to design a database that will have three tables:\n", 21 | "\n", 22 | "## client\n", 23 | "\n", 24 | "| column | data type | description | original column in dataset |\n", 25 | "|--------|-----------|-------------|----------------------------|\n", 26 | "| `id` | `serial` | Client ID - primary key | `client_id` |\n", 27 | "| `age` | `integer` | Client's age in years | `age` |\n", 28 | "| `job` | `text` | Client's type of job | `job` |\n", 29 | "| `marital` | `text` | Client's marital status | `marital` | \n", 30 | "| `education` | `text` | Client's level of education | `education` |\n", 31 | "| `credit_default` | `boolean` | Whether the client's credit is in default | `credit_default` |\n", 32 | "| `housing` | `boolean` | Whether the client has an existing housing loan (mortgage) | `housing` | \n", 33 | "| `loan` | `boolean` | Whether the client has an existing personal loan | `loan` |\n", 34 | "\n", 35 | "
\n", 36 | "\n", 37 | "## campaign\n", 38 | "\n", 39 | "| column | data type | description | original column in dataset |\n", 40 | "|--------|-----------|-------------|----------------------------|\n", 41 | "| `campaign_id` | `serial` | Campaign ID - primary key | N/A - new column |\n", 42 | "| `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` |\n", 43 | "| `number_contacts` | `integer` | Number of contact attempts to the client in the current campaign | `campaign` |\n", 44 | "| `contact_duration` | `integer` | Last contact duration in seconds | `duration` |\n", 45 | "| `pdays` | `integer` | Number of days since contact in previous campaign (`999` = not previously contacted) | `pdays` |\n", 46 | "| `previous_campaign_contacts` | `integer` | Number of contact attempts to the client in the previous campaign | `previous` |\n", 47 | "| `previous_outcome` | `boolean` | Outcome of the previous campaign | `poutcome` |\n", 48 | "| `campaign_outcome` | `boolean` | Outcome of the current campaign | `y` |\n", 49 | "| `last_contact_date` | `date` | Last date the client was contacted | A combination of `day`, `month`, and the newly created `year` |\n", 50 | "\n", 51 | "
\n", 52 | "\n", 53 | "## economics\n", 54 | "\n", 55 | "| column | data type | description | original column in dataset |\n", 56 | "|--------|-----------|-------------|----------------------------|\n", 57 | "| `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` |\n", 58 | "| `emp_var_rate` | `float` | Employment variation rate (quarterly indicator) | `emp_var_rate` |\n", 59 | "| `cons_price_idx` | `float` | Consumer price index (monthly indicator) | `cons_price_idx` |\n", 60 | "| `euribor_three_months` | `float` | Euro Interbank Offered Rate (euribor) three month rate (daily indicator) | `euribor3m` |\n", 61 | "| `number_employed` | `float` | Number of employees (quarterly indicator)| `nr_employed` |" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 1, 67 | "id": "e2edad3c-8286-4983-b5b7-35d94fd78023", 68 | "metadata": { 69 | "executionCancelledAt": null, 70 | "executionTime": 1057, 71 | "lastExecutedAt": 1686069923599, 72 | "lastScheduledRunId": null, 73 | "lastSuccessfullyExecutedCode": "# Start coding...\nimport pandas as pd\nimport numpy as np\n\n# Read in csv\nmarketing = pd.read_csv(\"bank_marketing.csv\")\n\n# Split into the three tables\nclient = marketing[[\"client_id\", \"age\", \"job\", \"marital\", \"education\", \n \"credit_default\", \"housing\", \"loan\"]]\ncampaign = marketing[[\"client_id\", \"campaign\", \"month\", \"day\", \n \"duration\", \"pdays\", \"previous\", \"poutcome\", \"y\"]]\neconomics = marketing[[\"client_id\", \"emp_var_rate\", \"cons_price_idx\", \n \"euribor3m\", \"nr_employed\"]]\n\n# Rename client_id in the client table\nclient.rename(columns={\"client_id\": \"id\"}, inplace=True)\n\n# Rename duration, y, and campaign columns\ncampaign.rename(columns={\"duration\": \"contact_duration\", \n \"y\": \"campaign_outcome\", \n \"campaign\": \"number_contacts\",\n \"previous\": \"previous_campaign_contacts\",\n \"poutcome\": \"previous_outcome\"}, \n inplace=True)\n\n# Rename euribor3m and nr_employed\neconomics.rename(columns={\"euribor3m\": \"euribor_three_months\", \n \"nr_employed\": \"number_employed\"}, \n inplace=True)\n\n# Clean education column\nclient[\"education\"] = client[\"education\"].str.replace(\".\", \"_\")\nclient[\"education\"] = client[\"education\"].replace(\"unknown\", np.NaN)\n\n# Clean job column\nclient[\"job\"] = client[\"job\"].str.replace(\".\", \"\")\n\n# Change campaign_outcome to binary values\ncampaign[\"campaign_outcome\"] = campaign[\"campaign_outcome\"].map({\"yes\": 1, \n \"no\": 0})\n\n# Convert poutcome to binary values\ncampaign[\"previous_outcome\"] = campaign[\"previous_outcome\"].replace(\"nonexistent\", \n np.NaN)\ncampaign[\"previous_outcome\"] = campaign[\"previous_outcome\"].map({\"success\": 1, \n \"failure\": 0})\n\n# Add campaign_id column\ncampaign[\"campaign_id\"] = 1\n\n# Capitalize month and day columns\ncampaign[\"month\"] = campaign[\"month\"].str.capitalize()\n\n# Add year column\ncampaign[\"year\"] = \"2022\"\n\n# Convert day to string\ncampaign[\"day\"] = campaign[\"day\"].astype(str)\n\n# Add last_contact_date column\ncampaign[\"last_contact_date\"] = campaign[\"year\"] + \"-\" + campaign[\"month\"] + \"-\" + campaign[\"day\"]\n\n# Convert to datetime\ncampaign[\"last_contact_date\"] = pd.to_datetime(campaign[\"last_contact_date\"], \n format=\"%Y-%b-%d\")\n\n# Drop unneccessary columns\ncampaign.drop(columns=[\"month\", \"day\", \"year\"], inplace=True)\n\n# Save tables to individual csv files\nclient.to_csv(\"client.csv\", index=False)\ncampaign.to_csv(\"campaign.csv\", index=False)\neconomics.to_csv(\"economics.csv\", index=False)\n\n# Store and print database_design\nclient_table = \"\"\"CREATE TABLE client\n(\n id SERIAL PRIMARY KEY,\n age INTEGER,\n job TEXT,\n marital TEXT,\n education TEXT,\n credit_default BOOLEAN,\n housing BOOLEAN,\n loan BOOLEAN\n);\n\\copy client from 'client.csv' DELIMITER ',' CSV HEADER\n\"\"\"\n\ncampaign_table = \"\"\"CREATE TABLE campaign\n(\n campaign_id SERIAL PRIMARY KEY,\n client_id SERIAL references client (id),\n number_contacts INTEGER,\n contact_duration INTEGER,\n pdays INTEGER,\n previous_campaign_contacts INTEGER,\n previous_outcome BOOLEAN,\n campaign_outcome BOOLEAN,\n last_contact_date DATE \n);\n\\copy campaign from 'campaign.csv' DELIMITER ',' CSV HEADER\n\"\"\"\n\neconomics_table = \"\"\"CREATE TABLE economics\n(\n client_id SERIAL references client (id),\n emp_var_rate FLOAT,\n cons_price_idx FLOAT,\n euribor_three_months FLOAT,\n number_employed FLOAT\n);\n\\copy economics from 'economics.csv' DELIMITER ',' CSV HEADER\n\"\"\"" 74 | }, 75 | "outputs": [], 76 | "source": [ 77 | "import pandas as pd\n", 78 | "import numpy as np\n", 79 | "\n", 80 | "# Start coding here..." 81 | ] 82 | } 83 | ], 84 | "metadata": { 85 | "editor": "DataCamp Workspace", 86 | "kernelspec": { 87 | "display_name": "Python 3 (ipykernel)", 88 | "language": "python", 89 | "name": "python3" 90 | }, 91 | "language_info": { 92 | "codemirror_mode": { 93 | "name": "ipython", 94 | "version": 3 95 | }, 96 | "file_extension": ".py", 97 | "mimetype": "text/x-python", 98 | "name": "python", 99 | "nbconvert_exporter": "python", 100 | "pygments_lexer": "ipython3", 101 | "version": "3.11.5" 102 | } 103 | }, 104 | "nbformat": 4, 105 | "nbformat_minor": 5 106 | } 107 | -------------------------------------------------------------------------------- /Designing a Bank Marketing Database.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "02077ee3-e1e4-4fc5-8de1-16e987afa5fb", 6 | "metadata": {}, 7 | "source": [ 8 | "![piggy_bank](piggy_bank.jpg)\n", 9 | "\n", 10 | "
\n", 11 | "\n", 12 | "Personal loans are a lucrative revenue stream for banks. The typical interest rate of a two year loan in the United Kingdom is [around 10%](https://www.experian.com/blogs/ask-experian/whats-a-good-interest-rate-for-a-personal-loan/). This might not sound like a lot, but in September 2022 alone UK consumers borrowed [around £1.5 billion](https://www.ukfinance.org.uk/system/files/2022-12/Household%20Finance%20Review%202022%20Q3-%20Final.pdf), which would mean approximately £300 million in interest generated by banks over two years!\n", 13 | "\n", 14 | "You have been asked to work with a bank to clean and store the data they collected as part of a recent marketing campaign, which aimed to get customers to take out a personal loan. They plan to conduct more marketing campaigns going forward so would like you to set up a PostgreSQL database to store this campaign's data, designing the schema in a way that would allow data from future campaigns to be easily imported. \n", 15 | "\n", 16 | "They have supplied you with a csv file called `\"bank_marketing.csv\"`, which you will need to clean, reformat, and split, in order to save separate files based on the tables you will create. It is recommended to use `pandas` for these tasks.\n", 17 | "\n", 18 | "Lastly, you will write the SQL code that the bank can execute to create the tables and populate with the data from the csv files. As the bank are quite strict about their security, you'll save SQL files as multiline string variables that they can then use to create the database on their end. \n", 19 | "\n", 20 | "You have been asked to design a database that will have three tables:\n", 21 | "\n", 22 | "## client\n", 23 | "\n", 24 | "| column | data type | description | original column in dataset |\n", 25 | "|--------|-----------|-------------|----------------------------|\n", 26 | "| `id` | `serial` | Client ID - primary key | `client_id` |\n", 27 | "| `age` | `integer` | Client's age in years | `age` |\n", 28 | "| `job` | `text` | Client's type of job | `job` |\n", 29 | "| `marital` | `text` | Client's marital status | `marital` | \n", 30 | "| `education` | `text` | Client's level of education | `education` |\n", 31 | "| `credit_default` | `boolean` | Whether the client's credit is in default | `credit_default` |\n", 32 | "| `housing` | `boolean` | Whether the client has an existing housing loan (mortgage) | `housing` | \n", 33 | "| `loan` | `boolean` | Whether the client has an existing personal loan | `loan` |\n", 34 | "\n", 35 | "
\n", 36 | "\n", 37 | "## campaign\n", 38 | "\n", 39 | "| column | data type | description | original column in dataset |\n", 40 | "|--------|-----------|-------------|----------------------------|\n", 41 | "| `campaign_id` | `serial` | Campaign ID - primary key | N/A - new column |\n", 42 | "| `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` |\n", 43 | "| `number_contacts` | `integer` | Number of contact attempts to the client in the current campaign | `campaign` |\n", 44 | "| `contact_duration` | `integer` | Last contact duration in seconds | `duration` |\n", 45 | "| `pdays` | `integer` | Number of days since contact in previous campaign (`999` = not previously contacted) | `pdays` |\n", 46 | "| `previous_campaign_contacts` | `integer` | Number of contact attempts to the client in the previous campaign | `previous` |\n", 47 | "| `previous_outcome` | `boolean` | Outcome of the previous campaign | `poutcome` |\n", 48 | "| `campaign_outcome` | `boolean` | Outcome of the current campaign | `y` |\n", 49 | "| `last_contact_date` | `date` | Last date the client was contacted | A combination of `day`, `month`, and the newly created `year` |\n", 50 | "\n", 51 | "
\n", 52 | "\n", 53 | "## economics\n", 54 | "\n", 55 | "| column | data type | description | original column in dataset |\n", 56 | "|--------|-----------|-------------|----------------------------|\n", 57 | "| `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` |\n", 58 | "| `emp_var_rate` | `float` | Employment variation rate (quarterly indicator) | `emp_var_rate` |\n", 59 | "| `cons_price_idx` | `float` | Consumer price index (monthly indicator) | `cons_price_idx` |\n", 60 | "| `euribor_three_months` | `float` | Euro Interbank Offered Rate (euribor) three month rate (daily indicator) | `euribor3m` |\n", 61 | "| `number_employed` | `float` | Number of employees (quarterly indicator)| `nr_employed` |" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 1, 67 | "id": "e2edad3c-8286-4983-b5b7-35d94fd78023", 68 | "metadata": { 69 | "executionCancelledAt": null, 70 | "executionTime": 1057, 71 | "lastExecutedAt": 1686069923599, 72 | "lastScheduledRunId": null, 73 | "lastSuccessfullyExecutedCode": "# Start coding...\nimport pandas as pd\nimport numpy as np\n\n# Read in csv\nmarketing = pd.read_csv(\"bank_marketing.csv\")\n\n# Split into the three tables\nclient = marketing[[\"client_id\", \"age\", \"job\", \"marital\", \"education\", \n \"credit_default\", \"housing\", \"loan\"]]\ncampaign = marketing[[\"client_id\", \"campaign\", \"month\", \"day\", \n \"duration\", \"pdays\", \"previous\", \"poutcome\", \"y\"]]\neconomics = marketing[[\"client_id\", \"emp_var_rate\", \"cons_price_idx\", \n \"euribor3m\", \"nr_employed\"]]\n\n# Rename client_id in the client table\nclient.rename(columns={\"client_id\": \"id\"}, inplace=True)\n\n# Rename duration, y, and campaign columns\ncampaign.rename(columns={\"duration\": \"contact_duration\", \n \"y\": \"campaign_outcome\", \n \"campaign\": \"number_contacts\",\n \"previous\": \"previous_campaign_contacts\",\n \"poutcome\": \"previous_outcome\"}, \n inplace=True)\n\n# Rename euribor3m and nr_employed\neconomics.rename(columns={\"euribor3m\": \"euribor_three_months\", \n \"nr_employed\": \"number_employed\"}, \n inplace=True)\n\n# Clean education column\nclient[\"education\"] = client[\"education\"].str.replace(\".\", \"_\")\nclient[\"education\"] = client[\"education\"].replace(\"unknown\", np.NaN)\n\n# Clean job column\nclient[\"job\"] = client[\"job\"].str.replace(\".\", \"\")\n\n# Change campaign_outcome to binary values\ncampaign[\"campaign_outcome\"] = campaign[\"campaign_outcome\"].map({\"yes\": 1, \n \"no\": 0})\n\n# Convert poutcome to binary values\ncampaign[\"previous_outcome\"] = campaign[\"previous_outcome\"].replace(\"nonexistent\", \n np.NaN)\ncampaign[\"previous_outcome\"] = campaign[\"previous_outcome\"].map({\"success\": 1, \n \"failure\": 0})\n\n# Add campaign_id column\ncampaign[\"campaign_id\"] = 1\n\n# Capitalize month and day columns\ncampaign[\"month\"] = campaign[\"month\"].str.capitalize()\n\n# Add year column\ncampaign[\"year\"] = \"2022\"\n\n# Convert day to string\ncampaign[\"day\"] = campaign[\"day\"].astype(str)\n\n# Add last_contact_date column\ncampaign[\"last_contact_date\"] = campaign[\"year\"] + \"-\" + campaign[\"month\"] + \"-\" + campaign[\"day\"]\n\n# Convert to datetime\ncampaign[\"last_contact_date\"] = pd.to_datetime(campaign[\"last_contact_date\"], \n format=\"%Y-%b-%d\")\n\n# Drop unneccessary columns\ncampaign.drop(columns=[\"month\", \"day\", \"year\"], inplace=True)\n\n# Save tables to individual csv files\nclient.to_csv(\"client.csv\", index=False)\ncampaign.to_csv(\"campaign.csv\", index=False)\neconomics.to_csv(\"economics.csv\", index=False)\n\n# Store and print database_design\nclient_table = \"\"\"CREATE TABLE client\n(\n id SERIAL PRIMARY KEY,\n age INTEGER,\n job TEXT,\n marital TEXT,\n education TEXT,\n credit_default BOOLEAN,\n housing BOOLEAN,\n loan BOOLEAN\n);\n\\copy client from 'client.csv' DELIMITER ',' CSV HEADER\n\"\"\"\n\ncampaign_table = \"\"\"CREATE TABLE campaign\n(\n campaign_id SERIAL PRIMARY KEY,\n client_id SERIAL references client (id),\n number_contacts INTEGER,\n contact_duration INTEGER,\n pdays INTEGER,\n previous_campaign_contacts INTEGER,\n previous_outcome BOOLEAN,\n campaign_outcome BOOLEAN,\n last_contact_date DATE \n);\n\\copy campaign from 'campaign.csv' DELIMITER ',' CSV HEADER\n\"\"\"\n\neconomics_table = \"\"\"CREATE TABLE economics\n(\n client_id SERIAL references client (id),\n emp_var_rate FLOAT,\n cons_price_idx FLOAT,\n euribor_three_months FLOAT,\n number_employed FLOAT\n);\n\\copy economics from 'economics.csv' DELIMITER ',' CSV HEADER\n\"\"\"" 74 | }, 75 | "outputs": [], 76 | "source": [ 77 | "# Import libraries\n", 78 | "import pandas as pd\n", 79 | "import numpy as np\n", 80 | "\n", 81 | "# suppress warnings from final output\n", 82 | "import warnings\n", 83 | "warnings.simplefilter(\"ignore\")\n", 84 | "\n", 85 | "# Start coding here..." 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "id": "9d94086e", 91 | "metadata": {}, 92 | "source": [ 93 | "#### Instruction 1: Read in bank_marketing.csv as a pandas DataFrame." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 2, 99 | "id": "ac57dc04", 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/html": [ 105 | "

\n", 106 | "\n", 119 | "\n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | "
client_idagejobmaritaleducationcredit_defaulthousingloancontactmonth...campaignpdayspreviouspoutcomeemp_var_ratecons_price_idxcons_conf_idxeuribor3mnr_employedy
0056housemaidmarriedbasic.4ynononotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
1157servicesmarriedhigh.schoolunknownnonotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
2237servicesmarriedhigh.schoolnoyesnotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
3340admin.marriedbasic.6ynononotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
4456servicesmarriedhigh.schoolnonoyestelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
\n", 269 | "

5 rows × 22 columns

\n", 270 | "
" 271 | ], 272 | "text/plain": [ 273 | " client_id age job marital education credit_default housing \\\n", 274 | "0 0 56 housemaid married basic.4y no no \n", 275 | "1 1 57 services married high.school unknown no \n", 276 | "2 2 37 services married high.school no yes \n", 277 | "3 3 40 admin. married basic.6y no no \n", 278 | "4 4 56 services married high.school no no \n", 279 | "\n", 280 | " loan contact month ... campaign pdays previous poutcome \\\n", 281 | "0 no telephone may ... 1 999 0 nonexistent \n", 282 | "1 no telephone may ... 1 999 0 nonexistent \n", 283 | "2 no telephone may ... 1 999 0 nonexistent \n", 284 | "3 no telephone may ... 1 999 0 nonexistent \n", 285 | "4 yes telephone may ... 1 999 0 nonexistent \n", 286 | "\n", 287 | " emp_var_rate cons_price_idx cons_conf_idx euribor3m nr_employed y \n", 288 | "0 1.1 93.994 -36.4 4.857 5191.0 no \n", 289 | "1 1.1 93.994 -36.4 4.857 5191.0 no \n", 290 | "2 1.1 93.994 -36.4 4.857 5191.0 no \n", 291 | "3 1.1 93.994 -36.4 4.857 5191.0 no \n", 292 | "4 1.1 93.994 -36.4 4.857 5191.0 no \n", 293 | "\n", 294 | "[5 rows x 22 columns]" 295 | ] 296 | }, 297 | "execution_count": 2, 298 | "metadata": {}, 299 | "output_type": "execute_result" 300 | } 301 | ], 302 | "source": [ 303 | "# Read in bank_marketing.csv as a pandas DataFrame.\n", 304 | "bank_marketing_data = pd.read_csv(\"bank_marketing.csv\")\n", 305 | "bank_marketing_data.head()" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "id": "11195a84", 311 | "metadata": {}, 312 | "source": [ 313 | "#### Instruction 2:\n", 314 | "\n", 315 | "Split the data into three DataFrames using information provided about the desired tables as your guide: \n", 316 | "\n", 317 | "* one with information about the client, \n", 318 | "* another containing campaign data, and \n", 319 | "* a third to store information about economics at the time of the campaign." 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 3, 325 | "id": "6707773f", 326 | "metadata": {}, 327 | "outputs": [ 328 | { 329 | "name": "stdout", 330 | "output_type": "stream", 331 | "text": [ 332 | "Client Data:\n", 333 | " client_id age job marital education credit_default housing loan\n", 334 | "0 0 56 housemaid married basic.4y no no no\n", 335 | "1 1 57 services married high.school unknown no no\n", 336 | "2 2 37 services married high.school no yes no\n", 337 | "3 3 40 admin. married basic.6y no no no\n", 338 | "4 4 56 services married high.school no no yes\n", 339 | "\n", 340 | "Campaign Data:\n", 341 | " client_id campaign duration pdays previous poutcome y month day\n", 342 | "0 0 1 261 999 0 nonexistent no may 13\n", 343 | "1 1 1 149 999 0 nonexistent no may 19\n", 344 | "2 2 1 226 999 0 nonexistent no may 23\n", 345 | "3 3 1 151 999 0 nonexistent no may 27\n", 346 | "4 4 1 307 999 0 nonexistent no may 3\n", 347 | "\n", 348 | "Economic Data:\n", 349 | " client_id emp_var_rate cons_price_idx euribor3m nr_employed\n", 350 | "0 0 1.1 93.994 4.857 5191.0\n", 351 | "1 1 1.1 93.994 4.857 5191.0\n", 352 | "2 2 1.1 93.994 4.857 5191.0\n", 353 | "3 3 1.1 93.994 4.857 5191.0\n", 354 | "4 4 1.1 93.994 4.857 5191.0\n" 355 | ] 356 | } 357 | ], 358 | "source": [ 359 | "# Define the columns for each table\n", 360 | "client_columns = ['client_id', 'age', 'job', 'marital', 'education', 'credit_default', 'housing', 'loan']\n", 361 | "campaign_columns = ['client_id', 'campaign', 'duration', 'pdays', 'previous', 'poutcome', 'y', 'month', 'day']\n", 362 | "economic_columns = ['client_id', 'emp_var_rate', 'cons_price_idx', 'euribor3m', 'nr_employed']\n", 363 | "\n", 364 | "# Create Dataframes for each table\n", 365 | "client = bank_marketing_data[client_columns]\n", 366 | "campaign = bank_marketing_data[campaign_columns]\n", 367 | "economics = bank_marketing_data[economic_columns]\n", 368 | "\n", 369 | "# Print the first few rows of each DataFrame to verify the split\n", 370 | "print(\"Client Data:\")\n", 371 | "print(client.head())\n", 372 | "\n", 373 | "print(\"\\nCampaign Data:\")\n", 374 | "print(campaign.head())\n", 375 | "\n", 376 | "print(\"\\nEconomic Data:\")\n", 377 | "print(economics.head())" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "id": "a77f1836", 383 | "metadata": {}, 384 | "source": [ 385 | "#### Instruction 3:\n", 386 | "\n", 387 | "Rename the column \"client_id\" to \"id\" in client (leave as-is in the other subsets); \"duration\" to \"contact_duration\",\n", 388 | "\"previous\" to \"previous_campaign_contacts\", \"y\" to \"campaign_outcome\", \"poutcome\" to \"previous_outcome\", and \n", 389 | "\"campaign\" to \"number_contacts\" in campaign; and \"euribor3m\" to \"euribor_three_months\" and \"nr_employed\" to\n", 390 | "\"number_employed\" in economics." 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 4, 396 | "id": "6f49b4d5", 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "name": "stdout", 401 | "output_type": "stream", 402 | "text": [ 403 | "Client Data:\n", 404 | " id age job marital education credit_default housing loan\n", 405 | "0 0 56 housemaid married basic.4y no no no\n", 406 | "1 1 57 services married high.school unknown no no\n", 407 | "2 2 37 services married high.school no yes no\n", 408 | "3 3 40 admin. married basic.6y no no no\n", 409 | "4 4 56 services married high.school no no yes\n", 410 | "\n", 411 | "Campaign Data:\n", 412 | " client_id number_contacts contact_duration pdays \\\n", 413 | "0 0 1 261 999 \n", 414 | "1 1 1 149 999 \n", 415 | "2 2 1 226 999 \n", 416 | "3 3 1 151 999 \n", 417 | "4 4 1 307 999 \n", 418 | "\n", 419 | " previous_campaign_contacts previous_outcome campaign_outcome month day \n", 420 | "0 0 nonexistent no may 13 \n", 421 | "1 0 nonexistent no may 19 \n", 422 | "2 0 nonexistent no may 23 \n", 423 | "3 0 nonexistent no may 27 \n", 424 | "4 0 nonexistent no may 3 \n", 425 | "\n", 426 | "Economic Data:\n", 427 | " client_id emp_var_rate cons_price_idx euribor_three_months \\\n", 428 | "0 0 1.1 93.994 4.857 \n", 429 | "1 1 1.1 93.994 4.857 \n", 430 | "2 2 1.1 93.994 4.857 \n", 431 | "3 3 1.1 93.994 4.857 \n", 432 | "4 4 1.1 93.994 4.857 \n", 433 | "\n", 434 | " number_employed \n", 435 | "0 5191.0 \n", 436 | "1 5191.0 \n", 437 | "2 5191.0 \n", 438 | "3 5191.0 \n", 439 | "4 5191.0 \n" 440 | ] 441 | } 442 | ], 443 | "source": [ 444 | "# Rename columns in the client DataFrame\n", 445 | "client.rename(columns={'client_id': 'id'}, inplace=True)\n", 446 | "\n", 447 | "# Rename columns in the campaign DataFrame\n", 448 | "campaign.rename(columns={'duration': 'contact_duration',\n", 449 | " 'previous': 'previous_campaign_contacts',\n", 450 | " 'y': 'campaign_outcome',\n", 451 | " 'poutcome': 'previous_outcome',\n", 452 | " 'campaign': 'number_contacts'}, inplace=True)\n", 453 | "\n", 454 | "# Rename columns in the economic DataFrame\n", 455 | "economics.rename(columns={'euribor3m': 'euribor_three_months',\n", 456 | " 'nr_employed': 'number_employed'}, inplace=True)\n", 457 | "\n", 458 | "# Print the first few rows of each DataFrame to verify the renaming of columns\n", 459 | "print(\"Client Data:\")\n", 460 | "print(client.head())\n", 461 | "\n", 462 | "print(\"\\nCampaign Data:\")\n", 463 | "print(campaign.head())\n", 464 | "\n", 465 | "print(\"\\nEconomic Data:\")\n", 466 | "print(economics.head())" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "id": "08f13310", 472 | "metadata": {}, 473 | "source": [ 474 | "#### Instruction 4:\n", 475 | "\n", 476 | "Clean the \"education\" column, changing \".\" to \"_\" and \"unknown\" to NumPy's null values." 477 | ] 478 | }, 479 | { 480 | "cell_type": "code", 481 | "execution_count": 5, 482 | "id": "f9a9ddd1", 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "# Replace \".\" with \"_\"\n", 487 | "client['education'] = client['education'].str.replace('.', '_')\n", 488 | "\n", 489 | "# Replace \"unknown\" with Numpy's NaN\n", 490 | "client['education'].replace('unknown', np.nan, inplace=True)" 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": 6, 496 | "id": "61f72247", 497 | "metadata": {}, 498 | "outputs": [ 499 | { 500 | "name": "stdout", 501 | "output_type": "stream", 502 | "text": [ 503 | "The 'education' column has been successfully cleaned.\n" 504 | ] 505 | } 506 | ], 507 | "source": [ 508 | "# Check for \".\" in the \"education\" column\n", 509 | "dot_count = client['education'].str.contains('\\.').sum()\n", 510 | "\n", 511 | "# Check for \"unknown\" in the \"education\" column\n", 512 | "unknown_count = (client['education'] == 'unknown').sum()\n", 513 | "\n", 514 | "if dot_count == 0 and unknown_count == 0:\n", 515 | " print(\"The 'education' column has been successfully cleaned.\")\n", 516 | "else:\n", 517 | " print(f\"The 'education' column still contains {dot_count} '.' and {unknown_count} 'unknown' values.\")" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "id": "7606b988", 523 | "metadata": {}, 524 | "source": [ 525 | "#### Instrucion 5:\n", 526 | "\n", 527 | "Remove periods from the \"job\" column." 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 7, 533 | "id": "196ea3e0", 534 | "metadata": {}, 535 | "outputs": [ 536 | { 537 | "name": "stdout", 538 | "output_type": "stream", 539 | "text": [ 540 | "Periods have been successfully removed from the 'job' column.\n" 541 | ] 542 | } 543 | ], 544 | "source": [ 545 | "# Remove \".\" from the \"job\" column\n", 546 | "client['job'] = client['job'].str.replace('.', '')\n", 547 | "\n", 548 | "# Check if periods are removed\n", 549 | "if '.' not in client['job'].values:\n", 550 | " print(\"Periods have been successfully removed from the 'job' column.\")\n", 551 | "else:\n", 552 | " print(\"Periods still exist in the job column.\")" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "id": "9a0eb339", 558 | "metadata": {}, 559 | "source": [ 560 | "#### Instruction 6:\n", 561 | "\n", 562 | "Convert \"success\" and \"failure\" in the \"previous_outcome\" and \"campaign_outcome\" columns to binary (1 or 0), \n", 563 | "along with the changing \"nonexistent\" to NumPy's null values in \"previous_outcome\"." 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": 8, 569 | "id": "fe0ace15", 570 | "metadata": {}, 571 | "outputs": [ 572 | { 573 | "name": "stdout", 574 | "output_type": "stream", 575 | "text": [ 576 | "Conversions and changes in 'previous_outcome' and 'campaign_outcome' columns were successfully applied.\n" 577 | ] 578 | } 579 | ], 580 | "source": [ 581 | "# Convert \"success\" and \"failure\" to binary (1 or 0) in \"previous_outcome\" and \"campaign_outcome\"\n", 582 | "campaign['previous_outcome'] = campaign['previous_outcome'].map({'success': 1, 'failure': 0})\n", 583 | "campaign['campaign_outcome'] = campaign['campaign_outcome'].map({'success': 1, 'failure': 0})\n", 584 | "\n", 585 | "# Change \"nonexistent\" to Numpy's NaN in \"previous_outcome\"\n", 586 | "campaign['previous_outcome'].replace('nonexistent', np.nan, inplace=True)\n", 587 | "\n", 588 | "# Check if the conversions and changes are applied\n", 589 | "if ('success' not in campaign['previous_outcome'].values and\n", 590 | " 'failure' not in campaign['previous_outcome'].values and\n", 591 | " 'nonexistent' not in campaign['previous_outcome'].values and\n", 592 | " 'success' not in campaign['campaign_outcome'].values and\n", 593 | " 'failure' not in campaign['campaign_outcome'].values):\n", 594 | " print(\"Conversions and changes in 'previous_outcome' and 'campaign_outcome' columns were successfully applied.\")\n", 595 | "else:\n", 596 | " print(\"Conversions and changes were not fully applied in 'previous_outcome' and 'campaign_outcome' columns.\")" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "id": "fa0e6239", 602 | "metadata": {}, 603 | "source": [ 604 | "#### Instruction 7:\n", 605 | "\n", 606 | "Add a column called campaign_id in campaign, where all rows have a value of 1." 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": 9, 612 | "id": "6b4ba272", 613 | "metadata": { 614 | "scrolled": true 615 | }, 616 | "outputs": [ 617 | { 618 | "data": { 619 | "text/html": [ 620 | "
\n", 621 | "\n", 634 | "\n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | "
client_idnumber_contactscontact_durationpdaysprevious_campaign_contactsprevious_outcomecampaign_outcomemonthdaycampaign_id
0012619990NaNNaNmay131
1111499990NaNNaNmay191
2212269990NaNNaNmay231
3311519990NaNNaNmay271
4413079990NaNNaNmay31
\n", 718 | "
" 719 | ], 720 | "text/plain": [ 721 | " client_id number_contacts contact_duration pdays \\\n", 722 | "0 0 1 261 999 \n", 723 | "1 1 1 149 999 \n", 724 | "2 2 1 226 999 \n", 725 | "3 3 1 151 999 \n", 726 | "4 4 1 307 999 \n", 727 | "\n", 728 | " previous_campaign_contacts previous_outcome campaign_outcome month day \\\n", 729 | "0 0 NaN NaN may 13 \n", 730 | "1 0 NaN NaN may 19 \n", 731 | "2 0 NaN NaN may 23 \n", 732 | "3 0 NaN NaN may 27 \n", 733 | "4 0 NaN NaN may 3 \n", 734 | "\n", 735 | " campaign_id \n", 736 | "0 1 \n", 737 | "1 1 \n", 738 | "2 1 \n", 739 | "3 1 \n", 740 | "4 1 " 741 | ] 742 | }, 743 | "execution_count": 9, 744 | "metadata": {}, 745 | "output_type": "execute_result" 746 | } 747 | ], 748 | "source": [ 749 | "# Add a new column 'campaign_id' with all values set to 1\n", 750 | "campaign['campaign_id'] = 1\n", 751 | "\n", 752 | "# Check if column was succsssfully created\n", 753 | "campaign.head()" 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": null, 759 | "id": "a15240cc", 760 | "metadata": {}, 761 | "outputs": [], 762 | "source": [] 763 | }, 764 | { 765 | "cell_type": "markdown", 766 | "id": "c91d964e", 767 | "metadata": {}, 768 | "source": [ 769 | "#### Instruction 8:\n", 770 | "\n", 771 | "Create a datetime column called last_contact_date, in the format of \"year-month-day\", where the year is 2022, and the month and day values are taken from the \"month\" and \"day\" columns." 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": 10, 777 | "id": "f473a21f", 778 | "metadata": {}, 779 | "outputs": [ 780 | { 781 | "name": "stdout", 782 | "output_type": "stream", 783 | "text": [ 784 | "The 'last_contact_date' column was successfully created.\n" 785 | ] 786 | }, 787 | { 788 | "data": { 789 | "text/html": [ 790 | "
\n", 791 | "\n", 804 | "\n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | "
client_idnumber_contactscontact_durationpdaysprevious_campaign_contactsprevious_outcomecampaign_outcomemonthdaycampaign_idyearlast_contact_date
0012619990NaNNaNmay13120222022-05-13
1111499990NaNNaNmay19120222022-05-19
2212269990NaNNaNmay23120222022-05-23
3311519990NaNNaNmay27120222022-05-27
4413079990NaNNaNmay3120222022-05-03
\n", 900 | "
" 901 | ], 902 | "text/plain": [ 903 | " client_id number_contacts contact_duration pdays \\\n", 904 | "0 0 1 261 999 \n", 905 | "1 1 1 149 999 \n", 906 | "2 2 1 226 999 \n", 907 | "3 3 1 151 999 \n", 908 | "4 4 1 307 999 \n", 909 | "\n", 910 | " previous_campaign_contacts previous_outcome campaign_outcome month day \\\n", 911 | "0 0 NaN NaN may 13 \n", 912 | "1 0 NaN NaN may 19 \n", 913 | "2 0 NaN NaN may 23 \n", 914 | "3 0 NaN NaN may 27 \n", 915 | "4 0 NaN NaN may 3 \n", 916 | "\n", 917 | " campaign_id year last_contact_date \n", 918 | "0 1 2022 2022-05-13 \n", 919 | "1 1 2022 2022-05-19 \n", 920 | "2 1 2022 2022-05-23 \n", 921 | "3 1 2022 2022-05-27 \n", 922 | "4 1 2022 2022-05-03 " 923 | ] 924 | }, 925 | "execution_count": 10, 926 | "metadata": {}, 927 | "output_type": "execute_result" 928 | } 929 | ], 930 | "source": [ 931 | "# Add the \"year\" column with the value 2022 to the 'campaign_data' DataFrame\n", 932 | "campaign['year'] = 2022\n", 933 | "\n", 934 | "# Create a datetime column \"last_contact_date\"\n", 935 | "campaign['last_contact_date'] = pd.to_datetime(\n", 936 | " campaign['year'].astype(str) + '-' +\n", 937 | " campaign['month'].astype(str) + '-' +\n", 938 | " campaign['day'].astype(str),\n", 939 | " errors='coerce'\n", 940 | ")\n", 941 | "\n", 942 | "# Check if the \"last_contact_date\" column was successfully created.\n", 943 | "if 'last_contact_date' in campaign.columns:\n", 944 | " print(\"The 'last_contact_date' column was successfully created.\")\n", 945 | "else:\n", 946 | " print(\"The 'last_contact_date' column was not created.\")\n", 947 | "\n", 948 | "# Print the first few rows of 'campaign_data' DataFrame to verify the creation of the date column\n", 949 | "campaign.head()" 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": 11, 955 | "id": "d7ef5f26", 956 | "metadata": {}, 957 | "outputs": [ 958 | { 959 | "data": { 960 | "text/plain": [ 961 | "dtype('\n", 1307 | "\n", 1320 | "\n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | "
idagejobmaritaleducationcredit_defaulthousingloan
0056housemaidmarriedbasic_4ynonono
1157servicesmarriedhigh_schoolunknownnono
2237servicesmarriedhigh_schoolnoyesno
3340adminmarriedbasic_6ynonono
4456servicesmarriedhigh_schoolnonoyes
...........................
411834118373retiredmarriedprofessional_coursenoyesno
411844118446blue-collarmarriedprofessional_coursenonono
411854118556retiredmarrieduniversity_degreenoyesno
411864118644technicianmarriedprofessional_coursenonono
411874118774retiredmarriedprofessional_coursenoyesno
\n", 1458 | "

41188 rows × 8 columns

\n", 1459 | "" 1460 | ], 1461 | "text/plain": [ 1462 | " id age job marital education credit_default \\\n", 1463 | "0 0 56 housemaid married basic_4y no \n", 1464 | "1 1 57 services married high_school unknown \n", 1465 | "2 2 37 services married high_school no \n", 1466 | "3 3 40 admin married basic_6y no \n", 1467 | "4 4 56 services married high_school no \n", 1468 | "... ... ... ... ... ... ... \n", 1469 | "41183 41183 73 retired married professional_course no \n", 1470 | "41184 41184 46 blue-collar married professional_course no \n", 1471 | "41185 41185 56 retired married university_degree no \n", 1472 | "41186 41186 44 technician married professional_course no \n", 1473 | "41187 41187 74 retired married professional_course no \n", 1474 | "\n", 1475 | " housing loan \n", 1476 | "0 no no \n", 1477 | "1 no no \n", 1478 | "2 yes no \n", 1479 | "3 no no \n", 1480 | "4 no yes \n", 1481 | "... ... ... \n", 1482 | "41183 yes no \n", 1483 | "41184 no no \n", 1484 | "41185 yes no \n", 1485 | "41186 no no \n", 1486 | "41187 yes no \n", 1487 | "\n", 1488 | "[41188 rows x 8 columns]" 1489 | ] 1490 | }, 1491 | "execution_count": 21, 1492 | "metadata": {}, 1493 | "output_type": "execute_result" 1494 | } 1495 | ], 1496 | "source": [ 1497 | "# Confirm if all three tables were successfully created \n", 1498 | "# and if all the data was successfully inserted into the three different tables\n", 1499 | "\n", 1500 | "# Read client_data from database to pandas dataframe\n", 1501 | "client_df = pd.read_sql_query('SELECT * FROM client_data', engine)\n", 1502 | "client_df" 1503 | ] 1504 | }, 1505 | { 1506 | "cell_type": "code", 1507 | "execution_count": 22, 1508 | "id": "3f83cdb5", 1509 | "metadata": {}, 1510 | "outputs": [ 1511 | { 1512 | "data": { 1513 | "text/html": [ 1514 | "
\n", 1515 | "\n", 1528 | "\n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | " \n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | " \n", 1603 | " \n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | " \n", 1609 | " \n", 1610 | " \n", 1611 | " \n", 1612 | " \n", 1613 | " \n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | " \n", 1652 | " \n", 1653 | " \n", 1654 | " \n", 1655 | " \n", 1656 | " \n", 1657 | " \n", 1658 | " \n", 1659 | " \n", 1660 | " \n", 1661 | " \n", 1662 | " \n", 1663 | " \n", 1664 | " \n", 1665 | " \n", 1666 | " \n", 1667 | " \n", 1668 | " \n", 1669 | " \n", 1670 | " \n", 1671 | " \n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | "
client_idnumber_contactscontact_durationpdaysprevious_campaign_contactsprevious_outcomecampaign_outcomecampaign_idlast_contact_date
0012619990NaNNone12022-05-13
1111499990NaNNone12022-05-19
2212269990NaNNone12022-05-23
3311519990NaNNone12022-05-27
4413079990NaNNone12022-05-03
..............................
411834118313349990NaNNone12022-11-30
411844118413839990NaNNone12022-11-06
411854118521899990NaNNone12022-11-24
411864118614429990NaNNone12022-11-17
4118741187323999910.0None12022-11-23
\n", 1678 | "

41188 rows × 9 columns

\n", 1679 | "
" 1680 | ], 1681 | "text/plain": [ 1682 | " client_id number_contacts contact_duration pdays \\\n", 1683 | "0 0 1 261 999 \n", 1684 | "1 1 1 149 999 \n", 1685 | "2 2 1 226 999 \n", 1686 | "3 3 1 151 999 \n", 1687 | "4 4 1 307 999 \n", 1688 | "... ... ... ... ... \n", 1689 | "41183 41183 1 334 999 \n", 1690 | "41184 41184 1 383 999 \n", 1691 | "41185 41185 2 189 999 \n", 1692 | "41186 41186 1 442 999 \n", 1693 | "41187 41187 3 239 999 \n", 1694 | "\n", 1695 | " previous_campaign_contacts previous_outcome campaign_outcome \\\n", 1696 | "0 0 NaN None \n", 1697 | "1 0 NaN None \n", 1698 | "2 0 NaN None \n", 1699 | "3 0 NaN None \n", 1700 | "4 0 NaN None \n", 1701 | "... ... ... ... \n", 1702 | "41183 0 NaN None \n", 1703 | "41184 0 NaN None \n", 1704 | "41185 0 NaN None \n", 1705 | "41186 0 NaN None \n", 1706 | "41187 1 0.0 None \n", 1707 | "\n", 1708 | " campaign_id last_contact_date \n", 1709 | "0 1 2022-05-13 \n", 1710 | "1 1 2022-05-19 \n", 1711 | "2 1 2022-05-23 \n", 1712 | "3 1 2022-05-27 \n", 1713 | "4 1 2022-05-03 \n", 1714 | "... ... ... \n", 1715 | "41183 1 2022-11-30 \n", 1716 | "41184 1 2022-11-06 \n", 1717 | "41185 1 2022-11-24 \n", 1718 | "41186 1 2022-11-17 \n", 1719 | "41187 1 2022-11-23 \n", 1720 | "\n", 1721 | "[41188 rows x 9 columns]" 1722 | ] 1723 | }, 1724 | "execution_count": 22, 1725 | "metadata": {}, 1726 | "output_type": "execute_result" 1727 | } 1728 | ], 1729 | "source": [ 1730 | "# Read campaign_data from database to pandas dataframe\n", 1731 | "campaign_df = pd.read_sql_query('SELECT * FROM campaign_data', engine)\n", 1732 | "campaign_df" 1733 | ] 1734 | }, 1735 | { 1736 | "cell_type": "code", 1737 | "execution_count": 23, 1738 | "id": "25ebcd41", 1739 | "metadata": {}, 1740 | "outputs": [ 1741 | { 1742 | "data": { 1743 | "text/html": [ 1744 | "
\n", 1745 | "\n", 1758 | "\n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1802 | " \n", 1803 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1809 | " \n", 1810 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1815 | " \n", 1816 | " \n", 1817 | " \n", 1818 | " \n", 1819 | " \n", 1820 | " \n", 1821 | " \n", 1822 | " \n", 1823 | " \n", 1824 | " \n", 1825 | " \n", 1826 | " \n", 1827 | " \n", 1828 | " \n", 1829 | " \n", 1830 | " \n", 1831 | " \n", 1832 | " \n", 1833 | " \n", 1834 | " \n", 1835 | " \n", 1836 | " \n", 1837 | " \n", 1838 | " \n", 1839 | " \n", 1840 | " \n", 1841 | " \n", 1842 | " \n", 1843 | " \n", 1844 | " \n", 1845 | " \n", 1846 | " \n", 1847 | " \n", 1848 | " \n", 1849 | " \n", 1850 | " \n", 1851 | " \n", 1852 | " \n", 1853 | " \n", 1854 | " \n", 1855 | " \n", 1856 | " \n", 1857 | " \n", 1858 | " \n", 1859 | "
client_idemp_var_ratecons_price_idxeuribor_three_monthsnumber_employed
001.193.9944.8575191.0
111.193.9944.8575191.0
221.193.9944.8575191.0
331.193.9944.8575191.0
441.193.9944.8575191.0
..................
4118341183-1.194.7671.0284963.6
4118441184-1.194.7671.0284963.6
4118541185-1.194.7671.0284963.6
4118641186-1.194.7671.0284963.6
4118741187-1.194.7671.0284963.6
\n", 1860 | "

41188 rows × 5 columns

\n", 1861 | "
" 1862 | ], 1863 | "text/plain": [ 1864 | " client_id emp_var_rate cons_price_idx euribor_three_months \\\n", 1865 | "0 0 1.1 93.994 4.857 \n", 1866 | "1 1 1.1 93.994 4.857 \n", 1867 | "2 2 1.1 93.994 4.857 \n", 1868 | "3 3 1.1 93.994 4.857 \n", 1869 | "4 4 1.1 93.994 4.857 \n", 1870 | "... ... ... ... ... \n", 1871 | "41183 41183 -1.1 94.767 1.028 \n", 1872 | "41184 41184 -1.1 94.767 1.028 \n", 1873 | "41185 41185 -1.1 94.767 1.028 \n", 1874 | "41186 41186 -1.1 94.767 1.028 \n", 1875 | "41187 41187 -1.1 94.767 1.028 \n", 1876 | "\n", 1877 | " number_employed \n", 1878 | "0 5191.0 \n", 1879 | "1 5191.0 \n", 1880 | "2 5191.0 \n", 1881 | "3 5191.0 \n", 1882 | "4 5191.0 \n", 1883 | "... ... \n", 1884 | "41183 4963.6 \n", 1885 | "41184 4963.6 \n", 1886 | "41185 4963.6 \n", 1887 | "41186 4963.6 \n", 1888 | "41187 4963.6 \n", 1889 | "\n", 1890 | "[41188 rows x 5 columns]" 1891 | ] 1892 | }, 1893 | "execution_count": 23, 1894 | "metadata": {}, 1895 | "output_type": "execute_result" 1896 | } 1897 | ], 1898 | "source": [ 1899 | "# Read economics_data from database to pandas dataframe\n", 1900 | "economics_df = pd.read_sql_query('SELECT * FROM economics_data', engine)\n", 1901 | "economics_df" 1902 | ] 1903 | }, 1904 | { 1905 | "cell_type": "markdown", 1906 | "id": "c2aa7e6a", 1907 | "metadata": {}, 1908 | "source": [ 1909 | "***The end!***" 1910 | ] 1911 | }, 1912 | { 1913 | "cell_type": "code", 1914 | "execution_count": null, 1915 | "id": "37c76c77", 1916 | "metadata": {}, 1917 | "outputs": [], 1918 | "source": [] 1919 | }, 1920 | { 1921 | "cell_type": "code", 1922 | "execution_count": null, 1923 | "id": "554ab333", 1924 | "metadata": {}, 1925 | "outputs": [], 1926 | "source": [] 1927 | } 1928 | ], 1929 | "metadata": { 1930 | "editor": "DataCamp Workspace", 1931 | "kernelspec": { 1932 | "display_name": "Python 3 (ipykernel)", 1933 | "language": "python", 1934 | "name": "python3" 1935 | }, 1936 | "language_info": { 1937 | "codemirror_mode": { 1938 | "name": "ipython", 1939 | "version": 3 1940 | }, 1941 | "file_extension": ".py", 1942 | "mimetype": "text/x-python", 1943 | "name": "python", 1944 | "nbconvert_exporter": "python", 1945 | "pygments_lexer": "ipython3", 1946 | "version": "3.11.5" 1947 | } 1948 | }, 1949 | "nbformat": 4, 1950 | "nbformat_minor": 5 1951 | } 1952 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Designing a Bank Marketing Database-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "02077ee3-e1e4-4fc5-8de1-16e987afa5fb", 6 | "metadata": {}, 7 | "source": [ 8 | "![piggy_bank](piggy_bank.jpg)\n", 9 | "\n", 10 | "
\n", 11 | "\n", 12 | "Personal loans are a lucrative revenue stream for banks. The typical interest rate of a two year loan in the United Kingdom is [around 10%](https://www.experian.com/blogs/ask-experian/whats-a-good-interest-rate-for-a-personal-loan/). This might not sound like a lot, but in September 2022 alone UK consumers borrowed [around £1.5 billion](https://www.ukfinance.org.uk/system/files/2022-12/Household%20Finance%20Review%202022%20Q3-%20Final.pdf), which would mean approximately £300 million in interest generated by banks over two years!\n", 13 | "\n", 14 | "You have been asked to work with a bank to clean and store the data they collected as part of a recent marketing campaign, which aimed to get customers to take out a personal loan. They plan to conduct more marketing campaigns going forward so would like you to set up a PostgreSQL database to store this campaign's data, designing the schema in a way that would allow data from future campaigns to be easily imported. \n", 15 | "\n", 16 | "They have supplied you with a csv file called `\"bank_marketing.csv\"`, which you will need to clean, reformat, and split, in order to save separate files based on the tables you will create. It is recommended to use `pandas` for these tasks.\n", 17 | "\n", 18 | "Lastly, you will write the SQL code that the bank can execute to create the tables and populate with the data from the csv files. As the bank are quite strict about their security, you'll save SQL files as multiline string variables that they can then use to create the database on their end. \n", 19 | "\n", 20 | "You have been asked to design a database that will have three tables:\n", 21 | "\n", 22 | "## client\n", 23 | "\n", 24 | "| column | data type | description | original column in dataset |\n", 25 | "|--------|-----------|-------------|----------------------------|\n", 26 | "| `id` | `serial` | Client ID - primary key | `client_id` |\n", 27 | "| `age` | `integer` | Client's age in years | `age` |\n", 28 | "| `job` | `text` | Client's type of job | `job` |\n", 29 | "| `marital` | `text` | Client's marital status | `marital` | \n", 30 | "| `education` | `text` | Client's level of education | `education` |\n", 31 | "| `credit_default` | `boolean` | Whether the client's credit is in default | `credit_default` |\n", 32 | "| `housing` | `boolean` | Whether the client has an existing housing loan (mortgage) | `housing` | \n", 33 | "| `loan` | `boolean` | Whether the client has an existing personal loan | `loan` |\n", 34 | "\n", 35 | "
\n", 36 | "\n", 37 | "## campaign\n", 38 | "\n", 39 | "| column | data type | description | original column in dataset |\n", 40 | "|--------|-----------|-------------|----------------------------|\n", 41 | "| `campaign_id` | `serial` | Campaign ID - primary key | N/A - new column |\n", 42 | "| `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` |\n", 43 | "| `number_contacts` | `integer` | Number of contact attempts to the client in the current campaign | `campaign` |\n", 44 | "| `contact_duration` | `integer` | Last contact duration in seconds | `duration` |\n", 45 | "| `pdays` | `integer` | Number of days since contact in previous campaign (`999` = not previously contacted) | `pdays` |\n", 46 | "| `previous_campaign_contacts` | `integer` | Number of contact attempts to the client in the previous campaign | `previous` |\n", 47 | "| `previous_outcome` | `boolean` | Outcome of the previous campaign | `poutcome` |\n", 48 | "| `campaign_outcome` | `boolean` | Outcome of the current campaign | `y` |\n", 49 | "| `last_contact_date` | `date` | Last date the client was contacted | A combination of `day`, `month`, and the newly created `year` |\n", 50 | "\n", 51 | "
\n", 52 | "\n", 53 | "## economics\n", 54 | "\n", 55 | "| column | data type | description | original column in dataset |\n", 56 | "|--------|-----------|-------------|----------------------------|\n", 57 | "| `client_id` | `serial` | Client ID - references `id` in the `client` table | `client_id` |\n", 58 | "| `emp_var_rate` | `float` | Employment variation rate (quarterly indicator) | `emp_var_rate` |\n", 59 | "| `cons_price_idx` | `float` | Consumer price index (monthly indicator) | `cons_price_idx` |\n", 60 | "| `euribor_three_months` | `float` | Euro Interbank Offered Rate (euribor) three month rate (daily indicator) | `euribor3m` |\n", 61 | "| `number_employed` | `float` | Number of employees (quarterly indicator)| `nr_employed` |" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 1, 67 | "id": "e2edad3c-8286-4983-b5b7-35d94fd78023", 68 | "metadata": { 69 | "executionCancelledAt": null, 70 | "executionTime": 1057, 71 | "lastExecutedAt": 1686069923599, 72 | "lastScheduledRunId": null, 73 | "lastSuccessfullyExecutedCode": "# Start coding...\nimport pandas as pd\nimport numpy as np\n\n# Read in csv\nmarketing = pd.read_csv(\"bank_marketing.csv\")\n\n# Split into the three tables\nclient = marketing[[\"client_id\", \"age\", \"job\", \"marital\", \"education\", \n \"credit_default\", \"housing\", \"loan\"]]\ncampaign = marketing[[\"client_id\", \"campaign\", \"month\", \"day\", \n \"duration\", \"pdays\", \"previous\", \"poutcome\", \"y\"]]\neconomics = marketing[[\"client_id\", \"emp_var_rate\", \"cons_price_idx\", \n \"euribor3m\", \"nr_employed\"]]\n\n# Rename client_id in the client table\nclient.rename(columns={\"client_id\": \"id\"}, inplace=True)\n\n# Rename duration, y, and campaign columns\ncampaign.rename(columns={\"duration\": \"contact_duration\", \n \"y\": \"campaign_outcome\", \n \"campaign\": \"number_contacts\",\n \"previous\": \"previous_campaign_contacts\",\n \"poutcome\": \"previous_outcome\"}, \n inplace=True)\n\n# Rename euribor3m and nr_employed\neconomics.rename(columns={\"euribor3m\": \"euribor_three_months\", \n \"nr_employed\": \"number_employed\"}, \n inplace=True)\n\n# Clean education column\nclient[\"education\"] = client[\"education\"].str.replace(\".\", \"_\")\nclient[\"education\"] = client[\"education\"].replace(\"unknown\", np.NaN)\n\n# Clean job column\nclient[\"job\"] = client[\"job\"].str.replace(\".\", \"\")\n\n# Change campaign_outcome to binary values\ncampaign[\"campaign_outcome\"] = campaign[\"campaign_outcome\"].map({\"yes\": 1, \n \"no\": 0})\n\n# Convert poutcome to binary values\ncampaign[\"previous_outcome\"] = campaign[\"previous_outcome\"].replace(\"nonexistent\", \n np.NaN)\ncampaign[\"previous_outcome\"] = campaign[\"previous_outcome\"].map({\"success\": 1, \n \"failure\": 0})\n\n# Add campaign_id column\ncampaign[\"campaign_id\"] = 1\n\n# Capitalize month and day columns\ncampaign[\"month\"] = campaign[\"month\"].str.capitalize()\n\n# Add year column\ncampaign[\"year\"] = \"2022\"\n\n# Convert day to string\ncampaign[\"day\"] = campaign[\"day\"].astype(str)\n\n# Add last_contact_date column\ncampaign[\"last_contact_date\"] = campaign[\"year\"] + \"-\" + campaign[\"month\"] + \"-\" + campaign[\"day\"]\n\n# Convert to datetime\ncampaign[\"last_contact_date\"] = pd.to_datetime(campaign[\"last_contact_date\"], \n format=\"%Y-%b-%d\")\n\n# Drop unneccessary columns\ncampaign.drop(columns=[\"month\", \"day\", \"year\"], inplace=True)\n\n# Save tables to individual csv files\nclient.to_csv(\"client.csv\", index=False)\ncampaign.to_csv(\"campaign.csv\", index=False)\neconomics.to_csv(\"economics.csv\", index=False)\n\n# Store and print database_design\nclient_table = \"\"\"CREATE TABLE client\n(\n id SERIAL PRIMARY KEY,\n age INTEGER,\n job TEXT,\n marital TEXT,\n education TEXT,\n credit_default BOOLEAN,\n housing BOOLEAN,\n loan BOOLEAN\n);\n\\copy client from 'client.csv' DELIMITER ',' CSV HEADER\n\"\"\"\n\ncampaign_table = \"\"\"CREATE TABLE campaign\n(\n campaign_id SERIAL PRIMARY KEY,\n client_id SERIAL references client (id),\n number_contacts INTEGER,\n contact_duration INTEGER,\n pdays INTEGER,\n previous_campaign_contacts INTEGER,\n previous_outcome BOOLEAN,\n campaign_outcome BOOLEAN,\n last_contact_date DATE \n);\n\\copy campaign from 'campaign.csv' DELIMITER ',' CSV HEADER\n\"\"\"\n\neconomics_table = \"\"\"CREATE TABLE economics\n(\n client_id SERIAL references client (id),\n emp_var_rate FLOAT,\n cons_price_idx FLOAT,\n euribor_three_months FLOAT,\n number_employed FLOAT\n);\n\\copy economics from 'economics.csv' DELIMITER ',' CSV HEADER\n\"\"\"" 74 | }, 75 | "outputs": [], 76 | "source": [ 77 | "# Import libraries\n", 78 | "import pandas as pd\n", 79 | "import numpy as np\n", 80 | "\n", 81 | "# suppress warnings from final output\n", 82 | "import warnings\n", 83 | "warnings.simplefilter(\"ignore\")\n", 84 | "\n", 85 | "# Start coding here..." 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "id": "9d94086e", 91 | "metadata": {}, 92 | "source": [ 93 | "#### Instruction 1: Read in bank_marketing.csv as a pandas DataFrame." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 2, 99 | "id": "ac57dc04", 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/html": [ 105 | "
\n", 106 | "\n", 119 | "\n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | "
client_idagejobmaritaleducationcredit_defaulthousingloancontactmonth...campaignpdayspreviouspoutcomeemp_var_ratecons_price_idxcons_conf_idxeuribor3mnr_employedy
0056housemaidmarriedbasic.4ynononotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
1157servicesmarriedhigh.schoolunknownnonotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
2237servicesmarriedhigh.schoolnoyesnotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
3340admin.marriedbasic.6ynononotelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
4456servicesmarriedhigh.schoolnonoyestelephonemay...19990nonexistent1.193.994-36.44.8575191.0no
\n", 269 | "

5 rows × 22 columns

\n", 270 | "
" 271 | ], 272 | "text/plain": [ 273 | " client_id age job marital education credit_default housing \\\n", 274 | "0 0 56 housemaid married basic.4y no no \n", 275 | "1 1 57 services married high.school unknown no \n", 276 | "2 2 37 services married high.school no yes \n", 277 | "3 3 40 admin. married basic.6y no no \n", 278 | "4 4 56 services married high.school no no \n", 279 | "\n", 280 | " loan contact month ... campaign pdays previous poutcome \\\n", 281 | "0 no telephone may ... 1 999 0 nonexistent \n", 282 | "1 no telephone may ... 1 999 0 nonexistent \n", 283 | "2 no telephone may ... 1 999 0 nonexistent \n", 284 | "3 no telephone may ... 1 999 0 nonexistent \n", 285 | "4 yes telephone may ... 1 999 0 nonexistent \n", 286 | "\n", 287 | " emp_var_rate cons_price_idx cons_conf_idx euribor3m nr_employed y \n", 288 | "0 1.1 93.994 -36.4 4.857 5191.0 no \n", 289 | "1 1.1 93.994 -36.4 4.857 5191.0 no \n", 290 | "2 1.1 93.994 -36.4 4.857 5191.0 no \n", 291 | "3 1.1 93.994 -36.4 4.857 5191.0 no \n", 292 | "4 1.1 93.994 -36.4 4.857 5191.0 no \n", 293 | "\n", 294 | "[5 rows x 22 columns]" 295 | ] 296 | }, 297 | "execution_count": 2, 298 | "metadata": {}, 299 | "output_type": "execute_result" 300 | } 301 | ], 302 | "source": [ 303 | "# Read in bank_marketing.csv as a pandas DataFrame.\n", 304 | "bank_marketing_data = pd.read_csv(\"bank_marketing.csv\")\n", 305 | "bank_marketing_data.head()" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "id": "11195a84", 311 | "metadata": {}, 312 | "source": [ 313 | "#### Instruction 2:\n", 314 | "\n", 315 | "Split the data into three DataFrames using information provided about the desired tables as your guide: \n", 316 | "\n", 317 | "* one with information about the client, \n", 318 | "* another containing campaign data, and \n", 319 | "* a third to store information about economics at the time of the campaign." 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 3, 325 | "id": "6707773f", 326 | "metadata": {}, 327 | "outputs": [ 328 | { 329 | "name": "stdout", 330 | "output_type": "stream", 331 | "text": [ 332 | "Client Data:\n", 333 | " client_id age job marital education credit_default housing loan\n", 334 | "0 0 56 housemaid married basic.4y no no no\n", 335 | "1 1 57 services married high.school unknown no no\n", 336 | "2 2 37 services married high.school no yes no\n", 337 | "3 3 40 admin. married basic.6y no no no\n", 338 | "4 4 56 services married high.school no no yes\n", 339 | "\n", 340 | "Campaign Data:\n", 341 | " client_id campaign duration pdays previous poutcome y month day\n", 342 | "0 0 1 261 999 0 nonexistent no may 13\n", 343 | "1 1 1 149 999 0 nonexistent no may 19\n", 344 | "2 2 1 226 999 0 nonexistent no may 23\n", 345 | "3 3 1 151 999 0 nonexistent no may 27\n", 346 | "4 4 1 307 999 0 nonexistent no may 3\n", 347 | "\n", 348 | "Economic Data:\n", 349 | " client_id emp_var_rate cons_price_idx euribor3m nr_employed\n", 350 | "0 0 1.1 93.994 4.857 5191.0\n", 351 | "1 1 1.1 93.994 4.857 5191.0\n", 352 | "2 2 1.1 93.994 4.857 5191.0\n", 353 | "3 3 1.1 93.994 4.857 5191.0\n", 354 | "4 4 1.1 93.994 4.857 5191.0\n" 355 | ] 356 | } 357 | ], 358 | "source": [ 359 | "# Define the columns for each table\n", 360 | "client_columns = ['client_id', 'age', 'job', 'marital', 'education', 'credit_default', 'housing', 'loan']\n", 361 | "campaign_columns = ['client_id', 'campaign', 'duration', 'pdays', 'previous', 'poutcome', 'y', 'month', 'day']\n", 362 | "economic_columns = ['client_id', 'emp_var_rate', 'cons_price_idx', 'euribor3m', 'nr_employed']\n", 363 | "\n", 364 | "# Create Dataframes for each table\n", 365 | "client = bank_marketing_data[client_columns]\n", 366 | "campaign = bank_marketing_data[campaign_columns]\n", 367 | "economics = bank_marketing_data[economic_columns]\n", 368 | "\n", 369 | "# Print the first few rows of each DataFrame to verify the split\n", 370 | "print(\"Client Data:\")\n", 371 | "print(client.head())\n", 372 | "\n", 373 | "print(\"\\nCampaign Data:\")\n", 374 | "print(campaign.head())\n", 375 | "\n", 376 | "print(\"\\nEconomic Data:\")\n", 377 | "print(economics.head())" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "id": "a77f1836", 383 | "metadata": {}, 384 | "source": [ 385 | "#### Instruction 3:\n", 386 | "\n", 387 | "Rename the column \"client_id\" to \"id\" in client (leave as-is in the other subsets); \"duration\" to \"contact_duration\",\n", 388 | "\"previous\" to \"previous_campaign_contacts\", \"y\" to \"campaign_outcome\", \"poutcome\" to \"previous_outcome\", and \n", 389 | "\"campaign\" to \"number_contacts\" in campaign; and \"euribor3m\" to \"euribor_three_months\" and \"nr_employed\" to\n", 390 | "\"number_employed\" in economics." 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 4, 396 | "id": "6f49b4d5", 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "name": "stdout", 401 | "output_type": "stream", 402 | "text": [ 403 | "Client Data:\n", 404 | " id age job marital education credit_default housing loan\n", 405 | "0 0 56 housemaid married basic.4y no no no\n", 406 | "1 1 57 services married high.school unknown no no\n", 407 | "2 2 37 services married high.school no yes no\n", 408 | "3 3 40 admin. married basic.6y no no no\n", 409 | "4 4 56 services married high.school no no yes\n", 410 | "\n", 411 | "Campaign Data:\n", 412 | " client_id number_contacts contact_duration pdays \\\n", 413 | "0 0 1 261 999 \n", 414 | "1 1 1 149 999 \n", 415 | "2 2 1 226 999 \n", 416 | "3 3 1 151 999 \n", 417 | "4 4 1 307 999 \n", 418 | "\n", 419 | " previous_campaign_contacts previous_outcome campaign_outcome month day \n", 420 | "0 0 nonexistent no may 13 \n", 421 | "1 0 nonexistent no may 19 \n", 422 | "2 0 nonexistent no may 23 \n", 423 | "3 0 nonexistent no may 27 \n", 424 | "4 0 nonexistent no may 3 \n", 425 | "\n", 426 | "Economic Data:\n", 427 | " client_id emp_var_rate cons_price_idx euribor_three_months \\\n", 428 | "0 0 1.1 93.994 4.857 \n", 429 | "1 1 1.1 93.994 4.857 \n", 430 | "2 2 1.1 93.994 4.857 \n", 431 | "3 3 1.1 93.994 4.857 \n", 432 | "4 4 1.1 93.994 4.857 \n", 433 | "\n", 434 | " number_employed \n", 435 | "0 5191.0 \n", 436 | "1 5191.0 \n", 437 | "2 5191.0 \n", 438 | "3 5191.0 \n", 439 | "4 5191.0 \n" 440 | ] 441 | } 442 | ], 443 | "source": [ 444 | "# Rename columns in the client DataFrame\n", 445 | "client.rename(columns={'client_id': 'id'}, inplace=True)\n", 446 | "\n", 447 | "# Rename columns in the campaign DataFrame\n", 448 | "campaign.rename(columns={'duration': 'contact_duration',\n", 449 | " 'previous': 'previous_campaign_contacts',\n", 450 | " 'y': 'campaign_outcome',\n", 451 | " 'poutcome': 'previous_outcome',\n", 452 | " 'campaign': 'number_contacts'}, inplace=True)\n", 453 | "\n", 454 | "# Rename columns in the economic DataFrame\n", 455 | "economics.rename(columns={'euribor3m': 'euribor_three_months',\n", 456 | " 'nr_employed': 'number_employed'}, inplace=True)\n", 457 | "\n", 458 | "# Print the first few rows of each DataFrame to verify the renaming of columns\n", 459 | "print(\"Client Data:\")\n", 460 | "print(client.head())\n", 461 | "\n", 462 | "print(\"\\nCampaign Data:\")\n", 463 | "print(campaign.head())\n", 464 | "\n", 465 | "print(\"\\nEconomic Data:\")\n", 466 | "print(economics.head())" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "id": "08f13310", 472 | "metadata": {}, 473 | "source": [ 474 | "#### Instruction 4:\n", 475 | "\n", 476 | "Clean the \"education\" column, changing \".\" to \"_\" and \"unknown\" to NumPy's null values." 477 | ] 478 | }, 479 | { 480 | "cell_type": "code", 481 | "execution_count": 5, 482 | "id": "f9a9ddd1", 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "# Replace \".\" with \"_\"\n", 487 | "client['education'] = client['education'].str.replace('.', '_')\n", 488 | "\n", 489 | "# Replace \"unknown\" with Numpy's NaN\n", 490 | "client['education'].replace('unknown', np.nan, inplace=True)" 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": 6, 496 | "id": "61f72247", 497 | "metadata": {}, 498 | "outputs": [ 499 | { 500 | "name": "stdout", 501 | "output_type": "stream", 502 | "text": [ 503 | "The 'education' column has been successfully cleaned.\n" 504 | ] 505 | } 506 | ], 507 | "source": [ 508 | "# Check for \".\" in the \"education\" column\n", 509 | "dot_count = client['education'].str.contains('\\.').sum()\n", 510 | "\n", 511 | "# Check for \"unknown\" in the \"education\" column\n", 512 | "unknown_count = (client['education'] == 'unknown').sum()\n", 513 | "\n", 514 | "if dot_count == 0 and unknown_count == 0:\n", 515 | " print(\"The 'education' column has been successfully cleaned.\")\n", 516 | "else:\n", 517 | " print(f\"The 'education' column still contains {dot_count} '.' and {unknown_count} 'unknown' values.\")" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "id": "7606b988", 523 | "metadata": {}, 524 | "source": [ 525 | "#### Instrucion 5:\n", 526 | "\n", 527 | "Remove periods from the \"job\" column." 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 7, 533 | "id": "196ea3e0", 534 | "metadata": {}, 535 | "outputs": [ 536 | { 537 | "name": "stdout", 538 | "output_type": "stream", 539 | "text": [ 540 | "Periods have been successfully removed from the 'job' column.\n" 541 | ] 542 | } 543 | ], 544 | "source": [ 545 | "# Remove \".\" from the \"job\" column\n", 546 | "client['job'] = client['job'].str.replace('.', '')\n", 547 | "\n", 548 | "# Check if periods are removed\n", 549 | "if '.' not in client['job'].values:\n", 550 | " print(\"Periods have been successfully removed from the 'job' column.\")\n", 551 | "else:\n", 552 | " print(\"Periods still exist in the job column.\")" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "id": "9a0eb339", 558 | "metadata": {}, 559 | "source": [ 560 | "#### Instruction 6:\n", 561 | "\n", 562 | "Convert \"success\" and \"failure\" in the \"previous_outcome\" and \"campaign_outcome\" columns to binary (1 or 0), \n", 563 | "along with the changing \"nonexistent\" to NumPy's null values in \"previous_outcome\"." 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": 8, 569 | "id": "fe0ace15", 570 | "metadata": {}, 571 | "outputs": [ 572 | { 573 | "name": "stdout", 574 | "output_type": "stream", 575 | "text": [ 576 | "Conversions and changes in 'previous_outcome' and 'campaign_outcome' columns were successfully applied.\n" 577 | ] 578 | } 579 | ], 580 | "source": [ 581 | "# Convert \"success\" and \"failure\" to binary (1 or 0) in \"previous_outcome\" and \"campaign_outcome\"\n", 582 | "campaign['previous_outcome'] = campaign['previous_outcome'].map({'success': 1, 'failure': 0})\n", 583 | "campaign['campaign_outcome'] = campaign['campaign_outcome'].map({'success': 1, 'failure': 0})\n", 584 | "\n", 585 | "# Change \"nonexistent\" to Numpy's NaN in \"previous_outcome\"\n", 586 | "campaign['previous_outcome'].replace('nonexistent', np.nan, inplace=True)\n", 587 | "\n", 588 | "# Check if the conversions and changes are applied\n", 589 | "if ('success' not in campaign['previous_outcome'].values and\n", 590 | " 'failure' not in campaign['previous_outcome'].values and\n", 591 | " 'nonexistent' not in campaign['previous_outcome'].values and\n", 592 | " 'success' not in campaign['campaign_outcome'].values and\n", 593 | " 'failure' not in campaign['campaign_outcome'].values):\n", 594 | " print(\"Conversions and changes in 'previous_outcome' and 'campaign_outcome' columns were successfully applied.\")\n", 595 | "else:\n", 596 | " print(\"Conversions and changes were not fully applied in 'previous_outcome' and 'campaign_outcome' columns.\")" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "id": "fa0e6239", 602 | "metadata": {}, 603 | "source": [ 604 | "#### Instruction 7:\n", 605 | "\n", 606 | "Add a column called campaign_id in campaign, where all rows have a value of 1." 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": 9, 612 | "id": "6b4ba272", 613 | "metadata": { 614 | "scrolled": true 615 | }, 616 | "outputs": [ 617 | { 618 | "data": { 619 | "text/html": [ 620 | "
\n", 621 | "\n", 634 | "\n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | "
client_idnumber_contactscontact_durationpdaysprevious_campaign_contactsprevious_outcomecampaign_outcomemonthdaycampaign_id
0012619990NaNNaNmay131
1111499990NaNNaNmay191
2212269990NaNNaNmay231
3311519990NaNNaNmay271
4413079990NaNNaNmay31
\n", 718 | "
" 719 | ], 720 | "text/plain": [ 721 | " client_id number_contacts contact_duration pdays \\\n", 722 | "0 0 1 261 999 \n", 723 | "1 1 1 149 999 \n", 724 | "2 2 1 226 999 \n", 725 | "3 3 1 151 999 \n", 726 | "4 4 1 307 999 \n", 727 | "\n", 728 | " previous_campaign_contacts previous_outcome campaign_outcome month day \\\n", 729 | "0 0 NaN NaN may 13 \n", 730 | "1 0 NaN NaN may 19 \n", 731 | "2 0 NaN NaN may 23 \n", 732 | "3 0 NaN NaN may 27 \n", 733 | "4 0 NaN NaN may 3 \n", 734 | "\n", 735 | " campaign_id \n", 736 | "0 1 \n", 737 | "1 1 \n", 738 | "2 1 \n", 739 | "3 1 \n", 740 | "4 1 " 741 | ] 742 | }, 743 | "execution_count": 9, 744 | "metadata": {}, 745 | "output_type": "execute_result" 746 | } 747 | ], 748 | "source": [ 749 | "# Add a new column 'campaign_id' with all values set to 1\n", 750 | "campaign['campaign_id'] = 1\n", 751 | "\n", 752 | "# Check if column was succsssfully created\n", 753 | "campaign.head()" 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": null, 759 | "id": "a15240cc", 760 | "metadata": {}, 761 | "outputs": [], 762 | "source": [] 763 | }, 764 | { 765 | "cell_type": "markdown", 766 | "id": "c91d964e", 767 | "metadata": {}, 768 | "source": [ 769 | "#### Instruction 8:\n", 770 | "\n", 771 | "Create a datetime column called last_contact_date, in the format of \"year-month-day\", where the year is 2022, and the month and day values are taken from the \"month\" and \"day\" columns." 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": 10, 777 | "id": "f473a21f", 778 | "metadata": {}, 779 | "outputs": [ 780 | { 781 | "name": "stdout", 782 | "output_type": "stream", 783 | "text": [ 784 | "The 'last_contact_date' column was successfully created.\n" 785 | ] 786 | }, 787 | { 788 | "data": { 789 | "text/html": [ 790 | "
\n", 791 | "\n", 804 | "\n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | "
client_idnumber_contactscontact_durationpdaysprevious_campaign_contactsprevious_outcomecampaign_outcomemonthdaycampaign_idyearlast_contact_date
0012619990NaNNaNmay13120222022-05-13
1111499990NaNNaNmay19120222022-05-19
2212269990NaNNaNmay23120222022-05-23
3311519990NaNNaNmay27120222022-05-27
4413079990NaNNaNmay3120222022-05-03
\n", 900 | "
" 901 | ], 902 | "text/plain": [ 903 | " client_id number_contacts contact_duration pdays \\\n", 904 | "0 0 1 261 999 \n", 905 | "1 1 1 149 999 \n", 906 | "2 2 1 226 999 \n", 907 | "3 3 1 151 999 \n", 908 | "4 4 1 307 999 \n", 909 | "\n", 910 | " previous_campaign_contacts previous_outcome campaign_outcome month day \\\n", 911 | "0 0 NaN NaN may 13 \n", 912 | "1 0 NaN NaN may 19 \n", 913 | "2 0 NaN NaN may 23 \n", 914 | "3 0 NaN NaN may 27 \n", 915 | "4 0 NaN NaN may 3 \n", 916 | "\n", 917 | " campaign_id year last_contact_date \n", 918 | "0 1 2022 2022-05-13 \n", 919 | "1 1 2022 2022-05-19 \n", 920 | "2 1 2022 2022-05-23 \n", 921 | "3 1 2022 2022-05-27 \n", 922 | "4 1 2022 2022-05-03 " 923 | ] 924 | }, 925 | "execution_count": 10, 926 | "metadata": {}, 927 | "output_type": "execute_result" 928 | } 929 | ], 930 | "source": [ 931 | "# Add the \"year\" column with the value 2022 to the 'campaign_data' DataFrame\n", 932 | "campaign['year'] = 2022\n", 933 | "\n", 934 | "# Create a datetime column \"last_contact_date\"\n", 935 | "campaign['last_contact_date'] = pd.to_datetime(\n", 936 | " campaign['year'].astype(str) + '-' +\n", 937 | " campaign['month'].astype(str) + '-' +\n", 938 | " campaign['day'].astype(str),\n", 939 | " errors='coerce'\n", 940 | ")\n", 941 | "\n", 942 | "# Check if the \"last_contact_date\" column was successfully created.\n", 943 | "if 'last_contact_date' in campaign.columns:\n", 944 | " print(\"The 'last_contact_date' column was successfully created.\")\n", 945 | "else:\n", 946 | " print(\"The 'last_contact_date' column was not created.\")\n", 947 | "\n", 948 | "# Print the first few rows of 'campaign_data' DataFrame to verify the creation of the date column\n", 949 | "campaign.head()" 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": 11, 955 | "id": "d7ef5f26", 956 | "metadata": {}, 957 | "outputs": [ 958 | { 959 | "data": { 960 | "text/plain": [ 961 | "dtype('\n", 1307 | "\n", 1320 | "\n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | "
idagejobmaritaleducationcredit_defaulthousingloan
0056housemaidmarriedbasic_4ynonono
1157servicesmarriedhigh_schoolunknownnono
2237servicesmarriedhigh_schoolnoyesno
3340adminmarriedbasic_6ynonono
4456servicesmarriedhigh_schoolnonoyes
...........................
411834118373retiredmarriedprofessional_coursenoyesno
411844118446blue-collarmarriedprofessional_coursenonono
411854118556retiredmarrieduniversity_degreenoyesno
411864118644technicianmarriedprofessional_coursenonono
411874118774retiredmarriedprofessional_coursenoyesno
\n", 1458 | "

41188 rows × 8 columns

\n", 1459 | "" 1460 | ], 1461 | "text/plain": [ 1462 | " id age job marital education credit_default \\\n", 1463 | "0 0 56 housemaid married basic_4y no \n", 1464 | "1 1 57 services married high_school unknown \n", 1465 | "2 2 37 services married high_school no \n", 1466 | "3 3 40 admin married basic_6y no \n", 1467 | "4 4 56 services married high_school no \n", 1468 | "... ... ... ... ... ... ... \n", 1469 | "41183 41183 73 retired married professional_course no \n", 1470 | "41184 41184 46 blue-collar married professional_course no \n", 1471 | "41185 41185 56 retired married university_degree no \n", 1472 | "41186 41186 44 technician married professional_course no \n", 1473 | "41187 41187 74 retired married professional_course no \n", 1474 | "\n", 1475 | " housing loan \n", 1476 | "0 no no \n", 1477 | "1 no no \n", 1478 | "2 yes no \n", 1479 | "3 no no \n", 1480 | "4 no yes \n", 1481 | "... ... ... \n", 1482 | "41183 yes no \n", 1483 | "41184 no no \n", 1484 | "41185 yes no \n", 1485 | "41186 no no \n", 1486 | "41187 yes no \n", 1487 | "\n", 1488 | "[41188 rows x 8 columns]" 1489 | ] 1490 | }, 1491 | "execution_count": 21, 1492 | "metadata": {}, 1493 | "output_type": "execute_result" 1494 | } 1495 | ], 1496 | "source": [ 1497 | "# Confirm if all three tables were successfully created \n", 1498 | "# and if all the data was successfully inserted into the three different tables\n", 1499 | "\n", 1500 | "# Read client_data from database to pandas dataframe\n", 1501 | "client_df = pd.read_sql_query('SELECT * FROM client_data', engine)\n", 1502 | "client_df" 1503 | ] 1504 | }, 1505 | { 1506 | "cell_type": "code", 1507 | "execution_count": 22, 1508 | "id": "3f83cdb5", 1509 | "metadata": {}, 1510 | "outputs": [ 1511 | { 1512 | "data": { 1513 | "text/html": [ 1514 | "
\n", 1515 | "\n", 1528 | "\n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | " \n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | " \n", 1603 | " \n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | " \n", 1609 | " \n", 1610 | " \n", 1611 | " \n", 1612 | " \n", 1613 | " \n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | " \n", 1652 | " \n", 1653 | " \n", 1654 | " \n", 1655 | " \n", 1656 | " \n", 1657 | " \n", 1658 | " \n", 1659 | " \n", 1660 | " \n", 1661 | " \n", 1662 | " \n", 1663 | " \n", 1664 | " \n", 1665 | " \n", 1666 | " \n", 1667 | " \n", 1668 | " \n", 1669 | " \n", 1670 | " \n", 1671 | " \n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | "
client_idnumber_contactscontact_durationpdaysprevious_campaign_contactsprevious_outcomecampaign_outcomecampaign_idlast_contact_date
0012619990NaNNone12022-05-13
1111499990NaNNone12022-05-19
2212269990NaNNone12022-05-23
3311519990NaNNone12022-05-27
4413079990NaNNone12022-05-03
..............................
411834118313349990NaNNone12022-11-30
411844118413839990NaNNone12022-11-06
411854118521899990NaNNone12022-11-24
411864118614429990NaNNone12022-11-17
4118741187323999910.0None12022-11-23
\n", 1678 | "

41188 rows × 9 columns

\n", 1679 | "
" 1680 | ], 1681 | "text/plain": [ 1682 | " client_id number_contacts contact_duration pdays \\\n", 1683 | "0 0 1 261 999 \n", 1684 | "1 1 1 149 999 \n", 1685 | "2 2 1 226 999 \n", 1686 | "3 3 1 151 999 \n", 1687 | "4 4 1 307 999 \n", 1688 | "... ... ... ... ... \n", 1689 | "41183 41183 1 334 999 \n", 1690 | "41184 41184 1 383 999 \n", 1691 | "41185 41185 2 189 999 \n", 1692 | "41186 41186 1 442 999 \n", 1693 | "41187 41187 3 239 999 \n", 1694 | "\n", 1695 | " previous_campaign_contacts previous_outcome campaign_outcome \\\n", 1696 | "0 0 NaN None \n", 1697 | "1 0 NaN None \n", 1698 | "2 0 NaN None \n", 1699 | "3 0 NaN None \n", 1700 | "4 0 NaN None \n", 1701 | "... ... ... ... \n", 1702 | "41183 0 NaN None \n", 1703 | "41184 0 NaN None \n", 1704 | "41185 0 NaN None \n", 1705 | "41186 0 NaN None \n", 1706 | "41187 1 0.0 None \n", 1707 | "\n", 1708 | " campaign_id last_contact_date \n", 1709 | "0 1 2022-05-13 \n", 1710 | "1 1 2022-05-19 \n", 1711 | "2 1 2022-05-23 \n", 1712 | "3 1 2022-05-27 \n", 1713 | "4 1 2022-05-03 \n", 1714 | "... ... ... \n", 1715 | "41183 1 2022-11-30 \n", 1716 | "41184 1 2022-11-06 \n", 1717 | "41185 1 2022-11-24 \n", 1718 | "41186 1 2022-11-17 \n", 1719 | "41187 1 2022-11-23 \n", 1720 | "\n", 1721 | "[41188 rows x 9 columns]" 1722 | ] 1723 | }, 1724 | "execution_count": 22, 1725 | "metadata": {}, 1726 | "output_type": "execute_result" 1727 | } 1728 | ], 1729 | "source": [ 1730 | "# Read campaign_data from database to pandas dataframe\n", 1731 | "campaign_df = pd.read_sql_query('SELECT * FROM campaign_data', engine)\n", 1732 | "campaign_df" 1733 | ] 1734 | }, 1735 | { 1736 | "cell_type": "code", 1737 | "execution_count": 23, 1738 | "id": "25ebcd41", 1739 | "metadata": {}, 1740 | "outputs": [ 1741 | { 1742 | "data": { 1743 | "text/html": [ 1744 | "
\n", 1745 | "\n", 1758 | "\n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1802 | " \n", 1803 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1809 | " \n", 1810 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1815 | " \n", 1816 | " \n", 1817 | " \n", 1818 | " \n", 1819 | " \n", 1820 | " \n", 1821 | " \n", 1822 | " \n", 1823 | " \n", 1824 | " \n", 1825 | " \n", 1826 | " \n", 1827 | " \n", 1828 | " \n", 1829 | " \n", 1830 | " \n", 1831 | " \n", 1832 | " \n", 1833 | " \n", 1834 | " \n", 1835 | " \n", 1836 | " \n", 1837 | " \n", 1838 | " \n", 1839 | " \n", 1840 | " \n", 1841 | " \n", 1842 | " \n", 1843 | " \n", 1844 | " \n", 1845 | " \n", 1846 | " \n", 1847 | " \n", 1848 | " \n", 1849 | " \n", 1850 | " \n", 1851 | " \n", 1852 | " \n", 1853 | " \n", 1854 | " \n", 1855 | " \n", 1856 | " \n", 1857 | " \n", 1858 | " \n", 1859 | "
client_idemp_var_ratecons_price_idxeuribor_three_monthsnumber_employed
001.193.9944.8575191.0
111.193.9944.8575191.0
221.193.9944.8575191.0
331.193.9944.8575191.0
441.193.9944.8575191.0
..................
4118341183-1.194.7671.0284963.6
4118441184-1.194.7671.0284963.6
4118541185-1.194.7671.0284963.6
4118641186-1.194.7671.0284963.6
4118741187-1.194.7671.0284963.6
\n", 1860 | "

41188 rows × 5 columns

\n", 1861 | "
" 1862 | ], 1863 | "text/plain": [ 1864 | " client_id emp_var_rate cons_price_idx euribor_three_months \\\n", 1865 | "0 0 1.1 93.994 4.857 \n", 1866 | "1 1 1.1 93.994 4.857 \n", 1867 | "2 2 1.1 93.994 4.857 \n", 1868 | "3 3 1.1 93.994 4.857 \n", 1869 | "4 4 1.1 93.994 4.857 \n", 1870 | "... ... ... ... ... \n", 1871 | "41183 41183 -1.1 94.767 1.028 \n", 1872 | "41184 41184 -1.1 94.767 1.028 \n", 1873 | "41185 41185 -1.1 94.767 1.028 \n", 1874 | "41186 41186 -1.1 94.767 1.028 \n", 1875 | "41187 41187 -1.1 94.767 1.028 \n", 1876 | "\n", 1877 | " number_employed \n", 1878 | "0 5191.0 \n", 1879 | "1 5191.0 \n", 1880 | "2 5191.0 \n", 1881 | "3 5191.0 \n", 1882 | "4 5191.0 \n", 1883 | "... ... \n", 1884 | "41183 4963.6 \n", 1885 | "41184 4963.6 \n", 1886 | "41185 4963.6 \n", 1887 | "41186 4963.6 \n", 1888 | "41187 4963.6 \n", 1889 | "\n", 1890 | "[41188 rows x 5 columns]" 1891 | ] 1892 | }, 1893 | "execution_count": 23, 1894 | "metadata": {}, 1895 | "output_type": "execute_result" 1896 | } 1897 | ], 1898 | "source": [ 1899 | "# Read economics_data from database to pandas dataframe\n", 1900 | "economics_df = pd.read_sql_query('SELECT * FROM economics_data', engine)\n", 1901 | "economics_df" 1902 | ] 1903 | }, 1904 | { 1905 | "cell_type": "markdown", 1906 | "id": "c2aa7e6a", 1907 | "metadata": {}, 1908 | "source": [ 1909 | "***The end!***" 1910 | ] 1911 | }, 1912 | { 1913 | "cell_type": "code", 1914 | "execution_count": null, 1915 | "id": "37c76c77", 1916 | "metadata": {}, 1917 | "outputs": [], 1918 | "source": [] 1919 | }, 1920 | { 1921 | "cell_type": "code", 1922 | "execution_count": null, 1923 | "id": "554ab333", 1924 | "metadata": {}, 1925 | "outputs": [], 1926 | "source": [] 1927 | } 1928 | ], 1929 | "metadata": { 1930 | "editor": "DataCamp Workspace", 1931 | "kernelspec": { 1932 | "display_name": "Python 3 (ipykernel)", 1933 | "language": "python", 1934 | "name": "python3" 1935 | }, 1936 | "language_info": { 1937 | "codemirror_mode": { 1938 | "name": "ipython", 1939 | "version": 3 1940 | }, 1941 | "file_extension": ".py", 1942 | "mimetype": "text/x-python", 1943 | "name": "python", 1944 | "nbconvert_exporter": "python", 1945 | "pygments_lexer": "ipython3", 1946 | "version": "3.11.5" 1947 | } 1948 | }, 1949 | "nbformat": 4, 1950 | "nbformat_minor": 5 1951 | } 1952 | --------------------------------------------------------------------------------