├── CaseStudy#1 - Danny's Diner
    ├── CaseStudy1_schema.sql
    ├── CaseStudy1_solution.ipynb
    └── README.md
├── CaseStudy#2 - Pizza Runner
    ├── CaseStudy2_schema.sql
    ├── CaseStudy2_solutions.ipynb
    ├── CaseStudy2_solutions.md
    └── README.md
├── CaseStudy#3 - Foodie-Fi
    ├── CaseStudy3_schema.sql
    ├── CaseStudy3_solutions.ipynb
    ├── CaseStudy3_solutions.md
    ├── README.md
    └── images
    │   ├── question10.png
    │   ├── question2.png
    │   ├── question3.png
    │   ├── question6.png
    │   └── question7.png
├── CaseStudy#4 - Data Bank
    ├── CaseStudy4_schema.sql
    ├── CaseStudy4_solutions.ipynb
    ├── CaseStudy4_solutions.md
    └── README.md
├── CaseStudy#5 - Data Mart
    ├── CaseStudy5_schema.sql
    ├── CaseStudy5_solutions.ipynb
    ├── CaseStudy5_solutions.md
    ├── README.md
    └── image
    │   └── plot.png
├── CaseStudy#6 - Clique Bait
    ├── CaseStudy6_schema.sql
    ├── CaseStudy6_solutions.ipynb
    ├── CaseStudy6_solutions.md
    ├── README.md
    └── images
    │   └── CliqueBait_ERD.png
├── CaseStudy#7 - Balanced Tree
    ├── CaseStudy7_schema.sql
    ├── CaseStudy7_solutions.ipynb
    ├── CaseStudy7_solutions.md
    └── README.md
├── CaseStudy#8 - Fresh Segments
    ├── CaseStudy8_schema.sql
    ├── CaseStudy8_solutions.ipynb
    ├── CaseStudy8_solutions.md
    └── README.md
├── LICENSE
└── README.md


/CaseStudy#1 - Danny's Diner/CaseStudy1_schema.sql:
--------------------------------------------------------------------------------
 1 | ##########################################################
 2 | # Case Study #1: Danny's Diner
 3 | # This SQL script is a DDL file for creating tables.
 4 | # Source: https://8weeksqlchallenge.com/case-study-1/
 5 | ##########################################################
 6 | 
 7 | 
 8 | -- Create Schema
 9 | CREATE SCHEMA IF NOT EXISTS dannys_diner;
10 | 
11 | 
12 | -- Create and Populate "sales" table
13 | CREATE TABLE IF NOT EXISTS dannys_diner.sales (
14 |   customer_id VARCHAR(1),
15 |   order_date DATE,
16 |   product_id INTEGER
17 | );
18 | 
19 | INSERT INTO dannys_diner.sales
20 |   (customer_id, order_date, product_id)
21 | VALUES
22 |   ('A', '2021-01-01', '1'),
23 |   ('A', '2021-01-01', '2'),
24 |   ('A', '2021-01-07', '2'),
25 |   ('A', '2021-01-10', '3'),
26 |   ('A', '2021-01-11', '3'),
27 |   ('A', '2021-01-11', '3'),
28 |   ('B', '2021-01-01', '2'),
29 |   ('B', '2021-01-02', '2'),
30 |   ('B', '2021-01-04', '1'),
31 |   ('B', '2021-01-11', '1'),
32 |   ('B', '2021-01-16', '3'),
33 |   ('B', '2021-02-01', '3'),
34 |   ('C', '2021-01-01', '3'),
35 |   ('C', '2021-01-01', '3'),
36 |   ('C', '2021-01-07', '3');
37 |  
38 | 
39 | -- Create and Populate "menu" table
40 | CREATE TABLE IF NOT EXISTS dannys_diner.menu (
41 |   product_id INTEGER,
42 |   product_name VARCHAR(5),
43 |   price INTEGER
44 | );
45 | 
46 | INSERT INTO dannys_diner.menu
47 |   (product_id, product_name, price)
48 | VALUES
49 |   ('1', 'sushi', '10'),
50 |   ('2', 'curry', '15'),
51 |   ('3', 'ramen', '12');
52 |   
53 | 
54 | -- Create and Populate "members" table
55 | CREATE TABLE IF NOT EXISTS dannys_diner.members (
56 |   customer_id VARCHAR(1),
57 |   join_date DATE
58 | );
59 | 
60 | INSERT INTO dannys_diner.members
61 |   (customer_id, join_date)
62 | VALUES
63 |   ('A', '2021-01-07'),
64 |   ('B', '2021-01-09');


--------------------------------------------------------------------------------
/CaseStudy#1 - Danny's Diner/README.md:
--------------------------------------------------------------------------------
 1 | # Case Study #1: Danny's Diner 🍥 
 2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
 3 | [![View Case Study 1](https://img.shields.io/badge/View-Case_Study_1-68032C)](https://8weeksqlchallenge.com/case-study-1/)</br>
 4 | ![1](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/3ebb1080-b8d3-4381-850d-4a003cc9476d)
 5 | 
 6 | The case study presented here is part of the **8 Week SQL Challenge**.\
 7 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).\
 8 | I use `MySQL queries` in `Jupyter Notebook` to quickly view results.
 9 | 
10 | 
11 | 
12 | ## Table of Contents
13 | * [Entity Relationship Diagram](#entity-relationship-diagram)
14 | * [Datasets](#datasets)
15 | * [Case Study Questions](#case-study-questions)
16 | * [Solutions](#solutions)
17 | 
18 | 
19 | ## Entity Relationship Diagram
20 | ![Danny's Diner](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/d71bffd1-6513-456c-9686-d95dbf1eeaaf)
21 | 
22 | ## Datasets
23 | The Case Study #1 contains 3 tables:
24 | - **sales**: This table captures all the order information (`order_date` and `product_id`) of each customer (`customer_id`).
25 | - **menu**: This table lists the IDs, names and prices of each menu item.
26 | - **members**: This table captures the dates (`join_date`) when each customer joined the member program.
27 | 
28 | ## Case Study Questions 
29 | 1. What is the total amount each customer spent at the restaurant?
30 | 2. How many days has each customer visited the restaurant?
31 | 3. What was the first item from the menu purchased by each customer?
32 | 4. What is the most purchased item on the menu and how many times was it purchased by all customers?
33 | 5. Which item was the most popular for each customer?
34 | 6. Which item was purchased first by the customer after they became a member?
35 | 7. Which item was purchased just before the customer became a member?
36 | 8. What is the total items and amount spent for each member before they became a member?
37 | 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
38 | 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
39 | 
40 | 
41 | ## Solutions
42 | - View `dannys_diner` database: [**here**](CaseStudy1_schema.sql)
43 | - View Solution: [**here**](CaseStudy1_solution.ipynb)
44 | 
45 | 


--------------------------------------------------------------------------------
/CaseStudy#2 - Pizza Runner/CaseStudy2_schema.sql:
--------------------------------------------------------------------------------
  1 | ##########################################################
  2 | # Case Study #2: Pizza Runner
  3 | # This SQL script is a DDL file for creating tables.
  4 | # Source: https://8weeksqlchallenge.com/case-study-2/
  5 | ##########################################################
  6 | 
  7 | 
  8 | -- Create Schema
  9 | CREATE SCHEMA IF NOT EXISTS pizza_runner;
 10 | USE pizza_runner;
 11 | 
 12 | 
 13 | -- Create and Populate "runners" table
 14 | DROP TABLE IF EXISTS runners;
 15 | CREATE TABLE runners (
 16 |   runner_id INTEGER,
 17 |   registration_date DATE
 18 | );
 19 | INSERT INTO runners
 20 |   (runner_id, registration_date)
 21 | VALUES
 22 |   (1, '2021-01-01'),
 23 |   (2, '2021-01-03'),
 24 |   (3, '2021-01-08'),
 25 |   (4, '2021-01-15');
 26 | 
 27 | 
 28 | -- Create and Populate "customers_orders" table
 29 | DROP TABLE IF EXISTS customer_orders;
 30 | CREATE TABLE customer_orders (
 31 |   order_id INTEGER,
 32 |   customer_id INTEGER,
 33 |   pizza_id INTEGER,
 34 |   exclusions VARCHAR(4),
 35 |   extras VARCHAR(4),
 36 |   order_time TIMESTAMP
 37 | );
 38 | 
 39 | INSERT INTO customer_orders
 40 |   (order_id, customer_id, pizza_id, exclusions, extras, order_time)
 41 | VALUES
 42 |   ('1', '101', '1', '', '', '2020-01-01 18:05:02'),
 43 |   ('2', '101', '1', '', '', '2020-01-01 19:00:52'),
 44 |   ('3', '102', '1', '', '', '2020-01-02 23:51:23'),
 45 |   ('3', '102', '2', '', NULL, '2020-01-02 23:51:23'),
 46 |   ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
 47 |   ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
 48 |   ('4', '103', '2', '4', '', '2020-01-04 13:23:46'),
 49 |   ('5', '104', '1', 'null', '1', '2020-01-08 21:00:29'),
 50 |   ('6', '101', '2', 'null', 'null', '2020-01-08 21:03:13'),
 51 |   ('7', '105', '2', 'null', '1', '2020-01-08 21:20:29'),
 52 |   ('8', '102', '1', 'null', 'null', '2020-01-09 23:54:33'),
 53 |   ('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'),
 54 |   ('10', '104', '1', 'null', 'null', '2020-01-11 18:34:49'),
 55 |   ('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49');
 56 | 
 57 | 
 58 | -- Create and Populate "runner_orders" table
 59 | DROP TABLE IF EXISTS runner_orders;
 60 | CREATE TABLE runner_orders (
 61 |   order_id INTEGER,
 62 |   runner_id INTEGER,
 63 |   pickup_time VARCHAR(19),
 64 |   distance VARCHAR(7),
 65 |   duration VARCHAR(10),
 66 |   cancellation VARCHAR(23)
 67 | );
 68 | 
 69 | INSERT INTO runner_orders
 70 |   (order_id, runner_id, pickup_time, distance, duration, cancellation)
 71 | VALUES
 72 |   ('1', '1', '2020-01-01 18:15:34', '20km', '32 minutes', ''),
 73 |   ('2', '1', '2020-01-01 19:10:54', '20km', '27 minutes', ''),
 74 |   ('3', '1', '2020-01-03 00:12:37', '13.4km', '20 mins', NULL),
 75 |   ('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL),
 76 |   ('5', '3', '2020-01-08 21:10:57', '10', '15', NULL),
 77 |   ('6', '3', 'null', 'null', 'null', 'Restaurant Cancellation'),
 78 |   ('7', '2', '2020-01-08 21:30:45', '25km', '25mins', 'null'),
 79 |   ('8', '2', '2020-01-10 00:15:02', '23.4 km', '15 minute', 'null'),
 80 |   ('9', '2', 'null', 'null', 'null', 'Customer Cancellation'),
 81 |   ('10', '1', '2020-01-11 18:50:20', '10km', '10minutes', 'null');
 82 | 
 83 | 
 84 | -- Create and Populate "pizza_names" table
 85 | DROP TABLE IF EXISTS pizza_names;
 86 | CREATE TABLE pizza_names (
 87 |   pizza_id INTEGER,
 88 |   pizza_name TEXT
 89 | );
 90 | INSERT INTO pizza_names
 91 |   (pizza_id, pizza_name)
 92 | VALUES
 93 |   (1, 'Meatlovers'),
 94 |   (2, 'Vegetarian');
 95 | 
 96 | 
 97 | -- Create and Populate "pizza_recipes" table
 98 | DROP TABLE IF EXISTS pizza_recipes;
 99 | CREATE TABLE pizza_recipes (
100 |   pizza_id INTEGER,
101 |   toppings TEXT
102 | );
103 | INSERT INTO pizza_recipes
104 |   (pizza_id, toppings)
105 | VALUES
106 |   (1, '1, 2, 3, 4, 5, 6, 8, 10'),
107 |   (2, '4, 6, 7, 9, 11, 12');
108 | 
109 | 
110 | -- Create and Populate "pizza_toppings" table
111 | DROP TABLE IF EXISTS pizza_toppings;
112 | CREATE TABLE pizza_toppings (
113 |   topping_id INTEGER,
114 |   topping_name TEXT
115 | );
116 | INSERT INTO pizza_toppings
117 |   (topping_id, topping_name)
118 | VALUES
119 |   (1, 'Bacon'),
120 |   (2, 'BBQ Sauce'),
121 |   (3, 'Beef'),
122 |   (4, 'Cheese'),
123 |   (5, 'Chicken'),
124 |   (6, 'Mushrooms'),
125 |   (7, 'Onions'),
126 |   (8, 'Pepperoni'),
127 |   (9, 'Peppers'),
128 |   (10, 'Salami'),
129 |   (11, 'Tomatoes'),
130 |   (12, 'Tomato Sauce');


--------------------------------------------------------------------------------
/CaseStudy#2 - Pizza Runner/README.md:
--------------------------------------------------------------------------------
 1 | # Case Study #2: Pizza Runner 🍕
 2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
 3 | [![View Case Study 2](https://img.shields.io/badge/View-Case_Study_2-9336E5)](https://8weeksqlchallenge.com/case-study-2/)</br>
 4 | ![2](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/b7a82869-9156-4cf5-add0-67f701bf3e45)
 5 | 
 6 | 
 7 | The case study presented here is part of the **8 Week SQL Challenge**.\
 8 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).\
 9 | I use `MySQL queries` in `Jupyter Notebook` to quickly view results.
10 | 
11 | 
12 | ## Table of Contents
13 | * [Entity Relationship Diagram](#entity-relationship-diagram)
14 | * [Datasets](#datasets)
15 | * [Case Study Questions](#case-study-questions)
16 | * [Solutions](#solutions)
17 | * [MySQL Topics Covered](#mysql-topics-covered)
18 | 
19 | ## Entity Relationship Diagram
20 | ![Pizza Runner](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/4222d127-ee94-49be-95ce-6e81d9b3774a)
21 | 
22 | 
23 | ## Datasets
24 | The Case Study #2 contains 6 tables:
25 | - **customer_orders**: This table captures all the pizza order and delivery information of each customer (`customer_id`).
26 | - **runner_orders**: This table lists the delivery and runner information for each order.
27 | - **runners**: This table lists the runner IDs and their registration dates.
28 | - **pizza_names**: This table maps each pizza_id to its corresponding pizza name
29 | - **pizza_recipes**: This table maps the toppings used for each pizza at Pizza Runner.
30 | - **pizza_toppings**: This table lists the toppings used at Pizza Runner.
31 | 
32 | ## Case Study Questions
33 | Case Study #2 is categorized into five question groups\
34 | To view the specific section, please open the link in a *`new tab`* or *`window`*.\
35 | [A. Pizza Metrics](CaseStudy2_solutions.md#A)\
36 | [B. Runner and Customer Experience](CaseStudy2_solutions.md#B)\
37 | [C. Ingredient Optimisation](CaseStudy2_solutions.md#C)\
38 | [D. Pricing and Ratings](CaseStudy2_solutions.md#D)\
39 | [E. Bonus Questions](CaseStudy2_solutions.md#E)
40 | 
41 | 
42 | ## Solutions
43 | - View `pizza_runner` database: [here](https://github.com/chanronnie/8WeekSQLChallenge/blob/main/CaseStudy%232%20-%20Pizza%20Runner/CaseStudy2_schema.sql)
44 | - View Solution:
45 |     - [Markdown File](CaseStudy2_solutions.md): offers a more fluid and responsive viewing experience
46 |     - [Jupyter Notebook](CaseStudy2_solutions.ipynb): contains the original code
47 | 
48 | ## MySQL Topics Covered
49 | - Data Cleaning
50 | - Common Table Expressions (CTE)
51 | - Temporary Tables
52 | - Window Functions
53 | - Subqueries
54 | - JOIN, UNION ALL
55 | - String and Time Data Manipulation
56 | 


--------------------------------------------------------------------------------
/CaseStudy#3 - Foodie-Fi/README.md:
--------------------------------------------------------------------------------
 1 | # Case Study #3: Foodie-Fi 🥑
 2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
 3 | [![View Case Study 3](https://img.shields.io/badge/View-Case_Study_3-049C20)](https://8weeksqlchallenge.com/case-study-3/)</br>
 4 | ![3](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/660c3a4a-8c2e-4f40-842b-464a3f4bf3ff)
 5 | 
 6 | 
 7 | 
 8 | The case study presented here is part of the **8 Week SQL Challenge**.\
 9 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).\
10 | I use `MySQL queries` in `Jupyter Notebook` to quickly view results.
11 | 
12 | 
13 | ## Table of Contents
14 | * [Entity Relationship Diagram](#entity-relationship-diagram)
15 | * [Datasets](#datasets)
16 | * [Case Study Questions](#case-study-questions)
17 | * [Solutions](#solutions)
18 | * [MySQL Topics Covered](#mysql-topics-covered)
19 | 
20 | ## Entity Relationship Diagram
21 | ![FoodieFi](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/7fb1d1c0-25c6-44b0-af05-b834f90e70ee)
22 | 
23 | 
24 | 
25 | ## Datasets
26 | The Case Study #3 contains 2 tables:
27 | - **subscriptions**: This table captures all the plan subscription infoprmation of each customer at Foodie-Fi.
28 | - **plans**: This table lists the plans available and prices at Foodie-Fi.
29 | 
30 | ## Case Study Questions
31 | Case Study #3 is categorized into 3 question groups\
32 | To view the specific section, please open the link in a *`new tab`* or *`window`*.\
33 | [A. Customer Journey](CaseStudy3_solutions.md#A)\
34 | [B. Data Analysis Questions](CaseStudy3_solutions.md#B)\
35 | [C. Challenge Payment Question](CaseStudy3_solutions.md#C)
36 | 
37 | ## Solutions
38 | - View `foodie_fi` database: [**here**](CaseStudy3_schema.sql)
39 | - View Solution:
40 |     - [**Markdown File**](CaseStudy3_solutions.md): offers a more fluid and responsive viewing experience
41 |     - [**Jupyter Notebook**](CaseStudy3_solutions.ipynb): contains the original code
42 | 
43 | ## MySQL Topics Covered
44 | - Common Table Expressions (CTE)
45 | - Temporary Tables
46 | - Window Functions
47 | - Subqueries
48 | - JOIN, UNION ALL
49 | 


--------------------------------------------------------------------------------
/CaseStudy#3 - Foodie-Fi/images/question10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/4b36f745d833a82a61a9ce70ab177462eb7dbfd0/CaseStudy#3 - Foodie-Fi/images/question10.png


--------------------------------------------------------------------------------
/CaseStudy#3 - Foodie-Fi/images/question2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/4b36f745d833a82a61a9ce70ab177462eb7dbfd0/CaseStudy#3 - Foodie-Fi/images/question2.png


--------------------------------------------------------------------------------
/CaseStudy#3 - Foodie-Fi/images/question3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/4b36f745d833a82a61a9ce70ab177462eb7dbfd0/CaseStudy#3 - Foodie-Fi/images/question3.png


--------------------------------------------------------------------------------
/CaseStudy#3 - Foodie-Fi/images/question6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/4b36f745d833a82a61a9ce70ab177462eb7dbfd0/CaseStudy#3 - Foodie-Fi/images/question6.png


--------------------------------------------------------------------------------
/CaseStudy#3 - Foodie-Fi/images/question7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/4b36f745d833a82a61a9ce70ab177462eb7dbfd0/CaseStudy#3 - Foodie-Fi/images/question7.png


--------------------------------------------------------------------------------
/CaseStudy#4 - Data Bank/CaseStudy4_solutions.md:
--------------------------------------------------------------------------------
   1 | # Case Study #4: Data Bank
   2 | The case study questions presented here are created by [**Data With Danny**](https://linktr.ee/datawithdanny). They are part of the [**8 Week SQL Challenge**](https://8weeksqlchallenge.com/).
   3 | 
   4 | My SQL queries are written in the `PostgreSQL 15` dialect, integrated into `Jupyter Notebook`, which allows us to instantly view the query results and document the queries.
   5 | 
   6 | For more details about the **Case Study #4**, click [**here**](https://8weeksqlchallenge.com/case-study-4/).
   7 | 
   8 | ## Table of Contents
   9 | 
  10 | ### [1. Importing Libraries](#Import)
  11 | 
  12 | ### [2. Tables of the Database](#Tables)
  13 | 
  14 | ### [3. Case Study Questions](#CaseStudyQuestions)
  15 | 
  16 | - [A. Customer Nodes Exploration](#A)
  17 | - [B. Customer Transactions](#B)
  18 | - [C. Data Allocation Challenge](#C)
  19 | 
  20 | 
  21 | <a id = 'Import'></a>
  22 | ## 1. Importing required Libraries
  23 | 
  24 | 
  25 | ```python
  26 | import psycopg2 as pg2
  27 | import pandas as pd
  28 | import os
  29 | import warnings
  30 | 
  31 | warnings.filterwarnings('ignore')
  32 | ```
  33 | 
  34 | 
  35 | <a id = 'Tables'></a>
  36 | ## 2. Tables of the Database
  37 | 
  38 | ### Connecting PostgreSQL database through Jupyter Notebook
  39 | 
  40 | 
  41 | ```python
  42 | # Get the PostgreSQL password 
  43 | mypassword = os.getenv("POSTGRESQL_PASSWORD")
  44 | 
  45 | # Connect SQL database
  46 | conn = pg2.connect(user = 'postgres', password = mypassword, database = 'data_bank')
  47 | cursor = conn.cursor()
  48 | ```
  49 | 
  50 | Now, let's list the table names of the `data_bank` database.
  51 | 
  52 | 
  53 | ```python
  54 | query_ShowTables = """
  55 | SELECT 
  56 |     table_schema, 
  57 |     table_name
  58 | FROM information_schema.tables
  59 | WHERE table_schema = 'data_bank';
  60 | """
  61 | cursor.execute(query_ShowTables)
  62 | 
  63 | print('--- Tables within "data_bank" database --- ')
  64 | for table in cursor:
  65 |     print(table[1])
  66 | ```
  67 | 
  68 |     --- Tables within "data_bank" database --- 
  69 |     regions
  70 |     customer_nodes
  71 |     customer_transactions
  72 |     
  73 | 
  74 | The followings are the 3 tables within the `data_bank` database. Please click [**here**](https://8weeksqlchallenge.com/case-study-4/) to get more insights about the tables.
  75 | 
  76 | 
  77 | ```python
  78 | cursor.execute(query_ShowTables)
  79 | for table in cursor:
  80 |     print("Table: ", table[1])
  81 |     query = "SELECT * FROM " + table[0] + '.' + table[1]
  82 |     df = pd.read_sql(query, conn)
  83 |     display(df)
  84 | ```
  85 | 
  86 |     Table:  regions
  87 |     
  88 | <table border="1" class="dataframe">
  89 |   <thead>
  90 |     <tr style="text-align: right;">
  91 |       <th></th>
  92 |       <th>region_id</th>
  93 |       <th>region_name</th>
  94 |     </tr>
  95 |   </thead>
  96 |   <tbody>
  97 |     <tr>
  98 |       <th>0</th>
  99 |       <td>1</td>
 100 |       <td>Australia</td>
 101 |     </tr>
 102 |     <tr>
 103 |       <th>1</th>
 104 |       <td>2</td>
 105 |       <td>America</td>
 106 |     </tr>
 107 |     <tr>
 108 |       <th>2</th>
 109 |       <td>3</td>
 110 |       <td>Africa</td>
 111 |     </tr>
 112 |     <tr>
 113 |       <th>3</th>
 114 |       <td>4</td>
 115 |       <td>Asia</td>
 116 |     </tr>
 117 |     <tr>
 118 |       <th>4</th>
 119 |       <td>5</td>
 120 |       <td>Europe</td>
 121 |     </tr>
 122 |   </tbody>
 123 | </table>
 124 | </div>
 125 | 
 126 | 
 127 |     Table:  customer_nodes
 128 |     
 129 | <table border="1" class="dataframe">
 130 |   <thead>
 131 |     <tr style="text-align: right;">
 132 |       <th></th>
 133 |       <th>customer_id</th>
 134 |       <th>region_id</th>
 135 |       <th>node_id</th>
 136 |       <th>start_date</th>
 137 |       <th>end_date</th>
 138 |     </tr>
 139 |   </thead>
 140 |   <tbody>
 141 |     <tr>
 142 |       <th>0</th>
 143 |       <td>1</td>
 144 |       <td>3</td>
 145 |       <td>4</td>
 146 |       <td>2020-01-02</td>
 147 |       <td>2020-01-03</td>
 148 |     </tr>
 149 |     <tr>
 150 |       <th>1</th>
 151 |       <td>2</td>
 152 |       <td>3</td>
 153 |       <td>5</td>
 154 |       <td>2020-01-03</td>
 155 |       <td>2020-01-17</td>
 156 |     </tr>
 157 |     <tr>
 158 |       <th>2</th>
 159 |       <td>3</td>
 160 |       <td>5</td>
 161 |       <td>4</td>
 162 |       <td>2020-01-27</td>
 163 |       <td>2020-02-18</td>
 164 |     </tr>
 165 |     <tr>
 166 |       <th>3</th>
 167 |       <td>4</td>
 168 |       <td>5</td>
 169 |       <td>4</td>
 170 |       <td>2020-01-07</td>
 171 |       <td>2020-01-19</td>
 172 |     </tr>
 173 |     <tr>
 174 |       <th>4</th>
 175 |       <td>5</td>
 176 |       <td>3</td>
 177 |       <td>3</td>
 178 |       <td>2020-01-15</td>
 179 |       <td>2020-01-23</td>
 180 |     </tr>
 181 |     <tr>
 182 |       <th>...</th>
 183 |       <td>...</td>
 184 |       <td>...</td>
 185 |       <td>...</td>
 186 |       <td>...</td>
 187 |       <td>...</td>
 188 |     </tr>
 189 |     <tr>
 190 |       <th>3495</th>
 191 |       <td>496</td>
 192 |       <td>3</td>
 193 |       <td>4</td>
 194 |       <td>2020-02-25</td>
 195 |       <td>9999-12-31</td>
 196 |     </tr>
 197 |     <tr>
 198 |       <th>3496</th>
 199 |       <td>497</td>
 200 |       <td>5</td>
 201 |       <td>4</td>
 202 |       <td>2020-05-27</td>
 203 |       <td>9999-12-31</td>
 204 |     </tr>
 205 |     <tr>
 206 |       <th>3497</th>
 207 |       <td>498</td>
 208 |       <td>1</td>
 209 |       <td>2</td>
 210 |       <td>2020-04-05</td>
 211 |       <td>9999-12-31</td>
 212 |     </tr>
 213 |     <tr>
 214 |       <th>3498</th>
 215 |       <td>499</td>
 216 |       <td>5</td>
 217 |       <td>1</td>
 218 |       <td>2020-02-03</td>
 219 |       <td>9999-12-31</td>
 220 |     </tr>
 221 |     <tr>
 222 |       <th>3499</th>
 223 |       <td>500</td>
 224 |       <td>2</td>
 225 |       <td>2</td>
 226 |       <td>2020-04-15</td>
 227 |       <td>9999-12-31</td>
 228 |     </tr>
 229 |   </tbody>
 230 | </table>
 231 | <p>3500 rows × 5 columns</p>
 232 | </div>
 233 | 
 234 | 
 235 |     Table:  customer_transactions
 236 |     
 237 | <table border="1" class="dataframe">
 238 |   <thead>
 239 |     <tr style="text-align: right;">
 240 |       <th></th>
 241 |       <th>customer_id</th>
 242 |       <th>txn_date</th>
 243 |       <th>txn_type</th>
 244 |       <th>txn_amount</th>
 245 |     </tr>
 246 |   </thead>
 247 |   <tbody>
 248 |     <tr>
 249 |       <th>0</th>
 250 |       <td>429</td>
 251 |       <td>2020-01-21</td>
 252 |       <td>deposit</td>
 253 |       <td>82</td>
 254 |     </tr>
 255 |     <tr>
 256 |       <th>1</th>
 257 |       <td>155</td>
 258 |       <td>2020-01-10</td>
 259 |       <td>deposit</td>
 260 |       <td>712</td>
 261 |     </tr>
 262 |     <tr>
 263 |       <th>2</th>
 264 |       <td>398</td>
 265 |       <td>2020-01-01</td>
 266 |       <td>deposit</td>
 267 |       <td>196</td>
 268 |     </tr>
 269 |     <tr>
 270 |       <th>3</th>
 271 |       <td>255</td>
 272 |       <td>2020-01-14</td>
 273 |       <td>deposit</td>
 274 |       <td>563</td>
 275 |     </tr>
 276 |     <tr>
 277 |       <th>4</th>
 278 |       <td>185</td>
 279 |       <td>2020-01-29</td>
 280 |       <td>deposit</td>
 281 |       <td>626</td>
 282 |     </tr>
 283 |     <tr>
 284 |       <th>...</th>
 285 |       <td>...</td>
 286 |       <td>...</td>
 287 |       <td>...</td>
 288 |       <td>...</td>
 289 |     </tr>
 290 |     <tr>
 291 |       <th>5863</th>
 292 |       <td>189</td>
 293 |       <td>2020-02-03</td>
 294 |       <td>withdrawal</td>
 295 |       <td>870</td>
 296 |     </tr>
 297 |     <tr>
 298 |       <th>5864</th>
 299 |       <td>189</td>
 300 |       <td>2020-03-22</td>
 301 |       <td>purchase</td>
 302 |       <td>718</td>
 303 |     </tr>
 304 |     <tr>
 305 |       <th>5865</th>
 306 |       <td>189</td>
 307 |       <td>2020-02-06</td>
 308 |       <td>purchase</td>
 309 |       <td>393</td>
 310 |     </tr>
 311 |     <tr>
 312 |       <th>5866</th>
 313 |       <td>189</td>
 314 |       <td>2020-01-22</td>
 315 |       <td>deposit</td>
 316 |       <td>302</td>
 317 |     </tr>
 318 |     <tr>
 319 |       <th>5867</th>
 320 |       <td>189</td>
 321 |       <td>2020-01-27</td>
 322 |       <td>withdrawal</td>
 323 |       <td>861</td>
 324 |     </tr>
 325 |   </tbody>
 326 | </table>
 327 | <p>5868 rows × 4 columns</p>
 328 | </div>
 329 | 
 330 | 
 331 | 
 332 | <a id = 'CaseStudyQuestions'></a>
 333 | ## Case Study Questions
 334 | 
 335 | <a id = 'A'></a>
 336 | ## A. Customer Nodes Exploration
 337 | 
 338 | #### 1. How many unique nodes are there on the Data Bank system?
 339 | 
 340 | 
 341 | ```python
 342 | pd.read_sql("""
 343 | SELECT COUNT(DISTINCT node_id) AS nodes_count
 344 | FROM data_bank.customer_nodes
 345 | """, conn)
 346 | ```
 347 | 
 348 | <table border="1" class="dataframe">
 349 |   <thead>
 350 |     <tr style="text-align: right;">
 351 |       <th></th>
 352 |       <th>nodes_count</th>
 353 |     </tr>
 354 |   </thead>
 355 |   <tbody>
 356 |     <tr>
 357 |       <th>0</th>
 358 |       <td>5</td>
 359 |     </tr>
 360 |   </tbody>
 361 | </table>
 362 | </div>
 363 | 
 364 | 
 365 | 
 366 | **Result (Attempt #1)**\
 367 | In the Data Bank system, there are 5 unique nodes to which customers will be randomly reallocated.
 368 | 
 369 | However, if we interpret the question differently, we may understand that nodes are unique for each region, and customers are randomly distributed across the nodes *according to their region*. The following query shows the possible uniques nodes in each region.
 370 | 
 371 | **Here is the attempt #2**
 372 | 
 373 | 
 374 | ```python
 375 | pd.read_sql("""
 376 | SELECT 
 377 |     cn.region_id, 
 378 |     r.region_name, 
 379 |     STRING_AGG(DISTINCT cn.node_id::VARCHAR(1), ', ') AS nodes
 380 | FROM data_bank.customer_nodes cn
 381 | JOIN data_bank.regions r ON cn.region_id = r.region_id
 382 | GROUP BY cn.region_id, r.region_name
 383 | """, conn)
 384 | ```
 385 | 
 386 | <table border="1" class="dataframe">
 387 |   <thead>
 388 |     <tr style="text-align: right;">
 389 |       <th></th>
 390 |       <th>region_id</th>
 391 |       <th>region_name</th>
 392 |       <th>nodes</th>
 393 |     </tr>
 394 |   </thead>
 395 |   <tbody>
 396 |     <tr>
 397 |       <th>0</th>
 398 |       <td>1</td>
 399 |       <td>Australia</td>
 400 |       <td>1, 2, 3, 4, 5</td>
 401 |     </tr>
 402 |     <tr>
 403 |       <th>1</th>
 404 |       <td>2</td>
 405 |       <td>America</td>
 406 |       <td>1, 2, 3, 4, 5</td>
 407 |     </tr>
 408 |     <tr>
 409 |       <th>2</th>
 410 |       <td>3</td>
 411 |       <td>Africa</td>
 412 |       <td>1, 2, 3, 4, 5</td>
 413 |     </tr>
 414 |     <tr>
 415 |       <th>3</th>
 416 |       <td>4</td>
 417 |       <td>Asia</td>
 418 |       <td>1, 2, 3, 4, 5</td>
 419 |     </tr>
 420 |     <tr>
 421 |       <th>4</th>
 422 |       <td>5</td>
 423 |       <td>Europe</td>
 424 |       <td>1, 2, 3, 4, 5</td>
 425 |     </tr>
 426 |   </tbody>
 427 | </table>
 428 | </div>
 429 | 
 430 | 
 431 | 
 432 | 
 433 | ```python
 434 | pd.read_sql("""
 435 | SELECT SUM(nb_nodes)::INTEGER AS total_nodes
 436 | FROM
 437 | (
 438 |     -- Find the number of unique nodes per region
 439 |     
 440 |     SELECT region_id, COUNT(DISTINCT node_id) AS nb_nodes
 441 |     FROM data_bank.customer_nodes
 442 |     GROUP BY region_id
 443 | ) n
 444 | """, conn)
 445 | ```
 446 | 
 447 | <table border="1" class="dataframe">
 448 |   <thead>
 449 |     <tr style="text-align: right;">
 450 |       <th></th>
 451 |       <th>total_nodes</th>
 452 |     </tr>
 453 |   </thead>
 454 |   <tbody>
 455 |     <tr>
 456 |       <th>0</th>
 457 |       <td>25</td>
 458 |     </tr>
 459 |   </tbody>
 460 | </table>
 461 | </div>
 462 | 
 463 | 
 464 | 
 465 | **Result (Attempt #2)**\
 466 | Hence, there are 25 unique nodes in the Data Bank system accross the world.
 467 | 
 468 | ___
 469 | #### 2. What is the number of nodes per region?
 470 | 
 471 | 
 472 | ```python
 473 | pd.read_sql("""
 474 | SELECT 
 475 |     r.region_name, 
 476 |     COUNT(DISTINCT node_id) AS nb_nodes
 477 | FROM data_bank.customer_nodes cn
 478 | JOIN data_bank.regions r ON cn.region_id = r.region_id
 479 | GROUP BY r.region_name; 
 480 | """, conn)
 481 | ```
 482 | 
 483 | <table border="1" class="dataframe">
 484 |   <thead>
 485 |     <tr style="text-align: right;">
 486 |       <th></th>
 487 |       <th>region_name</th>
 488 |       <th>nb_nodes</th>
 489 |     </tr>
 490 |   </thead>
 491 |   <tbody>
 492 |     <tr>
 493 |       <th>0</th>
 494 |       <td>Africa</td>
 495 |       <td>5</td>
 496 |     </tr>
 497 |     <tr>
 498 |       <th>1</th>
 499 |       <td>America</td>
 500 |       <td>5</td>
 501 |     </tr>
 502 |     <tr>
 503 |       <th>2</th>
 504 |       <td>Asia</td>
 505 |       <td>5</td>
 506 |     </tr>
 507 |     <tr>
 508 |       <th>3</th>
 509 |       <td>Australia</td>
 510 |       <td>5</td>
 511 |     </tr>
 512 |     <tr>
 513 |       <th>4</th>
 514 |       <td>Europe</td>
 515 |       <td>5</td>
 516 |     </tr>
 517 |   </tbody>
 518 | </table>
 519 | </div>
 520 | 
 521 | 
 522 | 
 523 | **Result**\
 524 | There are 5 nodes per region.
 525 | 
 526 | ___
 527 | #### 3. How many customers are allocated to each region?
 528 | 
 529 | 
 530 | ```python
 531 | pd.read_sql("""
 532 | SELECT 
 533 |     r.region_name AS region, 
 534 |     COUNT(DISTINCT customer_id) AS nb_customers
 535 | FROM data_bank.customer_nodes cn
 536 | JOIN data_bank.regions r ON cn.region_id = r.region_id
 537 | GROUP BY r.region_name
 538 | ORDER BY nb_customers DESC; 
 539 | """, conn)
 540 | ```
 541 | 
 542 | <table border="1" class="dataframe">
 543 |   <thead>
 544 |     <tr style="text-align: right;">
 545 |       <th></th>
 546 |       <th>region</th>
 547 |       <th>nb_customers</th>
 548 |     </tr>
 549 |   </thead>
 550 |   <tbody>
 551 |     <tr>
 552 |       <th>0</th>
 553 |       <td>Australia</td>
 554 |       <td>110</td>
 555 |     </tr>
 556 |     <tr>
 557 |       <th>1</th>
 558 |       <td>America</td>
 559 |       <td>105</td>
 560 |     </tr>
 561 |     <tr>
 562 |       <th>2</th>
 563 |       <td>Africa</td>
 564 |       <td>102</td>
 565 |     </tr>
 566 |     <tr>
 567 |       <th>3</th>
 568 |       <td>Asia</td>
 569 |       <td>95</td>
 570 |     </tr>
 571 |     <tr>
 572 |       <th>4</th>
 573 |       <td>Europe</td>
 574 |       <td>88</td>
 575 |     </tr>
 576 |   </tbody>
 577 | </table>
 578 | </div>
 579 | 
 580 | 
 581 | 
 582 | **Result**
 583 | - The majority of Data Bank customers are from Australia, totaling 110 customers.
 584 | - Following Australia, America and Africa are the second and third regions with the highest number of Data Bank customers.
 585 | - Asia has a total of 95 Data Bank customers.
 586 | - Data Bank does not seem to attract many customers from Europe, with only 88 clients.
 587 | 
 588 | ___
 589 | #### 4. How many days on average are customers reallocated to a different node?
 590 | 
 591 | 
 592 | ```python
 593 | pd.read_sql("""
 594 | SELECT ROUND(AVG(end_date::date - start_date::date), 1) AS avg_reallocation_days
 595 | FROM data_bank.customer_nodes
 596 | WHERE end_date != '9999-12-31'
 597 | """, conn)
 598 | ```
 599 | 
 600 | <table border="1" class="dataframe">
 601 |   <thead>
 602 |     <tr style="text-align: right;">
 603 |       <th></th>
 604 |       <th>avg_reallocation_days</th>
 605 |     </tr>
 606 |   </thead>
 607 |   <tbody>
 608 |     <tr>
 609 |       <th>0</th>
 610 |       <td>14.6</td>
 611 |     </tr>
 612 |   </tbody>
 613 | </table>
 614 | </div>
 615 | 
 616 | 
 617 | 
 618 | **Result (Attempt #1)**\
 619 | Customers are reallocated to a different node on average after 14.6 days .
 620 | 
 621 | However, if we examine the `customer_id 7`, we can observe two instances where the customer has been reallocated to the same node (nodes 2 and 4). For customer ID 7, it took him/her 
 622 | - 21 days to be reallocated from node 4 to node 2
 623 | - 34 days to be reallocated from node 2 to node 4
 624 | 
 625 | 
 626 | ```python
 627 | pd.read_sql("""
 628 | SELECT * 
 629 | FROM data_bank.customer_nodes
 630 | WHERE customer_id = 7
 631 | """, conn)
 632 | ```
 633 | 
 634 | <table border="1" class="dataframe">
 635 |   <thead>
 636 |     <tr style="text-align: right;">
 637 |       <th></th>
 638 |       <th>customer_id</th>
 639 |       <th>region_id</th>
 640 |       <th>node_id</th>
 641 |       <th>start_date</th>
 642 |       <th>end_date</th>
 643 |     </tr>
 644 |   </thead>
 645 |   <tbody>
 646 |     <tr>
 647 |       <th>0</th>
 648 |       <td>7</td>
 649 |       <td>2</td>
 650 |       <td>5</td>
 651 |       <td>2020-01-20</td>
 652 |       <td>2020-02-04</td>
 653 |     </tr>
 654 |     <tr>
 655 |       <th>1</th>
 656 |       <td>7</td>
 657 |       <td>2</td>
 658 |       <td>4</td>
 659 |       <td>2020-02-05</td>
 660 |       <td>2020-02-20</td>
 661 |     </tr>
 662 |     <tr>
 663 |       <th>2</th>
 664 |       <td>7</td>
 665 |       <td>2</td>
 666 |       <td>4</td>
 667 |       <td>2020-02-21</td>
 668 |       <td>2020-02-26</td>
 669 |     </tr>
 670 |     <tr>
 671 |       <th>3</th>
 672 |       <td>7</td>
 673 |       <td>2</td>
 674 |       <td>2</td>
 675 |       <td>2020-02-27</td>
 676 |       <td>2020-03-05</td>
 677 |     </tr>
 678 |     <tr>
 679 |       <th>4</th>
 680 |       <td>7</td>
 681 |       <td>2</td>
 682 |       <td>2</td>
 683 |       <td>2020-03-06</td>
 684 |       <td>2020-04-01</td>
 685 |     </tr>
 686 |     <tr>
 687 |       <th>5</th>
 688 |       <td>7</td>
 689 |       <td>2</td>
 690 |       <td>4</td>
 691 |       <td>2020-04-02</td>
 692 |       <td>2020-04-07</td>
 693 |     </tr>
 694 |     <tr>
 695 |       <th>6</th>
 696 |       <td>7</td>
 697 |       <td>2</td>
 698 |       <td>5</td>
 699 |       <td>2020-04-08</td>
 700 |       <td>9999-12-31</td>
 701 |     </tr>
 702 |   </tbody>
 703 | </table>
 704 | </div>
 705 | 
 706 | 
 707 | 
 708 | **Table 1**: Customer 7 reallocated to node no 4 twice
 709 | 
 710 | node_id | start_date | end_date | nb_reallocation_days
 711 | --- | --- | --- | ---
 712 | 4 | 2020-02-05 | 2020-02-20	| --
 713 | 4 |	2020-02-21 | 2020-02-26	| 21
 714 | 
 715 | **Table 2**: Customer 7 reallocated to node no 2 twice 
 716 | 
 717 | node_id | start_date | end_date | nb_reallocation_days
 718 | --- | --- | --- | ---
 719 | 2 | 2020-02-27 | 2020-03-05 | --
 720 | 2 | 2020-03-06 | 2020-04-01	| 34
 721 | 
 722 | Hence, here is a query for the `reallocation_cte` that will be used later to answer the question. The query takes into consideration counting the total number of days for which customers will be reallocated to a different node, and includes all instances of both same and different nodes.
 723 | 
 724 | 
 725 | ```python
 726 | # Let's see for customer ID 7
 727 | 
 728 | pd.read_sql("""
 729 | SELECT *,
 730 |     CASE 
 731 |         WHEN LEAD(node_id) OVER w = node_id THEN NULL 
 732 |         WHEN LAG(node_id) OVER w = node_id THEN end_date::date - LAG(start_date) OVER w::date
 733 |         ELSE end_date::date - start_date::date
 734 |     END AS nb_days
 735 | FROM data_bank.customer_nodes
 736 | WHERE end_date != '9999-12-31' AND customer_id = 7
 737 | WINDOW w AS (PARTITION BY customer_id ORDER BY start_date)
 738 | """, conn)
 739 | ```
 740 | 
 741 | <table border="1" class="dataframe">
 742 |   <thead>
 743 |     <tr style="text-align: right;">
 744 |       <th></th>
 745 |       <th>customer_id</th>
 746 |       <th>region_id</th>
 747 |       <th>node_id</th>
 748 |       <th>start_date</th>
 749 |       <th>end_date</th>
 750 |       <th>nb_days</th>
 751 |     </tr>
 752 |   </thead>
 753 |   <tbody>
 754 |     <tr>
 755 |       <th>0</th>
 756 |       <td>7</td>
 757 |       <td>2</td>
 758 |       <td>5</td>
 759 |       <td>2020-01-20</td>
 760 |       <td>2020-02-04</td>
 761 |       <td>15.0</td>
 762 |     </tr>
 763 |     <tr>
 764 |       <th>1</th>
 765 |       <td>7</td>
 766 |       <td>2</td>
 767 |       <td>4</td>
 768 |       <td>2020-02-05</td>
 769 |       <td>2020-02-20</td>
 770 |       <td>NaN</td>
 771 |     </tr>
 772 |     <tr>
 773 |       <th>2</th>
 774 |       <td>7</td>
 775 |       <td>2</td>
 776 |       <td>4</td>
 777 |       <td>2020-02-21</td>
 778 |       <td>2020-02-26</td>
 779 |       <td>21.0</td>
 780 |     </tr>
 781 |     <tr>
 782 |       <th>3</th>
 783 |       <td>7</td>
 784 |       <td>2</td>
 785 |       <td>2</td>
 786 |       <td>2020-02-27</td>
 787 |       <td>2020-03-05</td>
 788 |       <td>NaN</td>
 789 |     </tr>
 790 |     <tr>
 791 |       <th>4</th>
 792 |       <td>7</td>
 793 |       <td>2</td>
 794 |       <td>2</td>
 795 |       <td>2020-03-06</td>
 796 |       <td>2020-04-01</td>
 797 |       <td>34.0</td>
 798 |     </tr>
 799 |     <tr>
 800 |       <th>5</th>
 801 |       <td>7</td>
 802 |       <td>2</td>
 803 |       <td>4</td>
 804 |       <td>2020-04-02</td>
 805 |       <td>2020-04-07</td>
 806 |       <td>5.0</td>
 807 |     </tr>
 808 |   </tbody>
 809 | </table>
 810 | </div>
 811 | 
 812 | 
 813 | 
 814 | Now, let's utilize the aforementioned query as the `reallocation_cte`, to provide a comprehensive answer.
 815 | 
 816 | 
 817 | ```python
 818 | pd.read_sql("""
 819 | WITH reallocation_cte AS
 820 | (
 821 |     -- Find the number of reallocation days
 822 |     SELECT *, 
 823 |         CASE 
 824 |             WHEN LEAD(node_id) OVER w = node_id THEN NULL 
 825 |             WHEN LAG(node_id) OVER w = node_id THEN end_date::date - LAG(start_date) OVER w::date
 826 |             ELSE end_date::date - start_date::date
 827 |         END AS nb_reallocation_days
 828 |     FROM data_bank.customer_nodes
 829 |     WHERE end_date != '9999-12-31' 
 830 |     WINDOW w AS (PARTITION BY customer_id ORDER BY start_date)
 831 | )
 832 | SELECT ROUND(AVG(nb_reallocation_days),1) AS avg_reallocation_days
 833 | FROM reallocation_cte;
 834 | """, conn)
 835 | ```
 836 | 
 837 | <table border="1" class="dataframe">
 838 |   <thead>
 839 |     <tr style="text-align: right;">
 840 |       <th></th>
 841 |       <th>avg_reallocation_days</th>
 842 |     </tr>
 843 |   </thead>
 844 |   <tbody>
 845 |     <tr>
 846 |       <th>0</th>
 847 |       <td>17.3</td>
 848 |     </tr>
 849 |   </tbody>
 850 | </table>
 851 | </div>
 852 | 
 853 | 
 854 | 
 855 | **Result (Attempt #2)**\
 856 | Customers are reallocated to a different node on average after 17.3 days .
 857 | 
 858 | ___
 859 | #### 5. What is the median, 80th and 95th percentile for this same reallocation days metric for each region?
 860 | 
 861 | Since I have previously addressed the reallocation days problem (see question #4 above) using two different approaches, I will also provide my answer to this question in two attempts. Please note that the questions 4 and 5 are rather vague.
 862 | 
 863 | **Result (Attempt #1)**
 864 | 
 865 | 
 866 | ```python
 867 | pd.read_sql("""
 868 | SELECT
 869 |     region_name, 
 870 |     PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY reallocation_days) as median,
 871 |     PERCENTILE_DISC(0.8) WITHIN GROUP(ORDER BY reallocation_days) as percentile_80th,
 872 |     PERCENTILE_DISC(0.95) WITHIN GROUP(ORDER BY reallocation_days) as percentile_95th
 873 | FROM
 874 | (
 875 |     SELECT 
 876 |         n.node_id, 
 877 |         n.start_date, 
 878 |         n.end_date, 
 879 |         r.region_name, 
 880 |         n.end_date::date - n.start_date::date AS reallocation_days
 881 |     FROM data_bank.customer_nodes n
 882 |     JOIN data_bank.regions r ON n.region_id = r.region_id
 883 |     WHERE n.end_date != '9999-12-31' 
 884 | ) re
 885 | GROUP BY region_name;
 886 | """, conn)
 887 | ```
 888 | 
 889 | <table border="1" class="dataframe">
 890 |   <thead>
 891 |     <tr style="text-align: right;">
 892 |       <th></th>
 893 |       <th>region_name</th>
 894 |       <th>median</th>
 895 |       <th>percentile_80th</th>
 896 |       <th>percentile_95th</th>
 897 |     </tr>
 898 |   </thead>
 899 |   <tbody>
 900 |     <tr>
 901 |       <th>0</th>
 902 |       <td>Africa</td>
 903 |       <td>15</td>
 904 |       <td>24</td>
 905 |       <td>28</td>
 906 |     </tr>
 907 |     <tr>
 908 |       <th>1</th>
 909 |       <td>America</td>
 910 |       <td>15</td>
 911 |       <td>23</td>
 912 |       <td>28</td>
 913 |     </tr>
 914 |     <tr>
 915 |       <th>2</th>
 916 |       <td>Asia</td>
 917 |       <td>15</td>
 918 |       <td>23</td>
 919 |       <td>28</td>
 920 |     </tr>
 921 |     <tr>
 922 |       <th>3</th>
 923 |       <td>Australia</td>
 924 |       <td>15</td>
 925 |       <td>23</td>
 926 |       <td>28</td>
 927 |     </tr>
 928 |     <tr>
 929 |       <th>4</th>
 930 |       <td>Europe</td>
 931 |       <td>15</td>
 932 |       <td>24</td>
 933 |       <td>28</td>
 934 |     </tr>
 935 |   </tbody>
 936 | </table>
 937 | </div>
 938 | 
 939 | 
 940 | 
 941 | **Result (Attempt #2)**
 942 | 
 943 | 
 944 | ```python
 945 | pd.read_sql("""
 946 | WITH reallocation_cte AS
 947 | (
 948 |     SELECT *, 
 949 |         CASE 
 950 |             WHEN LEAD(node_id) OVER w = node_id THEN NULL 
 951 |             WHEN LAG(node_id) OVER w = node_id THEN end_date::date - LAG(start_date) OVER w::date
 952 |             ELSE end_date::date - start_date::date
 953 |         END AS nb_reallocation_days
 954 |     FROM data_bank.customer_nodes
 955 |     WHERE end_date != '9999-12-31' 
 956 |     WINDOW w AS (PARTITION BY customer_id ORDER BY start_date)
 957 | )
 958 | SELECT 
 959 |     r.region_name, 
 960 |     PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY cte.nb_reallocation_days) as median,
 961 |     PERCENTILE_DISC(0.8) WITHIN GROUP(ORDER BY cte.nb_reallocation_days) as percentile_80th,
 962 |     PERCENTILE_DISC(0.95) WITHIN GROUP(ORDER BY cte.nb_reallocation_days) as percentile_95th
 963 | FROM reallocation_cte cte
 964 | JOIN data_bank.regions r ON cte.region_id = r.region_id
 965 | GROUP BY r.region_name;
 966 | """, conn)
 967 | ```
 968 | 
 969 | <table border="1" class="dataframe">
 970 |   <thead>
 971 |     <tr style="text-align: right;">
 972 |       <th></th>
 973 |       <th>region_name</th>
 974 |       <th>median</th>
 975 |       <th>percentile_80th</th>
 976 |       <th>percentile_95th</th>
 977 |     </tr>
 978 |   </thead>
 979 |   <tbody>
 980 |     <tr>
 981 |       <th>0</th>
 982 |       <td>Africa</td>
 983 |       <td>17</td>
 984 |       <td>27</td>
 985 |       <td>36</td>
 986 |     </tr>
 987 |     <tr>
 988 |       <th>1</th>
 989 |       <td>America</td>
 990 |       <td>17</td>
 991 |       <td>26</td>
 992 |       <td>36</td>
 993 |     </tr>
 994 |     <tr>
 995 |       <th>2</th>
 996 |       <td>Asia</td>
 997 |       <td>17</td>
 998 |       <td>25</td>
 999 |       <td>34</td>
1000 |     </tr>
1001 |     <tr>
1002 |       <th>3</th>
1003 |       <td>Australia</td>
1004 |       <td>17</td>
1005 |       <td>26</td>
1006 |       <td>36</td>
1007 |     </tr>
1008 |     <tr>
1009 |       <th>4</th>
1010 |       <td>Europe</td>
1011 |       <td>18</td>
1012 |       <td>27</td>
1013 |       <td>37</td>
1014 |     </tr>
1015 |   </tbody>
1016 | </table>
1017 | </div>
1018 | 
1019 | 
1020 | 
1021 | ___
1022 | <a id = 'B'></a>
1023 | ## B. Customer Transactions
1024 | #### 1. What is the unique count and total amount for each transaction type?
1025 | 
1026 | 
1027 | ```python
1028 | pd.read_sql("""
1029 | SELECT 
1030 |     txn_type AS transaction_type, 
1031 |     to_char(COUNT(txn_type), 'FM 999,999') AS count, 
1032 |     to_char(SUM(txn_amount), 'FM$ 999,999,999.99') AS total_amount
1033 | FROM data_bank.customer_transactions
1034 | GROUP BY txn_type
1035 | """, conn)
1036 | ```
1037 | 
1038 | <table border="1" class="dataframe">
1039 |   <thead>
1040 |     <tr style="text-align: right;">
1041 |       <th></th>
1042 |       <th>transaction_type</th>
1043 |       <th>count</th>
1044 |       <th>total_amount</th>
1045 |     </tr>
1046 |   </thead>
1047 |   <tbody>
1048 |     <tr>
1049 |       <th>0</th>
1050 |       <td>purchase</td>
1051 |       <td>1,617</td>
1052 |       <td>$ 806,537.</td>
1053 |     </tr>
1054 |     <tr>
1055 |       <th>1</th>
1056 |       <td>withdrawal</td>
1057 |       <td>1,580</td>
1058 |       <td>$ 793,003.</td>
1059 |     </tr>
1060 |     <tr>
1061 |       <th>2</th>
1062 |       <td>deposit</td>
1063 |       <td>2,671</td>
1064 |       <td>$ 1,359,168.</td>
1065 |     </tr>
1066 |   </tbody>
1067 | </table>
1068 | </div>
1069 | 
1070 | 
1071 | 
1072 | **Result**
1073 | - The most fequently used transaction type at Data Bank is **deposit**, with a total numbers of 2,671 deposits, amounting to $1,359,168.
1074 | - There is a total of 1,671 **purchases** made at Data Bank, totalling $ 806,537.
1075 | - There is a total of 1,580 **withdrawals** made at Data Bank, totalling $ 793,003
1076 | 
1077 | ___
1078 | #### 2. What is the average total historical deposit counts and amounts for all customers?
1079 | 
1080 | 
1081 | ```python
1082 | pd.read_sql("""
1083 | SELECT 
1084 |     AVG(deposit_count)::INTEGER As nb_deposit, 
1085 |     to_char(AVG(avg_deposit_amount), 'FM$ 999,999.99') AS avg_deposit_amount
1086 | FROM
1087 | (
1088 |     -- Find the avegrage count and amount of deposits made by each customer
1089 |     SELECT 
1090 |         customer_id, 
1091 |         COUNT(txn_type) as deposit_count, 
1092 |         AVG(txn_amount) AS avg_deposit_amount
1093 |     FROM data_bank.customer_transactions
1094 |     WHERE txn_type = 'deposit'
1095 |     GROUP BY customer_id
1096 |     ORDER BY customer_id
1097 | ) d
1098 | """, conn)
1099 | ```
1100 | 
1101 | <table border="1" class="dataframe">
1102 |   <thead>
1103 |     <tr style="text-align: right;">
1104 |       <th></th>
1105 |       <th>nb_deposit</th>
1106 |       <th>avg_deposit_amount</th>
1107 |     </tr>
1108 |   </thead>
1109 |   <tbody>
1110 |     <tr>
1111 |       <th>0</th>
1112 |       <td>5</td>
1113 |       <td>$ 508.61</td>
1114 |     </tr>
1115 |   </tbody>
1116 | </table>
1117 | </div>
1118 | 
1119 | 
1120 | 
1121 | **Result**\
1122 | The Data Bank customers made an average of 5 deposits, with an average amount of $ 508.61.
1123 | 
1124 | ___
1125 | #### 3. For each month - how many Data Bank customers make more than 1 deposit and either 1 purchase or 1 withdrawal in a single month?
1126 | 
1127 | 
1128 | ```python
1129 | pd.read_sql("""
1130 | WITH counting_transactions_cte AS
1131 | (
1132 |     SELECT 
1133 |         customer_id,
1134 |         EXTRACT(MONTH FROM txn_date)::INTEGER AS month,
1135 |         TO_CHAR(txn_date, 'MONTH') AS month_name,
1136 |         SUM(CASE WHEN txn_type = 'deposit' THEN 1 ELSE 0 END) AS deposit,
1137 |         SUM(CASE WHEN txn_type = 'purchase' THEN 1 ELSE 0 END) AS purchase,
1138 |         SUM(CASE WHEN txn_type = 'withdrawal' THEN 1 ELSE 0 END) AS withdrawal
1139 |     FROM data_bank.customer_transactions
1140 |     GROUP BY customer_id, month, month_name
1141 |     ORDER BY customer_id
1142 | )
1143 | SELECT 
1144 |     month, 
1145 |     month_name, 
1146 |     COUNT(DISTINCT customer_id) AS nb_customers
1147 | FROM counting_transactions_cte
1148 | WHERE deposit > 1 AND (purchase > 0 OR withdrawal > 0)
1149 | GROUP BY month, month_name
1150 | ORDER BY month;
1151 | """, conn)
1152 | ```
1153 | 
1154 | <table border="1" class="dataframe">
1155 |   <thead>
1156 |     <tr style="text-align: right;">
1157 |       <th></th>
1158 |       <th>month</th>
1159 |       <th>month_name</th>
1160 |       <th>nb_customers</th>
1161 |     </tr>
1162 |   </thead>
1163 |   <tbody>
1164 |     <tr>
1165 |       <th>0</th>
1166 |       <td>1</td>
1167 |       <td>JANUARY</td>
1168 |       <td>168</td>
1169 |     </tr>
1170 |     <tr>
1171 |       <th>1</th>
1172 |       <td>2</td>
1173 |       <td>FEBRUARY</td>
1174 |       <td>181</td>
1175 |     </tr>
1176 |     <tr>
1177 |       <th>2</th>
1178 |       <td>3</td>
1179 |       <td>MARCH</td>
1180 |       <td>192</td>
1181 |     </tr>
1182 |     <tr>
1183 |       <th>3</th>
1184 |       <td>4</td>
1185 |       <td>APRIL</td>
1186 |       <td>70</td>
1187 |     </tr>
1188 |   </tbody>
1189 | </table>
1190 | </div>
1191 | 
1192 | 
1193 | 
1194 | **Result**
1195 | - **March** is the month when Data Bank customers make the most transactions (more than 1 deposit and either 1 purchase or 1 withdrawal in a single month), with a total of 192 customers.
1196 | - Following March, **February** is the second month with the highest number of customers engaging in the given transactions.
1197 | - In **January**, a total of 168 customers made more than 1 deposit and either 1 purchase or 1 withdrawal.
1198 | - The month of **April** has the fewest number of customers conducting these types of transactions, with only 70 clients.
1199 | 
1200 | ___
1201 | #### 4. What is the closing balance for each customer at the end of the month?
1202 | 
1203 | Here are the results showing the first 5 customers.
1204 | 
1205 | 
1206 | ```python
1207 | pd.read_sql("""
1208 | SELECT 
1209 |     customer_id,
1210 |     EXTRACT(MONTH FROM txn_date)::INTEGER AS month,
1211 |     TO_CHAR(txn_date, 'MONTH') AS month_name,
1212 |     SUM(CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE - txn_amount END) AS closing_balance
1213 | FROM data_bank.customer_transactions
1214 | WHERE customer_id <= 5
1215 | GROUP BY customer_id, month, month_name
1216 | ORDER BY customer_id, month;
1217 | """, conn)
1218 | ```
1219 | 
1220 | <table border="1" class="dataframe">
1221 |   <thead>
1222 |     <tr style="text-align: right;">
1223 |       <th></th>
1224 |       <th>customer_id</th>
1225 |       <th>month</th>
1226 |       <th>month_name</th>
1227 |       <th>closing_balance</th>
1228 |     </tr>
1229 |   </thead>
1230 |   <tbody>
1231 |     <tr>
1232 |       <th>0</th>
1233 |       <td>1</td>
1234 |       <td>1</td>
1235 |       <td>JANUARY</td>
1236 |       <td>312</td>
1237 |     </tr>
1238 |     <tr>
1239 |       <th>1</th>
1240 |       <td>1</td>
1241 |       <td>3</td>
1242 |       <td>MARCH</td>
1243 |       <td>-952</td>
1244 |     </tr>
1245 |     <tr>
1246 |       <th>2</th>
1247 |       <td>2</td>
1248 |       <td>1</td>
1249 |       <td>JANUARY</td>
1250 |       <td>549</td>
1251 |     </tr>
1252 |     <tr>
1253 |       <th>3</th>
1254 |       <td>2</td>
1255 |       <td>3</td>
1256 |       <td>MARCH</td>
1257 |       <td>61</td>
1258 |     </tr>
1259 |     <tr>
1260 |       <th>4</th>
1261 |       <td>3</td>
1262 |       <td>1</td>
1263 |       <td>JANUARY</td>
1264 |       <td>144</td>
1265 |     </tr>
1266 |     <tr>
1267 |       <th>5</th>
1268 |       <td>3</td>
1269 |       <td>2</td>
1270 |       <td>FEBRUARY</td>
1271 |       <td>-965</td>
1272 |     </tr>
1273 |     <tr>
1274 |       <th>6</th>
1275 |       <td>3</td>
1276 |       <td>3</td>
1277 |       <td>MARCH</td>
1278 |       <td>-401</td>
1279 |     </tr>
1280 |     <tr>
1281 |       <th>7</th>
1282 |       <td>3</td>
1283 |       <td>4</td>
1284 |       <td>APRIL</td>
1285 |       <td>493</td>
1286 |     </tr>
1287 |     <tr>
1288 |       <th>8</th>
1289 |       <td>4</td>
1290 |       <td>1</td>
1291 |       <td>JANUARY</td>
1292 |       <td>848</td>
1293 |     </tr>
1294 |     <tr>
1295 |       <th>9</th>
1296 |       <td>4</td>
1297 |       <td>3</td>
1298 |       <td>MARCH</td>
1299 |       <td>-193</td>
1300 |     </tr>
1301 |     <tr>
1302 |       <th>10</th>
1303 |       <td>5</td>
1304 |       <td>1</td>
1305 |       <td>JANUARY</td>
1306 |       <td>954</td>
1307 |     </tr>
1308 |     <tr>
1309 |       <th>11</th>
1310 |       <td>5</td>
1311 |       <td>3</td>
1312 |       <td>MARCH</td>
1313 |       <td>-2877</td>
1314 |     </tr>
1315 |     <tr>
1316 |       <th>12</th>
1317 |       <td>5</td>
1318 |       <td>4</td>
1319 |       <td>APRIL</td>
1320 |       <td>-490</td>
1321 |     </tr>
1322 |   </tbody>
1323 | </table>
1324 | </div>
1325 | 
1326 | 
1327 | 
1328 | ___
1329 | #### 5. What is the percentage of customers who increase their closing balance by more than 5%?
1330 | 
1331 | 
1332 | ```python
1333 | pd.read_sql("""
1334 | WITH monthly_balance_cte AS
1335 | (
1336 |     SELECT 
1337 |         customer_id,
1338 |         EXTRACT(MONTH FROM txn_date)::INTEGER AS month,
1339 |         SUM(CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE - txn_amount END) AS closing_balance
1340 |     FROM data_bank.customer_transactions
1341 |     GROUP BY customer_id, month
1342 |     ORDER BY customer_id, month
1343 | ),
1344 | balance_greaterthan5_cte AS
1345 | (
1346 |     SELECT COUNT(DISTINCT customer_id) AS nb_customers
1347 |     FROM
1348 |     (
1349 |         SELECT 
1350 |             customer_id, 
1351 |             (LEAD(closing_balance) OVER (PARTITION BY customer_id ORDER BY month) - closing_balance)/ closing_balance::numeric*100 AS percent_change
1352 |         FROM monthly_balance_cte
1353 |     ) pc
1354 |     WHERE percent_change > 5
1355 | )
1356 | SELECT 
1357 |     MAX(nb_customers) AS nb_customers, 
1358 |     COUNT(DISTINCT ct.customer_id) AS total_customers,
1359 |     CONCAT(ROUND(MAX(nb_customers)/COUNT(DISTINCT ct.customer_id)::numeric * 100,1), ' %') AS percentage_customers
1360 | FROM balance_greaterthan5_cte b, data_bank.customer_transactions ct
1361 | """, conn)
1362 | ```
1363 | 
1364 | <table border="1" class="dataframe">
1365 |   <thead>
1366 |     <tr style="text-align: right;">
1367 |       <th></th>
1368 |       <th>nb_customers</th>
1369 |       <th>total_customers</th>
1370 |       <th>percentage_customers</th>
1371 |     </tr>
1372 |   </thead>
1373 |   <tbody>
1374 |     <tr>
1375 |       <th>0</th>
1376 |       <td>269</td>
1377 |       <td>500</td>
1378 |       <td>53.8 %</td>
1379 |     </tr>
1380 |   </tbody>
1381 | </table>
1382 | </div>
1383 | 
1384 | 
1385 | 
1386 | **Result**\
1387 | There are 53.8 % of the Data Bank customers who increase their closing balance by more than 5%.
1388 | 
1389 | ___
1390 | <a id = 'C'></a>
1391 | ## C. Data Allocation Challenge
1392 | 
1393 | To test out a few different hypotheses - the Data Bank team wants to run an experiment where different groups of customers would be allocated data using 3 different options:
1394 | 
1395 | - `Option 1`: data is allocated based off the amount of money at the end of the previous month
1396 | - `Option 2`: data is allocated on the average amount of money kept in the account in the previous 30 days
1397 | - `Option 3`: data is updated real-time
1398 | 
1399 | For this multi-part challenge question - you have been requested to generate the following data elements to help the Data Bank team estimate how much data will need to be provisioned for each option:
1400 | - running customer balance column that includes the impact each transaction
1401 | - customer balance at the end of each month
1402 | - minimum, average and maximum values of the running balance for each customer
1403 | 
1404 | Using all of the data available - how much data would have been required for each option on a monthly basis?
1405 | 
1406 | ### Running Balance
1407 | 
1408 | 
1409 | ```python
1410 | pd.read_sql("""
1411 | SELECT *, 
1412 |     SUM(
1413 |         CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE -txn_amount END
1414 |     ) OVER (PARTITION BY customer_id ORDER BY txn_date) AS running_balance
1415 | FROM data_bank.customer_transactions
1416 | WHERE customer_id <= 5;
1417 | """, conn)
1418 | ```
1419 | 
1420 | <table border="1" class="dataframe">
1421 |   <thead>
1422 |     <tr style="text-align: right;">
1423 |       <th></th>
1424 |       <th>customer_id</th>
1425 |       <th>txn_date</th>
1426 |       <th>txn_type</th>
1427 |       <th>txn_amount</th>
1428 |       <th>running_balance</th>
1429 |     </tr>
1430 |   </thead>
1431 |   <tbody>
1432 |     <tr>
1433 |       <th>0</th>
1434 |       <td>1</td>
1435 |       <td>2020-01-02</td>
1436 |       <td>deposit</td>
1437 |       <td>312</td>
1438 |       <td>312</td>
1439 |     </tr>
1440 |     <tr>
1441 |       <th>1</th>
1442 |       <td>1</td>
1443 |       <td>2020-03-05</td>
1444 |       <td>purchase</td>
1445 |       <td>612</td>
1446 |       <td>-300</td>
1447 |     </tr>
1448 |     <tr>
1449 |       <th>2</th>
1450 |       <td>1</td>
1451 |       <td>2020-03-17</td>
1452 |       <td>deposit</td>
1453 |       <td>324</td>
1454 |       <td>24</td>
1455 |     </tr>
1456 |     <tr>
1457 |       <th>3</th>
1458 |       <td>1</td>
1459 |       <td>2020-03-19</td>
1460 |       <td>purchase</td>
1461 |       <td>664</td>
1462 |       <td>-640</td>
1463 |     </tr>
1464 |     <tr>
1465 |       <th>4</th>
1466 |       <td>2</td>
1467 |       <td>2020-01-03</td>
1468 |       <td>deposit</td>
1469 |       <td>549</td>
1470 |       <td>549</td>
1471 |     </tr>
1472 |     <tr>
1473 |       <th>5</th>
1474 |       <td>2</td>
1475 |       <td>2020-03-24</td>
1476 |       <td>deposit</td>
1477 |       <td>61</td>
1478 |       <td>610</td>
1479 |     </tr>
1480 |     <tr>
1481 |       <th>6</th>
1482 |       <td>3</td>
1483 |       <td>2020-01-27</td>
1484 |       <td>deposit</td>
1485 |       <td>144</td>
1486 |       <td>144</td>
1487 |     </tr>
1488 |     <tr>
1489 |       <th>7</th>
1490 |       <td>3</td>
1491 |       <td>2020-02-22</td>
1492 |       <td>purchase</td>
1493 |       <td>965</td>
1494 |       <td>-821</td>
1495 |     </tr>
1496 |     <tr>
1497 |       <th>8</th>
1498 |       <td>3</td>
1499 |       <td>2020-03-05</td>
1500 |       <td>withdrawal</td>
1501 |       <td>213</td>
1502 |       <td>-1034</td>
1503 |     </tr>
1504 |     <tr>
1505 |       <th>9</th>
1506 |       <td>3</td>
1507 |       <td>2020-03-19</td>
1508 |       <td>withdrawal</td>
1509 |       <td>188</td>
1510 |       <td>-1222</td>
1511 |     </tr>
1512 |     <tr>
1513 |       <th>10</th>
1514 |       <td>3</td>
1515 |       <td>2020-04-12</td>
1516 |       <td>deposit</td>
1517 |       <td>493</td>
1518 |       <td>-729</td>
1519 |     </tr>
1520 |     <tr>
1521 |       <th>11</th>
1522 |       <td>4</td>
1523 |       <td>2020-01-07</td>
1524 |       <td>deposit</td>
1525 |       <td>458</td>
1526 |       <td>458</td>
1527 |     </tr>
1528 |     <tr>
1529 |       <th>12</th>
1530 |       <td>4</td>
1531 |       <td>2020-01-21</td>
1532 |       <td>deposit</td>
1533 |       <td>390</td>
1534 |       <td>848</td>
1535 |     </tr>
1536 |     <tr>
1537 |       <th>13</th>
1538 |       <td>4</td>
1539 |       <td>2020-03-25</td>
1540 |       <td>purchase</td>
1541 |       <td>193</td>
1542 |       <td>655</td>
1543 |     </tr>
1544 |     <tr>
1545 |       <th>14</th>
1546 |       <td>5</td>
1547 |       <td>2020-01-15</td>
1548 |       <td>deposit</td>
1549 |       <td>974</td>
1550 |       <td>974</td>
1551 |     </tr>
1552 |     <tr>
1553 |       <th>15</th>
1554 |       <td>5</td>
1555 |       <td>2020-01-25</td>
1556 |       <td>deposit</td>
1557 |       <td>806</td>
1558 |       <td>1780</td>
1559 |     </tr>
1560 |     <tr>
1561 |       <th>16</th>
1562 |       <td>5</td>
1563 |       <td>2020-01-31</td>
1564 |       <td>withdrawal</td>
1565 |       <td>826</td>
1566 |       <td>954</td>
1567 |     </tr>
1568 |     <tr>
1569 |       <th>17</th>
1570 |       <td>5</td>
1571 |       <td>2020-03-02</td>
1572 |       <td>purchase</td>
1573 |       <td>886</td>
1574 |       <td>68</td>
1575 |     </tr>
1576 |     <tr>
1577 |       <th>18</th>
1578 |       <td>5</td>
1579 |       <td>2020-03-19</td>
1580 |       <td>deposit</td>
1581 |       <td>718</td>
1582 |       <td>786</td>
1583 |     </tr>
1584 |     <tr>
1585 |       <th>19</th>
1586 |       <td>5</td>
1587 |       <td>2020-03-26</td>
1588 |       <td>withdrawal</td>
1589 |       <td>786</td>
1590 |       <td>0</td>
1591 |     </tr>
1592 |     <tr>
1593 |       <th>20</th>
1594 |       <td>5</td>
1595 |       <td>2020-03-27</td>
1596 |       <td>deposit</td>
1597 |       <td>412</td>
1598 |       <td>-288</td>
1599 |     </tr>
1600 |     <tr>
1601 |       <th>21</th>
1602 |       <td>5</td>
1603 |       <td>2020-03-27</td>
1604 |       <td>withdrawal</td>
1605 |       <td>700</td>
1606 |       <td>-288</td>
1607 |     </tr>
1608 |     <tr>
1609 |       <th>22</th>
1610 |       <td>5</td>
1611 |       <td>2020-03-29</td>
1612 |       <td>purchase</td>
1613 |       <td>852</td>
1614 |       <td>-1140</td>
1615 |     </tr>
1616 |     <tr>
1617 |       <th>23</th>
1618 |       <td>5</td>
1619 |       <td>2020-03-31</td>
1620 |       <td>purchase</td>
1621 |       <td>783</td>
1622 |       <td>-1923</td>
1623 |     </tr>
1624 |     <tr>
1625 |       <th>24</th>
1626 |       <td>5</td>
1627 |       <td>2020-04-02</td>
1628 |       <td>withdrawal</td>
1629 |       <td>490</td>
1630 |       <td>-2413</td>
1631 |     </tr>
1632 |   </tbody>
1633 | </table>
1634 | </div>
1635 | 
1636 | 
1637 | 
1638 | ### Monthly Balance
1639 | 
1640 | 
1641 | ```python
1642 | pd.read_sql("""
1643 | SELECT 
1644 |     customer_id,
1645 |     EXTRACT(MONTH FROM txn_date)::INTEGER AS month,
1646 |     TO_CHAR(txn_date, 'Month') AS month_name,
1647 |     SUM(CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE -txn_amount END) AS closing_balance
1648 | FROM data_bank.customer_transactions
1649 | WHERE customer_id <= 5
1650 | GROUP BY customer_id, month, month_name
1651 | ORDER BY customer_id, month;
1652 | """, conn)
1653 | ```
1654 | 
1655 | <table border="1" class="dataframe">
1656 |   <thead>
1657 |     <tr style="text-align: right;">
1658 |       <th></th>
1659 |       <th>customer_id</th>
1660 |       <th>month</th>
1661 |       <th>month_name</th>
1662 |       <th>closing_balance</th>
1663 |     </tr>
1664 |   </thead>
1665 |   <tbody>
1666 |     <tr>
1667 |       <th>0</th>
1668 |       <td>1</td>
1669 |       <td>1</td>
1670 |       <td>January</td>
1671 |       <td>312</td>
1672 |     </tr>
1673 |     <tr>
1674 |       <th>1</th>
1675 |       <td>1</td>
1676 |       <td>3</td>
1677 |       <td>March</td>
1678 |       <td>-952</td>
1679 |     </tr>
1680 |     <tr>
1681 |       <th>2</th>
1682 |       <td>2</td>
1683 |       <td>1</td>
1684 |       <td>January</td>
1685 |       <td>549</td>
1686 |     </tr>
1687 |     <tr>
1688 |       <th>3</th>
1689 |       <td>2</td>
1690 |       <td>3</td>
1691 |       <td>March</td>
1692 |       <td>61</td>
1693 |     </tr>
1694 |     <tr>
1695 |       <th>4</th>
1696 |       <td>3</td>
1697 |       <td>1</td>
1698 |       <td>January</td>
1699 |       <td>144</td>
1700 |     </tr>
1701 |     <tr>
1702 |       <th>5</th>
1703 |       <td>3</td>
1704 |       <td>2</td>
1705 |       <td>February</td>
1706 |       <td>-965</td>
1707 |     </tr>
1708 |     <tr>
1709 |       <th>6</th>
1710 |       <td>3</td>
1711 |       <td>3</td>
1712 |       <td>March</td>
1713 |       <td>-401</td>
1714 |     </tr>
1715 |     <tr>
1716 |       <th>7</th>
1717 |       <td>3</td>
1718 |       <td>4</td>
1719 |       <td>April</td>
1720 |       <td>493</td>
1721 |     </tr>
1722 |     <tr>
1723 |       <th>8</th>
1724 |       <td>4</td>
1725 |       <td>1</td>
1726 |       <td>January</td>
1727 |       <td>848</td>
1728 |     </tr>
1729 |     <tr>
1730 |       <th>9</th>
1731 |       <td>4</td>
1732 |       <td>3</td>
1733 |       <td>March</td>
1734 |       <td>-193</td>
1735 |     </tr>
1736 |     <tr>
1737 |       <th>10</th>
1738 |       <td>5</td>
1739 |       <td>1</td>
1740 |       <td>January</td>
1741 |       <td>954</td>
1742 |     </tr>
1743 |     <tr>
1744 |       <th>11</th>
1745 |       <td>5</td>
1746 |       <td>3</td>
1747 |       <td>March</td>
1748 |       <td>-2877</td>
1749 |     </tr>
1750 |     <tr>
1751 |       <th>12</th>
1752 |       <td>5</td>
1753 |       <td>4</td>
1754 |       <td>April</td>
1755 |       <td>-490</td>
1756 |     </tr>
1757 |   </tbody>
1758 | </table>
1759 | </div>
1760 | 
1761 | 
1762 | 
1763 | ### Min, Average, Max Transaction
1764 | 
1765 | 
1766 | ```python
1767 | pd.read_sql("""
1768 | SELECT 
1769 |     customer_id, 
1770 |     MIN(running_balance) AS min_transaction, 
1771 |     MAX(running_balance) AS max_transaction,
1772 |     ROUND(AVG(running_balance),2) AS avg_transaction
1773 | FROM 
1774 | (
1775 |     SELECT *,
1776 |         SUM(CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE -txn_amount END) OVER (PARTITION BY customer_id ORDER BY txn_date) AS running_balance
1777 |     FROM data_bank.customer_transactions
1778 | ) running_balance
1779 | WHERE customer_id <= 10
1780 | GROUP BY customer_id
1781 | ORDER BY customer_id
1782 | """, conn)
1783 | ```
1784 | 
1785 | <table border="1" class="dataframe">
1786 |   <thead>
1787 |     <tr style="text-align: right;">
1788 |       <th></th>
1789 |       <th>customer_id</th>
1790 |       <th>min_transaction</th>
1791 |       <th>max_transaction</th>
1792 |       <th>avg_transaction</th>
1793 |     </tr>
1794 |   </thead>
1795 |   <tbody>
1796 |     <tr>
1797 |       <th>0</th>
1798 |       <td>1</td>
1799 |       <td>-640</td>
1800 |       <td>312</td>
1801 |       <td>-151.00</td>
1802 |     </tr>
1803 |     <tr>
1804 |       <th>1</th>
1805 |       <td>2</td>
1806 |       <td>549</td>
1807 |       <td>610</td>
1808 |       <td>579.50</td>
1809 |     </tr>
1810 |     <tr>
1811 |       <th>2</th>
1812 |       <td>3</td>
1813 |       <td>-1222</td>
1814 |       <td>144</td>
1815 |       <td>-732.40</td>
1816 |     </tr>
1817 |     <tr>
1818 |       <th>3</th>
1819 |       <td>4</td>
1820 |       <td>458</td>
1821 |       <td>848</td>
1822 |       <td>653.67</td>
1823 |     </tr>
1824 |     <tr>
1825 |       <th>4</th>
1826 |       <td>5</td>
1827 |       <td>-2413</td>
1828 |       <td>1780</td>
1829 |       <td>-135.45</td>
1830 |     </tr>
1831 |     <tr>
1832 |       <th>5</th>
1833 |       <td>6</td>
1834 |       <td>-552</td>
1835 |       <td>2197</td>
1836 |       <td>624.00</td>
1837 |     </tr>
1838 |     <tr>
1839 |       <th>6</th>
1840 |       <td>7</td>
1841 |       <td>887</td>
1842 |       <td>3539</td>
1843 |       <td>2268.69</td>
1844 |     </tr>
1845 |     <tr>
1846 |       <th>7</th>
1847 |       <td>8</td>
1848 |       <td>-1029</td>
1849 |       <td>1363</td>
1850 |       <td>173.70</td>
1851 |     </tr>
1852 |     <tr>
1853 |       <th>8</th>
1854 |       <td>9</td>
1855 |       <td>-91</td>
1856 |       <td>2030</td>
1857 |       <td>1021.70</td>
1858 |     </tr>
1859 |     <tr>
1860 |       <th>9</th>
1861 |       <td>10</td>
1862 |       <td>-5090</td>
1863 |       <td>556</td>
1864 |       <td>-2229.83</td>
1865 |     </tr>
1866 |   </tbody>
1867 | </table>
1868 | </div>
1869 | 
1870 | 
1871 | 
1872 | 
1873 | ```python
1874 | conn.close()
1875 | ```
1876 | 


--------------------------------------------------------------------------------
/CaseStudy#4 - Data Bank/README.md:
--------------------------------------------------------------------------------
 1 | # Case Study #4: Data Bank 💱
 2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
 3 | [![View Case Study 4](https://img.shields.io/badge/View-Case_Study_4-brown)](https://8weeksqlchallenge.com/case-study-4/)</br>
 4 | ![4](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/8d5eb6d0-c3d2-4e6c-998d-6c00fff3d516)
 5 | 
 6 | 
 7 | The case study presented here is part of the **8 Week SQL Challenge**.\
 8 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).
 9 | 
10 | This time, I am using `PostgreSQL queries` (instead of MySQL) in `Jupyter Notebook` to quickly view results, which provides me with an opportunity 
11 |   - to learn PostgreSQL
12 |   - to utilize handy mathematical and string functions.
13 | 
14 | 
15 | ## Table of Contents
16 | * [Entity Relationship Diagram](#entity-relationship-diagram)
17 | * [Datasets](#datasets)
18 | * [Case Study Questions](#case-study-questions)
19 | * [Solutions](#solutions)
20 | * [PostgreSQL Topics Covered](#postgresql-topics-covered)
21 | 
22 | ## Entity Relationship Diagram
23 | ![DataBank](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/5e232441-2712-4812-90a3-3a9ee5cd1ef0)
24 | 
25 | 
26 | 
27 | 
28 | ## Datasets
29 | The Case Study #4 contains 3 tables:
30 | - **regions**: This table maps the region_id to its respective region_name values.
31 | - **customer_nodes**: This table lists the regions and node reallocation informations of all Data Bank customers.
32 | - **customer_transactions**: This table lists all the transaction informations of all Data Bank customers.
33 | 
34 | ## Case Study Questions
35 | Case Study #4 is categorized into 3 question groups\
36 | To view the specific section, please open the link in a *`new tab`* or *`window`*.\
37 | [A. Customer Nodes Exploration](CaseStudy4_solutions.md#A)\
38 | [B. Customer Transactions](CaseStudy4_solutions.md#B)\
39 | [C. Data Allocation Challenge](CaseStudy4_solutions.md#C)
40 | 
41 | ## Solutions
42 | - View `data_bank` database: [**here**](CaseStudy4_schema.sql)
43 | - View Solution:
44 |     - [**Markdown File**](CaseStudy4_solutions.md): offers a more fluid and responsive viewing experience
45 |     - [**Jupyter Notebook**](CaseStudy4_solutions.ipynb): contains the original code
46 | 
47 | ## PostgreSQL Topics Covered
48 | - Common Table Expressions (CTE)
49 | - Window Functions
50 | - Subqueries
51 | - JOIN, UNION ALL
52 | - MEDIAN, PERCENTILE
53 | 


--------------------------------------------------------------------------------
/CaseStudy#5 - Data Mart/CaseStudy5_solutions.md:
--------------------------------------------------------------------------------
   1 | # Case Study #5: Data Mart
   2 | The case study questions presented here are created by [**Data With Danny**](https://linktr.ee/datawithdanny). They are part of the [**8 Week SQL Challenge**](https://8weeksqlchallenge.com/).
   3 | 
   4 | My SQL queries are written in the `PostgreSQL 15` dialect, integrated into `Jupyter Notebook`, which allows us to instantly view the query results and document the queries.
   5 | 
   6 | For more details about the **Case Study #5**, click [**here**](https://8weeksqlchallenge.com/case-study-5/).
   7 | 
   8 | ## Table of Contents
   9 | 
  10 | ### [1. Importing Libraries](#Import)
  11 | 
  12 | ### [2. Tables of the Database](#Tables)
  13 | 
  14 | ### [3. Case Study Questions](#CaseStudyQuestions)
  15 | 
  16 | - [A. Data Cleansing](#A)
  17 | - [B. Data Exploration](#B)
  18 | - [C. Before & After Analysis](#C)
  19 | - [D. Bonus Question](#D)
  20 | 
  21 | <a id = 'Import'></a>
  22 | ## 1. Importing Required Libraries
  23 | 
  24 | 
  25 | ```python
  26 | import psycopg2 as pg2
  27 | import pandas as pd
  28 | from datetime import datetime
  29 | import seaborn as sns
  30 | import matplotlib.pyplot as plt
  31 | import matplotlib.ticker as ticker
  32 | import os
  33 | import warnings
  34 | 
  35 | warnings.filterwarnings('ignore')
  36 | ```
  37 | 
  38 | ### Connecting PostgreSQL database from Jupyter Notebook
  39 | 
  40 | 
  41 | ```python
  42 | # Get PostgreSQL password
  43 | mypassword = os.getenv('POSTGRESQL_PASSWORD')
  44 | 
  45 | # Connecting to database
  46 | conn = pg2.connect(user = 'postgres', password = mypassword, database = 'data_mart')
  47 | cursor = conn.cursor()
  48 | ```
  49 | 
  50 | 
  51 | <a id = 'Tables'></a>
  52 | ## 2. Tables of the Database
  53 | 
  54 | 
  55 | ```python
  56 | cursor.execute("""
  57 | SELECT table_schema, table_name
  58 | FROM information_schema.tables
  59 | WHERE table_schema = 'data_mart'
  60 | """
  61 | )
  62 | 
  63 | print('--- Tables within "data_mart" database --- ')
  64 | for table in cursor:
  65 |     print(table[1])
  66 | ```
  67 | 
  68 |     --- Tables within "data_mart" database --- 
  69 |     weekly_sales
  70 |     
  71 | 
  72 | Here is the `weekly_sales` table, containing 17,117 data since the March 26, 2018
  73 | 
  74 | 
  75 | ```python
  76 | pd.read_sql("""
  77 | SELECT *
  78 | FROM data_mart.weekly_sales;
  79 | """, conn)
  80 | ```
  81 | 
  82 | <table border="1" class="dataframe">
  83 |   <thead>
  84 |     <tr style="text-align: right;">
  85 |       <th></th>
  86 |       <th>week_date</th>
  87 |       <th>region</th>
  88 |       <th>platform</th>
  89 |       <th>segment</th>
  90 |       <th>customer_type</th>
  91 |       <th>transactions</th>
  92 |       <th>sales</th>
  93 |     </tr>
  94 |   </thead>
  95 |   <tbody>
  96 |     <tr>
  97 |       <th>0</th>
  98 |       <td>31/8/20</td>
  99 |       <td>ASIA</td>
 100 |       <td>Retail</td>
 101 |       <td>C3</td>
 102 |       <td>New</td>
 103 |       <td>120631</td>
 104 |       <td>3656163</td>
 105 |     </tr>
 106 |     <tr>
 107 |       <th>1</th>
 108 |       <td>31/8/20</td>
 109 |       <td>ASIA</td>
 110 |       <td>Retail</td>
 111 |       <td>F1</td>
 112 |       <td>New</td>
 113 |       <td>31574</td>
 114 |       <td>996575</td>
 115 |     </tr>
 116 |     <tr>
 117 |       <th>2</th>
 118 |       <td>31/8/20</td>
 119 |       <td>USA</td>
 120 |       <td>Retail</td>
 121 |       <td>null</td>
 122 |       <td>Guest</td>
 123 |       <td>529151</td>
 124 |       <td>16509610</td>
 125 |     </tr>
 126 |     <tr>
 127 |       <th>3</th>
 128 |       <td>31/8/20</td>
 129 |       <td>EUROPE</td>
 130 |       <td>Retail</td>
 131 |       <td>C1</td>
 132 |       <td>New</td>
 133 |       <td>4517</td>
 134 |       <td>141942</td>
 135 |     </tr>
 136 |     <tr>
 137 |       <th>4</th>
 138 |       <td>31/8/20</td>
 139 |       <td>AFRICA</td>
 140 |       <td>Retail</td>
 141 |       <td>C2</td>
 142 |       <td>New</td>
 143 |       <td>58046</td>
 144 |       <td>1758388</td>
 145 |     </tr>
 146 |     <tr>
 147 |       <th>...</th>
 148 |       <td>...</td>
 149 |       <td>...</td>
 150 |       <td>...</td>
 151 |       <td>...</td>
 152 |       <td>...</td>
 153 |       <td>...</td>
 154 |       <td>...</td>
 155 |     </tr>
 156 |     <tr>
 157 |       <th>17112</th>
 158 |       <td>26/3/18</td>
 159 |       <td>AFRICA</td>
 160 |       <td>Retail</td>
 161 |       <td>C3</td>
 162 |       <td>New</td>
 163 |       <td>98342</td>
 164 |       <td>3706066</td>
 165 |     </tr>
 166 |     <tr>
 167 |       <th>17113</th>
 168 |       <td>26/3/18</td>
 169 |       <td>USA</td>
 170 |       <td>Shopify</td>
 171 |       <td>C4</td>
 172 |       <td>New</td>
 173 |       <td>16</td>
 174 |       <td>2784</td>
 175 |     </tr>
 176 |     <tr>
 177 |       <th>17114</th>
 178 |       <td>26/3/18</td>
 179 |       <td>USA</td>
 180 |       <td>Retail</td>
 181 |       <td>F2</td>
 182 |       <td>New</td>
 183 |       <td>25665</td>
 184 |       <td>1064172</td>
 185 |     </tr>
 186 |     <tr>
 187 |       <th>17115</th>
 188 |       <td>26/3/18</td>
 189 |       <td>EUROPE</td>
 190 |       <td>Retail</td>
 191 |       <td>C4</td>
 192 |       <td>New</td>
 193 |       <td>883</td>
 194 |       <td>33523</td>
 195 |     </tr>
 196 |     <tr>
 197 |       <th>17116</th>
 198 |       <td>26/3/18</td>
 199 |       <td>AFRICA</td>
 200 |       <td>Retail</td>
 201 |       <td>C3</td>
 202 |       <td>Existing</td>
 203 |       <td>218516</td>
 204 |       <td>12083475</td>
 205 |     </tr>
 206 |   </tbody>
 207 | </table>
 208 | <p>17117 rows × 7 columns</p>
 209 | </div>
 210 | 
 211 | 
 212 | 
 213 | 
 214 | <a id = 'CaseStudyQuestions'></a>
 215 | ## 3. Case Study Questions
 216 | 
 217 | <a id='A'></a>
 218 | ## A. Data Cleansing
 219 | In a single query, perform the following operations and generate a *new table* in the data_mart schema named `clean_weekly_sales`:
 220 | 
 221 | - Convert the `week_date` to a DATE format
 222 | 
 223 | - Add a `week_number` as the second column for each `week_date` value, for example any value from the 1st of January to 7th of January will be 1, 8th to 14th will be 2 etc
 224 | 
 225 | - Add a `month_number` with the calendar month for each `week_date` value as the 3rd column
 226 | 
 227 | - Add a `calendar_year` column as the 4th column containing either 2018, 2019 or 2020 values
 228 | 
 229 | - Add a new column called `age_band` after the original `segment` column using the following mapping on the number inside the segment value
 230 | 
 231 | <div align="center">
 232 | 
 233 | segment | age_band 
 234 |  --- | --- 
 235 | 1 | Young Adults
 236 | 2 | Middle Aged
 237 | 3 or 4 | Retirees
 238 | 
 239 | </div>
 240 | 
 241 | - Add a new `demographic` column using the following mapping for the first letter in the segment values:
 242 | 
 243 | <div align="center">
 244 | 
 245 | segment	| demographic
 246 |  --- | --- 
 247 | C | Couples
 248 | F | Families
 249 | 
 250 | </div>
 251 | 
 252 | - Ensure all null string values with an "unknown" string value in the original `segment` column as well as the new `age_band` and `demographic` columns
 253 | 
 254 | - Generate a new `avg_transaction` column as the sales value divided by transactions rounded to 2 decimal places for each record
 255 | 
 256 | ___
 257 | Creating an empty table is the initial step required to store the processed data from the original dataset `weekly_sales` in the desired column order. By using a single query with the `INSERT INTO` statement, it is possible to execute various data cleaning steps within the `SELECT` block.
 258 | 
 259 | 
 260 | ```python
 261 | # Creating table
 262 | cursor.execute("DROP TABLE IF EXISTS data_mart.clean_weekly_sales;")
 263 | cursor.execute("""
 264 | CREATE TABLE data_mart.clean_weekly_sales
 265 | (
 266 |     "week_date" DATE,
 267 |     "week_number" INTEGER,
 268 |     "month_number" INTEGER,
 269 |     "calendar_year" INTEGER,
 270 |     "region" VARCHAR(13),
 271 |     "platform" VARCHAR(7),
 272 |     "segment" VARCHAR(10),
 273 |     "age_band" VARCHAR(50),
 274 |     "demographic" VARCHAR(10),
 275 |     "customer_type" VARCHAR(8),
 276 |     "transactions" INTEGER,
 277 |     "sales" INTEGER,
 278 |     "avg_transaction" DECIMAL
 279 | );
 280 | """)
 281 | 
 282 | 
 283 | # Inserting required and processed data into the newly created table
 284 | cursor.execute("""
 285 | INSERT INTO data_mart.clean_weekly_sales
 286 | SELECT 
 287 |     TO_DATE(week_date, 'dd/mm/yy') AS week_date,
 288 |     DATE_PART('week', TO_DATE(week_date, 'dd/mm/yy'))::INTEGER as week_number,
 289 |     DATE_PART('month', TO_DATE(week_date, 'dd/mm/yy'))::INTEGER as month_number,
 290 |     DATE_PART('year', TO_DATE(week_date, 'dd/mm/yy'))::INTEGER as calendar_year,
 291 |     region,
 292 |     platform,
 293 |     CASE WHEN segment = 'null' THEN 'unknown' ELSE segment END as segment,
 294 |     CASE 
 295 |         WHEN segment LIKE '%1' THEN 'Young Adults' 
 296 |         WHEN segment LIKE '%2' THEN 'Middle Aged'
 297 |         WHEN REGEXP_LIKE(segment, '3|4') THEN 'Retirees'
 298 |         ELSE 'unknown'
 299 |     END AS age_band,
 300 |     
 301 |     CASE 
 302 |         WHEN segment LIKE 'C%' THEN 'Couples'
 303 |         WHEN segment LIKE 'F%' THEN 'Families'
 304 |         ELSE 'unknown' 
 305 |     END AS demographic,
 306 |     customer_type,
 307 |     transactions,
 308 |     sales,
 309 |     ROUND(sales/transactions::numeric, 2) AS avg_transaction
 310 | FROM data_mart.weekly_sales;
 311 | """)
 312 | 
 313 | # Saving updates
 314 | conn.commit()
 315 | ```
 316 | 
 317 | **Result**
 318 | 
 319 | 
 320 | ```python
 321 | pd.read_sql("""
 322 | SELECT * 
 323 | FROM data_mart.clean_weekly_sales;
 324 | """, conn)
 325 | ```
 326 | 
 327 | <table border="1" class="dataframe">
 328 |   <thead>
 329 |     <tr style="text-align: right;">
 330 |       <th></th>
 331 |       <th>week_date</th>
 332 |       <th>week_number</th>
 333 |       <th>month_number</th>
 334 |       <th>calendar_year</th>
 335 |       <th>region</th>
 336 |       <th>platform</th>
 337 |       <th>segment</th>
 338 |       <th>age_band</th>
 339 |       <th>demographic</th>
 340 |       <th>customer_type</th>
 341 |       <th>transactions</th>
 342 |       <th>sales</th>
 343 |       <th>avg_transaction</th>
 344 |     </tr>
 345 |   </thead>
 346 |   <tbody>
 347 |     <tr>
 348 |       <th>0</th>
 349 |       <td>2020-08-31</td>
 350 |       <td>36</td>
 351 |       <td>8</td>
 352 |       <td>2020</td>
 353 |       <td>ASIA</td>
 354 |       <td>Retail</td>
 355 |       <td>C3</td>
 356 |       <td>Retirees</td>
 357 |       <td>Couples</td>
 358 |       <td>New</td>
 359 |       <td>120631</td>
 360 |       <td>3656163</td>
 361 |       <td>30.31</td>
 362 |     </tr>
 363 |     <tr>
 364 |       <th>1</th>
 365 |       <td>2020-08-31</td>
 366 |       <td>36</td>
 367 |       <td>8</td>
 368 |       <td>2020</td>
 369 |       <td>ASIA</td>
 370 |       <td>Retail</td>
 371 |       <td>F1</td>
 372 |       <td>Young Adults</td>
 373 |       <td>Families</td>
 374 |       <td>New</td>
 375 |       <td>31574</td>
 376 |       <td>996575</td>
 377 |       <td>31.56</td>
 378 |     </tr>
 379 |     <tr>
 380 |       <th>2</th>
 381 |       <td>2020-08-31</td>
 382 |       <td>36</td>
 383 |       <td>8</td>
 384 |       <td>2020</td>
 385 |       <td>USA</td>
 386 |       <td>Retail</td>
 387 |       <td>unknown</td>
 388 |       <td>unknown</td>
 389 |       <td>unknown</td>
 390 |       <td>Guest</td>
 391 |       <td>529151</td>
 392 |       <td>16509610</td>
 393 |       <td>31.20</td>
 394 |     </tr>
 395 |     <tr>
 396 |       <th>3</th>
 397 |       <td>2020-08-31</td>
 398 |       <td>36</td>
 399 |       <td>8</td>
 400 |       <td>2020</td>
 401 |       <td>EUROPE</td>
 402 |       <td>Retail</td>
 403 |       <td>C1</td>
 404 |       <td>Young Adults</td>
 405 |       <td>Couples</td>
 406 |       <td>New</td>
 407 |       <td>4517</td>
 408 |       <td>141942</td>
 409 |       <td>31.42</td>
 410 |     </tr>
 411 |     <tr>
 412 |       <th>4</th>
 413 |       <td>2020-08-31</td>
 414 |       <td>36</td>
 415 |       <td>8</td>
 416 |       <td>2020</td>
 417 |       <td>AFRICA</td>
 418 |       <td>Retail</td>
 419 |       <td>C2</td>
 420 |       <td>Middle Aged</td>
 421 |       <td>Couples</td>
 422 |       <td>New</td>
 423 |       <td>58046</td>
 424 |       <td>1758388</td>
 425 |       <td>30.29</td>
 426 |     </tr>
 427 |     <tr>
 428 |       <th>...</th>
 429 |       <td>...</td>
 430 |       <td>...</td>
 431 |       <td>...</td>
 432 |       <td>...</td>
 433 |       <td>...</td>
 434 |       <td>...</td>
 435 |       <td>...</td>
 436 |       <td>...</td>
 437 |       <td>...</td>
 438 |       <td>...</td>
 439 |       <td>...</td>
 440 |       <td>...</td>
 441 |       <td>...</td>
 442 |     </tr>
 443 |     <tr>
 444 |       <th>17112</th>
 445 |       <td>2018-03-26</td>
 446 |       <td>13</td>
 447 |       <td>3</td>
 448 |       <td>2018</td>
 449 |       <td>AFRICA</td>
 450 |       <td>Retail</td>
 451 |       <td>C3</td>
 452 |       <td>Retirees</td>
 453 |       <td>Couples</td>
 454 |       <td>New</td>
 455 |       <td>98342</td>
 456 |       <td>3706066</td>
 457 |       <td>37.69</td>
 458 |     </tr>
 459 |     <tr>
 460 |       <th>17113</th>
 461 |       <td>2018-03-26</td>
 462 |       <td>13</td>
 463 |       <td>3</td>
 464 |       <td>2018</td>
 465 |       <td>USA</td>
 466 |       <td>Shopify</td>
 467 |       <td>C4</td>
 468 |       <td>Retirees</td>
 469 |       <td>Couples</td>
 470 |       <td>New</td>
 471 |       <td>16</td>
 472 |       <td>2784</td>
 473 |       <td>174.00</td>
 474 |     </tr>
 475 |     <tr>
 476 |       <th>17114</th>
 477 |       <td>2018-03-26</td>
 478 |       <td>13</td>
 479 |       <td>3</td>
 480 |       <td>2018</td>
 481 |       <td>USA</td>
 482 |       <td>Retail</td>
 483 |       <td>F2</td>
 484 |       <td>Middle Aged</td>
 485 |       <td>Families</td>
 486 |       <td>New</td>
 487 |       <td>25665</td>
 488 |       <td>1064172</td>
 489 |       <td>41.46</td>
 490 |     </tr>
 491 |     <tr>
 492 |       <th>17115</th>
 493 |       <td>2018-03-26</td>
 494 |       <td>13</td>
 495 |       <td>3</td>
 496 |       <td>2018</td>
 497 |       <td>EUROPE</td>
 498 |       <td>Retail</td>
 499 |       <td>C4</td>
 500 |       <td>Retirees</td>
 501 |       <td>Couples</td>
 502 |       <td>New</td>
 503 |       <td>883</td>
 504 |       <td>33523</td>
 505 |       <td>37.96</td>
 506 |     </tr>
 507 |     <tr>
 508 |       <th>17116</th>
 509 |       <td>2018-03-26</td>
 510 |       <td>13</td>
 511 |       <td>3</td>
 512 |       <td>2018</td>
 513 |       <td>AFRICA</td>
 514 |       <td>Retail</td>
 515 |       <td>C3</td>
 516 |       <td>Retirees</td>
 517 |       <td>Couples</td>
 518 |       <td>Existing</td>
 519 |       <td>218516</td>
 520 |       <td>12083475</td>
 521 |       <td>55.30</td>
 522 |     </tr>
 523 |   </tbody>
 524 | </table>
 525 | <p>17117 rows × 13 columns</p>
 526 | </div>
 527 | 
 528 | 
 529 | 
 530 | 
 531 | <a id = 'B'></a>
 532 | ## B. Data Exploration
 533 | 
 534 | #### 1. What day of the week is used for each `week_date` value?
 535 | 
 536 | 
 537 | ```python
 538 | pd.read_sql("""
 539 | SELECT DISTINCT TO_CHAR(week_date, 'Day') AS day_of_week
 540 | FROM data_mart.clean_weekly_sales;
 541 | """, conn)
 542 | ```
 543 | 
 544 | <table border="1" class="dataframe">
 545 |   <thead>
 546 |     <tr style="text-align: right;">
 547 |       <th></th>
 548 |       <th>day_of_week</th>
 549 |     </tr>
 550 |   </thead>
 551 |   <tbody>
 552 |     <tr>
 553 |       <th>0</th>
 554 |       <td>Monday</td>
 555 |     </tr>
 556 |   </tbody>
 557 | </table>
 558 | </div>
 559 | 
 560 | 
 561 | 
 562 | **Result**\
 563 | Monday is the designated day of the week for each `week_date` value.
 564 | 
 565 | ___
 566 | #### 2. What range of week numbers are missing from the dataset?
 567 | 
 568 | Taking into consideration that there are 52 weeks in a year, we can utilize the `generate_series` function to generate a series of numbers from 1 to 52. By performing a **LEFT JOIN** between this number series and the `clean_week_sales` table, we can identify all the week numbers that are not present in the dataset.
 569 | 
 570 | 
 571 | ```python
 572 | pd.read_sql("""
 573 | SELECT gs.week_number AS missing_week_nb
 574 | FROM generate_series(1,52,1) gs(week_number)
 575 | LEFT JOIN data_mart.clean_weekly_sales d ON gs.week_number = d.week_number
 576 | WHERE d.week_number IS NULL
 577 | """, conn)
 578 | ```
 579 | 
 580 | <table border="1" class="dataframe">
 581 |   <thead>
 582 |     <tr style="text-align: right;">
 583 |       <th></th>
 584 |       <th>missing_week_nb</th>
 585 |     </tr>
 586 |   </thead>
 587 |   <tbody>
 588 |     <tr>
 589 |       <th>0</th>
 590 |       <td>1</td>
 591 |     </tr>
 592 |     <tr>
 593 |       <th>1</th>
 594 |       <td>2</td>
 595 |     </tr>
 596 |     <tr>
 597 |       <th>2</th>
 598 |       <td>3</td>
 599 |     </tr>
 600 |     <tr>
 601 |       <th>3</th>
 602 |       <td>4</td>
 603 |     </tr>
 604 |     <tr>
 605 |       <th>4</th>
 606 |       <td>5</td>
 607 |     </tr>
 608 |     <tr>
 609 |       <th>5</th>
 610 |       <td>6</td>
 611 |     </tr>
 612 |     <tr>
 613 |       <th>6</th>
 614 |       <td>7</td>
 615 |     </tr>
 616 |     <tr>
 617 |       <th>7</th>
 618 |       <td>8</td>
 619 |     </tr>
 620 |     <tr>
 621 |       <th>8</th>
 622 |       <td>9</td>
 623 |     </tr>
 624 |     <tr>
 625 |       <th>9</th>
 626 |       <td>10</td>
 627 |     </tr>
 628 |     <tr>
 629 |       <th>10</th>
 630 |       <td>11</td>
 631 |     </tr>
 632 |     <tr>
 633 |       <th>11</th>
 634 |       <td>12</td>
 635 |     </tr>
 636 |     <tr>
 637 |       <th>12</th>
 638 |       <td>37</td>
 639 |     </tr>
 640 |     <tr>
 641 |       <th>13</th>
 642 |       <td>38</td>
 643 |     </tr>
 644 |     <tr>
 645 |       <th>14</th>
 646 |       <td>39</td>
 647 |     </tr>
 648 |     <tr>
 649 |       <th>15</th>
 650 |       <td>40</td>
 651 |     </tr>
 652 |     <tr>
 653 |       <th>16</th>
 654 |       <td>41</td>
 655 |     </tr>
 656 |     <tr>
 657 |       <th>17</th>
 658 |       <td>42</td>
 659 |     </tr>
 660 |     <tr>
 661 |       <th>18</th>
 662 |       <td>43</td>
 663 |     </tr>
 664 |     <tr>
 665 |       <th>19</th>
 666 |       <td>44</td>
 667 |     </tr>
 668 |     <tr>
 669 |       <th>20</th>
 670 |       <td>45</td>
 671 |     </tr>
 672 |     <tr>
 673 |       <th>21</th>
 674 |       <td>46</td>
 675 |     </tr>
 676 |     <tr>
 677 |       <th>22</th>
 678 |       <td>47</td>
 679 |     </tr>
 680 |     <tr>
 681 |       <th>23</th>
 682 |       <td>48</td>
 683 |     </tr>
 684 |     <tr>
 685 |       <th>24</th>
 686 |       <td>49</td>
 687 |     </tr>
 688 |     <tr>
 689 |       <th>25</th>
 690 |       <td>50</td>
 691 |     </tr>
 692 |     <tr>
 693 |       <th>26</th>
 694 |       <td>51</td>
 695 |     </tr>
 696 |     <tr>
 697 |       <th>27</th>
 698 |       <td>52</td>
 699 |     </tr>
 700 |   </tbody>
 701 | </table>
 702 | </div>
 703 | 
 704 | 
 705 | 
 706 | **Result**
 707 | - The first range of missing week numbers in the dataset is from week 1 to week 12.
 708 | - The second missing range of week numbers is from week 37 to week 52.
 709 | 
 710 | ___
 711 | #### 3. How many total transactions were there for each year in the dataset?
 712 | 
 713 | 
 714 | ```python
 715 | pd.read_sql("""
 716 | SELECT 
 717 |     calendar_year AS year, 
 718 |     COUNT(transactions) AS nb_transactions
 719 | FROM data_mart.clean_weekly_sales
 720 | GROUP BY year;
 721 | """, conn)
 722 | ```
 723 | 
 724 | <table border="1" class="dataframe">
 725 |   <thead>
 726 |     <tr style="text-align: right;">
 727 |       <th></th>
 728 |       <th>year</th>
 729 |       <th>nb_transactions</th>
 730 |     </tr>
 731 |   </thead>
 732 |   <tbody>
 733 |     <tr>
 734 |       <th>0</th>
 735 |       <td>2018</td>
 736 |       <td>5698</td>
 737 |     </tr>
 738 |     <tr>
 739 |       <th>1</th>
 740 |       <td>2020</td>
 741 |       <td>5711</td>
 742 |     </tr>
 743 |     <tr>
 744 |       <th>2</th>
 745 |       <td>2019</td>
 746 |       <td>5708</td>
 747 |     </tr>
 748 |   </tbody>
 749 | </table>
 750 | </div>
 751 | 
 752 | 
 753 | 
 754 | ___
 755 | #### 4. What is the total sales for each region for each month?
 756 | 
 757 | 
 758 | ```python
 759 | df4 = pd.read_sql("""
 760 | SELECT 
 761 |     region, 
 762 |     calendar_year AS year,
 763 |     TO_CHAR(week_date, 'Month') AS month_name,
 764 |     SUM(sales) AS total_sales
 765 | FROM data_mart.clean_weekly_sales
 766 | GROUP BY region, year, month_number, month_name
 767 | ORDER BY region, year, month_number
 768 | """, conn)
 769 | 
 770 | df4
 771 | ```
 772 | 
 773 | 
 774 | <table border="1" class="dataframe">
 775 |   <thead>
 776 |     <tr style="text-align: right;">
 777 |       <th></th>
 778 |       <th>region</th>
 779 |       <th>year</th>
 780 |       <th>month_name</th>
 781 |       <th>total_sales</th>
 782 |     </tr>
 783 |   </thead>
 784 |   <tbody>
 785 |     <tr>
 786 |       <th>0</th>
 787 |       <td>AFRICA</td>
 788 |       <td>2018</td>
 789 |       <td>March</td>
 790 |       <td>130542213</td>
 791 |     </tr>
 792 |     <tr>
 793 |       <th>1</th>
 794 |       <td>AFRICA</td>
 795 |       <td>2018</td>
 796 |       <td>April</td>
 797 |       <td>650194751</td>
 798 |     </tr>
 799 |     <tr>
 800 |       <th>2</th>
 801 |       <td>AFRICA</td>
 802 |       <td>2018</td>
 803 |       <td>May</td>
 804 |       <td>522814997</td>
 805 |     </tr>
 806 |     <tr>
 807 |       <th>3</th>
 808 |       <td>AFRICA</td>
 809 |       <td>2018</td>
 810 |       <td>June</td>
 811 |       <td>519127094</td>
 812 |     </tr>
 813 |     <tr>
 814 |       <th>4</th>
 815 |       <td>AFRICA</td>
 816 |       <td>2018</td>
 817 |       <td>July</td>
 818 |       <td>674135866</td>
 819 |     </tr>
 820 |     <tr>
 821 |       <th>...</th>
 822 |       <td>...</td>
 823 |       <td>...</td>
 824 |       <td>...</td>
 825 |       <td>...</td>
 826 |     </tr>
 827 |     <tr>
 828 |       <th>135</th>
 829 |       <td>USA</td>
 830 |       <td>2020</td>
 831 |       <td>April</td>
 832 |       <td>221952003</td>
 833 |     </tr>
 834 |     <tr>
 835 |       <th>136</th>
 836 |       <td>USA</td>
 837 |       <td>2020</td>
 838 |       <td>May</td>
 839 |       <td>225545881</td>
 840 |     </tr>
 841 |     <tr>
 842 |       <th>137</th>
 843 |       <td>USA</td>
 844 |       <td>2020</td>
 845 |       <td>June</td>
 846 |       <td>277763625</td>
 847 |     </tr>
 848 |     <tr>
 849 |       <th>138</th>
 850 |       <td>USA</td>
 851 |       <td>2020</td>
 852 |       <td>July</td>
 853 |       <td>223735311</td>
 854 |     </tr>
 855 |     <tr>
 856 |       <th>139</th>
 857 |       <td>USA</td>
 858 |       <td>2020</td>
 859 |       <td>August</td>
 860 |       <td>277361606</td>
 861 |     </tr>
 862 |   </tbody>
 863 | </table>
 864 | <p>140 rows × 4 columns</p>
 865 | </div>
 866 | 
 867 | 
 868 | 
 869 | **Result**\
 870 | The output table above, containing 140 rows, makes it challenging to gain insights into the sales performance at Data Mart. Therefore, let's examine a time-series visualization that represents the total sales made by Data Mart by region, using the `hue` parameter.
 871 | 
 872 | 
 873 | ```python
 874 | # Insert a new column for datetime using calendar_year and month_name values 
 875 | df4['date'] = df4.apply(lambda row: datetime(row['year'], datetime.strptime(row['month_name'].strip(), '%B').month, 1), axis=1)
 876 | 
 877 | 
 878 | # plot the total_sales by region
 879 | sns.set_style("darkgrid")
 880 | sns.lineplot(df4['date'], df4['total_sales'], hue = df4['region'])
 881 | plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
 882 | 
 883 | # customize the format, labels and title
 884 | plt.xticks(rotation=45, ha='right')
 885 | plt.ylabel("total sales")
 886 | formatter = ticker.StrMethodFormatter('${x:,.0f}')
 887 | plt.gca().yaxis.set_major_formatter(formatter)
 888 | plt.title('Total Sales at Data Mart between 2018 and 2020');
 889 | ```
 890 | 
 891 | 
 892 |     
 893 | ![png](image/plot.png)
 894 |     
 895 | 
 896 | 
 897 | **Insights**
 898 | - Data Mart has achieved the highest sales in Oceania since 2018.
 899 | - Following Oceania, Africa and Asia are the second and third most lucrative regions for Data Mart sales.
 900 | - South America and Europe recorded the lowest sales figures from 2018 to 2020.
 901 | 
 902 | ___
 903 | #### 5. What is the total count of transactions for each platform
 904 | 
 905 | 
 906 | ```python
 907 | pd.read_sql("""
 908 | SELECT 
 909 |     platform, 
 910 |     COUNT(transactions) AS nb_transactions
 911 | FROM data_mart.clean_weekly_sales
 912 | GROUP BY platform;
 913 | """, conn)
 914 | ```
 915 | 
 916 | <table border="1" class="dataframe">
 917 |   <thead>
 918 |     <tr style="text-align: right;">
 919 |       <th></th>
 920 |       <th>platform</th>
 921 |       <th>nb_transactions</th>
 922 |     </tr>
 923 |   </thead>
 924 |   <tbody>
 925 |     <tr>
 926 |       <th>0</th>
 927 |       <td>Shopify</td>
 928 |       <td>8549</td>
 929 |     </tr>
 930 |     <tr>
 931 |       <th>1</th>
 932 |       <td>Retail</td>
 933 |       <td>8568</td>
 934 |     </tr>
 935 |   </tbody>
 936 | </table>
 937 | </div>
 938 | 
 939 | 
 940 | 
 941 | ___
 942 | #### 6. What is the percentage of sales for Retail vs Shopify for each month?
 943 | 
 944 | 
 945 | ```python
 946 | pd.read_sql("""
 947 | WITH platform_sales_cte AS
 948 | (
 949 |     SELECT 
 950 |         calendar_year, 
 951 |         month_number,
 952 |         TO_CHAR(week_date, 'Month') AS month,
 953 |         SUM(sales) AS monthly_sales,
 954 |         SUM(CASE WHEN platform = 'Retail' THEN sales ELSE 0 END) AS retail_sales,
 955 |         SUM(CASE WHEN platform = 'Shopify' THEN sales ELSE 0 END) AS shopify_sales
 956 |     FROM data_mart.clean_weekly_sales
 957 |     GROUP BY calendar_year, month_number, month
 958 |     ORDER BY calendar_year, month
 959 | )
 960 | SELECT 
 961 |     calendar_year AS year, 
 962 |     month, 
 963 |     ROUND(retail_sales/monthly_sales::numeric*100,1) AS retail_percent_sales,
 964 |     ROUND(shopify_sales/monthly_sales::numeric*100,1) AS shopify_percent_sales
 965 | FROM platform_sales_cte;
 966 | """, conn)
 967 | ```
 968 | 
 969 | <table border="1" class="dataframe">
 970 |   <thead>
 971 |     <tr style="text-align: right;">
 972 |       <th></th>
 973 |       <th>year</th>
 974 |       <th>month</th>
 975 |       <th>retail_percent_sales</th>
 976 |       <th>shopify_percent_sales</th>
 977 |     </tr>
 978 |   </thead>
 979 |   <tbody>
 980 |     <tr>
 981 |       <th>0</th>
 982 |       <td>2018</td>
 983 |       <td>April</td>
 984 |       <td>97.9</td>
 985 |       <td>2.1</td>
 986 |     </tr>
 987 |     <tr>
 988 |       <th>1</th>
 989 |       <td>2018</td>
 990 |       <td>August</td>
 991 |       <td>97.7</td>
 992 |       <td>2.3</td>
 993 |     </tr>
 994 |     <tr>
 995 |       <th>2</th>
 996 |       <td>2018</td>
 997 |       <td>July</td>
 998 |       <td>97.8</td>
 999 |       <td>2.2</td>
1000 |     </tr>
1001 |     <tr>
1002 |       <th>3</th>
1003 |       <td>2018</td>
1004 |       <td>June</td>
1005 |       <td>97.8</td>
1006 |       <td>2.2</td>
1007 |     </tr>
1008 |     <tr>
1009 |       <th>4</th>
1010 |       <td>2018</td>
1011 |       <td>March</td>
1012 |       <td>97.9</td>
1013 |       <td>2.1</td>
1014 |     </tr>
1015 |     <tr>
1016 |       <th>5</th>
1017 |       <td>2018</td>
1018 |       <td>May</td>
1019 |       <td>97.7</td>
1020 |       <td>2.3</td>
1021 |     </tr>
1022 |     <tr>
1023 |       <th>6</th>
1024 |       <td>2018</td>
1025 |       <td>September</td>
1026 |       <td>97.7</td>
1027 |       <td>2.3</td>
1028 |     </tr>
1029 |     <tr>
1030 |       <th>7</th>
1031 |       <td>2019</td>
1032 |       <td>April</td>
1033 |       <td>97.8</td>
1034 |       <td>2.2</td>
1035 |     </tr>
1036 |     <tr>
1037 |       <th>8</th>
1038 |       <td>2019</td>
1039 |       <td>August</td>
1040 |       <td>97.2</td>
1041 |       <td>2.8</td>
1042 |     </tr>
1043 |     <tr>
1044 |       <th>9</th>
1045 |       <td>2019</td>
1046 |       <td>July</td>
1047 |       <td>97.4</td>
1048 |       <td>2.6</td>
1049 |     </tr>
1050 |     <tr>
1051 |       <th>10</th>
1052 |       <td>2019</td>
1053 |       <td>June</td>
1054 |       <td>97.4</td>
1055 |       <td>2.6</td>
1056 |     </tr>
1057 |     <tr>
1058 |       <th>11</th>
1059 |       <td>2019</td>
1060 |       <td>March</td>
1061 |       <td>97.7</td>
1062 |       <td>2.3</td>
1063 |     </tr>
1064 |     <tr>
1065 |       <th>12</th>
1066 |       <td>2019</td>
1067 |       <td>May</td>
1068 |       <td>97.5</td>
1069 |       <td>2.5</td>
1070 |     </tr>
1071 |     <tr>
1072 |       <th>13</th>
1073 |       <td>2019</td>
1074 |       <td>September</td>
1075 |       <td>97.1</td>
1076 |       <td>2.9</td>
1077 |     </tr>
1078 |     <tr>
1079 |       <th>14</th>
1080 |       <td>2020</td>
1081 |       <td>April</td>
1082 |       <td>97.0</td>
1083 |       <td>3.0</td>
1084 |     </tr>
1085 |     <tr>
1086 |       <th>15</th>
1087 |       <td>2020</td>
1088 |       <td>August</td>
1089 |       <td>96.5</td>
1090 |       <td>3.5</td>
1091 |     </tr>
1092 |     <tr>
1093 |       <th>16</th>
1094 |       <td>2020</td>
1095 |       <td>July</td>
1096 |       <td>96.7</td>
1097 |       <td>3.3</td>
1098 |     </tr>
1099 |     <tr>
1100 |       <th>17</th>
1101 |       <td>2020</td>
1102 |       <td>June</td>
1103 |       <td>96.8</td>
1104 |       <td>3.2</td>
1105 |     </tr>
1106 |     <tr>
1107 |       <th>18</th>
1108 |       <td>2020</td>
1109 |       <td>March</td>
1110 |       <td>97.3</td>
1111 |       <td>2.7</td>
1112 |     </tr>
1113 |     <tr>
1114 |       <th>19</th>
1115 |       <td>2020</td>
1116 |       <td>May</td>
1117 |       <td>96.7</td>
1118 |       <td>3.3</td>
1119 |     </tr>
1120 |   </tbody>
1121 | </table>
1122 | </div>
1123 | 
1124 | 
1125 | 
1126 | **Result**\
1127 | Retail has achieved significantly higher sales, accounting for approximately 97% of the total sales, while Shopify only accounted for 2.5% of the sales.
1128 | 
1129 | ___
1130 | #### 7. What is the percentage of sales by demographic for each year in the dataset?
1131 | 
1132 | 
1133 | ```python
1134 | pd.read_sql("""
1135 | WITH demographic_sales_cte AS
1136 | (
1137 |     SELECT 
1138 |         calendar_year AS year,
1139 |         SUM(sales) AS total_sales,
1140 |         SUM(CASE WHEN demographic = 'Families' THEN sales END) AS families_sales,
1141 |         SUM(CASE WHEN demographic = 'Couples' THEN sales END) AS couples_sales,
1142 |         SUM(CASE WHEN demographic = 'unknown' THEN sales END) AS unknown_sales
1143 |     FROM data_mart.clean_weekly_sales
1144 |     GROUP BY year
1145 | )
1146 | SELECT 
1147 |     year, 
1148 |     ROUND(families_sales/total_sales::numeric*100,1) AS families_sales_percent,
1149 |     ROUND(couples_sales/total_sales::numeric*100,1) AS couples_sales_percent,
1150 |     ROUND(unknown_sales/total_sales::numeric*100,1) AS unknown_sales_percent
1151 | FROM demographic_sales_cte;
1152 | """, conn)
1153 | ```
1154 | 
1155 | <table border="1" class="dataframe">
1156 |   <thead>
1157 |     <tr style="text-align: right;">
1158 |       <th></th>
1159 |       <th>year</th>
1160 |       <th>families_sales_percent</th>
1161 |       <th>couples_sales_percent</th>
1162 |       <th>unknown_sales_percent</th>
1163 |     </tr>
1164 |   </thead>
1165 |   <tbody>
1166 |     <tr>
1167 |       <th>0</th>
1168 |       <td>2018</td>
1169 |       <td>32.0</td>
1170 |       <td>26.4</td>
1171 |       <td>41.6</td>
1172 |     </tr>
1173 |     <tr>
1174 |       <th>1</th>
1175 |       <td>2020</td>
1176 |       <td>32.7</td>
1177 |       <td>28.7</td>
1178 |       <td>38.6</td>
1179 |     </tr>
1180 |     <tr>
1181 |       <th>2</th>
1182 |       <td>2019</td>
1183 |       <td>32.5</td>
1184 |       <td>27.3</td>
1185 |       <td>40.3</td>
1186 |     </tr>
1187 |   </tbody>
1188 | </table>
1189 | </div>
1190 | 
1191 | 
1192 | 
1193 | **Result**\
1194 | Families contributed slightly more to the sales of Data Mart, accounting for approximately 5% more sales compared to Couples.
1195 | 
1196 | ___
1197 | #### 8. Which `age_band` and `demographic` values contribute the most to Retail sales?
1198 | 
1199 | 
1200 | ```python
1201 | query8 = """
1202 | SELECT 
1203 |     age_band, 
1204 |     demographic, 
1205 |     ROUND(SUM(sales)/(SELECT SUM(sales) FROM data_mart.clean_weekly_sales WHERE platform = 'Retail')::NUMERIC * 100,1) AS contribution_percent
1206 | FROM data_mart.clean_weekly_sales
1207 | WHERE platform = 'Retail'
1208 | GROUP BY age_band, demographic
1209 | ORDER BY contribution_percent DESC
1210 | """
1211 | 
1212 | pd.read_sql(query8, conn)
1213 | ```
1214 | 
1215 | <table border="1" class="dataframe">
1216 |   <thead>
1217 |     <tr style="text-align: right;">
1218 |       <th></th>
1219 |       <th>age_band</th>
1220 |       <th>demographic</th>
1221 |       <th>contribution_percent</th>
1222 |     </tr>
1223 |   </thead>
1224 |   <tbody>
1225 |     <tr>
1226 |       <th>0</th>
1227 |       <td>unknown</td>
1228 |       <td>unknown</td>
1229 |       <td>40.5</td>
1230 |     </tr>
1231 |     <tr>
1232 |       <th>1</th>
1233 |       <td>Retirees</td>
1234 |       <td>Families</td>
1235 |       <td>16.7</td>
1236 |     </tr>
1237 |     <tr>
1238 |       <th>2</th>
1239 |       <td>Retirees</td>
1240 |       <td>Couples</td>
1241 |       <td>16.1</td>
1242 |     </tr>
1243 |     <tr>
1244 |       <th>3</th>
1245 |       <td>Middle Aged</td>
1246 |       <td>Families</td>
1247 |       <td>11.0</td>
1248 |     </tr>
1249 |     <tr>
1250 |       <th>4</th>
1251 |       <td>Young Adults</td>
1252 |       <td>Couples</td>
1253 |       <td>6.6</td>
1254 |     </tr>
1255 |     <tr>
1256 |       <th>5</th>
1257 |       <td>Middle Aged</td>
1258 |       <td>Couples</td>
1259 |       <td>4.7</td>
1260 |     </tr>
1261 |     <tr>
1262 |       <th>6</th>
1263 |       <td>Young Adults</td>
1264 |       <td>Families</td>
1265 |       <td>4.5</td>
1266 |     </tr>
1267 |   </tbody>
1268 | </table>
1269 | </div>
1270 | 
1271 | 
1272 | 
1273 | Alternatively, let's extract the values with the highest contribution percent.
1274 | 
1275 | 
1276 | ```python
1277 | pd.read_sql(query8 + "LIMIT 2", conn)
1278 | ```
1279 | 
1280 | <table border="1" class="dataframe">
1281 |   <thead>
1282 |     <tr style="text-align: right;">
1283 |       <th></th>
1284 |       <th>age_band</th>
1285 |       <th>demographic</th>
1286 |       <th>contribution_percent</th>
1287 |     </tr>
1288 |   </thead>
1289 |   <tbody>
1290 |     <tr>
1291 |       <th>0</th>
1292 |       <td>unknown</td>
1293 |       <td>unknown</td>
1294 |       <td>40.5</td>
1295 |     </tr>
1296 |     <tr>
1297 |       <th>1</th>
1298 |       <td>Retirees</td>
1299 |       <td>Families</td>
1300 |       <td>16.7</td>
1301 |     </tr>
1302 |   </tbody>
1303 | </table>
1304 | </div>
1305 | 
1306 | 
1307 | 
1308 | **Result**\
1309 | Retired Families are the age_band and demographic values that contributed the most to the Retail sales.
1310 | 
1311 | ___
1312 | #### 9. Can we use the `avg_transaction` column to find the average transaction size for each year for Retail vs Shopify? If not - how would you calculate it instead?
1313 | 
1314 | **Method 1:** When using `GROUP BY` to calculate the average transaction size for each year, the obtained results differ slightly from using the `avg_transaction` column directly.
1315 | 
1316 | 
1317 | ```python
1318 | pd.read_sql("""
1319 | SELECT 
1320 |     calendar_year AS year, 
1321 |     platform,
1322 |     ROUND(AVG(avg_transaction),2) AS avg_transaction1,
1323 |     ROUND(SUM(sales)::NUMERIC/SUM(transactions)::NUMERIC,2) AS avg_transaction2
1324 | FROM data_mart.clean_weekly_sales
1325 | GROUP BY year, platform
1326 | ORDER BY platform, year
1327 | """, conn)
1328 | ```
1329 | 
1330 | <table border="1" class="dataframe">
1331 |   <thead>
1332 |     <tr style="text-align: right;">
1333 |       <th></th>
1334 |       <th>year</th>
1335 |       <th>platform</th>
1336 |       <th>avg_transaction1</th>
1337 |       <th>avg_transaction2</th>
1338 |     </tr>
1339 |   </thead>
1340 |   <tbody>
1341 |     <tr>
1342 |       <th>0</th>
1343 |       <td>2018</td>
1344 |       <td>Retail</td>
1345 |       <td>42.91</td>
1346 |       <td>36.56</td>
1347 |     </tr>
1348 |     <tr>
1349 |       <th>1</th>
1350 |       <td>2019</td>
1351 |       <td>Retail</td>
1352 |       <td>41.97</td>
1353 |       <td>36.83</td>
1354 |     </tr>
1355 |     <tr>
1356 |       <th>2</th>
1357 |       <td>2020</td>
1358 |       <td>Retail</td>
1359 |       <td>40.64</td>
1360 |       <td>36.56</td>
1361 |     </tr>
1362 |     <tr>
1363 |       <th>3</th>
1364 |       <td>2018</td>
1365 |       <td>Shopify</td>
1366 |       <td>188.28</td>
1367 |       <td>192.48</td>
1368 |     </tr>
1369 |     <tr>
1370 |       <th>4</th>
1371 |       <td>2019</td>
1372 |       <td>Shopify</td>
1373 |       <td>177.56</td>
1374 |       <td>183.36</td>
1375 |     </tr>
1376 |     <tr>
1377 |       <th>5</th>
1378 |       <td>2020</td>
1379 |       <td>Shopify</td>
1380 |       <td>174.87</td>
1381 |       <td>179.03</td>
1382 |     </tr>
1383 |   </tbody>
1384 | </table>
1385 | </div>
1386 | 
1387 | 
1388 | 
1389 | **Method #2:** With this following method, we obtain the same results as using the `avg_transactions` values.
1390 | 
1391 | 
1392 | ```python
1393 | pd.read_sql("""
1394 | SELECT 
1395 |     calendar_year AS year,
1396 |     ROUND(AVG(CASE WHEN platform = 'Retail' THEN avg_transaction END),2) AS retail_avg_txn1,
1397 |     ROUND(AVG(CASE WHEN platform = 'Retail' THEN sales::numeric/transactions::numeric END),2) AS retail_avg_txn2,
1398 |     ROUND(AVG(CASE WHEN platform = 'Shopify' THEN avg_transaction END),2) AS shopify_avg_txn1,
1399 |     ROUND(AVG(CASE WHEN platform = 'Shopify' THEN sales::numeric/transactions::numeric END),2) AS shopify_avg_txn2
1400 | FROM data_mart.clean_weekly_sales
1401 | GROUP BY year
1402 | """, conn)
1403 | ```
1404 | 
1405 | <table border="1" class="dataframe">
1406 |   <thead>
1407 |     <tr style="text-align: right;">
1408 |       <th></th>
1409 |       <th>year</th>
1410 |       <th>retail_avg_txn1</th>
1411 |       <th>retail_avg_txn2</th>
1412 |       <th>shopify_avg_txn1</th>
1413 |       <th>shopify_avg_txn2</th>
1414 |     </tr>
1415 |   </thead>
1416 |   <tbody>
1417 |     <tr>
1418 |       <th>0</th>
1419 |       <td>2018</td>
1420 |       <td>42.91</td>
1421 |       <td>42.91</td>
1422 |       <td>188.28</td>
1423 |       <td>188.28</td>
1424 |     </tr>
1425 |     <tr>
1426 |       <th>1</th>
1427 |       <td>2020</td>
1428 |       <td>40.64</td>
1429 |       <td>40.64</td>
1430 |       <td>174.87</td>
1431 |       <td>174.87</td>
1432 |     </tr>
1433 |     <tr>
1434 |       <th>2</th>
1435 |       <td>2019</td>
1436 |       <td>41.97</td>
1437 |       <td>41.97</td>
1438 |       <td>177.56</td>
1439 |       <td>177.56</td>
1440 |     </tr>
1441 |   </tbody>
1442 | </table>
1443 | </div>
1444 | 
1445 | 
1446 | 
1447 | 
1448 | <a id = 'C'></a>
1449 | ## C. Before & After Analysis
1450 | This technique is usually used when we inspect an important event and want to inspect the impact before and after a certain point in time.
1451 | 
1452 | Taking the `week_date` value of **2020-06-15** as the baseline week where the Data Mart sustainable packaging changes came into effect.
1453 | 
1454 | We would include all `week_date` values for **2020-06-15** as the start of the period after the change and the previous `week_date` values would be before
1455 | 
1456 | Using this analysis approach - answer the following questions:
1457 | 
1458 | > 1. What is the total sales for the 4 weeks before and after 2020-06-15? What is the growth or reduction rate in actual values and percentage of sales?
1459 | > 2. What about the entire 12 weeks before and after?
1460 | > 3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019?
1461 | 
1462 | #### 1. What is the total sales for the 4 weeks before and after 2020-06-15? What is the growth or reduction rate in actual values and percentage of sales?
1463 | 
1464 | The week number of the date **2020-06-15** is 25. Therefore, 
1465 | - the four weeks before this date are `week 21, 22, 23, and 24`, and 
1466 | - the four weeks after are `week 25, 26, 27, and 28`.
1467 | 
1468 | 
1469 | ```python
1470 | pd.read_sql("""
1471 | SELECT DISTINCT week_number
1472 | FROM data_mart.clean_weekly_sales
1473 | WHERE week_date = '2020-06-15'
1474 | """, conn)
1475 | ```
1476 | 
1477 | <table border="1" class="dataframe">
1478 |   <thead>
1479 |     <tr style="text-align: right;">
1480 |       <th></th>
1481 |       <th>week_number</th>
1482 |     </tr>
1483 |   </thead>
1484 |   <tbody>
1485 |     <tr>
1486 |       <th>0</th>
1487 |       <td>25</td>
1488 |     </tr>
1489 |   </tbody>
1490 | </table>
1491 | </div>
1492 | 
1493 | 
1494 | 
1495 | **Result**\
1496 | The implementation of sustainable packaging changes appears to have a **negative impact** on sales at Data Mart, resulting in a reduction rate of 1.15% in sales.
1497 | 
1498 | 
1499 | ```python
1500 | pd.read_sql("""
1501 | SELECT 
1502 |     TO_CHAR(total_sales_before, 'FM$ 999,999,999,999') AS total_sales_before,
1503 |     TO_CHAR(total_sales_after, 'FM$ 999,999,999,999') AS total_sales_after,
1504 |     TO_CHAR(total_sales_after - total_sales_before, 'FM$ 999,999,999,999') AS difference,
1505 |     ROUND((total_sales_after - total_sales_before)/total_sales_before::NUMERIC * 100,2) AS sales_change_percent
1506 | FROM
1507 | (
1508 |     SELECT
1509 |         SUM(CASE WHEN week_number BETWEEN 21 AND 24 THEN sales ELSE 0 END) AS total_sales_before,
1510 |         SUM(CASE WHEN week_number BETWEEN 25 AND 28 THEN sales ELSE 0 END) AS total_sales_after
1511 |     FROM data_mart.clean_weekly_sales
1512 |     WHERE calendar_year = 2020
1513 | ) sales;
1514 | """, conn)
1515 | ```
1516 | 
1517 | <table border="1" class="dataframe">
1518 |   <thead>
1519 |     <tr style="text-align: right;">
1520 |       <th></th>
1521 |       <th>total_sales_before</th>
1522 |       <th>total_sales_after</th>
1523 |       <th>difference</th>
1524 |       <th>sales_change_percent</th>
1525 |     </tr>
1526 |   </thead>
1527 |   <tbody>
1528 |     <tr>
1529 |       <th>0</th>
1530 |       <td>$ 2,345,878,357</td>
1531 |       <td>$ 2,318,994,169</td>
1532 |       <td>$ -26,884,188</td>
1533 |       <td>-1.15</td>
1534 |     </tr>
1535 |   </tbody>
1536 | </table>
1537 | </div>
1538 | 
1539 | 
1540 | 
1541 | ___
1542 | #### 2. What about the entire 12 weeks before and after?
1543 | 
1544 | 
1545 | ```python
1546 | pd.read_sql("""
1547 | SELECT 
1548 |     TO_CHAR(total_sales_before, 'FM$ 999,999,999,999') AS total_sales_before,
1549 |     TO_CHAR(total_sales_after, 'FM$ 999,999,999,999') AS total_sales_after,
1550 |     TO_CHAR(total_sales_after - total_sales_before, 'FM$ 999,999,999,999') AS difference,
1551 |     ROUND((total_sales_after - total_sales_before)/total_sales_before::NUMERIC * 100, 2) AS sales_change_percent
1552 | FROM
1553 | (
1554 |     SELECT 
1555 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales ELSE 0 END) AS total_sales_before,
1556 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales ELSE 0 END) AS total_sales_after
1557 |     FROM data_mart.clean_weekly_sales
1558 |     WHERE calendar_year = 2020
1559 | ) sales
1560 | """, conn)
1561 | ```
1562 | 
1563 | <table border="1" class="dataframe">
1564 |   <thead>
1565 |     <tr style="text-align: right;">
1566 |       <th></th>
1567 |       <th>total_sales_before</th>
1568 |       <th>total_sales_after</th>
1569 |       <th>difference</th>
1570 |       <th>sales_change_percent</th>
1571 |     </tr>
1572 |   </thead>
1573 |   <tbody>
1574 |     <tr>
1575 |       <th>0</th>
1576 |       <td>$ 7,126,273,147</td>
1577 |       <td>$ 6,973,947,753</td>
1578 |       <td>$ -152,325,394</td>
1579 |       <td>-2.14</td>
1580 |     </tr>
1581 |   </tbody>
1582 | </table>
1583 | </div>
1584 | 
1585 | 
1586 | 
1587 | **Result**\
1588 | The implementation of sustainable packaging changes appears to have a **negative impact** on sales at Data Mart, resulting in a reduction rate of 2.14% in sales.
1589 | 
1590 | ___
1591 | #### 3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019?
1592 | 
1593 | 
1594 | ```python
1595 | pd.read_sql("""
1596 | SELECT 
1597 |     year,
1598 |     TO_CHAR(total_sales_before, 'FM$ 999,999,999,999') AS total_sales_before,
1599 |     TO_CHAR(total_sales_after, 'FM$ 999,999,999,999') AS total_sales_after,
1600 |     TO_CHAR(total_sales_after - total_sales_before, 'FM$ 999,999,999,999') AS difference,
1601 |     ROUND((total_sales_after - total_sales_before)/total_sales_before::NUMERIC * 100, 2) AS sales_change_percent
1602 | FROM
1603 | (
1604 |     SELECT 
1605 |         calendar_year AS year,
1606 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales ELSE 0 END) AS total_sales_before,
1607 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales ELSE 0 END) AS total_sales_after
1608 |     FROM data_mart.clean_weekly_sales
1609 |     GROUP BY year
1610 | ) sales
1611 | ORDER BY year
1612 | """, conn)
1613 | ```
1614 | 
1615 | <table border="1" class="dataframe">
1616 |   <thead>
1617 |     <tr style="text-align: right;">
1618 |       <th></th>
1619 |       <th>year</th>
1620 |       <th>total_sales_before</th>
1621 |       <th>total_sales_after</th>
1622 |       <th>difference</th>
1623 |       <th>sales_change_percent</th>
1624 |     </tr>
1625 |   </thead>
1626 |   <tbody>
1627 |     <tr>
1628 |       <th>0</th>
1629 |       <td>2018</td>
1630 |       <td>$ 6,396,562,317</td>
1631 |       <td>$ 6,500,818,510</td>
1632 |       <td>$ 104,256,193</td>
1633 |       <td>1.63</td>
1634 |     </tr>
1635 |     <tr>
1636 |       <th>1</th>
1637 |       <td>2019</td>
1638 |       <td>$ 6,883,386,397</td>
1639 |       <td>$ 6,862,646,103</td>
1640 |       <td>$ -20,740,294</td>
1641 |       <td>-0.30</td>
1642 |     </tr>
1643 |     <tr>
1644 |       <th>2</th>
1645 |       <td>2020</td>
1646 |       <td>$ 7,126,273,147</td>
1647 |       <td>$ 6,973,947,753</td>
1648 |       <td>$ -152,325,394</td>
1649 |       <td>-2.14</td>
1650 |     </tr>
1651 |   </tbody>
1652 | </table>
1653 | </div>
1654 | 
1655 | 
1656 | 
1657 | 
1658 | <a id = 'D'></a>
1659 | ## D. Bonus Question
1660 | Which areas of the business have the highest negative impact in sales metrics performance in 2020 for the 12 week before and after period?
1661 | 
1662 | - region
1663 | - platform
1664 | - age_band
1665 | - demographic
1666 | - customer_type
1667 | 
1668 | Do you have any further recommendations for Danny’s team at Data Mart or any interesting insights based off this analysis?
1669 | 
1670 | ### Overview
1671 | The following query output displays the various areas of the business that experienced the highest negative impact on sales metrics performance in 2020. By setting `WHERE sales_before > sales_after`, only the negative percent changes record will be filtered and displayed (the top 10 records).
1672 | 
1673 | 
1674 | ```python
1675 | pd.read_sql("""
1676 | WITH overview_sales_cte AS
1677 | (
1678 |     SELECT 
1679 |         region, 
1680 |         platform, 
1681 |         age_band, 
1682 |         demographic, 
1683 |         customer_type, 
1684 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales ELSE 0 END) AS sales_before,
1685 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales ELSE 0 END) AS sales_after
1686 |     FROM data_mart.clean_weekly_sales
1687 |     WHERE calendar_year = 2020
1688 |     GROUP BY region, platform, age_band, demographic, customer_type
1689 | )
1690 | SELECT 
1691 |     region, 
1692 |     platform, 
1693 |     age_band, 
1694 |     demographic, 
1695 |     customer_type,
1696 |     ROUND((sales_after - sales_before)::NUMERIC/sales_before::NUMERIC * 100, 2) AS sales_change_percent
1697 | FROM overview_sales_cte
1698 | WHERE sales_before > sales_after     
1699 | ORDER BY sales_change_percent ASC
1700 | LIMIT 10;            
1701 | """, conn)
1702 | ```
1703 | 
1704 | <table border="1" class="dataframe">
1705 |   <thead>
1706 |     <tr style="text-align: right;">
1707 |       <th></th>
1708 |       <th>region</th>
1709 |       <th>platform</th>
1710 |       <th>age_band</th>
1711 |       <th>demographic</th>
1712 |       <th>customer_type</th>
1713 |       <th>sales_change_percent</th>
1714 |     </tr>
1715 |   </thead>
1716 |   <tbody>
1717 |     <tr>
1718 |       <th>0</th>
1719 |       <td>SOUTH AMERICA</td>
1720 |       <td>Shopify</td>
1721 |       <td>unknown</td>
1722 |       <td>unknown</td>
1723 |       <td>Existing</td>
1724 |       <td>-42.23</td>
1725 |     </tr>
1726 |     <tr>
1727 |       <th>1</th>
1728 |       <td>EUROPE</td>
1729 |       <td>Shopify</td>
1730 |       <td>Retirees</td>
1731 |       <td>Families</td>
1732 |       <td>New</td>
1733 |       <td>-33.71</td>
1734 |     </tr>
1735 |     <tr>
1736 |       <th>2</th>
1737 |       <td>EUROPE</td>
1738 |       <td>Shopify</td>
1739 |       <td>Young Adults</td>
1740 |       <td>Families</td>
1741 |       <td>New</td>
1742 |       <td>-27.97</td>
1743 |     </tr>
1744 |     <tr>
1745 |       <th>3</th>
1746 |       <td>SOUTH AMERICA</td>
1747 |       <td>Retail</td>
1748 |       <td>unknown</td>
1749 |       <td>unknown</td>
1750 |       <td>Existing</td>
1751 |       <td>-23.20</td>
1752 |     </tr>
1753 |     <tr>
1754 |       <th>4</th>
1755 |       <td>SOUTH AMERICA</td>
1756 |       <td>Retail</td>
1757 |       <td>Retirees</td>
1758 |       <td>Families</td>
1759 |       <td>New</td>
1760 |       <td>-21.28</td>
1761 |     </tr>
1762 |     <tr>
1763 |       <th>5</th>
1764 |       <td>SOUTH AMERICA</td>
1765 |       <td>Shopify</td>
1766 |       <td>Middle Aged</td>
1767 |       <td>Families</td>
1768 |       <td>New</td>
1769 |       <td>-19.73</td>
1770 |     </tr>
1771 |     <tr>
1772 |       <th>6</th>
1773 |       <td>SOUTH AMERICA</td>
1774 |       <td>Shopify</td>
1775 |       <td>Retirees</td>
1776 |       <td>Families</td>
1777 |       <td>New</td>
1778 |       <td>-19.04</td>
1779 |     </tr>
1780 |     <tr>
1781 |       <th>7</th>
1782 |       <td>SOUTH AMERICA</td>
1783 |       <td>Shopify</td>
1784 |       <td>Retirees</td>
1785 |       <td>Couples</td>
1786 |       <td>New</td>
1787 |       <td>-18.39</td>
1788 |     </tr>
1789 |     <tr>
1790 |       <th>8</th>
1791 |       <td>SOUTH AMERICA</td>
1792 |       <td>Retail</td>
1793 |       <td>Retirees</td>
1794 |       <td>Couples</td>
1795 |       <td>Existing</td>
1796 |       <td>-16.80</td>
1797 |     </tr>
1798 |     <tr>
1799 |       <th>9</th>
1800 |       <td>SOUTH AMERICA</td>
1801 |       <td>Retail</td>
1802 |       <td>Retirees</td>
1803 |       <td>Couples</td>
1804 |       <td>New</td>
1805 |       <td>-15.86</td>
1806 |     </tr>
1807 |   </tbody>
1808 | </table>
1809 | </div>
1810 | 
1811 | 
1812 | 
1813 | ___
1814 | In the followings, I will analyze each area individually to determine which category has been the most impacted.
1815 | 
1816 | ### Sales By Region
1817 | The **Asia** region has been the most negatively impacted by the implementation of the new system at Data Mart.
1818 | 
1819 | 
1820 | ```python
1821 | pd.read_sql("""
1822 | WITH region_sales_cte AS
1823 | (
1824 |     SELECT 
1825 |         region, 
1826 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales END) AS sales_before,
1827 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales END) AS sales_after
1828 |     FROM data_mart.clean_weekly_sales
1829 |     WHERE calendar_year = 2020
1830 |     GROUP BY region
1831 | ) 
1832 | SELECT 
1833 |     region, 
1834 |     ROUND((sales_after - sales_before)/sales_before::NUMERIC * 100,2) AS sales_change_percent
1835 | FROM region_sales_cte
1836 | ORDER BY sales_change_percent ASC;
1837 | """, conn)
1838 | ```
1839 | 
1840 | <table border="1" class="dataframe">
1841 |   <thead>
1842 |     <tr style="text-align: right;">
1843 |       <th></th>
1844 |       <th>region</th>
1845 |       <th>sales_change_percent</th>
1846 |     </tr>
1847 |   </thead>
1848 |   <tbody>
1849 |     <tr>
1850 |       <th>0</th>
1851 |       <td>ASIA</td>
1852 |       <td>-3.26</td>
1853 |     </tr>
1854 |     <tr>
1855 |       <th>1</th>
1856 |       <td>OCEANIA</td>
1857 |       <td>-3.03</td>
1858 |     </tr>
1859 |     <tr>
1860 |       <th>2</th>
1861 |       <td>SOUTH AMERICA</td>
1862 |       <td>-2.15</td>
1863 |     </tr>
1864 |     <tr>
1865 |       <th>3</th>
1866 |       <td>CANADA</td>
1867 |       <td>-1.92</td>
1868 |     </tr>
1869 |     <tr>
1870 |       <th>4</th>
1871 |       <td>USA</td>
1872 |       <td>-1.60</td>
1873 |     </tr>
1874 |     <tr>
1875 |       <th>5</th>
1876 |       <td>AFRICA</td>
1877 |       <td>-0.54</td>
1878 |     </tr>
1879 |     <tr>
1880 |       <th>6</th>
1881 |       <td>EUROPE</td>
1882 |       <td>4.73</td>
1883 |     </tr>
1884 |   </tbody>
1885 | </table>
1886 | </div>
1887 | 
1888 | 
1889 | 
1890 | ### Sales By Platform
1891 | 
1892 | The **Retail** platform has been the most negatively impacted by the implementation of the new system at Data Mart.
1893 | 
1894 | 
1895 | ```python
1896 | pd.read_sql("""
1897 | WITH platform_sales_cte AS
1898 | (
1899 |     SELECT 
1900 |         platform, 
1901 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales END) AS sales_before,
1902 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales END) AS sales_after
1903 |     FROM data_mart.clean_weekly_sales
1904 |     WHERE calendar_year = 2020
1905 |     GROUP BY platform
1906 | ) 
1907 | SELECT 
1908 |     platform, 
1909 |     ROUND((sales_after - sales_before)/sales_before::NUMERIC * 100,2) AS sales_change_percent
1910 | FROM platform_sales_cte
1911 | ORDER BY sales_change_percent ASC;
1912 | """, conn)
1913 | ```
1914 | 
1915 | <table border="1" class="dataframe">
1916 |   <thead>
1917 |     <tr style="text-align: right;">
1918 |       <th></th>
1919 |       <th>platform</th>
1920 |       <th>sales_change_percent</th>
1921 |     </tr>
1922 |   </thead>
1923 |   <tbody>
1924 |     <tr>
1925 |       <th>0</th>
1926 |       <td>Retail</td>
1927 |       <td>-2.43</td>
1928 |     </tr>
1929 |     <tr>
1930 |       <th>1</th>
1931 |       <td>Shopify</td>
1932 |       <td>7.18</td>
1933 |     </tr>
1934 |   </tbody>
1935 | </table>
1936 | </div>
1937 | 
1938 | 
1939 | 
1940 | ### Sales By Age_Band
1941 | The **Middle Aged** customers have been the most negatively impacted by the implementation of the new system at Data Mart.
1942 | 
1943 | 
1944 | ```python
1945 | pd.read_sql("""
1946 | WITH age_band_sales_cte AS
1947 | (
1948 |     SELECT 
1949 |         age_band, 
1950 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales END) AS sales_before,
1951 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales END) AS sales_after
1952 |     FROM data_mart.clean_weekly_sales
1953 |     WHERE calendar_year = 2020
1954 |     GROUP BY age_band
1955 | ) 
1956 | SELECT 
1957 |     age_band, 
1958 |     ROUND((sales_after - sales_before)/sales_before::NUMERIC * 100,2) AS sales_change_percent
1959 | FROM age_band_sales_cte
1960 | ORDER BY sales_change_percent ASC;
1961 | """, conn)
1962 | ```
1963 | 
1964 | <table border="1" class="dataframe">
1965 |   <thead>
1966 |     <tr style="text-align: right;">
1967 |       <th></th>
1968 |       <th>age_band</th>
1969 |       <th>sales_change_percent</th>
1970 |     </tr>
1971 |   </thead>
1972 |   <tbody>
1973 |     <tr>
1974 |       <th>0</th>
1975 |       <td>unknown</td>
1976 |       <td>-3.34</td>
1977 |     </tr>
1978 |     <tr>
1979 |       <th>1</th>
1980 |       <td>Middle Aged</td>
1981 |       <td>-1.97</td>
1982 |     </tr>
1983 |     <tr>
1984 |       <th>2</th>
1985 |       <td>Retirees</td>
1986 |       <td>-1.23</td>
1987 |     </tr>
1988 |     <tr>
1989 |       <th>3</th>
1990 |       <td>Young Adults</td>
1991 |       <td>-0.92</td>
1992 |     </tr>
1993 |   </tbody>
1994 | </table>
1995 | </div>
1996 | 
1997 | 
1998 | 
1999 | ### Sales By Demographic
2000 | The **Families** demographic category has been the most negatively impacted by the implementation of the new system at Data Mart.
2001 | 
2002 | 
2003 | ```python
2004 | pd.read_sql("""
2005 | WITH demographic_sales_cte AS
2006 | (
2007 |     SELECT 
2008 |         demographic, 
2009 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales END) AS sales_before,
2010 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales END) AS sales_after
2011 |     FROM data_mart.clean_weekly_sales
2012 |     WHERE calendar_year = 2020
2013 |     GROUP BY demographic
2014 | ) 
2015 | SELECT 
2016 |     demographic, 
2017 |     ROUND((sales_after - sales_before)/sales_before::NUMERIC * 100,2) AS sales_change_percent
2018 | FROM demographic_sales_cte
2019 | ORDER BY sales_change_percent ASC;
2020 | """, conn)
2021 | ```
2022 | 
2023 | <table border="1" class="dataframe">
2024 |   <thead>
2025 |     <tr style="text-align: right;">
2026 |       <th></th>
2027 |       <th>demographic</th>
2028 |       <th>sales_change_percent</th>
2029 |     </tr>
2030 |   </thead>
2031 |   <tbody>
2032 |     <tr>
2033 |       <th>0</th>
2034 |       <td>unknown</td>
2035 |       <td>-3.34</td>
2036 |     </tr>
2037 |     <tr>
2038 |       <th>1</th>
2039 |       <td>Families</td>
2040 |       <td>-1.82</td>
2041 |     </tr>
2042 |     <tr>
2043 |       <th>2</th>
2044 |       <td>Couples</td>
2045 |       <td>-0.87</td>
2046 |     </tr>
2047 |   </tbody>
2048 | </table>
2049 | </div>
2050 | 
2051 | 
2052 | 
2053 | ### Sales By Customer_Type
2054 | 
2055 | The **Guest** customer type has been the most negatively impacted by the implementation of the new system at Data Mart.
2056 | 
2057 | 
2058 | ```python
2059 | pd.read_sql("""
2060 | WITH customer_type_sales_cte AS
2061 | (
2062 |     SELECT 
2063 |         customer_type, 
2064 |         SUM(CASE WHEN week_number BETWEEN 13 AND 24 THEN sales END) AS sales_before,
2065 |         SUM(CASE WHEN week_number BETWEEN 25 AND 37 THEN sales END) AS sales_after
2066 |     FROM data_mart.clean_weekly_sales
2067 |     WHERE calendar_year = 2020
2068 |     GROUP BY customer_type
2069 | ) 
2070 | SELECT 
2071 |     customer_type, 
2072 |     ROUND((sales_after - sales_before)/sales_before::NUMERIC * 100,2) AS sales_change_percent
2073 | FROM customer_type_sales_cte
2074 | ORDER BY sales_change_percent ASC;
2075 | """, conn)
2076 | ```
2077 | 
2078 | <table border="1" class="dataframe">
2079 |   <thead>
2080 |     <tr style="text-align: right;">
2081 |       <th></th>
2082 |       <th>customer_type</th>
2083 |       <th>sales_change_percent</th>
2084 |     </tr>
2085 |   </thead>
2086 |   <tbody>
2087 |     <tr>
2088 |       <th>0</th>
2089 |       <td>Guest</td>
2090 |       <td>-3.00</td>
2091 |     </tr>
2092 |     <tr>
2093 |       <th>1</th>
2094 |       <td>Existing</td>
2095 |       <td>-2.27</td>
2096 |     </tr>
2097 |     <tr>
2098 |       <th>2</th>
2099 |       <td>New</td>
2100 |       <td>1.01</td>
2101 |     </tr>
2102 |   </tbody>
2103 | </table>
2104 | </div>
2105 | 
2106 | 
2107 | 
2108 | 
2109 | ```python
2110 | conn.close()
2111 | ```
2112 | 


--------------------------------------------------------------------------------
/CaseStudy#5 - Data Mart/README.md:
--------------------------------------------------------------------------------
 1 | # Case Study #5: Data Mart 🧺
 2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
 3 | [![View Case Study 5](https://img.shields.io/badge/View-Case_Study_5-red)](https://8weeksqlchallenge.com/case-study-5/)</br>
 4 | ![5](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/a07627c4-27c5-47f5-a6f8-dc3379831287)
 5 | 
 6 | 
 7 | The case study presented here is part of the **8 Week SQL Challenge**.\
 8 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).
 9 | 
10 | This time, I am using `PostgreSQL queries` (instead of MySQL) in `Jupyter Notebook` to quickly view results, which provides me with an opportunity 
11 |   - to learn PostgreSQL
12 |   - to utilize handy mathematical and string functions.
13 | 
14 | 
15 | 
16 | ## Table of Contents
17 | * [Entity Relationship Diagram](#entity-relationship-diagram)
18 | * [Datasets](#datasets)
19 | * [Case Study Questions](#case-study-questions)
20 | * [Solutions](#solutions)
21 | * [PostgreSQL Topics Covered](#postgresql-topics-covered)
22 | 
23 | 
24 | ## Entity Relationship Diagram
25 | ![DataMart](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/b0bb72b8-e579-41c6-99f0-ce73718f0d73)
26 | 
27 | 
28 | ## Datasets
29 | The Case Study #5 contains 1 dataset:</br>
30 | **`weekly_sales`**: This table shows all the sales records made at Data Mart.
31 | 
32 | <details>
33 |   <summary>View dataset</summary>
34 | 
35 | Here are the 10 random rows of the `weekly_sales` dataset
36 | 
37 | week_date | region | platform | segment | customer_type | transactions | sales
38 | --- | --- | --- | --- | --- | --- | ---
39 | 9/9/20 | OCEANIA | Shopify | C3 | New | 610 | 110033.89
40 | 29/7/20 | AFRICA | Retail | C1 | New | 110692 | 3053771.19
41 | 22/7/20 | EUROPE | Shopify | C4 | Existing | 24 | 8101.54
42 | 13/5/20 | AFRICA | Shopify | null | Guest | 5287 | 1003301.37
43 | 24/7/19 | ASIA | Retail | C1 | New | 127342 | 3151780.41
44 | 10/7/19 | CANADA | Shopify | F3 | New | 51 | 8844.93
45 | 26/6/19 | OCEANIA | Retail | C3 | New | 152921 | 5551385.36
46 | 29/5/19 | SOUTH AMERICA | Shopify | null | New | 53 | 10056.2
47 | 22/8/18 | AFRICA | Retail | null | Existing | 31721 | 1718863.58
48 | 25/7/18 | SOUTH AMERICA | Retail | null | New | 2136 | 81757.91
49 | 
50 | </details>
51 | 
52 | ## Case Study Questions
53 | Case Study #5 is categorized into 4 question groups\
54 | To view the specific section, please open the link in a *`new tab`* or *`window`*.\
55 | [A. Data Cleansing](CaseStudy5_solutions.md#A)\
56 | [B. Data Exploration](CaseStudy5_solutions.md#B)\
57 | [C. Before & After Analysis](CaseStudy5_solutions.md#C)\
58 | [D. Bonus Question](CaseStudy5_solutions.md#D)
59 | 
60 | ## Solutions
61 | - View `data_mart` database: [**here**](CaseStudy5_schema.sql)
62 | - View Solution:
63 |     - [**Markdown File**](CaseStudy5_solutions.md): offers a more fluid and responsive viewing experience
64 |     - [**Jupyter Notebook**](CaseStudy5_solutions.ipynb): contains the original code
65 | 
66 | 
67 | ## PostgreSQL Topics Covered
68 | - Data Cleaning
69 | - Pivot Table using CASE WHEN
70 | - DATE Type Manipulation
71 | - Common Table Expressions (CTE)
72 | - Window Functions
73 | - Subqueries
74 | - JOIN, UNION ALL
75 | 


--------------------------------------------------------------------------------
/CaseStudy#5 - Data Mart/image/plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/4b36f745d833a82a61a9ce70ab177462eb7dbfd0/CaseStudy#5 - Data Mart/image/plot.png


--------------------------------------------------------------------------------
/CaseStudy#6 - Clique Bait/CaseStudy6_solutions.md:
--------------------------------------------------------------------------------
   1 | # Case Study #6: Clique Bait
   2 | The case study questions presented here are created by [**Data With Danny**](https://linktr.ee/datawithdanny). They are part of the [**8 Week SQL Challenge**](https://8weeksqlchallenge.com/).
   3 | 
   4 | My SQL queries are written in the `PostgreSQL 15` dialect, integrated into `Jupyter Notebook`, which allows us to instantly view the query results and document the queries.
   5 | 
   6 | For more details about the **Case Study #6**, click [**here**](https://8weeksqlchallenge.com/case-study-6/).
   7 | 
   8 | ## Table of Contents
   9 | 
  10 | ### [1. Importing Libraries](#Import)
  11 | 
  12 | ### [2. Tables of the Database](#Tables)
  13 | 
  14 | ### [3. Case Study Questions](#CaseStudyQuestions)
  15 | 
  16 | - [A. Enterprise Relationship Diagram](#A)
  17 | - [B. Digital Analysis](#B)
  18 | - [C. Product Funnel Analysis](#C)
  19 | - [D. Campaigns Analysis](#D)
  20 | 
  21 | 
  22 | <a id = 'Import'></a>
  23 | ## 1. Importing Libraries
  24 | 
  25 | 
  26 | ```python
  27 | import psycopg2 as pg2
  28 | import pandas as pd
  29 | import os
  30 | import warnings
  31 | 
  32 | warnings.filterwarnings("ignore")
  33 | ```
  34 | 
  35 | ### Connecting PostgreSQL database from Jupyter Notebook
  36 | 
  37 | 
  38 | ```python
  39 | # Get my PostgreSQL password
  40 | mypassword = os.getenv("POSTGRESQL_PASSWORD")
  41 | 
  42 | # Connecting to database
  43 | conn = pg2.connect(user = 'postgres', password = mypassword, database = 'clique_bait')
  44 | cursor = conn.cursor()
  45 | ```
  46 | 
  47 | ___
  48 | <a id = 'Tables'></a>
  49 | ## 2. Tables of the Database
  50 | 
  51 | First, let's verify if the connected database contains the 5 dataset names. 
  52 | 
  53 | 
  54 | ```python
  55 | cursor.execute("""
  56 | SELECT table_schema, table_name
  57 | FROM information_schema.tables
  58 | WHERE table_schema = 'clique_bait'
  59 | """)
  60 | 
  61 | table_names = []
  62 | print('--- Tables within "data_mart" database --- ')
  63 | for table in cursor:
  64 |     print(table[1])
  65 |     table_names.append(table[1])
  66 | ```
  67 | 
  68 |     --- Tables within "data_mart" database --- 
  69 |     event_identifier
  70 |     campaign_identifier
  71 |     page_hierarchy
  72 |     users
  73 |     events
  74 |     
  75 | 
  76 | #### Here are the 5 datasets of the "clique_bait" database. For more details about each dataset, please click [here](https://8weeksqlchallenge.com/case-study-6/).
  77 | 
  78 | 
  79 | ```python
  80 | for table in table_names:
  81 |     print("\nTable: ", table)
  82 |     display(pd.read_sql("SELECT * FROM clique_bait." + table, conn))
  83 | ```
  84 | 
  85 |     
  86 |     Table:  event_identifier
  87 |     
  88 | <table border="1" class="dataframe">
  89 |   <thead>
  90 |     <tr style="text-align: right;">
  91 |       <th></th>
  92 |       <th>event_type</th>
  93 |       <th>event_name</th>
  94 |     </tr>
  95 |   </thead>
  96 |   <tbody>
  97 |     <tr>
  98 |       <th>0</th>
  99 |       <td>1</td>
 100 |       <td>Page View</td>
 101 |     </tr>
 102 |     <tr>
 103 |       <th>1</th>
 104 |       <td>2</td>
 105 |       <td>Add to Cart</td>
 106 |     </tr>
 107 |     <tr>
 108 |       <th>2</th>
 109 |       <td>3</td>
 110 |       <td>Purchase</td>
 111 |     </tr>
 112 |     <tr>
 113 |       <th>3</th>
 114 |       <td>4</td>
 115 |       <td>Ad Impression</td>
 116 |     </tr>
 117 |     <tr>
 118 |       <th>4</th>
 119 |       <td>5</td>
 120 |       <td>Ad Click</td>
 121 |     </tr>
 122 |   </tbody>
 123 | </table>
 124 | </div>
 125 | 
 126 | 
 127 |     
 128 |     Table:  campaign_identifier
 129 |     
 130 | <table border="1" class="dataframe">
 131 |   <thead>
 132 |     <tr style="text-align: right;">
 133 |       <th></th>
 134 |       <th>campaign_id</th>
 135 |       <th>products</th>
 136 |       <th>campaign_name</th>
 137 |       <th>start_date</th>
 138 |       <th>end_date</th>
 139 |     </tr>
 140 |   </thead>
 141 |   <tbody>
 142 |     <tr>
 143 |       <th>0</th>
 144 |       <td>1</td>
 145 |       <td>1-3</td>
 146 |       <td>BOGOF - Fishing For Compliments</td>
 147 |       <td>2020-01-01</td>
 148 |       <td>2020-01-14</td>
 149 |     </tr>
 150 |     <tr>
 151 |       <th>1</th>
 152 |       <td>2</td>
 153 |       <td>4-5</td>
 154 |       <td>25% Off - Living The Lux Life</td>
 155 |       <td>2020-01-15</td>
 156 |       <td>2020-01-28</td>
 157 |     </tr>
 158 |     <tr>
 159 |       <th>2</th>
 160 |       <td>3</td>
 161 |       <td>6-8</td>
 162 |       <td>Half Off - Treat Your Shellf(ish)</td>
 163 |       <td>2020-02-01</td>
 164 |       <td>2020-03-31</td>
 165 |     </tr>
 166 |   </tbody>
 167 | </table>
 168 | </div>
 169 | 
 170 | 
 171 |     
 172 |     Table:  page_hierarchy
 173 |     
 174 | <table border="1" class="dataframe">
 175 |   <thead>
 176 |     <tr style="text-align: right;">
 177 |       <th></th>
 178 |       <th>page_id</th>
 179 |       <th>page_name</th>
 180 |       <th>product_category</th>
 181 |       <th>product_id</th>
 182 |     </tr>
 183 |   </thead>
 184 |   <tbody>
 185 |     <tr>
 186 |       <th>0</th>
 187 |       <td>1</td>
 188 |       <td>Home Page</td>
 189 |       <td>None</td>
 190 |       <td>NaN</td>
 191 |     </tr>
 192 |     <tr>
 193 |       <th>1</th>
 194 |       <td>2</td>
 195 |       <td>All Products</td>
 196 |       <td>None</td>
 197 |       <td>NaN</td>
 198 |     </tr>
 199 |     <tr>
 200 |       <th>2</th>
 201 |       <td>3</td>
 202 |       <td>Salmon</td>
 203 |       <td>Fish</td>
 204 |       <td>1.0</td>
 205 |     </tr>
 206 |     <tr>
 207 |       <th>3</th>
 208 |       <td>4</td>
 209 |       <td>Kingfish</td>
 210 |       <td>Fish</td>
 211 |       <td>2.0</td>
 212 |     </tr>
 213 |     <tr>
 214 |       <th>4</th>
 215 |       <td>5</td>
 216 |       <td>Tuna</td>
 217 |       <td>Fish</td>
 218 |       <td>3.0</td>
 219 |     </tr>
 220 |     <tr>
 221 |       <th>5</th>
 222 |       <td>6</td>
 223 |       <td>Russian Caviar</td>
 224 |       <td>Luxury</td>
 225 |       <td>4.0</td>
 226 |     </tr>
 227 |     <tr>
 228 |       <th>6</th>
 229 |       <td>7</td>
 230 |       <td>Black Truffle</td>
 231 |       <td>Luxury</td>
 232 |       <td>5.0</td>
 233 |     </tr>
 234 |     <tr>
 235 |       <th>7</th>
 236 |       <td>8</td>
 237 |       <td>Abalone</td>
 238 |       <td>Shellfish</td>
 239 |       <td>6.0</td>
 240 |     </tr>
 241 |     <tr>
 242 |       <th>8</th>
 243 |       <td>9</td>
 244 |       <td>Lobster</td>
 245 |       <td>Shellfish</td>
 246 |       <td>7.0</td>
 247 |     </tr>
 248 |     <tr>
 249 |       <th>9</th>
 250 |       <td>10</td>
 251 |       <td>Crab</td>
 252 |       <td>Shellfish</td>
 253 |       <td>8.0</td>
 254 |     </tr>
 255 |     <tr>
 256 |       <th>10</th>
 257 |       <td>11</td>
 258 |       <td>Oyster</td>
 259 |       <td>Shellfish</td>
 260 |       <td>9.0</td>
 261 |     </tr>
 262 |     <tr>
 263 |       <th>11</th>
 264 |       <td>12</td>
 265 |       <td>Checkout</td>
 266 |       <td>None</td>
 267 |       <td>NaN</td>
 268 |     </tr>
 269 |     <tr>
 270 |       <th>12</th>
 271 |       <td>13</td>
 272 |       <td>Confirmation</td>
 273 |       <td>None</td>
 274 |       <td>NaN</td>
 275 |     </tr>
 276 |   </tbody>
 277 | </table>
 278 | </div>
 279 | 
 280 | 
 281 |     
 282 |     Table:  users
 283 | 
 284 | <table border="1" class="dataframe">
 285 |   <thead>
 286 |     <tr style="text-align: right;">
 287 |       <th></th>
 288 |       <th>user_id</th>
 289 |       <th>cookie_id</th>
 290 |       <th>start_date</th>
 291 |     </tr>
 292 |   </thead>
 293 |   <tbody>
 294 |     <tr>
 295 |       <th>0</th>
 296 |       <td>1</td>
 297 |       <td>c4ca42</td>
 298 |       <td>2020-02-04</td>
 299 |     </tr>
 300 |     <tr>
 301 |       <th>1</th>
 302 |       <td>2</td>
 303 |       <td>c81e72</td>
 304 |       <td>2020-01-18</td>
 305 |     </tr>
 306 |     <tr>
 307 |       <th>2</th>
 308 |       <td>3</td>
 309 |       <td>eccbc8</td>
 310 |       <td>2020-02-21</td>
 311 |     </tr>
 312 |     <tr>
 313 |       <th>3</th>
 314 |       <td>4</td>
 315 |       <td>a87ff6</td>
 316 |       <td>2020-02-22</td>
 317 |     </tr>
 318 |     <tr>
 319 |       <th>4</th>
 320 |       <td>5</td>
 321 |       <td>e4da3b</td>
 322 |       <td>2020-02-01</td>
 323 |     </tr>
 324 |     <tr>
 325 |       <th>...</th>
 326 |       <td>...</td>
 327 |       <td>...</td>
 328 |       <td>...</td>
 329 |     </tr>
 330 |     <tr>
 331 |       <th>1777</th>
 332 |       <td>25</td>
 333 |       <td>46dd2f</td>
 334 |       <td>2020-03-29</td>
 335 |     </tr>
 336 |     <tr>
 337 |       <th>1778</th>
 338 |       <td>94</td>
 339 |       <td>59511b</td>
 340 |       <td>2020-03-22</td>
 341 |     </tr>
 342 |     <tr>
 343 |       <th>1779</th>
 344 |       <td>49</td>
 345 |       <td>d345a8</td>
 346 |       <td>2020-02-23</td>
 347 |     </tr>
 348 |     <tr>
 349 |       <th>1780</th>
 350 |       <td>211</td>
 351 |       <td>a26e03</td>
 352 |       <td>2020-02-20</td>
 353 |     </tr>
 354 |     <tr>
 355 |       <th>1781</th>
 356 |       <td>64</td>
 357 |       <td>87a4ba</td>
 358 |       <td>2020-03-18</td>
 359 |     </tr>
 360 |   </tbody>
 361 | </table>
 362 | <p>1782 rows × 3 columns</p>
 363 | </div>
 364 | 
 365 | 
 366 |     
 367 |     Table:  events
 368 | 
 369 | <table border="1" class="dataframe">
 370 |   <thead>
 371 |     <tr style="text-align: right;">
 372 |       <th></th>
 373 |       <th>visit_id</th>
 374 |       <th>cookie_id</th>
 375 |       <th>page_id</th>
 376 |       <th>event_type</th>
 377 |       <th>sequence_number</th>
 378 |       <th>event_time</th>
 379 |     </tr>
 380 |   </thead>
 381 |   <tbody>
 382 |     <tr>
 383 |       <th>0</th>
 384 |       <td>ccf365</td>
 385 |       <td>c4ca42</td>
 386 |       <td>1</td>
 387 |       <td>1</td>
 388 |       <td>1</td>
 389 |       <td>2020-02-04 19:16:09.182546</td>
 390 |     </tr>
 391 |     <tr>
 392 |       <th>1</th>
 393 |       <td>ccf365</td>
 394 |       <td>c4ca42</td>
 395 |       <td>2</td>
 396 |       <td>1</td>
 397 |       <td>2</td>
 398 |       <td>2020-02-04 19:16:17.358191</td>
 399 |     </tr>
 400 |     <tr>
 401 |       <th>2</th>
 402 |       <td>ccf365</td>
 403 |       <td>c4ca42</td>
 404 |       <td>6</td>
 405 |       <td>1</td>
 406 |       <td>3</td>
 407 |       <td>2020-02-04 19:16:58.454669</td>
 408 |     </tr>
 409 |     <tr>
 410 |       <th>3</th>
 411 |       <td>ccf365</td>
 412 |       <td>c4ca42</td>
 413 |       <td>9</td>
 414 |       <td>1</td>
 415 |       <td>4</td>
 416 |       <td>2020-02-04 19:16:58.609142</td>
 417 |     </tr>
 418 |     <tr>
 419 |       <th>4</th>
 420 |       <td>ccf365</td>
 421 |       <td>c4ca42</td>
 422 |       <td>9</td>
 423 |       <td>2</td>
 424 |       <td>5</td>
 425 |       <td>2020-02-04 19:17:51.729420</td>
 426 |     </tr>
 427 |     <tr>
 428 |       <th>...</th>
 429 |       <td>...</td>
 430 |       <td>...</td>
 431 |       <td>...</td>
 432 |       <td>...</td>
 433 |       <td>...</td>
 434 |       <td>...</td>
 435 |     </tr>
 436 |     <tr>
 437 |       <th>32729</th>
 438 |       <td>355a6a</td>
 439 |       <td>87a4ba</td>
 440 |       <td>10</td>
 441 |       <td>1</td>
 442 |       <td>15</td>
 443 |       <td>2020-03-18 22:44:16.541396</td>
 444 |     </tr>
 445 |     <tr>
 446 |       <th>32730</th>
 447 |       <td>355a6a</td>
 448 |       <td>87a4ba</td>
 449 |       <td>11</td>
 450 |       <td>1</td>
 451 |       <td>16</td>
 452 |       <td>2020-03-18 22:44:18.900830</td>
 453 |     </tr>
 454 |     <tr>
 455 |       <th>32731</th>
 456 |       <td>355a6a</td>
 457 |       <td>87a4ba</td>
 458 |       <td>11</td>
 459 |       <td>2</td>
 460 |       <td>17</td>
 461 |       <td>2020-03-18 22:45:12.670472</td>
 462 |     </tr>
 463 |     <tr>
 464 |       <th>32732</th>
 465 |       <td>355a6a</td>
 466 |       <td>87a4ba</td>
 467 |       <td>12</td>
 468 |       <td>1</td>
 469 |       <td>18</td>
 470 |       <td>2020-03-18 22:45:54.081818</td>
 471 |     </tr>
 472 |     <tr>
 473 |       <th>32733</th>
 474 |       <td>355a6a</td>
 475 |       <td>87a4ba</td>
 476 |       <td>13</td>
 477 |       <td>3</td>
 478 |       <td>19</td>
 479 |       <td>2020-03-18 22:45:54.984666</td>
 480 |     </tr>
 481 |   </tbody>
 482 | </table>
 483 | <p>32734 rows × 6 columns</p>
 484 | </div>
 485 | 
 486 | 
 487 | 
 488 | <a id = 'CaseStudyQuestions'></a>
 489 | ## 3. Case Study Questions
 490 | 
 491 | <a id = 'A'></a>
 492 | ## A. Enterprise Relationship Diagram
 493 | Using the following DDL schema details to create an ERD for all the Clique Bait datasets.</br>
 494 | [Click here](https://dbdiagram.io/) to access the DB Diagram tool to create the ERD.
 495 | </br>
 496 | </br>
 497 | </br>
 498 | **ANSWER**</br>
 499 | Here is the code that I used for creating the **ERD** for all the Clique Bait datasets on [DB Diagram tool](https://dbdiagram.io/). 
 500 | 
 501 | ```
 502 | TABLE event_identifier {
 503 |   "event_type" INTEGER
 504 |   "event_name" VARCHAR(13)
 505 | }
 506 | 
 507 | TABLE campaign_identifier {
 508 |   "campaign_id" INTEGER
 509 |   "products" VARCHAR(3)
 510 |   "campaign_name" VARCHAR(33)
 511 |   "start_date" TIMESTAMP
 512 |   "end_date" TIMESTAMP
 513 | }
 514 | 
 515 | TABLE page_hierarchy {
 516 |   "page_id" INTEGER
 517 |   "page_name" VARCHAR(14)
 518 |   "product_category" VARCHAR(9)
 519 |   "product_id" INTEGER
 520 | }
 521 | 
 522 | TABLE users {
 523 |   "user_id" INTEGER
 524 |   "cookie_id" VARCHAR(6)
 525 |   "start_date" TIMESTAMP
 526 | }
 527 | 
 528 | TABLE events {
 529 |   "visit_id" VARCHAR(6)
 530 |   "cookie_id" VARCHAR(6)
 531 |   "page_id" INTEGER
 532 |   "event_type" INTEGER
 533 |   "sequence_number" INTEGER
 534 |   "event_time" TIMESTAMP
 535 | }
 536 | 
 537 | // Establish connections or references between datasets
 538 | Ref: "events"."event_type" > "event_identifier"."event_type"
 539 | Ref: "events"."page_id" > "page_hierarchy"."page_id"
 540 | Ref: "events"."cookie_id" > "users"."cookie_id"
 541 | ```
 542 | 
 543 | **Result**
 544 | ![CliqueBait_ERD](images/CliqueBait_ERD.png)
 545 | 
 546 | 
 547 | <a id = 'B'></a>
 548 | ## B. Digital Analysis
 549 | 
 550 | #### 1. How many users are there?
 551 | 
 552 | 
 553 | ```python
 554 | pd.read_sql("""
 555 | SELECT COUNT(DISTINCT user_id) AS nb_users
 556 | FROM clique_bait.users;
 557 | """, conn)
 558 | ```
 559 | 
 560 | <table border="1" class="dataframe">
 561 |   <thead>
 562 |     <tr style="text-align: right;">
 563 |       <th></th>
 564 |       <th>nb_users</th>
 565 |     </tr>
 566 |   </thead>
 567 |   <tbody>
 568 |     <tr>
 569 |       <th>0</th>
 570 |       <td>500</td>
 571 |     </tr>
 572 |   </tbody>
 573 | </table>
 574 | </div>
 575 | 
 576 | 
 577 | 
 578 | **Result**\
 579 | There are 500 users.
 580 | 
 581 | ___
 582 | #### 2. How many cookies does each user have on average?
 583 | 
 584 | 
 585 | ```python
 586 | pd.read_sql("""
 587 | SELECT ROUND(AVG(nb_cookie_ids))::INTEGER AS avg_cookies_per_user
 588 | FROM
 589 | (
 590 |     SELECT DISTINCT user_id, COUNT(cookie_id) AS nb_cookie_ids
 591 |     FROM clique_bait.users
 592 |     GROUP BY user_id
 593 | ) nb_cookies_per_user;
 594 | """, conn)
 595 | ```
 596 | 
 597 | <table border="1" class="dataframe">
 598 |   <thead>
 599 |     <tr style="text-align: right;">
 600 |       <th></th>
 601 |       <th>avg_cookies_per_user</th>
 602 |     </tr>
 603 |   </thead>
 604 |   <tbody>
 605 |     <tr>
 606 |       <th>0</th>
 607 |       <td>4</td>
 608 |     </tr>
 609 |   </tbody>
 610 | </table>
 611 | </div>
 612 | 
 613 | 
 614 | 
 615 | **Result**\
 616 | Each user has an average of 4 cookies.
 617 | 
 618 | ___
 619 | #### 3. What is the unique number of visits by all users per month?
 620 | 
 621 | 
 622 | ```python
 623 | pd.read_sql("""
 624 | SELECT 
 625 |     DATE_PART('month', u.start_date)::INTEGER AS month, 
 626 |     TO_CHAR(u.start_date,'Month') AS month_name,
 627 |     COUNT(DISTINCT e.visit_id) AS nb_visits
 628 | FROM clique_bait.users u
 629 | JOIN clique_bait.events e ON u.cookie_id = e.cookie_id
 630 | GROUP BY month, month_name
 631 | ORDER BY month
 632 | """, conn)
 633 | ```
 634 | 
 635 | <table border="1" class="dataframe">
 636 |   <thead>
 637 |     <tr style="text-align: right;">
 638 |       <th></th>
 639 |       <th>month</th>
 640 |       <th>month_name</th>
 641 |       <th>nb_visits</th>
 642 |     </tr>
 643 |   </thead>
 644 |   <tbody>
 645 |     <tr>
 646 |       <th>0</th>
 647 |       <td>1</td>
 648 |       <td>January</td>
 649 |       <td>876</td>
 650 |     </tr>
 651 |     <tr>
 652 |       <th>1</th>
 653 |       <td>2</td>
 654 |       <td>February</td>
 655 |       <td>1488</td>
 656 |     </tr>
 657 |     <tr>
 658 |       <th>2</th>
 659 |       <td>3</td>
 660 |       <td>March</td>
 661 |       <td>916</td>
 662 |     </tr>
 663 |     <tr>
 664 |       <th>3</th>
 665 |       <td>4</td>
 666 |       <td>April</td>
 667 |       <td>248</td>
 668 |     </tr>
 669 |     <tr>
 670 |       <th>4</th>
 671 |       <td>5</td>
 672 |       <td>May</td>
 673 |       <td>36</td>
 674 |     </tr>
 675 |   </tbody>
 676 | </table>
 677 | </div>
 678 | 
 679 | 
 680 | 
 681 | **Result**
 682 | - The month of **February** has the highest number of visits on the Clique Bait website, with a total of 1,488 visits.
 683 | - Following February, **March** and **January** are the second and third most visited months, accounting for 916 visits and 876 visits, respectively.
 684 | - The month of **May** has the lowest number of visits, with only 36.
 685 | 
 686 | ___
 687 | #### 4. What is the number of events for each event type?
 688 | 
 689 | 
 690 | ```python
 691 | pd.read_sql("""
 692 | SELECT 
 693 |     ei.event_name, 
 694 |     COUNT(*) AS nb_events
 695 | FROM clique_bait.events e
 696 | JOIN clique_bait. event_identifier ei ON e.event_type = ei.event_type
 697 | GROUP BY ei.event_name
 698 | ORDER BY nb_events DESC
 699 | """, conn)
 700 | ```
 701 | 
 702 | <table border="1" class="dataframe">
 703 |   <thead>
 704 |     <tr style="text-align: right;">
 705 |       <th></th>
 706 |       <th>event_name</th>
 707 |       <th>nb_events</th>
 708 |     </tr>
 709 |   </thead>
 710 |   <tbody>
 711 |     <tr>
 712 |       <th>0</th>
 713 |       <td>Page View</td>
 714 |       <td>20928</td>
 715 |     </tr>
 716 |     <tr>
 717 |       <th>1</th>
 718 |       <td>Add to Cart</td>
 719 |       <td>8451</td>
 720 |     </tr>
 721 |     <tr>
 722 |       <th>2</th>
 723 |       <td>Purchase</td>
 724 |       <td>1777</td>
 725 |     </tr>
 726 |     <tr>
 727 |       <th>3</th>
 728 |       <td>Ad Impression</td>
 729 |       <td>876</td>
 730 |     </tr>
 731 |     <tr>
 732 |       <th>4</th>
 733 |       <td>Ad Click</td>
 734 |       <td>702</td>
 735 |     </tr>
 736 |   </tbody>
 737 | </table>
 738 | </div>
 739 | 
 740 | 
 741 | 
 742 | **Result**
 743 | - **Page View** has the highest number of events, with a total of 20,928.
 744 | - Following Page View, **Add to Cart** and **Purchase** are the second and third most interacted pages, accounting for 8,451 events and 1,777 events, respectively.
 745 | - Taking into consideration that not all users receive a campaign impression and that not all users clicked on the impression advertisement, the **Ad Impression** page has been interacted with 876 times, and clicking the impression (see **Ad Click**) has occurred only 702 times.
 746 | 
 747 | ___
 748 | #### 5. What is the percentage of visits which have a purchase event?
 749 | 
 750 | 
 751 | ```python
 752 | pd.read_sql("""
 753 | SELECT 
 754 |     COUNT(*) AS nb_purchase_event,
 755 |     (SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events) AS total_nb_visits,
 756 |     CONCAT(ROUND(COUNT(*)/(SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events)::NUMERIC * 100, 1), ' %') AS purchase_percent
 757 | FROM clique_bait.events e
 758 | JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
 759 | WHERE ei.event_name = 'Purchase'
 760 | """, conn)
 761 | ```
 762 | 
 763 | <table border="1" class="dataframe">
 764 |   <thead>
 765 |     <tr style="text-align: right;">
 766 |       <th></th>
 767 |       <th>nb_purchase_event</th>
 768 |       <th>total_nb_visits</th>
 769 |       <th>purchase_percent</th>
 770 |     </tr>
 771 |   </thead>
 772 |   <tbody>
 773 |     <tr>
 774 |       <th>0</th>
 775 |       <td>1777</td>
 776 |       <td>3564</td>
 777 |       <td>49.9 %</td>
 778 |     </tr>
 779 |   </tbody>
 780 | </table>
 781 | </div>
 782 | 
 783 | 
 784 | 
 785 | **Result**\
 786 | 49.9% of the visits resulted in a purchase event.
 787 | 
 788 | ___
 789 | #### 6. What is the percentage of visits which view the checkout page but do not have a purchase event?
 790 | 
 791 | 
 792 | ```python
 793 | pd.read_sql("""
 794 | SELECT 
 795 |     (nb_checkouts - nb_purchases) AS nb_checkouts_without_purchase,
 796 |     (SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events) AS total_visits,
 797 |     ROUND((nb_checkouts - nb_purchases)/(SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events)::NUMERIC * 100,2) AS percent
 798 | FROM
 799 | (
 800 |     SELECT 
 801 |         SUM(CASE WHEN ph.page_name = 'Checkout' AND ei.event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_checkouts,
 802 |         SUM(CASE WHEN ei.event_name = 'Purchase' THEN 1 ELSE 0 END) AS nb_purchases
 803 |     FROM clique_bait.events e 
 804 |     JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
 805 |     JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
 806 | ) c
 807 | """, conn)
 808 | ```
 809 | 
 810 | <table border="1" class="dataframe">
 811 |   <thead>
 812 |     <tr style="text-align: right;">
 813 |       <th></th>
 814 |       <th>nb_checkouts_without_purchase</th>
 815 |       <th>total_visits</th>
 816 |       <th>percent</th>
 817 |     </tr>
 818 |   </thead>
 819 |   <tbody>
 820 |     <tr>
 821 |       <th>0</th>
 822 |       <td>326</td>
 823 |       <td>3564</td>
 824 |       <td>9.15</td>
 825 |     </tr>
 826 |   </tbody>
 827 | </table>
 828 | </div>
 829 | 
 830 | 
 831 | 
 832 | **Result**\
 833 | Only 9.15% of the visits resulted in viewing the checkout page without a purchase event.
 834 | 
 835 | ___
 836 | #### 7. What are the top 3 pages by number of views?
 837 | 
 838 | 
 839 | ```python
 840 | pd.read_sql("""
 841 | SELECT 
 842 |     ph.page_name, 
 843 |     COUNT(*) AS nb_views
 844 | FROM clique_bait.events e
 845 | JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
 846 | JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
 847 | WHERE ei.event_name = 'Page View'
 848 | GROUP BY ph.page_name
 849 | ORDER BY nb_views DESC
 850 | LIMIT 3
 851 | """, conn)
 852 | ```
 853 | 
 854 | <table border="1" class="dataframe">
 855 |   <thead>
 856 |     <tr style="text-align: right;">
 857 |       <th></th>
 858 |       <th>page_name</th>
 859 |       <th>nb_views</th>
 860 |     </tr>
 861 |   </thead>
 862 |   <tbody>
 863 |     <tr>
 864 |       <th>0</th>
 865 |       <td>All Products</td>
 866 |       <td>3174</td>
 867 |     </tr>
 868 |     <tr>
 869 |       <th>1</th>
 870 |       <td>Checkout</td>
 871 |       <td>2103</td>
 872 |     </tr>
 873 |     <tr>
 874 |       <th>2</th>
 875 |       <td>Home Page</td>
 876 |       <td>1782</td>
 877 |     </tr>
 878 |   </tbody>
 879 | </table>
 880 | </div>
 881 | 
 882 | 
 883 | 
 884 | ___
 885 | #### 8. What is the number of views and cart adds for each product category?
 886 | 
 887 | 
 888 | ```python
 889 | pd.read_sql("""
 890 | SELECT 
 891 |     ph.product_category, 
 892 |     SUM(CASE WHEN event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_views,
 893 |     SUM(CASE WHEN event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS nb_card_adds
 894 | FROM clique_bait.events e
 895 | JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
 896 | JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
 897 | WHERE ph.product_category IS NOT NULL
 898 | GROUP BY ph.product_category
 899 | ORDER BY nb_views DESC
 900 | """, conn)
 901 | ```
 902 | 
 903 | <table border="1" class="dataframe">
 904 |   <thead>
 905 |     <tr style="text-align: right;">
 906 |       <th></th>
 907 |       <th>product_category</th>
 908 |       <th>nb_views</th>
 909 |       <th>nb_card_adds</th>
 910 |     </tr>
 911 |   </thead>
 912 |   <tbody>
 913 |     <tr>
 914 |       <th>0</th>
 915 |       <td>Shellfish</td>
 916 |       <td>6204</td>
 917 |       <td>3792</td>
 918 |     </tr>
 919 |     <tr>
 920 |       <th>1</th>
 921 |       <td>Fish</td>
 922 |       <td>4633</td>
 923 |       <td>2789</td>
 924 |     </tr>
 925 |     <tr>
 926 |       <th>2</th>
 927 |       <td>Luxury</td>
 928 |       <td>3032</td>
 929 |       <td>1870</td>
 930 |     </tr>
 931 |   </tbody>
 932 | </table>
 933 | </div>
 934 | 
 935 | 
 936 | 
 937 | ___
 938 | #### 9. What are the top 3 products by purchases?
 939 | 
 940 | 
 941 | ```python
 942 | pd.read_sql("""
 943 | WITH visitID_with_purchases_cte AS
 944 | (
 945 |     -- Retrieve visit IDS that have made purchases
 946 |     
 947 |     SELECT e.visit_id
 948 |     FROM clique_bait.events e
 949 |     JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
 950 |     WHERE ei.event_name = 'Purchase'
 951 | )
 952 | SELECT 
 953 |     ph.page_name as product, 
 954 |     COUNT(*) AS nb_purchases
 955 | FROM visitID_with_purchases_cte cte
 956 | JOIN clique_bait.events e ON cte.visit_id = e.visit_id
 957 | JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
 958 | JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
 959 | WHERE ph.product_category IS NOT NULL 
 960 | AND ei.event_name = 'Add to Cart'
 961 | GROUP BY ph.page_name 
 962 | ORDER BY nb_purchases DESC
 963 | LIMIT 3
 964 | """, conn)
 965 | ```
 966 | 
 967 | <table border="1" class="dataframe">
 968 |   <thead>
 969 |     <tr style="text-align: right;">
 970 |       <th></th>
 971 |       <th>product</th>
 972 |       <th>nb_purchases</th>
 973 |     </tr>
 974 |   </thead>
 975 |   <tbody>
 976 |     <tr>
 977 |       <th>0</th>
 978 |       <td>Lobster</td>
 979 |       <td>754</td>
 980 |     </tr>
 981 |     <tr>
 982 |       <th>1</th>
 983 |       <td>Oyster</td>
 984 |       <td>726</td>
 985 |     </tr>
 986 |     <tr>
 987 |       <th>2</th>
 988 |       <td>Crab</td>
 989 |       <td>719</td>
 990 |     </tr>
 991 |   </tbody>
 992 | </table>
 993 | </div>
 994 | 
 995 | 
 996 | 
 997 | Alternatively, we can utilize the Window Function `LAST_VALUE() OVER (PARTITION BY... ORDER BY...)` to extract the most recent event recorded during each visit on the Clique Bait website. The **Purchase** event typically represents the final action taken by users in each visit. Therefore, by applying the `LAST_VALUE()` in the `add_last_event_cte`, we can subsequently filter and identify the products that have indeed been purchased, as indicated in the following statement:</br> `WHERE event_name = 'Add to Cart' AND last_event = 'Purchase'`.
 998 | 
 999 | 
1000 | ```python
1001 | pd.read_sql("""
1002 | WITH add_last_event_cte AS
1003 | (
1004 |     SELECT         
1005 |         e.visit_id,
1006 |         e.sequence_number,
1007 |         ph.page_name, 
1008 |         ph.product_category,
1009 |         ei.event_name,
1010 |         LAST_VALUE(ei.event_name) OVER (PARTITION BY e.visit_id ORDER BY e.sequence_number ROWS BETWEEN UNBOUNDED preceding AND UNBOUNDED following) AS last_event
1011 |     FROM clique_bait.events e 
1012 |     JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
1013 |     JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
1014 | )
1015 | SELECT 
1016 |     page_name, 
1017 |     COUNT(event_name) AS nb_purchases
1018 | FROM add_last_event_cte
1019 | WHERE product_category IS NOT NULL AND event_name = 'Add to Cart' AND last_event = 'Purchase'
1020 | GROUP BY page_name
1021 | ORDER BY nb_purchases DESC
1022 | """, conn)
1023 | ```
1024 | 
1025 | <table border="1" class="dataframe">
1026 |   <thead>
1027 |     <tr style="text-align: right;">
1028 |       <th></th>
1029 |       <th>page_name</th>
1030 |       <th>nb_purchases</th>
1031 |     </tr>
1032 |   </thead>
1033 |   <tbody>
1034 |     <tr>
1035 |       <th>0</th>
1036 |       <td>Lobster</td>
1037 |       <td>754</td>
1038 |     </tr>
1039 |     <tr>
1040 |       <th>1</th>
1041 |       <td>Oyster</td>
1042 |       <td>726</td>
1043 |     </tr>
1044 |     <tr>
1045 |       <th>2</th>
1046 |       <td>Crab</td>
1047 |       <td>719</td>
1048 |     </tr>
1049 |     <tr>
1050 |       <th>3</th>
1051 |       <td>Salmon</td>
1052 |       <td>711</td>
1053 |     </tr>
1054 |     <tr>
1055 |       <th>4</th>
1056 |       <td>Black Truffle</td>
1057 |       <td>707</td>
1058 |     </tr>
1059 |     <tr>
1060 |       <th>5</th>
1061 |       <td>Kingfish</td>
1062 |       <td>707</td>
1063 |     </tr>
1064 |     <tr>
1065 |       <th>6</th>
1066 |       <td>Abalone</td>
1067 |       <td>699</td>
1068 |     </tr>
1069 |     <tr>
1070 |       <th>7</th>
1071 |       <td>Russian Caviar</td>
1072 |       <td>697</td>
1073 |     </tr>
1074 |     <tr>
1075 |       <th>8</th>
1076 |       <td>Tuna</td>
1077 |       <td>697</td>
1078 |     </tr>
1079 |   </tbody>
1080 | </table>
1081 | </div>
1082 | 
1083 | 
1084 | 
1085 | <a id = 'C'></a>
1086 | ## C. Product Funnel Analysis
1087 | Using a single SQL query - create a new output table which has the following details:
1088 | 
1089 | - How many times was each product viewed?
1090 | - How many times was each product added to cart?
1091 | - How many times was each product added to a cart but not purchased (abandoned)?
1092 | - How many times was each product purchased?
1093 | 
1094 | Additionally, create another table which further aggregates the data for the above points but this time **for each product category** instead of individual products.
1095 | 
1096 | ### Table 1: Aggregate the data by products
1097 | 
1098 | 
1099 | ```python
1100 | # Create table
1101 | cursor.execute("DROP TABLE IF EXISTS clique_bait.products;")
1102 | cursor.execute("""
1103 | CREATE TABLE clique_bait.products
1104 | (
1105 |     "product" VARCHAR(255),
1106 |     "nb_views" INTEGER,
1107 |     "nb_cart_adds" INTEGER,
1108 |     "nb_abandoned" INTEGER,
1109 |     "nb_purchases" INTEGER
1110 | );
1111 | """)
1112 | 
1113 | 
1114 | # Populate the table
1115 | cursor.execute("""
1116 | INSERT INTO clique_bait.products
1117 | WITH add_last_event_cte AS
1118 | (
1119 |     SELECT 
1120 |         e.visit_id,
1121 |         e.sequence_number,
1122 |         ph.page_name, 
1123 |         ph.product_category,
1124 |         ei.event_name,
1125 |         LAST_VALUE(ei.event_name) OVER (PARTITION BY e.visit_id ORDER BY e.sequence_number ROWS BETWEEN UNBOUNDED preceding AND UNBOUNDED following) AS last_event 
1126 |     FROM clique_bait.events e 
1127 |     JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
1128 |     JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
1129 | )
1130 | SELECT 
1131 |     page_name AS product, 
1132 |     SUM(CASE WHEN event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_views,
1133 |     SUM(CASE WHEN event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS nb_cart_adds,
1134 |     SUM(CASE WHEN event_name = 'Add to Cart' AND last_event != 'Purchase' THEN 1 ELSE 0 END) AS nb_abandoned,
1135 |     SUM(CASE WHEN event_name = 'Add to Cart' AND last_event = 'Purchase' THEN 1 ELSE 0 END) AS nb_purchases
1136 | FROM add_last_event_cte
1137 | WHERE product_category IS NOT NULL 
1138 | GROUP BY page_name
1139 | """)
1140 | 
1141 | 
1142 | # Saving
1143 | conn.commit()
1144 | ```
1145 | 
1146 | **Result**
1147 | 
1148 | 
1149 | ```python
1150 | pd.read_sql("""SELECT * FROM clique_bait.products""", conn)
1151 | ```
1152 | 
1153 | <table border="1" class="dataframe">
1154 |   <thead>
1155 |     <tr style="text-align: right;">
1156 |       <th></th>
1157 |       <th>product</th>
1158 |       <th>nb_views</th>
1159 |       <th>nb_cart_adds</th>
1160 |       <th>nb_abandoned</th>
1161 |       <th>nb_purchases</th>
1162 |     </tr>
1163 |   </thead>
1164 |   <tbody>
1165 |     <tr>
1166 |       <th>0</th>
1167 |       <td>Abalone</td>
1168 |       <td>1525</td>
1169 |       <td>932</td>
1170 |       <td>233</td>
1171 |       <td>699</td>
1172 |     </tr>
1173 |     <tr>
1174 |       <th>1</th>
1175 |       <td>Oyster</td>
1176 |       <td>1568</td>
1177 |       <td>943</td>
1178 |       <td>217</td>
1179 |       <td>726</td>
1180 |     </tr>
1181 |     <tr>
1182 |       <th>2</th>
1183 |       <td>Salmon</td>
1184 |       <td>1559</td>
1185 |       <td>938</td>
1186 |       <td>227</td>
1187 |       <td>711</td>
1188 |     </tr>
1189 |     <tr>
1190 |       <th>3</th>
1191 |       <td>Crab</td>
1192 |       <td>1564</td>
1193 |       <td>949</td>
1194 |       <td>230</td>
1195 |       <td>719</td>
1196 |     </tr>
1197 |     <tr>
1198 |       <th>4</th>
1199 |       <td>Tuna</td>
1200 |       <td>1515</td>
1201 |       <td>931</td>
1202 |       <td>234</td>
1203 |       <td>697</td>
1204 |     </tr>
1205 |     <tr>
1206 |       <th>5</th>
1207 |       <td>Lobster</td>
1208 |       <td>1547</td>
1209 |       <td>968</td>
1210 |       <td>214</td>
1211 |       <td>754</td>
1212 |     </tr>
1213 |     <tr>
1214 |       <th>6</th>
1215 |       <td>Kingfish</td>
1216 |       <td>1559</td>
1217 |       <td>920</td>
1218 |       <td>213</td>
1219 |       <td>707</td>
1220 |     </tr>
1221 |     <tr>
1222 |       <th>7</th>
1223 |       <td>Russian Caviar</td>
1224 |       <td>1563</td>
1225 |       <td>946</td>
1226 |       <td>249</td>
1227 |       <td>697</td>
1228 |     </tr>
1229 |     <tr>
1230 |       <th>8</th>
1231 |       <td>Black Truffle</td>
1232 |       <td>1469</td>
1233 |       <td>924</td>
1234 |       <td>217</td>
1235 |       <td>707</td>
1236 |     </tr>
1237 |   </tbody>
1238 | </table>
1239 | </div>
1240 | 
1241 | 
1242 | 
1243 | ### Table 2: Aggregate the data by product categories
1244 | 
1245 | 
1246 | ```python
1247 | # Create the table
1248 | cursor.execute("DROP TABLE IF EXISTS clique_bait.product_category;")
1249 | cursor.execute("""
1250 | CREATE TABLE clique_bait.product_category
1251 | (
1252 |     "product" VARCHAR(255),
1253 |     "nb_views" INTEGER,
1254 |     "nb_cart_adds" INTEGER,
1255 |     "nb_abandoned" INTEGER,
1256 |     "nb_purchases" INTEGER
1257 | );
1258 | """)
1259 | 
1260 | 
1261 | # Populate the table
1262 | cursor.execute("""
1263 | INSERT INTO clique_bait.product_category
1264 | WITH add_last_event_cte AS
1265 | (
1266 |     SELECT 
1267 |         e.visit_id,
1268 |         e.sequence_number, 
1269 |         ph.product_category,
1270 |         ei.event_name,
1271 |         LAST_VALUE(ei.event_name) OVER (PARTITION BY e.visit_id ORDER BY e.sequence_number ROWS BETWEEN UNBOUNDED preceding AND UNBOUNDED following) AS last_event 
1272 |     FROM clique_bait.events e 
1273 |     JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
1274 |     JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
1275 | )
1276 | SELECT 
1277 |     product_category, 
1278 |     SUM(CASE WHEN event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_views,
1279 |     SUM(CASE WHEN event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS nb_cart_adds,
1280 |     SUM(CASE WHEN event_name = 'Add to Cart' AND last_event != 'Purchase' THEN 1 ELSE 0 END) AS nb_abandoned,
1281 |     SUM(CASE WHEN event_name = 'Add to Cart' AND last_event = 'Purchase' THEN 1 ELSE 0 END) AS nb_purchases
1282 | FROM add_last_event_cte
1283 | WHERE product_category IS NOT NULL 
1284 | GROUP BY product_category
1285 | 
1286 | """)
1287 | 
1288 | # Saving
1289 | conn.commit()
1290 | ```
1291 | 
1292 | **Result**
1293 | 
1294 | 
1295 | ```python
1296 | pd.read_sql("SELECT * FROM clique_bait.product_category", conn)
1297 | ```
1298 | 
1299 | <table border="1" class="dataframe">
1300 |   <thead>
1301 |     <tr style="text-align: right;">
1302 |       <th></th>
1303 |       <th>product</th>
1304 |       <th>nb_views</th>
1305 |       <th>nb_cart_adds</th>
1306 |       <th>nb_abandoned</th>
1307 |       <th>nb_purchases</th>
1308 |     </tr>
1309 |   </thead>
1310 |   <tbody>
1311 |     <tr>
1312 |       <th>0</th>
1313 |       <td>Luxury</td>
1314 |       <td>3032</td>
1315 |       <td>1870</td>
1316 |       <td>466</td>
1317 |       <td>1404</td>
1318 |     </tr>
1319 |     <tr>
1320 |       <th>1</th>
1321 |       <td>Shellfish</td>
1322 |       <td>6204</td>
1323 |       <td>3792</td>
1324 |       <td>894</td>
1325 |       <td>2898</td>
1326 |     </tr>
1327 |     <tr>
1328 |       <th>2</th>
1329 |       <td>Fish</td>
1330 |       <td>4633</td>
1331 |       <td>2789</td>
1332 |       <td>674</td>
1333 |       <td>2115</td>
1334 |     </tr>
1335 |   </tbody>
1336 | </table>
1337 | </div>
1338 | 
1339 | 
1340 | 
1341 | ___
1342 | ### Use your 2 new output tables - answer the following questions:
1343 | 
1344 | #### 1. Which product had the most views, cart adds and purchases?
1345 | 
1346 | 
1347 | ```python
1348 | pd.read_sql("""
1349 | SELECT 
1350 |     product, 
1351 |     nb_views, 
1352 |     nb_cart_adds, 
1353 |     nb_purchases
1354 | FROM clique_bait.products
1355 | ORDER BY nb_views DESC, nb_cart_adds DESC, nb_purchases DESC
1356 | LIMIT 1;
1357 | """, conn)
1358 | ```
1359 | 
1360 | <table border="1" class="dataframe">
1361 |   <thead>
1362 |     <tr style="text-align: right;">
1363 |       <th></th>
1364 |       <th>product</th>
1365 |       <th>nb_views</th>
1366 |       <th>nb_cart_adds</th>
1367 |       <th>nb_purchases</th>
1368 |     </tr>
1369 |   </thead>
1370 |   <tbody>
1371 |     <tr>
1372 |       <th>0</th>
1373 |       <td>Oyster</td>
1374 |       <td>1568</td>
1375 |       <td>943</td>
1376 |       <td>726</td>
1377 |     </tr>
1378 |   </tbody>
1379 | </table>
1380 | </div>
1381 | 
1382 | 
1383 | 
1384 | The output above shows that **Oyster** is the product that had the most views, cart adds and purchases.</br>
1385 | Alternatively, we could break down the data for each individual product, which reveals that:
1386 | 
1387 | - **Oyster** is the most viewed product.
1388 | - **Lobster** is the most added product to cart and the most purchased product
1389 | 
1390 | 
1391 | ```python
1392 | pd.read_sql("""
1393 | WITH max_cte AS 
1394 | (
1395 |     SELECT
1396 |      *,
1397 |      CASE WHEN MAX(nb_views) OVER (ORDER BY nb_views DESC) = nb_views THEN product ELSE ''  END AS most_viewed,
1398 |      CASE WHEN MAX(nb_cart_adds) OVER (ORDER BY nb_cart_adds DESC) = nb_cart_adds THEN product ELSE ''  END AS most_cart_added,
1399 |      CASE WHEN MAX(nb_purchases) OVER (ORDER BY nb_purchases DESC) = nb_purchases THEN product ELSE ''  END AS most_purchased
1400 |     FROM clique_bait.products
1401 | )
1402 | SELECT most_viewed, most_cart_added, most_purchased
1403 | FROM max_cte
1404 | WHERE most_viewed != '' OR most_cart_added !='' OR most_purchased !=''
1405 | """, conn)
1406 | ```
1407 | 
1408 | <table border="1" class="dataframe">
1409 |   <thead>
1410 |     <tr style="text-align: right;">
1411 |       <th></th>
1412 |       <th>most_viewed</th>
1413 |       <th>most_cart_added</th>
1414 |       <th>most_purchased</th>
1415 |     </tr>
1416 |   </thead>
1417 |   <tbody>
1418 |     <tr>
1419 |       <th>0</th>
1420 |       <td>Oyster</td>
1421 |       <td></td>
1422 |       <td></td>
1423 |     </tr>
1424 |     <tr>
1425 |       <th>1</th>
1426 |       <td></td>
1427 |       <td>Lobster</td>
1428 |       <td>Lobster</td>
1429 |     </tr>
1430 |   </tbody>
1431 | </table>
1432 | </div>
1433 | 
1434 | 
1435 | 
1436 | ___
1437 | #### 2. Which product was most likely to be abandoned?
1438 | 
1439 | 
1440 | ```python
1441 | pd.read_sql("""
1442 | SELECT product, nb_abandoned
1443 | FROM clique_bait.products
1444 | ORDER BY nb_abandoned DESC
1445 | LIMIT 1;
1446 | """, conn)
1447 | ```
1448 | 
1449 | <table border="1" class="dataframe">
1450 |   <thead>
1451 |     <tr style="text-align: right;">
1452 |       <th></th>
1453 |       <th>product</th>
1454 |       <th>nb_abandoned</th>
1455 |     </tr>
1456 |   </thead>
1457 |   <tbody>
1458 |     <tr>
1459 |       <th>0</th>
1460 |       <td>Russian Caviar</td>
1461 |       <td>249</td>
1462 |     </tr>
1463 |   </tbody>
1464 | </table>
1465 | </div>
1466 | 
1467 | 
1468 | 
1469 | ___
1470 | #### 3. Which product had the highest view to purchase percentage?
1471 | 
1472 | 
1473 | ```python
1474 | pd.read_sql("""
1475 | SELECT 
1476 |     product, 
1477 |     nb_views, 
1478 |     nb_purchases, 
1479 |     CONCAT(ROUND(nb_purchases/nb_views::NUMERIC * 100,2), ' %') AS percent
1480 | FROM clique_bait.products
1481 | ORDER BY percent DESC
1482 | LIMIT 1;
1483 | """, conn)
1484 | ```
1485 | 
1486 | <table border="1" class="dataframe">
1487 |   <thead>
1488 |     <tr style="text-align: right;">
1489 |       <th></th>
1490 |       <th>product</th>
1491 |       <th>nb_views</th>
1492 |       <th>nb_purchases</th>
1493 |       <th>percent</th>
1494 |     </tr>
1495 |   </thead>
1496 |   <tbody>
1497 |     <tr>
1498 |       <th>0</th>
1499 |       <td>Lobster</td>
1500 |       <td>1547</td>
1501 |       <td>754</td>
1502 |       <td>48.74 %</td>
1503 |     </tr>
1504 |   </tbody>
1505 | </table>
1506 | </div>
1507 | 
1508 | 
1509 | 
1510 | ___
1511 | #### 4. What is the average conversion rate from view to cart add?
1512 | 
1513 | 
1514 | ```python
1515 | pd.read_sql("""
1516 | SELECT 
1517 |     CONCAT(ROUND(AVG(nb_cart_adds/nb_views::NUMERIC * 100),2), ' %') AS conversion_rate
1518 | FROM clique_bait.products
1519 | """, conn)
1520 | ```
1521 | 
1522 | <table border="1" class="dataframe">
1523 |   <thead>
1524 |     <tr style="text-align: right;">
1525 |       <th></th>
1526 |       <th>conversion_rate</th>
1527 |     </tr>
1528 |   </thead>
1529 |   <tbody>
1530 |     <tr>
1531 |       <th>0</th>
1532 |       <td>60.95 %</td>
1533 |     </tr>
1534 |   </tbody>
1535 | </table>
1536 | </div>
1537 | 
1538 | 
1539 | 
1540 | ___
1541 | #### 5. What is the average conversion rate from cart add to purchase?
1542 | 
1543 | 
1544 | ```python
1545 | pd.read_sql("""
1546 | SELECT 
1547 |     CONCAT(ROUND(AVG(nb_purchases/nb_cart_adds::NUMERIC * 100),2), ' %') AS conversion_rate
1548 | FROM clique_bait.products
1549 | """, conn)
1550 | ```
1551 | 
1552 | <table border="1" class="dataframe">
1553 |   <thead>
1554 |     <tr style="text-align: right;">
1555 |       <th></th>
1556 |       <th>conversion_rate</th>
1557 |     </tr>
1558 |   </thead>
1559 |   <tbody>
1560 |     <tr>
1561 |       <th>0</th>
1562 |       <td>75.93 %</td>
1563 |     </tr>
1564 |   </tbody>
1565 | </table>
1566 | </div>
1567 | 
1568 | 
1569 | 
1570 | <a id = 'D'></a>
1571 | ## D. Campaigns Analysis
1572 | Generate a table that has 1 single row for every unique `visit_id` record and has the following columns:
1573 | 
1574 | - `user_id`
1575 | - `visit_id`
1576 | - `visit_start_time`: the earliest event_time for each visit
1577 | - `page_views`: count of page views for each visit
1578 | - `cart_adds`: count of product cart add events for each visit
1579 | - `purchase`: 1/0 flag if a purchase event exists for each visit
1580 | - `campaign_name`: map the visit to a campaign if the visit_start_time falls between the start_date and end_date
1581 | - `impression`: count of ad impressions for each visit
1582 | - `click`: count of ad clicks for each visit
1583 | - `cart_products`**(Optional column)**: a comma separated text value with products added to the cart sorted by the order they were added to the cart (hint: use the `sequence_number`)
1584 | 
1585 | Use the subsequent dataset to generate at least 5 insights for the Clique Bait team - bonus: prepare a single A4 infographic that the team can use for their management reporting sessions, be sure to emphasise the most important points from your findings.
1586 | 
1587 | Some ideas you might want to investigate further include:
1588 | 
1589 | - Identifying users who have received impressions during each campaign period and comparing each metric with other users who did not have an impression event
1590 | - Does clicking on an impression lead to higher purchase rates?
1591 | - What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click?
1592 | - What metrics can you use to quantify the success or failure of each campaign compared to eachother?
1593 | 
1594 | 
1595 | ```python
1596 | # Create table
1597 | cursor.execute("DROP TABLE IF EXISTS clique_bait.campaign_analysis;")
1598 | cursor.execute("""
1599 | CREATE TABLE clique_bait.campaign_analysis
1600 | (
1601 |     "user_id" INTEGER,
1602 |     "visit_id" VARCHAR(6),
1603 |     "visit_start_time" TIMESTAMP,
1604 |     "page_views" INTEGER,
1605 |     "cart_adds" INTEGER,
1606 |     "purchases" INTEGER,
1607 |     "impressions" INTEGER,
1608 |     "clicks" INTEGER,
1609 |     "campaign_name" VARCHAR(33),
1610 |     "cart_products" VARCHAR(255)
1611 | );
1612 | """)
1613 | 
1614 | 
1615 | # Populate the table
1616 | cursor.execute("""
1617 | INSERT INTO clique_bait.campaign_analysis
1618 | WITH cart_products_cte AS
1619 | (
1620 |     -- Generate a sequence of products added to the cart  
1621 |     
1622 |     SELECT    
1623 |         u.user_id, 
1624 |         e.visit_id,
1625 |         STRING_AGG(ph.page_name, ', ' ORDER BY sequence_number) AS cart_products
1626 |     FROM clique_bait.users u
1627 |     JOIN clique_bait.events e ON u.cookie_id = e.cookie_id
1628 |     JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
1629 |     JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
1630 |     WHERE ei.event_name = 'Add to Cart'
1631 |     GROUP BY u.user_id, e.visit_id
1632 | )
1633 | 
1634 | SELECT 
1635 |     u.user_id, 
1636 |     e.visit_id, 
1637 |     MIN(e.event_time) AS visit_start_time,
1638 |     SUM(CASE WHEN ei.event_name = 'Page View' THEN 1 ELSE 0 END) AS page_views,
1639 |     SUM(CASE WHEN ei.event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS cart_adds,
1640 |     SUM(CASE WHEN ei.event_name = 'Purchase' THEN 1 ELSE 0 END) AS purchases,
1641 |     SUM(CASE WHEN ei.event_name = 'Ad Impression' THEN 1 ELSE 0 END) AS impressions,
1642 |     SUM(CASE WHEN ei.event_name = 'Ad Click' THEN 1 ELSE 0 END) AS clicks,
1643 |     CASE WHEN MIN(e.event_time) BETWEEN ci.start_date AND ci.end_date THEN ci.campaign_name ELSE '' END AS campaign_name,
1644 |     CASE WHEN cp.cart_products IS NULL THEN '' ELSE cp.cart_products END AS cart_products
1645 |    
1646 | FROM clique_bait.users u
1647 | JOIN clique_bait.events e ON u.cookie_id = e.cookie_id
1648 | JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
1649 | JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
1650 | JOIN clique_bait.campaign_identifier ci ON e.event_time BETWEEN ci.start_date AND ci.end_date
1651 | LEFT JOIN cart_products_cte cp ON cp.user_id = u.user_id AND cp.visit_id = e.visit_id
1652 | GROUP BY u.user_id, e.visit_id, ci.start_date, ci.end_date, ci.campaign_name, cp.cart_products
1653 | ORDER BY u.user_id, e.visit_id;
1654 | """)
1655 | 
1656 | 
1657 | # Saving the updates
1658 | conn.commit()
1659 | ```
1660 | 
1661 | **Result**</br>
1662 | Here is the table listing the first 5 users only.
1663 | 
1664 | 
1665 | ```python
1666 | pd.read_sql("""
1667 | SELECT * 
1668 | FROM clique_bait.campaign_analysis
1669 | WHERE user_id < 6
1670 | """, conn)
1671 | ```
1672 | 
1673 | <table border="1" class="dataframe">
1674 |   <thead>
1675 |     <tr style="text-align: right;">
1676 |       <th></th>
1677 |       <th>user_id</th>
1678 |       <th>visit_id</th>
1679 |       <th>visit_start_time</th>
1680 |       <th>page_views</th>
1681 |       <th>cart_adds</th>
1682 |       <th>purchases</th>
1683 |       <th>impressions</th>
1684 |       <th>clicks</th>
1685 |       <th>campaign_name</th>
1686 |       <th>cart_products</th>
1687 |     </tr>
1688 |   </thead>
1689 |   <tbody>
1690 |     <tr>
1691 |       <th>0</th>
1692 |       <td>1</td>
1693 |       <td>02a5d5</td>
1694 |       <td>2020-02-26 16:57:26.260871</td>
1695 |       <td>4</td>
1696 |       <td>0</td>
1697 |       <td>0</td>
1698 |       <td>0</td>
1699 |       <td>0</td>
1700 |       <td>Half Off - Treat Your Shellf(ish)</td>
1701 |       <td></td>
1702 |     </tr>
1703 |     <tr>
1704 |       <th>1</th>
1705 |       <td>1</td>
1706 |       <td>0826dc</td>
1707 |       <td>2020-02-26 05:58:37.918618</td>
1708 |       <td>1</td>
1709 |       <td>0</td>
1710 |       <td>0</td>
1711 |       <td>0</td>
1712 |       <td>0</td>
1713 |       <td>Half Off - Treat Your Shellf(ish)</td>
1714 |       <td></td>
1715 |     </tr>
1716 |     <tr>
1717 |       <th>2</th>
1718 |       <td>1</td>
1719 |       <td>0fc437</td>
1720 |       <td>2020-02-04 17:49:49.602976</td>
1721 |       <td>10</td>
1722 |       <td>6</td>
1723 |       <td>1</td>
1724 |       <td>1</td>
1725 |       <td>1</td>
1726 |       <td>Half Off - Treat Your Shellf(ish)</td>
1727 |       <td>Tuna, Russian Caviar, Black Truffle, Abalone, ...</td>
1728 |     </tr>
1729 |     <tr>
1730 |       <th>3</th>
1731 |       <td>1</td>
1732 |       <td>30b94d</td>
1733 |       <td>2020-03-15 13:12:54.023936</td>
1734 |       <td>9</td>
1735 |       <td>7</td>
1736 |       <td>1</td>
1737 |       <td>1</td>
1738 |       <td>1</td>
1739 |       <td>Half Off - Treat Your Shellf(ish)</td>
1740 |       <td>Salmon, Kingfish, Tuna, Russian Caviar, Abalon...</td>
1741 |     </tr>
1742 |     <tr>
1743 |       <th>4</th>
1744 |       <td>1</td>
1745 |       <td>41355d</td>
1746 |       <td>2020-03-25 00:11:17.860655</td>
1747 |       <td>6</td>
1748 |       <td>1</td>
1749 |       <td>0</td>
1750 |       <td>0</td>
1751 |       <td>0</td>
1752 |       <td>Half Off - Treat Your Shellf(ish)</td>
1753 |       <td>Lobster</td>
1754 |     </tr>
1755 |     <tr>
1756 |       <th>5</th>
1757 |       <td>1</td>
1758 |       <td>ccf365</td>
1759 |       <td>2020-02-04 19:16:09.182546</td>
1760 |       <td>7</td>
1761 |       <td>3</td>
1762 |       <td>1</td>
1763 |       <td>0</td>
1764 |       <td>0</td>
1765 |       <td>Half Off - Treat Your Shellf(ish)</td>
1766 |       <td>Lobster, Crab, Oyster</td>
1767 |     </tr>
1768 |     <tr>
1769 |       <th>6</th>
1770 |       <td>1</td>
1771 |       <td>eaffde</td>
1772 |       <td>2020-03-25 20:06:32.342989</td>
1773 |       <td>10</td>
1774 |       <td>8</td>
1775 |       <td>1</td>
1776 |       <td>1</td>
1777 |       <td>1</td>
1778 |       <td>Half Off - Treat Your Shellf(ish)</td>
1779 |       <td>Salmon, Tuna, Russian Caviar, Black Truffle, A...</td>
1780 |     </tr>
1781 |     <tr>
1782 |       <th>7</th>
1783 |       <td>1</td>
1784 |       <td>f7c798</td>
1785 |       <td>2020-03-15 02:23:26.312543</td>
1786 |       <td>9</td>
1787 |       <td>3</td>
1788 |       <td>1</td>
1789 |       <td>0</td>
1790 |       <td>0</td>
1791 |       <td>Half Off - Treat Your Shellf(ish)</td>
1792 |       <td>Russian Caviar, Crab, Oyster</td>
1793 |     </tr>
1794 |     <tr>
1795 |       <th>8</th>
1796 |       <td>2</td>
1797 |       <td>0635fb</td>
1798 |       <td>2020-02-16 06:42:42.735730</td>
1799 |       <td>9</td>
1800 |       <td>4</td>
1801 |       <td>1</td>
1802 |       <td>0</td>
1803 |       <td>0</td>
1804 |       <td>Half Off - Treat Your Shellf(ish)</td>
1805 |       <td>Salmon, Kingfish, Abalone, Crab</td>
1806 |     </tr>
1807 |     <tr>
1808 |       <th>9</th>
1809 |       <td>2</td>
1810 |       <td>1f1198</td>
1811 |       <td>2020-02-01 21:51:55.078775</td>
1812 |       <td>1</td>
1813 |       <td>0</td>
1814 |       <td>0</td>
1815 |       <td>0</td>
1816 |       <td>0</td>
1817 |       <td>Half Off - Treat Your Shellf(ish)</td>
1818 |       <td></td>
1819 |     </tr>
1820 |     <tr>
1821 |       <th>10</th>
1822 |       <td>2</td>
1823 |       <td>3b5871</td>
1824 |       <td>2020-01-18 10:16:32.158475</td>
1825 |       <td>9</td>
1826 |       <td>6</td>
1827 |       <td>1</td>
1828 |       <td>1</td>
1829 |       <td>1</td>
1830 |       <td>25% Off - Living The Lux Life</td>
1831 |       <td>Salmon, Kingfish, Russian Caviar, Black Truffl...</td>
1832 |     </tr>
1833 |     <tr>
1834 |       <th>11</th>
1835 |       <td>2</td>
1836 |       <td>49d73d</td>
1837 |       <td>2020-02-16 06:21:27.138532</td>
1838 |       <td>11</td>
1839 |       <td>9</td>
1840 |       <td>1</td>
1841 |       <td>1</td>
1842 |       <td>1</td>
1843 |       <td>Half Off - Treat Your Shellf(ish)</td>
1844 |       <td>Salmon, Kingfish, Tuna, Russian Caviar, Black ...</td>
1845 |     </tr>
1846 |     <tr>
1847 |       <th>12</th>
1848 |       <td>2</td>
1849 |       <td>910d9a</td>
1850 |       <td>2020-02-01 10:40:46.875968</td>
1851 |       <td>8</td>
1852 |       <td>1</td>
1853 |       <td>0</td>
1854 |       <td>0</td>
1855 |       <td>0</td>
1856 |       <td>Half Off - Treat Your Shellf(ish)</td>
1857 |       <td>Abalone</td>
1858 |     </tr>
1859 |     <tr>
1860 |       <th>13</th>
1861 |       <td>2</td>
1862 |       <td>c5c0ee</td>
1863 |       <td>2020-01-18 10:35:22.765382</td>
1864 |       <td>1</td>
1865 |       <td>0</td>
1866 |       <td>0</td>
1867 |       <td>0</td>
1868 |       <td>0</td>
1869 |       <td>25% Off - Living The Lux Life</td>
1870 |       <td></td>
1871 |     </tr>
1872 |     <tr>
1873 |       <th>14</th>
1874 |       <td>2</td>
1875 |       <td>d58cbd</td>
1876 |       <td>2020-01-18 23:40:54.761906</td>
1877 |       <td>8</td>
1878 |       <td>4</td>
1879 |       <td>0</td>
1880 |       <td>0</td>
1881 |       <td>0</td>
1882 |       <td>25% Off - Living The Lux Life</td>
1883 |       <td>Kingfish, Tuna, Abalone, Crab</td>
1884 |     </tr>
1885 |     <tr>
1886 |       <th>15</th>
1887 |       <td>2</td>
1888 |       <td>e26a84</td>
1889 |       <td>2020-01-18 16:06:40.907280</td>
1890 |       <td>6</td>
1891 |       <td>2</td>
1892 |       <td>1</td>
1893 |       <td>0</td>
1894 |       <td>0</td>
1895 |       <td>25% Off - Living The Lux Life</td>
1896 |       <td>Salmon, Oyster</td>
1897 |     </tr>
1898 |     <tr>
1899 |       <th>16</th>
1900 |       <td>3</td>
1901 |       <td>25502e</td>
1902 |       <td>2020-02-21 11:26:15.353389</td>
1903 |       <td>1</td>
1904 |       <td>0</td>
1905 |       <td>0</td>
1906 |       <td>0</td>
1907 |       <td>0</td>
1908 |       <td>Half Off - Treat Your Shellf(ish)</td>
1909 |       <td></td>
1910 |     </tr>
1911 |     <tr>
1912 |       <th>17</th>
1913 |       <td>3</td>
1914 |       <td>9a2f24</td>
1915 |       <td>2020-02-21 03:19:10.032455</td>
1916 |       <td>6</td>
1917 |       <td>2</td>
1918 |       <td>1</td>
1919 |       <td>0</td>
1920 |       <td>0</td>
1921 |       <td>Half Off - Treat Your Shellf(ish)</td>
1922 |       <td>Kingfish, Black Truffle</td>
1923 |     </tr>
1924 |     <tr>
1925 |       <th>18</th>
1926 |       <td>3</td>
1927 |       <td>bf200a</td>
1928 |       <td>2020-03-11 04:10:26.708385</td>
1929 |       <td>7</td>
1930 |       <td>2</td>
1931 |       <td>1</td>
1932 |       <td>0</td>
1933 |       <td>0</td>
1934 |       <td>Half Off - Treat Your Shellf(ish)</td>
1935 |       <td>Salmon, Crab</td>
1936 |     </tr>
1937 |     <tr>
1938 |       <th>19</th>
1939 |       <td>3</td>
1940 |       <td>eb13cd</td>
1941 |       <td>2020-03-11 21:36:37.222763</td>
1942 |       <td>1</td>
1943 |       <td>0</td>
1944 |       <td>0</td>
1945 |       <td>0</td>
1946 |       <td>0</td>
1947 |       <td>Half Off - Treat Your Shellf(ish)</td>
1948 |       <td></td>
1949 |     </tr>
1950 |     <tr>
1951 |       <th>20</th>
1952 |       <td>4</td>
1953 |       <td>07a950</td>
1954 |       <td>2020-03-19 17:56:24.610445</td>
1955 |       <td>6</td>
1956 |       <td>0</td>
1957 |       <td>0</td>
1958 |       <td>0</td>
1959 |       <td>0</td>
1960 |       <td>Half Off - Treat Your Shellf(ish)</td>
1961 |       <td></td>
1962 |     </tr>
1963 |     <tr>
1964 |       <th>21</th>
1965 |       <td>4</td>
1966 |       <td>4c0ce3</td>
1967 |       <td>2020-02-22 19:42:45.498271</td>
1968 |       <td>1</td>
1969 |       <td>0</td>
1970 |       <td>0</td>
1971 |       <td>0</td>
1972 |       <td>0</td>
1973 |       <td>Half Off - Treat Your Shellf(ish)</td>
1974 |       <td></td>
1975 |     </tr>
1976 |     <tr>
1977 |       <th>22</th>
1978 |       <td>4</td>
1979 |       <td>7caba5</td>
1980 |       <td>2020-02-22 17:49:37.646174</td>
1981 |       <td>5</td>
1982 |       <td>2</td>
1983 |       <td>0</td>
1984 |       <td>0</td>
1985 |       <td>0</td>
1986 |       <td>Half Off - Treat Your Shellf(ish)</td>
1987 |       <td>Tuna, Lobster</td>
1988 |     </tr>
1989 |     <tr>
1990 |       <th>23</th>
1991 |       <td>4</td>
1992 |       <td>b90e25</td>
1993 |       <td>2020-03-19 11:01:58.182947</td>
1994 |       <td>9</td>
1995 |       <td>4</td>
1996 |       <td>1</td>
1997 |       <td>1</td>
1998 |       <td>1</td>
1999 |       <td>Half Off - Treat Your Shellf(ish)</td>
2000 |       <td>Tuna, Black Truffle, Lobster, Crab</td>
2001 |     </tr>
2002 |     <tr>
2003 |       <th>24</th>
2004 |       <td>5</td>
2005 |       <td>05c52a</td>
2006 |       <td>2020-02-11 12:30:33.479052</td>
2007 |       <td>9</td>
2008 |       <td>8</td>
2009 |       <td>0</td>
2010 |       <td>1</td>
2011 |       <td>1</td>
2012 |       <td>Half Off - Treat Your Shellf(ish)</td>
2013 |       <td>Salmon, Kingfish, Tuna, Russian Caviar, Black ...</td>
2014 |     </tr>
2015 |     <tr>
2016 |       <th>25</th>
2017 |       <td>5</td>
2018 |       <td>4bffe1</td>
2019 |       <td>2020-02-26 16:03:10.377881</td>
2020 |       <td>8</td>
2021 |       <td>1</td>
2022 |       <td>0</td>
2023 |       <td>0</td>
2024 |       <td>0</td>
2025 |       <td>Half Off - Treat Your Shellf(ish)</td>
2026 |       <td>Russian Caviar</td>
2027 |     </tr>
2028 |     <tr>
2029 |       <th>26</th>
2030 |       <td>5</td>
2031 |       <td>580bf6</td>
2032 |       <td>2020-02-11 04:05:42.307991</td>
2033 |       <td>8</td>
2034 |       <td>6</td>
2035 |       <td>0</td>
2036 |       <td>0</td>
2037 |       <td>0</td>
2038 |       <td>Half Off - Treat Your Shellf(ish)</td>
2039 |       <td>Salmon, Tuna, Russian Caviar, Black Truffle, A...</td>
2040 |     </tr>
2041 |     <tr>
2042 |       <th>27</th>
2043 |       <td>5</td>
2044 |       <td>b45feb</td>
2045 |       <td>2020-02-01 07:47:45.025247</td>
2046 |       <td>1</td>
2047 |       <td>0</td>
2048 |       <td>0</td>
2049 |       <td>0</td>
2050 |       <td>0</td>
2051 |       <td>Half Off - Treat Your Shellf(ish)</td>
2052 |       <td></td>
2053 |     </tr>
2054 |     <tr>
2055 |       <th>28</th>
2056 |       <td>5</td>
2057 |       <td>f61ed7</td>
2058 |       <td>2020-02-01 06:30:39.766168</td>
2059 |       <td>8</td>
2060 |       <td>2</td>
2061 |       <td>1</td>
2062 |       <td>0</td>
2063 |       <td>0</td>
2064 |       <td>Half Off - Treat Your Shellf(ish)</td>
2065 |       <td>Lobster, Crab</td>
2066 |     </tr>
2067 |     <tr>
2068 |       <th>29</th>
2069 |       <td>5</td>
2070 |       <td>fa70cb</td>
2071 |       <td>2020-02-26 11:12:20.361638</td>
2072 |       <td>1</td>
2073 |       <td>0</td>
2074 |       <td>0</td>
2075 |       <td>0</td>
2076 |       <td>0</td>
2077 |       <td>Half Off - Treat Your Shellf(ish)</td>
2078 |       <td></td>
2079 |     </tr>
2080 |   </tbody>
2081 | </table>
2082 | </div>
2083 | 
2084 | 
2085 | 
2086 | ### further investigation ... 
2087 | ### Does clicking on an impression lead to higher purchase rates?
2088 | 
2089 | 
2090 | Among users who received an impression, the purchase rate is significantly higher for those who clicked on the advertisement compared to those who did not click, accounting for a purchase rate of 71.89% vs 13.12%.
2091 | 
2092 | 
2093 | ```python
2094 | pd.read_sql("""
2095 | SELECT 
2096 |     ROUND(SUM(CASE WHEN clicks = 1 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 2) AS clicked_purchase_rate,
2097 |     ROUND(SUM(CASE WHEN clicks = 0 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 2) AS no_clicked_purchase_rate
2098 | FROM clique_bait.campaign_analysis
2099 | WHERE impressions = 1 
2100 | """, conn)
2101 | ```
2102 | 
2103 | <table border="1" class="dataframe">
2104 |   <thead>
2105 |     <tr style="text-align: right;">
2106 |       <th></th>
2107 |       <th>clicked_purchase_rate</th>
2108 |       <th>no_clicked_purchase_rate</th>
2109 |     </tr>
2110 |   </thead>
2111 |   <tbody>
2112 |     <tr>
2113 |       <th>0</th>
2114 |       <td>71.89</td>
2115 |       <td>13.12</td>
2116 |     </tr>
2117 |   </tbody>
2118 | </table>
2119 | </div>
2120 | 
2121 | 
2122 | 
2123 | ___
2124 | ### What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click?
2125 | 
2126 | 
2127 | Among all users who visited the Clique Bait website:
2128 | - The purchase rate is higher for users who do not receive an impression, with a rate of 28.6%.
2129 | - Clicking on a campaign impression leads to a lower purchase rate of 17.6% compared to the previous statement.
2130 | - However, even if users receive campaign impressions, not clicking on them results in the lowest purchase rate of only 3.2%.
2131 | 
2132 | 
2133 | ```python
2134 | pd.read_sql("""
2135 | SELECT 
2136 |     ROUND(SUM(CASE WHEN impressions = 1 AND clicks = 1 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 1) AS clicked_purchase_rate,
2137 |     ROUND(SUM(CASE WHEN impressions = 1 AND clicks = 0 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 1) AS no_clicked_purchase_rate,
2138 |     ROUND(SUM(CASE WHEN impressions = 0 AND clicks = 0 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 1) AS no_impression_purchase_rate
2139 | FROM clique_bait.campaign_analysis
2140 | """, conn)
2141 | ```
2142 | 
2143 | <table border="1" class="dataframe">
2144 |   <thead>
2145 |     <tr style="text-align: right;">
2146 |       <th></th>
2147 |       <th>clicked_purchase_rate</th>
2148 |       <th>no_clicked_purchase_rate</th>
2149 |       <th>no_impression_purchase_rate</th>
2150 |     </tr>
2151 |   </thead>
2152 |   <tbody>
2153 |     <tr>
2154 |       <th>0</th>
2155 |       <td>17.6</td>
2156 |       <td>3.2</td>
2157 |       <td>28.6</td>
2158 |     </tr>
2159 |   </tbody>
2160 | </table>
2161 | </div>
2162 | 
2163 | 
2164 | 
2165 | 
2166 | ```python
2167 | conn.close()
2168 | ```
2169 | 


--------------------------------------------------------------------------------
/CaseStudy#6 - Clique Bait/README.md:
--------------------------------------------------------------------------------
  1 | # Case Study #6: Clique Bait 🪝</br>
  2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
  3 | [![View Case Study 6](https://img.shields.io/badge/View-Case_Study_6-06816E)](https://8weeksqlchallenge.com/case-study-6/)</br>
  4 | ![6](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/364b11b1-7a4b-47d9-b7a6-6fd87075d140)
  5 | 
  6 | 
  7 | The case study presented here is part of the **8 Week SQL Challenge**.\
  8 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).
  9 | 
 10 | This time, I am using `PostgreSQL queries` (instead of MySQL) in `Jupyter Notebook` to quickly view results, which provides me with an opportunity 
 11 |   - to learn PostgreSQL
 12 |   - to utilize handy mathematical and string functions.
 13 | 
 14 | 
 15 | 
 16 | ## Table of Contents
 17 | * [Entity Relationship Diagram](#entity-relationship-diagram)
 18 | * [Datasets](#datasets)
 19 | * [Case Study Questions](#case-study-questions)
 20 | * [Solutions](#solutions)
 21 | * [PostgreSQL Topics Covered](#postgresql-topics-covered)
 22 | 
 23 | ## Entity Relationship Diagram
 24 | The question A will require the creation of an Entity-Relationship Diagram (ERD).
 25 | 
 26 | 
 27 | ## Datasets
 28 | The Case Study #6 contains 5 dataset:
 29 | 
 30 | 
 31 | ### `users`
 32 | <details>
 33 | <summary>
 34 | View table
 35 | </summary>
 36 |   
 37 | `users` : This table shows all the user_id along with their unique cookie_id (PK).
 38 |   
 39 | user_id | cookie_id | start_date
 40 | --- | --- | ---
 41 | 397 | 759ff | 2020-03-30 00:00:00
 42 | 215 | 863329 | 2020-01-26 00:00:00
 43 | 191 | eefca9 | 2020-03-15 00:00:00
 44 | 89 | 764796 | 2020-01-07 00:00:00
 45 | 127 | 17ccc5 | 2020-01-22 00:00:00
 46 | 81 | b0b666 | 2020-03-01 00:00:00
 47 | 260 | a4f236 | 2020-01-08 00:00:00
 48 | 203 | d1182f | 2020-04-18 00:00:00
 49 | 23 | 12dbc8 | 2020-01-18 00:00:00
 50 | 375 | f61d69 | 2020-01-03 00:00:00
 51 |   
 52 | </details>
 53 | 
 54 | 
 55 | ### `events`
 56 | <details>
 57 | <summary>
 58 | View table
 59 | </summary>
 60 |   
 61 | `events` : The table lists all the website interactions by customers (cookie_id, webpage visited, etc.).
 62 | 
 63 | visit_id | cookie_id | page_id | event_type | sequence_number | event_time
 64 | --- | --- | --- | --- | --- | --- 
 65 | 719fd3 | 3d83d3 | 5 | 1 | 4 | 2020-03-02 00:29:09.975502
 66 | fb1eb1 | c5ff25 | 5 | 2 | 8 | 2020-01-22 07:59:16.761931
 67 | 23fe81 | 1e8c2d | 10 | 1 | 9 | 2020-03-21 13:14:11.745667
 68 | ad91aa | 648115 | 6 | 1 | 3 | 2020-04-27 16:28:09.824606
 69 | 5576d7 | ac418c | 6 | 1 | 4 | 2020-01-18 04:55:10.149236
 70 | 48308b | c686c1 | 8 | 1 | 5 | 2020-01-29 06:10:38.702163
 71 | 46b17d | 78f9b3 | 7 | 1 | 12 | 2020-02-16 09:45:31.926407
 72 | 9fd196 | ccf057 | 4 | 1 | 5 | 2020-02-14 08:29:12.922164
 73 | edf853 | f85454 | 1 | 1 | 1 | 2020-02-22 12:59:07.652207
 74 | 3c6716 | 02e74f | 3 | 2 | 5 | 2020-01-31 17:56:20.777383
 75 | 
 76 | </details>
 77 | 
 78 | 
 79 | ### `event_identifier`
 80 | <details>
 81 | <summary>
 82 | View table
 83 | </summary>
 84 |   
 85 | `event_identifier` : The table maps the event_type to its corresponding event_name.
 86 | 
 87 | event_type | event_name
 88 | --- | ---
 89 | 1 | Page View
 90 | 2 | Add to Cart
 91 | 3 | Purchase
 92 | 4 | Ad Impression
 93 | 5 | Ad Click
 94 | 
 95 | </details>
 96 | 
 97 | 
 98 | ### `campaign_identifier`
 99 | <details>
100 | <summary>
101 | View table
102 | </summary>
103 |   
104 | `campaign_identifier`: The table lists the information of the three campaigns that ran on the Clique Bait website.
105 | 
106 | campaign_id | products | campaign_name | start_date | end_date
107 | --- | --- | --- | --- | --- 
108 | 1 | 1-3 | BOGOF - Fishing For Compliments | 2020-01-01 00:00:00 | 2020-01-14 00:00:00
109 | 2 | 4-5 | 25% Off - Living The Lux Life | 2020-01-15 00:00:00 | 2020-01-28 00:00:00
110 | 3 | 6-8 | Half Off - Treat Your Shellf(ish) | 2020-02-01 00:00:00 | 2020-03-31 00:00:00
111 | 
112 | </details>
113 | 
114 | 
115 | ### `page_hierarchy`
116 | <details>
117 | <summary>
118 | View table
119 | </summary>
120 |   
121 | `page_hierarchy`: The table maps the page_id to its page_name.
122 | 
123 | page_id | page_name | product_category | product_id
124 | --- | --- | --- | --- 
125 | 1 | Home Page | null | null
126 | 2 | All Products | null | null
127 | 3 | Salmon | Fish | 1
128 | 4 | Kingfish | Fish | 2
129 | 5 | Tuna | Fish | 3
130 | 6 | Russian Caviar | Luxury | 4
131 | 7 | Black Truffle | Luxury | 5
132 | 8 | Abalone | Shellfish | 6
133 | 9 | Lobster | Shellfish | 7
134 | 10 | Crab | Shellfish | 8
135 | 11 | Oyster | Shellfish | 9
136 | 12 | Checkout | null | null
137 | 13 | Confirmation | null | null
138 | 
139 | </details>
140 | 
141 | 
142 | ## Case Study Questions
143 | Case Study #6 is categorized into 4 question groups\
144 | To view the specific section, please open the link in a *`new tab`* or *`window`*.\
145 | [A. Enterprise Relationship Diagram](CaseStudy6_solutions.md#A)\
146 | [B. Digital Analysis](CaseStudy6_solutions.md#B)\
147 | [C. Product Funnel Analysis](CaseStudy6_solutions.md#C)\
148 | [D. Campaigns Analysis](CaseStudy6_solutions.md#D)
149 | 
150 | ## Solutions
151 | - View `clique_bait` database: [**CaseStudy6_schema.sql**](https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/main/CaseStudy%236%20-%20Clique%20Bait/CaseStudy6_schema.sql)
152 | - View Solution:
153 |     - [**Markdown File**](CaseStudy6_solutions.md): offers a more fluid and responsive viewing experience
154 |     - [**Jupyter Notebook**](CaseStudy6_solutions.ipynb): contains the original code
155 | 
156 | ## PostgreSQL Topics Covered
157 | - Establish relationships between datasets to create an Entity-Relationship Diagram (ERD)
158 | - Data Cleaning
159 | - Create Tables
160 | - Pivot Table using CASE WHEN
161 | - DATE and STRING Type Manipulation
162 | - Common Table Expressions (CTE)
163 | - Window Functions
164 | - Subqueries
165 | - JOIN, UNION ALL
166 | 


--------------------------------------------------------------------------------
/CaseStudy#6 - Clique Bait/images/CliqueBait_ERD.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chanronnie/8WeekSQLChallenge/4b36f745d833a82a61a9ce70ab177462eb7dbfd0/CaseStudy#6 - Clique Bait/images/CliqueBait_ERD.png


--------------------------------------------------------------------------------
/CaseStudy#7 - Balanced Tree/README.md:
--------------------------------------------------------------------------------
  1 | # Case Study #7: Balanced Tree Clothing Co.🌲</br>
  2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
  3 | [![View Case Study 7](https://img.shields.io/badge/View-Case_Study_7-06816E)](https://8weeksqlchallenge.com/case-study-7/)</br>
  4 | ![7](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/d355752c-2e17-4706-a6cf-f92310e84fa1)
  5 | 
  6 | 
  7 | 
  8 | The case study presented here is part of the **8 Week SQL Challenge**.\
  9 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).
 10 | 
 11 | This time, I am using `PostgreSQL queries` (instead of MySQL) in `Jupyter Notebook` to quickly view results, which provides me with an opportunity 
 12 |   - to learn PostgreSQL
 13 |   - to utilize handy mathematical and string functions.
 14 | 
 15 | 
 16 | 
 17 | ## Table of Contents
 18 | * [Entity Relationship Diagram](#entity-relationship-diagram)
 19 | * [Datasets](#datasets)
 20 | * [Case Study Questions](#case-study-questions)
 21 | * [Solutions](#solutions)
 22 | * [PostgreSQL Topics Covered](#postgresql-topics-covered)
 23 | 
 24 | ## Entity Relationship Diagram
 25 | Again, I used the [DB Diagram tool](https://dbdiagram.io/home) to create the following ERD.
 26 | ![balanced_tree](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/9867786d-6078-42f8-abf5-3565768d2832)
 27 | 
 28 | 
 29 | 
 30 | <details>
 31 | <summary>View code to create ERD</summary>
 32 |   
 33 | Here is the code that I used for creating the ERD for all the Balanced Tree datasets on DB Diagram tool
 34 | 
 35 | ```markdown
 36 | TABLE product_hierarchy {
 37 |   "id" INTEGER
 38 |   "parent_id" INTEGER
 39 |   "level_text" VARCHAR(19)
 40 |   "level_name" VARCHAR(8)
 41 | }
 42 | 
 43 | TABLE product_prices {
 44 |   "id" INTEGER
 45 |   "product_id" VARCHAR(6)
 46 |   "price" INTEGER
 47 | }
 48 | 
 49 | TABLE product_details {
 50 |   "product_id" VARCHAR(6)
 51 |   "price" INTEGER
 52 |   "product_name" VARCHAR(32)
 53 |   "category_id" INTEGER
 54 |   "segment_id" INTEGER
 55 |   "style_id" INTEGER
 56 |   "category_name" VARCHAR(6)
 57 |   "segment_name" VARCHAR(6)
 58 |   "style_name" VARCHAR(19)
 59 |   }
 60 | 
 61 | TABLE sales {
 62 |   "prod_id" VARCHAR(6)
 63 |   "qty" INTEGER
 64 |   "price" INTEGER
 65 |   "discount" INTEGER
 66 |   "member" BOOLEAN
 67 |   "txn_id" VARCHAR(6)
 68 |   "start_txn_time" TIMESTAMP
 69 | }
 70 | 
 71 | # Establish relationships between datasets
 72 | REF: sales.prod_id > product_details.product_id
 73 | REF: product_hierarchy.id - product_prices.id
 74 | ```
 75 | 
 76 | </details>
 77 | 
 78 | 
 79 | ## Datasets
 80 | The Case Study #7 contains 4 dataset:
 81 | 
 82 | <details>
 83 |   <summary>View Table: product_details</summary>
 84 | 
 85 | The `product_details` dataset lists the product information.
 86 |   
 87 | product_id | price | product_name | category_id | segment_id | style_id | category_name | segment_name | style_name
 88 | --- | --- | --- | --- | --- | --- | --- | --- | ---  
 89 | c4a632 | 13 | Navy Oversized Jeans - Womens | 1 | 3 | 7	 | Womens | Jeans | Navy Oversized
 90 | e83aa3 | 32 | Black Straight Jeans - Womens | 1 | 3 | 8 | Womens | Jeans | Black Straight
 91 | e31d39 | 10 | Cream Relaxed Jeans - Womens | 1 | 3 | 9 | Womens | Jeans | Cream Relaxed
 92 | d5e9a6 | 23 | Khaki Suit Jacket - Womens | 1 | 4 | 10 | Womens | Jacket | Khaki Suit
 93 | 72f5d4 | 19 | Indigo Rain Jacket - Womens | 1 | 4 | 11 | Womens | Jacket | Indigo Rain
 94 | 9ec847 | 54 | Grey Fashion Jacket - Womens | 1 | 4 | 12 | Womens | Jacket | Grey Fashion
 95 | 5d267b | 40 | White Tee Shirt - Mens | 2 | 5 | 13 | Mens | Shirt | White Tee
 96 | c8d436 | 10 | Teal Button Up Shirt - Mens | 2 | 5 | 14 | Mens | Shirt | Teal Button Up
 97 | 2a2353 | 57 | Blue Polo Shirt - Mens | 2 | 5 | 15 | Mens | Shirt | Blue Polo
 98 | f084eb | 36 | Navy Solid Socks - Mens | 2 | 6 | 16 | Mens | Socks | Navy Solid
 99 | b9a74d | 17 | White Striped Socks - Mens | 2 | 6 | 17 | Mens | Socks | White Striped
100 | 2feb6b | 29 | Pink Fluro Polkadot Socks - Mens | 2 | 6 | 18 | Mens | Socks | Pink Fluro Polkadot
101 | 
102 |   
103 | </details>
104 | 
105 | <details>
106 |   <summary>View Table: sales</summary>
107 | 
108 | The `sales` dataset lists product level information (transactions made for Balanced Tree Co.)
109 | 
110 | prod_id | qty | price | discount | member | txn_id | start_txn_time
111 |   --- | --- | --- | --- | --- | --- | --- 
112 | c4a632 | 4 | 13 | 17 | t | 54f307 | 2021-02-13 01:59:43.296
113 | 5d267b | 4 | 40 | 17 | t | 54f307 | 2021-02-13 01:59:43.296
114 | b9a74d | 4 | 17 | 17 | t | 54f307 | 2021-02-13 01:59:43.296
115 | 2feb6b | 2 | 29 | 17 | t | 54f307 | 2021-02-13 01:59:43.296
116 | c4a632 | 5 | 13 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456
117 | e31d39 | 2 | 10 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456
118 | 72f5d4 | 3 | 19 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456
119 | 2a2353 | 3 | 57 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456
120 | f084eb | 3 | 36 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456
121 | c4a632 | 1 | 13 | 21 | f | ef648d | 2021-01-27 02:18:17.1648
122 |   
123 | </details>
124 | 
125 | 
126 | <details>
127 |   <summary>View Table: product_hierarchy</summary>
128 | 
129 | The `product_hierarchy` dataset will be used to answer the **Bonus Question** to recreate the `product_details` dataset.
130 | 
131 | id | parent_id | level_text | level_name
132 | --- | --- | --- | --- 
133 | 1 | | Womens | Category
134 | 2 | | Mens | Category
135 | 3 | 1 | Jeans | Segment
136 | 4 | 1 | Jacket | Segment
137 | 5 | 2 | Shirt | Segment
138 | 6 | 2 | Socks | Segment
139 | 7 | 3 | Navy Oversized | Style
140 | 8 | 3 | Black Straight | Style
141 | 9 | 3 | Cream Relaxed | Style
142 | 10 | 4 | Khaki Suit | Style
143 | 11 | 4 | Indigo Rain | Style
144 | 12 | 4 | Grey Fashion | Style
145 | 13 | 5 | White Tee | Style
146 | 14 | 5 | Teal Button Up | Style
147 | 15 | 5 | Blue Polo | Style
148 | 16 | 6 | Navy Solid | Style
149 | 17 | 6 | White Striped | Style
150 | 18 | 6 | Pink Fluro Polkadot | Style
151 |   
152 | </details>
153 | 
154 | <details>
155 |   <summary>View Table: product_prices</summary>
156 |   
157 | The `product_prices` dataset will be used to answer the **Bonus Question** to recreate the `product_details` dataset.
158 |   
159 | id | product_id | price
160 | --- | --- | ---
161 | 7 | c4a632 | 13
162 | 8 | e83aa3 | 32
163 | 9 | e31d39 | 10
164 | 10 | d5e9a6 | 23
165 | 11 | 72f5d4 | 19
166 | 12 | 9ec847 | 54
167 | 13 | 5d267b | 40
168 | 14 | c8d436 | 10
169 | 15 | 2a2353 | 57
170 | 16 | f084eb | 36
171 | 17 | b9a74d | 17
172 | 18 | 2feb6b | 29
173 | 
174 | </details>
175 | 
176 | 
177 | 
178 | 
179 | ## Case Study Questions
180 | Case Study #7 is categorized into 4 question groups\
181 | To view the specific section, please open the link in a *`new tab`* or *`window`*.\
182 | [A. High Level Sales Analysis](CaseStudy7_solutions.md#A)\
183 | [B. Transaction Analysis](CaseStudy7_solutions.md#B)\
184 | [C. Product Analysis](CaseStudy7_solutions.md#C)\
185 | [D. Bonus Challenge](CaseStudy7_solutions.md#D)
186 | 
187 | 
188 | ## Solutions
189 | - View `balanced_tree` database: [**here**](CaseStudy7_schema.sql)
190 | - View Solution:
191 |     - [**Markdown File**](CaseStudy7_solutions.md): offers a more fluid and responsive viewing experience
192 |     - [**Jupyter Notebook**](CaseStudy7_solutions.ipynb): contains the original code
193 | 
194 | 
195 | ## PostgreSQL Topics Covered
196 | - Establish relationships between datasets to create an Entity-Relationship Diagram (ERD)
197 | - Common Table Expressions (CTE)
198 | - Window Functions
199 | - Subqueries
200 | - JOIN, SELF JOIN
201 | 


--------------------------------------------------------------------------------
/CaseStudy#8 - Fresh Segments/README.md:
--------------------------------------------------------------------------------
  1 | # Case Study #8: Fresh Segments</br>
  2 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-F5788D.svg?logo=GitHub)](https://github.com/chanronnie/8WeekSQLChallenge)
  3 | [![View Case Study 8](https://img.shields.io/badge/View-Case_Study_8-07A092)](https://8weeksqlchallenge.com/case-study-8/)</br>
  4 | ![8](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/4f340132-a725-4702-a865-808b623c467c)
  5 | 
  6 | 
  7 | The case study presented here is part of the **8 Week SQL Challenge**.\
  8 | It is kindly brought to us by [**Data With Danny**](https://8weeksqlchallenge.com).\
  9 | This time, I am using `PostgreSQL queries` (instead of MySQL) in `Jupyter Notebook`.
 10 | 
 11 | 
 12 | ## Table of Contents
 13 | * [Entity Relationship Diagram](#entity-relationship-diagram)
 14 | * [Datasets](#datasets)
 15 | * [Case Study Questions](#case-study-questions)
 16 | * [Solutions](#solutions)
 17 | * [PostgreSQL Topics Covered](#postgresql-topics-covered)
 18 | 
 19 | 
 20 | ## Entity Relationship Diagram
 21 | Again, I used the [DB Diagram tool](https://dbdiagram.io/home) to create the following ERD.
 22 | ![Fresh Segments](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/1a9d92ec-c04b-44ed-9600-e98220789525)
 23 | 
 24 | 
 25 | <details>
 26 | <summary>View code to create ERD</summary>
 27 |   
 28 | Here is the code that I used for creating the ERD for the Fresh Segments datasets on DB Diagram tool
 29 | 
 30 | ```markdown
 31 | TABLE interest_metrics {
 32 |   "_month" VARCHAR(4)
 33 |   "_year" VARCHAR(4)
 34 |   "month_year" VARCHAR(7)
 35 |   "interest_id" VARCHAR(5)
 36 |   "composition" FLOAT
 37 |   "index_value" FLOAT
 38 |   "ranking" INTEGER
 39 |   "percentile_ranking" FLOAT
 40 | }
 41 | 
 42 | TABLE interest_map {
 43 |   "id" INTEGER
 44 |   "interest_name" TEXT
 45 |   "interest_summary" TEXT
 46 |   "created_at" TIMESTAMP
 47 |   "last_modified" TIMESTAMP
 48 |   }
 49 | 
 50 | 
 51 | # Establish relationships between datasets
 52 |   REF: interest_metrics.interest_id > interest_map.id
 53 | ```
 54 | 
 55 | </details>
 56 | 
 57 | 
 58 | ## Datasets
 59 | The Case Study #8 contains 2 dataset:
 60 | 
 61 | ### `interest_metrics`
 62 | 
 63 | <details>
 64 |   <summary>View Table: interest_metrics</summary>
 65 | 
 66 | The `interest_metrics` table presents aggregated interest metrics for a major client of Fresh Segments, reflecting the performance of interest_id based on clicks and interactions with targeted advertising content among their customer base.
 67 | 
 68 | _month | _year | month_year | interest_id | composition | index_value | ranking | percentile_ranking
 69 | --- | --- | --- | --- | --- | --- | --- | --- 
 70 | 7 | 2018 | 07-2018 | 32486 | 11.89 | 6.19 | 1 | 99.86
 71 | 7 | 2018 | 07-2018 | 6106 | 9.93 | 5.31 | 2 | 99.73
 72 | 7 | 2018 | 07-2018 | 18923 | 10.85 | 5.29 | 3 | 99.59
 73 | 7 | 2018 | 07-2018 | 6344 | 10.32 | 5.1 | 4 | 99.45
 74 | 7 | 2018 | 07-2018 | 100 | 10.77 | 5.04 | 5 | 99.31
 75 | 7 | 2018 | 07-2018 | 69 | 10.82 | 5.03 | 6 | 99.18
 76 | 7 | 2018 | 07-2018 | 79 | 11.21 | 4.97 | 7 | 99.04
 77 | 7 | 2018 | 07-2018 | 6111 | 10.71 | 4.83 | 8 | 98.9
 78 | 7 | 2018 | 07-2018 | 6214 | 9.71 | 4.83 | 8 | 98.9
 79 | 7 | 2018 | 07-2018 | 19422 | 10.11 | 4.81 | 10 | 98.63
 80 | 
 81 | </details>
 82 | 
 83 | ### `interest_map`
 84 | 
 85 | <details>
 86 |   <summary>View Table: interest_map</summary>
 87 | 
 88 | The **`interest_map`** links the interest_id to the corresponding interest_name.
 89 | 
 90 | id | interest_name | interest_summary | created_at | last_modified
 91 | --- | --- | --- | --- | ---
 92 | 1 | Fitness Enthusiasts | Consumers using fitness tracking apps and websites. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12
 93 | 2 | Gamers | Consumers researching game reviews and cheat codes. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12
 94 | 3 | Car Enthusiasts | Readers of automotive news and car reviews. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12
 95 | 4 | Luxury Retail Researchers | Consumers researching luxury product reviews and gift ideas. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12
 96 | 5 | Brides & Wedding Planners | People researching wedding ideas and vendors. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12
 97 | 6 | Vacation Planners | Consumers reading reviews of vacation destinations and accommodations. | 2016-05-26 14:57:59 | 2018-05-23 11:30:13
 98 | 7 | Motorcycle Enthusiasts | Readers of motorcycle news and reviews. | 2016-05-26 14:57:59 | 2018-05-23 11:30:13
 99 | 8 | Business News Readers	 | Readers of online business news content. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12
100 | 12 | Thrift Store Shoppers | Consumers shopping online for clothing at thrift stores and researching locations. | 2016-05-26 14:57:59 | 2018-03-16 13:14:00
101 | 13 | Advertising Professionals | People who read advertising industry news. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12
102 | 
103 |   
104 | </details>
105 | 
106 | 
107 | ## Case Study Questions
108 | Case Study #8 is categorized into 4 question groups\
109 | To view the specific section, please open the link in a *`new tab`* or *`window`*.\
110 | [A. Data Exploration and Cleansing](CaseStudy8_solutions.md#A)\
111 | [B. Interest Analysis](CaseStudy8_solutions.md#B)\
112 | [C. Segment Analysis](CaseStudy8_solutions.md#C)\
113 | [D. Index Analysis](CaseStudy8_solutions.md#D)
114 | 
115 | 
116 | ## Solutions
117 | - View `fresh_segments` database: [**here**](CaseStudy8_schema.sql)
118 | - View Solution:
119 |     - [**Markdown File**](CaseStudy8_solutions.md): offers a more fluid and responsive viewing experience
120 |     - [**Jupyter Notebook**](CaseStudy8_solutions.ipynb): contains the original code
121 | 
122 | 
123 | ## PostgreSQL Topics Covered
124 | - Data Cleaning 
125 | - Common Table Expressions (CTE)
126 | - Window Functions (rolling average)
127 | - Subqueries
128 | - JOIN
129 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Ronnie Chan
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 8 Week SQL Challenge 
 2 | ![case](https://github.com/chanronnie/8WeekSQLChallenge/assets/121308347/80a28034-5afd-435c-83dc-0d9a6ffa100f)
 3 | 
 4 | 
 5 | In my journey to switch careers into the field of Data, I am excited to undertake the [**#8WeekSQLChallenge**](https://8weeksqlchallenge.com/), kindly created by [Data with Danny](https://linktr.ee/datawithdanny) in order to enhance my SQL skills. 
 6 | This repository will store all the solutions for the case studies included in this challenge.
 7 | 
 8 | ## Case Studies
 9 | Case Study | Topic | SQL | Status|
10 | | --- | --- | --- | --- |
11 | |📄 [**#1 Danny's Diner**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%231%20-%20Danny's%20Diner) |Customer Analytics | MySQL | **Completed** ✔️|
12 | |📄 [**#2 Pizza Runner**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%232%20-%20Pizza%20Runner) | Delivery Operations Analytics | MySQL| **Completed** ✔️|
13 | |📄 [**#3 Foodie-Fi**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%233%20-%20Foodie-Fi)| Subscription Analytics | MySQL| **Completed** ✔️|
14 | |📄 [**#4 Data Bank**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%234%20-%20Data%20Bank) | Financial Data Storage and Usage Analytics | PostgreSQL| **Completed** ✔️ |
15 | |📄 [**#5 Data Mart**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%235%20-%20Data%20Mart) | Sales Performance Analytics | PostgreSQL | **Completed** ✔️ |
16 | |📄 [**#6 Clique Bait**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%236%20-%20Clique%20Bait) | Digital Analytics| PostgreSQL | **Completed** ✔️ |
17 | |📄 [**#7 Balanced Tree**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%237%20-%20Balanced%20Tree) | Sales Performance Analytics | PostgreSQL | **Completed** ✔️ |
18 | |📄 [**#8 Fresh Segments**](https://github.com/chanronnie/8WeekSQLChallenge/tree/main/CaseStudy%238%20-%20Fresh%20Segments) | Digital Marketing Analytics | PostgreSQL | **Completed** ✔️|
19 | 
20 | ## Technologies
21 | ![MySQL](https://img.shields.io/badge/mysql-%2300f.svg?style=for-the-badge&logo=mysql&logoColor=white) 
22 | ![Postgres](https://img.shields.io/badge/postgres-%23316192.svg?style=for-the-badge&logo=postgresql&logoColor=white)
23 | ![Jupyter Notebook](https://img.shields.io/badge/jupyter-%23FA0F00.svg?style=for-the-badge&logo=jupyter&logoColor=white)
24 | 
25 | ## Installation
26 | 
27 | For writing **MySQL** queries in Jupyter Notebook, we will need to install the `pymysql` library
28 | ```
29 | pip install pymysql
30 | ```
31 | 
32 | For writing **PostgreSQL** queries in Jupyter Notebook, we will need to install the `psycopg2` library
33 | ```
34 | pip install psycopg2
35 | ```
36 | 


--------------------------------------------------------------------------------