├── Case Study #1 - Danny's Diner
├── README.md
├── SQL
│ ├── dannys_diner.sql
│ └── solution_1.sql
└── Solution.md
├── Case Study #2 - Pizza Runner
├── README.md
├── SQL
│ ├── pizza_runner.sql
│ └── solution_2.sql
└── Solution.md
├── Case Study #3 - Foodie-Fi
├── README.md
├── SQL
│ ├── foodie_fi.sql
│ └── solution_3.sql
└── Solution.md
├── Case Study #4 - Data Bank
└── Solution.md
├── Case Study #5 - Data Mart
├── README.md
├── SQL
│ ├── data_mart.sql
│ └── solution_5.sql
└── Solution.md
├── Case Study #6 - Clique Bait
├── README.md
├── SQL
│ ├── clique_bait.sql
│ └── solution_6.sql
└── Solution.md
├── Case Study #7 - Balanced Tree Clothing Co.
├── README.md
├── SQL
│ ├── balanced_tree.sql
│ └── solution_7.sql
└── Solution.md
├── Case Study #8 - Fresh Segments
├── README.md
├── SQL
│ ├── fresh_segments.sql
│ └── solution_8.sql
└── Solution.md
└── README.md
/Case Study #1 - Danny's Diner/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #1 - Danny's Diner ::ramen::
2 |
3 |
4 |
5 | Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business. Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite.
6 |
7 | ## Available Data
8 |
9 | Danny has shared with you 3 key datasets for this case study:
10 | - `sales`;
11 | - `menu`;
12 | - `members`.
13 |
14 | ### Table 1: `sales`
15 |
16 | The `sales` table captures all `customer_id` level purchases with an corresponding `order_date` and `product_id` information for when and what menu items were ordered.
17 |
18 | | customer_id | order_date | product_id |
19 | |-------------|------------|------------|
20 | | A | 2021-01-01 | 1 |
21 | | A | 2021-01-01 | 2 |
22 | | A | 2021-01-07 | 2 |
23 | | A | 2021-01-10 | 3 |
24 | | A | 2021-01-11 | 3 |
25 | | A | 2021-01-11 | 3 |
26 | | B | 2021-01-01 | 2 |
27 | | B | 2021-01-02 | 2 |
28 | | B | 2021-01-04 | 1 |
29 | | B | 2021-01-11 | 1 |
30 | | B | 2021-01-16 | 3 |
31 | | B | 2021-02-01 | 3 |
32 | | C | 2021-01-01 | 3 |
33 | | C | 2021-01-01 | 3 |
34 | | C | 2021-01-07 | 3 |
35 |
36 | ### Table 2: `menu`
37 |
38 | The `menu` table maps the `product_id` to the actual `product_name` and price of each menu item.
39 |
40 | | product_id | product_name | price |
41 | |------------|--------------|-------|
42 | | 1 | sushi | 10 |
43 | | 2 | curry | 15 |
44 | | 3 | ramen | 12 |
45 |
46 | ### Table 3: `members`
47 |
48 | The final members table captures the `join_date` when a `customer_id` joined the beta version of the Danny’s Diner loyalty program.
49 |
50 | | customer_id | join_date |
51 | |-------------|------------|
52 | | A | 2021-01-07 |
53 | | B | 2021-01-09 |
54 |
55 | ## Entity Relationship Diagram
56 |
57 | 
58 |
59 |
60 | ## Table of Contents
61 |
62 | [Problem Statement](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#problem-statement)
63 |
64 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#case-study-questions)
65 |
66 | [1. What is the total amount each customer spent at the restaurant?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#1-what-is-the-total-amount-each-customer-spent-at-the-restaurant)
67 |
68 | [2. How many days has each customer visited the restaurant?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#2-how-many-days-has-each-customer-visited-the-restaurant)
69 |
70 | [3. What was the first item from the menu purchased by each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#3-what-was-the-first-item-from-the-menu-purchased-by-each-customer)
71 |
72 | [4. What is the most purchased item on the menu and how many times was it purchased by all customers?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#4-what-is-the-most-purchased-item-on-the-menu-and-how-many-times-was-it-purchased-by-all-customers)
73 |
74 | [5. Which item was the most popular for each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#5-which-item-was-the-most-popular-for-each-customer)
75 |
76 | [6. Which item was purchased first by the customer after they became a member?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#6-which-item-was-purchased-first-by-the-customer-after-they-became-a-member)
77 |
78 | [7. Which item was purchased just before the customer became a member?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#7-which-item-was-purchased-just-before-the-customer-became-a-member)
79 |
80 | [8. What is the total items and amount spent for each member before they became a member?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#8-what-is-the-total-items-and-amount-spent-for-each-member-before-they-became-a-member)
81 |
82 | [9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#9-if-each-1-spent-equates-to-10-points-and-sushi-has-a-2x-points-multiplier---how-many-points-would-each-customer-have)
83 |
84 | [10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#10-in-the-first-week-after-a-customer-joins-the-program-including-their-join-date-they-earn-2x-points-on-all-items-not-just-sushi---how-many-points-do-customer-a-and-b-have-at-the-end-of-january)
85 |
86 | [Bonus Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#bonus-questions)
87 |
88 | [Join All The Things](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#join-all-the-things)
89 |
90 | [Rank All The Things](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#rank-all-the-things)
91 |
--------------------------------------------------------------------------------
/Case Study #1 - Danny's Diner/SQL/dannys_diner.sql:
--------------------------------------------------------------------------------
1 | CREATE SCHEMA dannys_diner;
2 | SET search_path = dannys_diner;
3 |
4 | CREATE TABLE sales (
5 | "customer_id" VARCHAR(1),
6 | "order_date" DATE,
7 | "product_id" INTEGER
8 | );
9 |
10 | INSERT INTO sales
11 | ("customer_id", "order_date", "product_id")
12 | VALUES
13 | ('A', '2021-01-01', '1'),
14 | ('A', '2021-01-01', '2'),
15 | ('A', '2021-01-07', '2'),
16 | ('A', '2021-01-10', '3'),
17 | ('A', '2021-01-11', '3'),
18 | ('A', '2021-01-11', '3'),
19 | ('B', '2021-01-01', '2'),
20 | ('B', '2021-01-02', '2'),
21 | ('B', '2021-01-04', '1'),
22 | ('B', '2021-01-11', '1'),
23 | ('B', '2021-01-16', '3'),
24 | ('B', '2021-02-01', '3'),
25 | ('C', '2021-01-01', '3'),
26 | ('C', '2021-01-01', '3'),
27 | ('C', '2021-01-07', '3');
28 |
29 |
30 | CREATE TABLE menu (
31 | "product_id" INTEGER,
32 | "product_name" VARCHAR(5),
33 | "price" INTEGER
34 | );
35 |
36 | INSERT INTO menu
37 | ("product_id", "product_name", "price")
38 | VALUES
39 | ('1', 'sushi', '10'),
40 | ('2', 'curry', '15'),
41 | ('3', 'ramen', '12');
42 |
43 |
44 | CREATE TABLE members (
45 | "customer_id" VARCHAR(1),
46 | "join_date" DATE
47 | );
48 |
49 | INSERT INTO members
50 | ("customer_id", "join_date")
51 | VALUES
52 | ('A', '2021-01-07'),
53 | ('B', '2021-01-09');
--------------------------------------------------------------------------------
/Case Study #1 - Danny's Diner/SQL/solution_1.sql:
--------------------------------------------------------------------------------
1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January, 4, 2022
2 | -- Fixed on January, 31, 2022
3 |
4 | /* --------------------
5 | Case Study Questions
6 | --------------------*/
7 |
8 | SET search_path = dannys_diner;
9 |
10 | -- 1. What is the total amount each customer spent at the restaurant?
11 |
12 | SELECT
13 | customer_id,
14 | SUM(price) as total_spent
15 | FROM
16 | sales as s
17 | JOIN menu as m ON s.product_id = m.product_id
18 | GROUP BY
19 | 1
20 | ORDER BY
21 | 1;
22 |
23 | -- 2. How many days has each customer visited the restaurant?
24 |
25 | SELECT
26 | customer_id,
27 | COUNT(DISTINCT order_date) AS days_of_visiting
28 | FROM
29 | sales
30 | GROUP BY
31 | 1
32 | ORDER BY
33 | 1;
34 |
35 | -- 3. What was the first item from the menu purchased by each customer?
36 |
37 | WITH ranked AS (
38 | SELECT
39 | customer_id,
40 | product_name,
41 | order_date,
42 | row_number() OVER (
43 | PARTITION BY customer_id
44 | ORDER BY
45 | order_date,
46 | s.product_id
47 | ) AS rank
48 | FROM
49 | sales AS s
50 | JOIN menu AS m ON s.product_id = m.product_id
51 | )
52 | SELECT
53 | customer_id,
54 | product_name,
55 | order_date::varchar
56 | FROM
57 | ranked
58 | WHERE
59 | rank = 1;
60 |
61 | -- 4. What is the most purchased item on the menu and how many times was it purchased by all customers?
62 |
63 | WITH totals AS (
64 | SELECT
65 | product_name,
66 | COUNT(product_name) AS total_purchase_quantity,
67 | row_number() OVER() AS rank
68 | FROM
69 | sales AS s
70 | JOIN menu AS m ON s.product_id = m.product_id
71 | GROUP BY
72 | 1
73 | )
74 | SELECT
75 | product_name,
76 | total_purchase_quantity
77 | FROM
78 | totals
79 | WHERE
80 | rank = 1;
81 |
82 | -- 5. Which item was the most popular for each customer?
83 |
84 | WITH ranked AS (
85 | SELECT
86 | customer_id,
87 | product_name,
88 | COUNT(product_name) AS total_purchase_quantity,
89 | rank() OVER (
90 | PARTITION BY customer_id
91 | ORDER BY
92 | COUNT(product_name) desc
93 | ) AS rank
94 | FROM
95 | sales AS s
96 | JOIN menu AS m ON s.product_id = m.product_id
97 | GROUP BY
98 | customer_id,
99 | product_name
100 | )
101 | SELECT
102 | customer_id,
103 | product_name,
104 | total_purchase_quantity
105 | FROM
106 | ranked
107 | WHERE
108 | rank = 1;
109 |
110 | -- 6. Which item was purchased first by the customer after they became a member?
111 |
112 | WITH ranked AS (
113 | SELECT
114 | s.customer_id,
115 | order_date,
116 | join_date,
117 | product_name,
118 | row_number() OVER (
119 | PARTITION BY s.customer_id
120 | ORDER BY
121 | order_date
122 | ) AS rank
123 | FROM
124 | sales AS s
125 | JOIN members AS mm ON s.customer_id = mm.customer_id
126 | JOIN menu AS m ON s.product_id = m.product_id
127 | WHERE
128 | order_date >= join_date
129 | )
130 | SELECT
131 | customer_id,
132 | join_date::varchar,
133 | order_date::varchar,
134 | product_name
135 | FROM
136 | ranked AS r
137 | WHERE
138 | rank = 1
139 | ORDER BY
140 | 1;
141 |
142 | -- 7. Which item was purchased just before the customer became a member?
143 |
144 | WITH ranked AS (
145 | SELECT
146 | s.customer_id,
147 | order_date,
148 | join_date,
149 | product_name,
150 | rank() OVER (
151 | PARTITION BY s.customer_id
152 | ORDER BY
153 | order_date DESC
154 | ) AS rank
155 | FROM
156 | sales AS s
157 | JOIN members AS mm ON s.customer_id = mm.customer_id
158 | JOIN menu AS m ON s.product_id = m.product_id
159 | WHERE
160 | order_date < join_date
161 | )
162 | SELECT
163 | customer_id,
164 | join_date::varchar,
165 | order_date::varchar,
166 | product_name
167 | FROM
168 | ranked AS r
169 | WHERE
170 | rank = 1
171 | ORDER BY
172 | 1;
173 |
174 | -- 8. What is the total items and amount spent for each member before they became a member?
175 |
176 | SELECT
177 | s.customer_id,
178 | COUNT(product_name) AS total_number_of_items,
179 | SUM(price) AS total_purchase_amount
180 | FROM
181 | sales AS s
182 | JOIN members AS mm ON s.customer_id = mm.customer_id
183 | JOIN menu AS m ON s.product_id = m.product_id
184 | WHERE
185 | order_date < join_date
186 | GROUP BY
187 | 1
188 | ORDER BY
189 | 1;
190 |
191 | -- 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
192 |
193 | SELECT
194 | customer_id,
195 | SUM(point) AS points
196 | FROM
197 | sales AS s
198 | JOIN (
199 | SELECT
200 | product_id,
201 | CASE
202 | WHEN product_id = 1 THEN price * 20
203 | ELSE price * 10
204 | END AS point
205 | FROM
206 | menu
207 | ) AS p ON s.product_id = p.product_id
208 | GROUP BY
209 | 1
210 | ORDER BY
211 | 1;
212 |
213 | -- 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
214 |
215 | WITH count_points AS (
216 | SELECT
217 | s.customer_id,
218 | order_date,
219 | join_date,
220 | product_name,
221 | SUM(point) AS point
222 | FROM
223 | sales AS s
224 | JOIN (
225 | SELECT
226 | product_id,
227 | product_name,
228 | CASE
229 | WHEN product_name = 'sushi' THEN price * 20
230 | ELSE price * 10
231 | END AS point
232 | FROM
233 | menu AS m
234 | ) AS p ON s.product_id = p.product_id
235 | JOIN members AS mm ON s.customer_id = mm.customer_id
236 | GROUP BY
237 | s.customer_id,
238 | order_date,
239 | join_date,
240 | product_name,
241 | point
242 | )
243 | SELECT
244 | customer_id,
245 | SUM(
246 | CASE
247 | WHEN order_date >= join_date
248 | AND order_date < join_date + (7 * INTERVAL '1 day')
249 | AND product_name != 'sushi' THEN point * 2
250 | ELSE point
251 | END
252 | ) AS new_points
253 | FROM
254 | count_points
255 | WHERE
256 | DATE_PART('month', order_date) = 1
257 | GROUP BY
258 | 1
259 | ORDER BY
260 | 1;
261 |
262 | /* --------------------
263 | Bonus Questions
264 | --------------------*/
265 |
266 | -- Join All The Things
267 |
268 | WITH members AS (
269 | SELECT
270 | s.customer_id,
271 | order_date,
272 | product_name,
273 | price,
274 | join_date
275 | FROM
276 | sales AS s
277 | JOIN menu AS m ON s.product_id = m.product_id
278 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id
279 | )
280 | SELECT
281 | customer_id,
282 | order_date::varchar,
283 | product_name,
284 | price,
285 | CASE
286 | WHEN order_date >= join_date THEN 'Y'
287 | ELSE 'N'
288 | END AS member
289 | FROM
290 | members
291 | ORDER BY
292 | 1,
293 | 2,
294 | 3;
295 |
296 | -- Rank All The Things
297 |
298 | WITH members AS (
299 | SELECT
300 | s.customer_id,
301 | order_date::varchar,
302 | product_name,
303 | price,
304 | CASE
305 | WHEN order_date >= join_date THEN 'Y'
306 | ELSE 'N'
307 | END AS member
308 | FROM
309 | sales AS s
310 | JOIN menu AS m ON s.product_id = m.product_id
311 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id
312 | )
313 | SELECT
314 | *,
315 | CASE
316 | WHEN member = 'Y' THEN rank() OVER (
317 | PARTITION BY customer_id,
318 | member
319 | ORDER BY
320 | order_date
321 | )
322 | END AS ranking
323 | FROM
324 | members
325 | ORDER BY
326 | customer_id,
327 | order_date,
328 | product_name;
329 |
--------------------------------------------------------------------------------
/Case Study #1 - Danny's Diner/Solution.md:
--------------------------------------------------------------------------------
1 | # Case Study #1 - Danny's Diner :ramen:
2 |
3 | ## Problem Statement
4 |
5 | Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.
6 |
7 | Full description: [Case Study #1 - Danny's Diner](https://8weeksqlchallenge.com/case-study-1/)
8 |
9 | ## Case Study Questions
10 |
11 | Each of the following case study questions can be answered using a single SQL statement. I'll mostly use two queries for convenience purposes.
12 |
13 | #### 1. What is the total amount each customer spent at the restaurant?
14 |
15 | ````sql
16 | SET
17 | search_path = dannys_diner;
18 | SELECT
19 | customer_id,
20 | SUM(price) as total_spent
21 | FROM
22 | sales as s
23 | JOIN menu as m ON s.product_id = m.product_id
24 | GROUP BY
25 | 1
26 | ORDER BY
27 | 1
28 | ````
29 |
30 | | customer_id | total_spent |
31 | | ----------- | ----------- |
32 | | A | 76 |
33 | | B | 74 |
34 | | C | 36 |
35 |
36 | ---
37 | #### 2. How many days has each customer visited the restaurant?
38 |
39 | ````sql
40 | SET
41 | search_path = dannys_diner;
42 | SELECT
43 | customer_id,
44 | COUNT(DISTINCT order_date) AS days_of_visiting
45 | FROM
46 | sales
47 | GROUP BY
48 | 1
49 | ORDER BY
50 | 1
51 | ````
52 |
53 | | customer_id | days_of_visiting |
54 | | ----------- | ---------------- |
55 | | A | 4 |
56 | | B | 6 |
57 | | C | 2 |
58 |
59 | ---
60 |
61 | #### 3. What was the first item from the menu purchased by each customer?
62 |
63 | To get the first item we need to rank the items ordered by each customer in a temporary table using `WITH` statement.
64 |
65 | After we have those ranks, we can select the rows with the rank = 1. As the customer A made two orders at the first day, we need to use `ORDER BY` in the window function by two criteria: `order_date` and `product_id`.
66 |
67 | In the final query I cast date as `varchar` to remove time ans show the date only.
68 |
69 | ````sql
70 | SET
71 | search_path = dannys_diner;
72 | WITH ranked AS (
73 | SELECT
74 | customer_id,
75 | product_name,
76 | order_date,
77 | row_number() OVER (
78 | PARTITION BY customer_id
79 | ORDER BY
80 | order_date,
81 | s.product_id
82 | ) AS rank
83 | FROM
84 | sales AS s
85 | JOIN menu AS m ON s.product_id = m.product_id
86 | )
87 | SELECT
88 | customer_id,
89 | product_name,
90 | order_date::varchar
91 | FROM
92 | ranked
93 | WHERE
94 | rank = 1
95 | ````
96 |
97 | | customer_id | product_name | order_date |
98 | | ----------- | ------------ | ---------- |
99 | | A | sushi | 2021-01-01 |
100 | | B | curry | 2021-01-01 |
101 | | C | ramen | 2021-01-01 |
102 |
103 | ---
104 |
105 | The query without using window functions, returns two results for customer A, one result for customer B and two results for customer C:
106 |
107 | ````sql
108 | SELECT
109 | customer_id,
110 | product_name,
111 | order_date::varchar
112 | FROM
113 | dannys_diner.sales
114 | JOIN dannys_diner.menu ON dannys_diner.sales.product_id = dannys_diner.menu.product_id
115 | WHERE
116 | order_date IN (
117 | SELECT
118 | order_date
119 | FROM
120 | dannys_diner.sales
121 | LIMIT
122 | 1
123 | )
124 | ORDER BY
125 | 1
126 | ````
127 |
128 | | customer_id | product_name | order_date |
129 | | ----------- | ------------ | ---------- |
130 | | A | sushi | 2021-01-01 |
131 | | A | curry | 2021-01-01 |
132 | | B | curry | 2021-01-01 |
133 | | C | ramen | 2021-01-01 |
134 | | C | ramen | 2021-01-01 |
135 |
136 | ---
137 |
138 | ***The first purchase for customer A was :sushi:***
139 |
140 | ***The first purchase for customer B was :curry:***
141 |
142 | ***The first (and the only) purchase for customer C was :ramen:***
143 |
144 | #### 4. What is the most purchased item on the menu and how many times was it purchased by all customers?
145 |
146 | ````sql
147 | SET
148 | search_path = dannys_diner;
149 | WITH totals AS (
150 | SELECT
151 | product_name,
152 | COUNT(product_name) AS total_purchase_quantity,
153 | row_number() OVER() AS rank
154 | FROM
155 | sales AS s
156 | JOIN menu AS m ON s.product_id = m.product_id
157 | GROUP BY
158 | 1
159 | )
160 | SELECT
161 | product_name,
162 | total_purchase_quantity
163 | FROM
164 | totals
165 | WHERE
166 | rank = 1
167 | ````
168 | | product_name | total_purchase_quantity |
169 | | ------------ | ----------------------- |
170 | | ramen | 8 |
171 |
172 | ---
173 |
174 | ***The most purchased item on the menu was :ramen:, it was purchased 8 times in total.***
175 |
176 | #### 5. Which item was the most popular for each customer?
177 |
178 | Let's look at all the results sorted by purchase frequency:
179 |
180 | ````sql
181 | SET
182 | search_path = dannys_diner;
183 | SELECT
184 | customer_id,
185 | product_name,
186 | COUNT(product_name) AS total_purchase_quantity
187 | FROM
188 | sales AS s
189 | INNER JOIN menu AS m ON s.product_id = m.product_id
190 | GROUP BY
191 | customer_id,
192 | product_name
193 | ORDER BY
194 | total_purchase_quantity DESC
195 | ````
196 |
197 | | customer_id | product_name | total_purchase_quantity |
198 | | ----------- | ------------ | ----------------------- |
199 | | C | ramen | 3 |
200 | | A | ramen | 3 |
201 | | B | curry | 2 |
202 | | B | sushi | 2 |
203 | | B | ramen | 2 |
204 | | A | curry | 2 |
205 | | A | sushi | 1 |
206 |
207 | ---
208 |
209 | Now we can select the most popular products for each customer using `rank` window function:
210 |
211 | ````sql
212 | SET
213 | search_path = dannys_diner;
214 | WITH ranked AS (
215 | SELECT
216 | customer_id,
217 | product_name,
218 | COUNT(product_name) AS total_purchase_quantity,
219 | rank() OVER (
220 | PARTITION BY customer_id
221 | ORDER BY
222 | COUNT(product_name) desc
223 | ) AS rank
224 | FROM
225 | sales AS s
226 | JOIN menu AS m ON s.product_id = m.product_id
227 | GROUP BY
228 | customer_id,
229 | product_name
230 | )
231 | SELECT
232 | customer_id,
233 | product_name,
234 | total_purchase_quantity
235 | FROM
236 | ranked
237 | WHERE
238 | rank = 1
239 | ````
240 |
241 | | customer_id | product_name | total_purchase_quantity |
242 | | ----------- | ------------ | ----------------------- |
243 | | A | ramen | 3 |
244 | | B | ramen | 2 |
245 | | B | curry | 2 |
246 | | B | sushi | 2 |
247 | | C | ramen | 3 |
248 |
249 | ---
250 |
251 | ***The most popular item for customer A was :ramen:, they purchased it 3 times.***
252 |
253 | ***The most popular item for customer B was :curry:, :ramen: and :sushi:, they purchased each dish 2 times.***
254 |
255 | ***The most popular item for customer C was :ramen:, they purchased it 3 times.***
256 |
257 | #### 6. Which item was purchased first by the customer after they became a member?
258 |
259 | Let's consider that if the purchase date matches the membership date, then the purchase made on this date, was the first customer's purchase as a member.
260 | It means that we need to include this date in the WHERE statement.
261 |
262 | ````sql
263 | SET
264 | search_path = dannys_diner;
265 | WITH ranked AS (
266 | SELECT
267 | s.customer_id,
268 | order_date,
269 | join_date,
270 | product_name,
271 | row_number() OVER (
272 | PARTITION BY s.customer_id
273 | ORDER BY
274 | order_date
275 | ) AS rank
276 | FROM
277 | sales AS s
278 | JOIN members AS mm ON s.customer_id = mm.customer_id
279 | JOIN menu AS m ON s.product_id = m.product_id
280 | WHERE
281 | order_date >= join_date
282 | )
283 | SELECT
284 | customer_id,
285 | join_date::varchar,
286 | order_date::varchar,
287 | product_name
288 | FROM
289 | ranked AS r
290 | WHERE
291 | rank = 1
292 | ORDER BY
293 | 1
294 | ````
295 |
296 | | customer_id | join_date | order_date | product_name |
297 | | ----------- | ---------- | ---------- | ------------ |
298 | | A | 2021-01-07 | 2021-01-07 | curry |
299 | | B | 2021-01-09 | 2021-01-11 | sushi |
300 |
301 | ---
302 |
303 | #### 7. Which item was purchased just before the customer became a member?
304 |
305 | Customer A purchased their membership on January, 7 - and they placed an order that day.
306 | We do not have time and therefore can not say exactly if this purchase was made before of after they became a member.
307 | Let's consider that if the purchase date matches the membership date, then the purchase made on this date, was the first customer's purchase as a member.
308 | It means that we need to exclude this date in the `WHERE` statement.
309 |
310 | ````sql
311 | SET
312 | search_path = dannys_diner;
313 | WITH ranked AS (
314 | SELECT
315 | s.customer_id,
316 | order_date,
317 | join_date,
318 | product_name,
319 | rank() OVER (
320 | PARTITION BY s.customer_id
321 | ORDER BY
322 | order_date DESC
323 | ) AS rank
324 | FROM
325 | sales AS s
326 | JOIN members AS mm ON s.customer_id = mm.customer_id
327 | JOIN menu AS m ON s.product_id = m.product_id
328 | WHERE
329 | order_date < join_date
330 | )
331 | SELECT
332 | customer_id,
333 | join_date::varchar,
334 | order_date::varchar,
335 | product_name
336 | FROM
337 | ranked AS r
338 | WHERE
339 | rank = 1
340 | ORDER BY
341 | 1
342 | ````
343 |
344 | | customer_id | join_date | order_date | product_name |
345 | | ----------- | ---------- | ---------- | ------------ |
346 | | A | 2021-01-07 | 2021-01-01 | sushi |
347 | | A | 2021-01-07 | 2021-01-01 | curry |
348 | | B | 2021-01-09 | 2021-01-04 | sushi |
349 |
350 | ---
351 |
352 | Customer A purchased two items on January, 1 - the date before they became a member.
353 | We need more information to tell exactly what item was purchased before they became a member: order number or purchase time. I am keeping two items in the list for now.
354 |
355 | ***Customer A purchased :curry: and :sushi: on 2021-01-01***
356 |
357 | ***Customer B purchased :sushi: on 2021-01-04***
358 |
359 | #### 8. What is the total items and amount spent for each member before they became a member?
360 |
361 | Let's consider that if the purchase date matches the membership date, then the purchase made on this date, was the first customer's purchase as a member.
362 | It means that we need to exclude this date in the WHERE statement.
363 |
364 | ````sql
365 | SET
366 | search_path = dannys_diner;
367 | SELECT
368 | s.customer_id,
369 | COUNT(product_name) AS total_number_of_items,
370 | SUM(price) AS total_purchase_amount
371 | FROM
372 | sales AS s
373 | JOIN members AS mm ON s.customer_id = mm.customer_id
374 | JOIN menu AS m ON s.product_id = m.product_id
375 | WHERE
376 | order_date < join_date
377 | GROUP BY
378 | 1
379 | ORDER BY
380 | 1
381 | ````
382 |
383 | | customer_id | total_number_of_items | total_purchase_amount |
384 | | ----------- | --------------------- | --------------------- |
385 | | A | 2 | 25 |
386 | | B | 3 | 40 |
387 |
388 | ---
389 |
390 | #### 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
391 |
392 | ````sql
393 | SET
394 | search_path = dannys_diner;
395 | SELECT
396 | customer_id,
397 | SUM(point) AS points
398 | FROM
399 | sales AS s
400 | JOIN (
401 | SELECT
402 | product_id,
403 | CASE
404 | WHEN product_id = 1 THEN price * 20
405 | ELSE price * 10
406 | END AS point
407 | FROM
408 | menu
409 | ) AS p ON s.product_id = p.product_id
410 | GROUP BY
411 | 1
412 | ORDER BY
413 | 1
414 | ````
415 |
416 | | customer_id | points |
417 | | ----------- | ------ |
418 | | A | 860 |
419 | | B | 940 |
420 | | C | 360 |
421 |
422 | ---
423 |
424 | #### 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
425 |
426 | First we need to count points as usual: 10 points for each dollar spent on :curry: and :ramen: and 20 points for each dollar spent on :sushi:.
427 | We add this calculation to the `CTE` using `WITH` statement. Next we use this `CTE` to add extra 10 points for all the purchases of :curry: and :ramen: made by customers on the first week of their membership and return the sum of new points. The points for :sushi: remain the same - 20 points.
428 |
429 | ````sql
430 | SET
431 | search_path = dannys_diner;
432 | WITH count_points AS (
433 | SELECT
434 | s.customer_id,
435 | order_date,
436 | join_date,
437 | product_name,
438 | SUM(point) AS point
439 | FROM
440 | sales AS s
441 | JOIN (
442 | SELECT
443 | product_id,
444 | product_name,
445 | CASE
446 | WHEN product_name = 'sushi' THEN price * 20
447 | ELSE price * 10
448 | END AS point
449 | FROM
450 | menu AS m
451 | ) AS p ON s.product_id = p.product_id
452 | JOIN members AS mm ON s.customer_id = mm.customer_id
453 | GROUP BY
454 | s.customer_id,
455 | order_date,
456 | join_date,
457 | product_name,
458 | point
459 | )
460 | SELECT
461 | customer_id,
462 | SUM(
463 | CASE
464 | WHEN order_date >= join_date
465 | AND order_date < join_date + (7 * INTERVAL '1 day')
466 | AND product_name != 'sushi' THEN point * 2
467 | ELSE point
468 | END
469 | ) AS new_points
470 | FROM
471 | count_points
472 | WHERE
473 | DATE_PART('month', order_date) = 1
474 | GROUP BY
475 | 1
476 | ORDER BY
477 | 1
478 | ````
479 |
480 | | customer_id | new_points |
481 | | ----------- | ---------- |
482 | | A | 1370 |
483 | | B | 820 |
484 |
485 | ---
486 |
487 | ***Customer A at the end of January would have 1370 points***
488 |
489 | ***Customer B at the end of January would have 820 points*** and 0 benefits from their first week membership
490 |
491 | ## Bonus Questions
492 |
493 | ### Join All The Things
494 |
495 | ````sql
496 | SET
497 | search_path = dannys_diner;
498 | WITH members AS (
499 | SELECT
500 | s.customer_id,
501 | order_date,
502 | product_name,
503 | price,
504 | join_date
505 | FROM
506 | sales AS s
507 | JOIN menu AS m ON s.product_id = m.product_id
508 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id
509 | )
510 | SELECT
511 | customer_id,
512 | order_date::varchar,
513 | product_name,
514 | price,
515 | CASE
516 | WHEN order_date >= join_date THEN 'Y'
517 | ELSE 'N'
518 | END AS member
519 | FROM
520 | members
521 | ORDER BY
522 | 1,
523 | 2,
524 | 3
525 | ````
526 |
527 | | customer_id | order_date | product_name | price | member |
528 | | ----------- | ---------- | ------------ | ----- | ------ |
529 | | A | 2021-01-01 | curry | 15 | N |
530 | | A | 2021-01-01 | sushi | 10 | N |
531 | | A | 2021-01-07 | curry | 15 | Y |
532 | | A | 2021-01-10 | ramen | 12 | Y |
533 | | A | 2021-01-11 | ramen | 12 | Y |
534 | | A | 2021-01-11 | ramen | 12 | Y |
535 | | B | 2021-01-01 | curry | 15 | N |
536 | | B | 2021-01-02 | curry | 15 | N |
537 | | B | 2021-01-04 | sushi | 10 | N |
538 | | B | 2021-01-11 | sushi | 10 | Y |
539 | | B | 2021-01-16 | ramen | 12 | Y |
540 | | B | 2021-02-01 | ramen | 12 | Y |
541 | | C | 2021-01-01 | ramen | 12 | N |
542 | | C | 2021-01-01 | ramen | 12 | N |
543 | | C | 2021-01-07 | ramen | 12 | N |
544 |
545 | ---
546 |
547 | ### Rank All The Things
548 |
549 | First we need to select all the necessary columns from `sales`, `menu` and `members` tables - we do that using CTE and `WITH` statement.
550 | Next we can rank orders from this table by `customer_id` and `member` columns.
551 |
552 | ````sql
553 | SET
554 | search_path = dannys_diner;
555 | WITH members AS (
556 | SELECT
557 | s.customer_id,
558 | order_date::varchar,
559 | product_name,
560 | price,
561 | CASE
562 | WHEN order_date >= join_date THEN 'Y'
563 | ELSE 'N'
564 | END AS member
565 | FROM
566 | sales AS s
567 | JOIN menu AS m ON s.product_id = m.product_id
568 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id
569 | )
570 | SELECT
571 | *,
572 | CASE
573 | WHEN member = 'Y' THEN rank() OVER (
574 | PARTITION BY customer_id,
575 | member
576 | ORDER BY
577 | order_date
578 | )
579 | END AS ranking
580 | FROM
581 | members
582 | ORDER BY
583 | customer_id,
584 | order_date,
585 | product_name
586 | ````
587 |
588 | | customer_id | order_date | product_name | price | member | ranking |
589 | | ----------- | ---------- | ------------ | ----- | ------ | ------- |
590 | | A | 2021-01-01 | curry | 15 | N | |
591 | | A | 2021-01-01 | sushi | 10 | N | |
592 | | A | 2021-01-07 | curry | 15 | Y | 1 |
593 | | A | 2021-01-10 | ramen | 12 | Y | 2 |
594 | | A | 2021-01-11 | ramen | 12 | Y | 3 |
595 | | A | 2021-01-11 | ramen | 12 | Y | 3 |
596 | | B | 2021-01-01 | curry | 15 | N | |
597 | | B | 2021-01-02 | curry | 15 | N | |
598 | | B | 2021-01-04 | sushi | 10 | N | |
599 | | B | 2021-01-11 | sushi | 10 | Y | 1 |
600 | | B | 2021-01-16 | ramen | 12 | Y | 2 |
601 | | B | 2021-02-01 | ramen | 12 | Y | 3 |
602 | | C | 2021-01-01 | ramen | 12 | N | |
603 | | C | 2021-01-01 | ramen | 12 | N | |
604 | | C | 2021-01-07 | ramen | 12 | N | |
605 |
606 | ---
607 |
--------------------------------------------------------------------------------
/Case Study #2 - Pizza Runner/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #2 - Pizza Runner ::pizza::
2 |
3 |
4 |
5 | Danny requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations.
6 |
7 | All datasets exist within the `pizza_runner` database schema.
8 |
9 | ## Available Data
10 |
11 | ### Table 1: `runners`
12 |
13 | The `runners` table shows the `registration_date` for each new runner
14 |
15 | | runner_id | registration_date |
16 | |-----------|-------------------|
17 | | 1 | 2021-01-01 |
18 | | 2 | 2021-01-03 |
19 | | 3 | 2021-01-08 |
20 | | 4 | 2021-01-15 |
21 |
22 | ### Table 2: `customer_orders`
23 |
24 | Customer `pizza orders` are captured in the `customer_orders` table with 1 row for each individual pizza that is part of the order.
25 |
26 | The `pizza_id` relates to the type of pizza which was ordered whilst the `exclusions` are the `ingredient_id` values which should be removed from the pizza and the `extras` are the `ingredient_id` values which need to be added to the pizza.
27 |
28 | Note that customers can order multiple pizzas in a single order with varying exclusions and extras values even if the pizza is the same type!
29 |
30 | The `exclusions` and `extras` columns will need to be cleaned up before using them in your queries.
31 |
32 | | order_id | customer_id | pizza_id | exclusions | extras | order_time |
33 | |----------|-------------|----------|---------------------|---------------------|---------------------|
34 | | 1 | 101 | 1 | 2021-01-01 18:05:02 | | |
35 | | 2 | 101 | 1 | 2021-01-01 19:00:52 | | |
36 | | 3 | 102 | 1 | 2021-01-02 23:51:23 | | |
37 | | 3 | 102 | 2 | NaN | 2021-01-02 23:51:23 | |
38 | | 4 | 103 | 1 | 4 | 2021-01-04 13:23:46 | |
39 | | 4 | 103 | 1 | 4 | 2021-01-04 13:23:46 | |
40 | | 4 | 103 | 2 | 4 | 2021-01-04 13:23:46 | |
41 | | 5 | 104 | 1 | null | 1 | 2021-01-08 21:00:29 |
42 | | 6 | 101 | 2 | null | null | 2021-01-08 21:03:13 |
43 | | 7 | 105 | 2 | null | 1 | 2021-01-08 21:20:29 |
44 | | 8 | 102 | 1 | null | null | 2021-01-09 23:54:33 |
45 | | 9 | 103 | 1 | 4 | 1, 5 | 2021-01-10 11:22:59 |
46 | | 10 | 104 | 1 | null | null | 2021-01-11 18:34:49 |
47 | | 10 | 104 | 1 | 2, 6 | 1, 4 | 2021-01-11 18:34:49 |
48 |
49 | ### Table 3: `runner_orders`
50 |
51 | After each orders are received through the system - they are assigned to a runner - however not all orders are fully completed and can be cancelled by the restaurant or the customer.
52 |
53 | The `pickup_time` is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The `distance` and `duration` fields are related to how far and long the runner had to travel to deliver the order to the respective customer.
54 |
55 | There are some known data issues with this table so be careful when using this in your queries - make sure to check the data types for each column in the schema SQL!
56 |
57 | | order_id | runner_id | pickup_time | distance | duration | cancellation |
58 | |----------|-----------|---------------------|----------|------------|-------------------------|
59 | | 1 | 1 | 2021-01-01 18:15:34 | 20km | 32 minutes | |
60 | | 2 | 1 | 2021-01-01 19:10:54 | 20km | 27 minutes | |
61 | | 3 | 1 | 2021-01-03 00:12:37 | 13.4km | 20 mins | NaN |
62 | | 4 | 2 | 2021-01-04 13:53:03 | 23.4 | 40 | NaN |
63 | | 5 | 3 | 2021-01-08 21:10:57 | 10 | 15 | NaN |
64 | | 6 | 3 | null | null | null | Restaurant Cancellation |
65 | | 7 | 2 | 2020-01-08 21:30:45 | 25km | 25mins | null |
66 | | 8 | 2 | 2020-01-10 00:15:02 | 23.4 km | 15 minute | null |
67 | | 9 | 2 | null | null | null | Customer Cancellation |
68 | | 10 | 1 | 2020-01-11 18:50:20 | 10km | 10minutes | null |
69 |
70 | ### Table 4: `pizza_names`
71 |
72 | At the moment - Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!
73 |
74 | | pizza_id | pizza_name |
75 | |----------|-------------|
76 | | 1 | Meat Lovers |
77 | | 2 | Vegetarian |
78 |
79 | ### Table 5: `pizza_recipes`
80 |
81 | Each `pizza_id` has a standard set of toppings which are used as part of the pizza recipe.
82 |
83 | | pizza_id | toppings |
84 | |----------|------------------------ |
85 | | 1 | 1, 2, 3, 4, 5, 6, 8, 10 |
86 | | 2 | 4, 6, 7, 9, 11, 12 |
87 |
88 | ### Table 6: `pizza_toppings`
89 |
90 | This table contains all of the `topping_name` values with their corresponding `topping_id` value
91 |
92 | | topping_id | topping_name |
93 | |------------|--------------|
94 | | 1 | Bacon |
95 | | 2 | BBQ Sauce |
96 | | 3 | Beef |
97 | | 4 | Cheese |
98 | | 5 | Chicken |
99 | | 6 | Mushrooms |
100 | | 7 | Onions |
101 | | 8 | Pepperoni |
102 | | 9 | Peppers |
103 | | 10 | Salami |
104 | | 11 | Tomatoes |
105 | | 12 | Tomato Sauce |
106 |
107 | ## Entity Relationship Diagram
108 |
109 | 
110 |
111 | ## Table of Contents
112 |
113 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#introduction)
114 |
115 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#case-study-questions)
116 |
117 | [A. Pizza Metrics](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#a-pizza-metrics)
118 |
119 | [1. How many pizzas were ordered?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-how-many-pizzas-were-ordered)
120 |
121 | [2. How many unique customer orders were made?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-how-many-unique-customer-orders-were-made)
122 |
123 | [3. How many successful orders were delivered by each runner?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-how-many-successful-orders-were-delivered-by-each-runner)
124 |
125 | [4. How many of each type of pizza was delivered?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-how-many-of-each-type-of-pizza-was-delivered)
126 |
127 | [5. How many Vegetarian and Meatlovers were ordered by each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-how-many-vegetarian-and-meatlovers-were-ordered-by-each-customer)
128 |
129 | [6. What was the maximum number of pizzas delivered in a single order?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#6-what-was-the-maximum-number-of-pizzas-delivered-in-a-single-order)
130 |
131 | [7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#7-for-each-customer-how-many-delivered-pizzas-had-at-least-1-change-and-how-many-had-no-changes)
132 |
133 | [8. How many pizzas were delivered that had both exclusions and extras?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#8-how-many-pizzas-were-delivered-that-had-both-exclusions-and-extras)
134 |
135 | [9. What was the total volume of pizzas ordered for each hour of the day?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#9-what-was-the-total-volume-of-pizzas-ordered-for-each-hour-of-the-day)
136 |
137 | [10. What was the volume of orders for each day of the week?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#10-what-was-the-volume-of-orders-for-each-day-of-the-week)
138 |
139 | [B. Runner and Customer Experience](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#b-runner-and-customer-experience)
140 |
141 | [1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-how-many-runners-signed-up-for-each-1-week-period-ie-week-starts-2021-01-01)
142 |
143 | [2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-what-was-the-average-time-in-minutes-it-took-for-each-runner-to-arrive-at-the-pizza-runner-hq-to-pickup-the-order)
144 |
145 | [3. Is there any relationship between the number of pizzas and how long the order takes to prepare?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-is-there-any-relationship-between-the-number-of-pizzas-and-how-long-the-order-takes-to-prepare)
146 |
147 | [4. What was the average distance travelled for each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-what-was-the-average-distance-travelled-for-each-customer)
148 |
149 | [5. What was the difference between the longest and shortest delivery times for all orders?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-what-was-the-difference-between-the-longest-and-shortest-delivery-times-for-all-orders)
150 |
151 | [6. What was the average speed for each runner for each delivery and do you notice any trend for these values?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#6-what-was-the-average-speed-for-each-runner-for-each-delivery-and-do-you-notice-any-trend-for-these-values)
152 |
153 | [7. What is the successful delivery percentage for each runner?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#7-what-is-the-successful-delivery-percentage-for-each-runner)
154 |
155 | [C. Ingredient Optimisation](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#c-ingredient-optimisation)
156 |
157 | [1. What are the standard ingredients for each pizza?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-what-are-the-standard-ingredients-for-each-pizza)
158 |
159 | [2. What was the most commonly added extra?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-what-was-the-most-commonly-added-extra)
160 |
161 | [3. What was the most common exclusion?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-what-was-the-most-common-exclusion)
162 |
163 | [4. Generate an order item for each record in the customers_orders table in the format of one of the following:](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-generate-an-order-item-for-each-record-in-the-customers_orders-table-in-the-format-of-one-of-the-following)
164 |
165 | [5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-generate-an-alphabetically-ordered-comma-separated-ingredient-list-for-each-pizza-order-from-the-customer_orders-table-and-add-a-2x-in-front-of-any-relevant-ingredients)
166 |
167 | [6. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#6-what-is-the-total-quantity-of-each-ingredient-used-in-all-delivered-pizzas-sorted-by-most-frequent-first)
168 |
169 | [D. Pricing and Ratings](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#d-pricing-and-ratings)
170 |
171 | [1. If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-if-a-meat-lovers-pizza-costs-12-and-vegetarian-costs-10-and-there-were-no-charges-for-changes---how-much-money-has-pizza-runner-made-so-far-if-there-are-no-delivery-fees)
172 |
173 | [2. What if there was an additional $1 charge for any pizza extras?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-what-if-there-was-an-additional-1-charge-for-any-pizza-extras)
174 |
175 | [- Add cheese is $1 extra](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#--add-cheese-is-1-extra)
176 |
177 | [3. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-the-pizza-runner-team-now-wants-to-add-an-additional-ratings-system-that-allows-customers-to-rate-their-runner-how-would-you-design-an-additional-table-for-this-new-dataset---generate-a-schema-for-this-new-table-and-insert-your-own-data-for-ratings-for-each-successful-customer-order-between-1-to-5)
178 |
179 | [4. Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-using-your-newly-generated-table---can-you-join-all-of-the-information-together-to-form-a-table-which-has-the-following-information-for-successful-deliveries)
180 |
181 | [5. If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-if-a-meat-lovers-pizza-was-12-and-vegetarian-10-fixed-prices-with-no-cost-for-extras-and-each-runner-is-paid-030-per-kilometre-traveled---how-much-money-does-pizza-runner-have-left-over-after-these-deliveries)
182 |
183 | [E. Bonus Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#e-bonus-questions)
184 |
185 | [If Danny wants to expand his range of pizzas - how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#if-danny-wants-to-expand-his-range-of-pizzas---how-would-this-impact-the-existing-data-design-write-an-insert-statement-to-demonstrate-what-would-happen-if-a-new-supreme-pizza-with-all-the-toppings-was-added-to-the-pizza-runner-menu)
186 |
--------------------------------------------------------------------------------
/Case Study #2 - Pizza Runner/SQL/pizza_runner.sql:
--------------------------------------------------------------------------------
1 | CREATE SCHEMA pizza_runner;
2 | SET search_path = pizza_runner;
3 |
4 | DROP TABLE IF EXISTS runners;
5 | CREATE TABLE runners (
6 | "runner_id" INTEGER,
7 | "registration_date" DATE
8 | );
9 | INSERT INTO runners
10 | ("runner_id", "registration_date")
11 | VALUES
12 | (1, '2021-01-01'),
13 | (2, '2021-01-03'),
14 | (3, '2021-01-08'),
15 | (4, '2021-01-15');
16 |
17 |
18 | DROP TABLE IF EXISTS customer_orders;
19 | CREATE TABLE customer_orders (
20 | "order_id" INTEGER,
21 | "customer_id" INTEGER,
22 | "pizza_id" INTEGER,
23 | "exclusions" VARCHAR(4),
24 | "extras" VARCHAR(4),
25 | "order_time" TIMESTAMP
26 | );
27 |
28 | INSERT INTO customer_orders
29 | ("order_id", "customer_id", "pizza_id", "exclusions", "extras", "order_time")
30 | VALUES
31 | ('1', '101', '1', '', '', '2020-01-01 18:05:02'),
32 | ('2', '101', '1', '', '', '2020-01-01 19:00:52'),
33 | ('3', '102', '1', '', '', '2020-01-02 23:51:23'),
34 | ('3', '102', '2', '', NULL, '2020-01-02 23:51:23'),
35 | ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
36 | ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
37 | ('4', '103', '2', '4', '', '2020-01-04 13:23:46'),
38 | ('5', '104', '1', 'null', '1', '2020-01-08 21:00:29'),
39 | ('6', '101', '2', 'null', 'null', '2020-01-08 21:03:13'),
40 | ('7', '105', '2', 'null', '1', '2020-01-08 21:20:29'),
41 | ('8', '102', '1', 'null', 'null', '2020-01-09 23:54:33'),
42 | ('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'),
43 | ('10', '104', '1', 'null', 'null', '2020-01-11 18:34:49'),
44 | ('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49');
45 |
46 |
47 | DROP TABLE IF EXISTS runner_orders;
48 | CREATE TABLE runner_orders (
49 | "order_id" INTEGER,
50 | "runner_id" INTEGER,
51 | "pickup_time" VARCHAR(19),
52 | "distance" VARCHAR(7),
53 | "duration" VARCHAR(10),
54 | "cancellation" VARCHAR(23)
55 | );
56 |
57 | INSERT INTO runner_orders
58 | ("order_id", "runner_id", "pickup_time", "distance", "duration", "cancellation")
59 | VALUES
60 | ('1', '1', '2020-01-01 18:15:34', '20km', '32 minutes', ''),
61 | ('2', '1', '2020-01-01 19:10:54', '20km', '27 minutes', ''),
62 | ('3', '1', '2020-01-03 00:12:37', '13.4km', '20 mins', NULL),
63 | ('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL),
64 | ('5', '3', '2020-01-08 21:10:57', '10', '15', NULL),
65 | ('6', '3', 'null', 'null', 'null', 'Restaurant Cancellation'),
66 | ('7', '2', '2020-01-08 21:30:45', '25km', '25mins', 'null'),
67 | ('8', '2', '2020-01-10 00:15:02', '23.4 km', '15 minute', 'null'),
68 | ('9', '2', 'null', 'null', 'null', 'Customer Cancellation'),
69 | ('10', '1', '2020-01-11 18:50:20', '10km', '10minutes', 'null');
70 |
71 |
72 | DROP TABLE IF EXISTS pizza_names;
73 | CREATE TABLE pizza_names (
74 | "pizza_id" INTEGER,
75 | "pizza_name" TEXT
76 | );
77 | INSERT INTO pizza_names
78 | ("pizza_id", "pizza_name")
79 | VALUES
80 | (1, 'Meatlovers'),
81 | (2, 'Vegetarian');
82 |
83 |
84 | DROP TABLE IF EXISTS pizza_recipes;
85 | CREATE TABLE pizza_recipes (
86 | "pizza_id" INTEGER,
87 | "toppings" TEXT
88 | );
89 | INSERT INTO pizza_recipes
90 | ("pizza_id", "toppings")
91 | VALUES
92 | (1, '1, 2, 3, 4, 5, 6, 8, 10'),
93 | (2, '4, 6, 7, 9, 11, 12');
94 |
95 |
96 | DROP TABLE IF EXISTS pizza_toppings;
97 | CREATE TABLE pizza_toppings (
98 | "topping_id" INTEGER,
99 | "topping_name" TEXT
100 | );
101 | INSERT INTO pizza_toppings
102 | ("topping_id", "topping_name")
103 | VALUES
104 | (1, 'Bacon'),
105 | (2, 'BBQ Sauce'),
106 | (3, 'Beef'),
107 | (4, 'Cheese'),
108 | (5, 'Chicken'),
109 | (6, 'Mushrooms'),
110 | (7, 'Onions'),
111 | (8, 'Pepperoni'),
112 | (9, 'Peppers'),
113 | (10, 'Salami'),
114 | (11, 'Tomatoes'),
115 | (12, 'Tomato Sauce');
--------------------------------------------------------------------------------
/Case Study #3 - Foodie-Fi/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #3 - Foodie-Fi :avocado:
2 |
3 |
4 |
5 | Danny has shared the data design for Foodie-Fi and also short descriptions on each of the database tables - our case study focuses on only 2 tables but there will be a challenge to create a new table for the Foodie-Fi team.
6 |
7 | All datasets exist within the `foodie_fi` database schema.
8 |
9 | ## Available Data
10 |
11 | ### Table 1: `plans`
12 |
13 | Customers can choose which plans to join Foodie-Fi when they first sign up.
14 |
15 | Basic plan customers have limited access and can only stream their videos and is only available monthly at $9.90
16 |
17 | Pro plan customers have no watch time limits and are able to download videos for offline viewing. Pro plans start at $19.90 a month or $199 for an annual subscription.
18 |
19 | Customers can sign up to an initial 7 day free trial will automatically continue with the pro monthly subscription plan unless they cancel, downgrade to basic or upgrade to an annual pro plan at any point during the trial.
20 |
21 | When customers cancel their Foodie-Fi service - they will have a churn plan record with a null price but their plan will continue until the end of the billing period.
22 |
23 | | plan_id | plan_name | price |
24 | |---------|---------------|-------|
25 | | 0 | trial | 0 |
26 | | 1 | basic monthly | 9.90 |
27 | | 2 | pro monthly | 19.90 |
28 | | 3 | pro annual | 199 |
29 | | 4 | churn | null |
30 |
31 | ### Table 2: `subscriptions`
32 |
33 | Customer subscriptions show the exact date where their specific `plan_id` starts.
34 |
35 | If customers downgrade from a pro plan or cancel their subscription - the higher plan will remain in place until the period is over - the start_date in the subscriptions table will reflect the date that the actual plan changes.
36 |
37 | When customers upgrade their account from a basic plan to a pro or annual pro plan - the higher plan will take effect straightaway.
38 |
39 | When customers churn - they will keep their access until the end of their current billing period but the `start_date` will be technically the day they decided to cancel their service.
40 |
41 | | customer_id | plan_id | start_date |
42 | |-------------|---------|------------|
43 | | 1 | 0 | 2020-08-01 |
44 | | 1 | 1 | 2020-08-08 |
45 | | 2 | 0 | 2020-09-20 |
46 | | 2 | 3 | 2020-09-27 |
47 | | 11 | 0 | 2020-11-19 |
48 | | 11 | 4 | 2020-11-26 |
49 | | 13 | 0 | 2020-12-15 |
50 | | 13 | 1 | 2020-12-22 |
51 | | 13 | 2 | 2021-03-29 |
52 | | 15 | 0 | 2020-03-17 |
53 | | 15 | 2 | 2020-03-24 |
54 | | 15 | 4 | 2020-04-29 |
55 | | 16 | 0 | 2020-05-31 |
56 | | 16 | 1 | 2020-06-07 |
57 | | 16 | 3 | 2020-10-21 |
58 | | 18 | 0 | 2020-07-06 |
59 | | 18 | 2 | 2020-07-13 |
60 | | 19 | 0 | 2020-06-22 |
61 | | 19 | 2 | 2020-06-29 |
62 | | 19 | 3 | 2020-08-29 |
63 |
64 | ## Entity Relationship Diagram
65 |
66 | 
67 |
68 | ## Table of Contents
69 |
70 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#introduction)
71 |
72 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#case-study-questions)
73 |
74 | [A. Customer Journey](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#a-customer-journey)
75 |
76 | [1. Based off the 8 sample customers provided in the sample from the subscriptions table, write a brief description about each customer’s onboarding journey.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#1-based-off-the-8-sample-customers-provided-in-the-sample-from-the-subscriptions-table-write-a-brief-description-about-each-customers-onboarding-journey)
77 |
78 | [Try to keep it as short as possible - you may also want to run some sort of join to make your explanations a bit easier!](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#try-to-keep-it-as-short-as-possible---you-may-also-want-to-run-some-sort-of-join-to-make-your-explanations-a-bit-easier)
79 |
80 | [B. Data Analysis Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#b-data-analysis-questions)
81 |
82 | [1. How many customers has Foodie-Fi ever had?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#1-how-many-customers-has-foodie-fi-ever-had)
83 |
84 | [2. What is the monthly distribution of trial plan start_date values for our dataset - use the start of the month as the group by value](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#2-what-is-the-monthly-distribution-of-trial-plan-start_date-values-for-our-dataset---use-the-start-of-the-month-as-the-group-by-value)
85 |
86 | [3. What plan start_date values occur after the year 2020 for our dataset? Show the breakdown by count of events for each plan_name](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#3-what-plan-start_date-values-occur-after-the-year-2020-for-our-dataset-show-the-breakdown-by-count-of-events-for-each-plan_name)
87 |
88 | [4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#4-what-is-the-customer-count-and-percentage-of-customers-who-have-churned-rounded-to-1-decimal-place)
89 |
90 | [5. How many customers have churned straight after their initial free trial - what percentage is this rounded to the nearest whole number?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#5-how-many-customers-have-churned-straight-after-their-initial-free-trial---what-percentage-is-this-rounded-to-the-nearest-whole-number)
91 |
92 | [6. What is the number and percentage of customer plans after their initial free trial?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#6-what-is-the-number-and-percentage-of-customer-plans-after-their-initial-free-trial)
93 |
94 | [7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#7-what-is-the-customer-count-and-percentage-breakdown-of-all-5-plan_name-values-at-2020-12-31)
95 |
96 | [8. How many customers have upgraded to an annual plan in 2020?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#8-how-many-customers-have-upgraded-to-an-annual-plan-in-2020)
97 |
98 | [9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#9-how-many-days-on-average-does-it-take-for-a-customer-to-an-annual-plan-from-the-day-they-join-foodie-fi)
99 |
100 | [10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#10-can-you-further-breakdown-this-average-value-into-30-day-periods-ie-0-30-days-31-60-days-etc)
101 |
102 | [11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#11-how-many-customers-downgraded-from-a-pro-monthly-to-a-basic-monthly-plan-in-2020)
103 |
104 | [C. Challenge Payment Question](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#c-challenge-payment-question)
105 |
106 | [The Foodie-Fi team wants you to create a new payments table for the year 2020 that includes amounts paid by each customer in the subscriptions table with the following requirements:](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#the-foodie-fi-team-wants-you-to-create-a-new-payments-table-for-the-year-2020-that-includes-amounts-paid-by-each-customer-in-the-subscriptions-table-with-the-following-requirements)
107 |
108 | [D. Outside The Box Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#d-outside-the-box-questions)
109 |
110 | [1. How would you calculate the rate of growth for Foodie-Fi?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#1-how-would-you-calculate-the-rate-of-growth-for-foodie-fi)
111 |
112 | [2. What key metrics would you recommend Foodie-Fi management to track over time to assess performance of their overall business?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#2-what-key-metrics-would-you-recommend-foodie-fi-management-to-track-over-time-to-assess-performance-of-their-overall-business)
113 |
114 | [3. What are some key customer journeys or experiences that you would analyse further to improve customer retention?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#3-what-are-some-key-customer-journeys-or-experiences-that-you-would-analyse-further-to-improve-customer-retention)
115 |
116 | [4. If the Foodie-Fi team were to create an exit survey shown to customers who wish to cancel their subscription, what questions would you include in the survey?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#4-if-the-foodie-fi-team-were-to-create-an-exit-survey-shown-to-customers-who-wish-to-cancel-their-subscription-what-questions-would-you-include-in-the-survey)
117 |
118 | [5. What business levers could the Foodie-Fi team use to reduce the customer churn rate? How would you validate the effectiveness of your ideas?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#5-what-business-levers-could-the-foodie-fi-team-use-to-reduce-the-customer-churn-rate-how-would-you-validate-the-effectiveness-of-your-ideas)
119 |
--------------------------------------------------------------------------------
/Case Study #3 - Foodie-Fi/SQL/solution_3.sql:
--------------------------------------------------------------------------------
1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 18, 2022
2 | -- Fixed on February 4, 2022
3 |
4 | /* --------------------
5 | Case Study Questions: Case Study #3 - Foodie-Fi
6 | --------------------*/
7 | SET
8 | search_path = foodie_fi;
9 |
10 | -- A. Customer Journey
11 |
12 | -- 1. Based off the 8 sample customers provided in the sample from the `subscriptions` table, write a brief description about each customer’s onboarding journey.
13 | -- Try to keep it as short as possible - you may also want to run some sort of join to make your explanations a bit easier!
14 | -- Non-SQL question, answered in Solution.md
15 |
16 | -- B. Data Analysis Questions
17 | -- 1. How many customers has Foodie-Fi ever had?
18 |
19 | SELECT
20 | COUNT(distinct customer_id) AS total_number_of_customers
21 | FROM
22 | subscriptions;
23 |
24 | -- 2. What is the monthly distribution of trial plan `start_date` values for our dataset - use the start of the month as the group by value
25 |
26 | SELECT
27 | start_date::varchar,
28 | plan_name,
29 | SUM(number_of_customers) AS number_of_customers
30 | FROM
31 | (
32 | SELECT
33 | DATE_TRUNC('month', start_date)::date AS start_date,
34 | plan_name,
35 | COUNT(customer_id) AS number_of_customers
36 | FROM
37 | subscriptions AS s
38 | JOIN plans AS p ON s.plan_id = p.plan_id
39 | WHERE
40 | plan_name = 'trial'
41 | GROUP BY
42 | start_date,
43 | plan_name
44 | ) AS count_customers
45 | GROUP BY
46 | start_date,
47 | plan_name
48 | ORDER BY
49 | start_date;
50 |
51 | -- 3. What plan `start_date` values occur after the year 2020 for our dataset? Show the breakdown by count of events for each `plan_name`
52 |
53 | SELECT
54 | plan_name,
55 | COUNT(customer_id) AS number_of_events
56 | FROM
57 | subscriptions as s
58 | JOIN plans AS p ON s.plan_id = p.plan_id
59 | WHERE
60 | start_date > '2020-12-31' :: date
61 | GROUP BY
62 | plan_name
63 | ORDER BY
64 | plan_name;
65 |
66 | -- 4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place?
67 |
68 | SELECT
69 | SUM(churned_customers) AS churned_customers,
70 | ROUND(
71 | (SUM(churned_customers) / SUM(total_customers)) * 100,
72 | 1
73 | ) AS churn_percentage
74 | FROM
75 | (
76 | SELECT
77 | CASE
78 | WHEN plan_name = 'churn' THEN COUNT(distinct customer_id)
79 | ELSE 0
80 | END AS churned_customers,
81 | CASE
82 | WHEN plan_name = 'trial' THEN COUNT(distinct customer_id)
83 | ELSE 0
84 | END AS total_customers
85 | FROM
86 | subscriptions AS s
87 | JOIN plans AS p ON s.plan_id = p.plan_id
88 | GROUP BY
89 | plan_name
90 | ) AS count_churn;
91 |
92 | -- 5. How many customers have churned straight after their initial free trial - what percentage is this rounded to the nearest whole number?
93 |
94 | SELECT
95 | plan_name,
96 | COUNT(plan_name) AS number_of_churned_customers,
97 | ROUND(
98 | COUNT(plan_name) / customers :: numeric * 100
99 | ) AS percentage_of_churned_after_trial,
100 | ROUND(
101 | COUNT(plan_name) / churned_customers :: numeric * 100
102 | ) AS churned_after_trial_to_all_churned
103 | FROM
104 | (
105 | SELECT
106 | s.customer_id,
107 | trial_ended,
108 | plan_name
109 | FROM
110 | subscriptions AS s
111 | JOIN plans AS p ON s.plan_id = p.plan_id
112 | JOIN (
113 | SELECT
114 | customer_id,
115 | (start_date + interval '7' day) AS trial_ended
116 | FROM
117 | subscriptions
118 | WHERE
119 | plan_id = 0
120 | ) AS t ON s.customer_id = t.customer_id
121 | WHERE
122 | start_date = trial_ended
123 | AND plan_name = 'churn'
124 | GROUP BY
125 | start_date,
126 | s.customer_id,
127 | trial_ended,
128 | plan_name
129 | ) AS count_plans,
130 | LATERAL(
131 | SELECT
132 | COUNT(distinct customer_id) AS customers
133 | FROM
134 | subscriptions
135 | ) p,
136 | LATERAL(
137 | SELECT
138 | COUNT(distinct customer_id) as churned_customers
139 | FROM
140 | subscriptions
141 | WHERE
142 | plan_id = 4
143 | ) p1
144 | GROUP BY
145 | plan_name,
146 | customers,
147 | churned_customers;
148 |
149 | -- 6. What is the number and percentage of customer plans after their initial free trial?
150 |
151 | SELECT plan_name, COUNT(plan_name) AS number_of_plans_after_trial,
152 | ROUND(
153 | (
154 | COUNT(plan_name) / (
155 | SELECT
156 | COUNT(distinct customer_id)
157 | FROM
158 | subscriptions
159 | ) ::numeric * 100
160 | ), 1
161 | ) AS percentage_of_total_customers
162 | FROM
163 | (SELECT
164 | s.customer_id,
165 | trial_ended,
166 | plan_name
167 | FROM
168 | subscriptions AS s
169 | JOIN plans AS p ON s.plan_id = p.plan_id
170 | JOIN (
171 | SELECT
172 | customer_id,
173 | (start_date + interval '7' day) AS trial_ended
174 | FROM
175 | subscriptions
176 | WHERE
177 | plan_id = 0
178 | ) AS t ON s.customer_id = t.customer_id
179 | WHERE
180 | start_date = trial_ended
181 | GROUP BY
182 | start_date,
183 | s.customer_id,
184 | trial_ended,
185 | plan_name) AS count_plans
186 | GROUP BY plan_name;
187 |
188 | -- 7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31?
189 |
190 | WITH ranked AS (
191 | SELECT
192 | plan_name,
193 | RANK() OVER(
194 | PARTITION BY customer_id
195 | ORDER BY
196 | start_date DESC
197 | ) AS RANK
198 | FROM
199 | subscriptions AS s
200 | JOIN plans AS p ON s.plan_id = p.plan_id
201 | WHERE
202 | start_date <= '2020-12-31' :: date
203 | )
204 | SELECT
205 | plan_name,
206 | COUNT(plan_name) as number_of_plans,
207 | ROUND(
208 | (
209 | COUNT(plan_name) / customers :: numeric
210 | ) * 100,
211 | 1
212 | ) AS percentage_of_plans
213 | FROM
214 | ranked,
215 | LATERAL(
216 | SELECT
217 | COUNT(distinct customer_id) AS customers
218 | FROM
219 | subscriptions
220 | ) c
221 | WHERE
222 | rank = 1
223 | GROUP BY
224 | plan_name,
225 | customers
226 | ORDER BY
227 | 1;
228 |
229 | -- 8. How many customers have upgraded to an annual plan in 2020?
230 |
231 | SELECT
232 | plan_name,
233 | count(plan_name) AS number_of_customers
234 | FROM
235 | subscriptions AS s
236 | JOIN plans AS p ON s.plan_id = p.plan_id
237 | WHERE
238 | start_date BETWEEN '2020-01-01' :: date
239 | AND '2020-12-31' :: date
240 | AND plan_name = 'pro annual'
241 | GROUP BY
242 | plan_name;
243 |
244 | -- 9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi?
245 |
246 | SELECT
247 | plan_name,
248 | ROUND(AVG(s.start_date - t.start_date)) AS average_days_to_upgrade
249 | FROM
250 | subscriptions AS s
251 | JOIN plans AS p ON s.plan_id = p.plan_id
252 | JOIN (
253 | SELECT
254 | customer_id,
255 | start_date
256 | FROM
257 | subscriptions
258 | WHERE
259 | plan_id = 0
260 | ) AS t ON s.customer_id = t.customer_id
261 | WHERE
262 | plan_name = 'pro annual'
263 | GROUP BY
264 | plan_name;
265 |
266 | -- 10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)
267 |
268 | SELECT
269 | *
270 | FROM
271 | (
272 | SELECT
273 | plan_name,
274 | CASE
275 | WHEN s.start_date - t.start_date < 31 THEN '0-30 days'
276 | WHEN s.start_date - t.start_date BETWEEN 31
277 | AND 60 THEN '31-60 days'
278 | WHEN s.start_date - t.start_date BETWEEN 61
279 | AND 90 THEN '61-90 days'
280 | WHEN s.start_date - t.start_date BETWEEN 91
281 | AND 120 THEN '91-120 days'
282 | WHEN s.start_date - t.start_date BETWEEN 121
283 | AND 150 THEN '121-150 days'
284 | WHEN s.start_date - t.start_date BETWEEN 151
285 | AND 180 THEN '151-180 days'
286 | WHEN s.start_date - t.start_date BETWEEN 181
287 | AND 210 THEN '181-210 days'
288 | WHEN s.start_date - t.start_date BETWEEN 211
289 | AND 240 THEN '211-240 days'
290 | WHEN s.start_date - t.start_date BETWEEN 241
291 | AND 270 THEN '241-270 days'
292 | WHEN s.start_date - t.start_date BETWEEN 271
293 | AND 300 THEN '271-300 days'
294 | WHEN s.start_date - t.start_date BETWEEN 301
295 | AND 330 THEN '301-330 days'
296 | WHEN s.start_date - t.start_date BETWEEN 331
297 | AND 360 THEN '331-360 days'
298 | WHEN s.start_date - t.start_date > 360 THEN '360+ days'
299 | END AS group_by_days_to_upgrade,
300 | COUNT(s.start_date - t.start_date) AS number_of_customers,
301 | ROUND(AVG(s.start_date - t.start_date)) AS average_days_to_upgrade
302 | FROM
303 | subscriptions AS s
304 | JOIN plans AS p ON s.plan_id = p.plan_id
305 | JOIN (
306 | SELECT
307 | customer_id,
308 | start_date
309 | FROM
310 | subscriptions
311 | WHERE
312 | plan_id = 0
313 | ) AS t ON s.customer_id = t.customer_id
314 | WHERE
315 | plan_name = 'pro annual'
316 | GROUP BY
317 | plan_name,
318 | group_by_days_to_upgrade
319 | ) AS count_groups
320 | GROUP BY
321 | plan_name,
322 | group_by_days_to_upgrade,
323 | number_of_customers,
324 | average_days_to_upgrade
325 | ORDER BY
326 | CASE
327 | WHEN group_by_days_to_upgrade = '0-30 days' THEN 1
328 | WHEN group_by_days_to_upgrade = '31-60 days' THEN 2
329 | WHEN group_by_days_to_upgrade = '61-90 days' THEN 3
330 | WHEN group_by_days_to_upgrade = '91-120 days' THEN 4
331 | WHEN group_by_days_to_upgrade = '121-150 days' THEN 5
332 | WHEN group_by_days_to_upgrade = '151-180 days' THEN 6
333 | WHEN group_by_days_to_upgrade = '181-210 days' THEN 7
334 | WHEN group_by_days_to_upgrade = '211-240 days' THEN 8
335 | WHEN group_by_days_to_upgrade = '241-270 days' THEN 9
336 | WHEN group_by_days_to_upgrade = '271-300 days' THEN 10
337 | WHEN group_by_days_to_upgrade = '301-330 days' THEN 11
338 | WHEN group_by_days_to_upgrade = '331-360 days' THEN 12
339 | WHEN group_by_days_to_upgrade = '360+ days' THEN 13
340 | END;
341 |
342 | -- 11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020?
343 |
344 | SELECT
345 | s.customer_id,
346 | t.start_date::varchar AS basic_plan_started_on,
347 | s.start_date::varchar AS pro_plan_started_on,
348 | (t.start_date - s.start_date) AS days_between_basic_and_pro
349 | FROM
350 | subscriptions AS s
351 | JOIN plans AS p ON s.plan_id = p.plan_id
352 | JOIN (
353 | SELECT
354 | customer_id,
355 | start_date
356 | FROM
357 | subscriptions
358 | WHERE
359 | plan_id = 1
360 | ) AS t ON s.customer_id = t.customer_id
361 | WHERE
362 | plan_name = 'pro monthly'
363 | AND s.start_date >= '2020-01-01' :: date
364 | AND s.start_date <= '2020-12-31' :: date
365 | GROUP BY
366 | s.start_date,
367 | t.start_date,
368 | s.customer_id,
369 | plan_name
370 | ORDER BY
371 | days_between_basic_and_pro DESC
372 | LIMIT
373 | 5;
374 | -- C. Challenge Payment Question
375 |
376 | /* --------------------
377 | The Foodie-Fi team wants you to create a new `payments` table for the year 2020 that includes amounts paid by each customer in the `subscriptions` table with the following requirements:
378 |
379 | - monthly payments always occur on the same day of month as the original `start_date` of any monthly paid plan
380 | -
381 | - upgrades from basic to monthly or pro plans are reduced by the current paid amount in that month and start immediately
382 | -
383 | - upgrades from pro monthly to pro annual are paid at the end of the current billing period and also starts at the end of the month period
384 | -
385 | - once a customer churns they will no longer make payments
386 | --------------------*/
387 |
388 | SELECT
389 | customer_id,
390 | plan_id,
391 | plan_name,
392 | payment_date ::date :: varchar,
393 | CASE
394 | WHEN LAG(plan_id) OVER (
395 | PARTITION BY customer_id
396 | ORDER BY
397 | plan_id
398 | ) != plan_id
399 | AND DATE_PART(
400 | 'day',
401 | payment_date - LAG(payment_date) OVER (
402 | PARTITION BY customer_id
403 | ORDER BY
404 | plan_id
405 | )
406 | ) < 30 THEN amount - LAG(amount) OVER (
407 | PARTITION BY customer_id
408 | ORDER BY
409 | plan_id
410 | )
411 | ELSE amount
412 | END AS amount,
413 | RANK() OVER(
414 | PARTITION BY customer_id
415 | ORDER BY
416 | payment_date
417 | ) AS payment_order
418 |
419 | INTO TEMP TABLE payments
420 | FROM
421 | (
422 | SELECT
423 | customer_id,
424 | s.plan_id,
425 | plan_name,
426 | generate_series(
427 | start_date,
428 | CASE
429 | WHEN s.plan_id = 3 THEN start_date
430 | WHEN s.plan_id = 4 THEN NULL
431 | WHEN LEAD(start_date) OVER (
432 | PARTITION BY customer_id
433 | ORDER BY
434 | start_date
435 | ) IS NOT NULL THEN LEAD(start_date) OVER (
436 | PARTITION BY customer_id
437 | ORDER BY
438 | start_date
439 | )
440 | ELSE '2020-12-31' :: date
441 | END,
442 | '1 month' + '1 second' :: interval
443 | ) AS payment_date,
444 | price AS amount
445 | FROM
446 | subscriptions AS s
447 | JOIN plans AS p ON s.plan_id = p.plan_id
448 | WHERE
449 | s.plan_id != 0
450 | AND start_date < '2021-01-01' :: date
451 | GROUP BY
452 | customer_id,
453 | s.plan_id,
454 | plan_name,
455 | start_date,
456 | price
457 | ) AS t
458 | ORDER BY
459 | customer_id;
460 |
461 | -- D. Outside The Box Questions
462 | -- The following are open ended questions which might be asked during a technical interview for this case study - there are no right or wrong answers, but answers that make sense from both a technical and a business perspective make an amazing impression!
463 |
464 | -- 1. How would you calculate the rate of growth for Foodie-Fi?
465 |
466 | SELECT
467 | DATE_TRUNC('month', start_date) AS month,
468 | COUNT(customer_id) AS current_number_of_customers,
469 | LAG(COUNT(customer_id), 1) over (
470 | ORDER BY
471 | DATE_TRUNC('month', start_date)
472 | ) AS past_number_of_customers,
473 | (
474 | 100 * (
475 | COUNT(customer_id) - LAG(COUNT(customer_id), 1) over (
476 | ORDER BY
477 | DATE_TRUNC('month', start_date)
478 | )
479 | ) / LAG(COUNT(customer_id), 1) over (
480 | ORDER BY
481 | DATE_TRUNC('month', start_date)
482 | )
483 | ) || '%' AS growth
484 | FROM
485 | subscriptions AS s
486 | JOIN plans AS p ON s.plan_id = p.plan_id
487 | WHERE
488 | plan_name != 'trial'
489 | AND plan_name != 'churn'
490 | GROUP BY
491 | month
492 | ORDER BY
493 | month;
494 |
495 | -- 2. What key metrics would you recommend Foodie-Fi management to track over time to assess performance of their overall business?
496 |
497 | -- Non-SQL question
498 |
499 | -- 3. What are some key customer journeys or experiences that you would analyse further to improve customer retention?
500 |
501 | -- Non-SQL question
502 |
503 | -- 4. If the Foodie-Fi team were to create an exit survey shown to customers who wish to cancel their subscription, what questions would you include in the survey?
504 |
505 | -- Non-SQL question
506 |
507 | -- 5. What business levers could the Foodie-Fi team use to reduce the customer churn rate? How would you validate the effectiveness of your ideas?
508 |
509 | -- Non-SQL question
510 |
--------------------------------------------------------------------------------
/Case Study #4 - Data Bank/Solution.md:
--------------------------------------------------------------------------------
1 | [WORK IN PROGRESS]
2 |
--------------------------------------------------------------------------------
/Case Study #5 - Data Mart/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #5 - Data Mart :shopping:
2 |
3 |
4 |
5 | Data Mart is Danny’s latest venture and after running international operations for his online supermarket that specialises in fresh produce - Danny is asking for your support to analyse his sales performance.
6 |
7 | ## Available Data
8 |
9 | For this case study there is only a single table: `data_mart.weekly_sales`.
10 |
11 | The columns are pretty self-explanatory based on the column names but here are some further details about the dataset:
12 | - Data Mart has international operations using a multi-region strategy
13 | - Data Mart has both, a retail and online platform in the form of a Shopify store front to serve their customers
14 | - Customer `segment` and `customer_type` data relates to personal age and demographics information that is shared with Data Mart
15 | - `transactions` is the count of unique purchases made through Data Mart and sales is the actual dollar amount of purchases
16 |
17 | Each record in the dataset is related to a specific aggregated slice of the underlying sales data rolled up into a `week_date` value which represents the start of the sales week.
18 |
19 | 10 random rows are shown in the table output below from `data_mart.weekly_sales`:
20 |
21 | | week_date | region | platform | segment | customer_type | transactions | sales |
22 | |-----------|---------------|----------|---------|---------------|--------------|------------|
23 | | 9/9/20 | OCEANIA | Shopify | C3 | New | 610 | 110033.89 |
24 | | 29/7/20 | AFRICA | Retail | C1 | New | 110692 | 3053771.19 |
25 | | 22/7/20 | EUROPE | Shopify | C4 | Existing | 24 | 8101.54 |
26 | | 13/5/20 | AFRICA | Shopify | null | Guest | 5287 | 1003301.37 |
27 | | 24/7/19 | ASIA | Retail | C1 | New | 127342 | 3151780.41 |
28 | | 10/7/19 | CANADA | Shopify | F3 | New | 51 | 8844.93 |
29 | | 26/6/19 | OCEANIA | Retail | C3 | New | 152921 | 5551385.36 |
30 | | 29/5/19 | SOUTH AMERICA | Shopify | null | New | 53 | 10056.2 |
31 | | 22/8/18 | AFRICA | Retail | null | Existing | 31721 | 1718863.58 |
32 | | 25/7/18 | SOUTH AMERICA | Retail | null | New | 2136 | 81757.91 |
33 |
34 | ## Entity Relationship Diagram
35 |
36 | 
37 |
38 | ## Table of Contents
39 |
40 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#introduction)
41 |
42 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#case-study-questions)
43 |
44 | [1. Data Cleansing Steps](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#1-data-cleansing-steps)
45 |
46 | [2. Data Exploration](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#2-data-exploration)
47 |
48 | [1. What day of the week is used for each week_date value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#1-what-day-of-the-week-is-used-for-each-week_date-value)
49 |
50 | [2. What range of week numbers are missing from the dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#2-what-range-of-week-numbers-are-missing-from-the-dataset)
51 |
52 | [3. How many total transactions were there for each year in the dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#3-how-many-total-transactions-were-there-for-each-year-in-the-dataset)
53 |
54 | [4. What is the total sales for each region for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#4-what-is-the-total-sales-for-each-region-for-each-month)
55 |
56 | [5. What is the total count of transactions for each platform?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#5-what-is-the-total-count-of-transactions-for-each-platform)
57 |
58 | [6. What is the percentage of sales for Retail vs Shopify for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#6-what-is-the-percentage-of-sales-for-retail-vs-shopify-for-each-month)
59 |
60 | [7. What is the percentage of sales by demographic for each year in the dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#7-what-is-the-percentage-of-sales-by-demographic-for-each-year-in-the-dataset)
61 |
62 | [8. Which age_band and demographic values contribute the most to Retail sales?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#8-which-age_band-and-demographic-values-contribute-the-most-to-retail-sales)
63 |
64 | [9. Can we use the avg_transaction column to find the average transaction size for each year for Retail vs Shopify? If not - how would you calculate it instead?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#9-can-we-use-the-avg_transaction-column-to-find-the-average-transaction-size-for-each-year-for-retail-vs-shopify-if-not---how-would-you-calculate-it-instead)
65 |
66 | [3. Before & After Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#3-before--after-analysis)
67 |
68 | [1. What is the total sales for the 4 weeks before and after 2020-06-15? What is the growth or reduction rate in actual values and percentage of sales?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#1-what-is-the-total-sales-for-the-4-weeks-before-and-after-2020-06-15-what-is-the-growth-or-reduction-rate-in-actual-values-and-percentage-of-sales)
69 |
70 | [2. What about the entire 12 weeks before and after?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#2-what-about-the-entire-12-weeks-before-and-after)
71 |
72 | [3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#3-how-do-the-sale-metrics-for-these-2-periods-before-and-after-compare-with-the-previous-years-in-2018-and-2019)
73 |
74 | [4. Bonus Question](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#4-bonus-question)
75 |
76 | [Which areas of the business have the highest negative impact in sales metrics performance in 2020 for the 12 week before and after period?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#which-areas-of-the-business-have-the-highest-negative-impact-in-sales-metrics-performance-in-2020-for-the-12-week-before-and-after-period)
77 |
78 | [Do you have any further recommendations for Danny’s team at Data Mart or any interesting insights based off this analysis?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#do-you-have-any-further-recommendations-for-dannys-team-at-data-mart-or-any-interesting-insights-based-off-this-analysis)
79 |
--------------------------------------------------------------------------------
/Case Study #5 - Data Mart/SQL/solution_5.sql:
--------------------------------------------------------------------------------
1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 25, 2022
2 | -- Fixed on February 13, 2022
3 |
4 |
5 | -- Case Study #5 - Data Mart
6 |
7 |
8 | SET
9 | search_path = data_mart;
10 |
11 | /* --------------------
12 | Case Study Questions
13 |
14 | 1. Data Cleansing Steps
15 |
16 | In a single query, perform the following operations and generate a new table in the `data_mart` schema named `clean_weekly_sales`:
17 |
18 | - Convert the `week_date` to a `DATE` format
19 | - Add a `week_number` as the second column for each `week_date` value, for example any value from the 1st of January to 7th of January will be 1, 8th to 14th will be 2 etc
20 | - Add a `month_number` with the calendar month for each week_date value as the 3rd column
21 | - Add a `calendar_year` column as the 4th column containing either 2018, 2019 or 2020 values
22 | - Add a new column called `age_band` after the original segment column using the following mapping on the number inside the segment value
23 | - Add a new `demographic` column using the following mapping for the first letter in the segment values
24 | - Ensure all `null` string values with an "unknown" string value in the original segment column as well as the new `age_band` and `demographic` columns
25 | - Generate a new `avg_transaction` column as the sales value divided by transactions rounded to 2 decimal places for each record
26 | --------------------*/
27 |
28 | SELECT
29 | date AS week_date,
30 | EXTRACT(
31 | WEEK
32 | FROM
33 | date
34 | ) :: int AS week_number,
35 | EXTRACT(
36 | MONTH
37 | FROM
38 | date
39 | ) :: int AS month_number,
40 | EXTRACT(
41 | YEAR
42 | FROM
43 | date
44 | ) :: int AS calendar_year,
45 | region,
46 | platform,
47 | CASE
48 | WHEN segment = 'null' then 'unknown'
49 | ELSE segment
50 | END AS segment,
51 | CASE
52 | WHEN substr(segment, 2, 1) = '1' then 'Young Adults'
53 | WHEN substr(segment, 2, 1) = '2' then 'Middle Aged'
54 | WHEN substr(segment, 2, 1) in ('3', '4') then 'Retirees'
55 | ELSE 'unknown'
56 | END AS age_band,
57 | CASE
58 | WHEN substr(segment, 1, 1) = 'C' then 'Couples'
59 | WHEN substr(segment, 1, 1) = 'F' then 'Families'
60 | ELSE 'unknown'
61 | END AS demographic,
62 | customer_type,
63 | transactions,
64 | sales,
65 | avg_transaction INTO clean_weekly_sales
66 | FROM
67 | weekly_sales,
68 | LATERAL (
69 | SELECT
70 | TO_DATE(week_date, 'DD/MM/YY') AS date
71 | ) date,
72 | LATERAL (
73 | SELECT
74 | ROUND((sales :: numeric / transactions), 2) AS avg_transaction
75 | ) avt
76 | ORDER BY
77 | calendar_year;
78 |
79 | /* --------------------
80 | 2. Data Exploration
81 |
82 | 1. What day of the week is used for each week_date value?
83 | --------------------*/
84 |
85 | SELECT
86 | EXTRACT(
87 | ISODOW
88 | FROM
89 | week_date
90 | ) AS day_of_week
91 | FROM
92 | clean_weekly_sales
93 | GROUP BY
94 | 1;
95 |
96 | -- 2. What range of week numbers are missing from the dataset?
97 |
98 | WITH series as (
99 | SELECT
100 | GENERATE_SERIES(1, 52) as missing_weeks
101 | FROM
102 | clean_weekly_sales
103 | )
104 | SELECT
105 | missing_weeks
106 | FROM
107 | series
108 | WHERE
109 | missing_weeks NOT IN(
110 | SELECT
111 | week_number
112 | FROM
113 | clean_weekly_sales
114 | )
115 | GROUP BY
116 | 1
117 | ORDER BY
118 | 1;
119 |
120 | -- 3. How many total transactions were there for each year in the dataset?
121 |
122 | SELECT
123 | calendar_year,
124 | SUM(transactions) AS total_number_of_transactions
125 | FROM
126 | clean_weekly_sales
127 | GROUP BY
128 | 1
129 | ORDER BY
130 | 1;
131 |
132 | -- 4. What is the total sales for each region for each month?
133 |
134 | SELECT
135 | region,
136 | month_number,
137 | SUM(sales) AS total_sales
138 | FROM
139 | clean_weekly_sales
140 | GROUP BY
141 | 1,
142 | 2
143 | ORDER BY
144 | 1,
145 | 2;
146 |
147 | -- 5. What is the total count of transactions for each platform?
148 |
149 | SELECT
150 | platform,
151 | SUM(transactions) as total_transactions
152 | FROM
153 | clean_weekly_sales
154 | GROUP BY
155 | 1
156 | ORDER BY
157 | 1;
158 |
159 | -- 6. What is the percentage of sales for Retail vs Shopify for each month?
160 |
161 | SELECT
162 | sr.month_number,
163 | sr.calendar_year,
164 | ROUND(100 *(SUM(sales) :: numeric / total_sales_s), 1) AS percentage_of_sales_retail,
165 | 100 - ROUND(100 *(SUM(sales) :: numeric / total_sales_s), 1) AS percentage_of_sales_shopify
166 | FROM
167 | clean_weekly_sales AS sr
168 | JOIN (
169 | SELECT
170 | ss.month_number,
171 | ss.calendar_year,
172 | SUM(sales) AS total_sales_s
173 | FROM
174 | clean_weekly_sales AS ss
175 | GROUP BY
176 | ss.month_number,
177 | ss.calendar_year
178 | ) ss ON sr.month_number = ss.month_number
179 | and ss.calendar_year = sr.calendar_year
180 | WHERE
181 | sr.platform = 'Retail'
182 | GROUP BY
183 | sr.month_number,
184 | sr.calendar_year,
185 | total_sales_s
186 | ORDER BY
187 | 2,
188 | 1;
189 |
190 | -- 7. What is the percentage of sales by demographic for each year in the dataset?
191 |
192 | SELECT
193 | s.calendar_year,
194 | demographic,
195 | ROUND(100 *(SUM(sales) :: numeric / total_sales), 1) AS percentage
196 | FROM
197 | clean_weekly_sales AS s
198 | JOIN (
199 | SELECT
200 | cSUMalendar_year,
201 | SUM(sales) AS total_sales
202 | FROM
203 | clean_weekly_sales AS ts
204 | GROUP BY
205 | calendar_year
206 | ) ts ON s.calendar_year = ts.calendar_year
207 | GROUP BY
208 | 1,
209 | 2,
210 | total_sales
211 | ORDER BY
212 | 1,
213 | 2;
214 |
215 | -- 8. Which age_band and demographic values contribute the most to Retail sales?
216 |
217 | SELECT
218 | age_band,
219 | demographic,
220 | SUM(sales) AS total_sales
221 | FROM
222 | clean_weekly_sales
223 | WHERE
224 | platform = 'Retail'
225 | GROUP BY
226 | 1,
227 | 2
228 | ORDER BY
229 | 3 DESC,
230 | 1,
231 | 2;
232 |
233 | SELECT
234 | age_band,
235 | total_sales,
236 | percentage_of_sales
237 | FROM
238 | (
239 | SELECT
240 | age_band,
241 | SUM(sales) AS total_sales,
242 | round(100 * (SUM(sales) :: numeric / all_sales), 1) AS percentage_of_sales,
243 | row_number() over (
244 | ORDER BY
245 | SUM(sales) DESC
246 | )
247 | FROM
248 | clean_weekly_sales,
249 | LATERAL (
250 | SELECT
251 | SUM(sales) AS all_sales
252 | FROM
253 | clean_weekly_sales
254 | ) s
255 | WHERE
256 | platform = 'Retail'
257 | AND demographic != 'unknown'
258 | GROUP BY
259 | 1,
260 | all_sales
261 | ) ts
262 | WHERE
263 | row_number = 1;
264 |
265 | SELECT
266 | demographic,
267 | total_sales,
268 | percentage_of_sales
269 | FROM
270 | (
271 | SELECT
272 | demographic,
273 | SUM(sales) AS total_sales,
274 | round(100 * (SUM(sales) :: numeric / all_sales), 1) AS percentage_of_sales,
275 | row_number() over (
276 | ORDER BY
277 | SUM(sales) DESC
278 | )
279 | FROM
280 | clean_weekly_sales,
281 | LATERAL (
282 | SELECT
283 | SUM(sales) AS all_sales
284 | FROM
285 | clean_weekly_sales
286 | ) s
287 | WHERE
288 | platform = 'Retail'
289 | AND demographic != 'unknown'
290 | GROUP BY
291 | 1,
292 | all_sales
293 | ) ts
294 | WHERE
295 | row_number = 1;
296 |
297 | SELECT
298 | demographic,
299 | age_band,
300 | total_sales,
301 | percentage_of_sales
302 | FROM
303 | (
304 | SELECT
305 | demographic,
306 | age_band,
307 | SUM(sales) AS total_sales,
308 | round(100 * (SUM(sales) :: numeric / all_sales), 1) AS percentage_of_sales,
309 | row_number() over (
310 | ORDER BY
311 | SUM(sales) DESC
312 | )
313 | FROM
314 | clean_weekly_sales,
315 | LATERAL (
316 | SELECT
317 | SUM(sales) AS all_sales
318 | FROM
319 | clean_weekly_sales
320 | ) s
321 | WHERE
322 | platform = 'Retail'
323 | AND demographic != 'unknown'
324 | GROUP BY
325 | 1,
326 | 2,
327 | all_sales
328 | ) ts
329 | WHERE
330 | row_number = 1;
331 |
332 | -- 9. Can we use the avg_transaction column to find the average transaction size for each year for Retail vs Shopify? If not - how would you calculate it instead?
333 |
334 | SELECT
335 | calendar_year,
336 | platform,
337 | ROUND(SUM(sales) :: numeric / SUM(transactions), 1) AS correct_avg,
338 | ROUND(AVG(avg_transaction), 1) AS incorrect_avg
339 | FROM
340 | clean_weekly_sales
341 | GROUP BY
342 | 1,
343 | 2
344 | ORDER BY
345 | 1,
346 | 2;
347 |
348 | /* --------------------
349 | 3. Before & After Analysis
350 |
351 | This technique is usually used when we inspect an important event and want to inspect the impact before and after a certain point in time.
352 |
353 | Taking the `week_date` value of 2020-06-15 as the baseline week where the Data Mart sustainable packaging changes came into effect.
354 | We would include all `week_date` values for 2020-06-15 as the start of the period after the change and the previous `week_date` values would be before.
355 |
356 | Using this analysis approach - answer the following questions:
357 |
358 | 1. What is the total sales for the 4 weeks before and after 2020-06-15? What is the growth or reduction rate in actual values and percentage of sales?
359 | --------------------*/
360 |
361 | WITH sales_before AS (
362 | SELECT
363 | SUM(sales) AS total_sales_before
364 | FROM
365 | clean_weekly_sales,
366 | LATERAL(
367 | SELECT
368 | EXTRACT(
369 | WEEK
370 | FROM
371 | '2020-06-15' :: date
372 | ) AS base_week
373 | ) bw
374 | WHERE
375 | calendar_year = 2020
376 | AND week_number between (base_week - 4)
377 | AND (base_week - 1)
378 | ),
379 | sales_after AS (
380 | SELECT
381 | SUM(sales) AS total_sales_after
382 | FROM
383 | clean_weekly_sales,
384 | LATERAL(
385 | SELECT
386 | EXTRACT(
387 | WEEK
388 | FROM
389 | '2020-06-15' :: date
390 | ) AS base_week
391 | ) bw
392 | WHERE
393 | calendar_year = 2020
394 | AND week_number between (base_week)
395 | AND (base_week + 3)
396 | )
397 | SELECT
398 | total_sales_before,
399 | total_sales_after,
400 | total_sales_after - total_sales_before AS change_in_sales,
401 | ROUND(
402 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
403 | 2
404 | ) AS percentage_of_change
405 | FROM
406 | sales_before,
407 | sales_after;
408 |
409 | -- 2. What about the entire 12 weeks before and after?
410 |
411 | WITH sales_before AS (
412 | SELECT
413 | SUM(sales) AS total_sales_before
414 | FROM
415 | clean_weekly_sales,
416 | LATERAL(
417 | SELECT
418 | EXTRACT(
419 | WEEK
420 | FROM
421 | '2020-06-15' :: date
422 | ) AS base_week
423 | ) bw
424 | WHERE
425 | calendar_year = 2020
426 | AND week_number between (base_week - 12)
427 | AND (base_week - 1)
428 | ),
429 | sales_after AS (
430 | SELECT
431 | SUM(sales) AS total_sales_after
432 | FROM
433 | clean_weekly_sales,
434 | LATERAL(
435 | SELECT
436 | EXTRACT(
437 | WEEK
438 | FROM
439 | '2020-06-15' :: date
440 | ) AS base_week
441 | ) bw
442 | WHERE
443 | calendar_year = 2020
444 | AND week_number between (base_week)
445 | AND (base_week + 11)
446 | )
447 | SELECT
448 | total_sales_before,
449 | total_sales_after,
450 | total_sales_after - total_sales_before AS change_in_sales,
451 | ROUND(
452 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
453 | 2
454 | ) AS percentage_of_change
455 | FROM
456 | sales_before,
457 | sales_after;
458 |
459 | -- 3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019?
460 |
461 | WITH sales_before AS (
462 | SELECT
463 | calendar_year,
464 | SUM(sales) AS total_sales_before
465 | FROM
466 | clean_weekly_sales,
467 | LATERAL(
468 | SELECT
469 | EXTRACT(
470 | WEEK
471 | FROM
472 | '2020-06-15' :: date
473 | ) AS base_week
474 | ) bw
475 | WHERE
476 | week_number between (base_week - 4)
477 | AND (base_week - 1)
478 | group by
479 | 1
480 | ),
481 | sales_after AS (
482 | SELECT
483 | calendar_year,
484 | SUM(sales) AS total_sales_after
485 | FROM
486 | clean_weekly_sales,
487 | LATERAL(
488 | SELECT
489 | EXTRACT(
490 | WEEK
491 | FROM
492 | '2020-06-15' :: date
493 | ) AS base_week
494 | ) bw
495 | WHERE
496 | week_number between (base_week)
497 | AND (base_week + 3)
498 | group by
499 | 1
500 | )
501 | SELECT
502 | sb.calendar_year,
503 | total_sales_before,
504 | total_sales_after,
505 | total_sales_after - total_sales_before AS change_in_sales,
506 | ROUND(
507 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
508 | 2
509 | ) AS percentage_of_change
510 | FROM
511 | sales_before AS sb
512 | JOIN sales_after AS sa ON sb.calendar_year = sa.calendar_year
513 | group by
514 | 1,
515 | 2,
516 | 3;
517 |
518 | WITH sales_before AS (
519 | SELECT
520 | calendar_year,
521 | SUM(sales) AS total_sales_before
522 | FROM
523 | clean_weekly_sales,
524 | LATERAL(
525 | SELECT
526 | EXTRACT(
527 | WEEK
528 | FROM
529 | '2020-06-15' :: date
530 | ) AS base_week
531 | ) bw
532 | WHERE
533 | week_number between (base_week - 12)
534 | AND (base_week - 1)
535 | group by
536 | 1
537 | ),
538 | sales_after AS (
539 | SELECT
540 | calendar_year,
541 | SUM(sales) AS total_sales_after
542 | FROM
543 | clean_weekly_sales,
544 | LATERAL(
545 | SELECT
546 | EXTRACT(
547 | WEEK
548 | FROM
549 | '2020-06-15' :: date
550 | ) AS base_week
551 | ) bw
552 | WHERE
553 | week_number between (base_week)
554 | AND (base_week + 11)
555 | group by
556 | 1
557 | )
558 | SELECT
559 | sb.calendar_year,
560 | total_sales_before,
561 | total_sales_after,
562 | total_sales_after - total_sales_before AS change_in_sales,
563 | ROUND(
564 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
565 | 2
566 | ) AS percentage_of_change
567 | FROM
568 | sales_before AS sb
569 | JOIN sales_after AS sa ON sb.calendar_year = sa.calendar_year
570 | group by
571 | 1,
572 | 2,
573 | 3;
574 |
575 | /* --------------------
576 | 4. Bonus Question
577 |
578 | Which areas of the business have the highest negative impact in sales metrics performance in 2020 for the 12 week before and after period?
579 | --------------------*/
580 |
581 | -- region:
582 |
583 | WITH sales_before AS (
584 | SELECT
585 | region,
586 | SUM(sales) AS total_sales_before
587 | FROM
588 | clean_weekly_sales,
589 | LATERAL(
590 | SELECT
591 | EXTRACT(
592 | WEEK
593 | FROM
594 | '2020-06-15' :: date
595 | ) AS base_week
596 | ) bw
597 | WHERE
598 | calendar_year = 2020
599 | AND week_number between (base_week - 12)
600 | AND (base_week - 1)
601 | GROUP BY
602 | 1
603 | ),
604 | sales_after AS (
605 | SELECT
606 | region,
607 | SUM(sales) AS total_sales_after
608 | FROM
609 | clean_weekly_sales,
610 | LATERAL(
611 | SELECT
612 | EXTRACT(
613 | WEEK
614 | FROM
615 | '2020-06-15' :: date
616 | ) AS base_week
617 | ) bw
618 | WHERE
619 | calendar_year = 2020
620 | AND week_number between (base_week)
621 | AND (base_week + 11)
622 | GROUP BY
623 | 1
624 | )
625 | SELECT
626 | sb.region,
627 | total_sales_before,
628 | total_sales_after,
629 | total_sales_after - total_sales_before AS change_in_sales,
630 | ROUND(
631 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
632 | 2
633 | ) AS percentage_of_change
634 | FROM
635 | sales_before AS sb
636 | JOIN sales_after AS sa ON sb.region = sa.region
637 | GROUP BY
638 | 1,
639 | 2,
640 | 3,
641 | 4
642 | ORDER BY
643 | 5;
644 |
645 | -- platform:
646 |
647 | WITH sales_before AS (
648 | SELECT
649 | platform,
650 | SUM(sales) AS total_sales_before
651 | FROM
652 | clean_weekly_sales,
653 | LATERAL(
654 | SELECT
655 | EXTRACT(
656 | WEEK
657 | FROM
658 | '2020-06-15' :: date
659 | ) AS base_week
660 | ) bw
661 | WHERE
662 | calendar_year = 2020
663 | AND week_number between (base_week - 12)
664 | AND (base_week - 1)
665 | GROUP BY
666 | 1
667 | ),
668 | sales_after AS (
669 | SELECT
670 | platform,
671 | SUM(sales) AS total_sales_after
672 | FROM
673 | clean_weekly_sales,
674 | LATERAL(
675 | SELECT
676 | EXTRACT(
677 | WEEK
678 | FROM
679 | '2020-06-15' :: date
680 | ) AS base_week
681 | ) bw
682 | WHERE
683 | calendar_year = 2020
684 | AND week_number between (base_week)
685 | AND (base_week + 11)
686 | GROUP BY
687 | 1
688 | )
689 | SELECT
690 | sb.platform,
691 | total_sales_before,
692 | total_sales_after,
693 | total_sales_after - total_sales_before AS change_in_sales,
694 | ROUND(
695 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
696 | 2
697 | ) AS percentage_of_change
698 | FROM
699 | sales_before AS sb
700 | JOIN sales_after AS sa ON sb.platform = sa.platform
701 | GROUP BY
702 | 1,
703 | 2,
704 | 3,
705 | 4
706 | ORDER BY
707 | 5;
708 |
709 | -- age_band:
710 |
711 | WITH sales_before AS (
712 | SELECT
713 | age_band,
714 | SUM(sales) AS total_sales_before
715 | FROM
716 | clean_weekly_sales,
717 | LATERAL(
718 | SELECT
719 | EXTRACT(
720 | WEEK
721 | FROM
722 | '2020-06-15' :: date
723 | ) AS base_week
724 | ) bw
725 | WHERE
726 | calendar_year = 2020
727 | AND week_number between (base_week - 12)
728 | AND (base_week - 1)
729 | GROUP BY
730 | 1
731 | ),
732 | sales_after AS (
733 | SELECT
734 | age_band,
735 | SUM(sales) AS total_sales_after
736 | FROM
737 | clean_weekly_sales,
738 | LATERAL(
739 | SELECT
740 | EXTRACT(
741 | WEEK
742 | FROM
743 | '2020-06-15' :: date
744 | ) AS base_week
745 | ) bw
746 | WHERE
747 | calendar_year = 2020
748 | AND week_number between (base_week)
749 | AND (base_week + 11)
750 | GROUP BY
751 | 1
752 | )
753 | SELECT
754 | sb.age_band,
755 | total_sales_before,
756 | total_sales_after,
757 | total_sales_after - total_sales_before AS change_in_sales,
758 | ROUND(
759 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
760 | 2
761 | ) AS percentage_of_change
762 | FROM
763 | sales_before AS sb
764 | JOIN sales_after AS sa ON sb.age_band = sa.age_band
765 | GROUP BY
766 | 1,
767 | 2,
768 | 3,
769 | 4
770 | ORDER BY
771 | 5;
772 |
773 | -- demographic:
774 |
775 | WITH sales_before AS (
776 | SELECT
777 | demographic,
778 | SUM(sales) AS total_sales_before
779 | FROM
780 | clean_weekly_sales,
781 | LATERAL(
782 | SELECT
783 | EXTRACT(
784 | WEEK
785 | FROM
786 | '2020-06-15' :: date
787 | ) AS base_week
788 | ) bw
789 | WHERE
790 | calendar_year = 2020
791 | AND week_number between (base_week - 12)
792 | AND (base_week - 1)
793 | GROUP BY
794 | 1
795 | ),
796 | sales_after AS (
797 | SELECT
798 | demographic,
799 | SUM(sales) AS total_sales_after
800 | FROM
801 | clean_weekly_sales,
802 | LATERAL(
803 | SELECT
804 | EXTRACT(
805 | WEEK
806 | FROM
807 | '2020-06-15' :: date
808 | ) AS base_week
809 | ) bw
810 | WHERE
811 | calendar_year = 2020
812 | AND week_number between (base_week)
813 | AND (base_week + 11)
814 | GROUP BY
815 | 1
816 | )
817 | SELECT
818 | sb.demographic,
819 | total_sales_before,
820 | total_sales_after,
821 | total_sales_after - total_sales_before AS change_in_sales,
822 | ROUND(
823 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
824 | 2
825 | ) AS percentage_of_change
826 | FROM
827 | sales_before AS sb
828 | JOIN sales_after AS sa ON sb.demographic = sa.demographic
829 | GROUP BY
830 | 1,
831 | 2,
832 | 3,
833 | 4
834 | ORDER BY
835 | 5;
836 |
837 | -- customer_type:
838 |
839 | WITH sales_before AS (
840 | SELECT
841 | customer_type,
842 | SUM(sales) AS total_sales_before
843 | FROM
844 | clean_weekly_sales,
845 | LATERAL(
846 | SELECT
847 | EXTRACT(
848 | WEEK
849 | FROM
850 | '2020-06-15' :: date
851 | ) AS base_week
852 | ) bw
853 | WHERE
854 | calendar_year = 2020
855 | AND week_number between (base_week - 12)
856 | AND (base_week - 1)
857 | GROUP BY
858 | 1
859 | ),
860 | sales_after AS (
861 | SELECT
862 | customer_type,
863 | SUM(sales) AS total_sales_after
864 | FROM
865 | clean_weekly_sales,
866 | LATERAL(
867 | SELECT
868 | EXTRACT(
869 | WEEK
870 | FROM
871 | '2020-06-15' :: date
872 | ) AS base_week
873 | ) bw
874 | WHERE
875 | calendar_year = 2020
876 | AND week_number between (base_week)
877 | AND (base_week + 11)
878 | GROUP BY
879 | 1
880 | )
881 | SELECT
882 | sb.customer_type,
883 | total_sales_before,
884 | total_sales_after,
885 | total_sales_after - total_sales_before AS change_in_sales,
886 | ROUND(
887 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before,
888 | 2
889 | ) AS percentage_of_change
890 | FROM
891 | sales_before AS sb
892 | JOIN sales_after AS sa ON sb.customer_type = sa.customer_type
893 | GROUP BY
894 | 1,
895 | 2,
896 | 3,
897 | 4
898 | ORDER BY
899 | 5;
900 |
--------------------------------------------------------------------------------
/Case Study #6 - Clique Bait/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #6 - Clique Bait :hook:
2 |
3 |
4 |
5 | In this case study - you are required to support Danny’s vision and analyse his dataset and come up with creative solutions to calculate funnel fallout rates for the Clique Bait online store.
6 |
7 | ## Available Data
8 |
9 | For this case study there is a total of 5 datasets which you will need to combine to solve all of the questions.
10 |
11 | ### Users
12 |
13 | Customers who visit the Clique Bait website are tagged via their `cookie_id`.
14 |
15 | | user_id | cookie_id | start_date |
16 | |---------|-----------|---------------------|
17 | | 397 | 3759ff | 2020-03-30 00:00:00 |
18 | | 215 | 863329 | 2020-01-26 00:00:00 |
19 | | 191 | eefca9 | 2020-03-15 00:00:00 |
20 | | 89 | 764796 | 2020-01-07 00:00:00 |
21 | | 127 | 17ccc5 | 2020-01-22 00:00:00 |
22 | | 81 | b0b666 | 2020-03-01 00:00:00 |
23 | | 260 | a4f236 | 2020-01-08 00:00:00 |
24 | | 203 | d1182f | 2020-04-18 00:00:00 |
25 | | 23 | 12dbc8 | 2020-01-18 00:00:00 |
26 | | 375 | f61d69 | 2020-01-03 00:00:00 |
27 |
28 | ### Events
29 |
30 | Customer visits are logged in this events table at a `cookie_id` level and the `event_type` and `page_id` values can be used to join onto relevant satellite tables to obtain further information about each event.
31 |
32 | The `sequence_number` is used to order the events within each visit.
33 | | visit_id | cookie_id | page_id | event_type | sequence_number | event_time |
34 | |----------|-----------|---------|------------|-----------------|----------------------------|
35 | | 719fd3 | 3d83d3 | 5 | 1 | 4 | 2020-03-02 00:29:09.975502 |
36 | | fb1eb1 | c5ff25 | 5 | 2 | 8 | 2020-01-22 07:59:16.761931 |
37 | | 23fe81 | 1e8c2d | 10 | 1 | 9 | 2020-03-21 13:14:11.745667 |
38 | | ad91aa | 648115 | 6 | 1 | 3 | 2020-04-27 16:28:09.824606 |
39 | | 5576d7 | ac418c | 6 | 1 | 4 | 2020-01-18 04:55:10.149236 |
40 | | 48308b | c686c1 | 8 | 1 | 5 | 2020-01-29 06:10:38.702163 |
41 | | 46b17d | 78f9b3 | 7 | 1 | 12 | 2020-02-16 09:45:31.926407 |
42 | | 9fd196 | ccf057 | 4 | 1 | 5 | 2020-02-14 08:29:12.922164 |
43 | | edf853 | f85454 | 1 | 1 | 1 | 2020-02-22 12:59:07.652207 |
44 | | 3c6716 | 02e74f | 3 | 2 | 5 | 2020-01-31 17:56:20.777383 |
45 |
46 | ### Event Identifier
47 |
48 | The `event_identifier` table shows the types of events which are captured by Clique Bait’s digital data systems.
49 |
50 | | event_type | event_name |
51 | |------------|---------------|
52 | | 1 | Page View |
53 | | 2 | Add to Cart |
54 | | 3 | Purchase |
55 | | 4 | Ad Impression |
56 | | 5 | Ad Click |
57 |
58 | ### Campaign Identifier
59 |
60 | This table shows information for the 3 campaigns that Clique Bait has ran on their website so far in 2020.
61 |
62 | | campaign_id | products | campaign_name | start_date | end_date |
63 | |-------------|----------|-----------------------------------|---------------------|---------------------|
64 | | 1 | 1-3 | BOGOF - Fishing For Compliments | 2020-01-01 00:00:00 | 2020-01-14 00:00:00 |
65 | | 2 | 4-5 | 25% Off - Living The Lux Life | 2020-01-15 00:00:00 | 2020-01-28 00:00:00 |
66 | | 3 | 6-8 | Half Off - Treat Your Shellf(ish) | 2020-02-01 00:00:00 | 2020-03-31 00:00:00 |
67 |
68 | ### Page Hierarchy
69 |
70 | This table lists all of the pages on the Clique Bait website which are tagged and have data passing through from user interaction events.
71 |
72 | | page_id | page_name | product_category | product_id |
73 | |---------|----------------|------------------|------------|
74 | | 1 | Home Page | null | null |
75 | | 2 | All Products | null | null |
76 | | 3 | Salmon | Fish | 1 |
77 | | 4 | Kingfish | Fish | 2 |
78 | | 5 | Tuna | Fish | 3 |
79 | | 6 | Russian Caviar | Luxury | 4 |
80 | | 7 | Black Truffle | Luxury | 5 |
81 | | 8 | Abalone | Shellfish | 6 |
82 | | 9 | Lobster | Shellfish | 7 |
83 | | 10 | Crab | Shellfish | 8 |
84 | | 11 | Oyster | Shellfish | 9 |
85 | | 12 | Checkout | null | null |
86 | | 13 | Confirmation | null | null |
87 |
88 | ## Entity Relationship Diagram
89 |
90 | [1. Enterprise Relationship Diagram](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-enterprise-relationship-diagram)
91 |
92 | ## Table of Contents
93 |
94 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#introduction)
95 |
96 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#case-study-questions)
97 |
98 | [1. Enterprise Relationship Diagram](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-enterprise-relationship-diagram)
99 |
100 | [2. Digital Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#2-digital-analysis)
101 |
102 | [1. How many users are there?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-how-many-users-are-there)
103 |
104 | [2. How many cookies does each user have on average?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#2-how-many-cookies-does-each-user-have-on-average)
105 |
106 | [3. What is the unique number of visits by all users per month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-what-is-the-unique-number-of-visits-by-all-users-per-month)
107 |
108 | [4. What is the number of events for each event type?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#4-what-is-the-number-of-events-for-each-event-type)
109 |
110 | [5. What is the percentage of visits which have a purchase event?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#5-what-is-the-percentage-of-visits-which-have-a-purchase-event)
111 |
112 | [6. What is the percentage of visits which view the checkout page but do not have a purchase event?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#6-what-is-the-percentage-of-visits-which-view-the-checkout-page-but-do-not-have-a-purchase-event)
113 |
114 | [7. What are the top 3 pages by number of views?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#7-what-are-the-top-3-pages-by-number-of-views)
115 |
116 | [8. What is the number of views and cart adds for each product category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#8-what-is-the-number-of-views-and-cart-adds-for-each-product-category)
117 |
118 | [9. What are the top 3 products by purchases?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#9-what-are-the-top-3-products-by-purchases)
119 |
120 | [3. Product Funnel Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-product-funnel-analysis)
121 |
122 | [1. Which product had the most views, cart adds and purchases?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-which-product-had-the-most-views-cart-adds-and-purchases)
123 |
124 | [2. Which product was most likely to be abandoned?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#2-which-product-was-most-likely-to-be-abandoned)
125 |
126 | [3. Which product had the highest view to purchase percentage?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-which-product-had-the-highest-view-to-purchase-percentage)
127 |
128 | [4. What is the average conversion rate from view to cart add?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#4-what-is-the-average-conversion-rate-from-view-to-cart-add)
129 |
130 | [5. What is the average conversion rate from cart add to purchase?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#5-what-is-the-average-conversion-rate-from-cart-add-to-purchase)
131 |
132 | [3. Campaigns Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-campaigns-analysis)
133 |
--------------------------------------------------------------------------------
/Case Study #6 - Clique Bait/SQL/solution_6.sql:
--------------------------------------------------------------------------------
1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 25, 2022
2 | -- Fixed on February 14, 2022
3 |
4 | -- Case Study #6 - Clique Bait
5 |
6 | SET
7 | SEARCH_PATH = clique_bait;
8 |
9 | /* --------------------
10 | Case Study Questions
11 |
12 | -- 1. Enterprise Relationship Diagram
13 | --------------------*/
14 |
15 | -- Not a coding question
16 |
17 | /* --------------------
18 | 2. Digital Analysis
19 |
20 | Using the available datasets - answer the following questions using a single query for each one:
21 | --------------------*/
22 |
23 | -- 1. How many users are there?
24 |
25 | SELECT
26 | COUNT(distinct user_id) AS number_of_users
27 | FROM
28 | users;
29 |
30 | -- 2. How many cookies does each user have on average?
31 |
32 | SELECT
33 | ROUND(
34 | COUNT(cookie_id) / COUNT(distinct user_id) :: numeric,
35 | 2
36 | ) AS avg_cookies_per_user
37 | FROM
38 | users;
39 |
40 | -- 3. What is the unique number of visits by all users per month?
41 |
42 | SELECT
43 | TO_CHAR(event_time, 'Month') AS month,
44 | count(distinct visit_id) AS number_of_visits
45 | FROM
46 | events
47 | GROUP BY
48 | 1
49 | ORDER BY
50 | MIN(event_time);
51 |
52 | -- 4. What is the number of events for each event type?
53 |
54 | SELECT
55 | event_name,
56 | COUNT(e.event_type) AS number_of_events
57 | from
58 | events AS e
59 | JOIN event_identifier AS ei ON e.event_type = ei.event_type
60 | GROUP BY
61 | 1
62 | ORDER BY
63 | 2 DESC;
64 |
65 | -- 5. What is the percentage of visits which have a purchase event?
66 |
67 | SELECT
68 | event_name,
69 | ROUND(
70 | 100 *(number_of_purchases :: numeric / number_of_visits),
71 | 1
72 | ) AS percentage_from_all_visits
73 | FROM
74 | events AS e
75 | join event_identifier AS ei on e.event_type = ei.event_type,
76 | LATERAL(
77 | SELECT
78 | COUNT(distinct visit_id) AS number_of_visits
79 | FROM
80 | events
81 | ) AS nv,
82 | LATERAL(
83 | SELECT
84 | COUNT(distinct visit_id) AS number_of_purchases
85 | FROM
86 | events
87 | WHERE
88 | event_type = 3
89 | ) np
90 | WHERE
91 | event_name = 'Purchase'
92 | GROUP BY
93 | event_name,
94 | number_of_visits,
95 | number_of_purchases;
96 |
97 | -- 6. What is the percentage of visits which view the checkout page but do not have a purchase event?
98 |
99 | SELECT
100 | page_name,
101 | COUNT(page_name) AS number_of_visits,
102 | ROUND(
103 | 100 *(COUNT(page_name) :: numeric / checkout_visits),
104 | 1
105 | ) AS percentage_from_checkout_page_visits,
106 | ROUND(
107 | 100 *(COUNT(page_name) :: numeric / total_visits),
108 | 1
109 | ) AS percentage_from_all_visits
110 | FROM
111 | events AS e
112 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id,
113 | LATERAL(
114 | SELECT
115 | COUNT(distinct visit_id) AS total_visits
116 | FROM
117 | events
118 | ) AS tv,
119 | LATERAL(
120 | SELECT
121 | COUNT(distinct visit_id) AS checkout_visits
122 | FROM
123 | events
124 | WHERE
125 | page_id = 12
126 | ) AS cv
127 | WHERE
128 | page_name = 'Checkout'
129 | AND visit_id NOT IN (
130 | SELECT
131 | distinct visit_id
132 | FROM
133 | events AS ee
134 | WHERE
135 | event_type = 3
136 | )
137 | GROUP BY
138 | page_name,
139 | total_visits,
140 | checkout_visits;
141 |
142 | -- 7. What are the top 3 pages by number of views?
143 |
144 | WITH ordered_rows AS(
145 | SELECT
146 | page_name,
147 | event_name,
148 | COUNT(event_name) AS number_of_views,
149 | ROW_NUMBER() OVER (
150 | ORDER BY
151 | COUNT(page_name) DESC
152 | ) AS row
153 | FROM
154 | events AS e
155 | JOIN page_hierarchy AS pe on e.page_id = pe.page_id
156 | JOIN event_identifier AS ei on e.event_type = ei.event_type
157 | WHERE
158 | event_name = 'Page View'
159 | GROUP BY
160 | 1,
161 | 2
162 | )
163 | SELECT
164 | page_name,
165 | number_of_views
166 | FROM
167 | ordered_rows
168 | WHERE
169 | row in (1, 2, 3);
170 |
171 | -- 8. What is the number of views and cart adds for each product category?
172 |
173 | SELECT
174 | pe.product_category,
175 | SUM(
176 | CASE
177 | WHEN event_name = 'Page View' THEN 1
178 | ELSE 0
179 | END
180 | ) AS number_of_page_views,
181 | SUM(
182 | CASE
183 | WHEN event_name = 'Add to Cart' THEN 1
184 | ELSE 0
185 | END
186 | ) AS number_of_add_to_cart_events
187 | FROM
188 | events AS e
189 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id
190 | JOIN event_identifier AS ei ON e.event_type = ei.event_type
191 | WHERE
192 | product_id > 0
193 | GROUP BY
194 | 1
195 | ORDER BY
196 | 1;
197 |
198 | -- 9. What are the top 3 products by purchases?
199 |
200 | WITH ordered_rows AS(
201 | SELECT
202 | page_name,
203 | event_name,
204 | COUNT(event_name) AS number_of_purchases,
205 | ROW_NUMBER() OVER (
206 | ORDER BY
207 | COUNT(event_name) DESC
208 | ) AS row
209 | FROM
210 | events AS e
211 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id
212 | JOIN event_identifier AS ei ON e.event_type = ei.event_type
213 | WHERE
214 | visit_id in (
215 | SELECT
216 | distinct visit_id
217 | FROM
218 | events AS ee
219 | WHERE
220 | event_type = 3
221 | )
222 | AND product_id > 0
223 | AND event_name = 'Add to Cart'
224 | GROUP BY
225 | 1,
226 | 2
227 | )
228 | SELECT
229 | page_name,
230 | number_of_purchases
231 | FROM
232 | ordered_rows
233 | WHERE
234 | row in (1, 2, 3);
235 |
236 | /* --------------------
237 | 3. Product Funnel Analysis
238 |
239 | Using a single SQL query - create a new output table which has the following details:
240 |
241 | How many times was each product viewed?
242 | How many times was each product added to cart?
243 | How many times was each product added to a cart but not purchased (abandoned)?
244 | How many times was each product purchased?
245 | --------------------*/
246 |
247 | WITH joined_tables AS(
248 | SELECT
249 | visit_id,
250 | page_name,
251 | event_name
252 | FROM
253 | events AS e
254 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id
255 | JOIN event_identifier AS ei ON e.event_type = ei.event_type
256 | GROUP BY
257 | 1,
258 | 2,
259 | 3
260 | )
261 | SELECT
262 | jt.page_name,
263 | COUNT(event_name) AS number_of_views,
264 | number_of_added_to_cart,
265 | number_of_abandoned_carts,
266 | number_of_purchases
267 |
268 | -- INTO TABLE product_stats
269 | FROM
270 | joined_tables AS jt
271 | JOIN (
272 | SELECT
273 | page_name,
274 | COUNT(event_name) AS number_of_added_to_cart
275 | FROM
276 | joined_tables
277 | WHERE
278 | event_name = 'Add to Cart'
279 | GROUP BY
280 | 1
281 | ) jt1 ON jt.page_name = jt1.page_name
282 | JOIN (
283 | SELECT
284 | page_name,
285 | COUNT(event_name) AS number_of_abandoned_carts
286 | FROM
287 | joined_tables
288 | WHERE
289 | event_name = 'Add to Cart'
290 | AND visit_id NOT IN (
291 | SELECT
292 | distinct visit_id
293 | FROM
294 | events AS ee
295 | WHERE
296 | event_type = 3
297 | )
298 | GROUP BY
299 | 1
300 | ) jt2 ON jt.page_name = jt2.page_name
301 | JOIN (
302 | SELECT
303 | page_name,
304 | COUNT(event_name) AS number_of_purchases
305 | FROM
306 | joined_tables
307 | WHERE
308 | event_name = 'Add to Cart'
309 | AND visit_id IN (
310 | SELECT
311 | distinct visit_id
312 | FROM
313 | events AS ee
314 | WHERE
315 | event_type = 3
316 | )
317 | GROUP BY
318 | 1
319 | ) jt3 ON jt.page_name = jt3.page_name
320 | WHERE
321 | event_name = 'Page View'
322 | GROUP BY
323 | jt.page_name,
324 | number_of_added_to_cart,
325 | number_of_purchases,
326 | number_of_abandoned_carts
327 | ORDER BY
328 | 1;
329 |
330 | -- Additionally, create another table which further aggregates the data for the above points but this time for each product category instead of individual products.
331 |
332 | SELECT
333 | product_category,
334 | SUM(number_of_views) AS number_of_views,
335 | SUM(number_of_added_to_cart) AS number_of_added_to_cart,
336 | SUM(number_of_abandoned_carts) AS number_of_abandoned_carts,
337 | SUM(number_of_purchases) AS number_of_purchases
338 |
339 | -- INTO TABLE product_category_stats
340 | FROM
341 | product_stats AS ps
342 | JOIN page_hierarchy AS pe ON ps.page_name = pe.page_name
343 | GROUP BY
344 | product_category
345 | ORDER BY
346 | 1;
347 |
348 | -- Use your 2 new output tables - answer the following questions:
349 | -- 1. Which product had the most views, cart adds and purchases?
350 |
351 | WITH ordered_rows AS (
352 | SELECT
353 | *,
354 | ROW_NUMBER() OVER (
355 | ORDER BY
356 | number_of_views DESC
357 | ) AS views,
358 | ROW_NUMBER() OVER (
359 | ORDER BY
360 | number_of_added_to_cart DESC
361 | ) AS carts,
362 | ROW_NUMBER() OVER (
363 | ORDER BY
364 | number_of_purchases DESC
365 | ) AS purchases
366 | FROM
367 | product_stats
368 | GROUP BY
369 | 1,
370 | 2,
371 | 3,
372 | 4,
373 | 5
374 | )
375 | SELECT
376 | page_name,
377 | number_of_views,
378 | number_of_added_to_cart,
379 | number_of_purchases
380 | FROM
381 | ordered_rows
382 | WHERE
383 | views = 1
384 | OR carts = 1
385 | OR purchases = 1;
386 |
387 | -- 2. Which product was most likely to be abandoned?
388 |
389 | WITH ordered_rows AS (
390 | SELECT
391 | *,
392 | ROW_NUMBER() OVER (
393 | ORDER BY
394 | number_of_abandoned_carts DESC
395 | ) AS row
396 | FROM
397 | product_stats
398 | GROUP BY
399 | 1,
400 | 2,
401 | 3,
402 | 4,
403 | 5
404 | )
405 | SELECT
406 | page_name,
407 | number_of_abandoned_carts
408 | FROM
409 | ordered_rows
410 | WHERE
411 | row = 1;
412 |
413 | -- 3. Which product had the highest view to purchase percentage?
414 |
415 | WITH ordered_rows AS (
416 | SELECT
417 | page_name,
418 | view_to_purchase_percentage,
419 | ROW_NUMBER() OVER (
420 | ORDER BY
421 | view_to_purchase_percentage DESC
422 | ) AS row
423 | FROM
424 | product_stats,
425 | LATERAL(
426 | SELECT
427 | ROUND(
428 | 100 *(number_of_purchases :: numeric / number_of_views),
429 | 1
430 | ) AS view_to_purchase_percentage
431 | ) vpp
432 | GROUP BY
433 | 1,
434 | 2
435 | )
436 | SELECT
437 | page_name,
438 | view_to_purchase_percentage
439 | FROM
440 | ordered_rows
441 | WHERE
442 | row = 1;
443 |
444 | -- 4. What is the average conversion rate from view to cart add?
445 |
446 | SELECT
447 | ROUND(
448 | 100 *(
449 | SUM(number_of_added_to_cart) / SUM(number_of_views)
450 | ),
451 | 1
452 | ) AS avg_view_to_cart_conversion
453 | FROM
454 | product_category_stats;
455 |
456 | -- 5. What is the average conversion rate from cart add to purchase?
457 |
458 | SELECT
459 | ROUND(
460 | 100 *(
461 | SUM(number_of_purchases) / SUM(number_of_added_to_cart)
462 | ),
463 | 1
464 | ) AS avg_cart_to_purchase_conversion
465 | FROM
466 | product_category_stats;
467 |
468 | /* --------------------
469 | 3. Campaigns Analysis
470 |
471 | Generate a table that has 1 single row for every unique visit_id record and has the following columns:
472 |
473 | user_id
474 | visit_id
475 | visit_start_time: the earliest event_time for each visit
476 | page_views: count of page views for each visit
477 | cart_adds: count of product cart add events for each visit
478 | purchase: 1/0 flag if a purchase event exists for each visit
479 | campaign_name: map the visit to a campaign if the visit_start_time falls between the start_date and end_date
480 | impression: count of ad impressions for each visit
481 | click: count of ad clicks for each visit
482 | (Optional column) cart_products: a comma separated text value with products added to the cart sorted by the order they were added to the cart (hint: use the sequence_number)
483 | --------------------*/
484 |
485 | WITH joined_table AS (
486 | SELECT
487 | user_id,
488 | visit_id,
489 | event_time AS visit_start_time,
490 | page_name,
491 | event_name,
492 | sequence_number,
493 | product_id
494 | FROM
495 | users AS u
496 | JOIN events AS e ON u.cookie_id = e.cookie_id
497 | JOIN event_identifier AS ei ON e.event_type = ei.event_type
498 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id
499 | GROUP BY
500 | user_id,
501 | visit_id,
502 | event_name,
503 | page_name,
504 | event_time,
505 | sequence_number,
506 | product_id
507 | )
508 | SELECT
509 | user_id,
510 | jt.visit_id,
511 | visit_start_time,
512 | page_views,
513 | cart_adds,
514 | purchase,
515 | campaign_name,
516 | impression,
517 | click,
518 | cart_products
519 | FROM
520 | joined_table AS jt
521 | LEFT JOIN(
522 | SELECT
523 | visit_id,
524 | COUNT(page_name) AS page_views
525 | FROM
526 | joined_table
527 | WHERE
528 | event_name = 'Page View'
529 | GROUP BY
530 | 1
531 | ) AS jt1 ON jt.visit_id = jt1.visit_id
532 | LEFT JOIN(
533 | SELECT
534 | visit_id,
535 | COUNT(page_name) AS cart_adds
536 | FROM
537 | joined_table
538 | WHERE
539 | event_name = 'Add to Cart'
540 | GROUP BY
541 | 1
542 | ) AS jt2 ON jt.visit_id = jt2.visit_id
543 | LEFT JOIN(
544 | SELECT
545 | visit_id,
546 | CASE
547 | WHEN visit_id IN (
548 | SELECT
549 | distinct visit_id
550 | FROM
551 | events AS ee
552 | WHERE
553 | event_type = 3
554 | ) THEN 1
555 | ELSE 0
556 | END AS purchase
557 | FROM
558 | joined_table
559 | GROUP BY
560 | 1
561 | ) AS jt3 ON jt.visit_id = jt3.visit_id
562 | LEFT JOIN(
563 | SELECT
564 | visit_id,
565 | COUNT(page_name) AS impression
566 | FROM
567 | joined_table
568 | WHERE
569 | event_name = 'Ad Impression'
570 | GROUP BY
571 | 1
572 | ) AS jt4 ON jt.visit_id = jt4.visit_id
573 | LEFT JOIN(
574 | SELECT
575 | visit_id,
576 | COUNT(page_name) AS click
577 | FROM
578 | joined_table
579 | WHERE
580 | event_name = 'Ad Click'
581 | GROUP BY
582 | 1
583 | ) AS jt5 ON jt.visit_id = jt5.visit_id
584 | LEFT JOIN campaign_identifier AS ci ON jt.visit_start_time between ci.start_date
585 | AND ci.end_date
586 | LEFT JOIN(
587 | SELECT
588 | visit_id,
589 | STRING_AGG(
590 | page_name,
591 | ', '
592 | ORDER BY
593 | sequence_number
594 | ) AS cart_products
595 | FROM
596 | joined_table
597 | WHERE
598 | product_id > 0
599 | AND event_name = 'Add to Cart'
600 | GROUP BY
601 | 1
602 | ) AS jt6 ON jt.visit_id = jt6.visit_id
603 | WHERE
604 | sequence_number = 1
605 | GROUP BY
606 | page_name,
607 | page_views,
608 | cart_adds,
609 | user_id,
610 | jt.visit_id,
611 | purchase,
612 | impression,
613 | click,
614 | visit_start_time,
615 | campaign_name,
616 | cart_products
617 | ORDER BY
618 | 1,
619 | 3;
620 |
621 | /* --------------------
622 | Use the subsequent dataset to generate at least 5 insights for the Clique Bait team - bonus: prepare a single A4 infographic that the team can use for their management reporting sessions, be sure to emphasise the most important points from your findings.
623 |
624 | Some ideas you might want to investigate further include:
625 |
626 | Identifying users who have received impressions during each campaign period and comparing each metric with other users who did not have an impression event
627 | Does clicking on an impression lead to higher purchase rates?
628 | What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click?
629 | What metrics can you use to quantify the success or failure of each campaign compared to each other?
630 | --------------------*/
631 |
--------------------------------------------------------------------------------
/Case Study #7 - Balanced Tree Clothing Co./README.md:
--------------------------------------------------------------------------------
1 | # Case Study #7 - Balanced Tree Clothing Co. :mountain_snow:
2 |
3 |
4 |
5 | Danny, the CEO of this trendy fashion company has asked you to assist the team’s merchandising teams analyse their sales performance and generate a basic financial report to share with the wider business.
6 |
7 | ## Available Data
8 |
9 | For this case study there is a total of 4 datasets for this case study - however you will only need to utilise 2 main tables to solve all of the regular questions, and the additional 2 tables are used only for the bonus challenge question!
10 |
11 | ### Product Details
12 |
13 | `balanced_tree.product_details` includes all information about the entire range that Balanced Clothing sells in their store.
14 |
15 | | product_id | price | product_name | category_id | segment_id | style_id | category_name | segment_name | style_name |
16 | |------------|-------|----------------------------------|-------------|------------|----------|---------------|--------------|---------------------|
17 | | c4a632 | 13 | Navy Oversized Jeans - Womens | 1 | 3 | 7 | Womens | Jeans | Navy Oversized |
18 | | e83aa3 | 32 | Black Straight Jeans - Womens | 1 | 3 | 8 | Womens | Jeans | Black Straight |
19 | | e31d39 | 10 | Cream Relaxed Jeans - Womens | 1 | 3 | 9 | Womens | Jeans | Cream Relaxed |
20 | | d5e9a6 | 23 | Khaki Suit Jacket - Womens | 1 | 4 | 10 | Womens | Jacket | Khaki Suit |
21 | | 72f5d4 | 19 | Indigo Rain Jacket - Womens | 1 | 4 | 11 | Womens | Jacket | Indigo Rain |
22 | | 9ec847 | 54 | Grey Fashion Jacket - Womens | 1 | 4 | 12 | Womens | Jacket | Grey Fashion |
23 | | 5d267b | 40 | White Tee Shirt - Mens | 2 | 5 | 13 | Mens | Shirt | White Tee |
24 | | c8d436 | 10 | Teal Button Up Shirt - Mens | 2 | 5 | 14 | Mens | Shirt | Teal Button Up |
25 | | 2a2353 | 57 | Blue Polo Shirt - Mens | 2 | 5 | 15 | Mens | Shirt | Blue Polo |
26 | | f084eb | 36 | Navy Solid Socks - Mens | 2 | 6 | 16 | Mens | Socks | Navy Solid |
27 | | b9a74d | 17 | White Striped Socks - Mens | 2 | 6 | 17 | Mens | Socks | White Striped |
28 | | 2feb6b | 29 | Pink Fluro Polkadot Socks - Mens | 2 | 6 | 18 | Mens | Socks | Pink Fluro Polkadot |
29 |
30 | ### Product Sales
31 |
32 | `balanced_tree.sales` contains product level information for all the transactions made for Balanced Tree including quantity, price, percentage discount, member status, a transaction ID and also the transaction timestamp.
33 |
34 | | prod_id | qty | price | discount | member | txn_id | start_txn_time |
35 | |---------|-----|-------|----------|--------|--------|--------------------------|
36 | | c4a632 | 4 | 13 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 |
37 | | 5d267b | 4 | 40 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 |
38 | | b9a74d | 4 | 17 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 |
39 | | 2feb6b | 2 | 29 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 |
40 | | c4a632 | 5 | 13 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 |
41 | | e31d39 | 2 | 10 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 |
42 | | 72f5d4 | 3 | 19 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 |
43 | | 2a2353 | 3 | 57 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 |
44 | | f084eb | 3 | 36 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 |
45 | | c4a632 | 1 | 13 | 21 | f | ef648d | 2021-01-27 02:18:17.1648 |
46 |
47 | ### Product Hierarcy & Product Price
48 |
49 | These tables are used only for the bonus question where we will use them to recreate the `balanced_tree.product_details` table.
50 |
51 | `balanced_tree.product_hierarchy`
52 |
53 | | id | parent_id | level_text | level_name |
54 | |----|-----------|---------------------|------------|
55 | | 1 | Womens | Category | |
56 | | 2 | Mens | Category | |
57 | | 3 | 1 | Jeans | Segment |
58 | | 4 | 1 | Jacket | Segment |
59 | | 5 | 2 | Shirt | Segment |
60 | | 6 | 2 | Socks | Segment |
61 | | 7 | 3 | Navy Oversized | Style |
62 | | 8 | 3 | Black Straight | Style |
63 | | 9 | 3 | Cream Relaxed | Style |
64 | | 10 | 4 | Khaki Suit | Style |
65 | | 11 | 4 | Indigo Rain | Style |
66 | | 12 | 4 | Grey Fashion | Style |
67 | | 13 | 5 | White Tee | Style |
68 | | 14 | 5 | Teal Button Up | Style |
69 | | 15 | 5 | Blue Polo | Style |
70 | | 16 | 6 | Navy Solid | Style |
71 | | 17 | 6 | White Striped | Style |
72 | | 18 | 6 | Pink Fluro Polkadot | Style |
73 |
74 | `balanced_tree.product_prices`
75 |
76 | | id | product_id | price |
77 | |----|------------|-------|
78 | | 7 | c4a632 | 13 |
79 | | 8 | e83aa3 | 32 |
80 | | 9 | e31d39 | 10 |
81 | | 10 | d5e9a6 | 23 |
82 | | 11 | 72f5d4 | 19 |
83 | | 12 | 9ec847 | 54 |
84 | | 13 | 5d267b | 40 |
85 | | 14 | c8d436 | 10 |
86 | | 15 | 2a2353 | 57 |
87 | | 16 | f084eb | 36 |
88 | | 17 | b9a74d | 17 |
89 | | 18 | 2feb6b | 29 |
90 |
91 | ## Table of Contents
92 |
93 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#introduction)
94 |
95 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#case-study-questions)
96 |
97 | [High Level Sales Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#high-level-sales-analysis)
98 |
99 | [1. What was the total quantity sold for all products?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#1-what-was-the-total-quantity-sold-for-all-products)
100 |
101 | [2. What is the total generated revenue for all products before discounts?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#2-what-is-the-total-generated-revenue-for-all-products-before-discounts)
102 |
103 | [3. What was the total discount amount for all products?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#3-what-was-the-total-discount-amount-for-all-products)
104 |
105 | [Transaction Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#transaction-analysis)
106 |
107 | [1. How many unique transactions were there?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#1-how-many-unique-transactions-were-there)
108 |
109 | [2. What is the average unique products purchased in each transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#2-what-is-the-average-unique-products-purchased-in-each-transaction)
110 |
111 | [3. What are the 25th, 50th and 75th percentile values for the revenue per transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#3-what-are-the-25th-50th-and-75th-percentile-values-for-the-revenue-per-transaction)
112 |
113 | [4. What is the average discount value per transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#4-what-is-the-average-discount-value-per-transaction)
114 |
115 | [5. What is the percentage split of all transactions for members vs non-members?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#5-what-is-the-percentage-split-of-all-transactions-for-members-vs-non-members)
116 |
117 | [6. What is the average revenue for member transactions and non-member transactions?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#6-what-is-the-average-revenue-for-member-transactions-and-non-member-transactions)
118 |
119 | [Product Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#product-analysis)
120 |
121 | [1. What are the top 3 products by total revenue before discount?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#1-what-are-the-top-3-products-by-total-revenue-before-discount)
122 |
123 | [2. What is the total quantity, revenue and discount for each segment?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#2-what-is-the-total-quantity-revenue-and-discount-for-each-segment)
124 |
125 | [3. What is the top selling product for each segment?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#3-what-is-the-top-selling-product-for-each-segment)
126 |
127 | [4. What is the total quantity, revenue and discount for each category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#4-what-is-the-total-quantity-revenue-and-discount-for-each-category)
128 |
129 | [5. What is the top selling product for each category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#5-what-is-the-top-selling-product-for-each-category)
130 |
131 | [6. What is the percentage split of revenue by product for each segment?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#6-what-is-the-percentage-split-of-revenue-by-product-for-each-segment)
132 |
133 | [7. What is the percentage split of revenue by segment for each category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#7-what-is-the-percentage-split-of-revenue-by-segment-for-each-category)
134 |
135 | [8. What is the percentage split of total revenue by category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#8-what-is-the-percentage-split-of-total-revenue-by-category)
136 |
137 | [9. What is the total transaction “penetration” for each product? (hint: penetration = number of transactions where at least 1 quantity of a product was purchased divided by total number of transactions)](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#9-what-is-the-total-transaction-penetration-for-each-product-hint-penetration--number-of-transactions-where-at-least-1-quantity-of-a-product-was-purchased-divided-by-total-number-of-transactions)
138 |
139 | [10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#10-what-is-the-most-common-combination-of-at-least-1-quantity-of-any-3-products-in-a-1-single-transaction)
140 |
141 | [Reporting Challenge](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#reporting-challenge)
142 |
143 | [Bonus Challenge](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#bonus-challenge)
144 |
--------------------------------------------------------------------------------
/Case Study #7 - Balanced Tree Clothing Co./SQL/solution_7.sql:
--------------------------------------------------------------------------------
1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 28, 2022
2 | -- Fixed on February 15, 2022
3 |
4 | /* --------------------
5 | Case Study #7 - Balanced Tree Clothing Co.
6 | --------------------*/
7 |
8 | SET
9 | SEARCH_PATH = balanced_tree;
10 |
11 | /* --------------------
12 | Case Study Questions
13 | --------------------*/
14 |
15 | -- 1. What was the total quantity sold for all products?
16 |
17 | SELECT
18 | SUM(qty) AS total_qty_sold
19 | FROM
20 | sales;
21 |
22 | -- 2. What is the total generated revenue for all products before discounts?
23 |
24 | SELECT
25 | SUM(qty * price) AS total_sales
26 | FROM
27 | sales;
28 |
29 | -- 3. What was the total discount amount for all products?
30 |
31 | SELECT
32 | ROUND(SUM(qty * price * discount :: numeric / 100), 2) AS total_discount
33 | FROM
34 | sales;
35 |
36 | /* --------------------
37 | Transaction Analysis
38 | --------------------*/
39 |
40 | -- 1. How many unique transactions were there?
41 |
42 | SELECT
43 | COUNT(distinct txn_id) AS number_of_transactions
44 | FROM
45 | sales;
46 |
47 | -- 2. What is the average unique products purchased in each transaction?
48 |
49 | SELECT
50 | COUNT(prod_id) / COUNT(distinct txn_id) AS avg_number_of_product_in_order
51 | FROM
52 | sales;
53 |
54 | -- 3. What are the 25th, 50th and 75th percentile values for the revenue per transaction?
55 |
56 | WITH revenue AS(
57 | SELECT
58 | txn_id,
59 | SUM(qty * price) AS revenue
60 | FROM
61 | sales
62 | GROUP BY
63 | txn_id
64 | )
65 | SELECT
66 | PERCENTILE_CONT(0.25) WITHIN GROUP (
67 | ORDER BY
68 | revenue
69 | ) AS percentile_25,
70 | PERCENTILE_CONT(0.50) WITHIN GROUP (
71 | ORDER BY
72 | revenue
73 | ) AS percentile_50,
74 | PERCENTILE_CONT(0.75) WITHIN GROUP (
75 | ORDER BY
76 | revenue
77 | ) AS percentile_75
78 | FROM
79 | revenue;
80 |
81 | -- 4. What is the average discount value per transaction?
82 |
83 | WITH revenue AS(
84 | SELECT
85 | txn_id,
86 | SUM(qty * price * discount :: numeric / 100) AS order_discount
87 | FROM
88 | sales
89 | GROUP BY
90 | txn_id
91 | )
92 | SELECT
93 | ROUND(AVG(order_discount), 2) AS avg_order_discount
94 | FROM
95 | revenue;
96 |
97 | -- 5. What is the percentage split of all transactions for members vs non-members?
98 |
99 | WITH members AS (
100 | SELECT
101 | DISTINCT sales.txn_id,
102 | COUNT(distinct member) AS total_members,
103 | CASE
104 | WHEN member = TRUE THEN 1
105 | ELSE 0
106 | END AS number_of_members
107 | FROM
108 | sales
109 | GROUP BY
110 | txn_id,
111 | number_of_members
112 | )
113 | SELECT
114 | distinct percentage_of_members,
115 | 100 - percentage_of_members AS percentage_of_guests
116 | FROM
117 | members,
118 | LATERAL(
119 | SELECT
120 | ROUND(
121 | 100 *(SUM(number_of_members) / SUM(total_members)),
122 | 1
123 | ) AS percentage_of_members
124 | FROM
125 | members
126 | ) pm;
127 |
128 | -- 6. What is the average revenue for member transactions and non-member transactions?
129 |
130 | WITH members AS (
131 | SELECT
132 | sales.txn_id,
133 | CASE
134 | WHEN member = TRUE THEN SUM(qty * price)
135 | END AS members_revenue,
136 | CASE
137 | WHEN member = FALSE THEN SUM(qty * price)
138 | END AS guests_revenue
139 | FROM
140 | sales
141 | GROUP BY
142 | txn_id,
143 | member
144 | )
145 | SELECT
146 | ROUND(AVG(members_revenue), 2) AS avg_members_revenue,
147 | ROUND(AVG(guests_revenue), 2) AS avg_guests_revenue
148 | FROM
149 | members;
150 |
151 | /* --------------------
152 | Product Analysis
153 | --------------------*/
154 |
155 | -- 1. What are the top 3 products by total revenue before discount?
156 |
157 | WITH revenue AS(
158 | SELECT
159 | product_name,
160 | SUM(qty * s.price) AS total_revenue,
161 | ROW_NUMBER() OVER(
162 | ORDER BY
163 | SUM(qty * s.price) DESC
164 | ) AS row
165 | FROM
166 | sales AS s
167 | JOIN product_details AS pd ON s.prod_id = pd.product_id
168 | GROUP BY
169 | product_name
170 | )
171 | SELECT
172 | product_name,
173 | total_revenue
174 | FROM
175 | revenue
176 | WHERE
177 | row in (1, 2, 3);
178 |
179 | -- 2. What is the total quantity, revenue and discount for each segment?
180 |
181 |
182 | SELECT
183 | segment_name,
184 | SUM(qty) AS total_quantity,
185 | SUM(qty * s.price) AS total_revenue,
186 | round(
187 | SUM(qty * s.price * discount :: numeric / 100),
188 | 2
189 | ) AS total_discount
190 | FROM
191 | sales AS s
192 | JOIN product_details AS pd ON s.prod_id = pd.product_id
193 | GROUP BY
194 | segment_name
195 | ORDER BY
196 | 1;
197 |
198 | -- 3. What is the top selling product for each segment?
199 |
200 | WITH revenue AS (
201 | SELECT
202 | segment_name,
203 | product_name,
204 | SUM(qty) AS total_quantity,
205 | SUM(qty * s.price) AS total_revenue,
206 | ROW_NUMBER() OVER(
207 | PARTITION BY segment_name
208 | ORDER BY
209 | SUM(qty * s.price) DESC
210 | ) AS revenue_rank,
211 | ROW_NUMBER() OVER(
212 | PARTITION BY segment_name
213 | ORDER BY
214 | SUM(qty) DESC
215 | ) AS qty_rank
216 | FROM
217 | sales AS s
218 | JOIN product_details AS pd ON s.prod_id = pd.product_id
219 | GROUP BY
220 | segment_name,
221 | product_name
222 | )
223 | SELECT
224 | segment_name,
225 | product_name,
226 | total_quantity,
227 | total_revenue
228 | FROM
229 | revenue
230 | WHERE
231 | revenue_rank = 1
232 | OR qty_rank = 1;
233 |
234 | -- 4. What is the total quantity, revenue and discount for each category?
235 |
236 | SELECT
237 | category_name,
238 | SUM(qty) AS total_quantity,
239 | SUM(qty * s.price) AS total_revenue,
240 | round(
241 | SUM(qty * s.price * discount :: numeric / 100),
242 | 2
243 | ) AS total_discount
244 | FROM
245 | sales AS s
246 | JOIN product_details AS pd ON s.prod_id = pd.product_id
247 | GROUP BY
248 | category_name
249 | ORDER BY
250 | 1;
251 |
252 | -- 5. What is the top selling product for each category?
253 |
254 | WITH revenue AS (
255 | SELECT
256 | category_name,
257 | product_name,
258 | SUM(qty) AS total_quantity,
259 | SUM(qty * s.price) AS total_revenue,
260 | ROW_NUMBER() OVER(
261 | PARTITION BY category_name
262 | ORDER BY
263 | SUM(qty * s.price) DESC
264 | ) AS revenue_rank,
265 | ROW_NUMBER() OVER(
266 | PARTITION BY category_name
267 | ORDER BY
268 | SUM(qty) DESC
269 | ) AS qty_rank
270 | FROM
271 | sales AS s
272 | JOIN product_details AS pd ON s.prod_id = pd.product_id
273 | GROUP BY
274 | category_name,
275 | product_name
276 | )
277 | SELECT
278 | category_name,
279 | product_name,
280 | total_quantity,
281 | total_revenue
282 | FROM
283 | revenue
284 | WHERE
285 | revenue_rank = 1
286 | OR qty_rank = 1;
287 |
288 | -- 6. What is the percentage split of revenue by product for each segment?
289 |
290 | SELECT
291 | segment_name,
292 | product_name,
293 | ROUND(
294 | 100 *(
295 | SUM(qty * s.price) :: numeric / SUM(SUM(qty * s.price)) OVER(PARTITION BY segment_name)
296 | ),
297 | 1
298 | ) AS percent_of_revenue
299 | FROM
300 | sales AS s
301 | JOIN product_details AS pd ON s.prod_id = pd.product_id
302 | GROUP BY
303 | segment_name,
304 | product_name
305 | ORDER BY
306 | 1, 3 DESC;
307 |
308 | -- 7. What is the percentage split of revenue by segment for each category?
309 |
310 | SELECT
311 | segment_name,
312 | category_name,
313 | ROUND(
314 | 100 *(
315 | SUM(qty * s.price) :: numeric / SUM(SUM(qty * s.price)) OVER(PARTITION BY category_name)
316 | ),
317 | 1
318 | ) AS percent_of_revenue
319 | FROM
320 | sales AS s
321 | JOIN product_details AS pd ON s.prod_id = pd.product_id
322 | GROUP BY
323 | segment_name,
324 | category_name
325 | ORDER BY
326 | 1;
327 |
328 | -- 8. What is the percentage split of total revenue by category?
329 |
330 | SELECT
331 | category_name,
332 | ROUND(
333 | 100 *(
334 | SUM(qty * s.price) :: numeric / SUM(SUM(qty * s.price)) OVER()
335 | ),
336 | 1
337 | ) AS percent_of_revenue
338 | FROM
339 | sales AS s
340 | JOIN product_details AS pd ON s.prod_id = pd.product_id
341 | GROUP BY
342 | category_name
343 | ORDER BY
344 | 1;
345 |
346 | -- 9. What is the total transaction “penetration” for each product? (hint: penetration = number of transactions where at least 1 quantity of a product was purchased divided by total number of transactions)
347 |
348 | SELECT
349 | product_name,
350 | ROUND(
351 | 100 *(COUNT(product_name) :: numeric / number_of_txn),
352 | 2
353 | ) AS percent_of_penetration
354 | FROM
355 | sales AS s
356 | JOIN product_details AS pd ON s.prod_id = pd.product_id,
357 | LATERAL(
358 | SELECT
359 | COUNT(distinct txn_id) AS number_of_txn
360 | FROM
361 | sales
362 | ) ss
363 | GROUP BY
364 | product_name,
365 | number_of_txn
366 | ORDER BY
367 | 2 DESC;
368 |
369 | -- 10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction?
370 |
371 | SELECT
372 | product_1,
373 | product_2,
374 | product_3,
375 | times_bought_together
376 | FROM
377 | (
378 | with products AS(
379 | SELECT
380 | txn_id,
381 | product_name
382 | FROM
383 | sales AS s
384 | JOIN product_details AS pd ON s.prod_id = pd.product_id
385 | )
386 | SELECT
387 | p.product_name AS product_1,
388 | p1.product_name AS product_2,
389 | p2.product_name AS product_3,
390 | COUNT(*) AS times_bought_together,
391 | ROW_NUMBER() OVER(
392 | ORDER BY
393 | COUNT(*) DESC
394 | ) AS rank
395 | FROM
396 | products AS p
397 | JOIN products AS p1 ON p.txn_id = p1.txn_id
398 | AND p.product_name != p1.product_name
399 | AND p.product_name < p1.product_name
400 | JOIN products AS p2 ON p.txn_id = p2.txn_id
401 | AND p.product_name != p2.product_name
402 | AND p1.product_name != p2.product_name
403 | AND p.product_name < p2.product_name
404 | AND p1.product_name < p2.product_name
405 | GROUP BY
406 | p.product_name,
407 | p1.product_name,
408 | p2.product_name
409 | ) pp
410 | WHERE
411 | rank = 1;
412 |
413 | /* --------------------
414 | Reporting Challenge
415 |
416 | Write a single SQL script that combines all of the previous questions into a scheduled report that the Balanced Tree team can run at the beginning of each month to calculate the previous month’s values.
417 |
418 | Imagine that the Chief Financial Officer (which is also Danny) has asked for all of these questions at the end of every month.
419 |
420 | He first wants you to generate the data for January only - but then he also wants you to demonstrate that you can easily run the same analysis for February without many changes (if at all).
421 |
422 | Feel free to split up your final outputs into as many tables as you need - but be sure to explicitly reference which table outputs relate to which question for full marks :)
423 | --------------------*/
424 |
425 | -- [WORK IN PROGRESS]
426 |
427 | /* --------------------
428 | Bonus Challenge
429 |
430 | Use a single SQL query to transform the product_hierarchy and product_prices datasets to the product_details table.
431 |
432 | Hint: you may want to consider using a recursive CTE to solve this problem!
433 | --------------------*/
434 |
435 | SELECT
436 | product_id,
437 | price,
438 | CONCAT(
439 | ph.level_text,
440 | ' ',
441 | ph1.level_text,
442 | ' - ',
443 | ph2.level_text
444 | ) AS product_name,
445 | ph2.id AS category_id,
446 | ph1.id AS segment_id,
447 | ph.id AS style_id,
448 | ph2.level_text AS category_name,
449 | ph1.level_text AS segment_name,
450 | ph.level_text AS style_name
451 | FROM
452 | product_hierarchy AS ph
453 | JOIN product_hierarchy AS ph1 on ph.parent_id = ph1.id
454 | JOIN product_hierarchy AS ph2 on ph1.parent_id = ph2.id
455 | JOIN product_prices AS pp on ph.id = pp.id
456 |
--------------------------------------------------------------------------------
/Case Study #8 - Fresh Segments/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #8 - Fresh Segments :diamond_shape_with_a_dot_inside:
2 |
3 |
4 |
5 | Clients share their customer lists with the Fresh Segments team who then aggregate interest metrics and generate a single dataset worth of metrics for further analysis.
6 |
7 | Danny has asked for your assistance to analyse aggregated metrics for an example client and provide some high level insights about the customer list and their interests.
8 |
9 | ## Available Data
10 |
11 | For this case study there is a total of 2 datasets which you will need to use to solve the questions.
12 |
13 | ### Interest Metrics
14 |
15 | This table contains information about aggregated interest metrics for a specific major client of Fresh Segments which makes up a large proportion of their customer base.
16 |
17 | Each record in this table represents the performance of a specific interest_id based on the client’s customer base interest measured through clicks and interactions with specific targeted advertising content.
18 |
19 | | \_month | \_year | month_year | interest_id | composition | index_value | ranking | percentile_ranking |
20 | |---------|--------|------------|-------------|-------------|-------------|---------|--------------------|
21 | | 7 | 2018 | 07-2018 | 32486 | 11.89 | 6.19 | 1 | 99.86 |
22 | | 7 | 2018 | 07-2018 | 6106 | 9.93 | 5.31 | 2 | 99.73 |
23 | | 7 | 2018 | 07-2018 | 18923 | 10.85 | 5.29 | 3 | 99.59 |
24 | | 7 | 2018 | 07-2018 | 6344 | 10.32 | 5.1 | 4 | 99.45 |
25 | | 7 | 2018 | 07-2018 | 100 | 10.77 | 5.04 | 5 | 99.31 |
26 | | 7 | 2018 | 07-2018 | 69 | 10.82 | 5.03 | 6 | 99.18 |
27 | | 7 | 2018 | 07-2018 | 79 | 11.21 | 4.97 | 7 | 99.04 |
28 | | 7 | 2018 | 07-2018 | 6111 | 10.71 | 4.83 | 8 | 98.9 |
29 | | 7 | 2018 | 07-2018 | 6214 | 9.71 | 4.83 | 8 | 98.9 |
30 | | 7 | 2018 | 07-2018 | 19422 | 10.11 | 4.81 | 10 | 98.63 |
31 |
32 |
33 | For example - let’s interpret the first row of the interest_metrics table together:
34 |
35 | | \_month | \_year | month_year | interest_id | composition | index_value | ranking | percentile_ranking |
36 | |---------|--------|------------|-------------|-------------|-------------|---------|--------------------|
37 | | 7 | 2018 | 07-2018 | 32486 | 11.89 | 6.19 | 1 | 99.86 |
38 |
39 | In July 2018, the composition metric is 11.89, meaning that 11.89% of the client’s customer list interacted with the interest `interest_id` = 32486 - we can link interest_id to a separate mapping table to find the segment name called “Vacation Rental Accommodation Researchers”
40 |
41 | The `index_value` is 6.19, means that the composition value is 6.19x the average composition value for all Fresh Segments clients’ customer for this particular interest in the month of July 2018.
42 |
43 | The `ranking` and `percentage_ranking` relates to the order of `index_value` records in each month year.
44 |
45 | ### Interest Map
46 |
47 | This mapping table links the `interest_id` with their relevant interest information. You will need to join this table onto the previous `interest_details` table to obtain the interest_name as well as any details about the summary information.
48 |
49 | | id | interest_name | interest_summary | created_at | last_modified |
50 | |----|---------------------------|------------------------------------------------------------------------------------|---------------------|---------------------|
51 | | 1 | Fitness Enthusiasts | Consumers using fitness tracking apps and websites. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 |
52 | | 2 | Gamers | Consumers researching game reviews and cheat codes. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 |
53 | | 3 | Car Enthusiasts | Readers of automotive news and car reviews. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 |
54 | | 4 | Luxury Retail Researchers | Consumers researching luxury product reviews and gift ideas. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 |
55 | | 5 | Brides & Wedding Planners | People researching wedding ideas and vendors. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 |
56 | | 6 | Vacation Planners | Consumers reading reviews of vacation destinations and accommodations. | 2016-05-26 14:57:59 | 2018-05-23 11:30:13 |
57 | | 7 | Motorcycle Enthusiasts | Readers of motorcycle news and reviews. | 2016-05-26 14:57:59 | 2018-05-23 11:30:13 |
58 | | 8 | Business News Readers | Readers of online business news content. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 |
59 | | 12 | Thrift Store Shoppers | Consumers shopping online for clothing at thrift stores and researching locations. | 2016-05-26 14:57:59 | 2018-03-16 13:14:00 |
60 | | 13 | Advertising Professionals | People who read advertising industry news. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 |
61 |
62 | ## Table of Contents
63 |
64 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#introduction)
65 |
66 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#case-study-questions)
67 |
68 | [Data Exploration and Cleansing](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#data-exploration-and-cleansing)
69 |
70 | [1. Update the fresh_segments.interest_metrics table by modifying the month_year column to be a date data type with the start of the month](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-update-the-fresh_segmentsinterest_metrics-table-by-modifying-the-month_year-column-to-be-a-date-data-type-with-the-start-of-the-month)
71 |
72 | [2. What is count of records in the fresh_segments.interest_metrics for each month_year value sorted in chronological order (earliest to latest) with the null values appearing first?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-what-is-count-of-records-in-the-fresh_segmentsinterest_metrics-for-each-month_year-value-sorted-in-chronological-order-earliest-to-latest-with-the-null-values-appearing-first)
73 |
74 | [3. What do you think we should do with these null values in the fresh_segments.interest_metrics?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-what-do-you-think-we-should-do-with-these-null-values-in-the-fresh_segmentsinterest_metrics)
75 |
76 | [4. How many interest_id values exist in the fresh_segments.interest_metrics table but not in the fresh_segments.interest_map table? What about the other way around?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-how-many-interest_id-values-exist-in-the-fresh_segmentsinterest_metrics-table-but-not-in-the-fresh_segmentsinterest_map-table-what-about-the-other-way-around)
77 |
78 | [5. Summarise the id values in the fresh_segments.interest_map by its total record count in this table](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-summarise-the-id-values-in-the-fresh_segmentsinterest_map-by-its-total-record-count-in-this-table)
79 |
80 | [6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where interest_id = 21246 in your joined output and include all columns from fresh_segments.interest_metrics and all columns from fresh_segments.interest_map except from the id column](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#6-what-sort-of-table-join-should-we-perform-for-our-analysis-and-why-check-your-logic-by-checking-the-rows-where-interest_id--21246-in-your-joined-output-and-include-all-columns-from-fresh_segmentsinterest_metrics-and-all-columns-from-fresh_segmentsinterest_map-except-from-the-id-column)
81 |
82 | [7. Are there any records in your joined table where the month_year value is before the created_at value from the fresh_segments.interest_map table? Do you think these values are valid and why?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#7-are-there-any-records-in-your-joined-table-where-the-month_year-value-is-before-the-created_at-value-from-the-fresh_segmentsinterest_map-table-do-you-think-these-values-are-valid-and-why)
83 |
84 | [Interest Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#interest-analysis)
85 |
86 | [1. Which interests have been present in all month_year dates in our dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-which-interests-have-been-present-in-all-month_year-dates-in-our-dataset)
87 |
88 | [2. Using this same total_months measure - calculate the cumulative percentage of all records starting at 14 months - which total_months value passes the 90% cumulative percentage value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-using-this-same-total_months-measure---calculate-the-cumulative-percentage-of-all-records-starting-at-14-months---which-total_months-value-passes-the-90-cumulative-percentage-value)
89 |
90 | [3. If we were to remove all interest_id values which are lower than the total_months value we found in the previous question - how many total data points would we be removing?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-if-we-were-to-remove-all-interest_id-values-which-are-lower-than-the-total_months-value-we-found-in-the-previous-question---how-many-total-data-points-would-we-be-removing)
91 |
92 | [4. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-does-this-decision-make-sense-to-remove-these-data-points-from-a-business-perspective-use-an-example-where-there-are-all-14-months-present-to-a-removed-interest-example-for-your-arguments---think-about-what-it-means-to-have-less-months-present-from-a-segment-perspective)
93 |
94 | [5. After removing these interests - how many unique interests are there for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-after-removing-these-interests---how-many-unique-interests-are-there-for-each-month)
95 |
96 | [Segment Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#segment-analysis)
97 |
98 | [1. Using our filtered dataset by removing the interests with less than 6 months worth of data, which are the top 10 and bottom 10 interests which have the largest composition values in any month_year? Only use the maximum composition value for each interest but you must keep the corresponding month_year](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-using-our-filtered-dataset-by-removing-the-interests-with-less-than-6-months-worth-of-data-which-are-the-top-10-and-bottom-10-interests-which-have-the-largest-composition-values-in-any-month_year-only-use-the-maximum-composition-value-for-each-interest-but-you-must-keep-the-corresponding-month_year)
99 |
100 | [2. Which 5 interests had the lowest average ranking value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-which-5-interests-had-the-lowest-average-ranking-value)
101 |
102 | [3. Which 5 interests had the largest standard deviation in their percentile_ranking value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-which-5-interests-had-the-largest-standard-deviation-in-their-percentile_ranking-value)
103 |
104 | [4. For the 5 interests found in the previous question - what was minimum and maximum percentile_ranking values for each interest and its corresponding year_month value? Can you describe what is happening for these 5 interests?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-for-the-5-interests-found-in-the-previous-question---what-was-minimum-and-maximum-percentile_ranking-values-for-each-interest-and-its-corresponding-year_month-value-can-you-describe-what-is-happening-for-these-5-interests)
105 |
106 | [5. How would you describe our customers in this segment based off their composition and ranking values? What sort of products or services should we show to these customers and what should we avoid?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-how-would-you-describe-our-customers-in-this-segment-based-off-their-composition-and-ranking-values-what-sort-of-products-or-services-should-we-show-to-these-customers-and-what-should-we-avoid)
107 |
108 | [Index Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#index-analysis)
109 |
110 | [1. What is the top 10 interests by the average composition for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-what-is-the-top-10-interests-by-the-average-composition-for-each-month)
111 |
112 | [2. For all of these top 10 interests - which interest appears the most often?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-for-all-of-these-top-10-interests---which-interest-appears-the-most-often)
113 |
114 | [3. What is the average of the average composition for the top 10 interests for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-what-is-the-average-of-the-average-composition-for-the-top-10-interests-for-each-month)
115 |
116 | [4. What is the 3 month rolling average of the max average composition value from September 2018 to August 2019 and include the previous top ranking interests in the same output shown below](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-what-is-the-3-month-rolling-average-of-the-max-average-composition-value-from-september-2018-to-august-2019-and-include-the-previous-top-ranking-interests-in-the-same-output-shown-below)
117 |
118 | [5. Provide a possible reason why the max average composition might change from month to month? Could it signal something is not quite right with the overall business model for Fresh Segments?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-provide-a-possible-reason-why-the-max-average-composition-might-change-from-month-to-month-could-it-signal-something-is-not-quite-right-with-the-overall-business-model-for-fresh-segments)
119 |
--------------------------------------------------------------------------------
/Case Study #8 - Fresh Segments/SQL/solution_8.sql:
--------------------------------------------------------------------------------
1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 30, 2022
2 | -- Fixed on February 19, 2022
3 |
4 | -- Case Study #8 - Fresh Segments
5 |
6 | SET
7 | SEARCH_PATH = fresh_segments;
8 |
9 | /* --------------------
10 | Data Exploration and Cleansing
11 | 1. Update the fresh_segments.interest_metrics table by modifying the month_year column to be a date data type with the start of the month
12 | --------------------*/
13 |
14 | ALTER TABLE
15 | interest_metrics
16 | ALTER COLUMN
17 | month_year TYPE DATE USING TO_DATE(month_year, 'MM-YYYY');
18 |
19 | -- 2. What is count of records in the fresh_segments.interest_metrics for each month_year value sorted in chronological order (earliest to latest) with the null values appearing first?
20 |
21 | SELECT
22 | DATE_TRUNC('month', month_year) AS date,
23 | COUNT(*) AS number_of_records
24 | FROM
25 | interest_metrics
26 | GROUP BY
27 | month_year
28 | ORDER BY
29 | month_year NULLS FIRST;
30 |
31 | -- 3. What do you think we should do with these null values in the fresh_segments.interest_metrics?
32 | -- Non-coding question
33 |
34 | -- 4. How many interest_id values exist in the fresh_segments.interest_metrics table but not in the fresh_segments.interest_map table? What about the other way around?
35 |
36 | SELECT
37 | COUNT(distinct interest_id) AS interest_id
38 | FROM
39 | interest_metrics
40 | WHERE
41 | interest_id :: int NOT IN (
42 | SELECT
43 | id
44 | FROM
45 | interest_map
46 | );
47 |
48 | SELECT
49 | COUNT(id) AS interest_id
50 | FROM
51 | interest_map
52 | WHERE
53 | id NOT IN (
54 | SELECT
55 | distinct interest_id :: int
56 | FROM
57 | interest_metrics
58 | WHERE
59 | interest_id IS NOT NULL
60 | );
61 |
62 | -- 5. Summarise the id values in the fresh_segments.interest_map by its total record count in this table
63 |
64 | SELECT
65 | COUNT(distinct id) AS total_count
66 | FROM
67 | interest_map AS m;
68 |
69 | -- 6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where interest_id = 21246 in your joined output and include all columns from fresh_segments.interest_metrics and all columns from fresh_segments.interest_map except from the id column
70 |
71 | SELECT
72 | distinct interest_id :: int,
73 | interest_name,
74 | interest_summary,
75 | created_at,
76 | last_modified,
77 | _month,
78 | _year,
79 | month_year,
80 | composition,
81 | index_value,
82 | ranking,
83 | percentile_ranking
84 | FROM
85 | interest_map AS m
86 | LEFT JOIN interest_metrics AS im ON m.id = im.interest_id :: int
87 | WHERE
88 | interest_id = '21246'
89 | GROUP BY
90 | interest_name,
91 | id,
92 | interest_summary,
93 | created_at,
94 | last_modified,
95 | _month,
96 | _year,
97 | month_year,
98 | interest_id,
99 | composition,
100 | index_value,
101 | ranking,
102 | percentile_ranking
103 | ORDER BY
104 | _month NULLS FIRST;
105 |
106 | -- 7. Are there any records in your joined table where the month_year value is before the created_at value from the fresh_segments.interest_map table? Do you think these values are valid and why?
107 |
108 | WITH joined_table AS (
109 | SELECT
110 | distinct interest_id :: int,
111 | interest_name,
112 | interest_summary,
113 | created_at,
114 | last_modified,
115 | _month,
116 | _year,
117 | month_year,
118 | composition,
119 | index_value,
120 | ranking,
121 | percentile_ranking
122 | FROM
123 | interest_map AS m
124 | LEFT JOIN interest_metrics AS im ON m.id = im.interest_id :: int
125 | GROUP BY
126 | interest_name,
127 | id,
128 | interest_summary,
129 | created_at,
130 | last_modified,
131 | _month,
132 | _year,
133 | month_year,
134 | interest_id,
135 | composition,
136 | index_value,
137 | ranking,
138 | percentile_ranking
139 | )
140 | SELECT
141 | COUNT(*)
142 | FROM
143 | joined_table
144 | WHERE
145 | created_at > month_year
146 | ORDER BY
147 | 1;
148 |
149 | /* --------------------
150 | Interest Analysis
151 | 1. Which interests have been present in all month_year dates in our dataset?
152 | --------------------*/
153 |
154 | WITH interests AS (
155 | SELECT
156 | id,
157 | interest_name
158 | FROM
159 | interest_map AS m
160 | LEFT JOIN interest_metrics AS im ON m.id = im.interest_id :: int
161 | GROUP BY
162 | 1,
163 | 2
164 | HAVING
165 | COUNT(interest_id) = 14
166 | )
167 | SELECT
168 | interest_name
169 | FROM
170 | interests
171 | ORDER BY
172 | 1;
173 |
174 | -- 2. Using this same total_months measure - calculate the cumulative percentage of all records starting at 14 months - which total_months value passes the 90% cumulative percentage value?
175 |
176 | WITH counted_months AS (
177 | SELECT
178 | interest_id,
179 | COUNT(interest_id) total_months,
180 | ROW_NUMBER() OVER(
181 | PARTITION BY COUNT(interest_id)
182 | ORDER BY
183 | COUNT(interest_id)
184 | ) AS rank
185 | FROM
186 | interest_metrics AS im
187 | GROUP BY
188 | 1
189 | HAVING
190 | COUNT(interest_id) > 0
191 | )
192 | SELECT
193 | total_months,
194 | MAX(rank) AS number_of_interests,
195 | CAST(
196 | 100 * SUM(MAX(rank)) OVER (
197 | ORDER BY
198 | total_months
199 | ) / SUM(MAX(rank)) OVER () AS numeric(10, 2)
200 | ) cum_top,
201 | CAST(
202 | 100 - 100 * SUM(MAX(rank)) OVER (
203 | ORDER BY
204 | total_months
205 | ) / SUM(MAX(rank)) OVER () AS numeric(10, 2)
206 | ) cum_top_reversed
207 | FROM
208 | counted_months
209 | GROUP BY
210 | total_months
211 | ORDER BY
212 | 1;
213 |
214 | -- 3. If we were to remove all interest_id values which are lower than the total_months value we found in the previous question - how many total data points would we be removing?
215 |
216 | WITH interests AS (
217 | SELECT
218 | interest_id
219 | FROM
220 | interest_metrics AS im
221 | GROUP BY
222 | 1
223 | HAVING
224 | COUNT(interest_id) < 6
225 | )
226 | SELECT
227 | COUNT(interest_id) AS number_of_interests
228 | FROM
229 | interests
230 | ORDER BY
231 | 1;
232 |
233 | -- 4. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective.
234 |
235 | SELECT
236 | im.month_year,
237 | COUNT(interest_id) AS number_of_excluded_interests,
238 | number_of_included_interests,
239 | ROUND(
240 | 100 *(
241 | COUNT(interest_id) / number_of_included_interests :: numeric
242 | ),
243 | 1
244 | ) AS percent_of_excluded
245 | FROM
246 | interest_metrics AS im
247 | JOIN (
248 | SELECT
249 | month_year,
250 | COUNT(interest_id) AS number_of_included_interests
251 | FROM
252 | interest_metrics AS im
253 | WHERE
254 | month_year IS NOT NULL
255 | AND interest_id :: int IN (
256 | SELECT
257 | interest_id :: int
258 | FROM
259 | interest_metrics
260 | GROUP BY
261 | 1
262 | HAVING
263 | COUNT(interest_id) > 5
264 | )
265 | GROUP BY
266 | 1
267 | ) i ON im.month_year = i.month_year
268 | WHERE
269 | im.month_year IS NOT NULL
270 | AND interest_id :: int IN (
271 | SELECT
272 | interest_id :: int
273 | FROM
274 | interest_metrics
275 | GROUP BY
276 | 1
277 | having
278 | COUNT(interest_id) < 6
279 | )
280 | GROUP BY
281 | 1,
282 | 3
283 | ORDER BY
284 | 1;
285 |
286 | -- 5. After removing these interests - how many unique interests are there for each month?
287 |
288 | SELECT
289 | month_year,
290 | COUNT(interest_id) AS number_of_interests
291 | FROM
292 | interest_metrics AS im
293 | WHERE
294 | month_year IS NOT NULL
295 | AND interest_id :: int IN (
296 | SELECT
297 | interest_id :: int
298 | FROM
299 | interest_metrics
300 | GROUP BY
301 | 1
302 | HAVING
303 | COUNT(interest_id) > 5
304 | )
305 | GROUP BY
306 | 1
307 | ORDER BY
308 | 1;
309 |
310 | /* --------------------
311 | Segment Analysis
312 | 1. Using our filtered dataset by removing the interests with less than 6 months worth of data, which are the top 10 and bottom 10 interests which have the largest composition values in any month_year? Only use the maximum composition value for each interest but you must keep the corresponding month_year
313 | --------------------*/
314 |
315 | SELECT
316 | interests.month_year,
317 | interests.interest_name,
318 | interests.composition,
319 | i_max_new.composition AS max_composition,
320 | i_max_new.month_year AS max_composition_month
321 | FROM
322 | (
323 | (
324 | WITH max_interests AS (
325 | SELECT
326 | month_year,
327 | interest_name,
328 | composition,
329 | RANK() OVER (
330 | PARTITION BY interest_name
331 | ORDER BY
332 | composition DESC
333 | ) AS max_rank
334 | FROM
335 | interest_metrics AS im
336 | JOIN interest_map AS m ON m.id = im.interest_id :: int
337 | WHERE
338 | month_year IS NOT NULL
339 | AND interest_id :: int in (
340 | SELECT
341 | interest_id :: int
342 | FROM
343 | interest_metrics
344 | GROUP BY
345 | 1
346 | HAVING
347 | COUNT(interest_id) > 5
348 | )
349 | GROUP BY
350 | 1,
351 | 2,
352 | 3
353 | )
354 | SELECT
355 | month_year,
356 | interest_name,
357 | composition
358 | FROM
359 | max_interests
360 | WHERE
361 | max_rank = 1
362 | GROUP BY
363 | 1,
364 | 2,
365 | 3
366 | ORDER BY
367 | 3 DESC
368 | LIMIT
369 | 10
370 | )
371 | UNION
372 | (
373 | WITH min_interests AS (
374 | SELECT
375 | month_year,
376 | interest_name,
377 | composition,
378 | RANK() OVER (
379 | PARTITION BY interest_name
380 | ORDER BY
381 | composition
382 | ) AS min_rank
383 | FROM
384 | interest_metrics AS im
385 | JOIN interest_map AS m ON m.id = im.interest_id :: int
386 | WHERE
387 | month_year IS NOT NULL
388 | AND interest_id :: int in (
389 | SELECT
390 | interest_id :: int
391 | FROM
392 | interest_metrics
393 | GROUP BY
394 | 1
395 | HAVING
396 | COUNT(interest_id) > 5
397 | )
398 | GROUP BY
399 | 1,
400 | 2,
401 | 3
402 | )
403 | SELECT
404 | month_year,
405 | interest_name,
406 | composition
407 | FROM
408 | min_interests
409 | WHERE
410 | min_rank = 1
411 | GROUP BY
412 | 1,
413 | 2,
414 | 3
415 | ORDER BY
416 | 3
417 | LIMIT
418 | 10
419 | )
420 | ) AS interests
421 | JOIN (
422 | WITH max_interests AS (
423 | SELECT
424 | month_year,
425 | interest_name,
426 | composition,
427 | RANK() OVER (
428 | PARTITION BY interest_name
429 | ORDER BY
430 | composition DESC
431 | ) AS max_rank
432 | FROM
433 | interest_metrics AS im
434 | JOIN interest_map AS m ON m.id = im.interest_id :: int
435 | WHERE
436 | month_year IS NOT NULL
437 | AND interest_id :: int in (
438 | SELECT
439 | interest_id :: int
440 | FROM
441 | interest_metrics
442 | GROUP BY
443 | 1
444 | HAVING
445 | COUNT(interest_id) > 5
446 | )
447 | GROUP BY
448 | 1,
449 | 2,
450 | 3
451 | )
452 | SELECT
453 | month_year,
454 | interest_name,
455 | composition
456 | FROM
457 | max_interests
458 | WHERE
459 | max_rank = 1
460 | GROUP BY
461 | 1,
462 | 2,
463 | 3
464 | ORDER BY
465 | 3 DESC
466 | ) i_max_new on interests.interest_name = i_max_new.interest_name
467 | ORDER BY
468 | 3 DESC;
469 |
470 | -- 2. Which 5 interests had the lowest average ranking value?
471 |
472 | WITH ranking AS (
473 | SELECT
474 | interest_name,
475 | AVG(ranking) :: numeric(10, 2) AS avg_ranking,
476 | RANK() OVER (
477 | ORDER BY
478 | AVG(ranking) DESC
479 | ) AS rank
480 | FROM
481 | interest_metrics AS im
482 | JOIN interest_map AS m ON m.id = im.interest_id :: int
483 | WHERE
484 | month_year IS NOT NULL
485 | AND interest_id :: int IN (
486 | SELECT
487 | interest_id :: int
488 | FROM
489 | interest_metrics
490 | GROUP BY
491 | 1
492 | HAVING
493 | COUNT(interest_id) > 5
494 | )
495 | GROUP BY
496 | 1
497 | )
498 | SELECT
499 | interest_name,
500 | avg_ranking
501 | FROM
502 | ranking
503 | WHERE
504 | rank between 0
505 | AND 5;
506 |
507 | -- 3. Which 5 interests had the largest standard deviation in their percentile_ranking value?
508 |
509 | WITH ranking AS (
510 | SELECT
511 | id,
512 | interest_name,
513 | STDDEV(percentile_ranking) :: numeric(10, 2) AS standard_deviation,
514 | RANK() OVER (
515 | ORDER BY
516 | STDDEV(percentile_ranking) DESC
517 | ) AS rank
518 | FROM
519 | interest_metrics AS im
520 | JOIN interest_map AS m ON m.id = im.interest_id :: int
521 | WHERE
522 | month_year IS NOT NULL
523 | AND interest_id :: int IN (
524 | SELECT
525 | interest_id :: int
526 | FROM
527 | interest_metrics
528 | GROUP BY
529 | 1
530 | having
531 | count(interest_id) > 5
532 | )
533 | GROUP BY
534 | 1,
535 | 2
536 | )
537 | SELECT
538 | interest_name,
539 | standard_deviation
540 | FROM
541 | ranking
542 | WHERE
543 | rank between 0
544 | AND 5;
545 |
546 | -- 4. For the 5 interests found in the previous question - what was minimum and maximum percentile_ranking values for each interest and its corresponding year_month value? Can you describe what is happening for these 5 interests?
547 |
548 | WITH ranking AS (
549 | SELECT
550 | month_year,
551 | id,
552 | interest_name,
553 | percentile_ranking,
554 | RANK() OVER (
555 | PARTITION BY id
556 | ORDER BY
557 | percentile_ranking
558 | ) AS min_rank,
559 | RANK() OVER (
560 | PARTITION BY id
561 | ORDER BY
562 | percentile_ranking DESC
563 | ) AS max_rank
564 | FROM
565 | interest_metrics AS im
566 | JOIN interest_map AS m ON m.id = im.interest_id :: int
567 | WHERE
568 | month_year IS NOT NULL
569 | AND interest_id :: int IN (
570 | SELECT
571 | interest_id :: int
572 | FROM
573 | interest_metrics
574 | GROUP BY
575 | 1
576 | HAVING
577 | COUNT(interest_id) > 5
578 | )
579 | AND id IN (
580 | WITH ranking AS (
581 | SELECT
582 | id,
583 | interest_name,
584 | STDDEV(percentile_ranking) :: numeric(10, 2) AS standard_deviation,
585 | RANK() OVER (
586 | ORDER BY
587 | STDDEV(percentile_ranking) DESC
588 | ) AS rank
589 | FROM
590 | interest_metrics AS im
591 | JOIN interest_map AS m ON m.id = im.interest_id :: int
592 | WHERE
593 | month_year IS NOT NULL
594 | AND interest_id :: int IN (
595 | SELECT
596 | interest_id :: int
597 | FROM
598 | interest_metrics
599 | GROUP BY
600 | 1
601 | having
602 | count(interest_id) > 5
603 | )
604 | GROUP BY
605 | 1,
606 | 2
607 | )
608 | SELECT
609 | id
610 | FROM
611 | ranking
612 | WHERE
613 | rank between 0
614 | AND 5
615 | )
616 | GROUP BY
617 | 1,
618 | 2,
619 | 3,
620 | 4
621 | )
622 | SELECT
623 | month_year,
624 | interest_name,
625 | percentile_ranking
626 | FROM
627 | ranking
628 | WHERE
629 | min_rank = 1
630 | or max_rank = 1
631 | GROUP BY
632 | 1,
633 | 2,
634 | 3
635 | ORDER BY
636 | 2,
637 | 3 DESC;
638 |
639 | -- 5. How would you describe our customers in this segment based off their composition and ranking values? What sort of products or services should we show to these customers and what should we avoid?
640 | -- Non-coding question
641 |
642 | /* --------------------
643 | Index Analysis
644 | The index_value is a measure which can be used to reverse calculate the average composition for Fresh Segments’ clients.
645 | Average composition can be calculated by dividing the composition column by the index_value column rounded to 2 decimal places.
646 | 1. What is the top 10 interests by the average composition for each month?
647 | --------------------*/
648 |
649 | WITH ranking AS (
650 | SELECT
651 | month_year,
652 | id,
653 | interest_name,
654 | avg_composition,
655 | RANK() OVER (
656 | PARTITION BY month_year
657 | ORDER BY
658 | avg_composition DESC
659 | ) AS max_rank
660 | FROM
661 | interest_metrics AS im
662 | JOIN interest_map AS m ON m.id = im.interest_id :: int,
663 | LATERAL(
664 | SELECT
665 | (composition / index_value) :: numeric(10, 2) AS avg_composition
666 | ) ac
667 | WHERE
668 | month_year IS NOT NULL
669 | AND interest_id :: int IN (
670 | SELECT
671 | interest_id :: int
672 | FROM
673 | interest_metrics
674 | GROUP BY
675 | 1
676 | HAVING
677 | COUNT(interest_id) > 5
678 | )
679 | GROUP BY
680 | 1,
681 | 2,
682 | 3,
683 | 4
684 | )
685 | SELECT
686 | month_year,
687 | interest_name,
688 | avg_composition
689 | FROM
690 | ranking
691 | WHERE
692 | max_rank between 1
693 | AND 10
694 | ORDER BY
695 | 1,
696 | 3 DESC;
697 |
698 | -- 2. For all of these top 10 interests - which interest appears the most often?
699 |
700 | WITH ranking AS (
701 | SELECT
702 | month_year,
703 | id,
704 | interest_name,
705 | avg_composition,
706 | RANK() OVER (
707 | PARTITION BY month_year
708 | ORDER BY
709 | avg_composition DESC
710 | ) AS max_rank
711 | FROM
712 | interest_metrics AS im
713 | JOIN interest_map AS m on m.id = im.interest_id :: int,
714 | LATERAL(
715 | SELECT
716 | (composition / index_value) :: numeric(10, 2) AS avg_composition
717 | ) ac
718 | WHERE
719 | month_year IS NOT NULL
720 | AND interest_id :: int IN (
721 | SELECT
722 | interest_id :: int
723 | FROM
724 | interest_metrics
725 | GROUP BY
726 | 1
727 | HAVING
728 | COUNT(interest_id) > 5
729 | )
730 | GROUP BY
731 | 1,
732 | 2,
733 | 3,
734 | 4
735 | )
736 | SELECT
737 | interest_name,
738 | COUNT(interest_name) AS months_in_top_1
739 | FROM
740 | ranking
741 | WHERE
742 | max_rank = 1
743 | GROUP BY
744 | 1
745 | ORDER BY
746 | 2 DESC;
747 |
748 | -- 3. What is the average of the average composition for the top 10 interests for each month?
749 |
750 | SELECT
751 | month_year,
752 | AVG(avg_composition) :: numeric(10, 2) AS average_rating
753 | FROM
754 | (
755 | WITH ranking AS (
756 | SELECT
757 | month_year,
758 | id,
759 | interest_name,
760 | avg_composition,
761 | RANK() OVER (
762 | PARTITION BY month_year
763 | ORDER BY
764 | avg_composition DESC
765 | ) AS max_rank
766 | FROM
767 | interest_metrics AS im
768 | JOIN interest_map AS m ON m.id = im.interest_id :: int,
769 | LATERAL(
770 | SELECT
771 | (composition / index_value) :: numeric(10, 2) AS avg_composition
772 | ) ac
773 | WHERE
774 | month_year IS NOT NULL
775 | AND interest_id :: int IN (
776 | SELECT
777 | interest_id :: int
778 | FROM
779 | interest_metrics
780 | GROUP BY
781 | 1
782 | HAVING
783 | COUNT(interest_id) > 5
784 | )
785 | GROUP BY
786 | 1,
787 | 2,
788 | 3,
789 | 4
790 | )
791 | SELECT
792 | month_year,
793 | interest_name,
794 | avg_composition
795 | FROM
796 | ranking
797 | WHERE
798 | max_rank between 1
799 | AND 10
800 | ) r
801 | GROUP BY
802 | 1
803 | ORDER BY
804 | 1;
805 |
806 | -- 4. What is the 3 month rolling average of the max average composition value from September 2018 to August 2019 and include the previous top ranking interests in the same output shown below
807 |
808 | SELECT
809 | *
810 | FROM
811 | (
812 | WITH ranking AS (
813 | SELECT
814 | month_year,
815 | id,
816 | interest_name,
817 | avg_composition,
818 | RANK() OVER (
819 | PARTITION BY month_year
820 | ORDER BY
821 | avg_composition DESC
822 | ) AS max_rank
823 | FROM
824 | interest_metrics AS im
825 | JOIN interest_map AS m ON m.id = im.interest_id :: int,
826 | LATERAL(
827 | SELECT
828 | (composition / index_value) :: numeric(10, 2) AS avg_composition
829 | ) ac
830 | WHERE
831 | month_year IS NOT NULL
832 | AND interest_id :: int IN (
833 | SELECT
834 | interest_id :: int
835 | FROM
836 | interest_metrics
837 | GROUP BY
838 | 1
839 | HAVING
840 | COUNT(interest_id) > 5
841 | )
842 | GROUP BY
843 | 1,
844 | 2,
845 | 3,
846 | 4
847 | )
848 | SELECT
849 | month_year,
850 | interest_name,
851 | avg_composition AS max_index_composition,
852 | (
853 | AVG(avg_composition) OVER(
854 | ORDER BY
855 | month_year ROWS BETWEEN 2 PRECEDING
856 | AND CURRENT ROW
857 | )
858 | ) :: numeric(10, 2) AS _3_month_moving_avg,
859 | CONCAT(
860 | LAG(interest_name) OVER (
861 | ORDER BY
862 | month_year
863 | ),
864 | ': ',
865 | LAG(avg_composition) OVER (
866 | ORDER BY
867 | month_year
868 | )
869 | ) AS _1_month_ago,
870 | CONCAT(
871 | LAG(interest_name, 2) OVER (
872 | ORDER BY
873 | month_year
874 | ),
875 | ': ',
876 | LAG(avg_composition, 2) OVER (
877 | ORDER BY
878 | month_year
879 | )
880 | ) AS _2_month_ago
881 | FROM
882 | ranking
883 | WHERE
884 | max_rank = 1
885 | ) r
886 | WHERE
887 | month_year > '2018-08-01'
888 | ORDER BY
889 | 1;
890 |
891 | -- 5. Provide a possible reason why the max average composition might change from month to month? Could it signal something is not quite right with the overall business model for Fresh Segments?
892 | -- Non-coding question
893 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 8-Week-SQL-Challenge
2 | #8WeekSQLChallenge, https://8weeksqlchallenge.com: Solutions for SQL Case Studies
3 |
4 | | Name | Description | Status |
5 | |--------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
6 | | **[Case Study #1 - Danny's Diner](https://github.com/muryulia/8-Week-SQL-Challenge/tree/main/Case%20Study%20%231%20-%20Danny's%20Diner)** | Danny’s Diner is in need of your assistance to help the restaurant stay afloat. Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. | Completed |
7 | | **[Case Study #2 - Pizza Runner](https://github.com/muryulia/8-Week-SQL-Challenge/tree/main/Case%20Study%20%232%20-%20Pizza%20Runner)** | Danny requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations. | Completed |
8 | | **[Case Study #3 - Foodie-Fi](https://github.com/muryulia/8-Week-SQL-Challenge/tree/main/Case%20Study%20%233%20-%20Foodie-Fi)** | This case study focuses on using subscription style digital data to answer important business questions. | Completed |
9 | | **[Case Study #4 - Data Bank](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%234%20-%20Data%20Bank/Solution.md)** | Data Bank runs just like any other digital bank - but it isn’t only for banking activities, they also have the world’s most secure distributed data storage platform! The management team at Data Bank want to increase their total customer base - but also need some help tracking just how much data storage their customers will need. | In Progress |
10 | | **[Case Study #5 - Data Mart](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md)** | Data Mart is Danny’s latest venture and after running international operations for his online supermarket that specialises in fresh produce - Danny is asking for your support to analyse his sales performance. | Completed |
11 | | **[Case Study #6 - Clique Bait](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md)** | In this case study - you are required to support Danny’s vision and analyse his dataset and come up with creative solutions to calculate funnel fallout rates for the Clique Bait online store. | Completed Partially |
12 | | **[Case Study #7 - Balanced Tree Clothing Co.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md)** | Danny, the CEO of this trendy fashion company has asked you to assist the team’s merchandising teams analyse their sales performance and generate a basic financial report to share with the wider business. | Completed Partially |
13 | | **[Case Study #8 - Fresh Segments](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md)** | Danny created Fresh Segments, a digital marketing agency that helps other businesses analyse trends in online ad click behaviour for their unique customer base. Danny has asked for your assistance to analyse aggregated metrics for an example client and provide some high level insights about the customer list and their interests. | Completed |
14 |
15 | Solved on Postgres using various SQL techniques like conditional `WHERE` statements, data aggregation with `GROUP BY` and ordering with `ORDER BY`, `CASE WHEN` Statements, string transformations, regular expressions, datetime manipulation, common table expressions (CTE), subqueries and nested queries, complex table joins (Inner, Outer, Left, Right, Lateral - no Cross Joins - but it was possible to use them too), window functions (`row_number()`, `rank()`, `dense_rank()`, `lag()`, `lead()`).
16 |
17 | Some cases include duplicate pre-processing, exploratory data analysis and summary statistics.
18 |
--------------------------------------------------------------------------------