├── Case Study #1 - Danny's Diner
└── README.md
├── Case Study #2 - Pizza Runner
├── A. Pizza Metrics.md
├── B. Runner and Customer Experience.md
├── C. Ingredient Optimisation.md
├── Data Cleaning and Transformation.md
└── README.md
├── Case Study #3 - Foodie-Fi
└── README.md
├── Case Study #4 - Data Bank
└── README.md
├── Case Study #5 - Data Mart
└── README.md
├── Case Study #6 - Clique Bait
└── README.md
├── Case Study #7 - Balanced Tree Clothing Co.
└── README.md
├── Case Study #8: Fresh Segments
├── A. Data Exploration and Cleansing.md
├── B. Interest Analysis.md
└── README.md
└── README.md
/Case Study #1 - Danny's Diner/README.md:
--------------------------------------------------------------------------------
1 | # 🍜 Case Study #1: Danny's Diner
2 |
3 |
4 | ## 📚 Table of Contents
5 | - [Business Task](#business-task)
6 | - [Entity Relationship Diagram](#entity-relationship-diagram)
7 | - [Question and Solution](#question-and-solution)
8 |
9 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-1/).
10 |
11 | ***
12 |
13 | ## Business Task
14 | Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite.
15 |
16 | ***
17 |
18 | ## Entity Relationship Diagram
19 |
20 | 
21 |
22 | ***
23 |
24 | ## Question and Solution
25 |
26 | Please join me in executing the queries using PostgreSQL on [DB Fiddle](https://www.db-fiddle.com/f/2rM8RAnq7h5LLDTzZiRWcd/138). It would be great to work together on the questions!
27 |
28 | Additionally, I have also published this case study on [Medium](https://katiehuangx.medium.com/8-week-sql-challenge-case-study-week-1-dannys-diner-2ba026c897ab?source=friends_link&sk=ed355696f5a70ff8b3d5a1b905e5dabe).
29 |
30 | If you have any questions, reach out to me on [LinkedIn](https://www.linkedin.com/in/katiehuangx/).
31 |
32 | **1. What is the total amount each customer spent at the restaurant?**
33 |
34 | ````sql
35 | SELECT
36 | sales.customer_id,
37 | SUM(menu.price) AS total_sales
38 | FROM dannys_diner.sales
39 | INNER JOIN dannys_diner.menu
40 | ON sales.product_id = menu.product_id
41 | GROUP BY sales.customer_id
42 | ORDER BY sales.customer_id ASC;
43 | ````
44 |
45 | #### Steps:
46 | - Use **JOIN** to merge `dannys_diner.sales` and `dannys_diner.menu` tables as `sales.customer_id` and `menu.price` are from both tables.
47 | - Use **SUM** to calculate the total sales contributed by each customer.
48 | - Group the aggregated results by `sales.customer_id`.
49 |
50 | #### Answer:
51 | | customer_id | total_sales |
52 | | ----------- | ----------- |
53 | | A | 76 |
54 | | B | 74 |
55 | | C | 36 |
56 |
57 | - Customer A spent $76.
58 | - Customer B spent $74.
59 | - Customer C spent $36.
60 |
61 | ***
62 |
63 | **2. How many days has each customer visited the restaurant?**
64 |
65 | ````sql
66 | SELECT
67 | customer_id,
68 | COUNT(DISTINCT order_date) AS visit_count
69 | FROM dannys_diner.sales
70 | GROUP BY customer_id;
71 | ````
72 |
73 | #### Steps:
74 | - To determine the unique number of visits for each customer, utilize **COUNT(DISTINCT `order_date`)**.
75 | - It's important to apply the **DISTINCT** keyword while calculating the visit count to avoid duplicate counting of days. For instance, if Customer A visited the restaurant twice on '2021–01–07', counting without **DISTINCT** would result in 2 days instead of the accurate count of 1 day.
76 |
77 | #### Answer:
78 | | customer_id | visit_count |
79 | | ----------- | ----------- |
80 | | A | 4 |
81 | | B | 6 |
82 | | C | 2 |
83 |
84 | - Customer A visited 4 times.
85 | - Customer B visited 6 times.
86 | - Customer C visited 2 times.
87 |
88 | ***
89 |
90 | **3. What was the first item from the menu purchased by each customer?**
91 |
92 | ````sql
93 | WITH ordered_sales AS (
94 | SELECT
95 | sales.customer_id,
96 | sales.order_date,
97 | menu.product_name,
98 | DENSE_RANK() OVER (
99 | PARTITION BY sales.customer_id
100 | ORDER BY sales.order_date) AS rank
101 | FROM dannys_diner.sales
102 | INNER JOIN dannys_diner.menu
103 | ON sales.product_id = menu.product_id
104 | )
105 |
106 | SELECT
107 | customer_id,
108 | product_name
109 | FROM ordered_sales
110 | WHERE rank = 1
111 | GROUP BY customer_id, product_name;
112 | ````
113 |
114 | #### Steps:
115 | - Create a Common Table Expression (CTE) named `ordered_sales_cte`. Within the CTE, create a new column `rank` and calculate the row number using **DENSE_RANK()** window function. The **PARTITION BY** clause divides the data by `customer_id`, and the **ORDER BY** clause orders the rows within each partition by `order_date`.
116 | - In the outer query, select the appropriate columns and apply a filter in the **WHERE** clause to retrieve only the rows where the rank column equals 1, which represents the first row within each `customer_id` partition.
117 | - Use the GROUP BY clause to group the result by `customer_id` and `product_name`.
118 |
119 | #### Answer:
120 | | customer_id | product_name |
121 | | ----------- | ----------- |
122 | | A | curry |
123 | | A | sushi |
124 | | B | curry |
125 | | C | ramen |
126 |
127 | - Customer A placed an order for both curry and sushi simultaneously, making them the first items in the order.
128 | - Customer B's first order is curry.
129 | - Customer C's first order is ramen.
130 |
131 | I have received feedback suggesting the use of `ROW_NUMBER()` instead of `DENSE_RANK()` for determining the "first order" in this question.
132 |
133 | However, since the `order_date` does not have a timestamp, it is impossible to determine the exact sequence of items ordered by the customer.
134 |
135 | Therefore, it would be inaccurate to conclude that curry is the customer's first order purely based on the alphabetical order of the product names. For this reason, I maintain my solution of using `DENSE_RANK()` and consider both curry and sushi as Customer A's first order.
136 |
137 | ***
138 |
139 | **4. What is the most purchased item on the menu and how many times was it purchased by all customers?**
140 |
141 | ````sql
142 | SELECT
143 | menu.product_name,
144 | COUNT(sales.product_id) AS most_purchased_item
145 | FROM dannys_diner.sales
146 | INNER JOIN dannys_diner.menu
147 | ON sales.product_id = menu.product_id
148 | GROUP BY menu.product_name
149 | ORDER BY most_purchased_item DESC
150 | LIMIT 1;
151 | ````
152 |
153 | #### Steps:
154 | - Perform a **COUNT** aggregation on the `product_id` column and **ORDER BY** the result in descending order using `most_purchased` field.
155 | - Apply the **LIMIT** 1 clause to filter and retrieve the highest number of purchased items.
156 |
157 | #### Answer:
158 | | most_purchased | product_name |
159 | | ----------- | ----------- |
160 | | 8 | ramen |
161 |
162 |
163 | - Most purchased item on the menu is ramen which is 8 times. Yummy!
164 |
165 | ***
166 |
167 | **5. Which item was the most popular for each customer?**
168 |
169 | ````sql
170 | WITH most_popular AS (
171 | SELECT
172 | sales.customer_id,
173 | menu.product_name,
174 | COUNT(menu.product_id) AS order_count,
175 | DENSE_RANK() OVER (
176 | PARTITION BY sales.customer_id
177 | ORDER BY COUNT(sales.customer_id) DESC) AS rank
178 | FROM dannys_diner.menu
179 | INNER JOIN dannys_diner.sales
180 | ON menu.product_id = sales.product_id
181 | GROUP BY sales.customer_id, menu.product_name
182 | )
183 |
184 | SELECT
185 | customer_id,
186 | product_name,
187 | order_count
188 | FROM most_popular
189 | WHERE rank = 1;
190 | ````
191 |
192 | *Each user may have more than 1 favourite item.*
193 |
194 | #### Steps:
195 | - Create a CTE named `fav_item_cte` and within the CTE, join the `menu` table and `sales` table using the `product_id` column.
196 | - Group results by `sales.customer_id` and `menu.product_name` and calculate the count of `menu.product_id` occurrences for each group.
197 | - Utilize the **DENSE_RANK()** window function to calculate the ranking of each `sales.customer_id` partition based on the count of orders **COUNT(`sales.customer_id`)** in descending order.
198 | - In the outer query, select the appropriate columns and apply a filter in the **WHERE** clause to retrieve only the rows where the rank column equals 1, representing the rows with the highest order count for each customer.
199 |
200 | #### Answer:
201 | | customer_id | product_name | order_count |
202 | | ----------- | ---------- |------------ |
203 | | A | ramen | 3 |
204 | | B | sushi | 2 |
205 | | B | curry | 2 |
206 | | B | ramen | 2 |
207 | | C | ramen | 3 |
208 |
209 | - Customer A and C's favourite item is ramen.
210 | - Customer B enjoys all items on the menu. He/she is a true foodie, sounds like me.
211 |
212 | ***
213 |
214 | **6. Which item was purchased first by the customer after they became a member?**
215 |
216 | ```sql
217 | WITH joined_as_member AS (
218 | SELECT
219 | members.customer_id,
220 | sales.product_id,
221 | ROW_NUMBER() OVER (
222 | PARTITION BY members.customer_id
223 | ORDER BY sales.order_date) AS row_num
224 | FROM dannys_diner.members
225 | INNER JOIN dannys_diner.sales
226 | ON members.customer_id = sales.customer_id
227 | AND sales.order_date > members.join_date
228 | )
229 |
230 | SELECT
231 | customer_id,
232 | product_name
233 | FROM joined_as_member
234 | INNER JOIN dannys_diner.menu
235 | ON joined_as_member.product_id = menu.product_id
236 | WHERE row_num = 1
237 | ORDER BY customer_id ASC;
238 | ```
239 |
240 | #### Steps:
241 | - Create a CTE named `joined_as_member` and within the CTE, select the appropriate columns and calculate the row number using the **ROW_NUMBER()** window function. The **PARTITION BY** clause divides the data by `members.customer_id` and the **ORDER BY** clause orders the rows within each `members.customer_id` partition by `sales.order_date`.
242 | - Join tables `dannys_diner.members` and `dannys_diner.sales` on `customer_id` column. Additionally, apply a condition to only include sales that occurred *after* the member's `join_date` (`sales.order_date > members.join_date`).
243 | - In the outer query, join the `joined_as_member` CTE with the `dannys_diner.menu` on the `product_id` column.
244 | - In the **WHERE** clause, filter to retrieve only the rows where the row_num column equals 1, representing the first row within each `customer_id` partition.
245 | - Order result by `customer_id` in ascending order.
246 |
247 | #### Answer:
248 | | customer_id | product_name |
249 | | ----------- | ---------- |
250 | | A | ramen |
251 | | B | sushi |
252 |
253 | - Customer A's first order as a member is ramen.
254 | - Customer B's first order as a member is sushi.
255 |
256 | ***
257 |
258 | **7. Which item was purchased just before the customer became a member?**
259 |
260 | ````sql
261 | WITH purchased_prior_member AS (
262 | SELECT
263 | members.customer_id,
264 | sales.product_id,
265 | ROW_NUMBER() OVER (
266 | PARTITION BY members.customer_id
267 | ORDER BY sales.order_date DESC) AS rank
268 | FROM dannys_diner.members
269 | INNER JOIN dannys_diner.sales
270 | ON members.customer_id = sales.customer_id
271 | AND sales.order_date < members.join_date
272 | )
273 |
274 | SELECT
275 | p_member.customer_id,
276 | menu.product_name
277 | FROM purchased_prior_member AS p_member
278 | INNER JOIN dannys_diner.menu
279 | ON p_member.product_id = menu.product_id
280 | WHERE rank = 1
281 | ORDER BY p_member.customer_id ASC;
282 | ````
283 |
284 | #### Steps:
285 | - Create a CTE called `purchased_prior_member`.
286 | - In the CTE, select the appropriate columns and calculate the rank using the **ROW_NUMBER()** window function. The rank is determined based on the order dates of the sales in descending order within each customer's group.
287 | - Join `dannys_diner.members` table with `dannys_diner.sales` table based on the `customer_id` column, only including sales that occurred *before* the customer joined as a member (`sales.order_date < members.join_date`).
288 | - Join `purchased_prior_member` CTE with `dannys_diner.menu` table based on `product_id` column.
289 | - Filter the result set to include only the rows where the rank is 1, representing the earliest purchase made by each customer before they became a member.
290 | - Sort the result by `customer_id` in ascending order.
291 |
292 | #### Answer:
293 | | customer_id | product_name |
294 | | ----------- | ---------- |
295 | | A | sushi |
296 | | B | sushi |
297 |
298 | - Both customers' last order before becoming members are sushi.
299 |
300 | ***
301 |
302 | **8. What is the total items and amount spent for each member before they became a member?**
303 |
304 | ```sql
305 | SELECT
306 | sales.customer_id,
307 | COUNT(sales.product_id) AS total_items,
308 | SUM(menu.price) AS total_sales
309 | FROM dannys_diner.sales
310 | INNER JOIN dannys_diner.members
311 | ON sales.customer_id = members.customer_id
312 | AND sales.order_date < members.join_date
313 | INNER JOIN dannys_diner.menu
314 | ON sales.product_id = menu.product_id
315 | GROUP BY sales.customer_id
316 | ORDER BY sales.customer_id;
317 | ```
318 |
319 | #### Steps:
320 | - Select the columns `sales.customer_id` and calculate the count of `sales.product_id` as total_items for each customer and the sum of `menu.price` as total_sales.
321 | - From `dannys_diner.sales` table, join `dannys_diner.members` table on `customer_id` column, ensuring that `sales.order_date` is earlier than `members.join_date` (`sales.order_date < members.join_date`).
322 | - Then, join `dannys_diner.menu` table to `dannys_diner.sales` table on `product_id` column.
323 | - Group the results by `sales.customer_id`.
324 | - Order the result by `sales.customer_id` in ascending order.
325 |
326 | #### Answer:
327 | | customer_id | total_items | total_sales |
328 | | ----------- | ---------- |---------- |
329 | | A | 2 | 25 |
330 | | B | 3 | 40 |
331 |
332 | Before becoming members,
333 | - Customer A spent $25 on 2 items.
334 | - Customer B spent $40 on 3 items.
335 |
336 | ***
337 |
338 | **9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier — how many points would each customer have?**
339 |
340 | ```sql
341 | WITH points_cte AS (
342 | SELECT
343 | menu.product_id,
344 | CASE
345 | WHEN product_id = 1 THEN price * 20
346 | ELSE price * 10 END AS points
347 | FROM dannys_diner.menu
348 | )
349 |
350 | SELECT
351 | sales.customer_id,
352 | SUM(points_cte.points) AS total_points
353 | FROM dannys_diner.sales
354 | INNER JOIN points_cte
355 | ON sales.product_id = points_cte.product_id
356 | GROUP BY sales.customer_id
357 | ORDER BY sales.customer_id;
358 | ```
359 |
360 | #### Steps:
361 | Let's break down the question to understand the point calculation for each customer's purchases.
362 | - Each $1 spent = 10 points. However, `product_id` 1 sushi gets 2x points, so each $1 spent = 20 points.
363 | - Here's how the calculation is performed using a conditional CASE statement:
364 | - If product_id = 1, multiply every $1 by 20 points.
365 | - Otherwise, multiply $1 by 10 points.
366 | - Then, calculate the total points for each customer.
367 |
368 | #### Answer:
369 | | customer_id | total_points |
370 | | ----------- | ---------- |
371 | | A | 860 |
372 | | B | 940 |
373 | | C | 360 |
374 |
375 | - Total points for Customer A is $860.
376 | - Total points for Customer B is $940.
377 | - Total points for Customer C is $360.
378 |
379 | ***
380 |
381 | **10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi — how many points do customer A and B have at the end of January?**
382 |
383 | ```sql
384 | WITH dates_cte AS (
385 | SELECT
386 | customer_id,
387 | join_date,
388 | join_date + 6 AS valid_date,
389 | DATE_TRUNC(
390 | 'month', '2021-01-31'::DATE)
391 | + interval '1 month'
392 | - interval '1 day' AS last_date
393 | FROM dannys_diner.members
394 | )
395 |
396 | SELECT
397 | sales.customer_id,
398 | SUM(CASE
399 | WHEN menu.product_name = 'sushi' THEN 2 * 10 * menu.price
400 | WHEN sales.order_date BETWEEN dates.join_date AND dates.valid_date THEN 2 * 10 * menu.price
401 | ELSE 10 * menu.price END) AS points
402 | FROM dannys_diner.sales
403 | INNER JOIN dates_cte AS dates
404 | ON sales.customer_id = dates.customer_id
405 | AND dates.join_date <= sales.order_date
406 | AND sales.order_date <= dates.last_date
407 | INNER JOIN dannys_diner.menu
408 | ON sales.product_id = menu.product_id
409 | GROUP BY sales.customer_id;
410 | ```
411 |
412 | #### Assumptions:
413 | - On Day -X to Day 1 (the day a customer becomes a member), each $1 spent earns 10 points. However, for sushi, each $1 spent earns 20 points.
414 | - From Day 1 to Day 7 (the first week of membership), each $1 spent for any items earns 20 points.
415 | - From Day 8 to the last day of January 2021, each $1 spent earns 10 points. However, sushi continues to earn double the points at 20 points per $1 spent.
416 |
417 | #### Steps:
418 | - Create a CTE called `dates_cte`.
419 | - In `dates_cte`, calculate the `valid_date` by adding 6 days to the `join_date` and determine the `last_date` of the month by subtracting 1 day from the last day of January 2021.
420 | - From `dannys_diner.sales` table, join `dates_cte` on `customer_id` column, ensuring that the `order_date` of the sale is after the `join_date` (`dates.join_date <= sales.order_date`) and not later than the `last_date` (`sales.order_date <= dates.last_date`).
421 | - Then, join `dannys_diner.menu` table based on the `product_id` column.
422 | - In the outer query, calculate the points by using a `CASE` statement to determine the points based on our assumptions above.
423 | - If the `product_name` is 'sushi', multiply the price by 2 and then by 10. For orders placed between `join_date` and `valid_date`, also multiply the price by 2 and then by 10.
424 | - For all other products, multiply the price by 10.
425 | - Calculate the sum of points for each customer.
426 |
427 | #### Answer:
428 | | customer_id | total_points |
429 | | ----------- | ---------- |
430 | | A | 1020 |
431 | | B | 320 |
432 |
433 | - Total points for Customer A is 1,020.
434 | - Total points for Customer B is 320.
435 |
436 | ***
437 |
438 | ## BONUS QUESTIONS
439 |
440 | **Join All The Things**
441 |
442 | **Recreate the table with: customer_id, order_date, product_name, price, member (Y/N)**
443 |
444 | ```sql
445 | SELECT
446 | sales.customer_id,
447 | sales.order_date,
448 | menu.product_name,
449 | menu.price,
450 | CASE
451 | WHEN members.join_date > sales.order_date THEN 'N'
452 | WHEN members.join_date <= sales.order_date THEN 'Y'
453 | ELSE 'N' END AS member_status
454 | FROM dannys_diner.sales
455 | LEFT JOIN dannys_diner.members
456 | ON sales.customer_id = members.customer_id
457 | INNER JOIN dannys_diner.menu
458 | ON sales.product_id = menu.product_id
459 | ORDER BY members.customer_id, sales.order_date
460 | ```
461 |
462 | #### Answer:
463 | | customer_id | order_date | product_name | price | member |
464 | | ----------- | ---------- | -------------| ----- | ------ |
465 | | A | 2021-01-01 | sushi | 10 | N |
466 | | A | 2021-01-01 | curry | 15 | N |
467 | | A | 2021-01-07 | curry | 15 | Y |
468 | | A | 2021-01-10 | ramen | 12 | Y |
469 | | A | 2021-01-11 | ramen | 12 | Y |
470 | | A | 2021-01-11 | ramen | 12 | Y |
471 | | B | 2021-01-01 | curry | 15 | N |
472 | | B | 2021-01-02 | curry | 15 | N |
473 | | B | 2021-01-04 | sushi | 10 | N |
474 | | B | 2021-01-11 | sushi | 10 | Y |
475 | | B | 2021-01-16 | ramen | 12 | Y |
476 | | B | 2021-02-01 | ramen | 12 | Y |
477 | | C | 2021-01-01 | ramen | 12 | N |
478 | | C | 2021-01-01 | ramen | 12 | N |
479 | | C | 2021-01-07 | ramen | 12 | N |
480 |
481 | ***
482 |
483 | **Rank All The Things**
484 |
485 | **Danny also requires further information about the ```ranking``` of customer products, but he purposely does not need the ranking for non-member purchases so he expects null ```ranking``` values for the records when customers are not yet part of the loyalty program.**
486 |
487 | ```sql
488 | WITH customers_data AS (
489 | SELECT
490 | sales.customer_id,
491 | sales.order_date,
492 | menu.product_name,
493 | menu.price,
494 | CASE
495 | WHEN members.join_date > sales.order_date THEN 'N'
496 | WHEN members.join_date <= sales.order_date THEN 'Y'
497 | ELSE 'N' END AS member_status
498 | FROM dannys_diner.sales
499 | LEFT JOIN dannys_diner.members
500 | ON sales.customer_id = members.customer_id
501 | INNER JOIN dannys_diner.menu
502 | ON sales.product_id = menu.product_id
503 | )
504 |
505 | SELECT
506 | *,
507 | CASE
508 | WHEN member_status = 'N' then NULL
509 | ELSE RANK () OVER (
510 | PARTITION BY customer_id, member_status
511 | ORDER BY order_date
512 | ) END AS ranking
513 | FROM customers_data;
514 | ```
515 |
516 | #### Answer:
517 | | customer_id | order_date | product_name | price | member | ranking |
518 | | ----------- | ---------- | -------------| ----- | ------ |-------- |
519 | | A | 2021-01-01 | sushi | 10 | N | NULL
520 | | A | 2021-01-01 | curry | 15 | N | NULL
521 | | A | 2021-01-07 | curry | 15 | Y | 1
522 | | A | 2021-01-10 | ramen | 12 | Y | 2
523 | | A | 2021-01-11 | ramen | 12 | Y | 3
524 | | A | 2021-01-11 | ramen | 12 | Y | 3
525 | | B | 2021-01-01 | curry | 15 | N | NULL
526 | | B | 2021-01-02 | curry | 15 | N | NULL
527 | | B | 2021-01-04 | sushi | 10 | N | NULL
528 | | B | 2021-01-11 | sushi | 10 | Y | 1
529 | | B | 2021-01-16 | ramen | 12 | Y | 2
530 | | B | 2021-02-01 | ramen | 12 | Y | 3
531 | | C | 2021-01-01 | ramen | 12 | N | NULL
532 | | C | 2021-01-01 | ramen | 12 | N | NULL
533 | | C | 2021-01-07 | ramen | 12 | N | NULL
534 |
535 | ***
536 |
--------------------------------------------------------------------------------
/Case Study #2 - Pizza Runner/A. Pizza Metrics.md:
--------------------------------------------------------------------------------
1 | # 🍕 Case Study #2 - Pizza Runner
2 |
3 | ## 🍝 Solution - A. Pizza Metrics
4 |
5 | ### 1. How many pizzas were ordered?
6 |
7 | ````sql
8 | SELECT COUNT(*) AS pizza_order_count
9 | FROM #customer_orders;
10 | ````
11 |
12 | **Answer:**
13 |
14 | 
15 |
16 | - Total of 14 pizzas were ordered.
17 |
18 | ### 2. How many unique customer orders were made?
19 |
20 | ````sql
21 | SELECT
22 | COUNT(DISTINCT order_id) AS unique_order_count
23 | FROM #customer_orders;
24 | ````
25 |
26 | **Answer:**
27 |
28 | 
29 |
30 | - There are 10 unique customer orders.
31 |
32 | ### 3. How many successful orders were delivered by each runner?
33 |
34 | ````sql
35 | SELECT
36 | runner_id,
37 | COUNT(order_id) AS successful_orders
38 | FROM #runner_orders
39 | WHERE distance != 0
40 | GROUP BY runner_id;
41 | ````
42 |
43 | **Answer:**
44 |
45 | 
46 |
47 | - Runner 1 has 4 successful delivered orders.
48 | - Runner 2 has 3 successful delivered orders.
49 | - Runner 3 has 1 successful delivered order.
50 |
51 | ### 4. How many of each type of pizza was delivered?
52 |
53 | ````sql
54 | SELECT
55 | p.pizza_name,
56 | COUNT(c.pizza_id) AS delivered_pizza_count
57 | FROM #customer_orders AS c
58 | JOIN #runner_orders AS r
59 | ON c.order_id = r.order_id
60 | JOIN pizza_names AS p
61 | ON c.pizza_id = p.pizza_id
62 | WHERE r.distance != 0
63 | GROUP BY p.pizza_name;
64 | ````
65 |
66 | **Answer:**
67 |
68 | 
69 |
70 | - There are 9 delivered Meatlovers pizzas and 3 Vegetarian pizzas.
71 |
72 | ### 5. How many Vegetarian and Meatlovers were ordered by each customer?**
73 |
74 | ````sql
75 | SELECT
76 | c.customer_id,
77 | p.pizza_name,
78 | COUNT(p.pizza_name) AS order_count
79 | FROM #customer_orders AS c
80 | JOIN pizza_names AS p
81 | ON c.pizza_id= p.pizza_id
82 | GROUP BY c.customer_id, p.pizza_name
83 | ORDER BY c.customer_id;
84 | ````
85 |
86 | **Answer:**
87 |
88 | 
89 |
90 | - Customer 101 ordered 2 Meatlovers pizzas and 1 Vegetarian pizza.
91 | - Customer 102 ordered 2 Meatlovers pizzas and 2 Vegetarian pizzas.
92 | - Customer 103 ordered 3 Meatlovers pizzas and 1 Vegetarian pizza.
93 | - Customer 104 ordered 1 Meatlovers pizza.
94 | - Customer 105 ordered 1 Vegetarian pizza.
95 |
96 | ### 6. What was the maximum number of pizzas delivered in a single order?
97 |
98 | ````sql
99 | WITH pizza_count_cte AS
100 | (
101 | SELECT
102 | c.order_id,
103 | COUNT(c.pizza_id) AS pizza_per_order
104 | FROM #customer_orders AS c
105 | JOIN #runner_orders AS r
106 | ON c.order_id = r.order_id
107 | WHERE r.distance != 0
108 | GROUP BY c.order_id
109 | )
110 |
111 | SELECT
112 | MAX(pizza_per_order) AS pizza_count
113 | FROM pizza_count_cte;
114 | ````
115 |
116 | **Answer:**
117 |
118 | 
119 |
120 | - Maximum number of pizza delivered in a single order is 3 pizzas.
121 |
122 | ### 7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
123 |
124 | ````sql
125 | SELECT
126 | c.customer_id,
127 | SUM(
128 | CASE WHEN c.exclusions <> ' ' OR c.extras <> ' ' THEN 1
129 | ELSE 0
130 | END) AS at_least_1_change,
131 | SUM(
132 | CASE WHEN c.exclusions = ' ' AND c.extras = ' ' THEN 1
133 | ELSE 0
134 | END) AS no_change
135 | FROM #customer_orders AS c
136 | JOIN #runner_orders AS r
137 | ON c.order_id = r.order_id
138 | WHERE r.distance != 0
139 | GROUP BY c.customer_id
140 | ORDER BY c.customer_id;
141 | ````
142 |
143 | **Answer:**
144 |
145 | 
146 |
147 | - Customer 101 and 102 likes his/her pizzas per the original recipe.
148 | - Customer 103, 104 and 105 have their own preference for pizza topping and requested at least 1 change (extra or exclusion topping) on their pizza.
149 |
150 | ### 8. How many pizzas were delivered that had both exclusions and extras?
151 |
152 | ````sql
153 | SELECT
154 | SUM(
155 | CASE WHEN exclusions IS NOT NULL AND extras IS NOT NULL THEN 1
156 | ELSE 0
157 | END) AS pizza_count_w_exclusions_extras
158 | FROM #customer_orders AS c
159 | JOIN #runner_orders AS r
160 | ON c.order_id = r.order_id
161 | WHERE r.distance >= 1
162 | AND exclusions <> ' '
163 | AND extras <> ' ';
164 | ````
165 |
166 | **Answer:**
167 |
168 | 
169 |
170 | - Only 1 pizza delivered that had both extra and exclusion topping. That’s one fussy customer!
171 |
172 | ### 9. What was the total volume of pizzas ordered for each hour of the day?
173 |
174 | ````sql
175 | SELECT
176 | DATEPART(HOUR, [order_time]) AS hour_of_day,
177 | COUNT(order_id) AS pizza_count
178 | FROM #customer_orders
179 | GROUP BY DATEPART(HOUR, [order_time]);
180 | ````
181 |
182 | **Answer:**
183 |
184 | 
185 |
186 | - Highest volume of pizza ordered is at 13 (1:00 pm), 18 (6:00 pm) and 21 (9:00 pm).
187 | - Lowest volume of pizza ordered is at 11 (11:00 am), 19 (7:00 pm) and 23 (11:00 pm).
188 |
189 | ### 10. What was the volume of orders for each day of the week?
190 |
191 | ````sql
192 | SELECT
193 | FORMAT(DATEADD(DAY, 2, order_time),'dddd') AS day_of_week, -- add 2 to adjust 1st day of the week as Monday
194 | COUNT(order_id) AS total_pizzas_ordered
195 | FROM #customer_orders
196 | GROUP BY FORMAT(DATEADD(DAY, 2, order_time),'dddd');
197 | ````
198 |
199 | **Answer:**
200 |
201 | 
202 |
203 | - There are 5 pizzas ordered on Friday and Monday.
204 | - There are 3 pizzas ordered on Saturday.
205 | - There is 1 pizza ordered on Sunday.
206 |
207 | ***Click [here](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/B.%20Runner%20and%20Customer%20Experience.md) for solution for B. Runner and Customer Experience!***
208 |
209 |
--------------------------------------------------------------------------------
/Case Study #2 - Pizza Runner/B. Runner and Customer Experience.md:
--------------------------------------------------------------------------------
1 | # 🍕 Case Study #2 Pizza Runner
2 |
3 | ## Solution - B. Runner and Customer Experience
4 |
5 | ### 1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
6 |
7 | ````sql
8 | SELECT
9 | DATEPART(WEEK, registration_date) AS registration_week,
10 | COUNT(runner_id) AS runner_signup
11 | FROM runners
12 | GROUP BY DATEPART(WEEK, registration_date);
13 | ````
14 |
15 | **Answer:**
16 |
17 | 
18 |
19 | - On Week 1 of Jan 2021, 2 new runners signed up.
20 | - On Week 2 and 3 of Jan 2021, 1 new runner signed up.
21 |
22 | ### 2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
23 |
24 | ````sql
25 | WITH time_taken_cte AS
26 | (
27 | SELECT
28 | c.order_id,
29 | c.order_time,
30 | r.pickup_time,
31 | DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS pickup_minutes
32 | FROM #customer_orders AS c
33 | JOIN #runner_orders AS r
34 | ON c.order_id = r.order_id
35 | WHERE r.distance != 0
36 | GROUP BY c.order_id, c.order_time, r.pickup_time
37 | )
38 |
39 | SELECT
40 | AVG(pickup_minutes) AS avg_pickup_minutes
41 | FROM time_taken_cte
42 | WHERE pickup_minutes > 1;
43 | ````
44 |
45 | **Answer:**
46 |
47 | 
48 |
49 | - The average time taken in minutes by runners to arrive at Pizza Runner HQ to pick up the order is 15 minutes.
50 |
51 | ### 3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
52 |
53 | ````sql
54 | WITH prep_time_cte AS
55 | (
56 | SELECT
57 | c.order_id,
58 | COUNT(c.order_id) AS pizza_order,
59 | c.order_time,
60 | r.pickup_time,
61 | DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS prep_time_minutes
62 | FROM #customer_orders AS c
63 | JOIN #runner_orders AS r
64 | ON c.order_id = r.order_id
65 | WHERE r.distance != 0
66 | GROUP BY c.order_id, c.order_time, r.pickup_time
67 | )
68 |
69 | SELECT
70 | pizza_order,
71 | AVG(prep_time_minutes) AS avg_prep_time_minutes
72 | FROM prep_time_cte
73 | WHERE prep_time_minutes > 1
74 | GROUP BY pizza_order;
75 | ````
76 |
77 | **Answer:**
78 |
79 | 
80 |
81 | - On average, a single pizza order takes 12 minutes to prepare.
82 | - An order with 3 pizzas takes 30 minutes at an average of 10 minutes per pizza.
83 | - It takes 16 minutes to prepare an order with 2 pizzas which is 8 minutes per pizza — making 2 pizzas in a single order the ultimate efficiency rate.
84 |
85 | ### 4. Is there any relationship between the number of pizzas and how long the order takes to prepare?
86 |
87 | ````sql
88 | SELECT
89 | c.customer_id,
90 | AVG(r.distance) AS avg_distance
91 | FROM #customer_orders AS c
92 | JOIN #runner_orders AS r
93 | ON c.order_id = r.order_id
94 | WHERE r.duration != 0
95 | GROUP BY c.customer_id;
96 | ````
97 |
98 | **Answer:**
99 |
100 | 
101 |
102 | _(Assuming that distance is calculated from Pizza Runner HQ to customer’s place)_
103 |
104 | - Customer 104 stays the nearest to Pizza Runner HQ at average distance of 10km, whereas Customer 105 stays the furthest at 25km.
105 |
106 | ### 5. What was the difference between the longest and shortest delivery times for all orders?
107 |
108 | _Edit 08/10/21: Thanks to my reader, Ankush Taneja on Medium who caught my mistake. I've amended to the correct solution. Also, I was doing this case study using SQL Server few months ago, but I'm using PostgreSQL on SQLpad now so there could be a slight difference to the syntax._
109 |
110 | Firstly, I'm going to filter results with non-null duration first just to have a feel. You can skip this step and go straight to the answer.
111 |
112 | ````sql
113 | SELECT
114 | order_id, duration
115 | FROM #runner_orders
116 | WHERE duration not like ' ';
117 | ````
118 |
119 |
120 |
121 | ```sql
122 | SELECT MAX(duration::NUMERIC) - MIN(duration::NUMERIC) AS delivery_time_difference
123 | FROM runner_orders2
124 | where duration not like ' ';
125 | ```
126 |
127 | **Answer:**
128 |
129 |
130 |
131 | - The difference between longest (40 minutes) and shortest (10 minutes) delivery time for all orders is 30 minutes.
132 |
133 | ### 6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
134 |
135 | ````sql
136 | SELECT
137 | r.runner_id,
138 | c.customer_id,
139 | c.order_id,
140 | COUNT(c.order_id) AS pizza_count,
141 | r.distance, (r.duration / 60) AS duration_hr ,
142 | ROUND((r.distance/r.duration * 60), 2) AS avg_speed
143 | FROM #runner_orders AS r
144 | JOIN #customer_orders AS c
145 | ON r.order_id = c.order_id
146 | WHERE distance != 0
147 | GROUP BY r.runner_id, c.customer_id, c.order_id, r.distance, r.duration
148 | ORDER BY c.order_id;
149 | ````
150 |
151 | **Answer:**
152 |
153 | 
154 |
155 | _(Average speed = Distance in km / Duration in hour)_
156 | - Runner 1’s average speed runs from 37.5km/h to 60km/h.
157 | - Runner 2’s average speed runs from 35.1km/h to 93.6km/h. Danny should investigate Runner 2 as the average speed has a 300% fluctuation rate!
158 | - Runner 3’s average speed is 40km/h
159 |
160 | ### 7. What is the successful delivery percentage for each runner?
161 |
162 | ````sql
163 | SELECT
164 | runner_id,
165 | ROUND(100 * SUM(
166 | CASE WHEN distance = 0 THEN 0
167 | ELSE 1 END) / COUNT(*), 0) AS success_perc
168 | FROM #runner_orders
169 | GROUP BY runner_id;
170 | ````
171 |
172 | **Answer:**
173 |
174 | 
175 |
176 | - Runner 1 has 100% successful delivery.
177 | - Runner 2 has 75% successful delivery.
178 | - Runner 3 has 50% successful delivery
179 |
180 | _(It’s not right to attribute successful delivery to runners as order cancellations are out of the runner’s control.)_
181 |
182 | ***
183 |
--------------------------------------------------------------------------------
/Case Study #2 - Pizza Runner/C. Ingredient Optimisation.md:
--------------------------------------------------------------------------------
1 | # 🍕 Case Study #2 Pizza Runner
2 |
3 | ## Solution - C. Ingredient Optimisation
4 |
5 | ### 1. What are the standard ingredients for each pizza?
6 |
7 | ### 2. What was the most commonly added extra?
8 |
9 | ```sql
10 | WITH toppings_cte AS (
11 | SELECT
12 | pizza_id,
13 | REGEXP_SPLIT_TO_TABLE(toppings, '[,\s]+')::INTEGER AS topping_id
14 | FROM pizza_runner.pizza_recipes)
15 |
16 | SELECT
17 | t.topping_id, pt.topping_name,
18 | COUNT(t.topping_id) AS topping_count
19 | FROM toppings_cte t
20 | INNER JOIN pizza_runner.pizza_toppings pt
21 | ON t.topping_id = pt.topping_id
22 | GROUP BY t.topping_id, pt.topping_name
23 | ORDER BY topping_count DESC;
24 | ```
25 |
26 | **Solution**
27 |
28 |
29 |
30 | ### 3. What was the most common exclusion?
31 |
32 | ### 4. Generate an order item for each record in the customers_orders table in the format of one of the following:
33 | - Meat Lovers
34 | - Meat Lovers - Exclude Beef
35 | - Meat Lovers - Extra Bacon
36 | - Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers
37 |
38 | ### 5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
39 |
40 | ### 6. For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
41 |
42 | ### 7. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
43 |
--------------------------------------------------------------------------------
/Case Study #2 - Pizza Runner/Data Cleaning and Transformation.md:
--------------------------------------------------------------------------------
1 | # 🍕 Case Study #2 - Pizza Runner
2 |
3 | ## 🧼 Data Cleaning & Transformation
4 |
5 | ### 🔨 Table: customer_orders
6 |
7 | Looking at the `customer_orders` table below, we can see that there are
8 | - In the `exclusions` column, there are missing/ blank spaces ' ' and null values.
9 | - In the `extras` column, there are missing/ blank spaces ' ' and null values.
10 |
11 |
12 |
13 | Our course of action to clean the table:
14 | - Create a temporary table with all the columns
15 | - Remove null values in `exlusions` and `extras` columns and replace with blank space ' '.
16 |
17 | ````sql
18 | CREATE TEMP TABLE customer_orders_temp AS
19 | SELECT
20 | order_id,
21 | customer_id,
22 | pizza_id,
23 | CASE
24 | WHEN exclusions IS null OR exclusions LIKE 'null' THEN ' '
25 | ELSE exclusions
26 | END AS exclusions,
27 | CASE
28 | WHEN extras IS NULL or extras LIKE 'null' THEN ' '
29 | ELSE extras
30 | END AS extras,
31 | order_time
32 | FROM pizza_runner.customer_orders;
33 | `````
34 |
35 | This is how the clean `customers_orders_temp` table looks like and we will use this table to run all our queries.
36 |
37 |
38 |
39 | ***
40 |
41 | ### 🔨 Table: runner_orders
42 |
43 | Looking at the `runner_orders` table below, we can see that there are
44 | - In the `exclusions` column, there are missing/ blank spaces ' ' and null values.
45 | - In the `extras` column, there are missing/ blank spaces ' ' and null values
46 |
47 |
48 |
49 | Our course of action to clean the table:
50 | - In `pickup_time` column, remove nulls and replace with blank space ' '.
51 | - In `distance` column, remove "km" and nulls and replace with blank space ' '.
52 | - In `duration` column, remove "minutes", "minute" and nulls and replace with blank space ' '.
53 | - In `cancellation` column, remove NULL and null and and replace with blank space ' '.
54 |
55 | ````sql
56 | CREATE TEMP TABLE runner_orders_temp AS
57 | SELECT
58 | order_id,
59 | runner_id,
60 | CASE
61 | WHEN pickup_time LIKE 'null' THEN ' '
62 | ELSE pickup_time
63 | END AS pickup_time,
64 | CASE
65 | WHEN distance LIKE 'null' THEN ' '
66 | WHEN distance LIKE '%km' THEN TRIM('km' from distance)
67 | ELSE distance
68 | END AS distance,
69 | CASE
70 | WHEN duration LIKE 'null' THEN ' '
71 | WHEN duration LIKE '%mins' THEN TRIM('mins' from duration)
72 | WHEN duration LIKE '%minute' THEN TRIM('minute' from duration)
73 | WHEN duration LIKE '%minutes' THEN TRIM('minutes' from duration)
74 | ELSE duration
75 | END AS duration,
76 | CASE
77 | WHEN cancellation IS NULL or cancellation LIKE 'null' THEN ' '
78 | ELSE cancellation
79 | END AS cancellation
80 | FROM pizza_runner.runner_orders;
81 | ````
82 |
83 | Then, we alter the `pickup_time`, `distance` and `duration` columns to the correct data type.
84 |
85 | ````sql
86 | ALTER TABLE runner_orders_temp
87 | ALTER COLUMN pickup_time DATETIME,
88 | ALTER COLUMN distance FLOAT,
89 | ALTER COLUMN duration INT;
90 | ````
91 |
92 | This is how the clean `runner_orders_temp` table looks like and we will use this table to run all our queries.
93 |
94 |
95 |
96 | ***
97 |
98 | Click here for [solution](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/A.%20Pizza%20Metrics.md) to **A. Pizza Metrics**!
99 |
--------------------------------------------------------------------------------
/Case Study #2 - Pizza Runner/README.md:
--------------------------------------------------------------------------------
1 | # 🍕 Case Study #2 Pizza Runner
2 |
3 |
4 |
5 | ## 📚 Table of Contents
6 | - [Business Task](#business-task)
7 | - [Entity Relationship Diagram](#entity-relationship-diagram)
8 | - Solution
9 | - [Data Cleaning and Transformation](#-data-cleaning--transformation)
10 | - [A. Pizza Metrics](#a-pizza-metrics)
11 | - [B. Runner and Customer Experience](#b-runner-and-customer-experience)
12 | - [C. Ingredient Optimisation](#c-ingredient-optimisation)
13 | - [D. Pricing and Ratings](#d-pricing-and-ratings)
14 |
15 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-2/).
16 |
17 | I also published this on [Medium](https://medium.com/analytics-vidhya/8-week-sql-challenge-case-study-2-pizza-runner-ba32f0a6f9fb?source=friends_link&sk=5463dad7c9b0b1ba83d570f09e1fce90)!
18 | ***
19 |
20 | ## Business Task
21 | Danny is expanding his new Pizza Empire and at the same time, he wants to Uberize it, so Pizza Runner was launched!
22 |
23 | Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.
24 |
25 | ## Entity Relationship Diagram
26 |
27 | 
28 |
29 | ## 🧼 Data Cleaning & Transformation
30 |
31 | ### 🔨 Table: customer_orders
32 |
33 | Looking at the `customer_orders` table below, we can see that there are
34 | - In the `exclusions` column, there are missing/ blank spaces ' ' and null values.
35 | - In the `extras` column, there are missing/ blank spaces ' ' and null values.
36 |
37 |
38 |
39 | Our course of action to clean the table:
40 | - Create a temporary table with all the columns
41 | - Remove null values in `exlusions` and `extras` columns and replace with blank space ' '.
42 |
43 | ````sql
44 | CREATE TEMP TABLE customer_orders_temp AS
45 | SELECT
46 | order_id,
47 | customer_id,
48 | pizza_id,
49 | CASE
50 | WHEN exclusions IS null OR exclusions LIKE 'null' THEN ' '
51 | ELSE exclusions
52 | END AS exclusions,
53 | CASE
54 | WHEN extras IS NULL or extras LIKE 'null' THEN ' '
55 | ELSE extras
56 | END AS extras,
57 | order_time
58 | FROM pizza_runner.customer_orders;
59 | `````
60 |
61 | This is how the clean `customers_orders_temp` table looks like and we will use this table to run all our queries.
62 |
63 |
64 |
65 | ***
66 |
67 | ### 🔨 Table: runner_orders
68 |
69 | Looking at the `runner_orders` table below, we can see that there are
70 | - In the `exclusions` column, there are missing/ blank spaces ' ' and null values.
71 | - In the `extras` column, there are missing/ blank spaces ' ' and null values
72 |
73 |
74 |
75 | Our course of action to clean the table:
76 | - In `pickup_time` column, remove nulls and replace with blank space ' '.
77 | - In `distance` column, remove "km" and nulls and replace with blank space ' '.
78 | - In `duration` column, remove "minutes", "minute" and nulls and replace with blank space ' '.
79 | - In `cancellation` column, remove NULL and null and and replace with blank space ' '.
80 |
81 | ````sql
82 | CREATE TEMP TABLE runner_orders_temp AS
83 | SELECT
84 | order_id,
85 | runner_id,
86 | CASE
87 | WHEN pickup_time LIKE 'null' THEN ' '
88 | ELSE pickup_time
89 | END AS pickup_time,
90 | CASE
91 | WHEN distance LIKE 'null' THEN ' '
92 | WHEN distance LIKE '%km' THEN TRIM('km' from distance)
93 | ELSE distance
94 | END AS distance,
95 | CASE
96 | WHEN duration LIKE 'null' THEN ' '
97 | WHEN duration LIKE '%mins' THEN TRIM('mins' from duration)
98 | WHEN duration LIKE '%minute' THEN TRIM('minute' from duration)
99 | WHEN duration LIKE '%minutes' THEN TRIM('minutes' from duration)
100 | ELSE duration
101 | END AS duration,
102 | CASE
103 | WHEN cancellation IS NULL or cancellation LIKE 'null' THEN ' '
104 | ELSE cancellation
105 | END AS cancellation
106 | FROM pizza_runner.runner_orders;
107 | ````
108 |
109 | Then, we alter the `pickup_time`, `distance` and `duration` columns to the correct data type.
110 |
111 | ````sql
112 | ALTER TABLE runner_orders_temp
113 | ALTER COLUMN pickup_time DATETIME,
114 | ALTER COLUMN distance FLOAT,
115 | ALTER COLUMN duration INT;
116 | ````
117 |
118 | This is how the clean `runner_orders_temp` table looks like and we will use this table to run all our queries.
119 |
120 |
121 |
122 | ***
123 |
124 | ## Solution
125 |
126 | ## A. Pizza Metrics
127 |
128 | ### 1. How many pizzas were ordered?
129 |
130 | ````sql
131 | SELECT COUNT(*) AS pizza_order_count
132 | FROM customer_orders_temp;
133 | ````
134 |
135 | **Answer:**
136 |
137 | 
138 |
139 | - Total of 14 pizzas were ordered.
140 |
141 | ### 2. How many unique customer orders were made?
142 |
143 | ````sql
144 | SELECT COUNT(DISTINCT order_id) AS unique_order_count
145 | FROM customer_orders_temp;
146 | ````
147 |
148 | **Answer:**
149 |
150 | 
151 |
152 | - There are 10 unique customer orders.
153 |
154 | ### 3. How many successful orders were delivered by each runner?
155 |
156 | ````sql
157 | SELECT
158 | runner_id,
159 | COUNT(order_id) AS successful_orders
160 | FROM #runner_orders
161 | WHERE distance != 0
162 | GROUP BY runner_id;
163 | ````
164 |
165 | **Answer:**
166 |
167 | 
168 |
169 | - Runner 1 has 4 successful delivered orders.
170 | - Runner 2 has 3 successful delivered orders.
171 | - Runner 3 has 1 successful delivered order.
172 |
173 | ### 4. How many of each type of pizza was delivered?
174 |
175 | ````sql
176 | SELECT
177 | p.pizza_name,
178 | COUNT(c.pizza_id) AS delivered_pizza_count
179 | FROM #customer_orders AS c
180 | JOIN #runner_orders AS r
181 | ON c.order_id = r.order_id
182 | JOIN pizza_names AS p
183 | ON c.pizza_id = p.pizza_id
184 | WHERE r.distance != 0
185 | GROUP BY p.pizza_name;
186 | ````
187 |
188 | **Answer:**
189 |
190 | 
191 |
192 | - There are 9 delivered Meatlovers pizzas and 3 Vegetarian pizzas.
193 |
194 | ### 5. How many Vegetarian and Meatlovers were ordered by each customer?**
195 |
196 | ````sql
197 | SELECT
198 | c.customer_id,
199 | p.pizza_name,
200 | COUNT(p.pizza_name) AS order_count
201 | FROM #customer_orders AS c
202 | JOIN pizza_names AS p
203 | ON c.pizza_id= p.pizza_id
204 | GROUP BY c.customer_id, p.pizza_name
205 | ORDER BY c.customer_id;
206 | ````
207 |
208 | **Answer:**
209 |
210 | 
211 |
212 | - Customer 101 ordered 2 Meatlovers pizzas and 1 Vegetarian pizza.
213 | - Customer 102 ordered 2 Meatlovers pizzas and 2 Vegetarian pizzas.
214 | - Customer 103 ordered 3 Meatlovers pizzas and 1 Vegetarian pizza.
215 | - Customer 104 ordered 1 Meatlovers pizza.
216 | - Customer 105 ordered 1 Vegetarian pizza.
217 |
218 | ### 6. What was the maximum number of pizzas delivered in a single order?
219 |
220 | ````sql
221 | WITH pizza_count_cte AS
222 | (
223 | SELECT
224 | c.order_id,
225 | COUNT(c.pizza_id) AS pizza_per_order
226 | FROM #customer_orders AS c
227 | JOIN #runner_orders AS r
228 | ON c.order_id = r.order_id
229 | WHERE r.distance != 0
230 | GROUP BY c.order_id
231 | )
232 |
233 | SELECT
234 | MAX(pizza_per_order) AS pizza_count
235 | FROM pizza_count_cte;
236 | ````
237 |
238 | **Answer:**
239 |
240 | 
241 |
242 | - Maximum number of pizza delivered in a single order is 3 pizzas.
243 |
244 | ### 7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
245 |
246 | ````sql
247 | SELECT
248 | c.customer_id,
249 | SUM(
250 | CASE WHEN c.exclusions <> ' ' OR c.extras <> ' ' THEN 1
251 | ELSE 0
252 | END) AS at_least_1_change,
253 | SUM(
254 | CASE WHEN c.exclusions = ' ' AND c.extras = ' ' THEN 1
255 | ELSE 0
256 | END) AS no_change
257 | FROM #customer_orders AS c
258 | JOIN #runner_orders AS r
259 | ON c.order_id = r.order_id
260 | WHERE r.distance != 0
261 | GROUP BY c.customer_id
262 | ORDER BY c.customer_id;
263 | ````
264 |
265 | **Answer:**
266 |
267 | 
268 |
269 | - Customer 101 and 102 likes his/her pizzas per the original recipe.
270 | - Customer 103, 104 and 105 have their own preference for pizza topping and requested at least 1 change (extra or exclusion topping) on their pizza.
271 |
272 | ### 8. How many pizzas were delivered that had both exclusions and extras?
273 |
274 | ````sql
275 | SELECT
276 | SUM(
277 | CASE WHEN exclusions IS NOT NULL AND extras IS NOT NULL THEN 1
278 | ELSE 0
279 | END) AS pizza_count_w_exclusions_extras
280 | FROM #customer_orders AS c
281 | JOIN #runner_orders AS r
282 | ON c.order_id = r.order_id
283 | WHERE r.distance >= 1
284 | AND exclusions <> ' '
285 | AND extras <> ' ';
286 | ````
287 |
288 | **Answer:**
289 |
290 | 
291 |
292 | - Only 1 pizza delivered that had both extra and exclusion topping. That’s one fussy customer!
293 |
294 | ### 9. What was the total volume of pizzas ordered for each hour of the day?
295 |
296 | ````sql
297 | SELECT
298 | DATEPART(HOUR, [order_time]) AS hour_of_day,
299 | COUNT(order_id) AS pizza_count
300 | FROM #customer_orders
301 | GROUP BY DATEPART(HOUR, [order_time]);
302 | ````
303 |
304 | **Answer:**
305 |
306 | 
307 |
308 | - Highest volume of pizza ordered is at 13 (1:00 pm), 18 (6:00 pm) and 21 (9:00 pm).
309 | - Lowest volume of pizza ordered is at 11 (11:00 am), 19 (7:00 pm) and 23 (11:00 pm).
310 |
311 | ### 10. What was the volume of orders for each day of the week?
312 |
313 | ````sql
314 | SELECT
315 | FORMAT(DATEADD(DAY, 2, order_time),'dddd') AS day_of_week, -- add 2 to adjust 1st day of the week as Monday
316 | COUNT(order_id) AS total_pizzas_ordered
317 | FROM #customer_orders
318 | GROUP BY FORMAT(DATEADD(DAY, 2, order_time),'dddd');
319 | ````
320 |
321 | **Answer:**
322 |
323 | 
324 |
325 | - There are 5 pizzas ordered on Friday and Monday.
326 | - There are 3 pizzas ordered on Saturday.
327 | - There is 1 pizza ordered on Sunday.
328 |
329 | ***
330 |
331 | ## B. Runner and Customer Experience
332 |
333 | ### 1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
334 |
335 | ````sql
336 | SELECT
337 | DATEPART(WEEK, registration_date) AS registration_week,
338 | COUNT(runner_id) AS runner_signup
339 | FROM runners
340 | GROUP BY DATEPART(WEEK, registration_date);
341 | ````
342 |
343 | **Answer:**
344 |
345 | 
346 |
347 | - On Week 1 of Jan 2021, 2 new runners signed up.
348 | - On Week 2 and 3 of Jan 2021, 1 new runner signed up.
349 |
350 | ### 2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
351 |
352 | ````sql
353 | WITH time_taken_cte AS
354 | (
355 | SELECT
356 | c.order_id,
357 | c.order_time,
358 | r.pickup_time,
359 | DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS pickup_minutes
360 | FROM #customer_orders AS c
361 | JOIN #runner_orders AS r
362 | ON c.order_id = r.order_id
363 | WHERE r.distance != 0
364 | GROUP BY c.order_id, c.order_time, r.pickup_time
365 | )
366 |
367 | SELECT
368 | AVG(pickup_minutes) AS avg_pickup_minutes
369 | FROM time_taken_cte
370 | WHERE pickup_minutes > 1;
371 | ````
372 |
373 | **Answer:**
374 |
375 | 
376 |
377 | - The average time taken in minutes by runners to arrive at Pizza Runner HQ to pick up the order is 15 minutes.
378 |
379 | ### 3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
380 |
381 | ````sql
382 | WITH prep_time_cte AS
383 | (
384 | SELECT
385 | c.order_id,
386 | COUNT(c.order_id) AS pizza_order,
387 | c.order_time,
388 | r.pickup_time,
389 | DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS prep_time_minutes
390 | FROM #customer_orders AS c
391 | JOIN #runner_orders AS r
392 | ON c.order_id = r.order_id
393 | WHERE r.distance != 0
394 | GROUP BY c.order_id, c.order_time, r.pickup_time
395 | )
396 |
397 | SELECT
398 | pizza_order,
399 | AVG(prep_time_minutes) AS avg_prep_time_minutes
400 | FROM prep_time_cte
401 | WHERE prep_time_minutes > 1
402 | GROUP BY pizza_order;
403 | ````
404 |
405 | **Answer:**
406 |
407 | 
408 |
409 | - On average, a single pizza order takes 12 minutes to prepare.
410 | - An order with 3 pizzas takes 30 minutes at an average of 10 minutes per pizza.
411 | - It takes 16 minutes to prepare an order with 2 pizzas which is 8 minutes per pizza — making 2 pizzas in a single order the ultimate efficiency rate.
412 |
413 | ### 4. Is there any relationship between the number of pizzas and how long the order takes to prepare?
414 |
415 | ````sql
416 | SELECT
417 | c.customer_id,
418 | AVG(r.distance) AS avg_distance
419 | FROM #customer_orders AS c
420 | JOIN #runner_orders AS r
421 | ON c.order_id = r.order_id
422 | WHERE r.duration != 0
423 | GROUP BY c.customer_id;
424 | ````
425 |
426 | **Answer:**
427 |
428 | 
429 |
430 | _(Assuming that distance is calculated from Pizza Runner HQ to customer’s place)_
431 |
432 | - Customer 104 stays the nearest to Pizza Runner HQ at average distance of 10km, whereas Customer 105 stays the furthest at 25km.
433 |
434 | ### 5. What was the difference between the longest and shortest delivery times for all orders?
435 |
436 | _Edit 08/10/21: Thanks to my reader, Ankush Taneja on Medium who caught my mistake. I've amended to the correct solution. Also, I was doing this case study using SQL Server few months ago, but I'm using PostgreSQL on SQLpad now so there could be a slight difference to the syntax._
437 |
438 | Firstly, I'm going to filter results with non-null duration first just to have a feel. You can skip this step and go straight to the answer.
439 |
440 | ````sql
441 | SELECT
442 | order_id, duration
443 | FROM #runner_orders
444 | WHERE duration not like ' ';
445 | ````
446 |
447 |
448 |
449 | ```sql
450 | SELECT MAX(duration::NUMERIC) - MIN(duration::NUMERIC) AS delivery_time_difference
451 | FROM runner_orders2
452 | where duration not like ' ';
453 | ```
454 |
455 | **Answer:**
456 |
457 |
458 |
459 | - The difference between longest (40 minutes) and shortest (10 minutes) delivery time for all orders is 30 minutes.
460 |
461 | ### 6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
462 |
463 | ````sql
464 | SELECT
465 | r.runner_id,
466 | c.customer_id,
467 | c.order_id,
468 | COUNT(c.order_id) AS pizza_count,
469 | r.distance, (r.duration / 60) AS duration_hr ,
470 | ROUND((r.distance/r.duration * 60), 2) AS avg_speed
471 | FROM #runner_orders AS r
472 | JOIN #customer_orders AS c
473 | ON r.order_id = c.order_id
474 | WHERE distance != 0
475 | GROUP BY r.runner_id, c.customer_id, c.order_id, r.distance, r.duration
476 | ORDER BY c.order_id;
477 | ````
478 |
479 | **Answer:**
480 |
481 | 
482 |
483 | _(Average speed = Distance in km / Duration in hour)_
484 | - Runner 1’s average speed runs from 37.5km/h to 60km/h.
485 | - Runner 2’s average speed runs from 35.1km/h to 93.6km/h. Danny should investigate Runner 2 as the average speed has a 300% fluctuation rate!
486 | - Runner 3’s average speed is 40km/h
487 |
488 | ### 7. What is the successful delivery percentage for each runner?
489 |
490 | ````sql
491 | SELECT
492 | runner_id,
493 | ROUND(100 * SUM(
494 | CASE WHEN distance = 0 THEN 0
495 | ELSE 1 END) / COUNT(*), 0) AS success_perc
496 | FROM #runner_orders
497 | GROUP BY runner_id;
498 | ````
499 |
500 | **Answer:**
501 |
502 | 
503 |
504 | - Runner 1 has 100% successful delivery.
505 | - Runner 2 has 75% successful delivery.
506 | - Runner 3 has 50% successful delivery
507 |
508 | _(It’s not right to attribute successful delivery to runners as order cancellations are out of the runner’s control.)_
509 |
510 | ***
511 |
512 | ## C. Ingredient Optimisation
513 |
514 | ### 1. What are the standard ingredients for each pizza?
515 |
516 | ### 2. What was the most commonly added extra?
517 |
518 | ```sql
519 | WITH toppings_cte AS (
520 | SELECT
521 | pizza_id,
522 | REGEXP_SPLIT_TO_TABLE(toppings, '[,\s]+')::INTEGER AS topping_id
523 | FROM pizza_runner.pizza_recipes)
524 |
525 | SELECT
526 | t.topping_id, pt.topping_name,
527 | COUNT(t.topping_id) AS topping_count
528 | FROM toppings_cte t
529 | INNER JOIN pizza_runner.pizza_toppings pt
530 | ON t.topping_id = pt.topping_id
531 | GROUP BY t.topping_id, pt.topping_name
532 | ORDER BY topping_count DESC;
533 | ```
534 |
535 | **Solution**
536 |
537 |
538 |
539 | ### 3. What was the most common exclusion?
540 |
541 | ### 4. Generate an order item for each record in the customers_orders table in the format of one of the following:
542 | - Meat Lovers
543 | - Meat Lovers - Exclude Beef
544 | - Meat Lovers - Extra Bacon
545 | - Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers
546 |
547 | ### 5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
548 |
549 | ### 6. For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
550 |
551 | ### 7. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
552 |
553 | ***
554 |
555 | ## D. Pricing and Ratings
556 |
557 | 1. If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?
558 | 2. What if there was an additional $1 charge for any pizza extras?
559 | - Add cheese is $1 extra
560 | 3. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
561 | 4. Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
562 | - customer_id
563 | - order_id
564 | - runner_id
565 | - rating
566 | - order_time
567 | - pickup_time
568 | - Time between order and pickup
569 | - Delivery duration
570 | - Average speed
571 | - Total number of pizzas
572 | 5. If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?
573 |
574 | ***
575 |
--------------------------------------------------------------------------------
/Case Study #3 - Foodie-Fi/README.md:
--------------------------------------------------------------------------------
1 | # 🥑 Case Study #3: Foodie-Fi
2 |
3 |
4 |
5 | ## 📚 Table of Contents
6 | - [Business Task](#business-task)
7 | - [Entity Relationship Diagram](#entity-relationship-diagram)
8 | - [Question and Solution](#question-and-solution)
9 |
10 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-3/).
11 |
12 | ***
13 |
14 | ## Business Task
15 | Danny and his friends launched a new startup Foodie-Fi and started selling monthly and annual subscriptions, giving their customers unlimited on-demand access to exclusive food videos from around the world.
16 |
17 | This case study focuses on using subscription style digital data to answer important business questions on customer journey, payments, and business performances.
18 |
19 | ## Entity Relationship Diagram
20 |
21 | 
22 |
23 | **Table 1: `plans`**
24 |
25 |
26 |
27 | There are 5 customer plans.
28 |
29 | - Trial — Customer sign up to an initial 7 day free trial and will automatically continue with the pro monthly subscription plan unless they cancel, downgrade to basic or upgrade to an annual pro plan at any point during the trial.
30 | - Basic plan — Customers have limited access and can only stream their videos and is only available monthly at $9.90.
31 | - Pro plan — Customers have no watch time limits and are able to download videos for offline viewing. Pro plans start at $19.90 a month or $199 for an annual subscription.
32 |
33 | When customers cancel their Foodie-Fi service — they will have a Churn plan record with a null price, but their plan will continue until the end of the billing period.
34 |
35 | **Table 2: `subscriptions`**
36 |
37 |
38 |
39 | Customer subscriptions show the **exact date** where their specific `plan_id` starts.
40 |
41 | If customers downgrade from a pro plan or cancel their subscription — the higher plan will remain in place until the period is over — the `start_date` in the subscriptions table will reflect the date that the actual plan changes.
42 |
43 | When customers upgrade their account from a basic plan to a pro or annual pro plan — the higher plan will take effect straightaway.
44 |
45 | When customers churn, they will keep their access until the end of their current billing period, but the start_date will be technically the day they decided to cancel their service.
46 |
47 | ***
48 |
49 | ## Question and Solution
50 |
51 | Please join me in executing the queries using PostgreSQL on [DB Fiddle](https://www.db-fiddle.com/f/rHJhRrXy5hbVBNJ6F6b9gJ/16). It would be great to work together on the questions!
52 |
53 | Additionally, I have also published this case study on [Medium](https://medium.com/analytics-vidhya/8-week-sql-challenge-case-study-3-foodie-fi-3d8497376ea9?sk=579afc01c30aa6149d85050f8a46ddef).
54 |
55 | If you have any questions, reach out to me on [LinkedIn](https://www.linkedin.com/in/katiehuangx/).
56 |
57 | ## 🎞️ A. Customer Journey
58 |
59 | Based off the 8 sample customers provided in the sample subscriptions table below, write a brief description about each customer’s onboarding journey.
60 |
61 | **Table: Sample of `subscriptions` table**
62 |
63 |
64 |
65 | **Answer:**
66 |
67 | ```sql
68 | SELECT
69 | sub.customer_id,
70 | plans.plan_id,
71 | plans.plan_name,
72 | sub.start_date
73 | FROM foodie_fi.plans
74 | JOIN foodie_fi.subscriptions AS sub
75 | ON plans.plan_id = sub.plan_id
76 | WHERE sub.customer_id IN (1,2,11,13,15,16,18,19);
77 | ```
78 |
79 |
80 |
81 | Based on the results above, I have selected three customers to focus on and will now share their onboarding journey.
82 |
83 | _(Refer to the table below)_
84 |
85 | Customer 1: This customer initiated their journey by starting the free trial on 1 Aug 2020. After the trial period ended, on 8 Aug 2020, they subscribed to the basic monthly plan.
86 |
87 |
88 |
89 | Customer 13: The onboarding journey for this customer began with a free trial on 15 Dec 2020. Following the trial period, on 22 Dec 2020, they subscribed to the basic monthly plan. After three months, on 29 Mar 2021, they upgraded to the pro monthly plan.
90 |
91 |
92 |
93 | Customer 15: Initially, this customer commenced their onboarding journey with a free trial on 17 Mar 2020. Once the trial ended, on 24 Mar 2020, they upgraded to the pro monthly plan. However, the following month, on 29 Apr 2020, the customer decided to terminate their subscription and subsequently churned until the paid subscription ends.
94 |
95 |
96 |
97 | ***
98 |
99 | ## B. Data Analysis Questions
100 |
101 | ### 1. How many customers has Foodie-Fi ever had?
102 |
103 | To determine the count of unique customers for Foodie-Fi, I utilize the `COUNT()` function wrapped around `DISTINCT`.
104 |
105 | ```sql
106 | SELECT COUNT(DISTINCT customer_id) AS num_of_customers
107 | FROM foodie_fi.subscriptions;
108 | ```
109 |
110 | **Answer:**
111 |
112 |
113 |
114 | - Foodie-Fi has 1,000 unique customers.
115 |
116 | ### 2. What is the monthly distribution of trial plan start_date values for our dataset - use the start of the month as the group by value
117 |
118 | In other words, the question is asking for the monthly count of users on the trial plan subscription.
119 | - To start, extract the numerical value of month from `start_date` column using the `DATE_PART()` function, specifying the 'month' part of a date.
120 | - Filter the results to retrieve only users with trial plan subscriptions (`plan_id = 0).
121 |
122 | ```sql
123 | SELECT
124 | DATE_PART('month', start_date) AS month_date, -- Cast start_date as month in numerical format
125 | COUNT(sub.customer_id) AS trial_plan_subscriptions
126 | FROM foodie_fi.subscriptions AS sub
127 | JOIN foodie_fi.plans p
128 | ON s.plan_id = p.plan_id
129 | WHERE s.plan_id = 0 -- Trial plan ID is 0
130 | GROUP BY DATE_PART('month',start_date)
131 | ORDER BY month_date;
132 | ```
133 |
134 | **Answer:**
135 |
136 |
137 |
138 | Among all the months, March has the highest number of trial plans, while February has the lowest number of trial plans.
139 |
140 | ### 3. What plan start_date values occur after the year 2020 for our dataset? Show the breakdown by count of events for each plan_name.
141 |
142 | To put it simply, we have to determine the count of plans with start dates on or after 1 January 2021 grouped by plan names.
143 | 1. Filter plans based on their start dates by including only the plans occurring on or after January 1, 2021.
144 | 2. Calculate the number of customers as the number of events.
145 | 3. Group results based on the plan names. For better readability, order results in ascending order of the plan ID.
146 |
147 | ````sql
148 | SELECT
149 | plans.plan_id,
150 | plans.plan_name,
151 | COUNT(sub.customer_id) AS num_of_events
152 | FROM foodie_fi.subscriptions AS sub
153 | JOIN foodie_fi.plans
154 | ON sub.plan_id = plans.plan_id
155 | WHERE sub.start_date >= '2021-01-01'
156 | GROUP BY plans.plan_id, plans.plan_name
157 | ORDER BY plans.plan_id;
158 | ````
159 |
160 | **Answer:**
161 |
162 | | plan_id | plan_name | num_of_events |
163 | | ------- | ------------- | ------------- |
164 | | 1 | basic monthly | 8 |
165 | | 2 | pro monthly | 60 |
166 | | 3 | pro annual | 63 |
167 | | 4 | churn | 71 |
168 |
169 | ### 4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place?
170 |
171 | Let's analyze the question:
172 | - First, we need to determine
173 | - The number of customers who have churned, meaning those who have discontinued their subscription.
174 | - The total number of customers, including both active and churned ones.
175 |
176 | - To calculate the churn rate, we divide the number of churned customers by the total number of customers. The result should be rounded to one decimal place.
177 |
178 | ```sql
179 | SELECT
180 | COUNT(DISTINCT sub.customer_id) AS churned_customers,
181 | ROUND(100.0 * COUNT(sub.customer_id)
182 | / (SELECT COUNT(DISTINCT customer_id)
183 | FROM foodie_fi.subscriptions)
184 | ,1) AS churn_percentage
185 | FROM foodie_fi.subscriptions AS sub
186 | JOIN foodie_fi.plans
187 | ON sub.plan_id = plans.plan_id
188 | WHERE plans.plan_id = 4; -- Filter results to customers with churn plan only
189 | ```
190 |
191 | **Answer:**
192 |
193 |
194 |
195 | - Out of the total customer base of Foodie-Fi, 307 customers have churned. This represents approximately 30.7% of the overall customer count.
196 |
197 | ### 5. How many customers have churned straight after their initial free trial - what percentage is this rounded to the nearest whole number?
198 |
199 | Within a CTE called `ranked_cte`, determine which customers churned immediately after the trial plan by utilizing `ROW_NUMBER()` function to assign rankings to each customer's plans.
200 |
201 | In this scenario, if a customer churned right after the trial plan, the plan rankings would appear as follows:
202 | - Trial Plan - Rank 1
203 | - Churned - Rank 2
204 |
205 | In the outer query:
206 | - Apply 2 conditions in the WHERE clause:
207 | - Filter `plan_id = 4`.
208 | - Filter for customers who have churned immediately after their trial with `row_num = 2`.
209 | - Count the number of customers who have churned immediately after their trial period using a `CASE` statement by checking if the row number is 2 (`row_num = 2`) and the plan name is 'churn' (`plan_name = 'churn'`).
210 | - Calculate the churn percentage by dividing the `churned_customers` count by the total count of distinct customer IDs in the `subscriptions` table. Round percentage to a whole number.
211 |
212 | ```sql
213 | WITH ranked_cte AS (
214 | SELECT
215 | sub.customer_id,
216 | plans.plan_id,
217 | ROW_NUMBER() OVER (
218 | PARTITION BY sub.customer_id
219 | ORDER BY sub.start_date) AS row_num
220 | FROM foodie_fi.subscriptions AS sub
221 | JOIN foodie_fi.plans
222 | ON sub.plan_id = plans.plan_id
223 | )
224 |
225 | SELECT
226 | COUNT(CASE
227 | WHEN row_num = 2 AND plan_name = 'churn' THEN 1
228 | ELSE 0 END) AS churned_customers,
229 | ROUND(100.0 * COUNT(
230 | CASE
231 | WHEN row_num = 2 AND plan_name = 'churn' THEN 1
232 | ELSE 0 END)
233 | / (SELECT COUNT(DISTINCT customer_id)
234 | FROM foodie_fi.subscriptions)
235 | ) AS churn_percentage
236 | FROM ranked_cte
237 | WHERE plan_id = 4 -- Filter to churn plan.
238 | AND row_num = 2; -- Customers who have churned immediately after trial have churn plan ranked as 2.
239 | ```
240 |
241 | Here's another solution using the `LEAD()` window function:
242 | ```sql
243 | WITH ranked_cte AS (
244 | SELECT
245 | sub.customer_id,
246 | plans.plan_name,
247 | LEAD(plans.plan_name) OVER (
248 | PARTITION BY sub.customer_id
249 | ORDER BY sub.start_date) AS next_plan
250 | FROM foodie_fi.subscriptions AS sub
251 | JOIN foodie_fi.plans
252 | ON sub.plan_id = plans.plan_id
253 | )
254 |
255 | SELECT
256 | COUNT(customer_id) AS churned_customers,
257 | ROUND(100.0 *
258 | COUNT(customer_id)
259 | / (SELECT COUNT(DISTINCT customer_id)
260 | FROM foodie_fi.subscriptions)
261 | ) AS churn_percentage
262 | FROM ranked_cte
263 | WHERE plan_name = 'trial'
264 | AND next_plan = 'churn;
265 | ```
266 |
267 | **Answer:**
268 |
269 |
270 |
271 | - A total of 92 customers churned immediately after the initial free trial period, representing approximately 9% of the entire customer base.
272 |
273 | ### 6. What is the number and percentage of customer plans after their initial free trial?
274 |
275 | ```sql
276 | WITH next_plans AS (
277 | SELECT
278 | customer_id,
279 | plan_id,
280 | LEAD(plan_id) OVER(
281 | PARTITION BY customer_id
282 | ORDER BY plan_id) as next_plan_id
283 | FROM foodie_fi.subscriptions
284 | )
285 |
286 | SELECT
287 | next_plan_id AS plan_id,
288 | COUNT(customer_id) AS converted_customers,
289 | ROUND(100 *
290 | COUNT(customer_id)::NUMERIC
291 | / (SELECT COUNT(DISTINCT customer_id)
292 | FROM foodie_fi.subscriptions)
293 | ,1) AS conversion_percentage
294 | FROM next_plans
295 | WHERE next_plan_id IS NOT NULL
296 | AND plan_id = 0
297 | GROUP BY next_plan_id
298 | ORDER BY next_plan_id;
299 | ```
300 |
301 | **Answer:**
302 |
303 | | plan_id | converted_customers | conversion_percentage |
304 | | ------- | ------------------- | --------------------- |
305 | | 1 | 546 | 54.6 |
306 | | 2 | 325 | 32.5 |
307 | | 3 | 37 | 3.7 |
308 | | 4 | 92 | 9.2 |
309 |
310 | - More than 80% of Foodie-Fi's customers are on paid plans with a majority opting for Plans 1 and 2.
311 | - There is potential for improvement in customer acquisition for Plan 3 as only a small percentage of customers are choosing this higher-priced plan.
312 |
313 | ### 7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31?
314 |
315 | In the cte called `next_dates`, we begin by filtering the results to include only the plans with start dates on or before '2020-12-31'. To identify the next start date for each plan, we utilize the `LEAD()` window function.
316 |
317 | In the outer query, we filter the results where the `next_date` is NULL. This step helps us identify the most recent plan that each customer subscribed to as of '2020-12-31'.
318 |
319 | Lastly, we perform calculations to determine the total count of customers and the percentage of customers associated with each trial plan.
320 |
321 | ```sql
322 | WITH next_dates AS (
323 | SELECT
324 | customer_id,
325 | plan_id,
326 | start_date,
327 | LEAD(start_date) OVER (
328 | PARTITION BY customer_id
329 | ORDER BY start_date
330 | ) AS next_date
331 | FROM foodie_fi.subscriptions
332 | WHERE start_date <= '2020-12-31'
333 | )
334 |
335 | SELECT
336 | plan_id,
337 | COUNT(DISTINCT customer_id) AS customers,
338 | ROUND(100.0 *
339 | COUNT(DISTINCT customer_id)
340 | / (SELECT COUNT(DISTINCT customer_id)
341 | FROM foodie_fi.subscriptions)
342 | ,1) AS percentage
343 | FROM next_dates
344 | WHERE next_date IS NULL
345 | GROUP BY plan_id;
346 | ```
347 |
348 | **Answer:**
349 |
350 |
351 |
352 | ### 8. How many customers have upgraded to an annual plan in 2020?
353 |
354 | ```sql
355 | SELECT COUNT(DISTINCT customer_id) AS num_of_customers
356 | FROM foodie_fi.subscriptions
357 | WHERE plan_id = 3
358 | AND start_date <= '2020-12-31';
359 | ```
360 |
361 | **Answer:**
362 |
363 |
364 |
365 | - 196 customers have upgraded to an annual plan in 2020.
366 |
367 | ### 9. How many days on average does it take for a customer to upgrade to an annual plan from the day they join Foodie-Fi?
368 |
369 | This question is straightforward and the query provided is self-explanatory.
370 |
371 | ````sql
372 | WITH trial_plan AS (
373 | -- trial_plan CTE: Filter results to include only the customers subscribed to the trial plan.
374 | SELECT
375 | customer_id,
376 | start_date AS trial_date
377 | FROM foodie_fi.subscriptions
378 | WHERE plan_id = 0
379 | ), annual_plan AS (
380 | -- annual_plan CTE: Filter results to only include the customers subscribed to the pro annual plan.
381 | SELECT
382 | customer_id,
383 | start_date AS annual_date
384 | FROM foodie_fi.subscriptions
385 | WHERE plan_id = 3
386 | )
387 | -- Find the average of the differences between the start date of a trial plan and a pro annual plan.
388 | SELECT
389 | ROUND(
390 | AVG(
391 | annual.annual_date - trial.trial_date)
392 | ,0) AS avg_days_to_upgrade
393 | FROM trial_plan AS trial
394 | JOIN annual_plan AS annual
395 | ON trial.customer_id = annual.customer_id;
396 | ````
397 |
398 | **Answer:**
399 |
400 |
401 |
402 | - On average, customers take approximately 105 days from the day they join Foodie-Fi to upgrade to an annual plan.
403 |
404 | ### 10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)
405 |
406 | To understand how the `WIDTH_BUCKET()` function works in creating buckets of 30-day periods, you can refer to this [StackOverflow](https://stackoverflow.com/questions/50518548/creating-a-bin-column-in-postgres-to-check-an-integer-and-return-a-string) answer.
407 |
408 | ```sql
409 | WITH trial_plan AS (
410 | -- trial_plan CTE: Filter results to include only the customers subscribed to the trial plan.
411 | SELECT
412 | customer_id,
413 | start_date AS trial_date
414 | FROM foodie_fi.subscriptions
415 | WHERE plan_id = 0
416 | ), annual_plan AS (
417 | -- annual_plan CTE: Filter results to only include the customers subscribed to the pro annual plan.
418 | SELECT
419 | customer_id,
420 | start_date AS annual_date
421 | FROM foodie_fi.subscriptions
422 | WHERE plan_id = 3
423 | ), bins AS (
424 | -- bins CTE: Put customers in 30-day buckets based on the average number of days taken to upgrade to a pro annual plan.
425 | SELECT
426 | WIDTH_BUCKET(annual.annual_date - trial.trial_date, 0, 365, 12) AS avg_days_to_upgrade
427 | FROM trial_plan AS trial
428 | JOIN annual_plan AS annual
429 | ON trial.customer_id = annual.customer_id
430 | )
431 |
432 | SELECT
433 | ((avg_days_to_upgrade - 1) * 30 || ' - ' || avg_days_to_upgrade * 30 || ' days') AS bucket,
434 | COUNT(*) AS num_of_customers
435 | FROM bins
436 | GROUP BY avg_days_to_upgrade
437 | ORDER BY avg_days_to_upgrade;
438 | ```
439 |
440 | **Answer:**
441 |
442 | | bucket | num_of_customers |
443 | | -------------- | ---------------- |
444 | | 0 - 30 days | 49 |
445 | | 30 - 60 days | 24 |
446 | | 60 - 90 days | 35 |
447 | | 90 - 120 days | 35 |
448 | | 120 - 150 days | 43 |
449 | | 150 - 180 days | 37 |
450 | | 180 - 210 days | 24 |
451 | | 210 - 240 days | 4 |
452 | | 240 - 270 days | 4 |
453 | | 270 - 300 days | 1 |
454 | | 300 - 330 days | 1 |
455 | | 330 - 360 days | 1 |
456 |
457 | ### 11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020?
458 |
459 | ```sql
460 | WITH ranked_cte AS (
461 | SELECT
462 | sub.customer_id,
463 | plans.plan_id,
464 | plans.plan_name,
465 | LEAD(plans.plan_id) OVER (
466 | PARTITION BY sub.customer_id
467 | ORDER BY sub.start_date) AS next_plan_id
468 | FROM foodie_fi.subscriptions AS sub
469 | JOIN foodie_fi.plans
470 | ON sub.plan_id = plans.plan_id
471 | WHERE DATE_PART('year', start_date) = 2020
472 | )
473 |
474 | SELECT
475 | COUNT(customer_id) AS churned_customers
476 | FROM ranked_cte
477 | WHERE plan_id = 2
478 | AND next_plan_id = 1;
479 | ```
480 |
481 | **Answer:**
482 |
483 | In 2020, there were no instances where customers downgraded from a pro monthly plan to a basic monthly plan.
484 |
485 | ***
486 |
--------------------------------------------------------------------------------
/Case Study #4 - Data Bank/README.md:
--------------------------------------------------------------------------------
1 | ## Case Study #4: Data Bank
2 |
3 |
4 |
5 | ## 📚 Table of Contents
6 | - [Business Task](#business-task)
7 | - [Entity Relationship Diagram](#entity-relationship-diagram)
8 | - [Question and Solution](#question-and-solution)
9 |
10 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-4/).
11 |
12 | ***
13 |
14 | ## Business Task
15 | Danny launched a new initiative, Data Bank which runs **banking activities** and also acts as the world’s most secure distributed **data storage platform**!
16 |
17 | Customers are allocated cloud data storage limits which are directly linked to how much money they have in their accounts.
18 |
19 | The management team at Data Bank want to increase their total customer base - but also need some help tracking just how much data storage their customers will need.
20 |
21 | This case study is all about calculating metrics, growth and helping the business analyse their data in a smart way to better forecast and plan for their future developments!
22 |
23 | ## Entity Relationship Diagram
24 |
25 |
26 |
27 | **Table 1: `regions`**
28 |
29 | This regions table contains the `region_id` and their respective `region_name` values.
30 |
31 |
32 |
33 | **Table 2: `customer_nodes`**
34 |
35 | Customers are randomly distributed across the nodes according to their region. This random distribution changes frequently to reduce the risk of hackers getting into Data Bank’s system and stealing customer’s money and data!
36 |
37 |
38 |
39 | **Table 3: Customer Transactions**
40 |
41 | This table stores all customer deposits, withdrawals and purchases made using their Data Bank debit card.
42 |
43 |
44 |
45 | ***
46 |
47 | ## Question and Solution
48 |
49 | Please join me in executing the queries using PostgreSQL on [DB Fiddle](https://www.db-fiddle.com/f/2GtQz4wZtuNNu7zXH5HtV4/3). It would be great to work together on the questions!
50 |
51 | If you have any questions, reach out to me on [LinkedIn](https://www.linkedin.com/in/katiehuangx/).
52 |
53 | ## 🏦 A. Customer Nodes Exploration
54 |
55 | **1. How many unique nodes are there on the Data Bank system?**
56 |
57 | ````sql
58 | SELECT COUNT(DISTINCT node_id) AS unique_nodes
59 | FROM data_bank.customer_nodes;
60 | ````
61 |
62 | **Answer:**
63 |
64 | |unique_nodes|
65 | |:----|
66 | |5|
67 |
68 | - There are 5 unique nodes on the Data Bank system.
69 |
70 | ***
71 |
72 | **2. What is the number of nodes per region?**
73 |
74 | ````sql
75 | SELECT
76 | regions.region_name,
77 | COUNT(DISTINCT customers.node_id) AS node_count
78 | FROM data_bank.regions
79 | JOIN data_bank.customer_nodes AS customers
80 | ON regions.region_id = customers.region_id
81 | GROUP BY regions.region_name;
82 | ````
83 |
84 | **Answer:**
85 |
86 | |region_name|node_count|
87 | |:----|:----|
88 | |Africa|5|
89 | |America|5|
90 | |Asia|5|
91 | |Australia|5|
92 | |Europe|5|
93 |
94 | ***
95 |
96 | **3. How many customers are allocated to each region?**
97 |
98 | ````sql
99 | SELECT
100 | region_id,
101 | COUNT(customer_id) AS customer_count
102 | FROM data_bank.customer_nodes
103 | GROUP BY region_id
104 | ORDER BY region_id;
105 | ````
106 |
107 | **Answer:**
108 |
109 | |region_id|customer_count|
110 | |:----|:----|
111 | |1|770|
112 | |2|735|
113 | |3|714|
114 | |4|665|
115 | |5|616|
116 |
117 | ***
118 |
119 | **4. How many days on average are customers reallocated to a different node?**
120 |
121 | ````sql
122 | WITH node_days AS (
123 | SELECT
124 | customer_id,
125 | node_id,
126 | end_date - start_date AS days_in_node
127 | FROM data_bank.customer_nodes
128 | WHERE end_date != '9999-12-31'
129 | GROUP BY customer_id, node_id, start_date, end_date
130 | )
131 | , total_node_days AS (
132 | SELECT
133 | customer_id,
134 | node_id,
135 | SUM(days_in_node) AS total_days_in_node
136 | FROM node_days
137 | GROUP BY customer_id, node_id
138 | )
139 |
140 | SELECT ROUND(AVG(total_days_in_node)) AS avg_node_reallocation_days
141 | FROM total_node_days;
142 | ````
143 |
144 | **Answer:**
145 |
146 | |avg_node_reallocation_days|
147 | |:----|
148 | |24|
149 |
150 | - On average, customers are reallocated to a different node every 24 days.
151 |
152 | ***
153 |
154 | **5. What is the median, 80th and 95th percentile for this same reallocation days metric for each region?**
155 |
156 |
157 |
158 | ***
159 |
160 | ## 🏦 B. Customer Transactions
161 |
162 | **1. What is the unique count and total amount for each transaction type?**
163 |
164 | ````sql
165 | SELECT
166 | txn_type,
167 | COUNT(customer_id) AS transaction_count,
168 | SUM(txn_amount) AS total_amount
169 | FROM data_bank.customer_transactions
170 | GROUP BY txn_type;
171 | ````
172 |
173 | **Answer:**
174 |
175 | |txn_type|transaction_count|total_amount|
176 | |:----|:----|:----|
177 | |purchase|1617|806537|
178 | |deposit|2671|1359168|
179 | |withdrawal|1580|793003|
180 |
181 | ***
182 |
183 | **2. What is the average total historical deposit counts and amounts for all customers?**
184 |
185 | ````sql
186 | WITH deposits AS (
187 | SELECT
188 | customer_id,
189 | COUNT(customer_id) AS txn_count,
190 | AVG(txn_amount) AS avg_amount
191 | FROM data_bank.customer_transactions
192 | WHERE txn_type = 'deposit'
193 | GROUP BY customer_id
194 | )
195 |
196 | SELECT
197 | ROUND(AVG(txn_count)) AS avg_deposit_count,
198 | ROUND(AVG(avg_amount)) AS avg_deposit_amt
199 | FROM deposits;
200 | ````
201 | **Answer:**
202 |
203 | |avg_deposit_count|avg_deposit_amt|
204 | |:----|:----|
205 | |5|509|
206 |
207 | - The average historical deposit count is 5 and the average historical deposit amount is $ 509.
208 |
209 | ***
210 |
211 | **3. For each month - how many Data Bank customers make more than 1 deposit and either 1 purchase or 1 withdrawal in a single month?**
212 |
213 | First, create a CTE called `monthly_transactions` to determine the count of deposit, purchase and withdrawal for each customer categorised by month using `CASE` statement and `SUM()`.
214 |
215 | In the main query, select the `mth` column and count the number of unique customers where:
216 | - `deposit_count` is greater than 1, indicating more than one deposit (`deposit_count > 1`).
217 | - Either `purchase_count` is greater than or equal to 1 (`purchase_count >= 1`) OR `withdrawal_count` is greater than or equal to 1 (`withdrawal_count >= 1`).
218 |
219 | ````sql
220 | WITH monthly_transactions AS (
221 | SELECT
222 | customer_id,
223 | DATE_PART('month', txn_date) AS mth,
224 | SUM(CASE WHEN txn_type = 'deposit' THEN 0 ELSE 1 END) AS deposit_count,
225 | SUM(CASE WHEN txn_type = 'purchase' THEN 0 ELSE 1 END) AS purchase_count,
226 | SUM(CASE WHEN txn_type = 'withdrawal' THEN 1 ELSE 0 END) AS withdrawal_count
227 | FROM data_bank.customer_transactions
228 | GROUP BY customer_id, DATE_PART('month', txn_date)
229 | )
230 |
231 | SELECT
232 | mth,
233 | COUNT(DISTINCT customer_id) AS customer_count
234 | FROM monthly_transactions
235 | WHERE deposit_count > 1
236 | AND (purchase_count >= 1 OR withdrawal_count >= 1)
237 | GROUP BY mth
238 | ORDER BY mth;
239 | ````
240 |
241 | **Answer:**
242 |
243 | |month|customer_count|
244 | |:----|:----|
245 | |1|170|
246 | |2|277|
247 | |3|292|
248 | |4|103|
249 |
250 | ***
251 |
252 | **4. What is the closing balance for each customer at the end of the month? Also show the change in balance each month in the same table output.**
253 |
254 | Update Jun 2, 2023: Even after 2 years, I continue to find this question incredibly challenging. I have cleaned up the code and provided additional explanations.
255 |
256 | The key aspect to understanding the solution is to build up the tabele and run the CTEs cumulatively (run CTE 1 first, then run CTE 1 & 2, and so on). This approach allows for a better understanding of why specific columns were created or how the information in the tables progressed.
257 |
258 | ```sql
259 | -- CTE 1 - To identify transaction amount as an inflow (+) or outflow (-)
260 | WITH monthly_balances_cte AS (
261 | SELECT
262 | customer_id,
263 | (DATE_TRUNC('month', txn_date) + INTERVAL '1 MONTH - 1 DAY') AS closing_month,
264 | SUM(CASE
265 | WHEN txn_type = 'withdrawal' OR txn_type = 'purchase' THEN -txn_amount
266 | ELSE txn_amount END) AS transaction_balance
267 | FROM data_bank.customer_transactions
268 | GROUP BY
269 | customer_id, txn_date
270 | )
271 |
272 | -- CTE 2 - Use GENERATE_SERIES() to generate as a series of last day of the month for each customer.
273 | , monthend_series_cte AS (
274 | SELECT
275 | DISTINCT customer_id,
276 | ('2020-01-31'::DATE + GENERATE_SERIES(0,3) * INTERVAL '1 MONTH') AS ending_month
277 | FROM data_bank.customer_transactions
278 | )
279 |
280 | -- CTE 3 - Calculate total monthly change and ending balance for each month using window function SUM()
281 | , monthly_changes_cte AS (
282 | SELECT
283 | monthend_series_cte.customer_id,
284 | monthend_series_cte.ending_month,
285 | SUM(monthly_balances_cte.transaction_balance) OVER (
286 | PARTITION BY monthend_series_cte.customer_id, monthend_series_cte.ending_month
287 | ORDER BY monthend_series_cte.ending_month
288 | ) AS total_monthly_change,
289 | SUM(monthly_balances_cte.transaction_balance) OVER (
290 | PARTITION BY monthend_series_cte.customer_id
291 | ORDER BY monthend_series_cte.ending_month
292 | ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
293 | ) AS ending_balance
294 | FROM monthend_series_cte
295 | LEFT JOIN monthly_balances_cte
296 | ON monthend_series_cte.ending_month = monthly_balances_cte.closing_month
297 | AND monthend_series_cte.customer_id = monthly_balances_cte.customer_id
298 | )
299 |
300 | -- Final query: Display the output of customer monthly statement with the ending balances.
301 | SELECT
302 | customer_id,
303 | ending_month,
304 | COALESCE(total_monthly_change, 0) AS total_monthly_change,
305 | MIN(ending_balance) AS ending_balance
306 | FROM monthly_changes_cte
307 | GROUP BY
308 | customer_id, ending_month, total_monthly_change
309 | ORDER BY
310 | customer_id, ending_month;
311 | ```
312 |
313 | **Answer:**
314 |
315 | Showing results for customers ID 1, 2 and 3 only:
316 | |customer_id|ending_month|total_monthly_change|ending_balance|
317 | |:----|:----|:----|:----|
318 | |1|2020-01-31T00:00:00.000Z|312|312|
319 | |1|2020-02-29T00:00:00.000Z|0|312|
320 | |1|2020-03-31T00:00:00.000Z|-952|-964|
321 | |1|2020-04-30T00:00:00.000Z|0|-640|
322 | |2|2020-01-31T00:00:00.000Z|549|549|
323 | |2|2020-02-29T00:00:00.000Z|0|549|
324 | |2|2020-03-31T00:00:00.000Z|61|610|
325 | |2|2020-04-30T00:00:00.000Z|0|610|
326 | |3|2020-01-31T00:00:00.000Z|144|144|
327 | |3|2020-02-29T00:00:00.000Z|-965|-821|
328 | |3|2020-03-31T00:00:00.000Z|-401|-1222|
329 | |3|2020-04-30T00:00:00.000Z|493|-729|
330 |
331 | ***
332 |
333 | **5. Comparing the closing balance of a customer’s first month and the closing balance from their second nth, what percentage of customers:**
334 |
335 | For this question, I have created 2 temporary tables to solve the questions below:
336 | - Create temp table #1 `customer_monthly_balances` by copying and pasting the code from the solution to Question 4.
337 | - Use temp table #1 `ranked_monthly_balances` to create temp table #2 by applying the `ROW_NUMBER()` function.
338 |
339 | ```sql
340 | -- Temp table #1: Create a temp table using Question 4 solution
341 | CREATE TEMP TABLE customer_monthly_balances AS (
342 | WITH monthly_balances_cte AS (
343 | SELECT
344 | customer_id,
345 | (DATE_TRUNC('month', txn_date) + INTERVAL '1 MONTH - 1 DAY') AS closing_month,
346 | SUM(CASE
347 | WHEN txn_type = 'withdrawal' OR txn_type = 'purchase' THEN -txn_amount
348 | ELSE txn_amount END) AS transaction_balance
349 | FROM data_bank.customer_transactions
350 | GROUP BY
351 | customer_id, txn_date
352 | ), monthend_series_cte AS (
353 | SELECT
354 | DISTINCT customer_id,
355 | ('2020-01-31'::DATE + GENERATE_SERIES(0,3) * INTERVAL '1 MONTH') AS ending_month
356 | FROM data_bank.customer_transactions
357 | ), monthly_changes_cte AS (
358 | SELECT
359 | monthend_series_cte.customer_id,
360 | monthend_series_cte.ending_month,
361 | SUM(monthly_balances_cte.transaction_balance) OVER (
362 | PARTITION BY monthend_series_cte.customer_id, monthend_series_cte.ending_month
363 | ORDER BY monthend_series_cte.ending_month
364 | ) AS total_monthly_change,
365 | SUM(monthly_balances_cte.transaction_balance) OVER (
366 | PARTITION BY monthend_series_cte.customer_id
367 | ORDER BY monthend_series_cte.ending_month
368 | ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
369 | ) AS ending_balance
370 | FROM monthend_series_cte
371 | LEFT JOIN monthly_balances_cte
372 | ON monthend_series_cte.ending_month = monthly_balances_cte.closing_month
373 | AND monthend_series_cte.customer_id = monthly_balances_cte.customer_id
374 | )
375 |
376 | SELECT
377 | customer_id,
378 | ending_month,
379 | COALESCE(total_monthly_change, 0) AS total_monthly_change,
380 | MIN(ending_balance) AS ending_balance
381 | FROM monthly_changes_cte
382 | GROUP BY
383 | customer_id, ending_month, total_monthly_change
384 | ORDER BY
385 | customer_id, ending_month;
386 | );
387 |
388 | -- Temp table #2: Create a temp table using temp table #1 `customer_monthly_balances`
389 | CREATE TEMP TABLE ranked_monthly_balances AS (
390 | SELECT
391 | customer_id,
392 | ending_month,
393 | ending_balance,
394 | ROW_NUMBER() OVER (
395 | PARTITION BY customer_id
396 | ORDER BY ending_month) AS sequence
397 | FROM customer_monthly_balances
398 | );
399 | ```
400 |
401 | **- What percentage of customers have a negative first month balance? What percentage of customers have a positive first month balance?**
402 |
403 | To address both questions, I'm using one solution since the questions are asking opposite spectrums of each other.
404 |
405 | ````sql
406 | -- Method 1
407 | SELECT
408 | ROUND(100.0 *
409 | SUM(CASE
410 | WHEN ending_balance::TEXT LIKE '-%' THEN 1 ELSE 0 END)
411 | /(SELECT COUNT(DISTINCT customer_id)
412 | FROM customer_monthly_balances),1) AS negative_first_month_percentage,
413 | ROUND(100.0 *
414 | SUM(CASE
415 | WHEN ending_balance::TEXT NOT LIKE '-%' THEN 1 ELSE 0 END)
416 | /(SELECT COUNT(DISTINCT customer_id)
417 | FROM customer_monthly_balances),1) AS positive_first_month_percentage
418 | FROM ranked_monthly_balances
419 | WHERE ranked_row = 1;
420 | ````
421 |
422 | A cheeky solution would be to simply calculate one of the percentages requested and then deducting it from 100%.
423 | ```sql
424 | -- Method 2
425 | SELECT
426 | ROUND(100.0 *
427 | COUNT(customer_id)
428 | /(SELECT COUNT(DISTINCT customer_id)
429 | FROM customer_monthly_balances),1) AS negative_first_month_percentage,
430 | 100 - ROUND(100.0 * COUNT(customer_id)
431 | /(SELECT COUNT(DISTINCT customer_id)
432 | FROM customer_monthly_balances),1) AS positive_first_month_percentage
433 | FROM ranked_monthly_balances
434 | WHERE ranked_row = 1
435 | AND ending_balance::TEXT LIKE '-%';
436 | ```
437 |
438 | **Answer:**
439 |
440 | |negative_first_month_percentage|positive_first_month_percentage|
441 | |:----|:----|
442 | |44.8|55.2|
443 |
444 | **- What percentage of customers increase their opening month’s positive closing balance by more than 5% in the following month?**
445 |
446 | I'm using `LEAD()` window function to query the balances for the following month and then, filtering the results to select only the records with balances for the 1st and 2nd month.
447 |
448 | Important assumptions:
449 | - Negative balances in the `following_balance` field have been excluded from the results. This is because a higher negative balance in the following month does not represent a true increase in balances.
450 | - Including negative balances could lead to a misrepresentation of the answer as the percentage of variance would still appear as a positive percentage.
451 |
452 | ````sql
453 | WITH following_month_cte AS (
454 | SELECT
455 | customer_id,
456 | ending_month,
457 | ending_balance,
458 | LEAD(ending_balance) OVER (
459 | PARTITION BY customer_id
460 | ORDER BY ending_month) AS following_balance
461 | FROM ranked_monthly_balances
462 | )
463 | , variance_cte AS (
464 | SELECT
465 | customer_id,
466 | ending_month,
467 | ROUND(100.0 *
468 | (following_balance - ending_balance) / ending_balance,1) AS variance
469 | FROM following_month_cte
470 | WHERE ending_month = '2020-01-31'
471 | AND following_balance::TEXT NOT LIKE '-%'
472 | GROUP BY
473 | customer_id, ending_month, ending_balance, following_balance
474 | HAVING ROUND(100.0 * (following_balance - ending_balance) / ending_balance,1) > 5.0
475 | )
476 |
477 | SELECT
478 | ROUND(100.0 *
479 | COUNT(customer_id)
480 | / (SELECT COUNT(DISTINCT customer_id)
481 | FROM ranked_monthly_balances),1) AS increase_5_percentage
482 | FROM variance_cte;
483 | ````
484 |
485 | **Answer:**
486 |
487 | |increase_5_percentage|
488 | |:----|
489 | |20.0|
490 |
491 | - Among the customers, 20% experience a growth of more than 5% in their positive closing balance from the opening month to the following month.
492 |
493 | **- What percentage of customers reduce their opening month’s positive closing balance by more than 5% in the following month?**
494 |
495 | ````sql
496 | WITH following_month_cte AS (
497 | SELECT
498 | customer_id,
499 | ending_month,
500 | ending_balance,
501 | LEAD(ending_balance) OVER (
502 | PARTITION BY customer_id
503 | ORDER BY ending_month) AS following_balance
504 | FROM ranked_monthly_balances
505 | )
506 | , variance_cte AS (
507 | SELECT
508 | customer_id,
509 | ending_month,
510 | ROUND((100.0 *
511 | following_balance - ending_balance) / ending_balance,1) AS variance
512 | FROM following_month_cte
513 | WHERE ending_month = '2020-01-31'
514 | AND following_balance::TEXT NOT LIKE '-%'
515 | GROUP BY
516 | customer_id, ending_month, ending_balance, following_balance
517 | HAVING ROUND((100.0 * (following_balance - ending_balance)) / ending_balance,2) < 5.0
518 | )
519 |
520 | SELECT
521 | ROUND(100.0 *
522 | COUNT(customer_id)
523 | / (SELECT COUNT(DISTINCT customer_id)
524 | FROM ranked_monthly_balances),1) AS reduce_5_percentage
525 | FROM variance_cte;
526 | ````
527 |
528 | **Answer:**
529 |
530 | |reduce_5_percentage|
531 | |:----|
532 | |25.6|
533 |
534 | - Among the customers, 25.6% experience a drop of more than 5% in their positive closing balance from the opening month to the following month.
535 |
536 | **- What percentage of customers move from a positive balance in the first month to a negative balance in the second month?**
537 |
538 | ````sql
539 | WITH following_month_cte AS (
540 | SELECT
541 | customer_id,
542 | ending_month,
543 | ending_balance,
544 | LEAD(ending_balance) OVER (
545 | PARTITION BY customer_id
546 | ORDER BY ending_month) AS following_balance
547 | FROM ranked_monthly_balances
548 | )
549 | , variance_cte AS (
550 | SELECT *
551 | FROM following_month_cte
552 | WHERE ending_month = '2020-01-31'
553 | AND ending_balance::TEXT NOT LIKE '-%'
554 | AND following_balance::TEXT LIKE '-%'
555 | )
556 |
557 | SELECT
558 | ROUND(100.0 *
559 | COUNT(customer_id)
560 | / (SELECT COUNT(DISTINCT customer_id)
561 | FROM ranked_monthly_balances),1) AS positive_to_negative_percentage
562 | FROM variance_cte;
563 | ````
564 |
565 | **Answer:**
566 |
567 | |positive_to_negative_percentage|
568 | |:----|
569 | |20.2|
570 |
571 | - Among the customers, 20.2% transitioned from having a positive balance (`ending_balance`) in the first month to having a negative balance (`following_balance`) in the following month.
572 |
573 | ***
574 |
575 | Do give me a 🌟 if you like what you're reading. Thank you! 🙆🏻♀️
576 |
--------------------------------------------------------------------------------
/Case Study #5 - Data Mart/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #5: Data Mart
2 |
3 |
4 |
5 | ## 📚 Table of Contents
6 | - [Business Task](#business-task)
7 | - [Entity Relationship Diagram](#entity-relationship-diagram)
8 | - [Question and Solution](#question-and-solution)
9 |
10 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-5/).
11 |
12 | ***
13 |
14 | ## Business Task
15 | Data Mart is an online supermarket that specialises in fresh produce.
16 |
17 | In June 2020, large scale supply changes were made at Data Mart. All Data Mart products now use sustainable packaging methods in every single step from the farm all the way to the customer.
18 |
19 | Danny needs your help to analyse and quantify the impact of this change on the sales performance for Data Mart and it’s separate business areas.
20 |
21 | The key business question to answer are the following:
22 | - What was the quantifiable impact of the changes introduced in June 2020?
23 | - Which platform, region, segment and customer types were the most impacted by this change?
24 | - What can we do about future introduction of similar sustainability updates to the business to minimise impact on sales?
25 |
26 | ## Entity Relationship Diagram
27 |
28 | For this case study there is only a single table: `data_mart.weekly_sales`
29 |
30 |
31 |
32 | Here are some further details about the dataset:
33 |
34 | 1. Data Mart has international operations using a multi-region strategy.
35 | 2. Data Mart has both, a retail and online `platform` in the form of a Shopify store front to serve their customers.
36 | 3. Customer `segment` and `customer_type` data relates to personal age and demographics information that is shared with Data Mart.
37 | 4. `transactions` is the count of unique purchases made through Data Mart and `sales` is the actual dollar amount of purchases.
38 |
39 | Each record in the dataset is related to a specific aggregated slice of the underlying sales data rolled up into a `week_date` value which represents the start of the sales week.
40 |
41 | 10 random rows are shown in the table output below from `data_mart.weekly_sales`.
42 |
43 |
44 |
45 | ***
46 |
47 | ## Question and Solution
48 |
49 | Please join me in executing the queries using PostgreSQL on [DB Fiddle](https://www.db-fiddle.com/f/jmnwogTsUE8hGqkZv9H7E8/8). It would be great to work together on the questions!
50 |
51 | If you have any questions, reach out to me on [LinkedIn](https://www.linkedin.com/in/katiehuangx/).
52 |
53 | ## 🧼 A. Data Cleansing Steps
54 |
55 | In a single query, perform the following operations and generate a new table in the `data_mart` schema named `clean_weekly_sales`:
56 | - Convert the `week_date` to a `DATE` format
57 | - Add a `week_number` as the second column for each `week_date` value, for example any value from the 1st of January to 7th of January will be 1, 8th to 14th will be 2 etc
58 | - Add a `month_number` with the calendar month for each `week_date` value as the 3rd column
59 | - Add a `calendar_year` column as the 4th column containing either 2018, 2019 or 2020 values
60 | - Add a new column called `age_band` after the original segment column using the following mapping on the number inside the segment value
61 |
62 |
63 |
64 | - Add a new `demographic` column using the following mapping for the first letter in the `segment` values:
65 |
66 | | segment | demographic |
67 | | ------- | ----------- |
68 | | C | Couples |
69 | | F | Families |
70 |
71 | - Ensure all `null` string values with an "unknown" string value in the original `segment` column as well as the new `age_band` and `demographic` columns
72 | - Generate a new `avg_transaction` column as the sales value divided by transactions rounded to 2 decimal places for each record
73 |
74 | **Answer:**
75 |
76 | Let's construct the structure of `clean_weekly_sales` table and lay out the actions to be taken.
77 |
78 | _`*` represent new columns_
79 |
80 | | Column Name | Action to take |
81 | | ------- | --------------- |
82 | | week_date | Convert to `DATE` using `TO_DATE`
83 | | week_number* | Extract number of week using `DATE_PART`
84 | | month_number* | Extract number of month using `DATE_PART`
85 | | calendar_year* | Extract year using `DATE_PART`
86 | | region | No changes
87 | | platform | No changes
88 | | segment | No changes
89 | | age_band* | Use `CASE` statement and apply conditional logic on `segment` with 1 = `Young Adults`, 2 = `Middle Aged`, 3/4 = `Retirees` and null = `Unknown`
90 | | demographic* | Use `CASE WHEN` and apply conditional logic on based on `segment`, C = `Couples` and F = `Families` and null = `Unknown`
91 | | transactions | No changes
92 | | avg_transaction* | Divide `sales` with `transactions` and round up to 2 decimal places
93 | | sales | No changes
94 |
95 | ````sql
96 | DROP TABLE IF EXISTS clean_weekly_sales;
97 | CREATE TEMP TABLE clean_weekly_sales AS (
98 | SELECT
99 | TO_DATE(week_date, 'DD/MM/YY') AS week_date,
100 | DATE_PART('week', TO_DATE(week_date, 'DD/MM/YY')) AS week_number,
101 | DATE_PART('month', TO_DATE(week_date, 'DD/MM/YY')) AS month_number,
102 | DATE_PART('year', TO_DATE(week_date, 'DD/MM/YY')) AS calendar_year,
103 | region,
104 | platform,
105 | segment,
106 | CASE
107 | WHEN RIGHT(segment,1) = '1' THEN 'Young Adults'
108 | WHEN RIGHT(segment,1) = '2' THEN 'Middle Aged'
109 | WHEN RIGHT(segment,1) in ('3','4') THEN 'Retirees'
110 | ELSE 'unknown' END AS age_band,
111 | CASE
112 | WHEN LEFT(segment,1) = 'C' THEN 'Couples'
113 | WHEN LEFT(segment,1) = 'F' THEN 'Families'
114 | ELSE 'unknown' END AS demographic,
115 | transactions,
116 | ROUND((sales::NUMERIC/transactions),2) AS avg_transaction,
117 | sales
118 | FROM data_mart.weekly_sales
119 | );
120 | ````
121 |
122 |
123 |
124 | ***
125 |
126 | ## 🛍 B. Data Exploration
127 |
128 | **1. What day of the week is used for each week_date value?**
129 |
130 | ````sql
131 | SELECT DISTINCT(TO_CHAR(week_date, 'day')) AS week_day
132 | FROM clean_weekly_sales;
133 | ````
134 |
135 | **Answer:**
136 |
137 | |week_day|
138 | |:----|
139 | |monday|
140 |
141 | - Monday is used for the `week_date` value.
142 |
143 | **2. What range of week numbers are missing from the dataset?**
144 | - First, generate a range of week numbers for the entire year from 1st week to the 52nd week using the `GENERATE_SERIES()` function.
145 | - Then, perform a `LEFT JOIN` with the `clean_weekly_sales`. Ensure that the join sequence is the CTE followed by the `clean_weekly_sales` as reversing the sequence would result in null results (unless you opt for a `RIGHT JOIN` instead!).
146 |
147 | ````sql
148 | WITH week_number_cte AS (
149 | SELECT GENERATE_SERIES(1,52) AS week_number
150 | )
151 |
152 | SELECT DISTINCT week_no.week_number
153 | FROM week_number_cte AS week_no
154 | LEFT JOIN clean_weekly_sales AS sales
155 | ON week_no.week_number = sales.week_number
156 | WHERE sales.week_number IS NULL; -- Filter to identify the missing week numbers where the values are `NULL`.
157 | ````
158 |
159 | **Answer:**
160 |
161 | _I'm posting only the results of 5 rows here. Ensure that you have retrieved 28 rows!_
162 |
163 | |week_number|
164 | |:----|
165 | |1|
166 | |2|
167 | |3|
168 | |37|
169 | |41|
170 |
171 | - The dataset is missing a total of 28 `week_number` records.
172 |
173 | **3. How many total transactions were there for each year in the dataset?**
174 |
175 | ````sql
176 | SELECT
177 | calendar_year,
178 | SUM(transactions) AS total_transactions
179 | FROM clean_weekly_sales
180 | GROUP BY calendar_year
181 | ORDER BY calendar_year;
182 | ````
183 |
184 | **Answer:**
185 |
186 | |calendar_year|total_transactions|
187 | |:----|:----|
188 | |2018|346406460|
189 | |2019|365639285|
190 | |2020|375813651|
191 |
192 | **4. What is the total sales for each region for each month?**
193 |
194 | ````sql
195 | SELECT
196 | month_number,
197 | region,
198 | SUM(sales) AS total_sales
199 | FROM clean_weekly_sales
200 | GROUP BY month_number, region
201 | ORDER BY month_number, region;
202 | ````
203 |
204 | **Answer:**
205 |
206 | I'm only showing the results for the month of March.
207 |
208 | |month_number|region|total_sales|
209 | |:----|:----|:----|
210 | |3|AFRICA|567767480|
211 | |3|ASIA|529770793|
212 | |3|CANADA|144634329|
213 | |3|EUROPE|35337093|
214 | |3|OCEANIA|783282888|
215 | |3|SOUTH AMERICA|71023109|
216 | |3|USA|225353043|
217 |
218 | **5. What is the total count of transactions for each platform?**
219 |
220 | ````sql
221 | SELECT
222 | platform,
223 | SUM(transactions) AS total_transactions
224 | FROM clean_weekly_sales
225 | GROUP BY platform;
226 | ````
227 |
228 | **Answer:**
229 |
230 | |platform|total_transactions|
231 | |:----|:----|
232 | |Retail|1081934227|
233 | |Shopify|5925169|
234 |
235 | **6. What is the percentage of sales for Retail vs Shopify for each month?**
236 |
237 | ````sql
238 | WITH monthly_transactions AS (
239 | SELECT
240 | calendar_year,
241 | month_number,
242 | platform,
243 | SUM(sales) AS monthly_sales
244 | FROM clean_weekly_sales
245 | GROUP BY calendar_year, month_number, platform
246 | )
247 |
248 | SELECT
249 | calendar_year,
250 | month_number,
251 | ROUND(100 * MAX
252 | (CASE
253 | WHEN platform = 'Retail' THEN monthly_sales ELSE NULL END)
254 | / SUM(monthly_sales),2) AS retail_percentage,
255 | ROUND(100 * MAX
256 | (CASE
257 | WHEN platform = 'Shopify' THEN monthly_sales ELSE NULL END)
258 | / SUM(monthly_sales),2) AS shopify_percentage
259 | FROM monthly_transactions
260 | GROUP BY calendar_year, month_number
261 | ORDER BY calendar_year, month_number;
262 | ````
263 |
264 | **Answer:**
265 |
266 | _Although I am only displaying the rows for the year 2018, please note that the overall results consist of 20 rows._
267 |
268 | |calendar_year|month_number|retail_percentage|shopify_percentage|
269 | |:----|:----|:----|:----|
270 | |2018|3|97.92|2.08|
271 | |2018|4|97.93|2.07|
272 | |2018|5|97.73|2.27|
273 | |2018|6|97.76|2.24|
274 | |2018|7|97.75|2.25|
275 | |2018|8|97.71|2.29|
276 | |2018|9|97.68|2.32|
277 |
278 | **7. What is the percentage of sales by demographic for each year in the dataset?**
279 |
280 | ````sql
281 | WITH demographic_sales AS (
282 | SELECT
283 | calendar_year,
284 | demographic,
285 | SUM(sales) AS yearly_sales
286 | FROM clean_weekly_sales
287 | GROUP BY calendar_year, demographic
288 | )
289 |
290 | SELECT
291 | calendar_year,
292 | ROUND(100 * MAX
293 | (CASE
294 | WHEN demographic = 'Couples' THEN yearly_sales ELSE NULL END)
295 | / SUM(yearly_sales),2) AS couples_percentage,
296 | ROUND(100 * MAX
297 | (CASE
298 | WHEN demographic = 'Families' THEN yearly_sales ELSE NULL END)
299 | / SUM(yearly_sales),2) AS families_percentage,
300 | ROUND(100 * MAX
301 | (CASE
302 | WHEN demographic = 'unknown' THEN yearly_sales ELSE NULL END)
303 | / SUM(yearly_sales),2) AS unknown_percentage
304 | FROM demographic_sales
305 | GROUP BY calendar_year;
306 | ````
307 |
308 | **Answer:**
309 |
310 | |calendar_year|couples_percentage|families_percentage|unknown_percentage|
311 | |:----|:----|:----|:----|
312 | |2019|27.28|32.47|40.25|
313 | |2018|26.38|31.99|41.63|
314 | |2020|28.72|32.73|38.55|
315 |
316 | **8. Which age_band and demographic values contribute the most to Retail sales?**
317 |
318 | ````sql
319 | SELECT
320 | age_band,
321 | demographic,
322 | SUM(sales) AS retail_sales,
323 | ROUND(100 *
324 | SUM(sales)::NUMERIC
325 | / SUM(SUM(sales)) OVER (),
326 | 1) AS contribution_percentage
327 | FROM clean_weekly_sales
328 | WHERE platform = 'Retail'
329 | GROUP BY age_band, demographic
330 | ORDER BY retail_sales DESC;
331 | ````
332 |
333 | **Answer:**
334 |
335 | |age_band|demographic|retail_sales|contribution_percentage|
336 | |:----|:----|:----|:----|
337 | |unknown|unknown|16067285533|40.5|
338 | |Retirees|Families|6634686916|16.7|
339 | |Retirees|Couples|6370580014|16.1|
340 | |Middle Aged|Families|4354091554|11.0|
341 | |Young Adults|Couples|2602922797|6.6|
342 | |Middle Aged|Couples|1854160330|4.7|
343 | |Young Adults|Families|1770889293|4.5|
344 |
345 | The majority of the highest retail sales accounting for 42% are contributed by unknown `age_band` and `demographic`. This is followed by retired families at 16.73% and retired couples at 16.07%.
346 |
347 | **9. Can we use the `avg_transaction` column to find the average transaction size for each year for Retail vs Shopify? If not - how would you calculate it instead?**
348 |
349 | ````sql
350 | SELECT
351 | calendar_year,
352 | platform,
353 | ROUND(AVG(avg_transaction),0) AS avg_transaction_row,
354 | SUM(sales) / sum(transactions) AS avg_transaction_group
355 | FROM clean_weekly_sales
356 | GROUP BY calendar_year, platform
357 | ORDER BY calendar_year, platform;
358 | ````
359 |
360 | **Answer:**
361 |
362 | |calendar_year|platform|avg_transaction_row|avg_transaction_group|
363 | |:----|:----|:----|:----|
364 | |2018|Retail|43|36|
365 | |2018|Shopify|188|192|
366 | |2019|Retail|42|36|
367 | |2019|Shopify|178|183|
368 | |2020|Retail|41|36|
369 | |2020|Shopify|175|179|
370 |
371 | The difference between `avg_transaction_row` and `avg_transaction_group` is as follows:
372 | - `avg_transaction_row` calculates the average transaction size by dividing the sales of each row by the number of transactions in that row.
373 | - On the other hand, `avg_transaction_group` calculates the average transaction size by dividing the total sales for the entire dataset by the total number of transactions.
374 |
375 | For finding the average transaction size for each year by platform accurately, it is recommended to use `avg_transaction_group`.
376 |
377 | ***
378 |
379 | ## 🧼 C. Before & After Analysis
380 |
381 | This technique is usually used when we inspect an important event and want to inspect the impact before and after a certain point in time.
382 |
383 | Taking the `week_date` value of `2020-06-15` as the baseline week where the Data Mart sustainable packaging changes came into effect. We would include all `week_date` values for `2020-06-15` as the start of the period after the change and the previous week_date values would be before.
384 |
385 | Using this analysis approach - answer the following questions:
386 |
387 | **1. What is the total sales for the 4 weeks before and after `2020-06-15`? What is the growth or reduction rate in actual values and percentage of sales?**
388 |
389 | Before we proceed, we determine the the week_number corresponding to '2020-06-15' to use it as a filter in our analysis.
390 |
391 | ````sql
392 | SELECT DISTINCT week_number
393 | FROM clean_weekly_sales
394 | WHERE week_date = '2020-06-15'
395 | AND calendar_year = '2020';
396 | ````
397 |
398 | |week_number|
399 | |:----|
400 | |25|
401 |
402 | The `week_number` is 25. I created 2 CTEs:
403 | - `packaging_sales` CTE: Filter the dataset for 4 weeks before and after `2020-06-15` and calculate the sum of sales within the period.
404 | - `before_after_changes` CTE: Utilize a `CASE` statement to capture the sales for 4 weeks before and after `2020-06-15` and then calculate the total sales for the specified period.
405 |
406 | ````sql
407 | WITH packaging_sales AS (
408 | SELECT
409 | week_date,
410 | week_number,
411 | SUM(sales) AS total_sales
412 | FROM clean_weekly_sales
413 | WHERE (week_number BETWEEN 21 AND 28)
414 | AND (calendar_year = 2020)
415 | GROUP BY week_date, week_number
416 | )
417 | , before_after_changes AS (
418 | SELECT
419 | SUM(CASE
420 | WHEN week_number BETWEEN 21 AND 24 THEN total_sales END) AS before_packaging_sales,
421 | SUM(CASE
422 | WHEN week_number BETWEEN 25 AND 28 THEN total_sales END) AS after_packaging_sales
423 | FROM packaging_sales
424 | )
425 |
426 | SELECT
427 | after_packaging_sales - before_packaging_sales AS sales_variance,
428 | ROUND(100 *
429 | (after_packaging_sales - before_packaging_sales)
430 | / before_packaging_sales,2) AS variance_percentage
431 | FROM before_after_changes;
432 | ````
433 |
434 | **Answer:**
435 |
436 | |sales_variance|variance_percentage|
437 | |:----|:----|
438 | |-26884188|-1.15|
439 |
440 | Since the implementation of the new sustainable packaging, there has been a decrease in sales amounting by $26,884,188 reflecting a negative change at 1.15%. Introducing a new packaging does not always guarantee positive results as customers may not readily recognise your product on the shelves due to the change in packaging.
441 |
442 | ***
443 |
444 | **2. What about the entire 12 weeks before and after?**
445 |
446 | We can apply a similar approach and solution to address this question.
447 |
448 | ````sql
449 | WITH packaging_sales AS (
450 | SELECT
451 | week_date,
452 | week_number,
453 | SUM(sales) AS total_sales
454 | FROM clean_weekly_sales
455 | WHERE (week_number BETWEEN 13 AND 37)
456 | AND (calendar_year = 2020)
457 | GROUP BY week_date, week_number
458 | )
459 | , before_after_changes AS (
460 | SELECT
461 | SUM(CASE
462 | WHEN week_number BETWEEN 13 AND 24 THEN total_sales END) AS before_packaging_sales,
463 | SUM(CASE
464 | WHEN week_number BETWEEN 25 AND 37 THEN total_sales END) AS after_packaging_sales
465 | FROM packaging_sales
466 | )
467 |
468 | SELECT
469 | after_packaging_sales - before_packaging_sales AS sales_variance,
470 | ROUND(100 *
471 | (after_packaging_sales - before_packaging_sales) / before_packaging_sales,2) AS variance_percentage
472 | FROM before_after_changes;
473 | ````
474 |
475 | **Answer:**
476 |
477 | |sales_variance|variance_percentage|
478 | |:----|:----|
479 | |-152325394|-2.14|
480 |
481 | Looks like the sales have experienced a further decline, now at a negative 2.14%! If I'm Danny's boss, I wouldn't be too happy with the results.
482 |
483 | ***
484 |
485 | **3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019?**
486 |
487 | I'm breaking down this question to 2 parts.
488 |
489 | **Part 1: How do the sale metrics for 4 weeks before and after compare with the previous years in 2018 and 2019?**
490 | - Basically, the question is asking us to find the sales variance between 4 weeks before and after `'2020-06-15'` for years 2018, 2019 and 2020. Perhaps we can find a pattern here.
491 | - We can apply the same solution as above and add `calendar_year` into the syntax.
492 |
493 | ````sql
494 | WITH changes AS (
495 | SELECT
496 | calendar_year,
497 | week_number,
498 | SUM(sales) AS total_sales
499 | FROM clean_weekly_sales
500 | WHERE week_number BETWEEN 21 AND 28
501 | GROUP BY calendar_year, week_number
502 | )
503 | , before_after_changes AS (
504 | SELECT
505 | calendar_year,
506 | SUM(CASE
507 | WHEN week_number BETWEEN 13 AND 24 THEN total_sales END) AS before_packaging_sales,
508 | SUM(CASE
509 | WHEN week_number BETWEEN 25 AND 28 THEN total_sales END) AS after_packaging_sales
510 | FROM changes
511 | GROUP BY calendar_year
512 | )
513 |
514 | SELECT
515 | calendar_year,
516 | after_packaging_sales - before_packaging_sales AS sales_variance,
517 | ROUND(100 *
518 | (after_packaging_sales - before_packaging_sales)
519 | / before_packaging_sales,2) AS variance_percentage
520 | FROM before_after_changes;
521 | ````
522 |
523 | **Answer:**
524 |
525 | |calendar_year|sales_variance|variance_percentage|
526 | |:----|:----|:----|
527 | |2018|4102105|0.19|
528 | |2019|2336594|0.10|
529 | |2020|-26884188|-1.15|
530 |
531 | In 2018, there was a sales variance of $4,102,105, indicating a positive change of 0.19% compared to the period before the packaging change.
532 |
533 | Similarly, in 2019, there was a sales variance of $2,336,594, corresponding to a positive change of 0.10% when comparing the period before and after the packaging change.
534 |
535 | However, in 2020, there was a substantial decrease in sales following the packaging change. The sales variance amounted to $26,884,188, indicating a significant negative change of -1.15%. This reduction represents a considerable drop compared to the previous years.
536 |
537 | **Part 2: How do the sale metrics for 12 weeks before and after compare with the previous years in 2018 and 2019?**
538 | - Use the same solution above and change to week 13 to 24 for before and week 25 to 37 for after.
539 |
540 | ````sql
541 | WITH changes AS (
542 | SELECT
543 | calendar_year,
544 | week_number,
545 | SUM(sales) AS total_sales
546 | FROM clean_weekly_sales
547 | WHERE week_number BETWEEN 13 AND 37
548 | GROUP BY calendar_year, week_number
549 | )
550 | , before_after_changes AS (
551 | SELECT
552 | calendar_year,
553 | SUM(CASE
554 | WHEN week_number BETWEEN 13 AND 24 THEN total_sales END) AS before_packaging_sales,
555 | SUM(CASE
556 | WHEN week_number BETWEEN 25 AND 37 THEN total_sales END) AS after_packaging_sales
557 | FROM changes
558 | GROUP BY calendar_year
559 | )
560 |
561 | SELECT
562 | calendar_year,
563 | after_packaging_sales - before_packaging_sales AS sales_variance,
564 | ROUND(100 *
565 | (after_packaging_sales - before_packaging_sales)
566 | / before_packaging_sales,2) AS variance_percentage
567 | FROM before_after_changes;
568 | ````
569 |
570 | **Answer:**
571 |
572 | |calendar_year|sales_variance|variance_percentage|
573 | |:----|:----|:----|
574 | |2018|104256193|1.63|
575 | |2019|-20740294|-0.30|
576 | |2020|-152325394|-2.14|
577 |
578 | There was a fair bit of percentage differences in all 3 years. However, now when you compare the worst year to their best year in 2018, the sales percentage difference is even more stark at a difference of 3.77% (1.63% + 2.14%).
579 |
580 | When comparing the sales performance across all three years, there were noticeable variations in the percentage differences. However, the most significant contrast emerges when comparing the worst-performing year in 2020 to the best-performing year in 2018. In this comparison, the sales percentage difference becomes even more apparent with a significant gap of 3.77% (1.63% + 2.14%).
581 |
582 | ***
583 |
584 | ## D. Bonus Question
585 |
586 | Which areas of the business have the highest negative impact in sales metrics performance in 2020 for the 12 week before and after period?
587 | - `region`
588 | - `platform`
589 | - `age_band`
590 | - `demographic`
591 | - `customer_type`
592 |
593 | Do you have any further recommendations for Danny’s team at Data Mart or any interesting insights based off this analysis?
594 |
595 | ***
596 |
--------------------------------------------------------------------------------
/Case Study #6 - Clique Bait/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #6: Clique Bait
2 |
3 |
4 |
5 | ## 📚 Table of Contents
6 | - [Business Task](#business-task)
7 | - [Entity Relationship Diagram](#entity-relationship-diagram)
8 | - [Question and Solution](#question-and-solution)
9 |
10 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-6/).
11 |
12 | ***
13 |
14 | ## Business Task
15 | Clique Bait is an online seafood store.
16 |
17 | In this case study - you are required to support the founder and CEO Danny’s vision and analyse his dataset and come up with creative solutions to calculate funnel fallout rates for the Clique Bait online store.
18 |
19 | ## Entity Relationship Diagram
20 |
21 |
22 |
23 | **Table: `users`**
24 |
25 |
26 |
27 | **Table: `events`**
28 |
29 |
30 |
31 | **Table: `event_identifier`**
32 |
33 |
34 |
35 | **Table: `page_hierarchy`**
36 |
37 |
38 |
39 | **Table: `campaign_identifier`**
40 |
41 |
42 |
43 | ***
44 |
45 | ## Question and Solution
46 |
47 | Please join me in executing the queries using PostgreSQL on [DB Fiddle](https://www.db-fiddle.com/f/jmnwogTsUE8hGqkZv9H7E8/17). It would be great to work together on the questions!
48 |
49 | If you have any questions, reach out to me on [LinkedIn](https://www.linkedin.com/in/katiehuangx/).
50 |
51 | ## 👩🏻💻 A. Digital Analysis
52 |
53 | **1. How many users are there?**
54 |
55 | ````sql
56 | SELECT
57 | COUNT(DISTINCT user_id) AS user_count
58 | FROM clique_bait.users;
59 | ````
60 |
61 |
62 |
63 | **2. How many cookies does each user have on average?**
64 |
65 | This was one of those tricky questions that seems easy, but the solution is not as clear as it seems.
66 |
67 | - Question is asking the number of cookies each user have on average. That's calling us to either use a `DISTINCT` or `GROUP BY` in order to ensure the count of cookies belonging to each user is unique.
68 | - Next, round up the average cookie count with 0 decimal point as it will not make sense for the cookie to be in fractional form.
69 |
70 | ````sql
71 | WITH cookie AS (
72 | SELECT
73 | user_id,
74 | COUNT(cookie_id) AS cookie_id_count
75 | FROM clique_bait.users
76 | GROUP BY user_id)
77 |
78 | SELECT
79 | ROUND(AVG(cookie_id_count),0) AS avg_cookie_id
80 | FROM cookie;
81 | ````
82 |
83 |
84 |
85 | **3. What is the unique number of visits by all users per month?**
86 | - First, extract numerical month from `event_time` so that we can group the data by month.
87 | - Unique is a keyword to use `DISTINCT`.
88 |
89 | ````sql
90 | SELECT
91 | EXTRACT(MONTH FROM event_time) as month,
92 | COUNT(DISTINCT visit_id) AS unique_visit_count
93 | FROM clique_bait.events
94 | GROUP BY EXTRACT(MONTH FROM event_time);
95 | ````
96 |
97 |
98 |
99 | **4. What is the number of events for each event type?**
100 |
101 | ````sql
102 | SELECT
103 | event_type,
104 | COUNT(*) AS event_count
105 | FROM clique_bait.events
106 | GROUP BY event_type
107 | ORDER BY event_type;
108 | ````
109 |
110 |
111 |
112 | **5. What is the percentage of visits which have a purchase event?**
113 | - Join the events and events_identifier table and filter by `Purchase` event only.
114 | - As the data is now filtered to having `Purchase` events only, counting the distinct visit IDs would give you the number of purchase events.
115 | - Then, divide the number of purchase events with a subquery of total number of distinct visits from the `events` table.
116 |
117 | ````sql
118 | SELECT
119 | 100 * COUNT(DISTINCT e.visit_id)/
120 | (SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events) AS percentage_purchase
121 | FROM clique_bait.events AS e
122 | JOIN clique_bait.event_identifier AS ei
123 | ON e.event_type = ei.event_type
124 | WHERE ei.event_name = 'Purchase';
125 | ````
126 |
127 |
128 |
129 | **6. What is the percentage of visits which view the checkout page but do not have a purchase event?**
130 | The strategy to answer this question is to breakdown the question into 2 parts.
131 |
132 | Part 1: Create a `CTE` and using `CASE statements`, find the `MAX()` of:
133 | - `event_type` = 1 (Page View) and `page_id` = 12 (Checkout), and assign "1" to these events. These events are when user viewed the checkout page.
134 | - `event_type` = 3 (Purchase) and assign "1" to these events. These events signifies users who made a purchase.
135 |
136 | We're using MAX() because we do not want to group the results by `event_type` and `page_id`. Since the max score is "1", it would mean "Give me the max score for each event".
137 |
138 | Part 2: Using the table we have created, find the percentage of visits which view checkout page.
139 |
140 | ````sql
141 | WITH checkout_purchase AS (
142 | SELECT
143 | visit_id,
144 | MAX(CASE WHEN event_type = 1 AND page_id = 12 THEN 1 ELSE 0 END) AS checkout,
145 | MAX(CASE WHEN event_type = 3 THEN 1 ELSE 0 END) AS purchase
146 | FROM clique_bait.events
147 | GROUP BY visit_id)
148 |
149 | SELECT
150 | ROUND(100 * (1-(SUM(purchase)::numeric/SUM(checkout))),2) AS percentage_checkout_view_with_no_purchase
151 | FROM checkout_purchase
152 | ````
153 |
154 |
155 |
156 | **7. What are the top 3 pages by number of views?**
157 |
158 | ````sql
159 | SELECT
160 | ph.page_name,
161 | COUNT(*) AS page_views
162 | FROM clique_bait.events AS e
163 | JOIN clique_bait.page_hierarchy AS ph
164 | ON e.page_id = ph.page_id
165 | WHERE e.event_type = 1 -- "Page View"
166 | GROUP BY ph.page_name
167 | ORDER BY page_views DESC -- Order by descending to retrieve highest to lowest number of views
168 | LIMIT 3; -- Limit results to 3 to find the top 3
169 | ````
170 |
171 | **8. What is the number of views and cart adds for each product category?**
172 |
173 | ````sql
174 | SELECT
175 | ph.product_category,
176 | SUM(CASE WHEN e.event_type = 1 THEN 1 ELSE 0 END) AS page_views,
177 | SUM(CASE WHEN e.event_type = 2 THEN 1 ELSE 0 END) AS cart_adds
178 | FROM clique_bait.events AS e
179 | JOIN clique_bait.page_hierarchy AS ph
180 | ON e.page_id = ph.page_id
181 | WHERE ph.product_category IS NOT NULL
182 | GROUP BY ph.product_category
183 | ORDER BY page_views DESC;
184 | ````
185 |
186 |
187 |
188 | **9. What are the top 3 products by purchases?**
189 |
190 |
191 | ***
192 |
193 | ## 👩🏻💻 B. Product Funnel Analysis
194 |
195 | Using a single SQL query - create a new output table which has the following details:
196 |
197 | 1. How many times was each product viewed?
198 | 2. How many times was each product added to cart?
199 | 3. How many times was each product added to a cart but not purchased (abandoned)?
200 | 4. How many times was each product purchased?
201 |
202 | ## Planning Our Strategy
203 |
204 | Let us visualize the output table.
205 |
206 | | Column | Description |
207 | | ------- | ----------- |
208 | | product | Name of the product |
209 | | views | Number of views for each product |
210 | | cart_adds | Number of cart adds for each product |
211 | | abandoned | Number of times product was added to a cart, but not purchased |
212 | | purchased | Number of times product was purchased |
213 |
214 | These information would come from these 2 tables.
215 | - `events` table - visit_id, page_id, event_type
216 | - `page_hierarchy` table - page_id, product_category
217 |
218 | **Solution**
219 |
220 | - Note 1 - In `product_page_events` CTE, find page views and cart adds for individual visit ids by wrapping `SUM` around `CASE statements` so that we do not have to group the results by `event_type` as well.
221 | - Note 2 - In `purchase_events` CTE, get only visit ids that have made purchases.
222 | - Note 3 - In `combined_table` CTE, merge `product_page_events` and `purchase_events` using `LEFT JOIN`. Take note of the table sequence. In order to filter for visit ids with purchases, we use a `CASE statement` and where visit id is not null, it means the visit id is a purchase.
223 |
224 | ```sql
225 | WITH product_page_events AS ( -- Note 1
226 | SELECT
227 | e.visit_id,
228 | ph.product_id,
229 | ph.page_name AS product_name,
230 | ph.product_category,
231 | SUM(CASE WHEN e.event_type = 1 THEN 1 ELSE 0 END) AS page_view, -- 1 for Page View
232 | SUM(CASE WHEN e.event_type = 2 THEN 1 ELSE 0 END) AS cart_add -- 2 for Add Cart
233 | FROM clique_bait.events AS e
234 | JOIN clique_bait.page_hierarchy AS ph
235 | ON e.page_id = ph.page_id
236 | WHERE product_id IS NOT NULL
237 | GROUP BY e.visit_id, ph.product_id, ph.page_name, ph.product_category
238 | ),
239 | purchase_events AS ( -- Note 2
240 | SELECT
241 | DISTINCT visit_id
242 | FROM clique_bait.events
243 | WHERE event_type = 3 -- 3 for Purchase
244 | ),
245 | combined_table AS ( -- Note 3
246 | SELECT
247 | ppe.visit_id,
248 | ppe.product_id,
249 | ppe.product_name,
250 | ppe.product_category,
251 | ppe.page_view,
252 | ppe.cart_add,
253 | CASE WHEN pe.visit_id IS NOT NULL THEN 1 ELSE 0 END AS purchase
254 | FROM product_page_events AS ppe
255 | LEFT JOIN purchase_events AS pe
256 | ON ppe.visit_id = pe.visit_id
257 | ),
258 | product_info AS (
259 | SELECT
260 | product_name,
261 | product_category,
262 | SUM(page_view) AS views,
263 | SUM(cart_add) AS cart_adds,
264 | SUM(CASE WHEN cart_add = 1 AND purchase = 0 THEN 1 ELSE 0 END) AS abandoned,
265 | SUM(CASE WHEN cart_add = 1 AND purchase = 1 THEN 1 ELSE 0 END) AS purchases
266 | FROM combined_table
267 | GROUP BY product_id, product_name, product_category)
268 |
269 | SELECT *
270 | FROM product_info
271 | ORDER BY product_id;
272 | ```
273 |
274 | The logic behind `abandoned` column in which `cart_add = 1` where a customer adds an item into the cart, but `purchase = 0` customer did not purchase and abandoned the cart.
275 |
276 |
277 |
278 | ***
279 |
280 | Additionally, create another table which further aggregates the data for the above points but this time for each product category instead of individual products.
281 |
282 | **Solution**
283 |
284 | ```sql
285 | WITH product_page_events AS ( -- Note 1
286 | SELECT
287 | e.visit_id,
288 | ph.product_id,
289 | ph.page_name AS product_name,
290 | ph.product_category,
291 | SUM(CASE WHEN e.event_type = 1 THEN 1 ELSE 0 END) AS page_view, -- 1 for Page View
292 | SUM(CASE WHEN e.event_type = 2 THEN 1 ELSE 0 END) AS cart_add -- 2 for Add Cart
293 | FROM clique_bait.events AS e
294 | JOIN clique_bait.page_hierarchy AS ph
295 | ON e.page_id = ph.page_id
296 | WHERE product_id IS NOT NULL
297 | GROUP BY e.visit_id, ph.product_id, ph.page_name, ph.product_category
298 | ),
299 | purchase_events AS ( -- Note 2
300 | SELECT
301 | DISTINCT visit_id
302 | FROM clique_bait.events
303 | WHERE event_type = 3 -- 3 for Purchase
304 | ),
305 | combined_table AS ( -- Note 3
306 | SELECT
307 | ppe.visit_id,
308 | ppe.product_id,
309 | ppe.product_name,
310 | ppe.product_category,
311 | ppe.page_view,
312 | ppe.cart_add,
313 | CASE WHEN pe.visit_id IS NOT NULL THEN 1 ELSE 0 END AS purchase
314 | FROM product_page_events AS ppe
315 | LEFT JOIN purchase_events AS pe
316 | ON ppe.visit_id = pe.visit_id
317 | ),
318 | product_category AS (
319 | SELECT
320 | product_category,
321 | SUM(page_view) AS views,
322 | SUM(cart_add) AS cart_adds,
323 | SUM(CASE WHEN cart_add = 1 AND purchase = 0 THEN 1 ELSE 0 END) AS abandoned,
324 | SUM(CASE WHEN cart_add = 1 AND purchase = 1 THEN 1 ELSE 0 END) AS purchases
325 | FROM combined_table
326 | GROUP BY product_category)
327 |
328 | SELECT *
329 | FROM product_category
330 | ```
331 |
332 |
333 |
334 | ***
335 |
336 | Use your 2 new output tables - answer the following questions:
337 |
338 | **1. Which product had the most views, cart adds and purchases?**
339 |
340 | **2. Which product was most likely to be abandoned?**
341 |
342 |
343 |
344 | - Oyster has the most views.
345 | - Lobster has the most cart adds and purchases.
346 | - Russian Caviar is most likely to be abandoned.
347 |
348 | **3. Which product had the highest view to purchase percentage?**
349 |
350 | ```sql
351 | SELECT
352 | product_name,
353 | product_category,
354 | ROUND(100 * purchases/views,2) AS purchase_per_view_percentage
355 | FROM product_info
356 | ORDER BY purchase_per_view_percentage DESC
357 | ```
358 |
359 |
360 |
361 | - Lobster has the highest view to purchase percentage at 48.74%.
362 |
363 | **4. What is the average conversion rate from view to cart add?**
364 |
365 | **5. What is the average conversion rate from cart add to purchase?**
366 |
367 | ```sql
368 | SELECT
369 | ROUND(100*AVG(cart_adds/views),2) AS avg_view_to_cart_add_conversion,
370 | ROUND(100*AVG(purchases/cart_adds),2) AS avg_cart_add_to_purchases_conversion_rate
371 | FROM product_info
372 | ```
373 |
374 |
375 |
376 | - Average views to cart adds rate is 60.95% and average cart adds to purchases rate is 75.93%.
377 | - Although the cart add rate is lower, but the conversion of potential customer to the sales funnel is at least 15% higher.
378 |
379 | ***
380 |
381 | ## 👩🏻💻 C. Campaigns Analysis
382 |
383 | Generate a table that has 1 single row for every unique visit_id record and has the following columns:
384 | - `user_id`
385 | - `visit_id`
386 | - `visit_start_time`: the earliest event_time for each visit
387 | - `page_views`: count of page views for each visit
388 | - `cart_adds`: count of product cart add events for each visit
389 | - `purchase`: 1/0 flag if a purchase event exists for each visit
390 | - `campaign_name`: map the visit to a campaign if the visit_start_time falls between the start_date and end_date
391 | - `impression`: count of ad impressions for each visit
392 | - `click`: count of ad clicks for each visit
393 | - (Optional column) `cart_products`: a comma separated text value with products added to the cart sorted by the order they were added to the cart (hint: use the sequence_number)
394 |
395 | **Solution**
396 |
397 | Steps:
398 | - We will merge multiple tables:
399 | - Using INNER JOIN for `users` and `events` table
400 | - Joining `campaign_identifier` table using LEFT JOIN as we want all lines that have `event_time` between `start_date` and `end_date`.
401 | - Joining `page_hierachy` table using LEFT JOIN as we want all the rows in the `page_hierachy` table
402 | - To generate earliest `visit_start_time` for each unique `visit_id`, use `MIN()` to find the 1st `visit_time`.
403 | - Wrap `SUM()` with CASE statement in order to find the total number of counts for `page_views`, `cart_adds`, `purchase`, ad `impression` and ad `click`.
404 | - To get a list of products added into cart sorted by sequence,
405 | - Firstly, use a CASE statement to only get cart add events.
406 | - Then, use `STRING_AGG()` to separate products by comma `,` and sort the sequence using `sequence_number`.
407 |
408 | ```sql
409 | SELECT
410 | u.user_id, e.visit_id,
411 | MIN(e.event_time) AS visit_start_time,
412 | SUM(CASE WHEN e.event_type = 1 THEN 1 ELSE 0 END) AS page_views,
413 | SUM(CASE WHEN e.event_type = 2 THEN 1 ELSE 0 END) AS cart_adds,
414 | SUM(CASE WHEN e.event_type = 3 THEN 1 ELSE 0 END) AS purchase,
415 | c.campaign_name,
416 | SUM(CASE WHEN e.event_type = 4 THEN 1 ELSE 0 END) AS impression,
417 | SUM(CASE WHEN e.event_type = 5 THEN 1 ELSE 0 END) AS click,
418 | STRING_AGG(CASE WHEN p.product_id IS NOT NULL AND e.event_type = 2 THEN p.page_name ELSE NULL END,
419 | ', ' ORDER BY e.sequence_number) AS cart_products
420 | FROM clique_bait.users AS u
421 | INNER JOIN clique_bait.events AS e
422 | ON u.cookie_id = e.cookie_id
423 | LEFT JOIN clique_bait.campaign_identifier AS c
424 | ON e.event_time BETWEEN c.start_date AND c.end_date
425 | LEFT JOIN clique_bait.page_hierarchy AS p
426 | ON e.page_id = p.page_id
427 | GROUP BY u.user_id, e.visit_id, c.campaign_name;
428 | ```
429 |
430 | | user_id | visit_id | visit_start_time | page_views | cart_adds | purchase | campaign_name | impression | click | cart_products |
431 | |---------|----------|--------------------------|------------|-----------|----------|-----------------------------------|------------|-------|----------------------------------------------------------------|
432 | | 1 | 02a5d5 | 2020-02-26T16:57:26.261Z | 4 | 0 | 0 | Half Off - Treat Your Shellf(ish) | 0 | 0 | |
433 | | 1 | 0826dc | 2020-02-26T05:58:37.919Z | 1 | 0 | 0 | Half Off - Treat Your Shellf(ish) | 0 | 0 | |
434 | | 1 | 0fc437 | 2020-02-04T17:49:49.603Z | 10 | 6 | 1 | Half Off - Treat Your Shellf(ish) | 1 | 1 | Tuna, Russian Caviar, Black Truffle, Abalone, Crab, Oyster |
435 | | 1 | 30b94d | 2020-03-15T13:12:54.024Z | 9 | 7 | 1 | Half Off - Treat Your Shellf(ish) | 1 | 1 | Salmon, Kingfish, Tuna, Russian Caviar, Abalone, Lobster, Crab |
436 | | 1 | 41355d | 2020-03-25T00:11:17.861Z | 6 | 1 | 0 | Half Off - Treat Your Shellf(ish) | 0 | 0 | Lobster |
437 |
438 | ***
439 |
440 | Use the subsequent dataset to generate at least 5 insights for the Clique Bait team - bonus: prepare a single A4 infographic that the team can use for their management reporting sessions, be sure to emphasise the most important points from your findings.
441 |
442 | Some ideas you might want to investigate further include:
443 |
444 | - Identifying users who have received impressions during each campaign period and comparing each metric with other users who did not have an impression event
445 | - Does clicking on an impression lead to higher purchase rates?
446 | - What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click?
447 | - What metrics can you use to quantify the success or failure of each campaign compared to each other?
448 |
449 |
--------------------------------------------------------------------------------
/Case Study #7 - Balanced Tree Clothing Co./README.md:
--------------------------------------------------------------------------------
1 | ## 🌲 Case Study #7: Balanced Tree
2 |
3 |
4 |
5 | ## 📚 Table of Contents
6 | - [Business Task](#business-task)
7 | - [Entity Relationship Diagram](#entity-relationship-diagram)
8 | - [Question and Solution](#question-and-solution)
9 |
10 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-7/).
11 |
12 | ***
13 |
14 | ## Business Task
15 |
16 | Balanced Tree Clothing Company prides themselves on providing an optimised range of clothing and lifestyle wear for the modern adventurer!
17 |
18 | Danny, the CEO of this trendy fashion company has asked you to assist the team’s merchandising teams **analyse their sales performance and generate a basic financial report** to share with the wider business.
19 |
20 | ## Entity Relationship Diagram
21 |
22 |
23 |
24 | **Table 1: `product_details`**
25 |
26 | |product_id|price|product_name|category_id|segment_id|style_id|category_name|segment_name|style_name|
27 | |:----|:----|:----|:----|:----|:----|:----|:----|:----|
28 | |c4a632|13|Navy Oversized Jeans - Womens|1|3|7|Womens|Jeans|Navy Oversized|
29 | |e83aa3|32|Black Straight Jeans - Womens|1|3|8|Womens|Jeans|Black Straight|
30 | |e31d39|10|Cream Relaxed Jeans - Womens|1|3|9|Womens|Jeans|Cream Relaxed|
31 | |d5e9a6|23|Khaki Suit Jacket - Womens|1|4|10|Womens|Jacket|Khaki Suit|
32 | |72f5d4|19|Indigo Rain Jacket - Womens|1|4|11|Womens|Jacket|Indigo Rain|
33 | |9ec847|54|Grey Fashion Jacket - Womens|1|4|12|Womens|Jacket|Grey Fashion|
34 | |5d267b|40|White Tee Shirt - Mens|2|5|13|Mens|Shirt|White Tee|
35 | |c8d436|10|Teal Button Up Shirt - Mens|2|5|14|Mens|Shirt|Teal Button Up|
36 | |2a2353|57|Blue Polo Shirt - Mens|2|5|15|Mens|Shirt|Blue Polo|
37 | |f084eb|36|Navy Solid Socks - Mens|2|6|16|Mens|Socks|Navy Solid|
38 |
39 |
40 | **Table 2: `sales`**
41 |
42 | |prod_id|qty|price|discount|member|txn_id|start_txn_time|
43 | |:----|:----|:----|:----|:----|:----|:----|
44 | |c4a632|4|13|17|true|54f307|2021-02-13T01:59:43.296Z|
45 | |5d267b|4|40|17|true|54f307|2021-02-13T01:59:43.296Z|
46 | |b9a74d|4|17|17|true|54f307|2021-02-13T01:59:43.296Z|
47 | |2feb6b|2|29|17|true|54f307|2021-02-13T01:59:43.296Z|
48 | |c4a632|5|13|21|true|26cc98|2021-01-19T01:39:00.345Z|
49 | |e31d39|2|10|21|true|26cc98|2021-01-19T01:39:00.345Z|
50 | |72f5d4|3|19|21|true|26cc98|2021-01-19T01:39:00.345Z|
51 | |2a2353|3|57|21|true|26cc98|2021-01-19T01:39:00.345Z|
52 | |f084eb|3|36|21|true|26cc98|2021-01-19T01:39:00.345Z|
53 | |c4a632|1|13|21|false|ef648d|2021-01-27T02:18:17.164Z|
54 |
55 | **Table 3: `product_hierarchy`**
56 |
57 | |id|parent_id|level_text|level_name|
58 | |:----|:----|:----|:----|
59 | |1|null|Womens|Category|
60 | |2|null|Mens|Category|
61 | |3|1|Jeans|Segment|
62 | |4|1|Jacket|Segment|
63 | |5|2|Shirt|Segment|
64 | |6|2|Socks|Segment|
65 | |7|3|Navy Oversized|Style|
66 | |8|3|Black Straight|Style|
67 | |9|3|Cream Relaxed|Style|
68 | |10|4|Khaki Suit|Style|
69 |
70 | **Table 4: `product_prices`**
71 |
72 | |id|product_id|price|
73 | |:----|:----|:----|
74 | |7|c4a632|13|
75 | |8|e83aa3|32|
76 | |9|e31d39|10|
77 | |10|d5e9a6|23|
78 | |11|72f5d4|19|
79 | |12|9ec847|54|
80 | |13|5d267b|40|
81 | |14|c8d436|10|
82 | |15|2a2353|57|
83 | |16|f084eb|36|
84 |
85 | ***
86 |
87 | ## Question and Solution
88 |
89 | Please join me in executing the queries using PostgreSQL on [DB Fiddle](https://www.db-fiddle.com/f/dkhULDEjGib3K58MvDjYJr/8). It would be great to work together on the questions!
90 |
91 | If you have any questions, reach out to me on [LinkedIn](https://www.linkedin.com/in/katiehuangx/).
92 |
93 | ## 📈 A. High Level Sales Analysis
94 |
95 | **1. What was the total quantity sold for all products?**
96 |
97 | ```sql
98 | SELECT
99 | product.product_name,
100 | SUM(sales.qty) AS total_quantity
101 | FROM balanced_tree.sales
102 | INNER JOIN balanced_tree.product_details AS product
103 | ON sales.prod_id = product.product_id
104 | GROUP BY product.product_name;
105 | ```
106 |
107 | **Answer:**
108 |
109 | |product_name|total_quantity|
110 | |:----|:----|
111 | |White Tee Shirt - Mens|3800|
112 | |Navy Solid Socks - Mens|3792|
113 | |Grey Fashion Jacket - Womens|3876|
114 | |Navy Oversized Jeans - Womens|3856|
115 | |Pink Fluro Polkadot Socks - Mens|3770|
116 | |Khaki Suit Jacket - Womens|3752|
117 | |Black Straight Jeans - Womens|3786|
118 | |White Striped Socks - Mens|3655|
119 | |Blue Polo Shirt - Mens|3819|
120 | |Indigo Rain Jacket - Womens|3757|
121 | |Cream Relaxed Jeans - Womens|3707|
122 | |Teal Button Up Shirt - Mens|3646|
123 |
124 | ***
125 |
126 | **2. What is the total generated revenue for all products before discounts?**
127 |
128 | ```sql
129 | SELECT
130 | product.product_name,
131 | SUM(sales.qty) * SUM(sales.price) AS total_revenue
132 | FROM balanced_tree.sales
133 | INNER JOIN balanced_tree.product_details AS product
134 | ON sales.prod_id = product.product_id
135 | GROUP BY product.product_name;
136 | ```
137 |
138 | **Answer:**
139 |
140 | |product_name|total_revenue|
141 | |:----|:----|
142 | |White Tee Shirt - Mens|192736000|
143 | |Navy Solid Socks - Mens|174871872|
144 | |Grey Fashion Jacket - Womens|266862600|
145 | |Navy Oversized Jeans - Womens|63863072|
146 | |Pink Fluro Polkadot Socks - Mens|137537140|
147 | |Khaki Suit Jacket - Womens|107611112|
148 | |Black Straight Jeans - Womens|150955392|
149 | |White Striped Socks - Mens|77233805|
150 | |Blue Polo Shirt - Mens|276022044|
151 | |Indigo Rain Jacket - Womens|89228750|
152 | |Cream Relaxed Jeans - Womens|46078010|
153 | |Teal Button Up Shirt - Mens|45283320|
154 |
155 | ***
156 |
157 | **3. What was the total discount amount for all products?**
158 |
159 | ```sql
160 | SELECT
161 | product.product_name,
162 | SUM(sales.qty * sales.price * sales.discount/100) AS total_discount
163 | FROM balanced_tree.sales
164 | INNER JOIN balanced_tree.product_details AS product
165 | ON sales.prod_id = product.product_id
166 | GROUP BY product.product_name;
167 | ```
168 |
169 | **Answer:**
170 |
171 | |product_name|total_discount|
172 | |:----|:----|
173 | |White Tee Shirt - Mens|17968|
174 | |Navy Solid Socks - Mens|16059|
175 | |Grey Fashion Jacket - Womens|24781|
176 | |Navy Oversized Jeans - Womens|5538|
177 | |Pink Fluro Polkadot Socks - Mens|12344|
178 | |Khaki Suit Jacket - Womens|9660|
179 | |Black Straight Jeans - Womens|14156|
180 | |White Striped Socks - Mens|6877|
181 | |Blue Polo Shirt - Mens|26189|
182 | |Indigo Rain Jacket - Womens|8010|
183 | |Cream Relaxed Jeans - Womens|3979|
184 | |Teal Button Up Shirt - Mens|3925|
185 |
186 | ***
187 |
188 | ## 🧾 B. Transaction Analysis
189 |
190 | **1. How many unique transactions were there?**
191 |
192 | ```sql
193 | SELECT COUNT(DISTINCT txn_id) AS transaction_count
194 | FROM balanced_tree.sales;
195 | ```
196 |
197 | **Answer:**
198 |
199 | |transaction_count|
200 | |:----|
201 | |2500|
202 |
203 | ***
204 |
205 | **2. What is the average unique products purchased in each transaction?**
206 |
207 | ```sql
208 | SELECT ROUND(AVG(total_quantity)) AS avg_unique_products
209 | FROM (
210 | SELECT
211 | txn_id,
212 | SUM(qty) AS total_quantity
213 | FROM balanced_tree.sales
214 | GROUP BY txn_id
215 | ) AS total_quantities;
216 | ```
217 |
218 | **Answer:**
219 |
220 | |avg_unique_products|
221 | |:----|
222 | |18|
223 |
224 | ***
225 |
226 | **3. What are the 25th, 50th and 75th percentile values for the revenue per transaction?**
227 |
228 | ```sql
229 | WITH revenue_cte AS (
230 | SELECT
231 | txn_id,
232 | SUM(price * qty) AS revenue
233 | FROM balanced_tree.sales
234 | GROUP BY txn_id
235 | )
236 |
237 | SELECT
238 | PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) AS median_25th,
239 | PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue) AS median_50th,
240 | PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) AS median_75th
241 | FROM revenue_cte;
242 | ```
243 |
244 | **Answer:**
245 |
246 | |median_25th|median_50th|median_75th|
247 | |:----|:----|:----|
248 | |375.75|509.5|647|
249 |
250 | ***
251 |
252 | **4. What is the average discount value per transaction?**
253 |
254 | ```sql
255 | SELECT ROUND(AVG(discount_amt)) AS avg_discount
256 | FROM (
257 | SELECT
258 | txn_id,
259 | SUM(qty * price * discount/100) AS discount_amt
260 | FROM balanced_tree.sales
261 | GROUP BY txn_id
262 | ) AS discounted_value
263 | ```
264 |
265 | **Answer:**
266 |
267 | |avg_discount|
268 | |:----|
269 | |60|
270 |
271 | **5. What is the percentage split of all transactions for members vs non-members?**
272 |
273 | ```sql
274 | WITH transactions_cte AS (
275 | SELECT
276 | member,
277 | COUNT(DISTINCT txn_id) AS transactions
278 | FROM balanced_tree.sales
279 | GROUP BY member
280 | )
281 |
282 | SELECT
283 | member,
284 | transactions,
285 | ROUND(100 * transactions
286 | /(SELECT SUM(transactions)
287 | FROM transactions_cte)) AS percentage
288 | FROM transactions_cte
289 | GROUP BY member, transactions;
290 | ```
291 |
292 | **Answer:**
293 |
294 | Members have a transaction count at 60% compared to than non-members who account for only 40% of the transactions.
295 |
296 | |member|transactions|percentage|
297 | |:----|:----|:----|
298 | |false|995|40|
299 | |true|1505|60|
300 |
301 | ***
302 |
303 | **6. What is the average revenue for member transactions and non-member transactions?**
304 |
305 | ```sql
306 | WITH revenue_cte AS (
307 | SELECT
308 | member,
309 | txn_id,
310 | SUM(price * qty) AS revenue
311 | FROM balanced_tree.sales
312 | GROUP BY member, txn_id
313 | )
314 |
315 | SELECT
316 | member,
317 | ROUND(AVG(revenue),2) AS avg_revenue
318 | FROM revenue_cte
319 | GROUP BY member;
320 | ```
321 |
322 | **Answer:**
323 |
324 | The average revenue per transaction for members is only $1.23 higher compared to non-members.
325 |
326 | |member|avg_revenue|
327 | |:----|:----|
328 | |false|515.04|
329 | |true|516.27|
330 |
331 | ***
332 |
333 | ## 👚 C. Product Analysis
334 |
335 | **1. What are the top 3 products by total revenue before discount?**
336 |
337 | ```sql
338 | SELECT
339 | product.product_id,
340 | product.product_name,
341 | SUM(sales.qty) * SUM(sales.price) AS total_revenue
342 | FROM balanced_tree.sales
343 | INNER JOIN balanced_tree.product_details AS product
344 | ON sales.prod_id = product.product_id
345 | GROUP BY product.product_id, product.product_name
346 | ORDER BY total_revenue DESC
347 | LIMIT 3;
348 | ```
349 |
350 | **Answer:**
351 |
352 | |product_id|product_name|total_revenue|
353 | |:----|:----|:----|
354 | |2a2353|Blue Polo Shirt - Mens|276022044|
355 | |9ec847|Grey Fashion Jacket - Womens|266862600|
356 | |5d267b|White Tee Shirt - Mens|192736000|
357 |
358 | ***
359 |
360 | **2. What is the total quantity, revenue and discount for each segment?**
361 |
362 | ```sql
363 | SELECT
364 | product.segment_id,
365 | product.segment_name,
366 | SUM(sales.qty) AS total_quantity,
367 | SUM(sales.qty * sales.price) AS total_revenue,
368 | SUM((sales.qty * sales.price) * sales.discount/100) AS total_discount
369 | FROM balanced_tree.sales
370 | INNER JOIN balanced_tree.product_details AS product
371 | ON sales.prod_id = product.product_id
372 | GROUP BY product.segment_id, product.segment_name;
373 | ```
374 |
375 | **Answer:**
376 |
377 | |segment_id|segment_name|total_quantity|total_revenue|total_discount|
378 | |:----|:----|:----|:----|:----|
379 | |4|Jacket|11385|366983|42451|
380 | |6|Socks|11217|307977|35280|
381 | |5|Shirt|11265|406143|48082|
382 | |3|Jeans|11349|208350|23673|
383 |
384 | ***
385 |
386 | **3. What is the top selling product for each segment?**
387 |
388 | ```sql
389 | WITH top_selling_cte AS (
390 | SELECT
391 | product.segment_id,
392 | product.segment_name,
393 | product.product_id,
394 | product.product_name,
395 | SUM(sales.qty) AS total_quantity,
396 | RANK() OVER (
397 | PARTITION BY segment_id
398 | ORDER BY SUM(sales.qty) DESC) AS ranking
399 | FROM balanced_tree.sales
400 | INNER JOIN balanced_tree.product_details AS product
401 | ON sales.prod_id = product.product_id
402 | GROUP BY
403 | product.segment_id, product.segment_name, product.product_id, product.product_name
404 | )
405 |
406 | SELECT
407 | segment_id,
408 | segment_name,
409 | product_id,
410 | product_name,
411 | total_quantity
412 | FROM top_selling_cte
413 | WHERE ranking = 1;
414 | ```
415 |
416 | **Answer:**
417 |
418 | |segment_id|segment_name|product_id|product_name|total_quantity|
419 | |:----|:----|:----|:----|:----|
420 | |3|Jeans|c4a632|Navy Oversized Jeans - Womens|3856|
421 | |4|Jacket|9ec847|Grey Fashion Jacket - Womens|3876|
422 | |5|Shirt|2a2353|Blue Polo Shirt - Mens|3819|
423 | |6|Socks|f084eb|Navy Solid Socks - Mens|3792|
424 |
425 | ***
426 |
427 | **4. What is the total quantity, revenue and discount for each category?**
428 |
429 | ```sql
430 | SELECT
431 | product.category_id,
432 | product.category_name,
433 | SUM(sales.qty) AS total_quantity,
434 | SUM(sales.qty * sales.price) AS total_revenue,
435 | SUM((sales.qty * sales.price) * sales.discount/100) AS total_discount
436 | FROM balanced_tree.sales
437 | INNER JOIN balanced_tree.product_details AS product
438 | ON sales.prod_id = product.product_id
439 | GROUP BY product.category_id, product.category_name
440 | ORDER BY product.category_id;
441 | ```
442 |
443 | **Answer:**
444 |
445 | |category_id|category_name|total_quantity|total_revenue|total_discount|
446 | |:----|:----|:----|:----|:----|
447 | |1|Womens|22734|575333|66124|
448 | |2|Mens|22482|714120|83362|
449 |
450 | ***
451 |
452 | **5. What is the top selling product for each category?**
453 |
454 | ```sql
455 | WITH top_selling_cte AS (
456 | SELECT
457 | product.category_id,
458 | product.category_name,
459 | product.product_id,
460 | product.product_name,
461 | SUM(sales.qty) AS total_quantity,
462 | RANK() OVER (
463 | PARTITION BY product.category_id
464 | ORDER BY SUM(sales.qty) DESC) AS ranking
465 | FROM balanced_tree.sales
466 | INNER JOIN balanced_tree.product_details AS product
467 | ON sales.prod_id = product.product_id
468 | GROUP BY
469 | product.category_id, product.category_name, product.product_id, product.product_name
470 | )
471 |
472 | SELECT
473 | category_id,
474 | category_name,
475 | product_id,
476 | product_name,
477 | total_quantity
478 | FROM top_selling_cte
479 | WHERE ranking = 1;
480 | ```
481 |
482 | **Answer:**
483 |
484 | |category_id|category_name|product_id|product_name|total_quantity|
485 | |:----|:----|:----|:----|:----|
486 | |1|Womens|9ec847|Grey Fashion Jacket - Womens|3876|
487 | |2|Mens|2a2353|Blue Polo Shirt - Mens|3819|
488 |
489 | ***
490 |
491 | **6. What is the percentage split of revenue by product for each segment?**
492 |
493 | **Answer:**
494 |
495 | ***
496 |
497 | **7. What is the percentage split of revenue by segment for each category?**
498 |
499 | **Answer:**
500 |
501 | ***
502 |
503 | **8. What is the percentage split of total revenue by category?**
504 |
505 | **Answer:**
506 |
507 | ***
508 |
509 | **9. What is the total transaction “penetration” for each product? (hint: penetration = number of transactions where at least 1 quantity of a product was purchased divided by total number of transactions)**
510 |
511 | **Answer:**
512 |
513 | ***
514 |
515 | **10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction?**
516 |
517 | **Answer:**
518 |
519 | ***
520 |
521 | ## 📝 Reporting Challenge
522 |
523 | Write a single SQL script that combines all of the previous questions into a scheduled report that the Balanced Tree team can run at the beginning of each month to calculate the previous month’s values.
524 |
525 | Imagine that the Chief Financial Officer (which is also Danny) has asked for all of these questions at the end of every month.
526 |
527 | He first wants you to generate the data for January only - but then he also wants you to demonstrate that you can easily run the samne analysis for February without many changes (if at all).
528 |
529 | Feel free to split up your final outputs into as many tables as you need - but be sure to explicitly reference which table outputs relate to which question for full marks :)
530 |
531 | ***
532 |
533 | ## 💡 Bonus Challenge
534 |
535 | Use a single SQL query to transform the `product_hierarchy` and `product_prices` datasets to the `product_details` table.
536 |
537 | Hint: you may want to consider using a recursive CTE to solve this problem!
538 |
539 | ***
540 |
--------------------------------------------------------------------------------
/Case Study #8: Fresh Segments/A. Data Exploration and Cleansing.md:
--------------------------------------------------------------------------------
1 | # 🍅 Case Study #8 - Fresh Segments
2 |
3 | ## 🧼 Solution - A. Data Exploration and Cleansing
4 |
5 | ### 1. Update the `fresh_segments.interest_metrics` table by modifying the `month_year` column to be a `date` data type with the start of the month
6 |
7 | ```sql
8 | ALTER TABLE fresh_segments.interest_metrics
9 | ALTER month_year TYPE DATE USING month_year::DATE;
10 | ```
11 |
12 |
13 |
14 | ### 2. What is count of records in the `fresh_segments.interest_metrics` for each `month_year` value sorted in chronological order (earliest to latest) with the `null` values appearing first?
15 |
16 | ```sql
17 | SELECT
18 | month_year, COUNT(*)
19 | FROM fresh_segments.interest_metrics
20 | GROUP BY month_year
21 | ORDER BY month_year NULLS FIRST;
22 | ```
23 |
24 |
25 |
26 | ### 3. What do you think we should do with these `null` values in the `fresh_segments.interest_metrics`?
27 |
28 | The `null` values appear in `_month`, `_year`, `month_year`, and `interest_id`. The corresponding values in `composition`, `index_value`, `ranking`, and `percentile_ranking` fields are not meaningful without the specific information on `interest_id` and dates.
29 |
30 | Before dropping the values, it would be useful to find out the percentage of `null` values.
31 |
32 | ```sql
33 | SELECT
34 | ROUND(100 * (SUM(CASE WHEN interest_id IS NULL THEN 1 END) * 1.0 /
35 | COUNT(*)),2) AS null_perc
36 | FROM fresh_segments.interest_metrics
37 | ```
38 |
39 |
40 |
41 | The percentage of null values is 8.36% which is less than 10%, hence I would suggest to drop all the `null` values.
42 |
43 | ```sql
44 | DELETE FROM fresh_segments.interest_metrics
45 | WHERE interest_id IS NULL;
46 |
47 | -- Run again to confirm that there are no null values.
48 | SELECT
49 | ROUND(100 * (SUM(CASE WHEN interest_id IS NULL THEN 1 END) * 1.0 /
50 | COUNT(*)),2) AS null_perc
51 | FROM fresh_segments.interest_metrics
52 | ```
53 |
54 |
55 |
56 | Confirmed that there are no `null` values in `fresh_segments.interest_metrics`.
57 |
58 | ### 4. How many `interest_id` values exist in the `fresh_segments.interest_metrics` table but not in the `fresh_segments.interest_map` table? What about the other way around?
59 |
60 | ```sql
61 | SELECT
62 | COUNT(DISTINCT map.id) AS map_id_count,
63 | COUNT(DISTINCT metrics.interest_id) AS metrics_id_count,
64 | SUM(CASE WHEN map.id is NULL THEN 1 END) AS not_in_metric,
65 | SUM(CASE WHEN metrics.interest_id is NULL THEN 1 END) AS not_in_map
66 | FROM fresh_segments.interest_map map
67 | FULL OUTER JOIN fresh_segments.interest_metrics metrics
68 | ON metrics.interest_id = map.id;
69 | ```
70 |
71 |
72 |
73 | - There are 1,209 unique `id`s in `interest_map`.
74 | - There are 1,202 unique `interest_id`s in `interest_metrics`.
75 | - There are no `interest_id` that did not appear in `interest_map`. All 1,202 ids were present in the `interest_metrics` table.
76 | - There are 7 `id`s that did not appear in `interest_metrics`.
77 |
78 | ### 5. Summarise the id values in the `fresh_segments.interest_map` by its total record count in this table
79 |
80 | I found the solution for this question to be strange - hence I came up with another summary of the id values too.
81 |
82 | Original solution:
83 |
84 | ```sql
85 | SELECT COUNT(*)
86 | FROM fresh_segments.interest_map
87 | ```
88 |
89 |
90 |
91 | My solution:
92 |
93 | ```sql
94 | SELECT
95 | id,
96 | interest_name,
97 | COUNT(*)
98 | FROM fresh_segments.interest_map map
99 | JOIN fresh_segments.interest_metrics metrics
100 | ON map.id = metrics.interest_id
101 | GROUP BY id, interest_name
102 | ORDER BY count DESC, id;
103 | ```
104 |
105 |
106 |
107 | ### 6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where 'interest_id = 21246' in your joined output and include all columns from `fresh_segments.interest_metrics` and all columns from `fresh_segments.interest_map` except from the id column.
108 |
109 | We should be using `INNER JOIN` to perform our analysis.
110 |
111 | ```sql
112 | SELECT *
113 | FROM fresh_segments.interest_map map
114 | INNER JOIN fresh_segments.interest_metrics metrics
115 | ON map.id = metrics.interest_id
116 | WHERE metrics.interest_id = 21246
117 | AND metrics._month IS NOT NULL; -- There were instances when interest_id is available, however the date values were not - hence filter them out.
118 | ```
119 |
120 |
121 |
122 | The results should come up to 10 rows only.
123 |
124 | ### 7. Are there any records in your joined table where the `month_year` value is before the `created_at` value from the `fresh_segments.interest_map` table? Do you think these values are valid and why?
125 |
126 | ```sql
127 | SELECT
128 | COUNT(*)
129 | FROM fresh_segments.interest_map map
130 | INNER JOIN fresh_segments.interest_metrics metrics
131 | ON map.id = metrics.interest_id
132 | WHERE metrics.month_year < map.created_at::DATE;
133 | ```
134 |
135 |
136 |
137 | There are 188 records where the `month_year` date is before the `created_at` date.
138 |
139 | However, it looks like these records are created in the same month as `month_year`. Do you remember that the `month_year` column's date is set to default on 1st day of the month?
140 |
141 |
142 |
143 | Running another test to see whether date in `month_year` and `created_at` are in the same month.
144 |
145 | ```sql
146 | SELECT
147 | COUNT(*)
148 | FROM fresh_segments.interest_map map
149 | INNER JOIN fresh_segments.interest_metrics metrics
150 | ON map.id = metrics.interest_id
151 | WHERE metrics.month_year < DATE_TRUNC('mon', map.created_at::DATE);
152 | ```
153 |
154 |
155 |
156 | Seems like all the records' dates are in the same month, hence we will consider the records as valid.
157 |
158 | ***
159 |
--------------------------------------------------------------------------------
/Case Study #8: Fresh Segments/B. Interest Analysis.md:
--------------------------------------------------------------------------------
1 | # 🍅 Case Study #8 - Fresh Segments
2 |
3 | ## 📚 Solution - B. Interest Analysis
4 |
5 | ### 1. Which interests have been present in all `month_year` dates in our dataset?
6 |
7 | Find out how many unique `month_year` in dataset.
8 |
9 | ```sql
10 | SELECT
11 | COUNT(DISTINCT month_year) AS unique_month_year_count,
12 | COUNT(DISTINCT interest_id) AS unique_interest_id_count
13 | FROM fresh_segments.interest_metrics;
14 | ```
15 |
16 |
17 |
18 | There are 14 distinct `month_year` dates and 1202 distinct `interest_id`s.
19 |
20 | ```sql
21 | WITH interest_cte AS (
22 | SELECT
23 | interest_id,
24 | COUNT(DISTINCT month_year) AS total_months
25 | FROM fresh_segments.interest_metrics
26 | WHERE month_year IS NOT NULL
27 | GROUP BY interest_id
28 | )
29 |
30 | SELECT
31 | c.total_months,
32 | COUNT(DISTINCT c.interest_id)
33 | FROM interest_cte c
34 | WHERE total_months = 14
35 | GROUP BY c.total_months
36 | ORDER BY count DESC;
37 | ```
38 |
39 |
40 |
41 | 480 interests out of 1202 interests are present in all the `month_year` dates.
42 |
43 | ### 2. Using this same total_months measure - calculate the cumulative percentage of all records starting at 14 months - which total_months value passes the 90% cumulative percentage value?
44 |
45 | Find out the point in which interests present in a particular number of months are not performing well. For example, interest id 101 only appeared in 6 months due to non or lack of clicks and interactions, so we can consider to cut the interest off.
46 |
47 | ```sql
48 | WITH cte_interest_months AS (
49 | SELECT
50 | interest_id,
51 | MAX(DISTINCT month_year) AS total_months
52 | FROM fresh_segments.interest_metrics
53 | WHERE interest_id IS NOT NULL
54 | GROUP BY interest_id
55 | ),
56 | cte_interest_counts AS (
57 | SELECT
58 | total_months,
59 | COUNT(DISTINCT interest_id) AS interest_count
60 | FROM cte_interest_months
61 | GROUP BY total_months
62 | )
63 |
64 | SELECT
65 | total_months,
66 | interest_count,
67 | ROUND(100 * SUM(interest_count) OVER (ORDER BY total_months DESC) / -- Create running total field using cumulative values of interest count
68 | (SUM(INTEREST_COUNT) OVER ()),2) AS cumulative_percentage
69 | FROM cte_interest_counts;
70 | ```
71 |
72 |
73 |
74 | Interests with total months of 6 and above received a 90% and above percentage. Interests below this mark should be investigated to improve their clicks and customer interactions.
75 |
76 | ### 3. If we were to remove all `interest_id` values which are lower than the `total_months` value we found in the previous question - how many total data points would we be removing?
77 |
78 | ### 4. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective.
79 |
80 | ### 5. If we include all of our interests regardless of their counts - how many unique interests are there for each month?
81 |
--------------------------------------------------------------------------------
/Case Study #8: Fresh Segments/README.md:
--------------------------------------------------------------------------------
1 | # Case Study #8: Fresh Segments
2 |
3 |
4 |
5 | ## 📚 Table of Contents
6 | - [Business Task](#business-task)
7 | - [Entity Relationship Diagram](#entity-relationship-diagram)
8 | - [Question and Solution](#question-and-solution)
9 |
10 | Please note that all the information regarding the case study has been sourced from the following link: [here](https://8weeksqlchallenge.com/case-study-8/).
11 |
12 | ***
13 |
14 | ## Business Task
15 |
16 | Fresh Segments is a digital marketing agency that helps other businesses analyse trends in online ad click behaviour for their unique customer base.
17 |
18 | Clients share their customer lists with the Fresh Segments team who then aggregate interest metrics and generate a single dataset worth of metrics for further analysis.
19 |
20 | In particular - the composition and rankings for different interests are provided for each client showing the proportion of their customer list who interacted with online assets related to each interest for each month.
21 |
22 | Danny has asked for your assistance to analyse aggregated metrics for an example client and provide some high level insights about the customer list and their interests.
23 |
24 | ***
25 |
26 | ## Entity Relationship Diagram
27 |
28 | **Table: `interest_map`**
29 |
30 | | id | interest_name | interest_summary | created_at | last_modified |
31 | |----|---------------------------|------------------------------------------------------------------------------------|-------------------------|-------------------------|
32 | | 1 | Fitness Enthusiasts | Consumers using fitness tracking apps and websites. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:12.000 |
33 | | 2 | Gamers | Consumers researching game reviews and cheat codes. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:12.000 |
34 | | 3 | Car Enthusiasts | Readers of automotive news and car reviews. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:12.000 |
35 | | 4 | Luxury Retail Researchers | Consumers researching luxury product reviews and gift ideas. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:12.000 |
36 | | 5 | Brides & Wedding Planners | People researching wedding ideas and vendors. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:12.000 |
37 | | 6 | Vacation Planners | Consumers reading reviews of vacation destinations and accommodations. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:13.000 |
38 | | 7 | Motorcycle Enthusiasts | Readers of motorcycle news and reviews. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:13.000 |
39 | | 8 | Business News Readers | Readers of online business news content. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:12.000 |
40 | | 12 | Thrift Store Shoppers | Consumers shopping online for clothing at thrift stores and researching locations. | 2016-05-26T14:57:59.000 | 2018-03-16T13:14:00.000 |
41 | | 13 | Advertising Professionals | People who read advertising industry news. | 2016-05-26T14:57:59.000 | 2018-05-23T11:30:12.000 |
42 |
43 | **Table: `interest_metrics`**
44 |
45 | | month | year | month_year | interest_id | composition | index_value | ranking | percentile_ranking |
46 | |-------|------|------------|-------------|-------------|-------------|---------|--------------------|
47 | | 7 | 2018 | Jul-18 | 32486 | 11.89 | 6.19 | 1 | 99.86 |
48 | | 7 | 2018 | Jul-18 | 6106 | 9.93 | 5.31 | 2 | 99.73 |
49 | | 7 | 2018 | Jul-18 | 18923 | 10.85 | 5.29 | 3 | 99.59 |
50 | | 7 | 2018 | Jul-18 | 6344 | 10.32 | 5.1 | 4 | 99.45 |
51 | | 7 | 2018 | Jul-18 | 100 | 10.77 | 5.04 | 5 | 99.31 |
52 | | 7 | 2018 | Jul-18 | 69 | 10.82 | 5.03 | 6 | 99.18 |
53 | | 7 | 2018 | Jul-18 | 79 | 11.21 | 4.97 | 7 | 99.04 |
54 | | 7 | 2018 | Jul-18 | 6111 | 10.71 | 4.83 | 8 | 98.9 |
55 | | 7 | 2018 | Jul-18 | 6214 | 9.71 | 4.83 | 8 | 98.9 |
56 | | 7 | 2018 | Jul-18 | 19422 | 10.11 | 4.81 | 10 | 98.63 |
57 |
58 | ***
59 |
60 | ## Question and Solution
61 |
62 | Please join me in executing the queries using PostgreSQL on [DB Fiddle](https://www.db-fiddle.com/f/jmnwogTsUE8hGqkZv9H7E8/17). It would be great to work together on the questions!
63 |
64 | If you have any questions, reach out to me on [LinkedIn](https://www.linkedin.com/in/katiehuangx/).
65 |
66 | ## 🧼 A. Data Exploration and Cleansing
67 |
68 | **1. Update the `fresh_segments.interest_metrics` table by modifying the `month_year` column to be a `date` data type with the start of the month**
69 |
70 | ```sql
71 | ALTER TABLE fresh_segments.interest_metrics
72 | ALTER month_year TYPE DATE USING month_year::DATE;
73 | ```
74 |
75 |
76 |
77 | ***
78 |
79 | **2. What is count of records in the `fresh_segments.interest_metrics` for each `month_year` value sorted in chronological order (earliest to latest) with the `null` values appearing first?**
80 |
81 | ```sql
82 | SELECT
83 | month_year, COUNT(*)
84 | FROM fresh_segments.interest_metrics
85 | GROUP BY month_year
86 | ORDER BY month_year NULLS FIRST;
87 | ```
88 |
89 |
90 |
91 | **3. What do you think we should do with these `null` values in the `fresh_segments.interest_metrics`?**
92 |
93 | The `null` values appear in `_month`, `_year`, `month_year`, and `interest_id`. The corresponding values in `composition`, `index_value`, `ranking`, and `percentile_ranking` fields are not meaningful without the specific information on `interest_id` and dates.
94 |
95 | Before dropping the values, it would be useful to find out the percentage of `null` values.
96 |
97 | ```sql
98 | SELECT
99 | ROUND(100 * (SUM(CASE WHEN interest_id IS NULL THEN 1 END) * 1.0 /
100 | COUNT(*)),2) AS null_perc
101 | FROM fresh_segments.interest_metrics
102 | ```
103 |
104 |
105 |
106 | The percentage of null values is 8.36% which is less than 10%, hence I would suggest to drop all the `null` values.
107 |
108 | ```sql
109 | DELETE FROM fresh_segments.interest_metrics
110 | WHERE interest_id IS NULL;
111 |
112 | -- Run again to confirm that there are no null values.
113 | SELECT
114 | ROUND(100 * (SUM(CASE WHEN interest_id IS NULL THEN 1 END) * 1.0 /
115 | COUNT(*)),2) AS null_perc
116 | FROM fresh_segments.interest_metrics
117 | ```
118 |
119 |
120 |
121 | Confirmed that there are no `null` values in `fresh_segments.interest_metrics`.
122 |
123 | ***
124 |
125 | **4. How many `interest_id` values exist in the `fresh_segments.interest_metrics` table but not in the `fresh_segments.interest_map` table? What about the other way around?**
126 |
127 | ```sql
128 | SELECT
129 | COUNT(DISTINCT map.id) AS map_id_count,
130 | COUNT(DISTINCT metrics.interest_id) AS metrics_id_count,
131 | SUM(CASE WHEN map.id is NULL THEN 1 END) AS not_in_metric,
132 | SUM(CASE WHEN metrics.interest_id is NULL THEN 1 END) AS not_in_map
133 | FROM fresh_segments.interest_map map
134 | FULL OUTER JOIN fresh_segments.interest_metrics metrics
135 | ON metrics.interest_id = map.id;
136 | ```
137 |
138 |
139 |
140 | - There are 1,209 unique `id`s in `interest_map`.
141 | - There are 1,202 unique `interest_id`s in `interest_metrics`.
142 | - There are no `interest_id` that did not appear in `interest_map`. All 1,202 ids were present in the `interest_metrics` table.
143 | - There are 7 `id`s that did not appear in `interest_metrics`.
144 |
145 | ***
146 |
147 | **5. Summarise the id values in the `fresh_segments.interest_map` by its total record count in this table.**
148 |
149 | I found the solution for this question to be strange - hence I came up with another summary of the id values too.
150 |
151 | Original solution:
152 |
153 | ```sql
154 | SELECT COUNT(*)
155 | FROM fresh_segments.interest_map
156 | ```
157 |
158 |
159 |
160 | My solution:
161 |
162 | ```sql
163 | SELECT
164 | id,
165 | interest_name,
166 | COUNT(*)
167 | FROM fresh_segments.interest_map map
168 | JOIN fresh_segments.interest_metrics metrics
169 | ON map.id = metrics.interest_id
170 | GROUP BY id, interest_name
171 | ORDER BY count DESC, id;
172 | ```
173 |
174 |
175 |
176 | ***
177 |
178 | **6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where 'interest_id = 21246' in your joined output and include all columns from `fresh_segments.interest_metrics` and all columns from `fresh_segments.interest_map` except from the id column.**
179 |
180 | We should be using `INNER JOIN` to perform our analysis.
181 |
182 | ```sql
183 | SELECT *
184 | FROM fresh_segments.interest_map map
185 | INNER JOIN fresh_segments.interest_metrics metrics
186 | ON map.id = metrics.interest_id
187 | WHERE metrics.interest_id = 21246
188 | AND metrics._month IS NOT NULL; -- There were instances when interest_id is available, however the date values were not - hence filter them out.
189 | ```
190 |
191 |
192 |
193 | The results should come up to 10 rows only.
194 |
195 | ***
196 |
197 | **7. Are there any records in your joined table where the `month_year` value is before the `created_at` value from the `fresh_segments.interest_map` table? Do you think these values are valid and why?**
198 |
199 | ```sql
200 | SELECT
201 | COUNT(*)
202 | FROM fresh_segments.interest_map map
203 | INNER JOIN fresh_segments.interest_metrics metrics
204 | ON map.id = metrics.interest_id
205 | WHERE metrics.month_year < map.created_at::DATE;
206 | ```
207 |
208 |
209 |
210 | There are 188 records where the `month_year` date is before the `created_at` date.
211 |
212 | However, it looks like these records are created in the same month as `month_year`. Do you remember that the `month_year` column's date is set to default on 1st day of the month?
213 |
214 |
215 |
216 | Running another test to see whether date in `month_year` and `created_at` are in the same month.
217 |
218 | ```sql
219 | SELECT
220 | COUNT(*)
221 | FROM fresh_segments.interest_map map
222 | INNER JOIN fresh_segments.interest_metrics metrics
223 | ON map.id = metrics.interest_id
224 | WHERE metrics.month_year < DATE_TRUNC('mon', map.created_at::DATE);
225 | ```
226 |
227 |
228 |
229 | Seems like all the records' dates are in the same month, hence we will consider the records as valid.
230 |
231 | ***
232 |
233 | ## 📚 B. Interest Analysis
234 |
235 | **1. Which interests have been present in all `month_year` dates in our dataset?**
236 |
237 | Find out how many unique `month_year` in dataset.
238 |
239 | ```sql
240 | SELECT
241 | COUNT(DISTINCT month_year) AS unique_month_year_count,
242 | COUNT(DISTINCT interest_id) AS unique_interest_id_count
243 | FROM fresh_segments.interest_metrics;
244 | ```
245 |
246 |
247 |
248 | There are 14 distinct `month_year` dates and 1202 distinct `interest_id`s.
249 |
250 | ```sql
251 | WITH interest_cte AS (
252 | SELECT
253 | interest_id,
254 | COUNT(DISTINCT month_year) AS total_months
255 | FROM fresh_segments.interest_metrics
256 | WHERE month_year IS NOT NULL
257 | GROUP BY interest_id
258 | )
259 |
260 | SELECT
261 | c.total_months,
262 | COUNT(DISTINCT c.interest_id)
263 | FROM interest_cte c
264 | WHERE total_months = 14
265 | GROUP BY c.total_months
266 | ORDER BY count DESC;
267 | ```
268 |
269 |
270 |
271 | 480 interests out of 1202 interests are present in all the `month_year` dates.
272 |
273 | ***
274 |
275 | **2. Using this same total_months measure - calculate the cumulative percentage of all records starting at 14 months - which total_months value passes the 90% cumulative percentage value?**
276 |
277 | Find out the point in which interests present in a particular number of months are not performing well. For example, interest id 101 only appeared in 6 months due to non or lack of clicks and interactions, so we can consider to cut the interest off.
278 |
279 | ```sql
280 | WITH cte_interest_months AS (
281 | SELECT
282 | interest_id,
283 | MAX(DISTINCT month_year) AS total_months
284 | FROM fresh_segments.interest_metrics
285 | WHERE interest_id IS NOT NULL
286 | GROUP BY interest_id
287 | ),
288 | cte_interest_counts AS (
289 | SELECT
290 | total_months,
291 | COUNT(DISTINCT interest_id) AS interest_count
292 | FROM cte_interest_months
293 | GROUP BY total_months
294 | )
295 |
296 | SELECT
297 | total_months,
298 | interest_count,
299 | ROUND(100 * SUM(interest_count) OVER (ORDER BY total_months DESC) / -- Create running total field using cumulative values of interest count
300 | (SUM(INTEREST_COUNT) OVER ()),2) AS cumulative_percentage
301 | FROM cte_interest_counts;
302 | ```
303 |
304 |
305 |
306 | Interests with total months of 6 and above received a 90% and above percentage. Interests below this mark should be investigated to improve their clicks and customer interactions.
307 | ***
308 |
309 | **3. If we were to remove all `interest_id` values which are lower than the `total_months` value we found in the previous question - how many total data points would we be removing?**
310 |
311 | ***
312 |
313 | **4. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective. **
314 |
315 | ***
316 |
317 | **5. If we include all of our interests regardless of their counts - how many unique interests are there for each month?**
318 |
319 | ***
320 |
321 | ## 🧩 C. Segment Analysis
322 |
323 | 1. Using the complete dataset - which are the top 10 and bottom 10 interests which have the largest composition values in any month_year? Only use the maximum composition value for each interest but you must keep the corresponding month_year
324 | 2. Which 5 interests had the lowest average ranking value?
325 | 3. Which 5 interests had the largest standard deviation in their percentile_ranking value?
326 | 4. For the 5 interests found in the previous question - what was minimum and maximum percentile_ranking values for each interest and its corresponding year_month value? Can you describe what is happening for these 5 interests?
327 | 5. How would you describe our customers in this segment based off their composition and ranking values? What sort of products or services should we show to these customers and what should we avoid?
328 |
329 | ***
330 |
331 | ## 👆🏻 D. Index Analysis
332 |
333 | The `index_value` is a measure which can be used to reverse calculate the average composition for Fresh Segments’ clients. Average composition can be calculated by dividing the composition column by the index_value column rounded to 2 decimal places.
334 |
335 | 1. What is the top 10 interests by the average composition for each month?
336 | 2. For all of these top 10 interests - which interest appears the most often?
337 | 3. What is the average of the average composition for the top 10 interests for each month?
338 | 4. What is the 3 month rolling average of the max average composition value from September 2018 to August 2019 and include the previous top ranking interests in the same output shown below.
339 | 5. Provide a possible reason why the max average composition might change from month to month? Could it signal something is not quite right with the overall business model for Fresh Segments?
340 |
341 | ***
342 |
343 | Do give me a 🌟 if you like what you're reading. Thank you! 🙆🏻♀️
344 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 8-Week SQL Challenges
2 |
3 | This repository serves as the solution for the 8 case studies from the **[#8WeekSQLChallenge](https://8weeksqlchallenge.com)**.
4 |
5 | It showcases my ability to tackle various SQL challenges and demonstrates my proficiency in SQL query writing and problem-solving skills.
6 |
7 | A special thanks to [Data with Danny](https://www.linkedin.com/company/datawithdanny/) for creating these insightful and engaging SQL case studies! 👋🏻
8 |
9 | ## 📚 Table of Contents
10 |
11 | Please find the solution links for the case studies below. Simply click on the links to access each solution.
12 | - [Case Study #1: Danny's Diner](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/README.md)
13 | - [Case Study #2: Pizza Runner](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/README.md)
14 | - [Case Study #3: Foodie-Fi](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/README.md)
15 | - [Case Study #4: Data Bank](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%234%20-%20Data%20Bank/README.md)
16 | - [Case Study #5: Data Mart](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/README.md)
17 | - [Case Study #6: Clique Bait](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/README.md)
18 | - [Case Study #7: Balanced Tree](https://github.com/katiehuangx/8-Week-SQL-Challenge/tree/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co.)
19 | - [Case Study #8: Fresh Segments](https://github.com/katiehuangx/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%3A%20Fresh%20Segments/README.md)
20 |
21 | ## Important Note Regarding My Solutions
22 |
23 | Hello visitors,
24 |
25 | I wanted to address a concerning issue. It has come to my attention that some of the SQL case study solutions on this repository have been copied without permission. I want to emphasize that this behaviour goes against the principles of academic integrity and intellectual honesty.
26 |
27 | Creating these solutions has been a labour of love, involving substantial time and effort. Therefore, I kindly ask that you avoid copying them word for word. Instead, I encourage you to utilize them as a reference and derive inspiration from the underlying concepts, applying them in your own unique way.
28 |
29 | If you have any questions or need assistance, please feel free to reach out to me. I'm here to support your learning journey and provide guidance.
30 |
31 | Thank you for your understanding and for being a part of this community.
32 |
33 | Best wishes,
34 |
35 | Katie
36 |
--------------------------------------------------------------------------------