├── .gitattributes ├── Case Study #1 - Danny's Diner ├── README.md ├── query-sql.sql └── schema-sql.sql ├── Case Study #2 - Pizza Runner ├── README.md ├── query-sql.sql └── schema-sql.sql ├── Case Study #3 - Foodie-Fi ├── README.md ├── query-sql.sql └── schema.sql ├── Case Study #4 - Data Bank ├── README.md └── query-sql.sql ├── Case Study #7 - Balanced Tree Clothing Co ├── README.md ├── query-sql.sql └── schema-sql.sql ├── IMG ├── 1.png ├── 2.png ├── 3.png ├── 4.png ├── org-1.png ├── org-2.png ├── org-3.png ├── org-4.png ├── org-7.png └── timeline.png └── README.md /.gitattributes: -------------------------------------------------------------------------------- 1 | *.sql linguist-detectable=true 2 | *.sql linguist-language=sql 3 | *.sql text -------------------------------------------------------------------------------- /Case Study #1 - Danny's Diner/README.md: -------------------------------------------------------------------------------- 1 | # [8-Week SQL Challenge](https://github.com/ndleah/8-Week-SQL-Challenge) 2 | ![Star Badge](https://img.shields.io/static/v1?label=%F0%9F%8C%9F&message=If%20Useful&style=style=flat&color=BC4E99) 3 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-971901?)](https://github.com/ndleah/8-Week-SQL-Challenge) 4 | [![View Repositories](https://img.shields.io/badge/View-My_Repositories-blue?logo=GitHub)](https://github.com/ndleah?tab=repositories) 5 | [![View My Profile](https://img.shields.io/badge/View-My_Profile-green?logo=GitHub)](https://github.com/ndleah) 6 | 7 | # 🍜 Case Study #1 - Danny's Diner 8 |

9 | 10 | 11 | ## 📕 Table Of Contents 12 | * 🛠️ [Problem Statement](#problem-statement) 13 | * 📂 [Dataset](#dataset) 14 | * 🧙‍♂️ [Case Study Questions](#case-study-questions) 15 | * 🚀 [Solutions](#solutions) 16 | * 🐋 [Limitations](#limitations) 17 | 18 | --- 19 | 20 | ## 🛠️ Problem Statement 21 | 22 | > Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers. 23 | 24 |
25 | 26 | --- 27 | 28 | ## 📂 Dataset 29 | Danny has shared with you 3 key datasets for this case study: 30 | 31 | ### **```sales```** 32 | 33 |

34 | 35 | View table 36 | 37 | 38 | The sales table captures all ```customer_id``` level purchases with an corresponding ```order_date``` and ```product_id``` information for when and what menu items were ordered. 39 | 40 | |customer_id|order_date|product_id| 41 | |-----------|----------|----------| 42 | |A |2021-01-01|1 | 43 | |A |2021-01-01|2 | 44 | |A |2021-01-07|2 | 45 | |A |2021-01-10|3 | 46 | |A |2021-01-11|3 | 47 | |A |2021-01-11|3 | 48 | |B |2021-01-01|2 | 49 | |B |2021-01-02|2 | 50 | |B |2021-01-04|1 | 51 | |B |2021-01-11|1 | 52 | |B |2021-01-16|3 | 53 | |B |2021-02-01|3 | 54 | |C |2021-01-01|3 | 55 | |C |2021-01-01|3 | 56 | |C |2021-01-07|3 | 57 | 58 |
59 | 60 | ### **```menu```** 61 | 62 |
63 | 64 | View table 65 | 66 | 67 | The menu table maps the ```product_id``` to the actual ```product_name``` and price of each menu item. 68 | 69 | |product_id |product_name|price | 70 | |-----------|------------|----------| 71 | |1 |sushi |10 | 72 | |2 |curry |15 | 73 | |3 |ramen |12 | 74 | 75 |
76 | 77 | ### **```members```** 78 | 79 |
80 | 81 | View table 82 | 83 | 84 | The final members table captures the ```join_date``` when a ```customer_id``` joined the beta version of the Danny’s Diner loyalty program. 85 | 86 | |customer_id|join_date | 87 | |-----------|----------| 88 | |A |1/7/2021 | 89 | |B |1/9/2021 | 90 | 91 |
92 | 93 | ## 🧙‍♂️ Case Study Questions 94 |

95 | 96 | 97 | 1. What is the total amount each customer spent at the restaurant? 98 | 2. How many days has each customer visited the restaurant? 99 | 3. What was the first item from the menu purchased by each customer? 100 | 4. What is the most purchased item on the menu and how many times was it purchased by all customers? 101 | 5. Which item was the most popular for each customer? 102 | 6. Which item was purchased first by the customer after they became a member? 103 | 7. Which item was purchased just before the customer became a member? 104 | 8. What is the total items and amount spent for each member before they became a member? 105 | 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? 106 | 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January? 107 | 108 |
109 | 110 | ## 🚀 Solutions 111 | 112 | ### **Q1. What is the total amount each customer spent at the restaurant?** 113 | ```sql 114 | SELECT 115 | sales.customer_id, 116 | SUM(menu.price) AS total_spent 117 | FROM dannys_diner.sales 118 | JOIN dannys_diner.menu 119 | ON sales.product_id = menu.product_id 120 | GROUP BY customer_id 121 | ORDER BY customer_id; 122 | ``` 123 | 124 | | customer_id | total_spent | 125 | | ----------- | ----------- | 126 | | A | 76 | 127 | | B | 74 | 128 | | C | 36 | 129 | 130 | --- 131 | 132 | ### **Q2. How many days has each customer visited the restaurant?** 133 | ```sql 134 | SELECT 135 | customer_id, 136 | COUNT (DISTINCT order_date) AS visited_days 137 | FROM dannys_diner.sales 138 | GROUP BY customer_id; 139 | ``` 140 | 141 | |customer_id|visited_days| 142 | |-----------|------------| 143 | |A |4 | 144 | |B |6 | 145 | |C |2 | 146 | 147 | 148 | --- 149 | 150 | ### **Q3. What was the first item from the menu purchased by each customer?** 151 | > ⚠️ Access [**here**](#️-question-3-what-was-the-first-item-from-the-menu-purchased-by-each-customer) to view the limitations of this question 152 | 153 | ```sql 154 | WITH cte_order AS ( 155 | SELECT 156 | sales.customer_id, 157 | menu.product_name, 158 | ROW_NUMBER() OVER( 159 | PARTITION BY sales.customer_id 160 | ORDER BY 161 | sales.order_date, 162 | sales.product_id 163 | ) AS item_order 164 | FROM dannys_diner.sales 165 | JOIN dannys_diner.menu 166 | ON sales.product_id = menu.product_id 167 | ) 168 | SELECT * FROM cte_order 169 | WHERE item_order = 1; 170 | ``` 171 | 172 | **Result:** 173 | | customer_id | product_name | item_order | 174 | | ----------- | ------------ | ---------- | 175 | | A | sushi | 1 | 176 | | B | curry | 1 | 177 | | C | ramen | 1 | 178 | 179 | --- 180 | 181 | ### **Q4. What is the most purchased item on the menu and how many times was it purchased by all customers?** 182 | ```sql 183 | SELECT 184 | menu.product_name, 185 | COUNT(sales.product_id) AS order_count 186 | FROM dannys_diner.sales 187 | INNER JOIN dannys_diner.menu 188 | ON sales.product_id = menu.product_id 189 | GROUP BY 190 | menu.product_name 191 | ORDER BY order_count DESC 192 | LIMIT 1; 193 | ``` 194 | 195 | |product_id|product_name|order_count| 196 | |----------|------------|-----------| 197 | |3 |ramen |8 | 198 | 199 | --- 200 | 201 | ### **Q5. Which item was the most popular for each customer?** 202 | > ⚠️ Access [**here**](#️-question-5-which-item-was-the-most-popular-for-each-customer) to view the limitations of this question 203 | ```sql 204 | WITH cte_order_count AS ( 205 | SELECT 206 | sales.customer_id, 207 | menu.product_name, 208 | COUNT(*) as order_count 209 | FROM dannys_diner.sales 210 | JOIN dannys_diner.menu 211 | ON sales.product_id = menu.product_id 212 | GROUP BY 213 | customer_id, 214 | product_name 215 | ORDER BY 216 | customer_id, 217 | order_count DESC 218 | ), 219 | cte_popular_rank AS ( 220 | SELECT 221 | *, 222 | RANK() OVER(PARTITION BY customer_id ORDER BY order_count DESC) AS rank 223 | FROM cte_order_count 224 | ) 225 | SELECT * FROM cte_popular_rank 226 | WHERE rank = 1; 227 | ``` 228 | 229 | --- 230 | 231 | **Note:** Before answering **question 6-10**, I created a **```membership_validation```** table to validate only those customers joining in the membership program: 232 | ```sql 233 | DROP TABLE IF EXISTS membership_validation; 234 | CREATE TEMP TABLE membership_validation AS 235 | SELECT 236 | sales.customer_id, 237 | sales.order_date, 238 | menu.product_name, 239 | menu.price, 240 | members.join_date, 241 | CASE WHEN sales.order_date >= members.join_date 242 | THEN 'X' 243 | ELSE '' 244 | END AS membership 245 | FROM dannys_diner.sales 246 | INNER JOIN dannys_diner.menu 247 | ON sales.product_id = menu.product_id 248 | LEFT JOIN dannys_diner.members 249 | ON sales.customer_id = members.customer_id 250 | -- using the WHERE clause on the join_date column to exclude customers who haven't joined the membership program (don't have a join date = not joining the program) 251 | WHERE join_date IS NOT NULL 252 | ORDER BY 253 | customer_id, 254 | order_date; 255 | ``` 256 | 257 | --- 258 | 259 | ### **Q6. Which item was purchased first by the customer after they became a member?** 260 | > ⚠️ Access [**here**](#️-question-6-which-item-was-purchased-first-by-the-customer-after-they-became-a-member) to view the limitations of this question 261 | 262 | **Note:** In this question, the orders made during the join date are counted within the first order as well 263 | 264 | ```sql 265 | WITH cte_first_after_mem AS ( 266 | SELECT 267 | customer_id, 268 | product_name, 269 | order_date, 270 | RANK() OVER( 271 | PARTITION BY customer_id 272 | ORDER BY order_date) AS purchase_order 273 | FROM membership_validation 274 | WHERE membership = 'X' 275 | ) 276 | SELECT * FROM cte_first_after_mem 277 | WHERE purchase_order = 1; 278 | ``` 279 | 280 | | customer_id | product_name | order_date | purchase_order | 281 | | ----------- | ------------ | ------------------------ | -------------- | 282 | | A | curry | 2021-01-07T00:00:00.000Z | 1 | 283 | | B | sushi | 2021-01-11T00:00:00.000Z | 1 | 284 | 285 | --- 286 | 287 | ### **Q7. Which item was purchased just before the customer became a member?** 288 | > ⚠️ Access [**here**](#️-question-7-which-item-was-purchased-just-before-the-customer-became-a-member) to view the limitations of this question 289 | 290 | ```sql 291 | WITH cte_last_before_mem AS ( 292 | SELECT 293 | customer_id, 294 | product_name, 295 | order_date, 296 | RANK() OVER( 297 | PARTITION BY customer_id 298 | ORDER BY order_date DESC) AS purchase_order 299 | FROM membership_validation 300 | WHERE membership = '' 301 | ) 302 | SELECT * FROM cte_last_before_mem 303 | --since we used the ORDER BY DESC in the query above, the order 1 would mean the last date before the customer join in the membership 304 | WHERE purchase_order = 1; 305 | ``` 306 | 307 | | customer_id | product_name | order_date | purchase_order | 308 | | ----------- | ------------ | ------------------------ | -------------- | 309 | | A | sushi | 2021-01-01T00:00:00.000Z | 1 | 310 | | A | curry | 2021-01-01T00:00:00.000Z | 1 | 311 | | B | sushi | 2021-01-04T00:00:00.000Z | 1 | 312 | 313 | --- 314 | 315 | ### **Q8. What is the total items and amount spent for each member before they became a member?** 316 | ```sql 317 | WITH cte_spent_before_mem AS ( 318 | SELECT 319 | customer_id, 320 | product_name, 321 | price 322 | FROM membership_validation 323 | WHERE membership = '' 324 | ) 325 | SELECT 326 | customer_id, 327 | SUM(price) AS total_spent, 328 | COUNT(*) AS total_items 329 | FROM cte_spent_before_mem 330 | GROUP BY customer_id 331 | ORDER BY customer_id; 332 | ``` 333 | 334 | | customer_id | total_spent | total_items | 335 | | ----------- | ----------- | ----------- | 336 | | A | 25 | 2 | 337 | | B | 40 | 3 | 338 | 339 | 340 | --- 341 | 342 | ### **Q9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?** 343 | ```sql 344 | SELECT 345 | customer_id, 346 | SUM( 347 | CASE WHEN product_name = 'sushi' 348 | THEN (price * 20) 349 | ELSE (price * 10) 350 | END 351 | ) AS total_points 352 | FROM membership_validation 353 | GROUP BY customer_id 354 | ORDER BY customer_id; 355 | ``` 356 | 357 | | customer_id | total_points | 358 | | ----------- | ------------ | 359 | | A | 860 | 360 | | B | 940 | 361 | 362 | --- 363 | 364 | ### **Q10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?** 365 | 366 | > ⚠️ Access [**here**](#️-question-10-in-the-first-week-after-a-customer-joins-the-program-including-their-join-date-they-earn-2x-points-on-all-items-not-just-sushi---how-many-points-do-customer-a-and-b-have-at-the-end-of-january) to view the limitations of this question 367 | 368 | If we combine the condition from [**question 9**](#q9-if-each-1-spent-equates-to-10-points-and-sushi-has-a-2x-points-multiplier---how-many-points-would-each-customer-have) and the condition in this question, we will have 2 point calculation cases under: 369 | - **Normal condition:** when **```product_name = 'sushi'```** = points X2, **```else```** = points X1 370 | - **Conditions within the first week of membership:** when all menu items are awarded X2 points 371 | 372 | I have created a timeline as illustration for wheareas we apply the conditions: 373 |

374 | 375 | 376 | ### **Step 1:** 377 | 378 | As transparent from the timeline, I could see that the first need to take is to seperate the timeframe that will apply **condition 1** (Normal condition) from **condition 2** (First week membership's condition). Thus, I will create a temp table for days validation within and without the first week: 379 | 380 | > Note that I will use data from the **```membership_validation```** table from the previous question to exclude those customers did not join the membership program also 381 | ```sql 382 | --create temp table for days validation within the first week membership 383 | DROP TABLE IF EXISTS membership_first_week_validation; 384 | CREATE TEMP TABLE membership_first_week_validation AS 385 | WITH cte_valid AS ( 386 | SELECT 387 | customer_id, 388 | order_date, 389 | product_name, 390 | price, 391 | /* since we use aggregate function for our condition clause, the query will require to have the GROUP BY clause which also include the order_date and product_name. 392 | 393 | This can possibly combine those orders with same product name in the same date into 1, which can lead to errors in our final sum calculation. 394 | 395 | Thus, I will count all the order_count group by order_date to avoid these mistakes */ 396 | COUNT(*) AS order_count, 397 | CASE WHEN order_date BETWEEN join_date AND (join_date + 6) 398 | THEN 'X' 399 | ELSE '' 400 | END AS within_first_week 401 | FROM membership_validation 402 | GROUP BY 403 | customer_id, 404 | order_date, 405 | product_name, 406 | price, 407 | join_date 408 | ORDER BY 409 | customer_id, 410 | order_date 411 | ) 412 | SELECT * FROM cte_valid 413 | --Since we only calculate total points of customers by the end of January, I set the order_date condition < '2021-02-01' to avoid all the unecessary sum orders calculation after January 414 | WHERE order_date < '2021-02-01'; 415 | --inspect the table result 416 | SELECT * FROM membership_first_week_validation; 417 | ``` 418 | 419 | | customer_id | order_date | product_name | price | order_count | within_first_week | 420 | | ----------- | ------------------------ | ------------ | ----- | ----------- | ----------------- | 421 | | A | 2021-01-01T00:00:00.000Z | curry | 15 | 1 | | 422 | | A | 2021-01-01T00:00:00.000Z | sushi | 10 | 1 | | 423 | | A | 2021-01-07T00:00:00.000Z | curry | 15 | 1 | X | 424 | | A | 2021-01-10T00:00:00.000Z | ramen | 12 | 1 | X | 425 | | A | 2021-01-11T00:00:00.000Z | ramen | 12 | 2 | X | 426 | | B | 2021-01-01T00:00:00.000Z | curry | 15 | 1 | | 427 | | B | 2021-01-02T00:00:00.000Z | curry | 15 | 1 | | 428 | | B | 2021-01-04T00:00:00.000Z | sushi | 10 | 1 | | 429 | | B | 2021-01-11T00:00:00.000Z | sushi | 10 | 1 | X | 430 | | B | 2021-01-16T00:00:00.000Z | ramen | 12 | 1 | | 431 | 432 | After the table is generated, do you notice how the **```order_count```** from the **ramen order** made by **customer A** in ```'2021-01-11'``` is diplayed? If I didn't calculate the **```order_count```**, our calculation is possibly missing 1 order from ```'2021-01-11'```, so it is very crucial to also include this information when doing our calculation! 433 | 434 | ### **Step 2:** 435 | Now that we have two sets of data to calculate different cases applying for condition 1 and 2, next step I will created 2 tables based on our previous **```first_week_validation```** table: 436 | ```sql 437 | --create temp table for points calculation only in the first week of membership 438 | DROP TABLE IF EXISTS membership_first_week_points; 439 | CREATE TEMP TABLE membership_first_week_points AS 440 | WITH cte_first_week_count AS ( 441 | SELECT * FROM membership_first_week_validation 442 | WHERE within_first_week = 'X' 443 | ) 444 | SELECT 445 | customer_id, 446 | SUM( 447 | CASE WHEN within_first_week = 'X' 448 | THEN (price * order_count * 20) 449 | ELSE (price * order_count * 10) 450 | END 451 | ) AS total_points 452 | FROM cte_first_week_count 453 | GROUP BY customer_id; 454 | --inspect table results 455 | SELECT * FROM membership_first_week_points; 456 | ``` 457 | 458 | | customer_id | total_points | 459 | | ----------- | ------------ | 460 | | A | 1020 | 461 | | B | 200 | 462 | 463 | 464 | ```sql 465 | --create temp table for points calculation excluded the first week membership (before membership + after the first week membership) 466 | DROP TABLE IF EXISTS membership_non_first_week_points; 467 | CREATE TEMP TABLE membership_non_first_week_points AS 468 | WITH cte_first_week_count AS ( 469 | SELECT * FROM membership_first_week_validation 470 | WHERE within_first_week = '' 471 | ) 472 | SELECT 473 | customer_id, 474 | SUM( 475 | CASE WHEN product_name = 'sushi' 476 | THEN (price * order_count * 20) 477 | ELSE (price * order_count * 10) 478 | END 479 | ) AS total_points 480 | FROM cte_first_week_count 481 | GROUP BY customer_id; 482 | --inspect table results 483 | SELECT * FROM membership_non_first_week_points; 484 | ``` 485 | 486 | **Result:** 487 | | customer_id | total_points | 488 | | ----------- | ------------ | 489 | | A | 350 | 490 | | B | 620 | 491 | 492 | **Finding**: 493 | > Total points exclude the first week of membership program of: 494 | > * **Customer A** is **350** 495 | > * **Customer B** is **620** 496 | 497 | ### **Step 3:** 498 | 499 | Now that we have total points for both conditions, let's aggregate all the points value together to get the final result! 500 | 501 | ```sql 502 | --perform table union to aggregate our point values from both point calculation tables, then use SUM aggregate function to get our result 503 | WITH cte_union AS ( 504 | SELECT * FROM membership_first_week_points 505 | UNION 506 | SELECT * FROM membership_non_first_week_points 507 | ) 508 | SELECT 509 | customer_id, 510 | SUM(total_points) 511 | FROM cte_union 512 | GROUP BY customer_id 513 | ORDER BY customer_id; 514 | ``` 515 | 516 | | customer_id | SUM | 517 | | ----------- | ------------ | 518 | | A | 1370 | 519 | | B | 820 | 520 | 521 | ## 🐋 Limitations 522 | 523 | > This section includes all the limitations in terms of my understanding regarding the question and on the limited data information in response to the question 3, 5, 6, 7 and 10: 524 | 525 | ### ⚠️ **Question 3: What was the first item from the menu purchased by each customer?** 526 | [View solution](#q3-what-was-the-first-item-from-the-menu-purchased-by-each-customer) 527 | 528 | The limition of this question includes: 529 | 530 | * Since the **```order_date```** information does not include details of the purchase time (hours, minute, second, etc.) and those orders purchased **on the same day** are sorted based on the **```product_id```** instead of time element, it is difficult for me to know which product is purchased first on the same day. 531 | 532 | That's why, in this question I will sort the first purchase order by the **```product_id```** 533 | 534 | --- 535 | 536 | ### ⚠️ **Question 5: Which item was the most popular for each customer?** 537 | [View solution](#q5-which-item-was-the-most-popular-for-each-customer) 538 | 539 | The limition of this question includes: 540 | * Since there is **no extra information** to provide further conditions for **sorting popular items** for each customer, thus, those products have the same highest purchase counts are considered to be all popular 541 | 542 | --- 543 | 544 | ### ⚠️ **Question 6: Which item was purchased first by the customer after they became a member?** 545 | [View solution](#q6-which-item-was-purchased-first-by-the-customer-after-they-became-a-member) 546 | 547 | The limition of this question includes: 548 | 549 | * Since it is not clear that those orders made during the **join_date** was **after** or **before** the customer joined in the membership program because of the lack of **```order_date```** and **```join_date```** information (does not include details of the purchase time), I will assume these orders were made after the customer had already joined the program. 550 | --- 551 | 552 | ### ⚠️ **Question 7: Which item was purchased just before the customer became a member?** 553 | [View solution](#q7-which-item-was-purchased-just-before-the-customer-became-a-member) 554 | 555 | The limition of this question includes: 556 | * Since the **```order_date```** information does not include details of the purchase time (hours, minute, second, etc.) and those orders purchased **on the same day** are sorted based on the **```product_id```** instead of time element, it is difficult for me to know which product is last purchased before the customer join in the membership program. 557 | 558 | Therefore, the result can be either 1 of those orders made during the last day before the **```join_date```** 559 | 560 | --- 561 | 562 | ### ⚠️ **Question 10: In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?** 563 | [View solution](#q10-in-the-first-week-after-a-customer-joins-the-program-including-their-join-date-they-earn-2x-points-on-all-items-not-just-sushi---how-many-points-do-customer-a-and-b-have-at-the-end-of-january) 564 | 565 | The limition of this question includes: 566 | * Since it is not clear that the points in this question is only calculated **after the customer joins in the membership program** or not, I will also include the total points before the **```join_date```**. 567 | 568 | --- 569 |

© 2021 Leah Nguyen

570 | 571 | 572 | -------------------------------------------------------------------------------- /Case Study #1 - Danny's Diner/query-sql.sql: -------------------------------------------------------------------------------- 1 | /* -------------------- 2 | Case Study Questions 3 | --------------------*/ 4 | 5 | -- 1. What is the total amount each customer spent at the restaurant? 6 | -- 2. How many days has each customer visited the restaurant? 7 | -- 3. What was the first item from the menu purchased by each customer? 8 | -- 4. What is the most purchased item on the menu and how many times was it purchased by all customers? 9 | -- 5. Which item was the most popular for each customer? 10 | -- 6. Which item was purchased first by the customer after they became a member? 11 | -- 7. Which item was purchased just before the customer became a member? 12 | -- 8. What is the total items and amount spent for each member before they became a member? 13 | -- 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? 14 | -- 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January? 15 | 16 | -- 1. What is the total amount each customer spent at the restaurant? 17 | SELECT 18 | sales.customer_id, 19 | SUM(menu.price) AS total_spent 20 | FROM dannys_diner.sales 21 | JOIN dannys_diner.menu 22 | ON sales.product_id = menu.product_id 23 | GROUP BY customer_id 24 | ORDER BY customer_id; 25 | 26 | --Result: 27 | +──────────────+──────────────+ 28 | | customer_id | total_spent | 29 | +──────────────+──────────────+ 30 | | A | 76 | 31 | | B | 74 | 32 | | C | 36 | 33 | +──────────────+──────────────+ 34 | 35 | -- 2. How many days has each customer visited the restaurant? 36 | SELECT 37 | customer_id, 38 | COUNT (DISTINCT order_date) AS visited_days 39 | FROM dannys_diner.sales 40 | GROUP BY customer_id; 41 | 42 | --Result: 43 | +──────────────+───────────────+ 44 | | customer_id | visited_days | 45 | +──────────────+───────────────+ 46 | | A | 4 | 47 | | B | 6 | 48 | | C | 2 | 49 | +──────────────+───────────────+ 50 | 51 | -- 3. What was the first item from the menu purchased by each customer? 52 | WITH cte_order AS ( 53 | SELECT 54 | sales.customer_id, 55 | menu.product_name, 56 | ROW_NUMBER() OVER( 57 | PARTITION BY sales.customer_id 58 | ORDER BY 59 | sales.order_date, 60 | sales.product_id 61 | ) AS item_order 62 | FROM dannys_diner.sales 63 | JOIN dannys_diner.menu 64 | ON sales.product_id = menu.product_id 65 | ) 66 | SELECT * FROM cte_order 67 | WHERE item_order = 1; 68 | 69 | --Result: 70 | +──────────────+───────────────+─────────────+ 71 | | customer_id | product_name | item_order | 72 | +──────────────+───────────────+─────────────+ 73 | | A | sushi | 1 | 74 | | B | curry | 1 | 75 | | C | ramen | 1 | 76 | +──────────────+───────────────+─────────────+ 77 | 78 | -- 4. What is the most purchased item on the menu and how many times was it purchased by all customers? 79 | SELECT 80 | menu.product_name, 81 | COUNT(sales.product_id) AS order_count 82 | FROM dannys_diner.sales 83 | INNER JOIN dannys_diner.menu 84 | ON sales.product_id = menu.product_id 85 | GROUP BY 86 | menu.product_name 87 | ORDER BY order_count DESC 88 | LIMIT 1; 89 | 90 | --Result: 91 | +─────────────+───────────────+──────────────+ 92 | | product_id | product_name | order_count | 93 | +─────────────+───────────────+──────────────+ 94 | | 3 | ramen | 8 | 95 | +─────────────+───────────────+──────────────+ 96 | 97 | -- 5. Which item was the most popular for each customer? 98 | WITH cte_order_count AS ( 99 | SELECT 100 | sales.customer_id, 101 | menu.product_name, 102 | COUNT(*) as order_count 103 | FROM dannys_diner.sales 104 | JOIN dannys_diner.menu 105 | ON sales.product_id = menu.product_id 106 | GROUP BY 107 | customer_id, 108 | product_name 109 | ORDER BY 110 | customer_id, 111 | order_count DESC 112 | ), 113 | cte_popular_rank AS ( 114 | SELECT 115 | *, 116 | RANK() OVER(PARTITION BY customer_id ORDER BY order_count DESC) AS rank 117 | FROM cte_order_count 118 | ) 119 | SELECT * FROM cte_popular_rank 120 | WHERE rank = 1; 121 | 122 | --Result: 123 | +──────────────+───────────────+──────────────+───────+ 124 | | customer_id | product_name | order_count | rank | 125 | +──────────────+───────────────+──────────────+───────+ 126 | | A | ramen | 3 | 1 | 127 | | B | ramen | 2 | 1 | 128 | | B | curry | 2 | 1 | 129 | | B | sushi | 2 | 1 | 130 | | C | ramen | 3 | 1 | 131 | +──────────────+───────────────+──────────────+───────+ 132 | 133 | 134 | -- Before answering question 6-10, I created a membership_validation table to validate only those customers joining in the membership program: 135 | DROP TABLE IF EXISTS membership_validation; 136 | CREATE TEMP TABLE membership_validation AS 137 | SELECT 138 | sales.customer_id, 139 | sales.order_date, 140 | menu.product_name, 141 | menu.price, 142 | members.join_date, 143 | CASE WHEN sales.order_date >= members.join_date 144 | THEN 'X' 145 | ELSE '' 146 | END AS membership 147 | FROM dannys_diner.sales 148 | INNER JOIN dannys_diner.menu 149 | ON sales.product_id = menu.product_id 150 | LEFT JOIN dannys_diner.members 151 | ON sales.customer_id = members.customer_id 152 | WHERE join_date IS NOT NULL 153 | ORDER BY 154 | customer_id, 155 | order_date; 156 | 157 | -- 6. Which item was purchased first by the customer after they became a member? 158 | --Note: In this question, the orders made during the join date are counted within the first order as well 159 | WITH cte_first_after_mem AS ( 160 | SELECT 161 | customer_id, 162 | product_name, 163 | order_date, 164 | RANK() OVER( 165 | PARTITION BY customer_id 166 | ORDER BY order_date) AS purchase_order 167 | FROM membership_validation 168 | WHERE membership = 'X' 169 | ) 170 | SELECT * FROM cte_first_after_mem 171 | WHERE purchase_order = 1; 172 | 173 | --Result: 174 | +──────────────+───────────────+───────────────────────────+─────────────────+ 175 | | customer_id | product_name | order_date | purchase_order | 176 | +──────────────+───────────────+───────────────────────────+─────────────────+ 177 | | A | curry | 2021-01-07T00:00:00.000Z | 1 | 178 | | B | sushi | 2021-01-11T00:00:00.000Z | 1 | 179 | +──────────────+───────────────+───────────────────────────+─────────────────+ 180 | 181 | -- 7. Which item was purchased just before the customer became a member? 182 | WITH cte_last_before_mem AS ( 183 | SELECT 184 | customer_id, 185 | product_name, 186 | order_date, 187 | RANK() OVER( 188 | PARTITION BY customer_id 189 | ORDER BY order_date DESC) AS purchase_order 190 | FROM membership_validation 191 | WHERE membership = '' 192 | ) 193 | SELECT * FROM cte_last_before_mem 194 | --since we used the ORDER BY DESC in the query above, the order 1 would mean the last date before the customer join in the membership 195 | WHERE purchase_order = 1; 196 | 197 | --Result: 198 | +──────────────+───────────────+───────────────────────────+─────────────────+ 199 | | customer_id | product_name | order_date | purchase_order | 200 | +──────────────+───────────────+───────────────────────────+─────────────────+ 201 | | A | sushi | 2021-01-01T00:00:00.000Z | 1 | 202 | | A | curry | 2021-01-01T00:00:00.000Z | 1 | 203 | | B | sushi | 2021-01-04T00:00:00.000Z | 1 | 204 | +──────────────+───────────────+───────────────────────────+─────────────────+ 205 | 206 | -- 8. What is the total items and amount spent for each member before they became a member? 207 | WITH cte_spent_before_mem AS ( 208 | SELECT 209 | customer_id, 210 | product_name, 211 | price 212 | FROM membership_validation 213 | WHERE membership = '' 214 | ) 215 | SELECT 216 | customer_id, 217 | SUM(price) AS total_spent, 218 | COUNT(*) AS total_items 219 | FROM cte_spent_before_mem 220 | GROUP BY customer_id 221 | ORDER BY customer_id; 222 | 223 | --Result: 224 | +──────────────+──────────────+──────────────+ 225 | | customer_id | total_spent | total_items | 226 | +──────────────+──────────────+──────────────+ 227 | | A | 25 | 2 | 228 | | B | 40 | 3 | 229 | +──────────────+──────────────+──────────────+ 230 | 231 | -- 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? 232 | SELECT 233 | customer_id, 234 | SUM( 235 | CASE WHEN product_name = 'sushi' 236 | THEN (price * 20) 237 | ELSE (price * 10) 238 | END 239 | ) AS total_points 240 | FROM membership_validation 241 | GROUP BY customer_id 242 | ORDER BY customer_id; 243 | 244 | --Result: 245 | +──────────────+───────────────+ 246 | | customer_id | total_points | 247 | +──────────────+───────────────+ 248 | | A | 860 | 249 | | B | 940 | 250 | +──────────────+───────────────+ 251 | 252 | -- 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January? 253 | --create temp table for days validation within the first week membership 254 | DROP TABLE IF EXISTS membership_first_week_validation; 255 | CREATE TEMP TABLE membership_first_week_validation AS 256 | WITH cte_valid AS ( 257 | SELECT 258 | customer_id, 259 | order_date, 260 | product_name, 261 | price, 262 | COUNT(*) AS order_count, 263 | CASE WHEN order_date BETWEEN join_date AND (join_date + 6) 264 | THEN 'X' 265 | ELSE '' 266 | END AS within_first_week 267 | FROM membership_validation 268 | GROUP BY 269 | customer_id, 270 | order_date, 271 | product_name, 272 | price, 273 | join_date 274 | ORDER BY 275 | customer_id, 276 | order_date 277 | ) 278 | SELECT * FROM cte_valid 279 | WHERE order_date < '2021-02-01'; 280 | --inspect the table result 281 | SELECT * FROM membership_first_week_validation; 282 | 283 | --Result: 284 | +──────────────+───────────────────────────+───────────────+────────+──────────────+────────────────────+ 285 | | customer_id | order_date | product_name | price | order_count | within_first_week | 286 | +──────────────+───────────────────────────+───────────────+────────+──────────────+────────────────────+ 287 | | A | 2021-01-01T00:00:00.000Z | curry | 15 | 1 | | 288 | | A | 2021-01-01T00:00:00.000Z | sushi | 10 | 1 | | 289 | | A | 2021-01-07T00:00:00.000Z | curry | 15 | 1 | X | 290 | | A | 2021-01-10T00:00:00.000Z | ramen | 12 | 1 | X | 291 | | A | 2021-01-11T00:00:00.000Z | ramen | 12 | 2 | X | 292 | | B | 2021-01-01T00:00:00.000Z | curry | 15 | 1 | | 293 | | B | 2021-01-02T00:00:00.000Z | curry | 15 | 1 | | 294 | | B | 2021-01-04T00:00:00.000Z | sushi | 10 | 1 | | 295 | | B | 2021-01-11T00:00:00.000Z | sushi | 10 | 1 | X | 296 | | B | 2021-01-16T00:00:00.000Z | ramen | 12 | 1 | | 297 | +──────────────+───────────────────────────+───────────────+────────+──────────────+────────────────────+ 298 | 299 | --create temp table for points calculation only in the first week of membership 300 | DROP TABLE IF EXISTS membership_first_week_points; 301 | CREATE TEMP TABLE membership_first_week_points AS 302 | WITH cte_first_week_count AS ( 303 | SELECT * FROM membership_first_week_validation 304 | WHERE within_first_week = 'X' 305 | ) 306 | SELECT 307 | customer_id, 308 | SUM( 309 | CASE WHEN within_first_week = 'X' 310 | THEN (price * order_count * 20) 311 | ELSE (price * order_count * 10) 312 | END 313 | ) AS total_points 314 | FROM cte_first_week_count 315 | GROUP BY customer_id; 316 | --inspect table results 317 | SELECT * FROM membership_first_week_points; 318 | 319 | --Result: 320 | +──────────────+───────────────+ 321 | | customer_id | total_points | 322 | +──────────────+───────────────+ 323 | | A | 1020 | 324 | | B | 200 | 325 | +──────────────+───────────────+ 326 | 327 | --create temp table for points calculation excluded the first week membership (before membership + after the first week membership) 328 | DROP TABLE IF EXISTS membership_non_first_week_points; 329 | CREATE TEMP TABLE membership_non_first_week_points AS 330 | WITH cte_first_week_count AS ( 331 | SELECT * FROM membership_first_week_validation 332 | WHERE within_first_week = '' 333 | ) 334 | SELECT 335 | customer_id, 336 | SUM( 337 | CASE WHEN product_name = 'sushi' 338 | THEN (price * order_count * 20) 339 | ELSE (price * order_count * 10) 340 | END 341 | ) AS total_points 342 | FROM cte_first_week_count 343 | GROUP BY customer_id; 344 | --inspect table results 345 | SELECT * FROM membership_non_first_week_points; 346 | 347 | --Result: 348 | +──────────────+───────────────+ 349 | | customer_id | total_points | 350 | +──────────────+───────────────+ 351 | | A | 350 | 352 | | B | 620 | 353 | +──────────────+───────────────+ 354 | 355 | 356 | --perform table union to aggregate our points value from both point calculation tables, then use SUM aggregate function to get our result 357 | WITH cte_union AS ( 358 | SELECT * FROM membership_first_week_points 359 | UNION 360 | SELECT * FROM membership_non_first_week_points 361 | ) 362 | SELECT 363 | customer_id, 364 | SUM(total_points) 365 | FROM cte_union 366 | GROUP BY customer_id 367 | ORDER BY customer_id; 368 | 369 | --Result: 370 | +──────────────+──────+ 371 | | customer_id | SUM | 372 | +──────────────+──────+ 373 | | A | 1370 | 374 | | B | 820 | 375 | +──────────────+──────+ 376 | -------------------------------------------------------------------------------- /Case Study #1 - Danny's Diner/schema-sql.sql: -------------------------------------------------------------------------------- 1 | CREATE SCHEMA dannys_diner; 2 | SET search_path = dannys_diner; 3 | 4 | CREATE TABLE sales ( 5 | "customer_id" VARCHAR(1), 6 | "order_date" DATE, 7 | "product_id" INTEGER 8 | ); 9 | 10 | INSERT INTO sales 11 | ("customer_id", "order_date", "product_id") 12 | VALUES 13 | ('A', '2021-01-01', '1'), 14 | ('A', '2021-01-01', '2'), 15 | ('A', '2021-01-07', '2'), 16 | ('A', '2021-01-10', '3'), 17 | ('A', '2021-01-11', '3'), 18 | ('A', '2021-01-11', '3'), 19 | ('B', '2021-01-01', '2'), 20 | ('B', '2021-01-02', '2'), 21 | ('B', '2021-01-04', '1'), 22 | ('B', '2021-01-11', '1'), 23 | ('B', '2021-01-16', '3'), 24 | ('B', '2021-02-01', '3'), 25 | ('C', '2021-01-01', '3'), 26 | ('C', '2021-01-01', '3'), 27 | ('C', '2021-01-07', '3'); 28 | 29 | 30 | CREATE TABLE menu ( 31 | "product_id" INTEGER, 32 | "product_name" VARCHAR(5), 33 | "price" INTEGER 34 | ); 35 | 36 | INSERT INTO menu 37 | ("product_id", "product_name", "price") 38 | VALUES 39 | ('1', 'sushi', '10'), 40 | ('2', 'curry', '15'), 41 | ('3', 'ramen', '12'); 42 | 43 | 44 | CREATE TABLE members ( 45 | "customer_id" VARCHAR(1), 46 | "join_date" DATE 47 | ); 48 | 49 | INSERT INTO members 50 | ("customer_id", "join_date") 51 | VALUES 52 | ('A', '2021-01-07'), 53 | ('B', '2021-01-09'); -------------------------------------------------------------------------------- /Case Study #2 - Pizza Runner/README.md: -------------------------------------------------------------------------------- 1 | # [8-Week SQL Challenge](https://github.com/ndleah/8-Week-SQL-Challenge) 2 | ![Star Badge](https://img.shields.io/static/v1?label=%F0%9F%8C%9F&message=If%20Useful&style=style=flat&color=BC4E99) 3 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-971901?)](https://github.com/ndleah/8-Week-SQL-Challenge) 4 | [![View Repositories](https://img.shields.io/badge/View-My_Repositories-blue?logo=GitHub)](https://github.com/ndleah?tab=repositories) 5 | [![View My Profile](https://img.shields.io/badge/View-My_Profile-green?logo=GitHub)](https://github.com/ndleah) 6 | 7 | 8 | # 🍕 Case Study #2 - Pizza Runner 9 |

10 | 11 | 12 | ## 📕 Table Of Contents 13 | - 🛠️ [Problem Statement](#problem-statement) 14 | - 📂 [Dataset](#dataset) 15 | - ♻️ [Data Preprocessing](#️-data-preprocessing) 16 | - 🚀 [Solutions](#-solutions) 17 | 18 | --- 19 | 20 | ## 🛠️ Problem Statement 21 | 22 | > Danny was scrolling through his Instagram feed when something really caught his eye - “80s Retro Styling and Pizza Is The Future!” 23 | > 24 | > Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire - so he had one more genius idea to combine with it - he was going to Uberize it - and so **Pizza Runner** was launched! 25 | > 26 | > Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers. 27 | 28 | --- 29 | 30 | ## 📂 Dataset 31 | Danny has shared with you 6 key datasets for this case study: 32 | 33 | ### **```runners```** 34 |

35 | 36 | View table 37 | 38 | 39 | The runners table shows the **```registration_date```** for each new runner. 40 | 41 | 42 | |runner_id|registration_date| 43 | |---------|-----------------| 44 | |1 |1/1/2021 | 45 | |2 |1/3/2021 | 46 | |3 |1/8/2021 | 47 | |4 |1/15/2021 | 48 | 49 |
50 | 51 | 52 | ### **```customer_orders```** 53 | 54 |
55 | 56 | View table 57 | 58 | 59 | Customer pizza orders are captured in the **```customer_orders```** table with 1 row for each individual pizza that is part of the order. 60 | 61 | |order_id|customer_id|pizza_id|exclusions|extras|order_time | 62 | |--------|---------|--------|----------|------|------------------| 63 | |1 |101 |1 | | |44197.75349537037 | 64 | |2 |101 |1 | | |44197.79226851852 | 65 | |3 |102 |1 | | |44198.9940162037 | 66 | |3 |102 |2 | |*null* |44198.9940162037 | 67 | |4 |103 |1 |4 | |44200.558171296296| 68 | |4 |103 |1 |4 | |44200.558171296296| 69 | |4 |103 |2 |4 | |44200.558171296296| 70 | |5 |104 |1 |null |1 |44204.87533564815 | 71 | |6 |101 |2 |null |null |44204.877233796295| 72 | |7 |105 |2 |null |1 |44204.88922453704 | 73 | |8 |102 |1 |null |null |44205.99621527778 | 74 | |9 |103 |1 |4 |1, 5 |44206.47429398148 | 75 | |10 |104 |1 |null |null |44207.77417824074 | 76 | |10 |104 |1 |2, 6 |1, 4 |44207.77417824074 | 77 | 78 |
79 | 80 | ### **```runner_orders```** 81 | 82 |
83 | 84 | View table 85 | 86 | 87 | After each orders are received through the system - they are assigned to a runner - however not all orders are fully completed and can be cancelled by the restaurant or the customer. 88 | 89 | The **```pickup_time```** is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. 90 | 91 | The **```distance```** and **```duration```** fields are related to how far and long the runner had to travel to deliver the order to the respective customer. 92 | 93 | 94 | 95 | |order_id|runner_id|pickup_time|distance |duration|cancellation | 96 | |--------|---------|-----------|----------|--------|------------------| 97 | |1 |1 |1/1/2021 18:15|20km |32 minutes| | 98 | |2 |1 |1/1/2021 19:10|20km |27 minutes| | 99 | |3 |1 |1/3/2021 0:12|13.4km |20 mins |*null* | 100 | |4 |2 |1/4/2021 13:53|23.4 |40 |*null* | 101 | |5 |3 |1/8/2021 21:10|10 |15 |*null* | 102 | |6 |3 |null |null |null |Restaurant Cancellation| 103 | |7 |2 |1/8/2020 21:30|25km |25mins |null | 104 | |8 |2 |1/10/2020 0:15|23.4 km |15 minute|null | 105 | |9 |2 |null |null |null |Customer Cancellation| 106 | |10 |1 |1/11/2020 18:50|10km |10minutes|null | 107 | 108 |
109 | 110 | ### **```pizza_names```** 111 | 112 |
113 | 114 | View table 115 | 116 | 117 | |pizza_id|pizza_name| 118 | |--------|----------| 119 | |1 |Meat Lovers| 120 | |2 |Vegetarian| 121 | 122 |
123 | 124 | ### **```pizza_recipes```** 125 | 126 |
127 | 128 | View table 129 | 130 | 131 | Each **```pizza_id```** has a standard set of **```toppings```** which are used as part of the pizza recipe. 132 | 133 | 134 | |pizza_id|toppings | 135 | |--------|---------| 136 | |1 |1, 2, 3, 4, 5, 6, 8, 10| 137 | |2 |4, 6, 7, 9, 11, 12| 138 | 139 |
140 | 141 | ### **```pizza_toppings```** 142 | 143 |
144 | 145 | View table 146 | 147 | 148 | This table contains all of the **```topping_name```** values with their corresponding **```topping_id```** value. 149 | 150 | 151 | |topping_id|topping_name| 152 | |----------|------------| 153 | |1 |Bacon | 154 | |2 |BBQ Sauce | 155 | |3 |Beef | 156 | |4 |Cheese | 157 | |5 |Chicken | 158 | |6 |Mushrooms | 159 | |7 |Onions | 160 | |8 |Pepperoni | 161 | |9 |Peppers | 162 | |10 |Salami | 163 | |11 |Tomatoes | 164 | |12 |Tomato Sauce| 165 | 166 |
167 | 168 | --- 169 | 170 | ## ♻️ Data Preprocessing 171 | 172 | ### **Data Issues** 173 | 174 | Data issues in the existing schema include: 175 | 176 | * **```customer_orders``` table** 177 | - ```null``` values entered as text 178 | - using both ```NaN``` and ```null``` values 179 | * **```runner_orders``` table** 180 | - ```null``` values entered as text 181 | - using both ```NaN``` and ```null``` values 182 | - units manually entered in ```distance``` and ```duration``` columns 183 | 184 | ### **Data Cleaning** 185 | 186 | **```customer_orders```** 187 | - Converting ```null``` and ```NaN``` values into blanks ```''``` in ```exclusions``` and ```extras``` 188 | - Blanks indicate that the customer requested no extras/exclusions for the pizza, whereas ```null``` values would be ambiguous. 189 | - Saving the transformations in a temporary table 190 | - We want to avoid permanently changing the raw data via ```UPDATE``` commands if possible. 191 | 192 | **```runner_orders```** 193 | 194 | - Converting ```'null'``` text values into null values for ```pickup_time```, ```distance``` and ```duration``` 195 | - Extracting only numbers and decimal spaces for the distance and duration columns 196 | - Use regular expressions and ```NULLIF``` to convert non-numeric entries to null values 197 | - Converting blanks, ```'null'``` and ```NaN``` into null values for cancellation 198 | - Saving the transformations in a temporary table 199 | 200 | > ⚠️ Access [here](https://github.com/ndleah/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/table-transform.sql) to view full solution. 201 | 202 | **Result:** 203 | 204 |
205 | 206 | updated_customer_orders 207 | 208 | 209 | |order_id|customer_id|pizza_id|exclusions|extras|order_time | 210 | |--------|-----------|--------|----------|------|------------------------| 211 | |1 |101 |1 | | |2020-01-01T18:05:02.000Z| 212 | |2 |101 |1 | | |2020-01-01T19:00:52.000Z| 213 | |3 |102 |1 | | |2020-01-02T12:51:23.000Z| 214 | |3 |102 |2 | | |2020-01-02T12:51:23.000Z| 215 | |4 |103 |1 |4 | |2020-01-04T13:23:46.000Z| 216 | |4 |103 |1 |4 | |2020-01-04T13:23:46.000Z| 217 | |4 |103 |2 |4 | |2020-01-04T13:23:46.000Z| 218 | |5 |104 |1 | |1 |2020-01-08T21:00:29.000Z| 219 | |6 |101 |2 | | |2020-01-08T21:03:13.000Z| 220 | |7 |105 |2 | |1 |2020-01-08T21:20:29.000Z| 221 | |8 |102 |1 | | |2020-01-09T23:54:33.000Z| 222 | |9 |103 |1 |4 |1, 5 |2020-01-10T11:22:59.000Z| 223 | |10 |104 |1 | | |2020-01-11T18:34:49.000Z| 224 | |10 |104 |1 |2, 6 |1, 4 |2020-01-11T18:34:49.000Z| 225 | 226 |
227 | 228 |
229 | 230 | updated_runner_orders 231 | 232 | 233 | | order_id | runner_id | pickup_time | distance | duration | cancellation | 234 | |----------|-----------|---------------------|----------|----------|-------------------------| 235 | | 1 | 1 | 2020-01-01 18:15:34 | 20 | 32 | | 236 | | 2 | 1 | 2020-01-01 19:10:54 | 20 | 27 | | 237 | | 3 | 1 | 2020-01-02 00:12:37 | 13.4 | 20 | | 238 | | 4 | 2 | 2020-01-04 13:53:03 | 23.4 | 40 | | 239 | | 5 | 3 | 2020-01-08 21:10:57 | 10 | 15 | | 240 | | 6 | 3 | | | | Restaurant Cancellation | 241 | | 7 | 2 | 2020-01-08 21:30:45 | 25 | 25 | | 242 | | 8 | 2 | 2020-01-10 00:15:02 | 23.4 | 15 | | 243 | | 9 | 2 | | | | Customer Cancellation | 244 | | 10 | 1 | 2020-01-11 18:50:20 | 10 | 10 | | 245 | 246 |
247 | 248 | --- 249 | 250 | ## 🚀 Solutions 251 | 252 |
253 | 254 | Pizza Metrics 255 | 256 | 257 | ### **Q1. How many pizzas were ordered?** 258 | ```sql 259 | SELECT COUNT(*) AS pizza_count 260 | FROM updated_customer_orders; 261 | ``` 262 | |pizza_count| 263 | |-----------| 264 | |14 | 265 | 266 | ### **Q2. How many unique customer orders were made?** 267 | ```sql 268 | SELECT COUNT (DISTINCT order_id) AS order_count 269 | FROM updated_customer_orders; 270 | ``` 271 | |order_count| 272 | |-----------| 273 | |10 | 274 | 275 | 276 | ### **Q3. How many successful orders were delivered by each runner?** 277 | ```sql 278 | SELECT 279 | runner_id, 280 | COUNT(order_id) AS successful_orders 281 | FROM updated_runner_orders 282 | WHERE cancellation IS NULL 283 | OR cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 284 | GROUP BY runner_id 285 | ORDER BY successful_orders DESC; 286 | ``` 287 | 288 | | runner_id | successful_orders | 289 | |-----------|-------------------| 290 | | 1 | 4 | 291 | | 2 | 3 | 292 | | 3 | 1 | 293 | 294 | 295 | ### **Q4. How many of each type of pizza was delivered?** 296 | ```SQL 297 | SELECT 298 | pn.pizza_name, 299 | COUNT(co.*) AS pizza_type_count 300 | FROM updated_customer_orders AS co 301 | INNER JOIN pizza_runner.pizza_names AS pn 302 | ON co.pizza_id = pn.pizza_id 303 | INNER JOIN pizza_runner.runner_orders AS ro 304 | ON co.order_id = ro.order_id 305 | WHERE cancellation IS NULL 306 | OR cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 307 | GROUP BY pn.pizza_name 308 | ORDER BY pn.pizza_name; 309 | ``` 310 | 311 | OR 312 | 313 | ```SQL 314 | SELECT 315 | pn.pizza_name, 316 | COUNT(co.*) AS pizza_type_count 317 | FROM updated_customer_orders AS co 318 | INNER JOIN pizza_runner.pizza_names AS pn 319 | ON co.pizza_id = pn.pizza_id 320 | WHERE EXISTS ( 321 | SELECT 1 FROM updated_runner_orders AS ro 322 | WHERE ro.order_id = co.order_id 323 | AND ( 324 | ro.cancellation IS NULL 325 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 326 | ) 327 | ) 328 | GROUP BY pn.pizza_name 329 | ORDER BY pn.pizza_name; 330 | ``` 331 | | pizza_name | pizza_type_count | 332 | |------------|------------------| 333 | | Meatlovers | 9 | 334 | | Vegetarian | 3 | 335 | 336 | 337 | ### **Q5. How many Vegetarian and Meatlovers were ordered by each customer?** 338 | ```SQL 339 | SELECT 340 | customer_id, 341 | SUM(CASE WHEN pizza_id = 1 THEN 1 ELSE 0 END) AS meat_lovers, 342 | SUM(CASE WHEN pizza_id = 2 THEN 1 ELSE 0 END) AS vegetarian 343 | FROM updated_customer_orders 344 | GROUP BY customer_id; 345 | ``` 346 | 347 | | customer_id | meat_lovers | vegetarian | 348 | |-------------|-------------|------------| 349 | | 101 | 2 | 1 | 350 | | 103 | 3 | 1 | 351 | | 104 | 3 | 0 | 352 | | 105 | 0 | 1 | 353 | | 102 | 2 | 1 | 354 | 355 | ### **Q6. What was the maximum number of pizzas delivered in a single order?** 356 | ```SQL 357 | SELECT MAX(pizza_count) AS max_count 358 | FROM ( 359 | SELECT 360 | co.order_id, 361 | COUNT(co.pizza_id) AS pizza_count 362 | FROM updated_customer_orders AS co 363 | INNER JOIN updated_runner_orders AS ro 364 | ON co.order_id = ro.order_id 365 | WHERE 366 | ro.cancellation IS NULL 367 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 368 | GROUP BY co.order_id) AS mycount; 369 | ``` 370 | 371 | | max_count | 372 | |-----------| 373 | | 3 | 374 | 375 | 376 | ### **Q7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?** 377 | ```SQL 378 | SELECT 379 | co.customer_id, 380 | SUM (CASE WHEN co.exclusions IS NOT NULL OR co.extras IS NOT NULL THEN 1 ELSE 0 END) AS changes, 381 | SUM (CASE WHEN co.exclusions IS NULL OR co.extras IS NULL THEN 1 ELSE 0 END) AS no_change 382 | FROM updated_customer_orders AS co 383 | INNER JOIN updated_runner_orders AS ro 384 | ON co.order_id = ro.order_id 385 | WHERE ro.cancellation IS NULL 386 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 387 | GROUP BY co.customer_id 388 | ORDER BY co.customer_id; 389 | ``` 390 | 391 | | customer_id | changes | no_change | 392 | |-------------|---------|-----------| 393 | | 101 | 0 | 2 | 394 | | 102 | 0 | 3 | 395 | | 103 | 3 | 3 | 396 | | 104 | 2 | 2 | 397 | | 105 | 1 | 1 | 398 | 399 | 400 | ### **Q8. How many pizzas were delivered that had both exclusions and extras?** 401 | ```SQL 402 | SELECT 403 | SUM(CASE WHEN co.exclusions IS NOT NULL AND co.extras IS NOT NULL THEN 1 ELSE 0 END) as pizza_count 404 | FROM updated_customer_orders AS co 405 | INNER JOIN updated_runner_orders AS ro 406 | ON co.order_id = ro.order_id 407 | WHERE ro.cancellation IS NULL 408 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 409 | ``` 410 | 411 | | pizza_count | 412 | |-------------| 413 | | 1 | 414 | 415 | 416 | ### **Q9. What was the total volume of pizzas ordered for each hour of the day?** 417 | ```SQL 418 | SELECT 419 | DATE_PART('hour', order_time::TIMESTAMP) AS hour_of_day, 420 | COUNT(*) AS pizza_count 421 | FROM updated_customer_orders 422 | WHERE order_time IS NOT NULL 423 | GROUP BY hour_of_day 424 | ORDER BY hour_of_day; 425 | ``` 426 | 427 | | hour_of_day | pizza_count | 428 | |-------------|-------------| 429 | | 11 | 1 | 430 | | 12 | 2 | 431 | | 13 | 3 | 432 | | 18 | 3 | 433 | | 19 | 1 | 434 | | 21 | 3 | 435 | | 23 | 1 | 436 | 437 | ### **Q10. What was the volume of orders for each day of the week?** 438 | ```SQL 439 | SELECT 440 | TO_CHAR(order_time, 'Day') AS day_of_week, 441 | COUNT(*) AS pizza_count 442 | FROM updated_customer_orders 443 | GROUP BY 444 | day_of_week, 445 | DATE_PART('dow', order_time) 446 | ORDER BY day_of_week; 447 | ``` 448 | 449 | | day_of_week | pizza_count | 450 | |-------------|-------------| 451 | | Friday | 1 | 452 | | Saturday | 5 | 453 | | Thursday | 3 | 454 | | Wednesday | 5 | 455 | 456 |
457 | 458 |
459 | 460 | Runner and Customer Experience 461 | 462 | 463 | ### **Q1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)** 464 | ```SQL 465 | WITH runner_signups AS ( 466 | SELECT 467 | runner_id, 468 | registration_date, 469 | registration_date - ((registration_date - '2021-01-01') % 7) AS start_of_week 470 | FROM pizza_runner.runners 471 | ) 472 | SELECT 473 | start_of_week, 474 | COUNT(runner_id) AS signups 475 | FROM runner_signups 476 | GROUP BY start_of_week 477 | ORDER BY start_of_week; 478 | ``` 479 | 480 | | start_of_week | signups | 481 | |--------------------------|---------| 482 | | 2021-01-01T00:00:00.000Z | 2 | 483 | | 2021-01-08T00:00:00.000Z | 1 | 484 | | 2021-01-15T00:00:00.000Z | 1 | 485 | 486 | ### **Q2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?** 487 | ```SQL 488 | WITH runner_pickups AS ( 489 | SELECT 490 | ro.runner_id, 491 | ro.order_id, 492 | co.order_time, 493 | ro.pickup_time, 494 | (pickup_time - order_time) AS time_to_pickup 495 | FROM updated_runner_orders AS ro 496 | INNER JOIN updated_customer_orders AS co 497 | ON ro.order_id = co.order_id 498 | ) 499 | SELECT 500 | runner_id, 501 | date_part('minutes', AVG(time_to_pickup)) AS avg_arrival_minutes 502 | FROM runner_pickups 503 | GROUP BY runner_id 504 | ORDER BY runner_id; 505 | ``` 506 | | runner_id | avg_arrival_minutes | 507 | |-----------|---------------------| 508 | | 1 | -4 | 509 | | 2 | 23 | 510 | | 3 | 10 | 511 | 512 | ### **Q3. Is there any relationship between the number of pizzas and how long the order takes to prepare?** 513 | ```SQL 514 | WITH order_count AS ( 515 | SELECT 516 | order_id, 517 | order_time, 518 | COUNT(pizza_id) AS pizzas_order_count 519 | FROM updated_customer_orders 520 | GROUP BY order_id, order_time 521 | ), 522 | prepare_time AS ( 523 | SELECT 524 | ro.order_id, 525 | co.order_time, 526 | ro.pickup_time, 527 | co.pizzas_order_count, 528 | (pickup_time - order_time) AS time_to_pickup 529 | FROM updated_runner_orders AS ro 530 | INNER JOIN order_count AS co 531 | ON ro.order_id = co.order_id 532 | WHERE pickup_time IS NOT NULL 533 | ) 534 | SELECT 535 | pizzas_order_count, 536 | AVG(time_to_pickup) AS avg_time 537 | FROM prepare_time 538 | GROUP BY pizzas_order_count 539 | ORDER BY pizzas_order_count; 540 | ``` 541 | 542 | | pizzas_order_count | avg_time | 543 | |--------------------|-----------------| 544 | | 1 | 12 | 545 | | 2 | -6 | 546 | | 3 | 29 | 547 | 548 | ### **Q4. What was the average distance travelled for each runner?** 549 | ```SQL 550 | SELECT 551 | runner_id, 552 | ROUND(AVG(distance), 2) AS avg_distance 553 | FROM updated_runner_orders 554 | GROUP BY runner_id 555 | ORDER BY runner_id; 556 | ``` 557 | 558 | | runner_id | avg_distance | 559 | |-----------|--------------| 560 | | 1 | 15.85 | 561 | | 2 | 23.93 | 562 | | 3 | 10.00 | 563 | 564 | ### **Q5. What was the difference between the longest and shortest delivery times for all orders?** 565 | ```SQL 566 | SELECT 567 | MAX(duration) - MIN(duration) AS difference 568 | FROM updated_runner_orders; 569 | ``` 570 | 571 | | difference | 572 | |------------| 573 | | 30 | 574 | 575 | ### **Q6. What was the average speed for each runner for each delivery and do you notice any trend for these values?** 576 | ```SQL 577 | WITH order_count AS ( 578 | SELECT 579 | order_id, 580 | order_time, 581 | COUNT(pizza_id) AS pizzas_count 582 | FROM updated_customer_orders 583 | GROUP BY 584 | order_id, 585 | order_time 586 | ) 587 | SELECT 588 | ro.order_id, 589 | ro.runner_id, 590 | co.pizzas_count, 591 | ro.distance, 592 | ro.duration, 593 | ROUND(60 * ro.distance / ro.duration, 2) AS speed 594 | FROM updated_runner_orders AS ro 595 | INNER JOIN order_count AS co 596 | ON ro.order_id = co.order_id 597 | WHERE pickup_time IS NOT NULL 598 | ORDER BY speed DESC 599 | ``` 600 | 601 | | order_id | runner_id | pizzas_count | distance | duration | speed | 602 | |----------|-----------|--------------|----------|----------|-------| 603 | | 8 | 2 | 1 | 23.4 | 15 | 93.60 | 604 | | 7 | 2 | 1 | 25 | 25 | 60.00 | 605 | | 10 | 1 | 2 | 10 | 10 | 60.00 | 606 | | 2 | 1 | 1 | 20 | 27 | 44.44 | 607 | | 3 | 1 | 2 | 13.4 | 20 | 40.20 | 608 | | 5 | 3 | 1 | 10 | 15 | 40.00 | 609 | | 1 | 1 | 1 | 20 | 32 | 37.50 | 610 | | 4 | 2 | 3 | 23.4 | 40 | 35.10 | 611 | 612 | **Finding:** 613 | - **Orders shown in decreasing order of average speed:** 614 | > *While the fastest order only carried 1 pizza and the slowest order carried 3 pizzas, there is no clear trend that more pizzas slow down the delivery speed of an order.* 615 | 616 | ### **Q7. What is the successful delivery percentage for each runner?** 617 | ```sql 618 | SELECT 619 | runner_id, 620 | COUNT(pickup_time) as delivered, 621 | COUNT(order_id) AS total, 622 | ROUND(100 * COUNT(pickup_time) / COUNT(order_id)) AS delivery_percent 623 | FROM updated_runner_orders 624 | GROUP BY runner_id 625 | ORDER BY runner_id; 626 | ``` 627 | 628 | | runner_id | delivered | total | delivery_percent | 629 | |-----------|-----------|-------|------------------| 630 | | 1 | 4 | 4 | 100 | 631 | | 2 | 3 | 4 | 75 | 632 | | 3 | 1 | 2 | 50 | 633 | 634 | 635 |
636 | 637 | [![View Data Exploration Folder](https://img.shields.io/badge/View-Solution-971901?style=for-the-badge&logo=GITHUB)](https://github.com/ndleah/8-Week-SQL-Challenge/tree/main/Case%20Study%20%232%20-%20Pizza%20Runner/2.%20Runner%20and%20Customer%20Experience) 638 | 639 | --- 640 |

© 2021 Leah Nguyen

641 | -------------------------------------------------------------------------------- /Case Study #2 - Pizza Runner/query-sql.sql: -------------------------------------------------------------------------------- 1 | /* -------------------- 2 | Table Transformation 3 | --------------------*/ 4 | 5 | -- data type check 6 | --customer_orders 7 | SELECT 8 | table_name, 9 | column_name, 10 | data_type 11 | FROM information_schema.columns 12 | WHERE table_name = 'customer_orders'; 13 | 14 | --Result: 15 | +──────────────────+──────────────+──────────────────────────────+ 16 | | table_name | column_name | data_type | 17 | +──────────────────+──────────────+──────────────────────────────+ 18 | | customer_orders | order_id | integer | 19 | | customer_orders | customer_id | integer | 20 | | customer_orders | pizza_id | integer | 21 | | customer_orders | exclusions | character varying | 22 | | customer_orders | extras | character varying | 23 | | customer_orders | order_time | timestamp without time zone | 24 | +──────────────────+──────────────+──────────────────────────────+ 25 | 26 | | table_name | column_name | data_type | 27 | |-----------------|-------------|-----------------------------| 28 | | customer_orders | order_id | integer | 29 | | customer_orders | customer_id | integer | 30 | | customer_orders | pizza_id | integer | 31 | | customer_orders | exclusions | character varying | 32 | | customer_orders | extras | character varying | 33 | | customer_orders | order_time | timestamp without time zone | 34 | 35 | --runner_orders 36 | SELECT 37 | table_name, 38 | column_name, 39 | data_type 40 | FROM information_schema.columns 41 | WHERE table_name = 'runner_orders'; 42 | 43 | --Result: 44 | +────────────────+──────────────+────────────────────+ 45 | | table_name | column_name | data_type | 46 | +────────────────+──────────────+────────────────────+ 47 | | runner_orders | order_id | integer | 48 | | runner_orders | runner_id | integer | 49 | | runner_orders | pickup_time | character varying | 50 | | runner_orders | distance | character varying | 51 | | runner_orders | duration | character varying | 52 | | runner_orders | cancellation | character varying | 53 | +────────────────+──────────────+────────────────────+ 54 | 55 | | table_name | column_name | data_type | 56 | |---------------|--------------|-------------------| 57 | | runner_orders | order_id | integer | 58 | | runner_orders | runner_id | integer | 59 | | runner_orders | pickup_time | character varying | 60 | | runner_orders | distance | character varying | 61 | | runner_orders | duration | character varying | 62 | | runner_orders | cancellation | character varying | 63 | 64 | 65 | --Update tables 66 | 67 | --1. customer_order 68 | /* 69 | Cleaning customer_orders 70 | - Identify records with null or 'null' values 71 | - updating null or 'null' values to '' 72 | - blanks '' are not null because it indicates the customer asked for no extras or exclusions 73 | */ 74 | --Blanks indicate that the customer requested no extras/exclusions for the pizza, whereas null values would be ambiguous on this. 75 | 76 | DROP TABLE IF EXISTS updated_customer_orders; 77 | CREATE TEMP TABLE updated_customer_orders AS ( 78 | SELECT 79 | order_id, 80 | customer_id, 81 | pizza_id, 82 | CASE 83 | WHEN exclusions IS NULL 84 | OR exclusions LIKE 'null' THEN '' 85 | ELSE exclusions 86 | END AS exclusions, 87 | CASE 88 | WHEN extras IS NULL 89 | OR extras LIKE 'null' THEN '' 90 | ELSE extras 91 | END AS extras, 92 | order_time 93 | FROM pizza_runner.customer_orders 94 | ); 95 | SELECT * FROM updated_customer_orders; 96 | 97 | --Result: 98 | |order_id|customer_id|pizza_id|exclusions|extras|order_time | 99 | |--------|-----------|--------|----------|------|------------------------| 100 | |1 |101 |1 | | |2020-01-01T18:05:02.000Z| 101 | |2 |101 |1 | | |2020-01-01T19:00:52.000Z| 102 | |3 |102 |1 | | |2020-01-02T12:51:23.000Z| 103 | |3 |102 |2 | | |2020-01-02T12:51:23.000Z| 104 | |4 |103 |1 |4 | |2020-01-04T13:23:46.000Z| 105 | |4 |103 |1 |4 | |2020-01-04T13:23:46.000Z| 106 | |4 |103 |2 |4 | |2020-01-04T13:23:46.000Z| 107 | |5 |104 |1 | |1 |2020-01-08T21:00:29.000Z| 108 | |6 |101 |2 | | |2020-01-08T21:03:13.000Z| 109 | |7 |105 |2 | |1 |2020-01-08T21:20:29.000Z| 110 | |8 |102 |1 | | |2020-01-09T23:54:33.000Z| 111 | |9 |103 |1 |4 |1, 5 |2020-01-10T11:22:59.000Z| 112 | |10 |104 |1 | | |2020-01-11T18:34:49.000Z| 113 | |10 |104 |1 |2, 6 |1, 4 |2020-01-11T18:34:49.000Z| 114 | */ 115 | 116 | +───────────+──────────────+───────────+─────────────+─────────+───────────────────────────+ 117 | | order_id | customer_id | pizza_id | exclusions | extras | order_time | 118 | +───────────+──────────────+───────────+─────────────+─────────+───────────────────────────+ 119 | | 1 | 101 | 1 | | | 2020-01-01T18:05:02.000Z | 120 | | 2 | 101 | 1 | | | 2020-01-01T19:00:52.000Z | 121 | | 3 | 102 | 1 | | | 2020-01-02T12:51:23.000Z | 122 | | 3 | 102 | 2 | | | 2020-01-02T12:51:23.000Z | 123 | | 4 | 103 | 1 | 4 | | 2020-01-04T13:23:46.000Z | 124 | | 4 | 103 | 1 | 4 | | 2020-01-04T13:23:46.000Z | 125 | | 4 | 103 | 2 | 4 | | 2020-01-04T13:23:46.000Z | 126 | | 5 | 104 | 1 | | 1 | 2020-01-08T21:00:29.000Z | 127 | | 6 | 101 | 2 | | | 2020-01-08T21:03:13.000Z | 128 | | 7 | 105 | 2 | | 1 | 2020-01-08T21:20:29.000Z | 129 | | 8 | 102 | 1 | | | 2020-01-09T23:54:33.000Z | 130 | | 9 | 103 | 1 | 4 | 1, 5 | 2020-01-10T11:22:59.000Z | 131 | | 10 | 104 | 1 | | | 2020-01-11T18:34:49.000Z | 132 | | 10 | 104 | 1 | 2, 6 | 1, 4 | 2020-01-11T18:34:49.000Z | 133 | +───────────+──────────────+───────────+─────────────+─────────+───────────────────────────+ 134 | 135 | --2. runner_orders 136 | /* 137 | - pickup time, distance, duration is of the wrong type 138 | - records have nulls in these columns when the orders are cancelled 139 | - convert text 'null' to null values 140 | - units (km, minutes) need to be removed from distance and duration 141 | */ 142 | DROP TABLE IF EXISTS updated_runner_orders; 143 | CREATE TEMP TABLE updated_runner_orders AS ( 144 | SELECT 145 | order_id, 146 | runner_id, 147 | CASE WHEN pickup_time LIKE 'null' THEN null ELSE pickup_time END::timestamp AS pickup_time, 148 | NULLIF(regexp_replace(distance, '[^0-9.]','','g'), '')::numeric AS distance, 149 | NULLIF(regexp_replace(duration, '[^0-9.]','','g'), '')::numeric AS duration, 150 | CASE WHEN cancellation IN ('null', 'NaN', '') THEN null ELSE cancellation END AS cancellation 151 | FROM pizza_runner.runner_orders); 152 | SELECT * FROM updated_runner_orders; 153 | 154 | --Result: 155 | | order_id | runner_id | pickup_time | distance | duration | cancellation | 156 | |----------|-----------|---------------------|----------|----------|-------------------------| 157 | | 1 | 1 | 2020-01-01 18:15:34 | 20 | 32 | | 158 | | 2 | 1 | 2020-01-01 19:10:54 | 20 | 27 | | 159 | | 3 | 1 | 2020-01-02 00:12:37 | 13.4 | 20 | | 160 | | 4 | 2 | 2020-01-04 13:53:03 | 23.4 | 40 | | 161 | | 5 | 3 | 2020-01-08 21:10:57 | 10 | 15 | | 162 | | 6 | 3 | | | | Restaurant Cancellation | 163 | | 7 | 2 | 2020-01-08 21:30:45 | 25 | 25 | | 164 | | 8 | 2 | 2020-01-10 00:15:02 | 23.4 | 15 | | 165 | | 9 | 2 | | | | Customer Cancellation | 166 | | 10 | 1 | 2020-01-11 18:50:20 | 10 | 10 | | 167 | */ 168 | 169 | +───────────+────────────+──────────────────────+───────────+───────────+──────────────────────────+ 170 | | order_id | runner_id | pickup_time | distance | duration | cancellation | 171 | +───────────+────────────+──────────────────────+───────────+───────────+──────────────────────────+ 172 | | 1 | 1 | 2020-01-01 18:15:34 | 20 | 32 | | 173 | | 2 | 1 | 2020-01-01 19:10:54 | 20 | 27 | | 174 | | 3 | 1 | 2020-01-02 00:12:37 | 13.4 | 20 | | 175 | | 4 | 2 | 2020-01-04 13:53:03 | 23.4 | 40 | | 176 | | 5 | 3 | 2020-01-08 21:10:57 | 10 | 15 | | 177 | | 6 | 3 | | | | Restaurant Cancellation | 178 | | 7 | 2 | 2020-01-08 21:30:45 | 25 | 25 | | 179 | | 8 | 2 | 2020-01-10 00:15:02 | 23.4 | 15 | | 180 | | 9 | 2 | | | | Customer Cancellation | 181 | | 10 | 1 | 2020-01-11 18:50:20 | 10 | 10 | | 182 | +───────────+────────────+──────────────────────+───────────+───────────+──────────────────────────+ 183 | 184 | -- data type check 185 | --updated_customer_orders 186 | SELECT 187 | table_name, 188 | column_name, 189 | data_type 190 | FROM information_schema.columns 191 | WHERE table_name = 'updated_customer_orders' 192 | 193 | --Result: 194 | +──────────────────────────+──────────────+──────────────────────────────+ 195 | | table_name | column_name | data_type | 196 | +──────────────────────────+──────────────+──────────────────────────────+ 197 | | updated_customer_orders | order_id | integer | 198 | | updated_customer_orders | customer_id | integer | 199 | | updated_customer_orders | pizza_id | integer | 200 | | updated_customer_orders | exclusions | character varying | 201 | | updated_customer_orders | extras | character varying | 202 | | updated_customer_orders | order_time | timestamp without time zone | 203 | +──────────────────────────+──────────────+──────────────────────────────+ 204 | 205 | 206 | | table_name | column_name | data_type | 207 | |-------------------------|-------------|-----------------------------| 208 | | updated_customer_orders | order_id | integer | 209 | | updated_customer_orders | customer_id | integer | 210 | | updated_customer_orders | pizza_id | integer | 211 | | updated_customer_orders | exclusions | character varying | 212 | | updated_customer_orders | extras | character varying | 213 | | updated_customer_orders | order_time | timestamp without time zone | 214 | 215 | --updated_runner_orders 216 | SELECT 217 | table_name, 218 | column_name, 219 | data_type 220 | FROM information_schema.columns 221 | WHERE table_name = 'updated_runner_orders' 222 | 223 | --Result: 224 | +────────────────────────+──────────────+──────────────────────────────+ 225 | | table_name | column_name | data_type | 226 | +────────────────────────+──────────────+──────────────────────────────+ 227 | | updated_runner_orders | order_id | integer | 228 | | updated_runner_orders | runner_id | integer | 229 | | updated_runner_orders | pickup_time | timestamp without time zone | 230 | | updated_runner_orders | distance | numeric | 231 | | updated_runner_orders | duration | numeric | 232 | | updated_runner_orders | cancellation | character varying | 233 | +────────────────────────+──────────────+──────────────────────────────+ 234 | 235 | | table_name | column_name | data_type | 236 | |-----------------------|--------------|-----------------------------| 237 | | updated_runner_orders | order_id | integer | 238 | | updated_runner_orders | runner_id | integer | 239 | | updated_runner_orders | pickup_time | timestamp without time zone | 240 | | updated_runner_orders | distance | numeric | 241 | | updated_runner_orders | duration | numeric | 242 | | updated_runner_orders | cancellation | character varying | 243 | 244 | 245 | /* -------------------- 246 | Case Study Questions: 247 | Pizza Metrics 248 | --------------------*/ 249 | 250 | 251 | -- 1. How many pizzas were ordered? 252 | SELECT COUNT(*) AS pizza_count 253 | FROM updated_customer_orders; 254 | 255 | --Result: 256 | +──────────────+ 257 | | pizza_count | 258 | +──────────────+ 259 | | 14 | 260 | +──────────────+ 261 | 262 | -- 2. How many unique customer orders were made? 263 | SELECT COUNT (DISTINCT order_id) AS order_count 264 | FROM updated_customer_orders; 265 | 266 | --Result: 267 | +──────────────+ 268 | | order_count | 269 | +──────────────+ 270 | | 10 | 271 | +──────────────+ 272 | 273 | -- 3. How many successful orders were delivered by each runner? 274 | SELECT 275 | runner_id, 276 | COUNT(order_id) AS successful_orders 277 | FROM updated_runner_orders 278 | WHERE cancellation IS NULL 279 | OR cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 280 | GROUP BY runner_id 281 | ORDER BY successful_orders DESC; 282 | 283 | --Result: 284 | +────────────+────────────────────+ 285 | | runner_id | successful_orders | 286 | +────────────+────────────────────+ 287 | | 1 | 4 | 288 | | 2 | 3 | 289 | | 3 | 1 | 290 | +────────────+────────────────────+ 291 | 292 | -- 4. How many of each type of pizza was delivered? 293 | SELECT 294 | pn.pizza_name, 295 | COUNT(co.*) AS pizza_type_count 296 | FROM updated_customer_orders AS co 297 | INNER JOIN pizza_runner.pizza_names AS pn 298 | ON co.pizza_id = pn.pizza_id 299 | INNER JOIN pizza_runner.runner_orders AS ro 300 | ON co.order_id = ro.order_id 301 | WHERE cancellation IS NULL 302 | OR cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 303 | GROUP BY pn.pizza_name 304 | ORDER BY pn.pizza_name; 305 | 306 | --OR 307 | SELECT 308 | pn.pizza_name, 309 | COUNT(co.*) AS pizza_type_count 310 | FROM updated_customer_orders AS co 311 | INNER JOIN pizza_runner.pizza_names AS pn 312 | ON co.pizza_id = pn.pizza_id 313 | WHERE EXISTS ( 314 | SELECT 1 FROM updated_runner_orders AS ro 315 | WHERE ro.order_id = co.order_id 316 | AND ( 317 | ro.cancellation IS NULL 318 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 319 | ) 320 | ) 321 | GROUP BY pn.pizza_name 322 | ORDER BY pn.pizza_name; 323 | 324 | --Result: 325 | +─────────────+───────────────────+ 326 | | pizza_name | pizza_type_count | 327 | +─────────────+───────────────────+ 328 | | Meatlovers | 9 | 329 | | Vegetarian | 3 | 330 | +─────────────+───────────────────+ 331 | 332 | -- 5. How many Vegetarian and Meatlovers were ordered by each customer? 333 | SELECT 334 | customer_id, 335 | SUM(CASE WHEN pizza_id = 1 THEN 1 ELSE 0 END) AS meat_lovers, 336 | SUM(CASE WHEN pizza_id = 2 THEN 1 ELSE 0 END) AS vegetarian 337 | FROM updated_customer_orders 338 | GROUP BY customer_id; 339 | 340 | --Result: 341 | +──────────────+──────────────+─────────────+ 342 | | customer_id | meat_lovers | vegetarian | 343 | +──────────────+──────────────+─────────────+ 344 | | 101 | 2 | 1 | 345 | | 103 | 3 | 1 | 346 | | 104 | 3 | 0 | 347 | | 105 | 0 | 1 | 348 | | 102 | 2 | 1 | 349 | +──────────────+──────────────+─────────────+ 350 | 351 | -- 6. What was the maximum number of pizzas delivered in a single order? 352 | SELECT MAX(pizza_count) AS max_count 353 | FROM ( 354 | SELECT 355 | co.order_id, 356 | COUNT(co.pizza_id) AS pizza_count 357 | FROM updated_customer_orders AS co 358 | INNER JOIN updated_runner_orders AS ro 359 | ON co.order_id = ro.order_id 360 | WHERE 361 | ro.cancellation IS NULL 362 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 363 | GROUP BY co.order_id) AS mycount; 364 | 365 | --Result: 366 | +────────────+ 367 | | max_count | 368 | +────────────+ 369 | | 3 | 370 | +────────────+ 371 | 372 | -- 7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes? 373 | SELECT 374 | co.customer_id, 375 | SUM (CASE WHEN co.exclusions IS NOT NULL OR co.extras IS NOT NULL THEN 1 ELSE 0 END) AS changes, 376 | SUM (CASE WHEN co.exclusions IS NULL OR co.extras IS NULL THEN 1 ELSE 0 END) AS no_change 377 | FROM updated_customer_orders AS co 378 | INNER JOIN updated_runner_orders AS ro 379 | ON co.order_id = ro.order_id 380 | WHERE ro.cancellation IS NULL 381 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 382 | GROUP BY co.customer_id 383 | ORDER BY co.customer_id; 384 | 385 | --Result: 386 | +──────────────+──────────+────────────+ 387 | | customer_id | changes | no_change | 388 | +──────────────+──────────+────────────+ 389 | | 101 | 0 | 2 | 390 | | 102 | 0 | 3 | 391 | | 103 | 3 | 3 | 392 | | 104 | 2 | 2 | 393 | | 105 | 1 | 1 | 394 | +──────────────+──────────+────────────+ 395 | 396 | -- 8. How many pizzas were delivered that had both exclusions and extras? 397 | SELECT 398 | SUM(CASE WHEN co.exclusions IS NOT NULL AND co.extras IS NOT NULL THEN 1 ELSE 0 END) as pizza_count 399 | FROM updated_customer_orders AS co 400 | INNER JOIN updated_runner_orders AS ro 401 | ON co.order_id = ro.order_id 402 | WHERE ro.cancellation IS NULL 403 | OR ro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation') 404 | 405 | --Result: 406 | +──────────────+ 407 | | pizza_count | 408 | +──────────────+ 409 | | 1 | 410 | +──────────────+ 411 | 412 | -- 9. What was the total volume of pizzas ordered for each hour of the day? 413 | SELECT 414 | DATE_PART('hour', order_time::TIMESTAMP) AS hour_of_day, 415 | COUNT(*) AS pizza_count 416 | FROM updated_customer_orders 417 | WHERE order_time IS NOT NULL 418 | GROUP BY hour_of_day 419 | ORDER BY hour_of_day; 420 | 421 | --Result: 422 | +──────────────+──────────────+ 423 | | hour_of_day | pizza_count | 424 | +──────────────+──────────────+ 425 | | 11 | 1 | 426 | | 12 | 2 | 427 | | 13 | 3 | 428 | | 18 | 3 | 429 | | 19 | 1 | 430 | | 21 | 3 | 431 | | 23 | 1 | 432 | +──────────────+──────────────+ 433 | 434 | -- 10. What was the volume of orders for each day of the week? 435 | SELECT 436 | TO_CHAR(order_time, 'Day') AS day_of_week, 437 | COUNT(*) AS pizza_count 438 | FROM updated_customer_orders 439 | GROUP BY 440 | day_of_week, 441 | DATE_PART('dow', order_time) 442 | ORDER BY day_of_week; 443 | 444 | --Result: 445 | +──────────────+──────────────+ 446 | | day_of_week | pizza_count | 447 | +──────────────+──────────────+ 448 | | Friday | 1 | 449 | | Saturday | 5 | 450 | | Thursday | 3 | 451 | | Wednesday | 5 | 452 | +──────────────+──────────────+ 453 | 454 | /* -------------------- 455 | Case Study Questions: 456 | Runner and Customer Experience 457 | --------------------*/ 458 | 459 | -- How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01) 460 | WITH runner_signups AS ( 461 | SELECT 462 | runner_id, 463 | registration_date, 464 | registration_date - ((registration_date - '2021-01-01') % 7) AS start_of_week 465 | FROM pizza_runner.runners 466 | ) 467 | SELECT 468 | start_of_week, 469 | COUNT(runner_id) AS signups 470 | FROM runner_signups 471 | GROUP BY start_of_week 472 | ORDER BY start_of_week; 473 | 474 | --Result: 475 | +───────────────────────────+──────────+ 476 | | start_of_week | signups | 477 | +───────────────────────────+──────────+ 478 | | 2021-01-01T00:00:00.000Z | 2 | 479 | | 2021-01-08T00:00:00.000Z | 1 | 480 | | 2021-01-15T00:00:00.000Z | 1 | 481 | +───────────────────────────+──────────+ 482 | 483 | 484 | | start_of_week | signups | 485 | |--------------------------|---------| 486 | | 2021-01-01T00:00:00.000Z | 2 | 487 | | 2021-01-08T00:00:00.000Z | 1 | 488 | | 2021-01-15T00:00:00.000Z | 1 | 489 | 490 | -- What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order? 491 | WITH runner_pickups AS ( 492 | SELECT 493 | ro.runner_id, 494 | ro.order_id, 495 | co.order_time, 496 | ro.pickup_time, 497 | (pickup_time - order_time) AS time_to_pickup 498 | FROM updated_runner_orders AS ro 499 | INNER JOIN updated_customer_orders AS co 500 | ON ro.order_id = co.order_id 501 | ) 502 | SELECT 503 | runner_id, 504 | date_part('minutes', AVG(time_to_pickup)) AS avg_arrival_minutes 505 | FROM runner_pickups 506 | GROUP BY runner_id 507 | ORDER BY runner_id; 508 | 509 | --Result: 510 | +────────────+──────────────────────+ 511 | | runner_id | avg_arrival_minutes | 512 | +────────────+──────────────────────+ 513 | | 1 | -4 | 514 | | 2 | 23 | 515 | | 3 | 10 | 516 | +────────────+──────────────────────+ 517 | 518 | | runner_id | avg_arrival_minutes | 519 | |-----------|---------------------| 520 | | 1 | -4 | 521 | | 2 | 23 | 522 | | 3 | 10 | 523 | 524 | -- Is there any relationship between the number of pizzas and how long the order takes to prepare? 525 | WITH order_count AS ( 526 | SELECT 527 | order_id, 528 | order_time, 529 | COUNT(pizza_id) AS pizzas_order_count 530 | FROM updated_customer_orders 531 | GROUP BY order_id, order_time 532 | ), 533 | prepare_time AS ( 534 | SELECT 535 | ro.order_id, 536 | co.order_time, 537 | ro.pickup_time, 538 | co.pizzas_order_count, 539 | (pickup_time - order_time) AS time_to_pickup 540 | FROM updated_runner_orders AS ro 541 | INNER JOIN order_count AS co 542 | ON ro.order_id = co.order_id 543 | WHERE pickup_time IS NOT NULL 544 | ) 545 | 546 | SELECT 547 | pizzas_order_count, 548 | AVG(time_to_pickup) AS avg_time 549 | FROM prepare_time 550 | GROUP BY pizzas_order_count 551 | ORDER BY pizzas_order_count; 552 | 553 | --Result: 554 | +─────────────────────+──────────────────+ 555 | | pizzas_order_count | avg_time | 556 | +─────────────────────+──────────────────+ 557 | | 1 | 12 | 558 | | 2 | -6 | 559 | | 3 | 29 | 560 | +─────────────────────+──────────────────+ 561 | 562 | | pizzas_order_count | avg_time | 563 | |--------------------|-----------------| 564 | | 1 | 12 | 565 | | 2 | -6 | 566 | | 3 | 29 | 567 | 568 | 569 | -- What was the average distance travelled for each runner? 570 | SELECT 571 | runner_id, 572 | ROUND(AVG(distance), 2) AS avg_distance 573 | FROM updated_runner_orders 574 | GROUP BY runner_id 575 | ORDER BY runner_id; 576 | 577 | --Result: 578 | +────────────+───────────────+ 579 | | runner_id | avg_distance | 580 | +────────────+───────────────+ 581 | | 1 | 15.85 | 582 | | 2 | 23.93 | 583 | | 3 | 10.00 | 584 | +────────────+───────────────+ 585 | 586 | | runner_id | avg_distance | 587 | |-----------|--------------| 588 | | 1 | 15.85 | 589 | | 2 | 23.93 | 590 | | 3 | 10.00 | 591 | 592 | -- What was the difference between the longest and shortest delivery times for all orders? 593 | SELECT 594 | MAX(duration) - MIN(duration) AS difference 595 | FROM updated_runner_orders; 596 | 597 | --Result: 598 | +─────────────+ 599 | | difference | 600 | +─────────────+ 601 | | 30 | 602 | +─────────────+ 603 | 604 | | difference | 605 | |------------| 606 | | 30 | 607 | 608 | -- What was the average speed for each runner for each delivery and do you notice any trend for these values? 609 | 610 | WITH order_count AS ( 611 | SELECT 612 | order_id, 613 | order_time, 614 | COUNT(pizza_id) AS pizzas_count 615 | FROM updated_customer_orders 616 | GROUP BY 617 | order_id, 618 | order_time 619 | ) 620 | SELECT 621 | ro.order_id, 622 | ro.runner_id, 623 | co.pizzas_count, 624 | ro.distance, 625 | ro.duration, 626 | ROUND(60 * ro.distance / ro.duration, 2) AS speed 627 | FROM updated_runner_orders AS ro 628 | INNER JOIN order_count AS co 629 | ON ro.order_id = co.order_id 630 | WHERE pickup_time IS NOT NULL 631 | ORDER BY speed DESC 632 | --Result: 633 | +───────────+────────────+───────────────+───────────+───────────+────────+ 634 | | order_id | runner_id | pizzas_count | distance | duration | speed | 635 | +───────────+────────────+───────────────+───────────+───────────+────────+ 636 | | 8 | 2 | 1 | 23.4 | 15 | 93.60 | 637 | | 7 | 2 | 1 | 25 | 25 | 60.00 | 638 | | 10 | 1 | 2 | 10 | 10 | 60.00 | 639 | | 2 | 1 | 1 | 20 | 27 | 44.44 | 640 | | 3 | 1 | 2 | 13.4 | 20 | 40.20 | 641 | | 5 | 3 | 1 | 10 | 15 | 40.00 | 642 | | 1 | 1 | 1 | 20 | 32 | 37.50 | 643 | | 4 | 2 | 3 | 23.4 | 40 | 35.10 | 644 | +───────────+────────────+───────────────+───────────+───────────+────────+ 645 | 646 | | order_id | runner_id | pizzas_count | distance | duration | speed | 647 | |----------|-----------|--------------|----------|----------|-------| 648 | | 8 | 2 | 1 | 23.4 | 15 | 93.60 | 649 | | 7 | 2 | 1 | 25 | 25 | 60.00 | 650 | | 10 | 1 | 2 | 10 | 10 | 60.00 | 651 | | 2 | 1 | 1 | 20 | 27 | 44.44 | 652 | | 3 | 1 | 2 | 13.4 | 20 | 40.20 | 653 | | 5 | 3 | 1 | 10 | 15 | 40.00 | 654 | | 1 | 1 | 1 | 20 | 32 | 37.50 | 655 | | 4 | 2 | 3 | 23.4 | 40 | 35.10 | 656 | 657 | /*Finding: 658 | Orders shown in decreasing order of average speed: 659 | While the fastest order only carried 1 pizza and the slowest order carried 3 pizzas, 660 | there is no clear trend that more pizzas slow down the delivery speed of an order. 661 | */ 662 | 663 | -- What is the successful delivery percentage for each runner? 664 | SELECT 665 | runner_id, 666 | COUNT(pickup_time) as delivered, 667 | COUNT(order_id) AS total, 668 | ROUND(100 * COUNT(pickup_time) / COUNT(order_id)) AS delivery_percent 669 | FROM updated_runner_orders 670 | GROUP BY runner_id 671 | ORDER BY runner_id; 672 | 673 | --Result: 674 | +────────────+────────────+────────+─────────────────+ 675 | | runner_id | delivered | total | delivery_percent| 676 | +────────────+────────────+────────+─────────────────+ 677 | | 1 | 4 | 4 | 100 | 678 | | 2 | 3 | 4 | 75 | 679 | | 3 | 1 | 2 | 50 | 680 | +────────────+────────────+────────+─────────────────+ 681 | 682 | | runner_id | delivered | total | delivery_percent | 683 | |-----------|-----------|-------|------------------| 684 | | 1 | 4 | 4 | 100 | 685 | | 2 | 3 | 4 | 75 | 686 | | 3 | 1 | 2 | 50 | -------------------------------------------------------------------------------- /Case Study #2 - Pizza Runner/schema-sql.sql: -------------------------------------------------------------------------------- 1 | CREATE SCHEMA pizza_runner; 2 | SET search_path = pizza_runner; 3 | 4 | DROP TABLE IF EXISTS runners; 5 | CREATE TABLE runners ( 6 | "runner_id" INTEGER, 7 | "registration_date" DATE 8 | ); 9 | INSERT INTO runners 10 | ("runner_id", "registration_date") 11 | VALUES 12 | (1, '2021-01-01'), 13 | (2, '2021-01-03'), 14 | (3, '2021-01-08'), 15 | (4, '2021-01-15'); 16 | 17 | 18 | DROP TABLE IF EXISTS customer_orders; 19 | CREATE TABLE customer_orders ( 20 | "order_id" INTEGER, 21 | "customer_id" INTEGER, 22 | "pizza_id" INTEGER, 23 | "exclusions" VARCHAR(4), 24 | "extras" VARCHAR(4), 25 | "order_time" TIMESTAMP 26 | ); 27 | 28 | INSERT INTO customer_orders 29 | ("order_id", "customer_id", "pizza_id", "exclusions", "extras", "order_time") 30 | VALUES 31 | ('1', '101', '1', '', '', '2020-01-01 18:05:02'), 32 | ('2', '101', '1', '', '', '2020-01-01 19:00:52'), 33 | ('3', '102', '1', '', '', '2020-01-02 12:51:23'), 34 | ('3', '102', '2', '', NULL, '2020-01-02 12:51:23'), 35 | ('4', '103', '1', '4', '', '2020-01-04 13:23:46'), 36 | ('4', '103', '1', '4', '', '2020-01-04 13:23:46'), 37 | ('4', '103', '2', '4', '', '2020-01-04 13:23:46'), 38 | ('5', '104', '1', 'null', '1', '2020-01-08 21:00:29'), 39 | ('6', '101', '2', 'null', 'null', '2020-01-08 21:03:13'), 40 | ('7', '105', '2', 'null', '1', '2020-01-08 21:20:29'), 41 | ('8', '102', '1', 'null', 'null', '2020-01-09 23:54:33'), 42 | ('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'), 43 | ('10', '104', '1', 'null', 'null', '2020-01-11 18:34:49'), 44 | ('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49'); 45 | 46 | 47 | DROP TABLE IF EXISTS runner_orders; 48 | CREATE TABLE runner_orders ( 49 | "order_id" INTEGER, 50 | "runner_id" INTEGER, 51 | "pickup_time" VARCHAR(19), 52 | "distance" VARCHAR(7), 53 | "duration" VARCHAR(10), 54 | "cancellation" VARCHAR(23) 55 | ); 56 | 57 | INSERT INTO runner_orders 58 | ("order_id", "runner_id", "pickup_time", "distance", "duration", "cancellation") 59 | VALUES 60 | ('1', '1', '2020-01-01 18:15:34', '20km', '32 minutes', ''), 61 | ('2', '1', '2020-01-01 19:10:54', '20km', '27 minutes', ''), 62 | ('3', '1', '2020-01-02 00:12:37', '13.4km', '20 mins', NULL), 63 | ('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL), 64 | ('5', '3', '2020-01-08 21:10:57', '10', '15', NULL), 65 | ('6', '3', 'null', 'null', 'null', 'Restaurant Cancellation'), 66 | ('7', '2', '2020-01-08 21:30:45', '25km', '25mins', 'null'), 67 | ('8', '2', '2020-01-10 00:15:02', '23.4 km', '15 minute', 'null'), 68 | ('9', '2', 'null', 'null', 'null', 'Customer Cancellation'), 69 | ('10', '1', '2020-01-11 18:50:20', '10km', '10minutes', 'null'); 70 | 71 | 72 | DROP TABLE IF EXISTS pizza_names; 73 | CREATE TABLE pizza_names ( 74 | "pizza_id" INTEGER, 75 | "pizza_name" TEXT 76 | ); 77 | INSERT INTO pizza_names 78 | ("pizza_id", "pizza_name") 79 | VALUES 80 | (1, 'Meatlovers'), 81 | (2, 'Vegetarian'); 82 | 83 | 84 | DROP TABLE IF EXISTS pizza_recipes; 85 | CREATE TABLE pizza_recipes ( 86 | "pizza_id" INTEGER, 87 | "toppings" TEXT 88 | ); 89 | INSERT INTO pizza_recipes 90 | ("pizza_id", "toppings") 91 | VALUES 92 | (1, '1, 2, 3, 4, 5, 6, 8, 10'), 93 | (2, '4, 6, 7, 9, 11, 12'); 94 | 95 | 96 | DROP TABLE IF EXISTS pizza_toppings; 97 | CREATE TABLE pizza_toppings ( 98 | "topping_id" INTEGER, 99 | "topping_name" TEXT 100 | ); 101 | INSERT INTO pizza_toppings 102 | ("topping_id", "topping_name") 103 | VALUES 104 | (1, 'Bacon'), 105 | (2, 'BBQ Sauce'), 106 | (3, 'Beef'), 107 | (4, 'Cheese'), 108 | (5, 'Chicken'), 109 | (6, 'Mushrooms'), 110 | (7, 'Onions'), 111 | (8, 'Pepperoni'), 112 | (9, 'Peppers'), 113 | (10, 'Salami'), 114 | (11, 'Tomatoes'), 115 | (12, 'Tomato Sauce'); -------------------------------------------------------------------------------- /Case Study #3 - Foodie-Fi/README.md: -------------------------------------------------------------------------------- 1 | # [8-Week SQL Challenge](https://github.com/ndleah/8-Week-SQL-Challenge) 2 | ![Star Badge](https://img.shields.io/static/v1?label=%F0%9F%8C%9F&message=If%20Useful&style=style=flat&color=BC4E99) 3 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-971901?)](https://github.com/ndleah/8-Week-SQL-Challenge) 4 | [![View Repositories](https://img.shields.io/badge/View-My_Repositories-blue?logo=GitHub)](https://github.com/ndleah?tab=repositories) 5 | [![View My Profile](https://img.shields.io/badge/View-My_Profile-green?logo=GitHub)](https://github.com/ndleah) 6 | 7 | 8 | # 🥑 Case Study #3 - Foodie-Fi 9 |

10 | 11 | 12 | ## 📕 Table Of Contents 13 | - 🛠️ [Problem Statement](#problem-statement) 14 | - 📂 [Dataset](#dataset) 15 | - 🧙‍♂️ [Case Study Questions](#case-study-questions) 16 | - 🚀 [Solutions](#-solutions) 17 | 18 | ## 🛠️ Problem Statement 19 | 20 | Danny finds a few smart friends to launch his new startup Foodie-Fi in 2020 and started selling monthly and annual subscriptions, giving their customers unlimited on-demand access to exclusive food videos from around the world! 21 | 22 | Danny created Foodie-Fi with a data driven mindset and wanted to ensure all future investment decisions and new features were decided using data. This case study focuses on using subscription style digital data to answer important business questions. 23 | 24 | ## 📂 Dataset 25 | Danny has shared with you 2 key datasets for this case study: 26 | 27 | ### **```plan```** 28 | 29 |

30 | 31 | View table 32 | 33 | 34 | The plan table shows which plans customer can choose to join Foodie-Fi when they first sign up. 35 | 36 | * **Trial:** can sign up to an initial 7 day free trial will automatically continue with the pro monthly subscription plan unless they cancel 37 | 38 | * **Basic plan:** limited access and can only stream user videos 39 | * **Pro plan** no watch time limits and video are downloadable with 2 subscription options: **monthly** and **annually** 40 | 41 | 42 | | "plan_id" | "plan_name" | "price" | 43 | |-----------|-----------------|---------| 44 | | 0 | "trial" | 0.00 | 45 | | 1 | "basic monthly" | 9.90 | 46 | | 2 | "pro monthly" | 19.90 | 47 | | 3 | "pro annual" | 199.00 | 48 | | 4 | "churn" | NULL | 49 | 50 | 51 |
52 | 53 | 54 | ### **```subscriptions```** 55 | 56 | 57 |
58 | 59 | View table 60 | 61 | 62 | Customer subscriptions show the exact date where their specific ```plan_id``` starts. 63 | 64 | If customers downgrade from a pro plan or cancel their subscription - the higher plan will remain in place until the period is over - the ```start_date``` in the ```subscriptions``` table will reflect the date that the actual plan changes. 65 | 66 | In this part, I will display the first 20 rows of this dataset since the original one is super long: 67 | 68 | 69 | | "customer_id" | "plan_id" | "start_date" | 70 | |---------------|-----------|--------------| 71 | | 1 | 0 | "2020-08-01" | 72 | | 1 | 1 | "2020-08-08" | 73 | | 2 | 0 | "2020-09-20" | 74 | | 2 | 3 | "2020-09-27" | 75 | | 3 | 0 | "2020-01-13" | 76 | | 3 | 1 | "2020-01-20" | 77 | | 4 | 0 | "2020-01-17" | 78 | | 4 | 1 | "2020-01-24" | 79 | | 4 | 4 | "2020-04-21" | 80 | | 5 | 0 | "2020-08-03" | 81 | | 5 | 1 | "2020-08-10" | 82 | | 6 | 0 | "2020-12-23" | 83 | | 6 | 1 | "2020-12-30" | 84 | | 6 | 4 | "2021-02-26" | 85 | | 7 | 0 | "2020-02-05" | 86 | | 7 | 1 | "2020-02-12" | 87 | | 7 | 2 | "2020-05-22" | 88 | | 8 | 0 | "2020-06-11" | 89 | | 8 | 1 | "2020-06-18" | 90 | | 8 | 2 | "2020-08-03" | 91 | 92 | 93 |
94 | 95 | 96 | ## 🧙‍♂️ Case Study Questions 97 | 98 | 1. How many customers has Foodie-Fi ever had? 99 | 2. What is the monthly distribution of **```trial```** plan **```start_date```** values for our dataset - use the start of the month as the group by value 100 | 3. What plan **```start_date```** values occur after the year 2020 for our dataset? Show the breakdown by count of events for each **```plan_name```** 101 | 4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place? 102 | 5. How many customers have churned straight after their initial free trial - what percentage is this rounded to the nearest whole number? 103 | 6. What is the number and percentage of customer plans after their initial free trial? 104 | 7. What is the customer count and percentage breakdown of all 5 **```plan_name```** values at **```2020-12-31```**? 105 | 8. How many customers have upgraded to an annual plan in 2020? 106 | 9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi? 107 | 10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc) 108 | 11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020? 109 | 110 | ## 🚀 Solutions 111 | 112 | **Q1. How many customers has Foodie-Fi ever had?** 113 | ```SQL 114 | SELECT COUNT(DISTINCT customer_id) AS total_customers 115 | FROM foodie_fi.subscriptions; 116 | ``` 117 | 118 | 119 | | total_customers | 120 | |-------------------| 121 | | 1000 | 122 | 123 | 124 | **Q2. What plan start_date values occur after the year 2020 for our dataset? Show the breakdown by count of events for each ```plan_name** 125 | 126 | ```SQL 127 | SELECT EXTRACT(MONTH FROM start_date) AS months, COUNT(*) 128 | FROM foodie_fi.subscriptions 129 | WHERE plan_id = 0 130 | GROUP BY months 131 | ORDER BY months; 132 | ``` 133 | 134 | months | count 135 | -------|------- 136 | 1 | 88 137 | 2 | 68 138 | 3 | 94 139 | 4 | 81 140 | 5 | 88 141 | 6 | 79 142 | 7 | 89 143 | 8 | 88 144 | 9 | 87 145 | 10 | 79 146 | 11 | 75 147 | 12 | 84 148 | 149 | 150 | 151 | **Q3. What plan start_date values occur after the year 2020 for our dataset? Show the breakdown by count of events for each plan_name** 152 | ```SQL 153 | SELECT plan_id, COUNT(*) 154 | FROM foodie_fi.subscriptions 155 | WHERE start_date > '2020-01-01'::DATE 156 | GROUP BY plan_id 157 | ORDER BY plan_id; 158 | ``` 159 | 160 | plan_id | count 161 | --------|------- 162 | 0 | 997 163 | 1 | 546 164 | 2 | 539 165 | 3 | 258 166 | 4 | 307 167 | 168 | --- 169 | 170 | **Q4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place?** 171 | ```SQL 172 | DROP TABLE IF EXISTS total_count; 173 | CREATE TEMP TABLE total_count AS ( 174 | SELECT COUNT(DISTINCT customer_id) AS num 175 | FROM foodie_fi.subscriptions 176 | ); 177 | 178 | WITH churn_count AS ( 179 | SELECT COUNT(DISTINCT customer_id) AS num 180 | FROM foodie_fi.subscriptions 181 | WHERE plan_id = 4 182 | ) 183 | SELECT churn_count.num AS num_churned, 184 | churn_count.num::FLOAT / total_count.num::FLOAT *100 AS percent_churned 185 | FROM churn_count, total_count; 186 | ``` 187 | 188 | num_churned | percent_churned 189 | -------------|----------------- 190 | 307 | 30.7 191 | 192 | 193 | 194 | **Q5. How many customers have churned straight after their initial free trial what percentage is this rounded to the nearest whole number?** 195 | ```SQL 196 | DROP TABLE IF EXISTS next_plan_cte; 197 | CREATE TEMP TABLE next_plan_cte AS( 198 | SELECT *, 199 | LEAD(plan_id, 1) 200 | OVER(PARTITION BY customer_id ORDER BY start_date) as next_plan 201 | FROM foodie_fi.subscriptions 202 | ); 203 | 204 | WITH direct_churner_cte AS ( 205 | SELECT COUNT(DISTINCT customer_id) AS direct_churner 206 | FROM next_plan_cte 207 | WHERE plan_id = 0 AND next_plan = 4 208 | ) 209 | 210 | SELECT direct_churner, direct_churner::FLOAT/num::FLOAT * 100 AS percent_churned 211 | FROM direct_churner_cte, total_count; 212 | ``` 213 | 214 | 215 | direct_churner | percent_churned 216 | ----------------|----------------- 217 | 92 | 9.2 218 | 219 | 220 | **Q6. What is the number and percentage of customer plans after their initial free trial?** 221 | ```SQL 222 | DROP TABLE IF EXISTS current_plan_count; 223 | CREATE TEMP TABLE current_plan_count AS ( 224 | SELECT plan_id, COUNT(DISTINCT customer_id) AS num 225 | FROM foodie_fi.subscriptions 226 | GROUP BY plan_id 227 | ); 228 | 229 | WITH conversions AS ( 230 | SELECT next_plan, COUNT(*) AS total_conversions 231 | FROM next_plan_cte 232 | WHERE next_plan IS NOT NULL AND plan_id = 0 233 | GROUP BY next_plan 234 | ORDER BY next_plan 235 | ) 236 | 237 | SELECT current_plan_count.plan_id, total_conversions, num, 238 | ROUND(CAST(total_conversions::FLOAT / num::FLOAT * 100 AS NUMERIC), 2) AS percent_directly_converted 239 | FROM current_plan_count JOIN conversions 240 | ON current_plan_count.plan_id = conversions.next_plan; 241 | ``` 242 | 243 | plan_id | total_conversions | num | percent_directly_converted 244 | -------------|-------------------|-----|---------------------------- 245 | 1 | 546 | 546 | 100.00 246 | 2 | 325 | 539 | 60.30 247 | 3 | 37 | 258 | 14.34 248 | 4 | 92 | 307 | 29.97 249 | 250 | 251 | **Q7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31?** 252 | ```SQL 253 | WITH next_date_cte AS ( 254 | SELECT *, 255 | LEAD (start_date, 1) OVER (PARTITION BY customer_id ORDER BY start_date) AS next_date 256 | FROM foodie_fi.subscriptions 257 | ), 258 | customers_on_date_cte AS ( 259 | SELECT plan_id, COUNT(DISTINCT customer_id) AS customers 260 | FROM next_date_cte 261 | WHERE (next_date IS NOT NULL AND ('2020-12-31'::DATE > start_date AND '2020-12-31'::DATE < next_date)) 262 | OR (next_date IS NULL AND '2020-12-31'::DATE > start_date) 263 | GROUP BY plan_id 264 | ) 265 | 266 | SELECT plan_id, customers, ROUND(CAST(customers::FLOAT / num::FLOAT * 100 AS NUMERIC), 2) AS percent 267 | FROM customers_on_date_cte, total_count; 268 | ``` 269 | 270 | plan_id | customers | percent 271 | ---------|-----------|--------- 272 | 0 | 19 | 1.90 273 | 1 | 224 | 22.40 274 | 2 | 326 | 32.60 275 | 3 | 195 | 19.50 276 | 4 | 235 | 23.50 277 | 278 | 279 | **Q8. How many customers have upgraded to an annual plan in 2020?** 280 | ```SQL 281 | SELECT COUNT(DISTINCT customer_id) 282 | FROM next_plan_cte 283 | WHERE next_plan=3 AND EXTRACT(YEAR FROM start_date) = '2020'; 284 | ``` 285 | 286 | |count | 287 | |------| 288 | |253 | 289 | 290 | **Q9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi?** 291 | ```SQL 292 | -- This will only find the average of people who have upgraded to annual plan 293 | WITH join_date AS ( 294 | SELECT customer_id, start_date 295 | FROM foodie_fi.subscriptions 296 | WHERE plan_id = 0 297 | ), 298 | pro_date AS ( 299 | SELECT customer_id, start_date AS upgrade_date 300 | FROM foodie_fi.subscriptions 301 | WHERE plan_id = 3 302 | ) 303 | 304 | SELECT ROUND(AVG(upgrade_date - start_date), 2) AS avg_days_to_upgrade 305 | FROM join_date JOIN pro_date 306 | ON join_date.customer_id = pro_date.customer_id; 307 | ``` 308 | 309 | |avg_days_to_upgrade| 310 | |-------------------| 311 | |104.62 | 312 | 313 | 314 | **Q10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)** 315 | ```SQL 316 | WITH join_date AS ( 317 | SELECT customer_id, start_date 318 | FROM foodie_fi.subscriptions 319 | WHERE plan_id = 0 320 | ), 321 | pro_date AS ( 322 | SELECT customer_id, start_date AS upgrade_date 323 | FROM foodie_fi.subscriptions 324 | WHERE plan_id = 3 325 | ), 326 | bins AS ( 327 | SELECT WIDTH_BUCKET(upgrade_date - start_date, 0, 360, 12) AS avg_days_to_upgrade 328 | FROM join_date JOIN pro_date 329 | ON join_date.customer_id = pro_date.customer_id 330 | ) 331 | 332 | SELECT ((avg_days_to_upgrade - 1)*30 || '-' || (avg_days_to_upgrade)*30) AS "30-day-range", COUNT(*) 333 | FROM bins 334 | GROUP BY avg_days_to_upgrade 335 | ORDER BY avg_days_to_upgrade; 336 | ``` 337 | 338 | | 30-day-range | count | 339 | |----------------|---------| 340 | | 0-30 | 48 | 341 | | 30-60 | 25 | 342 | | 60-90 | 33 | 343 | | 90-120 | 35 | 344 | | 120-150 | 43 | 345 | | 150-180 | 35 | 346 | | 180-210 | 27 | 347 | | 210-240 | 4 | 348 | | 240-270 | 5 | 349 | | 270-300 | 1 | 350 | | 300-330 | 1 | 351 | | 330-360 | 1 | 352 | 353 | 354 | 355 | **Q11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020?** 356 | 357 | ```SQL 358 | SELECT COUNT(*) AS customers_downgraded 359 | FROM next_plan_cte 360 | WHERE plan_id=2 AND next_plan=1; 361 | ``` 362 | |customers_downgraded| 363 | |--------------------| 364 | | 0 | 365 | 366 | --- 367 |

© 2021 Leah Nguyen

368 | -------------------------------------------------------------------------------- /Case Study #3 - Foodie-Fi/query-sql.sql: -------------------------------------------------------------------------------- 1 | -- 1. How many customers has Foodie-Fi ever had? 2 | SELECT COUNT(DISTINCT customer_id) AS total_customers 3 | FROM foodie_fi.subscriptions; 4 | 5 | -- Query Results 6 | 7 | -- total_customers 8 | -- ----------------- 9 | -- 1000 10 | 11 | 12 | -- 2. What is the monthly distribution of trial plan start_date values for our 13 | -- dataset - use the start of the month as the group by value 14 | 15 | SELECT EXTRACT(MONTH FROM start_date) AS months, COUNT(*) 16 | FROM foodie_fi.subscriptions 17 | WHERE plan_id = 0 18 | GROUP BY months 19 | ORDER BY months; 20 | 21 | -- Query Results 22 | 23 | -- months | count 24 | -- --------+------- 25 | -- 1 | 88 26 | -- 2 | 68 27 | -- 3 | 94 28 | -- 4 | 81 29 | -- 5 | 88 30 | -- 6 | 79 31 | -- 7 | 89 32 | -- 8 | 88 33 | -- 9 | 87 34 | -- 10 | 79 35 | -- 11 | 75 36 | -- 12 | 84 37 | 38 | 39 | -- 3. What plan start_date values occur after the year 2020 for our dataset? Show the 40 | -- breakdown by count of events for each plan_name 41 | 42 | SELECT plan_id, COUNT(*) 43 | FROM foodie_fi.subscriptions 44 | WHERE start_date > '2020-01-01'::DATE 45 | GROUP BY plan_id 46 | ORDER BY plan_id; 47 | 48 | -- Query Results 49 | 50 | -- plan_id | count 51 | -- ---------+------- 52 | -- 0 | 997 53 | -- 1 | 546 54 | -- 2 | 539 55 | -- 3 | 258 56 | -- 4 | 307 57 | 58 | 59 | -- 4. What is the customer count and percentage of customers who have churned rounded 60 | -- to 1 decimal place? 61 | 62 | DROP TABLE IF EXISTS total_count; 63 | CREATE TEMP TABLE total_count AS ( 64 | SELECT COUNT(DISTINCT customer_id) AS num 65 | FROM foodie_fi.subscriptions 66 | ); 67 | 68 | WITH churn_count AS ( 69 | SELECT COUNT(DISTINCT customer_id) AS num 70 | FROM foodie_fi.subscriptions 71 | WHERE plan_id = 4 72 | ) 73 | SELECT churn_count.num AS num_churned, 74 | churn_count.num::FLOAT / total_count.num::FLOAT *100 AS percent_churned 75 | FROM churn_count, total_count; 76 | 77 | -- Query Results 78 | 79 | -- num_churned | percent_churned 80 | -- -------------+----------------- 81 | -- 307 | 30.7 82 | 83 | 84 | -- 5. How many customers have churned straight after their initial free trial 85 | -- - what percentage is this rounded to the nearest whole number? 86 | 87 | DROP TABLE IF EXISTS next_plan_cte; 88 | CREATE TEMP TABLE next_plan_cte AS( 89 | SELECT *, 90 | LEAD(plan_id, 1) 91 | OVER(PARTITION BY customer_id ORDER BY start_date) as next_plan 92 | FROM foodie_fi.subscriptions 93 | ); 94 | 95 | WITH direct_churner_cte AS ( 96 | SELECT COUNT(DISTINCT customer_id) AS direct_churner 97 | FROM next_plan_cte 98 | WHERE plan_id = 0 AND next_plan = 4 99 | ) 100 | 101 | SELECT direct_churner, direct_churner::FLOAT/num::FLOAT * 100 AS percent_churned 102 | FROM direct_churner_cte, total_count; 103 | 104 | -- Query Results 105 | 106 | -- direct_churner | percent_churned 107 | -- ----------------+----------------- 108 | -- 92 | 9.2 109 | 110 | 111 | -- 6. What is the number and percentage of customer plans after their initial free trial? 112 | 113 | DROP TABLE IF EXISTS current_plan_count; 114 | CREATE TEMP TABLE current_plan_count AS ( 115 | SELECT plan_id, COUNT(DISTINCT customer_id) AS num 116 | FROM foodie_fi.subscriptions 117 | GROUP BY plan_id 118 | ); 119 | 120 | WITH conversions AS ( 121 | SELECT next_plan, COUNT(*) AS total_conversions 122 | FROM next_plan_cte 123 | WHERE next_plan IS NOT NULL AND plan_id = 0 124 | GROUP BY next_plan 125 | ORDER BY next_plan 126 | ) 127 | 128 | SELECT current_plan_count.plan_id, total_conversions, num, 129 | ROUND(CAST(total_conversions::FLOAT / num::FLOAT * 100 AS NUMERIC), 2) AS percent_directly_converted 130 | FROM current_plan_count JOIN conversions 131 | ON current_plan_count.plan_id = conversions.next_plan; 132 | 133 | -- Query Results 134 | 135 | -- plan_id | total_conversions | num | percent_directly_converted 136 | -- ---------+-------------------+-----+---------------------------- 137 | -- 1 | 546 | 546 | 100.00 138 | -- 2 | 325 | 539 | 60.30 139 | -- 3 | 37 | 258 | 14.34 140 | -- 4 | 92 | 307 | 29.97 141 | 142 | 143 | -- 7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31? 144 | 145 | WITH next_date_cte AS ( 146 | SELECT *, 147 | LEAD (start_date, 1) OVER (PARTITION BY customer_id ORDER BY start_date) AS next_date 148 | FROM foodie_fi.subscriptions 149 | ), 150 | customers_on_date_cte AS ( 151 | SELECT plan_id, COUNT(DISTINCT customer_id) AS customers 152 | FROM next_date_cte 153 | WHERE (next_date IS NOT NULL AND ('2020-12-31'::DATE > start_date AND '2020-12-31'::DATE < next_date)) 154 | OR (next_date IS NULL AND '2020-12-31'::DATE > start_date) 155 | GROUP BY plan_id 156 | ) 157 | 158 | SELECT plan_id, customers, ROUND(CAST(customers::FLOAT / num::FLOAT * 100 AS NUMERIC), 2) AS percent 159 | FROM customers_on_date_cte, total_count; 160 | 161 | -- Query Results 162 | 163 | -- The number of customers on 2020-12-31 164 | 165 | -- plan_id | customers | percent 166 | -- ---------+-----------+--------- 167 | -- 0 | 19 | 1.90 168 | -- 1 | 224 | 22.40 169 | -- 2 | 326 | 32.60 170 | -- 3 | 195 | 19.50 171 | -- 4 | 235 | 23.50 172 | 173 | 174 | -- 8. How many customers have upgraded to an annual plan in 2020? 175 | 176 | SELECT COUNT(DISTINCT customer_id) 177 | FROM next_plan_cte 178 | WHERE next_plan=3 AND EXTRACT(YEAR FROM start_date) = '2020'; 179 | 180 | -- Query Results 181 | 182 | -- count 183 | -- ------- 184 | -- 253 185 | 186 | 187 | -- 9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi? 188 | 189 | -- This will only find the average of people who have upgraded to annual plan 190 | WITH join_date AS ( 191 | SELECT customer_id, start_date 192 | FROM foodie_fi.subscriptions 193 | WHERE plan_id = 0 194 | ), 195 | pro_date AS ( 196 | SELECT customer_id, start_date AS upgrade_date 197 | FROM foodie_fi.subscriptions 198 | WHERE plan_id = 3 199 | ) 200 | 201 | SELECT ROUND(AVG(upgrade_date - start_date), 2) AS avg_days_to_upgrade 202 | FROM join_date JOIN pro_date 203 | ON join_date.customer_id = pro_date.customer_id; 204 | 205 | -- Query Results 206 | 207 | -- avg_days_to_upgrade 208 | -- --------------------- 209 | -- 104.62 210 | 211 | 212 | -- 10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc) 213 | 214 | WITH join_date AS ( 215 | SELECT customer_id, start_date 216 | FROM foodie_fi.subscriptions 217 | WHERE plan_id = 0 218 | ), 219 | pro_date AS ( 220 | SELECT customer_id, start_date AS upgrade_date 221 | FROM foodie_fi.subscriptions 222 | WHERE plan_id = 3 223 | ), 224 | bins AS ( 225 | SELECT WIDTH_BUCKET(upgrade_date - start_date, 0, 360, 12) AS avg_days_to_upgrade 226 | FROM join_date JOIN pro_date 227 | ON join_date.customer_id = pro_date.customer_id 228 | ) 229 | 230 | 231 | SELECT ((avg_days_to_upgrade - 1)*30 || '-' || (avg_days_to_upgrade)*30) AS "30-day-range", COUNT(*) 232 | FROM bins 233 | GROUP BY avg_days_to_upgrade 234 | ORDER BY avg_days_to_upgrade; 235 | 236 | -- Query Results 237 | 238 | -- 30-day-range | count 239 | -- --------------+------- 240 | -- 0-30 | 48 241 | -- 30-60 | 25 242 | -- 60-90 | 33 243 | -- 90-120 | 35 244 | -- 120-150 | 43 245 | -- 150-180 | 35 246 | -- 180-210 | 27 247 | -- 210-240 | 4 248 | -- 240-270 | 5 249 | -- 270-300 | 1 250 | -- 300-330 | 1 251 | -- 330-360 | 1 252 | 253 | 254 | -- 11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020? 255 | 256 | SELECT COUNT(*) AS customers_downgraded 257 | FROM next_plan_cte 258 | WHERE plan_id=2 AND next_plan=1; 259 | 260 | -- Query Results 261 | 262 | -- customers_downgraded 263 | -- ---------------------- 264 | -- 0 -------------------------------------------------------------------------------- /Case Study #4 - Data Bank/README.md: -------------------------------------------------------------------------------- 1 | # [8-Week SQL Challenge](https://github.com/ndleah/8-Week-SQL-Challenge) 2 | ![Star Badge](https://img.shields.io/static/v1?label=%F0%9F%8C%9F&message=If%20Useful&style=style=flat&color=BC4E99) 3 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-971901?)](https://github.com/ndleah/8-Week-SQL-Challenge) 4 | [![View Repositories](https://img.shields.io/badge/View-My_Repositories-blue?logo=GitHub)](https://github.com/ndleah?tab=repositories) 5 | [![View My Profile](https://img.shields.io/badge/View-My_Profile-green?logo=GitHub)](https://github.com/ndleah) 6 | # 🪙 Case Study #4 - Data Bank 7 |

8 | 9 | 10 | 11 | ## 📕 Table Of Contents 12 | - 🛠️ [Problem Statement](#problem-statement) 13 | - 📂 [Dataset](#dataset) 14 | - 🧙‍♂️ [Case Study Questions](#case-study-questions) 15 | ## 🛠️ Problem Statement 16 | > Danny thought that there should be some sort of intersection between these new age banks, cryptocurrency and the data world…so he decides to launch a new initiative - **Data Bank**! 17 | > 18 | > The management team at Data Bank want to increase their total customer base - but also need some help tracking just how much data storage their customers will need. 19 | > 20 | >This case study is all about calculating metrics, growth and helping the business analyse their data in a smart way to better forecast and plan for their future developments! 21 | 22 | ## 📂 Dataset 23 | Danny has shared with you 2 key datasets for this case study: 24 | ### **```region```** 25 | 26 |

27 | 28 | View table 29 | 30 | 31 | This ```regions``` table contains the ```region_id``` and their respective ```region_name``` values 32 | 33 | | "region_id" | "region_name" | 34 | |-------------|---------------| 35 | | 1 | "Australia" | 36 | | 2 | "America" | 37 | | 3 | "Africa" | 38 | | 4 | "Asia" | 39 | | 5 | "Europe" | 40 |
41 | 42 | ### **```Customer Nodes```** 43 | 44 |
45 | 46 | View table 47 | 48 | 49 | Customers are randomly distributed across the nodes according to their region - this also specifies exactly which node contains both their cash and data. 50 | This random distribution changes frequently to reduce the risk of hackers getting into Data Bank’s system and stealing customer’s money and data! 51 | Below is a sample of the top 10 rows of the ```data_bank.customer_nodes``` 52 | 53 | | "customer_id" | "region_id" | "node_id" | "start_date" | "end_date" | 54 | |---------------|-------------|-----------|--------------|--------------| 55 | | 1 | 3 | 4 | "2020-01-02" | "2020-01-03" | 56 | | 2 | 3 | 5 | "2020-01-03" | "2020-01-17" | 57 | | 3 | 5 | 4 | "2020-01-27" | "2020-02-18" | 58 | | 4 | 5 | 4 | "2020-01-07" | "2020-01-19" | 59 | | 5 | 3 | 3 | "2020-01-15" | "2020-01-23" | 60 | | 6 | 1 | 1 | "2020-01-11" | "2020-02-06" | 61 | | 7 | 2 | 5 | "2020-01-20" | "2020-02-04" | 62 | | 8 | 1 | 2 | "2020-01-15" | "2020-01-28" | 63 | | 9 | 4 | 5 | "2020-01-21" | "2020-01-25" | 64 | | 10 | 3 | 4 | "2020-01-13" | "2020-01-14" | 65 |
66 | 67 | ### **```Customer Transactions```** 68 | 69 |
70 | 71 | View table 72 | 73 | 74 | This table stores all customer deposits, withdrawals and purchases made using their Data Bank debit card. 75 | 76 | | "customer_id" | "txn_date" | "txn_type" | "txn_amount" | 77 | |---------------|--------------|------------|--------------| 78 | | 429 | "2020-01-21" | "deposit" | 82 | 79 | | 155 | "2020-01-10" | "deposit" | 712 | 80 | | 398 | "2020-01-01" | "deposit" | 196 | 81 | | 255 | "2020-01-14" | "deposit" | 563 | 82 | | 185 | "2020-01-29" | "deposit" | 626 | 83 | | 309 | "2020-01-13" | "deposit" | 995 | 84 | | 312 | "2020-01-20" | "deposit" | 485 | 85 | | 376 | "2020-01-03" | "deposit" | 706 | 86 | | 188 | "2020-01-13" | "deposit" | 601 | 87 | | 138 | "2020-01-11" | "deposit" | 520 | 88 |
89 | 90 | ## 🧙‍♂️ Case Study Questions 91 |

92 | 93 | 94 | ### **A. Customer Nodes Exploration** 95 | 1. How many unique nodes are there on the Data Bank system? 96 | 2. What is the number of nodes per region? 97 | 3. How many customers are allocated to each region? 98 | 4. How many days on average are customers reallocated to a different node? 99 | 5. What is the median, 80th and 95th percentile for this same reallocation days metric for each region? 100 | 101 | [![View Data Exploration Folder](https://img.shields.io/badge/View-Solution-971901?style=for-the-badge&logo=GITHUB)](https://github.com/ndleah/8-Week-SQL-Challenge/tree/main/Case%20Study%20%232%20-%20Pizza%20Runner/1.%20Pizza%20Metrics) 102 | 103 | ### **B. Customer Transactions** 104 | 105 | 1. What is the unique count and total amount for each transaction type? 106 | 2. What is the average total historical deposit counts and amounts for all customers? 107 | 3. For each month - how many Data Bank customers make more than 1 deposit and either 1 purchase or 1 withdrawal in a single month? 108 | 4. What is the closing balance for each customer at the end of the month? 109 | 5. What is the percentage of customers who increase their closing balance by more than 5%? 110 | 111 | [![View Data Exploration Folder](https://img.shields.io/badge/View-Solution-971901?style=for-the-badge&logo=GITHUB)](https://github.com/ndleah/8-Week-SQL-Challenge/tree/main/Case%20Study%20%232%20-%20Pizza%20Runner/2.%20Runner%20and%20Customer%20Experience) 112 | 113 | ## 🚀 Solutions 114 | ### **A. Customer Nodes Exploration** 115 | 116 |

117 | 118 | View solutions 119 | 120 | 121 | ### **Q1. How many unique nodes are there on the Data Bank system?** 122 | 123 | ```sql 124 | SELECT COUNT(DISTINCT node_id) AS node_counts 125 | FROM data_bank.customer_nodes; 126 | ``` 127 | 128 | | "node_count" | 129 | |--------------| 130 | | 5 | 131 | 132 | ### **Q2. What is the number of nodes per region?** 133 | 134 | ```sql 135 | SELECT 136 | regions.region_name, 137 | COUNT(DISTINCT customer_nodes.node_id) AS node_counts 138 | FROM data_bank.regions 139 | INNER JOIN data_bank.customer_nodes 140 | ON regions.region_id = customer_nodes.region_id 141 | GROUP BY regions.region_name; 142 | ``` 143 | 144 | | "region_name" | "node_counts" | 145 | |---------------|---------------| 146 | | "Africa" | 5 | 147 | | "America" | 5 | 148 | | "Asia" | 5 | 149 | | "Australia" | 5 | 150 | | "Europe" | 5 | 151 | 152 | ### **Q3. How many customers are allocated to each region?** 153 | 154 | ```sql 155 | SELECT 156 | regions.region_name, 157 | COUNT(DISTINCT customer_nodes.customer_id) AS customer_counts 158 | FROM data_bank.regions 159 | INNER JOIN data_bank.customer_nodes 160 | ON regions.region_id = customer_nodes.region_id 161 | GROUP BY regions.region_name; 162 | ``` 163 | 164 | | "region_name" | "customer_counts" | 165 | |---------------|-------------------| 166 | | "Africa" | 102 | 167 | | "America" | 105 | 168 | | "Asia" | 95 | 169 | | "Australia" | 110 | 170 | | "Europe" | 88 | 171 | 172 |
173 | 174 | 175 | ### **B. Customer Transactions** 176 | 177 |
178 | 179 | View solutions 180 | 181 | 182 | ### **Q1. 1. What is the unique count and total amount for each transaction type?** 183 | 184 | ```sql 185 | SELECT 186 | txn_type, 187 | COUNT(txn_type) AS unique_count, 188 | SUM(txn_amount) AS total_amount 189 | FROM data_bank.customer_transactions 190 | GROUP BY txn_type; 191 | ``` 192 | 193 | | "txn_type" | "unique_count" | "total_amount" | 194 | |--------------|----------------|----------------| 195 | | "purchase" | 1617 | 806537 | 196 | | "withdrawal" | 1580 | 793003 | 197 | | "deposit" | 2671 | 1359168 | 198 | 199 | 200 | ### **Q2. What is the average total historical deposit counts and amounts for all customers?** 201 | 202 | ```sql 203 | WITH cte_deposit AS ( 204 | SELECT 205 | customer_id, 206 | COUNT(txn_type) AS deposit_count, 207 | SUM(txn_amount) AS deposit_amount 208 | FROM data_bank.customer_transactions 209 | WHERE txn_type = 'deposit' 210 | GROUP BY customer_id 211 | ) 212 | SELECT 213 | AVG(deposit_count) AS avg_deposit_count, 214 | AVG(deposit_amount) AS avg_deposit_amount 215 | FROM cte_deposit; 216 | ``` 217 | 218 | | "avg_deposit_count" | "avg_deposit_amount" | 219 | |---------------------|-----------------------| 220 | | 5.3420000000000000 | 2718.3360000000000000 | 221 | 222 | 223 | ### **Q3. For each month - how many Data Bank customers make more than 1 deposit and either 1 purchase or 1 withdrawal in a single month?** 224 | 225 | ```SQL 226 | WITH cte_customer AS ( 227 | SELECT 228 | EXTRACT(MONTH FROM txn_date) AS month_part, 229 | TO_CHAR(txn_date, 'Month') AS month, 230 | customer_id, 231 | SUM(CASE WHEN txn_type = 'deposit' THEN 1 ELSE 0 END) AS deposit_count, 232 | SUM(CASE WHEN txn_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_count, 233 | SUM(CASE WHEN txn_type = 'withdrawal' THEN 1 ELSE 0 END) AS withdrawal_count 234 | FROM data_bank.customer_transactions 235 | GROUP BY 236 | EXTRACT(MONTH FROM txn_date), 237 | TO_CHAR(txn_date, 'Month'), 238 | customer_id 239 | ) 240 | SELECT 241 | month, 242 | COUNT(customer_id) AS customer_count 243 | FROM cte_customer 244 | WHERE deposit_count > 1 AND (purchase_count >= 1 OR withdrawal_count >= 1) 245 | GROUP BY 246 | month_part, 247 | month 248 | ORDER BY month_part; 249 | ``` 250 | 251 | | "month" | "customer_count" | 252 | |-------------|------------------| 253 | | "January " | 168 | 254 | | "February " | 181 | 255 | | "March " | 192 | 256 | | "April " | 70 | 257 | 258 | 259 |
260 | 261 | --- 262 |

© 2021 Leah Nguyen

-------------------------------------------------------------------------------- /Case Study #4 - Data Bank/query-sql.sql: -------------------------------------------------------------------------------- 1 | /************************** 2 | CASE STUDY #4 - Data Bank 3 | ***************************/ 4 | 5 | /************************** 6 | A. Customer Nodes Exploration 7 | **************************/ 8 | 9 | --1. How many unique nodes are there on the Data Bank system? 10 | SELECT COUNT(DISTINCT node_id) AS node_counts 11 | FROM data_bank.customer_nodes; 12 | 13 | --2. What is the number of nodes per region? 14 | SELECT 15 | regions.region_name, 16 | COUNT(DISTINCT customer_nodes.node_id) AS node_counts 17 | FROM data_bank.regions 18 | INNER JOIN data_bank.customer_nodes 19 | ON regions.region_id = customer_nodes.region_id 20 | GROUP BY regions.region_name; 21 | 22 | 23 | --3. How many customers are allocated to each region? 24 | SELECT 25 | regions.region_name, 26 | COUNT(DISTINCT customer_nodes.customer_id) AS customer_counts 27 | FROM data_bank.regions 28 | INNER JOIN data_bank.customer_nodes 29 | ON regions.region_id = customer_nodes.region_id 30 | GROUP BY regions.region_name; 31 | 32 | 33 | --4. How many days on average are customers reallocated to a different node? 34 | 35 | 36 | --5. What is the median, 80th and 95th percentile for this same reallocation days metric for each region? 37 | 38 | /************************** 39 | B. Customer Transactions 40 | **************************/ 41 | 42 | /************************** 43 | B. Customer Transactions 44 | **************************/ 45 | 46 | --1. What is the unique count and total amount for each transaction type? 47 | SELECT 48 | txn_type, 49 | COUNT(txn_type) AS unique_count, 50 | SUM(txn_amount) AS total_amount 51 | FROM data_bank.customer_transactions 52 | GROUP BY txn_type; 53 | 54 | --2. What is the average total historical deposit counts and amounts for all customers? 55 | WITH cte_deposit AS ( 56 | SELECT 57 | customer_id, 58 | COUNT(txn_type) AS deposit_count, 59 | SUM(txn_amount) AS deposit_amount 60 | FROM data_bank.customer_transactions 61 | WHERE txn_type = 'deposit' 62 | GROUP BY customer_id 63 | ) 64 | SELECT 65 | AVG(deposit_count) AS avg_deposit_count, 66 | AVG(deposit_amount) AS avg_deposit_amount 67 | FROM cte_deposit; 68 | 69 | --3. For each month - how many Data Bank customers make more than 1 deposit and either 1 purchase or 1 withdrawal in a single month? 70 | WITH cte_customer AS ( 71 | SELECT 72 | EXTRACT(MONTH FROM txn_date) AS month_part, 73 | TO_CHAR(txn_date, 'Month') AS month, 74 | customer_id, 75 | SUM(CASE WHEN txn_type = 'deposit' THEN 1 ELSE 0 END) AS deposit_count, 76 | SUM(CASE WHEN txn_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_count, 77 | SUM(CASE WHEN txn_type = 'withdrawal' THEN 1 ELSE 0 END) AS withdrawal_count 78 | FROM data_bank.customer_transactions 79 | GROUP BY 80 | EXTRACT(MONTH FROM txn_date), 81 | TO_CHAR(txn_date, 'Month'), 82 | customer_id 83 | ) 84 | SELECT 85 | month, 86 | COUNT(customer_id) AS customer_count 87 | FROM cte_customer 88 | WHERE deposit_count > 1 AND (purchase_count >= 1 OR withdrawal_count >= 1) 89 | GROUP BY 90 | month_part, 91 | month 92 | ORDER BY month_part; 93 | 94 | 95 | --4. What is the closing balance for each customer at the end of the month? 96 | --5. What is the percentage of customers who increase their closing balance by more than 5%? -------------------------------------------------------------------------------- /Case Study #7 - Balanced Tree Clothing Co/README.md: -------------------------------------------------------------------------------- 1 | # [8-Week SQL Challenge](https://github.com/ndleah/8-Week-SQL-Challenge) 2 | ![Star Badge](https://img.shields.io/static/v1?label=%F0%9F%8C%9F&message=If%20Useful&style=style=flat&color=BC4E99) 3 | [![View Main Folder](https://img.shields.io/badge/View-Main_Folder-971901?)](https://github.com/ndleah/8-Week-SQL-Challenge) 4 | [![View Repositories](https://img.shields.io/badge/View-My_Repositories-blue?logo=GitHub)](https://github.com/ndleah?tab=repositories) 5 | [![View My Profile](https://img.shields.io/badge/View-My_Profile-green?logo=GitHub)](https://github.com/ndleah) 6 | 7 | 8 | # 🌲 Case Study #7 - Balanced Tree Clothing Co. 9 |

10 | 11 | 12 | ## 📕 Table Of Contents 13 | - 🛠️ [Problem Statement](#problem-statement) 14 | - 📂 [Dataset](#dataset) 15 | - 🧙‍♂️ [Case Study Questions](#case-study-questions) 16 | - 🚀 [Solutions](#-solutions) 17 | 18 | ## 🛠️ Problem Statement 19 | 20 | > Balanced Tree Clothing Company prides themselves on providing an optimised range of clothing and lifestyle wear for the modern adventurer! 21 | > 22 | > Danny, the CEO of this trendy fashion company has asked you to assist the team’s merchandising teams analyse their sales performance and generate a basic financial report to share with the wider business. 23 | 24 | ## 📂 Dataset 25 | For this case study there is a total of 4 datasets for this case study. However you will only need to utilise 2 main tables to solve all of the regular questions: 26 | 27 | ### **```Product Details```** 28 | 29 |

30 | 31 | View table 32 | 33 | 34 | `balanced_tree.product_details` includes all information about the entire range that Balanced Clothing sells in their store. 35 | 36 | | "product_id" | "price" | "product_name" | "category_id" | "segment_id" | "style_id" | "category_name" | "segment_name" | "style_name" | 37 | |--------------|---------|------------------------------------|---------------|--------------|------------|-----------------|----------------|-----------------------| 38 | | "c4a632" | 13 | "Navy Oversized Jeans - Womens" | 1 | 3 | 7 | "Womens" | "Jeans" | "Navy Oversized" | 39 | | "e83aa3" | 32 | "Black Straight Jeans - Womens" | 1 | 3 | 8 | "Womens" | "Jeans" | "Black Straight" | 40 | | "e31d39" | 10 | "Cream Relaxed Jeans - Womens" | 1 | 3 | 9 | "Womens" | "Jeans" | "Cream Relaxed" | 41 | | "d5e9a6" | 23 | "Khaki Suit Jacket - Womens" | 1 | 4 | 10 | "Womens" | "Jacket" | "Khaki Suit" | 42 | | "72f5d4" | 19 | "Indigo Rain Jacket - Womens" | 1 | 4 | 11 | "Womens" | "Jacket" | "Indigo Rain" | 43 | | "9ec847" | 54 | "Grey Fashion Jacket - Womens" | 1 | 4 | 12 | "Womens" | "Jacket" | "Grey Fashion" | 44 | | "5d267b" | 40 | "White Tee Shirt - Mens" | 2 | 5 | 13 | "Mens" | "Shirt" | "White Tee" | 45 | | "c8d436" | 10 | "Teal Button Up Shirt - Mens" | 2 | 5 | 14 | "Mens" | "Shirt" | "Teal Button Up" | 46 | | "2a2353" | 57 | "Blue Polo Shirt - Mens" | 2 | 5 | 15 | "Mens" | "Shirt" | "Blue Polo" | 47 | | "f084eb" | 36 | "Navy Solid Socks - Mens" | 2 | 6 | 16 | "Mens" | "Socks" | "Navy Solid" | 48 | | "b9a74d" | 17 | "White Striped Socks - Mens" | 2 | 6 | 17 | "Mens" | "Socks" | "White Striped" | 49 | | "2feb6b" | 29 | "Pink Fluro Polkadot Socks - Mens" | 2 | 6 | 18 | "Mens" | "Socks" | "Pink Fluro Polkadot" | 50 | 51 |
52 | 53 | ### **```Product Sales```** 54 | 55 |
56 | 57 | View table 58 | 59 | 60 | `balanced_tree.sales` contains product level information for all the transactions made for Balanced Tree including quantity, price, percentage discount, member status, a transaction ID and also the transaction timestamp. 61 | 62 | Below is the display of the first 10 rows in this dataset: 63 | 64 | 65 | | "prod_id" | "qty" | "price" | "discount" | "member" | "txn_id" | "start_txn_time" | 66 | |-----------|-------|---------|------------|----------|----------|----------------------------| 67 | | "c4a632" | 4 | 13 | 17 | True | "54f307" | "2021-02-13 01:59:43.296" | 68 | | "5d267b" | 4 | 40 | 17 | True | "54f307" | "2021-02-13 01:59:43.296" | 69 | | "b9a74d" | 4 | 17 | 17 | True | "54f307" | "2021-02-13 01:59:43.296" | 70 | | "2feb6b" | 2 | 29 | 17 | True | "54f307" | "2021-02-13 01:59:43.296" | 71 | | "c4a632" | 5 | 13 | 21 | True | "26cc98" | "2021-01-19 01:39:00.3456" | 72 | | "e31d39" | 2 | 10 | 21 | True | "26cc98" | "2021-01-19 01:39:00.3456" | 73 | | "72f5d4" | 3 | 19 | 21 | True | "26cc98" | "2021-01-19 01:39:00.3456" | 74 | | "2a2353" | 3 | 57 | 21 | True | "26cc98" | "2021-01-19 01:39:00.3456" | 75 | | "f084eb" | 3 | 36 | 21 | True | "26cc98" | "2021-01-19 01:39:00.3456" | 76 | | "c4a632" | 1 | 13 | 21 | False | "ef648d" | "2021-01-27 02:18:17.1648" | 77 | 78 |
79 | 80 | 81 | ## 🧙‍♂️ Case Study Questions 82 |

83 | 84 | 85 | ### **A. High Level Sales Analysis** 86 | 87 | 1. What was the total quantity sold for all products? 88 | 2. What is the total generated revenue for all products before discounts? 89 | 3. What was the total discount amount for all products? 90 | 91 | 92 | ### **B. Transaction Analysis** 93 | 94 | 1. How many unique transactions were there? 95 | 2. What is the average unique products purchased in each transaction? 96 | 3. What are the 25th, 50th and 75th percentile values for the revenue per transaction? 97 | 4. What is the average discount value per transaction? 98 | 5. What is the percentage split of all transactions for members vs non-members? 99 | 6. What is the average revenue for member transactions and non-member transactions? 100 | 101 | ### **C. Product Analysis** 102 | 103 | 1. What are the top 3 products by total revenue before discount? 104 | 2. What is the total quantity, revenue and discount for each segment? 105 | 3. What is the top selling product for each segment? 106 | 4. What is the total quantity, revenue and discount for each category? 107 | 5. What is the top selling product for each category? 108 | 6. What is the percentage split of revenue by product for each segment? 109 | 7. What is the percentage split of revenue by segment for each category? 110 | 8. What is the percentage split of total revenue by category? 111 | 9. What is the total transaction “penetration” for each product? 112 | 10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction? 113 | 114 | 115 | ## 🚀 Solutions 116 | ### **A. High Level Sales Analysis** 117 | 118 |

119 | 120 | View solutions 121 | 122 | 123 | **Q1. What was the total quantity sold for all products?** 124 | ```sql 125 | --for all products in total 126 | SELECT 127 | SUM(qty) AS sale_counts 128 | FROM balanced_tree.sales AS sales; 129 | ``` 130 | 131 | **Result:** 132 | | "sale_counts" | 133 | |---------------| 134 | | 45216 | 135 | 136 | 137 | ```sql 138 | --for each product category 139 | SELECT 140 | details.product_name, 141 | SUM(sales.qty) AS sale_counts 142 | FROM balanced_tree.sales AS sales 143 | INNER JOIN balanced_tree.product_details AS details 144 | ON sales.prod_id = details.product_id 145 | GROUP BY details.product_name 146 | ORDER BY sale_counts DESC; 147 | ``` 148 | 149 | **Result:** 150 | | "product_name" | "sale_counts" | 151 | |---------------------------------|---------------| 152 | | "Grey Fashion Jacket - Womens" | 3876 | 153 | | "Navy Oversized Jeans - Womens" | 3856 | 154 | | "Blue Polo Shirt - Mens" | 3819 | 155 | | "White Tee Shirt - Mens" | 3800 | 156 | | "Navy Solid Socks - Mens" | 3792 | 157 | 158 | 159 | **Q2. What is the total generated revenue for all products before discounts?** 160 | ```sql 161 | --for all products in total 162 | SELECT 163 | SUM(price * qty) AS nodis_revenue 164 | FROM balanced_tree.sales AS sales; 165 | ``` 166 | 167 | **Result:** 168 | | "nodis_revenue" | 169 | |-----------------| 170 | | 1289453 | 171 | 172 | 173 | 174 | 175 | ```sql 176 | --for each product category 177 | SELECT 178 | details.product_name, 179 | SUM(sales.qty * sales.price) AS nodis_revenue 180 | FROM balanced_tree.sales AS sales 181 | INNER JOIN balanced_tree.product_details AS details 182 | ON sales.prod_id = details.product_id 183 | GROUP BY details.product_name 184 | ORDER BY nodis_revenue DESC; 185 | ``` 186 | 187 | **Result:** 188 | | "product_name" | "nodis_revenue" | 189 | |---------------------------------|-----------------| 190 | | "Blue Polo Shirt - Mens" | 217683 | 191 | | "Grey Fashion Jacket - Womens" | 209304 | 192 | | "White Tee Shirt - Mens" | 152000 | 193 | | "Navy Solid Socks - Mens" | 136512 | 194 | | "Black Straight Jeans - Womens" | 121152 | 195 | 196 | 197 | 198 | **Q3. What was the total discount amount for all products?** 199 | ```sql 200 | SELECT 201 | SUM(price * qty * discount)/100 AS total_discount 202 | FROM balanced_tree.sales; 203 | ``` 204 | 205 | **Result:** 206 | | "total_discount" | 207 | |------------------| 208 | | 156229 | 209 | 210 | 211 |
212 | 213 | --- 214 | 215 | ### **B. Transaction Analysis** 216 | 217 |
218 | 219 | View solutions 220 | 221 | 222 | **Q1. How many unique transactions were there?** 223 | 224 | ```sql 225 | SELECT 226 | COUNT (DISTINCT txn_id) AS unique_txn 227 | FROM balanced_tree.sales; 228 | ``` 229 | 230 | 231 | **Result:** 232 | | "unique_txn" | 233 | |--------------| 234 | | 2500 | 235 | 236 | 237 | 238 | 239 | 240 | **Q2. What is the average unique products purchased in each transaction?** 241 | ```sql 242 | WITH cte_transaction_products AS ( 243 | SELECT 244 | txn_id, 245 | COUNT (DISTINCT prod_id) AS product_count 246 | FROM balanced_tree.sales 247 | GROUP BY txn_id 248 | ) 249 | SELECT 250 | ROUND(AVG(product_count)) AS avg_unique_products 251 | FROM cte_transaction_products; 252 | ``` 253 | 254 | **Result:** 255 | 256 | | "avg_unique_products" | 257 | |-----------------------| 258 | | 6 | 259 | 260 | 261 | 262 | **Q3. What are the 25th, 50th and 75th percentile values for the revenue per transaction?** 263 | ```sql 264 | WITH cte_transaction_revenue AS ( 265 | SELECT 266 | txn_id, 267 | SUM(qty * price) AS revenue 268 | FROM balanced_tree.sales 269 | GROUP BY txn_id 270 | ) 271 | SELECT 272 | PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY revenue) AS pct_25, 273 | PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY revenue) AS pct_50, 274 | PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY revenue) AS pct_75 275 | FROM cte_transaction_revenue; 276 | ``` 277 | 278 | **Result:** 279 | | "pct_25" | "pct_50" | "pct_75" | 280 | |----------|----------|----------| 281 | | 375.75 | 509.5 | 647 | 282 | 283 | 284 | **Q4. What is the average discount value per transaction?** 285 | ```sql 286 | WITH cte_transaction_discounts AS ( 287 | SELECT 288 | txn_id, 289 | SUM(price * qty * discount)/100 AS total_discount 290 | FROM balanced_tree.sales 291 | GROUP BY txn_id 292 | ) 293 | SELECT 294 | ROUND(AVG(total_discount)) AS avg_unique_products 295 | FROM cte_transaction_discounts; 296 | ``` 297 | 298 | **Result:** 299 | | "avg_unique_products" | 300 | |-----------------------| 301 | | 62 | 302 | 303 | 304 | **Q5. What is the percentage split of all transactions for members vs non-members?** 305 | ```sql 306 | SELECT 307 | ROUND(100 * 308 | COUNT(DISTINCT CASE WHEN member = true THEN txn_id END) / 309 | COUNT(DISTINCT txn_id) 310 | , 2) AS member_transaction, 311 | (100 - ROUND(100 * 312 | COUNT(DISTINCT CASE WHEN member = true THEN txn_id END) / 313 | COUNT(DISTINCT txn_id) 314 | , 2) 315 | ) AS non_member_transaction 316 | FROM balanced_tree.sales; 317 | ``` 318 | 319 | **Result:** 320 | | "member_transaction" | "non_member_transaction" | 321 | |----------------------|--------------------------| 322 | | 60.00 | 40.00 | 323 | 324 | 325 | 326 | 327 | **Q6. What is the average revenue for member transactions and non-member transactions?** 328 | ```sql 329 | WITH cte_member_revenue AS ( 330 | SELECT 331 | member, 332 | txn_id, 333 | SUM(price * qty) AS revenue 334 | FROM balanced_tree.sales 335 | GROUP BY 336 | member, 337 | txn_id 338 | ) 339 | SELECT 340 | member, 341 | ROUND(AVG(revenue), 2) AS avg_revenue 342 | FROM cte_member_revenue 343 | GROUP BY member; 344 | ``` 345 | 346 | **Result:** 347 | | "member" | "avg_revenue" | 348 | |----------|---------------| 349 | | False | 515.04 | 350 | | True | 516.27 | 351 | 352 | 353 | 354 |
355 | 356 | --- 357 | 358 | ### **C. Product Analysis** 359 | 360 |
361 | 362 | View solutions 363 | 364 | 365 | **Q1. What are the top 3 products by total revenue before discount?** 366 | ```sql 367 | SELECT 368 | details.product_name, 369 | SUM(sales.qty * sales.price) AS nodis_revenue 370 | FROM balanced_tree.sales AS sales 371 | INNER JOIN balanced_tree.product_details AS details 372 | ON sales.prod_id = details.product_id 373 | GROUP BY details.product_name 374 | ORDER BY nodis_revenue DESC 375 | LIMIT 3; 376 | ``` 377 | 378 | **Result:** 379 | | "product_name" | "nodis_revenue" | 380 | |--------------------------------|-----------------| 381 | | "Blue Polo Shirt - Mens" | 217683 | 382 | | "Grey Fashion Jacket - Womens" | 209304 | 383 | | "White Tee Shirt - Mens" | 152000 | 384 | 385 | 386 | 387 | 388 | **Q2. What is the total quantity, revenue and discount for each segment?** 389 | ```sql 390 | SELECT 391 | details.segment_id, 392 | details.segment_name, 393 | SUM(sales.qty) AS total_quantity, 394 | SUM(sales.qty * sales.price) AS total_revenue, 395 | SUM(sales.qty * sales.price * sales.discount)/100 AS total_discount 396 | FROM balanced_tree.sales AS sales 397 | INNER JOIN balanced_tree.product_details AS details 398 | ON sales.prod_id = details.product_id 399 | GROUP BY 400 | details.segment_id, 401 | details.segment_name 402 | ORDER BY total_revenue DESC; 403 | ``` 404 | 405 | **Result:** 406 | | "category_name" | "segment_name" | "category_segment_percentage" | 407 | |-----------------|----------------|-------------------------------| 408 | | "Womens" | "Jacket" | 63.79 | 409 | | "Womens" | "Jeans" | 36.21 | 410 | | "Mens" | "Shirt" | 56.87 | 411 | | "Mens" | "Socks" | 43.13 | 412 | 413 | 414 | 415 | **Q3. What is the top selling product for each segment?** 416 | ```sql 417 | SELECT 418 | details.segment_id, 419 | details.segment_name, 420 | details.product_id, 421 | details.product_name, 422 | SUM(sales.qty) AS product_quantity 423 | FROM balanced_tree.sales AS sales 424 | INNER JOIN balanced_tree.product_details AS details 425 | ON sales.prod_id = details.product_id 426 | GROUP BY 427 | details.segment_id, 428 | details.segment_name, 429 | details.product_id, 430 | details.product_name 431 | ORDER BY product_quantity DESC 432 | --Limit to the top 5 best selling products 433 | LIMIT 5; 434 | ``` 435 | 436 | **Result:** 437 | | "segment_id" | "segment_name" | "product_id" | "product_name" | "product_quantity" | 438 | |--------------|----------------|--------------|---------------------------------|--------------------| 439 | | 4 | "Jacket" | "9ec847" | "Grey Fashion Jacket - Womens" | 3876 | 440 | | 3 | "Jeans" | "c4a632" | "Navy Oversized Jeans - Womens" | 3856 | 441 | | 5 | "Shirt" | "2a2353" | "Blue Polo Shirt - Mens" | 3819 | 442 | | 5 | "Shirt" | "5d267b" | "White Tee Shirt - Mens" | 3800 | 443 | | 6 | "Socks" | "f084eb" | "Navy Solid Socks - Mens" | 3792 | 444 | 445 | 446 | 447 | **Q4. What is the total quantity, revenue and discount for each category?** 448 | ```sql 449 | SELECT 450 | details.category_id, 451 | details.category_name, 452 | SUM(sales.qty) AS total_quantity, 453 | SUM(sales.qty * sales.price) AS total_revenue, 454 | SUM(sales.qty * sales.price * sales.discount)/100 AS total_discount 455 | FROM balanced_tree.sales AS sales 456 | INNER JOIN balanced_tree.product_details AS details 457 | ON sales.prod_id = details.product_id 458 | GROUP BY 459 | details.category_id, 460 | details.category_name 461 | ORDER BY total_revenue DESC; 462 | ``` 463 | 464 | **Result:** 465 | | "category_id" | "category_name" | "total_quantity" | "total_revenue" | "total_discount" | 466 | |---------------|-----------------|------------------|-----------------|------------------| 467 | | 2 | "Mens" | 22482 | 714120 | 86607 | 468 | | 1 | "Womens" | 22734 | 575333 | 69621 | 469 | 470 | 471 | 472 | **Q5. What is the top selling product for each category?** 473 | ```sql 474 | SELECT 475 | details.category_id, 476 | details.category_name, 477 | details.product_id, 478 | details.product_name, 479 | SUM(sales.qty) AS product_quantity 480 | FROM balanced_tree.sales AS sales 481 | INNER JOIN balanced_tree.product_details AS details 482 | ON sales.prod_id = details.product_id 483 | GROUP BY 484 | details.category_id, 485 | details.category_name, 486 | details.product_id, 487 | details.product_name 488 | ORDER BY product_quantity DESC 489 | --Limit to the top 5 best selling products 490 | LIMIT 5; 491 | ``` 492 | 493 | **Result:** 494 | | "category_id" | "category_name" | "product_id" | "product_name" | "product_quantity" | 495 | |---------------|-----------------|--------------|---------------------------------|--------------------| 496 | | 1 | "Womens" | "9ec847" | "Grey Fashion Jacket - Womens" | 3876 | 497 | | 1 | "Womens" | "c4a632" | "Navy Oversized Jeans - Womens" | 3856 | 498 | | 2 | "Mens" | "2a2353" | "Blue Polo Shirt - Mens" | 3819 | 499 | | 2 | "Mens" | "5d267b" | "White Tee Shirt - Mens" | 3800 | 500 | | 2 | "Mens" | "f084eb" | "Navy Solid Socks - Mens" | 3792 | 501 | 502 | 503 | 504 | **Q6. What is the percentage split of revenue by product for each segment?** 505 | ```sql 506 | WITH cte_product_revenue AS ( 507 | SELECT 508 | product_details.segment_id, 509 | product_details.segment_name, 510 | product_details.product_id, 511 | product_details.product_name, 512 | SUM(sales.qty * sales.price) AS product_revenue 513 | FROM balanced_tree.sales 514 | INNER JOIN balanced_tree.product_details 515 | ON sales.prod_id = product_details.product_id 516 | GROUP BY 517 | product_details.segment_id, 518 | product_details.segment_name, 519 | product_details.product_id, 520 | product_details.product_name 521 | ) 522 | SELECT 523 | segment_name, 524 | product_name, 525 | ROUND( 526 | 100 * product_revenue / 527 | SUM(product_revenue) OVER ( 528 | PARTITION BY segment_id), 529 | 2) AS segment_product_percentage 530 | FROM cte_product_revenue 531 | ORDER BY 532 | segment_id, 533 | segment_product_percentage DESC; 534 | ``` 535 | 536 | **Result:** 537 | 538 | | "segment_name" | "product_name" | "segment_product_percentage" | 539 | |----------------|------------------------------------|------------------------------| 540 | | "Jeans" | "Black Straight Jeans - Womens" | 58.15 | 541 | | "Jeans" | "Navy Oversized Jeans - Womens" | 24.06 | 542 | | "Jeans" | "Cream Relaxed Jeans - Womens" | 17.79 | 543 | | "Jacket" | "Grey Fashion Jacket - Womens" | 57.03 | 544 | | "Jacket" | "Khaki Suit Jacket - Womens" | 23.51 | 545 | | "Jacket" | "Indigo Rain Jacket - Womens" | 19.45 | 546 | | "Shirt" | "Blue Polo Shirt - Mens" | 53.60 | 547 | | "Shirt" | "White Tee Shirt - Mens" | 37.43 | 548 | | "Shirt" | "Teal Button Up Shirt - Mens" | 8.98 | 549 | | "Socks" | "Navy Solid Socks - Mens" | 44.33 | 550 | | "Socks" | "Pink Fluro Polkadot Socks - Mens" | 35.50 | 551 | | "Socks" | "White Striped Socks - Mens" | 20.18 | 552 | 553 | 554 | 555 | **Q7. What is the percentage split of revenue by segment for each category?** 556 | ```sql 557 | WITH cte_product_revenue AS ( 558 | SELECT 559 | product_details.segment_id, 560 | product_details.segment_name, 561 | product_details.category_id, 562 | product_details.category_name, 563 | SUM(sales.qty * sales.price) AS product_revenue 564 | FROM balanced_tree.sales 565 | INNER JOIN balanced_tree.product_details 566 | ON sales.prod_id = product_details.product_id 567 | GROUP BY 568 | product_details.segment_id, 569 | product_details.segment_name, 570 | product_details.category_id, 571 | product_details.category_name 572 | ) 573 | SELECT 574 | category_name, 575 | segment_name, 576 | ROUND( 577 | 100 * product_revenue / 578 | SUM(product_revenue) OVER ( 579 | PARTITION BY category_id), 580 | 2) AS category_segment_percentage 581 | FROM cte_product_revenue 582 | ORDER BY 583 | category_id, 584 | category_segment_percentage DESC; 585 | ``` 586 | 587 | **Result:** 588 | | "category_name" | "segment_name" | "category_segment_percentage" | 589 | |-----------------|----------------|-------------------------------| 590 | | "Womens" | "Jacket" | 63.79 | 591 | | "Womens" | "Jeans" | 36.21 | 592 | | "Mens" | "Shirt" | 56.87 | 593 | | "Mens" | "Socks" | 43.13 | 594 | 595 | 596 | **Q8. What is the percentage split of total revenue by category?** 597 | ```sql 598 | SELECT 599 | ROUND(100 * SUM(CASE WHEN details.category_id = 1 THEN (sales.qty * sales.price) END) / 600 | SUM(qty * sales.price), 601 | 2) AS category_1, 602 | (100 - ROUND(100 * SUM(CASE WHEN details.category_id = 1 THEN (sales.qty * sales.price) END) / 603 | SUM(sales.qty * sales.price), 604 | 2) 605 | ) AS category_2 606 | FROM balanced_tree.sales AS sales 607 | INNER JOIN balanced_tree.product_details AS details 608 | ON sales.prod_id = details.product_id 609 | ``` 610 | 611 | **Result:** 612 | | "category_1" | "category_2" | 613 | |--------------|--------------| 614 | | 44.00 | 56.00 | 615 | 616 | 617 | 618 | **Q9. What is the total transaction “penetration” for each product?** 619 | ```sql 620 | WITH product_transactions AS ( 621 | SELECT 622 | DISTINCT prod_id, 623 | COUNT(DISTINCT txn_id) AS product_transactions 624 | FROM balanced_tree.sales 625 | GROUP BY prod_id 626 | ), 627 | total_transactions AS ( 628 | SELECT 629 | COUNT(DISTINCT txn_id) AS total_transaction_count 630 | FROM balanced_tree.sales 631 | ) 632 | SELECT 633 | product_details.product_id, 634 | product_details.product_name, 635 | ROUND( 636 | 100 * product_transactions.product_transactions::NUMERIC 637 | / total_transactions.total_transaction_count, 638 | 2 639 | ) AS penetration_percentage 640 | FROM product_transactions 641 | CROSS JOIN total_transactions 642 | INNER JOIN balanced_tree.product_details 643 | ON product_transactions.prod_id = product_details.product_id 644 | ORDER BY penetration_percentage DESC; 645 | ``` 646 | 647 | **Result:** 648 | | "product_id" | "product_name" | "penetration_percentage" | 649 | |--------------|------------------------------------|--------------------------| 650 | | "f084eb" | "Navy Solid Socks - Mens" | 51.24 | 651 | | "9ec847" | "Grey Fashion Jacket - Womens" | 51.00 | 652 | | "c4a632" | "Navy Oversized Jeans - Womens" | 50.96 | 653 | | "5d267b" | "White Tee Shirt - Mens" | 50.72 | 654 | | "2a2353" | "Blue Polo Shirt - Mens" | 50.72 | 655 | | "2feb6b" | "Pink Fluro Polkadot Socks - Mens" | 50.32 | 656 | | "72f5d4" | "Indigo Rain Jacket - Womens" | 50.00 | 657 | | "d5e9a6" | "Khaki Suit Jacket - Womens" | 49.88 | 658 | | "e83aa3" | "Black Straight Jeans - Womens" | 49.84 | 659 | | "e31d39" | "Cream Relaxed Jeans - Womens" | 49.72 | 660 | | "b9a74d" | "White Striped Socks - Mens" | 49.72 | 661 | | "c8d436" | "Teal Button Up Shirt - Mens" | 49.68 | 662 | 663 | 664 | 665 | **Q10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction?** 666 | ```sql 667 | -- step 1: check the product_counter... 668 | DROP TABLE IF EXISTS temp_product_combos; 669 | CREATE TEMP TABLE temp_product_combos AS 670 | WITH RECURSIVE input(product) AS ( 671 | SELECT product_id::TEXT FROM balanced_tree.product_details 672 | ), 673 | output_table AS ( 674 | SELECT 675 | ARRAY[product] AS combo, 676 | product, 677 | 1 AS product_counter 678 | FROM input 679 | 680 | UNION ALL -- important to remove duplicates! 681 | 682 | SELECT 683 | ARRAY_APPEND(output_table.combo, input.product), 684 | input.product, 685 | product_counter + 1 686 | FROM output_table 687 | INNER JOIN input ON input.product > output_table.product 688 | WHERE output_table.product_counter <= 2 689 | ) 690 | SELECT * from output_table 691 | WHERE product_counter = 2; 692 | 693 | -- step 2 694 | WITH cte_transaction_products AS ( 695 | SELECT 696 | txn_id, 697 | ARRAY_AGG(prod_id::TEXT ORDER BY prod_id) AS products 698 | FROM balanced_tree.sales 699 | GROUP BY txn_id 700 | ), 701 | -- step 3 702 | cte_combo_transactions AS ( 703 | SELECT 704 | txn_id, 705 | combo, 706 | products 707 | FROM cte_transaction_products 708 | CROSS JOIN temp_product_combos -- previously created temp table above! 709 | WHERE combo <@ products -- combo is contained in products 710 | ), 711 | -- step 4 712 | cte_ranked_combos AS ( 713 | SELECT 714 | combo, 715 | COUNT(DISTINCT txn_id) AS transaction_count, 716 | RANK() OVER (ORDER BY COUNT(DISTINCT txn_id) DESC) AS combo_rank, 717 | ROW_NUMBER() OVER (ORDER BY COUNT(DISTINCT txn_id) DESC) AS combo_id 718 | FROM cte_combo_transactions 719 | GROUP BY combo 720 | ), 721 | -- step 5 722 | cte_most_common_combo_product_transactions AS ( 723 | SELECT 724 | cte_combo_transactions.txn_id, 725 | cte_ranked_combos.combo_id, 726 | UNNEST(cte_ranked_combos.combo) AS prod_id 727 | FROM cte_combo_transactions 728 | INNER JOIN cte_ranked_combos 729 | ON cte_combo_transactions.combo = cte_ranked_combos.combo 730 | WHERE cte_ranked_combos.combo_rank = 1 731 | ) 732 | -- step 6 733 | SELECT 734 | product_details.product_id, 735 | product_details.product_name, 736 | COUNT(DISTINCT sales.txn_id) AS combo_transaction_count, 737 | SUM(sales.qty) AS quantity, 738 | SUM(sales.qty * sales.price) AS revenue, 739 | ROUND( 740 | SUM(sales.qty * sales.price * sales.discount / 100), 741 | 2 742 | ) AS discount, 743 | ROUND( 744 | SUM(sales.qty * sales.price * (1 - sales.discount / 100)), 745 | 2 746 | ) AS net_revenue 747 | FROM balanced_tree.sales 748 | INNER JOIN cte_most_common_combo_product_transactions AS top_combo 749 | ON sales.txn_id = top_combo.txn_id 750 | AND sales.prod_id = top_combo.prod_id 751 | INNER JOIN balanced_tree.product_details 752 | ON sales.prod_id = product_details.product_id 753 | GROUP BY 754 | product_details.product_id, 755 | product_details.product_name; 756 | ``` 757 | 758 | **Result:** 759 | | "product_id" | "product_name" | "combo_transaction_count" | "quantity" | "revenue" | "discount" | "net_revenue" | 760 | |--------------|--------------------------------|---------------------------|------------|-----------|------------|---------------| 761 | | "2a2353" | "Blue Polo Shirt - Mens" | 670 | 2011 | 114627 | 13618.00 | 114627.00 | 762 | | "9ec847" | "Grey Fashion Jacket - Womens" | 670 | 2047 | 110538 | 13025.00 | 110538.00 | 763 | 764 |
765 | 766 | --- 767 |

© 2021 Leah Nguyen

768 | -------------------------------------------------------------------------------- /Case Study #7 - Balanced Tree Clothing Co/query-sql.sql: -------------------------------------------------------------------------------- 1 | /************************** 2 | CASE STUDY #7 - Balanced Tree Clothing Co. 3 | ***************************/ 4 | 5 | 6 | /************************** 7 | A. High Level Sales Analysis 8 | ***************************/ 9 | 10 | --1. What was the total quantity sold for all products? 11 | --for all products in total 12 | SELECT 13 | SUM(qty) AS sale_counts 14 | FROM balanced_tree.sales AS sales; 15 | --for each product category 16 | SELECT 17 | details.product_name, 18 | SUM(sales.qty) AS sale_counts 19 | FROM balanced_tree.sales AS sales 20 | INNER JOIN balanced_tree.product_details AS details 21 | ON sales.prod_id = details.product_id 22 | GROUP BY details.product_name 23 | ORDER BY sale_counts DESC; 24 | 25 | --2. What is the total generated revenue for all products before discounts? 26 | --for all products in total 27 | SELECT 28 | SUM(price * qty) AS nodis_revenue 29 | FROM balanced_tree.sales AS sales; 30 | --for each product category 31 | SELECT 32 | details.product_name, 33 | SUM(sales.qty * sales.price) AS nodis_revenue 34 | FROM balanced_tree.sales AS sales 35 | INNER JOIN balanced_tree.product_details AS details 36 | ON sales.prod_id = details.product_id 37 | GROUP BY details.product_name 38 | ORDER BY nodis_revenue DESC; 39 | 40 | --3. What was the total discount amount for all products? 41 | SELECT 42 | SUM(price * qty * discount)/100 AS total_discount 43 | FROM balanced_tree.sales; 44 | 45 | /************************** 46 | --B. Transaction Analysis 47 | ***************************/ 48 | 49 | --1. How many unique transactions were there? 50 | SELECT 51 | COUNT (DISTINCT txn_id) AS unique_txn 52 | FROM balanced_tree.sales; 53 | 54 | --2. What is the average unique products purchased in each transaction? 55 | WITH cte_transaction_products AS ( 56 | SELECT 57 | txn_id, 58 | COUNT (DISTINCT prod_id) AS product_count 59 | FROM balanced_tree.sales 60 | GROUP BY txn_id 61 | ) 62 | SELECT 63 | ROUND(AVG(product_count)) AS avg_unique_products 64 | FROM cte_transaction_products; 65 | 66 | --3. What are the 25th, 50th and 75th percentile values for the revenue per transaction? 67 | WITH cte_transaction_revenue AS ( 68 | SELECT 69 | txn_id, 70 | SUM(qty * price) AS revenue 71 | FROM balanced_tree.sales 72 | GROUP BY txn_id 73 | ) 74 | SELECT 75 | PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY revenue) AS pct_25, 76 | PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY revenue) AS pct_50, 77 | PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY revenue) AS pct_75 78 | FROM cte_transaction_revenue; 79 | 80 | --4. What is the average discount value per transaction? 81 | WITH cte_transaction_discounts AS ( 82 | SELECT 83 | txn_id, 84 | SUM(price * qty * discount)/100 AS total_discount 85 | FROM balanced_tree.sales 86 | GROUP BY txn_id 87 | ) 88 | SELECT 89 | ROUND(AVG(total_discount)) AS avg_unique_products 90 | FROM cte_transaction_discounts; 91 | 92 | --5. What is the percentage split of all transactions for members vs non-members? 93 | SELECT 94 | ROUND(100 * 95 | COUNT(DISTINCT CASE WHEN member = true THEN txn_id END) / 96 | COUNT(DISTINCT txn_id) 97 | , 2) AS member_transaction, 98 | (100 - ROUND(100 * 99 | COUNT(DISTINCT CASE WHEN member = true THEN txn_id END) / 100 | COUNT(DISTINCT txn_id) 101 | , 2) 102 | ) AS non_member_transaction 103 | FROM balanced_tree.sales; 104 | 105 | --6. What is the average revenue for member transactions and non-member transactions? 106 | WITH cte_member_revenue AS ( 107 | SELECT 108 | member, 109 | txn_id, 110 | SUM(price * qty) AS revenue 111 | FROM balanced_tree.sales 112 | GROUP BY 113 | member, 114 | txn_id 115 | ) 116 | SELECT 117 | member, 118 | ROUND(AVG(revenue), 2) AS avg_revenue 119 | FROM cte_member_revenue 120 | GROUP BY member; 121 | 122 | /************************** 123 | c. Product Analysis 124 | ***************************/ 125 | 126 | --1. What are the top 3 products by total revenue before discount? 127 | SELECT 128 | details.product_name, 129 | SUM(sales.qty * sales.price) AS nodis_revenue 130 | FROM balanced_tree.sales AS sales 131 | INNER JOIN balanced_tree.product_details AS details 132 | ON sales.prod_id = details.product_id 133 | GROUP BY details.product_name 134 | ORDER BY nodis_revenue DESC 135 | LIMIT 3; 136 | 137 | --2. What is the total quantity, revenue and discount for each segment? 138 | SELECT 139 | details.segment_id, 140 | details.segment_name, 141 | SUM(sales.qty) AS total_quantity, 142 | SUM(sales.qty * sales.price) AS total_revenue, 143 | SUM(sales.qty * sales.price * sales.discount)/100 AS total_discount 144 | FROM balanced_tree.sales AS sales 145 | INNER JOIN balanced_tree.product_details AS details 146 | ON sales.prod_id = details.product_id 147 | GROUP BY 148 | details.segment_id, 149 | details.segment_name 150 | ORDER BY total_revenue DESC; 151 | 152 | --3. What is the top selling product for each segment? 153 | SELECT 154 | details.segment_id, 155 | details.segment_name, 156 | details.product_id, 157 | details.product_name, 158 | SUM(sales.qty) AS product_quantity 159 | FROM balanced_tree.sales AS sales 160 | INNER JOIN balanced_tree.product_details AS details 161 | ON sales.prod_id = details.product_id 162 | GROUP BY 163 | details.segment_id, 164 | details.segment_name, 165 | details.product_id, 166 | details.product_name 167 | ORDER BY product_quantity DESC 168 | --Limit to the top 5 best selling products 169 | LIMIT 5; 170 | 171 | --4. What is the total quantity, revenue and discount for each category? 172 | SELECT 173 | details.category_id, 174 | details.category_name, 175 | SUM(sales.qty) AS total_quantity, 176 | SUM(sales.qty * sales.price) AS total_revenue, 177 | SUM(sales.qty * sales.price * sales.discount)/100 AS total_discount 178 | FROM balanced_tree.sales AS sales 179 | INNER JOIN balanced_tree.product_details AS details 180 | ON sales.prod_id = details.product_id 181 | GROUP BY 182 | details.category_id, 183 | details.category_name 184 | ORDER BY total_revenue DESC; 185 | 186 | --5. What is the top selling product for each category? 187 | SELECT 188 | details.category_id, 189 | details.category_name, 190 | details.product_id, 191 | details.product_name, 192 | SUM(sales.qty) AS product_quantity 193 | FROM balanced_tree.sales AS sales 194 | INNER JOIN balanced_tree.product_details AS details 195 | ON sales.prod_id = details.product_id 196 | GROUP BY 197 | details.category_id, 198 | details.category_name, 199 | details.product_id, 200 | details.product_name 201 | ORDER BY product_quantity DESC 202 | --Limit to the top 5 best selling products 203 | LIMIT 5; 204 | 205 | --6. What is the percentage split of revenue by product for each segment? 206 | WITH cte_product_revenue AS ( 207 | SELECT 208 | product_details.segment_id, 209 | product_details.segment_name, 210 | product_details.product_id, 211 | product_details.product_name, 212 | SUM(sales.qty * sales.price) AS product_revenue 213 | FROM balanced_tree.sales 214 | INNER JOIN balanced_tree.product_details 215 | ON sales.prod_id = product_details.product_id 216 | GROUP BY 217 | product_details.segment_id, 218 | product_details.segment_name, 219 | product_details.product_id, 220 | product_details.product_name 221 | ) 222 | SELECT 223 | segment_name, 224 | product_name, 225 | ROUND( 226 | 100 * product_revenue / 227 | SUM(product_revenue) OVER ( 228 | PARTITION BY segment_id), 229 | 2) AS segment_product_percentage 230 | FROM cte_product_revenue 231 | ORDER BY 232 | segment_id, 233 | segment_product_percentage DESC; 234 | 235 | --7. What is the percentage split of revenue by segment for each category? 236 | WITH cte_product_revenue AS ( 237 | SELECT 238 | product_details.segment_id, 239 | product_details.segment_name, 240 | product_details.category_id, 241 | product_details.category_name, 242 | SUM(sales.qty * sales.price) AS product_revenue 243 | FROM balanced_tree.sales 244 | INNER JOIN balanced_tree.product_details 245 | ON sales.prod_id = product_details.product_id 246 | GROUP BY 247 | product_details.segment_id, 248 | product_details.segment_name, 249 | product_details.category_id, 250 | product_details.category_name 251 | ) 252 | SELECT 253 | category_name, 254 | segment_name, 255 | ROUND( 256 | 100 * product_revenue / 257 | SUM(product_revenue) OVER ( 258 | PARTITION BY category_id), 259 | 2) AS category_segment_percentage 260 | FROM cte_product_revenue 261 | ORDER BY 262 | category_id, 263 | category_segment_percentage DESC; 264 | 265 | --8. What is the percentage split of total revenue by category? 266 | SELECT 267 | ROUND(100 * SUM(CASE WHEN details.category_id = 1 THEN (sales.qty * sales.price) END) / 268 | SUM(qty * sales.price), 269 | 2) AS category_1, 270 | (100 - ROUND(100 * SUM(CASE WHEN details.category_id = 1 THEN (sales.qty * sales.price) END) / 271 | SUM(sales.qty * sales.price), 272 | 2) 273 | ) AS category_2 274 | FROM balanced_tree.sales AS sales 275 | INNER JOIN balanced_tree.product_details AS details 276 | ON sales.prod_id = details.product_id 277 | 278 | --9. What is the total transaction “penetration” for each product? 279 | /* penetration = number of transactions where at least 1 quantity 280 | of a product was purchased divided by total number of transactions */ 281 | WITH product_transactions AS ( 282 | SELECT 283 | DISTINCT prod_id, 284 | COUNT(DISTINCT txn_id) AS product_transactions 285 | FROM balanced_tree.sales 286 | GROUP BY prod_id 287 | ), 288 | total_transactions AS ( 289 | SELECT 290 | COUNT(DISTINCT txn_id) AS total_transaction_count 291 | FROM balanced_tree.sales 292 | ) 293 | SELECT 294 | product_details.product_id, 295 | product_details.product_name, 296 | ROUND( 297 | 100 * product_transactions.product_transactions::NUMERIC 298 | / total_transactions.total_transaction_count, 299 | 2 300 | ) AS penetration_percentage 301 | FROM product_transactions 302 | CROSS JOIN total_transactions 303 | INNER JOIN balanced_tree.product_details 304 | ON product_transactions.prod_id = product_details.product_id 305 | ORDER BY penetration_percentage DESC; 306 | 307 | --10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction? 308 | -- step 1: check the product_counter... 309 | DROP TABLE IF EXISTS temp_product_combos; 310 | CREATE TEMP TABLE temp_product_combos AS 311 | WITH RECURSIVE input(product) AS ( 312 | SELECT product_id::TEXT FROM balanced_tree.product_details 313 | ), 314 | output_table AS ( 315 | SELECT 316 | ARRAY[product] AS combo, 317 | product, 318 | 1 AS product_counter 319 | FROM input 320 | 321 | UNION ALL -- important to remove duplicates! 322 | 323 | SELECT 324 | ARRAY_APPEND(output_table.combo, input.product), 325 | input.product, 326 | product_counter + 1 327 | FROM output_table 328 | INNER JOIN input ON input.product > output_table.product 329 | WHERE output_table.product_counter <= 2 330 | ) 331 | SELECT * from output_table 332 | WHERE product_counter = 2; 333 | 334 | -- step 2 335 | WITH cte_transaction_products AS ( 336 | SELECT 337 | txn_id, 338 | ARRAY_AGG(prod_id::TEXT ORDER BY prod_id) AS products 339 | FROM balanced_tree.sales 340 | GROUP BY txn_id 341 | ), 342 | -- step 3 343 | cte_combo_transactions AS ( 344 | SELECT 345 | txn_id, 346 | combo, 347 | products 348 | FROM cte_transaction_products 349 | CROSS JOIN temp_product_combos -- previously created temp table above! 350 | WHERE combo <@ products -- combo is contained in products 351 | ), 352 | -- step 4 353 | cte_ranked_combos AS ( 354 | SELECT 355 | combo, 356 | COUNT(DISTINCT txn_id) AS transaction_count, 357 | RANK() OVER (ORDER BY COUNT(DISTINCT txn_id) DESC) AS combo_rank, 358 | ROW_NUMBER() OVER (ORDER BY COUNT(DISTINCT txn_id) DESC) AS combo_id 359 | FROM cte_combo_transactions 360 | GROUP BY combo 361 | ), 362 | -- step 5 363 | cte_most_common_combo_product_transactions AS ( 364 | SELECT 365 | cte_combo_transactions.txn_id, 366 | cte_ranked_combos.combo_id, 367 | UNNEST(cte_ranked_combos.combo) AS prod_id 368 | FROM cte_combo_transactions 369 | INNER JOIN cte_ranked_combos 370 | ON cte_combo_transactions.combo = cte_ranked_combos.combo 371 | WHERE cte_ranked_combos.combo_rank = 1 372 | ) 373 | -- step 6 374 | SELECT 375 | product_details.product_id, 376 | product_details.product_name, 377 | COUNT(DISTINCT sales.txn_id) AS combo_transaction_count, 378 | SUM(sales.qty) AS quantity, 379 | SUM(sales.qty * sales.price) AS revenue, 380 | ROUND( 381 | SUM(sales.qty * sales.price * sales.discount / 100), 382 | 2 383 | ) AS discount, 384 | ROUND( 385 | SUM(sales.qty * sales.price * (1 - sales.discount / 100)), 386 | 2 387 | ) AS net_revenue 388 | FROM balanced_tree.sales 389 | INNER JOIN cte_most_common_combo_product_transactions AS top_combo 390 | ON sales.txn_id = top_combo.txn_id 391 | AND sales.prod_id = top_combo.prod_id 392 | INNER JOIN balanced_tree.product_details 393 | ON sales.prod_id = product_details.product_id 394 | GROUP BY 395 | product_details.product_id, 396 | product_details.product_name; 397 | -------------------------------------------------------------------------------- /IMG/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/1.png -------------------------------------------------------------------------------- /IMG/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/2.png -------------------------------------------------------------------------------- /IMG/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/3.png -------------------------------------------------------------------------------- /IMG/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/4.png -------------------------------------------------------------------------------- /IMG/org-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/org-1.png -------------------------------------------------------------------------------- /IMG/org-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/org-2.png -------------------------------------------------------------------------------- /IMG/org-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/org-3.png -------------------------------------------------------------------------------- /IMG/org-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/org-4.png -------------------------------------------------------------------------------- /IMG/org-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/org-7.png -------------------------------------------------------------------------------- /IMG/timeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/8-Week-SQL-Challenge/dca1b3419fc9d84e1b3a389d5079b4b48f216b5f/IMG/timeline.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![Star Badge](https://img.shields.io/static/v1?label=%F0%9F%8C%9F&message=If%20Useful&style=style=flat&color=BC4E99) 2 | ![Open Source Love](https://badges.frapsoft.com/os/v1/open-source.svg?v=103) 3 | [![View My Profile](https://img.shields.io/badge/View-My_Profile-green?logo=GitHub)](https://github.com/ndleah) 4 | [![View Repositories](https://img.shields.io/badge/View-My_Repositories-blue?logo=GitHub)](https://github.com/ndleah?tab=repositories) 5 | 6 | # [8-Week SQL Challenge](https://8weeksqlchallenge.com) 7 | 8 | > This repository contains all of my code submissions for the **#8WeekSQLChallenge**! 9 | > 10 | > All case study's requirements and resources are available on the challenge [website](https://8weeksqlchallenge.com). 11 | 12 |

13 | 14 | 15 |

16 | 17 |

18 | 19 |

20 | 21 | ## 📕 Table Of Contents 22 | * [🍜 Case Study #1 - Danny's Diner](#-case-study-1---dannys-diner) 23 | * [🍕 Case Study #2 - Pizza Runner](#-case-study-2---pizza-runner) 24 | * [🥑 Case Study #3 - Foodie-Fi](#-case-study-3---foodie-fi) 25 | * [🪙 Case Study #4 - Data Bank](#-case-study-4---data-bank) 26 | 27 | --- 28 | 29 | ## 🍜 Case Study #1 - Danny's Diner 30 |

31 | 32 | 33 | Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen. 34 | 35 | Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business. 36 | 37 | ### View full case study introduction [here](https://8weeksqlchallenge.com/case-study-1/). 38 | 39 | 40 | [![View Data Exploration Folder](https://img.shields.io/badge/View-Solution_Case_Study_1-971901?style=for-the-badge&logo=GITHUB)](/Case%20Study%20%231%20-%20Danny's%20Diner) 41 | 42 | --- 43 | 44 | ## 🍕 Case Study #2 - Pizza Runner 45 |

46 | 47 | 48 | Danny was scrolling through his Instagram feed when something really caught his eye - “80s Retro Styling and Pizza Is The Future!” 49 | 50 | Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire - so he had one more genius idea to combine with it - he was going to Uberize it - and so Pizza Runner was launched! 51 | 52 | Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers. 53 | 54 | ### View full case study introduction [here](https://8weeksqlchallenge.com/case-study-2/). 55 | 56 | 57 | [![View Data Exploration Folder](https://img.shields.io/badge/View-Solution_Case_Study_2-971901?style=for-the-badge&logo=GITHUB)](/Case%20Study%20%232%20-%20Pizza%20Runner) 58 | 59 | --- 60 | 61 | ## 🥑 Case Study #3 - Foodie-Fi 62 |

63 | 64 | 65 | Subscription based businesses are super popular and Danny realised that there was a large gap in the market - he wanted to create a new streaming service that only had food related content - something like Netflix but with only cooking shows! 66 | 67 | Danny finds a few smart friends to launch his new startup Foodie-Fi in 2020 and started selling monthly and annual subscriptions, giving their customers unlimited on-demand access to exclusive food videos from around the world! 68 | 69 | Danny created Foodie-Fi with a data driven mindset and wanted to ensure all future investment decisions and new features were decided using data. This case study focuses on using subscription style digital data to answer important business questions. 70 | 71 | ### View full case study introduction [here](https://8weeksqlchallenge.com/case-study-3/). 72 | 73 | [![View Data Exploration Folder](https://img.shields.io/badge/View-Solution_Case_Study_3-971901?style=for-the-badge&logo=GITHUB)](/Case%20Study%20%233%20-%20Foodie-Fi) 74 | 75 | --- 76 | 77 | ## 🪙 Case Study #4 - Data Bank 78 |

79 | 80 | 81 | There is a new innovation in the financial industry called Neo-Banks: new aged digital only banks without physical branches. 82 | 83 | Danny thought that there should be some sort of intersection between these new age banks, cryptocurrency and the data world…so he decides to launch a new initiative - Data Bank! 84 | 85 | ... 86 | 87 | The management team at Data Bank want to increase their total customer base - but also need some help tracking just how much data storage their customers will need. 88 | 89 | This case study is all about calculating metrics, growth and helping the business analyse their data in a smart way to better forecast and plan for their future developments! 90 | 91 | ### View full case study introduction [here](https://8weeksqlchallenge.com/case-study-4/). 92 | 93 | [![View Data Exploration Folder](https://img.shields.io/badge/View-Solution_Case_Study_4-971901?style=for-the-badge&logo=GITHUB)](/Case%20Study%20%234%20-%20Data%20Bank) 94 | 95 | --- 96 | 97 | ## ✨ Contribution 98 | 99 | Contributions, issues, and feature requests are welcome! 100 | 101 | To contribute to this project, see the GitHub documentation on **[creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request)**. 102 | 103 | --- 104 | 105 | ## 👏 Support 106 | 107 | Give a ⭐️ if you like this project! 108 | ___________________________________ 109 | 110 |

© 2021 Leah Nguyen

--------------------------------------------------------------------------------