├── Case Study #1 - Danny's Diner ├── README.md ├── SQL │ ├── dannys_diner.sql │ └── solution_1.sql └── Solution.md ├── Case Study #2 - Pizza Runner ├── README.md ├── SQL │ ├── pizza_runner.sql │ └── solution_2.sql └── Solution.md ├── Case Study #3 - Foodie-Fi ├── README.md ├── SQL │ ├── foodie_fi.sql │ └── solution_3.sql └── Solution.md ├── Case Study #4 - Data Bank └── Solution.md ├── Case Study #5 - Data Mart ├── README.md ├── SQL │ ├── data_mart.sql │ └── solution_5.sql └── Solution.md ├── Case Study #6 - Clique Bait ├── README.md ├── SQL │ ├── clique_bait.sql │ └── solution_6.sql └── Solution.md ├── Case Study #7 - Balanced Tree Clothing Co. ├── README.md ├── SQL │ ├── balanced_tree.sql │ └── solution_7.sql └── Solution.md ├── Case Study #8 - Fresh Segments ├── README.md ├── SQL │ ├── fresh_segments.sql │ └── solution_8.sql └── Solution.md └── README.md /Case Study #1 - Danny's Diner/README.md: -------------------------------------------------------------------------------- 1 | # Case Study #1 - Danny's Diner ::ramen:: 2 | 3 | 4 | 5 | Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business. Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. 6 | 7 | ## Available Data 8 | 9 | Danny has shared with you 3 key datasets for this case study: 10 | - `sales`; 11 | - `menu`; 12 | - `members`. 13 | 14 | ### Table 1: `sales` 15 | 16 | The `sales` table captures all `customer_id` level purchases with an corresponding `order_date` and `product_id` information for when and what menu items were ordered. 17 | 18 | | customer_id | order_date | product_id | 19 | |-------------|------------|------------| 20 | | A | 2021-01-01 | 1 | 21 | | A | 2021-01-01 | 2 | 22 | | A | 2021-01-07 | 2 | 23 | | A | 2021-01-10 | 3 | 24 | | A | 2021-01-11 | 3 | 25 | | A | 2021-01-11 | 3 | 26 | | B | 2021-01-01 | 2 | 27 | | B | 2021-01-02 | 2 | 28 | | B | 2021-01-04 | 1 | 29 | | B | 2021-01-11 | 1 | 30 | | B | 2021-01-16 | 3 | 31 | | B | 2021-02-01 | 3 | 32 | | C | 2021-01-01 | 3 | 33 | | C | 2021-01-01 | 3 | 34 | | C | 2021-01-07 | 3 | 35 | 36 | ### Table 2: `menu` 37 | 38 | The `menu` table maps the `product_id` to the actual `product_name` and price of each menu item. 39 | 40 | | product_id | product_name | price | 41 | |------------|--------------|-------| 42 | | 1 | sushi | 10 | 43 | | 2 | curry | 15 | 44 | | 3 | ramen | 12 | 45 | 46 | ### Table 3: `members` 47 | 48 | The final members table captures the `join_date` when a `customer_id` joined the beta version of the Danny’s Diner loyalty program. 49 | 50 | | customer_id | join_date | 51 | |-------------|------------| 52 | | A | 2021-01-07 | 53 | | B | 2021-01-09 | 54 | 55 | ## Entity Relationship Diagram 56 | 57 | ![изображение](https://user-images.githubusercontent.com/98699089/156034410-8775d5d2-eda5-4453-9e33-54bfef253084.png) 58 | 59 | 60 | ## Table of Contents 61 | 62 | [Problem Statement](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#problem-statement) 63 | 64 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#case-study-questions) 65 | 66 | [1. What is the total amount each customer spent at the restaurant?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#1-what-is-the-total-amount-each-customer-spent-at-the-restaurant) 67 | 68 | [2. How many days has each customer visited the restaurant?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#2-how-many-days-has-each-customer-visited-the-restaurant) 69 | 70 | [3. What was the first item from the menu purchased by each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#3-what-was-the-first-item-from-the-menu-purchased-by-each-customer) 71 | 72 | [4. What is the most purchased item on the menu and how many times was it purchased by all customers?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#4-what-is-the-most-purchased-item-on-the-menu-and-how-many-times-was-it-purchased-by-all-customers) 73 | 74 | [5. Which item was the most popular for each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#5-which-item-was-the-most-popular-for-each-customer) 75 | 76 | [6. Which item was purchased first by the customer after they became a member?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#6-which-item-was-purchased-first-by-the-customer-after-they-became-a-member) 77 | 78 | [7. Which item was purchased just before the customer became a member?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#7-which-item-was-purchased-just-before-the-customer-became-a-member) 79 | 80 | [8. What is the total items and amount spent for each member before they became a member?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#8-what-is-the-total-items-and-amount-spent-for-each-member-before-they-became-a-member) 81 | 82 | [9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#9-if-each-1-spent-equates-to-10-points-and-sushi-has-a-2x-points-multiplier---how-many-points-would-each-customer-have) 83 | 84 | [10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#10-in-the-first-week-after-a-customer-joins-the-program-including-their-join-date-they-earn-2x-points-on-all-items-not-just-sushi---how-many-points-do-customer-a-and-b-have-at-the-end-of-january) 85 | 86 | [Bonus Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#bonus-questions) 87 | 88 | [Join All The Things](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#join-all-the-things) 89 | 90 | [Rank All The Things](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%231%20-%20Danny's%20Diner/Solution.md/#rank-all-the-things) 91 | -------------------------------------------------------------------------------- /Case Study #1 - Danny's Diner/SQL/dannys_diner.sql: -------------------------------------------------------------------------------- 1 | CREATE SCHEMA dannys_diner; 2 | SET search_path = dannys_diner; 3 | 4 | CREATE TABLE sales ( 5 | "customer_id" VARCHAR(1), 6 | "order_date" DATE, 7 | "product_id" INTEGER 8 | ); 9 | 10 | INSERT INTO sales 11 | ("customer_id", "order_date", "product_id") 12 | VALUES 13 | ('A', '2021-01-01', '1'), 14 | ('A', '2021-01-01', '2'), 15 | ('A', '2021-01-07', '2'), 16 | ('A', '2021-01-10', '3'), 17 | ('A', '2021-01-11', '3'), 18 | ('A', '2021-01-11', '3'), 19 | ('B', '2021-01-01', '2'), 20 | ('B', '2021-01-02', '2'), 21 | ('B', '2021-01-04', '1'), 22 | ('B', '2021-01-11', '1'), 23 | ('B', '2021-01-16', '3'), 24 | ('B', '2021-02-01', '3'), 25 | ('C', '2021-01-01', '3'), 26 | ('C', '2021-01-01', '3'), 27 | ('C', '2021-01-07', '3'); 28 | 29 | 30 | CREATE TABLE menu ( 31 | "product_id" INTEGER, 32 | "product_name" VARCHAR(5), 33 | "price" INTEGER 34 | ); 35 | 36 | INSERT INTO menu 37 | ("product_id", "product_name", "price") 38 | VALUES 39 | ('1', 'sushi', '10'), 40 | ('2', 'curry', '15'), 41 | ('3', 'ramen', '12'); 42 | 43 | 44 | CREATE TABLE members ( 45 | "customer_id" VARCHAR(1), 46 | "join_date" DATE 47 | ); 48 | 49 | INSERT INTO members 50 | ("customer_id", "join_date") 51 | VALUES 52 | ('A', '2021-01-07'), 53 | ('B', '2021-01-09'); -------------------------------------------------------------------------------- /Case Study #1 - Danny's Diner/SQL/solution_1.sql: -------------------------------------------------------------------------------- 1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January, 4, 2022 2 | -- Fixed on January, 31, 2022 3 | 4 | /* -------------------- 5 | Case Study Questions 6 | --------------------*/ 7 | 8 | SET search_path = dannys_diner; 9 | 10 | -- 1. What is the total amount each customer spent at the restaurant? 11 | 12 | SELECT 13 | customer_id, 14 | SUM(price) as total_spent 15 | FROM 16 | sales as s 17 | JOIN menu as m ON s.product_id = m.product_id 18 | GROUP BY 19 | 1 20 | ORDER BY 21 | 1; 22 | 23 | -- 2. How many days has each customer visited the restaurant? 24 | 25 | SELECT 26 | customer_id, 27 | COUNT(DISTINCT order_date) AS days_of_visiting 28 | FROM 29 | sales 30 | GROUP BY 31 | 1 32 | ORDER BY 33 | 1; 34 | 35 | -- 3. What was the first item from the menu purchased by each customer? 36 | 37 | WITH ranked AS ( 38 | SELECT 39 | customer_id, 40 | product_name, 41 | order_date, 42 | row_number() OVER ( 43 | PARTITION BY customer_id 44 | ORDER BY 45 | order_date, 46 | s.product_id 47 | ) AS rank 48 | FROM 49 | sales AS s 50 | JOIN menu AS m ON s.product_id = m.product_id 51 | ) 52 | SELECT 53 | customer_id, 54 | product_name, 55 | order_date::varchar 56 | FROM 57 | ranked 58 | WHERE 59 | rank = 1; 60 | 61 | -- 4. What is the most purchased item on the menu and how many times was it purchased by all customers? 62 | 63 | WITH totals AS ( 64 | SELECT 65 | product_name, 66 | COUNT(product_name) AS total_purchase_quantity, 67 | row_number() OVER() AS rank 68 | FROM 69 | sales AS s 70 | JOIN menu AS m ON s.product_id = m.product_id 71 | GROUP BY 72 | 1 73 | ) 74 | SELECT 75 | product_name, 76 | total_purchase_quantity 77 | FROM 78 | totals 79 | WHERE 80 | rank = 1; 81 | 82 | -- 5. Which item was the most popular for each customer? 83 | 84 | WITH ranked AS ( 85 | SELECT 86 | customer_id, 87 | product_name, 88 | COUNT(product_name) AS total_purchase_quantity, 89 | rank() OVER ( 90 | PARTITION BY customer_id 91 | ORDER BY 92 | COUNT(product_name) desc 93 | ) AS rank 94 | FROM 95 | sales AS s 96 | JOIN menu AS m ON s.product_id = m.product_id 97 | GROUP BY 98 | customer_id, 99 | product_name 100 | ) 101 | SELECT 102 | customer_id, 103 | product_name, 104 | total_purchase_quantity 105 | FROM 106 | ranked 107 | WHERE 108 | rank = 1; 109 | 110 | -- 6. Which item was purchased first by the customer after they became a member? 111 | 112 | WITH ranked AS ( 113 | SELECT 114 | s.customer_id, 115 | order_date, 116 | join_date, 117 | product_name, 118 | row_number() OVER ( 119 | PARTITION BY s.customer_id 120 | ORDER BY 121 | order_date 122 | ) AS rank 123 | FROM 124 | sales AS s 125 | JOIN members AS mm ON s.customer_id = mm.customer_id 126 | JOIN menu AS m ON s.product_id = m.product_id 127 | WHERE 128 | order_date >= join_date 129 | ) 130 | SELECT 131 | customer_id, 132 | join_date::varchar, 133 | order_date::varchar, 134 | product_name 135 | FROM 136 | ranked AS r 137 | WHERE 138 | rank = 1 139 | ORDER BY 140 | 1; 141 | 142 | -- 7. Which item was purchased just before the customer became a member? 143 | 144 | WITH ranked AS ( 145 | SELECT 146 | s.customer_id, 147 | order_date, 148 | join_date, 149 | product_name, 150 | rank() OVER ( 151 | PARTITION BY s.customer_id 152 | ORDER BY 153 | order_date DESC 154 | ) AS rank 155 | FROM 156 | sales AS s 157 | JOIN members AS mm ON s.customer_id = mm.customer_id 158 | JOIN menu AS m ON s.product_id = m.product_id 159 | WHERE 160 | order_date < join_date 161 | ) 162 | SELECT 163 | customer_id, 164 | join_date::varchar, 165 | order_date::varchar, 166 | product_name 167 | FROM 168 | ranked AS r 169 | WHERE 170 | rank = 1 171 | ORDER BY 172 | 1; 173 | 174 | -- 8. What is the total items and amount spent for each member before they became a member? 175 | 176 | SELECT 177 | s.customer_id, 178 | COUNT(product_name) AS total_number_of_items, 179 | SUM(price) AS total_purchase_amount 180 | FROM 181 | sales AS s 182 | JOIN members AS mm ON s.customer_id = mm.customer_id 183 | JOIN menu AS m ON s.product_id = m.product_id 184 | WHERE 185 | order_date < join_date 186 | GROUP BY 187 | 1 188 | ORDER BY 189 | 1; 190 | 191 | -- 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? 192 | 193 | SELECT 194 | customer_id, 195 | SUM(point) AS points 196 | FROM 197 | sales AS s 198 | JOIN ( 199 | SELECT 200 | product_id, 201 | CASE 202 | WHEN product_id = 1 THEN price * 20 203 | ELSE price * 10 204 | END AS point 205 | FROM 206 | menu 207 | ) AS p ON s.product_id = p.product_id 208 | GROUP BY 209 | 1 210 | ORDER BY 211 | 1; 212 | 213 | -- 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January? 214 | 215 | WITH count_points AS ( 216 | SELECT 217 | s.customer_id, 218 | order_date, 219 | join_date, 220 | product_name, 221 | SUM(point) AS point 222 | FROM 223 | sales AS s 224 | JOIN ( 225 | SELECT 226 | product_id, 227 | product_name, 228 | CASE 229 | WHEN product_name = 'sushi' THEN price * 20 230 | ELSE price * 10 231 | END AS point 232 | FROM 233 | menu AS m 234 | ) AS p ON s.product_id = p.product_id 235 | JOIN members AS mm ON s.customer_id = mm.customer_id 236 | GROUP BY 237 | s.customer_id, 238 | order_date, 239 | join_date, 240 | product_name, 241 | point 242 | ) 243 | SELECT 244 | customer_id, 245 | SUM( 246 | CASE 247 | WHEN order_date >= join_date 248 | AND order_date < join_date + (7 * INTERVAL '1 day') 249 | AND product_name != 'sushi' THEN point * 2 250 | ELSE point 251 | END 252 | ) AS new_points 253 | FROM 254 | count_points 255 | WHERE 256 | DATE_PART('month', order_date) = 1 257 | GROUP BY 258 | 1 259 | ORDER BY 260 | 1; 261 | 262 | /* -------------------- 263 | Bonus Questions 264 | --------------------*/ 265 | 266 | -- Join All The Things 267 | 268 | WITH members AS ( 269 | SELECT 270 | s.customer_id, 271 | order_date, 272 | product_name, 273 | price, 274 | join_date 275 | FROM 276 | sales AS s 277 | JOIN menu AS m ON s.product_id = m.product_id 278 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id 279 | ) 280 | SELECT 281 | customer_id, 282 | order_date::varchar, 283 | product_name, 284 | price, 285 | CASE 286 | WHEN order_date >= join_date THEN 'Y' 287 | ELSE 'N' 288 | END AS member 289 | FROM 290 | members 291 | ORDER BY 292 | 1, 293 | 2, 294 | 3; 295 | 296 | -- Rank All The Things 297 | 298 | WITH members AS ( 299 | SELECT 300 | s.customer_id, 301 | order_date::varchar, 302 | product_name, 303 | price, 304 | CASE 305 | WHEN order_date >= join_date THEN 'Y' 306 | ELSE 'N' 307 | END AS member 308 | FROM 309 | sales AS s 310 | JOIN menu AS m ON s.product_id = m.product_id 311 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id 312 | ) 313 | SELECT 314 | *, 315 | CASE 316 | WHEN member = 'Y' THEN rank() OVER ( 317 | PARTITION BY customer_id, 318 | member 319 | ORDER BY 320 | order_date 321 | ) 322 | END AS ranking 323 | FROM 324 | members 325 | ORDER BY 326 | customer_id, 327 | order_date, 328 | product_name; 329 | -------------------------------------------------------------------------------- /Case Study #1 - Danny's Diner/Solution.md: -------------------------------------------------------------------------------- 1 | # Case Study #1 - Danny's Diner :ramen: 2 | 3 | ## Problem Statement 4 | 5 | Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers. 6 | 7 | Full description: [Case Study #1 - Danny's Diner](https://8weeksqlchallenge.com/case-study-1/) 8 | 9 | ## Case Study Questions 10 | 11 | Each of the following case study questions can be answered using a single SQL statement. I'll mostly use two queries for convenience purposes. 12 | 13 | #### 1. What is the total amount each customer spent at the restaurant? 14 | 15 | ````sql 16 | SET 17 | search_path = dannys_diner; 18 | SELECT 19 | customer_id, 20 | SUM(price) as total_spent 21 | FROM 22 | sales as s 23 | JOIN menu as m ON s.product_id = m.product_id 24 | GROUP BY 25 | 1 26 | ORDER BY 27 | 1 28 | ```` 29 | 30 | | customer_id | total_spent | 31 | | ----------- | ----------- | 32 | | A | 76 | 33 | | B | 74 | 34 | | C | 36 | 35 | 36 | --- 37 | #### 2. How many days has each customer visited the restaurant? 38 | 39 | ````sql 40 | SET 41 | search_path = dannys_diner; 42 | SELECT 43 | customer_id, 44 | COUNT(DISTINCT order_date) AS days_of_visiting 45 | FROM 46 | sales 47 | GROUP BY 48 | 1 49 | ORDER BY 50 | 1 51 | ```` 52 | 53 | | customer_id | days_of_visiting | 54 | | ----------- | ---------------- | 55 | | A | 4 | 56 | | B | 6 | 57 | | C | 2 | 58 | 59 | --- 60 | 61 | #### 3. What was the first item from the menu purchased by each customer? 62 | 63 | To get the first item we need to rank the items ordered by each customer in a temporary table using `WITH` statement. 64 | 65 | After we have those ranks, we can select the rows with the rank = 1. As the customer A made two orders at the first day, we need to use `ORDER BY` in the window function by two criteria: `order_date` and `product_id`. 66 | 67 | In the final query I cast date as `varchar` to remove time ans show the date only. 68 | 69 | ````sql 70 | SET 71 | search_path = dannys_diner; 72 | WITH ranked AS ( 73 | SELECT 74 | customer_id, 75 | product_name, 76 | order_date, 77 | row_number() OVER ( 78 | PARTITION BY customer_id 79 | ORDER BY 80 | order_date, 81 | s.product_id 82 | ) AS rank 83 | FROM 84 | sales AS s 85 | JOIN menu AS m ON s.product_id = m.product_id 86 | ) 87 | SELECT 88 | customer_id, 89 | product_name, 90 | order_date::varchar 91 | FROM 92 | ranked 93 | WHERE 94 | rank = 1 95 | ```` 96 | 97 | | customer_id | product_name | order_date | 98 | | ----------- | ------------ | ---------- | 99 | | A | sushi | 2021-01-01 | 100 | | B | curry | 2021-01-01 | 101 | | C | ramen | 2021-01-01 | 102 | 103 | --- 104 | 105 | The query without using window functions, returns two results for customer A, one result for customer B and two results for customer C: 106 | 107 | ````sql 108 | SELECT 109 | customer_id, 110 | product_name, 111 | order_date::varchar 112 | FROM 113 | dannys_diner.sales 114 | JOIN dannys_diner.menu ON dannys_diner.sales.product_id = dannys_diner.menu.product_id 115 | WHERE 116 | order_date IN ( 117 | SELECT 118 | order_date 119 | FROM 120 | dannys_diner.sales 121 | LIMIT 122 | 1 123 | ) 124 | ORDER BY 125 | 1 126 | ```` 127 | 128 | | customer_id | product_name | order_date | 129 | | ----------- | ------------ | ---------- | 130 | | A | sushi | 2021-01-01 | 131 | | A | curry | 2021-01-01 | 132 | | B | curry | 2021-01-01 | 133 | | C | ramen | 2021-01-01 | 134 | | C | ramen | 2021-01-01 | 135 | 136 | --- 137 | 138 | ***The first purchase for customer A was :sushi:*** 139 | 140 | ***The first purchase for customer B was :curry:*** 141 | 142 | ***The first (and the only) purchase for customer C was :ramen:*** 143 | 144 | #### 4. What is the most purchased item on the menu and how many times was it purchased by all customers? 145 | 146 | ````sql 147 | SET 148 | search_path = dannys_diner; 149 | WITH totals AS ( 150 | SELECT 151 | product_name, 152 | COUNT(product_name) AS total_purchase_quantity, 153 | row_number() OVER() AS rank 154 | FROM 155 | sales AS s 156 | JOIN menu AS m ON s.product_id = m.product_id 157 | GROUP BY 158 | 1 159 | ) 160 | SELECT 161 | product_name, 162 | total_purchase_quantity 163 | FROM 164 | totals 165 | WHERE 166 | rank = 1 167 | ```` 168 | | product_name | total_purchase_quantity | 169 | | ------------ | ----------------------- | 170 | | ramen | 8 | 171 | 172 | --- 173 | 174 | ***The most purchased item on the menu was :ramen:, it was purchased 8 times in total.*** 175 | 176 | #### 5. Which item was the most popular for each customer? 177 | 178 | Let's look at all the results sorted by purchase frequency: 179 | 180 | ````sql 181 | SET 182 | search_path = dannys_diner; 183 | SELECT 184 | customer_id, 185 | product_name, 186 | COUNT(product_name) AS total_purchase_quantity 187 | FROM 188 | sales AS s 189 | INNER JOIN menu AS m ON s.product_id = m.product_id 190 | GROUP BY 191 | customer_id, 192 | product_name 193 | ORDER BY 194 | total_purchase_quantity DESC 195 | ```` 196 | 197 | | customer_id | product_name | total_purchase_quantity | 198 | | ----------- | ------------ | ----------------------- | 199 | | C | ramen | 3 | 200 | | A | ramen | 3 | 201 | | B | curry | 2 | 202 | | B | sushi | 2 | 203 | | B | ramen | 2 | 204 | | A | curry | 2 | 205 | | A | sushi | 1 | 206 | 207 | --- 208 | 209 | Now we can select the most popular products for each customer using `rank` window function: 210 | 211 | ````sql 212 | SET 213 | search_path = dannys_diner; 214 | WITH ranked AS ( 215 | SELECT 216 | customer_id, 217 | product_name, 218 | COUNT(product_name) AS total_purchase_quantity, 219 | rank() OVER ( 220 | PARTITION BY customer_id 221 | ORDER BY 222 | COUNT(product_name) desc 223 | ) AS rank 224 | FROM 225 | sales AS s 226 | JOIN menu AS m ON s.product_id = m.product_id 227 | GROUP BY 228 | customer_id, 229 | product_name 230 | ) 231 | SELECT 232 | customer_id, 233 | product_name, 234 | total_purchase_quantity 235 | FROM 236 | ranked 237 | WHERE 238 | rank = 1 239 | ```` 240 | 241 | | customer_id | product_name | total_purchase_quantity | 242 | | ----------- | ------------ | ----------------------- | 243 | | A | ramen | 3 | 244 | | B | ramen | 2 | 245 | | B | curry | 2 | 246 | | B | sushi | 2 | 247 | | C | ramen | 3 | 248 | 249 | --- 250 | 251 | ***The most popular item for customer A was :ramen:, they purchased it 3 times.*** 252 | 253 | ***The most popular item for customer B was :curry:, :ramen: and :sushi:, they purchased each dish 2 times.*** 254 | 255 | ***The most popular item for customer C was :ramen:, they purchased it 3 times.*** 256 | 257 | #### 6. Which item was purchased first by the customer after they became a member? 258 | 259 | Let's consider that if the purchase date matches the membership date, then the purchase made on this date, was the first customer's purchase as a member. 260 | It means that we need to include this date in the WHERE statement. 261 | 262 | ````sql 263 | SET 264 | search_path = dannys_diner; 265 | WITH ranked AS ( 266 | SELECT 267 | s.customer_id, 268 | order_date, 269 | join_date, 270 | product_name, 271 | row_number() OVER ( 272 | PARTITION BY s.customer_id 273 | ORDER BY 274 | order_date 275 | ) AS rank 276 | FROM 277 | sales AS s 278 | JOIN members AS mm ON s.customer_id = mm.customer_id 279 | JOIN menu AS m ON s.product_id = m.product_id 280 | WHERE 281 | order_date >= join_date 282 | ) 283 | SELECT 284 | customer_id, 285 | join_date::varchar, 286 | order_date::varchar, 287 | product_name 288 | FROM 289 | ranked AS r 290 | WHERE 291 | rank = 1 292 | ORDER BY 293 | 1 294 | ```` 295 | 296 | | customer_id | join_date | order_date | product_name | 297 | | ----------- | ---------- | ---------- | ------------ | 298 | | A | 2021-01-07 | 2021-01-07 | curry | 299 | | B | 2021-01-09 | 2021-01-11 | sushi | 300 | 301 | --- 302 | 303 | #### 7. Which item was purchased just before the customer became a member? 304 | 305 | Customer A purchased their membership on January, 7 - and they placed an order that day. 306 | We do not have time and therefore can not say exactly if this purchase was made before of after they became a member. 307 | Let's consider that if the purchase date matches the membership date, then the purchase made on this date, was the first customer's purchase as a member. 308 | It means that we need to exclude this date in the `WHERE` statement. 309 | 310 | ````sql 311 | SET 312 | search_path = dannys_diner; 313 | WITH ranked AS ( 314 | SELECT 315 | s.customer_id, 316 | order_date, 317 | join_date, 318 | product_name, 319 | rank() OVER ( 320 | PARTITION BY s.customer_id 321 | ORDER BY 322 | order_date DESC 323 | ) AS rank 324 | FROM 325 | sales AS s 326 | JOIN members AS mm ON s.customer_id = mm.customer_id 327 | JOIN menu AS m ON s.product_id = m.product_id 328 | WHERE 329 | order_date < join_date 330 | ) 331 | SELECT 332 | customer_id, 333 | join_date::varchar, 334 | order_date::varchar, 335 | product_name 336 | FROM 337 | ranked AS r 338 | WHERE 339 | rank = 1 340 | ORDER BY 341 | 1 342 | ```` 343 | 344 | | customer_id | join_date | order_date | product_name | 345 | | ----------- | ---------- | ---------- | ------------ | 346 | | A | 2021-01-07 | 2021-01-01 | sushi | 347 | | A | 2021-01-07 | 2021-01-01 | curry | 348 | | B | 2021-01-09 | 2021-01-04 | sushi | 349 | 350 | --- 351 | 352 | Customer A purchased two items on January, 1 - the date before they became a member. 353 | We need more information to tell exactly what item was purchased before they became a member: order number or purchase time. I am keeping two items in the list for now. 354 | 355 | ***Customer A purchased :curry: and :sushi: on 2021-01-01*** 356 | 357 | ***Customer B purchased :sushi: on 2021-01-04*** 358 | 359 | #### 8. What is the total items and amount spent for each member before they became a member? 360 | 361 | Let's consider that if the purchase date matches the membership date, then the purchase made on this date, was the first customer's purchase as a member. 362 | It means that we need to exclude this date in the WHERE statement. 363 | 364 | ````sql 365 | SET 366 | search_path = dannys_diner; 367 | SELECT 368 | s.customer_id, 369 | COUNT(product_name) AS total_number_of_items, 370 | SUM(price) AS total_purchase_amount 371 | FROM 372 | sales AS s 373 | JOIN members AS mm ON s.customer_id = mm.customer_id 374 | JOIN menu AS m ON s.product_id = m.product_id 375 | WHERE 376 | order_date < join_date 377 | GROUP BY 378 | 1 379 | ORDER BY 380 | 1 381 | ```` 382 | 383 | | customer_id | total_number_of_items | total_purchase_amount | 384 | | ----------- | --------------------- | --------------------- | 385 | | A | 2 | 25 | 386 | | B | 3 | 40 | 387 | 388 | --- 389 | 390 | #### 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? 391 | 392 | ````sql 393 | SET 394 | search_path = dannys_diner; 395 | SELECT 396 | customer_id, 397 | SUM(point) AS points 398 | FROM 399 | sales AS s 400 | JOIN ( 401 | SELECT 402 | product_id, 403 | CASE 404 | WHEN product_id = 1 THEN price * 20 405 | ELSE price * 10 406 | END AS point 407 | FROM 408 | menu 409 | ) AS p ON s.product_id = p.product_id 410 | GROUP BY 411 | 1 412 | ORDER BY 413 | 1 414 | ```` 415 | 416 | | customer_id | points | 417 | | ----------- | ------ | 418 | | A | 860 | 419 | | B | 940 | 420 | | C | 360 | 421 | 422 | --- 423 | 424 | #### 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January? 425 | 426 | First we need to count points as usual: 10 points for each dollar spent on :curry: and :ramen: and 20 points for each dollar spent on :sushi:. 427 | We add this calculation to the `CTE` using `WITH` statement. Next we use this `CTE` to add extra 10 points for all the purchases of :curry: and :ramen: made by customers on the first week of their membership and return the sum of new points. The points for :sushi: remain the same - 20 points. 428 | 429 | ````sql 430 | SET 431 | search_path = dannys_diner; 432 | WITH count_points AS ( 433 | SELECT 434 | s.customer_id, 435 | order_date, 436 | join_date, 437 | product_name, 438 | SUM(point) AS point 439 | FROM 440 | sales AS s 441 | JOIN ( 442 | SELECT 443 | product_id, 444 | product_name, 445 | CASE 446 | WHEN product_name = 'sushi' THEN price * 20 447 | ELSE price * 10 448 | END AS point 449 | FROM 450 | menu AS m 451 | ) AS p ON s.product_id = p.product_id 452 | JOIN members AS mm ON s.customer_id = mm.customer_id 453 | GROUP BY 454 | s.customer_id, 455 | order_date, 456 | join_date, 457 | product_name, 458 | point 459 | ) 460 | SELECT 461 | customer_id, 462 | SUM( 463 | CASE 464 | WHEN order_date >= join_date 465 | AND order_date < join_date + (7 * INTERVAL '1 day') 466 | AND product_name != 'sushi' THEN point * 2 467 | ELSE point 468 | END 469 | ) AS new_points 470 | FROM 471 | count_points 472 | WHERE 473 | DATE_PART('month', order_date) = 1 474 | GROUP BY 475 | 1 476 | ORDER BY 477 | 1 478 | ```` 479 | 480 | | customer_id | new_points | 481 | | ----------- | ---------- | 482 | | A | 1370 | 483 | | B | 820 | 484 | 485 | --- 486 | 487 | ***Customer A at the end of January would have 1370 points*** 488 | 489 | ***Customer B at the end of January would have 820 points*** and 0 benefits from their first week membership 490 | 491 | ## Bonus Questions 492 | 493 | ### Join All The Things 494 | 495 | ````sql 496 | SET 497 | search_path = dannys_diner; 498 | WITH members AS ( 499 | SELECT 500 | s.customer_id, 501 | order_date, 502 | product_name, 503 | price, 504 | join_date 505 | FROM 506 | sales AS s 507 | JOIN menu AS m ON s.product_id = m.product_id 508 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id 509 | ) 510 | SELECT 511 | customer_id, 512 | order_date::varchar, 513 | product_name, 514 | price, 515 | CASE 516 | WHEN order_date >= join_date THEN 'Y' 517 | ELSE 'N' 518 | END AS member 519 | FROM 520 | members 521 | ORDER BY 522 | 1, 523 | 2, 524 | 3 525 | ```` 526 | 527 | | customer_id | order_date | product_name | price | member | 528 | | ----------- | ---------- | ------------ | ----- | ------ | 529 | | A | 2021-01-01 | curry | 15 | N | 530 | | A | 2021-01-01 | sushi | 10 | N | 531 | | A | 2021-01-07 | curry | 15 | Y | 532 | | A | 2021-01-10 | ramen | 12 | Y | 533 | | A | 2021-01-11 | ramen | 12 | Y | 534 | | A | 2021-01-11 | ramen | 12 | Y | 535 | | B | 2021-01-01 | curry | 15 | N | 536 | | B | 2021-01-02 | curry | 15 | N | 537 | | B | 2021-01-04 | sushi | 10 | N | 538 | | B | 2021-01-11 | sushi | 10 | Y | 539 | | B | 2021-01-16 | ramen | 12 | Y | 540 | | B | 2021-02-01 | ramen | 12 | Y | 541 | | C | 2021-01-01 | ramen | 12 | N | 542 | | C | 2021-01-01 | ramen | 12 | N | 543 | | C | 2021-01-07 | ramen | 12 | N | 544 | 545 | --- 546 | 547 | ### Rank All The Things 548 | 549 | First we need to select all the necessary columns from `sales`, `menu` and `members` tables - we do that using CTE and `WITH` statement. 550 | Next we can rank orders from this table by `customer_id` and `member` columns. 551 | 552 | ````sql 553 | SET 554 | search_path = dannys_diner; 555 | WITH members AS ( 556 | SELECT 557 | s.customer_id, 558 | order_date::varchar, 559 | product_name, 560 | price, 561 | CASE 562 | WHEN order_date >= join_date THEN 'Y' 563 | ELSE 'N' 564 | END AS member 565 | FROM 566 | sales AS s 567 | JOIN menu AS m ON s.product_id = m.product_id 568 | LEFT JOIN members AS mm ON s.customer_id = mm.customer_id 569 | ) 570 | SELECT 571 | *, 572 | CASE 573 | WHEN member = 'Y' THEN rank() OVER ( 574 | PARTITION BY customer_id, 575 | member 576 | ORDER BY 577 | order_date 578 | ) 579 | END AS ranking 580 | FROM 581 | members 582 | ORDER BY 583 | customer_id, 584 | order_date, 585 | product_name 586 | ```` 587 | 588 | | customer_id | order_date | product_name | price | member | ranking | 589 | | ----------- | ---------- | ------------ | ----- | ------ | ------- | 590 | | A | 2021-01-01 | curry | 15 | N | | 591 | | A | 2021-01-01 | sushi | 10 | N | | 592 | | A | 2021-01-07 | curry | 15 | Y | 1 | 593 | | A | 2021-01-10 | ramen | 12 | Y | 2 | 594 | | A | 2021-01-11 | ramen | 12 | Y | 3 | 595 | | A | 2021-01-11 | ramen | 12 | Y | 3 | 596 | | B | 2021-01-01 | curry | 15 | N | | 597 | | B | 2021-01-02 | curry | 15 | N | | 598 | | B | 2021-01-04 | sushi | 10 | N | | 599 | | B | 2021-01-11 | sushi | 10 | Y | 1 | 600 | | B | 2021-01-16 | ramen | 12 | Y | 2 | 601 | | B | 2021-02-01 | ramen | 12 | Y | 3 | 602 | | C | 2021-01-01 | ramen | 12 | N | | 603 | | C | 2021-01-01 | ramen | 12 | N | | 604 | | C | 2021-01-07 | ramen | 12 | N | | 605 | 606 | --- 607 | -------------------------------------------------------------------------------- /Case Study #2 - Pizza Runner/README.md: -------------------------------------------------------------------------------- 1 | # Case Study #2 - Pizza Runner ::pizza:: 2 | 3 | 4 | 5 | Danny requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations. 6 | 7 | All datasets exist within the `pizza_runner` database schema. 8 | 9 | ## Available Data 10 | 11 | ### Table 1: `runners` 12 | 13 | The `runners` table shows the `registration_date` for each new runner 14 | 15 | | runner_id | registration_date | 16 | |-----------|-------------------| 17 | | 1 | 2021-01-01 | 18 | | 2 | 2021-01-03 | 19 | | 3 | 2021-01-08 | 20 | | 4 | 2021-01-15 | 21 | 22 | ### Table 2: `customer_orders` 23 | 24 | Customer `pizza orders` are captured in the `customer_orders` table with 1 row for each individual pizza that is part of the order. 25 | 26 | The `pizza_id` relates to the type of pizza which was ordered whilst the `exclusions` are the `ingredient_id` values which should be removed from the pizza and the `extras` are the `ingredient_id` values which need to be added to the pizza. 27 | 28 | Note that customers can order multiple pizzas in a single order with varying exclusions and extras values even if the pizza is the same type! 29 | 30 | The `exclusions` and `extras` columns will need to be cleaned up before using them in your queries. 31 | 32 | | order_id | customer_id | pizza_id | exclusions | extras | order_time | 33 | |----------|-------------|----------|---------------------|---------------------|---------------------| 34 | | 1 | 101 | 1 | 2021-01-01 18:05:02 | | | 35 | | 2 | 101 | 1 | 2021-01-01 19:00:52 | | | 36 | | 3 | 102 | 1 | 2021-01-02 23:51:23 | | | 37 | | 3 | 102 | 2 | NaN | 2021-01-02 23:51:23 | | 38 | | 4 | 103 | 1 | 4 | 2021-01-04 13:23:46 | | 39 | | 4 | 103 | 1 | 4 | 2021-01-04 13:23:46 | | 40 | | 4 | 103 | 2 | 4 | 2021-01-04 13:23:46 | | 41 | | 5 | 104 | 1 | null | 1 | 2021-01-08 21:00:29 | 42 | | 6 | 101 | 2 | null | null | 2021-01-08 21:03:13 | 43 | | 7 | 105 | 2 | null | 1 | 2021-01-08 21:20:29 | 44 | | 8 | 102 | 1 | null | null | 2021-01-09 23:54:33 | 45 | | 9 | 103 | 1 | 4 | 1, 5 | 2021-01-10 11:22:59 | 46 | | 10 | 104 | 1 | null | null | 2021-01-11 18:34:49 | 47 | | 10 | 104 | 1 | 2, 6 | 1, 4 | 2021-01-11 18:34:49 | 48 | 49 | ### Table 3: `runner_orders` 50 | 51 | After each orders are received through the system - they are assigned to a runner - however not all orders are fully completed and can be cancelled by the restaurant or the customer. 52 | 53 | The `pickup_time` is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The `distance` and `duration` fields are related to how far and long the runner had to travel to deliver the order to the respective customer. 54 | 55 | There are some known data issues with this table so be careful when using this in your queries - make sure to check the data types for each column in the schema SQL! 56 | 57 | | order_id | runner_id | pickup_time | distance | duration | cancellation | 58 | |----------|-----------|---------------------|----------|------------|-------------------------| 59 | | 1 | 1 | 2021-01-01 18:15:34 | 20km | 32 minutes | | 60 | | 2 | 1 | 2021-01-01 19:10:54 | 20km | 27 minutes | | 61 | | 3 | 1 | 2021-01-03 00:12:37 | 13.4km | 20 mins | NaN | 62 | | 4 | 2 | 2021-01-04 13:53:03 | 23.4 | 40 | NaN | 63 | | 5 | 3 | 2021-01-08 21:10:57 | 10 | 15 | NaN | 64 | | 6 | 3 | null | null | null | Restaurant Cancellation | 65 | | 7 | 2 | 2020-01-08 21:30:45 | 25km | 25mins | null | 66 | | 8 | 2 | 2020-01-10 00:15:02 | 23.4 km | 15 minute | null | 67 | | 9 | 2 | null | null | null | Customer Cancellation | 68 | | 10 | 1 | 2020-01-11 18:50:20 | 10km | 10minutes | null | 69 | 70 | ### Table 4: `pizza_names` 71 | 72 | At the moment - Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian! 73 | 74 | | pizza_id | pizza_name | 75 | |----------|-------------| 76 | | 1 | Meat Lovers | 77 | | 2 | Vegetarian | 78 | 79 | ### Table 5: `pizza_recipes` 80 | 81 | Each `pizza_id` has a standard set of toppings which are used as part of the pizza recipe. 82 | 83 | | pizza_id | toppings | 84 | |----------|------------------------ | 85 | | 1 | 1, 2, 3, 4, 5, 6, 8, 10 | 86 | | 2 | 4, 6, 7, 9, 11, 12 | 87 | 88 | ### Table 6: `pizza_toppings` 89 | 90 | This table contains all of the `topping_name` values with their corresponding `topping_id` value 91 | 92 | | topping_id | topping_name | 93 | |------------|--------------| 94 | | 1 | Bacon | 95 | | 2 | BBQ Sauce | 96 | | 3 | Beef | 97 | | 4 | Cheese | 98 | | 5 | Chicken | 99 | | 6 | Mushrooms | 100 | | 7 | Onions | 101 | | 8 | Pepperoni | 102 | | 9 | Peppers | 103 | | 10 | Salami | 104 | | 11 | Tomatoes | 105 | | 12 | Tomato Sauce | 106 | 107 | ## Entity Relationship Diagram 108 | 109 | ![изображение](https://user-images.githubusercontent.com/98699089/156217477-8e823838-4a49-4936-9206-afa3cdeda699.png) 110 | 111 | ## Table of Contents 112 | 113 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#introduction) 114 | 115 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#case-study-questions) 116 | 117 | [A. Pizza Metrics](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#a-pizza-metrics) 118 | 119 | [1. How many pizzas were ordered?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-how-many-pizzas-were-ordered) 120 | 121 | [2. How many unique customer orders were made?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-how-many-unique-customer-orders-were-made) 122 | 123 | [3. How many successful orders were delivered by each runner?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-how-many-successful-orders-were-delivered-by-each-runner) 124 | 125 | [4. How many of each type of pizza was delivered?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-how-many-of-each-type-of-pizza-was-delivered) 126 | 127 | [5. How many Vegetarian and Meatlovers were ordered by each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-how-many-vegetarian-and-meatlovers-were-ordered-by-each-customer) 128 | 129 | [6. What was the maximum number of pizzas delivered in a single order?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#6-what-was-the-maximum-number-of-pizzas-delivered-in-a-single-order) 130 | 131 | [7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#7-for-each-customer-how-many-delivered-pizzas-had-at-least-1-change-and-how-many-had-no-changes) 132 | 133 | [8. How many pizzas were delivered that had both exclusions and extras?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#8-how-many-pizzas-were-delivered-that-had-both-exclusions-and-extras) 134 | 135 | [9. What was the total volume of pizzas ordered for each hour of the day?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#9-what-was-the-total-volume-of-pizzas-ordered-for-each-hour-of-the-day) 136 | 137 | [10. What was the volume of orders for each day of the week?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#10-what-was-the-volume-of-orders-for-each-day-of-the-week) 138 | 139 | [B. Runner and Customer Experience](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#b-runner-and-customer-experience) 140 | 141 | [1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-how-many-runners-signed-up-for-each-1-week-period-ie-week-starts-2021-01-01) 142 | 143 | [2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-what-was-the-average-time-in-minutes-it-took-for-each-runner-to-arrive-at-the-pizza-runner-hq-to-pickup-the-order) 144 | 145 | [3. Is there any relationship between the number of pizzas and how long the order takes to prepare?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-is-there-any-relationship-between-the-number-of-pizzas-and-how-long-the-order-takes-to-prepare) 146 | 147 | [4. What was the average distance travelled for each customer?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-what-was-the-average-distance-travelled-for-each-customer) 148 | 149 | [5. What was the difference between the longest and shortest delivery times for all orders?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-what-was-the-difference-between-the-longest-and-shortest-delivery-times-for-all-orders) 150 | 151 | [6. What was the average speed for each runner for each delivery and do you notice any trend for these values?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#6-what-was-the-average-speed-for-each-runner-for-each-delivery-and-do-you-notice-any-trend-for-these-values) 152 | 153 | [7. What is the successful delivery percentage for each runner?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#7-what-is-the-successful-delivery-percentage-for-each-runner) 154 | 155 | [C. Ingredient Optimisation](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#c-ingredient-optimisation) 156 | 157 | [1. What are the standard ingredients for each pizza?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-what-are-the-standard-ingredients-for-each-pizza) 158 | 159 | [2. What was the most commonly added extra?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-what-was-the-most-commonly-added-extra) 160 | 161 | [3. What was the most common exclusion?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-what-was-the-most-common-exclusion) 162 | 163 | [4. Generate an order item for each record in the customers_orders table in the format of one of the following:](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-generate-an-order-item-for-each-record-in-the-customers_orders-table-in-the-format-of-one-of-the-following) 164 | 165 | [5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-generate-an-alphabetically-ordered-comma-separated-ingredient-list-for-each-pizza-order-from-the-customer_orders-table-and-add-a-2x-in-front-of-any-relevant-ingredients) 166 | 167 | [6. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#6-what-is-the-total-quantity-of-each-ingredient-used-in-all-delivered-pizzas-sorted-by-most-frequent-first) 168 | 169 | [D. Pricing and Ratings](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#d-pricing-and-ratings) 170 | 171 | [1. If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#1-if-a-meat-lovers-pizza-costs-12-and-vegetarian-costs-10-and-there-were-no-charges-for-changes---how-much-money-has-pizza-runner-made-so-far-if-there-are-no-delivery-fees) 172 | 173 | [2. What if there was an additional $1 charge for any pizza extras?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#2-what-if-there-was-an-additional-1-charge-for-any-pizza-extras) 174 | 175 | [- Add cheese is $1 extra](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#--add-cheese-is-1-extra) 176 | 177 | [3. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#3-the-pizza-runner-team-now-wants-to-add-an-additional-ratings-system-that-allows-customers-to-rate-their-runner-how-would-you-design-an-additional-table-for-this-new-dataset---generate-a-schema-for-this-new-table-and-insert-your-own-data-for-ratings-for-each-successful-customer-order-between-1-to-5) 178 | 179 | [4. Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#4-using-your-newly-generated-table---can-you-join-all-of-the-information-together-to-form-a-table-which-has-the-following-information-for-successful-deliveries) 180 | 181 | [5. If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#5-if-a-meat-lovers-pizza-was-12-and-vegetarian-10-fixed-prices-with-no-cost-for-extras-and-each-runner-is-paid-030-per-kilometre-traveled---how-much-money-does-pizza-runner-have-left-over-after-these-deliveries) 182 | 183 | [E. Bonus Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#e-bonus-questions) 184 | 185 | [If Danny wants to expand his range of pizzas - how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%232%20-%20Pizza%20Runner/Solution.md/#if-danny-wants-to-expand-his-range-of-pizzas---how-would-this-impact-the-existing-data-design-write-an-insert-statement-to-demonstrate-what-would-happen-if-a-new-supreme-pizza-with-all-the-toppings-was-added-to-the-pizza-runner-menu) 186 | -------------------------------------------------------------------------------- /Case Study #2 - Pizza Runner/SQL/pizza_runner.sql: -------------------------------------------------------------------------------- 1 | CREATE SCHEMA pizza_runner; 2 | SET search_path = pizza_runner; 3 | 4 | DROP TABLE IF EXISTS runners; 5 | CREATE TABLE runners ( 6 | "runner_id" INTEGER, 7 | "registration_date" DATE 8 | ); 9 | INSERT INTO runners 10 | ("runner_id", "registration_date") 11 | VALUES 12 | (1, '2021-01-01'), 13 | (2, '2021-01-03'), 14 | (3, '2021-01-08'), 15 | (4, '2021-01-15'); 16 | 17 | 18 | DROP TABLE IF EXISTS customer_orders; 19 | CREATE TABLE customer_orders ( 20 | "order_id" INTEGER, 21 | "customer_id" INTEGER, 22 | "pizza_id" INTEGER, 23 | "exclusions" VARCHAR(4), 24 | "extras" VARCHAR(4), 25 | "order_time" TIMESTAMP 26 | ); 27 | 28 | INSERT INTO customer_orders 29 | ("order_id", "customer_id", "pizza_id", "exclusions", "extras", "order_time") 30 | VALUES 31 | ('1', '101', '1', '', '', '2020-01-01 18:05:02'), 32 | ('2', '101', '1', '', '', '2020-01-01 19:00:52'), 33 | ('3', '102', '1', '', '', '2020-01-02 23:51:23'), 34 | ('3', '102', '2', '', NULL, '2020-01-02 23:51:23'), 35 | ('4', '103', '1', '4', '', '2020-01-04 13:23:46'), 36 | ('4', '103', '1', '4', '', '2020-01-04 13:23:46'), 37 | ('4', '103', '2', '4', '', '2020-01-04 13:23:46'), 38 | ('5', '104', '1', 'null', '1', '2020-01-08 21:00:29'), 39 | ('6', '101', '2', 'null', 'null', '2020-01-08 21:03:13'), 40 | ('7', '105', '2', 'null', '1', '2020-01-08 21:20:29'), 41 | ('8', '102', '1', 'null', 'null', '2020-01-09 23:54:33'), 42 | ('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'), 43 | ('10', '104', '1', 'null', 'null', '2020-01-11 18:34:49'), 44 | ('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49'); 45 | 46 | 47 | DROP TABLE IF EXISTS runner_orders; 48 | CREATE TABLE runner_orders ( 49 | "order_id" INTEGER, 50 | "runner_id" INTEGER, 51 | "pickup_time" VARCHAR(19), 52 | "distance" VARCHAR(7), 53 | "duration" VARCHAR(10), 54 | "cancellation" VARCHAR(23) 55 | ); 56 | 57 | INSERT INTO runner_orders 58 | ("order_id", "runner_id", "pickup_time", "distance", "duration", "cancellation") 59 | VALUES 60 | ('1', '1', '2020-01-01 18:15:34', '20km', '32 minutes', ''), 61 | ('2', '1', '2020-01-01 19:10:54', '20km', '27 minutes', ''), 62 | ('3', '1', '2020-01-03 00:12:37', '13.4km', '20 mins', NULL), 63 | ('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL), 64 | ('5', '3', '2020-01-08 21:10:57', '10', '15', NULL), 65 | ('6', '3', 'null', 'null', 'null', 'Restaurant Cancellation'), 66 | ('7', '2', '2020-01-08 21:30:45', '25km', '25mins', 'null'), 67 | ('8', '2', '2020-01-10 00:15:02', '23.4 km', '15 minute', 'null'), 68 | ('9', '2', 'null', 'null', 'null', 'Customer Cancellation'), 69 | ('10', '1', '2020-01-11 18:50:20', '10km', '10minutes', 'null'); 70 | 71 | 72 | DROP TABLE IF EXISTS pizza_names; 73 | CREATE TABLE pizza_names ( 74 | "pizza_id" INTEGER, 75 | "pizza_name" TEXT 76 | ); 77 | INSERT INTO pizza_names 78 | ("pizza_id", "pizza_name") 79 | VALUES 80 | (1, 'Meatlovers'), 81 | (2, 'Vegetarian'); 82 | 83 | 84 | DROP TABLE IF EXISTS pizza_recipes; 85 | CREATE TABLE pizza_recipes ( 86 | "pizza_id" INTEGER, 87 | "toppings" TEXT 88 | ); 89 | INSERT INTO pizza_recipes 90 | ("pizza_id", "toppings") 91 | VALUES 92 | (1, '1, 2, 3, 4, 5, 6, 8, 10'), 93 | (2, '4, 6, 7, 9, 11, 12'); 94 | 95 | 96 | DROP TABLE IF EXISTS pizza_toppings; 97 | CREATE TABLE pizza_toppings ( 98 | "topping_id" INTEGER, 99 | "topping_name" TEXT 100 | ); 101 | INSERT INTO pizza_toppings 102 | ("topping_id", "topping_name") 103 | VALUES 104 | (1, 'Bacon'), 105 | (2, 'BBQ Sauce'), 106 | (3, 'Beef'), 107 | (4, 'Cheese'), 108 | (5, 'Chicken'), 109 | (6, 'Mushrooms'), 110 | (7, 'Onions'), 111 | (8, 'Pepperoni'), 112 | (9, 'Peppers'), 113 | (10, 'Salami'), 114 | (11, 'Tomatoes'), 115 | (12, 'Tomato Sauce'); -------------------------------------------------------------------------------- /Case Study #3 - Foodie-Fi/README.md: -------------------------------------------------------------------------------- 1 | # Case Study #3 - Foodie-Fi :avocado: 2 | 3 | 4 | 5 | Danny has shared the data design for Foodie-Fi and also short descriptions on each of the database tables - our case study focuses on only 2 tables but there will be a challenge to create a new table for the Foodie-Fi team. 6 | 7 | All datasets exist within the `foodie_fi` database schema. 8 | 9 | ## Available Data 10 | 11 | ### Table 1: `plans` 12 | 13 | Customers can choose which plans to join Foodie-Fi when they first sign up. 14 | 15 | Basic plan customers have limited access and can only stream their videos and is only available monthly at $9.90 16 | 17 | Pro plan customers have no watch time limits and are able to download videos for offline viewing. Pro plans start at $19.90 a month or $199 for an annual subscription. 18 | 19 | Customers can sign up to an initial 7 day free trial will automatically continue with the pro monthly subscription plan unless they cancel, downgrade to basic or upgrade to an annual pro plan at any point during the trial. 20 | 21 | When customers cancel their Foodie-Fi service - they will have a churn plan record with a null price but their plan will continue until the end of the billing period. 22 | 23 | | plan_id | plan_name | price | 24 | |---------|---------------|-------| 25 | | 0 | trial | 0 | 26 | | 1 | basic monthly | 9.90 | 27 | | 2 | pro monthly | 19.90 | 28 | | 3 | pro annual | 199 | 29 | | 4 | churn | null | 30 | 31 | ### Table 2: `subscriptions` 32 | 33 | Customer subscriptions show the exact date where their specific `plan_id` starts. 34 | 35 | If customers downgrade from a pro plan or cancel their subscription - the higher plan will remain in place until the period is over - the start_date in the subscriptions table will reflect the date that the actual plan changes. 36 | 37 | When customers upgrade their account from a basic plan to a pro or annual pro plan - the higher plan will take effect straightaway. 38 | 39 | When customers churn - they will keep their access until the end of their current billing period but the `start_date` will be technically the day they decided to cancel their service. 40 | 41 | | customer_id | plan_id | start_date | 42 | |-------------|---------|------------| 43 | | 1 | 0 | 2020-08-01 | 44 | | 1 | 1 | 2020-08-08 | 45 | | 2 | 0 | 2020-09-20 | 46 | | 2 | 3 | 2020-09-27 | 47 | | 11 | 0 | 2020-11-19 | 48 | | 11 | 4 | 2020-11-26 | 49 | | 13 | 0 | 2020-12-15 | 50 | | 13 | 1 | 2020-12-22 | 51 | | 13 | 2 | 2021-03-29 | 52 | | 15 | 0 | 2020-03-17 | 53 | | 15 | 2 | 2020-03-24 | 54 | | 15 | 4 | 2020-04-29 | 55 | | 16 | 0 | 2020-05-31 | 56 | | 16 | 1 | 2020-06-07 | 57 | | 16 | 3 | 2020-10-21 | 58 | | 18 | 0 | 2020-07-06 | 59 | | 18 | 2 | 2020-07-13 | 60 | | 19 | 0 | 2020-06-22 | 61 | | 19 | 2 | 2020-06-29 | 62 | | 19 | 3 | 2020-08-29 | 63 | 64 | ## Entity Relationship Diagram 65 | 66 | ![изображение](https://user-images.githubusercontent.com/98699089/156618670-0540e629-2726-497f-b54f-f882df7a72c5.png) 67 | 68 | ## Table of Contents 69 | 70 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#introduction) 71 | 72 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#case-study-questions) 73 | 74 | [A. Customer Journey](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#a-customer-journey) 75 | 76 | [1. Based off the 8 sample customers provided in the sample from the subscriptions table, write a brief description about each customer’s onboarding journey.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#1-based-off-the-8-sample-customers-provided-in-the-sample-from-the-subscriptions-table-write-a-brief-description-about-each-customers-onboarding-journey) 77 | 78 | [Try to keep it as short as possible - you may also want to run some sort of join to make your explanations a bit easier!](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#try-to-keep-it-as-short-as-possible---you-may-also-want-to-run-some-sort-of-join-to-make-your-explanations-a-bit-easier) 79 | 80 | [B. Data Analysis Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#b-data-analysis-questions) 81 | 82 | [1. How many customers has Foodie-Fi ever had?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#1-how-many-customers-has-foodie-fi-ever-had) 83 | 84 | [2. What is the monthly distribution of trial plan start_date values for our dataset - use the start of the month as the group by value](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#2-what-is-the-monthly-distribution-of-trial-plan-start_date-values-for-our-dataset---use-the-start-of-the-month-as-the-group-by-value) 85 | 86 | [3. What plan start_date values occur after the year 2020 for our dataset? Show the breakdown by count of events for each plan_name](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#3-what-plan-start_date-values-occur-after-the-year-2020-for-our-dataset-show-the-breakdown-by-count-of-events-for-each-plan_name) 87 | 88 | [4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#4-what-is-the-customer-count-and-percentage-of-customers-who-have-churned-rounded-to-1-decimal-place) 89 | 90 | [5. How many customers have churned straight after their initial free trial - what percentage is this rounded to the nearest whole number?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#5-how-many-customers-have-churned-straight-after-their-initial-free-trial---what-percentage-is-this-rounded-to-the-nearest-whole-number) 91 | 92 | [6. What is the number and percentage of customer plans after their initial free trial?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#6-what-is-the-number-and-percentage-of-customer-plans-after-their-initial-free-trial) 93 | 94 | [7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#7-what-is-the-customer-count-and-percentage-breakdown-of-all-5-plan_name-values-at-2020-12-31) 95 | 96 | [8. How many customers have upgraded to an annual plan in 2020?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#8-how-many-customers-have-upgraded-to-an-annual-plan-in-2020) 97 | 98 | [9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#9-how-many-days-on-average-does-it-take-for-a-customer-to-an-annual-plan-from-the-day-they-join-foodie-fi) 99 | 100 | [10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#10-can-you-further-breakdown-this-average-value-into-30-day-periods-ie-0-30-days-31-60-days-etc) 101 | 102 | [11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#11-how-many-customers-downgraded-from-a-pro-monthly-to-a-basic-monthly-plan-in-2020) 103 | 104 | [C. Challenge Payment Question](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#c-challenge-payment-question) 105 | 106 | [The Foodie-Fi team wants you to create a new payments table for the year 2020 that includes amounts paid by each customer in the subscriptions table with the following requirements:](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#the-foodie-fi-team-wants-you-to-create-a-new-payments-table-for-the-year-2020-that-includes-amounts-paid-by-each-customer-in-the-subscriptions-table-with-the-following-requirements) 107 | 108 | [D. Outside The Box Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#d-outside-the-box-questions) 109 | 110 | [1. How would you calculate the rate of growth for Foodie-Fi?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#1-how-would-you-calculate-the-rate-of-growth-for-foodie-fi) 111 | 112 | [2. What key metrics would you recommend Foodie-Fi management to track over time to assess performance of their overall business?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#2-what-key-metrics-would-you-recommend-foodie-fi-management-to-track-over-time-to-assess-performance-of-their-overall-business) 113 | 114 | [3. What are some key customer journeys or experiences that you would analyse further to improve customer retention?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#3-what-are-some-key-customer-journeys-or-experiences-that-you-would-analyse-further-to-improve-customer-retention) 115 | 116 | [4. If the Foodie-Fi team were to create an exit survey shown to customers who wish to cancel their subscription, what questions would you include in the survey?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#4-if-the-foodie-fi-team-were-to-create-an-exit-survey-shown-to-customers-who-wish-to-cancel-their-subscription-what-questions-would-you-include-in-the-survey) 117 | 118 | [5. What business levers could the Foodie-Fi team use to reduce the customer churn rate? How would you validate the effectiveness of your ideas?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%233%20-%20Foodie-Fi/Solution.md/#5-what-business-levers-could-the-foodie-fi-team-use-to-reduce-the-customer-churn-rate-how-would-you-validate-the-effectiveness-of-your-ideas) 119 | -------------------------------------------------------------------------------- /Case Study #3 - Foodie-Fi/SQL/solution_3.sql: -------------------------------------------------------------------------------- 1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 18, 2022 2 | -- Fixed on February 4, 2022 3 | 4 | /* -------------------- 5 | Case Study Questions: Case Study #3 - Foodie-Fi 6 | --------------------*/ 7 | SET 8 | search_path = foodie_fi; 9 | 10 | -- A. Customer Journey 11 | 12 | -- 1. Based off the 8 sample customers provided in the sample from the `subscriptions` table, write a brief description about each customer’s onboarding journey. 13 | -- Try to keep it as short as possible - you may also want to run some sort of join to make your explanations a bit easier! 14 | -- Non-SQL question, answered in Solution.md 15 | 16 | -- B. Data Analysis Questions 17 | -- 1. How many customers has Foodie-Fi ever had? 18 | 19 | SELECT 20 | COUNT(distinct customer_id) AS total_number_of_customers 21 | FROM 22 | subscriptions; 23 | 24 | -- 2. What is the monthly distribution of trial plan `start_date` values for our dataset - use the start of the month as the group by value 25 | 26 | SELECT 27 | start_date::varchar, 28 | plan_name, 29 | SUM(number_of_customers) AS number_of_customers 30 | FROM 31 | ( 32 | SELECT 33 | DATE_TRUNC('month', start_date)::date AS start_date, 34 | plan_name, 35 | COUNT(customer_id) AS number_of_customers 36 | FROM 37 | subscriptions AS s 38 | JOIN plans AS p ON s.plan_id = p.plan_id 39 | WHERE 40 | plan_name = 'trial' 41 | GROUP BY 42 | start_date, 43 | plan_name 44 | ) AS count_customers 45 | GROUP BY 46 | start_date, 47 | plan_name 48 | ORDER BY 49 | start_date; 50 | 51 | -- 3. What plan `start_date` values occur after the year 2020 for our dataset? Show the breakdown by count of events for each `plan_name` 52 | 53 | SELECT 54 | plan_name, 55 | COUNT(customer_id) AS number_of_events 56 | FROM 57 | subscriptions as s 58 | JOIN plans AS p ON s.plan_id = p.plan_id 59 | WHERE 60 | start_date > '2020-12-31' :: date 61 | GROUP BY 62 | plan_name 63 | ORDER BY 64 | plan_name; 65 | 66 | -- 4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place? 67 | 68 | SELECT 69 | SUM(churned_customers) AS churned_customers, 70 | ROUND( 71 | (SUM(churned_customers) / SUM(total_customers)) * 100, 72 | 1 73 | ) AS churn_percentage 74 | FROM 75 | ( 76 | SELECT 77 | CASE 78 | WHEN plan_name = 'churn' THEN COUNT(distinct customer_id) 79 | ELSE 0 80 | END AS churned_customers, 81 | CASE 82 | WHEN plan_name = 'trial' THEN COUNT(distinct customer_id) 83 | ELSE 0 84 | END AS total_customers 85 | FROM 86 | subscriptions AS s 87 | JOIN plans AS p ON s.plan_id = p.plan_id 88 | GROUP BY 89 | plan_name 90 | ) AS count_churn; 91 | 92 | -- 5. How many customers have churned straight after their initial free trial - what percentage is this rounded to the nearest whole number? 93 | 94 | SELECT 95 | plan_name, 96 | COUNT(plan_name) AS number_of_churned_customers, 97 | ROUND( 98 | COUNT(plan_name) / customers :: numeric * 100 99 | ) AS percentage_of_churned_after_trial, 100 | ROUND( 101 | COUNT(plan_name) / churned_customers :: numeric * 100 102 | ) AS churned_after_trial_to_all_churned 103 | FROM 104 | ( 105 | SELECT 106 | s.customer_id, 107 | trial_ended, 108 | plan_name 109 | FROM 110 | subscriptions AS s 111 | JOIN plans AS p ON s.plan_id = p.plan_id 112 | JOIN ( 113 | SELECT 114 | customer_id, 115 | (start_date + interval '7' day) AS trial_ended 116 | FROM 117 | subscriptions 118 | WHERE 119 | plan_id = 0 120 | ) AS t ON s.customer_id = t.customer_id 121 | WHERE 122 | start_date = trial_ended 123 | AND plan_name = 'churn' 124 | GROUP BY 125 | start_date, 126 | s.customer_id, 127 | trial_ended, 128 | plan_name 129 | ) AS count_plans, 130 | LATERAL( 131 | SELECT 132 | COUNT(distinct customer_id) AS customers 133 | FROM 134 | subscriptions 135 | ) p, 136 | LATERAL( 137 | SELECT 138 | COUNT(distinct customer_id) as churned_customers 139 | FROM 140 | subscriptions 141 | WHERE 142 | plan_id = 4 143 | ) p1 144 | GROUP BY 145 | plan_name, 146 | customers, 147 | churned_customers; 148 | 149 | -- 6. What is the number and percentage of customer plans after their initial free trial? 150 | 151 | SELECT plan_name, COUNT(plan_name) AS number_of_plans_after_trial, 152 | ROUND( 153 | ( 154 | COUNT(plan_name) / ( 155 | SELECT 156 | COUNT(distinct customer_id) 157 | FROM 158 | subscriptions 159 | ) ::numeric * 100 160 | ), 1 161 | ) AS percentage_of_total_customers 162 | FROM 163 | (SELECT 164 | s.customer_id, 165 | trial_ended, 166 | plan_name 167 | FROM 168 | subscriptions AS s 169 | JOIN plans AS p ON s.plan_id = p.plan_id 170 | JOIN ( 171 | SELECT 172 | customer_id, 173 | (start_date + interval '7' day) AS trial_ended 174 | FROM 175 | subscriptions 176 | WHERE 177 | plan_id = 0 178 | ) AS t ON s.customer_id = t.customer_id 179 | WHERE 180 | start_date = trial_ended 181 | GROUP BY 182 | start_date, 183 | s.customer_id, 184 | trial_ended, 185 | plan_name) AS count_plans 186 | GROUP BY plan_name; 187 | 188 | -- 7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31? 189 | 190 | WITH ranked AS ( 191 | SELECT 192 | plan_name, 193 | RANK() OVER( 194 | PARTITION BY customer_id 195 | ORDER BY 196 | start_date DESC 197 | ) AS RANK 198 | FROM 199 | subscriptions AS s 200 | JOIN plans AS p ON s.plan_id = p.plan_id 201 | WHERE 202 | start_date <= '2020-12-31' :: date 203 | ) 204 | SELECT 205 | plan_name, 206 | COUNT(plan_name) as number_of_plans, 207 | ROUND( 208 | ( 209 | COUNT(plan_name) / customers :: numeric 210 | ) * 100, 211 | 1 212 | ) AS percentage_of_plans 213 | FROM 214 | ranked, 215 | LATERAL( 216 | SELECT 217 | COUNT(distinct customer_id) AS customers 218 | FROM 219 | subscriptions 220 | ) c 221 | WHERE 222 | rank = 1 223 | GROUP BY 224 | plan_name, 225 | customers 226 | ORDER BY 227 | 1; 228 | 229 | -- 8. How many customers have upgraded to an annual plan in 2020? 230 | 231 | SELECT 232 | plan_name, 233 | count(plan_name) AS number_of_customers 234 | FROM 235 | subscriptions AS s 236 | JOIN plans AS p ON s.plan_id = p.plan_id 237 | WHERE 238 | start_date BETWEEN '2020-01-01' :: date 239 | AND '2020-12-31' :: date 240 | AND plan_name = 'pro annual' 241 | GROUP BY 242 | plan_name; 243 | 244 | -- 9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi? 245 | 246 | SELECT 247 | plan_name, 248 | ROUND(AVG(s.start_date - t.start_date)) AS average_days_to_upgrade 249 | FROM 250 | subscriptions AS s 251 | JOIN plans AS p ON s.plan_id = p.plan_id 252 | JOIN ( 253 | SELECT 254 | customer_id, 255 | start_date 256 | FROM 257 | subscriptions 258 | WHERE 259 | plan_id = 0 260 | ) AS t ON s.customer_id = t.customer_id 261 | WHERE 262 | plan_name = 'pro annual' 263 | GROUP BY 264 | plan_name; 265 | 266 | -- 10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc) 267 | 268 | SELECT 269 | * 270 | FROM 271 | ( 272 | SELECT 273 | plan_name, 274 | CASE 275 | WHEN s.start_date - t.start_date < 31 THEN '0-30 days' 276 | WHEN s.start_date - t.start_date BETWEEN 31 277 | AND 60 THEN '31-60 days' 278 | WHEN s.start_date - t.start_date BETWEEN 61 279 | AND 90 THEN '61-90 days' 280 | WHEN s.start_date - t.start_date BETWEEN 91 281 | AND 120 THEN '91-120 days' 282 | WHEN s.start_date - t.start_date BETWEEN 121 283 | AND 150 THEN '121-150 days' 284 | WHEN s.start_date - t.start_date BETWEEN 151 285 | AND 180 THEN '151-180 days' 286 | WHEN s.start_date - t.start_date BETWEEN 181 287 | AND 210 THEN '181-210 days' 288 | WHEN s.start_date - t.start_date BETWEEN 211 289 | AND 240 THEN '211-240 days' 290 | WHEN s.start_date - t.start_date BETWEEN 241 291 | AND 270 THEN '241-270 days' 292 | WHEN s.start_date - t.start_date BETWEEN 271 293 | AND 300 THEN '271-300 days' 294 | WHEN s.start_date - t.start_date BETWEEN 301 295 | AND 330 THEN '301-330 days' 296 | WHEN s.start_date - t.start_date BETWEEN 331 297 | AND 360 THEN '331-360 days' 298 | WHEN s.start_date - t.start_date > 360 THEN '360+ days' 299 | END AS group_by_days_to_upgrade, 300 | COUNT(s.start_date - t.start_date) AS number_of_customers, 301 | ROUND(AVG(s.start_date - t.start_date)) AS average_days_to_upgrade 302 | FROM 303 | subscriptions AS s 304 | JOIN plans AS p ON s.plan_id = p.plan_id 305 | JOIN ( 306 | SELECT 307 | customer_id, 308 | start_date 309 | FROM 310 | subscriptions 311 | WHERE 312 | plan_id = 0 313 | ) AS t ON s.customer_id = t.customer_id 314 | WHERE 315 | plan_name = 'pro annual' 316 | GROUP BY 317 | plan_name, 318 | group_by_days_to_upgrade 319 | ) AS count_groups 320 | GROUP BY 321 | plan_name, 322 | group_by_days_to_upgrade, 323 | number_of_customers, 324 | average_days_to_upgrade 325 | ORDER BY 326 | CASE 327 | WHEN group_by_days_to_upgrade = '0-30 days' THEN 1 328 | WHEN group_by_days_to_upgrade = '31-60 days' THEN 2 329 | WHEN group_by_days_to_upgrade = '61-90 days' THEN 3 330 | WHEN group_by_days_to_upgrade = '91-120 days' THEN 4 331 | WHEN group_by_days_to_upgrade = '121-150 days' THEN 5 332 | WHEN group_by_days_to_upgrade = '151-180 days' THEN 6 333 | WHEN group_by_days_to_upgrade = '181-210 days' THEN 7 334 | WHEN group_by_days_to_upgrade = '211-240 days' THEN 8 335 | WHEN group_by_days_to_upgrade = '241-270 days' THEN 9 336 | WHEN group_by_days_to_upgrade = '271-300 days' THEN 10 337 | WHEN group_by_days_to_upgrade = '301-330 days' THEN 11 338 | WHEN group_by_days_to_upgrade = '331-360 days' THEN 12 339 | WHEN group_by_days_to_upgrade = '360+ days' THEN 13 340 | END; 341 | 342 | -- 11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020? 343 | 344 | SELECT 345 | s.customer_id, 346 | t.start_date::varchar AS basic_plan_started_on, 347 | s.start_date::varchar AS pro_plan_started_on, 348 | (t.start_date - s.start_date) AS days_between_basic_and_pro 349 | FROM 350 | subscriptions AS s 351 | JOIN plans AS p ON s.plan_id = p.plan_id 352 | JOIN ( 353 | SELECT 354 | customer_id, 355 | start_date 356 | FROM 357 | subscriptions 358 | WHERE 359 | plan_id = 1 360 | ) AS t ON s.customer_id = t.customer_id 361 | WHERE 362 | plan_name = 'pro monthly' 363 | AND s.start_date >= '2020-01-01' :: date 364 | AND s.start_date <= '2020-12-31' :: date 365 | GROUP BY 366 | s.start_date, 367 | t.start_date, 368 | s.customer_id, 369 | plan_name 370 | ORDER BY 371 | days_between_basic_and_pro DESC 372 | LIMIT 373 | 5; 374 | -- C. Challenge Payment Question 375 | 376 | /* -------------------- 377 | The Foodie-Fi team wants you to create a new `payments` table for the year 2020 that includes amounts paid by each customer in the `subscriptions` table with the following requirements: 378 | 379 | - monthly payments always occur on the same day of month as the original `start_date` of any monthly paid plan 380 | - 381 | - upgrades from basic to monthly or pro plans are reduced by the current paid amount in that month and start immediately 382 | - 383 | - upgrades from pro monthly to pro annual are paid at the end of the current billing period and also starts at the end of the month period 384 | - 385 | - once a customer churns they will no longer make payments 386 | --------------------*/ 387 | 388 | SELECT 389 | customer_id, 390 | plan_id, 391 | plan_name, 392 | payment_date ::date :: varchar, 393 | CASE 394 | WHEN LAG(plan_id) OVER ( 395 | PARTITION BY customer_id 396 | ORDER BY 397 | plan_id 398 | ) != plan_id 399 | AND DATE_PART( 400 | 'day', 401 | payment_date - LAG(payment_date) OVER ( 402 | PARTITION BY customer_id 403 | ORDER BY 404 | plan_id 405 | ) 406 | ) < 30 THEN amount - LAG(amount) OVER ( 407 | PARTITION BY customer_id 408 | ORDER BY 409 | plan_id 410 | ) 411 | ELSE amount 412 | END AS amount, 413 | RANK() OVER( 414 | PARTITION BY customer_id 415 | ORDER BY 416 | payment_date 417 | ) AS payment_order 418 | 419 | INTO TEMP TABLE payments 420 | FROM 421 | ( 422 | SELECT 423 | customer_id, 424 | s.plan_id, 425 | plan_name, 426 | generate_series( 427 | start_date, 428 | CASE 429 | WHEN s.plan_id = 3 THEN start_date 430 | WHEN s.plan_id = 4 THEN NULL 431 | WHEN LEAD(start_date) OVER ( 432 | PARTITION BY customer_id 433 | ORDER BY 434 | start_date 435 | ) IS NOT NULL THEN LEAD(start_date) OVER ( 436 | PARTITION BY customer_id 437 | ORDER BY 438 | start_date 439 | ) 440 | ELSE '2020-12-31' :: date 441 | END, 442 | '1 month' + '1 second' :: interval 443 | ) AS payment_date, 444 | price AS amount 445 | FROM 446 | subscriptions AS s 447 | JOIN plans AS p ON s.plan_id = p.plan_id 448 | WHERE 449 | s.plan_id != 0 450 | AND start_date < '2021-01-01' :: date 451 | GROUP BY 452 | customer_id, 453 | s.plan_id, 454 | plan_name, 455 | start_date, 456 | price 457 | ) AS t 458 | ORDER BY 459 | customer_id; 460 | 461 | -- D. Outside The Box Questions 462 | -- The following are open ended questions which might be asked during a technical interview for this case study - there are no right or wrong answers, but answers that make sense from both a technical and a business perspective make an amazing impression! 463 | 464 | -- 1. How would you calculate the rate of growth for Foodie-Fi? 465 | 466 | SELECT 467 | DATE_TRUNC('month', start_date) AS month, 468 | COUNT(customer_id) AS current_number_of_customers, 469 | LAG(COUNT(customer_id), 1) over ( 470 | ORDER BY 471 | DATE_TRUNC('month', start_date) 472 | ) AS past_number_of_customers, 473 | ( 474 | 100 * ( 475 | COUNT(customer_id) - LAG(COUNT(customer_id), 1) over ( 476 | ORDER BY 477 | DATE_TRUNC('month', start_date) 478 | ) 479 | ) / LAG(COUNT(customer_id), 1) over ( 480 | ORDER BY 481 | DATE_TRUNC('month', start_date) 482 | ) 483 | ) || '%' AS growth 484 | FROM 485 | subscriptions AS s 486 | JOIN plans AS p ON s.plan_id = p.plan_id 487 | WHERE 488 | plan_name != 'trial' 489 | AND plan_name != 'churn' 490 | GROUP BY 491 | month 492 | ORDER BY 493 | month; 494 | 495 | -- 2. What key metrics would you recommend Foodie-Fi management to track over time to assess performance of their overall business? 496 | 497 | -- Non-SQL question 498 | 499 | -- 3. What are some key customer journeys or experiences that you would analyse further to improve customer retention? 500 | 501 | -- Non-SQL question 502 | 503 | -- 4. If the Foodie-Fi team were to create an exit survey shown to customers who wish to cancel their subscription, what questions would you include in the survey? 504 | 505 | -- Non-SQL question 506 | 507 | -- 5. What business levers could the Foodie-Fi team use to reduce the customer churn rate? How would you validate the effectiveness of your ideas? 508 | 509 | -- Non-SQL question 510 | -------------------------------------------------------------------------------- /Case Study #4 - Data Bank/Solution.md: -------------------------------------------------------------------------------- 1 | [WORK IN PROGRESS] 2 | -------------------------------------------------------------------------------- /Case Study #5 - Data Mart/README.md: -------------------------------------------------------------------------------- 1 | # Case Study #5 - Data Mart :shopping: 2 | 3 | 4 | 5 | Data Mart is Danny’s latest venture and after running international operations for his online supermarket that specialises in fresh produce - Danny is asking for your support to analyse his sales performance. 6 | 7 | ## Available Data 8 | 9 | For this case study there is only a single table: `data_mart.weekly_sales`. 10 | 11 | The columns are pretty self-explanatory based on the column names but here are some further details about the dataset: 12 | - Data Mart has international operations using a multi-region strategy 13 | - Data Mart has both, a retail and online platform in the form of a Shopify store front to serve their customers 14 | - Customer `segment` and `customer_type` data relates to personal age and demographics information that is shared with Data Mart 15 | - `transactions` is the count of unique purchases made through Data Mart and sales is the actual dollar amount of purchases 16 | 17 | Each record in the dataset is related to a specific aggregated slice of the underlying sales data rolled up into a `week_date` value which represents the start of the sales week. 18 | 19 | 10 random rows are shown in the table output below from `data_mart.weekly_sales`: 20 | 21 | | week_date | region | platform | segment | customer_type | transactions | sales | 22 | |-----------|---------------|----------|---------|---------------|--------------|------------| 23 | | 9/9/20 | OCEANIA | Shopify | C3 | New | 610 | 110033.89 | 24 | | 29/7/20 | AFRICA | Retail | C1 | New | 110692 | 3053771.19 | 25 | | 22/7/20 | EUROPE | Shopify | C4 | Existing | 24 | 8101.54 | 26 | | 13/5/20 | AFRICA | Shopify | null | Guest | 5287 | 1003301.37 | 27 | | 24/7/19 | ASIA | Retail | C1 | New | 127342 | 3151780.41 | 28 | | 10/7/19 | CANADA | Shopify | F3 | New | 51 | 8844.93 | 29 | | 26/6/19 | OCEANIA | Retail | C3 | New | 152921 | 5551385.36 | 30 | | 29/5/19 | SOUTH AMERICA | Shopify | null | New | 53 | 10056.2 | 31 | | 22/8/18 | AFRICA | Retail | null | Existing | 31721 | 1718863.58 | 32 | | 25/7/18 | SOUTH AMERICA | Retail | null | New | 2136 | 81757.91 | 33 | 34 | ## Entity Relationship Diagram 35 | 36 | ![изображение](https://user-images.githubusercontent.com/98699089/156622725-8f9981cb-a9f5-4c18-bcd8-27472fbb55bb.png) 37 | 38 | ## Table of Contents 39 | 40 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#introduction) 41 | 42 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#case-study-questions) 43 | 44 | [1. Data Cleansing Steps](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#1-data-cleansing-steps) 45 | 46 | [2. Data Exploration](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#2-data-exploration) 47 | 48 | [1. What day of the week is used for each week_date value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#1-what-day-of-the-week-is-used-for-each-week_date-value) 49 | 50 | [2. What range of week numbers are missing from the dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#2-what-range-of-week-numbers-are-missing-from-the-dataset) 51 | 52 | [3. How many total transactions were there for each year in the dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#3-how-many-total-transactions-were-there-for-each-year-in-the-dataset) 53 | 54 | [4. What is the total sales for each region for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#4-what-is-the-total-sales-for-each-region-for-each-month) 55 | 56 | [5. What is the total count of transactions for each platform?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#5-what-is-the-total-count-of-transactions-for-each-platform) 57 | 58 | [6. What is the percentage of sales for Retail vs Shopify for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#6-what-is-the-percentage-of-sales-for-retail-vs-shopify-for-each-month) 59 | 60 | [7. What is the percentage of sales by demographic for each year in the dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#7-what-is-the-percentage-of-sales-by-demographic-for-each-year-in-the-dataset) 61 | 62 | [8. Which age_band and demographic values contribute the most to Retail sales?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#8-which-age_band-and-demographic-values-contribute-the-most-to-retail-sales) 63 | 64 | [9. Can we use the avg_transaction column to find the average transaction size for each year for Retail vs Shopify? If not - how would you calculate it instead?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#9-can-we-use-the-avg_transaction-column-to-find-the-average-transaction-size-for-each-year-for-retail-vs-shopify-if-not---how-would-you-calculate-it-instead) 65 | 66 | [3. Before & After Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#3-before--after-analysis) 67 | 68 | [1. What is the total sales for the 4 weeks before and after 2020-06-15? What is the growth or reduction rate in actual values and percentage of sales?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#1-what-is-the-total-sales-for-the-4-weeks-before-and-after-2020-06-15-what-is-the-growth-or-reduction-rate-in-actual-values-and-percentage-of-sales) 69 | 70 | [2. What about the entire 12 weeks before and after?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#2-what-about-the-entire-12-weeks-before-and-after) 71 | 72 | [3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#3-how-do-the-sale-metrics-for-these-2-periods-before-and-after-compare-with-the-previous-years-in-2018-and-2019) 73 | 74 | [4. Bonus Question](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#4-bonus-question) 75 | 76 | [Which areas of the business have the highest negative impact in sales metrics performance in 2020 for the 12 week before and after period?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#which-areas-of-the-business-have-the-highest-negative-impact-in-sales-metrics-performance-in-2020-for-the-12-week-before-and-after-period) 77 | 78 | [Do you have any further recommendations for Danny’s team at Data Mart or any interesting insights based off this analysis?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md/#do-you-have-any-further-recommendations-for-dannys-team-at-data-mart-or-any-interesting-insights-based-off-this-analysis) 79 | -------------------------------------------------------------------------------- /Case Study #5 - Data Mart/SQL/solution_5.sql: -------------------------------------------------------------------------------- 1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 25, 2022 2 | -- Fixed on February 13, 2022 3 | 4 | 5 | -- Case Study #5 - Data Mart 6 | 7 | 8 | SET 9 | search_path = data_mart; 10 | 11 | /* -------------------- 12 | Case Study Questions 13 | 14 | 1. Data Cleansing Steps 15 | 16 | In a single query, perform the following operations and generate a new table in the `data_mart` schema named `clean_weekly_sales`: 17 | 18 | - Convert the `week_date` to a `DATE` format 19 | - Add a `week_number` as the second column for each `week_date` value, for example any value from the 1st of January to 7th of January will be 1, 8th to 14th will be 2 etc 20 | - Add a `month_number` with the calendar month for each week_date value as the 3rd column 21 | - Add a `calendar_year` column as the 4th column containing either 2018, 2019 or 2020 values 22 | - Add a new column called `age_band` after the original segment column using the following mapping on the number inside the segment value 23 | - Add a new `demographic` column using the following mapping for the first letter in the segment values 24 | - Ensure all `null` string values with an "unknown" string value in the original segment column as well as the new `age_band` and `demographic` columns 25 | - Generate a new `avg_transaction` column as the sales value divided by transactions rounded to 2 decimal places for each record 26 | --------------------*/ 27 | 28 | SELECT 29 | date AS week_date, 30 | EXTRACT( 31 | WEEK 32 | FROM 33 | date 34 | ) :: int AS week_number, 35 | EXTRACT( 36 | MONTH 37 | FROM 38 | date 39 | ) :: int AS month_number, 40 | EXTRACT( 41 | YEAR 42 | FROM 43 | date 44 | ) :: int AS calendar_year, 45 | region, 46 | platform, 47 | CASE 48 | WHEN segment = 'null' then 'unknown' 49 | ELSE segment 50 | END AS segment, 51 | CASE 52 | WHEN substr(segment, 2, 1) = '1' then 'Young Adults' 53 | WHEN substr(segment, 2, 1) = '2' then 'Middle Aged' 54 | WHEN substr(segment, 2, 1) in ('3', '4') then 'Retirees' 55 | ELSE 'unknown' 56 | END AS age_band, 57 | CASE 58 | WHEN substr(segment, 1, 1) = 'C' then 'Couples' 59 | WHEN substr(segment, 1, 1) = 'F' then 'Families' 60 | ELSE 'unknown' 61 | END AS demographic, 62 | customer_type, 63 | transactions, 64 | sales, 65 | avg_transaction INTO clean_weekly_sales 66 | FROM 67 | weekly_sales, 68 | LATERAL ( 69 | SELECT 70 | TO_DATE(week_date, 'DD/MM/YY') AS date 71 | ) date, 72 | LATERAL ( 73 | SELECT 74 | ROUND((sales :: numeric / transactions), 2) AS avg_transaction 75 | ) avt 76 | ORDER BY 77 | calendar_year; 78 | 79 | /* -------------------- 80 | 2. Data Exploration 81 | 82 | 1. What day of the week is used for each week_date value? 83 | --------------------*/ 84 | 85 | SELECT 86 | EXTRACT( 87 | ISODOW 88 | FROM 89 | week_date 90 | ) AS day_of_week 91 | FROM 92 | clean_weekly_sales 93 | GROUP BY 94 | 1; 95 | 96 | -- 2. What range of week numbers are missing from the dataset? 97 | 98 | WITH series as ( 99 | SELECT 100 | GENERATE_SERIES(1, 52) as missing_weeks 101 | FROM 102 | clean_weekly_sales 103 | ) 104 | SELECT 105 | missing_weeks 106 | FROM 107 | series 108 | WHERE 109 | missing_weeks NOT IN( 110 | SELECT 111 | week_number 112 | FROM 113 | clean_weekly_sales 114 | ) 115 | GROUP BY 116 | 1 117 | ORDER BY 118 | 1; 119 | 120 | -- 3. How many total transactions were there for each year in the dataset? 121 | 122 | SELECT 123 | calendar_year, 124 | SUM(transactions) AS total_number_of_transactions 125 | FROM 126 | clean_weekly_sales 127 | GROUP BY 128 | 1 129 | ORDER BY 130 | 1; 131 | 132 | -- 4. What is the total sales for each region for each month? 133 | 134 | SELECT 135 | region, 136 | month_number, 137 | SUM(sales) AS total_sales 138 | FROM 139 | clean_weekly_sales 140 | GROUP BY 141 | 1, 142 | 2 143 | ORDER BY 144 | 1, 145 | 2; 146 | 147 | -- 5. What is the total count of transactions for each platform? 148 | 149 | SELECT 150 | platform, 151 | SUM(transactions) as total_transactions 152 | FROM 153 | clean_weekly_sales 154 | GROUP BY 155 | 1 156 | ORDER BY 157 | 1; 158 | 159 | -- 6. What is the percentage of sales for Retail vs Shopify for each month? 160 | 161 | SELECT 162 | sr.month_number, 163 | sr.calendar_year, 164 | ROUND(100 *(SUM(sales) :: numeric / total_sales_s), 1) AS percentage_of_sales_retail, 165 | 100 - ROUND(100 *(SUM(sales) :: numeric / total_sales_s), 1) AS percentage_of_sales_shopify 166 | FROM 167 | clean_weekly_sales AS sr 168 | JOIN ( 169 | SELECT 170 | ss.month_number, 171 | ss.calendar_year, 172 | SUM(sales) AS total_sales_s 173 | FROM 174 | clean_weekly_sales AS ss 175 | GROUP BY 176 | ss.month_number, 177 | ss.calendar_year 178 | ) ss ON sr.month_number = ss.month_number 179 | and ss.calendar_year = sr.calendar_year 180 | WHERE 181 | sr.platform = 'Retail' 182 | GROUP BY 183 | sr.month_number, 184 | sr.calendar_year, 185 | total_sales_s 186 | ORDER BY 187 | 2, 188 | 1; 189 | 190 | -- 7. What is the percentage of sales by demographic for each year in the dataset? 191 | 192 | SELECT 193 | s.calendar_year, 194 | demographic, 195 | ROUND(100 *(SUM(sales) :: numeric / total_sales), 1) AS percentage 196 | FROM 197 | clean_weekly_sales AS s 198 | JOIN ( 199 | SELECT 200 | cSUMalendar_year, 201 | SUM(sales) AS total_sales 202 | FROM 203 | clean_weekly_sales AS ts 204 | GROUP BY 205 | calendar_year 206 | ) ts ON s.calendar_year = ts.calendar_year 207 | GROUP BY 208 | 1, 209 | 2, 210 | total_sales 211 | ORDER BY 212 | 1, 213 | 2; 214 | 215 | -- 8. Which age_band and demographic values contribute the most to Retail sales? 216 | 217 | SELECT 218 | age_band, 219 | demographic, 220 | SUM(sales) AS total_sales 221 | FROM 222 | clean_weekly_sales 223 | WHERE 224 | platform = 'Retail' 225 | GROUP BY 226 | 1, 227 | 2 228 | ORDER BY 229 | 3 DESC, 230 | 1, 231 | 2; 232 | 233 | SELECT 234 | age_band, 235 | total_sales, 236 | percentage_of_sales 237 | FROM 238 | ( 239 | SELECT 240 | age_band, 241 | SUM(sales) AS total_sales, 242 | round(100 * (SUM(sales) :: numeric / all_sales), 1) AS percentage_of_sales, 243 | row_number() over ( 244 | ORDER BY 245 | SUM(sales) DESC 246 | ) 247 | FROM 248 | clean_weekly_sales, 249 | LATERAL ( 250 | SELECT 251 | SUM(sales) AS all_sales 252 | FROM 253 | clean_weekly_sales 254 | ) s 255 | WHERE 256 | platform = 'Retail' 257 | AND demographic != 'unknown' 258 | GROUP BY 259 | 1, 260 | all_sales 261 | ) ts 262 | WHERE 263 | row_number = 1; 264 | 265 | SELECT 266 | demographic, 267 | total_sales, 268 | percentage_of_sales 269 | FROM 270 | ( 271 | SELECT 272 | demographic, 273 | SUM(sales) AS total_sales, 274 | round(100 * (SUM(sales) :: numeric / all_sales), 1) AS percentage_of_sales, 275 | row_number() over ( 276 | ORDER BY 277 | SUM(sales) DESC 278 | ) 279 | FROM 280 | clean_weekly_sales, 281 | LATERAL ( 282 | SELECT 283 | SUM(sales) AS all_sales 284 | FROM 285 | clean_weekly_sales 286 | ) s 287 | WHERE 288 | platform = 'Retail' 289 | AND demographic != 'unknown' 290 | GROUP BY 291 | 1, 292 | all_sales 293 | ) ts 294 | WHERE 295 | row_number = 1; 296 | 297 | SELECT 298 | demographic, 299 | age_band, 300 | total_sales, 301 | percentage_of_sales 302 | FROM 303 | ( 304 | SELECT 305 | demographic, 306 | age_band, 307 | SUM(sales) AS total_sales, 308 | round(100 * (SUM(sales) :: numeric / all_sales), 1) AS percentage_of_sales, 309 | row_number() over ( 310 | ORDER BY 311 | SUM(sales) DESC 312 | ) 313 | FROM 314 | clean_weekly_sales, 315 | LATERAL ( 316 | SELECT 317 | SUM(sales) AS all_sales 318 | FROM 319 | clean_weekly_sales 320 | ) s 321 | WHERE 322 | platform = 'Retail' 323 | AND demographic != 'unknown' 324 | GROUP BY 325 | 1, 326 | 2, 327 | all_sales 328 | ) ts 329 | WHERE 330 | row_number = 1; 331 | 332 | -- 9. Can we use the avg_transaction column to find the average transaction size for each year for Retail vs Shopify? If not - how would you calculate it instead? 333 | 334 | SELECT 335 | calendar_year, 336 | platform, 337 | ROUND(SUM(sales) :: numeric / SUM(transactions), 1) AS correct_avg, 338 | ROUND(AVG(avg_transaction), 1) AS incorrect_avg 339 | FROM 340 | clean_weekly_sales 341 | GROUP BY 342 | 1, 343 | 2 344 | ORDER BY 345 | 1, 346 | 2; 347 | 348 | /* -------------------- 349 | 3. Before & After Analysis 350 | 351 | This technique is usually used when we inspect an important event and want to inspect the impact before and after a certain point in time. 352 | 353 | Taking the `week_date` value of 2020-06-15 as the baseline week where the Data Mart sustainable packaging changes came into effect. 354 | We would include all `week_date` values for 2020-06-15 as the start of the period after the change and the previous `week_date` values would be before. 355 | 356 | Using this analysis approach - answer the following questions: 357 | 358 | 1. What is the total sales for the 4 weeks before and after 2020-06-15? What is the growth or reduction rate in actual values and percentage of sales? 359 | --------------------*/ 360 | 361 | WITH sales_before AS ( 362 | SELECT 363 | SUM(sales) AS total_sales_before 364 | FROM 365 | clean_weekly_sales, 366 | LATERAL( 367 | SELECT 368 | EXTRACT( 369 | WEEK 370 | FROM 371 | '2020-06-15' :: date 372 | ) AS base_week 373 | ) bw 374 | WHERE 375 | calendar_year = 2020 376 | AND week_number between (base_week - 4) 377 | AND (base_week - 1) 378 | ), 379 | sales_after AS ( 380 | SELECT 381 | SUM(sales) AS total_sales_after 382 | FROM 383 | clean_weekly_sales, 384 | LATERAL( 385 | SELECT 386 | EXTRACT( 387 | WEEK 388 | FROM 389 | '2020-06-15' :: date 390 | ) AS base_week 391 | ) bw 392 | WHERE 393 | calendar_year = 2020 394 | AND week_number between (base_week) 395 | AND (base_week + 3) 396 | ) 397 | SELECT 398 | total_sales_before, 399 | total_sales_after, 400 | total_sales_after - total_sales_before AS change_in_sales, 401 | ROUND( 402 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 403 | 2 404 | ) AS percentage_of_change 405 | FROM 406 | sales_before, 407 | sales_after; 408 | 409 | -- 2. What about the entire 12 weeks before and after? 410 | 411 | WITH sales_before AS ( 412 | SELECT 413 | SUM(sales) AS total_sales_before 414 | FROM 415 | clean_weekly_sales, 416 | LATERAL( 417 | SELECT 418 | EXTRACT( 419 | WEEK 420 | FROM 421 | '2020-06-15' :: date 422 | ) AS base_week 423 | ) bw 424 | WHERE 425 | calendar_year = 2020 426 | AND week_number between (base_week - 12) 427 | AND (base_week - 1) 428 | ), 429 | sales_after AS ( 430 | SELECT 431 | SUM(sales) AS total_sales_after 432 | FROM 433 | clean_weekly_sales, 434 | LATERAL( 435 | SELECT 436 | EXTRACT( 437 | WEEK 438 | FROM 439 | '2020-06-15' :: date 440 | ) AS base_week 441 | ) bw 442 | WHERE 443 | calendar_year = 2020 444 | AND week_number between (base_week) 445 | AND (base_week + 11) 446 | ) 447 | SELECT 448 | total_sales_before, 449 | total_sales_after, 450 | total_sales_after - total_sales_before AS change_in_sales, 451 | ROUND( 452 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 453 | 2 454 | ) AS percentage_of_change 455 | FROM 456 | sales_before, 457 | sales_after; 458 | 459 | -- 3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019? 460 | 461 | WITH sales_before AS ( 462 | SELECT 463 | calendar_year, 464 | SUM(sales) AS total_sales_before 465 | FROM 466 | clean_weekly_sales, 467 | LATERAL( 468 | SELECT 469 | EXTRACT( 470 | WEEK 471 | FROM 472 | '2020-06-15' :: date 473 | ) AS base_week 474 | ) bw 475 | WHERE 476 | week_number between (base_week - 4) 477 | AND (base_week - 1) 478 | group by 479 | 1 480 | ), 481 | sales_after AS ( 482 | SELECT 483 | calendar_year, 484 | SUM(sales) AS total_sales_after 485 | FROM 486 | clean_weekly_sales, 487 | LATERAL( 488 | SELECT 489 | EXTRACT( 490 | WEEK 491 | FROM 492 | '2020-06-15' :: date 493 | ) AS base_week 494 | ) bw 495 | WHERE 496 | week_number between (base_week) 497 | AND (base_week + 3) 498 | group by 499 | 1 500 | ) 501 | SELECT 502 | sb.calendar_year, 503 | total_sales_before, 504 | total_sales_after, 505 | total_sales_after - total_sales_before AS change_in_sales, 506 | ROUND( 507 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 508 | 2 509 | ) AS percentage_of_change 510 | FROM 511 | sales_before AS sb 512 | JOIN sales_after AS sa ON sb.calendar_year = sa.calendar_year 513 | group by 514 | 1, 515 | 2, 516 | 3; 517 | 518 | WITH sales_before AS ( 519 | SELECT 520 | calendar_year, 521 | SUM(sales) AS total_sales_before 522 | FROM 523 | clean_weekly_sales, 524 | LATERAL( 525 | SELECT 526 | EXTRACT( 527 | WEEK 528 | FROM 529 | '2020-06-15' :: date 530 | ) AS base_week 531 | ) bw 532 | WHERE 533 | week_number between (base_week - 12) 534 | AND (base_week - 1) 535 | group by 536 | 1 537 | ), 538 | sales_after AS ( 539 | SELECT 540 | calendar_year, 541 | SUM(sales) AS total_sales_after 542 | FROM 543 | clean_weekly_sales, 544 | LATERAL( 545 | SELECT 546 | EXTRACT( 547 | WEEK 548 | FROM 549 | '2020-06-15' :: date 550 | ) AS base_week 551 | ) bw 552 | WHERE 553 | week_number between (base_week) 554 | AND (base_week + 11) 555 | group by 556 | 1 557 | ) 558 | SELECT 559 | sb.calendar_year, 560 | total_sales_before, 561 | total_sales_after, 562 | total_sales_after - total_sales_before AS change_in_sales, 563 | ROUND( 564 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 565 | 2 566 | ) AS percentage_of_change 567 | FROM 568 | sales_before AS sb 569 | JOIN sales_after AS sa ON sb.calendar_year = sa.calendar_year 570 | group by 571 | 1, 572 | 2, 573 | 3; 574 | 575 | /* -------------------- 576 | 4. Bonus Question 577 | 578 | Which areas of the business have the highest negative impact in sales metrics performance in 2020 for the 12 week before and after period? 579 | --------------------*/ 580 | 581 | -- region: 582 | 583 | WITH sales_before AS ( 584 | SELECT 585 | region, 586 | SUM(sales) AS total_sales_before 587 | FROM 588 | clean_weekly_sales, 589 | LATERAL( 590 | SELECT 591 | EXTRACT( 592 | WEEK 593 | FROM 594 | '2020-06-15' :: date 595 | ) AS base_week 596 | ) bw 597 | WHERE 598 | calendar_year = 2020 599 | AND week_number between (base_week - 12) 600 | AND (base_week - 1) 601 | GROUP BY 602 | 1 603 | ), 604 | sales_after AS ( 605 | SELECT 606 | region, 607 | SUM(sales) AS total_sales_after 608 | FROM 609 | clean_weekly_sales, 610 | LATERAL( 611 | SELECT 612 | EXTRACT( 613 | WEEK 614 | FROM 615 | '2020-06-15' :: date 616 | ) AS base_week 617 | ) bw 618 | WHERE 619 | calendar_year = 2020 620 | AND week_number between (base_week) 621 | AND (base_week + 11) 622 | GROUP BY 623 | 1 624 | ) 625 | SELECT 626 | sb.region, 627 | total_sales_before, 628 | total_sales_after, 629 | total_sales_after - total_sales_before AS change_in_sales, 630 | ROUND( 631 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 632 | 2 633 | ) AS percentage_of_change 634 | FROM 635 | sales_before AS sb 636 | JOIN sales_after AS sa ON sb.region = sa.region 637 | GROUP BY 638 | 1, 639 | 2, 640 | 3, 641 | 4 642 | ORDER BY 643 | 5; 644 | 645 | -- platform: 646 | 647 | WITH sales_before AS ( 648 | SELECT 649 | platform, 650 | SUM(sales) AS total_sales_before 651 | FROM 652 | clean_weekly_sales, 653 | LATERAL( 654 | SELECT 655 | EXTRACT( 656 | WEEK 657 | FROM 658 | '2020-06-15' :: date 659 | ) AS base_week 660 | ) bw 661 | WHERE 662 | calendar_year = 2020 663 | AND week_number between (base_week - 12) 664 | AND (base_week - 1) 665 | GROUP BY 666 | 1 667 | ), 668 | sales_after AS ( 669 | SELECT 670 | platform, 671 | SUM(sales) AS total_sales_after 672 | FROM 673 | clean_weekly_sales, 674 | LATERAL( 675 | SELECT 676 | EXTRACT( 677 | WEEK 678 | FROM 679 | '2020-06-15' :: date 680 | ) AS base_week 681 | ) bw 682 | WHERE 683 | calendar_year = 2020 684 | AND week_number between (base_week) 685 | AND (base_week + 11) 686 | GROUP BY 687 | 1 688 | ) 689 | SELECT 690 | sb.platform, 691 | total_sales_before, 692 | total_sales_after, 693 | total_sales_after - total_sales_before AS change_in_sales, 694 | ROUND( 695 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 696 | 2 697 | ) AS percentage_of_change 698 | FROM 699 | sales_before AS sb 700 | JOIN sales_after AS sa ON sb.platform = sa.platform 701 | GROUP BY 702 | 1, 703 | 2, 704 | 3, 705 | 4 706 | ORDER BY 707 | 5; 708 | 709 | -- age_band: 710 | 711 | WITH sales_before AS ( 712 | SELECT 713 | age_band, 714 | SUM(sales) AS total_sales_before 715 | FROM 716 | clean_weekly_sales, 717 | LATERAL( 718 | SELECT 719 | EXTRACT( 720 | WEEK 721 | FROM 722 | '2020-06-15' :: date 723 | ) AS base_week 724 | ) bw 725 | WHERE 726 | calendar_year = 2020 727 | AND week_number between (base_week - 12) 728 | AND (base_week - 1) 729 | GROUP BY 730 | 1 731 | ), 732 | sales_after AS ( 733 | SELECT 734 | age_band, 735 | SUM(sales) AS total_sales_after 736 | FROM 737 | clean_weekly_sales, 738 | LATERAL( 739 | SELECT 740 | EXTRACT( 741 | WEEK 742 | FROM 743 | '2020-06-15' :: date 744 | ) AS base_week 745 | ) bw 746 | WHERE 747 | calendar_year = 2020 748 | AND week_number between (base_week) 749 | AND (base_week + 11) 750 | GROUP BY 751 | 1 752 | ) 753 | SELECT 754 | sb.age_band, 755 | total_sales_before, 756 | total_sales_after, 757 | total_sales_after - total_sales_before AS change_in_sales, 758 | ROUND( 759 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 760 | 2 761 | ) AS percentage_of_change 762 | FROM 763 | sales_before AS sb 764 | JOIN sales_after AS sa ON sb.age_band = sa.age_band 765 | GROUP BY 766 | 1, 767 | 2, 768 | 3, 769 | 4 770 | ORDER BY 771 | 5; 772 | 773 | -- demographic: 774 | 775 | WITH sales_before AS ( 776 | SELECT 777 | demographic, 778 | SUM(sales) AS total_sales_before 779 | FROM 780 | clean_weekly_sales, 781 | LATERAL( 782 | SELECT 783 | EXTRACT( 784 | WEEK 785 | FROM 786 | '2020-06-15' :: date 787 | ) AS base_week 788 | ) bw 789 | WHERE 790 | calendar_year = 2020 791 | AND week_number between (base_week - 12) 792 | AND (base_week - 1) 793 | GROUP BY 794 | 1 795 | ), 796 | sales_after AS ( 797 | SELECT 798 | demographic, 799 | SUM(sales) AS total_sales_after 800 | FROM 801 | clean_weekly_sales, 802 | LATERAL( 803 | SELECT 804 | EXTRACT( 805 | WEEK 806 | FROM 807 | '2020-06-15' :: date 808 | ) AS base_week 809 | ) bw 810 | WHERE 811 | calendar_year = 2020 812 | AND week_number between (base_week) 813 | AND (base_week + 11) 814 | GROUP BY 815 | 1 816 | ) 817 | SELECT 818 | sb.demographic, 819 | total_sales_before, 820 | total_sales_after, 821 | total_sales_after - total_sales_before AS change_in_sales, 822 | ROUND( 823 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 824 | 2 825 | ) AS percentage_of_change 826 | FROM 827 | sales_before AS sb 828 | JOIN sales_after AS sa ON sb.demographic = sa.demographic 829 | GROUP BY 830 | 1, 831 | 2, 832 | 3, 833 | 4 834 | ORDER BY 835 | 5; 836 | 837 | -- customer_type: 838 | 839 | WITH sales_before AS ( 840 | SELECT 841 | customer_type, 842 | SUM(sales) AS total_sales_before 843 | FROM 844 | clean_weekly_sales, 845 | LATERAL( 846 | SELECT 847 | EXTRACT( 848 | WEEK 849 | FROM 850 | '2020-06-15' :: date 851 | ) AS base_week 852 | ) bw 853 | WHERE 854 | calendar_year = 2020 855 | AND week_number between (base_week - 12) 856 | AND (base_week - 1) 857 | GROUP BY 858 | 1 859 | ), 860 | sales_after AS ( 861 | SELECT 862 | customer_type, 863 | SUM(sales) AS total_sales_after 864 | FROM 865 | clean_weekly_sales, 866 | LATERAL( 867 | SELECT 868 | EXTRACT( 869 | WEEK 870 | FROM 871 | '2020-06-15' :: date 872 | ) AS base_week 873 | ) bw 874 | WHERE 875 | calendar_year = 2020 876 | AND week_number between (base_week) 877 | AND (base_week + 11) 878 | GROUP BY 879 | 1 880 | ) 881 | SELECT 882 | sb.customer_type, 883 | total_sales_before, 884 | total_sales_after, 885 | total_sales_after - total_sales_before AS change_in_sales, 886 | ROUND( 887 | 100 * (total_sales_after - total_sales_before) :: numeric / total_sales_before, 888 | 2 889 | ) AS percentage_of_change 890 | FROM 891 | sales_before AS sb 892 | JOIN sales_after AS sa ON sb.customer_type = sa.customer_type 893 | GROUP BY 894 | 1, 895 | 2, 896 | 3, 897 | 4 898 | ORDER BY 899 | 5; 900 | -------------------------------------------------------------------------------- /Case Study #6 - Clique Bait/README.md: -------------------------------------------------------------------------------- 1 | # Case Study #6 - Clique Bait :hook: 2 | 3 | 4 | 5 | In this case study - you are required to support Danny’s vision and analyse his dataset and come up with creative solutions to calculate funnel fallout rates for the Clique Bait online store. 6 | 7 | ## Available Data 8 | 9 | For this case study there is a total of 5 datasets which you will need to combine to solve all of the questions. 10 | 11 | ### Users 12 | 13 | Customers who visit the Clique Bait website are tagged via their `cookie_id`. 14 | 15 | | user_id | cookie_id | start_date | 16 | |---------|-----------|---------------------| 17 | | 397 | 3759ff | 2020-03-30 00:00:00 | 18 | | 215 | 863329 | 2020-01-26 00:00:00 | 19 | | 191 | eefca9 | 2020-03-15 00:00:00 | 20 | | 89 | 764796 | 2020-01-07 00:00:00 | 21 | | 127 | 17ccc5 | 2020-01-22 00:00:00 | 22 | | 81 | b0b666 | 2020-03-01 00:00:00 | 23 | | 260 | a4f236 | 2020-01-08 00:00:00 | 24 | | 203 | d1182f | 2020-04-18 00:00:00 | 25 | | 23 | 12dbc8 | 2020-01-18 00:00:00 | 26 | | 375 | f61d69 | 2020-01-03 00:00:00 | 27 | 28 | ### Events 29 | 30 | Customer visits are logged in this events table at a `cookie_id` level and the `event_type` and `page_id` values can be used to join onto relevant satellite tables to obtain further information about each event. 31 | 32 | The `sequence_number` is used to order the events within each visit. 33 | | visit_id | cookie_id | page_id | event_type | sequence_number | event_time | 34 | |----------|-----------|---------|------------|-----------------|----------------------------| 35 | | 719fd3 | 3d83d3 | 5 | 1 | 4 | 2020-03-02 00:29:09.975502 | 36 | | fb1eb1 | c5ff25 | 5 | 2 | 8 | 2020-01-22 07:59:16.761931 | 37 | | 23fe81 | 1e8c2d | 10 | 1 | 9 | 2020-03-21 13:14:11.745667 | 38 | | ad91aa | 648115 | 6 | 1 | 3 | 2020-04-27 16:28:09.824606 | 39 | | 5576d7 | ac418c | 6 | 1 | 4 | 2020-01-18 04:55:10.149236 | 40 | | 48308b | c686c1 | 8 | 1 | 5 | 2020-01-29 06:10:38.702163 | 41 | | 46b17d | 78f9b3 | 7 | 1 | 12 | 2020-02-16 09:45:31.926407 | 42 | | 9fd196 | ccf057 | 4 | 1 | 5 | 2020-02-14 08:29:12.922164 | 43 | | edf853 | f85454 | 1 | 1 | 1 | 2020-02-22 12:59:07.652207 | 44 | | 3c6716 | 02e74f | 3 | 2 | 5 | 2020-01-31 17:56:20.777383 | 45 | 46 | ### Event Identifier 47 | 48 | The `event_identifier` table shows the types of events which are captured by Clique Bait’s digital data systems. 49 | 50 | | event_type | event_name | 51 | |------------|---------------| 52 | | 1 | Page View | 53 | | 2 | Add to Cart | 54 | | 3 | Purchase | 55 | | 4 | Ad Impression | 56 | | 5 | Ad Click | 57 | 58 | ### Campaign Identifier 59 | 60 | This table shows information for the 3 campaigns that Clique Bait has ran on their website so far in 2020. 61 | 62 | | campaign_id | products | campaign_name | start_date | end_date | 63 | |-------------|----------|-----------------------------------|---------------------|---------------------| 64 | | 1 | 1-3 | BOGOF - Fishing For Compliments | 2020-01-01 00:00:00 | 2020-01-14 00:00:00 | 65 | | 2 | 4-5 | 25% Off - Living The Lux Life | 2020-01-15 00:00:00 | 2020-01-28 00:00:00 | 66 | | 3 | 6-8 | Half Off - Treat Your Shellf(ish) | 2020-02-01 00:00:00 | 2020-03-31 00:00:00 | 67 | 68 | ### Page Hierarchy 69 | 70 | This table lists all of the pages on the Clique Bait website which are tagged and have data passing through from user interaction events. 71 | 72 | | page_id | page_name | product_category | product_id | 73 | |---------|----------------|------------------|------------| 74 | | 1 | Home Page | null | null | 75 | | 2 | All Products | null | null | 76 | | 3 | Salmon | Fish | 1 | 77 | | 4 | Kingfish | Fish | 2 | 78 | | 5 | Tuna | Fish | 3 | 79 | | 6 | Russian Caviar | Luxury | 4 | 80 | | 7 | Black Truffle | Luxury | 5 | 81 | | 8 | Abalone | Shellfish | 6 | 82 | | 9 | Lobster | Shellfish | 7 | 83 | | 10 | Crab | Shellfish | 8 | 84 | | 11 | Oyster | Shellfish | 9 | 85 | | 12 | Checkout | null | null | 86 | | 13 | Confirmation | null | null | 87 | 88 | ## Entity Relationship Diagram 89 | 90 | [1. Enterprise Relationship Diagram](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-enterprise-relationship-diagram) 91 | 92 | ## Table of Contents 93 | 94 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#introduction) 95 | 96 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#case-study-questions) 97 | 98 | [1. Enterprise Relationship Diagram](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-enterprise-relationship-diagram) 99 | 100 | [2. Digital Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#2-digital-analysis) 101 | 102 | [1. How many users are there?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-how-many-users-are-there) 103 | 104 | [2. How many cookies does each user have on average?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#2-how-many-cookies-does-each-user-have-on-average) 105 | 106 | [3. What is the unique number of visits by all users per month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-what-is-the-unique-number-of-visits-by-all-users-per-month) 107 | 108 | [4. What is the number of events for each event type?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#4-what-is-the-number-of-events-for-each-event-type) 109 | 110 | [5. What is the percentage of visits which have a purchase event?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#5-what-is-the-percentage-of-visits-which-have-a-purchase-event) 111 | 112 | [6. What is the percentage of visits which view the checkout page but do not have a purchase event?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#6-what-is-the-percentage-of-visits-which-view-the-checkout-page-but-do-not-have-a-purchase-event) 113 | 114 | [7. What are the top 3 pages by number of views?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#7-what-are-the-top-3-pages-by-number-of-views) 115 | 116 | [8. What is the number of views and cart adds for each product category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#8-what-is-the-number-of-views-and-cart-adds-for-each-product-category) 117 | 118 | [9. What are the top 3 products by purchases?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#9-what-are-the-top-3-products-by-purchases) 119 | 120 | [3. Product Funnel Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-product-funnel-analysis) 121 | 122 | [1. Which product had the most views, cart adds and purchases?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#1-which-product-had-the-most-views-cart-adds-and-purchases) 123 | 124 | [2. Which product was most likely to be abandoned?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#2-which-product-was-most-likely-to-be-abandoned) 125 | 126 | [3. Which product had the highest view to purchase percentage?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-which-product-had-the-highest-view-to-purchase-percentage) 127 | 128 | [4. What is the average conversion rate from view to cart add?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#4-what-is-the-average-conversion-rate-from-view-to-cart-add) 129 | 130 | [5. What is the average conversion rate from cart add to purchase?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#5-what-is-the-average-conversion-rate-from-cart-add-to-purchase) 131 | 132 | [3. Campaigns Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md/#3-campaigns-analysis) 133 | -------------------------------------------------------------------------------- /Case Study #6 - Clique Bait/SQL/solution_6.sql: -------------------------------------------------------------------------------- 1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 25, 2022 2 | -- Fixed on February 14, 2022 3 | 4 | -- Case Study #6 - Clique Bait 5 | 6 | SET 7 | SEARCH_PATH = clique_bait; 8 | 9 | /* -------------------- 10 | Case Study Questions 11 | 12 | -- 1. Enterprise Relationship Diagram 13 | --------------------*/ 14 | 15 | -- Not a coding question 16 | 17 | /* -------------------- 18 | 2. Digital Analysis 19 | 20 | Using the available datasets - answer the following questions using a single query for each one: 21 | --------------------*/ 22 | 23 | -- 1. How many users are there? 24 | 25 | SELECT 26 | COUNT(distinct user_id) AS number_of_users 27 | FROM 28 | users; 29 | 30 | -- 2. How many cookies does each user have on average? 31 | 32 | SELECT 33 | ROUND( 34 | COUNT(cookie_id) / COUNT(distinct user_id) :: numeric, 35 | 2 36 | ) AS avg_cookies_per_user 37 | FROM 38 | users; 39 | 40 | -- 3. What is the unique number of visits by all users per month? 41 | 42 | SELECT 43 | TO_CHAR(event_time, 'Month') AS month, 44 | count(distinct visit_id) AS number_of_visits 45 | FROM 46 | events 47 | GROUP BY 48 | 1 49 | ORDER BY 50 | MIN(event_time); 51 | 52 | -- 4. What is the number of events for each event type? 53 | 54 | SELECT 55 | event_name, 56 | COUNT(e.event_type) AS number_of_events 57 | from 58 | events AS e 59 | JOIN event_identifier AS ei ON e.event_type = ei.event_type 60 | GROUP BY 61 | 1 62 | ORDER BY 63 | 2 DESC; 64 | 65 | -- 5. What is the percentage of visits which have a purchase event? 66 | 67 | SELECT 68 | event_name, 69 | ROUND( 70 | 100 *(number_of_purchases :: numeric / number_of_visits), 71 | 1 72 | ) AS percentage_from_all_visits 73 | FROM 74 | events AS e 75 | join event_identifier AS ei on e.event_type = ei.event_type, 76 | LATERAL( 77 | SELECT 78 | COUNT(distinct visit_id) AS number_of_visits 79 | FROM 80 | events 81 | ) AS nv, 82 | LATERAL( 83 | SELECT 84 | COUNT(distinct visit_id) AS number_of_purchases 85 | FROM 86 | events 87 | WHERE 88 | event_type = 3 89 | ) np 90 | WHERE 91 | event_name = 'Purchase' 92 | GROUP BY 93 | event_name, 94 | number_of_visits, 95 | number_of_purchases; 96 | 97 | -- 6. What is the percentage of visits which view the checkout page but do not have a purchase event? 98 | 99 | SELECT 100 | page_name, 101 | COUNT(page_name) AS number_of_visits, 102 | ROUND( 103 | 100 *(COUNT(page_name) :: numeric / checkout_visits), 104 | 1 105 | ) AS percentage_from_checkout_page_visits, 106 | ROUND( 107 | 100 *(COUNT(page_name) :: numeric / total_visits), 108 | 1 109 | ) AS percentage_from_all_visits 110 | FROM 111 | events AS e 112 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id, 113 | LATERAL( 114 | SELECT 115 | COUNT(distinct visit_id) AS total_visits 116 | FROM 117 | events 118 | ) AS tv, 119 | LATERAL( 120 | SELECT 121 | COUNT(distinct visit_id) AS checkout_visits 122 | FROM 123 | events 124 | WHERE 125 | page_id = 12 126 | ) AS cv 127 | WHERE 128 | page_name = 'Checkout' 129 | AND visit_id NOT IN ( 130 | SELECT 131 | distinct visit_id 132 | FROM 133 | events AS ee 134 | WHERE 135 | event_type = 3 136 | ) 137 | GROUP BY 138 | page_name, 139 | total_visits, 140 | checkout_visits; 141 | 142 | -- 7. What are the top 3 pages by number of views? 143 | 144 | WITH ordered_rows AS( 145 | SELECT 146 | page_name, 147 | event_name, 148 | COUNT(event_name) AS number_of_views, 149 | ROW_NUMBER() OVER ( 150 | ORDER BY 151 | COUNT(page_name) DESC 152 | ) AS row 153 | FROM 154 | events AS e 155 | JOIN page_hierarchy AS pe on e.page_id = pe.page_id 156 | JOIN event_identifier AS ei on e.event_type = ei.event_type 157 | WHERE 158 | event_name = 'Page View' 159 | GROUP BY 160 | 1, 161 | 2 162 | ) 163 | SELECT 164 | page_name, 165 | number_of_views 166 | FROM 167 | ordered_rows 168 | WHERE 169 | row in (1, 2, 3); 170 | 171 | -- 8. What is the number of views and cart adds for each product category? 172 | 173 | SELECT 174 | pe.product_category, 175 | SUM( 176 | CASE 177 | WHEN event_name = 'Page View' THEN 1 178 | ELSE 0 179 | END 180 | ) AS number_of_page_views, 181 | SUM( 182 | CASE 183 | WHEN event_name = 'Add to Cart' THEN 1 184 | ELSE 0 185 | END 186 | ) AS number_of_add_to_cart_events 187 | FROM 188 | events AS e 189 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id 190 | JOIN event_identifier AS ei ON e.event_type = ei.event_type 191 | WHERE 192 | product_id > 0 193 | GROUP BY 194 | 1 195 | ORDER BY 196 | 1; 197 | 198 | -- 9. What are the top 3 products by purchases? 199 | 200 | WITH ordered_rows AS( 201 | SELECT 202 | page_name, 203 | event_name, 204 | COUNT(event_name) AS number_of_purchases, 205 | ROW_NUMBER() OVER ( 206 | ORDER BY 207 | COUNT(event_name) DESC 208 | ) AS row 209 | FROM 210 | events AS e 211 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id 212 | JOIN event_identifier AS ei ON e.event_type = ei.event_type 213 | WHERE 214 | visit_id in ( 215 | SELECT 216 | distinct visit_id 217 | FROM 218 | events AS ee 219 | WHERE 220 | event_type = 3 221 | ) 222 | AND product_id > 0 223 | AND event_name = 'Add to Cart' 224 | GROUP BY 225 | 1, 226 | 2 227 | ) 228 | SELECT 229 | page_name, 230 | number_of_purchases 231 | FROM 232 | ordered_rows 233 | WHERE 234 | row in (1, 2, 3); 235 | 236 | /* -------------------- 237 | 3. Product Funnel Analysis 238 | 239 | Using a single SQL query - create a new output table which has the following details: 240 | 241 | How many times was each product viewed? 242 | How many times was each product added to cart? 243 | How many times was each product added to a cart but not purchased (abandoned)? 244 | How many times was each product purchased? 245 | --------------------*/ 246 | 247 | WITH joined_tables AS( 248 | SELECT 249 | visit_id, 250 | page_name, 251 | event_name 252 | FROM 253 | events AS e 254 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id 255 | JOIN event_identifier AS ei ON e.event_type = ei.event_type 256 | GROUP BY 257 | 1, 258 | 2, 259 | 3 260 | ) 261 | SELECT 262 | jt.page_name, 263 | COUNT(event_name) AS number_of_views, 264 | number_of_added_to_cart, 265 | number_of_abandoned_carts, 266 | number_of_purchases 267 | 268 | -- INTO TABLE product_stats 269 | FROM 270 | joined_tables AS jt 271 | JOIN ( 272 | SELECT 273 | page_name, 274 | COUNT(event_name) AS number_of_added_to_cart 275 | FROM 276 | joined_tables 277 | WHERE 278 | event_name = 'Add to Cart' 279 | GROUP BY 280 | 1 281 | ) jt1 ON jt.page_name = jt1.page_name 282 | JOIN ( 283 | SELECT 284 | page_name, 285 | COUNT(event_name) AS number_of_abandoned_carts 286 | FROM 287 | joined_tables 288 | WHERE 289 | event_name = 'Add to Cart' 290 | AND visit_id NOT IN ( 291 | SELECT 292 | distinct visit_id 293 | FROM 294 | events AS ee 295 | WHERE 296 | event_type = 3 297 | ) 298 | GROUP BY 299 | 1 300 | ) jt2 ON jt.page_name = jt2.page_name 301 | JOIN ( 302 | SELECT 303 | page_name, 304 | COUNT(event_name) AS number_of_purchases 305 | FROM 306 | joined_tables 307 | WHERE 308 | event_name = 'Add to Cart' 309 | AND visit_id IN ( 310 | SELECT 311 | distinct visit_id 312 | FROM 313 | events AS ee 314 | WHERE 315 | event_type = 3 316 | ) 317 | GROUP BY 318 | 1 319 | ) jt3 ON jt.page_name = jt3.page_name 320 | WHERE 321 | event_name = 'Page View' 322 | GROUP BY 323 | jt.page_name, 324 | number_of_added_to_cart, 325 | number_of_purchases, 326 | number_of_abandoned_carts 327 | ORDER BY 328 | 1; 329 | 330 | -- Additionally, create another table which further aggregates the data for the above points but this time for each product category instead of individual products. 331 | 332 | SELECT 333 | product_category, 334 | SUM(number_of_views) AS number_of_views, 335 | SUM(number_of_added_to_cart) AS number_of_added_to_cart, 336 | SUM(number_of_abandoned_carts) AS number_of_abandoned_carts, 337 | SUM(number_of_purchases) AS number_of_purchases 338 | 339 | -- INTO TABLE product_category_stats 340 | FROM 341 | product_stats AS ps 342 | JOIN page_hierarchy AS pe ON ps.page_name = pe.page_name 343 | GROUP BY 344 | product_category 345 | ORDER BY 346 | 1; 347 | 348 | -- Use your 2 new output tables - answer the following questions: 349 | -- 1. Which product had the most views, cart adds and purchases? 350 | 351 | WITH ordered_rows AS ( 352 | SELECT 353 | *, 354 | ROW_NUMBER() OVER ( 355 | ORDER BY 356 | number_of_views DESC 357 | ) AS views, 358 | ROW_NUMBER() OVER ( 359 | ORDER BY 360 | number_of_added_to_cart DESC 361 | ) AS carts, 362 | ROW_NUMBER() OVER ( 363 | ORDER BY 364 | number_of_purchases DESC 365 | ) AS purchases 366 | FROM 367 | product_stats 368 | GROUP BY 369 | 1, 370 | 2, 371 | 3, 372 | 4, 373 | 5 374 | ) 375 | SELECT 376 | page_name, 377 | number_of_views, 378 | number_of_added_to_cart, 379 | number_of_purchases 380 | FROM 381 | ordered_rows 382 | WHERE 383 | views = 1 384 | OR carts = 1 385 | OR purchases = 1; 386 | 387 | -- 2. Which product was most likely to be abandoned? 388 | 389 | WITH ordered_rows AS ( 390 | SELECT 391 | *, 392 | ROW_NUMBER() OVER ( 393 | ORDER BY 394 | number_of_abandoned_carts DESC 395 | ) AS row 396 | FROM 397 | product_stats 398 | GROUP BY 399 | 1, 400 | 2, 401 | 3, 402 | 4, 403 | 5 404 | ) 405 | SELECT 406 | page_name, 407 | number_of_abandoned_carts 408 | FROM 409 | ordered_rows 410 | WHERE 411 | row = 1; 412 | 413 | -- 3. Which product had the highest view to purchase percentage? 414 | 415 | WITH ordered_rows AS ( 416 | SELECT 417 | page_name, 418 | view_to_purchase_percentage, 419 | ROW_NUMBER() OVER ( 420 | ORDER BY 421 | view_to_purchase_percentage DESC 422 | ) AS row 423 | FROM 424 | product_stats, 425 | LATERAL( 426 | SELECT 427 | ROUND( 428 | 100 *(number_of_purchases :: numeric / number_of_views), 429 | 1 430 | ) AS view_to_purchase_percentage 431 | ) vpp 432 | GROUP BY 433 | 1, 434 | 2 435 | ) 436 | SELECT 437 | page_name, 438 | view_to_purchase_percentage 439 | FROM 440 | ordered_rows 441 | WHERE 442 | row = 1; 443 | 444 | -- 4. What is the average conversion rate from view to cart add? 445 | 446 | SELECT 447 | ROUND( 448 | 100 *( 449 | SUM(number_of_added_to_cart) / SUM(number_of_views) 450 | ), 451 | 1 452 | ) AS avg_view_to_cart_conversion 453 | FROM 454 | product_category_stats; 455 | 456 | -- 5. What is the average conversion rate from cart add to purchase? 457 | 458 | SELECT 459 | ROUND( 460 | 100 *( 461 | SUM(number_of_purchases) / SUM(number_of_added_to_cart) 462 | ), 463 | 1 464 | ) AS avg_cart_to_purchase_conversion 465 | FROM 466 | product_category_stats; 467 | 468 | /* -------------------- 469 | 3. Campaigns Analysis 470 | 471 | Generate a table that has 1 single row for every unique visit_id record and has the following columns: 472 | 473 | user_id 474 | visit_id 475 | visit_start_time: the earliest event_time for each visit 476 | page_views: count of page views for each visit 477 | cart_adds: count of product cart add events for each visit 478 | purchase: 1/0 flag if a purchase event exists for each visit 479 | campaign_name: map the visit to a campaign if the visit_start_time falls between the start_date and end_date 480 | impression: count of ad impressions for each visit 481 | click: count of ad clicks for each visit 482 | (Optional column) cart_products: a comma separated text value with products added to the cart sorted by the order they were added to the cart (hint: use the sequence_number) 483 | --------------------*/ 484 | 485 | WITH joined_table AS ( 486 | SELECT 487 | user_id, 488 | visit_id, 489 | event_time AS visit_start_time, 490 | page_name, 491 | event_name, 492 | sequence_number, 493 | product_id 494 | FROM 495 | users AS u 496 | JOIN events AS e ON u.cookie_id = e.cookie_id 497 | JOIN event_identifier AS ei ON e.event_type = ei.event_type 498 | JOIN page_hierarchy AS pe ON e.page_id = pe.page_id 499 | GROUP BY 500 | user_id, 501 | visit_id, 502 | event_name, 503 | page_name, 504 | event_time, 505 | sequence_number, 506 | product_id 507 | ) 508 | SELECT 509 | user_id, 510 | jt.visit_id, 511 | visit_start_time, 512 | page_views, 513 | cart_adds, 514 | purchase, 515 | campaign_name, 516 | impression, 517 | click, 518 | cart_products 519 | FROM 520 | joined_table AS jt 521 | LEFT JOIN( 522 | SELECT 523 | visit_id, 524 | COUNT(page_name) AS page_views 525 | FROM 526 | joined_table 527 | WHERE 528 | event_name = 'Page View' 529 | GROUP BY 530 | 1 531 | ) AS jt1 ON jt.visit_id = jt1.visit_id 532 | LEFT JOIN( 533 | SELECT 534 | visit_id, 535 | COUNT(page_name) AS cart_adds 536 | FROM 537 | joined_table 538 | WHERE 539 | event_name = 'Add to Cart' 540 | GROUP BY 541 | 1 542 | ) AS jt2 ON jt.visit_id = jt2.visit_id 543 | LEFT JOIN( 544 | SELECT 545 | visit_id, 546 | CASE 547 | WHEN visit_id IN ( 548 | SELECT 549 | distinct visit_id 550 | FROM 551 | events AS ee 552 | WHERE 553 | event_type = 3 554 | ) THEN 1 555 | ELSE 0 556 | END AS purchase 557 | FROM 558 | joined_table 559 | GROUP BY 560 | 1 561 | ) AS jt3 ON jt.visit_id = jt3.visit_id 562 | LEFT JOIN( 563 | SELECT 564 | visit_id, 565 | COUNT(page_name) AS impression 566 | FROM 567 | joined_table 568 | WHERE 569 | event_name = 'Ad Impression' 570 | GROUP BY 571 | 1 572 | ) AS jt4 ON jt.visit_id = jt4.visit_id 573 | LEFT JOIN( 574 | SELECT 575 | visit_id, 576 | COUNT(page_name) AS click 577 | FROM 578 | joined_table 579 | WHERE 580 | event_name = 'Ad Click' 581 | GROUP BY 582 | 1 583 | ) AS jt5 ON jt.visit_id = jt5.visit_id 584 | LEFT JOIN campaign_identifier AS ci ON jt.visit_start_time between ci.start_date 585 | AND ci.end_date 586 | LEFT JOIN( 587 | SELECT 588 | visit_id, 589 | STRING_AGG( 590 | page_name, 591 | ', ' 592 | ORDER BY 593 | sequence_number 594 | ) AS cart_products 595 | FROM 596 | joined_table 597 | WHERE 598 | product_id > 0 599 | AND event_name = 'Add to Cart' 600 | GROUP BY 601 | 1 602 | ) AS jt6 ON jt.visit_id = jt6.visit_id 603 | WHERE 604 | sequence_number = 1 605 | GROUP BY 606 | page_name, 607 | page_views, 608 | cart_adds, 609 | user_id, 610 | jt.visit_id, 611 | purchase, 612 | impression, 613 | click, 614 | visit_start_time, 615 | campaign_name, 616 | cart_products 617 | ORDER BY 618 | 1, 619 | 3; 620 | 621 | /* -------------------- 622 | Use the subsequent dataset to generate at least 5 insights for the Clique Bait team - bonus: prepare a single A4 infographic that the team can use for their management reporting sessions, be sure to emphasise the most important points from your findings. 623 | 624 | Some ideas you might want to investigate further include: 625 | 626 | Identifying users who have received impressions during each campaign period and comparing each metric with other users who did not have an impression event 627 | Does clicking on an impression lead to higher purchase rates? 628 | What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click? 629 | What metrics can you use to quantify the success or failure of each campaign compared to each other? 630 | --------------------*/ 631 | -------------------------------------------------------------------------------- /Case Study #7 - Balanced Tree Clothing Co./README.md: -------------------------------------------------------------------------------- 1 | # Case Study #7 - Balanced Tree Clothing Co. :mountain_snow: 2 | 3 | 4 | 5 | Danny, the CEO of this trendy fashion company has asked you to assist the team’s merchandising teams analyse their sales performance and generate a basic financial report to share with the wider business. 6 | 7 | ## Available Data 8 | 9 | For this case study there is a total of 4 datasets for this case study - however you will only need to utilise 2 main tables to solve all of the regular questions, and the additional 2 tables are used only for the bonus challenge question! 10 | 11 | ### Product Details 12 | 13 | `balanced_tree.product_details` includes all information about the entire range that Balanced Clothing sells in their store. 14 | 15 | | product_id | price | product_name | category_id | segment_id | style_id | category_name | segment_name | style_name | 16 | |------------|-------|----------------------------------|-------------|------------|----------|---------------|--------------|---------------------| 17 | | c4a632 | 13 | Navy Oversized Jeans - Womens | 1 | 3 | 7 | Womens | Jeans | Navy Oversized | 18 | | e83aa3 | 32 | Black Straight Jeans - Womens | 1 | 3 | 8 | Womens | Jeans | Black Straight | 19 | | e31d39 | 10 | Cream Relaxed Jeans - Womens | 1 | 3 | 9 | Womens | Jeans | Cream Relaxed | 20 | | d5e9a6 | 23 | Khaki Suit Jacket - Womens | 1 | 4 | 10 | Womens | Jacket | Khaki Suit | 21 | | 72f5d4 | 19 | Indigo Rain Jacket - Womens | 1 | 4 | 11 | Womens | Jacket | Indigo Rain | 22 | | 9ec847 | 54 | Grey Fashion Jacket - Womens | 1 | 4 | 12 | Womens | Jacket | Grey Fashion | 23 | | 5d267b | 40 | White Tee Shirt - Mens | 2 | 5 | 13 | Mens | Shirt | White Tee | 24 | | c8d436 | 10 | Teal Button Up Shirt - Mens | 2 | 5 | 14 | Mens | Shirt | Teal Button Up | 25 | | 2a2353 | 57 | Blue Polo Shirt - Mens | 2 | 5 | 15 | Mens | Shirt | Blue Polo | 26 | | f084eb | 36 | Navy Solid Socks - Mens | 2 | 6 | 16 | Mens | Socks | Navy Solid | 27 | | b9a74d | 17 | White Striped Socks - Mens | 2 | 6 | 17 | Mens | Socks | White Striped | 28 | | 2feb6b | 29 | Pink Fluro Polkadot Socks - Mens | 2 | 6 | 18 | Mens | Socks | Pink Fluro Polkadot | 29 | 30 | ### Product Sales 31 | 32 | `balanced_tree.sales` contains product level information for all the transactions made for Balanced Tree including quantity, price, percentage discount, member status, a transaction ID and also the transaction timestamp. 33 | 34 | | prod_id | qty | price | discount | member | txn_id | start_txn_time | 35 | |---------|-----|-------|----------|--------|--------|--------------------------| 36 | | c4a632 | 4 | 13 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 | 37 | | 5d267b | 4 | 40 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 | 38 | | b9a74d | 4 | 17 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 | 39 | | 2feb6b | 2 | 29 | 17 | t | 54f307 | 2021-02-13 01:59:43.296 | 40 | | c4a632 | 5 | 13 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 | 41 | | e31d39 | 2 | 10 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 | 42 | | 72f5d4 | 3 | 19 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 | 43 | | 2a2353 | 3 | 57 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 | 44 | | f084eb | 3 | 36 | 21 | t | 26cc98 | 2021-01-19 01:39:00.3456 | 45 | | c4a632 | 1 | 13 | 21 | f | ef648d | 2021-01-27 02:18:17.1648 | 46 | 47 | ### Product Hierarcy & Product Price 48 | 49 | These tables are used only for the bonus question where we will use them to recreate the `balanced_tree.product_details` table. 50 | 51 | `balanced_tree.product_hierarchy` 52 | 53 | | id | parent_id | level_text | level_name | 54 | |----|-----------|---------------------|------------| 55 | | 1 | Womens | Category | | 56 | | 2 | Mens | Category | | 57 | | 3 | 1 | Jeans | Segment | 58 | | 4 | 1 | Jacket | Segment | 59 | | 5 | 2 | Shirt | Segment | 60 | | 6 | 2 | Socks | Segment | 61 | | 7 | 3 | Navy Oversized | Style | 62 | | 8 | 3 | Black Straight | Style | 63 | | 9 | 3 | Cream Relaxed | Style | 64 | | 10 | 4 | Khaki Suit | Style | 65 | | 11 | 4 | Indigo Rain | Style | 66 | | 12 | 4 | Grey Fashion | Style | 67 | | 13 | 5 | White Tee | Style | 68 | | 14 | 5 | Teal Button Up | Style | 69 | | 15 | 5 | Blue Polo | Style | 70 | | 16 | 6 | Navy Solid | Style | 71 | | 17 | 6 | White Striped | Style | 72 | | 18 | 6 | Pink Fluro Polkadot | Style | 73 | 74 | `balanced_tree.product_prices` 75 | 76 | | id | product_id | price | 77 | |----|------------|-------| 78 | | 7 | c4a632 | 13 | 79 | | 8 | e83aa3 | 32 | 80 | | 9 | e31d39 | 10 | 81 | | 10 | d5e9a6 | 23 | 82 | | 11 | 72f5d4 | 19 | 83 | | 12 | 9ec847 | 54 | 84 | | 13 | 5d267b | 40 | 85 | | 14 | c8d436 | 10 | 86 | | 15 | 2a2353 | 57 | 87 | | 16 | f084eb | 36 | 88 | | 17 | b9a74d | 17 | 89 | | 18 | 2feb6b | 29 | 90 | 91 | ## Table of Contents 92 | 93 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#introduction) 94 | 95 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#case-study-questions) 96 | 97 | [High Level Sales Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#high-level-sales-analysis) 98 | 99 | [1. What was the total quantity sold for all products?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#1-what-was-the-total-quantity-sold-for-all-products) 100 | 101 | [2. What is the total generated revenue for all products before discounts?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#2-what-is-the-total-generated-revenue-for-all-products-before-discounts) 102 | 103 | [3. What was the total discount amount for all products?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#3-what-was-the-total-discount-amount-for-all-products) 104 | 105 | [Transaction Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#transaction-analysis) 106 | 107 | [1. How many unique transactions were there?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#1-how-many-unique-transactions-were-there) 108 | 109 | [2. What is the average unique products purchased in each transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#2-what-is-the-average-unique-products-purchased-in-each-transaction) 110 | 111 | [3. What are the 25th, 50th and 75th percentile values for the revenue per transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#3-what-are-the-25th-50th-and-75th-percentile-values-for-the-revenue-per-transaction) 112 | 113 | [4. What is the average discount value per transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#4-what-is-the-average-discount-value-per-transaction) 114 | 115 | [5. What is the percentage split of all transactions for members vs non-members?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#5-what-is-the-percentage-split-of-all-transactions-for-members-vs-non-members) 116 | 117 | [6. What is the average revenue for member transactions and non-member transactions?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#6-what-is-the-average-revenue-for-member-transactions-and-non-member-transactions) 118 | 119 | [Product Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#product-analysis) 120 | 121 | [1. What are the top 3 products by total revenue before discount?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#1-what-are-the-top-3-products-by-total-revenue-before-discount) 122 | 123 | [2. What is the total quantity, revenue and discount for each segment?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#2-what-is-the-total-quantity-revenue-and-discount-for-each-segment) 124 | 125 | [3. What is the top selling product for each segment?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#3-what-is-the-top-selling-product-for-each-segment) 126 | 127 | [4. What is the total quantity, revenue and discount for each category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#4-what-is-the-total-quantity-revenue-and-discount-for-each-category) 128 | 129 | [5. What is the top selling product for each category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#5-what-is-the-top-selling-product-for-each-category) 130 | 131 | [6. What is the percentage split of revenue by product for each segment?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#6-what-is-the-percentage-split-of-revenue-by-product-for-each-segment) 132 | 133 | [7. What is the percentage split of revenue by segment for each category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#7-what-is-the-percentage-split-of-revenue-by-segment-for-each-category) 134 | 135 | [8. What is the percentage split of total revenue by category?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#8-what-is-the-percentage-split-of-total-revenue-by-category) 136 | 137 | [9. What is the total transaction “penetration” for each product? (hint: penetration = number of transactions where at least 1 quantity of a product was purchased divided by total number of transactions)](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#9-what-is-the-total-transaction-penetration-for-each-product-hint-penetration--number-of-transactions-where-at-least-1-quantity-of-a-product-was-purchased-divided-by-total-number-of-transactions) 138 | 139 | [10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#10-what-is-the-most-common-combination-of-at-least-1-quantity-of-any-3-products-in-a-1-single-transaction) 140 | 141 | [Reporting Challenge](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#reporting-challenge) 142 | 143 | [Bonus Challenge](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md#bonus-challenge) 144 | -------------------------------------------------------------------------------- /Case Study #7 - Balanced Tree Clothing Co./SQL/solution_7.sql: -------------------------------------------------------------------------------- 1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 28, 2022 2 | -- Fixed on February 15, 2022 3 | 4 | /* -------------------- 5 | Case Study #7 - Balanced Tree Clothing Co. 6 | --------------------*/ 7 | 8 | SET 9 | SEARCH_PATH = balanced_tree; 10 | 11 | /* -------------------- 12 | Case Study Questions 13 | --------------------*/ 14 | 15 | -- 1. What was the total quantity sold for all products? 16 | 17 | SELECT 18 | SUM(qty) AS total_qty_sold 19 | FROM 20 | sales; 21 | 22 | -- 2. What is the total generated revenue for all products before discounts? 23 | 24 | SELECT 25 | SUM(qty * price) AS total_sales 26 | FROM 27 | sales; 28 | 29 | -- 3. What was the total discount amount for all products? 30 | 31 | SELECT 32 | ROUND(SUM(qty * price * discount :: numeric / 100), 2) AS total_discount 33 | FROM 34 | sales; 35 | 36 | /* -------------------- 37 | Transaction Analysis 38 | --------------------*/ 39 | 40 | -- 1. How many unique transactions were there? 41 | 42 | SELECT 43 | COUNT(distinct txn_id) AS number_of_transactions 44 | FROM 45 | sales; 46 | 47 | -- 2. What is the average unique products purchased in each transaction? 48 | 49 | SELECT 50 | COUNT(prod_id) / COUNT(distinct txn_id) AS avg_number_of_product_in_order 51 | FROM 52 | sales; 53 | 54 | -- 3. What are the 25th, 50th and 75th percentile values for the revenue per transaction? 55 | 56 | WITH revenue AS( 57 | SELECT 58 | txn_id, 59 | SUM(qty * price) AS revenue 60 | FROM 61 | sales 62 | GROUP BY 63 | txn_id 64 | ) 65 | SELECT 66 | PERCENTILE_CONT(0.25) WITHIN GROUP ( 67 | ORDER BY 68 | revenue 69 | ) AS percentile_25, 70 | PERCENTILE_CONT(0.50) WITHIN GROUP ( 71 | ORDER BY 72 | revenue 73 | ) AS percentile_50, 74 | PERCENTILE_CONT(0.75) WITHIN GROUP ( 75 | ORDER BY 76 | revenue 77 | ) AS percentile_75 78 | FROM 79 | revenue; 80 | 81 | -- 4. What is the average discount value per transaction? 82 | 83 | WITH revenue AS( 84 | SELECT 85 | txn_id, 86 | SUM(qty * price * discount :: numeric / 100) AS order_discount 87 | FROM 88 | sales 89 | GROUP BY 90 | txn_id 91 | ) 92 | SELECT 93 | ROUND(AVG(order_discount), 2) AS avg_order_discount 94 | FROM 95 | revenue; 96 | 97 | -- 5. What is the percentage split of all transactions for members vs non-members? 98 | 99 | WITH members AS ( 100 | SELECT 101 | DISTINCT sales.txn_id, 102 | COUNT(distinct member) AS total_members, 103 | CASE 104 | WHEN member = TRUE THEN 1 105 | ELSE 0 106 | END AS number_of_members 107 | FROM 108 | sales 109 | GROUP BY 110 | txn_id, 111 | number_of_members 112 | ) 113 | SELECT 114 | distinct percentage_of_members, 115 | 100 - percentage_of_members AS percentage_of_guests 116 | FROM 117 | members, 118 | LATERAL( 119 | SELECT 120 | ROUND( 121 | 100 *(SUM(number_of_members) / SUM(total_members)), 122 | 1 123 | ) AS percentage_of_members 124 | FROM 125 | members 126 | ) pm; 127 | 128 | -- 6. What is the average revenue for member transactions and non-member transactions? 129 | 130 | WITH members AS ( 131 | SELECT 132 | sales.txn_id, 133 | CASE 134 | WHEN member = TRUE THEN SUM(qty * price) 135 | END AS members_revenue, 136 | CASE 137 | WHEN member = FALSE THEN SUM(qty * price) 138 | END AS guests_revenue 139 | FROM 140 | sales 141 | GROUP BY 142 | txn_id, 143 | member 144 | ) 145 | SELECT 146 | ROUND(AVG(members_revenue), 2) AS avg_members_revenue, 147 | ROUND(AVG(guests_revenue), 2) AS avg_guests_revenue 148 | FROM 149 | members; 150 | 151 | /* -------------------- 152 | Product Analysis 153 | --------------------*/ 154 | 155 | -- 1. What are the top 3 products by total revenue before discount? 156 | 157 | WITH revenue AS( 158 | SELECT 159 | product_name, 160 | SUM(qty * s.price) AS total_revenue, 161 | ROW_NUMBER() OVER( 162 | ORDER BY 163 | SUM(qty * s.price) DESC 164 | ) AS row 165 | FROM 166 | sales AS s 167 | JOIN product_details AS pd ON s.prod_id = pd.product_id 168 | GROUP BY 169 | product_name 170 | ) 171 | SELECT 172 | product_name, 173 | total_revenue 174 | FROM 175 | revenue 176 | WHERE 177 | row in (1, 2, 3); 178 | 179 | -- 2. What is the total quantity, revenue and discount for each segment? 180 | 181 | 182 | SELECT 183 | segment_name, 184 | SUM(qty) AS total_quantity, 185 | SUM(qty * s.price) AS total_revenue, 186 | round( 187 | SUM(qty * s.price * discount :: numeric / 100), 188 | 2 189 | ) AS total_discount 190 | FROM 191 | sales AS s 192 | JOIN product_details AS pd ON s.prod_id = pd.product_id 193 | GROUP BY 194 | segment_name 195 | ORDER BY 196 | 1; 197 | 198 | -- 3. What is the top selling product for each segment? 199 | 200 | WITH revenue AS ( 201 | SELECT 202 | segment_name, 203 | product_name, 204 | SUM(qty) AS total_quantity, 205 | SUM(qty * s.price) AS total_revenue, 206 | ROW_NUMBER() OVER( 207 | PARTITION BY segment_name 208 | ORDER BY 209 | SUM(qty * s.price) DESC 210 | ) AS revenue_rank, 211 | ROW_NUMBER() OVER( 212 | PARTITION BY segment_name 213 | ORDER BY 214 | SUM(qty) DESC 215 | ) AS qty_rank 216 | FROM 217 | sales AS s 218 | JOIN product_details AS pd ON s.prod_id = pd.product_id 219 | GROUP BY 220 | segment_name, 221 | product_name 222 | ) 223 | SELECT 224 | segment_name, 225 | product_name, 226 | total_quantity, 227 | total_revenue 228 | FROM 229 | revenue 230 | WHERE 231 | revenue_rank = 1 232 | OR qty_rank = 1; 233 | 234 | -- 4. What is the total quantity, revenue and discount for each category? 235 | 236 | SELECT 237 | category_name, 238 | SUM(qty) AS total_quantity, 239 | SUM(qty * s.price) AS total_revenue, 240 | round( 241 | SUM(qty * s.price * discount :: numeric / 100), 242 | 2 243 | ) AS total_discount 244 | FROM 245 | sales AS s 246 | JOIN product_details AS pd ON s.prod_id = pd.product_id 247 | GROUP BY 248 | category_name 249 | ORDER BY 250 | 1; 251 | 252 | -- 5. What is the top selling product for each category? 253 | 254 | WITH revenue AS ( 255 | SELECT 256 | category_name, 257 | product_name, 258 | SUM(qty) AS total_quantity, 259 | SUM(qty * s.price) AS total_revenue, 260 | ROW_NUMBER() OVER( 261 | PARTITION BY category_name 262 | ORDER BY 263 | SUM(qty * s.price) DESC 264 | ) AS revenue_rank, 265 | ROW_NUMBER() OVER( 266 | PARTITION BY category_name 267 | ORDER BY 268 | SUM(qty) DESC 269 | ) AS qty_rank 270 | FROM 271 | sales AS s 272 | JOIN product_details AS pd ON s.prod_id = pd.product_id 273 | GROUP BY 274 | category_name, 275 | product_name 276 | ) 277 | SELECT 278 | category_name, 279 | product_name, 280 | total_quantity, 281 | total_revenue 282 | FROM 283 | revenue 284 | WHERE 285 | revenue_rank = 1 286 | OR qty_rank = 1; 287 | 288 | -- 6. What is the percentage split of revenue by product for each segment? 289 | 290 | SELECT 291 | segment_name, 292 | product_name, 293 | ROUND( 294 | 100 *( 295 | SUM(qty * s.price) :: numeric / SUM(SUM(qty * s.price)) OVER(PARTITION BY segment_name) 296 | ), 297 | 1 298 | ) AS percent_of_revenue 299 | FROM 300 | sales AS s 301 | JOIN product_details AS pd ON s.prod_id = pd.product_id 302 | GROUP BY 303 | segment_name, 304 | product_name 305 | ORDER BY 306 | 1, 3 DESC; 307 | 308 | -- 7. What is the percentage split of revenue by segment for each category? 309 | 310 | SELECT 311 | segment_name, 312 | category_name, 313 | ROUND( 314 | 100 *( 315 | SUM(qty * s.price) :: numeric / SUM(SUM(qty * s.price)) OVER(PARTITION BY category_name) 316 | ), 317 | 1 318 | ) AS percent_of_revenue 319 | FROM 320 | sales AS s 321 | JOIN product_details AS pd ON s.prod_id = pd.product_id 322 | GROUP BY 323 | segment_name, 324 | category_name 325 | ORDER BY 326 | 1; 327 | 328 | -- 8. What is the percentage split of total revenue by category? 329 | 330 | SELECT 331 | category_name, 332 | ROUND( 333 | 100 *( 334 | SUM(qty * s.price) :: numeric / SUM(SUM(qty * s.price)) OVER() 335 | ), 336 | 1 337 | ) AS percent_of_revenue 338 | FROM 339 | sales AS s 340 | JOIN product_details AS pd ON s.prod_id = pd.product_id 341 | GROUP BY 342 | category_name 343 | ORDER BY 344 | 1; 345 | 346 | -- 9. What is the total transaction “penetration” for each product? (hint: penetration = number of transactions where at least 1 quantity of a product was purchased divided by total number of transactions) 347 | 348 | SELECT 349 | product_name, 350 | ROUND( 351 | 100 *(COUNT(product_name) :: numeric / number_of_txn), 352 | 2 353 | ) AS percent_of_penetration 354 | FROM 355 | sales AS s 356 | JOIN product_details AS pd ON s.prod_id = pd.product_id, 357 | LATERAL( 358 | SELECT 359 | COUNT(distinct txn_id) AS number_of_txn 360 | FROM 361 | sales 362 | ) ss 363 | GROUP BY 364 | product_name, 365 | number_of_txn 366 | ORDER BY 367 | 2 DESC; 368 | 369 | -- 10. What is the most common combination of at least 1 quantity of any 3 products in a 1 single transaction? 370 | 371 | SELECT 372 | product_1, 373 | product_2, 374 | product_3, 375 | times_bought_together 376 | FROM 377 | ( 378 | with products AS( 379 | SELECT 380 | txn_id, 381 | product_name 382 | FROM 383 | sales AS s 384 | JOIN product_details AS pd ON s.prod_id = pd.product_id 385 | ) 386 | SELECT 387 | p.product_name AS product_1, 388 | p1.product_name AS product_2, 389 | p2.product_name AS product_3, 390 | COUNT(*) AS times_bought_together, 391 | ROW_NUMBER() OVER( 392 | ORDER BY 393 | COUNT(*) DESC 394 | ) AS rank 395 | FROM 396 | products AS p 397 | JOIN products AS p1 ON p.txn_id = p1.txn_id 398 | AND p.product_name != p1.product_name 399 | AND p.product_name < p1.product_name 400 | JOIN products AS p2 ON p.txn_id = p2.txn_id 401 | AND p.product_name != p2.product_name 402 | AND p1.product_name != p2.product_name 403 | AND p.product_name < p2.product_name 404 | AND p1.product_name < p2.product_name 405 | GROUP BY 406 | p.product_name, 407 | p1.product_name, 408 | p2.product_name 409 | ) pp 410 | WHERE 411 | rank = 1; 412 | 413 | /* -------------------- 414 | Reporting Challenge 415 | 416 | Write a single SQL script that combines all of the previous questions into a scheduled report that the Balanced Tree team can run at the beginning of each month to calculate the previous month’s values. 417 | 418 | Imagine that the Chief Financial Officer (which is also Danny) has asked for all of these questions at the end of every month. 419 | 420 | He first wants you to generate the data for January only - but then he also wants you to demonstrate that you can easily run the same analysis for February without many changes (if at all). 421 | 422 | Feel free to split up your final outputs into as many tables as you need - but be sure to explicitly reference which table outputs relate to which question for full marks :) 423 | --------------------*/ 424 | 425 | -- [WORK IN PROGRESS] 426 | 427 | /* -------------------- 428 | Bonus Challenge 429 | 430 | Use a single SQL query to transform the product_hierarchy and product_prices datasets to the product_details table. 431 | 432 | Hint: you may want to consider using a recursive CTE to solve this problem! 433 | --------------------*/ 434 | 435 | SELECT 436 | product_id, 437 | price, 438 | CONCAT( 439 | ph.level_text, 440 | ' ', 441 | ph1.level_text, 442 | ' - ', 443 | ph2.level_text 444 | ) AS product_name, 445 | ph2.id AS category_id, 446 | ph1.id AS segment_id, 447 | ph.id AS style_id, 448 | ph2.level_text AS category_name, 449 | ph1.level_text AS segment_name, 450 | ph.level_text AS style_name 451 | FROM 452 | product_hierarchy AS ph 453 | JOIN product_hierarchy AS ph1 on ph.parent_id = ph1.id 454 | JOIN product_hierarchy AS ph2 on ph1.parent_id = ph2.id 455 | JOIN product_prices AS pp on ph.id = pp.id 456 | -------------------------------------------------------------------------------- /Case Study #8 - Fresh Segments/README.md: -------------------------------------------------------------------------------- 1 | # Case Study #8 - Fresh Segments :diamond_shape_with_a_dot_inside: 2 | 3 | 4 | 5 | Clients share their customer lists with the Fresh Segments team who then aggregate interest metrics and generate a single dataset worth of metrics for further analysis. 6 | 7 | Danny has asked for your assistance to analyse aggregated metrics for an example client and provide some high level insights about the customer list and their interests. 8 | 9 | ## Available Data 10 | 11 | For this case study there is a total of 2 datasets which you will need to use to solve the questions. 12 | 13 | ### Interest Metrics 14 | 15 | This table contains information about aggregated interest metrics for a specific major client of Fresh Segments which makes up a large proportion of their customer base. 16 | 17 | Each record in this table represents the performance of a specific interest_id based on the client’s customer base interest measured through clicks and interactions with specific targeted advertising content. 18 | 19 | | \_month | \_year | month_year | interest_id | composition | index_value | ranking | percentile_ranking | 20 | |---------|--------|------------|-------------|-------------|-------------|---------|--------------------| 21 | | 7 | 2018 | 07-2018 | 32486 | 11.89 | 6.19 | 1 | 99.86 | 22 | | 7 | 2018 | 07-2018 | 6106 | 9.93 | 5.31 | 2 | 99.73 | 23 | | 7 | 2018 | 07-2018 | 18923 | 10.85 | 5.29 | 3 | 99.59 | 24 | | 7 | 2018 | 07-2018 | 6344 | 10.32 | 5.1 | 4 | 99.45 | 25 | | 7 | 2018 | 07-2018 | 100 | 10.77 | 5.04 | 5 | 99.31 | 26 | | 7 | 2018 | 07-2018 | 69 | 10.82 | 5.03 | 6 | 99.18 | 27 | | 7 | 2018 | 07-2018 | 79 | 11.21 | 4.97 | 7 | 99.04 | 28 | | 7 | 2018 | 07-2018 | 6111 | 10.71 | 4.83 | 8 | 98.9 | 29 | | 7 | 2018 | 07-2018 | 6214 | 9.71 | 4.83 | 8 | 98.9 | 30 | | 7 | 2018 | 07-2018 | 19422 | 10.11 | 4.81 | 10 | 98.63 | 31 | 32 | 33 | For example - let’s interpret the first row of the interest_metrics table together: 34 | 35 | | \_month | \_year | month_year | interest_id | composition | index_value | ranking | percentile_ranking | 36 | |---------|--------|------------|-------------|-------------|-------------|---------|--------------------| 37 | | 7 | 2018 | 07-2018 | 32486 | 11.89 | 6.19 | 1 | 99.86 | 38 | 39 | In July 2018, the composition metric is 11.89, meaning that 11.89% of the client’s customer list interacted with the interest `interest_id` = 32486 - we can link interest_id to a separate mapping table to find the segment name called “Vacation Rental Accommodation Researchers” 40 | 41 | The `index_value` is 6.19, means that the composition value is 6.19x the average composition value for all Fresh Segments clients’ customer for this particular interest in the month of July 2018. 42 | 43 | The `ranking` and `percentage_ranking` relates to the order of `index_value` records in each month year. 44 | 45 | ### Interest Map 46 | 47 | This mapping table links the `interest_id` with their relevant interest information. You will need to join this table onto the previous `interest_details` table to obtain the interest_name as well as any details about the summary information. 48 | 49 | | id | interest_name | interest_summary | created_at | last_modified | 50 | |----|---------------------------|------------------------------------------------------------------------------------|---------------------|---------------------| 51 | | 1 | Fitness Enthusiasts | Consumers using fitness tracking apps and websites. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 | 52 | | 2 | Gamers | Consumers researching game reviews and cheat codes. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 | 53 | | 3 | Car Enthusiasts | Readers of automotive news and car reviews. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 | 54 | | 4 | Luxury Retail Researchers | Consumers researching luxury product reviews and gift ideas. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 | 55 | | 5 | Brides & Wedding Planners | People researching wedding ideas and vendors. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 | 56 | | 6 | Vacation Planners | Consumers reading reviews of vacation destinations and accommodations. | 2016-05-26 14:57:59 | 2018-05-23 11:30:13 | 57 | | 7 | Motorcycle Enthusiasts | Readers of motorcycle news and reviews. | 2016-05-26 14:57:59 | 2018-05-23 11:30:13 | 58 | | 8 | Business News Readers | Readers of online business news content. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 | 59 | | 12 | Thrift Store Shoppers | Consumers shopping online for clothing at thrift stores and researching locations. | 2016-05-26 14:57:59 | 2018-03-16 13:14:00 | 60 | | 13 | Advertising Professionals | People who read advertising industry news. | 2016-05-26 14:57:59 | 2018-05-23 11:30:12 | 61 | 62 | ## Table of Contents 63 | 64 | [Introduction](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#introduction) 65 | 66 | [Case Study Questions](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#case-study-questions) 67 | 68 | [Data Exploration and Cleansing](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#data-exploration-and-cleansing) 69 | 70 | [1. Update the fresh_segments.interest_metrics table by modifying the month_year column to be a date data type with the start of the month](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-update-the-fresh_segmentsinterest_metrics-table-by-modifying-the-month_year-column-to-be-a-date-data-type-with-the-start-of-the-month) 71 | 72 | [2. What is count of records in the fresh_segments.interest_metrics for each month_year value sorted in chronological order (earliest to latest) with the null values appearing first?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-what-is-count-of-records-in-the-fresh_segmentsinterest_metrics-for-each-month_year-value-sorted-in-chronological-order-earliest-to-latest-with-the-null-values-appearing-first) 73 | 74 | [3. What do you think we should do with these null values in the fresh_segments.interest_metrics?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-what-do-you-think-we-should-do-with-these-null-values-in-the-fresh_segmentsinterest_metrics) 75 | 76 | [4. How many interest_id values exist in the fresh_segments.interest_metrics table but not in the fresh_segments.interest_map table? What about the other way around?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-how-many-interest_id-values-exist-in-the-fresh_segmentsinterest_metrics-table-but-not-in-the-fresh_segmentsinterest_map-table-what-about-the-other-way-around) 77 | 78 | [5. Summarise the id values in the fresh_segments.interest_map by its total record count in this table](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-summarise-the-id-values-in-the-fresh_segmentsinterest_map-by-its-total-record-count-in-this-table) 79 | 80 | [6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where interest_id = 21246 in your joined output and include all columns from fresh_segments.interest_metrics and all columns from fresh_segments.interest_map except from the id column](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#6-what-sort-of-table-join-should-we-perform-for-our-analysis-and-why-check-your-logic-by-checking-the-rows-where-interest_id--21246-in-your-joined-output-and-include-all-columns-from-fresh_segmentsinterest_metrics-and-all-columns-from-fresh_segmentsinterest_map-except-from-the-id-column) 81 | 82 | [7. Are there any records in your joined table where the month_year value is before the created_at value from the fresh_segments.interest_map table? Do you think these values are valid and why?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#7-are-there-any-records-in-your-joined-table-where-the-month_year-value-is-before-the-created_at-value-from-the-fresh_segmentsinterest_map-table-do-you-think-these-values-are-valid-and-why) 83 | 84 | [Interest Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#interest-analysis) 85 | 86 | [1. Which interests have been present in all month_year dates in our dataset?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-which-interests-have-been-present-in-all-month_year-dates-in-our-dataset) 87 | 88 | [2. Using this same total_months measure - calculate the cumulative percentage of all records starting at 14 months - which total_months value passes the 90% cumulative percentage value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-using-this-same-total_months-measure---calculate-the-cumulative-percentage-of-all-records-starting-at-14-months---which-total_months-value-passes-the-90-cumulative-percentage-value) 89 | 90 | [3. If we were to remove all interest_id values which are lower than the total_months value we found in the previous question - how many total data points would we be removing?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-if-we-were-to-remove-all-interest_id-values-which-are-lower-than-the-total_months-value-we-found-in-the-previous-question---how-many-total-data-points-would-we-be-removing) 91 | 92 | [4. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-does-this-decision-make-sense-to-remove-these-data-points-from-a-business-perspective-use-an-example-where-there-are-all-14-months-present-to-a-removed-interest-example-for-your-arguments---think-about-what-it-means-to-have-less-months-present-from-a-segment-perspective) 93 | 94 | [5. After removing these interests - how many unique interests are there for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-after-removing-these-interests---how-many-unique-interests-are-there-for-each-month) 95 | 96 | [Segment Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#segment-analysis) 97 | 98 | [1. Using our filtered dataset by removing the interests with less than 6 months worth of data, which are the top 10 and bottom 10 interests which have the largest composition values in any month_year? Only use the maximum composition value for each interest but you must keep the corresponding month_year](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-using-our-filtered-dataset-by-removing-the-interests-with-less-than-6-months-worth-of-data-which-are-the-top-10-and-bottom-10-interests-which-have-the-largest-composition-values-in-any-month_year-only-use-the-maximum-composition-value-for-each-interest-but-you-must-keep-the-corresponding-month_year) 99 | 100 | [2. Which 5 interests had the lowest average ranking value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-which-5-interests-had-the-lowest-average-ranking-value) 101 | 102 | [3. Which 5 interests had the largest standard deviation in their percentile_ranking value?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-which-5-interests-had-the-largest-standard-deviation-in-their-percentile_ranking-value) 103 | 104 | [4. For the 5 interests found in the previous question - what was minimum and maximum percentile_ranking values for each interest and its corresponding year_month value? Can you describe what is happening for these 5 interests?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-for-the-5-interests-found-in-the-previous-question---what-was-minimum-and-maximum-percentile_ranking-values-for-each-interest-and-its-corresponding-year_month-value-can-you-describe-what-is-happening-for-these-5-interests) 105 | 106 | [5. How would you describe our customers in this segment based off their composition and ranking values? What sort of products or services should we show to these customers and what should we avoid?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-how-would-you-describe-our-customers-in-this-segment-based-off-their-composition-and-ranking-values-what-sort-of-products-or-services-should-we-show-to-these-customers-and-what-should-we-avoid) 107 | 108 | [Index Analysis](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#index-analysis) 109 | 110 | [1. What is the top 10 interests by the average composition for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#1-what-is-the-top-10-interests-by-the-average-composition-for-each-month) 111 | 112 | [2. For all of these top 10 interests - which interest appears the most often?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#2-for-all-of-these-top-10-interests---which-interest-appears-the-most-often) 113 | 114 | [3. What is the average of the average composition for the top 10 interests for each month?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#3-what-is-the-average-of-the-average-composition-for-the-top-10-interests-for-each-month) 115 | 116 | [4. What is the 3 month rolling average of the max average composition value from September 2018 to August 2019 and include the previous top ranking interests in the same output shown below](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#4-what-is-the-3-month-rolling-average-of-the-max-average-composition-value-from-september-2018-to-august-2019-and-include-the-previous-top-ranking-interests-in-the-same-output-shown-below) 117 | 118 | [5. Provide a possible reason why the max average composition might change from month to month? Could it signal something is not quite right with the overall business model for Fresh Segments?](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md#5-provide-a-possible-reason-why-the-max-average-composition-might-change-from-month-to-month-could-it-signal-something-is-not-quite-right-with-the-overall-business-model-for-fresh-segments) 119 | -------------------------------------------------------------------------------- /Case Study #8 - Fresh Segments/SQL/solution_8.sql: -------------------------------------------------------------------------------- 1 | -- Solved on PostgreSQL 13.4 by Yulia Murtazina, January 30, 2022 2 | -- Fixed on February 19, 2022 3 | 4 | -- Case Study #8 - Fresh Segments 5 | 6 | SET 7 | SEARCH_PATH = fresh_segments; 8 | 9 | /* -------------------- 10 | Data Exploration and Cleansing 11 | 1. Update the fresh_segments.interest_metrics table by modifying the month_year column to be a date data type with the start of the month 12 | --------------------*/ 13 | 14 | ALTER TABLE 15 | interest_metrics 16 | ALTER COLUMN 17 | month_year TYPE DATE USING TO_DATE(month_year, 'MM-YYYY'); 18 | 19 | -- 2. What is count of records in the fresh_segments.interest_metrics for each month_year value sorted in chronological order (earliest to latest) with the null values appearing first? 20 | 21 | SELECT 22 | DATE_TRUNC('month', month_year) AS date, 23 | COUNT(*) AS number_of_records 24 | FROM 25 | interest_metrics 26 | GROUP BY 27 | month_year 28 | ORDER BY 29 | month_year NULLS FIRST; 30 | 31 | -- 3. What do you think we should do with these null values in the fresh_segments.interest_metrics? 32 | -- Non-coding question 33 | 34 | -- 4. How many interest_id values exist in the fresh_segments.interest_metrics table but not in the fresh_segments.interest_map table? What about the other way around? 35 | 36 | SELECT 37 | COUNT(distinct interest_id) AS interest_id 38 | FROM 39 | interest_metrics 40 | WHERE 41 | interest_id :: int NOT IN ( 42 | SELECT 43 | id 44 | FROM 45 | interest_map 46 | ); 47 | 48 | SELECT 49 | COUNT(id) AS interest_id 50 | FROM 51 | interest_map 52 | WHERE 53 | id NOT IN ( 54 | SELECT 55 | distinct interest_id :: int 56 | FROM 57 | interest_metrics 58 | WHERE 59 | interest_id IS NOT NULL 60 | ); 61 | 62 | -- 5. Summarise the id values in the fresh_segments.interest_map by its total record count in this table 63 | 64 | SELECT 65 | COUNT(distinct id) AS total_count 66 | FROM 67 | interest_map AS m; 68 | 69 | -- 6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where interest_id = 21246 in your joined output and include all columns from fresh_segments.interest_metrics and all columns from fresh_segments.interest_map except from the id column 70 | 71 | SELECT 72 | distinct interest_id :: int, 73 | interest_name, 74 | interest_summary, 75 | created_at, 76 | last_modified, 77 | _month, 78 | _year, 79 | month_year, 80 | composition, 81 | index_value, 82 | ranking, 83 | percentile_ranking 84 | FROM 85 | interest_map AS m 86 | LEFT JOIN interest_metrics AS im ON m.id = im.interest_id :: int 87 | WHERE 88 | interest_id = '21246' 89 | GROUP BY 90 | interest_name, 91 | id, 92 | interest_summary, 93 | created_at, 94 | last_modified, 95 | _month, 96 | _year, 97 | month_year, 98 | interest_id, 99 | composition, 100 | index_value, 101 | ranking, 102 | percentile_ranking 103 | ORDER BY 104 | _month NULLS FIRST; 105 | 106 | -- 7. Are there any records in your joined table where the month_year value is before the created_at value from the fresh_segments.interest_map table? Do you think these values are valid and why? 107 | 108 | WITH joined_table AS ( 109 | SELECT 110 | distinct interest_id :: int, 111 | interest_name, 112 | interest_summary, 113 | created_at, 114 | last_modified, 115 | _month, 116 | _year, 117 | month_year, 118 | composition, 119 | index_value, 120 | ranking, 121 | percentile_ranking 122 | FROM 123 | interest_map AS m 124 | LEFT JOIN interest_metrics AS im ON m.id = im.interest_id :: int 125 | GROUP BY 126 | interest_name, 127 | id, 128 | interest_summary, 129 | created_at, 130 | last_modified, 131 | _month, 132 | _year, 133 | month_year, 134 | interest_id, 135 | composition, 136 | index_value, 137 | ranking, 138 | percentile_ranking 139 | ) 140 | SELECT 141 | COUNT(*) 142 | FROM 143 | joined_table 144 | WHERE 145 | created_at > month_year 146 | ORDER BY 147 | 1; 148 | 149 | /* -------------------- 150 | Interest Analysis 151 | 1. Which interests have been present in all month_year dates in our dataset? 152 | --------------------*/ 153 | 154 | WITH interests AS ( 155 | SELECT 156 | id, 157 | interest_name 158 | FROM 159 | interest_map AS m 160 | LEFT JOIN interest_metrics AS im ON m.id = im.interest_id :: int 161 | GROUP BY 162 | 1, 163 | 2 164 | HAVING 165 | COUNT(interest_id) = 14 166 | ) 167 | SELECT 168 | interest_name 169 | FROM 170 | interests 171 | ORDER BY 172 | 1; 173 | 174 | -- 2. Using this same total_months measure - calculate the cumulative percentage of all records starting at 14 months - which total_months value passes the 90% cumulative percentage value? 175 | 176 | WITH counted_months AS ( 177 | SELECT 178 | interest_id, 179 | COUNT(interest_id) total_months, 180 | ROW_NUMBER() OVER( 181 | PARTITION BY COUNT(interest_id) 182 | ORDER BY 183 | COUNT(interest_id) 184 | ) AS rank 185 | FROM 186 | interest_metrics AS im 187 | GROUP BY 188 | 1 189 | HAVING 190 | COUNT(interest_id) > 0 191 | ) 192 | SELECT 193 | total_months, 194 | MAX(rank) AS number_of_interests, 195 | CAST( 196 | 100 * SUM(MAX(rank)) OVER ( 197 | ORDER BY 198 | total_months 199 | ) / SUM(MAX(rank)) OVER () AS numeric(10, 2) 200 | ) cum_top, 201 | CAST( 202 | 100 - 100 * SUM(MAX(rank)) OVER ( 203 | ORDER BY 204 | total_months 205 | ) / SUM(MAX(rank)) OVER () AS numeric(10, 2) 206 | ) cum_top_reversed 207 | FROM 208 | counted_months 209 | GROUP BY 210 | total_months 211 | ORDER BY 212 | 1; 213 | 214 | -- 3. If we were to remove all interest_id values which are lower than the total_months value we found in the previous question - how many total data points would we be removing? 215 | 216 | WITH interests AS ( 217 | SELECT 218 | interest_id 219 | FROM 220 | interest_metrics AS im 221 | GROUP BY 222 | 1 223 | HAVING 224 | COUNT(interest_id) < 6 225 | ) 226 | SELECT 227 | COUNT(interest_id) AS number_of_interests 228 | FROM 229 | interests 230 | ORDER BY 231 | 1; 232 | 233 | -- 4. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective. 234 | 235 | SELECT 236 | im.month_year, 237 | COUNT(interest_id) AS number_of_excluded_interests, 238 | number_of_included_interests, 239 | ROUND( 240 | 100 *( 241 | COUNT(interest_id) / number_of_included_interests :: numeric 242 | ), 243 | 1 244 | ) AS percent_of_excluded 245 | FROM 246 | interest_metrics AS im 247 | JOIN ( 248 | SELECT 249 | month_year, 250 | COUNT(interest_id) AS number_of_included_interests 251 | FROM 252 | interest_metrics AS im 253 | WHERE 254 | month_year IS NOT NULL 255 | AND interest_id :: int IN ( 256 | SELECT 257 | interest_id :: int 258 | FROM 259 | interest_metrics 260 | GROUP BY 261 | 1 262 | HAVING 263 | COUNT(interest_id) > 5 264 | ) 265 | GROUP BY 266 | 1 267 | ) i ON im.month_year = i.month_year 268 | WHERE 269 | im.month_year IS NOT NULL 270 | AND interest_id :: int IN ( 271 | SELECT 272 | interest_id :: int 273 | FROM 274 | interest_metrics 275 | GROUP BY 276 | 1 277 | having 278 | COUNT(interest_id) < 6 279 | ) 280 | GROUP BY 281 | 1, 282 | 3 283 | ORDER BY 284 | 1; 285 | 286 | -- 5. After removing these interests - how many unique interests are there for each month? 287 | 288 | SELECT 289 | month_year, 290 | COUNT(interest_id) AS number_of_interests 291 | FROM 292 | interest_metrics AS im 293 | WHERE 294 | month_year IS NOT NULL 295 | AND interest_id :: int IN ( 296 | SELECT 297 | interest_id :: int 298 | FROM 299 | interest_metrics 300 | GROUP BY 301 | 1 302 | HAVING 303 | COUNT(interest_id) > 5 304 | ) 305 | GROUP BY 306 | 1 307 | ORDER BY 308 | 1; 309 | 310 | /* -------------------- 311 | Segment Analysis 312 | 1. Using our filtered dataset by removing the interests with less than 6 months worth of data, which are the top 10 and bottom 10 interests which have the largest composition values in any month_year? Only use the maximum composition value for each interest but you must keep the corresponding month_year 313 | --------------------*/ 314 | 315 | SELECT 316 | interests.month_year, 317 | interests.interest_name, 318 | interests.composition, 319 | i_max_new.composition AS max_composition, 320 | i_max_new.month_year AS max_composition_month 321 | FROM 322 | ( 323 | ( 324 | WITH max_interests AS ( 325 | SELECT 326 | month_year, 327 | interest_name, 328 | composition, 329 | RANK() OVER ( 330 | PARTITION BY interest_name 331 | ORDER BY 332 | composition DESC 333 | ) AS max_rank 334 | FROM 335 | interest_metrics AS im 336 | JOIN interest_map AS m ON m.id = im.interest_id :: int 337 | WHERE 338 | month_year IS NOT NULL 339 | AND interest_id :: int in ( 340 | SELECT 341 | interest_id :: int 342 | FROM 343 | interest_metrics 344 | GROUP BY 345 | 1 346 | HAVING 347 | COUNT(interest_id) > 5 348 | ) 349 | GROUP BY 350 | 1, 351 | 2, 352 | 3 353 | ) 354 | SELECT 355 | month_year, 356 | interest_name, 357 | composition 358 | FROM 359 | max_interests 360 | WHERE 361 | max_rank = 1 362 | GROUP BY 363 | 1, 364 | 2, 365 | 3 366 | ORDER BY 367 | 3 DESC 368 | LIMIT 369 | 10 370 | ) 371 | UNION 372 | ( 373 | WITH min_interests AS ( 374 | SELECT 375 | month_year, 376 | interest_name, 377 | composition, 378 | RANK() OVER ( 379 | PARTITION BY interest_name 380 | ORDER BY 381 | composition 382 | ) AS min_rank 383 | FROM 384 | interest_metrics AS im 385 | JOIN interest_map AS m ON m.id = im.interest_id :: int 386 | WHERE 387 | month_year IS NOT NULL 388 | AND interest_id :: int in ( 389 | SELECT 390 | interest_id :: int 391 | FROM 392 | interest_metrics 393 | GROUP BY 394 | 1 395 | HAVING 396 | COUNT(interest_id) > 5 397 | ) 398 | GROUP BY 399 | 1, 400 | 2, 401 | 3 402 | ) 403 | SELECT 404 | month_year, 405 | interest_name, 406 | composition 407 | FROM 408 | min_interests 409 | WHERE 410 | min_rank = 1 411 | GROUP BY 412 | 1, 413 | 2, 414 | 3 415 | ORDER BY 416 | 3 417 | LIMIT 418 | 10 419 | ) 420 | ) AS interests 421 | JOIN ( 422 | WITH max_interests AS ( 423 | SELECT 424 | month_year, 425 | interest_name, 426 | composition, 427 | RANK() OVER ( 428 | PARTITION BY interest_name 429 | ORDER BY 430 | composition DESC 431 | ) AS max_rank 432 | FROM 433 | interest_metrics AS im 434 | JOIN interest_map AS m ON m.id = im.interest_id :: int 435 | WHERE 436 | month_year IS NOT NULL 437 | AND interest_id :: int in ( 438 | SELECT 439 | interest_id :: int 440 | FROM 441 | interest_metrics 442 | GROUP BY 443 | 1 444 | HAVING 445 | COUNT(interest_id) > 5 446 | ) 447 | GROUP BY 448 | 1, 449 | 2, 450 | 3 451 | ) 452 | SELECT 453 | month_year, 454 | interest_name, 455 | composition 456 | FROM 457 | max_interests 458 | WHERE 459 | max_rank = 1 460 | GROUP BY 461 | 1, 462 | 2, 463 | 3 464 | ORDER BY 465 | 3 DESC 466 | ) i_max_new on interests.interest_name = i_max_new.interest_name 467 | ORDER BY 468 | 3 DESC; 469 | 470 | -- 2. Which 5 interests had the lowest average ranking value? 471 | 472 | WITH ranking AS ( 473 | SELECT 474 | interest_name, 475 | AVG(ranking) :: numeric(10, 2) AS avg_ranking, 476 | RANK() OVER ( 477 | ORDER BY 478 | AVG(ranking) DESC 479 | ) AS rank 480 | FROM 481 | interest_metrics AS im 482 | JOIN interest_map AS m ON m.id = im.interest_id :: int 483 | WHERE 484 | month_year IS NOT NULL 485 | AND interest_id :: int IN ( 486 | SELECT 487 | interest_id :: int 488 | FROM 489 | interest_metrics 490 | GROUP BY 491 | 1 492 | HAVING 493 | COUNT(interest_id) > 5 494 | ) 495 | GROUP BY 496 | 1 497 | ) 498 | SELECT 499 | interest_name, 500 | avg_ranking 501 | FROM 502 | ranking 503 | WHERE 504 | rank between 0 505 | AND 5; 506 | 507 | -- 3. Which 5 interests had the largest standard deviation in their percentile_ranking value? 508 | 509 | WITH ranking AS ( 510 | SELECT 511 | id, 512 | interest_name, 513 | STDDEV(percentile_ranking) :: numeric(10, 2) AS standard_deviation, 514 | RANK() OVER ( 515 | ORDER BY 516 | STDDEV(percentile_ranking) DESC 517 | ) AS rank 518 | FROM 519 | interest_metrics AS im 520 | JOIN interest_map AS m ON m.id = im.interest_id :: int 521 | WHERE 522 | month_year IS NOT NULL 523 | AND interest_id :: int IN ( 524 | SELECT 525 | interest_id :: int 526 | FROM 527 | interest_metrics 528 | GROUP BY 529 | 1 530 | having 531 | count(interest_id) > 5 532 | ) 533 | GROUP BY 534 | 1, 535 | 2 536 | ) 537 | SELECT 538 | interest_name, 539 | standard_deviation 540 | FROM 541 | ranking 542 | WHERE 543 | rank between 0 544 | AND 5; 545 | 546 | -- 4. For the 5 interests found in the previous question - what was minimum and maximum percentile_ranking values for each interest and its corresponding year_month value? Can you describe what is happening for these 5 interests? 547 | 548 | WITH ranking AS ( 549 | SELECT 550 | month_year, 551 | id, 552 | interest_name, 553 | percentile_ranking, 554 | RANK() OVER ( 555 | PARTITION BY id 556 | ORDER BY 557 | percentile_ranking 558 | ) AS min_rank, 559 | RANK() OVER ( 560 | PARTITION BY id 561 | ORDER BY 562 | percentile_ranking DESC 563 | ) AS max_rank 564 | FROM 565 | interest_metrics AS im 566 | JOIN interest_map AS m ON m.id = im.interest_id :: int 567 | WHERE 568 | month_year IS NOT NULL 569 | AND interest_id :: int IN ( 570 | SELECT 571 | interest_id :: int 572 | FROM 573 | interest_metrics 574 | GROUP BY 575 | 1 576 | HAVING 577 | COUNT(interest_id) > 5 578 | ) 579 | AND id IN ( 580 | WITH ranking AS ( 581 | SELECT 582 | id, 583 | interest_name, 584 | STDDEV(percentile_ranking) :: numeric(10, 2) AS standard_deviation, 585 | RANK() OVER ( 586 | ORDER BY 587 | STDDEV(percentile_ranking) DESC 588 | ) AS rank 589 | FROM 590 | interest_metrics AS im 591 | JOIN interest_map AS m ON m.id = im.interest_id :: int 592 | WHERE 593 | month_year IS NOT NULL 594 | AND interest_id :: int IN ( 595 | SELECT 596 | interest_id :: int 597 | FROM 598 | interest_metrics 599 | GROUP BY 600 | 1 601 | having 602 | count(interest_id) > 5 603 | ) 604 | GROUP BY 605 | 1, 606 | 2 607 | ) 608 | SELECT 609 | id 610 | FROM 611 | ranking 612 | WHERE 613 | rank between 0 614 | AND 5 615 | ) 616 | GROUP BY 617 | 1, 618 | 2, 619 | 3, 620 | 4 621 | ) 622 | SELECT 623 | month_year, 624 | interest_name, 625 | percentile_ranking 626 | FROM 627 | ranking 628 | WHERE 629 | min_rank = 1 630 | or max_rank = 1 631 | GROUP BY 632 | 1, 633 | 2, 634 | 3 635 | ORDER BY 636 | 2, 637 | 3 DESC; 638 | 639 | -- 5. How would you describe our customers in this segment based off their composition and ranking values? What sort of products or services should we show to these customers and what should we avoid? 640 | -- Non-coding question 641 | 642 | /* -------------------- 643 | Index Analysis 644 | The index_value is a measure which can be used to reverse calculate the average composition for Fresh Segments’ clients. 645 | Average composition can be calculated by dividing the composition column by the index_value column rounded to 2 decimal places. 646 | 1. What is the top 10 interests by the average composition for each month? 647 | --------------------*/ 648 | 649 | WITH ranking AS ( 650 | SELECT 651 | month_year, 652 | id, 653 | interest_name, 654 | avg_composition, 655 | RANK() OVER ( 656 | PARTITION BY month_year 657 | ORDER BY 658 | avg_composition DESC 659 | ) AS max_rank 660 | FROM 661 | interest_metrics AS im 662 | JOIN interest_map AS m ON m.id = im.interest_id :: int, 663 | LATERAL( 664 | SELECT 665 | (composition / index_value) :: numeric(10, 2) AS avg_composition 666 | ) ac 667 | WHERE 668 | month_year IS NOT NULL 669 | AND interest_id :: int IN ( 670 | SELECT 671 | interest_id :: int 672 | FROM 673 | interest_metrics 674 | GROUP BY 675 | 1 676 | HAVING 677 | COUNT(interest_id) > 5 678 | ) 679 | GROUP BY 680 | 1, 681 | 2, 682 | 3, 683 | 4 684 | ) 685 | SELECT 686 | month_year, 687 | interest_name, 688 | avg_composition 689 | FROM 690 | ranking 691 | WHERE 692 | max_rank between 1 693 | AND 10 694 | ORDER BY 695 | 1, 696 | 3 DESC; 697 | 698 | -- 2. For all of these top 10 interests - which interest appears the most often? 699 | 700 | WITH ranking AS ( 701 | SELECT 702 | month_year, 703 | id, 704 | interest_name, 705 | avg_composition, 706 | RANK() OVER ( 707 | PARTITION BY month_year 708 | ORDER BY 709 | avg_composition DESC 710 | ) AS max_rank 711 | FROM 712 | interest_metrics AS im 713 | JOIN interest_map AS m on m.id = im.interest_id :: int, 714 | LATERAL( 715 | SELECT 716 | (composition / index_value) :: numeric(10, 2) AS avg_composition 717 | ) ac 718 | WHERE 719 | month_year IS NOT NULL 720 | AND interest_id :: int IN ( 721 | SELECT 722 | interest_id :: int 723 | FROM 724 | interest_metrics 725 | GROUP BY 726 | 1 727 | HAVING 728 | COUNT(interest_id) > 5 729 | ) 730 | GROUP BY 731 | 1, 732 | 2, 733 | 3, 734 | 4 735 | ) 736 | SELECT 737 | interest_name, 738 | COUNT(interest_name) AS months_in_top_1 739 | FROM 740 | ranking 741 | WHERE 742 | max_rank = 1 743 | GROUP BY 744 | 1 745 | ORDER BY 746 | 2 DESC; 747 | 748 | -- 3. What is the average of the average composition for the top 10 interests for each month? 749 | 750 | SELECT 751 | month_year, 752 | AVG(avg_composition) :: numeric(10, 2) AS average_rating 753 | FROM 754 | ( 755 | WITH ranking AS ( 756 | SELECT 757 | month_year, 758 | id, 759 | interest_name, 760 | avg_composition, 761 | RANK() OVER ( 762 | PARTITION BY month_year 763 | ORDER BY 764 | avg_composition DESC 765 | ) AS max_rank 766 | FROM 767 | interest_metrics AS im 768 | JOIN interest_map AS m ON m.id = im.interest_id :: int, 769 | LATERAL( 770 | SELECT 771 | (composition / index_value) :: numeric(10, 2) AS avg_composition 772 | ) ac 773 | WHERE 774 | month_year IS NOT NULL 775 | AND interest_id :: int IN ( 776 | SELECT 777 | interest_id :: int 778 | FROM 779 | interest_metrics 780 | GROUP BY 781 | 1 782 | HAVING 783 | COUNT(interest_id) > 5 784 | ) 785 | GROUP BY 786 | 1, 787 | 2, 788 | 3, 789 | 4 790 | ) 791 | SELECT 792 | month_year, 793 | interest_name, 794 | avg_composition 795 | FROM 796 | ranking 797 | WHERE 798 | max_rank between 1 799 | AND 10 800 | ) r 801 | GROUP BY 802 | 1 803 | ORDER BY 804 | 1; 805 | 806 | -- 4. What is the 3 month rolling average of the max average composition value from September 2018 to August 2019 and include the previous top ranking interests in the same output shown below 807 | 808 | SELECT 809 | * 810 | FROM 811 | ( 812 | WITH ranking AS ( 813 | SELECT 814 | month_year, 815 | id, 816 | interest_name, 817 | avg_composition, 818 | RANK() OVER ( 819 | PARTITION BY month_year 820 | ORDER BY 821 | avg_composition DESC 822 | ) AS max_rank 823 | FROM 824 | interest_metrics AS im 825 | JOIN interest_map AS m ON m.id = im.interest_id :: int, 826 | LATERAL( 827 | SELECT 828 | (composition / index_value) :: numeric(10, 2) AS avg_composition 829 | ) ac 830 | WHERE 831 | month_year IS NOT NULL 832 | AND interest_id :: int IN ( 833 | SELECT 834 | interest_id :: int 835 | FROM 836 | interest_metrics 837 | GROUP BY 838 | 1 839 | HAVING 840 | COUNT(interest_id) > 5 841 | ) 842 | GROUP BY 843 | 1, 844 | 2, 845 | 3, 846 | 4 847 | ) 848 | SELECT 849 | month_year, 850 | interest_name, 851 | avg_composition AS max_index_composition, 852 | ( 853 | AVG(avg_composition) OVER( 854 | ORDER BY 855 | month_year ROWS BETWEEN 2 PRECEDING 856 | AND CURRENT ROW 857 | ) 858 | ) :: numeric(10, 2) AS _3_month_moving_avg, 859 | CONCAT( 860 | LAG(interest_name) OVER ( 861 | ORDER BY 862 | month_year 863 | ), 864 | ': ', 865 | LAG(avg_composition) OVER ( 866 | ORDER BY 867 | month_year 868 | ) 869 | ) AS _1_month_ago, 870 | CONCAT( 871 | LAG(interest_name, 2) OVER ( 872 | ORDER BY 873 | month_year 874 | ), 875 | ': ', 876 | LAG(avg_composition, 2) OVER ( 877 | ORDER BY 878 | month_year 879 | ) 880 | ) AS _2_month_ago 881 | FROM 882 | ranking 883 | WHERE 884 | max_rank = 1 885 | ) r 886 | WHERE 887 | month_year > '2018-08-01' 888 | ORDER BY 889 | 1; 890 | 891 | -- 5. Provide a possible reason why the max average composition might change from month to month? Could it signal something is not quite right with the overall business model for Fresh Segments? 892 | -- Non-coding question 893 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 8-Week-SQL-Challenge 2 | #8WeekSQLChallenge, https://8weeksqlchallenge.com: Solutions for SQL Case Studies 3 | 4 | | Name | Description | Status | 5 | |--------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------| 6 | | **[Case Study #1 - Danny's Diner](https://github.com/muryulia/8-Week-SQL-Challenge/tree/main/Case%20Study%20%231%20-%20Danny's%20Diner)** | Danny’s Diner is in need of your assistance to help the restaurant stay afloat. Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. | Completed | 7 | | **[Case Study #2 - Pizza Runner](https://github.com/muryulia/8-Week-SQL-Challenge/tree/main/Case%20Study%20%232%20-%20Pizza%20Runner)** | Danny requires further assistance to clean his data and apply some basic calculations so he can better direct his runners and optimise Pizza Runner’s operations. | Completed | 8 | | **[Case Study #3 - Foodie-Fi](https://github.com/muryulia/8-Week-SQL-Challenge/tree/main/Case%20Study%20%233%20-%20Foodie-Fi)** | This case study focuses on using subscription style digital data to answer important business questions. | Completed | 9 | | **[Case Study #4 - Data Bank](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%234%20-%20Data%20Bank/Solution.md)** | Data Bank runs just like any other digital bank - but it isn’t only for banking activities, they also have the world’s most secure distributed data storage platform! The management team at Data Bank want to increase their total customer base - but also need some help tracking just how much data storage their customers will need. | In Progress | 10 | | **[Case Study #5 - Data Mart](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%235%20-%20Data%20Mart/Solution.md)** | Data Mart is Danny’s latest venture and after running international operations for his online supermarket that specialises in fresh produce - Danny is asking for your support to analyse his sales performance. | Completed | 11 | | **[Case Study #6 - Clique Bait](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%236%20-%20Clique%20Bait/Solution.md)** | In this case study - you are required to support Danny’s vision and analyse his dataset and come up with creative solutions to calculate funnel fallout rates for the Clique Bait online store. | Completed Partially | 12 | | **[Case Study #7 - Balanced Tree Clothing Co.](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%237%20-%20Balanced%20Tree%20Clothing%20Co./Solution.md)** | Danny, the CEO of this trendy fashion company has asked you to assist the team’s merchandising teams analyse their sales performance and generate a basic financial report to share with the wider business. | Completed Partially | 13 | | **[Case Study #8 - Fresh Segments](https://github.com/muryulia/8-Week-SQL-Challenge/blob/main/Case%20Study%20%238%20-%20Fresh%20Segments/Solution.md)** | Danny created Fresh Segments, a digital marketing agency that helps other businesses analyse trends in online ad click behaviour for their unique customer base. Danny has asked for your assistance to analyse aggregated metrics for an example client and provide some high level insights about the customer list and their interests. | Completed | 14 | 15 | Solved on Postgres using various SQL techniques like conditional `WHERE` statements, data aggregation with `GROUP BY` and ordering with `ORDER BY`, `CASE WHEN` Statements, string transformations, regular expressions, datetime manipulation, common table expressions (CTE), subqueries and nested queries, complex table joins (Inner, Outer, Left, Right, Lateral - no Cross Joins - but it was possible to use them too), window functions (`row_number()`, `rank()`, `dense_rank()`, `lag()`, `lead()`). 16 | 17 | Some cases include duplicate pre-processing, exploratory data analysis and summary statistics. 18 | --------------------------------------------------------------------------------