├── .gitattributes ├── IMG ├── pngwing.com.png ├── z2513921830835_b2f645dc08b9ae4ecfd59aa2f37f74f5.jpg ├── z2513953538301_86b8f067910a3700d4d4199e7bf5e690.jpg └── z2513958512711_8698af869ba03d103dd7a1ef2fe33e79.jpg ├── health-analytics.sql └── README.md /.gitattributes: -------------------------------------------------------------------------------- 1 | *.sql linguist-detectable=true 2 | *.sql linguist-language=sql 3 | *.sql text -------------------------------------------------------------------------------- /IMG/pngwing.com.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/health-analysis/HEAD/IMG/pngwing.com.png -------------------------------------------------------------------------------- /IMG/z2513921830835_b2f645dc08b9ae4ecfd59aa2f37f74f5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/health-analysis/HEAD/IMG/z2513921830835_b2f645dc08b9ae4ecfd59aa2f37f74f5.jpg -------------------------------------------------------------------------------- /IMG/z2513953538301_86b8f067910a3700d4d4199e7bf5e690.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/health-analysis/HEAD/IMG/z2513953538301_86b8f067910a3700d4d4199e7bf5e690.jpg -------------------------------------------------------------------------------- /IMG/z2513958512711_8698af869ba03d103dd7a1ef2fe33e79.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ndleah/health-analysis/HEAD/IMG/z2513958512711_8698af869ba03d103dd7a1ef2fe33e79.jpg -------------------------------------------------------------------------------- /health-analytics.sql: -------------------------------------------------------------------------------- 1 | -- For more detailed information, please read the README.md file contained in this folder for each step guidance. 2 | -- 1. How many unique users exist in the logs dataset? 3 | SELECT COUNT (DISTINCT id) 4 | FROM health.user_logs; 5 | 6 | /* Result: 7 | |count | 8 | |----------------------------------------| 9 | |554 | 10 | */ 11 | 12 | 13 | -- For question 2-8, I created a temporary table: 14 | -- > Step 1: Firstly, I ran a DROP TABLE IF EXISTS statement to clear out any previously created tables: 15 | DROP TABLE IF EXISTS user_measure_count; 16 | 17 | -- > Step 2: Next, I created a new temporary table using the results of the query below: 18 | CREATE TEMP TABLE user_measure_count AS 19 | SELECT 20 | id, 21 | COUNT(*) AS measure_count, 22 | COUNT (DISTINCT measure) AS unique_measures 23 | FROM health.user_logs 24 | GROUP BY 1; 25 | 26 | 27 | -- 2. How many total measurements do we have per user on average? 28 | SELECT 29 | ROUND (AVG(measure_count), 2) AS mean_value 30 | FROM user_measure_count; 31 | 32 | /* Result: 33 | |mean_value | 34 | |----------------------------------------| 35 | |79.23 | 36 | */ 37 | 38 | 39 | -- 3. What about the median number of measurements per user? 40 | SELECT 41 | PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY measure_count) AS median_value 42 | FROM user_measure_count; 43 | 44 | /* Result: 45 | |median_value | 46 | |----------------------------------------| 47 | |2 | 48 | */ 49 | 50 | 51 | -- 4. How many users have 3 or more measurements? 52 | SELECT COUNT(*) 53 | FROM user_measure_count 54 | WHERE measure_count >= 3; 55 | 56 | /* Result: 57 | |count | 58 | |----------------------------------------| 59 | |209 | 60 | */ 61 | 62 | 63 | -- 5. How many users have 1,000 or more measurements? 64 | SELECT COUNT(*) 65 | FROM user_measure_count 66 | WHERE measure_count >= 1000; 67 | 68 | /* Result: 69 | |count | 70 | |----------------------------------------| 71 | |5 | 72 | */ 73 | 74 | -- 6. Have logged blood glucose measurements? 75 | SELECT 76 | COUNT(DISTINCT id) 77 | FROM health.user_logs 78 | WHERE measure = 'blood_glucose'; 79 | 80 | /* Result: 81 | |count | 82 | |----------------------------------------| 83 | |325 | 84 | */ 85 | 86 | 87 | -- 7. Have at least 2 types of measurements? 88 | SELECT 89 | COUNT(*) 90 | FROM user_measure_count 91 | WHERE unique_measures >= 2; 92 | 93 | /* Result: 94 | |count | 95 | |----------------------------------------| 96 | |204 | 97 | */ 98 | 99 | -- 8. Have all 3 measures - blood glucose, weight and blood pressure? 100 | SELECT 101 | COUNT(*) 102 | FROM user_measure_count 103 | WHERE unique_measures = 3; 104 | 105 | /* Result: 106 | |count | 107 | |----------------------------------------| 108 | |50 | 109 | */ 110 | 111 | 112 | -- 9. What is the median systolic/diastolic blood pressure values? 113 | SELECT 114 | PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY systolic) AS median_systolic, 115 | PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY diastolic) AS median_diastolic 116 | FROM health.user_logs 117 | WHERE measure = 'blood_pressure'; 118 | 119 | /* Result: 120 | |median_systolic|median_diastolic| 121 | |---------------|----------------| 122 | |126 |79 | 123 | */ 124 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![Star Badge](https://img.shields.io/static/v1?label=%F0%9F%8C%9F&message=If%20Useful&style=style=flat&color=BC4E99) 2 | ![Open Source Love](https://badges.frapsoft.com/os/v1/open-source.svg?v=103) 3 | [![View My Profile](https://img.shields.io/badge/View-My_Profile-green?logo=GitHub)](https://github.com/ndleah) 4 | [![View Repositories](https://img.shields.io/badge/View-My_Repositories-blue?logo=GitHub)](https://github.com/ndleah?tab=repositories) 5 | 6 | # Health Analytics Case Study 7 | 8 | > This case study is contained within the [Serious SQL](https://www.datawithdanny.com) by [Danny Ma](https://www.linkedin.com/in/datawithdanny/) 9 | > 10 | ## 📕 **Table of contents** 11 | 12 | * 🛠️ [Overview](#️-overview) 13 | * 🚀 [Solutions](#-solutions) 14 | * 💻 [Key Highlights](#-key-highlight) 15 | 16 | 17 | ## 🛠️ Overview 18 | With the **Health Analytics Mini Case Study**, I queried data to bring insights to the following questions: 19 | 1. How many `unique users` exist in the logs dataset? 20 | 2. How many total `measurements` do we have `per user on average`? 21 | 3. What about the `median` number of measurements per user? 22 | 4. How many users have `3 or more` measurements? 23 | 5. How many users have `1,000 or more` measurements? 24 | 6. Have logged `blood glucose` measurements? 25 | 7. Have `at least 2 types` of measurements? 26 | 8. Have all 3 measures - `blood glucose, weight and blood pressure`? 27 | 9. What is the `median systolic/diastolic` **blood pressure** values? 28 | 29 | --- 30 | ## 🚀 Solutions 31 | 32 | ![Question 1](https://img.shields.io/badge/Question-1-971901) 33 | ### **How many unique users exist in the logs dataset?** 34 | ```sql 35 | SELECT COUNT (DISTINCT id) 36 | FROM health.user_logs; 37 | ``` 38 | 39 | |count | 40 | |----------------------------------------| 41 | |554 | 42 | 43 | 44 | 45 | **`Note:` For question 2-8, I created a temporary table:** 46 | 47 | **Step 1:** Firstly, I ran a code `DROP TABLE IF EXISTS` statement to clear out any previously created tables: 48 | ```sql 49 | DROP TABLE IF EXISTS user_measure_count; 50 | ``` 51 | **Step 2:** Next, I created a new **temporary table** using the results of the query below: 52 | ```sql 53 | CREATE TEMP TABLE user_measure_count AS 54 | SELECT 55 | id, 56 | COUNT(*) AS measure_count, 57 | COUNT (DISTINCT measure) AS unique_measures 58 | FROM health.user_logs 59 | GROUP BY 1; 60 | ``` 61 | 62 | ![Question 2](https://img.shields.io/badge/Question-2-971901) 63 | ### **How many total measurements do we have per user on average?** 64 | ```sql 65 | SELECT 66 | ROUND (AVG(measure_count), 2) AS mean_value 67 | FROM user_measure_count; 68 | ``` 69 | 70 | |mean_value | 71 | |----------------------------------------| 72 | |79.23 | 73 | 74 | --- 75 | 76 | ![Question 3](https://img.shields.io/badge/Question-3-971901) 77 | ### **What about the median number of measurements per user?** 78 | ```sql 79 | SELECT 80 | PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY measure_count) AS median_value 81 | FROM user_measure_count; 82 | ``` 83 | 84 | |median_value | 85 | |----------------------------------------| 86 | |2 | 87 | 88 | 89 | 90 | ![Question 4](https://img.shields.io/badge/Question-4-971901) 91 | ### **How many users have 3 or more measurements?** 92 | ```sql 93 | SELECT COUNT(*) 94 | FROM user_measure_count 95 | WHERE measure_count >= 3; 96 | ``` 97 | 98 | |count | 99 | |----------------------------------------| 100 | |209 | 101 | 102 | 103 | ![Question 5](https://img.shields.io/badge/Question-5-971901) 104 | ### **How many users have 1,000 or more measurements?** 105 | ```sql 106 | SELECT COUNT(*) 107 | FROM user_measure_count 108 | WHERE measure_count >= 1000; 109 | ``` 110 | 111 | |count | 112 | |----------------------------------------| 113 | |5 | 114 | 115 | --- 116 | 117 | ![Question 6](https://img.shields.io/badge/Question-6-971901) 118 | ### **Have logged blood glucose measurements?** 119 | ```sql 120 | SELECT 121 | COUNT(DISTINCT id) 122 | FROM health.user_logs 123 | WHERE measure = 'blood_glucose'; 124 | ``` 125 | 126 | |count | 127 | |----------------------------------------| 128 | |325 | 129 | 130 | --- 131 | 132 | ![Question 7](https://img.shields.io/badge/Question-7-971901) 133 | ### 7. **Have at least 2 types of measurements?** 134 | ```sql 135 | SELECT 136 | COUNT(*) 137 | FROM user_measure_count 138 | WHERE unique_measures >= 2; 139 | ``` 140 | 141 | 142 | |count | 143 | |----------------------------------------| 144 | |204 | 145 | 146 | --- 147 | 148 | ![Question 8](https://img.shields.io/badge/Question-8-971901) 149 | ### **Have all 3 measures - blood glucose, weight and blood pressure?** 150 | ```sql 151 | SELECT 152 | COUNT(*) 153 | FROM user_measure_count 154 | WHERE unique_measures = 3; 155 | ``` 156 | 157 | |count | 158 | |----------------------------------------| 159 | |50 | 160 | 161 | --- 162 | 163 | ![Question 9](https://img.shields.io/badge/Question-9-971901) 164 | ### **What is the median systolic/diastolic blood pressure values?** 165 | ```sql 166 | SELECT 167 | PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY systolic) AS median_systolic, 168 | PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY diastolic) AS median_diastolic 169 | FROM health.user_logs 170 | WHERE measure = 'blood_pressure'; 171 | ``` 172 | 173 | |median_systolic|median_diastolic| 174 | |---------------|----------------| 175 | |126 |79 | 176 | --- 177 | ## 💻 Key Highlight 178 | > **Initial thoughts:** 179 | Even though this is a short assignment which cover basic SQL syntax, I did run into problems several time during the solving process. However, it helped me to have a better understanding about data exploration using SQL from theories to real life application. 180 | 181 | Some of the main areas covered in this case study, including: 182 | * **Sorting Values** 183 | * **Inspect Row Counts** 184 | * **Duplicates & Record Frequency Review** 185 | * **Summary Statistics** `(MEAN, MEDIAN)` --------------------------------------------------------------------------------