├── slides ├── timings.json └── chapter_1_dfa96b92ebbafae7cbc930ac36eff790.md ├── course.yml ├── datasets └── README.md ├── chapter1.md ├── requirements.sh └── README.md /slides/timings.json: -------------------------------------------------------------------------------- 1 | [] -------------------------------------------------------------------------------- /course.yml: -------------------------------------------------------------------------------- 1 | title: Case Study Data Driven Decision Making with SQL 2 | description: A description of the course. 3 | programming_language: sql 4 | from: 'postgresql-base-prod:22' # 'msft-sql-base-prod:40' 5 | -------------------------------------------------------------------------------- /datasets/README.md: -------------------------------------------------------------------------------- 1 | # Datasets 2 | 3 | Upload datasets to this folder, then delete this file. Datasets should be under 10Mb; if not, speak to your Curriculum Lead. 4 | 5 | https://authoring.datacamp.com/courses/assets.html 6 | -------------------------------------------------------------------------------- /chapter1.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Template Chapter 1' 3 | description: 'This is a template chapter.' 4 | --- 5 | 6 | ## Ex 1.1 7 | 8 | ```yaml 9 | type: NormalExercise 10 | key: 84620e413f 11 | lang: sql 12 | xp: 100 13 | skills: 1 14 | ``` 15 | 16 | Do some data science. 17 | 18 | `@instructions` 19 | 20 | 21 | `@hint` 22 | 23 | 24 | `@pre_exercise_code` 25 | ```{python} 26 | 27 | ``` 28 | 29 | `@sample_code` 30 | ```{sql} 31 | 32 | ``` 33 | 34 | `@solution` 35 | ```{sql} 36 | 37 | ``` 38 | 39 | `@sct` 40 | ```{python} 41 | 42 | ``` 43 | 44 | --- 45 | 46 | ## Case Study: Data Driven Decision Making with SQL 47 | 48 | ```yaml 49 | type: VideoExercise 50 | key: d12207f973 51 | xp: 50 52 | ``` 53 | 54 | `@projector_key` 55 | dfa96b92ebbafae7cbc930ac36eff790 56 | -------------------------------------------------------------------------------- /requirements.sh: -------------------------------------------------------------------------------- 1 | # Use this file to install Linux software packages into the course image. 2 | # There is a list of available Linux packages at 3 | # https://packages.debian.org/testing/allpackages 4 | 5 | pip3 install --no-deps sqlwhat-ext==0.0.1 6 | 7 | # Write the SQL commands to COPY or BULK INSERT the data from CSV files into tables. 8 | # These should be in scripts named datasets/**DATABASENAME**.sql. 9 | # Then uncomment the code below, replacing **COURSEID** and **DATABASENAME**. 10 | 11 | # wget https://s3.amazonaws.com/assets.datacamp.com/production/course_**COURSEID**/datasets/**DATABASENAME**.sql 12 | 13 | # service postgresql start \ 14 | # && sudo -u postgres createdb --owner repl **DATABASENAME** \ 15 | # && sudo -u repl psql --echo-all --dbname **DATABASENAME** --file **DATABASENAME**.sql \ 16 | # && service postgresql stop 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Course Title By The Instructor 2 | 3 | Specs deadline: YYYY-MM-DD 4 | 5 | ## Course development resources 6 | 7 | * Course admin page: https://www.datacamp.com/teach/ 8 | * Authoring documentation: https://authoring.datacamp.com 9 | 10 | *Please read the [course design process description](https://authoring.datacamp.com/courses/design) 11 | and complete these steps in the `README.md` file in your course repository. 12 | If you need assistance, please speak with your Curriculum Lead.* 13 | 14 | ## Step 1: Brainstorming 15 | 16 | ### 1. What problem(s) will students learn how to solve? 17 | 18 | ### 2. What techniques or concepts will students learn? 19 | 20 | ### 3. What technologies, packages, or functions will students use? 21 | 22 | ### 4. What terms or jargon will you define? 23 | 24 | ### 5. What analogies or heuristics will you use? 25 | 26 | ### 6. What mistakes or misconceptions do you expect? 27 | 28 | ### 7. What datasets will you use? 29 | 30 | ## Step 2: Who Is This Course for? 31 | 32 | Link to [learner personas](https://authoring.datacamp.com/courses/design/personas.html) 33 | 34 | * Student 1: discussion. 35 | * Student 2: discussion. 36 | * Student 3: discussion. 37 | 38 | ## Step 3: What Will Learners Do Along the Way? 39 | 40 | Write full descriptions of a couple of significant exercises to show how far learners are likely to get. 41 | 42 | ### Title of Exercise 43 | 44 | Describe the exercise here, including the learning objectives, concepts taught, and any other important details. 45 | 46 | **Solution** 47 | 48 | ``` 49 | Include the code that you expect the students to write by the end of the course. 50 | It should typically be 2 or 3 lines. 51 | ``` 52 | 53 | ### Other Exercises 54 | 55 | Write brief descriptions of 10 to 15 more exercises throughout the course. 56 | After this step you should have a clear idea of the flow of the course. 57 | 58 | #### Exercise title 1 59 | 60 | - Describe the exercise. 61 | - Mention the learning objectives. 62 | - Two or three bullets points is enough. 63 | 64 | **Solution** 65 | 66 | ``` 67 | Solution code here. 68 | It should typically be 2 or 3 lines. 69 | ``` 70 | 71 | #### Exercise title 2 72 | 73 | - Describe the exercise. 74 | - … 75 | 76 | **Solution** 77 | 78 | ``` 79 | Solution code here. 80 | ``` 81 | 82 | ## Step 4: How Are the Concepts Connected? 83 | 84 | *Remind yourself about [course terminology](https://authoring.datacamp.com/courses/design#terminology-and-structure), then describe the flow of the course.* 85 | 86 | - Chapter 1 87 | - Lesson 1.1 88 | - Lesson 1.2 89 | - Lesson 1.3 90 | - Chapter 2 91 | - Lesson 2.1 92 | - Lesson 2.2 93 | - Lesson 2.3 94 | 95 | The datasets are: 96 | 97 | - `path/to/dataset-1`: data set 1 98 | - `path/to/dataset-2`: data set 2 99 | 100 | ## Step 5: Course Overview 101 | 102 | **Course Description** 103 | 104 | One-paragraph description of the course. 105 | 106 | **Learning Objectives** 107 | 108 | - Objective 1 109 | - Objective 2 110 | - Objective 3 111 | 112 | **Prerequisites** 113 | 114 | *Which DataCamp courses cover topics that a student should be familiar with before attempting this course? Here are some examples:* 115 | 116 | - [Intro to SQL for Data Science](https://www.datacamp.com/courses/intro-to-sql-for-data-science) 117 | - [Joining Data in PostgreSQL](https://www.datacamp.com/courses/joining-data-in-postgresql) 118 | -------------------------------------------------------------------------------- /slides/chapter_1_dfa96b92ebbafae7cbc930ac36eff790.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: dfa96b92ebbafae7cbc930ac36eff790 4 | 5 | --- 6 | ## Welcome 7 | 8 | ```yaml 9 | type: "TitleSlide" 10 | key: "8ed49f4555" 11 | ``` 12 | 13 | `@lower_third` 14 | 15 | name: Nujcharee Haswell (Ped) 16 | title: Data Intelligence Specialist / Data Scientist 17 | 18 | 19 | `@script` 20 | Hi, welcome to the course I’m Nujcharee Haswell and I’m a data intelligence specialist and a data scientist. I’ll be your instructor for Case Study Data Driven Decision Making with SQL. 21 | 22 | 23 | --- 24 | ## What is SQL? 25 | 26 | ```yaml 27 | type: "FullSlide" 28 | key: "19ff28420e" 29 | ``` 30 | 31 | `@part1` 32 | - Structured Query Language {{2}} 33 | 34 | - Interact with relational databases {{3}} 35 | 36 | - retrieve data {{4}} 37 | 38 | - manipulate data {{5}} 39 | 40 | - summarise data {{6}} 41 | 42 | - Explore and summarise a real life database. {{7}} 43 | 44 | 45 | `@script` 46 | SQL Stands for structured query language. 47 | 48 | You can use SQL to interact with relational databases and other things such as retrieving, manipulating or aggregating data and many more. 49 | 50 | In this video you will be building sql queries to explore and summarise a real life database. 51 | 52 | 53 | --- 54 | ## What is relational database? 55 | 56 | ```yaml 57 | type: "TwoColumns" 58 | key: "9810dbbd56" 59 | ``` 60 | 61 | `@part1` 62 | - Data organised in "Table" objects {{1}} 63 | 64 | - rows {{2}} 65 | 66 | - columns {{3}} 67 | 68 | - More than one tables in the database {{4}} 69 | 70 | - Can JOIN these tables to build queries where relationship between tables exist {{5}} 71 | 72 | 73 | `@part2` 74 | ![](https://github.com/nujcharee/courses/blob/master/Screen%20Shot%202018-09-16%20at%2019.56.50.png?raw=true) {{3}} 75 | 76 | 77 | `@script` 78 | Before deep diving into SQL, it’s important to have a basic understanding of relational database concept. 79 | 80 | Relational databases organise data into “table” objects consisting of rows and columns. Each column in a database has a specific data type such as text, number, date and so on. 81 | 82 | There are usually more than one tables in a database. 83 | 84 | They can be joined to build queries where relationships between tables exist. 85 | 86 | You will be learning more about JOINing tables later in the course. 87 | 88 | 89 | --- 90 | ## Basic SQL Query 91 | 92 | ```yaml 93 | type: "FullSlide" 94 | key: "5ee4acfdd2" 95 | ``` 96 | 97 | `@part1` 98 | - Select statement begins with SELECT keyword {{1}} 99 | 100 | ``` 101 | SELECT  FROM ; 102 | ``` {{2}} 103 | 104 | ``` 105 | SELECT * FROM 
; 106 | ``` {{3}} 107 | 108 | ``` 109 | SELECT , ,  FROM 
; 110 | ``` {{4}} 111 | 112 | - Always remember to only retrieve columns that you need {{5}} 113 | 114 | - `SELECT *` may cause performance to suffer from the fact the query pulls up too much data {{6}} 115 | 116 | - The semi-colon (;) indicates that SQL statment is complete and is ready to be interpreted {{7}} 117 | 118 | 119 | `@script` 120 | A Select statement begins with SELECT keyword followed by FROM clause. 121 | 122 | The first select statement will retrieve data from a single column from a single table. 123 | 124 | The special character asterisk can be used in a select statement in order to retrieve ALL columns from a table as you can see in the second select statement. 125 | 126 | It is possible to select more than one columns from a table as per the third sample right here. 127 | 128 | Always remember to only retrieve columns that you need as SELECT ALL columns may cause performance to suffer from the fact that query pulls up too much data. 129 | 130 | 131 | It is also a good practise to always put the semi-colon (;) which indicates that SQL statement is complete and is ready to be interpreted. 132 | 133 | 134 | --- 135 | ## Case Studies: Video Games Global Sales database 136 | 137 | ```yaml 138 | type: "FullSlide" 139 | key: "9e2bb70cfe" 140 | ``` 141 | 142 | `@part1` 143 | ![](https://github.com/nujcharee/courses/blob/master/Screen%20Shot%202018-09-16%20at%2011.29.09.png?raw=true) {{1}} 144 | 145 | ``` 146 | SELECT * FROM SALES; 147 | ``` {{2}} 148 | **Question: What is the name of Number 1 game in 2006? ** {{3}} 149 | 150 | ``` 151 | SELECT Name, Year FROM Sales WHERE Rank = 1; 152 | ``` {{4}} 153 | **Answer: Wii Sport ** {{5}} 154 | 155 | 156 | `@script` 157 | Let’s apply this with a real database. For the first case study you will use a database from Kaggle's Video Game Global Sales competition. Imagine that you work in a video game industry and you are tasked to carry out a market research. Your job is to analyse sales trend. 158 | 159 | Let's have a quick glance over the Sales table here. Think about the SELECT statement needed to retrieve ALL columns from this table. 160 | 161 | You will use SELECT * FROM Sales; 162 | 163 | The first question you may want answered is: What is the name of Number 1 game in 2006? Let's think about how you will write this in a select statement. 164 | 165 | That’s right. 166 | 167 | 168 | --- 169 | ## WHERE and ORDER BY 170 | 171 | ```yaml 172 | type: "FullSlide" 173 | key: "73171c367d" 174 | ``` 175 | 176 | `@part1` 177 | - WHERE is used to apply condition(s) to a query {{1}} 178 | 179 | - ORDER BY is used if the result rows should be in a specific order {{2}} 180 | 181 | **Question: What games were sold in 2016? ** 182 | {{3}} 183 | 184 | ![](https://github.com/nujcharee/courses/blob/master/Screen%20Shot%202018-09-16%20at%2000.14.48.png?raw=true) {{4}} 185 | 186 | ``` 187 | SELECT Rank, Name, Platform, Year, Genre, Global_Sales FROM Sales 188 | WHERE Year = 2016 ORDER BY Rank; 189 | ``` {{5}} 190 | 191 | **Answer: Looks like soccer on PS4 is doing pretty awesome worldwide! ** 192 | {{6}} 193 | 194 | 195 | `@script` 196 | WHERE and ORDER keywords are optional from select statement. 197 | 198 | You noticed that in the last SQL query, WHERE is used to apply condition(s) to a query. It can filter out the rows that we want to show while ORDER BY is used if the result rows should be in a specific order 199 | 200 | Let's look at this question. What games were sold in 2016? 201 | 202 | Think about what select statement requires to return this result. 203 | 204 | There you got it. 205 | 206 | 207 | --- 208 | ## Summarising data 209 | 210 | ```yaml 211 | type: "FullSlide" 212 | key: "b68404a3f8" 213 | ``` 214 | 215 | `@part1` 216 | - Aggregate functions such as count, minimum, maximum, average, and sum {{1}} 217 | 218 | ``` 219 | SELECT COUNT(Name) FROM Sales; 220 | 221 | SELECT AVG(Global_Sales) FROM Sales; 222 | 223 | SELECT MIN(Global_Sales) FROM Sales; 224 | 225 | SELECT MAX(Global_Sales) FROM Sales; 226 | 227 | SELECT SUM(Global_Sales) FROM Sales; 228 | ``` {{2}} 229 | 230 | - GROUP BY split the table into different groups based on the value of each row {{3}} 231 | 232 | **Question: What is Nintendo's highest selling group by year? ** {{4}} 233 | 234 | ``` 235 | SELECT MAX(NA_SALES) FROM SALES WHERE Publisher = 'Nintendo' 236 | GROUP BY Year; 237 | ``` {{5}} 238 | 239 | **Answer: The highest selling in 2006 is $41.49 million thanks to Wii Sport ** {{6}} 240 | 241 | 242 | `@script` 243 | After exploring content of a table next you may need to summarize data. SQL has keywords for aggregate functions namely 244 | 245 | COUNT, which is a keyword used to return total number of rows 246 | 247 | We use AVG keyword in order to find the average of a given value. 248 | 249 | As for MIN and MAX, they are used to find the minimum and maximum value of a table respectively. 250 | 251 | And SUM keyword is used to find the sum of a given value 252 | 253 | These functions are usually followed by GROUP BY keyword which is use to split the table into different groupa based on the value of each row. 254 | 255 | Let’s put this together to answer a question 256 | 257 | What is Nintendo's highest selling group by year? 258 | 259 | Answer: The highest selling in 2006 is $41.49 million thanks to Wii Sport 260 | 261 | 262 | --- 263 | ## Let's practice! 264 | 265 | ```yaml 266 | type: "FinalSlide" 267 | key: "b240739c0d" 268 | ``` 269 | 270 | `@script` 271 | Its your turn to practise! 272 | 273 | --------------------------------------------------------------------------------