├── 1-Special_Functions ├── .ipynb_checkpoints │ ├── Advanced-SQL-I_Special-Functions_BLANK-checkpoint.ipynb │ ├── Advanced-SQL-I_Special-Functions_SOLUTIONS-checkpoint.ipynb │ └── Quiz-4_Q11_DSI-9-checkpoint.ipynb ├── Advanced-SQL-I_Special-Functions_BLANK.ipynb └── Advanced-SQL-I_Special-Functions_SOLUTIONS.ipynb ├── 2-Subqueries ├── .ipynb_checkpoints │ ├── Advanced-SQL-II_Subqueries_BLANK-checkpoint.ipynb │ └── Advanced-SQL-II_Subqueries_SOLUTIONS-checkpoint.ipynb ├── Advanced-SQL-II_Subqueries_BLANK.ipynb └── Advanced-SQL-II_Subqueries_SOLUTIONS.ipynb ├── 3-Window_Functions ├── Advanced-SQL-III_Correlated-Sub-Queries-and-Window-Functions_BLANK.ipynb └── Advanced-SQL-III_Correlated-Sub-Queries-and-Window-Functions_SOLUTIONS.ipynb ├── LICENSE └── datasets ├── DailyQuote_essentialsql.txt ├── GoT_Schema.txt └── employees_udemy.txt /1-Special_Functions/.ipynb_checkpoints/Advanced-SQL-I_Special-Functions_BLANK-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL I: Special Functions\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "Throughout this entire session, we'll be running the queries in PostgreSQL. This Jupyter Notebook will just be a written record of what we've learned so that you'll have all of these functions in one location.\n", 17 | "\n", 18 | "Note that **THIS IS BY NO MEANS AN EXHAUSTIVE LIST** -- I have cherry-picked the ones that are commonly asked in interviews and/or useful on the job from my experience." 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### Preparation\n", 26 | "\n", 27 | "You should have already downloaded [PostgreSQL](https://www.enterprisedb.com/downloads/postgres-postgresql-downloads). Make sure you have **pgAdmin 4** set up and that you've loaded the `GoT Schemas`." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Contents\n", 35 | "**I. String Manipulation**\n", 36 | "- [`UPPER()`](#UPPER())\n", 37 | "- [`LOWER()`](#LOWER())\n", 38 | "- [`INITCAP()`](#LOWER())\n", 39 | "- [`LENGTH()`](#LENGTH())\n", 40 | "- [`POSITION()`](#POSITION())\n", 41 | "- [`TRIM()`](#TRIM())\n", 42 | "- [`SUBSTRING()`](#SUBSTRING())\n", 43 | "- [Concatenation Methods](#Concatenation)\n", 44 | "- [`REPLACE()`](#REPLACE())\n", 45 | "\n", 46 | "**II. Conditionals**\n", 47 | "- [Boolean Statements](#Boolean-Statements)\n", 48 | "- [`COALESCE()`](#COALESCE())\n", 49 | "- [`CASE WHEN`](#CASE-WHEN)\n", 50 | "\n", 51 | "**III. Date-Time Manipulation**\n", 52 | "- [Type Conversion](#Type-Conversion)\n", 53 | "- [`EXTRACT()`](#EXTRACT())\n", 54 | "\n", 55 | "https://www.postgresql.org/docs/8.1/functions-formatting.html\n", 56 | "https://www.postgresql.org/docs/9.1/functions-datetime.html" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "## I. String Manipulation" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "### `LOWER()`\n", 71 | "This is the same as the `.lower()` method for strings in Python used to convert every letter in a string to lower case\n", 72 | "\n", 73 | "_Example_: Convert all letters of the string `HeLlO, wOrLd!` to lower case\n", 74 | "```MySQL\n", 75 | "SELECT LOWER('HeLlO, wOrLd!')\n", 76 | "```\n", 77 | "\n", 78 | "**DISCUSS:** Why do you think this can be useful? Does case matter in SQL?\n", 79 | "\n", 80 | "\n", 81 | "**THINK:** Consider the following queries. Which of these will run?
\n", 82 | "(A) `SELECT first_name FROM people WHERE first_name = 'eddard'`
\n", 83 | "(B) `select first_name from people where first_name = 'eddard'`
\n", 84 | "(C) `SELECT first_name FROM people WHERE first_name = 'Eddard'`
\n", 85 | "(D) `select first_name from people where first_name = 'Eddard'`\n", 86 | "\n", 87 | "**EXERCISE 1:** Write a query that returns the first name of all living members of the ruling family of winterfell, but make sure the letters are all in lower case." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "_Answer:_\n", 95 | "\n", 96 | "```MySQL\n", 97 | "\n", 98 | "```" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "### `UPPER()`\n", 106 | "For completeness, this is the same as the `.upper()` method for strings in Python used to capitalize every letter in a string\n", 107 | "\n", 108 | "_Example_: Capitalize all letters of the string `Hello World`\n", 109 | "```MySQL\n", 110 | "SELECT UPPER('Hello, world!')\n", 111 | "```\n", 112 | "\n", 113 | "**EXERCISE 2:** Write a query that capitalizes every letter of every unique noble house's domain from the `houses` table." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "_Answer:_\n", 121 | "\n", 122 | "```MySQL\n", 123 | "\n", 124 | "```" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "### `INITCAP()`\n", 132 | "This is the same as the `.capitalize()` method for strings in Python that is used to convert the first letter to upper case.\n", 133 | "\n", 134 | "**EXERCISE 3:** Write a SQL query that returns the first name and houses of all characters whose first name begins with the prefix \"ae-\" or \"Ae-\", but make sure that only the first letter is capitalized in both of those columns." 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "_Answer:_\n", 142 | "```MySQL\n", 143 | "\n", 144 | "```" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "### `LENGTH()`\n", 152 | "This is the same as the `len()` function in Python. However, since we don't have lists or tuples in SQL, this is only applicable to objects with characters.\n", 153 | "\n", 154 | "**EXERCISE 4:** Write a query that displays the first name and house of characters that are alive, but only if their house is at least 6 characters long." 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "_Answer:_\n", 162 | "\n", 163 | "```MySQL\n", 164 | "\n", 165 | "```" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "### `TRIM()`\n", 173 | "This is the same as the `.strip()` method for strings in Python that eliminates leading and trailing white spaces.\n", 174 | "\n", 175 | "_Example:_ Write a query that strips out the white space from the string `' Hello, world! '`\n", 176 | "\n", 177 | "```MySQL\n", 178 | "SELECT TRIM(' Hello, world! ')\n", 179 | "```" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "### `SUBSTRING()`\n", 187 | "Python doesn't have a function that extracts a substring since we can just do it by directly indexing through the string. If you're familiar with R though, then you'll recognize this is similar to the `substr()` function.\n", 188 | "\n", 189 | "Syntax for this function:\n", 190 | "\n", 191 | "```MySQL\n", 192 | "SELECT SUBSTRING(string_column FROM FOR )\n", 193 | "```\n", 194 | "OR\n", 195 | "```MySQL\n", 196 | "SELECT SUBSTRING(string_column, , )\n", 197 | "```" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "**Example #1:**\n", 205 | "```MySQL\n", 206 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 1 FOR 5)\n", 207 | "```\n", 208 | "OR\n", 209 | "```MySQL\n", 210 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 1, 5)\n", 211 | "```\n", 212 | "will return `'Hello'`\n", 213 | "\n", 214 | "**Example #2:**\n", 215 | "```MySQL\n", 216 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 14)\n", 217 | "```\n", 218 | "OR\n", 219 | "```MySQL\n", 220 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 14)\n", 221 | "```\n", 222 | "will return `'friend! Hehe.`" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### Concatenation\n", 230 | "\n", 231 | "This is the equivalent of string concatenation in Python using `+`. The `+` in Python is replaced by `||` in PostgreSQL. Alternatively, you can use the `CONCAT()` function." 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "_Example:_ Write a query that prints every character's full name (i.e. first name then house)\n", 239 | "```MySQL\n", 240 | "SELECT INITCAP(p.first_name) || ' ' || INITCAP(p.house)\n", 241 | "FROM people p\n", 242 | "```\n", 243 | "\n", 244 | "**EXERCISE 5:** Write a query that automatically generates the sentence `'s army has soldiers.`" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "_Answer:_\n", 252 | "```MySQL\n", 253 | "\n", 254 | "```" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "### `REPLACE()`" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "This is the equivalent of the `.replace()` method for strings in Python and the `gsub()` function in R." 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "_Example:_\n", 276 | "```MySQL\n", 277 | "SELECT house,\n", 278 | " REPLACE(house, 'lannister', 'Evil Ducks') AS new_house -- replace all 'Lannister' with 'Evil Ducks' in house col\n", 279 | "FROM people\n", 280 | "```\n", 281 | "\n", 282 | "Does the function work when replacing `NULL` values though? Try this and let me know what you see\n", 283 | "```MySQL\n", 284 | "SELECT first_name,\n", 285 | " REPLACE(nickname, NULL, 'missing') AS new_nickname\n", 286 | "FROM people\n", 287 | "```" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "## `COALESCE()`\n", 295 | "This is an extremely powerful function that lets us handle missing values on a column-by-column basis.\n", 296 | "\n", 297 | "The syntax is pretty straight forward for this one: \n", 298 | "```MySQL\n", 299 | "COALESCE(, )\n", 300 | "```\n", 301 | "\n", 302 | "Alright, your turn!\n", 303 | "\n", 304 | "**EXERCISE 6**: Write a query that prints every character's full name in one column and their nickname in another, but make sure to replace all `NULL` nicknames with `¯\\_(ツ)_/¯`." 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "_Answer:_\n", 312 | "```MySQL\n", 313 | "\n", 314 | "```" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "_____\n", 322 | "## II. Conditionals" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "### Boolean Statements" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "**Review Discussion:** What is a Boolean statement? Can you think of an example where you've used this before?\n", 337 | "\n", 338 | "We can also include Booleans to create dummy variables in SQL on the fly.\n", 339 | "\n", 340 | "_Example:_\n", 341 | "```MySQL\n", 342 | "SELECT b.name,\n", 343 | " b.size,\n", 344 | " b.size >= 30 AS \"IsLarge\"\n", 345 | "FROM bannermen AS b\n", 346 | "```" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "## `CASE WHEN`\n", 354 | "This is the equivalent of if-elif-else statements, except embedded into a query. This takes Boolean Statements to the next level by allowing you to customize what happens on a case-by-case basis\n", 355 | "\n", 356 | "_Example_: Write a query that groups bannermen army sizes into 'yuge' (35+), 'medium' (25-34), 'smol' (< 25) " 357 | ] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": {}, 362 | "source": [ 363 | "```MySQL\n", 364 | "SELECT b.name,\n", 365 | " b.size,\n", 366 | " CASE WHEN b.size >= 35 THEN 'yuge' -- if\n", 367 | " WHEN b.size BETWEEN 25 AND 34 THEN 'medium' -- elif\n", 368 | " ELSE 'smol' -- else\n", 369 | " END AS \"size_group\" -- end it! (and rename if you want)\n", 370 | "FROM bannermen AS b\n", 371 | "```" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "## III. Date-Time Manipulation" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "### Type Conversion\n", 386 | "_(Complete documentation here: https://www.postgresql.org/docs/8.1/functions-formatting.html)_" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "#### `to_timestamp()`\n", 394 | "If you have a string that's both date and want to convert it to a datetime objecttime want the date and time,\n", 395 | "```MySQL\n", 396 | "SELECT to_timestamp('2019 May 13 15:00:05', 'YYYY-MON-DD HH24:MI:SS')\n", 397 | "```\n", 398 | "\n", 399 | "#### `to_date()`\n", 400 | "If you have a string where you want to convert to a date without any timestamp\n", 401 | "```MySQL\n", 402 | "SELECT to_date('2019 May 13 14:00:58', 'YYYY-MON-DD')\n", 403 | "```\n", 404 | "\n", 405 | "#### `current_date`\n", 406 | "You can use this to pull the current date from your computer's clock and manipulate it as you desired.\n", 407 | "```MySQL\n", 408 | "SELECT current_date\n", 409 | "```\n", 410 | "\n", 411 | "**EXERCISE 7:** Write a query that returns what the date was 21 days ago" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "_Answer:_\n", 419 | " \n", 420 | "```MySQL\n", 421 | "\n", 422 | "```" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "### `EXTRACT()`\n", 430 | "If you want to extract certain parts of a datetime object, this function is MAGICAL!\n", 431 | "\n", 432 | "```MySQL\n", 433 | "SELECT current_timestamp AS today,\n", 434 | "\t EXTRACT(day from current_date) AS \"Day\",\n", 435 | "\t EXTRACT(month from current_date) AS \"Month\",\n", 436 | "\t EXTRACT(year from current_date) AS \"Year\",\n", 437 | "\t EXTRACT(hour from current_timestamp) AS \"Hour\",\n", 438 | "\t EXTRACT(minute from current_timestamp) AS \"Minute\"\n", 439 | "```\n", 440 | "\n", 441 | "### Challenge: Interview Questions\n", 442 | "Lyft recently acquired the rights to add CitiBike to its app as part of its Bikes & Scooters business. You are a Data Scientist studying a `rides` table containing data on completed trips taken by riders, and a `deployed_bikes` table which contains information on the locations where each unique bike is deployed (i.e. where it is stationed).\n", 443 | "\n", 444 | "**`rides`** schema: \n", 445 | "- `ride_id`: int **[PRIMARY KEY]**\n", 446 | "- `bike_id`: int\n", 447 | "- `ride_datetime`: string\n", 448 | "- `duration`: int\n", 449 | "\n", 450 | "**`deployed_bikes`** schema:\n", 451 | "- `bike_id`: int **[PRIMARY KEY]**\n", 452 | "- `deploy_location`: string\n", 453 | "\n", 454 | "**EXERCISE 8: For the last week, find the number of rides that occured on each date, ordered from most recent to least recent**" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": {}, 460 | "source": [ 461 | "_Answer:_\n", 462 | "```MySQL\n", 463 | "\n", 464 | "```" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [ 471 | "**EXERCISE 9: Which deployment location did the best over the past week?**\n", 472 | "_(At this stage, you may assume there are no ties)_" 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "_Answer:_\n", 480 | " \n", 481 | "```MySQL\n", 482 | "\n", 483 | "```" 484 | ] 485 | } 486 | ], 487 | "metadata": { 488 | "kernelspec": { 489 | "display_name": "Python 3", 490 | "language": "python", 491 | "name": "python3" 492 | }, 493 | "language_info": { 494 | "codemirror_mode": { 495 | "name": "ipython", 496 | "version": 3 497 | }, 498 | "file_extension": ".py", 499 | "mimetype": "text/x-python", 500 | "name": "python", 501 | "nbconvert_exporter": "python", 502 | "pygments_lexer": "ipython3", 503 | "version": "3.7.1" 504 | } 505 | }, 506 | "nbformat": 4, 507 | "nbformat_minor": 2 508 | } 509 | -------------------------------------------------------------------------------- /1-Special_Functions/.ipynb_checkpoints/Advanced-SQL-I_Special-Functions_SOLUTIONS-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL I: Special Functions\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "Throughout this entire session, we'll be running the queries in PostgreSQL. This Jupyter Notebook will just be a written record of what we've learned so that you'll have all of these functions in one location.\n", 17 | "\n", 18 | "Note that **THIS IS BY NO MEANS AN EXHAUSTIVE LIST** -- I have cherry-picked the ones that are commonly asked in interviews and/or useful on the job from my experience." 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### Preparation\n", 26 | "\n", 27 | "You should have already downloaded [PostgreSQL](https://www.enterprisedb.com/downloads/postgres-postgresql-downloads). Make sure you have **pgAdmin 4** set up and that you've loaded the `GoT Schemas`." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Contents\n", 35 | "**I. String Manipulation**\n", 36 | "- [`UPPER()`](#UPPER())\n", 37 | "- [`LOWER()`](#LOWER())\n", 38 | "- [`INITCAP()`](#LOWER())\n", 39 | "- [`LENGTH()`](#LENGTH())\n", 40 | "- [`TRIM()`](#TRIM())\n", 41 | "- [`SUBSTRING()`](#SUBSTRING())\n", 42 | "- [Concatenation Methods](#Concatenation)\n", 43 | "- [`REPLACE()`](#REPLACE())\n", 44 | "- [`COALESCE()`](#COALESCE())\n", 45 | "\n", 46 | "**II. Conditionals**\n", 47 | "- [Boolean Statements](#Boolean-Statements)\n", 48 | "- [`CASE WHEN`](#CASE-WHEN)\n", 49 | "\n", 50 | "**III. Date-Time Manipulation**\n", 51 | "- [Type Conversion](#Type-Conversion)\n", 52 | "- [`EXTRACT()`](#EXTRACT())" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "## I. String Manipulation" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### `LOWER()`\n", 67 | "This is the same as the `.lower()` method for strings in Python used to convert every letter in a string to lower case\n", 68 | "\n", 69 | "_Example_: Convert all letters of the string `HeLlO, wOrLd!` to lower case\n", 70 | "```MySQL\n", 71 | "SELECT LOWER('HeLlO, wOrLd!')\n", 72 | "```\n", 73 | "\n", 74 | "**DISCUSS:** Why do you think this can be useful? Does case matter in SQL?\n", 75 | "\n", 76 | "\n", 77 | "**THINK:** Consider the following queries. Which of these will run?
\n", 78 | "(A) `SELECT first_name FROM people WHERE first_name = 'eddard'`
\n", 79 | "(B) `select first_name from people where first_name = 'eddard'`
\n", 80 | "(C) `SELECT first_name FROM people WHERE first_name = 'Eddard'`
\n", 81 | "(D) `select first_name from people where first_name = 'Eddard'`\n", 82 | "\n", 83 | "**EXERCISE 1:** Write a query that returns the first name of all living members of the ruling family of winterfell, but make sure the letters are all in lower case." 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "_Answer:_\n", 91 | "\n", 92 | "```MySQL\n", 93 | "SELECT p.first_name\n", 94 | "FROM people AS p\n", 95 | " INNER JOIN houses AS h ON p.house = h.name\n", 96 | "WHERE h.domain = 'winterfell' AND p.alive = 1\n", 97 | "```" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### `UPPER()`\n", 105 | "For completeness, this is the same as the `.upper()` method for strings in Python used to capitalize every letter in a string\n", 106 | "\n", 107 | "_Example_: Capitalize all letters of the string `Hello World`\n", 108 | "```MySQL\n", 109 | "SELECT 'Hello, world!'\n", 110 | "```\n", 111 | "\n", 112 | "**EXERCISE 2:** Write a query that capitalizes every letter of every unique noble house's domain from the `houses` table." 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "_Answer:_\n", 120 | "\n", 121 | "```MySQL\n", 122 | "SELECT UPPER(h.name)\n", 123 | "FROM houses AS h\n", 124 | "```" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "### `INITCAP()`\n", 132 | "This is the same as the `.capitalize()` method for strings in Python that is used to convert the first letter to upper case.\n", 133 | "\n", 134 | "**EXERCISE 3:** Write a SQL query that returns the first name and houses of all characters whose first name begins with the prefix \"ae-\" or \"Ae-\", but make sure that only the first letter is capitalized in both of those columns." 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "```MySQL\n", 142 | "SELECT INITCAP(c.first_name), INITCAP(c.house)\n", 143 | "FROM people AS c\n", 144 | "WHERE c.first_name ILIKE 'ae%'\n", 145 | "```" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "### `LENGTH()`\n", 153 | "This is the same as the `len()` function in Python. However, since we don't have lists or tuples in SQL, this is only applicable to objects with characters.\n", 154 | "\n", 155 | "**EXERCISE 4:** Write a query that displays the first name and house of characters that are alive, but only if their house is at least 6 characters long." 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "_Answer:_\n", 163 | "\n", 164 | "```MySQL\n", 165 | "SELECT p.first_name, p.house\n", 166 | "FROM people AS p\n", 167 | "WHERE p.alive = 1 AND LENGTH(p.house) >= 6\n", 168 | "```" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "### `TRIM()`\n", 176 | "This is the same as the `.strip()` method for strings in Python that eliminates leading and trailing white spaces.\n", 177 | "\n", 178 | "_Example:_ Write a query that strips out the white space from the string `' Hello, world! '`\n", 179 | "\n", 180 | "```MySQL\n", 181 | "SELECT TRIM(' Hello, world! ')\n", 182 | "```" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "### `SUBSTRING()`\n", 190 | "Python doesn't have a function that extracts a substring since we can just do it by directly indexing through the string. If you're familiar with R though, then you'll recognize this is similar to the `substr()` function.\n", 191 | "\n", 192 | "Syntax for this function:\n", 193 | "\n", 194 | "```MySQL\n", 195 | "SELECT SUBSTRING(string_column FROM FOR )\n", 196 | "```\n", 197 | "OR\n", 198 | "```MySQL\n", 199 | "SELECT SUBSTRING(string_column, , )\n", 200 | "```" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "**Example #1:**\n", 208 | "```MySQL\n", 209 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 1 FOR 5)\n", 210 | "```\n", 211 | "OR\n", 212 | "```MySQL\n", 213 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 1, 5)\n", 214 | "```\n", 215 | "will return `'Hello'`\n", 216 | "\n", 217 | "**Example #2:**\n", 218 | "```MySQL\n", 219 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 14)\n", 220 | "```\n", 221 | "OR\n", 222 | "```MySQL\n", 223 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 14)\n", 224 | "```\n", 225 | "will return `'friend! Hehe.`" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "### Concatenation\n", 233 | "\n", 234 | "This is the equivalent of string concatenation in Python using `+`. The `+` in Python is replaced by `||` in PostgreSQL. Alternatively, you can use the `CONCAT()` function." 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "_Example:_ Write a query that prints every character's full name (i.e. first name then house)\n", 242 | "```MySQL\n", 243 | "SELECT INITCAP(p.first_name) || ' ' || INITCAP(p.house)\n", 244 | "FROM people p\n", 245 | "```\n", 246 | "\n", 247 | "**EXERCISE 5:** Write a query that automatically generates the sentence `'s army has soldiers.`" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "_Answer:_\n", 255 | "```MySQL\n", 256 | "SELECT INITCAP(b.name) || '''s army has ' || size || ' soldiers.'\n", 257 | "FROM bannermen b\n", 258 | "```" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "### `REPLACE()`" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "This is the equivalent of the `.replace()` method for strings in Python and the `gsub()` function in R." 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "_Example:_\n", 280 | "```MySQL\n", 281 | "SELECT house,\n", 282 | " REPLACE(house, 'lannister', 'Evil Ducks') AS new_house -- replace all 'Lannister' with 'Evil Ducks' in house col\n", 283 | "FROM people\n", 284 | "```\n", 285 | "\n", 286 | "Does the function work when replacing `NULL` values though? Try this and let me know what you see\n", 287 | "```MySQL\n", 288 | "SELECT first_name,\n", 289 | " REPLACE(nickname, NULL, 'missing') AS new_nickname\n", 290 | "FROM people\n", 291 | "```" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "## `COALESCE()`\n", 299 | "This is an extremely powerful function that lets us handle missing values on a column-by-column basis.\n", 300 | "\n", 301 | "The syntax is pretty straight forward for this one: \n", 302 | "```MySQL\n", 303 | "COALESCE(, )\n", 304 | "```\n", 305 | "\n", 306 | "Alright, your turn!\n", 307 | "\n", 308 | "**EXERCISE 6**: Write a query that prints every character's full name in one column and their nickname in another, but make sure to replace all `NULL` nicknames with `¯\\_(ツ)_/¯`." 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "_Answer:_\n", 316 | "```MySQL\n", 317 | "SELECT first_name,\n", 318 | " COALESCE(nickname, '¯\\_(ツ)_/¯') AS cleaned_nickname\n", 319 | "FROM people\n", 320 | "```" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "_____\n", 328 | "## II. Conditionals" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "### Boolean Statements" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "**Review Discussion:** What is a Boolean statement? Can you think of an example where you've used this before?\n", 343 | "\n", 344 | "We can also include Booleans to create dummy variables in SQL on the fly.\n", 345 | "\n", 346 | "_Example:_\n", 347 | "```MySQL\n", 348 | "SELECT b.name,\n", 349 | " b.size,\n", 350 | " b.size >= 30 AS \"IsLarge\"\n", 351 | "FROM bannermen AS b\n", 352 | "```" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "## `CASE WHEN`\n", 360 | "This is the equivalent of if-elif-else statements, except embedded into a query. This takes Boolean Statements to the next level by allowing you to customize what happens on a case-by-case basis\n", 361 | "\n", 362 | "_Example_: Write a query that groups bannermen army sizes into 'yuge' (35+), 'medium' (25-34), 'smol' (< 25) " 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "```MySQL\n", 370 | "SELECT b.name,\n", 371 | " b.size,\n", 372 | " CASE WHEN b.size >= 35 THEN 'yuge' -- if\n", 373 | " WHEN b.size BETWEEN 25 AND 34 THEN 'medium' -- elif\n", 374 | " ELSE 'smol' -- else\n", 375 | " END AS \"size_group\" -- end it! (and rename if you want)\n", 376 | "FROM bannermen AS b\n", 377 | "```" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "## III. Date-Time Manipulation" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "### Type Conversion\n", 392 | "_(Complete documentation here: https://www.postgresql.org/docs/8.1/functions-formatting.html)_" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "#### `to_timestamp()`\n", 400 | "If you have a string that's both date and want to convert it to a datetime objecttime want the date and time,\n", 401 | "```MySQL\n", 402 | "SELECT to_timestamp('2019 May 13 15:00:05', 'YYYY-MON-DD HH24:MI:SS')\n", 403 | "```\n", 404 | "\n", 405 | "#### `to_date()`\n", 406 | "If you have a string where you want to convert to a date without any timestamp\n", 407 | "```MySQL\n", 408 | "SELECT to_date('2019 May 13 14:00:58', 'YYYY-MON-DD')\n", 409 | "```\n", 410 | "\n", 411 | "#### `current_date`\n", 412 | "You can use this to pull the current date from your computer's clock and manipulate it as you desired.\n", 413 | "```MySQL\n", 414 | "SELECT current_date\n", 415 | "```\n", 416 | "\n", 417 | "**EXERCISE 7:** Write a query that returns what the date was 21 days ago" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "_Answer:_\n", 425 | " \n", 426 | "```MySQL\n", 427 | "SELECT current_date - 21\n", 428 | "```" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "### `EXTRACT()`\n", 436 | "_(More datetime manipulation functions: https://www.postgresql.org/docs/9.1/functions-datetime.html)_\n", 437 | "\n", 438 | "If you want to extract certain parts of a datetime object, this function is MAGICAL!\n", 439 | "\n", 440 | "```MySQL\n", 441 | "SELECT current_timestamp AS today,\n", 442 | "\t EXTRACT(day from current_date) AS \"Day\",\n", 443 | "\t EXTRACT(month from current_date) AS \"Month\",\n", 444 | "\t EXTRACT(year from current_timestamp) AS \"Year\",\n", 445 | "\t EXTRACT(hour from current_timestamp) AS \"Hour\",\n", 446 | "\t EXTRACT(minute from current_timestamp) AS \"Minute\"\n", 447 | "```\n", 448 | "\n", 449 | "### Challenge: Interview Questions\n", 450 | "Lyft recently acquired the rights to add CitiBike to its app as part of its Bikes & Scooters business. You are a Data Scientist studying a `rides` table containing data on completed trips taken by riders, and a `deployed_bikes` table which contains information on the locations where each unique bike is deployed (i.e. where it is stationed).\n", 451 | "\n", 452 | "**`rides`** schema: \n", 453 | "- `ride_id`: int **[PRIMARY KEY]**\n", 454 | "- `bike_id`: int\n", 455 | "- `ride_datetime`: string\n", 456 | "- `duration`: int\n", 457 | "\n", 458 | "**`deployed_bikes`** schema:\n", 459 | "- `bike_id`: int **[PRIMARY KEY]**\n", 460 | "- `deploy_location`: string\n", 461 | "\n", 462 | "**EXERCISE 8: For the last week, find the number of rides that occured on each date, ordered from most recent to least recent**" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": {}, 468 | "source": [ 469 | "_Answer:_\n", 470 | "```MySQL\n", 471 | "SELECT ride_datetime,\n", 472 | " COUNT(ride_id)\n", 473 | "FROM rides\n", 474 | "WHERE to_date(ride_datetime, 'YYYY-MON-DD') BETWEEN (current_date - 7) AND (current_date - 1)\n", 475 | "GROUP BY ride_datetime\n", 476 | "ORDER BY ride_datetime DESC\n", 477 | "```" 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "**EXERCISE 9: Which deployment location did the best over the past week?**" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "_Answer:_\n", 492 | " \n", 493 | "```MySQL\n", 494 | "SELECT d.deploy_location,\n", 495 | " COUNT(r.ride_id)\n", 496 | "FROM rides AS r\n", 497 | " INNER JOIN deployed_bikes AS d ON r.bike_id = d.bike_id\n", 498 | "WHERE to_date(ride_date, 'YYYY-MON-DD') BETWEEN (current_date - 7) AND (current_date - 1)\n", 499 | "GROUP BY d.deploy_location\n", 500 | "ORDER BY COUNT(ride_id) DESC\n", 501 | "LIMIT 1\n", 502 | "```" 503 | ] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "Note this is actually _not the best_ solution since it only returns 1 row and doesn't account for the case where we have more than 1 deployment location with tied highest ride counts. The best solution would require a subquery, which I won't be covering until Advanced SQL II (Subqueries), so you can try revisiting this question and coming up with the best solution after we go through that!" 510 | ] 511 | } 512 | ], 513 | "metadata": { 514 | "kernelspec": { 515 | "display_name": "Python 3", 516 | "language": "python", 517 | "name": "python3" 518 | }, 519 | "language_info": { 520 | "codemirror_mode": { 521 | "name": "ipython", 522 | "version": 3 523 | }, 524 | "file_extension": ".py", 525 | "mimetype": "text/x-python", 526 | "name": "python", 527 | "nbconvert_exporter": "python", 528 | "pygments_lexer": "ipython3", 529 | "version": "3.7.1" 530 | } 531 | }, 532 | "nbformat": 4, 533 | "nbformat_minor": 2 534 | } 535 | -------------------------------------------------------------------------------- /1-Special_Functions/.ipynb_checkpoints/Quiz-4_Q11_DSI-9-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 2 6 | } 7 | -------------------------------------------------------------------------------- /1-Special_Functions/Advanced-SQL-I_Special-Functions_BLANK.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL I: Special Functions\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "Throughout this entire session, we'll be running the queries in PostgreSQL. This Jupyter Notebook will just be a written record of what we've learned so that you'll have all of these functions in one location.\n", 17 | "\n", 18 | "Note that **THIS IS BY NO MEANS AN EXHAUSTIVE LIST** -- I have cherry-picked the ones that are commonly asked in interviews and/or useful on the job from my experience." 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### Preparation\n", 26 | "\n", 27 | "You should have already downloaded [PostgreSQL](https://www.enterprisedb.com/downloads/postgres-postgresql-downloads). Make sure you have **pgAdmin 4** set up and that you've loaded the `GoT Schemas`." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Contents\n", 35 | "**I. String Manipulation**\n", 36 | "- [`UPPER()`](#UPPER())\n", 37 | "- [`LOWER()`](#LOWER())\n", 38 | "- [`INITCAP()`](#LOWER())\n", 39 | "- [`LENGTH()`](#LENGTH())\n", 40 | "- [`POSITION()`](#POSITION())\n", 41 | "- [`TRIM()`](#TRIM())\n", 42 | "- [`SUBSTRING()`](#SUBSTRING())\n", 43 | "- [Concatenation Methods](#Concatenation)\n", 44 | "- [`REPLACE()`](#REPLACE())\n", 45 | "\n", 46 | "**II. Conditionals**\n", 47 | "- [Boolean Statements](#Boolean-Statements)\n", 48 | "- [`COALESCE()`](#COALESCE())\n", 49 | "- [`CASE WHEN`](#CASE-WHEN)\n", 50 | "\n", 51 | "**III. Date-Time Manipulation**\n", 52 | "- [Type Conversion](#Type-Conversion)\n", 53 | "- [`EXTRACT()`](#EXTRACT())\n", 54 | "\n", 55 | "https://www.postgresql.org/docs/8.1/functions-formatting.html\n", 56 | "https://www.postgresql.org/docs/9.1/functions-datetime.html" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "## I. String Manipulation" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "### `LOWER()`\n", 71 | "This is the same as the `.lower()` method for strings in Python used to convert every letter in a string to lower case\n", 72 | "\n", 73 | "_Example_: Convert all letters of the string `HeLlO, wOrLd!` to lower case\n", 74 | "```MySQL\n", 75 | "SELECT LOWER('HeLlO, wOrLd!')\n", 76 | "```\n", 77 | "\n", 78 | "**DISCUSS:** Why do you think this can be useful? Does case matter in SQL?\n", 79 | "\n", 80 | "\n", 81 | "**THINK:** Consider the following queries. Which of these will run?
\n", 82 | "(A) `SELECT first_name FROM people WHERE first_name = 'eddard'`
\n", 83 | "(B) `select first_name from people where first_name = 'eddard'`
\n", 84 | "(C) `SELECT first_name FROM people WHERE first_name = 'Eddard'`
\n", 85 | "(D) `select first_name from people where first_name = 'Eddard'`\n", 86 | "\n", 87 | "**EXERCISE 1:** Write a query that returns the first name of all living members of the ruling family of winterfell, but make sure the letters are all in lower case." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "_Answer:_\n", 95 | "\n", 96 | "```MySQL\n", 97 | "\n", 98 | "```" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "### `UPPER()`\n", 106 | "For completeness, this is the same as the `.upper()` method for strings in Python used to capitalize every letter in a string\n", 107 | "\n", 108 | "_Example_: Capitalize all letters of the string `Hello World`\n", 109 | "```MySQL\n", 110 | "SELECT UPPER('Hello, world!')\n", 111 | "```\n", 112 | "\n", 113 | "**EXERCISE 2:** Write a query that capitalizes every letter of every unique noble house's domain from the `houses` table." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "_Answer:_\n", 121 | "\n", 122 | "```MySQL\n", 123 | "\n", 124 | "```" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "### `INITCAP()`\n", 132 | "This is the same as the `.capitalize()` method for strings in Python that is used to convert the first letter to upper case.\n", 133 | "\n", 134 | "**EXERCISE 3:** Write a SQL query that returns the first name and houses of all characters whose first name begins with the prefix \"ae-\" or \"Ae-\", but make sure that only the first letter is capitalized in both of those columns." 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "_Answer:_\n", 142 | "```MySQL\n", 143 | "\n", 144 | "```" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "### `LENGTH()`\n", 152 | "This is the same as the `len()` function in Python. However, since we don't have lists or tuples in SQL, this is only applicable to objects with characters.\n", 153 | "\n", 154 | "**EXERCISE 4:** Write a query that displays the first name and house of characters that are alive, but only if their house is at least 6 characters long." 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "_Answer:_\n", 162 | "\n", 163 | "```MySQL\n", 164 | "\n", 165 | "```" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "### `TRIM()`\n", 173 | "This is the same as the `.strip()` method for strings in Python that eliminates leading and trailing white spaces.\n", 174 | "\n", 175 | "_Example:_ Write a query that strips out the white space from the string `' Hello, world! '`\n", 176 | "\n", 177 | "```MySQL\n", 178 | "SELECT TRIM(' Hello, world! ')\n", 179 | "```" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "### `SUBSTRING()`\n", 187 | "Python doesn't have a function that extracts a substring since we can just do it by directly indexing through the string. If you're familiar with R though, then you'll recognize this is similar to the `substr()` function.\n", 188 | "\n", 189 | "Syntax for this function:\n", 190 | "\n", 191 | "```MySQL\n", 192 | "SELECT SUBSTRING(string_column FROM FOR )\n", 193 | "```\n", 194 | "OR\n", 195 | "```MySQL\n", 196 | "SELECT SUBSTRING(string_column, , )\n", 197 | "```" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "**Example #1:**\n", 205 | "```MySQL\n", 206 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 1 FOR 5)\n", 207 | "```\n", 208 | "OR\n", 209 | "```MySQL\n", 210 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 1, 5)\n", 211 | "```\n", 212 | "will return `'Hello'`\n", 213 | "\n", 214 | "**Example #2:**\n", 215 | "```MySQL\n", 216 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 14)\n", 217 | "```\n", 218 | "OR\n", 219 | "```MySQL\n", 220 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 14)\n", 221 | "```\n", 222 | "will return `'friend! Hehe.`" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### Concatenation\n", 230 | "\n", 231 | "This is the equivalent of string concatenation in Python using `+`. The `+` in Python is replaced by `||` in PostgreSQL. Alternatively, you can use the `CONCAT()` function." 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "_Example:_ Write a query that prints every character's full name (i.e. first name then house)\n", 239 | "```MySQL\n", 240 | "SELECT INITCAP(p.first_name) || ' ' || INITCAP(p.house)\n", 241 | "FROM people p\n", 242 | "```\n", 243 | "\n", 244 | "**EXERCISE 5:** Write a query that automatically generates the sentence `'s army has soldiers.`" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "_Answer:_\n", 252 | "```MySQL\n", 253 | "\n", 254 | "```" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "### `REPLACE()`" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "This is the equivalent of the `.replace()` method for strings in Python and the `gsub()` function in R." 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "_Example:_\n", 276 | "```MySQL\n", 277 | "SELECT house,\n", 278 | " REPLACE(house, 'lannister', 'Evil Ducks') AS new_house -- replace all 'Lannister' with 'Evil Ducks' in house col\n", 279 | "FROM people\n", 280 | "```\n", 281 | "\n", 282 | "Does the function work when replacing `NULL` values though? Try this and let me know what you see\n", 283 | "```MySQL\n", 284 | "SELECT first_name,\n", 285 | " REPLACE(nickname, NULL, 'missing') AS new_nickname\n", 286 | "FROM people\n", 287 | "```" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "## `COALESCE()`\n", 295 | "This is an extremely powerful function that lets us handle missing values on a column-by-column basis.\n", 296 | "\n", 297 | "The syntax is pretty straight forward for this one: \n", 298 | "```MySQL\n", 299 | "COALESCE(, )\n", 300 | "```\n", 301 | "\n", 302 | "Alright, your turn!\n", 303 | "\n", 304 | "**EXERCISE 6**: Write a query that prints every character's full name in one column and their nickname in another, but make sure to replace all `NULL` nicknames with `¯\\_(ツ)_/¯`." 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "_Answer:_\n", 312 | "```MySQL\n", 313 | "\n", 314 | "```" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "_____\n", 322 | "## II. Conditionals" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "### Boolean Statements" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "**Review Discussion:** What is a Boolean statement? Can you think of an example where you've used this before?\n", 337 | "\n", 338 | "We can also include Booleans to create dummy variables in SQL on the fly.\n", 339 | "\n", 340 | "_Example:_\n", 341 | "```MySQL\n", 342 | "SELECT b.name,\n", 343 | " b.size,\n", 344 | " b.size >= 30 AS \"IsLarge\"\n", 345 | "FROM bannermen AS b\n", 346 | "```" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "## `CASE WHEN`\n", 354 | "This is the equivalent of if-elif-else statements, except embedded into a query. This takes Boolean Statements to the next level by allowing you to customize what happens on a case-by-case basis\n", 355 | "\n", 356 | "_Example_: Write a query that groups bannermen army sizes into 'yuge' (35+), 'medium' (25-34), 'smol' (< 25) " 357 | ] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": {}, 362 | "source": [ 363 | "```MySQL\n", 364 | "SELECT b.name,\n", 365 | " b.size,\n", 366 | " CASE WHEN b.size >= 35 THEN 'yuge' -- if\n", 367 | " WHEN b.size BETWEEN 25 AND 34 THEN 'medium' -- elif\n", 368 | " ELSE 'smol' -- else\n", 369 | " END AS \"size_group\" -- end it! (and rename if you want)\n", 370 | "FROM bannermen AS b\n", 371 | "```" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "## III. Date-Time Manipulation" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "### Type Conversion\n", 386 | "_(Complete documentation here: https://www.postgresql.org/docs/8.1/functions-formatting.html)_" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "#### `to_timestamp()`\n", 394 | "If you have a string that's both date and want to convert it to a datetime objecttime want the date and time,\n", 395 | "```MySQL\n", 396 | "SELECT to_timestamp('2019 May 13 15:00:05', 'YYYY-MON-DD HH24:MI:SS')\n", 397 | "```\n", 398 | "\n", 399 | "#### `to_date()`\n", 400 | "If you have a string where you want to convert to a date without any timestamp\n", 401 | "```MySQL\n", 402 | "SELECT to_date('2019 May 13 14:00:58', 'YYYY-MON-DD')\n", 403 | "```\n", 404 | "\n", 405 | "#### `current_date`\n", 406 | "You can use this to pull the current date from your computer's clock and manipulate it as you desired.\n", 407 | "```MySQL\n", 408 | "SELECT current_date\n", 409 | "```\n", 410 | "\n", 411 | "**EXERCISE 7:** Write a query that returns what the date was 21 days ago" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "_Answer:_\n", 419 | " \n", 420 | "```MySQL\n", 421 | "\n", 422 | "```" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "### `EXTRACT()`\n", 430 | "If you want to extract certain parts of a datetime object, this function is MAGICAL!\n", 431 | "\n", 432 | "```MySQL\n", 433 | "SELECT current_timestamp AS today,\n", 434 | "\t EXTRACT(day from current_date) AS \"Day\",\n", 435 | "\t EXTRACT(month from current_date) AS \"Month\",\n", 436 | "\t EXTRACT(year from current_date) AS \"Year\",\n", 437 | "\t EXTRACT(hour from current_timestamp) AS \"Hour\",\n", 438 | "\t EXTRACT(minute from current_timestamp) AS \"Minute\"\n", 439 | "```\n", 440 | "\n", 441 | "### Challenge: Interview Questions\n", 442 | "Lyft recently acquired the rights to add CitiBike to its app as part of its Bikes & Scooters business. You are a Data Scientist studying a `rides` table containing data on completed trips taken by riders, and a `deployed_bikes` table which contains information on the locations where each unique bike is deployed (i.e. where it is stationed).\n", 443 | "\n", 444 | "**`rides`** schema: \n", 445 | "- `ride_id`: int **[PRIMARY KEY]**\n", 446 | "- `bike_id`: int\n", 447 | "- `ride_datetime`: string\n", 448 | "- `duration`: int\n", 449 | "\n", 450 | "**`deployed_bikes`** schema:\n", 451 | "- `bike_id`: int **[PRIMARY KEY]**\n", 452 | "- `deploy_location`: string\n", 453 | "\n", 454 | "**EXERCISE 8: For the last week, find the number of rides that occured on each date, ordered from most recent to least recent**" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": {}, 460 | "source": [ 461 | "_Answer:_\n", 462 | "```MySQL\n", 463 | "\n", 464 | "```" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [ 471 | "**EXERCISE 9: Which deployment location did the best over the past week?**\n", 472 | "_(At this stage, you may assume there are no ties)_" 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "_Answer:_\n", 480 | " \n", 481 | "```MySQL\n", 482 | "\n", 483 | "```" 484 | ] 485 | } 486 | ], 487 | "metadata": { 488 | "kernelspec": { 489 | "display_name": "Python 3", 490 | "language": "python", 491 | "name": "python3" 492 | }, 493 | "language_info": { 494 | "codemirror_mode": { 495 | "name": "ipython", 496 | "version": 3 497 | }, 498 | "file_extension": ".py", 499 | "mimetype": "text/x-python", 500 | "name": "python", 501 | "nbconvert_exporter": "python", 502 | "pygments_lexer": "ipython3", 503 | "version": "3.7.1" 504 | } 505 | }, 506 | "nbformat": 4, 507 | "nbformat_minor": 2 508 | } 509 | -------------------------------------------------------------------------------- /1-Special_Functions/Advanced-SQL-I_Special-Functions_SOLUTIONS.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL I: Special Functions\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "Throughout this entire session, we'll be running the queries in PostgreSQL. This Jupyter Notebook will just be a written record of what we've learned so that you'll have all of these functions in one location.\n", 17 | "\n", 18 | "Note that **THIS IS BY NO MEANS AN EXHAUSTIVE LIST** -- I have cherry-picked the ones that are commonly asked in interviews and/or useful on the job from my experience." 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### Preparation\n", 26 | "\n", 27 | "You should have already downloaded [PostgreSQL](https://www.enterprisedb.com/downloads/postgres-postgresql-downloads). Make sure you have **pgAdmin 4** set up and that you've loaded the `GoT Schemas`." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Contents\n", 35 | "**I. String Manipulation**\n", 36 | "- [`UPPER()`](#UPPER())\n", 37 | "- [`LOWER()`](#LOWER())\n", 38 | "- [`INITCAP()`](#LOWER())\n", 39 | "- [`LENGTH()`](#LENGTH())\n", 40 | "- [`TRIM()`](#TRIM())\n", 41 | "- [`SUBSTRING()`](#SUBSTRING())\n", 42 | "- [Concatenation Methods](#Concatenation)\n", 43 | "- [`REPLACE()`](#REPLACE())\n", 44 | "- [`COALESCE()`](#COALESCE())\n", 45 | "\n", 46 | "**II. Conditionals**\n", 47 | "- [Boolean Statements](#Boolean-Statements)\n", 48 | "- [`CASE WHEN`](#CASE-WHEN)\n", 49 | "\n", 50 | "**III. Date-Time Manipulation**\n", 51 | "- [Type Conversion](#Type-Conversion)\n", 52 | "- [`EXTRACT()`](#EXTRACT())" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "## I. String Manipulation" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### `LOWER()`\n", 67 | "This is the same as the `.lower()` method for strings in Python used to convert every letter in a string to lower case\n", 68 | "\n", 69 | "_Example_: Convert all letters of the string `HeLlO, wOrLd!` to lower case\n", 70 | "```MySQL\n", 71 | "SELECT LOWER('HeLlO, wOrLd!')\n", 72 | "```\n", 73 | "\n", 74 | "**DISCUSS:** Why do you think this can be useful? Does case matter in SQL?\n", 75 | "\n", 76 | "\n", 77 | "**THINK:** Consider the following queries. Which of these will run?
\n", 78 | "(A) `SELECT first_name FROM people WHERE first_name = 'eddard'`
\n", 79 | "(B) `select first_name from people where first_name = 'eddard'`
\n", 80 | "(C) `SELECT first_name FROM people WHERE first_name = 'Eddard'`
\n", 81 | "(D) `select first_name from people where first_name = 'Eddard'`\n", 82 | "\n", 83 | "**EXERCISE 1:** Write a query that returns the first name of all living members of the ruling family of winterfell, but make sure the letters are all in lower case." 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "_Answer:_\n", 91 | "\n", 92 | "```MySQL\n", 93 | "SELECT p.first_name\n", 94 | "FROM people AS p\n", 95 | " INNER JOIN houses AS h ON p.house = h.name\n", 96 | "WHERE h.domain = 'winterfell' AND p.alive = 1\n", 97 | "```" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### `UPPER()`\n", 105 | "For completeness, this is the same as the `.upper()` method for strings in Python used to capitalize every letter in a string\n", 106 | "\n", 107 | "_Example_: Capitalize all letters of the string `Hello World`\n", 108 | "```MySQL\n", 109 | "SELECT 'Hello, world!'\n", 110 | "```\n", 111 | "\n", 112 | "**EXERCISE 2:** Write a query that capitalizes every letter of every unique noble house's domain from the `houses` table." 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "_Answer:_\n", 120 | "\n", 121 | "```MySQL\n", 122 | "SELECT UPPER(h.name)\n", 123 | "FROM houses AS h\n", 124 | "```" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "### `INITCAP()`\n", 132 | "This is the same as the `.capitalize()` method for strings in Python that is used to convert the first letter to upper case.\n", 133 | "\n", 134 | "**EXERCISE 3:** Write a SQL query that returns the first name and houses of all characters whose first name begins with the prefix \"ae-\" or \"Ae-\", but make sure that only the first letter is capitalized in both of those columns." 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "```MySQL\n", 142 | "SELECT INITCAP(c.first_name), INITCAP(c.house)\n", 143 | "FROM people AS c\n", 144 | "WHERE c.first_name ILIKE 'ae%'\n", 145 | "```" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "### `LENGTH()`\n", 153 | "This is the same as the `len()` function in Python. However, since we don't have lists or tuples in SQL, this is only applicable to objects with characters.\n", 154 | "\n", 155 | "**EXERCISE 4:** Write a query that displays the first name and house of characters that are alive, but only if their house is at least 6 characters long." 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "_Answer:_\n", 163 | "\n", 164 | "```MySQL\n", 165 | "SELECT p.first_name, p.house\n", 166 | "FROM people AS p\n", 167 | "WHERE p.alive = 1 AND LENGTH(p.house) >= 6\n", 168 | "```" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "### `TRIM()`\n", 176 | "This is the same as the `.strip()` method for strings in Python that eliminates leading and trailing white spaces.\n", 177 | "\n", 178 | "_Example:_ Write a query that strips out the white space from the string `' Hello, world! '`\n", 179 | "\n", 180 | "```MySQL\n", 181 | "SELECT TRIM(' Hello, world! ')\n", 182 | "```" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "### `SUBSTRING()`\n", 190 | "Python doesn't have a function that extracts a substring since we can just do it by directly indexing through the string. If you're familiar with R though, then you'll recognize this is similar to the `substr()` function.\n", 191 | "\n", 192 | "Syntax for this function:\n", 193 | "\n", 194 | "```MySQL\n", 195 | "SELECT SUBSTRING(string_column FROM FOR )\n", 196 | "```\n", 197 | "OR\n", 198 | "```MySQL\n", 199 | "SELECT SUBSTRING(string_column, , )\n", 200 | "```" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "**Example #1:**\n", 208 | "```MySQL\n", 209 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 1 FOR 5)\n", 210 | "```\n", 211 | "OR\n", 212 | "```MySQL\n", 213 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 1, 5)\n", 214 | "```\n", 215 | "will return `'Hello'`\n", 216 | "\n", 217 | "**Example #2:**\n", 218 | "```MySQL\n", 219 | "SELECT SUBSTRING('Hello there, friend! Hehe.' FROM 14)\n", 220 | "```\n", 221 | "OR\n", 222 | "```MySQL\n", 223 | "SELECT SUBSTRING('Hello there, friend! Hehe.', 14)\n", 224 | "```\n", 225 | "will return `'friend! Hehe.`" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "### Concatenation\n", 233 | "\n", 234 | "This is the equivalent of string concatenation in Python using `+`. The `+` in Python is replaced by `||` in PostgreSQL. Alternatively, you can use the `CONCAT()` function." 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "_Example:_ Write a query that prints every character's full name (i.e. first name then house)\n", 242 | "```MySQL\n", 243 | "SELECT INITCAP(p.first_name) || ' ' || INITCAP(p.house)\n", 244 | "FROM people p\n", 245 | "```\n", 246 | "\n", 247 | "**EXERCISE 5:** Write a query that automatically generates the sentence `'s army has soldiers.`" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "_Answer:_\n", 255 | "```MySQL\n", 256 | "SELECT INITCAP(b.name) || '''s army has ' || size || ' soldiers.'\n", 257 | "FROM bannermen b\n", 258 | "```" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "### `REPLACE()`" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "This is the equivalent of the `.replace()` method for strings in Python and the `gsub()` function in R." 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "_Example:_\n", 280 | "```MySQL\n", 281 | "SELECT house,\n", 282 | " REPLACE(house, 'lannister', 'Evil Ducks') AS new_house -- replace all 'Lannister' with 'Evil Ducks' in house col\n", 283 | "FROM people\n", 284 | "```\n", 285 | "\n", 286 | "Does the function work when replacing `NULL` values though? Try this and let me know what you see\n", 287 | "```MySQL\n", 288 | "SELECT first_name,\n", 289 | " REPLACE(nickname, NULL, 'missing') AS new_nickname\n", 290 | "FROM people\n", 291 | "```" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "## `COALESCE()`\n", 299 | "This is an extremely powerful function that lets us handle missing values on a column-by-column basis.\n", 300 | "\n", 301 | "The syntax is pretty straight forward for this one: \n", 302 | "```MySQL\n", 303 | "COALESCE(, )\n", 304 | "```\n", 305 | "\n", 306 | "Alright, your turn!\n", 307 | "\n", 308 | "**EXERCISE 6**: Write a query that prints every character's full name in one column and their nickname in another, but make sure to replace all `NULL` nicknames with `¯\\_(ツ)_/¯`." 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "_Answer:_\n", 316 | "```MySQL\n", 317 | "SELECT first_name,\n", 318 | " COALESCE(nickname, '¯\\_(ツ)_/¯') AS cleaned_nickname\n", 319 | "FROM people\n", 320 | "```" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "_____\n", 328 | "## II. Conditionals" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "### Boolean Statements" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "**Review Discussion:** What is a Boolean statement? Can you think of an example where you've used this before?\n", 343 | "\n", 344 | "We can also include Booleans to create dummy variables in SQL on the fly.\n", 345 | "\n", 346 | "_Example:_\n", 347 | "```MySQL\n", 348 | "SELECT b.name,\n", 349 | " b.size,\n", 350 | " b.size >= 30 AS \"IsLarge\"\n", 351 | "FROM bannermen AS b\n", 352 | "```" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "## `CASE WHEN`\n", 360 | "This is the equivalent of if-elif-else statements, except embedded into a query. This takes Boolean Statements to the next level by allowing you to customize what happens on a case-by-case basis\n", 361 | "\n", 362 | "_Example_: Write a query that groups bannermen army sizes into 'yuge' (35+), 'medium' (25-34), 'smol' (< 25) " 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "```MySQL\n", 370 | "SELECT b.name,\n", 371 | " b.size,\n", 372 | " CASE WHEN b.size >= 35 THEN 'yuge' -- if\n", 373 | " WHEN b.size BETWEEN 25 AND 34 THEN 'medium' -- elif\n", 374 | " ELSE 'smol' -- else\n", 375 | " END AS \"size_group\" -- end it! (and rename if you want)\n", 376 | "FROM bannermen AS b\n", 377 | "```" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "## III. Date-Time Manipulation" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "### Type Conversion\n", 392 | "_(Complete documentation here: https://www.postgresql.org/docs/8.1/functions-formatting.html)_" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "#### `to_timestamp()`\n", 400 | "If you have a string that's both date and want to convert it to a datetime objecttime want the date and time,\n", 401 | "```MySQL\n", 402 | "SELECT to_timestamp('2019 May 13 15:00:05', 'YYYY-MON-DD HH24:MI:SS')\n", 403 | "```\n", 404 | "\n", 405 | "#### `to_date()`\n", 406 | "If you have a string where you want to convert to a date without any timestamp\n", 407 | "```MySQL\n", 408 | "SELECT to_date('2019 May 13 14:00:58', 'YYYY-MON-DD')\n", 409 | "```\n", 410 | "\n", 411 | "#### `current_date`\n", 412 | "You can use this to pull the current date from your computer's clock and manipulate it as you desired.\n", 413 | "```MySQL\n", 414 | "SELECT current_date\n", 415 | "```\n", 416 | "\n", 417 | "**EXERCISE 7:** Write a query that returns what the date was 21 days ago" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "_Answer:_\n", 425 | " \n", 426 | "```MySQL\n", 427 | "SELECT current_date - 21\n", 428 | "```" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "### `EXTRACT()`\n", 436 | "_(More datetime manipulation functions: https://www.postgresql.org/docs/9.1/functions-datetime.html)_\n", 437 | "\n", 438 | "If you want to extract certain parts of a datetime object, this function is MAGICAL!\n", 439 | "\n", 440 | "```MySQL\n", 441 | "SELECT current_timestamp AS today,\n", 442 | "\t EXTRACT(day from current_date) AS \"Day\",\n", 443 | "\t EXTRACT(month from current_date) AS \"Month\",\n", 444 | "\t EXTRACT(year from current_timestamp) AS \"Year\",\n", 445 | "\t EXTRACT(hour from current_timestamp) AS \"Hour\",\n", 446 | "\t EXTRACT(minute from current_timestamp) AS \"Minute\"\n", 447 | "```\n", 448 | "\n", 449 | "### Challenge: Interview Questions\n", 450 | "Lyft recently acquired the rights to add CitiBike to its app as part of its Bikes & Scooters business. You are a Data Scientist studying a `rides` table containing data on completed trips taken by riders, and a `deployed_bikes` table which contains information on the locations where each unique bike is deployed (i.e. where it is stationed).\n", 451 | "\n", 452 | "**`rides`** schema: \n", 453 | "- `ride_id`: int **[PRIMARY KEY]**\n", 454 | "- `bike_id`: int\n", 455 | "- `ride_datetime`: string\n", 456 | "- `duration`: int\n", 457 | "\n", 458 | "**`deployed_bikes`** schema:\n", 459 | "- `bike_id`: int **[PRIMARY KEY]**\n", 460 | "- `deploy_location`: string\n", 461 | "\n", 462 | "**EXERCISE 8: For the last week, find the number of rides that occured on each date, ordered from most recent to least recent**" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": {}, 468 | "source": [ 469 | "_Answer:_\n", 470 | "```MySQL\n", 471 | "SELECT ride_datetime,\n", 472 | " COUNT(ride_id)\n", 473 | "FROM rides\n", 474 | "WHERE to_date(ride_datetime, 'YYYY-MON-DD') BETWEEN (current_date - 7) AND (current_date - 1)\n", 475 | "GROUP BY ride_datetime\n", 476 | "ORDER BY ride_datetime DESC\n", 477 | "```" 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "**EXERCISE 9: Which deployment location did the best over the past week?**" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "_Answer:_\n", 492 | " \n", 493 | "```MySQL\n", 494 | "SELECT d.deploy_location,\n", 495 | " COUNT(r.ride_id)\n", 496 | "FROM rides AS r\n", 497 | " INNER JOIN deployed_bikes AS d ON r.bike_id = d.bike_id\n", 498 | "WHERE to_date(ride_date, 'YYYY-MON-DD') BETWEEN (current_date - 7) AND (current_date - 1)\n", 499 | "GROUP BY d.deploy_location\n", 500 | "ORDER BY COUNT(ride_id) DESC\n", 501 | "LIMIT 1\n", 502 | "```" 503 | ] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "Note this is actually _not the best_ solution since it only returns 1 row and doesn't account for the case where we have more than 1 deployment location with tied highest ride counts. The best solution would require a subquery, which I won't be covering until Advanced SQL II (Subqueries), so you can try revisiting this question and coming up with the best solution after we go through that!" 510 | ] 511 | } 512 | ], 513 | "metadata": { 514 | "kernelspec": { 515 | "display_name": "Python 3", 516 | "language": "python", 517 | "name": "python3" 518 | }, 519 | "language_info": { 520 | "codemirror_mode": { 521 | "name": "ipython", 522 | "version": 3 523 | }, 524 | "file_extension": ".py", 525 | "mimetype": "text/x-python", 526 | "name": "python", 527 | "nbconvert_exporter": "python", 528 | "pygments_lexer": "ipython3", 529 | "version": "3.7.1" 530 | } 531 | }, 532 | "nbformat": 4, 533 | "nbformat_minor": 2 534 | } 535 | -------------------------------------------------------------------------------- /2-Subqueries/.ipynb_checkpoints/Advanced-SQL-II_Subqueries_BLANK-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL II: Subqueries\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Additional Learning Tools after the course" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "The dataset I've used for this lesson is from [Udemy's Master SQL for Data Science](https://www.udemy.com/master-sql-for-data-science/learn/lecture/9790570#overview) course. In the repo, you should copy and paste the database construction queries from the `employees_udemy.txt` script into PostgreSQL if you wish to explore the dataset on your own." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## What is a subquery?\n", 31 | "\n", 32 | "Exactly what it sounds like: literally inception because **it's a query within a query**!\n", 33 | "\n", 34 | "...What?! Sounds complicated...why do we need this?" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "**Motivation:** The `employees` table has a department column amongst other employee-specific information. The `departments` table shows information on each of the departments. However, some departments have recently turned over their entire team and so there may not be any employees listed in those departments. How can we figure out which departments did this?\n", 42 | "\n", 43 | "TL;DR - How do we determine which departments exist in the `employees` table but not the `departments` table? Think through the logic in English first before you attempt to convert it to code.\n", 44 | "\n", 45 | "_**DO NOT USE JOINS - we'll talk about why not in a bit!**_" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "_Answer:_\n", 53 | "\n", 54 | "```MySQL\n", 55 | "```" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "### Subqueries in `WHERE`" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "How did we think about this?\n", 70 | "- The output of a subquery is a \"dataframe\" (or rather a subset of a table).\n", 71 | "- If we choose to extract just one column from a table using a query, we essentially have a list\n", 72 | "- We've written WHERE statements before with `IN` and `NOT IN` and compared results to a list\n", 73 | "- Connecting the dots: we can replace the list in a WHERE clause with a subquery to make things more dynamic" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "**Exercise 1:** Write a query that returns all information about employees who work in the Electronics division." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "_Answer:_\n", 88 | "```MySQL\n", 89 | "\n", 90 | "```" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "**Exercise 2:** Switching back to tables in the the `GoT_schema.txt` file now. Write a query that shows the name of characters (in the `people` table) who are not from any of the great noble houses (in the `houses` table)." 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "_Answer:_\n", 105 | "```MySQL\n", 106 | "```" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "**Exercise 3:** You might have noticed there are some noble houses that do not have any bannermen. Write a query that shows the name of the great noble houses without any bannermen (vassal houses) serving them." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "_Answer:_\n", 121 | "```MySQL\n", 122 | "\n", 123 | "```" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "_**Short Note on Efficient Queries**_\n", 131 | "\n", 132 | "Some `JOIN` commands (especially `INNER JOIN`) can be very computationally intensive. This is why sometimes we would prefer to write subqueries.\n", 133 | "\n", 134 | "_Example:_ Without using any kind of`JOIN`, find all employees who work in the Asia and Canada regions who make more than 13,000 dollars.\n", 135 | "\n", 136 | "```MySQL\n", 137 | "SELECT * from employees\n", 138 | "WHERE salary > 13000\n", 139 | "AND region_id IN (SELECT region_id\n", 140 | " FROM regions\n", 141 | " WHERE country IN ('Asia', 'Canada'))\n", 142 | "```" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "### Subqueries in `SELECT`" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "Subqueries can show up almost anywhere in the query! If we want to compare values to a single value, we could include the result of a subquery in the `SELECT` clause. This is especially important when you want to construct some sort of **_benchmark_** (e.g. how much you have missed/beaten a sales target by, what the active returns of a mutual fund is compared to its benchmark index, etc.) \n", 157 | "\n", 158 | "_Example:_ Show me the first_name, last_name, and salary of all employees next to the salary of the employee who earns the least at the company.\n", 159 | "\n", 160 | "```MySQL\n", 161 | "SELECT first_name,\n", 162 | " department,\n", 163 | " salary,\n", 164 | " (SELECT MIN(salary) FROM employees) AS \"lowest_salary\"\n", 165 | "FROM employees\n", 166 | "```" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "#### _Short Note on Order of Execution in SQL Queries_\n", 174 | "Across clauses, there is a sequence that queries follow. SQL queries will run FROM first, then WHERE and other filters, and then SELECT last. So in the exercise **below**, the `lowest_salary` is already going to be calculated based on Asia and Canada employees because WHERE executes before SELECT\n", 175 | "\n", 176 | "However, within a clause (e.g. within SELECT) everything runs _**simultaneously**_, not sequentially! So you cannot use `lowest_salary` in say a calculation for \"difference\" -- you will need to use the actual subquery in the calculation." 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "**Exercise 4:** Among all employees who work in Asia and Canada, calculate the how much less each employee makes compared to the highest earner across those regions." 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "_Answer:_\n", 191 | "```MySQL\n", 192 | "```" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "### Subqueries using `ALL` keyword" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "**Motivation:** We've learned convenient functions like `MAX` and `MIN` which helps us find the highest or lowest value in a field/column.\n", 207 | "\n", 208 | "```MySQL\n", 209 | "SELECT MAX(salary) FROM employees\n", 210 | "```\n", 211 | "\n", 212 | "What if your interviewer asked you to find the highest salary of all employees in the company **WITHOUT** using any built in SQL functions though?\n", 213 | "\n", 214 | "```MySQL\n", 215 | "SELECT salary\n", 216 | "FROM employees\n", 217 | "WHERE salary >= ALL(SELECT salary\n", 218 | " FROM employees)\n", 219 | "```" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "Interview aside though, here's a more practical problem. You're not going to be able to use MAX or MIN when it comes to this situation:\n", 227 | "\n", 228 | "**Exercise 5:** Find the mode salar(ies) of all employees in the company." 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "_Answer:_\n", 236 | "```MySQL\n", 237 | "\n", 238 | "```" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "### Challenge Interview Question \\#1\n", 246 | "\n", 247 | "A retailer store information about all of its products in a `Products` table, which contain the following columns:\n", 248 | "- `id`: the unique identification number for the product\n", 249 | "- `name`: the name the product\n", 250 | "- `manuf_id`: the identification number of the manufacturer we acquired this from\n", 251 | "- `grade`: the quality score on a scale of 1 (bad) to 100 (good) of the product according to reviews.\n", 252 | "\n", 253 | "Write a SQL query that returns the names of all products (there are ties) that have the **_SECOND_ lowest** score." 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "_Answer:_\n" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "### Challenge Interview Question \\#2\n", 268 | "\n", 269 | "A table called `eval` has 3 columns:
\n", 270 | "- case_id (int)
\n", 271 | "- timestamp (datetime)
\n", 272 | "- score (int)
\n", 273 | "\n", 274 | "But case_id is not unique. For a given case_id, there may be scores on different dates.\n", 275 | "\n", 276 | "Write a query to get the score for each case_id at most recent date." 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "_Answer:_\n", 284 | "\n", 285 | "```MySQL\n", 286 | "\n", 287 | "```" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "**_Need some help?_** While it is probably better that you do this under interview conditions (i.e. no help from pgAdmin), the option is there if you want to use this code to construct the database and visualize the outputs of your queries\n", 295 | "\n", 296 | "```MySQL\n", 297 | "create table eval (\n", 298 | "\tcase_id int,\n", 299 | "\ttimestamp date,\n", 300 | "\tscore int);\n", 301 | "\n", 302 | "insert into eval values (123, '2019-05-09', 7);\n", 303 | "insert into eval values (123, '2019-05-03', 6);\n", 304 | "insert into eval values (456, '2019-05-07', 1);\n", 305 | "insert into eval values (789, '2019-05-06', 3);\n", 306 | "insert into eval values (456, '2019-05-02', 9);\n", 307 | "insert into eval values (789, '2019-05-08', 2);```" 308 | ] 309 | } 310 | ], 311 | "metadata": { 312 | "kernelspec": { 313 | "display_name": "Python 3", 314 | "language": "python", 315 | "name": "python3" 316 | }, 317 | "language_info": { 318 | "codemirror_mode": { 319 | "name": "ipython", 320 | "version": 3 321 | }, 322 | "file_extension": ".py", 323 | "mimetype": "text/x-python", 324 | "name": "python", 325 | "nbconvert_exporter": "python", 326 | "pygments_lexer": "ipython3", 327 | "version": "3.7.1" 328 | } 329 | }, 330 | "nbformat": 4, 331 | "nbformat_minor": 2 332 | } 333 | -------------------------------------------------------------------------------- /2-Subqueries/.ipynb_checkpoints/Advanced-SQL-II_Subqueries_SOLUTIONS-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL II: Subqueries\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Additional Learning Tools after the course" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "The dataset I've used for this lesson is from [Udemy's Master SQL for Data Science](https://www.udemy.com/master-sql-for-data-science/learn/lecture/9790570#overview) course. In the repo, you should copy and paste the database construction queries from the `employees_udemy.txt` script into PostgreSQL if you wish to explore the dataset on your own." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## What is a subquery?\n", 31 | "\n", 32 | "Exactly what it sounds like: literally inception because **it's a query within a query**!\n", 33 | "\n", 34 | "...What?! Sounds complicated...why do we need this?" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "**Motivation:** The `employees` table has a department column amongst other employee-specific information. The `departments` table shows information on each of the departments. However, some departments have recently turned over their entire team and so there may not be any employees listed in those departments. How can we figure out which departments did this?\n", 42 | "\n", 43 | "TL;DR - How do we determine which departments exist in the `employees` table but not the `departments` table? Think through the logic in English first before you attempt to convert it to code.\n", 44 | "\n", 45 | "_**DO NOT USE JOINS - we'll talk about why not in a bit!**_" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "_Answer:_\n", 53 | "\n", 54 | "```MySQL\n", 55 | "SELECT DISTINCT department\n", 56 | "FROM employees\n", 57 | "WHERE department NOT IN (SELECT department\n", 58 | " FROM departments)\n", 59 | "```" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### Subqueries in `WHERE`" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "How did we think about this?\n", 74 | "- The output of a subquery is a \"dataframe\" (or rather a subset of a table).\n", 75 | "- If we choose to extract just one column from a table using a query, we essentially have a list\n", 76 | "- We've written WHERE statements before with `IN` and `NOT IN` and compared results to a list\n", 77 | "- Connecting the dots: we can replace the list in a WHERE clause with a subquery to make things more dynamic" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "**Exercise 1:** Write a query that returns all information about employees who work in the Electronics division." 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "_Answer:_\n", 92 | "```MySQL\n", 93 | "SELECT *\n", 94 | "FROM employees\n", 95 | "WHERE department IN (SELECT department\n", 96 | " FROM departments\n", 97 | " WHERE division = 'Electronics')\n", 98 | "```" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "**Exercise 2:** Switching back to tables in the the `GoT_schema.txt` file now. Write a query that shows the name of characters (in the `people` table) who are not from any of the great noble houses (in the `houses` table)." 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "_Answer:_\n", 113 | "```MySQL\n", 114 | "SELECT *\n", 115 | "FROM people\n", 116 | "WHERE house NOT IN (SELECT name\n", 117 | " FROM houses)\n", 118 | "```" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "**Exercise 3:** You might have noticed there are some noble houses that do not have any bannermen. Write a query that shows the name of the great noble houses without any bannermen (vassal houses) serving them." 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "_Answer:_\n", 133 | "```MySQL\n", 134 | "SELECT name\n", 135 | "FROM houses\n", 136 | "WHERE id NOT IN (SELECT leader_house_id\n", 137 | " FROM bannermen)\n", 138 | "```" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "_**Short Note on Efficient Queries**_\n", 146 | "\n", 147 | "Some `JOIN` commands (especially `INNER JOIN`) can be very computationally intensive. This is why sometimes we would prefer to write subqueries.\n", 148 | "\n", 149 | "_Example:_ Without using any kind of`JOIN`, find all employees who work in the Asia and Canada regions who make more than 13,000 dollars.\n", 150 | "\n", 151 | "```MySQL\n", 152 | "SELECT * from employees\n", 153 | "WHERE salary > 13000\n", 154 | "AND region_id IN (SELECT region_id\n", 155 | " FROM regions\n", 156 | " WHERE country IN ('Asia', 'Canada'))\n", 157 | "```" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "### Subqueries in `SELECT`" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "Subqueries can show up almost anywhere in the query! If we want to compare values to a single value, we could include the result of a subquery in the `SELECT` clause. This is especially important when you want to construct some sort of **_benchmark_** (e.g. how much you have missed/beaten a sales target by, what the active returns of a mutual fund is compared to its benchmark index, etc.) \n", 172 | "\n", 173 | "_Example:_ Show me the first_name, last_name, and salary of all employees next to the salary of the employee who earns the least at the company.\n", 174 | "\n", 175 | "```MySQL\n", 176 | "SELECT first_name,\n", 177 | " department,\n", 178 | " salary,\n", 179 | " (SELECT MIN(salary) FROM employees) AS \"lowest_salary\"\n", 180 | "FROM employees\n", 181 | "```" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "#### _Short Note on Order of Execution in SQL Queries_\n", 189 | "Across clauses, there is a sequence that queries follow. SQL queries will run FROM first, then WHERE and other filters, and then SELECT last. So in the exercise **below**, the `highest_salary` is already going to be calculated based on Asia and Canada employees because WHERE executes before SELECT.\n", 190 | "\n", 191 | "However, within a clause (e.g. within SELECT) everything runs _**simultaneously**_, not sequentially! So you cannot use `highest_salary` in say a calculation for \"difference\" -- you will need to use the actual subquery in the calculation." 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "**Exercise 4:** Among all employees who work in Asia and Canada, calculate the how much less each employee makes compared to the highest earner across those regions." 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "_Answer:_\n", 206 | "```MySQL\n", 207 | "SELECT first_name,\n", 208 | " department,\n", 209 | " salary,\n", 210 | " (SELECT MAX(salary) FROM employees) AS \"highest_salary\",\n", 211 | " (SELECT MAX(salary) FROM employees) - salary AS \"difference\" \n", 212 | "FROM employees\n", 213 | "WHERE region_id IN (SELECT region_id\n", 214 | " FROM regions\n", 215 | " WHERE country IN ('Asia', 'Canada'))\n", 216 | "```" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "### Subqueries using `ALL` keyword" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "**Motivation:** We've learned convenient functions like `MAX` and `MIN` which helps us find the highest or lowest value in a field/column.\n", 231 | "\n", 232 | "```MySQL\n", 233 | "SELECT MAX(salary) FROM employees\n", 234 | "```\n", 235 | "\n", 236 | "What if your interviewer asked you to find the highest salary of all employees in the company **WITHOUT** using any built in SQL functions though?\n", 237 | "\n", 238 | "```MySQL\n", 239 | "SELECT salary\n", 240 | "FROM employees\n", 241 | "WHERE salary >= ALL(SELECT salary\n", 242 | " FROM employees)\n", 243 | "```" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "Interview aside though, here's a more practical problem. You're not going to be able to use MAX or MIN when it comes to this situation:\n", 251 | "\n", 252 | "**Exercise 5:** Find the mode salar(ies) of all employees in the company." 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "_Answer:_\n", 260 | "```MySQL\n", 261 | "SELECT salary\n", 262 | "FROM employees\n", 263 | "GROUP BY salary\n", 264 | "HAVING COUNT(salary) >= ALL(SELECT COUNT(salary)\n", 265 | " FROM employees\n", 266 | " GROUP BY salary)\n", 267 | "```" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### Challenge Interview Question \\#1\n", 275 | "\n", 276 | "A retailer store information about all of its products in a `Products` table, which contain the following columns:\n", 277 | "- `id`: the unique identification number for the product\n", 278 | "- `name`: the name the product\n", 279 | "- `manuf_id`: the identification number of the manufacturer we acquired this from\n", 280 | "- `grade`: the quality score on a scale of 1 (bad) to 100 (good) of the product according to reviews.\n", 281 | "\n", 282 | "Write a SQL query that returns the names of all products (there are ties) that have the **_SECOND_ lowest** score." 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "_Answer:_\n", 290 | "```MySQL\n", 291 | "SELECT name\n", 292 | "FROM Products\n", 293 | "WHERE grade IN (SELECT MIN(grade) AS SecondLowest\n", 294 | " FROM Products\n", 295 | " WHERE grade > (SELECT MIN(grade) AS Lowest\n", 296 | " FROM Products))\n", 297 | "ORDER BY name;```" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "### Challenge Interview Question #2\n", 305 | "\n", 306 | "A table called `eval` has 3 columns:
\n", 307 | "- case_id (int)
\n", 308 | "- timestamp (datetime)
\n", 309 | "- score (int)
\n", 310 | "\n", 311 | "But case_id is not unique. For a given case_id, there may be scores on different dates.\n", 312 | "\n", 313 | "Write a query to get the score for each case_id at most recent date." 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "_Answer:_\n", 321 | "\n", 322 | "```MySQL\n", 323 | "SELECT case_id, timestamp, score\n", 324 | "FROM eval e1\n", 325 | "WHERE e1.timestamp = (SELECT e2.timestamp\n", 326 | " FROM eval e2\n", 327 | " WHERE e2.case_id = e1.case_id\n", 328 | " ORDER BY timestamp DESC\n", 329 | " LIMIT 1)\n", 330 | "```" 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "**_Need some help?_** While it is probably better that you do this under interview conditions (i.e. no help from pgAdmin), the option is there if you want to use this code to construct the database and visualize the outputs of your queries\n", 338 | "\n", 339 | "```MySQL\n", 340 | "create table eval (\n", 341 | "\tcase_id int,\n", 342 | "\ttimestamp date,\n", 343 | "\tscore int);\n", 344 | "\n", 345 | "insert into eval values (123, '2019-05-09', 7);\n", 346 | "insert into eval values (123, '2019-05-03', 6);\n", 347 | "insert into eval values (456, '2019-05-07', 1);\n", 348 | "insert into eval values (789, '2019-05-06', 3);\n", 349 | "insert into eval values (456, '2019-05-02', 9);\n", 350 | "insert into eval values (789, '2019-05-08', 2);```" 351 | ] 352 | } 353 | ], 354 | "metadata": { 355 | "kernelspec": { 356 | "display_name": "Python 3", 357 | "language": "python", 358 | "name": "python3" 359 | }, 360 | "language_info": { 361 | "codemirror_mode": { 362 | "name": "ipython", 363 | "version": 3 364 | }, 365 | "file_extension": ".py", 366 | "mimetype": "text/x-python", 367 | "name": "python", 368 | "nbconvert_exporter": "python", 369 | "pygments_lexer": "ipython3", 370 | "version": "3.7.1" 371 | } 372 | }, 373 | "nbformat": 4, 374 | "nbformat_minor": 2 375 | } 376 | -------------------------------------------------------------------------------- /2-Subqueries/Advanced-SQL-II_Subqueries_BLANK.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL II: Subqueries\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Additional Learning Tools after the course" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "The dataset I've used for this lesson is from [Udemy's Master SQL for Data Science](https://www.udemy.com/master-sql-for-data-science/learn/lecture/9790570#overview) course. In the repo, you should copy and paste the database construction queries from the `employees_udemy.txt` script into PostgreSQL if you wish to explore the dataset on your own." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## What is a subquery?\n", 31 | "\n", 32 | "Exactly what it sounds like: literally inception because **it's a query within a query**!\n", 33 | "\n", 34 | "...What?! Sounds complicated...why do we need this?" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "**Motivation:** The `employees` table has a department column amongst other employee-specific information. The `departments` table shows information on each of the departments. However, some departments have recently turned over their entire team and so there may not be any employees listed in those departments. How can we figure out which departments did this?\n", 42 | "\n", 43 | "TL;DR - How do we determine which departments exist in the `employees` table but not the `departments` table? Think through the logic in English first before you attempt to convert it to code.\n", 44 | "\n", 45 | "_**DO NOT USE JOINS - we'll talk about why not in a bit!**_" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "_Answer:_\n", 53 | "\n", 54 | "```MySQL\n", 55 | "```" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "### Subqueries in `WHERE`" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "How did we think about this?\n", 70 | "- The output of a subquery is a \"dataframe\" (or rather a subset of a table).\n", 71 | "- If we choose to extract just one column from a table using a query, we essentially have a list\n", 72 | "- We've written WHERE statements before with `IN` and `NOT IN` and compared results to a list\n", 73 | "- Connecting the dots: we can replace the list in a WHERE clause with a subquery to make things more dynamic" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "**Exercise 1:** Write a query that returns all information about employees who work in the Electronics division." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "_Answer:_\n", 88 | "```MySQL\n", 89 | "\n", 90 | "```" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "**Exercise 2:** Switching back to tables in the the `GoT_schema.txt` file now. Write a query that shows the name of characters (in the `people` table) who are not from any of the great noble houses (in the `houses` table)." 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "_Answer:_\n", 105 | "```MySQL\n", 106 | "```" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "**Exercise 3:** You might have noticed there are some noble houses that do not have any bannermen. Write a query that shows the name of the great noble houses without any bannermen (vassal houses) serving them." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "_Answer:_\n", 121 | "```MySQL\n", 122 | "\n", 123 | "```" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "_**Short Note on Efficient Queries**_\n", 131 | "\n", 132 | "Some `JOIN` commands (especially `INNER JOIN`) can be very computationally intensive. This is why sometimes we would prefer to write subqueries.\n", 133 | "\n", 134 | "_Example:_ Without using any kind of`JOIN`, find all employees who work in the Asia and Canada regions who make more than 13,000 dollars.\n", 135 | "\n", 136 | "```MySQL\n", 137 | "SELECT * from employees\n", 138 | "WHERE salary > 13000\n", 139 | "AND region_id IN (SELECT region_id\n", 140 | " FROM regions\n", 141 | " WHERE country IN ('Asia', 'Canada'))\n", 142 | "```" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "### Subqueries in `SELECT`" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "Subqueries can show up almost anywhere in the query! If we want to compare values to a single value, we could include the result of a subquery in the `SELECT` clause. This is especially important when you want to construct some sort of **_benchmark_** (e.g. how much you have missed/beaten a sales target by, what the active returns of a mutual fund is compared to its benchmark index, etc.) \n", 157 | "\n", 158 | "_Example:_ Show me the first_name, last_name, and salary of all employees next to the salary of the employee who earns the least at the company.\n", 159 | "\n", 160 | "```MySQL\n", 161 | "SELECT first_name,\n", 162 | " department,\n", 163 | " salary,\n", 164 | " (SELECT MIN(salary) FROM employees) AS \"lowest_salary\"\n", 165 | "FROM employees\n", 166 | "```" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "#### _Short Note on Order of Execution in SQL Queries_\n", 174 | "Across clauses, there is a sequence that queries follow. SQL queries will run FROM first, then WHERE and other filters, and then SELECT last. So in the exercise **below**, the `lowest_salary` is already going to be calculated based on Asia and Canada employees because WHERE executes before SELECT\n", 175 | "\n", 176 | "However, within a clause (e.g. within SELECT) everything runs _**simultaneously**_, not sequentially! So you cannot use `lowest_salary` in say a calculation for \"difference\" -- you will need to use the actual subquery in the calculation." 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "**Exercise 4:** Among all employees who work in Asia and Canada, calculate the how much less each employee makes compared to the highest earner across those regions." 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "_Answer:_\n", 191 | "```MySQL\n", 192 | "```" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "### Subqueries using `ALL` keyword" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "**Motivation:** We've learned convenient functions like `MAX` and `MIN` which helps us find the highest or lowest value in a field/column.\n", 207 | "\n", 208 | "```MySQL\n", 209 | "SELECT MAX(salary) FROM employees\n", 210 | "```\n", 211 | "\n", 212 | "What if your interviewer asked you to find the highest salary of all employees in the company **WITHOUT** using any built in SQL functions though?\n", 213 | "\n", 214 | "```MySQL\n", 215 | "SELECT salary\n", 216 | "FROM employees\n", 217 | "WHERE salary >= ALL(SELECT salary\n", 218 | " FROM employees)\n", 219 | "```" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "Interview aside though, here's a more practical problem. You're not going to be able to use MAX or MIN when it comes to this situation:\n", 227 | "\n", 228 | "**Exercise 5:** Find the mode salar(ies) of all employees in the company." 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "_Answer:_\n", 236 | "```MySQL\n", 237 | "\n", 238 | "```" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "### Challenge Interview Question \\#1\n", 246 | "\n", 247 | "A retailer store information about all of its products in a `Products` table, which contain the following columns:\n", 248 | "- `id`: the unique identification number for the product\n", 249 | "- `name`: the name the product\n", 250 | "- `manuf_id`: the identification number of the manufacturer we acquired this from\n", 251 | "- `grade`: the quality score on a scale of 1 (bad) to 100 (good) of the product according to reviews.\n", 252 | "\n", 253 | "Write a SQL query that returns the names of all products (there are ties) that have the **_SECOND_ lowest** score." 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "_Answer:_\n" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "### Challenge Interview Question \\#2\n", 268 | "\n", 269 | "A table called `eval` has 3 columns:
\n", 270 | "- case_id (int)
\n", 271 | "- timestamp (datetime)
\n", 272 | "- score (int)
\n", 273 | "\n", 274 | "But case_id is not unique. For a given case_id, there may be scores on different dates.\n", 275 | "\n", 276 | "Write a query to get the score for each case_id at most recent date." 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "_Answer:_\n", 284 | "\n", 285 | "```MySQL\n", 286 | "\n", 287 | "```" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "**_Need some help?_** While it is probably better that you do this under interview conditions (i.e. no help from pgAdmin), the option is there if you want to use this code to construct the database and visualize the outputs of your queries\n", 295 | "\n", 296 | "```MySQL\n", 297 | "create table eval (\n", 298 | "\tcase_id int,\n", 299 | "\ttimestamp date,\n", 300 | "\tscore int);\n", 301 | "\n", 302 | "insert into eval values (123, '2019-05-09', 7);\n", 303 | "insert into eval values (123, '2019-05-03', 6);\n", 304 | "insert into eval values (456, '2019-05-07', 1);\n", 305 | "insert into eval values (789, '2019-05-06', 3);\n", 306 | "insert into eval values (456, '2019-05-02', 9);\n", 307 | "insert into eval values (789, '2019-05-08', 2);```" 308 | ] 309 | } 310 | ], 311 | "metadata": { 312 | "kernelspec": { 313 | "display_name": "Python 3", 314 | "language": "python", 315 | "name": "python3" 316 | }, 317 | "language_info": { 318 | "codemirror_mode": { 319 | "name": "ipython", 320 | "version": 3 321 | }, 322 | "file_extension": ".py", 323 | "mimetype": "text/x-python", 324 | "name": "python", 325 | "nbconvert_exporter": "python", 326 | "pygments_lexer": "ipython3", 327 | "version": "3.7.1" 328 | } 329 | }, 330 | "nbformat": 4, 331 | "nbformat_minor": 2 332 | } 333 | -------------------------------------------------------------------------------- /2-Subqueries/Advanced-SQL-II_Subqueries_SOLUTIONS.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL II: Subqueries\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## Additional Learning Tools after the course" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "The dataset I've used for this lesson is from [Udemy's Master SQL for Data Science](https://www.udemy.com/master-sql-for-data-science/learn/lecture/9790570#overview) course. In the repo, you should copy and paste the database construction queries from the `employees_udemy.txt` script into PostgreSQL if you wish to explore the dataset on your own." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## What is a subquery?\n", 31 | "\n", 32 | "Exactly what it sounds like: literally inception because **it's a query within a query**!\n", 33 | "\n", 34 | "...What?! Sounds complicated...why do we need this?" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "**Motivation:** The `employees` table has a department column amongst other employee-specific information. The `departments` table shows information on each of the departments. However, some departments have recently turned over their entire team and so there may not be any employees listed in those departments. How can we figure out which departments did this?\n", 42 | "\n", 43 | "TL;DR - How do we determine which departments exist in the `employees` table but not the `departments` table? Think through the logic in English first before you attempt to convert it to code.\n", 44 | "\n", 45 | "_**DO NOT USE JOINS - we'll talk about why not in a bit!**_" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "_Answer:_\n", 53 | "\n", 54 | "```MySQL\n", 55 | "SELECT DISTINCT department\n", 56 | "FROM employees\n", 57 | "WHERE department NOT IN (SELECT department\n", 58 | " FROM departments)\n", 59 | "```" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### Subqueries in `WHERE`" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "How did we think about this?\n", 74 | "- The output of a subquery is a \"dataframe\" (or rather a subset of a table).\n", 75 | "- If we choose to extract just one column from a table using a query, we essentially have a list\n", 76 | "- We've written WHERE statements before with `IN` and `NOT IN` and compared results to a list\n", 77 | "- Connecting the dots: we can replace the list in a WHERE clause with a subquery to make things more dynamic" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "**Exercise 1:** Write a query that returns all information about employees who work in the Electronics division." 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "_Answer:_\n", 92 | "```MySQL\n", 93 | "SELECT *\n", 94 | "FROM employees\n", 95 | "WHERE department IN (SELECT department\n", 96 | " FROM departments\n", 97 | " WHERE division = 'Electronics')\n", 98 | "```" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "**Exercise 2:** Switching back to tables in the the `GoT_schema.txt` file now. Write a query that shows the name of characters (in the `people` table) who are not from any of the great noble houses (in the `houses` table)." 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "_Answer:_\n", 113 | "```MySQL\n", 114 | "SELECT *\n", 115 | "FROM people\n", 116 | "WHERE house NOT IN (SELECT name\n", 117 | " FROM houses)\n", 118 | "```" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "**Exercise 3:** You might have noticed there are some noble houses that do not have any bannermen. Write a query that shows the name of the great noble houses without any bannermen (vassal houses) serving them." 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "_Answer:_\n", 133 | "```MySQL\n", 134 | "SELECT name\n", 135 | "FROM houses\n", 136 | "WHERE id NOT IN (SELECT leader_house_id\n", 137 | " FROM bannermen)\n", 138 | "```" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "_**Short Note on Efficient Queries**_\n", 146 | "\n", 147 | "Some `JOIN` commands (especially `INNER JOIN`) can be very computationally intensive. This is why sometimes we would prefer to write subqueries.\n", 148 | "\n", 149 | "_Example:_ Without using any kind of`JOIN`, find all employees who work in the Asia and Canada regions who make more than 13,000 dollars.\n", 150 | "\n", 151 | "```MySQL\n", 152 | "SELECT * from employees\n", 153 | "WHERE salary > 13000\n", 154 | "AND region_id IN (SELECT region_id\n", 155 | " FROM regions\n", 156 | " WHERE country IN ('Asia', 'Canada'))\n", 157 | "```" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "### Subqueries in `SELECT`" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "Subqueries can show up almost anywhere in the query! If we want to compare values to a single value, we could include the result of a subquery in the `SELECT` clause. This is especially important when you want to construct some sort of **_benchmark_** (e.g. how much you have missed/beaten a sales target by, what the active returns of a mutual fund is compared to its benchmark index, etc.) \n", 172 | "\n", 173 | "_Example:_ Show me the first_name, last_name, and salary of all employees next to the salary of the employee who earns the least at the company.\n", 174 | "\n", 175 | "```MySQL\n", 176 | "SELECT first_name,\n", 177 | " department,\n", 178 | " salary,\n", 179 | " (SELECT MIN(salary) FROM employees) AS \"lowest_salary\"\n", 180 | "FROM employees\n", 181 | "```" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "#### _Short Note on Order of Execution in SQL Queries_\n", 189 | "Across clauses, there is a sequence that queries follow. SQL queries will run FROM first, then WHERE and other filters, and then SELECT last. So in the exercise **below**, the `highest_salary` is already going to be calculated based on Asia and Canada employees because WHERE executes before SELECT.\n", 190 | "\n", 191 | "However, within a clause (e.g. within SELECT) everything runs _**simultaneously**_, not sequentially! So you cannot use `highest_salary` in say a calculation for \"difference\" -- you will need to use the actual subquery in the calculation." 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "**Exercise 4:** Among all employees who work in Asia and Canada, calculate the how much less each employee makes compared to the highest earner across those regions." 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "_Answer:_\n", 206 | "```MySQL\n", 207 | "SELECT first_name,\n", 208 | " department,\n", 209 | " salary,\n", 210 | " (SELECT MAX(salary) FROM employees) AS \"highest_salary\",\n", 211 | " (SELECT MAX(salary) FROM employees) - salary AS \"difference\" \n", 212 | "FROM employees\n", 213 | "WHERE region_id IN (SELECT region_id\n", 214 | " FROM regions\n", 215 | " WHERE country IN ('Asia', 'Canada'))\n", 216 | "```" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "### Subqueries using `ALL` keyword" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "**Motivation:** We've learned convenient functions like `MAX` and `MIN` which helps us find the highest or lowest value in a field/column.\n", 231 | "\n", 232 | "```MySQL\n", 233 | "SELECT MAX(salary) FROM employees\n", 234 | "```\n", 235 | "\n", 236 | "What if your interviewer asked you to find the highest salary of all employees in the company **WITHOUT** using any built in SQL functions though?\n", 237 | "\n", 238 | "```MySQL\n", 239 | "SELECT salary\n", 240 | "FROM employees\n", 241 | "WHERE salary >= ALL(SELECT salary\n", 242 | " FROM employees)\n", 243 | "```" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "Interview aside though, here's a more practical problem. You're not going to be able to use MAX or MIN when it comes to this situation:\n", 251 | "\n", 252 | "**Exercise 5:** Find the mode salar(ies) of all employees in the company." 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "_Answer:_\n", 260 | "```MySQL\n", 261 | "SELECT salary\n", 262 | "FROM employees\n", 263 | "GROUP BY salary\n", 264 | "HAVING COUNT(salary) >= ALL(SELECT COUNT(salary)\n", 265 | " FROM employees\n", 266 | " GROUP BY salary)\n", 267 | "```" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### Challenge Interview Question \\#1\n", 275 | "\n", 276 | "A retailer store information about all of its products in a `Products` table, which contain the following columns:\n", 277 | "- `id`: the unique identification number for the product\n", 278 | "- `name`: the name the product\n", 279 | "- `manuf_id`: the identification number of the manufacturer we acquired this from\n", 280 | "- `grade`: the quality score on a scale of 1 (bad) to 100 (good) of the product according to reviews.\n", 281 | "\n", 282 | "Write a SQL query that returns the names of all products (there are ties) that have the **_SECOND_ lowest** score." 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "_Answer:_\n", 290 | "```MySQL\n", 291 | "SELECT name\n", 292 | "FROM Products\n", 293 | "WHERE grade IN (SELECT MIN(grade) AS SecondLowest\n", 294 | " FROM Products\n", 295 | " WHERE grade > (SELECT MIN(grade) AS Lowest\n", 296 | " FROM Products))\n", 297 | "ORDER BY name;```" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "### Challenge Interview Question #2\n", 305 | "\n", 306 | "A table called `eval` has 3 columns:
\n", 307 | "- case_id (int)
\n", 308 | "- timestamp (datetime)
\n", 309 | "- score (int)
\n", 310 | "\n", 311 | "But case_id is not unique. For a given case_id, there may be scores on different dates.\n", 312 | "\n", 313 | "Write a query to get the score for each case_id at most recent date." 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "_Answer:_\n", 321 | "\n", 322 | "```MySQL\n", 323 | "SELECT case_id, timestamp, score\n", 324 | "FROM eval e1\n", 325 | "WHERE e1.timestamp = (SELECT e2.timestamp\n", 326 | " FROM eval e2\n", 327 | " WHERE e2.case_id = e1.case_id\n", 328 | " ORDER BY timestamp DESC\n", 329 | " LIMIT 1)\n", 330 | "```" 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "**_Need some help?_** While it is probably better that you do this under interview conditions (i.e. no help from pgAdmin), the option is there if you want to use this code to construct the database and visualize the outputs of your queries\n", 338 | "\n", 339 | "```MySQL\n", 340 | "create table eval (\n", 341 | "\tcase_id int,\n", 342 | "\ttimestamp date,\n", 343 | "\tscore int);\n", 344 | "\n", 345 | "insert into eval values (123, '2019-05-09', 7);\n", 346 | "insert into eval values (123, '2019-05-03', 6);\n", 347 | "insert into eval values (456, '2019-05-07', 1);\n", 348 | "insert into eval values (789, '2019-05-06', 3);\n", 349 | "insert into eval values (456, '2019-05-02', 9);\n", 350 | "insert into eval values (789, '2019-05-08', 2);```" 351 | ] 352 | } 353 | ], 354 | "metadata": { 355 | "kernelspec": { 356 | "display_name": "Python 3", 357 | "language": "python", 358 | "name": "python3" 359 | }, 360 | "language_info": { 361 | "codemirror_mode": { 362 | "name": "ipython", 363 | "version": 3 364 | }, 365 | "file_extension": ".py", 366 | "mimetype": "text/x-python", 367 | "name": "python", 368 | "nbconvert_exporter": "python", 369 | "pygments_lexer": "ipython3", 370 | "version": "3.7.1" 371 | } 372 | }, 373 | "nbformat": 4, 374 | "nbformat_minor": 2 375 | } 376 | -------------------------------------------------------------------------------- /3-Window_Functions/Advanced-SQL-III_Correlated-Sub-Queries-and-Window-Functions_BLANK.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL III: Correlated Sub-Queries and Window Functions\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "Alright, it's the final stretch! This is the last of the 3-part workshop on Advanced SQL techniques." 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "## Warm-Up\n", 24 | "\n", 25 | "**Warm-Up Exercise:**\n", 26 | "Write a query that shows the department and the number of people in each department." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "_Answer:_\n", 34 | "```MySQL\n", 35 | "\n", 36 | "```" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "**But what about this:** Write a query that includes each employee's first_name, their department and the number of people in their department.\n", 44 | "\n", 45 | "Some of you might be thinking you can just add `first_name` as an additional column and throw that into the `GROUP BY` like:\n", 46 | "```MySQL\n", 47 | "SELECT first_name,\n", 48 | " department,\n", 49 | " COUNT(department)\n", 50 | "FROM employees\n", 51 | "GROUP BY department, first_name\n", 52 | "```\n", 53 | "But why does this **_NOT_ work**?\n", 54 | "\n", 55 | "Here's the **right version** which makes use of sub-queries:\n", 56 | "```MySQL\n", 57 | "SELECT first_name,\n", 58 | " department,\n", 59 | " (SELECT COUNT(department)\n", 60 | " FROM employees e1\n", 61 | " WHERE e1.department = e2.department)\n", 62 | "FROM employees e2\n", 63 | "GROUP BY department, first_name\n", 64 | "```\n", 65 | "This is what we call a **correlated sub-query**, which is a little complicated but useful. However, correlated sub-queries can be computationally inefficient because here it has to run for every single row! So...what's better?" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "## Window Functions!\n", 73 | "\n", 74 | "Just to be clear a window function is **NOT** a function called `WINDOW()` - it is a family of functions that _**operate on a group of rows (window) that are somehow related to the current row**_.\n", 75 | "\n", 76 | "This necessarily means `GROUP BY` is being used **_behind the scenes_** but we do not necessarily need to explicitly call `GROUP BY` anymore when we use window functions!\n", 77 | "\n", 78 | "**IMPORTANT WARNING:** Depending on the SQL dialect used at your company, you may not have access to Window functions. PostgreSQL (which we're using here) supports it; however, MySQL does not." 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "### `OVER()`: Basic Uses\n", 86 | "The most common Window Function is `OVER()` which allows us to specify the \"window\" (or group of rows) that is the focus of our analysis.\n", 87 | "\n", 88 | "_Example:_\n", 89 | "```MySQL\n", 90 | "SELECT first_name,\n", 91 | "\t\tdepartment,\n", 92 | "\t\tCOUNT(*) OVER(PARTITION BY department)\n", 93 | "FROM employees\n", 94 | "```\n", 95 | "\n", 96 | "Let's check that the output of the correlated sub-query method we used earlier is equivalent to this:\n", 97 | "\n", 98 | "```MySQL\n", 99 | "(SELECT first_name,\n", 100 | " department,\n", 101 | " (SELECT COUNT(department)\n", 102 | " FROM employees e1\n", 103 | " WHERE e1.department = e2.department)\n", 104 | "FROM employees e2\n", 105 | "GROUP BY department, first_name)\n", 106 | "\n", 107 | "EXCEPT\n", 108 | "\n", 109 | "(SELECT first_name,\n", 110 | "\t\tdepartment,\n", 111 | "\t\tCOUNT(*) OVER(PARTITION BY department)\n", 112 | "FROM employees)\n", 113 | "```\n", 114 | "\n", 115 | "**Exercise 1:** Write a query that includes each employee's first_name, their department and the total salaries earned of people in their department using\n", 116 | "\n", 117 | "**(a) The Correlated Sub-Query Method**" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "_Answer:_\n", 125 | "```MySQL\n", 126 | "\n", 127 | "```" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "**(b) The Window Function Method**" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "_Answer:_\n", 142 | "```MySQL\n", 143 | "\n", 144 | "```" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "**Exercise 2:** Write a query that includes:\n", 152 | "- each employee's first_name\n", 153 | "- their department\n", 154 | "- their department size\n", 155 | "- their region_id\n", 156 | "- the total salaries earned of people in their region\n", 157 | "\n", 158 | "using as few lines of code as possible." 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "_Answer:_\n", 166 | "```MySQL\n", 167 | "\n", 168 | "```" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "### `OVER()`: Cumulative Sums\n", 176 | "\n", 177 | "Beyond the basic uses of `OVER()` as a more flexible `GROUP BY`, we can also use it to help us do cumulative calculations. \n", 178 | "\n", 179 | "_Example:_ Write a query that returns the first name of all employees, the hire_date, the employee's salary, and **the total salaries earned by employees on each date** (call it \"cumulative_salary\").\n", 180 | "\n", 181 | "```MySQL\n", 182 | "SELECT first_name,\n", 183 | " hire_date,\n", 184 | " salary,\n", 185 | " SUM(salary) OVER(ORDER BY hire_date -- this is the index we are going by\n", 186 | " RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary -- the input range\n", 187 | "FROM employees\n", 188 | "```" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "The `UNBOUNDED PRECEDING` here means from the start of the ordered series (earliest hire_date in this case)." 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "**Exercise 3:** Write a query that returns the first name, hire date, salary, and cumulative salaries paid at each date for each department since the company was founded." 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "_Answer:_\n", 210 | "```MySQL\n", 211 | "\n", 212 | "```" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "### `OVER()`: Rolling Averages" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "This is very similar to cumulative sums except we just need to change `SUM()` $\\rightarrow$ `AVG()` and change `UNBOUNDED PRECEEDING` to a specified window.\n", 227 | "\n", 228 | "It makes more sense to work with financial data to do this so let's switch gears to the `DailyQuote` table which you can access if you create a new database using commands from `DailyQuote_essentialsql.txt` _(Source: https://www.essentialsql.com/sql-puzzle-calculate-moving-averages/)_\n", 229 | "\n", 230 | "*Example:* Calculate the 3-day rolling average closing price of the stock at each date.\n", 231 | "```MySQL\n", 232 | "SELECT MarketDate,\n", 233 | " ClosingPrice,\n", 234 | " AVG(ClosingPrice) OVER (ORDER BY MarketDate ASC\n", 235 | " ROWS BETWEEN '2' PRECEDING AND CURRENT ROW) AS \"3D_roll_avg\"\n", 236 | "FROM DailyQuote\n", 237 | "```" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "_**Aside: ...but technically the right way to do things**_\n", 245 | "\n", 246 | "Notice that the rolling averages for the first 3 dates don't make sense since it's not a 5-day average but based on just the existing value up until that date.\n", 247 | "\n", 248 | "```MySQL\n", 249 | "SELECT MarketDate,\n", 250 | "\t RowNumber,\n", 251 | "\t ClosingPrice,\n", 252 | "\t CASE WHEN RowNumber > 4 THEN \"5D_roll_avg\"\n", 253 | "\t\t\tELSE NULL\n", 254 | "\t\t\tEND AS \"5D_roll_avg\"\n", 255 | "FROM (SELECT MarketDate,\n", 256 | "\t\t\tClosingPrice,\n", 257 | "\t ROW_NUMBER() OVER(ORDER BY MarketDate ASC) AS RowNumber,\n", 258 | "\t AVG(ClosingPrice) OVER (ORDER BY MarketDate ASC\n", 259 | "\t\t\t\t\t\t\t ROWS BETWEEN '4' PRECEDING AND CURRENT ROW) AS \"5D_roll_avg\"\n", 260 | "\t FROM DailyQuote) AS subquery\n", 261 | "```" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "**Exercise 4:** Calculate the 10-day and 30-day rolling average closing price of the stock at each date.\n", 269 | "_(You can try this the quick way, or as a bonus try doing this the right way)_ " 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "_Answer:_\n", 277 | "```MySQL\n", 278 | "\n", 279 | "```" 280 | ] 281 | } 282 | ], 283 | "metadata": { 284 | "kernelspec": { 285 | "display_name": "Python 3", 286 | "language": "python", 287 | "name": "python3" 288 | }, 289 | "language_info": { 290 | "codemirror_mode": { 291 | "name": "ipython", 292 | "version": 3 293 | }, 294 | "file_extension": ".py", 295 | "mimetype": "text/x-python", 296 | "name": "python", 297 | "nbconvert_exporter": "python", 298 | "pygments_lexer": "ipython3", 299 | "version": "3.6.8" 300 | } 301 | }, 302 | "nbformat": 4, 303 | "nbformat_minor": 2 304 | } 305 | -------------------------------------------------------------------------------- /3-Window_Functions/Advanced-SQL-III_Correlated-Sub-Queries-and-Window-Functions_SOLUTIONS.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced SQL III: Correlated Sub-Queries and Window Functions\n", 8 | "_**Author**: Boom Devahastin Na Ayudhya_\n", 9 | "***" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "Alright, it's the final stretch! This is the last of the three-part workshop on Advanced SQL techniques. This is going to be the toughest out of all topics we've covered, but we'll get through it!" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "## Warm-Up\n", 24 | "\n", 25 | "**Warm-Up Exercise:**\n", 26 | "Write a query that shows the department and the number of people in each department." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "_Answer:_\n", 34 | "```MySQL\n", 35 | "SELECT department, COUNT(employee_id)\n", 36 | "FROM employees\n", 37 | "GROUP BY department\n", 38 | "```" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "**But what about this:** Write a query that includes each employee's first_name, their department and the number of people in their department.\n", 46 | "\n", 47 | "Some of you might be thinking you can just add `first_name` as an additional column and throw that into the `GROUP BY` like:\n", 48 | "```MySQL\n", 49 | "SELECT first_name,\n", 50 | " department,\n", 51 | " COUNT(department)\n", 52 | "FROM employees\n", 53 | "GROUP BY department, first_name\n", 54 | "```\n", 55 | "But why does this **_NOT_ work**?\n", 56 | "\n", 57 | "Here's the **right version** which makes use of sub-queries:\n", 58 | "```MySQL\n", 59 | "SELECT first_name,\n", 60 | " department,\n", 61 | " (SELECT COUNT(department)\n", 62 | " FROM employees e1\n", 63 | " WHERE e1.department = e2.department)\n", 64 | "FROM employees e2\n", 65 | "GROUP BY department, first_name\n", 66 | "```\n", 67 | "This is what we call a **correlated sub-query**, which is a little complicated but useful. However, correlated sub-queries can be computationally inefficient because here it has to run for every single row! So...what's better?" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "## Window Functions!\n", 75 | "\n", 76 | "Just to be clear a window function is **NOT** a function called `WINDOW()` - it is a family of functions that _**operate on a group of rows (window) that are somehow related to the current row**_.\n", 77 | "\n", 78 | "This necessarily means `GROUP BY` is being used **_behind the scenes_** but we do not necessarily need to explicitly call `GROUP BY` anymore when we use window functions!\n", 79 | "\n", 80 | "**IMPORTANT WARNING:** Depending on the SQL dialect used at your company, you may not have access to Window functions. PostgreSQL (which we're using here) supports it; however, MySQL does not." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "### `OVER()`: Basic Uses\n", 88 | "The most common Window Function is `OVER()` which allows us to specify the \"window\" (or group of rows) that is the focus of our analysis.\n", 89 | "\n", 90 | "_Example:_\n", 91 | "```MySQL\n", 92 | "SELECT first_name,\n", 93 | "\t\tdepartment,\n", 94 | "\t\tCOUNT(*) OVER(PARTITION BY department)\n", 95 | "FROM employees\n", 96 | "```\n", 97 | "\n", 98 | "Let's check that the output of the correlated sub-query method we used earlier is equivalent to this:\n", 99 | "\n", 100 | "```MySQL\n", 101 | "(SELECT first_name,\n", 102 | " department,\n", 103 | " (SELECT COUNT(department)\n", 104 | " FROM employees e1\n", 105 | " WHERE e1.department = e2.department)\n", 106 | "FROM employees e2\n", 107 | "GROUP BY department, first_name)\n", 108 | "\n", 109 | "EXCEPT\n", 110 | "\n", 111 | "(SELECT first_name,\n", 112 | "\t\tdepartment,\n", 113 | "\t\tCOUNT(*) OVER(PARTITION BY department)\n", 114 | "FROM employees)\n", 115 | "```\n", 116 | "\n", 117 | "**Exercise 1:** Write a query that includes each employee's first_name, their department and the total salaries earned of people in their department using\n", 118 | "\n", 119 | "**(a) The Correlated Sub-Query Method**" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "_Answer:_\n", 127 | "```MySQL\n", 128 | "SELECT first_name,\n", 129 | " department,\n", 130 | " (SELECT SUM(salary)\n", 131 | " FROM employees e1\n", 132 | " WHERE e1.department = e2.department)\n", 133 | "FROM employees e2\n", 134 | "GROUP BY department, first_name\n", 135 | "```" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "**(b) The Window Function Method**" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "_Answer:_\n", 150 | "```MySQL\n", 151 | "SELECT first_name,\n", 152 | " department,\n", 153 | " SUM(salary) OVER(PARTITION BY department)\n", 154 | "FROM employees\n", 155 | "```" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "**Exercise 2:** Write a query that includes:\n", 163 | "- each employee's first_name\n", 164 | "- their department\n", 165 | "- their department size\n", 166 | "- their region_id\n", 167 | "- the total salaries earned of people in their region\n", 168 | "\n", 169 | "using as few lines of code as possible." 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "_Answer:_\n", 177 | "```MySQL\n", 178 | "SELECT first_name,\n", 179 | "\t department,\n", 180 | "\t COUNT(*) OVER(PARTITION BY department) AS \"dept_size\",\n", 181 | " region_id,\n", 182 | "\t SUM(salary) OVER(PARTITION BY region_id) AS \"region_total_salary\"\n", 183 | "FROM employees\n", 184 | "```" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "### `OVER()`: Cumulative Sums\n", 192 | "\n", 193 | "Beyond the basic uses of `OVER()` as a more flexible `GROUP BY`, we can also use it to help us do cumulative calculations. \n", 194 | "\n", 195 | "_Example:_ Write a query that returns the first name of all employees, the hire_date, the employee's salary, and **the total salaries earned by employees on each date** (call it \"cumulative_salary\").\n", 196 | "\n", 197 | "```MySQL\n", 198 | "SELECT first_name,\n", 199 | " hire_date,\n", 200 | " salary,\n", 201 | " SUM(salary) OVER(ORDER BY hire_date -- this is the index we are going by\n", 202 | " RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary -- the input range\n", 203 | "FROM employees\n", 204 | "```" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "The `UNBOUNDED PRECEDING` here means from the start of the ordered series (earliest hire_date in this case)." 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "**Exercise 3:** Write a query that returns the first name, hire date, salary, and cumulative salaries paid at each date for each department since the company was founded." 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "_Answer:_\n", 226 | "```MySQL\n", 227 | "SELECT department, hire_date, first_name, salary,\n", 228 | " SUM(salary) OVER(PARTITION BY department -- equivalent of GROUP BY\n", 229 | " ORDER BY hire_date -- index\n", 230 | " RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_dept_salary -- input range\n", 231 | "FROM employees\n", 232 | "```" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "### `OVER()`: Rolling Averages" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "This is very similar to cumulative sums except we just need to change `SUM()` $\\rightarrow$ `AVG()` and change `UNBOUNDED PRECEEDING` to a specified window.\n", 247 | "\n", 248 | "It makes more sense to work with financial data to do this so let's switch gears to the `DailyQuote` table which you can access if you create a new database using commands from `DailyQuote_essentialsql.txt` _(Source: https://www.essentialsql.com/sql-puzzle-calculate-moving-averages/)_\n", 249 | "\n", 250 | "*Example:* Calculate the 3-day rolling average closing price of the stock at each date.\n", 251 | "```MySQL\n", 252 | "SELECT MarketDate,\n", 253 | " ClosingPrice,\n", 254 | " AVG(ClosingPrice) OVER (ORDER BY MarketDate ASC\n", 255 | " ROWS BETWEEN '2' PRECEDING AND CURRENT ROW) AS \"3D_roll_avg\"\n", 256 | "FROM DailyQuote\n", 257 | "```" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "_**Aside: ...but technically the right way to do things**_\n", 265 | "\n", 266 | "Notice that the rolling averages for the first 3 dates don't make sense since it's not a 5-day average but based on just the existing value up until that date.\n", 267 | "\n", 268 | "```MySQL\n", 269 | "SELECT MarketDate,\n", 270 | "\t RowNumber,\n", 271 | "\t ClosingPrice,\n", 272 | "\t CASE WHEN RowNumber > 4 THEN \"5D_roll_avg\"\n", 273 | "\t\t\tELSE NULL\n", 274 | "\t\t\tEND AS \"5D_roll_avg\"\n", 275 | "FROM (SELECT MarketDate,\n", 276 | "\t\t\tClosingPrice,\n", 277 | "\t ROW_NUMBER() OVER(ORDER BY MarketDate ASC) AS RowNumber,\n", 278 | "\t AVG(ClosingPrice) OVER (ORDER BY MarketDate ASC\n", 279 | "\t\t\t\t\t\t\t ROWS BETWEEN '4' PRECEDING AND CURRENT ROW) AS \"5D_roll_avg\"\n", 280 | "\t FROM DailyQuote) AS subquery\n", 281 | "```" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "**Exercise 4:** Calculate the 10-day and 30-day rolling average closing price of the stock at each date.\n", 289 | "_(You can try this the quick way, or as a bonus try doing this the right way)_ " 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "_Answer:_\n", 297 | "```MySQL\n", 298 | "SELECT MarketDate,\n", 299 | "\t RowNumber,\n", 300 | "\t ClosingPrice,\n", 301 | "\t CASE WHEN RowNumber > 9 THEN \"10D_roll_avg\"\n", 302 | "\t\t\tELSE NULL\n", 303 | "\t\t\tEND AS \"10D_roll_avg\",\n", 304 | "\t CASE WHEN RowNumber > 29 THEN \"30D_roll_avg\"\n", 305 | "\t\t\tELSE NULL\n", 306 | "\t\t\tEND AS \"30D_roll_avg\",\n", 307 | "FROM (SELECT MarketDate,\n", 308 | "\t\t\t ClosingPrice,\n", 309 | "\t ROW_NUMBER() OVER(ORDER BY MarketDate ASC) AS RowNumber,\n", 310 | "\t AVG(ClosingPrice) OVER (ORDER BY MarketDate ASC\n", 311 | "\t\t\t\t\t\t\t ROWS BETWEEN '9' PRECEDING AND CURRENT ROW) AS \"10D_roll_avg\",\n", 312 | "\t AVG(ClosingPrice) OVER (ORDER BY MarketDate ASC\n", 313 | "\t\t\t\t\t\t\t ROWS BETWEEN '29' PRECEDING AND CURRENT ROW) AS \"30D_roll_avg\"\n", 314 | "\t FROM DailyQuote) AS subquery\n", 315 | "```" 316 | ] 317 | } 318 | ], 319 | "metadata": { 320 | "kernelspec": { 321 | "display_name": "Python 3", 322 | "language": "python", 323 | "name": "python3" 324 | }, 325 | "language_info": { 326 | "codemirror_mode": { 327 | "name": "ipython", 328 | "version": 3 329 | }, 330 | "file_extension": ".py", 331 | "mimetype": "text/x-python", 332 | "name": "python", 333 | "nbconvert_exporter": "python", 334 | "pygments_lexer": "ipython3", 335 | "version": "3.6.8" 336 | } 337 | }, 338 | "nbformat": 4, 339 | "nbformat_minor": 2 340 | } 341 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Boom Devahastin Na Ayudhya 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /datasets/DailyQuote_essentialsql.txt: -------------------------------------------------------------------------------- 1 | CREATE TABLE DailyQuote ( 2 | MarketDate date, 3 | ClosingPrice float 4 | ); 5 | 6 | INSERT INTO DailyQuote VALUES ('2016/12/30', 62.14); 7 | INSERT INTO DailyQuote VALUES ('2016/12/29', 62.90); 8 | INSERT INTO DailyQuote VALUES ('2016/12/28', 62.99); 9 | INSERT INTO DailyQuote VALUES ('2016/12/27', 63.28); 10 | INSERT INTO DailyQuote VALUES ('2016/12/23', 63.24); 11 | INSERT INTO DailyQuote VALUES ('2016/12/22', 63.55); 12 | INSERT INTO DailyQuote VALUES ('2016/12/21', 63.54); 13 | INSERT INTO DailyQuote VALUES ('2016/12/20', 63.54); 14 | INSERT INTO DailyQuote VALUES ('2016/12/19', 63.62); 15 | INSERT INTO DailyQuote VALUES ('2016/12/16', 62.30); 16 | INSERT INTO DailyQuote VALUES ('2016/12/15', 62.58); 17 | INSERT INTO DailyQuote VALUES ('2016/12/14', 62.68); 18 | INSERT INTO DailyQuote VALUES ('2016/12/13', 62.98); 19 | INSERT INTO DailyQuote VALUES ('2016/12/12', 62.17); 20 | INSERT INTO DailyQuote VALUES ('2016/12/09', 61.97); 21 | INSERT INTO DailyQuote VALUES ('2016/12/08', 61.01); 22 | INSERT INTO DailyQuote VALUES ('2016/12/07', 61.37); 23 | INSERT INTO DailyQuote VALUES ('2016/12/06', 59.95); 24 | INSERT INTO DailyQuote VALUES ('2016/12/05', 60.22); 25 | INSERT INTO DailyQuote VALUES ('2016/12/02', 59.25); 26 | INSERT INTO DailyQuote VALUES ('2016/12/01', 59.20); 27 | INSERT INTO DailyQuote VALUES ('2016/11/30', 60.26); 28 | INSERT INTO DailyQuote VALUES ('2016/11/29', 61.09); 29 | INSERT INTO DailyQuote VALUES ('2016/11/28', 60.61); 30 | INSERT INTO DailyQuote VALUES ('2016/11/25', 60.53); 31 | INSERT INTO DailyQuote VALUES ('2016/11/23', 60.40); 32 | INSERT INTO DailyQuote VALUES ('2016/11/22', 61.12); 33 | INSERT INTO DailyQuote VALUES ('2016/11/21', 60.86); 34 | INSERT INTO DailyQuote VALUES ('2016/11/18', 60.35); 35 | INSERT INTO DailyQuote VALUES ('2016/11/17', 60.64); 36 | INSERT INTO DailyQuote VALUES ('2016/11/16', 59.65); 37 | INSERT INTO DailyQuote VALUES ('2016/11/15', 58.87); 38 | INSERT INTO DailyQuote VALUES ('2016/11/14', 58.12); 39 | INSERT INTO DailyQuote VALUES ('2016/11/11', 59.02); 40 | INSERT INTO DailyQuote VALUES ('2016/11/10', 58.70); 41 | INSERT INTO DailyQuote VALUES ('2016/11/09', 60.17); 42 | INSERT INTO DailyQuote VALUES ('2016/11/08', 60.47); 43 | INSERT INTO DailyQuote VALUES ('2016/11/07', 60.42); 44 | INSERT INTO DailyQuote VALUES ('2016/11/04', 58.71); 45 | INSERT INTO DailyQuote VALUES ('2016/11/03', 59.21); 46 | INSERT INTO DailyQuote VALUES ('2016/11/02', 59.43); 47 | INSERT INTO DailyQuote VALUES ('2016/11/01', 59.80); 48 | INSERT INTO DailyQuote VALUES ('2016/10/31', 59.92); 49 | INSERT INTO DailyQuote VALUES ('2016/10/28', 59.87); 50 | INSERT INTO DailyQuote VALUES ('2016/10/27', 60.10); 51 | INSERT INTO DailyQuote VALUES ('2016/10/26', 60.63); 52 | INSERT INTO DailyQuote VALUES ('2016/10/25', 60.99); 53 | INSERT INTO DailyQuote VALUES ('2016/10/24', 61.00); 54 | INSERT INTO DailyQuote VALUES ('2016/10/21', 59.66); 55 | INSERT INTO DailyQuote VALUES ('2016/10/20', 57.25); 56 | INSERT INTO DailyQuote VALUES ('2016/10/19', 57.53); 57 | INSERT INTO DailyQuote VALUES ('2016/10/18', 57.66); 58 | INSERT INTO DailyQuote VALUES ('2016/10/17', 57.22); 59 | INSERT INTO DailyQuote VALUES ('2016/10/14', 57.42); 60 | INSERT INTO DailyQuote VALUES ('2016/10/13', 56.92); 61 | INSERT INTO DailyQuote VALUES ('2016/10/12', 57.11); 62 | INSERT INTO DailyQuote VALUES ('2016/10/11', 57.19); 63 | INSERT INTO DailyQuote VALUES ('2016/10/10', 58.04); 64 | INSERT INTO DailyQuote VALUES ('2016/10/07', 57.80); 65 | INSERT INTO DailyQuote VALUES ('2016/10/06', 57.74); 66 | INSERT INTO DailyQuote VALUES ('2016/10/05', 57.64); 67 | INSERT INTO DailyQuote VALUES ('2016/10/04', 57.24); 68 | INSERT INTO DailyQuote VALUES ('2016/10/03', 57.42); 69 | INSERT INTO DailyQuote VALUES ('2016/09/30', 57.60); 70 | INSERT INTO DailyQuote VALUES ('2016/09/29', 57.40); 71 | INSERT INTO DailyQuote VALUES ('2016/09/28', 58.03); 72 | INSERT INTO DailyQuote VALUES ('2016/09/27', 57.95); 73 | INSERT INTO DailyQuote VALUES ('2016/09/26', 56.90); 74 | INSERT INTO DailyQuote VALUES ('2016/09/23', 57.43); 75 | INSERT INTO DailyQuote VALUES ('2016/09/22', 57.82); 76 | INSERT INTO DailyQuote VALUES ('2016/09/21', 57.76); 77 | INSERT INTO DailyQuote VALUES ('2016/09/20', 56.81); 78 | INSERT INTO DailyQuote VALUES ('2016/09/19', 56.93); 79 | INSERT INTO DailyQuote VALUES ('2016/09/16', 57.25); 80 | INSERT INTO DailyQuote VALUES ('2016/09/15', 57.19); 81 | INSERT INTO DailyQuote VALUES ('2016/09/14', 56.26); 82 | INSERT INTO DailyQuote VALUES ('2016/09/13', 56.53); 83 | INSERT INTO DailyQuote VALUES ('2016/09/12', 57.05); 84 | INSERT INTO DailyQuote VALUES ('2016/09/09', 56.21); 85 | INSERT INTO DailyQuote VALUES ('2016/09/08', 57.43); 86 | INSERT INTO DailyQuote VALUES ('2016/09/07', 57.66); 87 | INSERT INTO DailyQuote VALUES ('2016/09/06', 57.61); 88 | INSERT INTO DailyQuote VALUES ('2016/09/02', 57.67); 89 | INSERT INTO DailyQuote VALUES ('2016/09/01', 57.59); 90 | INSERT INTO DailyQuote VALUES ('2016/08/31', 57.46); 91 | INSERT INTO DailyQuote VALUES ('2016/08/30', 57.89); 92 | INSERT INTO DailyQuote VALUES ('2016/08/29', 58.10); 93 | INSERT INTO DailyQuote VALUES ('2016/08/26', 58.03); 94 | INSERT INTO DailyQuote VALUES ('2016/08/25', 58.17); 95 | INSERT INTO DailyQuote VALUES ('2016/08/24', 57.95); 96 | INSERT INTO DailyQuote VALUES ('2016/08/23', 57.89); 97 | INSERT INTO DailyQuote VALUES ('2016/08/22', 57.67); 98 | INSERT INTO DailyQuote VALUES ('2016/08/19', 57.62); 99 | INSERT INTO DailyQuote VALUES ('2016/08/18', 57.60); 100 | INSERT INTO DailyQuote VALUES ('2016/08/17', 57.56); 101 | INSERT INTO DailyQuote VALUES ('2016/08/16', 57.44); 102 | INSERT INTO DailyQuote VALUES ('2016/08/15', 58.12); 103 | INSERT INTO DailyQuote VALUES ('2016/08/12', 57.94); 104 | INSERT INTO DailyQuote VALUES ('2016/08/11', 58.30); 105 | INSERT INTO DailyQuote VALUES ('2016/08/10', 58.02); 106 | INSERT INTO DailyQuote VALUES ('2016/08/09', 58.20); 107 | INSERT INTO DailyQuote VALUES ('2016/08/08', 58.06); 108 | INSERT INTO DailyQuote VALUES ('2016/08/05', 57.96); 109 | INSERT INTO DailyQuote VALUES ('2016/08/04', 57.39); 110 | INSERT INTO DailyQuote VALUES ('2016/08/03', 56.97); 111 | INSERT INTO DailyQuote VALUES ('2016/08/02', 56.58); 112 | INSERT INTO DailyQuote VALUES ('2016/08/01', 56.58); 113 | INSERT INTO DailyQuote VALUES ('2016/07/29', 56.68); 114 | INSERT INTO DailyQuote VALUES ('2016/07/28', 56.21); 115 | INSERT INTO DailyQuote VALUES ('2016/07/27', 56.19); 116 | INSERT INTO DailyQuote VALUES ('2016/07/26', 56.76); 117 | INSERT INTO DailyQuote VALUES ('2016/07/25', 56.73); 118 | INSERT INTO DailyQuote VALUES ('2016/07/22', 56.57); 119 | INSERT INTO DailyQuote VALUES ('2016/07/21', 55.80); 120 | INSERT INTO DailyQuote VALUES ('2016/07/20', 55.91); 121 | INSERT INTO DailyQuote VALUES ('2016/07/19', 53.09); 122 | INSERT INTO DailyQuote VALUES ('2016/07/18', 53.96); 123 | INSERT INTO DailyQuote VALUES ('2016/07/15', 53.70); 124 | INSERT INTO DailyQuote VALUES ('2016/07/14', 53.74); 125 | INSERT INTO DailyQuote VALUES ('2016/07/13', 53.51); 126 | INSERT INTO DailyQuote VALUES ('2016/07/12', 53.21); 127 | INSERT INTO DailyQuote VALUES ('2016/07/11', 52.59); 128 | INSERT INTO DailyQuote VALUES ('2016/07/08', 52.30); 129 | INSERT INTO DailyQuote VALUES ('2016/07/07', 51.38); 130 | INSERT INTO DailyQuote VALUES ('2016/07/06', 51.38); 131 | INSERT INTO DailyQuote VALUES ('2016/07/05', 51.17); 132 | INSERT INTO DailyQuote VALUES ('2016/07/01', 51.16); 133 | INSERT INTO DailyQuote VALUES ('2016/06/30', 51.17); 134 | INSERT INTO DailyQuote VALUES ('2016/06/29', 50.54); 135 | INSERT INTO DailyQuote VALUES ('2016/06/28', 49.44); 136 | INSERT INTO DailyQuote VALUES ('2016/06/27', 48.43); 137 | INSERT INTO DailyQuote VALUES ('2016/06/24', 49.83); 138 | INSERT INTO DailyQuote VALUES ('2016/06/23', 51.91); 139 | INSERT INTO DailyQuote VALUES ('2016/06/22', 50.99); 140 | INSERT INTO DailyQuote VALUES ('2016/06/21', 51.19); 141 | INSERT INTO DailyQuote VALUES ('2016/06/20', 50.07); 142 | INSERT INTO DailyQuote VALUES ('2016/06/17', 50.13); 143 | INSERT INTO DailyQuote VALUES ('2016/06/16', 50.39); 144 | INSERT INTO DailyQuote VALUES ('2016/06/15', 49.69); 145 | INSERT INTO DailyQuote VALUES ('2016/06/14', 49.83); 146 | INSERT INTO DailyQuote VALUES ('2016/06/13', 50.14); 147 | INSERT INTO DailyQuote VALUES ('2016/06/10', 51.48); 148 | INSERT INTO DailyQuote VALUES ('2016/06/09', 51.62); 149 | INSERT INTO DailyQuote VALUES ('2016/06/08', 52.04); 150 | INSERT INTO DailyQuote VALUES ('2016/06/07', 52.10); 151 | INSERT INTO DailyQuote VALUES ('2016/06/06', 52.13); 152 | INSERT INTO DailyQuote VALUES ('2016/06/03', 51.79); 153 | INSERT INTO DailyQuote VALUES ('2016/06/02', 52.48); 154 | INSERT INTO DailyQuote VALUES ('2016/06/01', 52.85); 155 | INSERT INTO DailyQuote VALUES ('2016/05/31', 53.00); 156 | INSERT INTO DailyQuote VALUES ('2016/05/27', 52.32); 157 | INSERT INTO DailyQuote VALUES ('2016/05/26', 51.89); 158 | INSERT INTO DailyQuote VALUES ('2016/05/25', 52.12); 159 | INSERT INTO DailyQuote VALUES ('2016/05/24', 51.59); 160 | INSERT INTO DailyQuote VALUES ('2016/05/23', 50.03); 161 | INSERT INTO DailyQuote VALUES ('2016/05/20', 50.62); 162 | INSERT INTO DailyQuote VALUES ('2016/05/19', 50.32); 163 | INSERT INTO DailyQuote VALUES ('2016/05/18', 50.81); 164 | INSERT INTO DailyQuote VALUES ('2016/05/17', 50.51); 165 | INSERT INTO DailyQuote VALUES ('2016/05/16', 51.83); 166 | INSERT INTO DailyQuote VALUES ('2016/05/13', 51.08); 167 | INSERT INTO DailyQuote VALUES ('2016/05/12', 51.51); 168 | INSERT INTO DailyQuote VALUES ('2016/05/11', 51.05); 169 | INSERT INTO DailyQuote VALUES ('2016/05/10', 51.02); 170 | INSERT INTO DailyQuote VALUES ('2016/05/09', 50.07); 171 | INSERT INTO DailyQuote VALUES ('2016/05/06', 50.39); 172 | INSERT INTO DailyQuote VALUES ('2016/05/05', 49.94); 173 | INSERT INTO DailyQuote VALUES ('2016/05/04', 49.87); 174 | INSERT INTO DailyQuote VALUES ('2016/05/03', 49.78); 175 | INSERT INTO DailyQuote VALUES ('2016/05/02', 50.61); 176 | INSERT INTO DailyQuote VALUES ('2016/04/29', 49.87); 177 | INSERT INTO DailyQuote VALUES ('2016/04/28', 49.90); 178 | INSERT INTO DailyQuote VALUES ('2016/04/27', 50.94); 179 | INSERT INTO DailyQuote VALUES ('2016/04/26', 51.44); 180 | INSERT INTO DailyQuote VALUES ('2016/04/25', 52.11); 181 | INSERT INTO DailyQuote VALUES ('2016/04/22', 51.78); 182 | INSERT INTO DailyQuote VALUES ('2016/04/21', 55.78); 183 | INSERT INTO DailyQuote VALUES ('2016/04/20', 55.59); 184 | INSERT INTO DailyQuote VALUES ('2016/04/19', 56.39); 185 | INSERT INTO DailyQuote VALUES ('2016/04/18', 56.46); 186 | INSERT INTO DailyQuote VALUES ('2016/04/15', 55.65); 187 | INSERT INTO DailyQuote VALUES ('2016/04/14', 55.36); 188 | INSERT INTO DailyQuote VALUES ('2016/04/13', 55.35); 189 | INSERT INTO DailyQuote VALUES ('2016/04/12', 54.65); 190 | INSERT INTO DailyQuote VALUES ('2016/04/11', 54.31); 191 | INSERT INTO DailyQuote VALUES ('2016/04/08', 54.42); 192 | INSERT INTO DailyQuote VALUES ('2016/04/07', 54.46); 193 | INSERT INTO DailyQuote VALUES ('2016/04/06', 55.12); 194 | INSERT INTO DailyQuote VALUES ('2016/04/05', 54.56); 195 | INSERT INTO DailyQuote VALUES ('2016/04/04', 55.43); 196 | INSERT INTO DailyQuote VALUES ('2016/04/01', 55.57); 197 | INSERT INTO DailyQuote VALUES ('2016/03/31', 55.23); 198 | INSERT INTO DailyQuote VALUES ('2016/03/30', 55.05); 199 | INSERT INTO DailyQuote VALUES ('2016/03/29', 54.71); 200 | INSERT INTO DailyQuote VALUES ('2016/03/28', 53.54); 201 | INSERT INTO DailyQuote VALUES ('2016/03/24', 54.21); 202 | INSERT INTO DailyQuote VALUES ('2016/03/23', 53.97); 203 | INSERT INTO DailyQuote VALUES ('2016/03/22', 54.07); 204 | INSERT INTO DailyQuote VALUES ('2016/03/21', 53.86); 205 | INSERT INTO DailyQuote VALUES ('2016/03/18', 53.49); 206 | INSERT INTO DailyQuote VALUES ('2016/03/17', 54.66); 207 | INSERT INTO DailyQuote VALUES ('2016/03/16', 54.35); 208 | INSERT INTO DailyQuote VALUES ('2016/03/15', 53.59); 209 | INSERT INTO DailyQuote VALUES ('2016/03/14', 53.17); 210 | INSERT INTO DailyQuote VALUES ('2016/03/11', 53.07); 211 | INSERT INTO DailyQuote VALUES ('2016/03/10', 52.05); 212 | INSERT INTO DailyQuote VALUES ('2016/03/09', 52.84); 213 | INSERT INTO DailyQuote VALUES ('2016/03/08', 51.65); 214 | INSERT INTO DailyQuote VALUES ('2016/03/07', 51.03); 215 | INSERT INTO DailyQuote VALUES ('2016/03/04', 52.03); 216 | INSERT INTO DailyQuote VALUES ('2016/03/03', 52.35); 217 | INSERT INTO DailyQuote VALUES ('2016/03/02', 52.95); 218 | INSERT INTO DailyQuote VALUES ('2016/03/01', 52.58); 219 | INSERT INTO DailyQuote VALUES ('2016/02/29', 50.88); 220 | INSERT INTO DailyQuote VALUES ('2016/02/26', 51.30); 221 | INSERT INTO DailyQuote VALUES ('2016/02/25', 52.10); 222 | INSERT INTO DailyQuote VALUES ('2016/02/24', 51.36); 223 | INSERT INTO DailyQuote VALUES ('2016/02/23', 51.18); 224 | INSERT INTO DailyQuote VALUES ('2016/02/22', 52.65); 225 | INSERT INTO DailyQuote VALUES ('2016/02/19', 51.82); 226 | INSERT INTO DailyQuote VALUES ('2016/02/18', 52.19); 227 | INSERT INTO DailyQuote VALUES ('2016/02/17', 52.42); 228 | INSERT INTO DailyQuote VALUES ('2016/02/16', 51.09); 229 | INSERT INTO DailyQuote VALUES ('2016/02/12', 50.50); 230 | INSERT INTO DailyQuote VALUES ('2016/02/11', 49.69); 231 | INSERT INTO DailyQuote VALUES ('2016/02/10', 49.71); 232 | INSERT INTO DailyQuote VALUES ('2016/02/09', 49.28); 233 | INSERT INTO DailyQuote VALUES ('2016/02/08', 49.41); 234 | INSERT INTO DailyQuote VALUES ('2016/02/05', 50.16); 235 | INSERT INTO DailyQuote VALUES ('2016/02/04', 52.00); 236 | INSERT INTO DailyQuote VALUES ('2016/02/03', 52.16); 237 | INSERT INTO DailyQuote VALUES ('2016/02/02', 53.00); 238 | INSERT INTO DailyQuote VALUES ('2016/02/01', 54.71); 239 | INSERT INTO DailyQuote VALUES ('2016/01/29', 55.09); 240 | INSERT INTO DailyQuote VALUES ('2016/01/28', 52.06); 241 | INSERT INTO DailyQuote VALUES ('2016/01/27', 51.22); 242 | INSERT INTO DailyQuote VALUES ('2016/01/26', 52.17); 243 | INSERT INTO DailyQuote VALUES ('2016/01/25', 51.79); 244 | INSERT INTO DailyQuote VALUES ('2016/01/22', 52.29); 245 | INSERT INTO DailyQuote VALUES ('2016/01/21', 50.48); 246 | INSERT INTO DailyQuote VALUES ('2016/01/20', 50.79); 247 | INSERT INTO DailyQuote VALUES ('2016/01/19', 50.56); 248 | INSERT INTO DailyQuote VALUES ('2016/01/15', 50.99); 249 | INSERT INTO DailyQuote VALUES ('2016/01/14', 53.11); 250 | INSERT INTO DailyQuote VALUES ('2016/01/13', 51.64); 251 | INSERT INTO DailyQuote VALUES ('2016/01/12', 52.78); 252 | INSERT INTO DailyQuote VALUES ('2016/01/11', 52.30); 253 | INSERT INTO DailyQuote VALUES ('2016/01/08', 52.33); 254 | INSERT INTO DailyQuote VALUES ('2016/01/07', 52.17); 255 | INSERT INTO DailyQuote VALUES ('2016/01/06', 54.05); 256 | INSERT INTO DailyQuote VALUES ('2016/01/05', 55.05); 257 | INSERT INTO DailyQuote VALUES ('2016/01/04', 54.80); -------------------------------------------------------------------------------- /datasets/GoT_Schema.txt: -------------------------------------------------------------------------------- 1 | create table houses ( 2 | id int, 3 | name varchar(100), 4 | domain varchar(100), 5 | primary key (id) 6 | ); 7 | 8 | insert into houses values (1,'stark','winterfell'); 9 | insert into houses values (2,'greyjoy','pyke'); 10 | insert into houses values (3,'lannister','casterly rock'); 11 | insert into houses values (4,'martell','sunspear'); 12 | insert into houses values (5,'tyrell','highgarden'); 13 | insert into houses values (6,'targaryen','king''s landing'); 14 | insert into houses values (7,'baratheon','storm''s end'); 15 | insert into houses values (8,'tully','riverrun'); 16 | insert into houses values (9,'arryn','vale'); 17 | 18 | ---------------------------------------------------- 19 | create table bannermen ( 20 | id int, 21 | name varchar(100), 22 | size int, 23 | leader_house_id int, 24 | primary key (id) 25 | ); 26 | 27 | insert into bannermen values (1, 'karstark', 50, 1); 28 | insert into bannermen values (2, 'clegane', 15, 3); 29 | insert into bannermen values (3, 'hightower', 25, 5); 30 | insert into bannermen values (4, 'glover', 45, 1); 31 | insert into bannermen values (5, 'mormont', 35, 1); 32 | insert into bannermen values (6, 'blackwood', 25, 8); 33 | insert into bannermen values (7, 'florent', 10, 5); 34 | insert into bannermen values (8, 'tarly', 20, 5); 35 | insert into bannermen values (9, 'selmy', 40, 7); 36 | insert into bannermen values (10, 'tarth', 20, 7); 37 | insert into bannermen values (11, 'harlaw', 15, 2); 38 | insert into bannermen values (12, 'umber', 10, 1); 39 | insert into bannermen values (13, 'whent', 25, 8); 40 | 41 | ---------------------------------------------------- 42 | create table people ( 43 | id int, 44 | first_name varchar(100), 45 | house varchar(100), 46 | nickname varchar(100), 47 | alive int, 48 | primary key (id) 49 | ); 50 | 51 | insert into people values (1, 'jaime', 'lannister', 'kingslayer', 1); 52 | insert into people values (2, 'eddard', 'stark', 'ned', 0); 53 | insert into people values (3, 'oberyn', 'martell', 'viper', 0); 54 | insert into people values (4, 'robert', 'baratheon', NULL, 0); 55 | insert into people values (5, 'arya', 'stark', 'no one', 1); 56 | insert into people values (6, 'aegon', 'targaryen', 'jon', 1); 57 | insert into people values (7, 'theon', 'greyjoy', 'reek', 0); 58 | insert into people values (8, 'yara', 'greyjoy', NULL, 1); 59 | insert into people values (9, 'tywin', 'lannister', NULL, 0); 60 | insert into people values (10, 'brynden', 'tully', 'blackfish', 0); 61 | insert into people values (11, 'cersei', 'lannister', 'queen', 1); 62 | insert into people values (12, 'brandon', 'stark', 'three-eyed raven', 1); 63 | insert into people values (13, 'sansa', 'stark', NULL, 1); 64 | insert into people values (14, 'tyrion', 'lannister', 'imp', 1); 65 | insert into people values (15, 'brienne', 'tarth', NULL, 1); 66 | insert into people values (16, 'samwell', 'tarly', 'sam', 1); 67 | insert into people values (17, 'great jon', 'umber', NULL, 0); 68 | insert into people values (18, 'barristan', 'selmy', NULL, 0); 69 | insert into people values (19, 'aerys', 'targaryen', 'mad king', 0); 70 | insert into people values (20, 'alannys', 'harlow', NULL, 0); --------------------------------------------------------------------------------