├── placeholder.sql ├── .gitattributes ├── LICENSE └── README.md /placeholder.sql: -------------------------------------------------------------------------------- 1 | SELECT * -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | *.sql linguist-detectable=true 2 | *.sql linguist-language=sql 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Ben 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SQL tips and tricks 2 | 3 | [![Stand With Ukraine](https://raw.githubusercontent.com/vshymanskyy/StandWithUkraine/main/badges/StandWithUkraine.svg)](https://stand-with-ukraine.pp.ua) 4 | 5 | [![Ceasefire Now](https://badge.techforpalestine.org/default)](https://techforpalestine.org/learn-more) 6 | 7 | A (somewhat opinionated) list of SQL tips and tricks that I've picked up over the years. 8 | 9 | There's so much you can do with SQL but I've focused on what I find most useful in my day-to-day work as a data analyst and what 10 | I wish I had known when I first started writing SQL. 11 | 12 | Please note that some of these tips might not be relevant for all RDBMSs. 13 | 14 | ## Table of contents 15 | 16 | ### Formatting/readability 17 | 18 | - [Use a leading comma to separate fields](#use-a-leading-comma-to-separate-fields) 19 | - [Use a dummy value in the `WHERE` clause](#use-a-dummy-value-in-the-where-clause) 20 | - [Indent your code](#indent-your-code) 21 | - [Consider CTEs when writing complex queries](#consider-ctes-when-writing-complex-queries) 22 | - [Comment your code](#comment-your-code) 23 | - [Simplify joins with USING](#simplify-joins-with-using) 24 | 25 | ### Data wrangling 26 | - [Anti-joins will return rows from one table that have no match in another table](#anti-joins-will-return-rows-from-one-table-that-have-no-match-in-another-table) 27 | - [Use `QUALIFY` to filter window functions](#use-qualify-to-filter-window-functions) 28 | - [You can (but shouldn't always) `GROUP BY` column position](#you-can-but-shouldnt-always-group-by-column-position) 29 | - [Create a grand total with `GROUP BY ROLLUP`](#create-a-grand-total-with-group-by-rollup) 30 | - [Use `EXCEPT` to find the difference between two tables](#use-except-to-find-the-difference-between-two-tables) 31 | 32 | ### Performance 33 | 34 | - [`NOT EXISTS` is faster than `NOT IN` if your column allows `NULL`](#not-exists-is-faster-than-not-in-if-your-column-allows-null) 35 | - [Implicit casting will slow down (or break) ](#implicit-casting-will-slow-down-or-break-your-query) 36 | 37 | ### Common mistakes 38 | 39 | - [Be aware of how `NOT IN` behaves with `NULL` values](#be-aware-of-how-not-in-behaves-with-null-values) 40 | - [Avoid ambiguity when naming calculated fields](#avoid-ambiguity-when-naming-calculated-fields) 41 | - [Always specify which column belongs to which table](#always-specify-which-column-belongs-to-which-table) 42 | 43 | ### Miscellaneous 44 | 45 | - [Understand the order of execution](#understand-the-order-of-execution) 46 | - [Read the documentation (in full)](#read-the-documentation-in-full) 47 | - [Use descriptive names for your saved queries](#use-descriptive-names-for-your-saved-queries) 48 | 49 | 50 | ## Formatting/readability 51 | ### Use a leading comma to separate fields 52 | 53 | Use a leading comma to separate fields in the `SELECT` clause rather than a trailing comma. 54 | 55 | - Clearly defines that this is a new column vs code that's wrapped to multiple lines. 56 | 57 | - Visual cue to easily identify if the comma is missing or not. Varying line lengths makes it harder to determine. 58 | 59 | ```SQL 60 | SELECT 61 | employee_id 62 | , employee_name 63 | , job 64 | , salary 65 | FROM employees 66 | ; 67 | ``` 68 | 69 | - Also use a leading `AND` in the `WHERE` clause, for the same reasons (following tip demonstrates this). 70 | 71 | ----- 72 | 73 | ### **Use a dummy value in the WHERE clause** 74 | Use a dummy value in the `WHERE` clause so you can easily comment out conditions when testing or tweaking a query. 75 | 76 | ```SQL 77 | /* 78 | If I want to comment out the job 79 | condition the following query 80 | will break: 81 | */ 82 | SELECT * 83 | FROM employees 84 | WHERE 85 | --job IN ('Clerk', 'Manager') 86 | AND dept_no != 5 87 | ; 88 | 89 | /* 90 | With a dummy value there's no issue. 91 | I can comment out all the conditions 92 | and 1=1 will ensure the query still runs: 93 | */ 94 | SELECT * 95 | FROM employees 96 | WHERE 1=1 97 | -- AND job IN ('Clerk', 'Manager') 98 | AND dept_no != 5 99 | ; 100 | ``` 101 | 102 | ----- 103 | 104 | ### Indent your code 105 | Indent your code to make it more readable to colleagues and your future self. 106 | 107 | Opinions will vary on what this looks like, so be sure to follow your company/team's guidelines or, if that doesn't exist, go with whatever works for you. 108 | 109 | You can also use an online formatter like [poorsql](https://poorsql.com/) or a linter like [sqlfluff](https://github.com/sqlfluff/sqlfluff). 110 | 111 | ``` SQL 112 | SELECT 113 | -- Bad: 114 | vc.video_id 115 | , CASE WHEN meta.GENRE IN ('Drama', 'Comedy') THEN 'Entertainment' ELSE meta.GENRE END as content_type 116 | FROM video_content AS vc 117 | INNER JOIN metadata ON vc.video_id = metadata.video_id 118 | ; 119 | 120 | -- Good: 121 | SELECT 122 | vc.video_id 123 | , CASE 124 | WHEN meta.GENRE IN ('Drama', 'Comedy') THEN 'Entertainment' 125 | ELSE meta.GENRE 126 | END AS content_type 127 | FROM video_content 128 | INNER JOIN metadata 129 | ON video_content.video_id = metadata.video_id 130 | ; 131 | ``` 132 | ----- 133 | 134 | ### Consider CTEs when writing complex queries 135 | For longer than I'd care to admit I would nest inline views, which would lead to 136 | queries that were hard to understand, particularly if revisited after a few weeks. 137 | 138 | If you find yourself nesting inline views more than 2 or 3 levels deep, 139 | consider using common table expressions, which keep your code more organised and readable and supports reusability and debugging. 140 | 141 | ```SQL 142 | -- Using inline views: 143 | SELECT 144 | vhs.movie 145 | , vhs.vhs_revenue 146 | , cs.cinema_revenue 147 | FROM 148 | ( 149 | SELECT 150 | movie_id 151 | , SUM(ticket_sales) AS cinema_revenue 152 | FROM tickets 153 | GROUP BY movie_id 154 | ) AS cs 155 | INNER JOIN 156 | ( 157 | SELECT 158 | movie 159 | , movie_id 160 | , SUM(revenue) AS vhs_revenue 161 | FROM blockbuster 162 | GROUP BY movie, movie_id 163 | ) AS vhs 164 | ON cs.movie_id = vhs.movie_id 165 | ; 166 | 167 | -- Using CTEs: 168 | WITH cinema_sales AS 169 | ( 170 | SELECT 171 | movie_id 172 | , SUM(ticket_sales) AS cinema_revenue 173 | FROM tickets 174 | GROUP BY movie_id 175 | ), 176 | vhs_sales AS 177 | ( 178 | SELECT 179 | movie 180 | , movie_id 181 | , SUM(revenue) AS vhs_revenue 182 | FROM blockbuster 183 | GROUP BY movie, movie_id 184 | ) 185 | SELECT 186 | vhs.movie 187 | , vhs.vhs_revenue 188 | , cs.cinema_revenue 189 | FROM cinema_sales AS cs 190 | INNER JOIN vhs_sales AS vhs 191 | ON cs.movie_id = vhs.movie_id 192 | ; 193 | ``` 194 | 195 | ----- 196 | ### Comment your code 197 | While in the moment you know why you did something, if you revisit 198 | the code weeks, months or years later you might not remember. 199 | 200 | In general you should strive to write comments that explain why you did something, not how. 201 | 202 | Your colleagues and future self will thank you! 203 | 204 | ```SQL 205 | SELECT 206 | video_content.* 207 | FROM video_content 208 | LEFT JOIN archive 209 | ON video_content.video_id = archive.video_id 210 | WHERE 1=1 211 | -- Need to filter out as new CMS cannot process archive video formats: 212 | AND archive.video_id IS NULL 213 | ; 214 | ``` 215 | 216 | ----- 217 | ### Simplify joins with `USING` 218 | 219 | If you're joining using a column with the same name in two tables you can use `USING` to 220 | simplify your join: 221 | 222 | ```SQL 223 | -- USING: 224 | SELECT * 225 | FROM album 226 | INNER JOIN artist 227 | USING (artistid) 228 | 229 | -- Traditional ON clause: 230 | SELECT * 231 | FROM album 232 | INNER JOIN artist 233 | ON album.artistid = artist.ArtistId 234 | ``` 235 | 236 | The other benefit of `USING` is that the column in common between the two tables is deduplicated, with only one column returned in the result set. 237 | 238 | This means that there is no ambiguity, unlike the following query which would throw a `ambiguous column name` error as the database would not be sure 239 | which column to which you are referring if you are using the `ON` clause: 240 | 241 | ```SQL 242 | SELECT ArtistId -- Which table column? 243 | FROM album 244 | INNER JOIN artist 245 | ON album.artistid = artist.ArtistId 246 | ``` 247 | 248 | ## Data wrangling 249 | 250 | ### Anti-joins will return rows from one table that have no match in another table 251 | 252 | Use anti-joins when you want to return rows from one table that don't have a match in another table. 253 | 254 | For example, you only want video IDs of content that hasn't been archived. 255 | 256 | There are multiple ways to do an anti-join: 257 | 258 | ```SQL 259 | -- Using a LEFT JOIN: 260 | SELECT 261 | vc.video_id 262 | FROM video_content AS vc 263 | LEFT JOIN archive 264 | ON vc.video_id = archive.video_id 265 | WHERE 1=1 266 | AND archive.video_id IS NULL -- Any rows with no match will have a NULL value. 267 | ; 268 | 269 | -- Using NOT IN/subquery: 270 | SELECT 271 | video_id 272 | FROM video_content 273 | WHERE 1=1 274 | AND video_id NOT IN (SELECT video_id FROM archive) -- Be mindful of NULL values. 275 | 276 | -- Using NOT EXISTS/correlated subquery: 277 | SELECT 278 | video_id 279 | FROM video_content AS vc 280 | WHERE 1=1 281 | AND NOT EXISTS ( 282 | SELECT 1 283 | FROM archive AS a 284 | WHERE a.video_id = vc.video_id 285 | ) 286 | 287 | ``` 288 | 289 | Note that I advise against using `NOT IN` - see [this tip](#be-aware-of-how-not-in-behaves-with-null-values). 290 | 291 | ----- 292 | ### Use `QUALIFY` to filter window functions 293 | 294 | `QUALIFY` lets you filter the results of a query based on a window function, meaning you don't need 295 | to use an inline view to filter your result set and thus reducing the number of lines of code. 296 | 297 | For example, if I want to return the top 10 markets per product I can use 298 | `QUALIFY` rather than an inline view: 299 | 300 | ```SQL 301 | -- Using QUALIFY: 302 | SELECT 303 | product 304 | , market 305 | , SUM(revenue) AS market_revenue 306 | FROM sales 307 | GROUP BY product, market 308 | QUALIFY DENSE_RANK() OVER (PARTITION BY product ORDER BY SUM(revenue) DESC) <= 10 309 | ORDER BY product, market_revenue 310 | ; 311 | 312 | -- Without QUALIFY: 313 | SELECT 314 | product 315 | , market 316 | , market_revenue 317 | FROM 318 | ( 319 | SELECT 320 | product 321 | , market 322 | , SUM(revenue) AS market_revenue 323 | , DENSE_RANK() OVER (PARTITION BY product ORDER BY SUM(revenue) DESC) AS market_rank 324 | FROM sales 325 | GROUP BY product, market 326 | ) 327 | WHERE market_rank <= 10 328 | ORDER BY product, market_revenue 329 | ; 330 | ``` 331 | 332 | Unfortunately it looks like `QUALIFY` is only available in the big data warehouses (Snowflake, Amazon Redshift, Google BigQuery) but I had to include this because it's so useful. 333 | 334 | ----- 335 | ### You can (but shouldn't always) `GROUP BY` column position 336 | 337 | Instead of using the column name, you can `GROUP BY` or `ORDER BY` using the 338 | column position. 339 | 340 | - This can be useful for ad-hoc/one-off queries, but for production code 341 | you should always refer to a column by its name. 342 | 343 | ```SQL 344 | SELECT 345 | dept_no 346 | , SUM(salary) AS dept_salary 347 | FROM employees 348 | GROUP BY 1 -- dept_no is the first column in the SELECT clause. 349 | ORDER BY 2 DESC 350 | ; 351 | ``` 352 | 353 | ----- 354 | ### Create a grand total with `GROUP BY ROLLUP` 355 | Creating a grand total (and/or sub-total) row is possible thanks to `GROUP BY ROLLUP`. 356 | 357 | For example, if you've aggregated a company's employees salary per department you 358 | can use `GROUP BY ROLLUP` to create a grand total that applies your aggregate functions as if 359 | the specified grouping hadn't been applied (thus creating a grand total row). 360 | 361 | The [Transact-SQL documentation](https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver17) explains `GROUP BY ROLLUP` well: 362 | 363 | _"Creates a group for each combination of column expressions. In addition, it 'rolls up' the results into subtotals and grand totals. To do this, it moves from right to left decreasing the number of column expressions over which it creates groups and the aggregation(s)."_ 364 | 365 | You may want to apply `COALESCE`, as below, to ensure the total row is labelled as such. 366 | 367 | ```SQL 368 | SELECT 369 | COALESCE(dept_no, 'Total') AS department_number 370 | , SUM(salary) AS dept_salary 371 | FROM employees 372 | GROUP BY ROLLUP(dept_no) 373 | ORDER BY dept_salary -- Be sure to order by this column to ensure the Total appears last/at the bottom of the result set. 374 | ; 375 | ``` 376 | 377 | ----- 378 | ### Use `EXCEPT` to find the difference between two tables 379 | 380 | `EXCEPT` returns rows from the first query's result set that don't appear in the second query's result set. 381 | 382 | ```SQL 383 | /* 384 | Miles Davis will be returned from 385 | this query 386 | */ 387 | SELECT artist_name 388 | FROM artist 389 | WHERE artist_name = 'Miles Davis' 390 | EXCEPT 391 | SELECT artist_name 392 | FROM artist 393 | WHERE artist_name = 'Nirvana' 394 | ; 395 | 396 | /* 397 | Nothing will be returned from this 398 | query as 'Miles Davis' appears in 399 | both queries' result sets. 400 | */ 401 | SELECT artist_name 402 | FROM artist 403 | WHERE artist_name = 'Miles Davis' 404 | EXCEPT 405 | SELECT artist_name 406 | FROM artist 407 | WHERE artist_name = 'Miles Davis' 408 | ; 409 | ``` 410 | 411 | You can also utilise `EXCEPT` with `UNION ALL` to verify whether two tables have the same data. 412 | 413 | If no rows are returned the tables are identical - otherwise, what's returned are the rows causing the difference: 414 | 415 | ```SQL 416 | /* 417 | The first query will return rows from 418 | employees that aren't present in 419 | department. 420 | 421 | The second query will return rows from 422 | department that aren't present in employees. 423 | 424 | The UNION ALL will ensure that the 425 | final result set returned combines 426 | all of these rows so you know 427 | which rows are causing the difference. 428 | */ 429 | ( 430 | SELECT 431 | id 432 | , employee_name 433 | FROM employees 434 | EXCEPT 435 | SELECT 436 | id 437 | , employee_name 438 | FROM department 439 | ) 440 | UNION ALL 441 | ( 442 | SELECT 443 | id 444 | , employee_name 445 | FROM department 446 | EXCEPT 447 | SELECT 448 | id 449 | , employee_name 450 | FROM employees 451 | ) 452 | ; 453 | 454 | ``` 455 | 456 | ## Performance 457 | 458 | ### `NOT EXISTS` is faster than `NOT IN` if your column allows `NULL` 459 | 460 | `NOT IN` is usually slower than using `NOT EXISTS`, if the values/column you're comparing against allows `NULL`. 461 | 462 | I've experienced this when using Snowflake and the PostgreSQL Wiki explicitly [calls this out](https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN): 463 | 464 | *"...NOT IN (SELECT ...) does not optimize very well."* 465 | 466 | Aside from being slow, using `NOT IN` will not work as intended if there is a `NULL` in the values being compared against - see [tip 11](#be-aware-of-how-not-in-behaves-with-null-values). 467 | 468 | Why include this tip if `NOT IN` doesn't work with `NULL` values anyway? 469 | 470 | Well just because a column allows `NULL` values does not mean there **are** any `NULL` values present and if you're working with a table that you cannot alter you'll want to use `NOT EXISTS` to speed up your query. 471 | 472 | ----- 473 | 474 | ### Implicit casting will slow down (or break) your query 475 | 476 | If you specify a value with a different data type than a column's, your database may automatically (implicitly) convert the value. 477 | 478 | For example, let's say the `video_id` column has a string data type and you specify an integer in the `WHERE` clause: 479 | 480 | ```SQL 481 | SELECT video_name 482 | FROM video_content 483 | -- Behind the scenes the database will implicitly attempt to convert the video_id column to an integer: 484 | WHERE video_id = 200050 485 | ``` 486 | 487 | There's a couple of problems with relying on implicit casting: 488 | 489 | 1) An error may be thrown if the implicit conversion isn't possible - for example, if one of the video IDs has a string value of _'abc2000'_ 490 | 491 | 2) \*Your query may be slower, due to the additional work of converting each value to the specified data type. 492 | 493 | Instead, use the same data type as the column you're operating on (`WHERE video_ID = '200050'`) or, to avoid errors, use a function like [`TRY_TO_NUMBER`](https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal) that 494 | will attempt the conversion but handle any errors: 495 | 496 | ```SQL 497 | SELECT video_name 498 | FROM video_content 499 | -- This won't result in an error: 500 | WHERE TRY_TO_NUMBER(video_id) = 200050 501 | ``` 502 | 503 | \* Note that this depends on the size of the dataset being operated on. 504 | 505 | ## Common mistakes 506 | 507 | ### Be aware of how `NOT IN` behaves with `NULL` values 508 | 509 | `NOT IN` doesn't work if `NULL` is present in the values being checked against. As `NULL` represents Unknown the SQL engine can't verify that the value being checked is not present in the list. 510 | - Instead use `NOT EXISTS`. 511 | 512 | ``` SQL 513 | INSERT INTO departments (id) 514 | VALUES (1), (2), (NULL); 515 | 516 | -- Doesn't work due to NULL: 517 | SELECT * 518 | FROM employees 519 | WHERE department_id NOT IN (SELECT DISTINCT id from departments) 520 | ; 521 | 522 | -- Solution. 523 | SELECT * 524 | FROM employees e 525 | WHERE NOT EXISTS ( 526 | SELECT 1 527 | FROM departments d 528 | WHERE d.id = e.department_id 529 | ) 530 | ; 531 | ``` 532 | 533 | ----- 534 | ### Avoid ambiguity when naming calculated fields 535 | 536 | When creating a calculated field, naming it the same as an existing column can lead to unexpected behaviour. 537 | 538 | Note [Snowflake's documentation](https://docs.snowflake.com/en/sql-reference/constructs/group-by) on the topic: 539 | 540 | *"If a GROUP BY clause contains a name that matches both a column name and an alias, then the GROUP BY clause uses the column name."* 541 | 542 | For example you might expect the following to return 2 rows but what's actually returned is 3 rows: 543 | 544 | ```SQL 545 | CREATE TABLE products ( 546 | product VARCHAR(50) NOT NULL, 547 | revenue INT NOT NULL 548 | ) 549 | ; 550 | 551 | INSERT INTO products (product, revenue) 552 | VALUES 553 | ('Shark', 100), 554 | ('Robot', 150), 555 | ('Racecar', 90); 556 | 557 | SELECT 558 | LEFT(product, 1) AS product -- Returns the first letter of the product value. 559 | , MAX(revenue) as max_revenue 560 | FROM products 561 | GROUP BY product 562 | ; 563 | ``` 564 | 565 | |PRODUCT|MAX_REVENUE| 566 | |-------|------------| 567 | |S|100| 568 | |R|150| 569 | |R|90| 570 | 571 | What's happened is that the `LEFT` function has been applied after the product column has been 572 | grouped and aggregation applied. 573 | 574 | The solution is to use a unique alias or be more explicit in the `GROUP BY` clause: 575 | 576 | ```SQL 577 | -- Solution option 1: 578 | SELECT 579 | LEFT(product, 1) AS product_letter 580 | , MAX(revenue) AS max_revenue 581 | FROM products 582 | GROUP BY product_letter 583 | ; 584 | 585 | -- Solution option 2: 586 | SELECT 587 | LEFT(product, 1) AS product, 588 | , MAX(revenue) AS max_revenue 589 | FROM products 590 | GROUP BY LEFT(product, 1) 591 | ; 592 | ``` 593 | 594 | Result: 595 | 596 | |PRODUCT_LETTER|MAX_REVENUE| 597 | |--------------|-----------| 598 | |S|100| 599 | |R|150| 600 | 601 | 602 | Assigning an alias to a calculated field can also be problematic when it comes to window functions. 603 | 604 | In this example the `CASE` statement is being applied AFTER the window function has executed: 605 | 606 | ```SQL 607 | /* 608 | The window function will rank the 'Robot' product as 1 when it should be 3. 609 | */ 610 | SELECT 611 | product 612 | , CASE product WHEN 'Robot' THEN 0 ELSE revenue END AS revenue 613 | , RANK() OVER (ORDER BY revenue DESC) 614 | FROM products 615 | ; 616 | ``` 617 | 618 | Our earlier solutions apply: 619 | 620 | ```SQL 621 | /* 622 | Solution option 1 (note this might not work in all RDBMS, in which case use the other solution): 623 | */ 624 | SELECT 625 | product 626 | , CASE product WHEN 'Robot' THEN 0 ELSE revenue END AS updated_revenue 627 | , RANK() OVER (ORDER BY updated_revenue DESC) 628 | FROM products 629 | ; 630 | 631 | -- Solution option 2: 632 | SELECT 633 | product 634 | , CASE product WHEN 'Robot' THEN 0 ELSE revenue END AS revenue 635 | , RANK() OVER (ORDER BY CASE product WHEN 'Robot' THEN 0 ELSE revenue END DESC) 636 | FROM products 637 | ; 638 | ``` 639 | 640 | My advice - use a unique alias when possible to avoid confusion. 641 | 642 | ----- 643 | ### Always specify which column belongs to which table 644 | 645 | When you have complex queries with multiple joins, it pays to be able to 646 | trace back an issue with a value to its source. 647 | 648 | Additionally, your RDBMS might raise an error if two tables share the same 649 | column name and you don't specify which column you are using. 650 | 651 | ```SQL 652 | SELECT 653 | vc.video_id 654 | , vc.series_name 655 | , metadata.season 656 | , metadata.episode_number 657 | FROM video_content AS vc 658 | INNER JOIN video_metadata AS metadata 659 | ON vc.video_id = metadata.video_id 660 | ; 661 | ``` 662 | 663 | ## Miscellaneous 664 | 665 | ### Understand the order of execution 666 | If I had to give one piece of advice to someone learning SQL, it'd be to understand the order of 667 | execution (of clauses). It will completely change how you write queries. This [blog post](https://blog.jooq.org/a-beginners-guide-to-the-true-order-of-sql-operations/) is a fantastic resource for learning. 668 | 669 | ----- 670 | ### Read the documentation (in full) 671 | Using Snowflake I once needed to return the latest date from a list of columns 672 | and so I decided to use `GREATEST()`. 673 | 674 | What I didn't realise was that if one of the 675 | arguments is `NULL` then the function returns `NULL`. 676 | 677 | If I'd read the documentation in full I'd have known! In many cases it can take just a minute or less to scan 678 | the documentation and it will save you the headache of having to work 679 | out why something isn't working the way you expected: 680 | 681 | ```SQL 682 | /* 683 | If I'd read the documentation 684 | further I'd also have realised 685 | that my solution to the NULL 686 | problem with GREATEST()... 687 | */ 688 | 689 | SELECT COALESCE(GREATEST(signup_date, consumption_date), signup_date, consumption_date); 690 | 691 | /* 692 | ... could have been solved with the 693 | following function: 694 | */ 695 | SELECT GREATEST_IGNORE_NULLS(signup_date, consumption_date); 696 | ``` 697 | 698 | ----- 699 | ### Use descriptive names for your saved queries 700 | 701 | There's almost nothing worse than not being able to find a query you need to re-run/refer back to. 702 | 703 | Use a descriptive name when saving your queries so you can easily find what you're looking for. 704 | 705 | I usually will write the subject of the query, the month the query was ran and the name of the requester (if they exist). 706 | For example: `Lapsed users analysis - 2023-09-01 - Olivia Roberts` 707 | --------------------------------------------------------------------------------