└── README.md /README.md: -------------------------------------------------------------------------------- 1 | --- 2 | # Complete SQL Query Revision 3 | 4 | ## Table of Contents 5 | 1. [Foundation Understanding Databases](#foundation-understanding-databases) 6 | 2. [Basic SELECT Queries](#basic-select-queries) 7 | 3. [Filtering and Conditions](#filtering-and-conditions) 8 | 4. [Sorting and Limiting Results](#sorting-and-limiting-results) 9 | 5. [Working with Multiple Tables](#working-with-multiple-tables) 10 | 6. [Grouping and Aggregation](#grouping-and-aggregation) 11 | 7. [Advanced Filtering with Subqueries](#advanced-filtering-with-subqueries) 12 | 8. [Window Functions](#window-functions) 13 | 9. [Data Modification](#data-modification) 14 | 10. [Advanced Techniques](#advanced-techniques) 15 | 11. [Performance and Optimization](#performance-and-optimization) 16 | 12. [Practice Exercises](#practice-exercises) 17 | 18 | --- 19 | 20 | ## Foundation: Understanding Databases 21 | 22 | Before we dive into writing queries, let's establish what we're working with. Think of a database like a digital filing cabinet, but infinitely more organized and powerful. 23 | 24 | ### What is a Relational Database? 25 | 26 | A relational database stores information in tables, much like spreadsheets, but with strict rules about how data relates to each other. Each table represents a specific type of entity (like customers, orders, or products), and tables can reference each other through relationships. 27 | 28 | **Key Concepts:** 29 | 30 | - **Table**: A collection of related data organized in rows and columns 31 | - **Row (Record)**: A single entry in a table representing one instance of the entity 32 | - **Column (Field)**: A specific attribute or property of the entity 33 | - **Primary Key**: A unique identifier for each row in a table 34 | - **Foreign Key**: A reference to a primary key in another table, creating relationships 35 | 36 | ### Sample Database Schema 37 | 38 | Throughout this course, we'll use a fictional e-commerce database with these tables: 39 | 40 | ```sql 41 | -- Customers table 42 | customers ( 43 | customer_id INT PRIMARY KEY, 44 | first_name VARCHAR(50), 45 | last_name VARCHAR(50), 46 | email VARCHAR(100), 47 | registration_date DATE, 48 | city VARCHAR(50), 49 | country VARCHAR(50) 50 | ) 51 | 52 | -- Products table 53 | products ( 54 | product_id INT PRIMARY KEY, 55 | product_name VARCHAR(100), 56 | category VARCHAR(50), 57 | price DECIMAL(10,2), 58 | stock_quantity INT, 59 | supplier_id INT 60 | ) 61 | 62 | -- Orders table 63 | orders ( 64 | order_id INT PRIMARY KEY, 65 | customer_id INT, -- Foreign key to customers 66 | order_date DATE, 67 | total_amount DECIMAL(10,2), 68 | status VARCHAR(20) 69 | ) 70 | 71 | -- Order_items table (junction table for many-to-many relationship) 72 | order_items ( 73 | order_item_id INT PRIMARY KEY, 74 | order_id INT, -- Foreign key to orders 75 | product_id INT, -- Foreign key to products 76 | quantity INT, 77 | unit_price DECIMAL(10,2) 78 | ) 79 | ``` 80 | 81 | Think of this structure like a real business: customers place orders, orders contain multiple products, and each product has its own details. The relationships between these tables mirror real-world connections. 82 | 83 | --- 84 | 85 | ## Basic SELECT Queries 86 | 87 | The SELECT statement is your primary tool for retrieving information from a database. Think of it as asking the database a question - you specify what information you want and from which table. 88 | 89 | ### The Anatomy of a SELECT Statement 90 | 91 | ```sql 92 | SELECT column1, column2, column3 -- What data do you want? 93 | FROM table_name; -- Which table contains this data? 94 | ``` 95 | 96 | ### Your First Query 97 | 98 | Let's start with the most basic query - retrieving all data from a table: 99 | 100 | ```sql 101 | -- Get all information about all customers 102 | SELECT * FROM customers; 103 | ``` 104 | 105 | The asterisk (\*) is a wildcard that means "give me all columns." While convenient for exploration, it's generally better to specify exactly which columns you need. 106 | 107 | ### Selecting Specific Columns 108 | 109 | ```sql 110 | -- Get just the names and email addresses of customers 111 | SELECT first_name, last_name, email 112 | FROM customers; 113 | ``` 114 | 115 | This approach has several advantages: it's faster (less data transferred), clearer about your intentions, and more maintainable if the table structure changes. 116 | 117 | ### Column Aliases - Making Output Readable 118 | 119 | You can rename columns in your output using aliases, which is especially useful for calculated fields or when column names aren't user-friendly: 120 | 121 | ```sql 122 | -- Create more readable column headers 123 | SELECT 124 | first_name AS "First Name", 125 | last_name AS "Last Name", 126 | email AS "Email Address" 127 | FROM customers; 128 | ``` 129 | 130 | ### Simple Calculations 131 | 132 | SQL can perform calculations on numeric data: 133 | 134 | ```sql 135 | -- Calculate total value of each product in stock 136 | SELECT 137 | product_name, 138 | price, 139 | stock_quantity, 140 | price * stock_quantity AS total_inventory_value 141 | FROM products; 142 | ``` 143 | 144 | Notice how we created a new calculated column. The database multiplies the price by stock quantity for each row and displays the result under our chosen alias. 145 | 146 | --- 147 | 148 | ## Filtering and Conditions 149 | 150 | Real-world scenarios rarely require all data from a table. The WHERE clause allows you to specify conditions that rows must meet to be included in your results. 151 | 152 | ### Basic WHERE Conditions 153 | 154 | ```sql 155 | -- Find customers from a specific city 156 | SELECT first_name, last_name, city 157 | FROM customers 158 | WHERE city = 'New York'; 159 | ``` 160 | 161 | ### Comparison Operators 162 | 163 | SQL provides various operators for different types of comparisons: 164 | 165 | ```sql 166 | -- Products with price greater than $50 167 | SELECT product_name, price 168 | FROM products 169 | WHERE price > 50.00; 170 | 171 | -- Orders from 2024 or later 172 | SELECT order_id, order_date, total_amount 173 | FROM orders 174 | WHERE order_date >= '2024-01-01'; 175 | 176 | -- Products that are NOT in the Electronics category 177 | SELECT product_name, category 178 | FROM products 179 | WHERE category != 'Electronics'; -- or use <> instead of != 180 | ``` 181 | 182 | ### Working with Text Data 183 | 184 | Text comparisons in SQL are case-sensitive by default, and you have several options for pattern matching: 185 | 186 | ```sql 187 | -- Exact match (case-sensitive) 188 | SELECT * FROM customers 189 | WHERE last_name = 'Smith'; 190 | 191 | -- Pattern matching with LIKE 192 | SELECT * FROM customers 193 | WHERE last_name LIKE 'Sm%'; -- Names starting with 'Sm' 194 | 195 | -- Case-insensitive search (using UPPER or LOWER) 196 | SELECT * FROM customers 197 | WHERE UPPER(last_name) = 'SMITH'; 198 | ``` 199 | 200 | **LIKE Pattern Wildcards:** 201 | 202 | - `%` matches any sequence of characters (including zero characters) 203 | - `_` matches exactly one character 204 | 205 | ```sql 206 | -- Names with exactly 5 characters 207 | SELECT * FROM customers WHERE first_name LIKE '_____'; 208 | 209 | -- Email addresses from Gmail 210 | SELECT * FROM customers WHERE email LIKE '%@gmail.com'; 211 | 212 | -- Products with 'phone' anywhere in the name 213 | SELECT * FROM products WHERE LOWER(product_name) LIKE '%phone%'; 214 | ``` 215 | 216 | ### Combining Conditions 217 | 218 | Real queries often need multiple conditions. SQL provides logical operators to combine them: 219 | 220 | ```sql 221 | -- AND: Both conditions must be true 222 | SELECT product_name, price, category 223 | FROM products 224 | WHERE price > 100 AND category = 'Electronics'; 225 | 226 | -- OR: Either condition can be true 227 | SELECT customer_id, first_name, last_name 228 | FROM customers 229 | WHERE city = 'New York' OR city = 'Los Angeles'; 230 | 231 | -- Complex combinations using parentheses 232 | SELECT product_name, price, category 233 | FROM products 234 | WHERE (price > 100 AND category = 'Electronics') 235 | OR (price > 200 AND category = 'Clothing'); 236 | ``` 237 | 238 | Think of parentheses like mathematical equations - they control the order of evaluation and make your intentions clear. 239 | 240 | ### Working with NULL Values 241 | 242 | NULL represents missing or unknown data, and it requires special handling: 243 | 244 | ```sql 245 | -- Find customers without a city listed 246 | SELECT first_name, last_name 247 | FROM customers 248 | WHERE city IS NULL; 249 | 250 | -- Find customers WITH a city listed 251 | SELECT first_name, last_name, city 252 | FROM customers 253 | WHERE city IS NOT NULL; 254 | ``` 255 | 256 | **Important**: You cannot use `= NULL` or `!= NULL`. NULL comparisons always require `IS NULL` or `IS NOT NULL`. 257 | 258 | ### The IN Operator 259 | 260 | When you need to check if a value matches any item in a list, IN is more concise than multiple OR conditions: 261 | 262 | ```sql 263 | -- Traditional approach with OR 264 | SELECT * FROM products 265 | WHERE category = 'Electronics' OR category = 'Clothing' OR category = 'Books'; 266 | 267 | -- More elegant approach with IN 268 | SELECT * FROM products 269 | WHERE category IN ('Electronics', 'Clothing', 'Books'); 270 | 271 | -- NOT IN for exclusion 272 | SELECT * FROM products 273 | WHERE category NOT IN ('Electronics', 'Clothing'); 274 | ``` 275 | 276 | ### Range Queries with BETWEEN 277 | 278 | For checking if a value falls within a range: 279 | 280 | ```sql 281 | -- Products priced between $20 and $100 282 | SELECT product_name, price 283 | FROM products 284 | WHERE price BETWEEN 20.00 AND 100.00; 285 | 286 | -- Orders from the first quarter of 2024 287 | SELECT order_id, order_date, total_amount 288 | FROM orders 289 | WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'; 290 | ``` 291 | 292 | BETWEEN is inclusive on both ends, meaning the boundary values are included in the results. 293 | 294 | --- 295 | 296 | ## Sorting and Limiting Results 297 | 298 | Once you've filtered your data, you'll often want to control how it's presented and how much of it you see. 299 | 300 | ### Sorting with ORDER BY 301 | 302 | ```sql 303 | -- Sort customers alphabetically by last name 304 | SELECT first_name, last_name, email 305 | FROM customers 306 | ORDER BY last_name; 307 | 308 | -- Sort products by price, highest first 309 | SELECT product_name, price 310 | FROM products 311 | ORDER BY price DESC; -- DESC for descending, ASC for ascending (default) 312 | ``` 313 | 314 | ### Multi-Level Sorting 315 | 316 | You can sort by multiple columns, with each subsequent column serving as a "tie-breaker": 317 | 318 | ```sql 319 | -- Sort by category first, then by price within each category 320 | SELECT product_name, category, price 321 | FROM products 322 | ORDER BY category ASC, price DESC; 323 | ``` 324 | 325 | This query groups all products by category alphabetically, and within each category, shows the most expensive products first. 326 | 327 | ### Limiting Results 328 | 329 | When dealing with large datasets, you often want just a subset of results: 330 | 331 | ```sql 332 | -- Get the 5 most expensive products 333 | SELECT product_name, price 334 | FROM products 335 | ORDER BY price DESC 336 | LIMIT 5; 337 | 338 | -- Get products 11-20 when sorted by price (pagination) 339 | SELECT product_name, price 340 | FROM products 341 | ORDER BY price DESC 342 | LIMIT 10 OFFSET 10; -- Skip first 10, then take next 10 343 | ``` 344 | 345 | **Note**: Different database systems use different syntax for limiting results: 346 | 347 | - MySQL/PostgreSQL: `LIMIT n` or `LIMIT n OFFSET m` 348 | - SQL Server: `TOP n` or `OFFSET m ROWS FETCH NEXT n ROWS ONLY` 349 | - Oracle: `ROWNUM <= n` or `FETCH FIRST n ROWS ONLY` 350 | 351 | --- 352 | 353 | ## Working with Multiple Tables 354 | 355 | Real-world data is spread across multiple related tables. Joins allow you to combine data from different tables based on their relationships. 356 | 357 | ### Understanding Relationships 358 | 359 | Before diving into joins, let's understand how tables relate: 360 | 361 | - A customer can have many orders (one-to-many) 362 | - An order can contain many products, and a product can be in many orders (many-to-many, handled through the order_items table) 363 | 364 | ### INNER JOIN - Finding Matching Records 365 | 366 | An INNER JOIN returns only rows where matching records exist in both tables: 367 | 368 | ```sql 369 | -- Get customer information along with their orders 370 | SELECT 371 | c.first_name, 372 | c.last_name, 373 | o.order_id, 374 | o.order_date, 375 | o.total_amount 376 | FROM customers c 377 | INNER JOIN orders o ON c.customer_id = o.customer_id 378 | ORDER BY c.last_name, o.order_date; 379 | ``` 380 | 381 | **Key Points:** 382 | 383 | - Table aliases (`c` for customers, `o` for orders) make queries more readable 384 | - The ON clause specifies how tables are related 385 | - Only customers who have placed orders will appear in results 386 | 387 | ### LEFT JOIN - Including All Records from the First Table 388 | 389 | Sometimes you want all records from one table, even if they don't have matches in the other: 390 | 391 | ```sql 392 | -- Get all customers, including those who haven't placed orders 393 | SELECT 394 | c.first_name, 395 | c.last_name, 396 | c.email, 397 | o.order_id, 398 | o.order_date 399 | FROM customers c 400 | LEFT JOIN orders o ON c.customer_id = o.customer_id 401 | ORDER BY c.last_name; 402 | ``` 403 | 404 | Customers without orders will show NULL values for the order columns. This is useful for finding inactive customers or analyzing customer engagement. 405 | 406 | ### RIGHT JOIN and FULL OUTER JOIN 407 | 408 | ```sql 409 | -- RIGHT JOIN: All orders, even if customer data is missing (rare in practice) 410 | SELECT 411 | c.first_name, 412 | c.last_name, 413 | o.order_id, 414 | o.total_amount 415 | FROM customers c 416 | RIGHT JOIN orders o ON c.customer_id = o.customer_id; 417 | 418 | -- FULL OUTER JOIN: All customers and all orders (not supported in all databases) 419 | SELECT 420 | c.first_name, 421 | c.last_name, 422 | o.order_id, 423 | o.total_amount 424 | FROM customers c 425 | FULL OUTER JOIN orders o ON c.customer_id = o.customer_id; 426 | ``` 427 | 428 | ### Joining Multiple Tables 429 | 430 | Real queries often involve three or more tables: 431 | 432 | ```sql 433 | -- Get detailed order information: customer, order, and product details 434 | SELECT 435 | c.first_name, 436 | c.last_name, 437 | o.order_date, 438 | p.product_name, 439 | oi.quantity, 440 | oi.unit_price, 441 | (oi.quantity * oi.unit_price) AS line_total 442 | FROM customers c 443 | INNER JOIN orders o ON c.customer_id = o.customer_id 444 | INNER JOIN order_items oi ON o.order_id = oi.order_id 445 | INNER JOIN products p ON oi.product_id = p.product_id 446 | WHERE o.order_date >= '2024-01-01' 447 | ORDER BY c.last_name, o.order_date; 448 | ``` 449 | 450 | This query tells a complete story: who bought what, when, and for how much. 451 | 452 | ### Self-Joins 453 | 454 | Sometimes a table needs to be joined with itself. This is common with hierarchical data: 455 | 456 | ```sql 457 | -- If we had an employees table with manager relationships 458 | SELECT 459 | e.first_name AS employee_name, 460 | m.first_name AS manager_name 461 | FROM employees e 462 | LEFT JOIN employees m ON e.manager_id = m.employee_id; 463 | ``` 464 | 465 | --- 466 | 467 | ## Grouping and Aggregation 468 | 469 | Aggregation allows you to summarize data - counting records, calculating totals, finding averages, and more. 470 | 471 | ### Basic Aggregate Functions 472 | 473 | ```sql 474 | -- Count total number of customers 475 | SELECT COUNT(*) AS total_customers 476 | FROM customers; 477 | 478 | -- Count customers with email addresses (excludes NULLs) 479 | SELECT COUNT(email) AS customers_with_email 480 | FROM customers; 481 | 482 | -- Basic statistics on product prices 483 | SELECT 484 | COUNT(*) AS total_products, 485 | MIN(price) AS cheapest_price, 486 | MAX(price) AS most_expensive_price, 487 | AVG(price) AS average_price, 488 | SUM(stock_quantity) AS total_inventory_units 489 | FROM products; 490 | ``` 491 | 492 | ### GROUP BY - Summarizing by Categories 493 | 494 | GROUP BY allows you to create summaries for each group of related records: 495 | 496 | ```sql 497 | -- Count customers by city 498 | SELECT 499 | city, 500 | COUNT(*) AS customer_count 501 | FROM customers 502 | WHERE city IS NOT NULL 503 | GROUP BY city 504 | ORDER BY customer_count DESC; 505 | 506 | -- Total sales by product category 507 | SELECT 508 | p.category, 509 | COUNT(oi.order_item_id) AS items_sold, 510 | SUM(oi.quantity * oi.unit_price) AS total_revenue 511 | FROM products p 512 | INNER JOIN order_items oi ON p.product_id = oi.product_id 513 | GROUP BY p.category 514 | ORDER BY total_revenue DESC; 515 | ``` 516 | 517 | **Think of GROUP BY this way**: Imagine sorting all your data into separate piles based on the grouping column(s), then calculating statistics for each pile. 518 | 519 | ### HAVING - Filtering Groups 520 | 521 | WHERE filters individual rows before grouping, but HAVING filters groups after aggregation: 522 | 523 | ```sql 524 | -- Find cities with more than 5 customers 525 | SELECT 526 | city, 527 | COUNT(*) AS customer_count 528 | FROM customers 529 | WHERE city IS NOT NULL 530 | GROUP BY city 531 | HAVING COUNT(*) > 5 532 | ORDER BY customer_count DESC; 533 | 534 | -- Product categories with average price over $50 535 | SELECT 536 | category, 537 | COUNT(*) AS product_count, 538 | AVG(price) AS average_price 539 | FROM products 540 | GROUP BY category 541 | HAVING AVG(price) > 50.00; 542 | ``` 543 | 544 | ### Complex Grouping Examples 545 | 546 | ```sql 547 | -- Monthly sales summary for 2024 548 | SELECT 549 | YEAR(o.order_date) AS year, 550 | MONTH(o.order_date) AS month, 551 | COUNT(DISTINCT o.order_id) AS total_orders, 552 | COUNT(DISTINCT o.customer_id) AS unique_customers, 553 | SUM(o.total_amount) AS monthly_revenue 554 | FROM orders o 555 | WHERE o.order_date >= '2024-01-01' 556 | GROUP BY YEAR(o.order_date), MONTH(o.order_date) 557 | ORDER BY year, month; 558 | 559 | -- Customer purchase behavior analysis 560 | SELECT 561 | c.customer_id, 562 | c.first_name, 563 | c.last_name, 564 | COUNT(o.order_id) AS total_orders, 565 | SUM(o.total_amount) AS total_spent, 566 | AVG(o.total_amount) AS average_order_value, 567 | MIN(o.order_date) AS first_order_date, 568 | MAX(o.order_date) AS last_order_date 569 | FROM customers c 570 | INNER JOIN orders o ON c.customer_id = o.customer_id 571 | GROUP BY c.customer_id, c.first_name, c.last_name 572 | ORDER BY total_spent DESC; 573 | ``` 574 | 575 | --- 576 | 577 | ## Advanced Filtering with Subqueries 578 | 579 | Subqueries are queries within queries, allowing you to use the results of one query as input to another. They're powerful tools for complex filtering and data analysis. 580 | 581 | ### Simple Subqueries 582 | 583 | ```sql 584 | -- Find products that cost more than the average price 585 | SELECT product_name, price 586 | FROM products 587 | WHERE price > ( 588 | SELECT AVG(price) 589 | FROM products 590 | ); 591 | 592 | -- Find customers who have placed orders 593 | SELECT first_name, last_name 594 | FROM customers 595 | WHERE customer_id IN ( 596 | SELECT DISTINCT customer_id 597 | FROM orders 598 | ); 599 | ``` 600 | 601 | The subquery in parentheses executes first, and its result is used by the outer query. 602 | 603 | ### Correlated Subqueries 604 | 605 | In correlated subqueries, the inner query references columns from the outer query: 606 | 607 | ```sql 608 | -- Find customers whose latest order was in the last 30 days 609 | SELECT c.first_name, c.last_name, c.email 610 | FROM customers c 611 | WHERE EXISTS ( 612 | SELECT 1 613 | FROM orders o 614 | WHERE o.customer_id = c.customer_id 615 | AND o.order_date >= CURRENT_DATE - INTERVAL 30 DAY 616 | ); 617 | 618 | -- Find products that have never been ordered 619 | SELECT product_name, price 620 | FROM products p 621 | WHERE NOT EXISTS ( 622 | SELECT 1 623 | FROM order_items oi 624 | WHERE oi.product_id = p.product_id 625 | ); 626 | ``` 627 | 628 | **EXISTS vs IN**: EXISTS is often more efficient and handles NULL values better than IN, especially with correlated subqueries. 629 | 630 | ### Subqueries in SELECT Clauses 631 | 632 | You can use subqueries to add calculated columns: 633 | 634 | ```sql 635 | -- Show each customer with their total number of orders 636 | SELECT 637 | c.first_name, 638 | c.last_name, 639 | (SELECT COUNT(*) 640 | FROM orders o 641 | WHERE o.customer_id = c.customer_id) AS total_orders, 642 | (SELECT MAX(order_date) 643 | FROM orders o 644 | WHERE o.customer_id = c.customer_id) AS last_order_date 645 | FROM customers c 646 | ORDER BY total_orders DESC; 647 | ``` 648 | 649 | ### Common Table Expressions (CTEs) 650 | 651 | CTEs provide a cleaner way to write complex queries with subqueries: 652 | 653 | ```sql 654 | -- Find customers who spent more than the average customer 655 | WITH customer_totals AS ( 656 | SELECT 657 | c.customer_id, 658 | c.first_name, 659 | c.last_name, 660 | SUM(o.total_amount) AS total_spent 661 | FROM customers c 662 | INNER JOIN orders o ON c.customer_id = o.customer_id 663 | GROUP BY c.customer_id, c.first_name, c.last_name 664 | ), 665 | overall_average AS ( 666 | SELECT AVG(total_spent) AS avg_spent 667 | FROM customer_totals 668 | ) 669 | SELECT 670 | ct.first_name, 671 | ct.last_name, 672 | ct.total_spent, 673 | oa.avg_spent 674 | FROM customer_totals ct 675 | CROSS JOIN overall_average oa 676 | WHERE ct.total_spent > oa.avg_spent 677 | ORDER BY ct.total_spent DESC; 678 | ``` 679 | 680 | CTEs make complex logic more readable by breaking it into named, reusable pieces. 681 | 682 | --- 683 | 684 | ## Window Functions 685 | 686 | Window functions perform calculations across a set of rows related to the current row, without collapsing the results into groups like aggregate functions do. 687 | 688 | ### Basic Window Functions 689 | 690 | ```sql 691 | -- Add row numbers to products ordered by price 692 | SELECT 693 | product_name, 694 | price, 695 | ROW_NUMBER() OVER (ORDER BY price DESC) as price_rank 696 | FROM products; 697 | 698 | -- Show each order with running total of sales 699 | SELECT 700 | order_id, 701 | order_date, 702 | total_amount, 703 | SUM(total_amount) OVER (ORDER BY order_date) as running_total 704 | FROM orders 705 | ORDER BY order_date; 706 | ``` 707 | 708 | ### Partitioning with OVER 709 | 710 | The PARTITION BY clause creates separate "windows" for different groups: 711 | 712 | ```sql 713 | -- Rank products by price within each category 714 | SELECT 715 | product_name, 716 | category, 717 | price, 718 | RANK() OVER (PARTITION BY category ORDER BY price DESC) as category_price_rank 719 | FROM products; 720 | 721 | -- Show each customer's orders with order sequence 722 | SELECT 723 | c.first_name, 724 | c.last_name, 725 | o.order_date, 726 | o.total_amount, 727 | ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY o.order_date) as order_sequence 728 | FROM customers c 729 | INNER JOIN orders o ON c.customer_id = o.customer_id 730 | ORDER BY c.last_name, o.order_date; 731 | ``` 732 | 733 | ### Advanced Window Functions 734 | 735 | ```sql 736 | -- Compare each order to the previous order for the same customer 737 | SELECT 738 | c.first_name, 739 | c.last_name, 740 | o.order_date, 741 | o.total_amount, 742 | LAG(o.total_amount) OVER (PARTITION BY c.customer_id ORDER BY o.order_date) as previous_order_amount, 743 | o.total_amount - LAG(o.total_amount) OVER (PARTITION BY c.customer_id ORDER BY o.order_date) as change_from_previous 744 | FROM customers c 745 | INNER JOIN orders o ON c.customer_id = o.customer_id 746 | ORDER BY c.last_name, o.order_date; 747 | 748 | -- Find top 3 products in each category 749 | SELECT * 750 | FROM ( 751 | SELECT 752 | product_name, 753 | category, 754 | price, 755 | DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) as price_rank 756 | FROM products 757 | ) ranked_products 758 | WHERE price_rank <= 3 759 | ORDER BY category, price_rank; 760 | ``` 761 | 762 | **Key Window Functions:** 763 | 764 | - `ROW_NUMBER()`: Assigns unique sequential numbers 765 | - `RANK()`: Assigns ranks with gaps for ties 766 | - `DENSE_RANK()`: Assigns ranks without gaps for ties 767 | - `LAG()/LEAD()`: Access previous/next row values 768 | - `FIRST_VALUE()/LAST_VALUE()`: Get first/last values in window 769 | 770 | --- 771 | 772 | ## Data Modification 773 | 774 | Beyond querying data, SQL allows you to insert, update, and delete records. 775 | 776 | ### INSERT - Adding New Data 777 | 778 | ```sql 779 | -- Insert a single customer 780 | INSERT INTO customers (first_name, last_name, email, registration_date, city, country) 781 | VALUES ('John', 'Doe', 'john.doe@email.com', '2024-06-09', 'Chicago', 'USA'); 782 | 783 | -- Insert multiple customers at once 784 | INSERT INTO customers (first_name, last_name, email, registration_date, city, country) 785 | VALUES 786 | ('Jane', 'Smith', 'jane.smith@email.com', '2024-06-09', 'Miami', 'USA'), 787 | ('Bob', 'Johnson', 'bob.johnson@email.com', '2024-06-09', 'Seattle', 'USA'); 788 | 789 | -- Insert from a query (copying data) 790 | INSERT INTO archived_orders (order_id, customer_id, order_date, total_amount) 791 | SELECT order_id, customer_id, order_date, total_amount 792 | FROM orders 793 | WHERE order_date < '2023-01-01'; 794 | ``` 795 | 796 | ### UPDATE - Modifying Existing Data 797 | 798 | ```sql 799 | -- Update a single record 800 | UPDATE customers 801 | SET city = 'New Chicago', country = 'USA' 802 | WHERE customer_id = 1; 803 | 804 | -- Update multiple records with conditions 805 | UPDATE products 806 | SET price = price * 1.10 -- 10% price increase 807 | WHERE category = 'Electronics'; 808 | 809 | -- Update using data from other tables 810 | UPDATE customers c 811 | SET city = 'Updated City' 812 | WHERE c.customer_id IN ( 813 | SELECT DISTINCT o.customer_id 814 | FROM orders o 815 | WHERE o.order_date >= '2024-01-01' 816 | ); 817 | ``` 818 | 819 | **Warning**: Always use WHERE clauses with UPDATE statements unless you intend to modify every row in the table. 820 | 821 | ### DELETE - Removing Data 822 | 823 | ```sql 824 | -- Delete specific records 825 | DELETE FROM customers 826 | WHERE registration_date IS NULL; 827 | 828 | -- Delete based on related data 829 | DELETE FROM products 830 | WHERE product_id NOT IN ( 831 | SELECT DISTINCT product_id 832 | FROM order_items 833 | ); 834 | 835 | -- Delete all records (use with extreme caution) 836 | DELETE FROM temp_table; 837 | ``` 838 | 839 | ### UPSERT Operations 840 | 841 | Some databases support "upsert" operations (insert or update): 842 | 843 | ```sql 844 | -- MySQL example: INSERT ... ON DUPLICATE KEY UPDATE 845 | INSERT INTO products (product_id, product_name, price) 846 | VALUES (1, 'Updated Product', 29.99) 847 | ON DUPLICATE KEY UPDATE 848 | product_name = 'Updated Product', 849 | price = 29.99; 850 | 851 | -- PostgreSQL example: INSERT ... ON CONFLICT 852 | INSERT INTO products (product_id, product_name, price) 853 | VALUES (1, 'Updated Product', 29.99) 854 | ON CONFLICT (product_id) 855 | DO UPDATE SET 856 | product_name = 'Updated Product', 857 | price = 29.99; 858 | ``` 859 | 860 | --- 861 | 862 | ## Advanced Techniques 863 | 864 | ### CASE Statements - Conditional Logic 865 | 866 | CASE statements allow you to implement conditional logic within queries: 867 | 868 | ```sql 869 | -- Categorize customers by order frequency 870 | SELECT 871 | c.first_name, 872 | c.last_name, 873 | COUNT(o.order_id) as order_count, 874 | CASE 875 | WHEN COUNT(o.order_id) >= 10 THEN 'High Value' 876 | WHEN COUNT(o.order_id) >= 5 THEN 'Medium Value' 877 | WHEN COUNT(o.order_id) >= 1 THEN 'Low Value' 878 | ELSE 'No Orders' 879 | END as customer_category 880 | FROM customers c 881 | LEFT JOIN orders o ON c.customer_id = o.customer_id 882 | GROUP BY c.customer_id, c.first_name, c.last_name 883 | ORDER BY order_count DESC; 884 | 885 | -- Create pivot-like reports 886 | SELECT 887 | category, 888 | SUM(CASE WHEN price < 50 THEN 1 ELSE 0 END) as budget_products, 889 | SUM(CASE WHEN price BETWEEN 50 AND 100 THEN 1 ELSE 0 END) as mid_range_products, 890 | SUM(CASE WHEN price > 100 THEN 1 ELSE 0 END) as premium_products 891 | FROM products 892 | GROUP BY category; 893 | ``` 894 | 895 | ### Working with Dates and Times 896 | 897 | ```sql 898 | -- Extract date parts 899 | SELECT 900 | order_id, 901 | order_date, 902 | YEAR(order_date) as order_year, 903 | MONTH(order_date) as order_month, 904 | DAYNAME(order_date) as order_day_name, 905 | QUARTER(order_date) as order_quarter 906 | FROM orders; 907 | 908 | -- Date arithmetic 909 | SELECT 910 | customer_id, 911 | registration_date, 912 | DATEDIFF(CURRENT_DATE, registration_date) as days_since_registration, 913 | DATE_ADD(registration_date, INTERVAL 1 YEAR) as one_year_anniversary 914 | FROM customers; 915 | 916 | -- Time-based analysis 917 | SELECT 918 | DATE_TRUNC('month', order_date) as month, 919 | COUNT(*) as orders_count, 920 | SUM(total_amount) as monthly_revenue 921 | FROM orders 922 | GROUP BY DATE_TRUNC('month', order_date) 923 | ORDER BY month; 924 | ``` 925 | 926 | ### String Functions 927 | 928 | ```sql 929 | -- Text manipulation 930 | SELECT 931 | first_name, 932 | last_name, 933 | CONCAT(first_name, ' ', last_name) as full_name, 934 | UPPER(email) as email_upper, 935 | LENGTH(email) as email_length, 936 | SUBSTRING(email, 1, POSITION('@' IN email) - 1) as username 937 | FROM customers; 938 | 939 | -- Pattern matching and replacement 940 | SELECT 941 | product_name, 942 | REPLACE(product_name, 'iPhone', 'Phone') as generic_name, 943 | CASE 944 | WHEN product_name LIKE '%Pro%' THEN 'Professional' 945 | WHEN product_name LIKE '%Mini%' THEN 'Compact' 946 | ELSE 'Standard' 947 | END as product_tier 948 | FROM products; 949 | ``` 950 | 951 | ### Set Operations 952 | 953 | ```sql 954 | -- UNION: Combine results from multiple queries 955 | SELECT city FROM customers WHERE country = 'USA' 956 | UNION 957 | SELECT city FROM suppliers WHERE country = 'USA'; 958 | 959 | -- INTERSECT: Find common values (not supported in all databases) 960 | SELECT city FROM customers 961 | INTERSECT 962 | SELECT city FROM suppliers; 963 | 964 | -- EXCEPT/MINUS: Find values in first query but not second 965 | SELECT city FROM customers 966 | EXCEPT 967 | SELECT city FROM suppliers; 968 | ``` 969 | 970 | --- 971 | 972 | ## Performance and Optimization 973 | 974 | Understanding query performance is crucial for working with large datasets. 975 | 976 | ### Query Execution Plans 977 | 978 | Most databases provide tools to show how queries are executed: 979 | 980 | ```sql 981 | -- Show execution plan (syntax varies by database) 982 | EXPLAIN SELECT c.first_name, c.last_name, COUNT(o.order_id) 983 | FROM customers c 984 | LEFT JOIN orders o ON c.customer_id = o.customer_id 985 | GROUP BY c.customer_id, c.first_name, c.last_name; 986 | ``` 987 | 988 | ### Indexing Strategy 989 | 990 | Indexes speed up queries but slow down modifications: 991 | 992 | ```sql 993 | -- Create indexes on frequently queried columns 994 | CREATE INDEX idx_customers_email ON customers(email); 995 | CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date); 996 | CREATE INDEX idx_products_category_price ON products(category, price); 997 | 998 | -- Composite indexes for multi-column queries 999 | CREATE INDEX idx_order_items_lookup ON order_items(order_id, product_id); 1000 | ``` 1001 | 1002 | ### Query Optimization Tips 1003 | 1004 | ```sql 1005 | -- Use specific columns instead of SELECT * 1006 | SELECT customer_id, first_name, last_name -- Good 1007 | FROM customers; 1008 | 1009 | SELECT * -- Avoid when possible 1010 | FROM customers; 1011 | 1012 | -- Use LIMIT when you don't need all results 1013 | SELECT product_name, price 1014 | FROM products 1015 | ORDER BY price DESC 1016 | LIMIT 10; -- Only get top 10 1017 | 1018 | -- Use EXISTS instead of IN for correlated subqueries 1019 | SELECT c.first_name, c.last_name 1020 | FROM customers c 1021 | WHERE EXISTS ( -- More efficient 1022 | SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id 1023 | ); 1024 | 1025 | -- Instead of 1026 | SELECT c.first_name, c.last_name 1027 | FROM customers c 1028 | WHERE c.customer_id IN ( -- Can be slower 1029 | SELECT customer_id FROM orders 1030 | ); 1031 | ``` 1032 | 1033 | ### Common Performance Pitfalls 1034 | 1035 | ```sql 1036 | -- Avoid functions on columns in WHERE clauses 1037 | -- Slow: 1038 | SELECT * FROM orders WHERE YEAR(order_date) = 2024; 1039 | 1040 | -- Better: 1041 | SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'; 1042 | 1043 | -- Be careful with wildcards at the beginning of LIKE patterns 1044 | -- Slow (can't use indexes): 1045 | SELECT * FROM customers WHERE last_name LIKE '%son'; 1046 | 1047 | -- Better (can use indexes): 1048 | SELECT * FROM customers WHERE last_name LIKE 'John%'; 1049 | ``` 1050 | 1051 | --- 1052 | 1053 | ## Practice Exercises 1054 | 1055 | ### Beginner Level 1056 | 1057 | 1. **Basic Selection**: Write a query to find all products in the 'Electronics' category with a price under $100. 1058 | 1059 | 2. **Customer Analysis**: Find all customers who registered in 2024, sorted by registration date. 1060 | 1061 | 3. **Order Summary**: Count the total number of orders and calculate the total revenue from all orders. 1062 | 1063 | ### Intermediate Level 1064 | 1065 | 4. **Join Practice**: Create a report showing customer names, their order dates, and order totals for orders placed in the last 6 months. 1066 | 1067 | 5. **Grouping Challenge**: Find the top 5 best-selling products by total quantity sold, including the product name, total quantity, and total revenue generated. 1068 | 1069 | 6. **Subquery Practice**: Find customers who have spent more than the average customer spending amount. 1070 | 1071 | ### Advanced Level 1072 | 1073 | 7. **Window Functions**: Create a report showing each customer's orders with a running total of their spending over time. 1074 | 1075 | 8. **Complex Analysis**: Find products that have been ordered in every month of 2024 (if any). 1076 | 1077 | 9. **Performance Challenge**: Write an optimized query to find the most popular product in each category for the current year. 1078 | 1079 | ### Expert Level 1080 | 1081 | 10. **Business Intelligence**: Create a comprehensive customer segmentation analysis that categorizes customers as: 1082 | 1083 | - VIP: Top 10% by total spending 1084 | - Regular: Next 40% by total spending 1085 | - Occasional: Remaining customers with orders 1086 | - Inactive: Customers with no orders 1087 | 1088 | --- 1089 | 1090 | ## Solutions to Practice Exercises 1091 | 1092 | ### Beginner Solutions 1093 | 1094 | **Exercise 1: Basic Selection** 1095 | 1096 | ```sql 1097 | SELECT product_name, price, category 1098 | FROM products 1099 | WHERE category = 'Electronics' AND price < 100 1100 | ORDER BY price; 1101 | ``` 1102 | 1103 | **Exercise 2: Customer Analysis** 1104 | 1105 | ```sql 1106 | SELECT first_name, last_name, email, registration_date 1107 | FROM customers 1108 | WHERE registration_date >= '2024-01-01' AND registration_date < '2025-01-01' 1109 | ORDER BY registration_date; 1110 | ``` 1111 | 1112 | **Exercise 3: Order Summary** 1113 | 1114 | ```sql 1115 | SELECT 1116 | COUNT(*) as total_orders, 1117 | SUM(total_amount) as total_revenue, 1118 | AVG(total_amount) as average_order_value 1119 | FROM orders; 1120 | ``` 1121 | 1122 | ### Intermediate Solutions 1123 | 1124 | **Exercise 4: Join Practice** 1125 | 1126 | ```sql 1127 | SELECT 1128 | c.first_name, 1129 | c.last_name, 1130 | o.order_date, 1131 | o.total_amount 1132 | FROM customers c 1133 | INNER JOIN orders o ON c.customer_id = o.customer_id 1134 | WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 6 MONTH) 1135 | ORDER BY c.last_name, o.order_date; 1136 | ``` 1137 | 1138 | **Exercise 5: Grouping Challenge** 1139 | 1140 | ```sql 1141 | SELECT 1142 | p.product_name, 1143 | SUM(oi.quantity) as total_quantity_sold, 1144 | SUM(oi.quantity * oi.unit_price) as total_revenue 1145 | FROM products p 1146 | INNER JOIN order_items oi ON p.product_id = oi.product_id 1147 | GROUP BY p.product_id, p.product_name 1148 | ORDER BY total_quantity_sold DESC 1149 | LIMIT 5; 1150 | ``` 1151 | 1152 | **Exercise 6: Subquery Practice** 1153 | 1154 | ```sql 1155 | WITH customer_spending AS ( 1156 | SELECT 1157 | c.customer_id, 1158 | c.first_name, 1159 | c.last_name, 1160 | SUM(o.total_amount) as total_spent 1161 | FROM customers c 1162 | INNER JOIN orders o ON c.customer_id = o.customer_id 1163 | GROUP BY c.customer_id, c.first_name, c.last_name 1164 | ) 1165 | SELECT 1166 | first_name, 1167 | last_name, 1168 | total_spent 1169 | FROM customer_spending 1170 | WHERE total_spent > ( 1171 | SELECT AVG(total_spent) FROM customer_spending 1172 | ) 1173 | ORDER BY total_spent DESC; 1174 | ``` 1175 | 1176 | ### Advanced Solutions 1177 | 1178 | **Exercise 7: Window Functions** 1179 | 1180 | ```sql 1181 | SELECT 1182 | c.first_name, 1183 | c.last_name, 1184 | o.order_date, 1185 | o.total_amount, 1186 | SUM(o.total_amount) OVER ( 1187 | PARTITION BY c.customer_id 1188 | ORDER BY o.order_date 1189 | ROWS UNBOUNDED PRECEDING 1190 | ) as running_total 1191 | FROM customers c 1192 | INNER JOIN orders o ON c.customer_id = o.customer_id 1193 | ORDER BY c.customer_id, o.order_date; 1194 | ``` 1195 | 1196 | **Exercise 8: Complex Analysis** 1197 | 1198 | ```sql 1199 | WITH monthly_products AS ( 1200 | SELECT DISTINCT 1201 | p.product_id, 1202 | p.product_name, 1203 | MONTH(o.order_date) as order_month 1204 | FROM products p 1205 | INNER JOIN order_items oi ON p.product_id = oi.product_id 1206 | INNER JOIN orders o ON oi.order_id = o.order_id 1207 | WHERE YEAR(o.order_date) = 2024 1208 | ), 1209 | product_month_counts AS ( 1210 | SELECT 1211 | product_id, 1212 | product_name, 1213 | COUNT(DISTINCT order_month) as months_sold 1214 | FROM monthly_products 1215 | GROUP BY product_id, product_name 1216 | ) 1217 | SELECT product_name 1218 | FROM product_month_counts 1219 | WHERE months_sold = 12; -- All 12 months 1220 | ``` 1221 | 1222 | ### Expert Solution 1223 | 1224 | **Exercise 10: Business Intelligence - Customer Segmentation** 1225 | 1226 | ```sql 1227 | WITH customer_totals AS ( 1228 | SELECT 1229 | c.customer_id, 1230 | c.first_name, 1231 | c.last_name, 1232 | c.email, 1233 | COALESCE(SUM(o.total_amount), 0) as total_spent, 1234 | COUNT(o.order_id) as order_count 1235 | FROM customers c 1236 | LEFT JOIN orders o ON c.customer_id = o.customer_id 1237 | GROUP BY c.customer_id, c.first_name, c.last_name, c.email 1238 | ), 1239 | spending_percentiles AS ( 1240 | SELECT 1241 | *, 1242 | PERCENT_RANK() OVER (ORDER BY total_spent DESC) as spending_percentile 1243 | FROM customer_totals 1244 | WHERE total_spent > 0 1245 | ) 1246 | SELECT 1247 | customer_id, 1248 | first_name, 1249 | last_name, 1250 | email, 1251 | total_spent, 1252 | order_count, 1253 | CASE 1254 | WHEN total_spent = 0 THEN 'Inactive' 1255 | WHEN spending_percentile <= 0.10 THEN 'VIP' 1256 | WHEN spending_percentile <= 0.50 THEN 'Regular' 1257 | ELSE 'Occasional' 1258 | END as customer_segment, 1259 | ROUND(spending_percentile * 100, 1) as spending_percentile_rank 1260 | FROM ( 1261 | SELECT 1262 | ct.*, 1263 | COALESCE(sp.spending_percentile, 1.0) as spending_percentile 1264 | FROM customer_totals ct 1265 | LEFT JOIN spending_percentiles sp ON ct.customer_id = sp.customer_id 1266 | ) segmented_customers 1267 | ORDER BY total_spent DESC; 1268 | ``` 1269 | 1270 | --- 1271 | 1272 | ## Real-World Application Scenarios 1273 | 1274 | ### E-commerce Analytics 1275 | 1276 | ```sql 1277 | -- Daily sales dashboard 1278 | WITH daily_metrics AS ( 1279 | SELECT 1280 | DATE(order_date) as sale_date, 1281 | COUNT(DISTINCT order_id) as orders, 1282 | COUNT(DISTINCT customer_id) as unique_customers, 1283 | SUM(total_amount) as revenue, 1284 | AVG(total_amount) as avg_order_value 1285 | FROM orders 1286 | WHERE order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY) 1287 | GROUP BY DATE(order_date) 1288 | ) 1289 | SELECT 1290 | sale_date, 1291 | orders, 1292 | unique_customers, 1293 | revenue, 1294 | avg_order_value, 1295 | revenue - LAG(revenue) OVER (ORDER BY sale_date) as revenue_change, 1296 | ROUND( 1297 | ((revenue - LAG(revenue) OVER (ORDER BY sale_date)) / 1298 | LAG(revenue) OVER (ORDER BY sale_date)) * 100, 2 1299 | ) as revenue_change_percent 1300 | FROM daily_metrics 1301 | ORDER BY sale_date DESC; 1302 | ``` 1303 | 1304 | ### Customer Retention Analysis 1305 | 1306 | ```sql 1307 | -- Customer cohort analysis (simplified) 1308 | WITH customer_first_orders AS ( 1309 | SELECT 1310 | customer_id, 1311 | MIN(order_date) as first_order_date, 1312 | DATE_FORMAT(MIN(order_date), '%Y-%m') as cohort_month 1313 | FROM orders 1314 | GROUP BY customer_id 1315 | ), 1316 | monthly_activity AS ( 1317 | SELECT 1318 | cfo.customer_id, 1319 | cfo.cohort_month, 1320 | DATE_FORMAT(o.order_date, '%Y-%m') as activity_month, 1321 | TIMESTAMPDIFF(MONTH, cfo.first_order_date, o.order_date) as period_number 1322 | FROM customer_first_orders cfo 1323 | INNER JOIN orders o ON cfo.customer_id = o.customer_id 1324 | ) 1325 | SELECT 1326 | cohort_month, 1327 | period_number, 1328 | COUNT(DISTINCT customer_id) as customers 1329 | FROM monthly_activity 1330 | WHERE period_number <= 12 -- First 12 months 1331 | GROUP BY cohort_month, period_number 1332 | ORDER BY cohort_month, period_number; 1333 | ``` 1334 | 1335 | ### Inventory Management 1336 | 1337 | ```sql 1338 | -- Products needing restock alert 1339 | WITH product_velocity AS ( 1340 | SELECT 1341 | p.product_id, 1342 | p.product_name, 1343 | p.stock_quantity, 1344 | COALESCE(SUM(oi.quantity), 0) as units_sold_30_days, 1345 | COALESCE(SUM(oi.quantity) / 30.0, 0) as avg_daily_sales 1346 | FROM products p 1347 | LEFT JOIN order_items oi ON p.product_id = oi.product_id 1348 | LEFT JOIN orders o ON oi.order_id = o.order_id 1349 | AND o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY) 1350 | GROUP BY p.product_id, p.product_name, p.stock_quantity 1351 | ) 1352 | SELECT 1353 | product_name, 1354 | stock_quantity, 1355 | units_sold_30_days, 1356 | ROUND(avg_daily_sales, 2) as avg_daily_sales, 1357 | CASE 1358 | WHEN avg_daily_sales > 0 1359 | THEN ROUND(stock_quantity / avg_daily_sales, 0) 1360 | ELSE 999 1361 | END as days_of_inventory, 1362 | CASE 1363 | WHEN stock_quantity / NULLIF(avg_daily_sales, 0) < 7 THEN 'URGENT' 1364 | WHEN stock_quantity / NULLIF(avg_daily_sales, 0) < 14 THEN 'LOW' 1365 | WHEN stock_quantity / NULLIF(avg_daily_sales, 0) < 30 THEN 'NORMAL' 1366 | ELSE 'HIGH' 1367 | END as inventory_status 1368 | FROM product_velocity 1369 | WHERE avg_daily_sales > 0 1370 | ORDER BY days_of_inventory ASC; 1371 | ``` 1372 | 1373 | --- 1374 | 1375 | ## Database-Specific Considerations 1376 | 1377 | ### MySQL Specifics 1378 | 1379 | ```sql 1380 | -- MySQL date functions 1381 | SELECT 1382 | order_date, 1383 | DATE_FORMAT(order_date, '%Y-%m') as year_month, 1384 | WEEKDAY(order_date) as day_of_week, 1385 | STR_TO_DATE('2024-06-09', '%Y-%m-%d') as parsed_date; 1386 | 1387 | -- MySQL string functions 1388 | SELECT 1389 | CONCAT(first_name, ' ', last_name) as full_name, 1390 | CHAR_LENGTH(email) as email_length, 1391 | SUBSTRING_INDEX(email, '@', 1) as username; 1392 | ``` 1393 | 1394 | ### PostgreSQL Specifics 1395 | 1396 | ```sql 1397 | -- PostgreSQL date functions 1398 | SELECT 1399 | order_date, 1400 | EXTRACT(YEAR FROM order_date) as year, 1401 | DATE_TRUNC('month', order_date) as month_start, 1402 | order_date + INTERVAL '30 days' as future_date; 1403 | 1404 | -- PostgreSQL arrays and JSON (if supported) 1405 | SELECT 1406 | customer_id, 1407 | ARRAY_AGG(product_name) as purchased_products, 1408 | JSON_AGG( 1409 | JSON_BUILD_OBJECT( 1410 | 'product', product_name, 1411 | 'quantity', quantity 1412 | ) 1413 | ) as order_details 1414 | FROM customers c 1415 | JOIN orders o USING (customer_id) 1416 | JOIN order_items oi USING (order_id) 1417 | JOIN products p USING (product_id) 1418 | GROUP BY customer_id; 1419 | ``` 1420 | 1421 | ### SQL Server Specifics 1422 | 1423 | ```sql 1424 | -- SQL Server TOP and OFFSET/FETCH 1425 | SELECT TOP 10 * FROM products ORDER BY price DESC; 1426 | 1427 | SELECT * FROM products 1428 | ORDER BY price DESC 1429 | OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY; 1430 | 1431 | -- SQL Server date functions 1432 | SELECT 1433 | GETDATE() as current_datetime, 1434 | DATEPART(YEAR, order_date) as year, 1435 | DATEDIFF(DAY, order_date, GETDATE()) as days_ago; 1436 | ``` 1437 | 1438 | --- 1439 | 1440 | ## Best Practices Summary 1441 | 1442 | ### Writing Maintainable SQL 1443 | 1444 | 1. **Use clear, descriptive aliases** 1445 | 1446 | ```sql 1447 | -- Good 1448 | SELECT c.first_name, c.last_name, o.order_date 1449 | FROM customers c 1450 | INNER JOIN orders o ON c.customer_id = o.customer_id; 1451 | 1452 | -- Avoid 1453 | SELECT a.first_name, a.last_name, b.order_date 1454 | FROM customers a, orders b 1455 | WHERE a.customer_id = b.customer_id; 1456 | ``` 1457 | 1458 | 2. **Format queries for readability** 1459 | 1460 | ```sql 1461 | SELECT 1462 | c.first_name, 1463 | c.last_name, 1464 | COUNT(o.order_id) as total_orders, 1465 | SUM(o.total_amount) as total_spent 1466 | FROM customers c 1467 | LEFT JOIN orders o ON c.customer_id = o.customer_id 1468 | WHERE c.registration_date >= '2024-01-01' 1469 | GROUP BY c.customer_id, c.first_name, c.last_name 1470 | HAVING COUNT(o.order_id) > 0 1471 | ORDER BY total_spent DESC 1472 | LIMIT 10; 1473 | ``` 1474 | 1475 | 3. **Comment complex logic** 1476 | ```sql 1477 | -- Calculate customer lifetime value with 30-day recency weighting 1478 | SELECT 1479 | customer_id, 1480 | total_spent * 1481 | CASE 1482 | WHEN days_since_last_order <= 30 THEN 1.0 1483 | WHEN days_since_last_order <= 90 THEN 0.8 1484 | ELSE 0.5 1485 | END as weighted_clv 1486 | FROM customer_metrics; 1487 | ``` 1488 | 1489 | ## Best Practices Summary 1490 | 1491 | ### Writing Maintainable SQL 1492 | 1493 | 1. **Use clear, descriptive aliases** 1494 | ```sql 1495 | -- Good 1496 | SELECT c.first_name, c.last_name, o.order_date 1497 | FROM customers c 1498 | INNER JOIN orders o ON c.customer_id = o.customer_id; 1499 | 1500 | -- Avoid 1501 | SELECT a.first_name, a.last_name, b.order_date 1502 | FROM customers a, orders b 1503 | WHERE a.customer_id = b.customer_id; 1504 | ``` 1505 | 1506 | 2. **Format queries for readability** 1507 | ```sql 1508 | SELECT 1509 | c.first_name, 1510 | c.last_name, 1511 | COUNT(o.order_id) as total_orders, 1512 | SUM(o.total_amount) as total_spent 1513 | FROM customers c 1514 | LEFT JOIN orders o ON c.customer_id = o.customer_id 1515 | WHERE c.registration_date >= '2024-01-01' 1516 | GROUP BY c.customer_id, c.first_name, c.last_name 1517 | HAVING COUNT(o.order_id) > 0 1518 | ORDER BY total_spent DESC 1519 | LIMIT 10; 1520 | ``` 1521 | 1522 | 3. **Comment complex logic** 1523 | ```sql 1524 | -- Calculate customer lifetime value with 30-day recency weighting 1525 | SELECT 1526 | customer_id, 1527 | total_spent * 1528 | CASE 1529 | WHEN days_since_last_order <= 30 THEN 1.0 1530 | WHEN days_since_last_order <= 90 THEN 0.8 1531 | ELSE 0.5 1532 | END as weighted_clv 1533 | FROM customer_metrics; 1534 | ``` 1535 | 1536 | ### Advanced Performance Guidelines 1537 | 1538 | Understanding query performance requires thinking like the database engine. Every query goes through multiple phases: parsing, optimization, execution planning, and finally execution. Let's explore how to write queries that work with the optimizer rather than against it. 1539 | 1540 | #### Index Strategy: Beyond the Basics 1541 | 1542 | The most impactful performance optimization is proper indexing, but it's not just about "adding indexes to queried columns." The order of columns in composite indexes matters enormously, and understanding this can transform your query performance. 1543 | 1544 | **Composite Index Column Ordering** 1545 | ```sql 1546 | -- If you frequently query: WHERE customer_id = ? AND order_date BETWEEN ? AND ? 1547 | -- Create index in this specific order: 1548 | CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date); 1549 | 1550 | -- NOT: CREATE INDEX idx_orders_date_customer ON orders(order_date, customer_id); 1551 | -- The first version allows the database to quickly find all orders for a customer, 1552 | -- then scan just that subset for the date range 1553 | -- The second version would need to scan all orders in date range, then filter by customer 1554 | ``` 1555 | 1556 | The rule here is to put the most selective columns first (columns that eliminate the most rows), followed by range conditions. Think of an index like a phone book - you can quickly find all "Smiths" and then scan through them for "John Smith," but you can't efficiently find all "Johns" without reading the entire book. 1557 | 1558 | **Covering Indexes: Eliminating Table Lookups** 1559 | ```sql 1560 | -- Instead of just indexing the WHERE clause columns: 1561 | CREATE INDEX idx_orders_customer ON orders(customer_id); 1562 | 1563 | -- Include columns needed in SELECT to avoid going back to the table: 1564 | CREATE INDEX idx_orders_customer_covering ON orders(customer_id, order_date, total_amount); 1565 | 1566 | -- Now this query can be satisfied entirely from the index: 1567 | SELECT order_date, total_amount 1568 | FROM orders 1569 | WHERE customer_id = 12345; 1570 | ``` 1571 | 1572 | This technique, called a "covering index," means the database never needs to access the actual table data after finding the index entries. It's particularly powerful for frequently-run reporting queries. 1573 | 1574 | #### Query Rewriting for Performance 1575 | 1576 | Sometimes the same logical query can be written in dramatically different ways with vastly different performance characteristics. 1577 | 1578 | **Transforming Correlated Subqueries** 1579 | ```sql 1580 | -- Slow: Correlated subquery that runs once per customer 1581 | SELECT c.first_name, c.last_name 1582 | FROM customers c 1583 | WHERE ( 1584 | SELECT COUNT(*) 1585 | FROM orders o 1586 | WHERE o.customer_id = c.customer_id 1587 | AND o.order_date >= '2024-01-01' 1588 | ) > 5; 1589 | 1590 | -- Fast: Join with aggregation (runs aggregation once, then joins) 1591 | SELECT c.first_name, c.last_name 1592 | FROM customers c 1593 | INNER JOIN ( 1594 | SELECT customer_id 1595 | FROM orders 1596 | WHERE order_date >= '2024-01-01' 1597 | GROUP BY customer_id 1598 | HAVING COUNT(*) > 5 1599 | ) frequent_customers ON c.customer_id = frequent_customers.customer_id; 1600 | ``` 1601 | 1602 | The first query executes the subquery thousands of times (once for each customer). The second query does the aggregation work once and then performs a simple join. With 10,000 customers, this could be the difference between 10,000 aggregations versus one. 1603 | 1604 | **EXISTS vs IN: When It Really Matters** 1605 | ```sql 1606 | -- When the subquery might return NULLs, EXISTS is not just faster but correct: 1607 | SELECT c.first_name, c.last_name 1608 | FROM customers c 1609 | WHERE EXISTS ( 1610 | SELECT 1 FROM orders o 1611 | WHERE o.customer_id = c.customer_id 1612 | ); 1613 | 1614 | -- IN can behave unexpectedly with NULLs and is often slower: 1615 | SELECT c.first_name, c.last_name 1616 | FROM customers c 1617 | WHERE c.customer_id IN ( 1618 | SELECT customer_id FROM orders -- If any customer_id is NULL, weird things happen 1619 | ); 1620 | ``` 1621 | 1622 | The EXISTS version also allows the database to stop searching as soon as it finds one matching order, while IN might need to build the entire set of customer IDs first. 1623 | 1624 | #### Join Optimization: Order and Type Matter 1625 | 1626 | The order in which you write your joins can significantly impact performance, especially with complex multi-table queries. 1627 | 1628 | **Join Order Strategy** 1629 | ```sql 1630 | -- Less efficient: Starting with the largest table 1631 | SELECT p.product_name, c.category_name, s.supplier_name 1632 | FROM products p -- 1 million rows 1633 | INNER JOIN categories c ON p.category_id = c.category_id -- 50 rows 1634 | INNER JOIN suppliers s ON p.supplier_id = s.supplier_id; -- 1000 rows 1635 | 1636 | -- More efficient: Start with smaller, more selective tables 1637 | SELECT p.product_name, c.category_name, s.supplier_name 1638 | FROM categories c -- 50 rows - start here 1639 | INNER JOIN products p ON c.category_id = p.category_id 1640 | INNER JOIN suppliers s ON p.supplier_id = s.supplier_id 1641 | WHERE c.category_name = 'Electronics'; -- Very selective condition 1642 | ``` 1643 | 1644 | While modern optimizers often reorder joins automatically, understanding this principle helps you write queries that work with the optimizer rather than forcing it to work harder. 1645 | 1646 | **Choosing Between JOIN Types Based on Data Distribution** 1647 | ```sql 1648 | -- When you need all customers regardless of orders, but want order info where available: 1649 | -- LEFT JOIN is appropriate and efficient 1650 | SELECT c.first_name, c.last_name, COUNT(o.order_id) as order_count 1651 | FROM customers c 1652 | LEFT JOIN orders o ON c.customer_id = o.customer_id 1653 | GROUP BY c.customer_id, c.first_name, c.last_name; 1654 | 1655 | -- But if you know most customers have orders, this might be faster: 1656 | -- Get customers with orders, then UNION customers without orders 1657 | SELECT c.first_name, c.last_name, COUNT(o.order_id) as order_count 1658 | FROM customers c 1659 | INNER JOIN orders o ON c.customer_id = o.customer_id 1660 | GROUP BY c.customer_id, c.first_name, c.last_name 1661 | UNION ALL 1662 | SELECT c.first_name, c.last_name, 0 as order_count 1663 | FROM customers c 1664 | WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id); 1665 | ``` 1666 | 1667 | #### Avoiding Performance Killers 1668 | 1669 | **Function Calls in WHERE Clauses** 1670 | ```sql 1671 | -- Performance killer: Function prevents index usage 1672 | SELECT * FROM orders 1673 | WHERE YEAR(order_date) = 2024; -- Index on order_date cannot be used 1674 | 1675 | -- Index-friendly version: 1676 | SELECT * FROM orders 1677 | WHERE order_date >= '2024-01-01' 1678 | AND order_date < '2025-01-01'; -- Index on order_date can be used efficiently 1679 | ``` 1680 | 1681 | When you wrap a column in a function, the database can't use indexes on that column because it would need to calculate the function result for every row to use the index. 1682 | 1683 | **Implicit Type Conversions** 1684 | ```sql 1685 | -- Hidden performance problem: If customer_id is INT but you pass a string 1686 | SELECT * FROM orders WHERE customer_id = '12345'; -- Implicit conversion 1687 | 1688 | -- Better: Match the data type exactly 1689 | SELECT * FROM orders WHERE customer_id = 12345; -- Direct comparison 1690 | 1691 | -- Even worse: This forces conversion of ALL customer_id values 1692 | SELECT * FROM orders WHERE CAST(customer_id AS VARCHAR) = '12345'; 1693 | ``` 1694 | 1695 | **Premature DISTINCT Usage** 1696 | ```sql 1697 | -- Expensive: DISTINCT requires sorting/hashing entire result set 1698 | SELECT DISTINCT c.first_name, c.last_name 1699 | FROM customers c 1700 | INNER JOIN orders o ON c.customer_id = o.customer_id; 1701 | 1702 | -- Often better: Use EXISTS to avoid duplicates in the first place 1703 | SELECT c.first_name, c.last_name 1704 | FROM customers c 1705 | WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id); 1706 | ``` 1707 | 1708 | #### Window Function Performance Considerations 1709 | 1710 | Window functions are powerful but can be resource-intensive. Understanding their performance characteristics helps you use them wisely. 1711 | 1712 | **Partitioning Strategy** 1713 | ```sql 1714 | -- Less efficient: Large partitions mean more sorting work 1715 | SELECT 1716 | product_name, 1717 | price, 1718 | ROW_NUMBER() OVER (ORDER BY price) as overall_rank -- Sorts ALL products 1719 | FROM products; 1720 | 1721 | -- More efficient: Smaller partitions reduce sorting overhead 1722 | SELECT 1723 | product_name, 1724 | category, 1725 | price, 1726 | ROW_NUMBER() OVER (PARTITION BY category ORDER BY price) as category_rank -- Sorts within category 1727 | FROM products; 1728 | ``` 1729 | 1730 | **Frame Specification Impact** 1731 | ```sql 1732 | -- Expensive: UNBOUNDED PRECEDING with large datasets 1733 | SELECT 1734 | order_date, 1735 | total_amount, 1736 | SUM(total_amount) OVER ( 1737 | ORDER BY order_date 1738 | ROWS UNBOUNDED PRECEDING -- Processes all previous rows for each row 1739 | ) as running_total 1740 | FROM orders; 1741 | 1742 | -- More efficient for recent data analysis: Limited window 1743 | SELECT 1744 | order_date, 1745 | total_amount, 1746 | SUM(total_amount) OVER ( 1747 | ORDER BY order_date 1748 | ROWS 29 PRECEDING -- Only look at last 30 days 1749 | ) as rolling_30_day_total 1750 | FROM orders 1751 | WHERE order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY); 1752 | ``` 1753 | 1754 | #### Query Plan Analysis and Optimization 1755 | 1756 | Learning to read execution plans is crucial for performance tuning. Here's what to look for: 1757 | 1758 | **Identifying Expensive Operations** 1759 | ```sql 1760 | -- Use EXPLAIN to see the execution plan 1761 | EXPLAIN SELECT c.first_name, c.last_name, COUNT(o.order_id) 1762 | FROM customers c 1763 | LEFT JOIN orders o ON c.customer_id = o.customer_id 1764 | WHERE c.registration_date >= '2024-01-01' 1765 | GROUP BY c.customer_id, c.first_name, c.last_name; 1766 | ``` 1767 | 1768 | In the execution plan, watch for these red flags: 1769 | - **Table scans** on large tables (should use indexes) 1770 | - **Hash joins** on large datasets (nested loop joins might be better with proper indexes) 1771 | - **Sorting operations** on large result sets (consider if you really need ORDER BY) 1772 | - **High row count estimates** that don't match reality (statistics might be outdated) 1773 | 1774 | **Understanding Cost Estimates** 1775 | The database optimizer makes decisions based on statistics about your data. If these statistics are wrong, the optimizer makes poor choices. Regular statistics updates are crucial: 1776 | 1777 | ```sql 1778 | -- Update table statistics (syntax varies by database) 1779 | ANALYZE TABLE customers; 1780 | UPDATE STATISTICS customers; 1781 | ``` 1782 | 1783 | This is especially important after large data loads or significant changes to data distribution. 1784 | 1785 | 1786 | ### Security Considerations 1787 | 1788 | 1. **Use parameterized queries** to prevent SQL injection 1789 | 2. **Implement proper access controls** at the database level 1790 | 3. **Audit sensitive operations** like DELETE and UPDATE 1791 | 4. **Regular backups** before major data modifications 1792 | 1793 | --- 1794 | --------------------------------------------------------------------------------