├── .gitignore ├── README.md ├── docker-compose.yaml ├── images ├── ERD.PNG ├── bash_1.PNG ├── bash_2.PNG ├── bash_3.PNG ├── bash_4.PNG ├── bash_5.PNG ├── bash_6.PNG ├── terminal_1.PNG └── terminal_2.PNG ├── source_data ├── csv_data │ ├── cities.csv │ ├── countries.csv │ ├── currencies.csv │ └── languages.csv ├── csv_output │ └── output.csv └── scripts │ ├── build_tables.sql │ ├── cleanup_db.sql │ ├── code_challenge.sql │ ├── create_relationships.sql │ ├── drop_all_tables.sql │ ├── normalize_tables.sql │ ├── remove_duplicates.sql │ └── start_fresh.sh └── walkthrough ├── WALKTHROUGH_1_DOCKER.md ├── WALKTHROUGH_2_BUILD.md ├── WALKTHROUGH_3_NORMALIZE.md ├── WALKTHROUGH_4_REMOVE_DUPLICATES.md ├── WALKTHROUGH_5_CLEANUP.md ├── WALKTHROUGH_6_RELATIONSHIPS.md ├── WALKTHROUGH_7_CODE_CHALLENGE.md └── WALKTHROUGH_BONUS_BASH.md /.gitignore: -------------------------------------------------------------------------------- 1 | /db/pgdata 2 | ./vscode -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | This project is an opportunity to practice your SQL skills. This project was inspired by the [Braintree Analytics Code Challenge](https://github.com/AlexanderConnelly/BrainTree_SQL_Coding_Challenge_Data_Analyst) and was created to strengthen/sharpen your SQL knowledge. 9 | 10 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 11 | 12 | **Please Note**: 13 | 14 | This project is meant to measure/gauge/practice your technical abilities with SQL. 15 | 16 | * All work should be done in SQL. 17 | * Do not alter the CSV files. 18 | * Most, if not all of these questions are intended purely as a measure of your SQL skills and not as actual questions that one would expect to answer from an employer. 19 | 20 | An SQL Code Challenge [Walkthrough](walkthrough/WALKTHROUGH_1_DOCKER.md) has been created if you would like to follow along in a step by step fashion. 21 | 22 | The questions are listed below with the expected results and answers hidden so you can cross reference your answers. 23 | 24 | This repository contains all of the necessary files, datasets and directories for running a PostgresSQL server in a Docker Container. The only prerequisite is that you should already have Docker Desktop installed and running on your computer. 25 | 26 | https://www.docker.com/products/docker-desktop/ 27 | 28 | ### Getting Started 29 | 30 | * This project uses the latest version of `PostgreSQL`, but any SQL database may be used. 31 | * Directories and Files details: 32 | * `README.md`: This is the file your are currently reading. 33 | * `.gitignore`: The files and directories listed in this file will be ignored by Git/GitHub. 34 | * `docker-compose.yaml`: With this yaml (**Y**et **A**nother **M**arkup **L**anguage) file you can define the services (Postgres/MySQL/Python...) and with a single command, you can spin everything up or tear it all down. 35 | * `source_data/csv_data`: This is the location of the CSV files needed to copy into our database. 36 | * `db`: In the current directory, create an empty directory named '`db`'. Although this directory is ignored by Git/GitHub, PostgreSQL requires a directory to keep data persistent if you are running a Docker container. There will be no need to alter any of the files that reside there once the container starts running. 37 | * `images`: This is the location of the image files displayed in this repository. 38 | * `walkthrough/`: A directory with a beginner friendly walkthrough of all of the steps I took to complete the SQL challenge. 39 | * `source_data/`: This directory contains all the files and scripts within our Docker container. 40 | * `source_data/csv_data`: This is the location of the CSV files needed to copy/insert into our database. 41 | * `source_data/scripts/`: This directory will contain all sql scripts I used to complete the challenge. 42 | * `source_data/csv_output/`: This directory will contain all CSV files generated by some of our queries. 43 | 44 | :grey_exclamation: :star: Start here if you want to follow along with the [SQL Challenge WALKTHROUGH](./walkthrough/WALKTHROUGH_1_DOCKER.md) :star: :grey_exclamation: 45 | 46 | ### SQL Code Challenge 47 | 1. Create Database, Tables and Relations. 48 | Using the CSV files located in `source_data/csv_data`, create your new SQL database and tables with the properly formatted data. 49 | 50 | * Add a numeric, auto-incrementing Primary Key to every table. 51 | * In the `countries` table, add the column `created_on` with the current date. 52 | * Create a one-to-one and one-to-many relationship with the countries table as the parent table. 53 | 54 | 2. List Regions and Country Count 55 | List all of the regions and the total number of countries in each region. Order by country count in descending order and capitalize the region name. 56 | 57 |
58 | Click to expand expected results! 59 | 60 | ##### Expected Results: 61 | 62 | region |country_count| 63 | ---------|-------------| 64 | Africa | 59| 65 | Americas | 57| 66 | Asia | 50| 67 | Europe | 48| 68 | Oceania | 26| 69 | Antartica| 1| 70 | 71 |
72 |

73 | 74 |
75 | Click to expand answer! 76 | 77 | ##### Answer 78 | ```sql 79 | SELECT 80 | -- initcap() capitalizes the first letter of every word in a string. 81 | initcap(region) AS region, 82 | count(*) AS country_count 83 | FROM 84 | cleaned_data.countries 85 | GROUP BY 86 | -- Aggregate functions 'count()' require you to group all column fields. 87 | region 88 | ORDER BY 89 | country_count DESC; 90 | ``` 91 |
92 |
93 | 94 | 3. List Sub-Regions and City Count 95 | List all of the sub-regions and the total number of cities in each sub-region. Order by sub-region name alphabetically. 96 | 97 |
98 | Click to expand expected results! 99 | 100 | ##### Expected Results: 101 | 102 | sub_region |city_count| 103 | -------------------------------|----------| 104 | Australia And New Zealand | 329| 105 | Central Asia | 560| 106 | Eastern Asia | 3164| 107 | Eastern Europe | 2959| 108 | Latin America And The Caribbean| 7204| 109 | Melanesia | 60| 110 | Micronesia | 15| 111 | Northern Africa | 1152| 112 | Northern America | 5844| 113 | Northern Europe | 2025| 114 | Polynesia | 22| 115 | Southeastern Asia | 2627| 116 | Southern Asia | 6848| 117 | Southern Europe | 3238| 118 | Subsaharan Africa | 3223| 119 | Western Asia | 1400| 120 | Western Europe | 3952| 121 | 122 |
123 |

124 | 125 |
126 | Click to expand answer! 127 | 128 | ##### Answer 129 | ```sql 130 | SELECT 131 | initcap(t1.sub_region) AS sub_region, 132 | count(*) AS city_count 133 | FROM 134 | cleaned_data.countries AS t1 135 | JOIN 136 | cleaned_data.cities AS t2 137 | ON 138 | t1.country_code_2 = t2.country_code_2 139 | GROUP BY 140 | t1.sub_region 141 | ORDER BY 142 | t1.sub_region; 143 | ``` 144 |
145 |
146 | 147 | 4. Specific Sub-Region and String Functions 148 | List all of the countries and the total number of cities in the Northern Europe sub-region. List the country names in uppercase and order the list by the length of the country name and alphabetically in ascending order. 149 | 150 |
151 | Click to expand expected results! 152 | 153 | ##### Expected Results: 154 | 155 | country_name |city_count| 156 | --------------|----------| 157 | JERSEY | 1| 158 | LATVIA | 39| 159 | NORWAY | 127| 160 | SWEDEN | 148| 161 | DENMARK | 75| 162 | ESTONIA | 20| 163 | FINLAND | 142| 164 | ICELAND | 12| 165 | IRELAND | 64| 166 | LITHUANIA | 61| 167 | ISLE OF MAN | 2| 168 | FAROE ISLANDS | 29| 169 | UNITED KINGDOM| 1305| 170 | 171 |
172 |

173 | 174 |
175 | Click to expand answer! 176 | 177 | ##### Answer 178 | ```sql 179 | SELECT 180 | -- upper() returns your string in uppercase. 181 | upper(t1.country_name) AS country_name, 182 | count(*) AS city_count 183 | FROM 184 | cleaned_data.countries AS t1 185 | JOIN 186 | cleaned_data.cities AS t2 187 | ON 188 | t1.country_code_2 = t2.country_code_2 189 | WHERE 190 | t1.sub_region = 'northern europe' 191 | GROUP BY 192 | t1.country_name 193 | ORDER BY 194 | -- length() returns the number or characters in a string including spaces. 195 | length(t1.country_name), t1.country_name; 196 | ``` 197 |
198 |
199 | 200 | 201 | 5. List Specific Countries by Year 202 | List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021. Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities. 203 | 204 |
205 | Click to expand expected results! 206 | 207 | ##### Expected Results: 208 | 209 | country_name |city_count| 210 | ----------------------|----------| 211 | Andorra | 5| 212 | Albania | 11| 213 | Bosnia And Herzegovina| 15| 214 | Croatia | 22| 215 | North Macedonia | 28| 216 | Malta | 32| 217 | Serbia | 58| 218 | Slovenia | 74| 219 | Greece | 64| 220 | Portugal | 109| 221 | Spain | 302| 222 | San Marino | 2| 223 | Montenegro | 12| 224 | Italy | 542| 225 | 226 |
227 |

228 | 229 |
230 | Click to expand answer! 231 | 232 | ##### Answer 233 | ```sql 234 | SELECT 235 | initcap(t1.country_name) AS country_name, 236 | count(*) AS city_count 237 | FROM 238 | cleaned_data.countries AS t1 239 | JOIN 240 | cleaned_data.cities AS t2 241 | ON 242 | t1.country_code_2 = t2.country_code_2 243 | WHERE 244 | t1.sub_region = 'southern europe' 245 | AND 246 | -- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours..... 247 | EXTRACT('year' FROM t2.insert_date) = 2021 248 | GROUP BY 249 | t1.country_name 250 | ORDER BY 251 | -- substring() function returns a specific portion of a string. 252 | substring(t1.country_name,length(t1.country_name),1), city_count; 253 | ``` 254 |
255 |
256 | 257 | 6. List Anti-Join 258 | List all of the countries in the region of Asia that did **NOT** have a city with an inserted date from June 2021 through Sept 2021. 259 | 260 |
261 | Click to expand expected results! 262 | 263 | ##### Expected Results: 264 | 265 | country_name | 266 | -----------------| 267 | Brunei Darussalam| 268 | Kuwait | 269 | Macao | 270 | Singapore | 271 | 272 |
273 |

274 | 275 |
276 | Click to expand answer! 277 | 278 | ##### Answer 279 | ```sql 280 | SELECT 281 | -- Distinct will only return unique values 282 | DISTINCT initcap(t1.country_name) AS country_name 283 | FROM 284 | cleaned_data.countries AS t1 285 | -- Left join will return all matching values from the left table (cleaned_data.countries) and 286 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where 287 | -- there are no matches in the left table. 288 | LEFT JOIN 289 | cleaned_data.cities AS t2 290 | ON 291 | t1.country_code_2 = t2.country_code_2 292 | AND 293 | t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01' 294 | WHERE 295 | t1.region = 'asia' 296 | -- Only return values that did NOT have a match with the countries table. 297 | AND 298 | t2.country_code_2 IS NULL; 299 | ``` 300 |
301 |
302 | 303 | 7. Reversable Names 304 | List the country, city name, population and city name length for the city names that are [palindromes](https://en.wikipedia.org/wiki/Palindrome) in the Western Asia sub-region. Format the population with a thousands separator (1,000) and format the length of the city name in roman numerals. Order by the length of the city name in descending order and alphabetically in ascending order. 305 | 306 |
307 | Click to expand expected results! 308 | 309 | ##### Expected Results: 310 | 311 | country_name |city_name|population|roman_numeral_length| 312 | --------------------|---------|----------|--------------------| 313 | Yemen |Hajjah | 46,568 | VI | 314 | Syrian Arab Republic|Hamah | 696,863 | V | 315 | Turkey |Kavak | 21,692 | V | 316 | Turkey |Kinik | 29,803 | V | 317 | Turkey |Tut | 10,161 | III | 318 | 319 |
320 |

321 | 322 |
323 | Click to expand answer! 324 | 325 | ##### Answer 326 | ```sql 327 | SELECT 328 | initcap(t1.country_name) AS country_name, 329 | initcap(t2.city_name) AS city_name, 330 | -- to_char() takes non-string data types and returns them as strings. 331 | to_char(t2.population, '999G999') AS population, 332 | to_char(length(t2.city_name), 'RN') AS roman_numeral_length 333 | FROM 334 | cleaned_data.countries AS t1 335 | JOIN 336 | cleaned_data.cities AS t2 337 | ON 338 | t1.country_code_2 = t2.country_code_2 339 | WHERE 340 | t2.city_name = reverse(t2.city_name) 341 | AND 342 | t1.sub_region = 'western asia' 343 | ORDER BY 344 | length(t2.city_name) DESC, t2.city_name ASC; 345 | ``` 346 |
347 |
348 | 349 | 8. Search with Wildcard and Case 350 | List all of the countries that end in 'stan'. Make your query case-insensitive and list whether the total population of the cities listed is an odd or even number for cities inserted in 2022. Order by whether the population value is odd or even in ascending order and country name in alphabetical order. 351 | 352 |
353 | Click to expand expected results! 354 | 355 | ##### Expected Results: 356 | 357 | country_name|total_population|odd_or_even| 358 | ------------|----------------|-----------| 359 | Afghanistan | 6,006,530 |Even | 360 | Kazakhstan | 4,298,264 |Even | 361 | Kyrgyzstan | 1,017,644 |Even | 362 | Pakistan | 26,344,480 |Even | 363 | Tajikistan | 2,720,953 |Odd | 364 | Turkmenistan| 419,607 |Odd | 365 | Uzbekistan | 3,035,547 |Odd | 366 | 367 |
368 |

369 | 370 |
371 | Click to expand answer! 372 | 373 | ##### Answer 374 | ```sql 375 | SELECT 376 | initcap(t1.country_name) AS country_name, 377 | to_char(sum(t2.population), '99G999G999') total_population, 378 | CASE 379 | WHEN (sum(t2.population) % 2) = 0 380 | THEN 'Even' 381 | ELSE 382 | 'Odd' 383 | END AS odd_or_even 384 | FROM 385 | cleaned_data.countries AS t1 386 | JOIN 387 | cleaned_data.cities AS t2 388 | ON 389 | t1.country_code_2 = t2.country_code_2 390 | WHERE 391 | t1.country_name ILIKE '%stan' 392 | AND 393 | EXTRACT('year' FROM t2.insert_date) = 2022 394 | GROUP BY 395 | t1.country_name 396 | ORDER BY 397 | odd_or_even, country_name; 398 | ``` 399 |
400 |
401 | 402 | 9. Ranking Regions 403 | List the third most populated city ranked by region WITHOUT using limit or offset. List the region name, city name, population and order the results by region. 404 | 405 |
406 | Click to expand expected results! 407 | 408 | ##### Expected Results: 409 | 410 | region |city_name|third_largest_pop| 411 | --------|---------|-----------------| 412 | Africa |Kinshasa | 12,836,000 | 413 | Americas|New York | 18,972,871 | 414 | Asia |Delhi | 32,226,000 | 415 | Europe |Paris | 11,060,000 | 416 | Oceania |Brisbane | 2,360,241 | 417 | 418 |
419 |

420 | 421 |
422 | Click to expand answer! 423 | 424 | ##### Answer 425 | ```sql 426 | WITH get_city_rank_cte AS ( 427 | SELECT 428 | t1.region, 429 | t2.city_name, 430 | t2.population AS third_largest_pop, 431 | DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk 432 | FROM 433 | cleaned_data.countries AS t1 434 | JOIN 435 | cleaned_data.cities AS t2 436 | ON 437 | t1.country_code_2 = t2.country_code_2 438 | WHERE 439 | t2.population IS NOT NULL 440 | GROUP BY 441 | t1.region, 442 | t2.city_name, 443 | t2.population 444 | ) 445 | SELECT 446 | initcap(region) AS region, 447 | initcap(city_name) AS city_name, 448 | to_char(third_largest_pop, '99G999G999') AS third_largest_pop 449 | FROM 450 | get_city_rank_cte 451 | WHERE 452 | rnk = 3; 453 | ``` 454 |
455 |
456 | 457 | 10. Using Buckets 458 | List the bottom third of all countries in the Western Asia sub-region that speak Arabic. Include the row number and country name. Order by row number. 459 | 460 |
461 | Click to expand expected results! 462 | 463 | ##### Expected Results: 464 | 465 | row_id|country_name | 466 | ------|--------------------| 467 | 9|saudi arabia | 468 | 10|syrian arab republic| 469 | 11|united arab emirates| 470 | 12|yemen | 471 | 472 |
473 |

474 | 475 |
476 | Click to expand answer! 477 | 478 | ##### Answer 479 | ```sql 480 | WITH get_ntile_cte AS ( 481 | SELECT 482 | ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn, 483 | t1.country_name, 484 | -- ntile() window functions returns groups of data section into 'buckets'. 485 | NTILE(3) OVER (ORDER BY t1.country_name) AS nt 486 | FROM 487 | cleaned_data.countries AS t1 488 | JOIN 489 | cleaned_data.languages AS t2 490 | ON 491 | t1.country_code_2 = t2.country_code_2 492 | WHERE 493 | t1.sub_region = 'western asia' 494 | AND 495 | t2.language = 'arabic' 496 | ) 497 | SELECT 498 | rn AS row_id, 499 | country_name 500 | FROM 501 | get_ntile_cte 502 | WHERE 503 | nt = 3; 504 | ``` 505 |
506 |
507 | 508 | 11. Using Arrays 509 | Create a query that lists country name, capital name, population, languages spoken and currency name for countries in the Northen Africa sub-region. There can be multiple currency names and languages spoken per country. Add multiple values for the same field into an array. 510 | 511 |
512 | Click to expand expected results! 513 | 514 | ##### Expected Results: 515 | 516 | country_name|city_name|population|languages |currencies | 517 | ------------|---------|----------|--------------------------------------------|---------------| 518 | algeria |algiers | 3415811|{french,arabic,kabyle} |algerian dinar | 519 | egypt |cairo | 20296000|{arabic} |egyptian pound | 520 | libya |tripoli | 1293016|{arabic} |libyan dinar | 521 | morocco |rabat | 572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham| 522 | sudan |khartoum | 7869000|{arabic,english} |sudanese pound | 523 | tunisia |tunis | 1056247|{french,arabic} |tunisian dinar | 524 | 525 |
526 |

527 | 528 |
529 | Click to expand answer! 530 | 531 | ##### Answer 532 | ```sql 533 | WITH get_row_values AS ( 534 | SELECT 535 | t1.country_name, 536 | t2.city_name, 537 | t2.population, 538 | -- array_agg() aggregates multiple values and returns them in 'array' format. 539 | array_agg(t3.LANGUAGE) AS languages, 540 | t4.currency_name AS currencies 541 | FROM 542 | cleaned_data.countries AS t1 543 | JOIN 544 | cleaned_data.cities AS t2 545 | ON 546 | t1.country_code_2 = t2.country_code_2 547 | JOIN 548 | cleaned_data.languages AS t3 549 | ON 550 | t1.country_code_2 = t3.country_code_2 551 | JOIN 552 | cleaned_data.currencies AS t4 553 | ON 554 | t1.country_code_2 = t4.country_code_2 555 | WHERE 556 | t1.sub_region = 'northern africa' 557 | AND 558 | t2.capital = TRUE 559 | GROUP BY 560 | t1.country_name, 561 | t2.city_name, 562 | t2.population, 563 | t4.currency_name 564 | ) 565 | SELECT 566 | * 567 | FROM 568 | get_row_values; 569 | ``` 570 |
571 |
572 | 573 | 12. Using Case and Percentages 574 | Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022. List how many vowels and consonants are present in the city name and concatnate their percentage to the their respective count in parenthesis. 575 | 576 |
577 | Click to expand expected results! 578 | 579 | ##### Expected Results: 580 | 581 | city_name |vowel_count_perc|consonants_count_perc| 582 | ---------------|----------------|---------------------| 583 | standish |2 (25.00%) |6 (75%) | 584 | grand forks |2 (18.18%) |9 (81.82%) | 585 | camano |3 (50.00%) |3 (50%) | 586 | cedar hills |3 (27.27%) |8 (72.73%) | 587 | gladstone |3 (33.33%) |6 (66.67%) | 588 | whitehall |3 (33.33%) |6 (66.67%) | 589 | homewood |4 (50.00%) |4 (50%) | 590 | willowbrook |4 (36.36%) |7 (63.64%) | 591 | port salerno |4 (33.33%) |8 (66.67%) | 592 | vadnais heights|5 (33.33%) |10 (66.67%) | 593 | jeffersonville |5 (35.71%) |9 (64.29%) | 594 | 595 |
596 |

597 | 598 |
599 | Click to expand answer! 600 | 601 | ##### Answer 602 | ```sql 603 | WITH get_letter_count AS ( 604 | SELECT 605 | t1.city_name, 606 | length(t1.city_name) string_length, 607 | -- regexp_replace() returns a vlue that has been manipulated by a regular expression. 608 | length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count 609 | FROM 610 | cleaned_data.cities AS t1 611 | WHERE 612 | t1.insert_date = '2022-04-28' 613 | AND 614 | t1.country_code_2 in ('us') 615 | ), 616 | get_letter_diff AS ( 617 | SELECT 618 | city_name, 619 | string_length - consonant_count AS vowels, 620 | round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc, 621 | consonant_count AS consonants, 622 | round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc 623 | FROM 624 | get_letter_count 625 | ) 626 | SELECT 627 | city_name, 628 | vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc, 629 | consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc 630 | FROM 631 | get_letter_diff 632 | ORDER BY 633 | vowels; 634 | ``` 635 |
636 |
637 | 638 | 13. Most Consecutive Days 639 | List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted in April 2022. 640 | 641 |
642 | Click to expand expected results! 643 | 644 | ##### Expected Results: 645 | 646 | most_consecutive_dates|city_name | 647 | ----------------------|------------| 648 | 2022-04-22|South Dundas| 649 | 2022-04-23|La Prairie | 650 | 2022-04-24|Elliot Lake | 651 | 2022-04-25|Lachute | 652 | 653 |
654 |

655 | 656 |
657 | Click to expand answer! 658 | 659 | ##### Answer 660 | ```sql 661 | DROP TABLE IF EXISTS get_dates; 662 | CREATE TEMP TABLE get_dates AS ( 663 | SELECT 664 | DISTINCT ON (insert_date) insert_date AS insert_date, 665 | city_name 666 | FROM 667 | cleaned_data.cities 668 | WHERE 669 | country_code_2 = 'ca' 670 | AND 671 | insert_date BETWEEN '2022-04-01' AND '2022-04-30' 672 | ORDER BY 673 | insert_date 674 | ); 675 | 676 | DROP TABLE IF EXISTS get_diff; 677 | CREATE TEMP TABLE get_diff AS ( 678 | SELECT 679 | city_name, 680 | insert_date, 681 | EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff 682 | FROM 683 | get_dates 684 | ); 685 | 686 | 687 | DROP TABLE IF EXISTS get_diff_count; 688 | CREATE TEMP TABLE get_diff_count AS ( 689 | SELECT 690 | city_name, 691 | insert_date, 692 | count(*) OVER (PARTITION BY diff) AS diff_count 693 | FROM 694 | get_diff 695 | ); 696 | 697 | 698 | WITH get_rank AS ( 699 | SELECT 700 | DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk, 701 | insert_date, 702 | city_name 703 | FROM 704 | get_diff_count 705 | ) 706 | SELECT 707 | insert_date AS most_consecutive_dates, 708 | initcap(city_name) AS city_name 709 | FROM 710 | get_rank 711 | WHERE 712 | rnk = 1 713 | ORDER BY 714 | insert_date; 715 | ``` 716 |
717 |
718 | 719 | 14. Month over Month in View 720 | Create a view that lists the month-year, the number of cities inserted for that month, a running city count total and the month over month percentage growth for 2021. 721 | 722 | Format the cities count and the running total with the thousands separator and format the month over month growth with a plus symbol and percentage symbol 723 | 724 | Example: 725 | month_year|cities_inserted|running_total|month_over_month| 726 | ----------|---------------|-------------|----------------| 727 | Feb-2021 | 1,291 | 2,762 |+87.76% | 728 | 729 |
730 | Click to expand expected results! 731 | 732 | ##### Expected Results: 733 | 734 | month_year|cities_inserted|running_total|month_over_month| 735 | ----------|---------------|-------------|----------------| 736 | Jan-2021 | 1,471 | 1,471 | | 737 | Feb-2021 | 1,291 | 2,762 |+87.76% | 738 | Mar-2021 | 1,485 | 4,247 |+53.77% | 739 | Apr-2021 | 1,508 | 5,755 |+35.51% | 740 | May-2021 | 1,509 | 7,264 |+26.22% | 741 | Jun-2021 | 1,395 | 8,659 |+19.20% | 742 | Jul-2021 | 1,394 | 10,053 |+16.10% | 743 | Aug-2021 | 1,481 | 11,534 |+14.73% | 744 | Sep-2021 | 1,452 | 12,986 |+12.59% | 745 | Oct-2021 | 1,446 | 14,432 |+11.14% | 746 | Nov-2021 | 1,378 | 15,810 |+9.55% | 747 | Dec-2021 | 1,472 | 17,282 |+9.31% | 748 | 749 |
750 |

751 | 752 |
753 | Click to expand answer! 754 | 755 | ##### Answer 756 | ```sql 757 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth; 758 | CREATE VIEW cleaned_data.view_2021_growth AS ( 759 | WITH get_month_count AS ( 760 | SELECT 761 | date_trunc('month', insert_date) as single_month, 762 | count(*) AS monthly_count 763 | FROM 764 | cleaned_data.cities 765 | WHERE 766 | EXTRACT('year' FROM insert_date) = 2021 767 | GROUP BY 768 | single_month 769 | ORDER BY 770 | single_month 771 | ), 772 | get_running_total AS ( 773 | SELECT 774 | single_month::date, 775 | monthly_count, 776 | sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities 777 | FROM 778 | get_month_count 779 | ), 780 | get_month_over_month AS ( 781 | SELECT 782 | single_month, 783 | monthly_count, 784 | total_num_cities, 785 | round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month 786 | FROM 787 | get_running_total 788 | ) 789 | SELECT 790 | to_char(single_month, 'Mon-YYYY') AS month_year, 791 | to_char(monthly_count, '9G999') AS cities_inserted, 792 | to_char(total_num_cities, '99G999') AS running_total, 793 | to_char(month_over_month, 'sg99.99') || '%' AS month_over_month 794 | FROM 795 | get_month_over_month 796 | ); 797 | 798 | SELECT 799 | * 800 | FROM 801 | cleaned_data.view_2021_growth; 802 | ``` 803 |
804 |
805 | 806 | 15. Stored Procedure to CSV 807 | Create and call a stored procedure that lists a unique row id number, insert date, country name, city name, population and languages spoken for countries in the Latin America and Caribbean sub-region that were inserted on either '2022-04-09', '2022-04-28' or '2022-08-11'. 808 | 809 | Order by the insert date and output the results (including headers) to a CSV file located in [/source_data/csv_output/](../source_data/csv_output/) . 810 | 811 |
812 | Click to expand expected results! 813 | 814 | ##### Expected Results: 815 | 816 | Results located in [/source_data/csv_output/output.csv](../source_data/csv_output/output.csv) 817 | 818 |
819 |

820 | 821 |
822 | Click to expand answer! 823 | 824 | ##### Answer 825 | ```sql 826 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output () 827 | LANGUAGE plpgsql 828 | AS 829 | $sproc$ 830 | BEGIN 831 | COPY ( 832 | SELECT 833 | ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id, 834 | t1.insert_date, 835 | t2.country_name, 836 | t1.city_name, 837 | t1.population, 838 | array_agg(t3.language) AS languages 839 | FROM 840 | cleaned_data.cities AS t1 841 | JOIN 842 | cleaned_data.countries AS t2 843 | ON 844 | t1.country_code_2 = t2.country_code_2 845 | LEFT JOIN 846 | cleaned_data.languages AS t3 847 | ON 848 | t2.country_code_2 = t3.country_code_2 849 | WHERE 850 | t2.sub_region = 'latin america and the caribbean' 851 | AND 852 | t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11') 853 | GROUP BY 854 | t1.insert_date, 855 | t2.country_name, 856 | t1.city_name, 857 | t1.population 858 | ORDER BY 859 | t1.insert_date 860 | ) 861 | TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER; 862 | END 863 | $sproc$; 864 | 865 | -- Call the stored procedure 866 | CALL cleaned_data.sproc_output(); 867 | ``` 868 |
869 |
870 | 871 | :grey_exclamation: Start here if you want to follow along with the [SQL Challenge WALKTHROUGH](./walkthrough/WALKTHROUGH_1_DOCKER.md) :grey_exclamation: 872 | 873 | :exclamation: If you found this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 874 | 875 | 876 | 877 | -------------------------------------------------------------------------------- /docker-compose.yaml: -------------------------------------------------------------------------------- 1 | version: '3.8' 2 | services: 3 | # This service can be named anything but we'll call it the image it's running 4 | postgres: 5 | # This will get the lastest official PostgreSQL image from DockerHub 6 | image: postgres 7 | # The container chosen is the name of the project 8 | container_name: sql_coding_challenge 9 | restart: always 10 | # This is the name of the env file. This file is hidden and has evironmental variables. 11 | # All values are hard-coded so the file has been commented out. 12 | # env_file: 13 | # - .env 14 | environment: 15 | # The database name 16 | POSTGRES_DB: sql_coding_challenge 17 | # PostgreSQL Username 18 | POSTGRES_USER: postgres 19 | # PostgreSQL Password 20 | POSTGRES_PASSWORD: postgres 21 | # The location where PostgreSQL stores its data (Do not change unless you know what your doing) 22 | PGDATA: /var/lib/postgresql/data/pgdata 23 | # Default ports that PostgreSQL uses 24 | ports: 25 | - "5432:5432" 26 | volumes: 27 | - 28 | # We bind the db directory in our project file to where PostgreSQL stores its data 29 | # to keep data stored persistent when PostgreSQL restarts 30 | type: bind 31 | source: ./db 32 | target: /var/lib/postgresql/data 33 | - 34 | # We bind our source data file so that the PostgreSQL container can access files and 35 | # scripts without having to copy them into the container 36 | type: bind 37 | source: ./source_data 38 | target: /var/lib/postgresql/source_data 39 | networks: 40 | - reference 41 | networks: 42 | reference: -------------------------------------------------------------------------------- /images/ERD.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/ERD.PNG -------------------------------------------------------------------------------- /images/bash_1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_1.PNG -------------------------------------------------------------------------------- /images/bash_2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_2.PNG -------------------------------------------------------------------------------- /images/bash_3.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_3.PNG -------------------------------------------------------------------------------- /images/bash_4.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_4.PNG -------------------------------------------------------------------------------- /images/bash_5.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_5.PNG -------------------------------------------------------------------------------- /images/bash_6.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_6.PNG -------------------------------------------------------------------------------- /images/terminal_1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/terminal_1.PNG -------------------------------------------------------------------------------- /images/terminal_2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/terminal_2.PNG -------------------------------------------------------------------------------- /source_data/csv_data/countries.csv: -------------------------------------------------------------------------------- 1 | country_name,country_code_2,country_code_3,region,sub_region,intermediate_region 2 | !afg!hANistan,af?,afg,asia,$southern asia, 3 | alba$nia,aL,alb,europe!,southern *europe, 4 | Alb?ania,AL,alb,eur#ope,$southern e#urope, 5 | ?algeria?,d!z,dza,africa,northern africa, 6 | americ#an samoa?,as,as!m,o##ceania,polyne$sia, 7 | andoRra,Ad,and,europe,southern europe, 8 | cyprus ,cy,cyp,asia,western asia, 9 | angola,ao,ago,africa,sub-saharan africa,middle africa 10 | ang!uilla,ai,aia,americas,latin ame!rica and the caribbean,caribbean 11 | antarctica,aq,ata,antartica,, 12 | antigua and barbuda,ag,atg,americas,latin america and the caribbean,caribbean 13 | argentina,ar,arg,americas,latin america and the caribbean,south america 14 | armenia,a?m,arm,asia,western asia, 15 | aruba,aw,abw,americas,latin america and the caribbean,caribbean 16 | australia,au,aus,oceania,australia and new zealand, 17 | austria,at,aut,europe,western europe, 18 | azerbaijan,az,aze,asia,western asia, 19 | bahamas,bs,bhs,americas,latin america and the caribbean,caribbean 20 | bahrain,bh,bhr,asia,western asia, 21 | bangladesh,bd,bgd,asia,southern asia, 22 | barbados,bb,brb,americas,latin america and the caribbean,caribbean 23 | belarus,by,blr,europe,eastern europe, 24 | b!elgium,be,bel,europe,western europe, 25 | belize,bz,blz,americas,latin america and the caribbean,central america 26 | benin,bj,ben,africa,sub-saharan africa,western africa 27 | bermuda,bm,bmu,americas,northern america, 28 | bhutan,bt,btn,asia,southern asia, 29 | azerbaijan,az,aze,asia,western asia, 30 | bolivia,bo,bol,americas,latin america and the cari$bbean,south america 31 | "bonaire, sint eustatius and saba",bq,bes,americas,latin america and the caribbean,caribbean 32 | bosnia and herze$govina,ba,bih,europe,southern europe, 33 | botswana,bW,bwa,africa,sub-saharan africa,southern africa 34 | bouvet island,bv,bvt,americas,latin america and the caribbean,south america 35 | brazil,br,bra,americas,latin america and the caribbean,south america 36 | british indian ocean territory,io,iot,africa,sub-saharan africa,eastern africa 37 | brunei darussalam,bn,brn,asia,south-eastern asia, 38 | bulgaria,bg,bgr,europe,eastern europe, 39 | bu$rkina faso,bf,bfa,africa,sub-saharan africa,western africa 40 | burundi,bi,bdi,africa,sub-saharan africa,eastern africa 41 | cabo verde,cv,cpv,africa,sub-saharan africa,west$ern africa 42 | cambodia,kh,khm,asia,south-eastern asia, 43 | cyprus,cy,cyp,asia,western asia, 44 | cameroon,c$m,cmr,africa,sub-saharan africa,middle africa 45 | canada,ca,can,americas,northern america, 46 | cayman islands,ky,cym,americas,latin america and the caribbean,caribbean 47 | central african republic,cf,caf,africa,sub-saharan africa,middle africa 48 | chad,td,tcd,africa,sub-saharan africa,middle africa 49 | chile,cl,chl,americas,latin america and the caribbean,south america 50 | china ,cn,chn,asia,eastern asia, 51 | christmas island,cx,cxr,oceania,australia and new zealand, 52 | cocos islands,cc,cck,oceania,australia and new z?ealand, 53 | colombia,co,col,americas,latin america and the caribbean,south america 54 | com!oros,km,com,africa,sub-saharan africa,eastern africa 55 | republic of congo,cg,cog,africa,sub-saharan africa,middle africa 56 | democratic republic of the congo,cd,cod,africa,sub-saharan africa,middle africa 57 | cook islands,ck,cok,oceania,polynesia, 58 | costa rica,cr,cri,americas,latin america and the caribbean,central america 59 | ivory coast,ci,civ,africa,sub-saharan africa,western africa 60 | !!croatia,Hr,hrv,europe,southern europe, 61 | cuba,cu,cub,americas,latin america and the caribbean,caribbean 62 | curacao,cw,cuw,americas,latin a!merica and the caribbean,caribbean 63 | cyprus,cy,cyp,asia,western asia, 64 | czechia,cz,cze,europe,eastern europe, 65 | denmark,dk,dnk,europe,northern europe, 66 | djibouti,dj,dji,africa,sub-saharan africa,eastern africa 67 | dominica,dm,dma,americas,latin america and the caribbean,caribbean 68 | dominican republic,do,dom,americas,latin america and the caribbean,caribbean 69 | ecua$dor,ec,ecu,americas,latin america and the caribbean,south america 70 | egypt,eg,egy,africa,northern africa, 71 | el salvador,sv,slv,americas,latin america and the caribbean,central america 72 | equatorial guinea,gq,gnq,africa,sub-saharan africa,middle africa 73 | eritrea,er,eri,africa,sub-saharan africa,eastern africa 74 | estonia,ee,est,europe,northern europe, 75 | eswatini,s?z,swz,africa,sub-saharan africa,southern africa 76 | ethiopia,!et,eth,africa,sub-saharan africa,eastern africa 77 | falkland islands (malvinas),fk,flk,americas,latin america and the caribbean,south america 78 | faroe islands,fo,fro,europe,northern europe, 79 | fiji,fj,fji,oceania,melanesia, 80 | finland,fi,fin,europe,northern europe, 81 | france,fr,fra,europe,western europe, 82 | french guiana ,gf,guf,americas,latin america and the caribbean,south america 83 | french polynesia,pf,pyf,oceania,polynesia, 84 | macao,mo,mac,asia,eastern asia, 85 | french southern territories,tf,atf,africa,sub-saharan africa,eastern africa 86 | gabon,ga,gab,africa,sub-saharan africa,middle africa 87 | gambia,gm,gmb,africa,sub-saharan africa,western africa 88 | georgia,g!e,geo,asia,western asia, 89 | germany,de,deu,europe,western europe, 90 | ghana,gh,gha,africa,sub-saharan africa,western africa 91 | gibraltar,gi,gib,europe,southern europe, 92 | !china,cn,chn,asia,eastern asia, 93 | greece,gr,grc,europe,southern europe, 94 | greenland,gl,grl,americas,northern america, 95 | grenada,gd,grd,americas,la?tin america and the caribbean,caribbean 96 | guadeloupe,gp,glp,americas,latin america and the caribbean,caribbean 97 | guam,gu,gum,oceania,micronesia, 98 | guatemala,gt,gtm,americas,latin america and the caribbean,central america 99 | guinea,gn,gin,africa,sub-saharan africa,western africa 100 | guinea-bissau,gw,gnb,africa,sub-saharan africa,western africa 101 | guy$ana,gY,guy,americas,latin america and the caribbean,south america 102 | haiti,ht,hti,americas,latin america and the caribbean,carib!bean 103 | holy see,va,vat,europe,southern europe, 104 | honduras,hn,hnd,americas,latin america and the caribbean,central america 105 | hong kong,hk,hkg,asia,eastern asia, 106 | hungary,hu,hun,europe,eastern europe, 107 | iceland,is,isl,europe,northern europe, 108 | india,i!n,ind,asia,southern asia, 109 | indonesia,id,idn,asia,south-eastern asia, 110 | iran,ir,irn,asia,southern asia, 111 | iraq,iq,irq,asia,western asia, 112 | ireland,ie,irl,europe,northern europe, 113 | Ireland,ie,irl,europe,northern europe, 114 | isle of man,im,imn,europe,northern europe, 115 | israel,il,isr,asia,western asia, 116 | italy,it,ita,europe,southern europe, 117 | china,cn,chn,asia,eastern asia, 118 | jamaica,jm,jam,americas,latin america and the caribbean,caribbean 119 | japan,jp,jpn,asia,eastern asia, 120 | jersey,je,jey,europe,northern europe,channel islands 121 | jordan ,jo,jor,asia,western asia, 122 | kazakhstan ,kz,kaz,asia,central asia, 123 | kenya,ke,ken,africa,sub-saharan africa,eastern africa 124 | kiribati,ki,kir,oc!eania,micronesia, 125 | north korea,kp,prk,asia,eastern asia, 126 | south korea,kr,kor,asia,eastern asia, 127 | kUWait,kw,kwt,asia,western asia, 128 | kyrgyzstan,kg,kgz,asia,central asia, 129 | laos,la,lao,asia,south-eastern asia, 130 | latvia,lv,lva,europe,northern europe, 131 | lebanon,lb,lbn,asia,western asia, 132 | lesotho,ls,lso,africa,sub-saharan africa,southern africa 133 | liberia,lr,lbr,africa,sub-saharan africa,western africa 134 | libya,ly,lby,afr?ica,northern africa, 135 | liechtenstein,li,lie,europe,western europe, 136 | li!thuania,lt,ltu,europe,northern europe, 137 | luxembourg,lu,lux,europe,western europe, 138 | macao,mo,mac,asia,eastern asia, 139 | kUWAit,kw,kwt,asia,western asia, 140 | madagascar?!,mg,mdg,africa,sub-saharan africa,eastern africa 141 | malawi,mw,mwi,africa,sub-saharan africa,eastern africa 142 | malaysia,my,mys,asia,south-eastern asia, 143 | maldives,mv,mdv,asia,southern asia, 144 | mali,ml,mli,africa,sub-saharan africa,western africa 145 | malta ,mt,mlt,europe,southern europe, 146 | marshall islands,mh,mhl,oceania,micronesia, 147 | martinique,mq,mtq,americas,latin america and the caribbean,caribbean 148 | mauritania,mr,mrt,africa,sub-saharan africa,western africa 149 | mauritius,mu,mus,africa,sub-saharan africa,eastern africa 150 | mayotte,yt,myt,africa,sub-saharan africa,eastern africa 151 | libYa,ly,lby,africa,northern africa, 152 | mexico,mx,mex,americas,latin america and the caribbean,central america 153 | micronesia (federated states of),fm,fsm,oceania,micronesia, 154 | "moldova, republic of",md,mda,europe,eastern europe, 155 | monaco,mc,mco,euro#pe,western europe, 156 | mongolia,mn,mng,asia,eastern asia, 157 | mo#ntenegro,me,mne,europe,southern europe, 158 | montserrat,ms,msr,americas,latin america and the caribbean,caribbean 159 | morocco,ma,mar,africa,northern africa, 160 | mozambique,mz,moz,africa,sub-saharan africa,eastern africa 161 | myanmar,mm,mmr,asia,south-eastern asia, 162 | nAmibia,na,nam,africa,sub-saharan africa,southern africa 163 | nauru ,nr,nru,oceania,micronesia, 164 | nepal,np,npl,asia,southern asia, 165 | marshall islands,mh,mhl,oceania,micronesia, 166 | netherlands,nl,nld,europe,western europe, 167 | new caledonia,nc,ncl,oceania,melanesia, 168 | new zea!land,nz,nzl,oceania,australia and new zealand, 169 | nicaragua,ni,nic,americas,latin america and the caribbean,central america 170 | niger,ne,ner,africa,sub-saharan africa,western africa 171 | nigeria,ng,nga,africa,sub-saharan africa,western africa 172 | niue,nu,niu,oceania,polynesia, 173 | norfolk island,nf,nfk,oceania,australia and new zealand, 174 | reunion,re,reu,africa,sub-saharan africa,eastern africa 175 | north macedonia,mk,mkd,europe,southern europe, 176 | sAudi arabia,sa,sau,asia,western asia, 177 | northern mariana islands,mp,mnp,oceania,micronesia, 178 | norway,no,nor,europe,northern europe, 179 | !!oman,om,omn,asia,western asia, 180 | pa$kistan,pk,pak,asia,southern asia, 181 | palau,pw,plw,oceania,micronesia, 182 | panama,pa,pan,americas,latin america and the caribbean,central america 183 | papua new guinea,pg,png,oceania,melanesia, 184 | paraguay,py,pry,americas,latin america and the caribbean,south america 185 | peru,pe,per,americas,latin america and the caribbean,south america 186 | philippines,ph,phl,asia,south-eastern as$ia, 187 | pitcairn,pn,pcn,oceania,polynesia, 188 | poland,pl,pol,europe,eastern europe, 189 | portugal,pt,prt,europe,southern europe, 190 | philippines ,p?h,phl,asia,south-eastern asia, 191 | puerto rico,pr,pri,americas,latin america and the caribbean,caribbean 192 | qatar,qa,qat,asia,western asia, 193 | r!eunion,re,reu,africa,sub-saharan africa,eastern africa 194 | romania,RO,rou,europe,eastern europe, 195 | russian federation,ru,rus,europe,eastern europe, 196 | rwanda,rw,rwa,africa,sub-saharan africa,eastern africa 197 | saint barthelemy,bl,blm,americas,latin america and the caribbean,caribbean 198 | "saint helena, ascension and tristan da cunha",sh,shn,africa,sub-saharan africa,western africa 199 | saint kitts and nevis,kn,kna,americas,latin america and the caribbean,caribbean 200 | saint lucia,lc,lca,americas,latin america and the caribbean,caribbean 201 | saint martin (french part),mf,maf,americas,latin america and the caribbean,caribbean 202 | saint pierre and miquelon,pm,spm,americas,northern america, 203 | $saint vincent and the grenadines,vc,vct,americas,latin america and the caribbean,caribbean 204 | samoa,ws,wsm,oceania,polynesia, 205 | san marino,sm,smr,europe,southern europe, 206 | sao tome and principe,st,stp,africa,sub-saharan africa,middle africa 207 | saudi Arabia,sa,sau,asia,western asia, 208 | Senegal,sn,sen,africa,sub-saharan africa,western africa 209 | serbia,rs,srb,europe,southern europe, 210 | seychelles,sc,syc,africa,sub-saharan africa,eastern africa 211 | sierra leone,sl,sle,africa,sub-saharan africa,western africa 212 | zimbabwe?,zw,zwe,africa,sub-saharan africa,eastern africa 213 | singapore,sg,sgp,asia,south-eastern asia, 214 | sint ma!arten (dutch part),sx,sxm,americas,latin america and the caribbean,caribbean 215 | slovakia,sk,svk,europe,eastern europe, 216 | slovenia,si,svn,europe,southern europe, 217 | solomon islands,sb,slb,oceania,melanesia, 218 | somalia,so,som,africa,sub-saharan africa,eastern africa 219 | south africa,Za,zaf,africa,sub-saharan africa,s$outhern africa 220 | south georgia and the south sandwich islands,gs,sgs,americas,latin america and the caribbean,south america 221 | south sudan,ss,ssd,africa,sub-saharan africa,eastern africa 222 | spain,es,esp,europe,southern europe, 223 | sri lanka,lk,lka,asia,southern asia, 224 | sudan,sd,sdn,africa,northern africa, 225 | suriname,sr,sur,americas,latin america and the caribbean,south america 226 | sweden,se,swe,europe,northern europe, 227 | switzer*land,ch,che,euro?pe,western europe, 228 | syrian arab republic,sy,syr,asia,western asia, 229 | taiwan,tw,twn,asia,eastern asia, 230 | tajikistan,tj,tjk,asia,central asia, 231 | "tanzania, united republic of",tz,tza,africa,sub-saharan africa,eastern africa 232 | thailand,th,tha,asia,south-eastern asia, 233 | timor-leste,tl!,tls,asia,south-eastern asia, 234 | togo,tg,tgo,africa,sub-saharan africa,western africa 235 | tonga,to,ton,oceania,polynesia, 236 | trinidad and tobago,tt,tto,americas,latin america and the caribbean,caribbean 237 | tunisia,tn,tun,africa,northern africa, 238 | tur?key,tr,tur,asia,western asia, 239 | turkmenistan,tm,tkm,asia,central asia, 240 | turks and caicos islands,tc,tca,americas,latin america and the caribbean,caribbean 241 | tuvalu,tv,tuv,oceania,polynesia, 242 | uganda,ug,uga,africa,sub-saharan africa,eastern africa 243 | serbia,rs,srb,eur!ope,southern eur?ope, 244 | ukraine,ua,ukr,europe,eastern europe, 245 | united arab emirates,ae,are,asia,western asia, 246 | united kingdom,gb,gbr,europe,northern europe, 247 | united states of america,us,usa,americas,northern america, 248 | uruguay,uy,ury,americas,latin america and the caribbean,south america 249 | uzbe?kistan,uz,uzb,asia,central asia, 250 | vanuatu,vu,vut,oceania,melanesia, 251 | *venezuela (bolivarian republic of),ve,ven,americas,latin america and the caribbean,south america 252 | viet nam,vn,vnm,asia,south-eastern asia, 253 | virgin islands (british),vg,vgb,americas,latin america and the caribbean,caribbean 254 | virgin islands (u.s.),vi,vir,americas,latin america and the caribbean,caribbean 255 | wallis and futuna,wf,wlf,oceania,polynesia, 256 | yemen,ye,yem,asia,western asia, 257 | zambia#,zm,zmb,africa,sub-saharan africa,eastern africa 258 | zim$babwe,zw,zwe,africa,sub-saharan africa,eastern africa 259 | -------------------------------------------------------------------------------- /source_data/csv_data/currencies.csv: -------------------------------------------------------------------------------- 1 | country_code_2,currency_name,currency_code 2 | aF,afghani,afn 3 | al,lek,all 4 | dz,algerian dinar,dzd 5 | as,us dollar,usd 6 | ad,euro,eur 7 | ao,kwanza,aoa 8 | ai,east caribbean dollar,xcd 9 | ag,east caribbean dollar,xcd 10 | ar,argentine peso,ars 11 | am,armenian dram,amd 12 | aw,aruban florin,awg 13 | au,australian $dollar,aud 14 | at,euro,eur 15 | az,azerbaijan manat,azn 16 | bs,bahamian dollar,bsd 17 | bh,bahraini dinar,bhd 18 | bd,taka,bdt 19 | bb,barbados dollar,bbd 20 | by,belarusian ruble,byn 21 | be,euro,eur 22 | bz,belize dollar,bzd 23 | bj,cfa franc bceao,xof 24 | !bm,bermudian dollar,bmd 25 | bt,indian rupee,inr 26 | bt,ngultrum,btn 27 | bo,boliviano,bob 28 | bo,mvdol,bov 29 | ba,convertible mark,bam 30 | bw,pula,bwp 31 | br,brazilian real,brl 32 | io,us dollar,usd 33 | bn,brunei dollar,bnd 34 | bg,bulgarian lev,bgn 35 | bf,cfa franc bceao,xof 36 | bi,burundi franc,bif 37 | cv,cabo verde escudo,cve 38 | kh,riel,khr 39 | cm,cfa franc beac,xaf 40 | ca,canadian dollar,cad 41 | ky,cayman islands dollar,kyd 42 | cf,cfa franc beac,xaf 43 | td,cfa franc beac,xaf 44 | cl,chilean peso,clp 45 | cl,unidad de fomento,clf 46 | cn,yuan renminbi,cny 47 | cx,australian dollar,aud 48 | cc,australian dollar,aud 49 | co,colombian peso,cop 50 | co,unidad de valor real,cou 51 | km,comorian franc ,kmf 52 | cd,congolese franc,cdf 53 | cg,cfa franc beac,xaf 54 | ck,new zealand dollar,nzd 55 | cr,costa rican colon,crc 56 | ci,cfa franc bceao,xof 57 | hr,euro,eur 58 | cU,cuban peso,cup 59 | cu,peso convertible,cuc 60 | cw,netherlands antillean guilder,ang 61 | cy,euro,eur 62 | cz,czech koruna,czk 63 | dk,danish krone,dkk 64 | dj,djibouti franc,djf 65 | dm,east caribbean dollar,xcd 66 | do,dominican peso,dop 67 | ec,us $dollar,usd 68 | eg,egyptian pound,egp 69 | sv,el salvador colon,svc 70 | sv,us dollar,usd 71 | gq,cfa franc beac,xaf 72 | er,nakfa,ern 73 | ee,euro,eur 74 | sz,lilangeni,szl 75 | et,ethiopian birr,etb 76 | fk,falkland islands pound,fkp 77 | fo,danish krone,dkk 78 | fj,fiji dollar,fjd 79 | fi,euro,eur 80 | fr,euro,eur 81 | gf,euro,eur 82 | pf,cfp franc,xpf 83 | Ga,cfa franc beac,xaf 84 | gm,dalasi,gmd 85 | ge,lari,gel 86 | de,euro,eur 87 | gh,ghana cedi,ghs 88 | gi,gibraltar pound,gip 89 | gr,euro,eur 90 | gl,danish krone,dkk 91 | gd,east caribbean dollar,xcd 92 | gp,euro,eur 93 | gu,us dollar,usd 94 | gt,quetzal,gtq 95 | gn,guinean franc,gnf 96 | gw,cfa franc bceao,xof 97 | gn,guyana dollar,gyd 98 | ht,gourde,htg 99 | ht,us dollar,usd 100 | va,euro,eur 101 | hn,lempira,hnl 102 | hk,hong kong $dollar,hkd 103 | hu,forint,huf 104 | is,iceland krona,isk 105 | in,indian rupee,inr 106 | id!,rupiah,idr 107 | ir,iranian rial,irr 108 | iq,iraqi dinar,iqd 109 | ie,euro,eur 110 | im,pound sterling,gbp 111 | il,new israeli sheqel,ils 112 | it,euro,eur 113 | jm,jamaican dollar,jmd 114 | jp,yen,jpy 115 | je,pound sterling,gbp 116 | jo,jordanian dinar,jod 117 | kz,tenge,kzt 118 | ke,kenyan shilling,kes 119 | ki,australian dollar,aud 120 | kp,north korean won,kpw 121 | kr,won,krw 122 | kw,kuwaiti dinar,kwd 123 | kg,som,kgs 124 | la,lao kip,lak 125 | lv,euro,eur 126 | lb,lebanese pound,lbp 127 | ls,loti,lsl 128 | ls,rand,zar 129 | lr,liberian dollar,lrd 130 | ly,libyan dinar,lyd 131 | li,swiss franc,chf 132 | !!lt,euro,eur 133 | lu,euro,eur 134 | mo,pataca,mop 135 | mk,denar,mkd 136 | mg,malagasy ariary,mga 137 | mw,malawi !kwacha,mwk 138 | my,malaysian ringgit,myr 139 | mv,rufiyaa,mvr 140 | ml,cfa franc bceao,xof 141 | mt,euro,eur 142 | mh,us dollar,usd 143 | mq,euro,eur 144 | mr,ouguiya,mru 145 | mu,mauritius rupee,mur 146 | yt,euro,eur 147 | mx,mexican peso,mxn 148 | mx,mexican unidad de inversion (udi),mxv 149 | fm,us dollar,usd 150 | md,moldovan leu,mdl 151 | mc,euro,eur 152 | mN,tugrik,mnt 153 | me,euro,eur 154 | ms,east caribbean dollar,xcd 155 | ma,moroccan dirham,mad 156 | mz,mozambique metical,mzn 157 | mm,kyat,mmk 158 | na,namibia dollar,nad 159 | na,rand,zar 160 | nr,australian dollar,aud 161 | np,nepalese $rupee,npr 162 | nl,euro,eur 163 | nc,cfp franc,xpf 164 | nz,new zealand dollar,nzd 165 | ni,cordoba oro,nio 166 | ni,cfa franc bceao,xof 167 | ng,naira,ngn 168 | nu,new zealand dollar,nzd 169 | nf,australian dollar,aud 170 | mp,us dollar,usd 171 | no,norwegian krone,nok 172 | om,rial omani,omr 173 | pk,pakistan rupee,pkr 174 | pw,us dollar,usd 175 | pa,balboa,pab 176 | PA,us dollar,usd 177 | pg,kina,pgk 178 | py,guarani,pyg 179 | pe,sol,pen 180 | ph,philippine peso,php 181 | pn,new zealand dollar,nzd 182 | pl,zloty,pln 183 | pt,euro,eur 184 | pr,us dollar,usd 185 | qa,qatari rial,qar 186 | re,euro,eur 187 | ro,roman?ian leu,ron 188 | ru,russian ruble,rub 189 | rw,rwanda franc,rwf 190 | bl,euro,eur 191 | sh,saint helena pound,shp 192 | kn,east caribbean dollar,xcd 193 | ls,east caribbean dollar,xcd 194 | mf,euro,eur 195 | pm,euro,eur 196 | vc,east caribbean dollar,xcd 197 | ws,tala,wst 198 | sm,euro,eur 199 | st,dobra,stn 200 | sa,saudi riyal,sar 201 | sn,cfa franc bceao,xof 202 | rs!,serbian dinar,rsd 203 | sc,seychelles rupee,scr 204 | sl,leone,sll 205 | sl,leone,sle 206 | sg,singapore dollar,sgd 207 | sx,netherlands antillean guilder,ang 208 | sk,euro,eur 209 | sk,euro,eur 210 | sb,solomon islands dollar,sbd 211 | so,somali shilling,sos 212 | za,rand,zar 213 | gs,no universal currency, 214 | ss,south sudanese pound,ssp 215 | es,euro,eur 216 | lk,sri lanka rupee,lkr 217 | sd,sudanese ?pound,sdg 218 | sr,surinam dollar,srd 219 | se,swedish krona,sek 220 | ch,swiss franc,chf 221 | ch,wir euro,che 222 | ch,wir franc,chw 223 | sy,syrian pound,syp 224 | tw,new taiwan dollar,twd 225 | tj,somoni,tjs 226 | tz,tanzanian shilling,tzs 227 | th,baht,thb 228 | tl,us dollar,usd 229 | Tg,cfa franc bceao,xof 230 | to,pa'anga,top 231 | tt,trinidad and tobago dollar,ttd 232 | tn,tunisian dinar,tnd 233 | tr,turkish lira,try 234 | tm,turkmenistan new manat,tmt 235 | tc,us dollar,usd 236 | tv,australian dollar,aud 237 | ug,uganda shilling,ugx 238 | uA,hryvnia,uah 239 | ae,uae dirham,aed 240 | gb,pound sterling,gbp 241 | us,us dollar,usd 242 | uy,peso uruguayo,uyu 243 | uy,uruguay peso en unidades indexadas (ui),uyi 244 | uy,unidad previsional,uyw 245 | uz,uzbekistan sum,uzs 246 | vu,vatu,vuv 247 | ve,bolivar soberano,ves 248 | ve,bolivar soberano,ved 249 | vn,dong,vnd 250 | vg,us d$ollar,usd 251 | vi,us dollar,usd 252 | wf,cfp franc,xpf 253 | ye,yemeni rial,yer 254 | zm,zambian kwacha,zmw 255 | zw,zimbabwe dollar,zwl 256 | -------------------------------------------------------------------------------- /source_data/csv_data/languages.csv: -------------------------------------------------------------------------------- 1 | language,country_code_2 2 | pashto,af 3 | persian,af 4 | u?zbek,af 5 | albanian,aL 6 | english,al 7 | arabic,dz 8 | french,dz 9 | kabyle,dz 10 | english,as 11 | catalan,ad 12 | english,ad 13 | lingala,ao 14 | portuguese,ao 15 | english,ai 16 | spanish,ai 17 | english,ag 18 | spanish,ag 19 | english,ar 20 | spanish,ar 21 | armenian,am 22 | dutch,aw 23 | spanish,aw 24 | english,au 25 | english,at 26 | german,at 27 | azerbaijani,az 28 | english,bs 29 | spanish,bs 30 | arabic,bh 31 | bangla,bd 32 | chakma,bD 33 | english,bd 34 | english,bb 35 | spanish,bb 36 | be?larusian,by 37 | russian,by 38 | dutch,be 39 | english,be 40 | french,be 41 | german,be 42 | walloon,be 43 | english,bz 44 | spanish,bz 45 | french,bj 46 | yoruba,bj 47 | english,bm 48 | spanish,bm 49 | dzongkha,bt 50 | quechua,bo 51 | spanish,bo 52 | bosnian,ba 53 | croatian,ba 54 | english,ba 55 | serbian,ba 56 | english,bw 57 | tswana,bw 58 | english,br 59 | portuguese,br 60 | spanish,br 61 | english,io 62 | english,vg 63 | spanish,vg 64 | malay,bn 65 | malay,bn 66 | bulgarian,bg 67 | english,bg 68 | french,bf 69 | fulah,bf 70 | english,bi 71 | french,bi 72 | rundi,bi 73 | khmer,kh 74 | aghem,cm 75 | bafia,cm 76 | basaa,cm 77 | duala,cm 78 | english,cm 79 | ewondo,cm 80 | french,cm 81 | fulah,cm 82 | kako,cm 83 | kwasio,cm 84 | meta,cm 85 | mundang,cm 86 | ngiemboon,cm 87 | ngomba,cm 88 | yangben,cm 89 | english,ca 90 | french,ca 91 | inuktitut,ca 92 | mohawk,ca 93 | spanish,ca 94 | kabuverdianu,cv 95 | portuguese,cv 96 | dutch,bq 97 | spanish,bq 98 | english,ky 99 | spanish,ky 100 | french,cf 101 | lingala,cf 102 | sango,cf 103 | arabic,td 104 | french,td 105 | english,cl 106 | mapuche,cl 107 | spanish,cl 108 | cantonese,cn 109 | chinese,cn 110 | chinese,cn 111 | english,cn 112 | sichuan yi,cn 113 | tibetan,cn 114 | uyghur,cn 115 | english,cx 116 | english,cc 117 | english,co 118 | spanish,co 119 | arabic,km 120 | french,km 121 | french,cg 122 | lingala,cg 123 | french,cd 124 | lingala,cd 125 | luba-katanga,cd 126 | swahili,cd 127 | english,ck 128 | spanish,cr 129 | french,ci 130 | croatian,hr 131 | english,hr 132 | spanish,cu 133 | dutch,cw 134 | spanish,cw 135 | english,cy 136 | greek,cy 137 | turkish,cy 138 | czech,cz 139 | english,cz 140 | danish,dk 141 | english,dk 142 | faroese,dk 143 | arabic,dj 144 | french,dj 145 | somali,dj 146 | english,dm 147 | spanish,dm 148 | spanish,do 149 | quechua,ec 150 | spanish,ec 151 | arabic,eg 152 | spanish,sv 153 | french,gq 154 | portuguese,gq 155 | spanish,gq 156 | arabic, er 157 | blin,er 158 | english,er 159 | geez,er 160 | tigre,er 161 | tigrinya,er 162 | english,ee 163 | estonian,ee 164 | english,sz 165 | swati,sz 166 | amharic,et 167 | geez,et 168 | oromo,et 169 | somali,et 170 | tigrinya,et 171 | wolaytta,et 172 | english,fk 173 | spanish,fk 174 | faroese,fo 175 | english,fj 176 | english,fi 177 | finnish,fi 178 | inari sami,fi 179 | northern sami,fi 180 | swedish,fi 181 | breton,fr 182 | catalan,fr 183 | corsican,fr 184 | english,fr 185 | french,fr 186 | occitan,fr 187 | portuguese,fr 188 | swiss german,fr 189 | french,gf 190 | spanish,gf 191 | french,pf 192 | french,ga 193 | english,gm 194 | fulah,gm 195 | georgian,ge 196 | ossetic,ge 197 | colognian,de 198 | english,de 199 | german,de 200 | low german,de 201 | lower sorbian,de 202 | upper sorbian,de 203 | akan,gh 204 | english,gh 205 | ewe,gh 206 | fulah,gh 207 | ga,gh 208 | hausa,gh 209 | english,gi 210 | english,gr 211 | greek,gr 212 | danish,gl 213 | kalaallisut,gl 214 | spanish,gl 215 | english,gd 216 | spanish,gd 217 | french,gp 218 | spanish,gp 219 | e!nglish,gu 220 | spanish,gt 221 | fulah,gw 222 | portuguese,gw 223 | french,gn 224 | fulah,gn 225 | kpelle,gn 226 | english,gy 227 | spanish,gy 228 | french,ht 229 | spanish,ht 230 | spanish,hn 231 | can#tonese,hk 232 | chinese,hk 233 | chinese,hk 234 | english,hk 235 | english,hu 236 | hungarian,hu 237 | english,is 238 | icelandic,is 239 | assamese,in 240 | bangla,in 241 | bodo,in 242 | chakma,in 243 | english,in 244 | gujarati,in 245 | hindi,in 246 | kannada,in 247 | kashmiri,in 248 | konkani,in 249 | malayalam,in 250 | manipuri,in 251 | marathi,in 252 | nepali,in 253 | odia,in 254 | punjabi,in 255 | sanskrit,in 256 | santali,in 257 | tamil,in 258 | telugu,in 259 | tibetan,in 260 | urdu,in 261 | english,id 262 | indonesian,id 263 | javanese,id 264 | "kurdish, sorani",ir 265 | mazanderani,ir 266 | northern luri,ir 267 | persian,ir 268 | arabic,iq 269 | "kurdish, sorani",iq 270 | northern luri,iq 271 | syriac,iq 272 | english,ie 273 | irish,ie 274 | english,iM 275 | manx,im 276 | arabic,il 277 | english,il 278 | hebrew,il 279 | catalan,it 280 | english,it 281 | friulian,it 282 | german,it 283 | italian,it 284 | sardinian,it 285 | sicilian,it 286 | english,jm 287 | english,jp 288 | japanese,jp 289 | english,je 290 | arabic,jo 291 | kazakh,kz 292 | russian,kz 293 | embu,ke 294 | english,ke 295 | gusii,ke 296 | kalenjin,ke 297 | kamba,ke 298 | kikuyu,ke 299 | luo,ke 300 | luyia,ke 301 | masai,ke 302 | meru,ke 303 | oromo,ke 304 | samburu,ke 305 | somali,ke 306 | swahili,ke 307 | taita,ke 308 | teso,ke 309 | english,ki 310 | arabic,kw 311 | kyrgyz,kg 312 | russian,kg 313 | lao,la 314 | english,lv 315 | latvian,lv 316 | arabic,lb 317 | english,ls 318 | southern sotho,ls 319 | english,lr 320 | fulah,lr 321 | kpelle,lr 322 | vai,lr 323 | arabic,ly 324 | german,li 325 | swiss german,li 326 | english,lt 327 | lithuanian,lt 328 | english,lu 329 | french,lu 330 | german,lu 331 | luxembourgish,lu 332 | portuguese,lu 333 | chinese,mo 334 | chinese,mo 335 | english,mo 336 | portuguese,mo 337 | english,mg 338 | french,mg 339 | malagasy,mg 340 | english,mw 341 | nyanja,mw 342 | english,my 343 | malay,my 344 | tamil,my 345 | dhivehi,mv 346 | english,mv 347 | bambara,ml 348 | french,ml 349 | koyra chiini,ml 350 | koyraboro senni,ml 351 | english,mt 352 | maltese,mt 353 | english,mh 354 | french,mq 355 | spanish,mq 356 | arabic,mr 357 | french,mr 358 | fulah,mr 359 | english,mu 360 | french,mu 361 | morisyen,mu 362 | f#rench,yt 363 | english,mx 364 | spanish,mx 365 | english,fm 366 | romanian,md 367 | russian,md 368 | french,mc 369 | mongolian,mn 370 | english,me 371 | serbian,me 372 | serbian,me 373 | english,ms 374 | spanish,ms 375 | arabic,ma 376 | french,ma 377 | moroccan tamazight,ma 378 | tachelhit,ma 379 | makhuwa-meetto,mz 380 | portuguese,mz 381 | sena,mz 382 | burmese,mm 383 | english,mm 384 | afrikaans,na 385 | english,na 386 | nama,na 387 | english,nr 388 | nepali,np 389 | dutch,nl 390 | english,nl 391 | low german,nl 392 | western frisian,nl 393 | fre$nch,nc 394 | english,nz 395 | maori,nz 396 | spanish,ni 397 | french,ne 398 | fulah,ne 399 | hausa,ne 400 | tasawaq,ne 401 | zarma,ne 402 | english,ng 403 | fulah,ng 404 | hausa,ng 405 | igbo,ng 406 | jju,ng 407 | tyap,ng 408 | yoruba,ng 409 | english,nu 410 | english,nf 411 | korean,kp 412 | albanian,mk 413 | macedonian,mk 414 | english,mp 415 | english,no 416 | northern sami,no 417 | norwegian nynorsk,no 418 | arabic,om 419 | english,pk 420 | pashto,pk 421 | punjabi,pk 422 | sindhi,pk 423 | urdu,pk 424 | english,pw 425 | spanish,pa 426 | english,pg 427 | guarani,py 428 | spanish,py 429 | quechua,pe 430 | spanish,pe 431 | cebuano,ph 432 | english,ph 433 | filipino,ph 434 | spanish,ph 435 | english,pn 436 | english,pl 437 | polish,pl 438 | english,pt 439 | portuguese,pt 440 | english,pr 441 | spanish,pr 442 | arabic,Qa 443 | french,re 444 | english,ro 445 | romanian,ro 446 | bashkir,ru 447 | chechen,ru 448 | chuvash,ru 449 | engli$sh,ru 450 | erzya,ru 451 | ossetic,ru 452 | russian,ru 453 | sakha,ru 454 | tatar,ru 455 | english,rw 456 | french,rw 457 | kinyarwanda,rw 458 | english,ws 459 | italian,sm 460 | portuguese,st 461 | arabic,sa 462 | english,sa 463 | french,sn 464 | fulah,sn 465 | jola-fonyi,sn 466 | wolof,sn 467 | english,rs 468 | serbian,rs 469 | english,sc 470 | french,sc 471 | english,sl 472 | fulah,sl 473 | chinese,sg 474 | english,sg 475 | malay,sg 476 | tamil,sg 477 | dutch,sx 478 | english,sx 479 | spanish,sx 480 | english,sk 481 | slovak,sk 482 | english,si 483 | slovenian,si 484 | english,sb 485 | arabic,so 486 | somali,so 487 | afrikaans,za 488 | english,za 489 | northern sotho,za 490 | south ndebele,za 491 | southern sotho,za 492 | swati,za 493 | tsonga,za 494 | tswana,za 495 | ve!nda,za 496 | xhosa,za 497 | zulu,za 498 | english,kr 499 | korean,kr 500 | arabic,ss 501 | english,ss 502 | nuer,ss 503 | asturian,es 504 | basque,es 505 | catalan,es 506 | english,es 507 | galician,es 508 | spanish,es 509 | sinhala,lk 510 | tamil,lk 511 | french,bl 512 | spanish,bl 513 | english,sh 514 | english,kn 515 | spanish,kn 516 | english,lc 517 | spanish,lc 518 | french,mf 519 | spanish,mf 520 | french,pm 521 | spanish,pm 522 | english,vc 523 | spanish,vc 524 | ara$bic,sd 525 | english,sd 526 | dutch,sR 527 | spanish,sr 528 | english,se 529 | northern sami,se 530 | swedish,se 531 | english,ch 532 | french,ch 533 | german,ch 534 | italian,ch 535 | portuguese,ch 536 | romansh,ch 537 | swiss german,ch 538 | walser,ch 539 | arabic,sy 540 | french,sy 541 | syriac,sy 542 | chinese,tw 543 | english,tw 544 | taroko,tw 545 | tajik,tj 546 | asu,tz 547 | bena,tz 548 | english,tz 549 | langi,tz 550 | machame,tz 551 | makonde,tz 552 | masai,tz 553 | ro$mbo,tz 554 | rwa,tz 555 | sangu,tz 556 | shambala,tz 557 | swahili,tz 558 | vunjo,tz 559 | english,th 560 | thai,th 561 | portuguese,tl 562 | ewe,tg 563 | french,tg 564 | english,to 565 | tongan,to 566 | english,tt 567 | spanish, tt 568 | arabic,tn 569 | french,tn 570 | english,tr 571 | kurdish,tr 572 | turkish,tr 573 | turkmen,tm 574 | english,tc 575 | spanish,tc 576 | english,tv 577 | english,vi 578 | spanish,vi 579 | chiga,ug 580 | english,ug 581 | g!anda,ug 582 | nyankole,ug 583 | soga,ug 584 | swahili,ug 585 | teso,ug 586 | english,ua 587 | russian,ua 588 | ukrainian,ua 589 | arabic,ae 590 | english,ae 591 | cornish,gb 592 | english,gb 593 | scottish gaelic,gb 594 | welsh,gb 595 | english,us 596 | spanish,uy 597 | uzbek,uz 598 | uzbek,uz 599 | english,vu 600 | french,vu 601 | italian,va 602 | spanish,ve 603 | vietnamese,vn 604 | french,wF 605 | arabic,ye 606 | bemba,zm 607 | en!glish,zm 608 | english,zw 609 | shona,zw 610 | -------------------------------------------------------------------------------- /source_data/csv_output/output.csv: -------------------------------------------------------------------------------- 1 | row_id,insert_date,country_name,city_name,population,languages 2 | 1,2022-04-09,argentina,santa maria,9526,"{english,spanish}" 3 | 2,2022-04-09,brazil,clevelandia,16450,"{spanish,english,portuguese}" 4 | 3,2022-04-09,brazil,conceicao do rio verde,12949,"{english,portuguese,spanish}" 5 | 4,2022-04-09,brazil,curaca,35208,"{english,portuguese,spanish}" 6 | 5,2022-04-09,brazil,japoata,12938,"{spanish,english,portuguese}" 7 | 6,2022-04-09,haiti,petite riviere de nippes,29815,"{spanish,french}" 8 | 7,2022-04-09,peru,puerto pimentel,15552,"{quechua,spanish}" 9 | 8,2022-04-28,argentina,ciudad general belgrano,92957,"{english,spanish}" 10 | 9,2022-04-28,brazil,caete,40750,"{portuguese,spanish,english}" 11 | 10,2022-04-28,brazil,catanduva,119480,"{spanish,english,portuguese}" 12 | 11,2022-04-28,brazil,colider,32120,"{english,spanish,portuguese}" 13 | 12,2022-04-28,brazil,miracema do tocantins,20684,"{spanish,english,portuguese}" 14 | 13,2022-04-28,brazil,presidente medici,18571,"{spanish,english,portuguese}" 15 | 14,2022-04-28,colombia,coyaima,28443,"{spanish,english}" 16 | 15,2022-04-28,colombia,manati,13456,"{spanish,english}" 17 | 16,2022-04-28,dominican republic,licey al medio,25539,{spanish} 18 | 17,2022-04-28,mexico,amatitan,15344,"{spanish,english}" 19 | 18,2022-04-28,mexico,san juan zitlaltepec,19600,"{spanish,english}" 20 | 19,2022-04-28,mexico,teopisca,20044,"{english,spanish}" 21 | 20,2022-08-11,brazil,charqueadas,35320,"{portuguese,spanish,english}" 22 | 21,2022-08-11,brazil,chopinzinho,19679,"{portuguese,english,spanish}" 23 | 22,2022-08-11,brazil,jaguarao,27931,"{portuguese,spanish,english}" 24 | 23,2022-08-11,ecuador,vinces,30241,"{spanish,quechua}" 25 | 24,2022-08-11,guatemala,el palmar,31706,{spanish} 26 | 25,2022-08-11,nicaragua,villanueva,25660,{spanish} 27 | 26,2022-08-11,puerto rico,guaynabo,70542,"{english,spanish}" 28 | -------------------------------------------------------------------------------- /source_data/scripts/build_tables.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQL Code Challenge 3 | Author: Jaime M. Shaker 4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 5 | Website: https://www.shaker.dev 6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 7 | 8 | File Name: build_tables.sql 9 | */ 10 | 11 | -- We must insure that our data is properly organized. Let's create a schema 12 | -- specificaly for importing/copying data from our CSV files. 13 | CREATE SCHEMA IF NOT EXISTS import_data; 14 | -- Create import_data.countries table 15 | -- Drop this table everytime this script is run to ensure repeatability. 16 | DROP TABLE IF EXISTS import_data.countries; 17 | -- Create a new table and add a country_id to use as a primary key. 18 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY. 19 | -- We will also create a created_on field that will have the date that the table was created. 20 | CREATE TABLE import_data.countries ( 21 | -- Create an auto-incrementing, unique key for every row. 22 | -- This will also be used as our Primary Key. 23 | country_id INT GENERATED ALWAYS AS IDENTITY, 24 | country_name TEXT, 25 | country_code_2 TEXT, 26 | country_code_3 TEXT, 27 | region TEXT, 28 | sub_region TEXT, 29 | intermediate_region TEXT, 30 | created_on date, 31 | PRIMARY KEY (country_id) 32 | ); 33 | -- We will use the COPY statment to extract the data from our CSV files. 34 | COPY import_data.countries ( 35 | country_name, 36 | country_code_2, 37 | country_code_3, 38 | region, 39 | sub_region, 40 | intermediate_region 41 | ) 42 | -- PostgreSQL stores its data in /var/lib/postgresql/ 43 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access. 44 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv' 45 | -- The CSV files are comma separated and include headers. 46 | WITH DELIMITER ',' HEADER CSV; 47 | 48 | -- Using the same process, lets create tables for all of our csv files. 49 | /* Create import.cities table */ 50 | DROP TABLE IF EXISTS import_data.cities; 51 | CREATE TABLE import_data.cities ( 52 | city_id INT GENERATED ALWAYS AS IDENTITY, 53 | city_name TEXT, 54 | latitude TEXT, 55 | longitude TEXT, 56 | country_code_2 TEXT, 57 | capital TEXT, 58 | population TEXT, 59 | insert_date TEXT, 60 | PRIMARY KEY (city_id) 61 | ); 62 | 63 | COPY import_data.cities ( 64 | city_name, 65 | latitude, 66 | longitude, 67 | country_code_2, 68 | capital, 69 | population, 70 | insert_date 71 | ) 72 | FROM '/var/lib/postgresql/source_data/csv_data/cities.csv' 73 | WITH DELIMITER ',' HEADER CSV; 74 | 75 | /* Create import.currencies table */ 76 | DROP TABLE IF EXISTS import_data.currencies; 77 | CREATE TABLE import_data.currencies ( 78 | currency_id INT GENERATED ALWAYS AS IDENTITY, 79 | country_code_2 TEXT, 80 | currency_name TEXT, 81 | currency_code TEXT, 82 | PRIMARY KEY (currency_id) 83 | ); 84 | 85 | COPY import_data.currencies ( 86 | country_code_2, 87 | currency_name, 88 | currency_code 89 | ) 90 | FROM '/var/lib/postgresql/source_data/csv_data/currencies.csv' 91 | WITH DELIMITER ',' HEADER CSV; 92 | 93 | /* Create import.languages table */ 94 | DROP TABLE IF EXISTS import_data.languages; 95 | CREATE TABLE import_data.languages ( 96 | language_id INT GENERATED ALWAYS AS IDENTITY, 97 | language TEXT, 98 | country_code_2 TEXT, 99 | PRIMARY KEY (language_id) 100 | ); 101 | 102 | COPY import_data.languages ( 103 | language, 104 | country_code_2 105 | ) 106 | FROM '/var/lib/postgresql/source_data/csv_data/languages.csv' 107 | WITH DELIMITER ',' HEADER CSV; -------------------------------------------------------------------------------- /source_data/scripts/cleanup_db.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQL Code Challenge 3 | Author: Jaime M. Shaker 4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 5 | Website: https://www.shaker.dev 6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 7 | 8 | File Name: cleanup_db.sql 9 | */ 10 | 11 | -- Drop All import_data Tables and Schema 12 | 13 | DROP TABLE import_data.countries; 14 | DROP TABLE import_data.cities; 15 | DROP TABLE import_data.currencies; 16 | DROP TABLE import_data.languages; 17 | DROP SCHEMA import_data CASCADE; 18 | -------------------------------------------------------------------------------- /source_data/scripts/code_challenge.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQL Code Challenge 3 | Author: Jaime M. Shaker 4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 5 | Website: https://www.shaker.dev 6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 7 | 8 | File Name: code_challenge.sql 9 | */ 10 | 11 | 12 | /* Question 2. 13 | * 14 | * List all of the regions and the total number of countries in each region. 15 | * Order by country count in descending order and capitalize the region name. 16 | * 17 | */ 18 | 19 | SELECT 20 | -- initcap() capitalizes the first letter of every word in a string. 21 | initcap(region) AS region, 22 | count(*) AS country_count 23 | FROM 24 | cleaned_data.countries 25 | GROUP BY 26 | -- Aggregate functions 'count()' require you to group all column fields. 27 | region 28 | ORDER BY 29 | country_count DESC; 30 | 31 | /* 32 | 33 | region |country_count| 34 | ---------+-------------+ 35 | Africa | 59| 36 | Americas | 57| 37 | Asia | 50| 38 | Europe | 48| 39 | Oceania | 26| 40 | Antartica| 1| 41 | 42 | */ 43 | 44 | /* Question 3. 45 | * 46 | * List all of the sub-regions and the total number of cities in each sub-region. 47 | * Order by sub-region name alphabetically. 48 | * 49 | */ 50 | 51 | SELECT 52 | initcap(t1.sub_region) AS sub_region, 53 | count(*) AS city_count 54 | FROM 55 | cleaned_data.countries AS t1 56 | JOIN 57 | cleaned_data.cities AS t2 58 | ON 59 | t1.country_code_2 = t2.country_code_2 60 | GROUP BY 61 | t1.sub_region 62 | ORDER BY 63 | t1.sub_region; 64 | 65 | /* 66 | 67 | sub_region |city_count| 68 | -------------------------------+----------+ 69 | Australia And New Zealand | 329| 70 | Central Asia | 560| 71 | Eastern Asia | 3164| 72 | Eastern Europe | 2959| 73 | Latin America And The Caribbean| 7204| 74 | Melanesia | 60| 75 | Micronesia | 15| 76 | Northern Africa | 1152| 77 | Northern America | 5844| 78 | Northern Europe | 2025| 79 | Polynesia | 22| 80 | Southeastern Asia | 2627| 81 | Southern Asia | 6848| 82 | Southern Europe | 3238| 83 | Subsaharan Africa | 3223| 84 | Western Asia | 1400| 85 | Western Europe | 3952| 86 | 87 | */ 88 | 89 | /* Question 4. 90 | * 91 | * List all of the countries and the total number of cities in the Northern Europe sub-region. 92 | * List the country names in uppercase and order the list by the length of the country name and 93 | * alphabetically in ascending order. 94 | * 95 | */ 96 | 97 | SELECT 98 | -- upper() returns your string in uppercase. 99 | upper(t1.country_name) AS country_name, 100 | count(*) AS city_count 101 | FROM 102 | cleaned_data.countries AS t1 103 | JOIN 104 | cleaned_data.cities AS t2 105 | ON 106 | t1.country_code_2 = t2.country_code_2 107 | WHERE 108 | t1.sub_region = 'northern europe' 109 | GROUP BY 110 | t1.country_name 111 | ORDER BY 112 | -- length() returns the number or characters in a string including spaces. 113 | length(t1.country_name), t1.country_name; 114 | 115 | /* 116 | 117 | country_name |city_count| 118 | --------------+----------+ 119 | JERSEY | 1| 120 | LATVIA | 39| 121 | NORWAY | 127| 122 | SWEDEN | 148| 123 | DENMARK | 75| 124 | ESTONIA | 20| 125 | FINLAND | 142| 126 | ICELAND | 12| 127 | IRELAND | 64| 128 | LITHUANIA | 61| 129 | ISLE OF MAN | 2| 130 | FAROE ISLANDS | 29| 131 | UNITED KINGDOM| 1305| 132 | 133 | */ 134 | 135 | /* Question 5. 136 | * 137 | * List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021. 138 | * Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities. 139 | * 140 | */ 141 | 142 | SELECT 143 | initcap(t1.country_name) AS country_name, 144 | count(*) AS city_count 145 | FROM 146 | cleaned_data.countries AS t1 147 | JOIN 148 | cleaned_data.cities AS t2 149 | ON 150 | t1.country_code_2 = t2.country_code_2 151 | WHERE 152 | t1.sub_region = 'southern europe' 153 | AND 154 | -- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours..... 155 | EXTRACT('year' FROM t2.insert_date) = 2021 156 | GROUP BY 157 | t1.country_name 158 | ORDER BY 159 | -- substring() function returns a specific portion of a string. 160 | substring(t1.country_name,length(t1.country_name),1), city_count; 161 | 162 | /* 163 | 164 | country_name |city_count| 165 | ----------------------+----------+ 166 | Andorra | 5| 167 | Albania | 11| 168 | Bosnia And Herzegovina| 15| 169 | Croatia | 22| 170 | North Macedonia | 28| 171 | Malta | 32| 172 | Serbia | 58| 173 | Slovenia | 74| 174 | Greece | 64| 175 | Portugal | 109| 176 | Spain | 302| 177 | San Marino | 2| 178 | Montenegro | 12| 179 | Italy | 542| 180 | 181 | */ 182 | 183 | /* Question 6. 184 | * 185 | * List all of the countries in the region of Asia that did **NOT** have a city 186 | * with an inserted date from June 2021 through Sept 2021. 187 | * 188 | */ 189 | 190 | SELECT 191 | -- Distinct will only return unique values 192 | DISTINCT initcap(t1.country_name) AS country_name 193 | FROM 194 | cleaned_data.countries AS t1 195 | -- Left join will return all matching values from the left table (cleaned_data.countries) and 196 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where 197 | -- there are no matches in the left table. 198 | LEFT JOIN 199 | cleaned_data.cities AS t2 200 | ON 201 | t1.country_code_2 = t2.country_code_2 202 | AND 203 | t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01' 204 | WHERE 205 | t1.region = 'asia' 206 | -- Only return values that did NOT have a match with the countries table. 207 | AND 208 | t2.country_code_2 IS NULL; 209 | 210 | /* 211 | 212 | country_name | 213 | -----------------+ 214 | Brunei Darussalam| 215 | Kuwait | 216 | Macao | 217 | Singapore | 218 | 219 | */ 220 | 221 | /* Question 7. 222 | * 223 | * List the country, city name, population and city name length for the city names that are palindromes in the 224 | * Western Asia sub-region. Format the population with a thousands separator (1,000) and format the length of 225 | * the city name in roman numerals. Order by the length of the city name in descending order and 226 | * alphabetically in ascending order. 227 | * 228 | */ 229 | 230 | SELECT 231 | initcap(t1.country_name) AS country_name, 232 | initcap(t2.city_name) AS city_name, 233 | -- to_char() takes non-string data types and returns them as strings. 234 | to_char(t2.population, '999G999') AS population, 235 | to_char(length(t2.city_name), 'RN') AS roman_numeral_length 236 | FROM 237 | cleaned_data.countries AS t1 238 | JOIN 239 | cleaned_data.cities AS t2 240 | ON 241 | t1.country_code_2 = t2.country_code_2 242 | WHERE 243 | t2.city_name = reverse(t2.city_name) 244 | AND 245 | t1.sub_region = 'western asia' 246 | ORDER BY 247 | length(t2.city_name) DESC, t2.city_name ASC; 248 | 249 | /* 250 | 251 | country_name |city_name|population|roman_numeral_length| 252 | --------------------+---------+----------+--------------------+ 253 | Yemen |Hajjah | 46,568 | VI | 254 | Syrian Arab Republic|Hamah | 696,863 | V | 255 | Turkey |Kavak | 21,692 | V | 256 | Turkey |Kinik | 29,803 | V | 257 | Turkey |Tut | 10,161 | III | 258 | 259 | */ 260 | 261 | 262 | /* Question 8. 263 | * 264 | * List all of the countries that end in 'stan'. Make your query case-insensitive and list 265 | * whether the total population of the cities listed is an odd or even number for cities inserted in 2022. 266 | * Order by whether the population value is odd or even in ascending order and country name in alphabetical order. 267 | * 268 | */ 269 | 270 | SELECT 271 | initcap(t1.country_name) AS country_name, 272 | to_char(sum(t2.population), '99G999G999') total_population, 273 | CASE 274 | WHEN (sum(t2.population) % 2) = 0 275 | THEN 'Even' 276 | ELSE 277 | 'Odd' 278 | END AS odd_or_even 279 | FROM 280 | cleaned_data.countries AS t1 281 | JOIN 282 | cleaned_data.cities AS t2 283 | ON 284 | t1.country_code_2 = t2.country_code_2 285 | WHERE 286 | t1.country_name ILIKE '%stan' 287 | AND 288 | EXTRACT('year' FROM t2.insert_date) = 2022 289 | GROUP BY 290 | t1.country_name 291 | ORDER BY 292 | odd_or_even, country_name; 293 | 294 | /* 295 | 296 | country_name|total_population|odd_or_even| 297 | ------------+----------------+-----------+ 298 | Afghanistan | 6,006,530 |Even | 299 | Kazakhstan | 4,298,264 |Even | 300 | Kyrgyzstan | 1,017,644 |Even | 301 | Pakistan | 26,344,480 |Even | 302 | Tajikistan | 2,720,953 |Odd | 303 | Turkmenistan| 419,607 |Odd | 304 | Uzbekistan | 3,035,547 |Odd | 305 | 306 | */ 307 | 308 | /* Question 9. 309 | * 310 | * List the third most populated city ranked by region WITHOUT using limit or offset. 311 | * List the region name, city name, population and order the results by region. 312 | * 313 | */ 314 | 315 | WITH get_city_rank_cte AS ( 316 | SELECT 317 | t1.region, 318 | t2.city_name, 319 | t2.population AS third_largest_pop, 320 | DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk 321 | FROM 322 | cleaned_data.countries AS t1 323 | JOIN 324 | cleaned_data.cities AS t2 325 | ON 326 | t1.country_code_2 = t2.country_code_2 327 | WHERE 328 | t2.population IS NOT NULL 329 | GROUP BY 330 | t1.region, 331 | t2.city_name, 332 | t2.population 333 | ) 334 | SELECT 335 | initcap(region) AS region, 336 | initcap(city_name) AS city_name, 337 | to_char(third_largest_pop, '99G999G999') AS third_largest_pop 338 | FROM 339 | get_city_rank_cte 340 | WHERE 341 | rnk = 3; 342 | 343 | /* 344 | 345 | region |city_name|third_largest_pop| 346 | --------+---------+-----------------+ 347 | Africa |Kinshasa | 12,836,000 | 348 | Americas|New York | 18,972,871 | 349 | Asia |Delhi | 32,226,000 | 350 | Europe |Paris | 11,060,000 | 351 | Oceania |Brisbane | 2,360,241 | 352 | 353 | */ 354 | 355 | /* Question 10. 356 | * 357 | * List the bottom third of all countries in the Western Asia sub-region that speak Arabic. 358 | * Include the row number and country name. Order by row number. 359 | * 360 | */ 361 | 362 | WITH get_ntile_cte AS ( 363 | SELECT 364 | ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn, 365 | t1.country_name, 366 | -- ntile() window functions returns groups of data section into 'buckets'. 367 | NTILE(3) OVER (ORDER BY t1.country_name) AS nt 368 | FROM 369 | cleaned_data.countries AS t1 370 | JOIN 371 | cleaned_data.languages AS t2 372 | ON 373 | t1.country_code_2 = t2.country_code_2 374 | WHERE 375 | t1.sub_region = 'western asia' 376 | AND 377 | t2.language = 'arabic' 378 | ) 379 | SELECT 380 | rn AS row_id, 381 | country_name 382 | FROM 383 | get_ntile_cte 384 | WHERE 385 | nt = 3; 386 | 387 | 388 | /* 389 | 390 | row_id|country_name | 391 | ------+--------------------+ 392 | 9|saudi arabia | 393 | 10|syrian arab republic| 394 | 11|united arab emirates| 395 | 12|yemen | 396 | 397 | */ 398 | 399 | /* Question 11. 400 | * 401 | * Create a query that lists country name, capital name, population, languages spoken and currency name for countries 402 | * in the Northen Africa sub-region. There can be multiple currency names and languages spoken per country. 403 | * Add multiple values for the same field into an array. 404 | * 405 | */ 406 | 407 | WITH get_row_values AS ( 408 | SELECT 409 | t1.country_name, 410 | t2.city_name, 411 | t2.population, 412 | -- array_agg() aggregates multiple values and returns them in 'array' format. 413 | array_agg(t3.LANGUAGE) AS languages, 414 | t4.currency_name AS currencies 415 | FROM 416 | cleaned_data.countries AS t1 417 | JOIN 418 | cleaned_data.cities AS t2 419 | ON 420 | t1.country_code_2 = t2.country_code_2 421 | JOIN 422 | cleaned_data.languages AS t3 423 | ON 424 | t1.country_code_2 = t3.country_code_2 425 | JOIN 426 | cleaned_data.currencies AS t4 427 | ON 428 | t1.country_code_2 = t4.country_code_2 429 | WHERE 430 | t1.sub_region = 'northern africa' 431 | AND 432 | t2.capital = TRUE 433 | GROUP BY 434 | t1.country_name, 435 | t2.city_name, 436 | t2.population, 437 | t4.currency_name 438 | ) 439 | SELECT 440 | * 441 | FROM 442 | get_row_values; 443 | 444 | /* 445 | 446 | country_name|city_name|population|languages |currencies | 447 | ------------+---------+----------+--------------------------------------------+---------------+ 448 | algeria |algiers | 3415811|{french,arabic,kabyle} |algerian dinar | 449 | egypt |cairo | 20296000|{arabic} |egyptian pound | 450 | libya |tripoli | 1293016|{arabic} |libyan dinar | 451 | morocco |rabat | 572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham| 452 | sudan |khartoum | 7869000|{arabic,english} |sudanese pound | 453 | tunisia |tunis | 1056247|{french,arabic} |tunisian dinar | 454 | 455 | */ 456 | 457 | 458 | /* Question 12. 459 | * 460 | * Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022. List 461 | * how many vowels and consonants are present in the city name and concatnate their percentage to the their respective 462 | * count in parenthesis. 463 | * 464 | */ 465 | 466 | WITH get_letter_count AS ( 467 | SELECT 468 | t1.city_name, 469 | length(t1.city_name) string_length, 470 | -- regexp_replace() returns a vlue that has been manipulated by a regular expression. 471 | length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count 472 | FROM 473 | cleaned_data.cities AS t1 474 | WHERE 475 | t1.insert_date = '2022-04-28' 476 | AND 477 | t1.country_code_2 in ('us') 478 | ), 479 | get_letter_diff AS ( 480 | SELECT 481 | city_name, 482 | string_length - consonant_count AS vowels, 483 | round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc, 484 | consonant_count AS consonants, 485 | round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc 486 | FROM 487 | get_letter_count 488 | ) 489 | SELECT 490 | city_name, 491 | vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc, 492 | consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc 493 | FROM 494 | get_letter_diff 495 | ORDER BY 496 | vowels; 497 | 498 | /* 499 | 500 | city_name |vowel_count_perc|consonants_count_perc| 501 | ---------------+----------------+---------------------+ 502 | standish |2 (25.00%) |6 (75%) | 503 | grand forks |2 (18.18%) |9 (81.82%) | 504 | camano |3 (50.00%) |3 (50%) | 505 | cedar hills |3 (27.27%) |8 (72.73%) | 506 | gladstone |3 (33.33%) |6 (66.67%) | 507 | whitehall |3 (33.33%) |6 (66.67%) | 508 | homewood |4 (50.00%) |4 (50%) | 509 | willowbrook |4 (36.36%) |7 (63.64%) | 510 | port salerno |4 (33.33%) |8 (66.67%) | 511 | vadnais heights|5 (33.33%) |10 (66.67%) | 512 | jeffersonville |5 (35.71%) |9 (64.29%) | 513 | 514 | */ 515 | 516 | /* Question 13. 517 | * 518 | * List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted 519 | * in April 2022. 520 | * 521 | */ 522 | 523 | DROP TABLE IF EXISTS get_dates; 524 | CREATE TEMP TABLE get_dates AS ( 525 | SELECT 526 | DISTINCT ON (insert_date) insert_date AS insert_date, 527 | city_name 528 | FROM 529 | cleaned_data.cities 530 | WHERE 531 | country_code_2 = 'ca' 532 | AND 533 | insert_date BETWEEN '2022-04-01' AND '2022-04-30' 534 | ORDER BY 535 | insert_date 536 | ); 537 | 538 | DROP TABLE IF EXISTS get_diff; 539 | CREATE TEMP TABLE get_diff AS ( 540 | SELECT 541 | city_name, 542 | insert_date, 543 | EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff 544 | FROM 545 | get_dates 546 | ); 547 | 548 | 549 | DROP TABLE IF EXISTS get_diff_count; 550 | CREATE TEMP TABLE get_diff_count AS ( 551 | SELECT 552 | city_name, 553 | insert_date, 554 | count(*) OVER (PARTITION BY diff) AS diff_count 555 | FROM 556 | get_diff 557 | ); 558 | 559 | 560 | WITH get_rank AS ( 561 | SELECT 562 | DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk, 563 | insert_date, 564 | city_name 565 | FROM 566 | get_diff_count 567 | ) 568 | SELECT 569 | insert_date AS most_consecutive_dates, 570 | initcap(city_name) AS city_name 571 | FROM 572 | get_rank 573 | WHERE 574 | rnk = 1 575 | ORDER BY 576 | insert_date; 577 | 578 | /* 579 | 580 | most_consecutive_dates|city_name | 581 | ----------------------+------------+ 582 | 2022-04-22|South Dundas| 583 | 2022-04-23|La Prairie | 584 | 2022-04-24|Elliot Lake | 585 | 2022-04-25|Lachute | 586 | 587 | */ 588 | 589 | /* Question 14. 590 | * 591 | * Create a view that lists the month-year, the number of cities inserted for that month, 592 | * a running city count total and the month over month percentage growth for 2021. 593 | * 594 | * Format the cities count and the running total with the thousands separator 595 | * and format the month over month growth with a plus symbol and percentage symbol. 596 | * 597 | * Example: Feb-2021 | 1,000 | 2,000 |+60.00% | 598 | * 599 | */ 600 | 601 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth; 602 | CREATE VIEW cleaned_data.view_2021_growth AS ( 603 | WITH get_month_count AS ( 604 | SELECT 605 | date_trunc('month', insert_date) as single_month, 606 | count(*) AS monthly_count 607 | FROM 608 | cleaned_data.cities 609 | WHERE 610 | EXTRACT('year' FROM insert_date) = 2021 611 | GROUP BY 612 | single_month 613 | ORDER BY 614 | single_month 615 | ), 616 | get_running_total AS ( 617 | SELECT 618 | single_month::date, 619 | monthly_count, 620 | sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities 621 | FROM 622 | get_month_count 623 | ), 624 | get_month_over_month AS ( 625 | SELECT 626 | single_month, 627 | monthly_count, 628 | total_num_cities, 629 | round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month 630 | FROM 631 | get_running_total 632 | ) 633 | SELECT 634 | to_char(single_month, 'Mon-YYYY') AS month_year, 635 | to_char(monthly_count, '9G999') AS cities_inserted, 636 | to_char(total_num_cities, '99G999') AS running_total, 637 | to_char(month_over_month, 'sg99.99') || '%' AS month_over_month 638 | FROM 639 | get_month_over_month 640 | ); 641 | 642 | SELECT 643 | * 644 | FROM 645 | cleaned_data.view_2021_growth; 646 | 647 | /* 648 | 649 | month_year|cities_inserted|running_total|month_over_month| 650 | ----------+---------------+-------------+----------------+ 651 | Jan-2021 | 1,471 | 1,471 | | 652 | Feb-2021 | 1,291 | 2,762 |+87.76% | 653 | Mar-2021 | 1,485 | 4,247 |+53.77% | 654 | Apr-2021 | 1,508 | 5,755 |+35.51% | 655 | May-2021 | 1,509 | 7,264 |+26.22% | 656 | Jun-2021 | 1,395 | 8,659 |+19.20% | 657 | Jul-2021 | 1,394 | 10,053 |+16.10% | 658 | Aug-2021 | 1,481 | 11,534 |+14.73% | 659 | Sep-2021 | 1,452 | 12,986 |+12.59% | 660 | Oct-2021 | 1,446 | 14,432 |+11.14% | 661 | Nov-2021 | 1,378 | 15,810 |+9.55% | 662 | Dec-2021 | 1,472 | 17,282 |+9.31% | 663 | 664 | */ 665 | 666 | /* Question 15. 667 | * 668 | * Create and call a stored procedure that lists a unique row id number, insert date, county name, city name, population 669 | * and languages spoken for countries in the Latin America and the Caribbean sub-region that were insert on 670 | * either '2022-04-09', '2022-04-28' or '2022-08-11'. 671 | * 672 | * Order by the insert date and output the results (including headers) to a CSV file located in /source_data/csv_output/. 673 | * 674 | */ 675 | 676 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output () 677 | LANGUAGE plpgsql 678 | AS 679 | $sproc$ 680 | BEGIN 681 | COPY ( 682 | SELECT 683 | ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id, 684 | t1.insert_date, 685 | t2.country_name, 686 | t1.city_name, 687 | t1.population, 688 | array_agg(t3.language) AS languages 689 | FROM 690 | cleaned_data.cities AS t1 691 | JOIN 692 | cleaned_data.countries AS t2 693 | ON 694 | t1.country_code_2 = t2.country_code_2 695 | LEFT JOIN 696 | cleaned_data.languages AS t3 697 | ON 698 | t2.country_code_2 = t3.country_code_2 699 | WHERE 700 | t2.sub_region = 'latin america and the caribbean' 701 | AND 702 | t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11') 703 | GROUP BY 704 | t1.insert_date, 705 | t2.country_name, 706 | t1.city_name, 707 | t1.population 708 | ORDER BY 709 | t1.insert_date 710 | ) 711 | TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER; 712 | END 713 | $sproc$; 714 | 715 | -- Call the stored procedure 716 | CALL cleaned_data.sproc_output(); 717 | 718 | /* 719 | 720 | Results located in /source_data/csv_output/output.csv 721 | 722 | row_id,insert_date,country_name,city_name,population,languages 723 | 1,2022-04-09,argentina,santa maria,9526,"{english,spanish}" 724 | 2,2022-04-09,brazil,clevelandia,16450,"{spanish,english,portuguese}" 725 | 3,2022-04-09,brazil,conceicao do rio verde,12949,"{english,portuguese,spanish}" 726 | 4,2022-04-09,brazil,curaca,35208,"{english,portuguese,spanish}" 727 | 5,2022-04-09,brazil,japoata,12938,"{spanish,english,portuguese}" 728 | 6,2022-04-09,haiti,petite riviere de nippes,29815,"{spanish,french}" 729 | 7,2022-04-09,peru,puerto pimentel,15552,"{quechua,spanish}" 730 | 8,2022-04-28,argentina,ciudad general belgrano,92957,"{english,spanish}" 731 | 9,2022-04-28,brazil,caete,40750,"{portuguese,spanish,english}" 732 | 10,2022-04-28,brazil,catanduva,119480,"{spanish,english,portuguese}" 733 | 11,2022-04-28,brazil,colider,32120,"{english,spanish,portuguese}" 734 | 12,2022-04-28,brazil,miracema do tocantins,20684,"{spanish,english,portuguese}" 735 | 13,2022-04-28,brazil,presidente medici,18571,"{spanish,english,portuguese}" 736 | 14,2022-04-28,colombia,coyaima,28443,"{spanish,english}" 737 | 15,2022-04-28,colombia,manati,13456,"{spanish,english}" 738 | 16,2022-04-28,dominican republic,licey al medio,25539,{spanish} 739 | 17,2022-04-28,mexico,amatitan,15344,"{spanish,english}" 740 | 18,2022-04-28,mexico,san juan zitlaltepec,19600,"{spanish,english}" 741 | 19,2022-04-28,mexico,teopisca,20044,"{english,spanish}" 742 | 20,2022-08-11,brazil,charqueadas,35320,"{portuguese,spanish,english}" 743 | 21,2022-08-11,brazil,chopinzinho,19679,"{portuguese,english,spanish}" 744 | 22,2022-08-11,brazil,jaguarao,27931,"{portuguese,spanish,english}" 745 | 23,2022-08-11,ecuador,vinces,30241,"{spanish,quechua}" 746 | 24,2022-08-11,guatemala,el palmar,31706,{spanish} 747 | 25,2022-08-11,nicaragua,villanueva,25660,{spanish} 748 | 26,2022-08-11,puerto rico,guaynabo,70542,"{english,spanish}" 749 | 750 | */ 751 | 752 | 753 | -------------------------------------------------------------------------------- /source_data/scripts/create_relationships.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQL Code Challenge 3 | Author: Jaime M. Shaker 4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 5 | Website: https://www.shaker.dev 6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 7 | 8 | File Name: create_relationships.sql 9 | */ 10 | 11 | -- Alter cleaned_data.countries and add the UNIQUE constraint to country_code_2 12 | 13 | ALTER TABLE 14 | cleaned_data.countries 15 | ADD CONSTRAINT 16 | unique_country_code_2 17 | UNIQUE (country_code_2); 18 | 19 | -- Alter all other tables and add a foreign key constraint and reference. 20 | -- Create Foreign Key relationship for cleaned_data.cities 21 | ALTER TABLE 22 | -- Table to be altered 23 | cleaned_data.cities 24 | ADD CONSTRAINT 25 | -- Give this constraint a name 26 | fk_country_city 27 | -- Which key in cleaned_data.cities is a foreign key 28 | FOREIGN KEY (country_code_2) 29 | -- Which key to reference from parent table 30 | REFERENCES cleaned_data.countries (country_code_2); 31 | 32 | -- Create Foreign Key relationship for cleaned_data.currencies 33 | ALTER TABLE 34 | cleaned_data.currencies 35 | ADD CONSTRAINT 36 | fk_country_currencies 37 | FOREIGN KEY (country_code_2) 38 | REFERENCES cleaned_data.countries (country_code_2); 39 | 40 | -- Create Foreign Key relationship for cleaned_data.languages 41 | ALTER TABLE 42 | cleaned_data.languages 43 | ADD CONSTRAINT 44 | fk_country_languages 45 | FOREIGN KEY (country_code_2) 46 | REFERENCES cleaned_data.countries (country_code_2); 47 | 48 | -------------------------------------------------------------------------------- /source_data/scripts/drop_all_tables.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQL Code Challenge 3 | Author: Jaime M. Shaker 4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 5 | Website: https://www.shaker.dev 6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 7 | 8 | File Name: drop_all_tables.sql 9 | 10 | Description: This script will drop all tables, schemas and constraints used in 11 | this project. 12 | */ 13 | 14 | -- Drop import_data tables and schema. 15 | DROP TABLE IF EXISTS import_data.countries; 16 | DROP TABLE IF EXISTS import_data.cities; 17 | DROP TABLE IF EXISTS import_data.currencies; 18 | DROP TABLE IF EXISTS import_data.languages; 19 | DROP SCHEMA IF EXISTS import_data; 20 | 21 | -- Drop Foreign Key relationship for cleaned_data.currencies 22 | ALTER TABLE IF EXISTS 23 | -- Table to be altered 24 | cleaned_data.cities 25 | DROP CONSTRAINT IF EXISTS 26 | -- Drop this constraint name 27 | fk_country_city; 28 | 29 | -- Drop Foreign Key relationship for cleaned_data.currencies 30 | ALTER TABLE IF EXISTS 31 | cleaned_data.currencies 32 | DROP CONSTRAINT IF EXISTS 33 | fk_country_currencies; 34 | 35 | -- Drop Foreign Key relationship for cleaned_data.languages 36 | ALTER TABLE IF EXISTS 37 | cleaned_data.languages 38 | DROP CONSTRAINT IF EXISTS 39 | fk_country_languages; 40 | 41 | -- Drop cleaned_data tables and schema. 42 | DROP TABLE IF EXISTS cleaned_data.countries; 43 | DROP TABLE IF EXISTS cleaned_data.cities; 44 | DROP TABLE IF EXISTS cleaned_data.currencies; 45 | DROP TABLE IF EXISTS cleaned_data.languages; 46 | DROP SCHEMA IF EXISTS cleaned_data CASCADE; -------------------------------------------------------------------------------- /source_data/scripts/normalize_tables.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQL Code Challenge 3 | Author: Jaime M. Shaker 4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 5 | Website: https://www.shaker.dev 6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 7 | 8 | File Name: normalize_tables.sql 9 | */ 10 | 11 | -- We must insure that our data is properly organized. Let's create a schema 12 | -- specificaly for our transformed, clean data. 13 | CREATE SCHEMA IF NOT EXISTS cleaned_data; 14 | -- Create countries table 15 | -- Drop this table everytime this script is run to ensure repeatability. 16 | -- We are adding CASCADE for when we add foreign key relationships. 17 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE; 18 | -- We must presuppose that the data in it's current state is unusable/unreliable. 19 | -- 20 | CREATE TABLE cleaned_data.countries ( 21 | -- The country_id was automatically incremented and can be entered as an integer. 22 | country_id INT NOT NULL, 23 | -- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text). 24 | -- Use text to ensure that there are no errors due to string length. 25 | country_name TEXT NOT NULL, 26 | -- The data appears to have duplicate entries so we will remove them once the data has been cleaned of 27 | -- any unwanted characters. 28 | country_code_2 varchar(2) NOT NULL, 29 | country_code_3 varchar(3) NOT NULL, 30 | region TEXT, 31 | sub_region TEXT, 32 | intermediate_region TEXT, 33 | created_on date, 34 | PRIMARY KEY (country_id) 35 | ); 36 | 37 | INSERT INTO cleaned_data.countries ( 38 | country_id, 39 | country_name, 40 | country_code_2, 41 | country_code_3, 42 | region, 43 | sub_region, 44 | intermediate_region, 45 | created_on 46 | ) 47 | ( 48 | SELECT 49 | i.country_id, 50 | -- regex_replace() function removes any special characters with a simple regex-expression. 51 | -- trim() function removes white space from either end of the string. 52 | -- lower() function converts all characters to lowercase. 53 | trim(lower(regexp_replace(i.country_name, '[^\w\s^. ]', '', 'gi'))), 54 | -- Properly cast type from TEXT into new table 55 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar, 56 | trim(lower(regexp_replace(i.country_code_3, '[^\w\s^. ]', '', 'gi')))::varchar, 57 | trim(lower(regexp_replace(i.region, '[^\w\s^. ]', '', 'gi'))), 58 | trim(lower(regexp_replace(i.sub_region, '[^\w\s^. ]', '', 'gi'))), 59 | trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^. ]', '', 'gi'))), 60 | -- Use the built-in function current_date to insert current date into created_on field. 61 | current_date 62 | FROM 63 | import_data.countries AS i 64 | ); 65 | 66 | /* Create cleaned_data.cities table */ 67 | DROP TABLE IF EXISTS cleaned_data.cities; 68 | CREATE TABLE cleaned_data.cities ( 69 | city_id int, 70 | city_name TEXT, 71 | latitude float, 72 | longitude float, 73 | country_code_2 varchar(2) NOT NULL, 74 | capital boolean, 75 | population int, 76 | insert_date date, 77 | PRIMARY KEY (city_id) 78 | ); 79 | 80 | INSERT INTO cleaned_data.cities ( 81 | city_id, 82 | city_name, 83 | latitude, 84 | longitude, 85 | country_code_2, 86 | capital, 87 | population, 88 | insert_date 89 | ) 90 | ( 91 | SELECT 92 | i.city_id, 93 | trim(lower(regexp_replace(i.city_name, '[^\w\s^. ]', '', 'gi'))), 94 | i.longitude::float, 95 | i.latitude::float, 96 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar, 97 | i.capital::boolean, 98 | i.population::int, 99 | i.insert_date::date 100 | FROM 101 | import_data.cities AS i 102 | ); 103 | 104 | /* Create cleaned_data.currencies table */ 105 | DROP TABLE IF EXISTS cleaned_data.currencies; 106 | CREATE TABLE cleaned_data.currencies ( 107 | currency_id int, 108 | country_code_2 varchar(2) NOT NULL, 109 | currency_name TEXT, 110 | currency_code TEXT, 111 | PRIMARY KEY (currency_id) 112 | ); 113 | 114 | INSERT INTO cleaned_data.currencies ( 115 | currency_id, 116 | country_code_2, 117 | currency_name, 118 | currency_code 119 | ) 120 | ( 121 | SELECT 122 | currency_id, 123 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar, 124 | trim(lower(regexp_replace(i.currency_name, '[^\w\s^. ]', '', 'gi'))), 125 | trim(lower(regexp_replace(i.currency_code, '[^\w\s^. ]', '', 'gi'))) 126 | FROM 127 | import_data.currencies AS i 128 | ); 129 | 130 | /* Create cleaned_data.languages table */ 131 | DROP TABLE IF EXISTS cleaned_data.languages; 132 | CREATE TABLE cleaned_data.languages ( 133 | language_id int, 134 | language TEXT, 135 | country_code_2 varchar(2) NOT NULL, 136 | PRIMARY KEY (language_id) 137 | ); 138 | 139 | INSERT INTO cleaned_data.languages ( 140 | language_id, 141 | language, 142 | country_code_2 143 | ) 144 | ( 145 | SELECT 146 | language_id, 147 | trim(lower(regexp_replace(i.language, '[^\w\s^. ]', '', 'gi'))), 148 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar 149 | FROM 150 | import_data.languages AS i 151 | ); 152 | 153 | 154 | -------------------------------------------------------------------------------- /source_data/scripts/remove_duplicates.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQL Code Challenge 3 | Author: Jaime M. Shaker 4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 5 | Website: https://www.shaker.dev 6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 7 | 8 | File Name: remove_duplicates.sql 9 | */ 10 | 11 | -- Remove Duplicate Entries 12 | -- Check for duplicate entries in clean_data.countries 13 | /* 14 | SELECT 15 | -- Get the columns. 16 | country_code_2, 17 | -- Count how many times the country_code_2 occurs. 18 | count(*) 19 | FROM 20 | cleaned_data.countries 21 | GROUP BY 22 | -- Using an aggregate function forces us to group all like names together. 23 | country_code_2 24 | HAVING 25 | -- Only select values that have a count greater than one (multiple entries). 26 | count(*) > 1; 27 | */ 28 | 29 | -- Delete duplicate entries 30 | 31 | DELETE 32 | FROM 33 | -- Add an alias to the id's we wish to keep 34 | cleaned_data.countries AS clean_data_1 35 | USING 36 | -- Add an alias to the duplicate id's 37 | cleaned_data.countries AS clean_data_2 38 | WHERE 39 | -- This statement will remove the greater value id's. 40 | clean_data_1.country_id > clean_data_2.country_id 41 | AND 42 | -- This statement ensures that both values are identical. 43 | clean_data_1.country_code_2 = clean_data_2.country_code_2; 44 | -------------------------------------------------------------------------------- /source_data/scripts/start_fresh.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Although these commands can be done in a simple one-liner, I've made 4 | # the individual commands for new people that may not understand the syntax. 5 | # 6 | # The breakdown of the command are as follows. 7 | # psql: is a terminal-based front-end to PostgreSQL. 8 | # sql_coding_challenge: Name of our database. 9 | # -f: Checks if the file exists and is a regular file. 10 | # some_file.sql: The sql script to execute via psql 11 | 12 | psql sql_coding_challenge -f drop_all_tables.sql 13 | psql sql_coding_challenge -f build_tables.sql 14 | psql sql_coding_challenge -f normalize_tables.sql 15 | psql sql_coding_challenge -f remove_duplicates.sql 16 | psql sql_coding_challenge -f cleanup_db.sql 17 | psql sql_coding_challenge -f create_relationships.sql 18 | echo "Build Process Complete" -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_1_DOCKER.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 9 | 10 | ### Set Up 11 | 12 | This repository contains all of the necessary files, data and directories for running a PostgresSQL server in a Docker Container. The only prerequisite is that you should already have Docker Desktop installed and running on your computer. 13 | 14 | https://www.docker.com/products/docker-desktop/ 15 | 16 | Once we have `Docker Desktop` installed and running, we must have Docker download an Image of Postgres and start running the Postgres container. 17 | 18 | Using your terminal (Powershell/Bash/ect...) and cd (changed directory) to where this project is located. To ensure you are in the correct directory, type this command into the terminal. 19 | 20 | `ls -al` 21 | 22 | Your results should look something similar to this. 23 | 24 | ![alt text](../images/terminal_1.PNG) 25 | 26 | As mentioned before, in the same directory as the `docker-compose.yaml` file, create an empty directory/file folder named '`db`' (Note the red arrow in image). This is where the PostgreSQL container will store internal data and keep your data persistent. 27 | * This directory's path is in the `.gitignore` file which is why it is not included in this repository. 28 | 29 | Once you are in the same directory as the `docker-compose.yaml` file, run this command to start the Docker container. 30 | 31 | `docker-compose up -d` 32 | 33 | * The `-d` portion of the command allows the container to run in the background. 34 | * If this is the first time running this command, this command will download the required Docker image and create a PostgreSQL container which may take a few minutes to complete. Once it is complete and running, your terminal should look something like this. 35 | 36 | ![alt text](../images/terminal_2.PNG) 37 | 38 | * After the initial install, the next time you run the `docker-compose up -d` command, the Docker container will fire right up as long as the image hasn't been deleted. 39 | 40 | If you are using [PGAdmin](https://www.pgadmin.org/), [DBeaver](https://dbeaver.io/) or any other Database management tool, create a connection to the now running PostgreSQL server. 41 | * Host: `localhost` 42 | * Database Name: `sql_coding_challenge` 43 | * Username: `postgres` 44 | * Password: `postgres` 45 | 46 | Alternatively, if you are using [psql](https://gist.github.com/Kartones/dd3ff5ec5ea238d4c546), use these commands in your terminal to connect to the database: 47 | * `docker exec -ti sql_coding_challenge psql -U postgres` 48 | * `\c sql_coding_challenge` 49 | 50 | To stop the Docker container: 51 | * Using the command line, cd into the same directory as the `docker-compose.yaml` file and run the command... 52 | * `docker-compose down` 53 | * You can also use the `Docker Desktop GUI` to stop the container. 54 | 55 | The `docker-compose down` command will stop the container until the next time you start it back up using the command: 56 | * `docker-compose up -d` 57 | 58 | Now that we have our PostgreSQL server up and running, click the link below and let's start creating tables and inserting data. 59 | 60 | Go to [WALKTHROUGH_BUILD](WALKTHROUGH_2_BUILD.md) 61 | 62 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 63 | 64 | 65 | 66 | -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_2_BUILD.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 9 | 10 | ### ETL (Extract/Transform/Load) 11 | 12 | ETL is the process data engineers use to extract data from different sources, transform the data into a usable resource, and load that data into the systems that end-users can access and use to solve business problems. 13 | * Although this term is used on a much larger scale for data warehousing, on a much, much smaller scale, we are doing the same thing by: 14 | * Extracting raw data from CSV files. 15 | * Transforming the data to make it usable. 16 | * Loading the data into tables within our database 17 | 18 | Also known as `Data Wrangling`. 19 | 20 | In Data Analysis, the analyst must ensure that the data is 'clean' before doing any analysis. 'Dirty' data can lead to unreliable, inaccurate and/or misleading results. 21 | 22 | * Garbage in = garbage out. 23 | 24 | These are some of the steps that can be taken to properly prepare this dataset for analysis. 25 | 26 | * Check for duplicate entries and remove them. 27 | * Remove extra spaces and/or other invalid characters. 28 | * Separate or combine values as needed. 29 | * Correct incorrect spelling or inputted data. 30 | * Check for null or empty values. 31 | 32 | Let's create a new file in [source_data/scripts/](../source_data/scripts/) and name it `build_tables.sql`. 33 | 34 | Before we can transform the data, we must get the data into the database. Let's create a script that will consistantly build and rebuild our schema and tables. We will use the `countries.csv` to create the countries table. 35 | 36 | ```sql 37 | -- We must insure that our data is properly organized. Let's create a schema 38 | -- specificaly for importing/copying data from our CSV files. 39 | CREATE SCHEMA IF NOT EXISTS import_data; 40 | -- Create import_data.countries table 41 | -- Drop this table everytime this script is run to ensure repeatability. 42 | DROP TABLE IF EXISTS import_data.countries; 43 | -- Create a new table and add a country_id to use as a primary key. 44 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY. 45 | -- We will also create a created_on field that will have the date that the table was created. 46 | CREATE TABLE import_data.countries ( 47 | -- Create an auto-incrementing, unique key for every row. 48 | -- This will also be used as our Primary Key. 49 | country_id INT GENERATED ALWAYS AS IDENTITY, 50 | country_name TEXT, 51 | country_code_2 TEXT, 52 | country_code_3 TEXT, 53 | region TEXT, 54 | sub_region TEXT, 55 | intermediate_region TEXT, 56 | created_on date, 57 | PRIMARY KEY (country_id) 58 | ); 59 | -- We will use the COPY statment to extract the data from our CSV files. 60 | COPY import_data.countries ( 61 | country_name, 62 | country_code_2, 63 | country_code_3, 64 | region, 65 | sub_region, 66 | intermediate_region 67 | ) 68 | -- PostgreSQL stores its data in /var/lib/postgresql/ 69 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access. 70 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv' 71 | -- The CSV files are comma separated and include headers. 72 | WITH DELIMITER ',' HEADER CSV; 73 | ``` 74 | Execute the complete script and SUCCESS! We have inserted the `countries.csv` into the `import_data.countries` table. 75 | 76 | Let's test our new table and take a look at the first 5 rows of data. 77 | 78 | ```sql 79 | SELECT * FROM import_data.countries LIMIT 5; 80 | ``` 81 | **Results** 82 | 83 | country_id|country_name |country_code_2|country_code_3|region |sub_region |intermediate_region|created_on| 84 | ----------|----------------|--------------|--------------|--------|-----------------|-------------------|----------| 85 | 1|!afg!hANistan |af? |afg |asia |$southern asia | | | 86 | 2| alba$nia |aL |alb |europe! |southern *europe | | | 87 | 3|Alb?ania |AL |alb |eur#ope |$southern e#urope| | | 88 | 4|?algeria? |d!z |dza |africa |northern africa | | | 89 | 5|americ#an samoa?|as |as!m |0oceania|polyne$sia | | | 90 | 91 | As you can see, some countries do not have an `immediate_region` field and none of our entries have a `created_on` value yet. Also notice that some of the country names have special characters, upper/lower case characters and whitespace. 92 | 93 | This will definitely cause some problems if not corrected, however, right now our only concern is to get the data into our database. 94 | 95 | Let's use the same process to add all of the CSV files in `source_data/csv_data/` 96 | * cities.csv 97 | * currencies.csv 98 | * languages.csv 99 | 100 | After adding all the necessary code, our `build_tables.sql` file should now look like this... 101 | 102 | ```sql 103 | /* 104 | SQL Code Challenge 105 | Author: Jaime M. Shaker 106 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 107 | Website: https://www.shaker.dev 108 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 109 | 110 | File Name: build_tables.sql 111 | */ 112 | 113 | -- We must insure that our data is properly organized. Let's create a schema 114 | -- specificaly for importing/copying data from our CSV files. 115 | CREATE SCHEMA IF NOT EXISTS import_data; 116 | -- Create import_data.countries table 117 | -- Drop this table everytime this script is run to ensure repeatability. 118 | DROP TABLE IF EXISTS import_data.countries; 119 | -- Create a new table and add a country_id to use as a primary key. 120 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY. 121 | -- We will also create a created_on field that will have the date that the table was created. 122 | CREATE TABLE import_data.countries ( 123 | -- Create an auto-incrementing, unique key for every row. 124 | -- This will also be used as our Primary Key. 125 | country_id INT GENERATED ALWAYS AS IDENTITY, 126 | country_name TEXT, 127 | country_code_2 TEXT, 128 | country_code_3 TEXT, 129 | region TEXT, 130 | sub_region TEXT, 131 | intermediate_region TEXT, 132 | created_on date, 133 | PRIMARY KEY (country_id) 134 | ); 135 | -- We will use the COPY statment to extract the data from our CSV files. 136 | COPY import_data.countries ( 137 | country_name, 138 | country_code_2, 139 | country_code_3, 140 | region, 141 | sub_region, 142 | intermediate_region 143 | ) 144 | -- PostgreSQL stores its data in /var/lib/postgresql/ 145 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access. 146 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv' 147 | -- The CSV files are comma separated and include headers. 148 | WITH DELIMITER ',' HEADER CSV; 149 | 150 | -- Using the same process, lets create tables for all of our csv files. 151 | /* Create import.cities table */ 152 | DROP TABLE IF EXISTS import_data.cities; 153 | CREATE TABLE import_data.cities ( 154 | city_id INT GENERATED ALWAYS AS IDENTITY, 155 | city_name TEXT, 156 | latitude TEXT, 157 | longitude TEXT, 158 | country_code_2 TEXT, 159 | capital TEXT, 160 | population TEXT, 161 | insert_date TEXT, 162 | PRIMARY KEY (city_id) 163 | ); 164 | 165 | COPY import_data.cities ( 166 | city_name, 167 | latitude, 168 | longitude, 169 | country_code_2, 170 | capital, 171 | population, 172 | insert_date 173 | ) 174 | FROM '/var/lib/postgresql/source_data/csv_data/cities.csv' 175 | WITH DELIMITER ',' HEADER CSV; 176 | 177 | /* Create import.currencies table */ 178 | DROP TABLE IF EXISTS import_data.currencies; 179 | CREATE TABLE import_data.currencies ( 180 | currency_id INT GENERATED ALWAYS AS IDENTITY, 181 | country_code_2 TEXT, 182 | currency_name TEXT, 183 | currency_code TEXT, 184 | PRIMARY KEY (currency_id) 185 | ); 186 | 187 | COPY import_data.currencies ( 188 | country_code_2, 189 | currency_name, 190 | currency_code 191 | ) 192 | FROM '/var/lib/postgresql/source_data/csv_data/currencies.csv' 193 | WITH DELIMITER ',' HEADER CSV; 194 | 195 | /* Create import.languages table */ 196 | DROP TABLE IF EXISTS import_data.languages; 197 | CREATE TABLE import_data.languages ( 198 | language_id INT GENERATED ALWAYS AS IDENTITY, 199 | language TEXT, 200 | country_code_2 TEXT, 201 | PRIMARY KEY (language_id) 202 | ); 203 | 204 | COPY import_data.languages ( 205 | language, 206 | country_code_2 207 | ) 208 | FROM '/var/lib/postgresql/source_data/csv_data/languages.csv' 209 | WITH DELIMITER ',' HEADER CSV; 210 | ``` 211 | Execute the complete script and our data has now been successfully extracted from the CSV files and inserted/copied into our database. This script can be repeated over and over and it will always give the same results. 212 | 213 | We can now TRANSFORM our data into a usable resource. 214 | 215 | Go to [WALKTHROUGH_NORMALIZE](WALKTHROUGH_3_NORMALIZE.md) 216 | 217 | 218 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 219 | 220 | 221 | 222 | -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_3_NORMALIZE.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 9 | 10 | ### Normalization 11 | 12 | According to Wikipedia, [Database normalization](https://en.wikipedia.org/wiki/Database_normalization) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. 13 | 14 | Database normalization forms can be quite involved and go beyond the scope of this walkthrough. In this section, we will restructure our tables with the proper constraints, relationships and "clean" the data as we insert it into our new tables. 15 | 16 | To keep things organised, we will created our new tables with the `clean_data ` schema. 17 | 18 | Create a new file in the `source_data/scripts` named `normalize_tables.sql`. Upon inspecting the import.countries table, we can see that there are duplicate entries, mixed upper/lower case characters, added white space and special characters. 19 | 20 | ```sql 21 | SELECT 22 | country_name, 23 | country_code_2 24 | FROM 25 | import_data.countries 26 | LIMIT 5; 27 | ``` 28 | 29 | **Results:** 30 | 31 | country_name |country_code_2| 32 | ----------------|--------------| 33 | !afg!hANistan |af? | 34 | alba$nia |aL | 35 | Alb?ania |AL | 36 | ?algeria? |d!z | 37 | americ#an samoa?|as | 38 | 39 | Before we can insert our data into our new table. We must ensure that the data has been 'cleaned'. During the insert, lets... 40 | * Remove extra spaces (whitespace) 41 | * Remove special characters. 42 | * Convert all characters to lower case. 43 | 44 | ```sql 45 | /* 46 | SQL Code Challenge 47 | Author: Jaime M. Shaker 48 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 49 | Website: https://www.shaker.dev 50 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 51 | 52 | File Name: normalize_tables.sql 53 | */ 54 | 55 | -- We must insure that our data is properly organized. Let's create a schema 56 | -- specificaly for our transformed, clean data. 57 | CREATE SCHEMA IF NOT EXISTS cleaned_data; 58 | -- Create countries table 59 | -- Drop this table everytime this script is run to ensure repeatability. 60 | -- We are adding CASCADE for when we add foreign key relationships. 61 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE; 62 | -- We must presuppose that the data in it's current state is unusable/unreliable. 63 | -- 64 | CREATE TABLE cleaned_data.countries ( 65 | -- The country_id was automatically incremented and can be entered as an integer. 66 | country_id INT NOT NULL, 67 | -- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text). 68 | -- Use text to ensure that there are no errors due to string length. 69 | country_name TEXT NOT NULL, 70 | -- The data appears to have duplicate entries so we will remove them once the data has been cleaned of 71 | -- any unwanted characters. 72 | country_code_2 varchar(2) NOT NULL, 73 | country_code_3 varchar(3) NOT NULL, 74 | region TEXT, 75 | sub_region TEXT, 76 | intermediate_region TEXT, 77 | created_on date, 78 | PRIMARY KEY (country_id) 79 | ); 80 | 81 | INSERT INTO cleaned_data.countries ( 82 | country_id, 83 | country_name, 84 | country_code_2, 85 | country_code_3, 86 | region, 87 | sub_region, 88 | intermediate_region, 89 | created_on 90 | ) 91 | ( 92 | SELECT 93 | i.country_id, 94 | -- regex_replace() function removes any special characters with a simple regex-expression. 95 | -- trim() function removes white space from either end of the string. 96 | -- lower() function converts all characters to lowercase. 97 | trim(lower(regexp_replace(i.country_name, '[^\w\s^.]', '', 'gi'))), 98 | -- Properly cast type from TEXT into new table 99 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar, 100 | trim(lower(regexp_replace(i.country_code_3, '[^\w\s^.]', '', 'gi')))::varchar, 101 | trim(lower(regexp_replace(i.region, '[^\w\s^.]', '', 'gi'))), 102 | trim(lower(regexp_replace(i.sub_region, '[^\w\s^.]', '', 'gi'))), 103 | trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^.]', '', 'gi'))), 104 | -- Use the built-in function current_date to insert current date into created_on field. 105 | current_date 106 | FROM 107 | import_data.countries AS i 108 | ); 109 | ``` 110 | Now we can run the same query in our new `cleaned_data.countries` table and see our updated results. 111 | 112 | ```sql 113 | SELECT 114 | country_name, 115 | country_code_2 116 | FROM 117 | cleaned_data.countries 118 | LIMIT 5; 119 | ``` 120 | 121 | **Results:** 122 | 123 | country_name |country_code_2| 124 | --------------|--------------| 125 | afghanistan |af | 126 | albania |al | 127 | albania |al | 128 | algeria |dz | 129 | american samoa|as | 130 | 131 | The data has been 'cleaned' and is now in a more usable form. We still note that there are duplicate entries. Before removing any duplicates, let's continue cleaning the data with our `normalize_table.sql` script and insert the data into `cleaned_data.tables`. 132 | 133 | ```sql 134 | /* 135 | SQL Code Challenge 136 | Author: Jaime M. Shaker 137 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 138 | Website: https://www.shaker.dev 139 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 140 | 141 | File Name: normalize_tables.sql 142 | */ 143 | 144 | -- We must insure that our data is properly organized. Let's create a schema 145 | -- specificaly for our transformed, clean data. 146 | CREATE SCHEMA IF NOT EXISTS cleaned_data; 147 | -- Create countries table 148 | -- Drop this table everytime this script is run to ensure repeatability. 149 | -- We are adding CASCADE for when we add foreign key relationships. 150 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE; 151 | -- We must presuppose that the data in it's current state is unusable/unreliable. 152 | -- 153 | CREATE TABLE cleaned_data.countries ( 154 | -- The country_id was automatically incremented and can be entered as an integer. 155 | country_id INT NOT NULL, 156 | -- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text). 157 | -- Use text to ensure that there are no errors due to string length. 158 | country_name TEXT NOT NULL, 159 | -- The data appears to have duplicate entries so we will remove them once the data has been cleaned of 160 | -- any unwanted characters. 161 | country_code_2 varchar(2) NOT NULL, 162 | country_code_3 varchar(3) NOT NULL, 163 | region TEXT, 164 | sub_region TEXT, 165 | intermediate_region TEXT, 166 | created_on date, 167 | PRIMARY KEY (country_id) 168 | ); 169 | 170 | INSERT INTO cleaned_data.countries ( 171 | country_id, 172 | country_name, 173 | country_code_2, 174 | country_code_3, 175 | region, 176 | sub_region, 177 | intermediate_region, 178 | created_on 179 | ) 180 | ( 181 | SELECT 182 | i.country_id, 183 | -- regex_replace() function removes any special characters with a simple regex-expression. 184 | -- trim() function removes white space from either end of the string. 185 | -- lower() function converts all characters to lowercase. 186 | trim(lower(regexp_replace(i.country_name, '[^\w\s^.]', '', 'gi'))), 187 | -- Properly cast type from TEXT into new table 188 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar, 189 | trim(lower(regexp_replace(i.country_code_3, '[^\w\s^.]', '', 'gi')))::varchar, 190 | trim(lower(regexp_replace(i.region, '[^\w\s^.]', '', 'gi'))), 191 | trim(lower(regexp_replace(i.sub_region, '[^\w\s^.]', '', 'gi'))), 192 | trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^.]', '', 'gi'))), 193 | -- Use the built-in function current_date to insert current date into created_on field. 194 | current_date 195 | FROM 196 | import_data.countries AS i 197 | ); 198 | 199 | /* Create cleaned_data.cities table */ 200 | DROP TABLE IF EXISTS cleaned_data.cities; 201 | CREATE TABLE cleaned_data.cities ( 202 | city_id int, 203 | city_name TEXT, 204 | latitude float, 205 | longitude float, 206 | country_code_2 varchar(2) NOT NULL, 207 | capital boolean, 208 | population int, 209 | insert_date date, 210 | PRIMARY KEY (city_id) 211 | ); 212 | 213 | INSERT INTO cleaned_data.cities ( 214 | city_id, 215 | city_name, 216 | latitude, 217 | longitude, 218 | country_code_2, 219 | capital, 220 | population, 221 | insert_date 222 | ) 223 | ( 224 | SELECT 225 | i.city_id, 226 | trim(lower(regexp_replace(i.city_name, '[^\w\s^.]', '', 'gi'))), 227 | i.longitude::float, 228 | i.latitude::float, 229 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar, 230 | i.capital::boolean, 231 | i.population::int, 232 | i.insert_date::date 233 | FROM 234 | import_data.cities AS i 235 | ); 236 | 237 | /* Create cleaned_data.currencies table */ 238 | DROP TABLE IF EXISTS cleaned_data.currencies; 239 | CREATE TABLE cleaned_data.currencies ( 240 | currency_id int, 241 | country_code_2 varchar(2) NOT NULL, 242 | currency_name TEXT, 243 | currency_code TEXT, 244 | PRIMARY KEY (currency_id) 245 | ); 246 | 247 | INSERT INTO cleaned_data.currencies ( 248 | currency_id, 249 | country_code_2, 250 | currency_name, 251 | currency_code 252 | ) 253 | ( 254 | SELECT 255 | currency_id, 256 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar, 257 | trim(lower(regexp_replace(i.currency_name, '[^\w\s^.]', '', 'gi'))), 258 | trim(lower(regexp_replace(i.currency_code, '[^\w\s^.]', '', 'gi'))) 259 | FROM 260 | import_data.currencies AS i 261 | ); 262 | 263 | /* Create cleaned_data.languages table */ 264 | DROP TABLE IF EXISTS cleaned_data.languages; 265 | CREATE TABLE cleaned_data.languages ( 266 | language_id int, 267 | language TEXT, 268 | country_code_2 varchar(2) NOT NULL, 269 | PRIMARY KEY (language_id) 270 | ); 271 | 272 | INSERT INTO cleaned_data.languages ( 273 | language_id, 274 | language, 275 | country_code_2 276 | ) 277 | ( 278 | SELECT 279 | language_id, 280 | trim(lower(regexp_replace(i.language, '[^\w\s^.]', '', 'gi'))), 281 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar 282 | FROM 283 | import_data.languages AS i 284 | ); 285 | 286 | ``` 287 | 288 | Click the link to view the complete [normalize_table.sql](../source_data/scripts/normalize_tables.sql) 289 | 290 | Once all of your tables loaded with our cleaned data, click the link below 291 | 292 | Go to [WALKTHROUGH_REMOVE_DUPLICATES](WALKTHROUGH_4_REMOVE_DUPLICATES.md) 293 | 294 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 295 | 296 | 297 | 298 | -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_4_REMOVE_DUPLICATES.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Analytics Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | 9 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 10 | 11 | ### Removing Duplicates 12 | 13 | Removing duplicate entries is part of the Normalization process. I've placed this process in its own file to reduce clutter in the `normalize_tables.sql` script. 14 | 15 | Create a new file in `source_data/scripts/` and name it `remove_duplicates.sql`. In this script, we remove duplicate entries in our tables. 16 | 17 | After inspecting our data, we see that every country must have a unique `country_code_2`. 18 | 19 | Let's create a query that checks for multiple entries of `country_code_2` in our `clean_data.countries` table. 20 | 21 | ```sql 22 | -- Remove Duplicate Entries 23 | -- Check for duplicate entries 24 | SELECT 25 | -- Get the columns. 26 | country_code_2, 27 | -- Count how many times the country_code_2 occurs. 28 | count(*) 29 | FROM 30 | cleaned_data.countries 31 | GROUP BY 32 | -- Using an aggregate function forces us to group all exact country_codes together. 33 | country_code_2 34 | HAVING 35 | -- Only select values that have a count greater than one (multiple entries). 36 | count(*) > 1; 37 | ``` 38 | 39 | **Results:** 40 | 41 | country_code_2|count| 42 | --------------|-----| 43 | cn | 3| 44 | al | 2| 45 | sa | 2| 46 | ie | 2| 47 | ly | 2| 48 | mo | 2| 49 | re | 2| 50 | ph | 2| 51 | zw | 2| 52 | az | 2| 53 | kw | 2| 54 | cy | 3| 55 | rs | 2| 56 | mh | 2| 57 | 58 | According to our results, there are 14 `country_codes_2` with multiple entries. Let's create another query that `deletes` all of the multiple entries. 59 | 60 | ```sql 61 | -- Delete duplicate entries 62 | DELETE 63 | FROM 64 | -- Add an alias to the id's we wish to keep 65 | cleaned_data.countries AS clean_data_1 66 | USING 67 | -- Add an alias to the duplicate id's 68 | cleaned_data.countries AS clean_data_2 69 | WHERE 70 | -- This statement will remove the greater value id's. 71 | clean_data_1.country_id > clean_data_2.country_id 72 | AND 73 | -- This statement ensures that both values are identical. 74 | clean_data_1.country_code_2 = clean_data_2.country_code_2; 75 | ``` 76 | 77 | Now, if we were to rerun our first query which checked for duplicate entries, it should not return anything. 78 | 79 | ```sql 80 | -- Remove Duplicate Entries 81 | -- Check for duplicate entries 82 | SELECT 83 | -- Get the columns. 84 | country_code_2, 85 | -- Count how many times the country_code_2 occurs. 86 | count(*) 87 | FROM 88 | cleaned_data.countries 89 | GROUP BY 90 | -- Using an aggregate function forces us to group all like names together. 91 | country_code_2 92 | HAVING 93 | -- Only select values that have a count greater than one (multiple entries). 94 | count(*) > 1; 95 | ``` 96 | **Results:** 97 | 98 | country_code_2|count| 99 | --------------|-----| 100 | 101 | There are no results. We have successfully deleted multiple entries in the `cleaned_data.countries` table. We can also take a peek at our `clean_data.countries` table and check if it removed the duplicate entries. Initally we saw that `Albania` had multiple entries. 102 | 103 | ```sql 104 | SELECT 105 | country_name, 106 | country_code_2 107 | FROM 108 | cleaned_data.countries 109 | LIMIT 5; 110 | ``` 111 | 112 | **Results:** 113 | country_name |country_code_2| 114 | --------------|--------------| 115 | afghanistan |af | 116 | albania |al | 117 | algeria |dz | 118 | american samoa|as | 119 | andorra |ad | 120 | 121 | The multiple entries for `Albania` have been deleted. 122 | 123 | This query works for the `clean_data.countries` table because every country has one unique `country_code_2`. The problem with using this method with the other tables is that many different countries share cities of the same name, currency type and languages. The other tables would have to seach for a combination of the table fields. To reduce clutter, the only table that has duplicate entries is the `clean_data.countries` table. 124 | 125 | If for whatever reason you needed to start over, you can easily rerun the `build_tables.sql` and then the `normalize_table.sql` scripts and regenerate the data consistently. 126 | 127 | The complete `remove_duplicates.sql` script should look like... 128 | 129 | ```sql 130 | /* 131 | SQL Code Challenge 132 | Author: Jaime M. Shaker 133 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 134 | Website: https://www.shaker.dev 135 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 136 | 137 | File Name: remove_duplicates.sql 138 | */ 139 | 140 | -- Remove Duplicate Entries 141 | -- Check for duplicate entries in clean_data.countries 142 | SELECT 143 | -- Get the columns. 144 | country_code_2, 145 | -- Count how many times the country_code_2 occurs. 146 | count(*) 147 | FROM 148 | cleaned_data.countries 149 | GROUP BY 150 | -- Using an aggregate function forces us to group all like names together. 151 | country_code_2 152 | HAVING 153 | -- Only select values that have a count greater than one (multiple entries). 154 | count(*) > 1; 155 | 156 | -- Delete duplicate entries 157 | 158 | DELETE 159 | FROM 160 | -- Add an alias to the id's we wish to keep 161 | cleaned_data.countries AS clean_data_1 162 | USING 163 | -- Add an alias to the duplicate id's 164 | cleaned_data.countries AS clean_data_2 165 | WHERE 166 | -- This statement will remove the greater value id's. 167 | clean_data_1.country_id > clean_data_2.country_id 168 | AND 169 | -- This statement ensures that both values are identical. 170 | clean_data_1.country_code_2 = clean_data_2.country_code_2; 171 | ``` 172 | 173 | With our table data cleaned and free of duplicate entries, we can clean-up our database before progressing.. 174 | 175 | Go to [WALKTHROUGH_CLEANUP](WALKTHROUGH_5_CLEANUP.md) 176 | 177 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 178 | 179 | 180 | 181 | -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_5_CLEANUP.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Analytics Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | 9 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 10 | 11 | ### Clean Up 12 | 13 | Now that are tables have been cleaned, we can actually begin answering the questions of the `SQL Coding Challenge`. Before we do that, let's do a little clean up and `Drop` our import schema and the tables associated with it as they are not needed anymore. 14 | 15 | If for whatever reason you needed to start over, you can easily rerun the `build_tables.sql` and then the `normalize_table.sql` scripts and regenerate the data consistently. 16 | 17 | `DROP TABLE` removes tables from the database. To empty a table of rows without destroying the table, use `DELETE` or `TRUNCATE`. 18 | 19 | The complete `cleanup_db.sql` script should look like... 20 | 21 | ```sql 22 | /* 23 | SQL Code Challenge 24 | Author: Jaime M. Shaker 25 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 26 | Website: https://www.shaker.dev 27 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 28 | 29 | File Name: cleanup_db.sql 30 | */ 31 | 32 | -- Drop All import_data Tables and Schema 33 | 34 | DROP TABLE import_data.countries; 35 | DROP TABLE import_data.cities; 36 | DROP TABLE import_data.currencies; 37 | DROP TABLE import_data.languages; 38 | DROP SCHEMA import_data CASCADE; 39 | ``` 40 | 41 | With our database cleaned of any unnecessary tables and schema, we can now add `Foreign Key` relationships. 42 | 43 | Go to [WALKTHROUGH_RELATIONSHIPS](WALKTHROUGH_6_RELATIONSHIPS.md) 44 | 45 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 46 | 47 | 48 | 49 | -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_6_RELATIONSHIPS.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 9 | 10 | ### Foreign Keys 11 | 12 | All of our tables in the `cleaned_data` schema share a common field named `country_code_2`. We also know that this field is `UNIQUE` in the `cleaned_data.countries` table because there can only be one country code per country. 13 | 14 | Using this information, we can make the `cleaned_data.countries` table the parent table and all other tables can have a [One-to-One](https://fmhelp.filemaker.com/help/18/fmp/en/index.html#page/FMP_Help/one-to-one-relationships.html) or a [One-to-Many](https://help.claris.com/archive/help/18/fmp/en/index.html#page/FMP_Help%2Fone-to-many-relationships.html%23) `FOREIGN KEY` relationship to it. 15 | 16 | First, create a new script and name it `create_relationships.sql`. Create a query to `ALTER` the `cleaned_data.countries` table and add the `UNIQUE` constraint to the `country_code_2` field. Then we can `ALTER` the child tables and add a `FOREIGN KEY` constraint. 17 | 18 | ```sql 19 | /* 20 | SQL Code Challenge 21 | Author: Jaime M. Shaker 22 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev 23 | Website: https://www.shaker.dev 24 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/ 25 | 26 | File Name: create_relationships.sql 27 | */ 28 | 29 | -- Alter cleaned_data.countries and add the UNIQUE constraint to country_code_2 30 | 31 | ALTER TABLE 32 | cleaned_data.countries 33 | ADD CONSTRAINT 34 | unique_country_code_2 35 | UNIQUE (country_code_2); 36 | 37 | -- Alter all other tables and add a foreign key constraint and reference. 38 | -- Create Foreign Key relationship for cleaned_data.cities 39 | ALTER TABLE 40 | -- Table to be altered 41 | cleaned_data.cities 42 | ADD CONSTRAINT 43 | -- Give this constraint a name 44 | fk_country_city 45 | -- Which key in cleaned_data.cities is a foreign key 46 | FOREIGN KEY (country_code_2) 47 | -- Which key to reference from parent table 48 | REFERENCES cleaned_data.countries (country_code_2); 49 | 50 | -- Create Foreign Key relationship for cleaned_data.currencies 51 | ALTER TABLE 52 | cleaned_data.currencies 53 | ADD CONSTRAINT 54 | fk_country_currencies 55 | FOREIGN KEY (country_code_2) 56 | REFERENCES cleaned_data.countries (country_code_2); 57 | 58 | -- Create Foreign Key relationship for cleaned_data.languages 59 | ALTER TABLE 60 | cleaned_data.languages 61 | ADD CONSTRAINT 62 | fk_country_languages 63 | FOREIGN KEY (country_code_2) 64 | REFERENCES cleaned_data.countries (country_code_2); 65 | 66 | ``` 67 | 68 | Once you execute the complete script and have created a `FOREIGN KEY` constraint for all of our other tables, your `Entity Relationship Diagram` should look something like this. 69 | 70 | ![alt text](../images/ERD.PNG) 71 | 72 | :warning:**Bonus**:warning: 73 | * Automate this mini ETL process using a [Bash shell script](WALKTHROUGH_BONUS_BASH.md)! 74 | 75 | We are now ready to complete the `SQL Coding Challenge`. 76 | 77 | click the link below 78 | 79 | Go to [WALKTHROUGH_CODE_CHALLENGE](WALKTHROUGH_7_CODE_CHALLENGE.md) 80 | 81 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 82 | 83 | 84 | 85 | -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_7_CODE_CHALLENGE.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 9 | 10 | ### SQL Code Challenge 11 | 12 | :warning: To start over with an empty database, run the the `drop_all_tables.sql` script. :warning: 13 | 14 | 1. Create Database, Tables and Relations. 15 | Using the CSV files located in `source_data/csv_data`, create your new SQL database and tables with the properly formatted data. 16 | 17 | * Add a numeric, auto-incrementing Primary Key to every table. 18 | * In the `countries` table, add the column `created_on` with the current date. 19 | * Create a one-to-one and one-to-many relationship with the countries table as the parent table. 20 | 21 | :exclamation: This question has already been completed with the ETL process. :exclamation: 22 | 23 | 2. List Regions and Country Count 24 | List all of the regions and the total number of countries in each region. Order by country count in descending order and capitalize the region name. 25 | 26 |
27 | Click to expand expected results! 28 | 29 | ##### Expected Results: 30 | 31 | region |country_count| 32 | ---------|-------------| 33 | Africa | 59| 34 | Americas | 57| 35 | Asia | 50| 36 | Europe | 48| 37 | Oceania | 26| 38 | Antartica| 1| 39 | 40 |
41 |

42 | 43 |
44 | Click to expand answer! 45 | 46 | ##### Answer 47 | ```sql 48 | SELECT 49 | -- initcap() capitalizes the first letter of every word in a string. 50 | initcap(region) AS region, 51 | count(*) AS country_count 52 | FROM 53 | cleaned_data.countries 54 | GROUP BY 55 | -- Aggregate functions 'count()' require you to group all column fields. 56 | region 57 | ORDER BY 58 | country_count DESC; 59 | ``` 60 |
61 |
62 | 63 | 3. List Sub-Regions and City Count 64 | List all of the sub-regions and the total number of cities in each sub-region. Order by sub-region name alphabetically. 65 | 66 |
67 | Click to expand expected results! 68 | 69 | ##### Expected Results: 70 | 71 | sub_region |city_count| 72 | -------------------------------|----------| 73 | Australia And New Zealand | 329| 74 | Central Asia | 560| 75 | Eastern Asia | 3164| 76 | Eastern Europe | 2959| 77 | Latin America And The Caribbean| 7204| 78 | Melanesia | 60| 79 | Micronesia | 15| 80 | Northern Africa | 1152| 81 | Northern America | 5844| 82 | Northern Europe | 2025| 83 | Polynesia | 22| 84 | Southeastern Asia | 2627| 85 | Southern Asia | 6848| 86 | Southern Europe | 3238| 87 | Subsaharan Africa | 3223| 88 | Western Asia | 1400| 89 | Western Europe | 3952| 90 | 91 |
92 |

93 | 94 |
95 | Click to expand answer! 96 | 97 | ##### Answer 98 | ```sql 99 | SELECT 100 | initcap(t1.sub_region) AS sub_region, 101 | count(*) AS city_count 102 | FROM 103 | cleaned_data.countries AS t1 104 | JOIN 105 | cleaned_data.cities AS t2 106 | ON 107 | t1.country_code_2 = t2.country_code_2 108 | GROUP BY 109 | t1.sub_region 110 | ORDER BY 111 | t1.sub_region; 112 | ``` 113 |
114 |
115 | 116 | 4. Specific Sub-Region and String Functions 117 | List all of the countries and the total number of cities in the Northern Europe sub-region. List the country names in uppercase and order the list by the length of the country name and alphabetically in ascending order. 118 | 119 |
120 | Click to expand expected results! 121 | 122 | ##### Expected Results: 123 | 124 | country_name |city_count| 125 | --------------|----------| 126 | JERSEY | 1| 127 | LATVIA | 39| 128 | NORWAY | 127| 129 | SWEDEN | 148| 130 | DENMARK | 75| 131 | ESTONIA | 20| 132 | FINLAND | 142| 133 | ICELAND | 12| 134 | IRELAND | 64| 135 | LITHUANIA | 61| 136 | ISLE OF MAN | 2| 137 | FAROE ISLANDS | 29| 138 | UNITED KINGDOM| 1305| 139 | 140 |
141 |

142 | 143 |
144 | Click to expand answer! 145 | 146 | ##### Answer 147 | ```sql 148 | SELECT 149 | -- upper() returns your string in uppercase. 150 | upper(t1.country_name) AS country_name, 151 | count(*) AS city_count 152 | FROM 153 | cleaned_data.countries AS t1 154 | JOIN 155 | cleaned_data.cities AS t2 156 | ON 157 | t1.country_code_2 = t2.country_code_2 158 | WHERE 159 | t1.sub_region = 'northern europe' 160 | GROUP BY 161 | t1.country_name 162 | ORDER BY 163 | -- length() returns the number or characters in a string including spaces. 164 | length(t1.country_name), t1.country_name; 165 | ``` 166 |
167 |
168 | 169 | 170 | 5. List Specific Countries by Year 171 | List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021. Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities. 172 | 173 |
174 | Click to expand expected results! 175 | 176 | ##### Expected Results: 177 | 178 | country_name |city_count| 179 | ----------------------|----------| 180 | Andorra | 5| 181 | Albania | 11| 182 | Bosnia And Herzegovina| 15| 183 | Croatia | 22| 184 | North Macedonia | 28| 185 | Malta | 32| 186 | Serbia | 58| 187 | Slovenia | 74| 188 | Greece | 64| 189 | Portugal | 109| 190 | Spain | 302| 191 | San Marino | 2| 192 | Montenegro | 12| 193 | Italy | 542| 194 | 195 |
196 |

197 | 198 |
199 | Click to expand answer! 200 | 201 | ##### Answer 202 | ```sql 203 | SELECT 204 | initcap(t1.country_name) AS country_name, 205 | count(*) AS city_count 206 | FROM 207 | cleaned_data.countries AS t1 208 | JOIN 209 | cleaned_data.cities AS t2 210 | ON 211 | t1.country_code_2 = t2.country_code_2 212 | WHERE 213 | t1.sub_region = 'southern europe' 214 | AND 215 | -- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours..... 216 | EXTRACT('year' FROM t2.insert_date) = 2021 217 | GROUP BY 218 | t1.country_name 219 | ORDER BY 220 | -- substring() function returns a specific portion of a string. 221 | substring(t1.country_name,length(t1.country_name),1), city_count; 222 | ``` 223 |
224 |
225 | 226 | 6. List Anti-Join 227 | List all of the countries in the region of Asia that did **NOT** have a city with an inserted date from June 2021 through Sept 2021. 228 | 229 |
230 | Click to expand expected results! 231 | 232 | ##### Expected Results: 233 | 234 | country_name | 235 | -----------------| 236 | Brunei Darussalam| 237 | Kuwait | 238 | Macao | 239 | Singapore | 240 | 241 |
242 |

243 | 244 |
245 | Click to expand answer! 246 | 247 | ##### Answer 248 | ```sql 249 | SELECT 250 | -- Distinct will only return unique values 251 | DISTINCT initcap(t1.country_name) AS country_name 252 | FROM 253 | cleaned_data.countries AS t1 254 | -- Left join will return all matching values from the left table (cleaned_data.countries) and 255 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where 256 | -- there are no matches in the left table. 257 | LEFT JOIN 258 | cleaned_data.cities AS t2 259 | ON 260 | t1.country_code_2 = t2.country_code_2 261 | AND 262 | t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01' 263 | WHERE 264 | t1.region = 'asia' 265 | -- Only return values that did NOT have a match with the countries table. 266 | AND 267 | t2.country_code_2 IS NULL; 268 | ``` 269 |
270 |
271 | 272 | 7. Reversable Names 273 | List the country, city name, population and city name length for the city names that are [palindromes](https://en.wikipedia.org/wiki/Palindrome) in the Western Asia sub-region. Format the population with a thousands separator (1,000) and format the length of the city name in roman numerals. Order by the length of the city name in descending order and alphabetically in ascending order. 274 | 275 |
276 | Click to expand expected results! 277 | 278 | ##### Expected Results: 279 | 280 | country_name |city_name|population|roman_numeral_length| 281 | --------------------|---------|----------|--------------------| 282 | Yemen |Hajjah | 46,568 | VI | 283 | Syrian Arab Republic|Hamah | 696,863 | V | 284 | Turkey |Kavak | 21,692 | V | 285 | Turkey |Kinik | 29,803 | V | 286 | Turkey |Tut | 10,161 | III | 287 | 288 |
289 |

290 | 291 |
292 | Click to expand answer! 293 | 294 | ##### Answer 295 | ```sql 296 | SELECT 297 | initcap(t1.country_name) AS country_name, 298 | initcap(t2.city_name) AS city_name, 299 | -- to_char() takes non-string data types and returns them as strings. 300 | to_char(t2.population, '999G999') AS population, 301 | to_char(length(t2.city_name), 'RN') AS roman_numeral_length 302 | FROM 303 | cleaned_data.countries AS t1 304 | JOIN 305 | cleaned_data.cities AS t2 306 | ON 307 | t1.country_code_2 = t2.country_code_2 308 | WHERE 309 | t2.city_name = reverse(t2.city_name) 310 | AND 311 | t1.sub_region = 'western asia' 312 | ORDER BY 313 | length(t2.city_name) DESC, t2.city_name ASC; 314 | ``` 315 |
316 |
317 | 318 | 8. Search with Wildcard and Case 319 | List all of the countries that end in 'stan'. Make your query case-insensitive and list whether the total population of the cities listed is an odd or even number for cities inserted in 2022. Order by whether the population value is odd or even in ascending order and country name in alphabetical order. 320 | 321 |
322 | Click to expand expected results! 323 | 324 | ##### Expected Results: 325 | 326 | country_name|total_population|odd_or_even| 327 | ------------|----------------|-----------| 328 | Afghanistan | 6,006,530 |Even | 329 | Kazakhstan | 4,298,264 |Even | 330 | Kyrgyzstan | 1,017,644 |Even | 331 | Pakistan | 26,344,480 |Even | 332 | Tajikistan | 2,720,953 |Odd | 333 | Turkmenistan| 419,607 |Odd | 334 | Uzbekistan | 3,035,547 |Odd | 335 | 336 |
337 |

338 | 339 |
340 | Click to expand answer! 341 | 342 | ##### Answer 343 | ```sql 344 | SELECT 345 | initcap(t1.country_name) AS country_name, 346 | to_char(sum(t2.population), '99G999G999') total_population, 347 | CASE 348 | WHEN (sum(t2.population) % 2) = 0 349 | THEN 'Even' 350 | ELSE 351 | 'Odd' 352 | END AS odd_or_even 353 | FROM 354 | cleaned_data.countries AS t1 355 | JOIN 356 | cleaned_data.cities AS t2 357 | ON 358 | t1.country_code_2 = t2.country_code_2 359 | WHERE 360 | t1.country_name ILIKE '%stan' 361 | AND 362 | EXTRACT('year' FROM t2.insert_date) = 2022 363 | GROUP BY 364 | t1.country_name 365 | ORDER BY 366 | odd_or_even, country_name; 367 | ``` 368 |
369 |
370 | 371 | 9. Ranking Regions 372 | List the third most populated city ranked by region WITHOUT using limit or offset. List the region name, city name, population and order the results by region. 373 | 374 |
375 | Click to expand expected results! 376 | 377 | ##### Expected Results: 378 | 379 | region |city_name|third_largest_pop| 380 | --------|---------|-----------------| 381 | Africa |Kinshasa | 12,836,000 | 382 | Americas|New York | 18,972,871 | 383 | Asia |Delhi | 32,226,000 | 384 | Europe |Paris | 11,060,000 | 385 | Oceania |Brisbane | 2,360,241 | 386 | 387 |
388 |

389 | 390 |
391 | Click to expand answer! 392 | 393 | ##### Answer 394 | ```sql 395 | WITH get_city_rank_cte AS ( 396 | SELECT 397 | t1.region, 398 | t2.city_name, 399 | t2.population AS third_largest_pop, 400 | DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk 401 | FROM 402 | cleaned_data.countries AS t1 403 | JOIN 404 | cleaned_data.cities AS t2 405 | ON 406 | t1.country_code_2 = t2.country_code_2 407 | WHERE 408 | t2.population IS NOT NULL 409 | GROUP BY 410 | t1.region, 411 | t2.city_name, 412 | t2.population 413 | ) 414 | SELECT 415 | initcap(region) AS region, 416 | initcap(city_name) AS city_name, 417 | to_char(third_largest_pop, '99G999G999') AS third_largest_pop 418 | FROM 419 | get_city_rank_cte 420 | WHERE 421 | rnk = 3; 422 | ``` 423 |
424 |
425 | 426 | 10. Using Buckets 427 | List the bottom third of all countries in the Western Asia sub-region that speak Arabic. Include the row number and country name. Order by row number. 428 | 429 |
430 | Click to expand expected results! 431 | 432 | ##### Expected Results: 433 | 434 | row_id|country_name | 435 | ------|--------------------| 436 | 9|saudi arabia | 437 | 10|syrian arab republic| 438 | 11|united arab emirates| 439 | 12|yemen | 440 | 441 |
442 |

443 | 444 |
445 | Click to expand answer! 446 | 447 | ##### Answer 448 | ```sql 449 | WITH get_ntile_cte AS ( 450 | SELECT 451 | ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn, 452 | t1.country_name, 453 | -- ntile() window functions returns groups of data section into 'buckets'. 454 | NTILE(3) OVER (ORDER BY t1.country_name) AS nt 455 | FROM 456 | cleaned_data.countries AS t1 457 | JOIN 458 | cleaned_data.languages AS t2 459 | ON 460 | t1.country_code_2 = t2.country_code_2 461 | WHERE 462 | t1.sub_region = 'western asia' 463 | AND 464 | t2.language = 'arabic' 465 | ) 466 | SELECT 467 | rn AS row_id, 468 | country_name 469 | FROM 470 | get_ntile_cte 471 | WHERE 472 | nt = 3; 473 | ``` 474 |
475 |
476 | 477 | 11. Using Arrays 478 | Create a query that lists country name, capital name, population, languages spoken and currency name for countries in the Northen Africa sub-region. There can be multiple currency names and languages spoken per country. Add multiple values for the same field into an array. 479 | 480 |
481 | Click to expand expected results! 482 | 483 | ##### Expected Results: 484 | 485 | country_name|city_name|population|languages |currencies | 486 | ------------|---------|----------|--------------------------------------------|---------------| 487 | algeria |algiers | 3415811|{french,arabic,kabyle} |algerian dinar | 488 | egypt |cairo | 20296000|{arabic} |egyptian pound | 489 | libya |tripoli | 1293016|{arabic} |libyan dinar | 490 | morocco |rabat | 572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham| 491 | sudan |khartoum | 7869000|{arabic,english} |sudanese pound | 492 | tunisia |tunis | 1056247|{french,arabic} |tunisian dinar | 493 | 494 |
495 |

496 | 497 |
498 | Click to expand answer! 499 | 500 | ##### Answer 501 | ```sql 502 | WITH get_row_values AS ( 503 | SELECT 504 | t1.country_name, 505 | t2.city_name, 506 | t2.population, 507 | -- array_agg() aggregates multiple values and returns them in 'array' format. 508 | array_agg(t3.LANGUAGE) AS languages, 509 | t4.currency_name AS currencies 510 | FROM 511 | cleaned_data.countries AS t1 512 | JOIN 513 | cleaned_data.cities AS t2 514 | ON 515 | t1.country_code_2 = t2.country_code_2 516 | JOIN 517 | cleaned_data.languages AS t3 518 | ON 519 | t1.country_code_2 = t3.country_code_2 520 | JOIN 521 | cleaned_data.currencies AS t4 522 | ON 523 | t1.country_code_2 = t4.country_code_2 524 | WHERE 525 | t1.sub_region = 'northern africa' 526 | AND 527 | t2.capital = TRUE 528 | GROUP BY 529 | t1.country_name, 530 | t2.city_name, 531 | t2.population, 532 | t4.currency_name 533 | ) 534 | SELECT 535 | * 536 | FROM 537 | get_row_values; 538 | ``` 539 |
540 |
541 | 542 | 12. Using Case and Percentages 543 | Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022. List how many vowels and consonants are present in the city name and concatnate their percentage to the their respective count in parenthesis. 544 | 545 |
546 | Click to expand expected results! 547 | 548 | ##### Expected Results: 549 | 550 | city_name |vowel_count_perc|consonants_count_perc| 551 | ---------------|----------------|---------------------| 552 | standish |2 (25.00%) |6 (75%) | 553 | grand forks |2 (18.18%) |9 (81.82%) | 554 | camano |3 (50.00%) |3 (50%) | 555 | cedar hills |3 (27.27%) |8 (72.73%) | 556 | gladstone |3 (33.33%) |6 (66.67%) | 557 | whitehall |3 (33.33%) |6 (66.67%) | 558 | homewood |4 (50.00%) |4 (50%) | 559 | willowbrook |4 (36.36%) |7 (63.64%) | 560 | port salerno |4 (33.33%) |8 (66.67%) | 561 | vadnais heights|5 (33.33%) |10 (66.67%) | 562 | jeffersonville |5 (35.71%) |9 (64.29%) | 563 | 564 |
565 |

566 | 567 |
568 | Click to expand answer! 569 | 570 | ##### Answer 571 | ```sql 572 | WITH get_letter_count AS ( 573 | SELECT 574 | t1.city_name, 575 | length(t1.city_name) string_length, 576 | -- regexp_replace() returns a vlue that has been manipulated by a regular expression. 577 | length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count 578 | FROM 579 | cleaned_data.cities AS t1 580 | WHERE 581 | t1.insert_date = '2022-04-28' 582 | AND 583 | t1.country_code_2 in ('us') 584 | ), 585 | get_letter_diff AS ( 586 | SELECT 587 | city_name, 588 | string_length - consonant_count AS vowels, 589 | round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc, 590 | consonant_count AS consonants, 591 | round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc 592 | FROM 593 | get_letter_count 594 | ) 595 | SELECT 596 | city_name, 597 | vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc, 598 | consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc 599 | FROM 600 | get_letter_diff 601 | ORDER BY 602 | vowels; 603 | ``` 604 |
605 |
606 | 607 | 13. Most Consecutive Days 608 | List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted in April 2022. 609 | 610 |
611 | Click to expand expected results! 612 | 613 | ##### Expected Results: 614 | 615 | most_consecutive_dates|city_name | 616 | ----------------------|------------| 617 | 2022-04-22|South Dundas| 618 | 2022-04-23|La Prairie | 619 | 2022-04-24|Elliot Lake | 620 | 2022-04-25|Lachute | 621 | 622 |
623 |

624 | 625 |
626 | Click to expand answer! 627 | 628 | ##### Answer 629 | ```sql 630 | DROP TABLE IF EXISTS get_dates; 631 | CREATE TEMP TABLE get_dates AS ( 632 | SELECT 633 | DISTINCT ON (insert_date) insert_date AS insert_date, 634 | city_name 635 | FROM 636 | cleaned_data.cities 637 | WHERE 638 | country_code_2 = 'ca' 639 | AND 640 | insert_date BETWEEN '2022-04-01' AND '2022-04-30' 641 | ORDER BY 642 | insert_date 643 | ); 644 | 645 | DROP TABLE IF EXISTS get_diff; 646 | CREATE TEMP TABLE get_diff AS ( 647 | SELECT 648 | city_name, 649 | insert_date, 650 | EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff 651 | FROM 652 | get_dates 653 | ); 654 | 655 | 656 | DROP TABLE IF EXISTS get_diff_count; 657 | CREATE TEMP TABLE get_diff_count AS ( 658 | SELECT 659 | city_name, 660 | insert_date, 661 | count(*) OVER (PARTITION BY diff) AS diff_count 662 | FROM 663 | get_diff 664 | ); 665 | 666 | 667 | WITH get_rank AS ( 668 | SELECT 669 | DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk, 670 | insert_date, 671 | city_name 672 | FROM 673 | get_diff_count 674 | ) 675 | SELECT 676 | insert_date AS most_consecutive_dates, 677 | initcap(city_name) AS city_name 678 | FROM 679 | get_rank 680 | WHERE 681 | rnk = 1 682 | ORDER BY 683 | insert_date; 684 | ``` 685 |
686 |
687 | 688 | 14. Month over Month in View 689 | Create a view that lists the month-year, the number of cities inserted for that month, a running city count total and the month over month percentage growth for 2021. 690 | 691 | Format the cities count and the running total with the thousands separator and format the month over month growth with a plus symbol and percentage symbol 692 | 693 | Example: 694 | month_year|cities_inserted|running_total|month_over_month| 695 | ----------|---------------|-------------|----------------| 696 | Feb-2021 | 1,291 | 2,762 |+87.76% | 697 | 698 |
699 | Click to expand expected results! 700 | 701 | ##### Expected Results: 702 | 703 | month_year|cities_inserted|running_total|month_over_month| 704 | ----------|---------------|-------------|----------------| 705 | Jan-2021 | 1,471 | 1,471 | | 706 | Feb-2021 | 1,291 | 2,762 |+87.76% | 707 | Mar-2021 | 1,485 | 4,247 |+53.77% | 708 | Apr-2021 | 1,508 | 5,755 |+35.51% | 709 | May-2021 | 1,509 | 7,264 |+26.22% | 710 | Jun-2021 | 1,395 | 8,659 |+19.20% | 711 | Jul-2021 | 1,394 | 10,053 |+16.10% | 712 | Aug-2021 | 1,481 | 11,534 |+14.73% | 713 | Sep-2021 | 1,452 | 12,986 |+12.59% | 714 | Oct-2021 | 1,446 | 14,432 |+11.14% | 715 | Nov-2021 | 1,378 | 15,810 |+9.55% | 716 | Dec-2021 | 1,472 | 17,282 |+9.31% | 717 | 718 |
719 |

720 | 721 |
722 | Click to expand answer! 723 | 724 | ##### Answer 725 | ```sql 726 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth; 727 | CREATE VIEW cleaned_data.view_2021_growth AS ( 728 | WITH get_month_count AS ( 729 | SELECT 730 | date_trunc('month', insert_date) as single_month, 731 | count(*) AS monthly_count 732 | FROM 733 | cleaned_data.cities 734 | WHERE 735 | EXTRACT('year' FROM insert_date) = 2021 736 | GROUP BY 737 | single_month 738 | ORDER BY 739 | single_month 740 | ), 741 | get_running_total AS ( 742 | SELECT 743 | single_month::date, 744 | monthly_count, 745 | sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities 746 | FROM 747 | get_month_count 748 | ), 749 | get_month_over_month AS ( 750 | SELECT 751 | single_month, 752 | monthly_count, 753 | total_num_cities, 754 | round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month 755 | FROM 756 | get_running_total 757 | ) 758 | SELECT 759 | to_char(single_month, 'Mon-YYYY') AS month_year, 760 | to_char(monthly_count, '9G999') AS cities_inserted, 761 | to_char(total_num_cities, '99G999') AS running_total, 762 | to_char(month_over_month, 'sg99.99') || '%' AS month_over_month 763 | FROM 764 | get_month_over_month 765 | ); 766 | 767 | SELECT 768 | * 769 | FROM 770 | cleaned_data.view_2021_growth; 771 | ``` 772 |
773 |
774 | 775 | 15. Stored Procedure to CSV 776 | Create and call a stored procedure that lists a unique row id number, insert date, country name, city name, population and languages spoken for countries in the Latin America and Caribbean sub-region that were inserted on either '2022-04-09', '2022-04-28' or '2022-08-11'. 777 | 778 | Order by the insert date and output the results (including headers) to a CSV file located in [/source_data/csv_output/](../source_data/csv_output/) . 779 | 780 |
781 | Click to expand expected results! 782 | 783 | ##### Expected Results: 784 | 785 | Results located in [/source_data/csv_output/output.csv](../source_data/csv_output/output.csv) 786 | 787 |
788 |

789 | 790 |
791 | Click to expand answer! 792 | 793 | ##### Answer 794 | ```sql 795 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output () 796 | LANGUAGE plpgsql 797 | AS 798 | $sproc$ 799 | BEGIN 800 | COPY ( 801 | SELECT 802 | ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id, 803 | t1.insert_date, 804 | t2.country_name, 805 | t1.city_name, 806 | t1.population, 807 | array_agg(t3.language) AS languages 808 | FROM 809 | cleaned_data.cities AS t1 810 | JOIN 811 | cleaned_data.countries AS t2 812 | ON 813 | t1.country_code_2 = t2.country_code_2 814 | LEFT JOIN 815 | cleaned_data.languages AS t3 816 | ON 817 | t2.country_code_2 = t3.country_code_2 818 | WHERE 819 | t2.sub_region = 'latin america and the caribbean' 820 | AND 821 | t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11') 822 | GROUP BY 823 | t1.insert_date, 824 | t2.country_name, 825 | t1.city_name, 826 | t1.population 827 | ORDER BY 828 | t1.insert_date 829 | ) 830 | TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER; 831 | END 832 | $sproc$; 833 | 834 | -- Call the stored procedure 835 | CALL cleaned_data.sproc_output(); 836 | ``` 837 |
838 |
839 | 840 | :warning: To start over with an empty database, run the the `drop_all_tables.sql` script. :warning: 841 | 842 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 843 | 844 | 845 | 846 | -------------------------------------------------------------------------------- /walkthrough/WALKTHROUGH_BONUS_BASH.md: -------------------------------------------------------------------------------- 1 | ## Basic/Intermediate SQL Code Challenge 2 | 3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 | 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 9 | 10 | ### Bash Script 11 | 12 | What is Bash? 13 | * The name stands for **B**ourne **A**gain **SH**ell. It is an open source version of the Bourne Shell and was first released in 1989. A shell is a command-line interpreter that provides a command line user interface for operating systems. 14 | 15 | What is a Bash script? 16 | * A Bash script is a plain text file which contains a series of commands. These are the commands we would normally type ouselves on the command line. Using [psql](https://www.postgresguide.com/utilities/psql/), we can create a Bash script that will execute our `.sql` files in order and always start with a fresh, clean database everytime we run the script. 17 | 18 | Do I have Bash on my computer? 19 | * If you are following along this walk-through and using the Docker container, the answer is YES! Bash is the default shell used in Linux and MacOS up to version 10.15 although it remains available as an alternative shell. 20 | 21 | To use the Bash shell, let's connect to our Docker/PostgreSQL container by opening up another command line terminal and type in this command. 22 | 23 | `docker exec -ti sql_coding_challenge bash` 24 | 25 | ![alt text](../images/bash_1.PNG) 26 | 27 | After executing this command, you should be logged into the Docker container as `root`. Keep in mind that the `@a80c64aa877f` portion after `root` is the actual Docker ID for this container, which will be different for your unique container. 28 | 29 | ![alt text](../images/bash_2.PNG) 30 | 31 | We are logged in as the `root` user. Let's `substitute user` with the `su` command. Running around your file system as the `root` user can lead to problems if you don't know what you're doing. In the terminal type and execute the command 32 | 33 | `su postgres` 34 | 35 | ![alt text](../images/bash_3.PNG) 36 | 37 | We are now connected to the container as the user `postgres`. Now we can cd (change directory) to where our scripts are located which we initially set-up as the `/source_data/scripts/` directory.. 38 | 39 | `cd ~/source_data/scripts/` 40 | 41 | We are now in our `source_data/scripts` directory. 42 | 43 | ![alt text](../images/bash_4.PNG) 44 | 45 | Create a new file in the `source_data/scripts` directory and name it 46 | 47 | `start_fresh.sh` 48 | 49 | This is the Bash script that will execute the psql code. Inside of the `start_fresh.sh` file, the bash script should look like this. 50 | 51 | ```bash 52 | #!/bin/bash 53 | 54 | psql sql_coding_challenge -f drop_all_tables.sql 55 | psql sql_coding_challenge -f build_tables.sql 56 | psql sql_coding_challenge -f normalize_tables.sql 57 | psql sql_coding_challenge -f remove_duplicates.sql 58 | psql sql_coding_challenge -f cleanup_db.sql 59 | psql sql_coding_challenge -f create_relationships.sql 60 | echo "Build Process Complete" 61 | ``` 62 | 63 | To run this script, use the following command in the same directory as the script. 64 | 65 | `./start_fresh.sh` or `bash start_fresh.sh` 66 | 67 | :exclamation: Note: :exclamation: 68 | * If upon trying to run the script you receive a `Permission Denied` error, you must run this command first, then try again. 69 | * `chmod u+x start_fresh.sh` 70 | 71 | ![alt text](../images/bash_5.PNG) 72 | 73 | Once the script starts, it will start executing the `.sql` files we just created and once it's complete, should look something similar to this. 74 | 75 | ![alt text](../images/bash_6.PNG) 76 | 77 | We are now right back to where we were when we first started this page. 78 | 79 | However, now we have a script that we can run our SQL files and produce a fresh set-up all the time, everytime. 80 | 81 | We are now ready to complete the `SQL Coding Challenge`. 82 | 83 | click the link below 84 | 85 | Go to [WALKTHROUGH_CODE_CHALLENGE](WALKTHROUGH_7_CODE_CHALLENGE.md) 86 | 87 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation: 88 | 89 | 90 | 91 | --------------------------------------------------------------------------------