├── .gitignore
├── README.md
├── docker-compose.yaml
├── images
├── ERD.PNG
├── bash_1.PNG
├── bash_2.PNG
├── bash_3.PNG
├── bash_4.PNG
├── bash_5.PNG
├── bash_6.PNG
├── terminal_1.PNG
└── terminal_2.PNG
├── source_data
├── csv_data
│ ├── cities.csv
│ ├── countries.csv
│ ├── currencies.csv
│ └── languages.csv
├── csv_output
│ └── output.csv
└── scripts
│ ├── build_tables.sql
│ ├── cleanup_db.sql
│ ├── code_challenge.sql
│ ├── create_relationships.sql
│ ├── drop_all_tables.sql
│ ├── normalize_tables.sql
│ ├── remove_duplicates.sql
│ └── start_fresh.sh
└── walkthrough
├── WALKTHROUGH_1_DOCKER.md
├── WALKTHROUGH_2_BUILD.md
├── WALKTHROUGH_3_NORMALIZE.md
├── WALKTHROUGH_4_REMOVE_DUPLICATES.md
├── WALKTHROUGH_5_CLEANUP.md
├── WALKTHROUGH_6_RELATIONSHIPS.md
├── WALKTHROUGH_7_CODE_CHALLENGE.md
└── WALKTHROUGH_BONUS_BASH.md
/.gitignore:
--------------------------------------------------------------------------------
1 | /db/pgdata
2 | ./vscode
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | This project is an opportunity to practice your SQL skills. This project was inspired by the [Braintree Analytics Code Challenge](https://github.com/AlexanderConnelly/BrainTree_SQL_Coding_Challenge_Data_Analyst) and was created to strengthen/sharpen your SQL knowledge.
9 |
10 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
11 |
12 | **Please Note**:
13 |
14 | This project is meant to measure/gauge/practice your technical abilities with SQL.
15 |
16 | * All work should be done in SQL.
17 | * Do not alter the CSV files.
18 | * Most, if not all of these questions are intended purely as a measure of your SQL skills and not as actual questions that one would expect to answer from an employer.
19 |
20 | An SQL Code Challenge [Walkthrough](walkthrough/WALKTHROUGH_1_DOCKER.md) has been created if you would like to follow along in a step by step fashion.
21 |
22 | The questions are listed below with the expected results and answers hidden so you can cross reference your answers.
23 |
24 | This repository contains all of the necessary files, datasets and directories for running a PostgresSQL server in a Docker Container. The only prerequisite is that you should already have Docker Desktop installed and running on your computer.
25 |
26 | https://www.docker.com/products/docker-desktop/
27 |
28 | ### Getting Started
29 |
30 | * This project uses the latest version of `PostgreSQL`, but any SQL database may be used.
31 | * Directories and Files details:
32 | * `README.md`: This is the file your are currently reading.
33 | * `.gitignore`: The files and directories listed in this file will be ignored by Git/GitHub.
34 | * `docker-compose.yaml`: With this yaml (**Y**et **A**nother **M**arkup **L**anguage) file you can define the services (Postgres/MySQL/Python...) and with a single command, you can spin everything up or tear it all down.
35 | * `source_data/csv_data`: This is the location of the CSV files needed to copy into our database.
36 | * `db`: In the current directory, create an empty directory named '`db`'. Although this directory is ignored by Git/GitHub, PostgreSQL requires a directory to keep data persistent if you are running a Docker container. There will be no need to alter any of the files that reside there once the container starts running.
37 | * `images`: This is the location of the image files displayed in this repository.
38 | * `walkthrough/`: A directory with a beginner friendly walkthrough of all of the steps I took to complete the SQL challenge.
39 | * `source_data/`: This directory contains all the files and scripts within our Docker container.
40 | * `source_data/csv_data`: This is the location of the CSV files needed to copy/insert into our database.
41 | * `source_data/scripts/`: This directory will contain all sql scripts I used to complete the challenge.
42 | * `source_data/csv_output/`: This directory will contain all CSV files generated by some of our queries.
43 |
44 | :grey_exclamation: :star: Start here if you want to follow along with the [SQL Challenge WALKTHROUGH](./walkthrough/WALKTHROUGH_1_DOCKER.md) :star: :grey_exclamation:
45 |
46 | ### SQL Code Challenge
47 | 1. Create Database, Tables and Relations.
48 | Using the CSV files located in `source_data/csv_data`, create your new SQL database and tables with the properly formatted data.
49 |
50 | * Add a numeric, auto-incrementing Primary Key to every table.
51 | * In the `countries` table, add the column `created_on` with the current date.
52 | * Create a one-to-one and one-to-many relationship with the countries table as the parent table.
53 |
54 | 2. List Regions and Country Count
55 | List all of the regions and the total number of countries in each region. Order by country count in descending order and capitalize the region name.
56 |
57 |
58 | Click to expand expected results!
59 |
60 | ##### Expected Results:
61 |
62 | region |country_count|
63 | ---------|-------------|
64 | Africa | 59|
65 | Americas | 57|
66 | Asia | 50|
67 | Europe | 48|
68 | Oceania | 26|
69 | Antartica| 1|
70 |
71 |
72 |
73 |
74 |
75 | Click to expand answer!
76 |
77 | ##### Answer
78 | ```sql
79 | SELECT
80 | -- initcap() capitalizes the first letter of every word in a string.
81 | initcap(region) AS region,
82 | count(*) AS country_count
83 | FROM
84 | cleaned_data.countries
85 | GROUP BY
86 | -- Aggregate functions 'count()' require you to group all column fields.
87 | region
88 | ORDER BY
89 | country_count DESC;
90 | ```
91 |
92 |
93 |
94 | 3. List Sub-Regions and City Count
95 | List all of the sub-regions and the total number of cities in each sub-region. Order by sub-region name alphabetically.
96 |
97 |
98 | Click to expand expected results!
99 |
100 | ##### Expected Results:
101 |
102 | sub_region |city_count|
103 | -------------------------------|----------|
104 | Australia And New Zealand | 329|
105 | Central Asia | 560|
106 | Eastern Asia | 3164|
107 | Eastern Europe | 2959|
108 | Latin America And The Caribbean| 7204|
109 | Melanesia | 60|
110 | Micronesia | 15|
111 | Northern Africa | 1152|
112 | Northern America | 5844|
113 | Northern Europe | 2025|
114 | Polynesia | 22|
115 | Southeastern Asia | 2627|
116 | Southern Asia | 6848|
117 | Southern Europe | 3238|
118 | Subsaharan Africa | 3223|
119 | Western Asia | 1400|
120 | Western Europe | 3952|
121 |
122 |
123 |
124 |
125 |
126 | Click to expand answer!
127 |
128 | ##### Answer
129 | ```sql
130 | SELECT
131 | initcap(t1.sub_region) AS sub_region,
132 | count(*) AS city_count
133 | FROM
134 | cleaned_data.countries AS t1
135 | JOIN
136 | cleaned_data.cities AS t2
137 | ON
138 | t1.country_code_2 = t2.country_code_2
139 | GROUP BY
140 | t1.sub_region
141 | ORDER BY
142 | t1.sub_region;
143 | ```
144 |
145 |
146 |
147 | 4. Specific Sub-Region and String Functions
148 | List all of the countries and the total number of cities in the Northern Europe sub-region. List the country names in uppercase and order the list by the length of the country name and alphabetically in ascending order.
149 |
150 |
151 | Click to expand expected results!
152 |
153 | ##### Expected Results:
154 |
155 | country_name |city_count|
156 | --------------|----------|
157 | JERSEY | 1|
158 | LATVIA | 39|
159 | NORWAY | 127|
160 | SWEDEN | 148|
161 | DENMARK | 75|
162 | ESTONIA | 20|
163 | FINLAND | 142|
164 | ICELAND | 12|
165 | IRELAND | 64|
166 | LITHUANIA | 61|
167 | ISLE OF MAN | 2|
168 | FAROE ISLANDS | 29|
169 | UNITED KINGDOM| 1305|
170 |
171 |
172 |
173 |
174 |
175 | Click to expand answer!
176 |
177 | ##### Answer
178 | ```sql
179 | SELECT
180 | -- upper() returns your string in uppercase.
181 | upper(t1.country_name) AS country_name,
182 | count(*) AS city_count
183 | FROM
184 | cleaned_data.countries AS t1
185 | JOIN
186 | cleaned_data.cities AS t2
187 | ON
188 | t1.country_code_2 = t2.country_code_2
189 | WHERE
190 | t1.sub_region = 'northern europe'
191 | GROUP BY
192 | t1.country_name
193 | ORDER BY
194 | -- length() returns the number or characters in a string including spaces.
195 | length(t1.country_name), t1.country_name;
196 | ```
197 |
198 |
199 |
200 |
201 | 5. List Specific Countries by Year
202 | List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021. Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities.
203 |
204 |
205 | Click to expand expected results!
206 |
207 | ##### Expected Results:
208 |
209 | country_name |city_count|
210 | ----------------------|----------|
211 | Andorra | 5|
212 | Albania | 11|
213 | Bosnia And Herzegovina| 15|
214 | Croatia | 22|
215 | North Macedonia | 28|
216 | Malta | 32|
217 | Serbia | 58|
218 | Slovenia | 74|
219 | Greece | 64|
220 | Portugal | 109|
221 | Spain | 302|
222 | San Marino | 2|
223 | Montenegro | 12|
224 | Italy | 542|
225 |
226 |
227 |
228 |
229 |
230 | Click to expand answer!
231 |
232 | ##### Answer
233 | ```sql
234 | SELECT
235 | initcap(t1.country_name) AS country_name,
236 | count(*) AS city_count
237 | FROM
238 | cleaned_data.countries AS t1
239 | JOIN
240 | cleaned_data.cities AS t2
241 | ON
242 | t1.country_code_2 = t2.country_code_2
243 | WHERE
244 | t1.sub_region = 'southern europe'
245 | AND
246 | -- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours.....
247 | EXTRACT('year' FROM t2.insert_date) = 2021
248 | GROUP BY
249 | t1.country_name
250 | ORDER BY
251 | -- substring() function returns a specific portion of a string.
252 | substring(t1.country_name,length(t1.country_name),1), city_count;
253 | ```
254 |
255 |
256 |
257 | 6. List Anti-Join
258 | List all of the countries in the region of Asia that did **NOT** have a city with an inserted date from June 2021 through Sept 2021.
259 |
260 |
261 | Click to expand expected results!
262 |
263 | ##### Expected Results:
264 |
265 | country_name |
266 | -----------------|
267 | Brunei Darussalam|
268 | Kuwait |
269 | Macao |
270 | Singapore |
271 |
272 |
273 |
274 |
275 |
276 | Click to expand answer!
277 |
278 | ##### Answer
279 | ```sql
280 | SELECT
281 | -- Distinct will only return unique values
282 | DISTINCT initcap(t1.country_name) AS country_name
283 | FROM
284 | cleaned_data.countries AS t1
285 | -- Left join will return all matching values from the left table (cleaned_data.countries) and
286 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where
287 | -- there are no matches in the left table.
288 | LEFT JOIN
289 | cleaned_data.cities AS t2
290 | ON
291 | t1.country_code_2 = t2.country_code_2
292 | AND
293 | t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01'
294 | WHERE
295 | t1.region = 'asia'
296 | -- Only return values that did NOT have a match with the countries table.
297 | AND
298 | t2.country_code_2 IS NULL;
299 | ```
300 |
301 |
302 |
303 | 7. Reversable Names
304 | List the country, city name, population and city name length for the city names that are [palindromes](https://en.wikipedia.org/wiki/Palindrome) in the Western Asia sub-region. Format the population with a thousands separator (1,000) and format the length of the city name in roman numerals. Order by the length of the city name in descending order and alphabetically in ascending order.
305 |
306 |
307 | Click to expand expected results!
308 |
309 | ##### Expected Results:
310 |
311 | country_name |city_name|population|roman_numeral_length|
312 | --------------------|---------|----------|--------------------|
313 | Yemen |Hajjah | 46,568 | VI |
314 | Syrian Arab Republic|Hamah | 696,863 | V |
315 | Turkey |Kavak | 21,692 | V |
316 | Turkey |Kinik | 29,803 | V |
317 | Turkey |Tut | 10,161 | III |
318 |
319 |
320 |
321 |
322 |
323 | Click to expand answer!
324 |
325 | ##### Answer
326 | ```sql
327 | SELECT
328 | initcap(t1.country_name) AS country_name,
329 | initcap(t2.city_name) AS city_name,
330 | -- to_char() takes non-string data types and returns them as strings.
331 | to_char(t2.population, '999G999') AS population,
332 | to_char(length(t2.city_name), 'RN') AS roman_numeral_length
333 | FROM
334 | cleaned_data.countries AS t1
335 | JOIN
336 | cleaned_data.cities AS t2
337 | ON
338 | t1.country_code_2 = t2.country_code_2
339 | WHERE
340 | t2.city_name = reverse(t2.city_name)
341 | AND
342 | t1.sub_region = 'western asia'
343 | ORDER BY
344 | length(t2.city_name) DESC, t2.city_name ASC;
345 | ```
346 |
347 |
348 |
349 | 8. Search with Wildcard and Case
350 | List all of the countries that end in 'stan'. Make your query case-insensitive and list whether the total population of the cities listed is an odd or even number for cities inserted in 2022. Order by whether the population value is odd or even in ascending order and country name in alphabetical order.
351 |
352 |
353 | Click to expand expected results!
354 |
355 | ##### Expected Results:
356 |
357 | country_name|total_population|odd_or_even|
358 | ------------|----------------|-----------|
359 | Afghanistan | 6,006,530 |Even |
360 | Kazakhstan | 4,298,264 |Even |
361 | Kyrgyzstan | 1,017,644 |Even |
362 | Pakistan | 26,344,480 |Even |
363 | Tajikistan | 2,720,953 |Odd |
364 | Turkmenistan| 419,607 |Odd |
365 | Uzbekistan | 3,035,547 |Odd |
366 |
367 |
368 |
369 |
370 |
371 | Click to expand answer!
372 |
373 | ##### Answer
374 | ```sql
375 | SELECT
376 | initcap(t1.country_name) AS country_name,
377 | to_char(sum(t2.population), '99G999G999') total_population,
378 | CASE
379 | WHEN (sum(t2.population) % 2) = 0
380 | THEN 'Even'
381 | ELSE
382 | 'Odd'
383 | END AS odd_or_even
384 | FROM
385 | cleaned_data.countries AS t1
386 | JOIN
387 | cleaned_data.cities AS t2
388 | ON
389 | t1.country_code_2 = t2.country_code_2
390 | WHERE
391 | t1.country_name ILIKE '%stan'
392 | AND
393 | EXTRACT('year' FROM t2.insert_date) = 2022
394 | GROUP BY
395 | t1.country_name
396 | ORDER BY
397 | odd_or_even, country_name;
398 | ```
399 |
400 |
401 |
402 | 9. Ranking Regions
403 | List the third most populated city ranked by region WITHOUT using limit or offset. List the region name, city name, population and order the results by region.
404 |
405 |
406 | Click to expand expected results!
407 |
408 | ##### Expected Results:
409 |
410 | region |city_name|third_largest_pop|
411 | --------|---------|-----------------|
412 | Africa |Kinshasa | 12,836,000 |
413 | Americas|New York | 18,972,871 |
414 | Asia |Delhi | 32,226,000 |
415 | Europe |Paris | 11,060,000 |
416 | Oceania |Brisbane | 2,360,241 |
417 |
418 |
419 |
420 |
421 |
422 | Click to expand answer!
423 |
424 | ##### Answer
425 | ```sql
426 | WITH get_city_rank_cte AS (
427 | SELECT
428 | t1.region,
429 | t2.city_name,
430 | t2.population AS third_largest_pop,
431 | DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk
432 | FROM
433 | cleaned_data.countries AS t1
434 | JOIN
435 | cleaned_data.cities AS t2
436 | ON
437 | t1.country_code_2 = t2.country_code_2
438 | WHERE
439 | t2.population IS NOT NULL
440 | GROUP BY
441 | t1.region,
442 | t2.city_name,
443 | t2.population
444 | )
445 | SELECT
446 | initcap(region) AS region,
447 | initcap(city_name) AS city_name,
448 | to_char(third_largest_pop, '99G999G999') AS third_largest_pop
449 | FROM
450 | get_city_rank_cte
451 | WHERE
452 | rnk = 3;
453 | ```
454 |
455 |
456 |
457 | 10. Using Buckets
458 | List the bottom third of all countries in the Western Asia sub-region that speak Arabic. Include the row number and country name. Order by row number.
459 |
460 |
461 | Click to expand expected results!
462 |
463 | ##### Expected Results:
464 |
465 | row_id|country_name |
466 | ------|--------------------|
467 | 9|saudi arabia |
468 | 10|syrian arab republic|
469 | 11|united arab emirates|
470 | 12|yemen |
471 |
472 |
473 |
474 |
475 |
476 | Click to expand answer!
477 |
478 | ##### Answer
479 | ```sql
480 | WITH get_ntile_cte AS (
481 | SELECT
482 | ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn,
483 | t1.country_name,
484 | -- ntile() window functions returns groups of data section into 'buckets'.
485 | NTILE(3) OVER (ORDER BY t1.country_name) AS nt
486 | FROM
487 | cleaned_data.countries AS t1
488 | JOIN
489 | cleaned_data.languages AS t2
490 | ON
491 | t1.country_code_2 = t2.country_code_2
492 | WHERE
493 | t1.sub_region = 'western asia'
494 | AND
495 | t2.language = 'arabic'
496 | )
497 | SELECT
498 | rn AS row_id,
499 | country_name
500 | FROM
501 | get_ntile_cte
502 | WHERE
503 | nt = 3;
504 | ```
505 |
506 |
507 |
508 | 11. Using Arrays
509 | Create a query that lists country name, capital name, population, languages spoken and currency name for countries in the Northen Africa sub-region. There can be multiple currency names and languages spoken per country. Add multiple values for the same field into an array.
510 |
511 |
512 | Click to expand expected results!
513 |
514 | ##### Expected Results:
515 |
516 | country_name|city_name|population|languages |currencies |
517 | ------------|---------|----------|--------------------------------------------|---------------|
518 | algeria |algiers | 3415811|{french,arabic,kabyle} |algerian dinar |
519 | egypt |cairo | 20296000|{arabic} |egyptian pound |
520 | libya |tripoli | 1293016|{arabic} |libyan dinar |
521 | morocco |rabat | 572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham|
522 | sudan |khartoum | 7869000|{arabic,english} |sudanese pound |
523 | tunisia |tunis | 1056247|{french,arabic} |tunisian dinar |
524 |
525 |
526 |
527 |
528 |
529 | Click to expand answer!
530 |
531 | ##### Answer
532 | ```sql
533 | WITH get_row_values AS (
534 | SELECT
535 | t1.country_name,
536 | t2.city_name,
537 | t2.population,
538 | -- array_agg() aggregates multiple values and returns them in 'array' format.
539 | array_agg(t3.LANGUAGE) AS languages,
540 | t4.currency_name AS currencies
541 | FROM
542 | cleaned_data.countries AS t1
543 | JOIN
544 | cleaned_data.cities AS t2
545 | ON
546 | t1.country_code_2 = t2.country_code_2
547 | JOIN
548 | cleaned_data.languages AS t3
549 | ON
550 | t1.country_code_2 = t3.country_code_2
551 | JOIN
552 | cleaned_data.currencies AS t4
553 | ON
554 | t1.country_code_2 = t4.country_code_2
555 | WHERE
556 | t1.sub_region = 'northern africa'
557 | AND
558 | t2.capital = TRUE
559 | GROUP BY
560 | t1.country_name,
561 | t2.city_name,
562 | t2.population,
563 | t4.currency_name
564 | )
565 | SELECT
566 | *
567 | FROM
568 | get_row_values;
569 | ```
570 |
571 |
572 |
573 | 12. Using Case and Percentages
574 | Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022. List how many vowels and consonants are present in the city name and concatnate their percentage to the their respective count in parenthesis.
575 |
576 |
577 | Click to expand expected results!
578 |
579 | ##### Expected Results:
580 |
581 | city_name |vowel_count_perc|consonants_count_perc|
582 | ---------------|----------------|---------------------|
583 | standish |2 (25.00%) |6 (75%) |
584 | grand forks |2 (18.18%) |9 (81.82%) |
585 | camano |3 (50.00%) |3 (50%) |
586 | cedar hills |3 (27.27%) |8 (72.73%) |
587 | gladstone |3 (33.33%) |6 (66.67%) |
588 | whitehall |3 (33.33%) |6 (66.67%) |
589 | homewood |4 (50.00%) |4 (50%) |
590 | willowbrook |4 (36.36%) |7 (63.64%) |
591 | port salerno |4 (33.33%) |8 (66.67%) |
592 | vadnais heights|5 (33.33%) |10 (66.67%) |
593 | jeffersonville |5 (35.71%) |9 (64.29%) |
594 |
595 |
596 |
597 |
598 |
599 | Click to expand answer!
600 |
601 | ##### Answer
602 | ```sql
603 | WITH get_letter_count AS (
604 | SELECT
605 | t1.city_name,
606 | length(t1.city_name) string_length,
607 | -- regexp_replace() returns a vlue that has been manipulated by a regular expression.
608 | length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count
609 | FROM
610 | cleaned_data.cities AS t1
611 | WHERE
612 | t1.insert_date = '2022-04-28'
613 | AND
614 | t1.country_code_2 in ('us')
615 | ),
616 | get_letter_diff AS (
617 | SELECT
618 | city_name,
619 | string_length - consonant_count AS vowels,
620 | round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc,
621 | consonant_count AS consonants,
622 | round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc
623 | FROM
624 | get_letter_count
625 | )
626 | SELECT
627 | city_name,
628 | vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc,
629 | consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc
630 | FROM
631 | get_letter_diff
632 | ORDER BY
633 | vowels;
634 | ```
635 |
636 |
637 |
638 | 13. Most Consecutive Days
639 | List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted in April 2022.
640 |
641 |
642 | Click to expand expected results!
643 |
644 | ##### Expected Results:
645 |
646 | most_consecutive_dates|city_name |
647 | ----------------------|------------|
648 | 2022-04-22|South Dundas|
649 | 2022-04-23|La Prairie |
650 | 2022-04-24|Elliot Lake |
651 | 2022-04-25|Lachute |
652 |
653 |
654 |
655 |
656 |
657 | Click to expand answer!
658 |
659 | ##### Answer
660 | ```sql
661 | DROP TABLE IF EXISTS get_dates;
662 | CREATE TEMP TABLE get_dates AS (
663 | SELECT
664 | DISTINCT ON (insert_date) insert_date AS insert_date,
665 | city_name
666 | FROM
667 | cleaned_data.cities
668 | WHERE
669 | country_code_2 = 'ca'
670 | AND
671 | insert_date BETWEEN '2022-04-01' AND '2022-04-30'
672 | ORDER BY
673 | insert_date
674 | );
675 |
676 | DROP TABLE IF EXISTS get_diff;
677 | CREATE TEMP TABLE get_diff AS (
678 | SELECT
679 | city_name,
680 | insert_date,
681 | EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff
682 | FROM
683 | get_dates
684 | );
685 |
686 |
687 | DROP TABLE IF EXISTS get_diff_count;
688 | CREATE TEMP TABLE get_diff_count AS (
689 | SELECT
690 | city_name,
691 | insert_date,
692 | count(*) OVER (PARTITION BY diff) AS diff_count
693 | FROM
694 | get_diff
695 | );
696 |
697 |
698 | WITH get_rank AS (
699 | SELECT
700 | DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk,
701 | insert_date,
702 | city_name
703 | FROM
704 | get_diff_count
705 | )
706 | SELECT
707 | insert_date AS most_consecutive_dates,
708 | initcap(city_name) AS city_name
709 | FROM
710 | get_rank
711 | WHERE
712 | rnk = 1
713 | ORDER BY
714 | insert_date;
715 | ```
716 |
717 |
718 |
719 | 14. Month over Month in View
720 | Create a view that lists the month-year, the number of cities inserted for that month, a running city count total and the month over month percentage growth for 2021.
721 |
722 | Format the cities count and the running total with the thousands separator and format the month over month growth with a plus symbol and percentage symbol
723 |
724 | Example:
725 | month_year|cities_inserted|running_total|month_over_month|
726 | ----------|---------------|-------------|----------------|
727 | Feb-2021 | 1,291 | 2,762 |+87.76% |
728 |
729 |
730 | Click to expand expected results!
731 |
732 | ##### Expected Results:
733 |
734 | month_year|cities_inserted|running_total|month_over_month|
735 | ----------|---------------|-------------|----------------|
736 | Jan-2021 | 1,471 | 1,471 | |
737 | Feb-2021 | 1,291 | 2,762 |+87.76% |
738 | Mar-2021 | 1,485 | 4,247 |+53.77% |
739 | Apr-2021 | 1,508 | 5,755 |+35.51% |
740 | May-2021 | 1,509 | 7,264 |+26.22% |
741 | Jun-2021 | 1,395 | 8,659 |+19.20% |
742 | Jul-2021 | 1,394 | 10,053 |+16.10% |
743 | Aug-2021 | 1,481 | 11,534 |+14.73% |
744 | Sep-2021 | 1,452 | 12,986 |+12.59% |
745 | Oct-2021 | 1,446 | 14,432 |+11.14% |
746 | Nov-2021 | 1,378 | 15,810 |+9.55% |
747 | Dec-2021 | 1,472 | 17,282 |+9.31% |
748 |
749 |
750 |
751 |
752 |
753 | Click to expand answer!
754 |
755 | ##### Answer
756 | ```sql
757 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth;
758 | CREATE VIEW cleaned_data.view_2021_growth AS (
759 | WITH get_month_count AS (
760 | SELECT
761 | date_trunc('month', insert_date) as single_month,
762 | count(*) AS monthly_count
763 | FROM
764 | cleaned_data.cities
765 | WHERE
766 | EXTRACT('year' FROM insert_date) = 2021
767 | GROUP BY
768 | single_month
769 | ORDER BY
770 | single_month
771 | ),
772 | get_running_total AS (
773 | SELECT
774 | single_month::date,
775 | monthly_count,
776 | sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities
777 | FROM
778 | get_month_count
779 | ),
780 | get_month_over_month AS (
781 | SELECT
782 | single_month,
783 | monthly_count,
784 | total_num_cities,
785 | round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month
786 | FROM
787 | get_running_total
788 | )
789 | SELECT
790 | to_char(single_month, 'Mon-YYYY') AS month_year,
791 | to_char(monthly_count, '9G999') AS cities_inserted,
792 | to_char(total_num_cities, '99G999') AS running_total,
793 | to_char(month_over_month, 'sg99.99') || '%' AS month_over_month
794 | FROM
795 | get_month_over_month
796 | );
797 |
798 | SELECT
799 | *
800 | FROM
801 | cleaned_data.view_2021_growth;
802 | ```
803 |
804 |
805 |
806 | 15. Stored Procedure to CSV
807 | Create and call a stored procedure that lists a unique row id number, insert date, country name, city name, population and languages spoken for countries in the Latin America and Caribbean sub-region that were inserted on either '2022-04-09', '2022-04-28' or '2022-08-11'.
808 |
809 | Order by the insert date and output the results (including headers) to a CSV file located in [/source_data/csv_output/](../source_data/csv_output/) .
810 |
811 |
812 | Click to expand expected results!
813 |
814 | ##### Expected Results:
815 |
816 | Results located in [/source_data/csv_output/output.csv](../source_data/csv_output/output.csv)
817 |
818 |
819 |
820 |
821 |
822 | Click to expand answer!
823 |
824 | ##### Answer
825 | ```sql
826 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output ()
827 | LANGUAGE plpgsql
828 | AS
829 | $sproc$
830 | BEGIN
831 | COPY (
832 | SELECT
833 | ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id,
834 | t1.insert_date,
835 | t2.country_name,
836 | t1.city_name,
837 | t1.population,
838 | array_agg(t3.language) AS languages
839 | FROM
840 | cleaned_data.cities AS t1
841 | JOIN
842 | cleaned_data.countries AS t2
843 | ON
844 | t1.country_code_2 = t2.country_code_2
845 | LEFT JOIN
846 | cleaned_data.languages AS t3
847 | ON
848 | t2.country_code_2 = t3.country_code_2
849 | WHERE
850 | t2.sub_region = 'latin america and the caribbean'
851 | AND
852 | t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11')
853 | GROUP BY
854 | t1.insert_date,
855 | t2.country_name,
856 | t1.city_name,
857 | t1.population
858 | ORDER BY
859 | t1.insert_date
860 | )
861 | TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER;
862 | END
863 | $sproc$;
864 |
865 | -- Call the stored procedure
866 | CALL cleaned_data.sproc_output();
867 | ```
868 |
869 |
870 |
871 | :grey_exclamation: Start here if you want to follow along with the [SQL Challenge WALKTHROUGH](./walkthrough/WALKTHROUGH_1_DOCKER.md) :grey_exclamation:
872 |
873 | :exclamation: If you found this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
874 |
875 |
876 |
877 |
--------------------------------------------------------------------------------
/docker-compose.yaml:
--------------------------------------------------------------------------------
1 | version: '3.8'
2 | services:
3 | # This service can be named anything but we'll call it the image it's running
4 | postgres:
5 | # This will get the lastest official PostgreSQL image from DockerHub
6 | image: postgres
7 | # The container chosen is the name of the project
8 | container_name: sql_coding_challenge
9 | restart: always
10 | # This is the name of the env file. This file is hidden and has evironmental variables.
11 | # All values are hard-coded so the file has been commented out.
12 | # env_file:
13 | # - .env
14 | environment:
15 | # The database name
16 | POSTGRES_DB: sql_coding_challenge
17 | # PostgreSQL Username
18 | POSTGRES_USER: postgres
19 | # PostgreSQL Password
20 | POSTGRES_PASSWORD: postgres
21 | # The location where PostgreSQL stores its data (Do not change unless you know what your doing)
22 | PGDATA: /var/lib/postgresql/data/pgdata
23 | # Default ports that PostgreSQL uses
24 | ports:
25 | - "5432:5432"
26 | volumes:
27 | -
28 | # We bind the db directory in our project file to where PostgreSQL stores its data
29 | # to keep data stored persistent when PostgreSQL restarts
30 | type: bind
31 | source: ./db
32 | target: /var/lib/postgresql/data
33 | -
34 | # We bind our source data file so that the PostgreSQL container can access files and
35 | # scripts without having to copy them into the container
36 | type: bind
37 | source: ./source_data
38 | target: /var/lib/postgresql/source_data
39 | networks:
40 | - reference
41 | networks:
42 | reference:
--------------------------------------------------------------------------------
/images/ERD.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/ERD.PNG
--------------------------------------------------------------------------------
/images/bash_1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_1.PNG
--------------------------------------------------------------------------------
/images/bash_2.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_2.PNG
--------------------------------------------------------------------------------
/images/bash_3.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_3.PNG
--------------------------------------------------------------------------------
/images/bash_4.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_4.PNG
--------------------------------------------------------------------------------
/images/bash_5.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_5.PNG
--------------------------------------------------------------------------------
/images/bash_6.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_6.PNG
--------------------------------------------------------------------------------
/images/terminal_1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/terminal_1.PNG
--------------------------------------------------------------------------------
/images/terminal_2.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/terminal_2.PNG
--------------------------------------------------------------------------------
/source_data/csv_data/countries.csv:
--------------------------------------------------------------------------------
1 | country_name,country_code_2,country_code_3,region,sub_region,intermediate_region
2 | !afg!hANistan,af?,afg,asia,$southern asia,
3 | alba$nia,aL,alb,europe!,southern *europe,
4 | Alb?ania,AL,alb,eur#ope,$southern e#urope,
5 | ?algeria?,d!z,dza,africa,northern africa,
6 | americ#an samoa?,as,as!m,o##ceania,polyne$sia,
7 | andoRra,Ad,and,europe,southern europe,
8 | cyprus ,cy,cyp,asia,western asia,
9 | angola,ao,ago,africa,sub-saharan africa,middle africa
10 | ang!uilla,ai,aia,americas,latin ame!rica and the caribbean,caribbean
11 | antarctica,aq,ata,antartica,,
12 | antigua and barbuda,ag,atg,americas,latin america and the caribbean,caribbean
13 | argentina,ar,arg,americas,latin america and the caribbean,south america
14 | armenia,a?m,arm,asia,western asia,
15 | aruba,aw,abw,americas,latin america and the caribbean,caribbean
16 | australia,au,aus,oceania,australia and new zealand,
17 | austria,at,aut,europe,western europe,
18 | azerbaijan,az,aze,asia,western asia,
19 | bahamas,bs,bhs,americas,latin america and the caribbean,caribbean
20 | bahrain,bh,bhr,asia,western asia,
21 | bangladesh,bd,bgd,asia,southern asia,
22 | barbados,bb,brb,americas,latin america and the caribbean,caribbean
23 | belarus,by,blr,europe,eastern europe,
24 | b!elgium,be,bel,europe,western europe,
25 | belize,bz,blz,americas,latin america and the caribbean,central america
26 | benin,bj,ben,africa,sub-saharan africa,western africa
27 | bermuda,bm,bmu,americas,northern america,
28 | bhutan,bt,btn,asia,southern asia,
29 | azerbaijan,az,aze,asia,western asia,
30 | bolivia,bo,bol,americas,latin america and the cari$bbean,south america
31 | "bonaire, sint eustatius and saba",bq,bes,americas,latin america and the caribbean,caribbean
32 | bosnia and herze$govina,ba,bih,europe,southern europe,
33 | botswana,bW,bwa,africa,sub-saharan africa,southern africa
34 | bouvet island,bv,bvt,americas,latin america and the caribbean,south america
35 | brazil,br,bra,americas,latin america and the caribbean,south america
36 | british indian ocean territory,io,iot,africa,sub-saharan africa,eastern africa
37 | brunei darussalam,bn,brn,asia,south-eastern asia,
38 | bulgaria,bg,bgr,europe,eastern europe,
39 | bu$rkina faso,bf,bfa,africa,sub-saharan africa,western africa
40 | burundi,bi,bdi,africa,sub-saharan africa,eastern africa
41 | cabo verde,cv,cpv,africa,sub-saharan africa,west$ern africa
42 | cambodia,kh,khm,asia,south-eastern asia,
43 | cyprus,cy,cyp,asia,western asia,
44 | cameroon,c$m,cmr,africa,sub-saharan africa,middle africa
45 | canada,ca,can,americas,northern america,
46 | cayman islands,ky,cym,americas,latin america and the caribbean,caribbean
47 | central african republic,cf,caf,africa,sub-saharan africa,middle africa
48 | chad,td,tcd,africa,sub-saharan africa,middle africa
49 | chile,cl,chl,americas,latin america and the caribbean,south america
50 | china ,cn,chn,asia,eastern asia,
51 | christmas island,cx,cxr,oceania,australia and new zealand,
52 | cocos islands,cc,cck,oceania,australia and new z?ealand,
53 | colombia,co,col,americas,latin america and the caribbean,south america
54 | com!oros,km,com,africa,sub-saharan africa,eastern africa
55 | republic of congo,cg,cog,africa,sub-saharan africa,middle africa
56 | democratic republic of the congo,cd,cod,africa,sub-saharan africa,middle africa
57 | cook islands,ck,cok,oceania,polynesia,
58 | costa rica,cr,cri,americas,latin america and the caribbean,central america
59 | ivory coast,ci,civ,africa,sub-saharan africa,western africa
60 | !!croatia,Hr,hrv,europe,southern europe,
61 | cuba,cu,cub,americas,latin america and the caribbean,caribbean
62 | curacao,cw,cuw,americas,latin a!merica and the caribbean,caribbean
63 | cyprus,cy,cyp,asia,western asia,
64 | czechia,cz,cze,europe,eastern europe,
65 | denmark,dk,dnk,europe,northern europe,
66 | djibouti,dj,dji,africa,sub-saharan africa,eastern africa
67 | dominica,dm,dma,americas,latin america and the caribbean,caribbean
68 | dominican republic,do,dom,americas,latin america and the caribbean,caribbean
69 | ecua$dor,ec,ecu,americas,latin america and the caribbean,south america
70 | egypt,eg,egy,africa,northern africa,
71 | el salvador,sv,slv,americas,latin america and the caribbean,central america
72 | equatorial guinea,gq,gnq,africa,sub-saharan africa,middle africa
73 | eritrea,er,eri,africa,sub-saharan africa,eastern africa
74 | estonia,ee,est,europe,northern europe,
75 | eswatini,s?z,swz,africa,sub-saharan africa,southern africa
76 | ethiopia,!et,eth,africa,sub-saharan africa,eastern africa
77 | falkland islands (malvinas),fk,flk,americas,latin america and the caribbean,south america
78 | faroe islands,fo,fro,europe,northern europe,
79 | fiji,fj,fji,oceania,melanesia,
80 | finland,fi,fin,europe,northern europe,
81 | france,fr,fra,europe,western europe,
82 | french guiana ,gf,guf,americas,latin america and the caribbean,south america
83 | french polynesia,pf,pyf,oceania,polynesia,
84 | macao,mo,mac,asia,eastern asia,
85 | french southern territories,tf,atf,africa,sub-saharan africa,eastern africa
86 | gabon,ga,gab,africa,sub-saharan africa,middle africa
87 | gambia,gm,gmb,africa,sub-saharan africa,western africa
88 | georgia,g!e,geo,asia,western asia,
89 | germany,de,deu,europe,western europe,
90 | ghana,gh,gha,africa,sub-saharan africa,western africa
91 | gibraltar,gi,gib,europe,southern europe,
92 | !china,cn,chn,asia,eastern asia,
93 | greece,gr,grc,europe,southern europe,
94 | greenland,gl,grl,americas,northern america,
95 | grenada,gd,grd,americas,la?tin america and the caribbean,caribbean
96 | guadeloupe,gp,glp,americas,latin america and the caribbean,caribbean
97 | guam,gu,gum,oceania,micronesia,
98 | guatemala,gt,gtm,americas,latin america and the caribbean,central america
99 | guinea,gn,gin,africa,sub-saharan africa,western africa
100 | guinea-bissau,gw,gnb,africa,sub-saharan africa,western africa
101 | guy$ana,gY,guy,americas,latin america and the caribbean,south america
102 | haiti,ht,hti,americas,latin america and the caribbean,carib!bean
103 | holy see,va,vat,europe,southern europe,
104 | honduras,hn,hnd,americas,latin america and the caribbean,central america
105 | hong kong,hk,hkg,asia,eastern asia,
106 | hungary,hu,hun,europe,eastern europe,
107 | iceland,is,isl,europe,northern europe,
108 | india,i!n,ind,asia,southern asia,
109 | indonesia,id,idn,asia,south-eastern asia,
110 | iran,ir,irn,asia,southern asia,
111 | iraq,iq,irq,asia,western asia,
112 | ireland,ie,irl,europe,northern europe,
113 | Ireland,ie,irl,europe,northern europe,
114 | isle of man,im,imn,europe,northern europe,
115 | israel,il,isr,asia,western asia,
116 | italy,it,ita,europe,southern europe,
117 | china,cn,chn,asia,eastern asia,
118 | jamaica,jm,jam,americas,latin america and the caribbean,caribbean
119 | japan,jp,jpn,asia,eastern asia,
120 | jersey,je,jey,europe,northern europe,channel islands
121 | jordan ,jo,jor,asia,western asia,
122 | kazakhstan ,kz,kaz,asia,central asia,
123 | kenya,ke,ken,africa,sub-saharan africa,eastern africa
124 | kiribati,ki,kir,oc!eania,micronesia,
125 | north korea,kp,prk,asia,eastern asia,
126 | south korea,kr,kor,asia,eastern asia,
127 | kUWait,kw,kwt,asia,western asia,
128 | kyrgyzstan,kg,kgz,asia,central asia,
129 | laos,la,lao,asia,south-eastern asia,
130 | latvia,lv,lva,europe,northern europe,
131 | lebanon,lb,lbn,asia,western asia,
132 | lesotho,ls,lso,africa,sub-saharan africa,southern africa
133 | liberia,lr,lbr,africa,sub-saharan africa,western africa
134 | libya,ly,lby,afr?ica,northern africa,
135 | liechtenstein,li,lie,europe,western europe,
136 | li!thuania,lt,ltu,europe,northern europe,
137 | luxembourg,lu,lux,europe,western europe,
138 | macao,mo,mac,asia,eastern asia,
139 | kUWAit,kw,kwt,asia,western asia,
140 | madagascar?!,mg,mdg,africa,sub-saharan africa,eastern africa
141 | malawi,mw,mwi,africa,sub-saharan africa,eastern africa
142 | malaysia,my,mys,asia,south-eastern asia,
143 | maldives,mv,mdv,asia,southern asia,
144 | mali,ml,mli,africa,sub-saharan africa,western africa
145 | malta ,mt,mlt,europe,southern europe,
146 | marshall islands,mh,mhl,oceania,micronesia,
147 | martinique,mq,mtq,americas,latin america and the caribbean,caribbean
148 | mauritania,mr,mrt,africa,sub-saharan africa,western africa
149 | mauritius,mu,mus,africa,sub-saharan africa,eastern africa
150 | mayotte,yt,myt,africa,sub-saharan africa,eastern africa
151 | libYa,ly,lby,africa,northern africa,
152 | mexico,mx,mex,americas,latin america and the caribbean,central america
153 | micronesia (federated states of),fm,fsm,oceania,micronesia,
154 | "moldova, republic of",md,mda,europe,eastern europe,
155 | monaco,mc,mco,euro#pe,western europe,
156 | mongolia,mn,mng,asia,eastern asia,
157 | mo#ntenegro,me,mne,europe,southern europe,
158 | montserrat,ms,msr,americas,latin america and the caribbean,caribbean
159 | morocco,ma,mar,africa,northern africa,
160 | mozambique,mz,moz,africa,sub-saharan africa,eastern africa
161 | myanmar,mm,mmr,asia,south-eastern asia,
162 | nAmibia,na,nam,africa,sub-saharan africa,southern africa
163 | nauru ,nr,nru,oceania,micronesia,
164 | nepal,np,npl,asia,southern asia,
165 | marshall islands,mh,mhl,oceania,micronesia,
166 | netherlands,nl,nld,europe,western europe,
167 | new caledonia,nc,ncl,oceania,melanesia,
168 | new zea!land,nz,nzl,oceania,australia and new zealand,
169 | nicaragua,ni,nic,americas,latin america and the caribbean,central america
170 | niger,ne,ner,africa,sub-saharan africa,western africa
171 | nigeria,ng,nga,africa,sub-saharan africa,western africa
172 | niue,nu,niu,oceania,polynesia,
173 | norfolk island,nf,nfk,oceania,australia and new zealand,
174 | reunion,re,reu,africa,sub-saharan africa,eastern africa
175 | north macedonia,mk,mkd,europe,southern europe,
176 | sAudi arabia,sa,sau,asia,western asia,
177 | northern mariana islands,mp,mnp,oceania,micronesia,
178 | norway,no,nor,europe,northern europe,
179 | !!oman,om,omn,asia,western asia,
180 | pa$kistan,pk,pak,asia,southern asia,
181 | palau,pw,plw,oceania,micronesia,
182 | panama,pa,pan,americas,latin america and the caribbean,central america
183 | papua new guinea,pg,png,oceania,melanesia,
184 | paraguay,py,pry,americas,latin america and the caribbean,south america
185 | peru,pe,per,americas,latin america and the caribbean,south america
186 | philippines,ph,phl,asia,south-eastern as$ia,
187 | pitcairn,pn,pcn,oceania,polynesia,
188 | poland,pl,pol,europe,eastern europe,
189 | portugal,pt,prt,europe,southern europe,
190 | philippines ,p?h,phl,asia,south-eastern asia,
191 | puerto rico,pr,pri,americas,latin america and the caribbean,caribbean
192 | qatar,qa,qat,asia,western asia,
193 | r!eunion,re,reu,africa,sub-saharan africa,eastern africa
194 | romania,RO,rou,europe,eastern europe,
195 | russian federation,ru,rus,europe,eastern europe,
196 | rwanda,rw,rwa,africa,sub-saharan africa,eastern africa
197 | saint barthelemy,bl,blm,americas,latin america and the caribbean,caribbean
198 | "saint helena, ascension and tristan da cunha",sh,shn,africa,sub-saharan africa,western africa
199 | saint kitts and nevis,kn,kna,americas,latin america and the caribbean,caribbean
200 | saint lucia,lc,lca,americas,latin america and the caribbean,caribbean
201 | saint martin (french part),mf,maf,americas,latin america and the caribbean,caribbean
202 | saint pierre and miquelon,pm,spm,americas,northern america,
203 | $saint vincent and the grenadines,vc,vct,americas,latin america and the caribbean,caribbean
204 | samoa,ws,wsm,oceania,polynesia,
205 | san marino,sm,smr,europe,southern europe,
206 | sao tome and principe,st,stp,africa,sub-saharan africa,middle africa
207 | saudi Arabia,sa,sau,asia,western asia,
208 | Senegal,sn,sen,africa,sub-saharan africa,western africa
209 | serbia,rs,srb,europe,southern europe,
210 | seychelles,sc,syc,africa,sub-saharan africa,eastern africa
211 | sierra leone,sl,sle,africa,sub-saharan africa,western africa
212 | zimbabwe?,zw,zwe,africa,sub-saharan africa,eastern africa
213 | singapore,sg,sgp,asia,south-eastern asia,
214 | sint ma!arten (dutch part),sx,sxm,americas,latin america and the caribbean,caribbean
215 | slovakia,sk,svk,europe,eastern europe,
216 | slovenia,si,svn,europe,southern europe,
217 | solomon islands,sb,slb,oceania,melanesia,
218 | somalia,so,som,africa,sub-saharan africa,eastern africa
219 | south africa,Za,zaf,africa,sub-saharan africa,s$outhern africa
220 | south georgia and the south sandwich islands,gs,sgs,americas,latin america and the caribbean,south america
221 | south sudan,ss,ssd,africa,sub-saharan africa,eastern africa
222 | spain,es,esp,europe,southern europe,
223 | sri lanka,lk,lka,asia,southern asia,
224 | sudan,sd,sdn,africa,northern africa,
225 | suriname,sr,sur,americas,latin america and the caribbean,south america
226 | sweden,se,swe,europe,northern europe,
227 | switzer*land,ch,che,euro?pe,western europe,
228 | syrian arab republic,sy,syr,asia,western asia,
229 | taiwan,tw,twn,asia,eastern asia,
230 | tajikistan,tj,tjk,asia,central asia,
231 | "tanzania, united republic of",tz,tza,africa,sub-saharan africa,eastern africa
232 | thailand,th,tha,asia,south-eastern asia,
233 | timor-leste,tl!,tls,asia,south-eastern asia,
234 | togo,tg,tgo,africa,sub-saharan africa,western africa
235 | tonga,to,ton,oceania,polynesia,
236 | trinidad and tobago,tt,tto,americas,latin america and the caribbean,caribbean
237 | tunisia,tn,tun,africa,northern africa,
238 | tur?key,tr,tur,asia,western asia,
239 | turkmenistan,tm,tkm,asia,central asia,
240 | turks and caicos islands,tc,tca,americas,latin america and the caribbean,caribbean
241 | tuvalu,tv,tuv,oceania,polynesia,
242 | uganda,ug,uga,africa,sub-saharan africa,eastern africa
243 | serbia,rs,srb,eur!ope,southern eur?ope,
244 | ukraine,ua,ukr,europe,eastern europe,
245 | united arab emirates,ae,are,asia,western asia,
246 | united kingdom,gb,gbr,europe,northern europe,
247 | united states of america,us,usa,americas,northern america,
248 | uruguay,uy,ury,americas,latin america and the caribbean,south america
249 | uzbe?kistan,uz,uzb,asia,central asia,
250 | vanuatu,vu,vut,oceania,melanesia,
251 | *venezuela (bolivarian republic of),ve,ven,americas,latin america and the caribbean,south america
252 | viet nam,vn,vnm,asia,south-eastern asia,
253 | virgin islands (british),vg,vgb,americas,latin america and the caribbean,caribbean
254 | virgin islands (u.s.),vi,vir,americas,latin america and the caribbean,caribbean
255 | wallis and futuna,wf,wlf,oceania,polynesia,
256 | yemen,ye,yem,asia,western asia,
257 | zambia#,zm,zmb,africa,sub-saharan africa,eastern africa
258 | zim$babwe,zw,zwe,africa,sub-saharan africa,eastern africa
259 |
--------------------------------------------------------------------------------
/source_data/csv_data/currencies.csv:
--------------------------------------------------------------------------------
1 | country_code_2,currency_name,currency_code
2 | aF,afghani,afn
3 | al,lek,all
4 | dz,algerian dinar,dzd
5 | as,us dollar,usd
6 | ad,euro,eur
7 | ao,kwanza,aoa
8 | ai,east caribbean dollar,xcd
9 | ag,east caribbean dollar,xcd
10 | ar,argentine peso,ars
11 | am,armenian dram,amd
12 | aw,aruban florin,awg
13 | au,australian $dollar,aud
14 | at,euro,eur
15 | az,azerbaijan manat,azn
16 | bs,bahamian dollar,bsd
17 | bh,bahraini dinar,bhd
18 | bd,taka,bdt
19 | bb,barbados dollar,bbd
20 | by,belarusian ruble,byn
21 | be,euro,eur
22 | bz,belize dollar,bzd
23 | bj,cfa franc bceao,xof
24 | !bm,bermudian dollar,bmd
25 | bt,indian rupee,inr
26 | bt,ngultrum,btn
27 | bo,boliviano,bob
28 | bo,mvdol,bov
29 | ba,convertible mark,bam
30 | bw,pula,bwp
31 | br,brazilian real,brl
32 | io,us dollar,usd
33 | bn,brunei dollar,bnd
34 | bg,bulgarian lev,bgn
35 | bf,cfa franc bceao,xof
36 | bi,burundi franc,bif
37 | cv,cabo verde escudo,cve
38 | kh,riel,khr
39 | cm,cfa franc beac,xaf
40 | ca,canadian dollar,cad
41 | ky,cayman islands dollar,kyd
42 | cf,cfa franc beac,xaf
43 | td,cfa franc beac,xaf
44 | cl,chilean peso,clp
45 | cl,unidad de fomento,clf
46 | cn,yuan renminbi,cny
47 | cx,australian dollar,aud
48 | cc,australian dollar,aud
49 | co,colombian peso,cop
50 | co,unidad de valor real,cou
51 | km,comorian franc ,kmf
52 | cd,congolese franc,cdf
53 | cg,cfa franc beac,xaf
54 | ck,new zealand dollar,nzd
55 | cr,costa rican colon,crc
56 | ci,cfa franc bceao,xof
57 | hr,euro,eur
58 | cU,cuban peso,cup
59 | cu,peso convertible,cuc
60 | cw,netherlands antillean guilder,ang
61 | cy,euro,eur
62 | cz,czech koruna,czk
63 | dk,danish krone,dkk
64 | dj,djibouti franc,djf
65 | dm,east caribbean dollar,xcd
66 | do,dominican peso,dop
67 | ec,us $dollar,usd
68 | eg,egyptian pound,egp
69 | sv,el salvador colon,svc
70 | sv,us dollar,usd
71 | gq,cfa franc beac,xaf
72 | er,nakfa,ern
73 | ee,euro,eur
74 | sz,lilangeni,szl
75 | et,ethiopian birr,etb
76 | fk,falkland islands pound,fkp
77 | fo,danish krone,dkk
78 | fj,fiji dollar,fjd
79 | fi,euro,eur
80 | fr,euro,eur
81 | gf,euro,eur
82 | pf,cfp franc,xpf
83 | Ga,cfa franc beac,xaf
84 | gm,dalasi,gmd
85 | ge,lari,gel
86 | de,euro,eur
87 | gh,ghana cedi,ghs
88 | gi,gibraltar pound,gip
89 | gr,euro,eur
90 | gl,danish krone,dkk
91 | gd,east caribbean dollar,xcd
92 | gp,euro,eur
93 | gu,us dollar,usd
94 | gt,quetzal,gtq
95 | gn,guinean franc,gnf
96 | gw,cfa franc bceao,xof
97 | gn,guyana dollar,gyd
98 | ht,gourde,htg
99 | ht,us dollar,usd
100 | va,euro,eur
101 | hn,lempira,hnl
102 | hk,hong kong $dollar,hkd
103 | hu,forint,huf
104 | is,iceland krona,isk
105 | in,indian rupee,inr
106 | id!,rupiah,idr
107 | ir,iranian rial,irr
108 | iq,iraqi dinar,iqd
109 | ie,euro,eur
110 | im,pound sterling,gbp
111 | il,new israeli sheqel,ils
112 | it,euro,eur
113 | jm,jamaican dollar,jmd
114 | jp,yen,jpy
115 | je,pound sterling,gbp
116 | jo,jordanian dinar,jod
117 | kz,tenge,kzt
118 | ke,kenyan shilling,kes
119 | ki,australian dollar,aud
120 | kp,north korean won,kpw
121 | kr,won,krw
122 | kw,kuwaiti dinar,kwd
123 | kg,som,kgs
124 | la,lao kip,lak
125 | lv,euro,eur
126 | lb,lebanese pound,lbp
127 | ls,loti,lsl
128 | ls,rand,zar
129 | lr,liberian dollar,lrd
130 | ly,libyan dinar,lyd
131 | li,swiss franc,chf
132 | !!lt,euro,eur
133 | lu,euro,eur
134 | mo,pataca,mop
135 | mk,denar,mkd
136 | mg,malagasy ariary,mga
137 | mw,malawi !kwacha,mwk
138 | my,malaysian ringgit,myr
139 | mv,rufiyaa,mvr
140 | ml,cfa franc bceao,xof
141 | mt,euro,eur
142 | mh,us dollar,usd
143 | mq,euro,eur
144 | mr,ouguiya,mru
145 | mu,mauritius rupee,mur
146 | yt,euro,eur
147 | mx,mexican peso,mxn
148 | mx,mexican unidad de inversion (udi),mxv
149 | fm,us dollar,usd
150 | md,moldovan leu,mdl
151 | mc,euro,eur
152 | mN,tugrik,mnt
153 | me,euro,eur
154 | ms,east caribbean dollar,xcd
155 | ma,moroccan dirham,mad
156 | mz,mozambique metical,mzn
157 | mm,kyat,mmk
158 | na,namibia dollar,nad
159 | na,rand,zar
160 | nr,australian dollar,aud
161 | np,nepalese $rupee,npr
162 | nl,euro,eur
163 | nc,cfp franc,xpf
164 | nz,new zealand dollar,nzd
165 | ni,cordoba oro,nio
166 | ni,cfa franc bceao,xof
167 | ng,naira,ngn
168 | nu,new zealand dollar,nzd
169 | nf,australian dollar,aud
170 | mp,us dollar,usd
171 | no,norwegian krone,nok
172 | om,rial omani,omr
173 | pk,pakistan rupee,pkr
174 | pw,us dollar,usd
175 | pa,balboa,pab
176 | PA,us dollar,usd
177 | pg,kina,pgk
178 | py,guarani,pyg
179 | pe,sol,pen
180 | ph,philippine peso,php
181 | pn,new zealand dollar,nzd
182 | pl,zloty,pln
183 | pt,euro,eur
184 | pr,us dollar,usd
185 | qa,qatari rial,qar
186 | re,euro,eur
187 | ro,roman?ian leu,ron
188 | ru,russian ruble,rub
189 | rw,rwanda franc,rwf
190 | bl,euro,eur
191 | sh,saint helena pound,shp
192 | kn,east caribbean dollar,xcd
193 | ls,east caribbean dollar,xcd
194 | mf,euro,eur
195 | pm,euro,eur
196 | vc,east caribbean dollar,xcd
197 | ws,tala,wst
198 | sm,euro,eur
199 | st,dobra,stn
200 | sa,saudi riyal,sar
201 | sn,cfa franc bceao,xof
202 | rs!,serbian dinar,rsd
203 | sc,seychelles rupee,scr
204 | sl,leone,sll
205 | sl,leone,sle
206 | sg,singapore dollar,sgd
207 | sx,netherlands antillean guilder,ang
208 | sk,euro,eur
209 | sk,euro,eur
210 | sb,solomon islands dollar,sbd
211 | so,somali shilling,sos
212 | za,rand,zar
213 | gs,no universal currency,
214 | ss,south sudanese pound,ssp
215 | es,euro,eur
216 | lk,sri lanka rupee,lkr
217 | sd,sudanese ?pound,sdg
218 | sr,surinam dollar,srd
219 | se,swedish krona,sek
220 | ch,swiss franc,chf
221 | ch,wir euro,che
222 | ch,wir franc,chw
223 | sy,syrian pound,syp
224 | tw,new taiwan dollar,twd
225 | tj,somoni,tjs
226 | tz,tanzanian shilling,tzs
227 | th,baht,thb
228 | tl,us dollar,usd
229 | Tg,cfa franc bceao,xof
230 | to,pa'anga,top
231 | tt,trinidad and tobago dollar,ttd
232 | tn,tunisian dinar,tnd
233 | tr,turkish lira,try
234 | tm,turkmenistan new manat,tmt
235 | tc,us dollar,usd
236 | tv,australian dollar,aud
237 | ug,uganda shilling,ugx
238 | uA,hryvnia,uah
239 | ae,uae dirham,aed
240 | gb,pound sterling,gbp
241 | us,us dollar,usd
242 | uy,peso uruguayo,uyu
243 | uy,uruguay peso en unidades indexadas (ui),uyi
244 | uy,unidad previsional,uyw
245 | uz,uzbekistan sum,uzs
246 | vu,vatu,vuv
247 | ve,bolivar soberano,ves
248 | ve,bolivar soberano,ved
249 | vn,dong,vnd
250 | vg,us d$ollar,usd
251 | vi,us dollar,usd
252 | wf,cfp franc,xpf
253 | ye,yemeni rial,yer
254 | zm,zambian kwacha,zmw
255 | zw,zimbabwe dollar,zwl
256 |
--------------------------------------------------------------------------------
/source_data/csv_data/languages.csv:
--------------------------------------------------------------------------------
1 | language,country_code_2
2 | pashto,af
3 | persian,af
4 | u?zbek,af
5 | albanian,aL
6 | english,al
7 | arabic,dz
8 | french,dz
9 | kabyle,dz
10 | english,as
11 | catalan,ad
12 | english,ad
13 | lingala,ao
14 | portuguese,ao
15 | english,ai
16 | spanish,ai
17 | english,ag
18 | spanish,ag
19 | english,ar
20 | spanish,ar
21 | armenian,am
22 | dutch,aw
23 | spanish,aw
24 | english,au
25 | english,at
26 | german,at
27 | azerbaijani,az
28 | english,bs
29 | spanish,bs
30 | arabic,bh
31 | bangla,bd
32 | chakma,bD
33 | english,bd
34 | english,bb
35 | spanish,bb
36 | be?larusian,by
37 | russian,by
38 | dutch,be
39 | english,be
40 | french,be
41 | german,be
42 | walloon,be
43 | english,bz
44 | spanish,bz
45 | french,bj
46 | yoruba,bj
47 | english,bm
48 | spanish,bm
49 | dzongkha,bt
50 | quechua,bo
51 | spanish,bo
52 | bosnian,ba
53 | croatian,ba
54 | english,ba
55 | serbian,ba
56 | english,bw
57 | tswana,bw
58 | english,br
59 | portuguese,br
60 | spanish,br
61 | english,io
62 | english,vg
63 | spanish,vg
64 | malay,bn
65 | malay,bn
66 | bulgarian,bg
67 | english,bg
68 | french,bf
69 | fulah,bf
70 | english,bi
71 | french,bi
72 | rundi,bi
73 | khmer,kh
74 | aghem,cm
75 | bafia,cm
76 | basaa,cm
77 | duala,cm
78 | english,cm
79 | ewondo,cm
80 | french,cm
81 | fulah,cm
82 | kako,cm
83 | kwasio,cm
84 | meta,cm
85 | mundang,cm
86 | ngiemboon,cm
87 | ngomba,cm
88 | yangben,cm
89 | english,ca
90 | french,ca
91 | inuktitut,ca
92 | mohawk,ca
93 | spanish,ca
94 | kabuverdianu,cv
95 | portuguese,cv
96 | dutch,bq
97 | spanish,bq
98 | english,ky
99 | spanish,ky
100 | french,cf
101 | lingala,cf
102 | sango,cf
103 | arabic,td
104 | french,td
105 | english,cl
106 | mapuche,cl
107 | spanish,cl
108 | cantonese,cn
109 | chinese,cn
110 | chinese,cn
111 | english,cn
112 | sichuan yi,cn
113 | tibetan,cn
114 | uyghur,cn
115 | english,cx
116 | english,cc
117 | english,co
118 | spanish,co
119 | arabic,km
120 | french,km
121 | french,cg
122 | lingala,cg
123 | french,cd
124 | lingala,cd
125 | luba-katanga,cd
126 | swahili,cd
127 | english,ck
128 | spanish,cr
129 | french,ci
130 | croatian,hr
131 | english,hr
132 | spanish,cu
133 | dutch,cw
134 | spanish,cw
135 | english,cy
136 | greek,cy
137 | turkish,cy
138 | czech,cz
139 | english,cz
140 | danish,dk
141 | english,dk
142 | faroese,dk
143 | arabic,dj
144 | french,dj
145 | somali,dj
146 | english,dm
147 | spanish,dm
148 | spanish,do
149 | quechua,ec
150 | spanish,ec
151 | arabic,eg
152 | spanish,sv
153 | french,gq
154 | portuguese,gq
155 | spanish,gq
156 | arabic, er
157 | blin,er
158 | english,er
159 | geez,er
160 | tigre,er
161 | tigrinya,er
162 | english,ee
163 | estonian,ee
164 | english,sz
165 | swati,sz
166 | amharic,et
167 | geez,et
168 | oromo,et
169 | somali,et
170 | tigrinya,et
171 | wolaytta,et
172 | english,fk
173 | spanish,fk
174 | faroese,fo
175 | english,fj
176 | english,fi
177 | finnish,fi
178 | inari sami,fi
179 | northern sami,fi
180 | swedish,fi
181 | breton,fr
182 | catalan,fr
183 | corsican,fr
184 | english,fr
185 | french,fr
186 | occitan,fr
187 | portuguese,fr
188 | swiss german,fr
189 | french,gf
190 | spanish,gf
191 | french,pf
192 | french,ga
193 | english,gm
194 | fulah,gm
195 | georgian,ge
196 | ossetic,ge
197 | colognian,de
198 | english,de
199 | german,de
200 | low german,de
201 | lower sorbian,de
202 | upper sorbian,de
203 | akan,gh
204 | english,gh
205 | ewe,gh
206 | fulah,gh
207 | ga,gh
208 | hausa,gh
209 | english,gi
210 | english,gr
211 | greek,gr
212 | danish,gl
213 | kalaallisut,gl
214 | spanish,gl
215 | english,gd
216 | spanish,gd
217 | french,gp
218 | spanish,gp
219 | e!nglish,gu
220 | spanish,gt
221 | fulah,gw
222 | portuguese,gw
223 | french,gn
224 | fulah,gn
225 | kpelle,gn
226 | english,gy
227 | spanish,gy
228 | french,ht
229 | spanish,ht
230 | spanish,hn
231 | can#tonese,hk
232 | chinese,hk
233 | chinese,hk
234 | english,hk
235 | english,hu
236 | hungarian,hu
237 | english,is
238 | icelandic,is
239 | assamese,in
240 | bangla,in
241 | bodo,in
242 | chakma,in
243 | english,in
244 | gujarati,in
245 | hindi,in
246 | kannada,in
247 | kashmiri,in
248 | konkani,in
249 | malayalam,in
250 | manipuri,in
251 | marathi,in
252 | nepali,in
253 | odia,in
254 | punjabi,in
255 | sanskrit,in
256 | santali,in
257 | tamil,in
258 | telugu,in
259 | tibetan,in
260 | urdu,in
261 | english,id
262 | indonesian,id
263 | javanese,id
264 | "kurdish, sorani",ir
265 | mazanderani,ir
266 | northern luri,ir
267 | persian,ir
268 | arabic,iq
269 | "kurdish, sorani",iq
270 | northern luri,iq
271 | syriac,iq
272 | english,ie
273 | irish,ie
274 | english,iM
275 | manx,im
276 | arabic,il
277 | english,il
278 | hebrew,il
279 | catalan,it
280 | english,it
281 | friulian,it
282 | german,it
283 | italian,it
284 | sardinian,it
285 | sicilian,it
286 | english,jm
287 | english,jp
288 | japanese,jp
289 | english,je
290 | arabic,jo
291 | kazakh,kz
292 | russian,kz
293 | embu,ke
294 | english,ke
295 | gusii,ke
296 | kalenjin,ke
297 | kamba,ke
298 | kikuyu,ke
299 | luo,ke
300 | luyia,ke
301 | masai,ke
302 | meru,ke
303 | oromo,ke
304 | samburu,ke
305 | somali,ke
306 | swahili,ke
307 | taita,ke
308 | teso,ke
309 | english,ki
310 | arabic,kw
311 | kyrgyz,kg
312 | russian,kg
313 | lao,la
314 | english,lv
315 | latvian,lv
316 | arabic,lb
317 | english,ls
318 | southern sotho,ls
319 | english,lr
320 | fulah,lr
321 | kpelle,lr
322 | vai,lr
323 | arabic,ly
324 | german,li
325 | swiss german,li
326 | english,lt
327 | lithuanian,lt
328 | english,lu
329 | french,lu
330 | german,lu
331 | luxembourgish,lu
332 | portuguese,lu
333 | chinese,mo
334 | chinese,mo
335 | english,mo
336 | portuguese,mo
337 | english,mg
338 | french,mg
339 | malagasy,mg
340 | english,mw
341 | nyanja,mw
342 | english,my
343 | malay,my
344 | tamil,my
345 | dhivehi,mv
346 | english,mv
347 | bambara,ml
348 | french,ml
349 | koyra chiini,ml
350 | koyraboro senni,ml
351 | english,mt
352 | maltese,mt
353 | english,mh
354 | french,mq
355 | spanish,mq
356 | arabic,mr
357 | french,mr
358 | fulah,mr
359 | english,mu
360 | french,mu
361 | morisyen,mu
362 | f#rench,yt
363 | english,mx
364 | spanish,mx
365 | english,fm
366 | romanian,md
367 | russian,md
368 | french,mc
369 | mongolian,mn
370 | english,me
371 | serbian,me
372 | serbian,me
373 | english,ms
374 | spanish,ms
375 | arabic,ma
376 | french,ma
377 | moroccan tamazight,ma
378 | tachelhit,ma
379 | makhuwa-meetto,mz
380 | portuguese,mz
381 | sena,mz
382 | burmese,mm
383 | english,mm
384 | afrikaans,na
385 | english,na
386 | nama,na
387 | english,nr
388 | nepali,np
389 | dutch,nl
390 | english,nl
391 | low german,nl
392 | western frisian,nl
393 | fre$nch,nc
394 | english,nz
395 | maori,nz
396 | spanish,ni
397 | french,ne
398 | fulah,ne
399 | hausa,ne
400 | tasawaq,ne
401 | zarma,ne
402 | english,ng
403 | fulah,ng
404 | hausa,ng
405 | igbo,ng
406 | jju,ng
407 | tyap,ng
408 | yoruba,ng
409 | english,nu
410 | english,nf
411 | korean,kp
412 | albanian,mk
413 | macedonian,mk
414 | english,mp
415 | english,no
416 | northern sami,no
417 | norwegian nynorsk,no
418 | arabic,om
419 | english,pk
420 | pashto,pk
421 | punjabi,pk
422 | sindhi,pk
423 | urdu,pk
424 | english,pw
425 | spanish,pa
426 | english,pg
427 | guarani,py
428 | spanish,py
429 | quechua,pe
430 | spanish,pe
431 | cebuano,ph
432 | english,ph
433 | filipino,ph
434 | spanish,ph
435 | english,pn
436 | english,pl
437 | polish,pl
438 | english,pt
439 | portuguese,pt
440 | english,pr
441 | spanish,pr
442 | arabic,Qa
443 | french,re
444 | english,ro
445 | romanian,ro
446 | bashkir,ru
447 | chechen,ru
448 | chuvash,ru
449 | engli$sh,ru
450 | erzya,ru
451 | ossetic,ru
452 | russian,ru
453 | sakha,ru
454 | tatar,ru
455 | english,rw
456 | french,rw
457 | kinyarwanda,rw
458 | english,ws
459 | italian,sm
460 | portuguese,st
461 | arabic,sa
462 | english,sa
463 | french,sn
464 | fulah,sn
465 | jola-fonyi,sn
466 | wolof,sn
467 | english,rs
468 | serbian,rs
469 | english,sc
470 | french,sc
471 | english,sl
472 | fulah,sl
473 | chinese,sg
474 | english,sg
475 | malay,sg
476 | tamil,sg
477 | dutch,sx
478 | english,sx
479 | spanish,sx
480 | english,sk
481 | slovak,sk
482 | english,si
483 | slovenian,si
484 | english,sb
485 | arabic,so
486 | somali,so
487 | afrikaans,za
488 | english,za
489 | northern sotho,za
490 | south ndebele,za
491 | southern sotho,za
492 | swati,za
493 | tsonga,za
494 | tswana,za
495 | ve!nda,za
496 | xhosa,za
497 | zulu,za
498 | english,kr
499 | korean,kr
500 | arabic,ss
501 | english,ss
502 | nuer,ss
503 | asturian,es
504 | basque,es
505 | catalan,es
506 | english,es
507 | galician,es
508 | spanish,es
509 | sinhala,lk
510 | tamil,lk
511 | french,bl
512 | spanish,bl
513 | english,sh
514 | english,kn
515 | spanish,kn
516 | english,lc
517 | spanish,lc
518 | french,mf
519 | spanish,mf
520 | french,pm
521 | spanish,pm
522 | english,vc
523 | spanish,vc
524 | ara$bic,sd
525 | english,sd
526 | dutch,sR
527 | spanish,sr
528 | english,se
529 | northern sami,se
530 | swedish,se
531 | english,ch
532 | french,ch
533 | german,ch
534 | italian,ch
535 | portuguese,ch
536 | romansh,ch
537 | swiss german,ch
538 | walser,ch
539 | arabic,sy
540 | french,sy
541 | syriac,sy
542 | chinese,tw
543 | english,tw
544 | taroko,tw
545 | tajik,tj
546 | asu,tz
547 | bena,tz
548 | english,tz
549 | langi,tz
550 | machame,tz
551 | makonde,tz
552 | masai,tz
553 | ro$mbo,tz
554 | rwa,tz
555 | sangu,tz
556 | shambala,tz
557 | swahili,tz
558 | vunjo,tz
559 | english,th
560 | thai,th
561 | portuguese,tl
562 | ewe,tg
563 | french,tg
564 | english,to
565 | tongan,to
566 | english,tt
567 | spanish, tt
568 | arabic,tn
569 | french,tn
570 | english,tr
571 | kurdish,tr
572 | turkish,tr
573 | turkmen,tm
574 | english,tc
575 | spanish,tc
576 | english,tv
577 | english,vi
578 | spanish,vi
579 | chiga,ug
580 | english,ug
581 | g!anda,ug
582 | nyankole,ug
583 | soga,ug
584 | swahili,ug
585 | teso,ug
586 | english,ua
587 | russian,ua
588 | ukrainian,ua
589 | arabic,ae
590 | english,ae
591 | cornish,gb
592 | english,gb
593 | scottish gaelic,gb
594 | welsh,gb
595 | english,us
596 | spanish,uy
597 | uzbek,uz
598 | uzbek,uz
599 | english,vu
600 | french,vu
601 | italian,va
602 | spanish,ve
603 | vietnamese,vn
604 | french,wF
605 | arabic,ye
606 | bemba,zm
607 | en!glish,zm
608 | english,zw
609 | shona,zw
610 |
--------------------------------------------------------------------------------
/source_data/csv_output/output.csv:
--------------------------------------------------------------------------------
1 | row_id,insert_date,country_name,city_name,population,languages
2 | 1,2022-04-09,argentina,santa maria,9526,"{english,spanish}"
3 | 2,2022-04-09,brazil,clevelandia,16450,"{spanish,english,portuguese}"
4 | 3,2022-04-09,brazil,conceicao do rio verde,12949,"{english,portuguese,spanish}"
5 | 4,2022-04-09,brazil,curaca,35208,"{english,portuguese,spanish}"
6 | 5,2022-04-09,brazil,japoata,12938,"{spanish,english,portuguese}"
7 | 6,2022-04-09,haiti,petite riviere de nippes,29815,"{spanish,french}"
8 | 7,2022-04-09,peru,puerto pimentel,15552,"{quechua,spanish}"
9 | 8,2022-04-28,argentina,ciudad general belgrano,92957,"{english,spanish}"
10 | 9,2022-04-28,brazil,caete,40750,"{portuguese,spanish,english}"
11 | 10,2022-04-28,brazil,catanduva,119480,"{spanish,english,portuguese}"
12 | 11,2022-04-28,brazil,colider,32120,"{english,spanish,portuguese}"
13 | 12,2022-04-28,brazil,miracema do tocantins,20684,"{spanish,english,portuguese}"
14 | 13,2022-04-28,brazil,presidente medici,18571,"{spanish,english,portuguese}"
15 | 14,2022-04-28,colombia,coyaima,28443,"{spanish,english}"
16 | 15,2022-04-28,colombia,manati,13456,"{spanish,english}"
17 | 16,2022-04-28,dominican republic,licey al medio,25539,{spanish}
18 | 17,2022-04-28,mexico,amatitan,15344,"{spanish,english}"
19 | 18,2022-04-28,mexico,san juan zitlaltepec,19600,"{spanish,english}"
20 | 19,2022-04-28,mexico,teopisca,20044,"{english,spanish}"
21 | 20,2022-08-11,brazil,charqueadas,35320,"{portuguese,spanish,english}"
22 | 21,2022-08-11,brazil,chopinzinho,19679,"{portuguese,english,spanish}"
23 | 22,2022-08-11,brazil,jaguarao,27931,"{portuguese,spanish,english}"
24 | 23,2022-08-11,ecuador,vinces,30241,"{spanish,quechua}"
25 | 24,2022-08-11,guatemala,el palmar,31706,{spanish}
26 | 25,2022-08-11,nicaragua,villanueva,25660,{spanish}
27 | 26,2022-08-11,puerto rico,guaynabo,70542,"{english,spanish}"
28 |
--------------------------------------------------------------------------------
/source_data/scripts/build_tables.sql:
--------------------------------------------------------------------------------
1 | /*
2 | SQL Code Challenge
3 | Author: Jaime M. Shaker
4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
5 | Website: https://www.shaker.dev
6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | File Name: build_tables.sql
9 | */
10 |
11 | -- We must insure that our data is properly organized. Let's create a schema
12 | -- specificaly for importing/copying data from our CSV files.
13 | CREATE SCHEMA IF NOT EXISTS import_data;
14 | -- Create import_data.countries table
15 | -- Drop this table everytime this script is run to ensure repeatability.
16 | DROP TABLE IF EXISTS import_data.countries;
17 | -- Create a new table and add a country_id to use as a primary key.
18 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY.
19 | -- We will also create a created_on field that will have the date that the table was created.
20 | CREATE TABLE import_data.countries (
21 | -- Create an auto-incrementing, unique key for every row.
22 | -- This will also be used as our Primary Key.
23 | country_id INT GENERATED ALWAYS AS IDENTITY,
24 | country_name TEXT,
25 | country_code_2 TEXT,
26 | country_code_3 TEXT,
27 | region TEXT,
28 | sub_region TEXT,
29 | intermediate_region TEXT,
30 | created_on date,
31 | PRIMARY KEY (country_id)
32 | );
33 | -- We will use the COPY statment to extract the data from our CSV files.
34 | COPY import_data.countries (
35 | country_name,
36 | country_code_2,
37 | country_code_3,
38 | region,
39 | sub_region,
40 | intermediate_region
41 | )
42 | -- PostgreSQL stores its data in /var/lib/postgresql/
43 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access.
44 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv'
45 | -- The CSV files are comma separated and include headers.
46 | WITH DELIMITER ',' HEADER CSV;
47 |
48 | -- Using the same process, lets create tables for all of our csv files.
49 | /* Create import.cities table */
50 | DROP TABLE IF EXISTS import_data.cities;
51 | CREATE TABLE import_data.cities (
52 | city_id INT GENERATED ALWAYS AS IDENTITY,
53 | city_name TEXT,
54 | latitude TEXT,
55 | longitude TEXT,
56 | country_code_2 TEXT,
57 | capital TEXT,
58 | population TEXT,
59 | insert_date TEXT,
60 | PRIMARY KEY (city_id)
61 | );
62 |
63 | COPY import_data.cities (
64 | city_name,
65 | latitude,
66 | longitude,
67 | country_code_2,
68 | capital,
69 | population,
70 | insert_date
71 | )
72 | FROM '/var/lib/postgresql/source_data/csv_data/cities.csv'
73 | WITH DELIMITER ',' HEADER CSV;
74 |
75 | /* Create import.currencies table */
76 | DROP TABLE IF EXISTS import_data.currencies;
77 | CREATE TABLE import_data.currencies (
78 | currency_id INT GENERATED ALWAYS AS IDENTITY,
79 | country_code_2 TEXT,
80 | currency_name TEXT,
81 | currency_code TEXT,
82 | PRIMARY KEY (currency_id)
83 | );
84 |
85 | COPY import_data.currencies (
86 | country_code_2,
87 | currency_name,
88 | currency_code
89 | )
90 | FROM '/var/lib/postgresql/source_data/csv_data/currencies.csv'
91 | WITH DELIMITER ',' HEADER CSV;
92 |
93 | /* Create import.languages table */
94 | DROP TABLE IF EXISTS import_data.languages;
95 | CREATE TABLE import_data.languages (
96 | language_id INT GENERATED ALWAYS AS IDENTITY,
97 | language TEXT,
98 | country_code_2 TEXT,
99 | PRIMARY KEY (language_id)
100 | );
101 |
102 | COPY import_data.languages (
103 | language,
104 | country_code_2
105 | )
106 | FROM '/var/lib/postgresql/source_data/csv_data/languages.csv'
107 | WITH DELIMITER ',' HEADER CSV;
--------------------------------------------------------------------------------
/source_data/scripts/cleanup_db.sql:
--------------------------------------------------------------------------------
1 | /*
2 | SQL Code Challenge
3 | Author: Jaime M. Shaker
4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
5 | Website: https://www.shaker.dev
6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | File Name: cleanup_db.sql
9 | */
10 |
11 | -- Drop All import_data Tables and Schema
12 |
13 | DROP TABLE import_data.countries;
14 | DROP TABLE import_data.cities;
15 | DROP TABLE import_data.currencies;
16 | DROP TABLE import_data.languages;
17 | DROP SCHEMA import_data CASCADE;
18 |
--------------------------------------------------------------------------------
/source_data/scripts/code_challenge.sql:
--------------------------------------------------------------------------------
1 | /*
2 | SQL Code Challenge
3 | Author: Jaime M. Shaker
4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
5 | Website: https://www.shaker.dev
6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | File Name: code_challenge.sql
9 | */
10 |
11 |
12 | /* Question 2.
13 | *
14 | * List all of the regions and the total number of countries in each region.
15 | * Order by country count in descending order and capitalize the region name.
16 | *
17 | */
18 |
19 | SELECT
20 | -- initcap() capitalizes the first letter of every word in a string.
21 | initcap(region) AS region,
22 | count(*) AS country_count
23 | FROM
24 | cleaned_data.countries
25 | GROUP BY
26 | -- Aggregate functions 'count()' require you to group all column fields.
27 | region
28 | ORDER BY
29 | country_count DESC;
30 |
31 | /*
32 |
33 | region |country_count|
34 | ---------+-------------+
35 | Africa | 59|
36 | Americas | 57|
37 | Asia | 50|
38 | Europe | 48|
39 | Oceania | 26|
40 | Antartica| 1|
41 |
42 | */
43 |
44 | /* Question 3.
45 | *
46 | * List all of the sub-regions and the total number of cities in each sub-region.
47 | * Order by sub-region name alphabetically.
48 | *
49 | */
50 |
51 | SELECT
52 | initcap(t1.sub_region) AS sub_region,
53 | count(*) AS city_count
54 | FROM
55 | cleaned_data.countries AS t1
56 | JOIN
57 | cleaned_data.cities AS t2
58 | ON
59 | t1.country_code_2 = t2.country_code_2
60 | GROUP BY
61 | t1.sub_region
62 | ORDER BY
63 | t1.sub_region;
64 |
65 | /*
66 |
67 | sub_region |city_count|
68 | -------------------------------+----------+
69 | Australia And New Zealand | 329|
70 | Central Asia | 560|
71 | Eastern Asia | 3164|
72 | Eastern Europe | 2959|
73 | Latin America And The Caribbean| 7204|
74 | Melanesia | 60|
75 | Micronesia | 15|
76 | Northern Africa | 1152|
77 | Northern America | 5844|
78 | Northern Europe | 2025|
79 | Polynesia | 22|
80 | Southeastern Asia | 2627|
81 | Southern Asia | 6848|
82 | Southern Europe | 3238|
83 | Subsaharan Africa | 3223|
84 | Western Asia | 1400|
85 | Western Europe | 3952|
86 |
87 | */
88 |
89 | /* Question 4.
90 | *
91 | * List all of the countries and the total number of cities in the Northern Europe sub-region.
92 | * List the country names in uppercase and order the list by the length of the country name and
93 | * alphabetically in ascending order.
94 | *
95 | */
96 |
97 | SELECT
98 | -- upper() returns your string in uppercase.
99 | upper(t1.country_name) AS country_name,
100 | count(*) AS city_count
101 | FROM
102 | cleaned_data.countries AS t1
103 | JOIN
104 | cleaned_data.cities AS t2
105 | ON
106 | t1.country_code_2 = t2.country_code_2
107 | WHERE
108 | t1.sub_region = 'northern europe'
109 | GROUP BY
110 | t1.country_name
111 | ORDER BY
112 | -- length() returns the number or characters in a string including spaces.
113 | length(t1.country_name), t1.country_name;
114 |
115 | /*
116 |
117 | country_name |city_count|
118 | --------------+----------+
119 | JERSEY | 1|
120 | LATVIA | 39|
121 | NORWAY | 127|
122 | SWEDEN | 148|
123 | DENMARK | 75|
124 | ESTONIA | 20|
125 | FINLAND | 142|
126 | ICELAND | 12|
127 | IRELAND | 64|
128 | LITHUANIA | 61|
129 | ISLE OF MAN | 2|
130 | FAROE ISLANDS | 29|
131 | UNITED KINGDOM| 1305|
132 |
133 | */
134 |
135 | /* Question 5.
136 | *
137 | * List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021.
138 | * Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities.
139 | *
140 | */
141 |
142 | SELECT
143 | initcap(t1.country_name) AS country_name,
144 | count(*) AS city_count
145 | FROM
146 | cleaned_data.countries AS t1
147 | JOIN
148 | cleaned_data.cities AS t2
149 | ON
150 | t1.country_code_2 = t2.country_code_2
151 | WHERE
152 | t1.sub_region = 'southern europe'
153 | AND
154 | -- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours.....
155 | EXTRACT('year' FROM t2.insert_date) = 2021
156 | GROUP BY
157 | t1.country_name
158 | ORDER BY
159 | -- substring() function returns a specific portion of a string.
160 | substring(t1.country_name,length(t1.country_name),1), city_count;
161 |
162 | /*
163 |
164 | country_name |city_count|
165 | ----------------------+----------+
166 | Andorra | 5|
167 | Albania | 11|
168 | Bosnia And Herzegovina| 15|
169 | Croatia | 22|
170 | North Macedonia | 28|
171 | Malta | 32|
172 | Serbia | 58|
173 | Slovenia | 74|
174 | Greece | 64|
175 | Portugal | 109|
176 | Spain | 302|
177 | San Marino | 2|
178 | Montenegro | 12|
179 | Italy | 542|
180 |
181 | */
182 |
183 | /* Question 6.
184 | *
185 | * List all of the countries in the region of Asia that did **NOT** have a city
186 | * with an inserted date from June 2021 through Sept 2021.
187 | *
188 | */
189 |
190 | SELECT
191 | -- Distinct will only return unique values
192 | DISTINCT initcap(t1.country_name) AS country_name
193 | FROM
194 | cleaned_data.countries AS t1
195 | -- Left join will return all matching values from the left table (cleaned_data.countries) and
196 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where
197 | -- there are no matches in the left table.
198 | LEFT JOIN
199 | cleaned_data.cities AS t2
200 | ON
201 | t1.country_code_2 = t2.country_code_2
202 | AND
203 | t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01'
204 | WHERE
205 | t1.region = 'asia'
206 | -- Only return values that did NOT have a match with the countries table.
207 | AND
208 | t2.country_code_2 IS NULL;
209 |
210 | /*
211 |
212 | country_name |
213 | -----------------+
214 | Brunei Darussalam|
215 | Kuwait |
216 | Macao |
217 | Singapore |
218 |
219 | */
220 |
221 | /* Question 7.
222 | *
223 | * List the country, city name, population and city name length for the city names that are palindromes in the
224 | * Western Asia sub-region. Format the population with a thousands separator (1,000) and format the length of
225 | * the city name in roman numerals. Order by the length of the city name in descending order and
226 | * alphabetically in ascending order.
227 | *
228 | */
229 |
230 | SELECT
231 | initcap(t1.country_name) AS country_name,
232 | initcap(t2.city_name) AS city_name,
233 | -- to_char() takes non-string data types and returns them as strings.
234 | to_char(t2.population, '999G999') AS population,
235 | to_char(length(t2.city_name), 'RN') AS roman_numeral_length
236 | FROM
237 | cleaned_data.countries AS t1
238 | JOIN
239 | cleaned_data.cities AS t2
240 | ON
241 | t1.country_code_2 = t2.country_code_2
242 | WHERE
243 | t2.city_name = reverse(t2.city_name)
244 | AND
245 | t1.sub_region = 'western asia'
246 | ORDER BY
247 | length(t2.city_name) DESC, t2.city_name ASC;
248 |
249 | /*
250 |
251 | country_name |city_name|population|roman_numeral_length|
252 | --------------------+---------+----------+--------------------+
253 | Yemen |Hajjah | 46,568 | VI |
254 | Syrian Arab Republic|Hamah | 696,863 | V |
255 | Turkey |Kavak | 21,692 | V |
256 | Turkey |Kinik | 29,803 | V |
257 | Turkey |Tut | 10,161 | III |
258 |
259 | */
260 |
261 |
262 | /* Question 8.
263 | *
264 | * List all of the countries that end in 'stan'. Make your query case-insensitive and list
265 | * whether the total population of the cities listed is an odd or even number for cities inserted in 2022.
266 | * Order by whether the population value is odd or even in ascending order and country name in alphabetical order.
267 | *
268 | */
269 |
270 | SELECT
271 | initcap(t1.country_name) AS country_name,
272 | to_char(sum(t2.population), '99G999G999') total_population,
273 | CASE
274 | WHEN (sum(t2.population) % 2) = 0
275 | THEN 'Even'
276 | ELSE
277 | 'Odd'
278 | END AS odd_or_even
279 | FROM
280 | cleaned_data.countries AS t1
281 | JOIN
282 | cleaned_data.cities AS t2
283 | ON
284 | t1.country_code_2 = t2.country_code_2
285 | WHERE
286 | t1.country_name ILIKE '%stan'
287 | AND
288 | EXTRACT('year' FROM t2.insert_date) = 2022
289 | GROUP BY
290 | t1.country_name
291 | ORDER BY
292 | odd_or_even, country_name;
293 |
294 | /*
295 |
296 | country_name|total_population|odd_or_even|
297 | ------------+----------------+-----------+
298 | Afghanistan | 6,006,530 |Even |
299 | Kazakhstan | 4,298,264 |Even |
300 | Kyrgyzstan | 1,017,644 |Even |
301 | Pakistan | 26,344,480 |Even |
302 | Tajikistan | 2,720,953 |Odd |
303 | Turkmenistan| 419,607 |Odd |
304 | Uzbekistan | 3,035,547 |Odd |
305 |
306 | */
307 |
308 | /* Question 9.
309 | *
310 | * List the third most populated city ranked by region WITHOUT using limit or offset.
311 | * List the region name, city name, population and order the results by region.
312 | *
313 | */
314 |
315 | WITH get_city_rank_cte AS (
316 | SELECT
317 | t1.region,
318 | t2.city_name,
319 | t2.population AS third_largest_pop,
320 | DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk
321 | FROM
322 | cleaned_data.countries AS t1
323 | JOIN
324 | cleaned_data.cities AS t2
325 | ON
326 | t1.country_code_2 = t2.country_code_2
327 | WHERE
328 | t2.population IS NOT NULL
329 | GROUP BY
330 | t1.region,
331 | t2.city_name,
332 | t2.population
333 | )
334 | SELECT
335 | initcap(region) AS region,
336 | initcap(city_name) AS city_name,
337 | to_char(third_largest_pop, '99G999G999') AS third_largest_pop
338 | FROM
339 | get_city_rank_cte
340 | WHERE
341 | rnk = 3;
342 |
343 | /*
344 |
345 | region |city_name|third_largest_pop|
346 | --------+---------+-----------------+
347 | Africa |Kinshasa | 12,836,000 |
348 | Americas|New York | 18,972,871 |
349 | Asia |Delhi | 32,226,000 |
350 | Europe |Paris | 11,060,000 |
351 | Oceania |Brisbane | 2,360,241 |
352 |
353 | */
354 |
355 | /* Question 10.
356 | *
357 | * List the bottom third of all countries in the Western Asia sub-region that speak Arabic.
358 | * Include the row number and country name. Order by row number.
359 | *
360 | */
361 |
362 | WITH get_ntile_cte AS (
363 | SELECT
364 | ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn,
365 | t1.country_name,
366 | -- ntile() window functions returns groups of data section into 'buckets'.
367 | NTILE(3) OVER (ORDER BY t1.country_name) AS nt
368 | FROM
369 | cleaned_data.countries AS t1
370 | JOIN
371 | cleaned_data.languages AS t2
372 | ON
373 | t1.country_code_2 = t2.country_code_2
374 | WHERE
375 | t1.sub_region = 'western asia'
376 | AND
377 | t2.language = 'arabic'
378 | )
379 | SELECT
380 | rn AS row_id,
381 | country_name
382 | FROM
383 | get_ntile_cte
384 | WHERE
385 | nt = 3;
386 |
387 |
388 | /*
389 |
390 | row_id|country_name |
391 | ------+--------------------+
392 | 9|saudi arabia |
393 | 10|syrian arab republic|
394 | 11|united arab emirates|
395 | 12|yemen |
396 |
397 | */
398 |
399 | /* Question 11.
400 | *
401 | * Create a query that lists country name, capital name, population, languages spoken and currency name for countries
402 | * in the Northen Africa sub-region. There can be multiple currency names and languages spoken per country.
403 | * Add multiple values for the same field into an array.
404 | *
405 | */
406 |
407 | WITH get_row_values AS (
408 | SELECT
409 | t1.country_name,
410 | t2.city_name,
411 | t2.population,
412 | -- array_agg() aggregates multiple values and returns them in 'array' format.
413 | array_agg(t3.LANGUAGE) AS languages,
414 | t4.currency_name AS currencies
415 | FROM
416 | cleaned_data.countries AS t1
417 | JOIN
418 | cleaned_data.cities AS t2
419 | ON
420 | t1.country_code_2 = t2.country_code_2
421 | JOIN
422 | cleaned_data.languages AS t3
423 | ON
424 | t1.country_code_2 = t3.country_code_2
425 | JOIN
426 | cleaned_data.currencies AS t4
427 | ON
428 | t1.country_code_2 = t4.country_code_2
429 | WHERE
430 | t1.sub_region = 'northern africa'
431 | AND
432 | t2.capital = TRUE
433 | GROUP BY
434 | t1.country_name,
435 | t2.city_name,
436 | t2.population,
437 | t4.currency_name
438 | )
439 | SELECT
440 | *
441 | FROM
442 | get_row_values;
443 |
444 | /*
445 |
446 | country_name|city_name|population|languages |currencies |
447 | ------------+---------+----------+--------------------------------------------+---------------+
448 | algeria |algiers | 3415811|{french,arabic,kabyle} |algerian dinar |
449 | egypt |cairo | 20296000|{arabic} |egyptian pound |
450 | libya |tripoli | 1293016|{arabic} |libyan dinar |
451 | morocco |rabat | 572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham|
452 | sudan |khartoum | 7869000|{arabic,english} |sudanese pound |
453 | tunisia |tunis | 1056247|{french,arabic} |tunisian dinar |
454 |
455 | */
456 |
457 |
458 | /* Question 12.
459 | *
460 | * Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022. List
461 | * how many vowels and consonants are present in the city name and concatnate their percentage to the their respective
462 | * count in parenthesis.
463 | *
464 | */
465 |
466 | WITH get_letter_count AS (
467 | SELECT
468 | t1.city_name,
469 | length(t1.city_name) string_length,
470 | -- regexp_replace() returns a vlue that has been manipulated by a regular expression.
471 | length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count
472 | FROM
473 | cleaned_data.cities AS t1
474 | WHERE
475 | t1.insert_date = '2022-04-28'
476 | AND
477 | t1.country_code_2 in ('us')
478 | ),
479 | get_letter_diff AS (
480 | SELECT
481 | city_name,
482 | string_length - consonant_count AS vowels,
483 | round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc,
484 | consonant_count AS consonants,
485 | round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc
486 | FROM
487 | get_letter_count
488 | )
489 | SELECT
490 | city_name,
491 | vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc,
492 | consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc
493 | FROM
494 | get_letter_diff
495 | ORDER BY
496 | vowels;
497 |
498 | /*
499 |
500 | city_name |vowel_count_perc|consonants_count_perc|
501 | ---------------+----------------+---------------------+
502 | standish |2 (25.00%) |6 (75%) |
503 | grand forks |2 (18.18%) |9 (81.82%) |
504 | camano |3 (50.00%) |3 (50%) |
505 | cedar hills |3 (27.27%) |8 (72.73%) |
506 | gladstone |3 (33.33%) |6 (66.67%) |
507 | whitehall |3 (33.33%) |6 (66.67%) |
508 | homewood |4 (50.00%) |4 (50%) |
509 | willowbrook |4 (36.36%) |7 (63.64%) |
510 | port salerno |4 (33.33%) |8 (66.67%) |
511 | vadnais heights|5 (33.33%) |10 (66.67%) |
512 | jeffersonville |5 (35.71%) |9 (64.29%) |
513 |
514 | */
515 |
516 | /* Question 13.
517 | *
518 | * List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted
519 | * in April 2022.
520 | *
521 | */
522 |
523 | DROP TABLE IF EXISTS get_dates;
524 | CREATE TEMP TABLE get_dates AS (
525 | SELECT
526 | DISTINCT ON (insert_date) insert_date AS insert_date,
527 | city_name
528 | FROM
529 | cleaned_data.cities
530 | WHERE
531 | country_code_2 = 'ca'
532 | AND
533 | insert_date BETWEEN '2022-04-01' AND '2022-04-30'
534 | ORDER BY
535 | insert_date
536 | );
537 |
538 | DROP TABLE IF EXISTS get_diff;
539 | CREATE TEMP TABLE get_diff AS (
540 | SELECT
541 | city_name,
542 | insert_date,
543 | EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff
544 | FROM
545 | get_dates
546 | );
547 |
548 |
549 | DROP TABLE IF EXISTS get_diff_count;
550 | CREATE TEMP TABLE get_diff_count AS (
551 | SELECT
552 | city_name,
553 | insert_date,
554 | count(*) OVER (PARTITION BY diff) AS diff_count
555 | FROM
556 | get_diff
557 | );
558 |
559 |
560 | WITH get_rank AS (
561 | SELECT
562 | DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk,
563 | insert_date,
564 | city_name
565 | FROM
566 | get_diff_count
567 | )
568 | SELECT
569 | insert_date AS most_consecutive_dates,
570 | initcap(city_name) AS city_name
571 | FROM
572 | get_rank
573 | WHERE
574 | rnk = 1
575 | ORDER BY
576 | insert_date;
577 |
578 | /*
579 |
580 | most_consecutive_dates|city_name |
581 | ----------------------+------------+
582 | 2022-04-22|South Dundas|
583 | 2022-04-23|La Prairie |
584 | 2022-04-24|Elliot Lake |
585 | 2022-04-25|Lachute |
586 |
587 | */
588 |
589 | /* Question 14.
590 | *
591 | * Create a view that lists the month-year, the number of cities inserted for that month,
592 | * a running city count total and the month over month percentage growth for 2021.
593 | *
594 | * Format the cities count and the running total with the thousands separator
595 | * and format the month over month growth with a plus symbol and percentage symbol.
596 | *
597 | * Example: Feb-2021 | 1,000 | 2,000 |+60.00% |
598 | *
599 | */
600 |
601 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth;
602 | CREATE VIEW cleaned_data.view_2021_growth AS (
603 | WITH get_month_count AS (
604 | SELECT
605 | date_trunc('month', insert_date) as single_month,
606 | count(*) AS monthly_count
607 | FROM
608 | cleaned_data.cities
609 | WHERE
610 | EXTRACT('year' FROM insert_date) = 2021
611 | GROUP BY
612 | single_month
613 | ORDER BY
614 | single_month
615 | ),
616 | get_running_total AS (
617 | SELECT
618 | single_month::date,
619 | monthly_count,
620 | sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities
621 | FROM
622 | get_month_count
623 | ),
624 | get_month_over_month AS (
625 | SELECT
626 | single_month,
627 | monthly_count,
628 | total_num_cities,
629 | round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month
630 | FROM
631 | get_running_total
632 | )
633 | SELECT
634 | to_char(single_month, 'Mon-YYYY') AS month_year,
635 | to_char(monthly_count, '9G999') AS cities_inserted,
636 | to_char(total_num_cities, '99G999') AS running_total,
637 | to_char(month_over_month, 'sg99.99') || '%' AS month_over_month
638 | FROM
639 | get_month_over_month
640 | );
641 |
642 | SELECT
643 | *
644 | FROM
645 | cleaned_data.view_2021_growth;
646 |
647 | /*
648 |
649 | month_year|cities_inserted|running_total|month_over_month|
650 | ----------+---------------+-------------+----------------+
651 | Jan-2021 | 1,471 | 1,471 | |
652 | Feb-2021 | 1,291 | 2,762 |+87.76% |
653 | Mar-2021 | 1,485 | 4,247 |+53.77% |
654 | Apr-2021 | 1,508 | 5,755 |+35.51% |
655 | May-2021 | 1,509 | 7,264 |+26.22% |
656 | Jun-2021 | 1,395 | 8,659 |+19.20% |
657 | Jul-2021 | 1,394 | 10,053 |+16.10% |
658 | Aug-2021 | 1,481 | 11,534 |+14.73% |
659 | Sep-2021 | 1,452 | 12,986 |+12.59% |
660 | Oct-2021 | 1,446 | 14,432 |+11.14% |
661 | Nov-2021 | 1,378 | 15,810 |+9.55% |
662 | Dec-2021 | 1,472 | 17,282 |+9.31% |
663 |
664 | */
665 |
666 | /* Question 15.
667 | *
668 | * Create and call a stored procedure that lists a unique row id number, insert date, county name, city name, population
669 | * and languages spoken for countries in the Latin America and the Caribbean sub-region that were insert on
670 | * either '2022-04-09', '2022-04-28' or '2022-08-11'.
671 | *
672 | * Order by the insert date and output the results (including headers) to a CSV file located in /source_data/csv_output/.
673 | *
674 | */
675 |
676 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output ()
677 | LANGUAGE plpgsql
678 | AS
679 | $sproc$
680 | BEGIN
681 | COPY (
682 | SELECT
683 | ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id,
684 | t1.insert_date,
685 | t2.country_name,
686 | t1.city_name,
687 | t1.population,
688 | array_agg(t3.language) AS languages
689 | FROM
690 | cleaned_data.cities AS t1
691 | JOIN
692 | cleaned_data.countries AS t2
693 | ON
694 | t1.country_code_2 = t2.country_code_2
695 | LEFT JOIN
696 | cleaned_data.languages AS t3
697 | ON
698 | t2.country_code_2 = t3.country_code_2
699 | WHERE
700 | t2.sub_region = 'latin america and the caribbean'
701 | AND
702 | t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11')
703 | GROUP BY
704 | t1.insert_date,
705 | t2.country_name,
706 | t1.city_name,
707 | t1.population
708 | ORDER BY
709 | t1.insert_date
710 | )
711 | TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER;
712 | END
713 | $sproc$;
714 |
715 | -- Call the stored procedure
716 | CALL cleaned_data.sproc_output();
717 |
718 | /*
719 |
720 | Results located in /source_data/csv_output/output.csv
721 |
722 | row_id,insert_date,country_name,city_name,population,languages
723 | 1,2022-04-09,argentina,santa maria,9526,"{english,spanish}"
724 | 2,2022-04-09,brazil,clevelandia,16450,"{spanish,english,portuguese}"
725 | 3,2022-04-09,brazil,conceicao do rio verde,12949,"{english,portuguese,spanish}"
726 | 4,2022-04-09,brazil,curaca,35208,"{english,portuguese,spanish}"
727 | 5,2022-04-09,brazil,japoata,12938,"{spanish,english,portuguese}"
728 | 6,2022-04-09,haiti,petite riviere de nippes,29815,"{spanish,french}"
729 | 7,2022-04-09,peru,puerto pimentel,15552,"{quechua,spanish}"
730 | 8,2022-04-28,argentina,ciudad general belgrano,92957,"{english,spanish}"
731 | 9,2022-04-28,brazil,caete,40750,"{portuguese,spanish,english}"
732 | 10,2022-04-28,brazil,catanduva,119480,"{spanish,english,portuguese}"
733 | 11,2022-04-28,brazil,colider,32120,"{english,spanish,portuguese}"
734 | 12,2022-04-28,brazil,miracema do tocantins,20684,"{spanish,english,portuguese}"
735 | 13,2022-04-28,brazil,presidente medici,18571,"{spanish,english,portuguese}"
736 | 14,2022-04-28,colombia,coyaima,28443,"{spanish,english}"
737 | 15,2022-04-28,colombia,manati,13456,"{spanish,english}"
738 | 16,2022-04-28,dominican republic,licey al medio,25539,{spanish}
739 | 17,2022-04-28,mexico,amatitan,15344,"{spanish,english}"
740 | 18,2022-04-28,mexico,san juan zitlaltepec,19600,"{spanish,english}"
741 | 19,2022-04-28,mexico,teopisca,20044,"{english,spanish}"
742 | 20,2022-08-11,brazil,charqueadas,35320,"{portuguese,spanish,english}"
743 | 21,2022-08-11,brazil,chopinzinho,19679,"{portuguese,english,spanish}"
744 | 22,2022-08-11,brazil,jaguarao,27931,"{portuguese,spanish,english}"
745 | 23,2022-08-11,ecuador,vinces,30241,"{spanish,quechua}"
746 | 24,2022-08-11,guatemala,el palmar,31706,{spanish}
747 | 25,2022-08-11,nicaragua,villanueva,25660,{spanish}
748 | 26,2022-08-11,puerto rico,guaynabo,70542,"{english,spanish}"
749 |
750 | */
751 |
752 |
753 |
--------------------------------------------------------------------------------
/source_data/scripts/create_relationships.sql:
--------------------------------------------------------------------------------
1 | /*
2 | SQL Code Challenge
3 | Author: Jaime M. Shaker
4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
5 | Website: https://www.shaker.dev
6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | File Name: create_relationships.sql
9 | */
10 |
11 | -- Alter cleaned_data.countries and add the UNIQUE constraint to country_code_2
12 |
13 | ALTER TABLE
14 | cleaned_data.countries
15 | ADD CONSTRAINT
16 | unique_country_code_2
17 | UNIQUE (country_code_2);
18 |
19 | -- Alter all other tables and add a foreign key constraint and reference.
20 | -- Create Foreign Key relationship for cleaned_data.cities
21 | ALTER TABLE
22 | -- Table to be altered
23 | cleaned_data.cities
24 | ADD CONSTRAINT
25 | -- Give this constraint a name
26 | fk_country_city
27 | -- Which key in cleaned_data.cities is a foreign key
28 | FOREIGN KEY (country_code_2)
29 | -- Which key to reference from parent table
30 | REFERENCES cleaned_data.countries (country_code_2);
31 |
32 | -- Create Foreign Key relationship for cleaned_data.currencies
33 | ALTER TABLE
34 | cleaned_data.currencies
35 | ADD CONSTRAINT
36 | fk_country_currencies
37 | FOREIGN KEY (country_code_2)
38 | REFERENCES cleaned_data.countries (country_code_2);
39 |
40 | -- Create Foreign Key relationship for cleaned_data.languages
41 | ALTER TABLE
42 | cleaned_data.languages
43 | ADD CONSTRAINT
44 | fk_country_languages
45 | FOREIGN KEY (country_code_2)
46 | REFERENCES cleaned_data.countries (country_code_2);
47 |
48 |
--------------------------------------------------------------------------------
/source_data/scripts/drop_all_tables.sql:
--------------------------------------------------------------------------------
1 | /*
2 | SQL Code Challenge
3 | Author: Jaime M. Shaker
4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
5 | Website: https://www.shaker.dev
6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | File Name: drop_all_tables.sql
9 |
10 | Description: This script will drop all tables, schemas and constraints used in
11 | this project.
12 | */
13 |
14 | -- Drop import_data tables and schema.
15 | DROP TABLE IF EXISTS import_data.countries;
16 | DROP TABLE IF EXISTS import_data.cities;
17 | DROP TABLE IF EXISTS import_data.currencies;
18 | DROP TABLE IF EXISTS import_data.languages;
19 | DROP SCHEMA IF EXISTS import_data;
20 |
21 | -- Drop Foreign Key relationship for cleaned_data.currencies
22 | ALTER TABLE IF EXISTS
23 | -- Table to be altered
24 | cleaned_data.cities
25 | DROP CONSTRAINT IF EXISTS
26 | -- Drop this constraint name
27 | fk_country_city;
28 |
29 | -- Drop Foreign Key relationship for cleaned_data.currencies
30 | ALTER TABLE IF EXISTS
31 | cleaned_data.currencies
32 | DROP CONSTRAINT IF EXISTS
33 | fk_country_currencies;
34 |
35 | -- Drop Foreign Key relationship for cleaned_data.languages
36 | ALTER TABLE IF EXISTS
37 | cleaned_data.languages
38 | DROP CONSTRAINT IF EXISTS
39 | fk_country_languages;
40 |
41 | -- Drop cleaned_data tables and schema.
42 | DROP TABLE IF EXISTS cleaned_data.countries;
43 | DROP TABLE IF EXISTS cleaned_data.cities;
44 | DROP TABLE IF EXISTS cleaned_data.currencies;
45 | DROP TABLE IF EXISTS cleaned_data.languages;
46 | DROP SCHEMA IF EXISTS cleaned_data CASCADE;
--------------------------------------------------------------------------------
/source_data/scripts/normalize_tables.sql:
--------------------------------------------------------------------------------
1 | /*
2 | SQL Code Challenge
3 | Author: Jaime M. Shaker
4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
5 | Website: https://www.shaker.dev
6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | File Name: normalize_tables.sql
9 | */
10 |
11 | -- We must insure that our data is properly organized. Let's create a schema
12 | -- specificaly for our transformed, clean data.
13 | CREATE SCHEMA IF NOT EXISTS cleaned_data;
14 | -- Create countries table
15 | -- Drop this table everytime this script is run to ensure repeatability.
16 | -- We are adding CASCADE for when we add foreign key relationships.
17 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE;
18 | -- We must presuppose that the data in it's current state is unusable/unreliable.
19 | --
20 | CREATE TABLE cleaned_data.countries (
21 | -- The country_id was automatically incremented and can be entered as an integer.
22 | country_id INT NOT NULL,
23 | -- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text).
24 | -- Use text to ensure that there are no errors due to string length.
25 | country_name TEXT NOT NULL,
26 | -- The data appears to have duplicate entries so we will remove them once the data has been cleaned of
27 | -- any unwanted characters.
28 | country_code_2 varchar(2) NOT NULL,
29 | country_code_3 varchar(3) NOT NULL,
30 | region TEXT,
31 | sub_region TEXT,
32 | intermediate_region TEXT,
33 | created_on date,
34 | PRIMARY KEY (country_id)
35 | );
36 |
37 | INSERT INTO cleaned_data.countries (
38 | country_id,
39 | country_name,
40 | country_code_2,
41 | country_code_3,
42 | region,
43 | sub_region,
44 | intermediate_region,
45 | created_on
46 | )
47 | (
48 | SELECT
49 | i.country_id,
50 | -- regex_replace() function removes any special characters with a simple regex-expression.
51 | -- trim() function removes white space from either end of the string.
52 | -- lower() function converts all characters to lowercase.
53 | trim(lower(regexp_replace(i.country_name, '[^\w\s^. ]', '', 'gi'))),
54 | -- Properly cast type from TEXT into new table
55 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar,
56 | trim(lower(regexp_replace(i.country_code_3, '[^\w\s^. ]', '', 'gi')))::varchar,
57 | trim(lower(regexp_replace(i.region, '[^\w\s^. ]', '', 'gi'))),
58 | trim(lower(regexp_replace(i.sub_region, '[^\w\s^. ]', '', 'gi'))),
59 | trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^. ]', '', 'gi'))),
60 | -- Use the built-in function current_date to insert current date into created_on field.
61 | current_date
62 | FROM
63 | import_data.countries AS i
64 | );
65 |
66 | /* Create cleaned_data.cities table */
67 | DROP TABLE IF EXISTS cleaned_data.cities;
68 | CREATE TABLE cleaned_data.cities (
69 | city_id int,
70 | city_name TEXT,
71 | latitude float,
72 | longitude float,
73 | country_code_2 varchar(2) NOT NULL,
74 | capital boolean,
75 | population int,
76 | insert_date date,
77 | PRIMARY KEY (city_id)
78 | );
79 |
80 | INSERT INTO cleaned_data.cities (
81 | city_id,
82 | city_name,
83 | latitude,
84 | longitude,
85 | country_code_2,
86 | capital,
87 | population,
88 | insert_date
89 | )
90 | (
91 | SELECT
92 | i.city_id,
93 | trim(lower(regexp_replace(i.city_name, '[^\w\s^. ]', '', 'gi'))),
94 | i.longitude::float,
95 | i.latitude::float,
96 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar,
97 | i.capital::boolean,
98 | i.population::int,
99 | i.insert_date::date
100 | FROM
101 | import_data.cities AS i
102 | );
103 |
104 | /* Create cleaned_data.currencies table */
105 | DROP TABLE IF EXISTS cleaned_data.currencies;
106 | CREATE TABLE cleaned_data.currencies (
107 | currency_id int,
108 | country_code_2 varchar(2) NOT NULL,
109 | currency_name TEXT,
110 | currency_code TEXT,
111 | PRIMARY KEY (currency_id)
112 | );
113 |
114 | INSERT INTO cleaned_data.currencies (
115 | currency_id,
116 | country_code_2,
117 | currency_name,
118 | currency_code
119 | )
120 | (
121 | SELECT
122 | currency_id,
123 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar,
124 | trim(lower(regexp_replace(i.currency_name, '[^\w\s^. ]', '', 'gi'))),
125 | trim(lower(regexp_replace(i.currency_code, '[^\w\s^. ]', '', 'gi')))
126 | FROM
127 | import_data.currencies AS i
128 | );
129 |
130 | /* Create cleaned_data.languages table */
131 | DROP TABLE IF EXISTS cleaned_data.languages;
132 | CREATE TABLE cleaned_data.languages (
133 | language_id int,
134 | language TEXT,
135 | country_code_2 varchar(2) NOT NULL,
136 | PRIMARY KEY (language_id)
137 | );
138 |
139 | INSERT INTO cleaned_data.languages (
140 | language_id,
141 | language,
142 | country_code_2
143 | )
144 | (
145 | SELECT
146 | language_id,
147 | trim(lower(regexp_replace(i.language, '[^\w\s^. ]', '', 'gi'))),
148 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar
149 | FROM
150 | import_data.languages AS i
151 | );
152 |
153 |
154 |
--------------------------------------------------------------------------------
/source_data/scripts/remove_duplicates.sql:
--------------------------------------------------------------------------------
1 | /*
2 | SQL Code Challenge
3 | Author: Jaime M. Shaker
4 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
5 | Website: https://www.shaker.dev
6 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | File Name: remove_duplicates.sql
9 | */
10 |
11 | -- Remove Duplicate Entries
12 | -- Check for duplicate entries in clean_data.countries
13 | /*
14 | SELECT
15 | -- Get the columns.
16 | country_code_2,
17 | -- Count how many times the country_code_2 occurs.
18 | count(*)
19 | FROM
20 | cleaned_data.countries
21 | GROUP BY
22 | -- Using an aggregate function forces us to group all like names together.
23 | country_code_2
24 | HAVING
25 | -- Only select values that have a count greater than one (multiple entries).
26 | count(*) > 1;
27 | */
28 |
29 | -- Delete duplicate entries
30 |
31 | DELETE
32 | FROM
33 | -- Add an alias to the id's we wish to keep
34 | cleaned_data.countries AS clean_data_1
35 | USING
36 | -- Add an alias to the duplicate id's
37 | cleaned_data.countries AS clean_data_2
38 | WHERE
39 | -- This statement will remove the greater value id's.
40 | clean_data_1.country_id > clean_data_2.country_id
41 | AND
42 | -- This statement ensures that both values are identical.
43 | clean_data_1.country_code_2 = clean_data_2.country_code_2;
44 |
--------------------------------------------------------------------------------
/source_data/scripts/start_fresh.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Although these commands can be done in a simple one-liner, I've made
4 | # the individual commands for new people that may not understand the syntax.
5 | #
6 | # The breakdown of the command are as follows.
7 | # psql: is a terminal-based front-end to PostgreSQL.
8 | # sql_coding_challenge: Name of our database.
9 | # -f: Checks if the file exists and is a regular file.
10 | # some_file.sql: The sql script to execute via psql
11 |
12 | psql sql_coding_challenge -f drop_all_tables.sql
13 | psql sql_coding_challenge -f build_tables.sql
14 | psql sql_coding_challenge -f normalize_tables.sql
15 | psql sql_coding_challenge -f remove_duplicates.sql
16 | psql sql_coding_challenge -f cleanup_db.sql
17 | psql sql_coding_challenge -f create_relationships.sql
18 | echo "Build Process Complete"
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_1_DOCKER.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
9 |
10 | ### Set Up
11 |
12 | This repository contains all of the necessary files, data and directories for running a PostgresSQL server in a Docker Container. The only prerequisite is that you should already have Docker Desktop installed and running on your computer.
13 |
14 | https://www.docker.com/products/docker-desktop/
15 |
16 | Once we have `Docker Desktop` installed and running, we must have Docker download an Image of Postgres and start running the Postgres container.
17 |
18 | Using your terminal (Powershell/Bash/ect...) and cd (changed directory) to where this project is located. To ensure you are in the correct directory, type this command into the terminal.
19 |
20 | `ls -al`
21 |
22 | Your results should look something similar to this.
23 |
24 | 
25 |
26 | As mentioned before, in the same directory as the `docker-compose.yaml` file, create an empty directory/file folder named '`db`' (Note the red arrow in image). This is where the PostgreSQL container will store internal data and keep your data persistent.
27 | * This directory's path is in the `.gitignore` file which is why it is not included in this repository.
28 |
29 | Once you are in the same directory as the `docker-compose.yaml` file, run this command to start the Docker container.
30 |
31 | `docker-compose up -d`
32 |
33 | * The `-d` portion of the command allows the container to run in the background.
34 | * If this is the first time running this command, this command will download the required Docker image and create a PostgreSQL container which may take a few minutes to complete. Once it is complete and running, your terminal should look something like this.
35 |
36 | 
37 |
38 | * After the initial install, the next time you run the `docker-compose up -d` command, the Docker container will fire right up as long as the image hasn't been deleted.
39 |
40 | If you are using [PGAdmin](https://www.pgadmin.org/), [DBeaver](https://dbeaver.io/) or any other Database management tool, create a connection to the now running PostgreSQL server.
41 | * Host: `localhost`
42 | * Database Name: `sql_coding_challenge`
43 | * Username: `postgres`
44 | * Password: `postgres`
45 |
46 | Alternatively, if you are using [psql](https://gist.github.com/Kartones/dd3ff5ec5ea238d4c546), use these commands in your terminal to connect to the database:
47 | * `docker exec -ti sql_coding_challenge psql -U postgres`
48 | * `\c sql_coding_challenge`
49 |
50 | To stop the Docker container:
51 | * Using the command line, cd into the same directory as the `docker-compose.yaml` file and run the command...
52 | * `docker-compose down`
53 | * You can also use the `Docker Desktop GUI` to stop the container.
54 |
55 | The `docker-compose down` command will stop the container until the next time you start it back up using the command:
56 | * `docker-compose up -d`
57 |
58 | Now that we have our PostgreSQL server up and running, click the link below and let's start creating tables and inserting data.
59 |
60 | Go to [WALKTHROUGH_BUILD](WALKTHROUGH_2_BUILD.md)
61 |
62 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
63 |
64 |
65 |
66 |
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_2_BUILD.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
9 |
10 | ### ETL (Extract/Transform/Load)
11 |
12 | ETL is the process data engineers use to extract data from different sources, transform the data into a usable resource, and load that data into the systems that end-users can access and use to solve business problems.
13 | * Although this term is used on a much larger scale for data warehousing, on a much, much smaller scale, we are doing the same thing by:
14 | * Extracting raw data from CSV files.
15 | * Transforming the data to make it usable.
16 | * Loading the data into tables within our database
17 |
18 | Also known as `Data Wrangling`.
19 |
20 | In Data Analysis, the analyst must ensure that the data is 'clean' before doing any analysis. 'Dirty' data can lead to unreliable, inaccurate and/or misleading results.
21 |
22 | * Garbage in = garbage out.
23 |
24 | These are some of the steps that can be taken to properly prepare this dataset for analysis.
25 |
26 | * Check for duplicate entries and remove them.
27 | * Remove extra spaces and/or other invalid characters.
28 | * Separate or combine values as needed.
29 | * Correct incorrect spelling or inputted data.
30 | * Check for null or empty values.
31 |
32 | Let's create a new file in [source_data/scripts/](../source_data/scripts/) and name it `build_tables.sql`.
33 |
34 | Before we can transform the data, we must get the data into the database. Let's create a script that will consistantly build and rebuild our schema and tables. We will use the `countries.csv` to create the countries table.
35 |
36 | ```sql
37 | -- We must insure that our data is properly organized. Let's create a schema
38 | -- specificaly for importing/copying data from our CSV files.
39 | CREATE SCHEMA IF NOT EXISTS import_data;
40 | -- Create import_data.countries table
41 | -- Drop this table everytime this script is run to ensure repeatability.
42 | DROP TABLE IF EXISTS import_data.countries;
43 | -- Create a new table and add a country_id to use as a primary key.
44 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY.
45 | -- We will also create a created_on field that will have the date that the table was created.
46 | CREATE TABLE import_data.countries (
47 | -- Create an auto-incrementing, unique key for every row.
48 | -- This will also be used as our Primary Key.
49 | country_id INT GENERATED ALWAYS AS IDENTITY,
50 | country_name TEXT,
51 | country_code_2 TEXT,
52 | country_code_3 TEXT,
53 | region TEXT,
54 | sub_region TEXT,
55 | intermediate_region TEXT,
56 | created_on date,
57 | PRIMARY KEY (country_id)
58 | );
59 | -- We will use the COPY statment to extract the data from our CSV files.
60 | COPY import_data.countries (
61 | country_name,
62 | country_code_2,
63 | country_code_3,
64 | region,
65 | sub_region,
66 | intermediate_region
67 | )
68 | -- PostgreSQL stores its data in /var/lib/postgresql/
69 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access.
70 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv'
71 | -- The CSV files are comma separated and include headers.
72 | WITH DELIMITER ',' HEADER CSV;
73 | ```
74 | Execute the complete script and SUCCESS! We have inserted the `countries.csv` into the `import_data.countries` table.
75 |
76 | Let's test our new table and take a look at the first 5 rows of data.
77 |
78 | ```sql
79 | SELECT * FROM import_data.countries LIMIT 5;
80 | ```
81 | **Results**
82 |
83 | country_id|country_name |country_code_2|country_code_3|region |sub_region |intermediate_region|created_on|
84 | ----------|----------------|--------------|--------------|--------|-----------------|-------------------|----------|
85 | 1|!afg!hANistan |af? |afg |asia |$southern asia | | |
86 | 2| alba$nia |aL |alb |europe! |southern *europe | | |
87 | 3|Alb?ania |AL |alb |eur#ope |$southern e#urope| | |
88 | 4|?algeria? |d!z |dza |africa |northern africa | | |
89 | 5|americ#an samoa?|as |as!m |0oceania|polyne$sia | | |
90 |
91 | As you can see, some countries do not have an `immediate_region` field and none of our entries have a `created_on` value yet. Also notice that some of the country names have special characters, upper/lower case characters and whitespace.
92 |
93 | This will definitely cause some problems if not corrected, however, right now our only concern is to get the data into our database.
94 |
95 | Let's use the same process to add all of the CSV files in `source_data/csv_data/`
96 | * cities.csv
97 | * currencies.csv
98 | * languages.csv
99 |
100 | After adding all the necessary code, our `build_tables.sql` file should now look like this...
101 |
102 | ```sql
103 | /*
104 | SQL Code Challenge
105 | Author: Jaime M. Shaker
106 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
107 | Website: https://www.shaker.dev
108 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
109 |
110 | File Name: build_tables.sql
111 | */
112 |
113 | -- We must insure that our data is properly organized. Let's create a schema
114 | -- specificaly for importing/copying data from our CSV files.
115 | CREATE SCHEMA IF NOT EXISTS import_data;
116 | -- Create import_data.countries table
117 | -- Drop this table everytime this script is run to ensure repeatability.
118 | DROP TABLE IF EXISTS import_data.countries;
119 | -- Create a new table and add a country_id to use as a primary key.
120 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY.
121 | -- We will also create a created_on field that will have the date that the table was created.
122 | CREATE TABLE import_data.countries (
123 | -- Create an auto-incrementing, unique key for every row.
124 | -- This will also be used as our Primary Key.
125 | country_id INT GENERATED ALWAYS AS IDENTITY,
126 | country_name TEXT,
127 | country_code_2 TEXT,
128 | country_code_3 TEXT,
129 | region TEXT,
130 | sub_region TEXT,
131 | intermediate_region TEXT,
132 | created_on date,
133 | PRIMARY KEY (country_id)
134 | );
135 | -- We will use the COPY statment to extract the data from our CSV files.
136 | COPY import_data.countries (
137 | country_name,
138 | country_code_2,
139 | country_code_3,
140 | region,
141 | sub_region,
142 | intermediate_region
143 | )
144 | -- PostgreSQL stores its data in /var/lib/postgresql/
145 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access.
146 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv'
147 | -- The CSV files are comma separated and include headers.
148 | WITH DELIMITER ',' HEADER CSV;
149 |
150 | -- Using the same process, lets create tables for all of our csv files.
151 | /* Create import.cities table */
152 | DROP TABLE IF EXISTS import_data.cities;
153 | CREATE TABLE import_data.cities (
154 | city_id INT GENERATED ALWAYS AS IDENTITY,
155 | city_name TEXT,
156 | latitude TEXT,
157 | longitude TEXT,
158 | country_code_2 TEXT,
159 | capital TEXT,
160 | population TEXT,
161 | insert_date TEXT,
162 | PRIMARY KEY (city_id)
163 | );
164 |
165 | COPY import_data.cities (
166 | city_name,
167 | latitude,
168 | longitude,
169 | country_code_2,
170 | capital,
171 | population,
172 | insert_date
173 | )
174 | FROM '/var/lib/postgresql/source_data/csv_data/cities.csv'
175 | WITH DELIMITER ',' HEADER CSV;
176 |
177 | /* Create import.currencies table */
178 | DROP TABLE IF EXISTS import_data.currencies;
179 | CREATE TABLE import_data.currencies (
180 | currency_id INT GENERATED ALWAYS AS IDENTITY,
181 | country_code_2 TEXT,
182 | currency_name TEXT,
183 | currency_code TEXT,
184 | PRIMARY KEY (currency_id)
185 | );
186 |
187 | COPY import_data.currencies (
188 | country_code_2,
189 | currency_name,
190 | currency_code
191 | )
192 | FROM '/var/lib/postgresql/source_data/csv_data/currencies.csv'
193 | WITH DELIMITER ',' HEADER CSV;
194 |
195 | /* Create import.languages table */
196 | DROP TABLE IF EXISTS import_data.languages;
197 | CREATE TABLE import_data.languages (
198 | language_id INT GENERATED ALWAYS AS IDENTITY,
199 | language TEXT,
200 | country_code_2 TEXT,
201 | PRIMARY KEY (language_id)
202 | );
203 |
204 | COPY import_data.languages (
205 | language,
206 | country_code_2
207 | )
208 | FROM '/var/lib/postgresql/source_data/csv_data/languages.csv'
209 | WITH DELIMITER ',' HEADER CSV;
210 | ```
211 | Execute the complete script and our data has now been successfully extracted from the CSV files and inserted/copied into our database. This script can be repeated over and over and it will always give the same results.
212 |
213 | We can now TRANSFORM our data into a usable resource.
214 |
215 | Go to [WALKTHROUGH_NORMALIZE](WALKTHROUGH_3_NORMALIZE.md)
216 |
217 |
218 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
219 |
220 |
221 |
222 |
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_3_NORMALIZE.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
9 |
10 | ### Normalization
11 |
12 | According to Wikipedia, [Database normalization](https://en.wikipedia.org/wiki/Database_normalization) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints.
13 |
14 | Database normalization forms can be quite involved and go beyond the scope of this walkthrough. In this section, we will restructure our tables with the proper constraints, relationships and "clean" the data as we insert it into our new tables.
15 |
16 | To keep things organised, we will created our new tables with the `clean_data ` schema.
17 |
18 | Create a new file in the `source_data/scripts` named `normalize_tables.sql`. Upon inspecting the import.countries table, we can see that there are duplicate entries, mixed upper/lower case characters, added white space and special characters.
19 |
20 | ```sql
21 | SELECT
22 | country_name,
23 | country_code_2
24 | FROM
25 | import_data.countries
26 | LIMIT 5;
27 | ```
28 |
29 | **Results:**
30 |
31 | country_name |country_code_2|
32 | ----------------|--------------|
33 | !afg!hANistan |af? |
34 | alba$nia |aL |
35 | Alb?ania |AL |
36 | ?algeria? |d!z |
37 | americ#an samoa?|as |
38 |
39 | Before we can insert our data into our new table. We must ensure that the data has been 'cleaned'. During the insert, lets...
40 | * Remove extra spaces (whitespace)
41 | * Remove special characters.
42 | * Convert all characters to lower case.
43 |
44 | ```sql
45 | /*
46 | SQL Code Challenge
47 | Author: Jaime M. Shaker
48 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
49 | Website: https://www.shaker.dev
50 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
51 |
52 | File Name: normalize_tables.sql
53 | */
54 |
55 | -- We must insure that our data is properly organized. Let's create a schema
56 | -- specificaly for our transformed, clean data.
57 | CREATE SCHEMA IF NOT EXISTS cleaned_data;
58 | -- Create countries table
59 | -- Drop this table everytime this script is run to ensure repeatability.
60 | -- We are adding CASCADE for when we add foreign key relationships.
61 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE;
62 | -- We must presuppose that the data in it's current state is unusable/unreliable.
63 | --
64 | CREATE TABLE cleaned_data.countries (
65 | -- The country_id was automatically incremented and can be entered as an integer.
66 | country_id INT NOT NULL,
67 | -- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text).
68 | -- Use text to ensure that there are no errors due to string length.
69 | country_name TEXT NOT NULL,
70 | -- The data appears to have duplicate entries so we will remove them once the data has been cleaned of
71 | -- any unwanted characters.
72 | country_code_2 varchar(2) NOT NULL,
73 | country_code_3 varchar(3) NOT NULL,
74 | region TEXT,
75 | sub_region TEXT,
76 | intermediate_region TEXT,
77 | created_on date,
78 | PRIMARY KEY (country_id)
79 | );
80 |
81 | INSERT INTO cleaned_data.countries (
82 | country_id,
83 | country_name,
84 | country_code_2,
85 | country_code_3,
86 | region,
87 | sub_region,
88 | intermediate_region,
89 | created_on
90 | )
91 | (
92 | SELECT
93 | i.country_id,
94 | -- regex_replace() function removes any special characters with a simple regex-expression.
95 | -- trim() function removes white space from either end of the string.
96 | -- lower() function converts all characters to lowercase.
97 | trim(lower(regexp_replace(i.country_name, '[^\w\s^.]', '', 'gi'))),
98 | -- Properly cast type from TEXT into new table
99 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
100 | trim(lower(regexp_replace(i.country_code_3, '[^\w\s^.]', '', 'gi')))::varchar,
101 | trim(lower(regexp_replace(i.region, '[^\w\s^.]', '', 'gi'))),
102 | trim(lower(regexp_replace(i.sub_region, '[^\w\s^.]', '', 'gi'))),
103 | trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^.]', '', 'gi'))),
104 | -- Use the built-in function current_date to insert current date into created_on field.
105 | current_date
106 | FROM
107 | import_data.countries AS i
108 | );
109 | ```
110 | Now we can run the same query in our new `cleaned_data.countries` table and see our updated results.
111 |
112 | ```sql
113 | SELECT
114 | country_name,
115 | country_code_2
116 | FROM
117 | cleaned_data.countries
118 | LIMIT 5;
119 | ```
120 |
121 | **Results:**
122 |
123 | country_name |country_code_2|
124 | --------------|--------------|
125 | afghanistan |af |
126 | albania |al |
127 | albania |al |
128 | algeria |dz |
129 | american samoa|as |
130 |
131 | The data has been 'cleaned' and is now in a more usable form. We still note that there are duplicate entries. Before removing any duplicates, let's continue cleaning the data with our `normalize_table.sql` script and insert the data into `cleaned_data.tables`.
132 |
133 | ```sql
134 | /*
135 | SQL Code Challenge
136 | Author: Jaime M. Shaker
137 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
138 | Website: https://www.shaker.dev
139 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
140 |
141 | File Name: normalize_tables.sql
142 | */
143 |
144 | -- We must insure that our data is properly organized. Let's create a schema
145 | -- specificaly for our transformed, clean data.
146 | CREATE SCHEMA IF NOT EXISTS cleaned_data;
147 | -- Create countries table
148 | -- Drop this table everytime this script is run to ensure repeatability.
149 | -- We are adding CASCADE for when we add foreign key relationships.
150 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE;
151 | -- We must presuppose that the data in it's current state is unusable/unreliable.
152 | --
153 | CREATE TABLE cleaned_data.countries (
154 | -- The country_id was automatically incremented and can be entered as an integer.
155 | country_id INT NOT NULL,
156 | -- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text).
157 | -- Use text to ensure that there are no errors due to string length.
158 | country_name TEXT NOT NULL,
159 | -- The data appears to have duplicate entries so we will remove them once the data has been cleaned of
160 | -- any unwanted characters.
161 | country_code_2 varchar(2) NOT NULL,
162 | country_code_3 varchar(3) NOT NULL,
163 | region TEXT,
164 | sub_region TEXT,
165 | intermediate_region TEXT,
166 | created_on date,
167 | PRIMARY KEY (country_id)
168 | );
169 |
170 | INSERT INTO cleaned_data.countries (
171 | country_id,
172 | country_name,
173 | country_code_2,
174 | country_code_3,
175 | region,
176 | sub_region,
177 | intermediate_region,
178 | created_on
179 | )
180 | (
181 | SELECT
182 | i.country_id,
183 | -- regex_replace() function removes any special characters with a simple regex-expression.
184 | -- trim() function removes white space from either end of the string.
185 | -- lower() function converts all characters to lowercase.
186 | trim(lower(regexp_replace(i.country_name, '[^\w\s^.]', '', 'gi'))),
187 | -- Properly cast type from TEXT into new table
188 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
189 | trim(lower(regexp_replace(i.country_code_3, '[^\w\s^.]', '', 'gi')))::varchar,
190 | trim(lower(regexp_replace(i.region, '[^\w\s^.]', '', 'gi'))),
191 | trim(lower(regexp_replace(i.sub_region, '[^\w\s^.]', '', 'gi'))),
192 | trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^.]', '', 'gi'))),
193 | -- Use the built-in function current_date to insert current date into created_on field.
194 | current_date
195 | FROM
196 | import_data.countries AS i
197 | );
198 |
199 | /* Create cleaned_data.cities table */
200 | DROP TABLE IF EXISTS cleaned_data.cities;
201 | CREATE TABLE cleaned_data.cities (
202 | city_id int,
203 | city_name TEXT,
204 | latitude float,
205 | longitude float,
206 | country_code_2 varchar(2) NOT NULL,
207 | capital boolean,
208 | population int,
209 | insert_date date,
210 | PRIMARY KEY (city_id)
211 | );
212 |
213 | INSERT INTO cleaned_data.cities (
214 | city_id,
215 | city_name,
216 | latitude,
217 | longitude,
218 | country_code_2,
219 | capital,
220 | population,
221 | insert_date
222 | )
223 | (
224 | SELECT
225 | i.city_id,
226 | trim(lower(regexp_replace(i.city_name, '[^\w\s^.]', '', 'gi'))),
227 | i.longitude::float,
228 | i.latitude::float,
229 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
230 | i.capital::boolean,
231 | i.population::int,
232 | i.insert_date::date
233 | FROM
234 | import_data.cities AS i
235 | );
236 |
237 | /* Create cleaned_data.currencies table */
238 | DROP TABLE IF EXISTS cleaned_data.currencies;
239 | CREATE TABLE cleaned_data.currencies (
240 | currency_id int,
241 | country_code_2 varchar(2) NOT NULL,
242 | currency_name TEXT,
243 | currency_code TEXT,
244 | PRIMARY KEY (currency_id)
245 | );
246 |
247 | INSERT INTO cleaned_data.currencies (
248 | currency_id,
249 | country_code_2,
250 | currency_name,
251 | currency_code
252 | )
253 | (
254 | SELECT
255 | currency_id,
256 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
257 | trim(lower(regexp_replace(i.currency_name, '[^\w\s^.]', '', 'gi'))),
258 | trim(lower(regexp_replace(i.currency_code, '[^\w\s^.]', '', 'gi')))
259 | FROM
260 | import_data.currencies AS i
261 | );
262 |
263 | /* Create cleaned_data.languages table */
264 | DROP TABLE IF EXISTS cleaned_data.languages;
265 | CREATE TABLE cleaned_data.languages (
266 | language_id int,
267 | language TEXT,
268 | country_code_2 varchar(2) NOT NULL,
269 | PRIMARY KEY (language_id)
270 | );
271 |
272 | INSERT INTO cleaned_data.languages (
273 | language_id,
274 | language,
275 | country_code_2
276 | )
277 | (
278 | SELECT
279 | language_id,
280 | trim(lower(regexp_replace(i.language, '[^\w\s^.]', '', 'gi'))),
281 | trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar
282 | FROM
283 | import_data.languages AS i
284 | );
285 |
286 | ```
287 |
288 | Click the link to view the complete [normalize_table.sql](../source_data/scripts/normalize_tables.sql)
289 |
290 | Once all of your tables loaded with our cleaned data, click the link below
291 |
292 | Go to [WALKTHROUGH_REMOVE_DUPLICATES](WALKTHROUGH_4_REMOVE_DUPLICATES.md)
293 |
294 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
295 |
296 |
297 |
298 |
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_4_REMOVE_DUPLICATES.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Analytics Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 |
9 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
10 |
11 | ### Removing Duplicates
12 |
13 | Removing duplicate entries is part of the Normalization process. I've placed this process in its own file to reduce clutter in the `normalize_tables.sql` script.
14 |
15 | Create a new file in `source_data/scripts/` and name it `remove_duplicates.sql`. In this script, we remove duplicate entries in our tables.
16 |
17 | After inspecting our data, we see that every country must have a unique `country_code_2`.
18 |
19 | Let's create a query that checks for multiple entries of `country_code_2` in our `clean_data.countries` table.
20 |
21 | ```sql
22 | -- Remove Duplicate Entries
23 | -- Check for duplicate entries
24 | SELECT
25 | -- Get the columns.
26 | country_code_2,
27 | -- Count how many times the country_code_2 occurs.
28 | count(*)
29 | FROM
30 | cleaned_data.countries
31 | GROUP BY
32 | -- Using an aggregate function forces us to group all exact country_codes together.
33 | country_code_2
34 | HAVING
35 | -- Only select values that have a count greater than one (multiple entries).
36 | count(*) > 1;
37 | ```
38 |
39 | **Results:**
40 |
41 | country_code_2|count|
42 | --------------|-----|
43 | cn | 3|
44 | al | 2|
45 | sa | 2|
46 | ie | 2|
47 | ly | 2|
48 | mo | 2|
49 | re | 2|
50 | ph | 2|
51 | zw | 2|
52 | az | 2|
53 | kw | 2|
54 | cy | 3|
55 | rs | 2|
56 | mh | 2|
57 |
58 | According to our results, there are 14 `country_codes_2` with multiple entries. Let's create another query that `deletes` all of the multiple entries.
59 |
60 | ```sql
61 | -- Delete duplicate entries
62 | DELETE
63 | FROM
64 | -- Add an alias to the id's we wish to keep
65 | cleaned_data.countries AS clean_data_1
66 | USING
67 | -- Add an alias to the duplicate id's
68 | cleaned_data.countries AS clean_data_2
69 | WHERE
70 | -- This statement will remove the greater value id's.
71 | clean_data_1.country_id > clean_data_2.country_id
72 | AND
73 | -- This statement ensures that both values are identical.
74 | clean_data_1.country_code_2 = clean_data_2.country_code_2;
75 | ```
76 |
77 | Now, if we were to rerun our first query which checked for duplicate entries, it should not return anything.
78 |
79 | ```sql
80 | -- Remove Duplicate Entries
81 | -- Check for duplicate entries
82 | SELECT
83 | -- Get the columns.
84 | country_code_2,
85 | -- Count how many times the country_code_2 occurs.
86 | count(*)
87 | FROM
88 | cleaned_data.countries
89 | GROUP BY
90 | -- Using an aggregate function forces us to group all like names together.
91 | country_code_2
92 | HAVING
93 | -- Only select values that have a count greater than one (multiple entries).
94 | count(*) > 1;
95 | ```
96 | **Results:**
97 |
98 | country_code_2|count|
99 | --------------|-----|
100 |
101 | There are no results. We have successfully deleted multiple entries in the `cleaned_data.countries` table. We can also take a peek at our `clean_data.countries` table and check if it removed the duplicate entries. Initally we saw that `Albania` had multiple entries.
102 |
103 | ```sql
104 | SELECT
105 | country_name,
106 | country_code_2
107 | FROM
108 | cleaned_data.countries
109 | LIMIT 5;
110 | ```
111 |
112 | **Results:**
113 | country_name |country_code_2|
114 | --------------|--------------|
115 | afghanistan |af |
116 | albania |al |
117 | algeria |dz |
118 | american samoa|as |
119 | andorra |ad |
120 |
121 | The multiple entries for `Albania` have been deleted.
122 |
123 | This query works for the `clean_data.countries` table because every country has one unique `country_code_2`. The problem with using this method with the other tables is that many different countries share cities of the same name, currency type and languages. The other tables would have to seach for a combination of the table fields. To reduce clutter, the only table that has duplicate entries is the `clean_data.countries` table.
124 |
125 | If for whatever reason you needed to start over, you can easily rerun the `build_tables.sql` and then the `normalize_table.sql` scripts and regenerate the data consistently.
126 |
127 | The complete `remove_duplicates.sql` script should look like...
128 |
129 | ```sql
130 | /*
131 | SQL Code Challenge
132 | Author: Jaime M. Shaker
133 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
134 | Website: https://www.shaker.dev
135 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
136 |
137 | File Name: remove_duplicates.sql
138 | */
139 |
140 | -- Remove Duplicate Entries
141 | -- Check for duplicate entries in clean_data.countries
142 | SELECT
143 | -- Get the columns.
144 | country_code_2,
145 | -- Count how many times the country_code_2 occurs.
146 | count(*)
147 | FROM
148 | cleaned_data.countries
149 | GROUP BY
150 | -- Using an aggregate function forces us to group all like names together.
151 | country_code_2
152 | HAVING
153 | -- Only select values that have a count greater than one (multiple entries).
154 | count(*) > 1;
155 |
156 | -- Delete duplicate entries
157 |
158 | DELETE
159 | FROM
160 | -- Add an alias to the id's we wish to keep
161 | cleaned_data.countries AS clean_data_1
162 | USING
163 | -- Add an alias to the duplicate id's
164 | cleaned_data.countries AS clean_data_2
165 | WHERE
166 | -- This statement will remove the greater value id's.
167 | clean_data_1.country_id > clean_data_2.country_id
168 | AND
169 | -- This statement ensures that both values are identical.
170 | clean_data_1.country_code_2 = clean_data_2.country_code_2;
171 | ```
172 |
173 | With our table data cleaned and free of duplicate entries, we can clean-up our database before progressing..
174 |
175 | Go to [WALKTHROUGH_CLEANUP](WALKTHROUGH_5_CLEANUP.md)
176 |
177 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
178 |
179 |
180 |
181 |
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_5_CLEANUP.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Analytics Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 |
9 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
10 |
11 | ### Clean Up
12 |
13 | Now that are tables have been cleaned, we can actually begin answering the questions of the `SQL Coding Challenge`. Before we do that, let's do a little clean up and `Drop` our import schema and the tables associated with it as they are not needed anymore.
14 |
15 | If for whatever reason you needed to start over, you can easily rerun the `build_tables.sql` and then the `normalize_table.sql` scripts and regenerate the data consistently.
16 |
17 | `DROP TABLE` removes tables from the database. To empty a table of rows without destroying the table, use `DELETE` or `TRUNCATE`.
18 |
19 | The complete `cleanup_db.sql` script should look like...
20 |
21 | ```sql
22 | /*
23 | SQL Code Challenge
24 | Author: Jaime M. Shaker
25 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
26 | Website: https://www.shaker.dev
27 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
28 |
29 | File Name: cleanup_db.sql
30 | */
31 |
32 | -- Drop All import_data Tables and Schema
33 |
34 | DROP TABLE import_data.countries;
35 | DROP TABLE import_data.cities;
36 | DROP TABLE import_data.currencies;
37 | DROP TABLE import_data.languages;
38 | DROP SCHEMA import_data CASCADE;
39 | ```
40 |
41 | With our database cleaned of any unnecessary tables and schema, we can now add `Foreign Key` relationships.
42 |
43 | Go to [WALKTHROUGH_RELATIONSHIPS](WALKTHROUGH_6_RELATIONSHIPS.md)
44 |
45 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
46 |
47 |
48 |
49 |
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_6_RELATIONSHIPS.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
9 |
10 | ### Foreign Keys
11 |
12 | All of our tables in the `cleaned_data` schema share a common field named `country_code_2`. We also know that this field is `UNIQUE` in the `cleaned_data.countries` table because there can only be one country code per country.
13 |
14 | Using this information, we can make the `cleaned_data.countries` table the parent table and all other tables can have a [One-to-One](https://fmhelp.filemaker.com/help/18/fmp/en/index.html#page/FMP_Help/one-to-one-relationships.html) or a [One-to-Many](https://help.claris.com/archive/help/18/fmp/en/index.html#page/FMP_Help%2Fone-to-many-relationships.html%23) `FOREIGN KEY` relationship to it.
15 |
16 | First, create a new script and name it `create_relationships.sql`. Create a query to `ALTER` the `cleaned_data.countries` table and add the `UNIQUE` constraint to the `country_code_2` field. Then we can `ALTER` the child tables and add a `FOREIGN KEY` constraint.
17 |
18 | ```sql
19 | /*
20 | SQL Code Challenge
21 | Author: Jaime M. Shaker
22 | Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
23 | Website: https://www.shaker.dev
24 | LinkedIn: https://www.linkedin.com/in/jaime-shaker/
25 |
26 | File Name: create_relationships.sql
27 | */
28 |
29 | -- Alter cleaned_data.countries and add the UNIQUE constraint to country_code_2
30 |
31 | ALTER TABLE
32 | cleaned_data.countries
33 | ADD CONSTRAINT
34 | unique_country_code_2
35 | UNIQUE (country_code_2);
36 |
37 | -- Alter all other tables and add a foreign key constraint and reference.
38 | -- Create Foreign Key relationship for cleaned_data.cities
39 | ALTER TABLE
40 | -- Table to be altered
41 | cleaned_data.cities
42 | ADD CONSTRAINT
43 | -- Give this constraint a name
44 | fk_country_city
45 | -- Which key in cleaned_data.cities is a foreign key
46 | FOREIGN KEY (country_code_2)
47 | -- Which key to reference from parent table
48 | REFERENCES cleaned_data.countries (country_code_2);
49 |
50 | -- Create Foreign Key relationship for cleaned_data.currencies
51 | ALTER TABLE
52 | cleaned_data.currencies
53 | ADD CONSTRAINT
54 | fk_country_currencies
55 | FOREIGN KEY (country_code_2)
56 | REFERENCES cleaned_data.countries (country_code_2);
57 |
58 | -- Create Foreign Key relationship for cleaned_data.languages
59 | ALTER TABLE
60 | cleaned_data.languages
61 | ADD CONSTRAINT
62 | fk_country_languages
63 | FOREIGN KEY (country_code_2)
64 | REFERENCES cleaned_data.countries (country_code_2);
65 |
66 | ```
67 |
68 | Once you execute the complete script and have created a `FOREIGN KEY` constraint for all of our other tables, your `Entity Relationship Diagram` should look something like this.
69 |
70 | 
71 |
72 | :warning:**Bonus**:warning:
73 | * Automate this mini ETL process using a [Bash shell script](WALKTHROUGH_BONUS_BASH.md)!
74 |
75 | We are now ready to complete the `SQL Coding Challenge`.
76 |
77 | click the link below
78 |
79 | Go to [WALKTHROUGH_CODE_CHALLENGE](WALKTHROUGH_7_CODE_CHALLENGE.md)
80 |
81 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
82 |
83 |
84 |
85 |
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_7_CODE_CHALLENGE.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
9 |
10 | ### SQL Code Challenge
11 |
12 | :warning: To start over with an empty database, run the the `drop_all_tables.sql` script. :warning:
13 |
14 | 1. Create Database, Tables and Relations.
15 | Using the CSV files located in `source_data/csv_data`, create your new SQL database and tables with the properly formatted data.
16 |
17 | * Add a numeric, auto-incrementing Primary Key to every table.
18 | * In the `countries` table, add the column `created_on` with the current date.
19 | * Create a one-to-one and one-to-many relationship with the countries table as the parent table.
20 |
21 | :exclamation: This question has already been completed with the ETL process. :exclamation:
22 |
23 | 2. List Regions and Country Count
24 | List all of the regions and the total number of countries in each region. Order by country count in descending order and capitalize the region name.
25 |
26 |
27 | Click to expand expected results!
28 |
29 | ##### Expected Results:
30 |
31 | region |country_count|
32 | ---------|-------------|
33 | Africa | 59|
34 | Americas | 57|
35 | Asia | 50|
36 | Europe | 48|
37 | Oceania | 26|
38 | Antartica| 1|
39 |
40 |
41 |
42 |
43 |
44 | Click to expand answer!
45 |
46 | ##### Answer
47 | ```sql
48 | SELECT
49 | -- initcap() capitalizes the first letter of every word in a string.
50 | initcap(region) AS region,
51 | count(*) AS country_count
52 | FROM
53 | cleaned_data.countries
54 | GROUP BY
55 | -- Aggregate functions 'count()' require you to group all column fields.
56 | region
57 | ORDER BY
58 | country_count DESC;
59 | ```
60 |
61 |
62 |
63 | 3. List Sub-Regions and City Count
64 | List all of the sub-regions and the total number of cities in each sub-region. Order by sub-region name alphabetically.
65 |
66 |
67 | Click to expand expected results!
68 |
69 | ##### Expected Results:
70 |
71 | sub_region |city_count|
72 | -------------------------------|----------|
73 | Australia And New Zealand | 329|
74 | Central Asia | 560|
75 | Eastern Asia | 3164|
76 | Eastern Europe | 2959|
77 | Latin America And The Caribbean| 7204|
78 | Melanesia | 60|
79 | Micronesia | 15|
80 | Northern Africa | 1152|
81 | Northern America | 5844|
82 | Northern Europe | 2025|
83 | Polynesia | 22|
84 | Southeastern Asia | 2627|
85 | Southern Asia | 6848|
86 | Southern Europe | 3238|
87 | Subsaharan Africa | 3223|
88 | Western Asia | 1400|
89 | Western Europe | 3952|
90 |
91 |
92 |
93 |
94 |
95 | Click to expand answer!
96 |
97 | ##### Answer
98 | ```sql
99 | SELECT
100 | initcap(t1.sub_region) AS sub_region,
101 | count(*) AS city_count
102 | FROM
103 | cleaned_data.countries AS t1
104 | JOIN
105 | cleaned_data.cities AS t2
106 | ON
107 | t1.country_code_2 = t2.country_code_2
108 | GROUP BY
109 | t1.sub_region
110 | ORDER BY
111 | t1.sub_region;
112 | ```
113 |
114 |
115 |
116 | 4. Specific Sub-Region and String Functions
117 | List all of the countries and the total number of cities in the Northern Europe sub-region. List the country names in uppercase and order the list by the length of the country name and alphabetically in ascending order.
118 |
119 |
120 | Click to expand expected results!
121 |
122 | ##### Expected Results:
123 |
124 | country_name |city_count|
125 | --------------|----------|
126 | JERSEY | 1|
127 | LATVIA | 39|
128 | NORWAY | 127|
129 | SWEDEN | 148|
130 | DENMARK | 75|
131 | ESTONIA | 20|
132 | FINLAND | 142|
133 | ICELAND | 12|
134 | IRELAND | 64|
135 | LITHUANIA | 61|
136 | ISLE OF MAN | 2|
137 | FAROE ISLANDS | 29|
138 | UNITED KINGDOM| 1305|
139 |
140 |
141 |
142 |
143 |
144 | Click to expand answer!
145 |
146 | ##### Answer
147 | ```sql
148 | SELECT
149 | -- upper() returns your string in uppercase.
150 | upper(t1.country_name) AS country_name,
151 | count(*) AS city_count
152 | FROM
153 | cleaned_data.countries AS t1
154 | JOIN
155 | cleaned_data.cities AS t2
156 | ON
157 | t1.country_code_2 = t2.country_code_2
158 | WHERE
159 | t1.sub_region = 'northern europe'
160 | GROUP BY
161 | t1.country_name
162 | ORDER BY
163 | -- length() returns the number or characters in a string including spaces.
164 | length(t1.country_name), t1.country_name;
165 | ```
166 |
167 |
168 |
169 |
170 | 5. List Specific Countries by Year
171 | List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021. Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities.
172 |
173 |
174 | Click to expand expected results!
175 |
176 | ##### Expected Results:
177 |
178 | country_name |city_count|
179 | ----------------------|----------|
180 | Andorra | 5|
181 | Albania | 11|
182 | Bosnia And Herzegovina| 15|
183 | Croatia | 22|
184 | North Macedonia | 28|
185 | Malta | 32|
186 | Serbia | 58|
187 | Slovenia | 74|
188 | Greece | 64|
189 | Portugal | 109|
190 | Spain | 302|
191 | San Marino | 2|
192 | Montenegro | 12|
193 | Italy | 542|
194 |
195 |
196 |
197 |
198 |
199 | Click to expand answer!
200 |
201 | ##### Answer
202 | ```sql
203 | SELECT
204 | initcap(t1.country_name) AS country_name,
205 | count(*) AS city_count
206 | FROM
207 | cleaned_data.countries AS t1
208 | JOIN
209 | cleaned_data.cities AS t2
210 | ON
211 | t1.country_code_2 = t2.country_code_2
212 | WHERE
213 | t1.sub_region = 'southern europe'
214 | AND
215 | -- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours.....
216 | EXTRACT('year' FROM t2.insert_date) = 2021
217 | GROUP BY
218 | t1.country_name
219 | ORDER BY
220 | -- substring() function returns a specific portion of a string.
221 | substring(t1.country_name,length(t1.country_name),1), city_count;
222 | ```
223 |
224 |
225 |
226 | 6. List Anti-Join
227 | List all of the countries in the region of Asia that did **NOT** have a city with an inserted date from June 2021 through Sept 2021.
228 |
229 |
230 | Click to expand expected results!
231 |
232 | ##### Expected Results:
233 |
234 | country_name |
235 | -----------------|
236 | Brunei Darussalam|
237 | Kuwait |
238 | Macao |
239 | Singapore |
240 |
241 |
242 |
243 |
244 |
245 | Click to expand answer!
246 |
247 | ##### Answer
248 | ```sql
249 | SELECT
250 | -- Distinct will only return unique values
251 | DISTINCT initcap(t1.country_name) AS country_name
252 | FROM
253 | cleaned_data.countries AS t1
254 | -- Left join will return all matching values from the left table (cleaned_data.countries) and
255 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where
256 | -- there are no matches in the left table.
257 | LEFT JOIN
258 | cleaned_data.cities AS t2
259 | ON
260 | t1.country_code_2 = t2.country_code_2
261 | AND
262 | t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01'
263 | WHERE
264 | t1.region = 'asia'
265 | -- Only return values that did NOT have a match with the countries table.
266 | AND
267 | t2.country_code_2 IS NULL;
268 | ```
269 |
270 |
271 |
272 | 7. Reversable Names
273 | List the country, city name, population and city name length for the city names that are [palindromes](https://en.wikipedia.org/wiki/Palindrome) in the Western Asia sub-region. Format the population with a thousands separator (1,000) and format the length of the city name in roman numerals. Order by the length of the city name in descending order and alphabetically in ascending order.
274 |
275 |
276 | Click to expand expected results!
277 |
278 | ##### Expected Results:
279 |
280 | country_name |city_name|population|roman_numeral_length|
281 | --------------------|---------|----------|--------------------|
282 | Yemen |Hajjah | 46,568 | VI |
283 | Syrian Arab Republic|Hamah | 696,863 | V |
284 | Turkey |Kavak | 21,692 | V |
285 | Turkey |Kinik | 29,803 | V |
286 | Turkey |Tut | 10,161 | III |
287 |
288 |
289 |
290 |
291 |
292 | Click to expand answer!
293 |
294 | ##### Answer
295 | ```sql
296 | SELECT
297 | initcap(t1.country_name) AS country_name,
298 | initcap(t2.city_name) AS city_name,
299 | -- to_char() takes non-string data types and returns them as strings.
300 | to_char(t2.population, '999G999') AS population,
301 | to_char(length(t2.city_name), 'RN') AS roman_numeral_length
302 | FROM
303 | cleaned_data.countries AS t1
304 | JOIN
305 | cleaned_data.cities AS t2
306 | ON
307 | t1.country_code_2 = t2.country_code_2
308 | WHERE
309 | t2.city_name = reverse(t2.city_name)
310 | AND
311 | t1.sub_region = 'western asia'
312 | ORDER BY
313 | length(t2.city_name) DESC, t2.city_name ASC;
314 | ```
315 |
316 |
317 |
318 | 8. Search with Wildcard and Case
319 | List all of the countries that end in 'stan'. Make your query case-insensitive and list whether the total population of the cities listed is an odd or even number for cities inserted in 2022. Order by whether the population value is odd or even in ascending order and country name in alphabetical order.
320 |
321 |
322 | Click to expand expected results!
323 |
324 | ##### Expected Results:
325 |
326 | country_name|total_population|odd_or_even|
327 | ------------|----------------|-----------|
328 | Afghanistan | 6,006,530 |Even |
329 | Kazakhstan | 4,298,264 |Even |
330 | Kyrgyzstan | 1,017,644 |Even |
331 | Pakistan | 26,344,480 |Even |
332 | Tajikistan | 2,720,953 |Odd |
333 | Turkmenistan| 419,607 |Odd |
334 | Uzbekistan | 3,035,547 |Odd |
335 |
336 |
337 |
338 |
339 |
340 | Click to expand answer!
341 |
342 | ##### Answer
343 | ```sql
344 | SELECT
345 | initcap(t1.country_name) AS country_name,
346 | to_char(sum(t2.population), '99G999G999') total_population,
347 | CASE
348 | WHEN (sum(t2.population) % 2) = 0
349 | THEN 'Even'
350 | ELSE
351 | 'Odd'
352 | END AS odd_or_even
353 | FROM
354 | cleaned_data.countries AS t1
355 | JOIN
356 | cleaned_data.cities AS t2
357 | ON
358 | t1.country_code_2 = t2.country_code_2
359 | WHERE
360 | t1.country_name ILIKE '%stan'
361 | AND
362 | EXTRACT('year' FROM t2.insert_date) = 2022
363 | GROUP BY
364 | t1.country_name
365 | ORDER BY
366 | odd_or_even, country_name;
367 | ```
368 |
369 |
370 |
371 | 9. Ranking Regions
372 | List the third most populated city ranked by region WITHOUT using limit or offset. List the region name, city name, population and order the results by region.
373 |
374 |
375 | Click to expand expected results!
376 |
377 | ##### Expected Results:
378 |
379 | region |city_name|third_largest_pop|
380 | --------|---------|-----------------|
381 | Africa |Kinshasa | 12,836,000 |
382 | Americas|New York | 18,972,871 |
383 | Asia |Delhi | 32,226,000 |
384 | Europe |Paris | 11,060,000 |
385 | Oceania |Brisbane | 2,360,241 |
386 |
387 |
388 |
389 |
390 |
391 | Click to expand answer!
392 |
393 | ##### Answer
394 | ```sql
395 | WITH get_city_rank_cte AS (
396 | SELECT
397 | t1.region,
398 | t2.city_name,
399 | t2.population AS third_largest_pop,
400 | DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk
401 | FROM
402 | cleaned_data.countries AS t1
403 | JOIN
404 | cleaned_data.cities AS t2
405 | ON
406 | t1.country_code_2 = t2.country_code_2
407 | WHERE
408 | t2.population IS NOT NULL
409 | GROUP BY
410 | t1.region,
411 | t2.city_name,
412 | t2.population
413 | )
414 | SELECT
415 | initcap(region) AS region,
416 | initcap(city_name) AS city_name,
417 | to_char(third_largest_pop, '99G999G999') AS third_largest_pop
418 | FROM
419 | get_city_rank_cte
420 | WHERE
421 | rnk = 3;
422 | ```
423 |
424 |
425 |
426 | 10. Using Buckets
427 | List the bottom third of all countries in the Western Asia sub-region that speak Arabic. Include the row number and country name. Order by row number.
428 |
429 |
430 | Click to expand expected results!
431 |
432 | ##### Expected Results:
433 |
434 | row_id|country_name |
435 | ------|--------------------|
436 | 9|saudi arabia |
437 | 10|syrian arab republic|
438 | 11|united arab emirates|
439 | 12|yemen |
440 |
441 |
442 |
443 |
444 |
445 | Click to expand answer!
446 |
447 | ##### Answer
448 | ```sql
449 | WITH get_ntile_cte AS (
450 | SELECT
451 | ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn,
452 | t1.country_name,
453 | -- ntile() window functions returns groups of data section into 'buckets'.
454 | NTILE(3) OVER (ORDER BY t1.country_name) AS nt
455 | FROM
456 | cleaned_data.countries AS t1
457 | JOIN
458 | cleaned_data.languages AS t2
459 | ON
460 | t1.country_code_2 = t2.country_code_2
461 | WHERE
462 | t1.sub_region = 'western asia'
463 | AND
464 | t2.language = 'arabic'
465 | )
466 | SELECT
467 | rn AS row_id,
468 | country_name
469 | FROM
470 | get_ntile_cte
471 | WHERE
472 | nt = 3;
473 | ```
474 |
475 |
476 |
477 | 11. Using Arrays
478 | Create a query that lists country name, capital name, population, languages spoken and currency name for countries in the Northen Africa sub-region. There can be multiple currency names and languages spoken per country. Add multiple values for the same field into an array.
479 |
480 |
481 | Click to expand expected results!
482 |
483 | ##### Expected Results:
484 |
485 | country_name|city_name|population|languages |currencies |
486 | ------------|---------|----------|--------------------------------------------|---------------|
487 | algeria |algiers | 3415811|{french,arabic,kabyle} |algerian dinar |
488 | egypt |cairo | 20296000|{arabic} |egyptian pound |
489 | libya |tripoli | 1293016|{arabic} |libyan dinar |
490 | morocco |rabat | 572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham|
491 | sudan |khartoum | 7869000|{arabic,english} |sudanese pound |
492 | tunisia |tunis | 1056247|{french,arabic} |tunisian dinar |
493 |
494 |
495 |
496 |
497 |
498 | Click to expand answer!
499 |
500 | ##### Answer
501 | ```sql
502 | WITH get_row_values AS (
503 | SELECT
504 | t1.country_name,
505 | t2.city_name,
506 | t2.population,
507 | -- array_agg() aggregates multiple values and returns them in 'array' format.
508 | array_agg(t3.LANGUAGE) AS languages,
509 | t4.currency_name AS currencies
510 | FROM
511 | cleaned_data.countries AS t1
512 | JOIN
513 | cleaned_data.cities AS t2
514 | ON
515 | t1.country_code_2 = t2.country_code_2
516 | JOIN
517 | cleaned_data.languages AS t3
518 | ON
519 | t1.country_code_2 = t3.country_code_2
520 | JOIN
521 | cleaned_data.currencies AS t4
522 | ON
523 | t1.country_code_2 = t4.country_code_2
524 | WHERE
525 | t1.sub_region = 'northern africa'
526 | AND
527 | t2.capital = TRUE
528 | GROUP BY
529 | t1.country_name,
530 | t2.city_name,
531 | t2.population,
532 | t4.currency_name
533 | )
534 | SELECT
535 | *
536 | FROM
537 | get_row_values;
538 | ```
539 |
540 |
541 |
542 | 12. Using Case and Percentages
543 | Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022. List how many vowels and consonants are present in the city name and concatnate their percentage to the their respective count in parenthesis.
544 |
545 |
546 | Click to expand expected results!
547 |
548 | ##### Expected Results:
549 |
550 | city_name |vowel_count_perc|consonants_count_perc|
551 | ---------------|----------------|---------------------|
552 | standish |2 (25.00%) |6 (75%) |
553 | grand forks |2 (18.18%) |9 (81.82%) |
554 | camano |3 (50.00%) |3 (50%) |
555 | cedar hills |3 (27.27%) |8 (72.73%) |
556 | gladstone |3 (33.33%) |6 (66.67%) |
557 | whitehall |3 (33.33%) |6 (66.67%) |
558 | homewood |4 (50.00%) |4 (50%) |
559 | willowbrook |4 (36.36%) |7 (63.64%) |
560 | port salerno |4 (33.33%) |8 (66.67%) |
561 | vadnais heights|5 (33.33%) |10 (66.67%) |
562 | jeffersonville |5 (35.71%) |9 (64.29%) |
563 |
564 |
565 |
566 |
567 |
568 | Click to expand answer!
569 |
570 | ##### Answer
571 | ```sql
572 | WITH get_letter_count AS (
573 | SELECT
574 | t1.city_name,
575 | length(t1.city_name) string_length,
576 | -- regexp_replace() returns a vlue that has been manipulated by a regular expression.
577 | length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count
578 | FROM
579 | cleaned_data.cities AS t1
580 | WHERE
581 | t1.insert_date = '2022-04-28'
582 | AND
583 | t1.country_code_2 in ('us')
584 | ),
585 | get_letter_diff AS (
586 | SELECT
587 | city_name,
588 | string_length - consonant_count AS vowels,
589 | round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc,
590 | consonant_count AS consonants,
591 | round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc
592 | FROM
593 | get_letter_count
594 | )
595 | SELECT
596 | city_name,
597 | vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc,
598 | consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc
599 | FROM
600 | get_letter_diff
601 | ORDER BY
602 | vowels;
603 | ```
604 |
605 |
606 |
607 | 13. Most Consecutive Days
608 | List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted in April 2022.
609 |
610 |
611 | Click to expand expected results!
612 |
613 | ##### Expected Results:
614 |
615 | most_consecutive_dates|city_name |
616 | ----------------------|------------|
617 | 2022-04-22|South Dundas|
618 | 2022-04-23|La Prairie |
619 | 2022-04-24|Elliot Lake |
620 | 2022-04-25|Lachute |
621 |
622 |
623 |
624 |
625 |
626 | Click to expand answer!
627 |
628 | ##### Answer
629 | ```sql
630 | DROP TABLE IF EXISTS get_dates;
631 | CREATE TEMP TABLE get_dates AS (
632 | SELECT
633 | DISTINCT ON (insert_date) insert_date AS insert_date,
634 | city_name
635 | FROM
636 | cleaned_data.cities
637 | WHERE
638 | country_code_2 = 'ca'
639 | AND
640 | insert_date BETWEEN '2022-04-01' AND '2022-04-30'
641 | ORDER BY
642 | insert_date
643 | );
644 |
645 | DROP TABLE IF EXISTS get_diff;
646 | CREATE TEMP TABLE get_diff AS (
647 | SELECT
648 | city_name,
649 | insert_date,
650 | EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff
651 | FROM
652 | get_dates
653 | );
654 |
655 |
656 | DROP TABLE IF EXISTS get_diff_count;
657 | CREATE TEMP TABLE get_diff_count AS (
658 | SELECT
659 | city_name,
660 | insert_date,
661 | count(*) OVER (PARTITION BY diff) AS diff_count
662 | FROM
663 | get_diff
664 | );
665 |
666 |
667 | WITH get_rank AS (
668 | SELECT
669 | DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk,
670 | insert_date,
671 | city_name
672 | FROM
673 | get_diff_count
674 | )
675 | SELECT
676 | insert_date AS most_consecutive_dates,
677 | initcap(city_name) AS city_name
678 | FROM
679 | get_rank
680 | WHERE
681 | rnk = 1
682 | ORDER BY
683 | insert_date;
684 | ```
685 |
686 |
687 |
688 | 14. Month over Month in View
689 | Create a view that lists the month-year, the number of cities inserted for that month, a running city count total and the month over month percentage growth for 2021.
690 |
691 | Format the cities count and the running total with the thousands separator and format the month over month growth with a plus symbol and percentage symbol
692 |
693 | Example:
694 | month_year|cities_inserted|running_total|month_over_month|
695 | ----------|---------------|-------------|----------------|
696 | Feb-2021 | 1,291 | 2,762 |+87.76% |
697 |
698 |
699 | Click to expand expected results!
700 |
701 | ##### Expected Results:
702 |
703 | month_year|cities_inserted|running_total|month_over_month|
704 | ----------|---------------|-------------|----------------|
705 | Jan-2021 | 1,471 | 1,471 | |
706 | Feb-2021 | 1,291 | 2,762 |+87.76% |
707 | Mar-2021 | 1,485 | 4,247 |+53.77% |
708 | Apr-2021 | 1,508 | 5,755 |+35.51% |
709 | May-2021 | 1,509 | 7,264 |+26.22% |
710 | Jun-2021 | 1,395 | 8,659 |+19.20% |
711 | Jul-2021 | 1,394 | 10,053 |+16.10% |
712 | Aug-2021 | 1,481 | 11,534 |+14.73% |
713 | Sep-2021 | 1,452 | 12,986 |+12.59% |
714 | Oct-2021 | 1,446 | 14,432 |+11.14% |
715 | Nov-2021 | 1,378 | 15,810 |+9.55% |
716 | Dec-2021 | 1,472 | 17,282 |+9.31% |
717 |
718 |
719 |
720 |
721 |
722 | Click to expand answer!
723 |
724 | ##### Answer
725 | ```sql
726 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth;
727 | CREATE VIEW cleaned_data.view_2021_growth AS (
728 | WITH get_month_count AS (
729 | SELECT
730 | date_trunc('month', insert_date) as single_month,
731 | count(*) AS monthly_count
732 | FROM
733 | cleaned_data.cities
734 | WHERE
735 | EXTRACT('year' FROM insert_date) = 2021
736 | GROUP BY
737 | single_month
738 | ORDER BY
739 | single_month
740 | ),
741 | get_running_total AS (
742 | SELECT
743 | single_month::date,
744 | monthly_count,
745 | sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities
746 | FROM
747 | get_month_count
748 | ),
749 | get_month_over_month AS (
750 | SELECT
751 | single_month,
752 | monthly_count,
753 | total_num_cities,
754 | round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month
755 | FROM
756 | get_running_total
757 | )
758 | SELECT
759 | to_char(single_month, 'Mon-YYYY') AS month_year,
760 | to_char(monthly_count, '9G999') AS cities_inserted,
761 | to_char(total_num_cities, '99G999') AS running_total,
762 | to_char(month_over_month, 'sg99.99') || '%' AS month_over_month
763 | FROM
764 | get_month_over_month
765 | );
766 |
767 | SELECT
768 | *
769 | FROM
770 | cleaned_data.view_2021_growth;
771 | ```
772 |
773 |
774 |
775 | 15. Stored Procedure to CSV
776 | Create and call a stored procedure that lists a unique row id number, insert date, country name, city name, population and languages spoken for countries in the Latin America and Caribbean sub-region that were inserted on either '2022-04-09', '2022-04-28' or '2022-08-11'.
777 |
778 | Order by the insert date and output the results (including headers) to a CSV file located in [/source_data/csv_output/](../source_data/csv_output/) .
779 |
780 |
781 | Click to expand expected results!
782 |
783 | ##### Expected Results:
784 |
785 | Results located in [/source_data/csv_output/output.csv](../source_data/csv_output/output.csv)
786 |
787 |
788 |
789 |
790 |
791 | Click to expand answer!
792 |
793 | ##### Answer
794 | ```sql
795 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output ()
796 | LANGUAGE plpgsql
797 | AS
798 | $sproc$
799 | BEGIN
800 | COPY (
801 | SELECT
802 | ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id,
803 | t1.insert_date,
804 | t2.country_name,
805 | t1.city_name,
806 | t1.population,
807 | array_agg(t3.language) AS languages
808 | FROM
809 | cleaned_data.cities AS t1
810 | JOIN
811 | cleaned_data.countries AS t2
812 | ON
813 | t1.country_code_2 = t2.country_code_2
814 | LEFT JOIN
815 | cleaned_data.languages AS t3
816 | ON
817 | t2.country_code_2 = t3.country_code_2
818 | WHERE
819 | t2.sub_region = 'latin america and the caribbean'
820 | AND
821 | t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11')
822 | GROUP BY
823 | t1.insert_date,
824 | t2.country_name,
825 | t1.city_name,
826 | t1.population
827 | ORDER BY
828 | t1.insert_date
829 | )
830 | TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER;
831 | END
832 | $sproc$;
833 |
834 | -- Call the stored procedure
835 | CALL cleaned_data.sproc_output();
836 | ```
837 |
838 |
839 |
840 | :warning: To start over with an empty database, run the the `drop_all_tables.sql` script. :warning:
841 |
842 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
843 |
844 |
845 |
846 |
--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_BONUS_BASH.md:
--------------------------------------------------------------------------------
1 | ## Basic/Intermediate SQL Code Challenge
2 |
3 | **Author**: Jaime M. Shaker
4 | **Email**: jaime.m.shaker@gmail.com
5 | **Website**: https://www.shaker.dev
6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/
7 |
8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
9 |
10 | ### Bash Script
11 |
12 | What is Bash?
13 | * The name stands for **B**ourne **A**gain **SH**ell. It is an open source version of the Bourne Shell and was first released in 1989. A shell is a command-line interpreter that provides a command line user interface for operating systems.
14 |
15 | What is a Bash script?
16 | * A Bash script is a plain text file which contains a series of commands. These are the commands we would normally type ouselves on the command line. Using [psql](https://www.postgresguide.com/utilities/psql/), we can create a Bash script that will execute our `.sql` files in order and always start with a fresh, clean database everytime we run the script.
17 |
18 | Do I have Bash on my computer?
19 | * If you are following along this walk-through and using the Docker container, the answer is YES! Bash is the default shell used in Linux and MacOS up to version 10.15 although it remains available as an alternative shell.
20 |
21 | To use the Bash shell, let's connect to our Docker/PostgreSQL container by opening up another command line terminal and type in this command.
22 |
23 | `docker exec -ti sql_coding_challenge bash`
24 |
25 | 
26 |
27 | After executing this command, you should be logged into the Docker container as `root`. Keep in mind that the `@a80c64aa877f` portion after `root` is the actual Docker ID for this container, which will be different for your unique container.
28 |
29 | 
30 |
31 | We are logged in as the `root` user. Let's `substitute user` with the `su` command. Running around your file system as the `root` user can lead to problems if you don't know what you're doing. In the terminal type and execute the command
32 |
33 | `su postgres`
34 |
35 | 
36 |
37 | We are now connected to the container as the user `postgres`. Now we can cd (change directory) to where our scripts are located which we initially set-up as the `/source_data/scripts/` directory..
38 |
39 | `cd ~/source_data/scripts/`
40 |
41 | We are now in our `source_data/scripts` directory.
42 |
43 | 
44 |
45 | Create a new file in the `source_data/scripts` directory and name it
46 |
47 | `start_fresh.sh`
48 |
49 | This is the Bash script that will execute the psql code. Inside of the `start_fresh.sh` file, the bash script should look like this.
50 |
51 | ```bash
52 | #!/bin/bash
53 |
54 | psql sql_coding_challenge -f drop_all_tables.sql
55 | psql sql_coding_challenge -f build_tables.sql
56 | psql sql_coding_challenge -f normalize_tables.sql
57 | psql sql_coding_challenge -f remove_duplicates.sql
58 | psql sql_coding_challenge -f cleanup_db.sql
59 | psql sql_coding_challenge -f create_relationships.sql
60 | echo "Build Process Complete"
61 | ```
62 |
63 | To run this script, use the following command in the same directory as the script.
64 |
65 | `./start_fresh.sh` or `bash start_fresh.sh`
66 |
67 | :exclamation: Note: :exclamation:
68 | * If upon trying to run the script you receive a `Permission Denied` error, you must run this command first, then try again.
69 | * `chmod u+x start_fresh.sh`
70 |
71 | 
72 |
73 | Once the script starts, it will start executing the `.sql` files we just created and once it's complete, should look something similar to this.
74 |
75 | 
76 |
77 | We are now right back to where we were when we first started this page.
78 |
79 | However, now we have a script that we can run our SQL files and produce a fresh set-up all the time, everytime.
80 |
81 | We are now ready to complete the `SQL Coding Challenge`.
82 |
83 | click the link below
84 |
85 | Go to [WALKTHROUGH_CODE_CHALLENGE](WALKTHROUGH_7_CODE_CHALLENGE.md)
86 |
87 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
88 |
89 |
90 |
91 |
--------------------------------------------------------------------------------