├── .gitignore
├── README.md
├── docker-compose.yaml
├── images
    ├── ERD.PNG
    ├── bash_1.PNG
    ├── bash_2.PNG
    ├── bash_3.PNG
    ├── bash_4.PNG
    ├── bash_5.PNG
    ├── bash_6.PNG
    ├── terminal_1.PNG
    └── terminal_2.PNG
├── source_data
    ├── csv_data
    │   ├── cities.csv
    │   ├── countries.csv
    │   ├── currencies.csv
    │   └── languages.csv
    ├── csv_output
    │   └── output.csv
    └── scripts
    │   ├── build_tables.sql
    │   ├── cleanup_db.sql
    │   ├── code_challenge.sql
    │   ├── create_relationships.sql
    │   ├── drop_all_tables.sql
    │   ├── normalize_tables.sql
    │   ├── remove_duplicates.sql
    │   └── start_fresh.sh
└── walkthrough
    ├── WALKTHROUGH_1_DOCKER.md
    ├── WALKTHROUGH_2_BUILD.md
    ├── WALKTHROUGH_3_NORMALIZE.md
    ├── WALKTHROUGH_4_REMOVE_DUPLICATES.md
    ├── WALKTHROUGH_5_CLEANUP.md
    ├── WALKTHROUGH_6_RELATIONSHIPS.md
    ├── WALKTHROUGH_7_CODE_CHALLENGE.md
    └── WALKTHROUGH_BONUS_BASH.md


/.gitignore:
--------------------------------------------------------------------------------
1 | /db/pgdata
2 | ./vscode


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Basic/Intermediate SQL Code Challenge
  2 | 
  3 | **Author**: Jaime M. Shaker <br />
  4 | **Email**: jaime.m.shaker@gmail.com <br />
  5 | **Website**: https://www.shaker.dev <br />
  6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
  7 | 
  8 | This project is an opportunity to practice your SQL skills.  This project was inspired by the [Braintree Analytics Code Challenge](https://github.com/AlexanderConnelly/BrainTree_SQL_Coding_Challenge_Data_Analyst) and was created to strengthen/sharpen your SQL knowledge.
  9 | 
 10 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
 11 | 
 12 | **Please Note**:
 13 | 
 14 | This project is meant to measure/gauge/practice your technical abilities with SQL.
 15 | 
 16 | * All work should be done in SQL.
 17 | * Do not alter the CSV files.
 18 | * Most, if not all of these questions are intended purely as a measure of your SQL skills and not as actual questions that one would expect to answer from an employer.
 19 | 
 20 | An SQL Code Challenge [Walkthrough](walkthrough/WALKTHROUGH_1_DOCKER.md) has been created if you would like to follow along in a step by step fashion. 
 21 | 
 22 | The questions are listed below with the expected results and answers hidden so you can cross reference your answers.
 23 | 
 24 | This repository contains all of the necessary files, datasets and directories for running a PostgresSQL server in a Docker Container.  The only prerequisite is that you should already have Docker Desktop installed and running on your computer.
 25 | 
 26 |  https://www.docker.com/products/docker-desktop/
 27 |  
 28 |  ### Getting Started
 29 | 
 30 | * This project uses the latest version of `PostgreSQL`, but any SQL database may be used.
 31 | * Directories and Files details:
 32 | 	* `README.md`: This is the file your are currently reading.
 33 | 	* `.gitignore`: The files and directories listed in this file will be ignored by Git/GitHub.
 34 | 	* `docker-compose.yaml`: With this yaml (**Y**et **A**nother **M**arkup **L**anguage) file you can define the services (Postgres/MySQL/Python...) and with a single command, you can spin everything up or tear it all down.
 35 | 	* `source_data/csv_data`: This is the location of the CSV files needed to copy into our database.
 36 | 	* `db`: In the current directory, create an empty directory named '`db`'.  Although this directory is ignored by Git/GitHub, PostgreSQL requires a directory to keep data persistent if you are running a Docker container.  There will be no need to alter any of the files that reside there once the container starts running.
 37 | 	* `images`: This is the location of the image files displayed in this repository.
 38 | 	* `walkthrough/`: A directory with a beginner friendly walkthrough of all of the steps I took to complete the SQL challenge.
 39 | 	* `source_data/`: This directory contains all the files and scripts within our Docker container.
 40 | 	* `source_data/csv_data`: This is the location of the CSV files needed to copy/insert into our database.
 41 | 	* `source_data/scripts/`: This directory will contain all sql scripts I used to complete the challenge.
 42 | 	* `source_data/csv_output/`: This directory will contain all CSV files generated by some of our queries.
 43 | 
 44 | :grey_exclamation: :star: Start here if you want to follow along with the  [SQL Challenge WALKTHROUGH](./walkthrough/WALKTHROUGH_1_DOCKER.md) :star: :grey_exclamation: 
 45 |  
 46 | ### SQL Code Challenge 
 47 | <strong>1.  Create Database, Tables and Relations.</strong>  
 48 | Using the CSV files located in `source_data/csv_data`, create your new SQL database and tables with the properly formatted data.
 49 | 
 50 | * Add a numeric, auto-incrementing Primary Key to every table.
 51 | * In the `countries` table, add the column `created_on` with the current date.
 52 | * Create a one-to-one and one-to-many relationship with the countries table as the parent table.
 53 | 
 54 | <strong>2.  List Regions and Country Count</strong>  
 55 | List all of the regions and the total number of countries in each region.  Order by country count in descending order and capitalize the region name.
 56 | 
 57 | <details>
 58 |   <summary>Click to expand expected results!</summary>
 59 | 
 60 |   ##### Expected Results:
 61 | 
 62 | region   |country_count|
 63 | ---------|-------------|
 64 | Africa   |           59|
 65 | Americas |           57|
 66 | Asia     |           50|
 67 | Europe   |           48|
 68 | Oceania  |           26|
 69 | Antartica|            1|
 70 | 
 71 | </details>
 72 | </p>
 73 | 
 74 | <details>
 75 |   <summary>Click to expand answer!</summary>
 76 | 
 77 |   ##### Answer
 78 |   ```sql
 79 | SELECT 
 80 | 	-- initcap() capitalizes the first letter of every word in a string.
 81 | 	initcap(region) AS region,
 82 | 	count(*) AS country_count
 83 | FROM
 84 | 	cleaned_data.countries
 85 | GROUP BY
 86 | 	-- Aggregate functions 'count()' require you to group all column fields.
 87 | 	region
 88 | ORDER BY 
 89 | 	country_count DESC;
 90 |   ```
 91 | </details>
 92 | <br />
 93 | 
 94 | <strong>3. List Sub-Regions and City Count</strong>  
 95 | List all of the sub-regions and the total number of cities in each sub-region.  Order by sub-region name alphabetically.
 96 | 
 97 | <details>
 98 |   <summary>Click to expand expected results!</summary>
 99 | 
100 |   ##### Expected Results:
101 | 
102 | sub_region                     |city_count|
103 | -------------------------------|----------|
104 | Australia And New Zealand      |       329|
105 | Central Asia                   |       560|
106 | Eastern Asia                   |      3164|
107 | Eastern Europe                 |      2959|
108 | Latin America And The Caribbean|      7204|
109 | Melanesia                      |        60|
110 | Micronesia                     |        15|
111 | Northern Africa                |      1152|
112 | Northern America               |      5844|
113 | Northern Europe                |      2025|
114 | Polynesia                      |        22|
115 | Southeastern Asia              |      2627|
116 | Southern Asia                  |      6848|
117 | Southern Europe                |      3238|
118 | Subsaharan Africa              |      3223|
119 | Western Asia                   |      1400|
120 | Western Europe                 |      3952|
121 | 
122 | </details>
123 | </p>
124 | 
125 | <details>
126 |   <summary>Click to expand answer!</summary>
127 | 
128 |   ##### Answer
129 |   ```sql
130 | SELECT 
131 | 	initcap(t1.sub_region) AS sub_region,
132 | 	count(*) AS city_count
133 | FROM
134 | 	cleaned_data.countries AS t1
135 | JOIN 
136 | 	cleaned_data.cities AS t2
137 | ON
138 | 	t1.country_code_2 = t2.country_code_2
139 | GROUP BY
140 | 	t1.sub_region
141 | ORDER BY 
142 | 	t1.sub_region;
143 |   ```
144 | </details>
145 | <br />
146 | 
147 | <strong>4. Specific Sub-Region and String Functions</strong>  
148 | List all of the countries and the total number of cities in the Northern Europe sub-region.  List the country names in uppercase and order the list by the length of the country name and alphabetically in ascending order.
149 | 
150 | <details>
151 |   <summary>Click to expand expected results!</summary>
152 | 
153 |   ##### Expected Results:
154 | 
155 | country_name  |city_count|
156 | --------------|----------|
157 | JERSEY        |         1|
158 | LATVIA        |        39|
159 | NORWAY        |       127|
160 | SWEDEN        |       148|
161 | DENMARK       |        75|
162 | ESTONIA       |        20|
163 | FINLAND       |       142|
164 | ICELAND       |        12|
165 | IRELAND       |        64|
166 | LITHUANIA     |        61|
167 | ISLE OF MAN   |         2|
168 | FAROE ISLANDS |        29|
169 | UNITED KINGDOM|      1305|
170 | 
171 | </details>
172 | </p>
173 | 
174 | <details>
175 |   <summary>Click to expand answer!</summary>
176 | 
177 |   ##### Answer
178 |   ```sql
179 | SELECT 
180 | 	-- upper() returns your string in uppercase.
181 | 	upper(t1.country_name) AS country_name,
182 | 	count(*) AS city_count
183 | FROM
184 | 	cleaned_data.countries AS t1
185 | JOIN 
186 | 	cleaned_data.cities AS t2
187 | ON 
188 | 	t1.country_code_2 = t2.country_code_2
189 | WHERE
190 | 	t1.sub_region = 'northern europe'
191 | GROUP BY 
192 | 	t1.country_name
193 | ORDER BY 
194 | 	-- length() returns the number or characters in a string including spaces.  
195 | 	length(t1.country_name), t1.country_name;
196 |   ```
197 | </details>
198 | <br />
199 | 
200 | 
201 | <strong>5. List Specific Countries by Year</strong>  
202 |  List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021.  Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities.
203 | 
204 | <details>
205 |   <summary>Click to expand expected results!</summary>
206 | 
207 |   ##### Expected Results:
208 | 
209 | country_name          |city_count|
210 | ----------------------|----------|
211 | Andorra               |         5|
212 | Albania               |        11|
213 | Bosnia And Herzegovina|        15|
214 | Croatia               |        22|
215 | North Macedonia       |        28|
216 | Malta                 |        32|
217 | Serbia                |        58|
218 | Slovenia              |        74|
219 | Greece                |        64|
220 | Portugal              |       109|
221 | Spain                 |       302|
222 | San Marino            |         2|
223 | Montenegro            |        12|
224 | Italy                 |       542|
225 | 
226 | </details>
227 | </p>
228 | 
229 | <details>
230 |   <summary>Click to expand answer!</summary>
231 | 
232 |   ##### Answer
233 |   ```sql
234 | SELECT 
235 | 	initcap(t1.country_name) AS country_name,
236 | 	count(*) AS city_count
237 | FROM
238 | 	cleaned_data.countries AS t1
239 | JOIN 
240 | 	cleaned_data.cities AS t2
241 | ON 
242 | 	t1.country_code_2 = t2.country_code_2
243 | WHERE
244 | 	t1.sub_region = 'southern europe'
245 | AND
246 | 	-- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours.....
247 | 	EXTRACT('year' FROM t2.insert_date) = 2021
248 | GROUP BY 
249 | 	t1.country_name
250 | ORDER BY 
251 | 	-- substring() function returns a specific portion of a string.
252 | 	substring(t1.country_name,length(t1.country_name),1), city_count;
253 |   ```
254 | </details>
255 | <br />
256 | 
257 | <strong>6. List Anti-Join</strong>  
258 | List all of the countries in the region of Asia that did **NOT** have a city with an inserted date from June 2021 through Sept 2021.
259 | 
260 | <details>
261 |   <summary>Click to expand expected results!</summary>
262 | 
263 |   ##### Expected Results:
264 | 
265 | country_name     |
266 | -----------------|
267 | Brunei Darussalam|
268 | Kuwait           |
269 | Macao            |
270 | Singapore        |
271 | 
272 | </details>
273 | </p>
274 | 
275 | <details>
276 |   <summary>Click to expand answer!</summary>
277 | 
278 |   ##### Answer
279 |   ```sql
280 | SELECT 
281 | 	-- Distinct will only return unique values
282 | 	DISTINCT initcap(t1.country_name) AS country_name
283 | FROM
284 | 	cleaned_data.countries AS t1
285 | -- Left join will return all matching values from the left table (cleaned_data.countries) and
286 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where
287 | -- there are no matches in the left table.
288 | LEFT JOIN 
289 | 	cleaned_data.cities AS t2
290 | ON 
291 | 	t1.country_code_2 = t2.country_code_2
292 | AND
293 | 	t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01'
294 | WHERE
295 | 	t1.region = 'asia'
296 | -- Only return values that did NOT have a match with the countries table.
297 | AND 
298 | 	t2.country_code_2 IS NULL;
299 |   ```
300 | </details>
301 | <br />
302 | 
303 | <strong>7. Reversable Names</strong>  
304 | List the country, city name, population and city name length for the city names that are [palindromes](https://en.wikipedia.org/wiki/Palindrome) in the Western Asia sub-region.  Format the population with a thousands separator (1,000) and format the length of the city name in roman numerals.  Order by the length of the city name in descending order and alphabetically in ascending order.
305 | 
306 | <details>
307 |   <summary>Click to expand expected results!</summary>
308 | 
309 |   ##### Expected Results:
310 | 
311 | country_name        |city_name|population|roman_numeral_length|
312 | --------------------|---------|----------|--------------------|
313 | Yemen               |Hajjah   |  46,568  |             VI     |
314 | Syrian Arab Republic|Hamah    | 696,863  |              V     |
315 | Turkey              |Kavak    |  21,692  |              V     |
316 | Turkey              |Kinik    |  29,803  |              V     |
317 | Turkey              |Tut      |  10,161  |            III     |
318 | 
319 | </details>
320 | </p>
321 | 
322 | <details>
323 |   <summary>Click to expand answer!</summary>
324 | 
325 |   ##### Answer
326 |   ```sql
327 | SELECT 
328 | 	initcap(t1.country_name) AS country_name,
329 | 	initcap(t2.city_name) AS city_name,
330 | 	-- to_char() takes non-string data types and returns them as strings.
331 | 	to_char(t2.population, '999G999') AS population,
332 | 	to_char(length(t2.city_name), 'RN') AS roman_numeral_length
333 | FROM
334 | 	cleaned_data.countries AS t1
335 | JOIN 
336 | 	cleaned_data.cities AS t2
337 | ON 
338 | 	t1.country_code_2 = t2.country_code_2
339 | WHERE
340 | 	t2.city_name = reverse(t2.city_name)
341 | AND
342 | 	t1.sub_region = 'western asia'
343 | ORDER BY 
344 | 	length(t2.city_name) DESC, t2.city_name ASC;
345 |   ```
346 | </details>
347 | <br />
348 | 
349 | <strong>8. Search with Wildcard and Case</strong>  
350 | List all of the countries that end in 'stan'.  Make your query case-insensitive and list whether the total population of the cities listed is an odd or even number for cities inserted in 2022.  Order by whether the population value is odd or even in ascending order and country name in alphabetical order.
351 | 
352 | <details>
353 |   <summary>Click to expand expected results!</summary>
354 | 
355 |   ##### Expected Results:
356 | 
357 | country_name|total_population|odd_or_even|
358 | ------------|----------------|-----------|
359 | Afghanistan |  6,006,530     |Even       |
360 | Kazakhstan  |  4,298,264     |Even       |
361 | Kyrgyzstan  |  1,017,644     |Even       |
362 | Pakistan    | 26,344,480     |Even       |
363 | Tajikistan  |  2,720,953     |Odd        |
364 | Turkmenistan|    419,607     |Odd        |
365 | Uzbekistan  |  3,035,547     |Odd        |
366 | 
367 | </details>
368 | </p>
369 | 
370 | <details>
371 |   <summary>Click to expand answer!</summary>
372 | 
373 |   ##### Answer
374 |   ```sql
375 | SELECT
376 | 	initcap(t1.country_name) AS country_name,
377 | 	to_char(sum(t2.population), '99G999G999') total_population,
378 | 	CASE
379 | 		WHEN (sum(t2.population) % 2) = 0
380 | 			THEN 'Even'
381 | 		ELSE 
382 | 			'Odd'
383 | 	END AS odd_or_even
384 | FROM
385 | 	cleaned_data.countries AS t1
386 | JOIN 
387 | 	cleaned_data.cities AS t2
388 | ON 
389 | 	t1.country_code_2 = t2.country_code_2
390 | WHERE
391 | 	t1.country_name ILIKE '%stan'
392 | AND 
393 | 	EXTRACT('year' FROM t2.insert_date) = 2022
394 | GROUP BY
395 | 	t1.country_name
396 | ORDER BY 
397 | 	odd_or_even, country_name;
398 |   ```
399 | </details>
400 | <br />
401 | 
402 | <strong>9. Ranking Regions</strong>  
403 | List the third most populated city ranked by region WITHOUT using limit or offset.  List the region name, city name, population and order the results by region.
404 | 
405 | <details>
406 |   <summary>Click to expand expected results!</summary>
407 | 
408 |   ##### Expected Results:
409 | 
410 | region  |city_name|third_largest_pop|
411 | --------|---------|-----------------|
412 | Africa  |Kinshasa | 12,836,000      |
413 | Americas|New York | 18,972,871      |
414 | Asia    |Delhi    | 32,226,000      |
415 | Europe  |Paris    | 11,060,000      |
416 | Oceania |Brisbane |  2,360,241      |
417 | 
418 | </details>
419 | </p>
420 | 
421 | <details>
422 |   <summary>Click to expand answer!</summary>
423 | 
424 |   ##### Answer
425 |   ```sql
426 | WITH get_city_rank_cte AS (
427 | 	SELECT
428 | 		t1.region,
429 | 		t2.city_name,
430 | 		t2.population AS third_largest_pop,
431 | 		DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk
432 | 	FROM
433 | 		cleaned_data.countries AS t1
434 | 	JOIN 
435 | 		cleaned_data.cities AS t2
436 | 	ON 
437 | 		t1.country_code_2 = t2.country_code_2
438 | 	WHERE 
439 | 		t2.population IS NOT NULL
440 | 	GROUP BY
441 | 		t1.region,
442 | 		t2.city_name,
443 | 		t2.population
444 | )
445 | SELECT
446 | 	initcap(region) AS region,
447 | 	initcap(city_name) AS city_name,
448 | 	to_char(third_largest_pop, '99G999G999') AS third_largest_pop
449 | FROM
450 | 	get_city_rank_cte
451 | WHERE
452 | 	rnk = 3;
453 |   ```
454 | </details>
455 | <br />
456 | 
457 | <strong>10. Using Buckets</strong>  
458 | List the bottom third of all countries in the Western Asia sub-region that speak Arabic.  Include the row number and country name.  Order by row number.
459 | 
460 | <details>
461 |   <summary>Click to expand expected results!</summary>
462 | 
463 |   ##### Expected Results:
464 | 
465 | row_id|country_name        |
466 | ------|--------------------|
467 | 9|saudi arabia        |
468 | 10|syrian arab republic|
469 | 11|united arab emirates|
470 | 12|yemen               |
471 | 
472 | </details>
473 | </p>
474 | 
475 | <details>
476 |   <summary>Click to expand answer!</summary>
477 | 
478 |   ##### Answer
479 |   ```sql
480 | WITH get_ntile_cte AS (
481 | 	SELECT
482 | 		ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn,
483 | 		t1.country_name,
484 | 		-- ntile() window functions returns groups of data section into 'buckets'.
485 | 		NTILE(3) OVER (ORDER BY t1.country_name) AS nt
486 | 	FROM
487 | 		cleaned_data.countries AS t1
488 | 	JOIN 
489 | 		cleaned_data.languages AS t2
490 | 	ON
491 | 		t1.country_code_2 = t2.country_code_2
492 | 	WHERE
493 | 		t1.sub_region = 'western asia'
494 | 	AND 
495 | 		t2.language = 'arabic'
496 | )
497 | SELECT
498 | 	rn AS row_id,
499 | 	country_name
500 | FROM
501 | 	get_ntile_cte
502 | WHERE
503 | 	nt = 3;
504 |   ```
505 | </details>
506 | <br />
507 | 
508 | <strong>11. Using Arrays</strong>  
509 | Create a query that lists country name, capital name, population, languages spoken and currency name for countries in the Northen Africa sub-region.  There can be multiple currency names and languages spoken per country.  Add multiple values for the same field into an array.
510 | 
511 | <details>
512 |   <summary>Click to expand expected results!</summary>
513 | 
514 |   ##### Expected Results:
515 | 
516 | country_name|city_name|population|languages                                   |currencies     |
517 | ------------|---------|----------|--------------------------------------------|---------------|
518 | algeria     |algiers  |   3415811|{french,arabic,kabyle}                      |algerian dinar |
519 | egypt       |cairo    |  20296000|{arabic}                                    |egyptian pound |
520 | libya       |tripoli  |   1293016|{arabic}                                    |libyan dinar   |
521 | morocco     |rabat    |    572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham|
522 | sudan       |khartoum |   7869000|{arabic,english}                            |sudanese pound |
523 | tunisia     |tunis    |   1056247|{french,arabic}                             |tunisian dinar |
524 | 
525 | </details>
526 | </p>
527 | 
528 | <details>
529 |   <summary>Click to expand answer!</summary>
530 | 
531 |   ##### Answer
532 |   ```sql
533 | WITH get_row_values AS (
534 | 	SELECT
535 | 		t1.country_name,
536 | 		t2.city_name,
537 | 		t2.population,
538 | 		-- array_agg() aggregates multiple values and returns them in 'array' format.
539 | 		array_agg(t3.LANGUAGE) AS languages,
540 | 		t4.currency_name AS currencies
541 | 	FROM
542 | 		cleaned_data.countries AS t1
543 | 	JOIN
544 | 		cleaned_data.cities AS t2
545 | 	ON 
546 | 		t1.country_code_2 = t2.country_code_2
547 | 	JOIN
548 | 		cleaned_data.languages AS t3
549 | 	ON 
550 | 		t1.country_code_2 = t3.country_code_2
551 | 	JOIN
552 | 		cleaned_data.currencies AS t4
553 | 	ON 
554 | 		t1.country_code_2 = t4.country_code_2
555 | 	WHERE
556 | 		t1.sub_region = 'northern africa'
557 | 	AND
558 | 		t2.capital = TRUE
559 | 	GROUP BY
560 | 		t1.country_name,
561 | 		t2.city_name,
562 | 		t2.population,
563 | 		t4.currency_name
564 | )
565 | SELECT
566 | 	*
567 | FROM
568 | 	get_row_values;
569 |   ```
570 | </details>
571 | <br />
572 | 
573 | <strong>12. Using Case and Percentages</strong>  
574 | Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022.  List how many vowels and consonants are present in the city name and concatnate their percentage to the their respective count in parenthesis.  
575 | 
576 | <details>
577 |   <summary>Click to expand expected results!</summary>
578 | 
579 |   ##### Expected Results:
580 | 
581 | city_name      |vowel_count_perc|consonants_count_perc|
582 | ---------------|----------------|---------------------|
583 | standish       |2 (25.00%)      |6 (75%)              |
584 | grand forks    |2 (18.18%)      |9 (81.82%)           |
585 | camano         |3 (50.00%)      |3 (50%)              |
586 | cedar hills    |3 (27.27%)      |8 (72.73%)           |
587 | gladstone      |3 (33.33%)      |6 (66.67%)           |
588 | whitehall      |3 (33.33%)      |6 (66.67%)           |
589 | homewood       |4 (50.00%)      |4 (50%)              |
590 | willowbrook    |4 (36.36%)      |7 (63.64%)           |
591 | port salerno   |4 (33.33%)      |8 (66.67%)           |
592 | vadnais heights|5 (33.33%)      |10 (66.67%)          |
593 | jeffersonville |5 (35.71%)      |9 (64.29%)           |
594 | 
595 | </details>
596 | </p>
597 | 
598 | <details>
599 |   <summary>Click to expand answer!</summary>
600 | 
601 |   ##### Answer
602 |   ```sql
603 | WITH get_letter_count AS (
604 | 	SELECT
605 | 		t1.city_name,
606 | 		length(t1.city_name) string_length,
607 | 		-- regexp_replace() returns a vlue that has been manipulated by a regular expression.
608 | 		length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count
609 | 	FROM
610 | 		cleaned_data.cities AS t1
611 | 	WHERE
612 | 		t1.insert_date = '2022-04-28'
613 | 	AND
614 | 		t1.country_code_2 in ('us')
615 | ),
616 | get_letter_diff AS (
617 | 	SELECT
618 | 		city_name,
619 | 		string_length - consonant_count AS vowels,
620 | 		round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc,
621 | 		consonant_count AS consonants,
622 | 		round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc
623 | 	FROM
624 | 		get_letter_count
625 | )
626 | SELECT 
627 | 	city_name,
628 | 	vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc,
629 | 	consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc
630 | FROM
631 | 	get_letter_diff
632 | ORDER BY 
633 | 	vowels;
634 |   ```
635 | </details>
636 | <br />
637 | 
638 | <strong>13. Most Consecutive Days</strong>  
639 | List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted in April 2022.  
640 | 
641 | <details>
642 |   <summary>Click to expand expected results!</summary>
643 | 
644 |   ##### Expected Results:
645 | 
646 | most_consecutive_dates|city_name   |
647 | ----------------------|------------|
648 | 2022-04-22|South Dundas|
649 | 2022-04-23|La Prairie  |
650 | 2022-04-24|Elliot Lake |
651 | 2022-04-25|Lachute     |
652 | 
653 | </details>
654 | </p>
655 | 
656 | <details>
657 |   <summary>Click to expand answer!</summary>
658 | 
659 |   ##### Answer
660 |   ```sql
661 | DROP TABLE IF EXISTS get_dates;
662 | CREATE TEMP TABLE get_dates AS (
663 | 	SELECT
664 | 		DISTINCT ON (insert_date) insert_date AS insert_date,
665 | 		city_name
666 | 	FROM
667 | 		cleaned_data.cities
668 | 	WHERE
669 | 		country_code_2 = 'ca'
670 | 	AND
671 | 		insert_date BETWEEN '2022-04-01' AND '2022-04-30'
672 | 	ORDER BY
673 | 		insert_date
674 | );
675 | 
676 | DROP TABLE IF EXISTS get_diff;
677 | CREATE TEMP TABLE get_diff AS (
678 | 	SELECT
679 | 		city_name,
680 | 		insert_date,
681 | 		EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff
682 | 	FROM
683 | 		get_dates
684 | );
685 | 
686 | 
687 | DROP TABLE IF EXISTS get_diff_count;
688 | CREATE TEMP TABLE get_diff_count AS (
689 | 	SELECT
690 | 		city_name,
691 | 		insert_date,
692 | 		count(*) OVER (PARTITION BY diff) AS diff_count
693 | 	FROM
694 | 		get_diff
695 | );
696 | 
697 | 
698 | WITH get_rank AS (
699 | 	SELECT
700 | 		DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk,
701 | 		insert_date,
702 | 		city_name
703 | 	FROM
704 | 		get_diff_count
705 | )
706 | SELECT
707 | 	insert_date AS most_consecutive_dates,
708 | 	initcap(city_name) AS city_name
709 | FROM
710 | 	get_rank
711 | WHERE
712 | 	rnk = 1
713 | ORDER BY 
714 | 	insert_date;
715 |   ```
716 | </details>
717 | <br />
718 | 
719 | <strong>14. Month over Month in View</strong>  
720 | Create a view that lists the month-year, the number of cities inserted for that month, a running city count total and the month over month percentage growth for 2021.
721 | 
722 | Format the cities count and the running total with the thousands separator and format the month over month growth with a plus symbol and percentage symbol
723 | 
724 | Example: 
725 | month_year|cities_inserted|running_total|month_over_month|
726 | ----------|---------------|-------------|----------------|
727 | Feb-2021  | 1,291         |  2,762      |+87.76%         |
728 | 
729 | <details>
730 |   <summary>Click to expand expected results!</summary>
731 | 
732 |   ##### Expected Results:
733 | 
734 | month_year|cities_inserted|running_total|month_over_month|
735 | ----------|---------------|-------------|----------------|
736 | Jan-2021  | 1,471         |  1,471      |                |
737 | Feb-2021  | 1,291         |  2,762      |+87.76%         |
738 | Mar-2021  | 1,485         |  4,247      |+53.77%         |
739 | Apr-2021  | 1,508         |  5,755      |+35.51%         |
740 | May-2021  | 1,509         |  7,264      |+26.22%         |
741 | Jun-2021  | 1,395         |  8,659      |+19.20%         |
742 | Jul-2021  | 1,394         | 10,053      |+16.10%         |
743 | Aug-2021  | 1,481         | 11,534      |+14.73%         |
744 | Sep-2021  | 1,452         | 12,986      |+12.59%         |
745 | Oct-2021  | 1,446         | 14,432      |+11.14%         |
746 | Nov-2021  | 1,378         | 15,810      |+9.55%          |
747 | Dec-2021  | 1,472         | 17,282      |+9.31%          |
748 | 
749 | </details>
750 | </p>
751 | 
752 | <details>
753 |   <summary>Click to expand answer!</summary>
754 | 
755 |   ##### Answer
756 |   ```sql
757 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth;
758 | CREATE VIEW cleaned_data.view_2021_growth AS (
759 | 	WITH get_month_count AS (
760 | 		SELECT
761 | 			date_trunc('month', insert_date) as single_month,
762 | 		  	count(*) AS monthly_count
763 | 		FROM 
764 | 			cleaned_data.cities
765 | 		WHERE 
766 | 			EXTRACT('year' FROM insert_date) = 2021
767 | 		GROUP BY 
768 | 			single_month
769 | 		ORDER BY 
770 | 			single_month
771 | 	),
772 | 	get_running_total AS (
773 | 		SELECT
774 | 			single_month::date,
775 | 		  	monthly_count,
776 | 		  	sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities
777 | 		FROM
778 | 			get_month_count
779 | 	),
780 | 	get_month_over_month AS (
781 | 		SELECT
782 | 			single_month,
783 | 			monthly_count,
784 | 			total_num_cities,
785 | 			round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month
786 | 		FROM
787 | 			get_running_total
788 | 	)
789 | 	SELECT
790 | 		to_char(single_month, 'Mon-YYYY') AS month_year,
791 | 		to_char(monthly_count, '9G999') AS cities_inserted,
792 | 		to_char(total_num_cities, '99G999') AS running_total,
793 | 		to_char(month_over_month, 'sg99.99') || '%' AS month_over_month
794 | 	FROM
795 | 		get_month_over_month
796 | );
797 | 
798 | SELECT 
799 | 	*
800 | FROM 
801 | 	cleaned_data.view_2021_growth;
802 |   ```
803 | </details>
804 | <br />
805 | 
806 | <strong>15. Stored Procedure to CSV</strong>  
807 | Create and call a stored procedure that lists a unique row id number, insert date, country name, city name, population and languages spoken for countries in the Latin America and Caribbean sub-region that were inserted on either '2022-04-09', '2022-04-28' or '2022-08-11'.
808 | 
809 |  Order by the insert date and output the results (including headers) to a CSV file located in [/source_data/csv_output/](../source_data/csv_output/) . 
810 | 
811 | <details>
812 |   <summary>Click to expand expected results!</summary>
813 | 
814 |   ##### Expected Results:
815 | 
816 | Results located in [/source_data/csv_output/output.csv](../source_data/csv_output/output.csv)
817 | 
818 | </details>
819 | </p>
820 | 
821 | <details>
822 |   <summary>Click to expand answer!</summary>
823 | 
824 |   ##### Answer
825 |   ```sql
826 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output ()
827 | LANGUAGE plpgsql
828 | AS 
829 | $sproc$
830 | 	BEGIN
831 | 		COPY (
832 | 			SELECT
833 | 				ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id,
834 | 				t1.insert_date,
835 | 				t2.country_name,
836 | 				t1.city_name,
837 | 				t1.population,
838 | 				array_agg(t3.language) AS languages
839 | 			FROM
840 | 				cleaned_data.cities AS t1
841 | 			JOIN
842 | 				cleaned_data.countries AS t2
843 | 			ON 
844 | 				t1.country_code_2 = t2.country_code_2
845 | 			LEFT JOIN
846 | 				cleaned_data.languages AS t3
847 | 			ON 
848 | 				t2.country_code_2 = t3.country_code_2
849 | 			WHERE
850 | 				t2.sub_region = 'latin america and the caribbean'
851 | 			AND
852 | 				t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11')
853 | 			GROUP BY 
854 | 				t1.insert_date,
855 | 				t2.country_name,
856 | 				t1.city_name,
857 | 				t1.population
858 | 			ORDER BY
859 | 				t1.insert_date
860 | 			)
861 | 		TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER;
862 | 	END
863 | $sproc$;
864 | 
865 | -- Call the stored procedure
866 | CALL cleaned_data.sproc_output();
867 |   ```
868 | </details>
869 | <br />
870 | 
871 | :grey_exclamation: Start here if you want to follow along with the  [SQL Challenge WALKTHROUGH](./walkthrough/WALKTHROUGH_1_DOCKER.md) :grey_exclamation: 
872 | 
873 | :exclamation: If you found this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
874 | 
875 | 
876 | 
877 | 


--------------------------------------------------------------------------------
/docker-compose.yaml:
--------------------------------------------------------------------------------
 1 | version: '3.8'
 2 | services:
 3 |   #  This service can be named anything but we'll call it the image it's running
 4 |   postgres:
 5 |     # This will get the lastest official PostgreSQL image from DockerHub
 6 |     image: postgres
 7 |     # The container chosen is the name of the project
 8 |     container_name: sql_coding_challenge
 9 |     restart: always
10 |     # This is the name of the env file. This file is hidden and has evironmental variables. 
11 |     # All values are hard-coded so the file has been commented out.
12 |     # env_file:
13 |     #  - .env
14 |     environment:
15 |       # The database name
16 |       POSTGRES_DB: sql_coding_challenge
17 |       # PostgreSQL Username
18 |       POSTGRES_USER: postgres
19 |       # PostgreSQL Password
20 |       POSTGRES_PASSWORD: postgres
21 |       # The location where PostgreSQL stores its data (Do not change unless you know what your doing)
22 |       PGDATA: /var/lib/postgresql/data/pgdata
23 |     # Default ports that PostgreSQL uses
24 |     ports:
25 |       - "5432:5432"
26 |     volumes:
27 |       -
28 |         # We bind the db directory in our project file to where PostgreSQL stores its data 
29 |         # to keep data stored persistent when PostgreSQL restarts
30 |         type: bind
31 |         source: ./db
32 |         target: /var/lib/postgresql/data
33 |       -
34 |         # We bind our source data file so that the PostgreSQL container can access files and
35 |         # scripts without having to copy them into the container
36 |         type: bind
37 |         source: ./source_data
38 |         target: /var/lib/postgresql/source_data
39 |     networks:
40 |       - reference
41 | networks:
42 |   reference:


--------------------------------------------------------------------------------
/images/ERD.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/ERD.PNG


--------------------------------------------------------------------------------
/images/bash_1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_1.PNG


--------------------------------------------------------------------------------
/images/bash_2.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_2.PNG


--------------------------------------------------------------------------------
/images/bash_3.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_3.PNG


--------------------------------------------------------------------------------
/images/bash_4.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_4.PNG


--------------------------------------------------------------------------------
/images/bash_5.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_5.PNG


--------------------------------------------------------------------------------
/images/bash_6.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/bash_6.PNG


--------------------------------------------------------------------------------
/images/terminal_1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/terminal_1.PNG


--------------------------------------------------------------------------------
/images/terminal_2.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/iweld/SQL_Coding_Challenge/6ba60472d5bbd925323c00864ad4d9c87b990c5b/images/terminal_2.PNG


--------------------------------------------------------------------------------
/source_data/csv_data/countries.csv:
--------------------------------------------------------------------------------
  1 | country_name,country_code_2,country_code_3,region,sub_region,intermediate_region
  2 | !afg!hANistan,af?,afg,asia,$southern asia,
  3 |   alba$nia,aL,alb,europe!,southern *europe,
  4 | Alb?ania,AL,alb,eur#ope,$southern e#urope,
  5 | ?algeria?,d!z,dza,africa,northern africa,
  6 | americ#an samoa?,as,as!m,o##ceania,polyne$sia,
  7 | andoRra,Ad,and,europe,southern europe,
  8 | cyprus    ,cy,cyp,asia,western asia,
  9 | angola,ao,ago,africa,sub-saharan africa,middle africa
 10 | ang!uilla,ai,aia,americas,latin ame!rica and the caribbean,caribbean
 11 | antarctica,aq,ata,antartica,,
 12 | antigua and barbuda,ag,atg,americas,latin america and the caribbean,caribbean
 13 | argentina,ar,arg,americas,latin america and the caribbean,south america
 14 | armenia,a?m,arm,asia,western asia,
 15 | aruba,aw,abw,americas,latin america and the caribbean,caribbean
 16 | australia,au,aus,oceania,australia and new zealand,
 17 | austria,at,aut,europe,western europe,
 18 | azerbaijan,az,aze,asia,western asia,
 19 | bahamas,bs,bhs,americas,latin america and the caribbean,caribbean
 20 | bahrain,bh,bhr,asia,western asia,
 21 | bangladesh,bd,bgd,asia,southern asia,
 22 | barbados,bb,brb,americas,latin america and the caribbean,caribbean
 23 | belarus,by,blr,europe,eastern europe,
 24 | b!elgium,be,bel,europe,western europe,
 25 | belize,bz,blz,americas,latin america and the caribbean,central america
 26 | benin,bj,ben,africa,sub-saharan africa,western africa
 27 | bermuda,bm,bmu,americas,northern america,
 28 | bhutan,bt,btn,asia,southern asia,
 29 | azerbaijan,az,aze,asia,western asia,
 30 | bolivia,bo,bol,americas,latin america and the cari$bbean,south america
 31 | "bonaire, sint eustatius and saba",bq,bes,americas,latin america and the caribbean,caribbean
 32 | bosnia and herze$govina,ba,bih,europe,southern europe,
 33 | botswana,bW,bwa,africa,sub-saharan africa,southern africa
 34 | bouvet island,bv,bvt,americas,latin america and the caribbean,south america
 35 | brazil,br,bra,americas,latin america and the caribbean,south america
 36 | british indian ocean territory,io,iot,africa,sub-saharan africa,eastern africa
 37 | brunei darussalam,bn,brn,asia,south-eastern asia,
 38 | bulgaria,bg,bgr,europe,eastern europe,
 39 | bu$rkina faso,bf,bfa,africa,sub-saharan africa,western africa
 40 | burundi,bi,bdi,africa,sub-saharan africa,eastern africa
 41 | cabo verde,cv,cpv,africa,sub-saharan africa,west$ern africa
 42 | cambodia,kh,khm,asia,south-eastern asia,
 43 |   cyprus,cy,cyp,asia,western asia,
 44 | cameroon,c$m,cmr,africa,sub-saharan africa,middle africa
 45 | canada,ca,can,americas,northern america,
 46 | cayman islands,ky,cym,americas,latin america and the caribbean,caribbean
 47 | central african republic,cf,caf,africa,sub-saharan africa,middle africa
 48 | chad,td,tcd,africa,sub-saharan africa,middle africa
 49 | chile,cl,chl,americas,latin america and the caribbean,south america
 50 | china ,cn,chn,asia,eastern asia,
 51 | christmas island,cx,cxr,oceania,australia and new zealand,
 52 | cocos islands,cc,cck,oceania,australia and new z?ealand,
 53 | colombia,co,col,americas,latin america and the caribbean,south america
 54 | com!oros,km,com,africa,sub-saharan africa,eastern africa
 55 | republic of congo,cg,cog,africa,sub-saharan africa,middle africa
 56 | democratic republic of the congo,cd,cod,africa,sub-saharan africa,middle africa
 57 | cook islands,ck,cok,oceania,polynesia,
 58 | costa rica,cr,cri,americas,latin america and the caribbean,central america
 59 | ivory coast,ci,civ,africa,sub-saharan africa,western africa
 60 | !!croatia,Hr,hrv,europe,southern europe,
 61 | cuba,cu,cub,americas,latin america and the caribbean,caribbean
 62 | curacao,cw,cuw,americas,latin a!merica and the caribbean,caribbean
 63 | cyprus,cy,cyp,asia,western asia,
 64 | czechia,cz,cze,europe,eastern europe,
 65 |   denmark,dk,dnk,europe,northern europe,
 66 | djibouti,dj,dji,africa,sub-saharan africa,eastern africa
 67 | dominica,dm,dma,americas,latin america and the caribbean,caribbean
 68 | dominican republic,do,dom,americas,latin america and the caribbean,caribbean
 69 | ecua$dor,ec,ecu,americas,latin america and the caribbean,south america
 70 | egypt,eg,egy,africa,northern africa,
 71 | el salvador,sv,slv,americas,latin america and the caribbean,central america
 72 | equatorial guinea,gq,gnq,africa,sub-saharan africa,middle africa
 73 | eritrea,er,eri,africa,sub-saharan africa,eastern africa
 74 | estonia,ee,est,europe,northern europe,
 75 | eswatini,s?z,swz,africa,sub-saharan africa,southern africa
 76 |   ethiopia,!et,eth,africa,sub-saharan africa,eastern africa
 77 | falkland islands (malvinas),fk,flk,americas,latin america and the caribbean,south america
 78 | faroe islands,fo,fro,europe,northern europe,
 79 | fiji,fj,fji,oceania,melanesia,
 80 | finland,fi,fin,europe,northern europe,
 81 | france,fr,fra,europe,western europe,
 82 | french guiana  ,gf,guf,americas,latin america and the caribbean,south america
 83 | french polynesia,pf,pyf,oceania,polynesia,
 84 | macao,mo,mac,asia,eastern asia,
 85 | french southern territories,tf,atf,africa,sub-saharan africa,eastern africa
 86 | gabon,ga,gab,africa,sub-saharan africa,middle africa
 87 | gambia,gm,gmb,africa,sub-saharan africa,western africa
 88 | georgia,g!e,geo,asia,western asia,
 89 | germany,de,deu,europe,western europe,
 90 | ghana,gh,gha,africa,sub-saharan africa,western africa
 91 | gibraltar,gi,gib,europe,southern europe,
 92 | !china,cn,chn,asia,eastern asia,
 93 | greece,gr,grc,europe,southern europe,
 94 | greenland,gl,grl,americas,northern america,
 95 | grenada,gd,grd,americas,la?tin america and the caribbean,caribbean
 96 | guadeloupe,gp,glp,americas,latin america and the caribbean,caribbean
 97 | guam,gu,gum,oceania,micronesia,
 98 |    guatemala,gt,gtm,americas,latin america and the caribbean,central america
 99 | guinea,gn,gin,africa,sub-saharan africa,western africa
100 | guinea-bissau,gw,gnb,africa,sub-saharan africa,western africa
101 | guy$ana,gY,guy,americas,latin america and the caribbean,south america
102 | haiti,ht,hti,americas,latin america and the caribbean,carib!bean
103 | holy see,va,vat,europe,southern europe,
104 | honduras,hn,hnd,americas,latin america and the caribbean,central america
105 | hong kong,hk,hkg,asia,eastern asia,
106 | hungary,hu,hun,europe,eastern europe,
107 | iceland,is,isl,europe,northern europe,
108 | india,i!n,ind,asia,southern asia,
109 | indonesia,id,idn,asia,south-eastern asia,
110 | iran,ir,irn,asia,southern asia,
111 | iraq,iq,irq,asia,western asia,
112 | ireland,ie,irl,europe,northern europe,
113 | Ireland,ie,irl,europe,northern europe,
114 | isle of man,im,imn,europe,northern europe,
115 | israel,il,isr,asia,western asia,
116 | italy,it,ita,europe,southern europe,
117 | china,cn,chn,asia,eastern asia,
118 | jamaica,jm,jam,americas,latin america and the caribbean,caribbean
119 | japan,jp,jpn,asia,eastern asia,
120 | jersey,je,jey,europe,northern europe,channel islands
121 | jordan   ,jo,jor,asia,western asia,
122 | kazakhstan  ,kz,kaz,asia,central asia,
123 | kenya,ke,ken,africa,sub-saharan africa,eastern africa
124 | kiribati,ki,kir,oc!eania,micronesia,
125 | north korea,kp,prk,asia,eastern asia,
126 | south korea,kr,kor,asia,eastern asia,
127 | kUWait,kw,kwt,asia,western asia,
128 | kyrgyzstan,kg,kgz,asia,central asia,
129 | laos,la,lao,asia,south-eastern asia,
130 | latvia,lv,lva,europe,northern europe,
131 | lebanon,lb,lbn,asia,western asia,
132 | lesotho,ls,lso,africa,sub-saharan africa,southern africa
133 | liberia,lr,lbr,africa,sub-saharan africa,western africa
134 | libya,ly,lby,afr?ica,northern africa,
135 | liechtenstein,li,lie,europe,western europe,
136 | li!thuania,lt,ltu,europe,northern europe,
137 | luxembourg,lu,lux,europe,western europe,
138 | macao,mo,mac,asia,eastern asia,
139 | kUWAit,kw,kwt,asia,western asia,
140 | madagascar?!,mg,mdg,africa,sub-saharan africa,eastern africa
141 | malawi,mw,mwi,africa,sub-saharan africa,eastern africa
142 | malaysia,my,mys,asia,south-eastern asia,
143 | maldives,mv,mdv,asia,southern asia,
144 | mali,ml,mli,africa,sub-saharan africa,western africa
145 |  malta ,mt,mlt,europe,southern europe,
146 | marshall islands,mh,mhl,oceania,micronesia,
147 | martinique,mq,mtq,americas,latin america and the caribbean,caribbean
148 | mauritania,mr,mrt,africa,sub-saharan africa,western africa
149 | mauritius,mu,mus,africa,sub-saharan africa,eastern africa
150 | mayotte,yt,myt,africa,sub-saharan africa,eastern africa
151 | libYa,ly,lby,africa,northern africa,
152 | mexico,mx,mex,americas,latin america and the caribbean,central america
153 | micronesia (federated states of),fm,fsm,oceania,micronesia,
154 | "moldova, republic of",md,mda,europe,eastern europe,
155 | monaco,mc,mco,euro#pe,western europe,
156 | mongolia,mn,mng,asia,eastern asia,
157 | mo#ntenegro,me,mne,europe,southern europe,
158 | montserrat,ms,msr,americas,latin america and the caribbean,caribbean
159 | morocco,ma,mar,africa,northern africa,
160 | mozambique,mz,moz,africa,sub-saharan africa,eastern africa
161 | myanmar,mm,mmr,asia,south-eastern asia,
162 | nAmibia,na,nam,africa,sub-saharan africa,southern africa
163 |  nauru ,nr,nru,oceania,micronesia,
164 | nepal,np,npl,asia,southern asia,
165 | marshall islands,mh,mhl,oceania,micronesia,
166 | netherlands,nl,nld,europe,western europe,
167 | new caledonia,nc,ncl,oceania,melanesia,
168 | new zea!land,nz,nzl,oceania,australia and new zealand,
169 | nicaragua,ni,nic,americas,latin america and the caribbean,central america
170 | niger,ne,ner,africa,sub-saharan africa,western africa
171 | nigeria,ng,nga,africa,sub-saharan africa,western africa
172 | niue,nu,niu,oceania,polynesia,
173 | norfolk island,nf,nfk,oceania,australia and new zealand,
174 | reunion,re,reu,africa,sub-saharan africa,eastern africa
175 | north macedonia,mk,mkd,europe,southern europe,
176 | sAudi arabia,sa,sau,asia,western asia,
177 | northern mariana islands,mp,mnp,oceania,micronesia,
178 | norway,no,nor,europe,northern europe,
179 | !!oman,om,omn,asia,western asia,
180 | pa$kistan,pk,pak,asia,southern asia,
181 | palau,pw,plw,oceania,micronesia,
182 | panama,pa,pan,americas,latin america and the caribbean,central america
183 | papua new guinea,pg,png,oceania,melanesia,
184 | paraguay,py,pry,americas,latin america and the caribbean,south america
185 | peru,pe,per,americas,latin america and the caribbean,south america
186 | philippines,ph,phl,asia,south-eastern as$ia,
187 | pitcairn,pn,pcn,oceania,polynesia,
188 | poland,pl,pol,europe,eastern europe,
189 | portugal,pt,prt,europe,southern europe,
190 | philippines   ,p?h,phl,asia,south-eastern asia,
191 | puerto rico,pr,pri,americas,latin america and the caribbean,caribbean
192 | qatar,qa,qat,asia,western asia,
193 | r!eunion,re,reu,africa,sub-saharan africa,eastern africa
194 | romania,RO,rou,europe,eastern europe,
195 | russian federation,ru,rus,europe,eastern europe,
196 | rwanda,rw,rwa,africa,sub-saharan africa,eastern africa
197 | saint barthelemy,bl,blm,americas,latin america and the caribbean,caribbean
198 | "saint helena, ascension and tristan da cunha",sh,shn,africa,sub-saharan africa,western africa
199 | saint kitts and nevis,kn,kna,americas,latin america and the caribbean,caribbean
200 | saint lucia,lc,lca,americas,latin america and the caribbean,caribbean
201 | saint martin (french part),mf,maf,americas,latin america and the caribbean,caribbean
202 | saint pierre and miquelon,pm,spm,americas,northern america,
203 | $saint vincent and the grenadines,vc,vct,americas,latin america and the caribbean,caribbean
204 | samoa,ws,wsm,oceania,polynesia,
205 | san marino,sm,smr,europe,southern europe,
206 | sao tome and principe,st,stp,africa,sub-saharan africa,middle africa
207 | saudi Arabia,sa,sau,asia,western asia,
208 | Senegal,sn,sen,africa,sub-saharan africa,western africa
209 | serbia,rs,srb,europe,southern europe,
210 | seychelles,sc,syc,africa,sub-saharan africa,eastern africa
211 | sierra leone,sl,sle,africa,sub-saharan africa,western africa
212 | zimbabwe?,zw,zwe,africa,sub-saharan africa,eastern africa
213 | singapore,sg,sgp,asia,south-eastern asia,
214 | sint ma!arten (dutch part),sx,sxm,americas,latin america and the caribbean,caribbean
215 | slovakia,sk,svk,europe,eastern europe,
216 | slovenia,si,svn,europe,southern europe,
217 | solomon islands,sb,slb,oceania,melanesia,
218 | somalia,so,som,africa,sub-saharan africa,eastern africa
219 | south africa,Za,zaf,africa,sub-saharan africa,s$outhern africa
220 | south georgia and the south sandwich islands,gs,sgs,americas,latin america and the caribbean,south america
221 | south sudan,ss,ssd,africa,sub-saharan africa,eastern africa
222 | spain,es,esp,europe,southern europe,
223 | sri lanka,lk,lka,asia,southern asia,
224 | sudan,sd,sdn,africa,northern africa,
225 | suriname,sr,sur,americas,latin america and the caribbean,south america
226 | sweden,se,swe,europe,northern europe,
227 | switzer*land,ch,che,euro?pe,western europe,
228 | syrian arab republic,sy,syr,asia,western asia,
229 | taiwan,tw,twn,asia,eastern asia,
230 | tajikistan,tj,tjk,asia,central asia,
231 | "tanzania, united republic of",tz,tza,africa,sub-saharan africa,eastern africa
232 | thailand,th,tha,asia,south-eastern asia,
233 | timor-leste,tl!,tls,asia,south-eastern asia,
234 | togo,tg,tgo,africa,sub-saharan africa,western africa
235 | tonga,to,ton,oceania,polynesia,
236 | trinidad and tobago,tt,tto,americas,latin america and the caribbean,caribbean
237 | tunisia,tn,tun,africa,northern africa,
238 | tur?key,tr,tur,asia,western asia,
239 | turkmenistan,tm,tkm,asia,central asia,
240 | turks and caicos islands,tc,tca,americas,latin america and the caribbean,caribbean
241 | tuvalu,tv,tuv,oceania,polynesia,
242 | uganda,ug,uga,africa,sub-saharan africa,eastern africa
243 | serbia,rs,srb,eur!ope,southern eur?ope,
244 | ukraine,ua,ukr,europe,eastern europe,
245 | united arab emirates,ae,are,asia,western asia,
246 | united kingdom,gb,gbr,europe,northern europe,
247 | united states of america,us,usa,americas,northern america,
248 | uruguay,uy,ury,americas,latin america and the caribbean,south america
249 | uzbe?kistan,uz,uzb,asia,central asia,
250 | vanuatu,vu,vut,oceania,melanesia,
251 | *venezuela (bolivarian republic of),ve,ven,americas,latin america and the caribbean,south america
252 | viet nam,vn,vnm,asia,south-eastern asia,
253 | virgin islands (british),vg,vgb,americas,latin america and the caribbean,caribbean
254 | virgin islands (u.s.),vi,vir,americas,latin america and the caribbean,caribbean
255 | wallis and futuna,wf,wlf,oceania,polynesia,
256 | yemen,ye,yem,asia,western asia,
257 | zambia#,zm,zmb,africa,sub-saharan africa,eastern africa
258 | zim$babwe,zw,zwe,africa,sub-saharan africa,eastern africa
259 | 


--------------------------------------------------------------------------------
/source_data/csv_data/currencies.csv:
--------------------------------------------------------------------------------
  1 | country_code_2,currency_name,currency_code
  2 | aF,afghani,afn
  3 | al,lek,all
  4 | dz,algerian dinar,dzd
  5 | as,us dollar,usd
  6 | ad,euro,eur
  7 | ao,kwanza,aoa
  8 | ai,east caribbean dollar,xcd
  9 | ag,east caribbean dollar,xcd
 10 | ar,argentine peso,ars
 11 | am,armenian dram,amd
 12 | aw,aruban florin,awg
 13 | au,australian $dollar,aud
 14 | at,euro,eur
 15 | az,azerbaijan manat,azn
 16 | bs,bahamian dollar,bsd
 17 | bh,bahraini dinar,bhd
 18 | bd,taka,bdt
 19 | bb,barbados dollar,bbd
 20 | by,belarusian ruble,byn
 21 | be,euro,eur
 22 | bz,belize dollar,bzd
 23 | bj,cfa franc bceao,xof
 24 | !bm,bermudian dollar,bmd
 25 | bt,indian rupee,inr
 26 | bt,ngultrum,btn
 27 | bo,boliviano,bob
 28 | bo,mvdol,bov
 29 | ba,convertible mark,bam
 30 | bw,pula,bwp
 31 | br,brazilian real,brl
 32 | io,us dollar,usd
 33 | bn,brunei dollar,bnd
 34 | bg,bulgarian lev,bgn
 35 | bf,cfa franc bceao,xof
 36 | bi,burundi franc,bif
 37 | cv,cabo verde escudo,cve
 38 | kh,riel,khr
 39 | cm,cfa franc beac,xaf
 40 | ca,canadian dollar,cad
 41 | ky,cayman islands dollar,kyd
 42 | cf,cfa franc beac,xaf
 43 | td,cfa franc beac,xaf
 44 | cl,chilean peso,clp
 45 | cl,unidad de fomento,clf
 46 | cn,yuan renminbi,cny
 47 | cx,australian dollar,aud
 48 | cc,australian dollar,aud
 49 | co,colombian peso,cop
 50 | co,unidad de valor real,cou
 51 | km,comorian franc ,kmf
 52 | cd,congolese franc,cdf
 53 | cg,cfa franc beac,xaf
 54 | ck,new zealand dollar,nzd
 55 | cr,costa rican colon,crc
 56 | ci,cfa franc bceao,xof
 57 | hr,euro,eur
 58 | cU,cuban peso,cup
 59 | cu,peso convertible,cuc
 60 | cw,netherlands antillean guilder,ang
 61 | cy,euro,eur
 62 | cz,czech koruna,czk
 63 | dk,danish krone,dkk
 64 | dj,djibouti franc,djf
 65 | dm,east caribbean dollar,xcd
 66 | do,dominican peso,dop
 67 | ec,us $dollar,usd
 68 | eg,egyptian pound,egp
 69 | sv,el salvador colon,svc
 70 | sv,us dollar,usd
 71 | gq,cfa franc beac,xaf
 72 | er,nakfa,ern
 73 | ee,euro,eur
 74 | sz,lilangeni,szl
 75 | et,ethiopian birr,etb
 76 | fk,falkland islands pound,fkp
 77 | fo,danish krone,dkk
 78 | fj,fiji dollar,fjd
 79 | fi,euro,eur
 80 | fr,euro,eur
 81 | gf,euro,eur
 82 | pf,cfp franc,xpf
 83 | Ga,cfa franc beac,xaf
 84 | gm,dalasi,gmd
 85 | ge,lari,gel
 86 | de,euro,eur
 87 | gh,ghana cedi,ghs
 88 | gi,gibraltar pound,gip
 89 | gr,euro,eur
 90 | gl,danish krone,dkk
 91 | gd,east caribbean dollar,xcd
 92 | gp,euro,eur
 93 | gu,us dollar,usd
 94 | gt,quetzal,gtq
 95 | gn,guinean franc,gnf
 96 | gw,cfa franc bceao,xof
 97 | gn,guyana dollar,gyd
 98 | ht,gourde,htg
 99 | ht,us dollar,usd
100 | va,euro,eur
101 | hn,lempira,hnl
102 | hk,hong kong $dollar,hkd
103 | hu,forint,huf
104 | is,iceland krona,isk
105 | in,indian rupee,inr
106 | id!,rupiah,idr
107 | ir,iranian rial,irr
108 | iq,iraqi dinar,iqd
109 | ie,euro,eur
110 | im,pound sterling,gbp
111 | il,new israeli sheqel,ils
112 | it,euro,eur
113 | jm,jamaican dollar,jmd
114 | jp,yen,jpy
115 | je,pound sterling,gbp
116 | jo,jordanian dinar,jod
117 | kz,tenge,kzt
118 | ke,kenyan shilling,kes
119 | ki,australian dollar,aud
120 | kp,north korean won,kpw
121 | kr,won,krw
122 | kw,kuwaiti dinar,kwd
123 | kg,som,kgs
124 | la,lao kip,lak
125 | lv,euro,eur
126 | lb,lebanese pound,lbp
127 | ls,loti,lsl
128 | ls,rand,zar
129 | lr,liberian dollar,lrd
130 | ly,libyan dinar,lyd
131 | li,swiss franc,chf
132 | !!lt,euro,eur
133 | lu,euro,eur
134 | mo,pataca,mop
135 | mk,denar,mkd
136 | mg,malagasy ariary,mga
137 | mw,malawi !kwacha,mwk
138 | my,malaysian ringgit,myr
139 | mv,rufiyaa,mvr
140 | ml,cfa franc bceao,xof
141 | mt,euro,eur
142 | mh,us dollar,usd
143 | mq,euro,eur
144 | mr,ouguiya,mru
145 | mu,mauritius rupee,mur
146 | yt,euro,eur
147 | mx,mexican peso,mxn
148 | mx,mexican unidad de inversion (udi),mxv
149 | fm,us dollar,usd
150 | md,moldovan leu,mdl
151 | mc,euro,eur
152 | mN,tugrik,mnt
153 | me,euro,eur
154 | ms,east caribbean dollar,xcd
155 | ma,moroccan dirham,mad
156 | mz,mozambique metical,mzn
157 | mm,kyat,mmk
158 | na,namibia dollar,nad
159 | na,rand,zar
160 | nr,australian dollar,aud
161 | np,nepalese $rupee,npr
162 | nl,euro,eur
163 | nc,cfp franc,xpf
164 | nz,new zealand dollar,nzd
165 | ni,cordoba oro,nio
166 | ni,cfa franc bceao,xof
167 | ng,naira,ngn
168 | nu,new zealand dollar,nzd
169 | nf,australian dollar,aud
170 | mp,us dollar,usd
171 | no,norwegian krone,nok
172 | om,rial omani,omr
173 | pk,pakistan rupee,pkr
174 | pw,us dollar,usd
175 | pa,balboa,pab
176 | PA,us dollar,usd
177 | pg,kina,pgk
178 | py,guarani,pyg
179 | pe,sol,pen
180 | ph,philippine peso,php
181 | pn,new zealand dollar,nzd
182 | pl,zloty,pln
183 | pt,euro,eur
184 | pr,us dollar,usd
185 | qa,qatari rial,qar
186 | re,euro,eur
187 | ro,roman?ian leu,ron
188 | ru,russian ruble,rub
189 | rw,rwanda franc,rwf
190 | bl,euro,eur
191 | sh,saint helena pound,shp
192 | kn,east caribbean dollar,xcd
193 | ls,east caribbean dollar,xcd
194 | mf,euro,eur
195 | pm,euro,eur
196 | vc,east caribbean dollar,xcd
197 | ws,tala,wst
198 | sm,euro,eur
199 | st,dobra,stn
200 | sa,saudi riyal,sar
201 | sn,cfa franc bceao,xof
202 | rs!,serbian dinar,rsd
203 | sc,seychelles rupee,scr
204 | sl,leone,sll
205 | sl,leone,sle
206 | sg,singapore dollar,sgd
207 | sx,netherlands antillean guilder,ang
208 | sk,euro,eur
209 | sk,euro,eur
210 | sb,solomon islands dollar,sbd
211 | so,somali shilling,sos
212 | za,rand,zar
213 | gs,no universal currency,
214 | ss,south sudanese pound,ssp
215 | es,euro,eur
216 | lk,sri lanka rupee,lkr
217 | sd,sudanese ?pound,sdg
218 | sr,surinam dollar,srd
219 | se,swedish krona,sek
220 | ch,swiss franc,chf
221 | ch,wir euro,che
222 | ch,wir franc,chw
223 | sy,syrian pound,syp
224 | tw,new taiwan dollar,twd
225 | tj,somoni,tjs
226 | tz,tanzanian shilling,tzs
227 | th,baht,thb
228 | tl,us dollar,usd
229 | Tg,cfa franc bceao,xof
230 | to,pa'anga,top
231 | tt,trinidad and tobago dollar,ttd
232 | tn,tunisian dinar,tnd
233 | tr,turkish lira,try
234 | tm,turkmenistan new manat,tmt
235 | tc,us dollar,usd
236 | tv,australian dollar,aud
237 | ug,uganda shilling,ugx
238 | uA,hryvnia,uah
239 | ae,uae dirham,aed
240 | gb,pound sterling,gbp
241 | us,us dollar,usd
242 | uy,peso uruguayo,uyu
243 | uy,uruguay peso en unidades indexadas (ui),uyi
244 | uy,unidad previsional,uyw
245 | uz,uzbekistan sum,uzs
246 | vu,vatu,vuv
247 | ve,bolivar soberano,ves
248 | ve,bolivar soberano,ved
249 | vn,dong,vnd
250 | vg,us d$ollar,usd
251 | vi,us dollar,usd
252 | wf,cfp franc,xpf
253 | ye,yemeni rial,yer
254 | zm,zambian kwacha,zmw
255 | zw,zimbabwe dollar,zwl
256 | 


--------------------------------------------------------------------------------
/source_data/csv_data/languages.csv:
--------------------------------------------------------------------------------
  1 | language,country_code_2
  2 | pashto,af
  3 | persian,af
  4 | u?zbek,af
  5 | albanian,aL
  6 | english,al
  7 | arabic,dz
  8 | french,dz
  9 | kabyle,dz
 10 | english,as
 11 | catalan,ad
 12 | english,ad
 13 | lingala,ao
 14 | portuguese,ao
 15 | english,ai
 16 | spanish,ai
 17 | english,ag
 18 | spanish,ag
 19 | english,ar
 20 | spanish,ar
 21 | armenian,am
 22 | dutch,aw
 23 | spanish,aw
 24 | english,au
 25 | english,at
 26 | german,at
 27 | azerbaijani,az
 28 | english,bs
 29 | spanish,bs
 30 | arabic,bh
 31 | bangla,bd
 32 | chakma,bD
 33 | english,bd
 34 | english,bb
 35 | spanish,bb
 36 | be?larusian,by
 37 | russian,by
 38 | dutch,be
 39 | english,be
 40 | french,be
 41 | german,be
 42 | walloon,be
 43 | english,bz
 44 | spanish,bz
 45 | french,bj
 46 | yoruba,bj
 47 | english,bm
 48 | spanish,bm
 49 | dzongkha,bt
 50 | quechua,bo
 51 | spanish,bo
 52 | bosnian,ba
 53 | croatian,ba
 54 | english,ba
 55 | serbian,ba
 56 | english,bw
 57 | tswana,bw
 58 | english,br
 59 | portuguese,br
 60 | spanish,br
 61 | english,io
 62 | english,vg
 63 | spanish,vg
 64 | malay,bn
 65 | malay,bn
 66 | bulgarian,bg
 67 | english,bg
 68 | french,bf
 69 | fulah,bf
 70 | english,bi
 71 | french,bi
 72 | rundi,bi
 73 | khmer,kh
 74 | aghem,cm
 75 | bafia,cm
 76 | basaa,cm
 77 | duala,cm
 78 | english,cm
 79 | ewondo,cm
 80 | french,cm
 81 | fulah,cm
 82 | kako,cm  
 83 | kwasio,cm
 84 | meta,cm
 85 | mundang,cm
 86 | ngiemboon,cm
 87 | ngomba,cm
 88 | yangben,cm
 89 | english,ca
 90 | french,ca
 91 | inuktitut,ca
 92 | mohawk,ca
 93 | spanish,ca
 94 | kabuverdianu,cv
 95 | portuguese,cv
 96 | dutch,bq
 97 | spanish,bq
 98 | english,ky
 99 | spanish,ky
100 | french,cf
101 | lingala,cf
102 | sango,cf
103 | arabic,td
104 | french,td
105 | english,cl
106 | mapuche,cl
107 | spanish,cl
108 | cantonese,cn
109 | chinese,cn
110 | chinese,cn
111 | english,cn
112 | sichuan yi,cn
113 | tibetan,cn
114 | uyghur,cn
115 | english,cx
116 | english,cc
117 | english,co
118 | spanish,co
119 | arabic,km
120 | french,km
121 | french,cg
122 | lingala,cg
123 | french,cd
124 | lingala,cd
125 | luba-katanga,cd
126 | swahili,cd
127 | english,ck
128 | spanish,cr
129 | french,ci
130 | croatian,hr
131 | english,hr
132 | spanish,cu
133 | dutch,cw
134 | spanish,cw
135 | english,cy
136 | greek,cy
137 | turkish,cy
138 | czech,cz
139 | english,cz
140 | danish,dk
141 | english,dk
142 | faroese,dk
143 | arabic,dj
144 | french,dj
145 | somali,dj
146 | english,dm
147 | spanish,dm
148 | spanish,do
149 | quechua,ec
150 | spanish,ec
151 | arabic,eg
152 | spanish,sv
153 | french,gq
154 | portuguese,gq
155 | spanish,gq
156 | arabic,   er
157 | blin,er
158 | english,er
159 | geez,er
160 | tigre,er
161 | tigrinya,er
162 | english,ee
163 | estonian,ee
164 | english,sz
165 | swati,sz
166 | amharic,et
167 | geez,et
168 | oromo,et
169 | somali,et
170 | tigrinya,et
171 | wolaytta,et
172 | english,fk
173 | spanish,fk
174 | faroese,fo
175 | english,fj
176 | english,fi
177 | finnish,fi
178 | inari sami,fi
179 | northern sami,fi
180 | swedish,fi
181 | breton,fr
182 | catalan,fr
183 | corsican,fr
184 | english,fr
185 | french,fr
186 | occitan,fr
187 | portuguese,fr
188 | swiss german,fr
189 | french,gf
190 | spanish,gf
191 | french,pf
192 | french,ga
193 | english,gm
194 | fulah,gm
195 | georgian,ge
196 | ossetic,ge
197 | colognian,de
198 | english,de
199 | german,de
200 | low german,de
201 | lower sorbian,de
202 | upper sorbian,de
203 | akan,gh
204 | english,gh
205 | ewe,gh
206 | fulah,gh
207 | ga,gh
208 | hausa,gh
209 | english,gi
210 | english,gr
211 | greek,gr
212 | danish,gl
213 | kalaallisut,gl
214 | spanish,gl
215 | english,gd
216 | spanish,gd
217 | french,gp
218 | spanish,gp
219 | e!nglish,gu
220 | spanish,gt
221 | fulah,gw
222 | portuguese,gw
223 | french,gn
224 | fulah,gn
225 | kpelle,gn
226 | english,gy
227 | spanish,gy
228 | french,ht
229 | spanish,ht
230 | spanish,hn
231 | can#tonese,hk
232 | chinese,hk
233 | chinese,hk
234 | english,hk
235 | english,hu
236 | hungarian,hu
237 | english,is
238 | icelandic,is
239 | assamese,in
240 | bangla,in
241 | bodo,in
242 | chakma,in
243 | english,in
244 | gujarati,in
245 | hindi,in
246 | kannada,in
247 | kashmiri,in
248 | konkani,in
249 | malayalam,in
250 | manipuri,in
251 | marathi,in
252 | nepali,in
253 | odia,in
254 | punjabi,in
255 | sanskrit,in
256 | santali,in
257 | tamil,in
258 | telugu,in
259 | tibetan,in
260 | urdu,in
261 | english,id
262 | indonesian,id
263 | javanese,id
264 | "kurdish, sorani",ir
265 | mazanderani,ir
266 | northern luri,ir
267 | persian,ir
268 | arabic,iq
269 | "kurdish, sorani",iq
270 | northern luri,iq
271 | syriac,iq
272 | english,ie
273 | irish,ie
274 | english,iM
275 | manx,im
276 | arabic,il
277 | english,il
278 | hebrew,il
279 | catalan,it
280 | english,it
281 | friulian,it
282 | german,it
283 | italian,it
284 | sardinian,it
285 | sicilian,it
286 | english,jm
287 | english,jp
288 | japanese,jp
289 | english,je
290 | arabic,jo
291 | kazakh,kz
292 | russian,kz
293 | embu,ke
294 | english,ke
295 | gusii,ke
296 | kalenjin,ke
297 | kamba,ke
298 | kikuyu,ke
299 | luo,ke
300 | luyia,ke
301 | masai,ke
302 | meru,ke
303 | oromo,ke
304 | samburu,ke
305 | somali,ke
306 | swahili,ke
307 | taita,ke
308 | teso,ke
309 | english,ki
310 | arabic,kw
311 | kyrgyz,kg
312 | russian,kg
313 | lao,la
314 | english,lv
315 | latvian,lv
316 | arabic,lb
317 | english,ls
318 | southern sotho,ls
319 | english,lr
320 | fulah,lr
321 | kpelle,lr
322 | vai,lr
323 | arabic,ly
324 | german,li
325 | swiss german,li
326 | english,lt
327 | lithuanian,lt
328 | english,lu
329 | french,lu
330 | german,lu
331 | luxembourgish,lu
332 | portuguese,lu
333 | chinese,mo
334 | chinese,mo
335 | english,mo
336 | portuguese,mo
337 | english,mg
338 | french,mg
339 | malagasy,mg
340 | english,mw
341 | nyanja,mw
342 | english,my
343 | malay,my
344 | tamil,my
345 | dhivehi,mv
346 | english,mv
347 | bambara,ml
348 | french,ml
349 | koyra chiini,ml
350 | koyraboro senni,ml
351 | english,mt
352 | maltese,mt
353 | english,mh
354 | french,mq
355 | spanish,mq
356 | arabic,mr
357 | french,mr
358 | fulah,mr
359 | english,mu
360 | french,mu
361 | morisyen,mu
362 | f#rench,yt
363 | english,mx
364 | spanish,mx
365 | english,fm
366 | romanian,md
367 | russian,md
368 | french,mc
369 | mongolian,mn
370 | english,me
371 | serbian,me
372 | serbian,me
373 | english,ms
374 | spanish,ms
375 | arabic,ma
376 | french,ma
377 | moroccan tamazight,ma
378 | tachelhit,ma
379 | makhuwa-meetto,mz
380 | portuguese,mz
381 | sena,mz
382 | burmese,mm
383 | english,mm
384 | afrikaans,na
385 | english,na
386 | nama,na
387 | english,nr
388 | nepali,np
389 | dutch,nl
390 | english,nl
391 | low german,nl
392 | western frisian,nl
393 | fre$nch,nc
394 | english,nz
395 | maori,nz
396 | spanish,ni
397 | french,ne
398 | fulah,ne
399 | hausa,ne
400 | tasawaq,ne
401 | zarma,ne
402 | english,ng
403 | fulah,ng
404 | hausa,ng
405 | igbo,ng
406 | jju,ng
407 | tyap,ng
408 | yoruba,ng
409 | english,nu
410 | english,nf
411 | korean,kp
412 | albanian,mk
413 | macedonian,mk
414 | english,mp
415 | english,no
416 | northern sami,no
417 | norwegian nynorsk,no
418 | arabic,om
419 | english,pk
420 | pashto,pk
421 | punjabi,pk
422 | sindhi,pk
423 | urdu,pk
424 | english,pw
425 | spanish,pa
426 | english,pg
427 | guarani,py
428 | spanish,py
429 | quechua,pe
430 | spanish,pe
431 | cebuano,ph
432 | english,ph
433 | filipino,ph
434 | spanish,ph
435 | english,pn
436 | english,pl
437 | polish,pl
438 | english,pt
439 | portuguese,pt
440 | english,pr
441 | spanish,pr
442 | arabic,Qa
443 | french,re
444 | english,ro
445 | romanian,ro
446 | bashkir,ru
447 | chechen,ru
448 | chuvash,ru
449 | engli$sh,ru
450 | erzya,ru
451 | ossetic,ru
452 | russian,ru
453 | sakha,ru
454 | tatar,ru
455 | english,rw
456 | french,rw
457 | kinyarwanda,rw
458 | english,ws
459 | italian,sm
460 | portuguese,st
461 | arabic,sa
462 | english,sa
463 | french,sn
464 | fulah,sn
465 | jola-fonyi,sn
466 | wolof,sn
467 | english,rs
468 | serbian,rs
469 | english,sc
470 | french,sc
471 | english,sl
472 | fulah,sl
473 | chinese,sg
474 | english,sg
475 | malay,sg
476 | tamil,sg
477 | dutch,sx
478 | english,sx
479 | spanish,sx
480 | english,sk
481 | slovak,sk
482 | english,si
483 | slovenian,si
484 | english,sb
485 | arabic,so
486 | somali,so
487 | afrikaans,za
488 | english,za
489 | northern sotho,za
490 | south ndebele,za
491 | southern sotho,za
492 | swati,za
493 | tsonga,za
494 | tswana,za
495 | ve!nda,za
496 | xhosa,za
497 | zulu,za
498 | english,kr
499 | korean,kr
500 | arabic,ss
501 | english,ss
502 | nuer,ss
503 | asturian,es
504 | basque,es
505 | catalan,es
506 | english,es
507 | galician,es
508 | spanish,es
509 | sinhala,lk
510 | tamil,lk
511 | french,bl
512 | spanish,bl
513 | english,sh
514 | english,kn
515 | spanish,kn
516 | english,lc
517 | spanish,lc
518 | french,mf
519 | spanish,mf
520 | french,pm
521 | spanish,pm
522 | english,vc
523 | spanish,vc
524 | ara$bic,sd
525 | english,sd
526 | dutch,sR
527 | spanish,sr
528 | english,se
529 | northern sami,se
530 | swedish,se
531 | english,ch
532 | french,ch
533 | german,ch
534 | italian,ch
535 | portuguese,ch
536 | romansh,ch
537 | swiss german,ch
538 | walser,ch
539 | arabic,sy
540 | french,sy
541 | syriac,sy
542 | chinese,tw
543 | english,tw
544 | taroko,tw
545 | tajik,tj
546 | asu,tz
547 | bena,tz
548 | english,tz
549 | langi,tz
550 | machame,tz
551 | makonde,tz
552 | masai,tz
553 | ro$mbo,tz
554 | rwa,tz
555 | sangu,tz
556 | shambala,tz
557 | swahili,tz
558 | vunjo,tz
559 | english,th
560 | thai,th
561 | portuguese,tl
562 | ewe,tg
563 | french,tg
564 | english,to
565 | tongan,to
566 | english,tt
567 | spanish,  tt
568 | arabic,tn
569 | french,tn
570 | english,tr
571 | kurdish,tr
572 | turkish,tr
573 | turkmen,tm
574 | english,tc
575 | spanish,tc
576 | english,tv
577 | english,vi
578 | spanish,vi
579 | chiga,ug
580 | english,ug
581 | g!anda,ug
582 | nyankole,ug
583 | soga,ug
584 | swahili,ug
585 | teso,ug
586 | english,ua
587 | russian,ua
588 | ukrainian,ua
589 | arabic,ae
590 | english,ae
591 | cornish,gb
592 | english,gb
593 | scottish gaelic,gb
594 | welsh,gb
595 | english,us
596 | spanish,uy
597 | uzbek,uz
598 | uzbek,uz
599 | english,vu
600 | french,vu
601 | italian,va
602 | spanish,ve
603 | vietnamese,vn
604 | french,wF
605 | arabic,ye
606 | bemba,zm
607 | en!glish,zm
608 | english,zw
609 | shona,zw
610 | 


--------------------------------------------------------------------------------
/source_data/csv_output/output.csv:
--------------------------------------------------------------------------------
 1 | row_id,insert_date,country_name,city_name,population,languages
 2 | 1,2022-04-09,argentina,santa maria,9526,"{english,spanish}"
 3 | 2,2022-04-09,brazil,clevelandia,16450,"{spanish,english,portuguese}"
 4 | 3,2022-04-09,brazil,conceicao do rio verde,12949,"{english,portuguese,spanish}"
 5 | 4,2022-04-09,brazil,curaca,35208,"{english,portuguese,spanish}"
 6 | 5,2022-04-09,brazil,japoata,12938,"{spanish,english,portuguese}"
 7 | 6,2022-04-09,haiti,petite riviere de nippes,29815,"{spanish,french}"
 8 | 7,2022-04-09,peru,puerto pimentel,15552,"{quechua,spanish}"
 9 | 8,2022-04-28,argentina,ciudad general belgrano,92957,"{english,spanish}"
10 | 9,2022-04-28,brazil,caete,40750,"{portuguese,spanish,english}"
11 | 10,2022-04-28,brazil,catanduva,119480,"{spanish,english,portuguese}"
12 | 11,2022-04-28,brazil,colider,32120,"{english,spanish,portuguese}"
13 | 12,2022-04-28,brazil,miracema do tocantins,20684,"{spanish,english,portuguese}"
14 | 13,2022-04-28,brazil,presidente medici,18571,"{spanish,english,portuguese}"
15 | 14,2022-04-28,colombia,coyaima,28443,"{spanish,english}"
16 | 15,2022-04-28,colombia,manati,13456,"{spanish,english}"
17 | 16,2022-04-28,dominican republic,licey al medio,25539,{spanish}
18 | 17,2022-04-28,mexico,amatitan,15344,"{spanish,english}"
19 | 18,2022-04-28,mexico,san juan zitlaltepec,19600,"{spanish,english}"
20 | 19,2022-04-28,mexico,teopisca,20044,"{english,spanish}"
21 | 20,2022-08-11,brazil,charqueadas,35320,"{portuguese,spanish,english}"
22 | 21,2022-08-11,brazil,chopinzinho,19679,"{portuguese,english,spanish}"
23 | 22,2022-08-11,brazil,jaguarao,27931,"{portuguese,spanish,english}"
24 | 23,2022-08-11,ecuador,vinces,30241,"{spanish,quechua}"
25 | 24,2022-08-11,guatemala,el palmar,31706,{spanish}
26 | 25,2022-08-11,nicaragua,villanueva,25660,{spanish}
27 | 26,2022-08-11,puerto rico,guaynabo,70542,"{english,spanish}"
28 | 


--------------------------------------------------------------------------------
/source_data/scripts/build_tables.sql:
--------------------------------------------------------------------------------
  1 | /*
  2 | 	SQL Code Challenge
  3 | 	Author: Jaime M. Shaker
  4 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
  5 | 	Website: https://www.shaker.dev
  6 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
  7 | 	
  8 | 	File Name: build_tables.sql
  9 | */
 10 | 
 11 | -- We must insure that our data is properly organized.  Let's create a schema
 12 | -- specificaly for importing/copying data from our CSV files.
 13 | CREATE SCHEMA IF NOT EXISTS import_data;
 14 | -- Create import_data.countries table
 15 | -- Drop this table everytime this script is run to ensure repeatability.
 16 | DROP TABLE IF EXISTS import_data.countries;
 17 | -- Create a new table and add a country_id to use as a primary key.
 18 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY.
 19 | -- We will also create a created_on field that will have the date that the table was created.
 20 | CREATE TABLE import_data.countries (
 21 | 	-- Create an auto-incrementing, unique key for every row.  
 22 | 	-- This will also be used as our Primary Key.
 23 | 	country_id INT GENERATED ALWAYS AS IDENTITY,
 24 | 	country_name TEXT,
 25 | 	country_code_2 TEXT,
 26 | 	country_code_3 TEXT,
 27 | 	region TEXT,
 28 | 	sub_region TEXT,
 29 | 	intermediate_region TEXT,
 30 | 	created_on date,
 31 | 	PRIMARY KEY (country_id)
 32 | );
 33 | -- We will use the COPY statment to extract the data from our CSV files.
 34 | COPY import_data.countries (
 35 | 	country_name,
 36 | 	country_code_2,
 37 | 	country_code_3,
 38 | 	region,
 39 | 	sub_region,
 40 | 	intermediate_region
 41 | )
 42 | -- PostgreSQL stores its data in /var/lib/postgresql/
 43 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access.
 44 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv'
 45 | -- The CSV files are comma separated and include headers.
 46 | WITH DELIMITER ',' HEADER CSV;
 47 | 
 48 | -- Using the same process, lets create tables for all of our csv files.
 49 | /* Create import.cities table */
 50 | DROP TABLE IF EXISTS import_data.cities;
 51 | CREATE TABLE import_data.cities (
 52 | 	city_id INT GENERATED ALWAYS AS IDENTITY,
 53 | 	city_name TEXT,
 54 | 	latitude TEXT,
 55 | 	longitude TEXT,
 56 | 	country_code_2 TEXT,
 57 | 	capital TEXT,
 58 | 	population TEXT,
 59 | 	insert_date TEXT,
 60 | 	PRIMARY KEY (city_id)
 61 | );
 62 | 
 63 | COPY import_data.cities (
 64 | 	city_name,
 65 | 	latitude,
 66 | 	longitude,
 67 | 	country_code_2,
 68 | 	capital,
 69 | 	population,
 70 | 	insert_date
 71 | )
 72 | FROM '/var/lib/postgresql/source_data/csv_data/cities.csv'
 73 | WITH DELIMITER ',' HEADER CSV;
 74 | 
 75 | /* Create import.currencies table */
 76 | DROP TABLE IF EXISTS import_data.currencies;
 77 | CREATE TABLE import_data.currencies (
 78 | 	currency_id INT GENERATED ALWAYS AS IDENTITY,
 79 | 	country_code_2 TEXT,
 80 | 	currency_name TEXT,
 81 | 	currency_code TEXT,
 82 | 	PRIMARY KEY (currency_id)
 83 | );
 84 | 
 85 | COPY import_data.currencies (
 86 | 	country_code_2,
 87 | 	currency_name,
 88 | 	currency_code
 89 | )
 90 | FROM '/var/lib/postgresql/source_data/csv_data/currencies.csv'
 91 | WITH DELIMITER ',' HEADER CSV;
 92 | 
 93 | /* Create import.languages table */
 94 | DROP TABLE IF EXISTS import_data.languages;
 95 | CREATE TABLE import_data.languages (
 96 | 	language_id INT GENERATED ALWAYS AS IDENTITY,
 97 | 	language TEXT,
 98 | 	country_code_2 TEXT,
 99 | 	PRIMARY KEY (language_id)
100 | );
101 | 
102 | COPY import_data.languages (
103 | 	language,
104 | 	country_code_2
105 | )
106 | FROM '/var/lib/postgresql/source_data/csv_data/languages.csv'
107 | WITH DELIMITER ',' HEADER CSV;


--------------------------------------------------------------------------------
/source_data/scripts/cleanup_db.sql:
--------------------------------------------------------------------------------
 1 | /*
 2 | 	SQL Code Challenge
 3 | 	Author: Jaime M. Shaker
 4 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
 5 | 	Website: https://www.shaker.dev
 6 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
 7 | 	
 8 | 	File Name: cleanup_db.sql
 9 | */
10 | 	
11 | -- Drop All import_data Tables and Schema
12 | 
13 | DROP TABLE import_data.countries;
14 | DROP TABLE import_data.cities;
15 | DROP TABLE import_data.currencies;
16 | DROP TABLE import_data.languages;
17 | DROP SCHEMA import_data CASCADE;
18 | 


--------------------------------------------------------------------------------
/source_data/scripts/code_challenge.sql:
--------------------------------------------------------------------------------
  1 | /*
  2 | 	SQL Code Challenge
  3 | 	Author: Jaime M. Shaker
  4 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
  5 | 	Website: https://www.shaker.dev
  6 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
  7 | 	
  8 | 	File Name: code_challenge.sql
  9 | */
 10 | 
 11 | 
 12 | /* Question 2. 
 13 |  * 
 14 |  * List all of the regions and the total number of countries in each region.  
 15 |  * Order by country count in descending order and capitalize the region name.
 16 |  * 
 17 |  */
 18 | 
 19 | SELECT 
 20 | 	-- initcap() capitalizes the first letter of every word in a string.
 21 | 	initcap(region) AS region,
 22 | 	count(*) AS country_count
 23 | FROM
 24 | 	cleaned_data.countries
 25 | GROUP BY
 26 | 	-- Aggregate functions 'count()' require you to group all column fields.
 27 | 	region
 28 | ORDER BY 
 29 | 	country_count DESC;
 30 | 
 31 | /*
 32 | 
 33 | region   |country_count|
 34 | ---------+-------------+
 35 | Africa   |           59|
 36 | Americas |           57|
 37 | Asia     |           50|
 38 | Europe   |           48|
 39 | Oceania  |           26|
 40 | Antartica|            1|
 41 | 
 42 | */
 43 | 
 44 | /* Question 3. 
 45 |  * 
 46 |  * List all of the sub-regions and the total number of cities in each sub-region.  
 47 |  * Order by sub-region name alphabetically.
 48 |  * 
 49 |  */
 50 | 
 51 | SELECT 
 52 | 	initcap(t1.sub_region) AS sub_region,
 53 | 	count(*) AS city_count
 54 | FROM
 55 | 	cleaned_data.countries AS t1
 56 | JOIN 
 57 | 	cleaned_data.cities AS t2
 58 | ON
 59 | 	t1.country_code_2 = t2.country_code_2
 60 | GROUP BY
 61 | 	t1.sub_region
 62 | ORDER BY 
 63 | 	t1.sub_region;
 64 | 
 65 | /*
 66 | 
 67 | sub_region                     |city_count|
 68 | -------------------------------+----------+
 69 | Australia And New Zealand      |       329|
 70 | Central Asia                   |       560|
 71 | Eastern Asia                   |      3164|
 72 | Eastern Europe                 |      2959|
 73 | Latin America And The Caribbean|      7204|
 74 | Melanesia                      |        60|
 75 | Micronesia                     |        15|
 76 | Northern Africa                |      1152|
 77 | Northern America               |      5844|
 78 | Northern Europe                |      2025|
 79 | Polynesia                      |        22|
 80 | Southeastern Asia              |      2627|
 81 | Southern Asia                  |      6848|
 82 | Southern Europe                |      3238|
 83 | Subsaharan Africa              |      3223|
 84 | Western Asia                   |      1400|
 85 | Western Europe                 |      3952|
 86 | 
 87 | */
 88 | 
 89 | /* Question 4. 
 90 |  * 
 91 |  * List all of the countries and the total number of cities in the Northern Europe sub-region.  
 92 |  * List the country names in uppercase and order the list by the length of the country name and 
 93 |  * alphabetically in ascending order.
 94 |  * 
 95 |  */
 96 | 
 97 | SELECT 
 98 | 	-- upper() returns your string in uppercase.
 99 | 	upper(t1.country_name) AS country_name,
100 | 	count(*) AS city_count
101 | FROM
102 | 	cleaned_data.countries AS t1
103 | JOIN 
104 | 	cleaned_data.cities AS t2
105 | ON 
106 | 	t1.country_code_2 = t2.country_code_2
107 | WHERE
108 | 	t1.sub_region = 'northern europe'
109 | GROUP BY 
110 | 	t1.country_name
111 | ORDER BY 
112 | 	-- length() returns the number or characters in a string including spaces.  
113 | 	length(t1.country_name), t1.country_name;
114 | 
115 | /*
116 | 
117 | country_name  |city_count|
118 | --------------+----------+
119 | JERSEY        |         1|
120 | LATVIA        |        39|
121 | NORWAY        |       127|
122 | SWEDEN        |       148|
123 | DENMARK       |        75|
124 | ESTONIA       |        20|
125 | FINLAND       |       142|
126 | ICELAND       |        12|
127 | IRELAND       |        64|
128 | LITHUANIA     |        61|
129 | ISLE OF MAN   |         2|
130 | FAROE ISLANDS |        29|
131 | UNITED KINGDOM|      1305|
132 | 
133 | */
134 | 	
135 | /* Question 5.
136 |  * 
137 |  *  List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021.  
138 |  * 	Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities.
139 |  * 
140 |  */ 
141 | 
142 | SELECT 
143 | 	initcap(t1.country_name) AS country_name,
144 | 	count(*) AS city_count
145 | FROM
146 | 	cleaned_data.countries AS t1
147 | JOIN 
148 | 	cleaned_data.cities AS t2
149 | ON 
150 | 	t1.country_code_2 = t2.country_code_2
151 | WHERE
152 | 	t1.sub_region = 'southern europe'
153 | AND
154 | 	-- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours.....
155 | 	EXTRACT('year' FROM t2.insert_date) = 2021
156 | GROUP BY 
157 | 	t1.country_name
158 | ORDER BY 
159 | 	-- substring() function returns a specific portion of a string.
160 | 	substring(t1.country_name,length(t1.country_name),1), city_count;
161 | 
162 | /*
163 | 
164 | country_name          |city_count|
165 | ----------------------+----------+
166 | Andorra               |         5|
167 | Albania               |        11|
168 | Bosnia And Herzegovina|        15|
169 | Croatia               |        22|
170 | North Macedonia       |        28|
171 | Malta                 |        32|
172 | Serbia                |        58|
173 | Slovenia              |        74|
174 | Greece                |        64|
175 | Portugal              |       109|
176 | Spain                 |       302|
177 | San Marino            |         2|
178 | Montenegro            |        12|
179 | Italy                 |       542|
180 | 
181 | */
182 | 
183 | /* Question 6.
184 |  * 
185 |  * List all of the countries in the region of Asia that did **NOT** have a city 
186 |  * with an inserted date from June 2021 through Sept 2021.
187 |  * 
188 |  */ 
189 | 
190 | SELECT 
191 | 	-- Distinct will only return unique values
192 | 	DISTINCT initcap(t1.country_name) AS country_name
193 | FROM
194 | 	cleaned_data.countries AS t1
195 | -- Left join will return all matching values from the left table (cleaned_data.countries) and
196 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where
197 | -- there are no matches in the left table.
198 | LEFT JOIN 
199 | 	cleaned_data.cities AS t2
200 | ON 
201 | 	t1.country_code_2 = t2.country_code_2
202 | AND
203 | 	t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01'
204 | WHERE
205 | 	t1.region = 'asia'
206 | -- Only return values that did NOT have a match with the countries table.
207 | AND 
208 | 	t2.country_code_2 IS NULL;
209 | 
210 | /*
211 | 
212 | country_name     |
213 | -----------------+
214 | Brunei Darussalam|
215 | Kuwait           |
216 | Macao            |
217 | Singapore        |
218 | 
219 | */
220 | 
221 | /* Question 7.
222 |  * 
223 |  * List the country, city name, population and city name length for the city names that are palindromes in the 
224 |  * Western Asia sub-region.  Format the population with a thousands separator (1,000) and format the length of 
225 |  * the city name in roman numerals.  Order by the length of the city name in descending order and 
226 |  * alphabetically in ascending order.
227 |  * 
228 |  */
229 | 
230 | SELECT 
231 | 	initcap(t1.country_name) AS country_name,
232 | 	initcap(t2.city_name) AS city_name,
233 | 	-- to_char() takes non-string data types and returns them as strings.
234 | 	to_char(t2.population, '999G999') AS population,
235 | 	to_char(length(t2.city_name), 'RN') AS roman_numeral_length
236 | FROM
237 | 	cleaned_data.countries AS t1
238 | JOIN 
239 | 	cleaned_data.cities AS t2
240 | ON 
241 | 	t1.country_code_2 = t2.country_code_2
242 | WHERE
243 | 	t2.city_name = reverse(t2.city_name)
244 | AND
245 | 	t1.sub_region = 'western asia'
246 | ORDER BY 
247 | 	length(t2.city_name) DESC, t2.city_name ASC;
248 | 
249 | /*
250 | 
251 | country_name        |city_name|population|roman_numeral_length|
252 | --------------------+---------+----------+--------------------+
253 | Yemen               |Hajjah   |  46,568  |             VI     |
254 | Syrian Arab Republic|Hamah    | 696,863  |              V     |
255 | Turkey              |Kavak    |  21,692  |              V     |
256 | Turkey              |Kinik    |  29,803  |              V     |
257 | Turkey              |Tut      |  10,161  |            III     |
258 | 
259 | */
260 | 
261 | 
262 | /* Question 8.
263 |  * 
264 |  * List all of the countries that end in 'stan'.  Make your query case-insensitive and list 
265 |  * whether the total population of the cities listed is an odd or even number for cities inserted in 2022.  
266 |  * Order by whether the population value is odd or even in ascending order and country name in alphabetical order.
267 |  * 
268 |  */
269 | 
270 | SELECT
271 | 	initcap(t1.country_name) AS country_name,
272 | 	to_char(sum(t2.population), '99G999G999') total_population,
273 | 	CASE
274 | 		WHEN (sum(t2.population) % 2) = 0
275 | 			THEN 'Even'
276 | 		ELSE 
277 | 			'Odd'
278 | 	END AS odd_or_even
279 | FROM
280 | 	cleaned_data.countries AS t1
281 | JOIN 
282 | 	cleaned_data.cities AS t2
283 | ON 
284 | 	t1.country_code_2 = t2.country_code_2
285 | WHERE
286 | 	t1.country_name ILIKE '%stan'
287 | AND 
288 | 	EXTRACT('year' FROM t2.insert_date) = 2022
289 | GROUP BY
290 | 	t1.country_name
291 | ORDER BY 
292 | 	odd_or_even, country_name;
293 | 
294 | /*
295 | 
296 | country_name|total_population|odd_or_even|
297 | ------------+----------------+-----------+
298 | Afghanistan |  6,006,530     |Even       |
299 | Kazakhstan  |  4,298,264     |Even       |
300 | Kyrgyzstan  |  1,017,644     |Even       |
301 | Pakistan    | 26,344,480     |Even       |
302 | Tajikistan  |  2,720,953     |Odd        |
303 | Turkmenistan|    419,607     |Odd        |
304 | Uzbekistan  |  3,035,547     |Odd        |
305 | 
306 | */
307 | 
308 | /* Question 9.
309 |  * 
310 |  * List the third most populated city ranked by region WITHOUT using limit or offset.  
311 |  * List the region name, city name, population and order the results by region.
312 |  * 
313 |  */
314 | 
315 | WITH get_city_rank_cte AS (
316 | 	SELECT
317 | 		t1.region,
318 | 		t2.city_name,
319 | 		t2.population AS third_largest_pop,
320 | 		DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk
321 | 	FROM
322 | 		cleaned_data.countries AS t1
323 | 	JOIN 
324 | 		cleaned_data.cities AS t2
325 | 	ON 
326 | 		t1.country_code_2 = t2.country_code_2
327 | 	WHERE 
328 | 		t2.population IS NOT NULL
329 | 	GROUP BY
330 | 		t1.region,
331 | 		t2.city_name,
332 | 		t2.population
333 | )
334 | SELECT
335 | 	initcap(region) AS region,
336 | 	initcap(city_name) AS city_name,
337 | 	to_char(third_largest_pop, '99G999G999') AS third_largest_pop
338 | FROM
339 | 	get_city_rank_cte
340 | WHERE
341 | 	rnk = 3;
342 | 
343 | /*
344 | 
345 | region  |city_name|third_largest_pop|
346 | --------+---------+-----------------+
347 | Africa  |Kinshasa | 12,836,000      |
348 | Americas|New York | 18,972,871      |
349 | Asia    |Delhi    | 32,226,000      |
350 | Europe  |Paris    | 11,060,000      |
351 | Oceania |Brisbane |  2,360,241      |
352 | 
353 | */
354 | 
355 | /* Question 10.
356 |  * 
357 |  * List the bottom third of all countries in the Western Asia sub-region that speak Arabic.  
358 |  * Include the row number and country name.  Order by row number.
359 |  * 
360 |  */
361 | 
362 | WITH get_ntile_cte AS (
363 | 	SELECT
364 | 		ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn,
365 | 		t1.country_name,
366 | 		-- ntile() window functions returns groups of data section into 'buckets'.
367 | 		NTILE(3) OVER (ORDER BY t1.country_name) AS nt
368 | 	FROM
369 | 		cleaned_data.countries AS t1
370 | 	JOIN 
371 | 		cleaned_data.languages AS t2
372 | 	ON
373 | 		t1.country_code_2 = t2.country_code_2
374 | 	WHERE
375 | 		t1.sub_region = 'western asia'
376 | 	AND 
377 | 		t2.language = 'arabic'
378 | )
379 | SELECT
380 | 	rn AS row_id,
381 | 	country_name
382 | FROM
383 | 	get_ntile_cte
384 | WHERE
385 | 	nt = 3;
386 | 
387 | 
388 | /*
389 | 
390 | row_id|country_name        |
391 | ------+--------------------+
392 |      9|saudi arabia        |
393 |     10|syrian arab republic|
394 |     11|united arab emirates|
395 |     12|yemen               |
396 | 
397 | */
398 | 
399 | /* Question 11.
400 |  * 
401 |  *  Create a query that lists country name, capital name, population, languages spoken and currency name for countries 
402 |  * 	in the Northen Africa sub-region.  There can be multiple currency names and languages spoken per country.  
403 |  * 	Add multiple values for the same field into an array.
404 |  * 
405 |  */
406 | 
407 | WITH get_row_values AS (
408 | 	SELECT
409 | 		t1.country_name,
410 | 		t2.city_name,
411 | 		t2.population,
412 | 		-- array_agg() aggregates multiple values and returns them in 'array' format.
413 | 		array_agg(t3.LANGUAGE) AS languages,
414 | 		t4.currency_name AS currencies
415 | 	FROM
416 | 		cleaned_data.countries AS t1
417 | 	JOIN
418 | 		cleaned_data.cities AS t2
419 | 	ON 
420 | 		t1.country_code_2 = t2.country_code_2
421 | 	JOIN
422 | 		cleaned_data.languages AS t3
423 | 	ON 
424 | 		t1.country_code_2 = t3.country_code_2
425 | 	JOIN
426 | 		cleaned_data.currencies AS t4
427 | 	ON 
428 | 		t1.country_code_2 = t4.country_code_2
429 | 	WHERE
430 | 		t1.sub_region = 'northern africa'
431 | 	AND
432 | 		t2.capital = TRUE
433 | 	GROUP BY
434 | 		t1.country_name,
435 | 		t2.city_name,
436 | 		t2.population,
437 | 		t4.currency_name
438 | )
439 | SELECT
440 | 	*
441 | FROM
442 | 	get_row_values;
443 | 
444 | /*
445 | 
446 | country_name|city_name|population|languages                                   |currencies     |
447 | ------------+---------+----------+--------------------------------------------+---------------+
448 | algeria     |algiers  |   3415811|{french,arabic,kabyle}                      |algerian dinar |
449 | egypt       |cairo    |  20296000|{arabic}                                    |egyptian pound |
450 | libya       |tripoli  |   1293016|{arabic}                                    |libyan dinar   |
451 | morocco     |rabat    |    572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham|
452 | sudan       |khartoum |   7869000|{arabic,english}                            |sudanese pound |
453 | tunisia     |tunis    |   1056247|{french,arabic}                             |tunisian dinar |
454 | 
455 | */
456 | 
457 | 
458 | /* Question 12.
459 |  * 
460 |  * Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022. List 
461 |  * how many vowels and consonants are present in the city name and concatnate their percentage to the their respective 
462 |  * count in parenthesis.
463 |  * 
464 |  */
465 | 
466 | WITH get_letter_count AS (
467 | 	SELECT
468 | 		t1.city_name,
469 | 		length(t1.city_name) string_length,
470 | 		-- regexp_replace() returns a vlue that has been manipulated by a regular expression.
471 | 		length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count
472 | 	FROM
473 | 		cleaned_data.cities AS t1
474 | 	WHERE
475 | 		t1.insert_date = '2022-04-28'
476 | 	AND
477 | 		t1.country_code_2 in ('us')
478 | ),
479 | get_letter_diff AS (
480 | 	SELECT
481 | 		city_name,
482 | 		string_length - consonant_count AS vowels,
483 | 		round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc,
484 | 		consonant_count AS consonants,
485 | 		round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc
486 | 	FROM
487 | 		get_letter_count
488 | )
489 | SELECT 
490 | 	city_name,
491 | 	vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc,
492 | 	consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc
493 | FROM
494 | 	get_letter_diff
495 | ORDER BY 
496 | 	vowels;
497 | 	
498 | /*
499 | 
500 | city_name      |vowel_count_perc|consonants_count_perc|
501 | ---------------+----------------+---------------------+
502 | standish       |2 (25.00%)      |6 (75%)              |
503 | grand forks    |2 (18.18%)      |9 (81.82%)           |
504 | camano         |3 (50.00%)      |3 (50%)              |
505 | cedar hills    |3 (27.27%)      |8 (72.73%)           |
506 | gladstone      |3 (33.33%)      |6 (66.67%)           |
507 | whitehall      |3 (33.33%)      |6 (66.67%)           |
508 | homewood       |4 (50.00%)      |4 (50%)              |
509 | willowbrook    |4 (36.36%)      |7 (63.64%)           |
510 | port salerno   |4 (33.33%)      |8 (66.67%)           |
511 | vadnais heights|5 (33.33%)      |10 (66.67%)          |
512 | jeffersonville |5 (35.71%)      |9 (64.29%)           |	
513 | 	
514 | */
515 | 
516 | /* Question 13.
517 |  * 
518 |  * List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted
519 |  * in April 2022.  
520 |  * 
521 |  */
522 | 
523 | DROP TABLE IF EXISTS get_dates;
524 | CREATE TEMP TABLE get_dates AS (
525 | 	SELECT
526 | 		DISTINCT ON (insert_date) insert_date AS insert_date,
527 | 		city_name
528 | 	FROM
529 | 		cleaned_data.cities
530 | 	WHERE
531 | 		country_code_2 = 'ca'
532 | 	AND
533 | 		insert_date BETWEEN '2022-04-01' AND '2022-04-30'
534 | 	ORDER BY
535 | 		insert_date
536 | );
537 | 
538 | DROP TABLE IF EXISTS get_diff;
539 | CREATE TEMP TABLE get_diff AS (
540 | 	SELECT
541 | 		city_name,
542 | 		insert_date,
543 | 		EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff
544 | 	FROM
545 | 		get_dates
546 | );
547 | 
548 | 
549 | DROP TABLE IF EXISTS get_diff_count;
550 | CREATE TEMP TABLE get_diff_count AS (
551 | 	SELECT
552 | 		city_name,
553 | 		insert_date,
554 | 		count(*) OVER (PARTITION BY diff) AS diff_count
555 | 	FROM
556 | 		get_diff
557 | );
558 | 
559 | 
560 | WITH get_rank AS (
561 | 	SELECT
562 | 		DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk,
563 | 		insert_date,
564 | 		city_name
565 | 	FROM
566 | 		get_diff_count
567 | )
568 | SELECT
569 | 	insert_date AS most_consecutive_dates,
570 | 	initcap(city_name) AS city_name
571 | FROM
572 | 	get_rank
573 | WHERE
574 | 	rnk = 1
575 | ORDER BY 
576 | 	insert_date;
577 | 	
578 | /*
579 |  
580 | most_consecutive_dates|city_name   |
581 | ----------------------+------------+
582 |             2022-04-22|South Dundas|
583 |             2022-04-23|La Prairie  |
584 |             2022-04-24|Elliot Lake |
585 |             2022-04-25|Lachute     |
586 | 		
587 | */	
588 | 
589 | /* Question 14.
590 |  * 
591 |  * Create a view that lists the month-year, the number of cities inserted for that month, 
592 |  * a running city count total and the month over month percentage growth for 2021.
593 |  * 
594 |  * Format the cities count and the running total with the thousands separator
595 |  * and format the month over month growth with a plus symbol and percentage symbol.
596 |  * 
597 |  * Example: Feb-2021  | 1,000         |  2,000      |+60.00%         | 
598 |  * 
599 |  */
600 | 
601 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth;
602 | CREATE VIEW cleaned_data.view_2021_growth AS (
603 | 	WITH get_month_count AS (
604 | 		SELECT
605 | 			date_trunc('month', insert_date) as single_month,
606 | 		  	count(*) AS monthly_count
607 | 		FROM 
608 | 			cleaned_data.cities
609 | 		WHERE 
610 | 			EXTRACT('year' FROM insert_date) = 2021
611 | 		GROUP BY 
612 | 			single_month
613 | 		ORDER BY 
614 | 			single_month
615 | 	),
616 | 	get_running_total AS (
617 | 		SELECT
618 | 			single_month::date,
619 | 		  	monthly_count,
620 | 		  	sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities
621 | 		FROM
622 | 			get_month_count
623 | 	),
624 | 	get_month_over_month AS (
625 | 		SELECT
626 | 			single_month,
627 | 			monthly_count,
628 | 			total_num_cities,
629 | 			round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month
630 | 		FROM
631 | 			get_running_total
632 | 	)
633 | 	SELECT
634 | 		to_char(single_month, 'Mon-YYYY') AS month_year,
635 | 		to_char(monthly_count, '9G999') AS cities_inserted,
636 | 		to_char(total_num_cities, '99G999') AS running_total,
637 | 		to_char(month_over_month, 'sg99.99') || '%' AS month_over_month
638 | 	FROM
639 | 		get_month_over_month
640 | );
641 | 
642 | SELECT 
643 | 	*
644 | FROM 
645 | 	cleaned_data.view_2021_growth;
646 | 
647 | /*
648 | 		
649 | month_year|cities_inserted|running_total|month_over_month|
650 | ----------+---------------+-------------+----------------+
651 | Jan-2021  | 1,471         |  1,471      |                |
652 | Feb-2021  | 1,291         |  2,762      |+87.76%         |
653 | Mar-2021  | 1,485         |  4,247      |+53.77%         |
654 | Apr-2021  | 1,508         |  5,755      |+35.51%         |
655 | May-2021  | 1,509         |  7,264      |+26.22%         |
656 | Jun-2021  | 1,395         |  8,659      |+19.20%         |
657 | Jul-2021  | 1,394         | 10,053      |+16.10%         |
658 | Aug-2021  | 1,481         | 11,534      |+14.73%         |
659 | Sep-2021  | 1,452         | 12,986      |+12.59%         |
660 | Oct-2021  | 1,446         | 14,432      |+11.14%         |
661 | Nov-2021  | 1,378         | 15,810      |+9.55%          |
662 | Dec-2021  | 1,472         | 17,282      |+9.31%          |		
663 | 		
664 | */		
665 | 
666 | /* Question 15.
667 |  * 
668 |  * Create and call a stored procedure that lists a unique row id number, insert date, county name, city name, population 
669 |  * and languages spoken for countries in the Latin America and the Caribbean sub-region that were insert on 
670 |  * either '2022-04-09', '2022-04-28' or '2022-08-11'.
671 |  * 
672 |  * Order by the insert date and output the results (including headers) to a CSV file located in /source_data/csv_output/. 
673 |  * 
674 |  */
675 | 
676 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output ()
677 | LANGUAGE plpgsql
678 | AS 
679 | $sproc$
680 | 	BEGIN
681 | 		COPY (
682 | 			SELECT
683 | 				ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id,
684 | 				t1.insert_date,
685 | 				t2.country_name,
686 | 				t1.city_name,
687 | 				t1.population,
688 | 				array_agg(t3.language) AS languages
689 | 			FROM
690 | 				cleaned_data.cities AS t1
691 | 			JOIN
692 | 				cleaned_data.countries AS t2
693 | 			ON 
694 | 				t1.country_code_2 = t2.country_code_2
695 | 			LEFT JOIN
696 | 				cleaned_data.languages AS t3
697 | 			ON 
698 | 				t2.country_code_2 = t3.country_code_2
699 | 			WHERE
700 | 				t2.sub_region = 'latin america and the caribbean'
701 | 			AND
702 | 				t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11')
703 | 			GROUP BY 
704 | 				t1.insert_date,
705 | 				t2.country_name,
706 | 				t1.city_name,
707 | 				t1.population
708 | 			ORDER BY
709 | 				t1.insert_date
710 | 			)
711 | 		TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER;
712 | 	END
713 | $sproc$;
714 | 
715 | -- Call the stored procedure
716 | CALL cleaned_data.sproc_output();
717 | 
718 | /*
719 |  
720 | Results located in /source_data/csv_output/output.csv
721 | 
722 | row_id,insert_date,country_name,city_name,population,languages
723 | 1,2022-04-09,argentina,santa maria,9526,"{english,spanish}"
724 | 2,2022-04-09,brazil,clevelandia,16450,"{spanish,english,portuguese}"
725 | 3,2022-04-09,brazil,conceicao do rio verde,12949,"{english,portuguese,spanish}"
726 | 4,2022-04-09,brazil,curaca,35208,"{english,portuguese,spanish}"
727 | 5,2022-04-09,brazil,japoata,12938,"{spanish,english,portuguese}"
728 | 6,2022-04-09,haiti,petite riviere de nippes,29815,"{spanish,french}"
729 | 7,2022-04-09,peru,puerto pimentel,15552,"{quechua,spanish}"
730 | 8,2022-04-28,argentina,ciudad general belgrano,92957,"{english,spanish}"
731 | 9,2022-04-28,brazil,caete,40750,"{portuguese,spanish,english}"
732 | 10,2022-04-28,brazil,catanduva,119480,"{spanish,english,portuguese}"
733 | 11,2022-04-28,brazil,colider,32120,"{english,spanish,portuguese}"
734 | 12,2022-04-28,brazil,miracema do tocantins,20684,"{spanish,english,portuguese}"
735 | 13,2022-04-28,brazil,presidente medici,18571,"{spanish,english,portuguese}"
736 | 14,2022-04-28,colombia,coyaima,28443,"{spanish,english}"
737 | 15,2022-04-28,colombia,manati,13456,"{spanish,english}"
738 | 16,2022-04-28,dominican republic,licey al medio,25539,{spanish}
739 | 17,2022-04-28,mexico,amatitan,15344,"{spanish,english}"
740 | 18,2022-04-28,mexico,san juan zitlaltepec,19600,"{spanish,english}"
741 | 19,2022-04-28,mexico,teopisca,20044,"{english,spanish}"
742 | 20,2022-08-11,brazil,charqueadas,35320,"{portuguese,spanish,english}"
743 | 21,2022-08-11,brazil,chopinzinho,19679,"{portuguese,english,spanish}"
744 | 22,2022-08-11,brazil,jaguarao,27931,"{portuguese,spanish,english}"
745 | 23,2022-08-11,ecuador,vinces,30241,"{spanish,quechua}"
746 | 24,2022-08-11,guatemala,el palmar,31706,{spanish}
747 | 25,2022-08-11,nicaragua,villanueva,25660,{spanish}
748 | 26,2022-08-11,puerto rico,guaynabo,70542,"{english,spanish}"		
749 | 		
750 | */		
751 | 	
752 | 
753 | 


--------------------------------------------------------------------------------
/source_data/scripts/create_relationships.sql:
--------------------------------------------------------------------------------
 1 | /*
 2 | 	SQL Code Challenge
 3 | 	Author: Jaime M. Shaker
 4 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
 5 | 	Website: https://www.shaker.dev
 6 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
 7 | 	
 8 | 	File Name: create_relationships.sql
 9 | */
10 | 
11 | -- Alter cleaned_data.countries and add the UNIQUE constraint to country_code_2
12 | 
13 | ALTER TABLE 
14 | 	cleaned_data.countries 
15 | ADD CONSTRAINT 
16 | 	unique_country_code_2 
17 | UNIQUE (country_code_2);
18 | 
19 | -- Alter all other tables and add a foreign key constraint and reference.
20 | -- Create Foreign Key relationship for cleaned_data.cities
21 | ALTER TABLE 
22 | 	-- Table to be altered
23 | 	cleaned_data.cities
24 | ADD CONSTRAINT 
25 | 	-- Give this constraint a name
26 | 	fk_country_city 
27 | 	-- Which key in cleaned_data.cities is a foreign key
28 | FOREIGN KEY (country_code_2)
29 | 	-- Which key to reference from parent table
30 | REFERENCES cleaned_data.countries (country_code_2);
31 | 
32 | -- Create Foreign Key relationship for cleaned_data.currencies
33 | ALTER TABLE
34 | 	cleaned_data.currencies
35 | ADD CONSTRAINT 
36 | 	fk_country_currencies
37 | FOREIGN KEY (country_code_2)
38 | REFERENCES cleaned_data.countries (country_code_2);
39 | 
40 | -- Create Foreign Key relationship for cleaned_data.languages
41 | ALTER TABLE 
42 | 	cleaned_data.languages
43 | ADD CONSTRAINT 
44 | 	fk_country_languages 
45 | FOREIGN KEY (country_code_2)
46 | REFERENCES cleaned_data.countries (country_code_2);
47 | 
48 | 


--------------------------------------------------------------------------------
/source_data/scripts/drop_all_tables.sql:
--------------------------------------------------------------------------------
 1 | /*
 2 | 	SQL Code Challenge
 3 | 	Author: Jaime M. Shaker
 4 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
 5 | 	Website: https://www.shaker.dev
 6 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
 7 | 	
 8 | 	File Name: drop_all_tables.sql
 9 | 	
10 | 	Description: This script will drop all tables, schemas and constraints used in 
11 | 	this project.
12 | */
13 | 
14 | -- Drop import_data tables and schema.
15 | DROP TABLE IF EXISTS import_data.countries;
16 | DROP TABLE IF EXISTS import_data.cities;
17 | DROP TABLE IF EXISTS import_data.currencies;
18 | DROP TABLE IF EXISTS import_data.languages;
19 | DROP SCHEMA IF EXISTS import_data;
20 | 
21 | -- Drop Foreign Key relationship for cleaned_data.currencies
22 | ALTER TABLE IF EXISTS
23 | 	-- Table to be altered
24 | 	cleaned_data.cities
25 | DROP CONSTRAINT IF EXISTS
26 | 	-- Drop this constraint name
27 | 	fk_country_city;
28 | 
29 | -- Drop Foreign Key relationship for cleaned_data.currencies
30 | ALTER TABLE IF EXISTS
31 | 	cleaned_data.currencies
32 | DROP CONSTRAINT IF EXISTS 
33 | 	fk_country_currencies;
34 | 
35 | -- Drop Foreign Key relationship for cleaned_data.languages
36 | ALTER TABLE IF EXISTS
37 | 	cleaned_data.languages
38 | DROP CONSTRAINT IF EXISTS
39 | 	fk_country_languages;
40 | 
41 | -- Drop cleaned_data tables and schema.
42 | DROP TABLE IF EXISTS cleaned_data.countries;
43 | DROP TABLE IF EXISTS cleaned_data.cities;
44 | DROP TABLE IF EXISTS cleaned_data.currencies;
45 | DROP TABLE IF EXISTS cleaned_data.languages;
46 | DROP SCHEMA IF EXISTS cleaned_data CASCADE;


--------------------------------------------------------------------------------
/source_data/scripts/normalize_tables.sql:
--------------------------------------------------------------------------------
  1 | /*
  2 | 	SQL Code Challenge
  3 | 	Author: Jaime M. Shaker
  4 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
  5 | 	Website: https://www.shaker.dev
  6 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
  7 | 	
  8 | 	File Name: normalize_tables.sql
  9 | */
 10 | 
 11 | -- We must insure that our data is properly organized.  Let's create a schema
 12 | -- specificaly for our transformed, clean data.
 13 | CREATE SCHEMA IF NOT EXISTS cleaned_data;
 14 | -- Create countries table
 15 | -- Drop this table everytime this script is run to ensure repeatability.
 16 | -- We are adding CASCADE for when we add foreign key relationships.
 17 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE;
 18 | -- We must presuppose that the data in it's current state is unusable/unreliable.
 19 | -- 
 20 | CREATE TABLE cleaned_data.countries (
 21 | 	-- The country_id was automatically incremented and can be entered as an integer.
 22 | 	country_id INT NOT NULL,
 23 | 	-- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text).  
 24 | 	-- Use text to ensure that there are no errors due to string length.
 25 | 	country_name TEXT NOT NULL,
 26 | 	-- The data appears to have duplicate entries so we will remove them once the data has been cleaned of
 27 | 	-- any unwanted characters.
 28 | 	country_code_2 varchar(2) NOT NULL,
 29 | 	country_code_3 varchar(3) NOT NULL,
 30 | 	region TEXT,
 31 | 	sub_region TEXT,
 32 | 	intermediate_region TEXT,
 33 | 	created_on date,
 34 | 	PRIMARY KEY (country_id)
 35 | );
 36 | 
 37 | INSERT INTO cleaned_data.countries (
 38 | 	country_id,
 39 | 	country_name,
 40 | 	country_code_2,
 41 | 	country_code_3,
 42 | 	region,
 43 | 	sub_region,
 44 | 	intermediate_region,
 45 | 	created_on
 46 | )
 47 | (
 48 | 	SELECT
 49 | 		i.country_id,
 50 | 		-- regex_replace() function removes any special characters with a simple regex-expression.
 51 | 		-- trim() function removes white space from either end of the string.
 52 | 		-- lower() function converts all characters to lowercase.
 53 | 		trim(lower(regexp_replace(i.country_name, '[^\w\s^. ]', '', 'gi'))),
 54 | 		-- Properly cast type from TEXT into new table
 55 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar,
 56 | 		trim(lower(regexp_replace(i.country_code_3, '[^\w\s^. ]', '', 'gi')))::varchar,
 57 | 		trim(lower(regexp_replace(i.region, '[^\w\s^. ]', '', 'gi'))),
 58 | 		trim(lower(regexp_replace(i.sub_region, '[^\w\s^. ]', '', 'gi'))),
 59 | 		trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^. ]', '', 'gi'))),
 60 | 		-- Use the built-in function current_date to insert current date into created_on field.
 61 | 		current_date
 62 | 	FROM 
 63 | 		import_data.countries AS i
 64 | );
 65 | 
 66 | /* Create cleaned_data.cities table */
 67 | DROP TABLE IF EXISTS cleaned_data.cities;
 68 | CREATE TABLE cleaned_data.cities (
 69 | 	city_id int,
 70 | 	city_name TEXT,
 71 | 	latitude float,
 72 | 	longitude float,
 73 | 	country_code_2 varchar(2) NOT NULL,
 74 | 	capital boolean,
 75 | 	population int,
 76 | 	insert_date date,
 77 | 	PRIMARY KEY (city_id)
 78 | );
 79 | 
 80 | INSERT INTO cleaned_data.cities (
 81 | 	city_id,
 82 | 	city_name,
 83 | 	latitude,
 84 | 	longitude,
 85 | 	country_code_2,
 86 | 	capital,
 87 | 	population,
 88 | 	insert_date
 89 | )
 90 | (
 91 | 	SELECT
 92 | 		i.city_id,
 93 | 		trim(lower(regexp_replace(i.city_name, '[^\w\s^. ]', '', 'gi'))),
 94 | 		i.longitude::float,
 95 | 		i.latitude::float,
 96 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar,
 97 | 		i.capital::boolean,
 98 | 		i.population::int,
 99 | 		i.insert_date::date
100 | 	FROM 
101 | 		import_data.cities AS i
102 | );
103 | 
104 | /* Create cleaned_data.currencies table */
105 | DROP TABLE IF EXISTS cleaned_data.currencies;
106 | CREATE TABLE cleaned_data.currencies (
107 | 	currency_id int,
108 | 	country_code_2 varchar(2) NOT NULL,
109 | 	currency_name TEXT,
110 | 	currency_code TEXT,
111 | 	PRIMARY KEY (currency_id)
112 | );
113 | 
114 | INSERT INTO cleaned_data.currencies (
115 | 	currency_id,
116 | 	country_code_2,
117 | 	currency_name,
118 | 	currency_code
119 | )
120 | (
121 | 	SELECT 
122 | 		currency_id,
123 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar,
124 | 		trim(lower(regexp_replace(i.currency_name, '[^\w\s^. ]', '', 'gi'))),
125 | 		trim(lower(regexp_replace(i.currency_code, '[^\w\s^. ]', '', 'gi')))
126 | 	FROM
127 | 		import_data.currencies AS i
128 | );
129 | 
130 | /* Create cleaned_data.languages table */
131 | DROP TABLE IF EXISTS cleaned_data.languages;
132 | CREATE TABLE cleaned_data.languages (
133 | 	language_id int,
134 | 	language TEXT,
135 | 	country_code_2 varchar(2) NOT NULL,
136 | 	PRIMARY KEY (language_id)
137 | );
138 | 
139 | INSERT INTO cleaned_data.languages (
140 | 	language_id,
141 | 	language,
142 | 	country_code_2
143 | )
144 | (
145 | 	SELECT 
146 | 		language_id,
147 | 		trim(lower(regexp_replace(i.language, '[^\w\s^. ]', '', 'gi'))),
148 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^. ]', '', 'gi')))::varchar
149 | 	FROM
150 | 		import_data.languages AS i
151 | );
152 | 	
153 | 
154 | 


--------------------------------------------------------------------------------
/source_data/scripts/remove_duplicates.sql:
--------------------------------------------------------------------------------
 1 | /*
 2 | 	SQL Code Challenge
 3 | 	Author: Jaime M. Shaker
 4 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
 5 | 	Website: https://www.shaker.dev
 6 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
 7 | 	
 8 | 	File Name: remove_duplicates.sql
 9 | */
10 | 
11 | -- Remove Duplicate Entries
12 | -- Check for duplicate entries in clean_data.countries
13 | /*
14 | SELECT
15 | 	-- Get the columns.
16 | 	country_code_2,
17 | 	-- Count how many times the country_code_2 occurs.
18 | 	count(*)
19 | FROM
20 | 	cleaned_data.countries
21 | GROUP BY 
22 | 	-- Using an aggregate function forces us to group all like names together.
23 | 	country_code_2
24 | HAVING 
25 | 	-- Only select values that have a count greater than one (multiple entries).
26 | 	count(*) > 1;
27 | */
28 | 	
29 | -- Delete duplicate entries
30 | 
31 | DELETE 
32 | FROM 
33 | 	-- Add an alias to the id's we wish to keep
34 | 	cleaned_data.countries AS clean_data_1
35 | USING 
36 | 	-- Add an alias to the duplicate id's
37 | 	cleaned_data.countries AS clean_data_2
38 | WHERE 
39 | 	-- This statement will remove the greater value id's.
40 | 	clean_data_1.country_id > clean_data_2.country_id
41 | AND 
42 | 	-- This statement ensures that both values are identical.
43 | 	clean_data_1.country_code_2 = clean_data_2.country_code_2;
44 | 


--------------------------------------------------------------------------------
/source_data/scripts/start_fresh.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Although these commands can be done in a simple one-liner, I've made
 4 | # the individual commands for new people that may not understand the syntax.
 5 | #
 6 | # The breakdown of the command are as follows.
 7 | # psql: is a terminal-based front-end to PostgreSQL.
 8 | # sql_coding_challenge: Name of our database.
 9 | # -f: Checks if the file exists and is a regular file.
10 | # some_file.sql: The sql script  to execute via psql 
11 | 
12 | psql sql_coding_challenge -f drop_all_tables.sql
13 | psql sql_coding_challenge -f build_tables.sql
14 | psql sql_coding_challenge -f normalize_tables.sql
15 | psql sql_coding_challenge -f remove_duplicates.sql
16 | psql sql_coding_challenge -f cleanup_db.sql
17 | psql sql_coding_challenge -f create_relationships.sql
18 | echo "Build Process Complete"


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_1_DOCKER.md:
--------------------------------------------------------------------------------
 1 | ## Basic/Intermediate SQL Code Challenge
 2 | 
 3 | **Author**: Jaime M. Shaker <br />
 4 | **Email**: jaime.m.shaker@gmail.com <br />
 5 | **Website**: https://www.shaker.dev <br />
 6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
 7 | 
 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
 9 | 
10 | ### Set Up
11 | 
12 | This repository contains all of the necessary files, data and directories for running a PostgresSQL server in a Docker Container.  The only prerequisite is that you should already have Docker Desktop installed and running on your computer.
13 | 
14 |  https://www.docker.com/products/docker-desktop/
15 | 
16 |  Once we have `Docker Desktop` installed and running, we must have Docker download an Image of Postgres and start running the Postgres container.
17 | 
18 |  Using your terminal (Powershell/Bash/ect...) and cd (changed directory) to where this project is located.  To ensure you are in the correct directory, type this command into the terminal.
19 | 
20 |  `ls -al`
21 | 
22 |  Your results should look something similar to this.
23 | 
24 |  ![alt text](../images/terminal_1.PNG)
25 | 
26 |  As mentioned before, in the same directory as the `docker-compose.yaml` file, create an empty directory/file folder named '`db`' (Note the red arrow in image). This is where the PostgreSQL container will store internal data and keep your data persistent.
27 |  * This directory's path is in the `.gitignore` file which is why it is not included in this repository.
28 | 
29 |  Once you are in the same directory as the `docker-compose.yaml` file, run this command to start the Docker container.
30 | 
31 |  `docker-compose up -d`
32 | 
33 | * The `-d` portion of the command allows the container to run in the background.
34 |  * If this is the first time running this command, this command will download the required Docker image and create a PostgreSQL container which may take a few minutes to complete.  Once it is complete and running, your terminal should look something like this.
35 | 
36 |   ![alt text](../images/terminal_2.PNG)
37 | 
38 |  * After the initial install, the next time you run the   `docker-compose up -d` command, the Docker container will fire right up as long as the image hasn't been deleted.
39 | 
40 | If you are using [PGAdmin](https://www.pgadmin.org/), [DBeaver](https://dbeaver.io/) or any other Database management tool, create a connection to the now running PostgreSQL server.
41 | * Host: `localhost`
42 | * Database Name: `sql_coding_challenge`
43 | * Username: `postgres`
44 | * Password: `postgres`
45 | 
46 | Alternatively, if you are using [psql](https://gist.github.com/Kartones/dd3ff5ec5ea238d4c546), use these commands in your terminal to connect to the database:
47 | * `docker exec -ti sql_coding_challenge psql -U postgres`
48 | * `\c sql_coding_challenge`
49 | 
50 | To stop the Docker container:
51 | * Using the command line, cd into the same directory as the `docker-compose.yaml` file and run the command...
52 | 	* `docker-compose down`
53 | * You can also use the `Docker Desktop GUI` to stop the container.
54 | 
55 | The `docker-compose down` command will stop the container until the next time you start it back up using the command: 
56 | *  `docker-compose up -d`
57 |  
58 | Now that we have our PostgreSQL server up and running, click the link below and let's start creating tables and inserting data.
59 | 
60 |  Go to [WALKTHROUGH_BUILD](WALKTHROUGH_2_BUILD.md)
61 | 
62 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
63 | 
64 | 
65 | 
66 | 


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_2_BUILD.md:
--------------------------------------------------------------------------------
  1 | ## Basic/Intermediate SQL Code Challenge
  2 | 
  3 | **Author**: Jaime M. Shaker <br />
  4 | **Email**: jaime.m.shaker@gmail.com <br />
  5 | **Website**: https://www.shaker.dev <br />
  6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
  7 | 
  8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
  9 | 
 10 | ### ETL (Extract/Transform/Load)
 11 | 
 12 | ETL is the process data engineers use to extract data from different sources, transform the data into a usable resource, and load that data into the systems that end-users can access and use to solve business problems.  
 13 | * Although this term is used on a much larger scale for data warehousing, on a much, much smaller scale, we are doing the same thing by:
 14 | 	* Extracting raw data from CSV files.
 15 | 	* Transforming the data to make it usable.
 16 | 	* Loading the data into tables within our database
 17 | 
 18 | Also known as `Data Wrangling`.
 19 | 
 20 | In Data Analysis, the analyst must ensure that the data is 'clean' before doing any analysis. 'Dirty' data can lead to unreliable, inaccurate and/or misleading results.
 21 | 
 22 | * Garbage in = garbage out.
 23 | 
 24 | These are some of the steps that can be taken to properly prepare this dataset for analysis.
 25 | 
 26 | * Check for duplicate entries and remove them.
 27 | * Remove extra spaces and/or other invalid characters.
 28 | * Separate or combine values as needed.
 29 | * Correct incorrect spelling or inputted data.
 30 | * Check for null or empty values.
 31 | 
 32 | Let's create a new file in [source_data/scripts/](../source_data/scripts/) and name it `build_tables.sql`.
 33 | 
 34 | Before we can transform the data, we must get the data into the database.  Let's create a script that will consistantly build and rebuild our schema and tables.  We will use the `countries.csv` to create the countries table.
 35 | 
 36 | ```sql
 37 | -- We must insure that our data is properly organized.  Let's create a schema
 38 | -- specificaly for importing/copying data from our CSV files.
 39 | CREATE SCHEMA IF NOT EXISTS import_data;
 40 | -- Create import_data.countries table
 41 | -- Drop this table everytime this script is run to ensure repeatability.
 42 | DROP TABLE IF EXISTS import_data.countries;
 43 | -- Create a new table and add a country_id to use as a primary key.
 44 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY.
 45 | -- We will also create a created_on field that will have the date that the table was created.
 46 | CREATE TABLE import_data.countries (
 47 | 	-- Create an auto-incrementing, unique key for every row.  
 48 | 	-- This will also be used as our Primary Key.
 49 | 	country_id INT GENERATED ALWAYS AS IDENTITY,
 50 | 	country_name TEXT,
 51 | 	country_code_2 TEXT,
 52 | 	country_code_3 TEXT,
 53 | 	region TEXT,
 54 | 	sub_region TEXT,
 55 | 	intermediate_region TEXT,
 56 | 	created_on date,
 57 | 	PRIMARY KEY (country_id)
 58 | );
 59 | -- We will use the COPY statment to extract the data from our CSV files.
 60 | COPY import_data.countries (
 61 | 	country_name,
 62 | 	country_code_2,
 63 | 	country_code_3,
 64 | 	region,
 65 | 	sub_region,
 66 | 	intermediate_region
 67 | )
 68 | -- PostgreSQL stores its data in /var/lib/postgresql/
 69 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access.
 70 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv'
 71 | -- The CSV files are comma separated and include headers.
 72 | WITH DELIMITER ',' HEADER CSV;
 73 | ```
 74 | Execute the complete script and SUCCESS!  We have inserted the `countries.csv` into the `import_data.countries` table.
 75 | 
 76 | Let's test our new table and take a look at the first 5 rows of data.
 77 | 
 78 | ```sql
 79 | SELECT * FROM import_data.countries LIMIT 5;
 80 | ```
 81 | **Results**
 82 | 
 83 | country_id|country_name    |country_code_2|country_code_3|region  |sub_region       |intermediate_region|created_on|
 84 | ----------|----------------|--------------|--------------|--------|-----------------|-------------------|----------|
 85 | 1|!afg!hANistan   |af?           |afg           |asia    |$southern asia   |                   |          |
 86 | 2|  alba$nia      |aL            |alb           |europe! |southern *europe |                   |          |
 87 | 3|Alb?ania        |AL            |alb           |eur#ope |$southern e#urope|                   |          |
 88 | 4|?algeria?       |d!z           |dza           |africa  |northern africa  |                   |          |
 89 | 5|americ#an samoa?|as            |as!m          |0oceania|polyne$sia       |                   |          |
 90 | 
 91 | As you can see, some countries do not have an `immediate_region` field and none of our entries have a `created_on` value yet.  Also notice that some of the country names have special characters, upper/lower case characters and whitespace.
 92 | 
 93 | This will definitely cause some problems if not corrected, however, right now our only concern is to get the data into our database. 
 94 | 
 95 | Let's use the same process to add all of the CSV files in `source_data/csv_data/`
 96 | * cities.csv
 97 | * currencies.csv
 98 | * languages.csv
 99 | 
100 | After adding all the necessary code, our `build_tables.sql` file should now look like this...
101 | 
102 | ```sql
103 | /*
104 | 	SQL Code Challenge
105 | 	Author: Jaime M. Shaker
106 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
107 | 	Website: https://www.shaker.dev
108 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
109 | 	
110 | 	File Name: build_tables.sql
111 | */
112 | 
113 | -- We must insure that our data is properly organized.  Let's create a schema
114 | -- specificaly for importing/copying data from our CSV files.
115 | CREATE SCHEMA IF NOT EXISTS import_data;
116 | -- Create import_data.countries table
117 | -- Drop this table everytime this script is run to ensure repeatability.
118 | DROP TABLE IF EXISTS import_data.countries;
119 | -- Create a new table and add a country_id to use as a primary key.
120 | -- We will initially insert all the data as TEXT to ensure there aren't any errors during the COPY.
121 | -- We will also create a created_on field that will have the date that the table was created.
122 | CREATE TABLE import_data.countries (
123 | 	-- Create an auto-incrementing, unique key for every row.  
124 | 	-- This will also be used as our Primary Key.
125 | 	country_id INT GENERATED ALWAYS AS IDENTITY,
126 | 	country_name TEXT,
127 | 	country_code_2 TEXT,
128 | 	country_code_3 TEXT,
129 | 	region TEXT,
130 | 	sub_region TEXT,
131 | 	intermediate_region TEXT,
132 | 	created_on date,
133 | 	PRIMARY KEY (country_id)
134 | );
135 | -- We will use the COPY statment to extract the data from our CSV files.
136 | COPY import_data.countries (
137 | 	country_name,
138 | 	country_code_2,
139 | 	country_code_3,
140 | 	region,
141 | 	sub_region,
142 | 	intermediate_region
143 | )
144 | -- PostgreSQL stores its data in /var/lib/postgresql/
145 | -- In the docker-compose.yaml file, we created a volume name 'source_data/' that our container can access.
146 | FROM '/var/lib/postgresql/source_data/csv_data/countries.csv'
147 | -- The CSV files are comma separated and include headers.
148 | WITH DELIMITER ',' HEADER CSV;
149 | 
150 | -- Using the same process, lets create tables for all of our csv files.
151 | /* Create import.cities table */
152 | DROP TABLE IF EXISTS import_data.cities;
153 | CREATE TABLE import_data.cities (
154 | 	city_id INT GENERATED ALWAYS AS IDENTITY,
155 | 	city_name TEXT,
156 | 	latitude TEXT,
157 | 	longitude TEXT,
158 | 	country_code_2 TEXT,
159 | 	capital TEXT,
160 | 	population TEXT,
161 | 	insert_date TEXT,
162 | 	PRIMARY KEY (city_id)
163 | );
164 | 
165 | COPY import_data.cities (
166 | 	city_name,
167 | 	latitude,
168 | 	longitude,
169 | 	country_code_2,
170 | 	capital,
171 | 	population,
172 | 	insert_date
173 | )
174 | FROM '/var/lib/postgresql/source_data/csv_data/cities.csv'
175 | WITH DELIMITER ',' HEADER CSV;
176 | 
177 | /* Create import.currencies table */
178 | DROP TABLE IF EXISTS import_data.currencies;
179 | CREATE TABLE import_data.currencies (
180 | 	currency_id INT GENERATED ALWAYS AS IDENTITY,
181 | 	country_code_2 TEXT,
182 | 	currency_name TEXT,
183 | 	currency_code TEXT,
184 | 	PRIMARY KEY (currency_id)
185 | );
186 | 
187 | COPY import_data.currencies (
188 | 	country_code_2,
189 | 	currency_name,
190 | 	currency_code
191 | )
192 | FROM '/var/lib/postgresql/source_data/csv_data/currencies.csv'
193 | WITH DELIMITER ',' HEADER CSV;
194 | 
195 | /* Create import.languages table */
196 | DROP TABLE IF EXISTS import_data.languages;
197 | CREATE TABLE import_data.languages (
198 | 	language_id INT GENERATED ALWAYS AS IDENTITY,
199 | 	language TEXT,
200 | 	country_code_2 TEXT,
201 | 	PRIMARY KEY (language_id)
202 | );
203 | 
204 | COPY import_data.languages (
205 | 	language,
206 | 	country_code_2
207 | )
208 | FROM '/var/lib/postgresql/source_data/csv_data/languages.csv'
209 | WITH DELIMITER ',' HEADER CSV;
210 | ```
211 | Execute the complete script and our data has now been successfully extracted from the CSV files and inserted/copied into our database.  This script can be repeated over and over and it will always give the same results. 
212 | 
213 | We can now TRANSFORM our data into a usable resource.
214 | 
215 |  Go to [WALKTHROUGH_NORMALIZE](WALKTHROUGH_3_NORMALIZE.md)
216 | 
217 | 
218 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
219 | 
220 | 
221 | 
222 | 


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_3_NORMALIZE.md:
--------------------------------------------------------------------------------
  1 | ## Basic/Intermediate SQL Code Challenge
  2 | 
  3 | **Author**: Jaime M. Shaker <br />
  4 | **Email**: jaime.m.shaker@gmail.com <br />
  5 | **Website**: https://www.shaker.dev <br />
  6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
  7 | 
  8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
  9 | 
 10 | ### Normalization
 11 | 
 12 | According to Wikipedia, [Database normalization](https://en.wikipedia.org/wiki/Database_normalization)  is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints.
 13 | 
 14 | Database normalization forms can be quite involved and go beyond the scope of this walkthrough.  In this section, we will restructure our tables with the proper constraints, relationships and "clean" the data as we insert it into our new tables.
 15 | 
 16 | To keep things organised, we will created our new tables with the `clean_data ` schema.
 17 | 
 18 | Create a new file in the `source_data/scripts` named `normalize_tables.sql`.  Upon inspecting the import.countries table, we can see that there are duplicate entries, mixed upper/lower case characters, added white space and special characters.
 19 | 
 20 | ```sql
 21 | SELECT 
 22 | 	country_name,
 23 | 	country_code_2
 24 | FROM 
 25 | 	import_data.countries 
 26 | LIMIT 5;
 27 | ```
 28 | 
 29 | **Results:**
 30 | 
 31 | country_name    |country_code_2|
 32 | ----------------|--------------|
 33 | !afg!hANistan   |af?           |
 34 |   alba$nia      |aL            |
 35 | Alb?ania        |AL            |
 36 | ?algeria?       |d!z           |
 37 | americ#an samoa?|as            |
 38 | 
 39 | Before we can insert our data into our new table. We must ensure that the data has been 'cleaned'.  During the insert, lets...
 40 | * Remove extra spaces (whitespace)
 41 | * Remove special characters.
 42 | * Convert all characters to lower case.
 43 | 
 44 | ```sql
 45 | /*
 46 | 	SQL Code Challenge
 47 | 	Author: Jaime M. Shaker
 48 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
 49 | 	Website: https://www.shaker.dev
 50 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
 51 | 	
 52 | 	File Name: normalize_tables.sql
 53 | */
 54 | 
 55 | -- We must insure that our data is properly organized.  Let's create a schema
 56 | -- specificaly for our transformed, clean data.
 57 | CREATE SCHEMA IF NOT EXISTS cleaned_data;
 58 | -- Create countries table
 59 | -- Drop this table everytime this script is run to ensure repeatability.
 60 | -- We are adding CASCADE for when we add foreign key relationships.
 61 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE;
 62 | -- We must presuppose that the data in it's current state is unusable/unreliable.
 63 | -- 
 64 | CREATE TABLE cleaned_data.countries (
 65 | 	-- The country_id was automatically incremented and can be entered as an integer.
 66 | 	country_id INT NOT NULL,
 67 | 	-- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text).  
 68 | 	-- Use text to ensure that there are no errors due to string length.
 69 | 	country_name TEXT NOT NULL,
 70 | 	-- The data appears to have duplicate entries so we will remove them once the data has been cleaned of
 71 | 	-- any unwanted characters.
 72 | 	country_code_2 varchar(2) NOT NULL,
 73 | 	country_code_3 varchar(3) NOT NULL,
 74 | 	region TEXT,
 75 | 	sub_region TEXT,
 76 | 	intermediate_region TEXT,
 77 | 	created_on date,
 78 | 	PRIMARY KEY (country_id)
 79 | );
 80 | 
 81 | INSERT INTO cleaned_data.countries (
 82 | 	country_id,
 83 | 	country_name,
 84 | 	country_code_2,
 85 | 	country_code_3,
 86 | 	region,
 87 | 	sub_region,
 88 | 	intermediate_region,
 89 | 	created_on
 90 | )
 91 | (
 92 | 	SELECT
 93 | 		i.country_id,
 94 | 		-- regex_replace() function removes any special characters with a simple regex-expression.
 95 | 		-- trim() function removes white space from either end of the string.
 96 | 		-- lower() function converts all characters to lowercase.
 97 | 		trim(lower(regexp_replace(i.country_name, '[^\w\s^.]', '', 'gi'))),
 98 | 		-- Properly cast type from TEXT into new table
 99 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
100 | 		trim(lower(regexp_replace(i.country_code_3, '[^\w\s^.]', '', 'gi')))::varchar,
101 | 		trim(lower(regexp_replace(i.region, '[^\w\s^.]', '', 'gi'))),
102 | 		trim(lower(regexp_replace(i.sub_region, '[^\w\s^.]', '', 'gi'))),
103 | 		trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^.]', '', 'gi'))),
104 | 		-- Use the built-in function current_date to insert current date into created_on field.
105 | 		current_date
106 | 	FROM 
107 | 		import_data.countries AS i
108 | );
109 | ```
110 | Now we can run the same query in our new `cleaned_data.countries` table and see our updated results.
111 | 
112 | ```sql
113 | SELECT 
114 | 	country_name,
115 | 	country_code_2
116 | FROM 
117 | 	cleaned_data.countries 
118 | LIMIT 5;
119 | ```
120 | 
121 | **Results:**
122 | 
123 | country_name  |country_code_2|
124 | --------------|--------------|
125 | afghanistan   |af            |
126 | albania       |al            |
127 | albania       |al            |
128 | algeria       |dz            |
129 | american samoa|as            |
130 | 
131 | The data has been 'cleaned' and is now in a more usable form.  We still note that there are duplicate entries.   Before removing any duplicates, let's continue cleaning the data with our `normalize_table.sql` script and insert the data into `cleaned_data.tables`. 
132 | 
133 | ```sql
134 | /*
135 | 	SQL Code Challenge
136 | 	Author: Jaime M. Shaker
137 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
138 | 	Website: https://www.shaker.dev
139 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
140 | 	
141 | 	File Name: normalize_tables.sql
142 | */
143 | 
144 | -- We must insure that our data is properly organized.  Let's create a schema
145 | -- specificaly for our transformed, clean data.
146 | CREATE SCHEMA IF NOT EXISTS cleaned_data;
147 | -- Create countries table
148 | -- Drop this table everytime this script is run to ensure repeatability.
149 | -- We are adding CASCADE for when we add foreign key relationships.
150 | DROP TABLE IF EXISTS cleaned_data.countries CASCADE;
151 | -- We must presuppose that the data in it's current state is unusable/unreliable.
152 | -- 
153 | CREATE TABLE cleaned_data.countries (
154 | 	-- The country_id was automatically incremented and can be entered as an integer.
155 | 	country_id INT NOT NULL,
156 | 	-- Different from other database systems, in PostgreSQL, there is no performance difference among the three character types (char, varchar and text).  
157 | 	-- Use text to ensure that there are no errors due to string length.
158 | 	country_name TEXT NOT NULL,
159 | 	-- The data appears to have duplicate entries so we will remove them once the data has been cleaned of
160 | 	-- any unwanted characters.
161 | 	country_code_2 varchar(2) NOT NULL,
162 | 	country_code_3 varchar(3) NOT NULL,
163 | 	region TEXT,
164 | 	sub_region TEXT,
165 | 	intermediate_region TEXT,
166 | 	created_on date,
167 | 	PRIMARY KEY (country_id)
168 | );
169 | 
170 | INSERT INTO cleaned_data.countries (
171 | 	country_id,
172 | 	country_name,
173 | 	country_code_2,
174 | 	country_code_3,
175 | 	region,
176 | 	sub_region,
177 | 	intermediate_region,
178 | 	created_on
179 | )
180 | (
181 | 	SELECT
182 | 		i.country_id,
183 | 		-- regex_replace() function removes any special characters with a simple regex-expression.
184 | 		-- trim() function removes white space from either end of the string.
185 | 		-- lower() function converts all characters to lowercase.
186 | 		trim(lower(regexp_replace(i.country_name, '[^\w\s^.]', '', 'gi'))),
187 | 		-- Properly cast type from TEXT into new table
188 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
189 | 		trim(lower(regexp_replace(i.country_code_3, '[^\w\s^.]', '', 'gi')))::varchar,
190 | 		trim(lower(regexp_replace(i.region, '[^\w\s^.]', '', 'gi'))),
191 | 		trim(lower(regexp_replace(i.sub_region, '[^\w\s^.]', '', 'gi'))),
192 | 		trim(lower(regexp_replace(i.intermediate_region, '[^\w\s^.]', '', 'gi'))),
193 | 		-- Use the built-in function current_date to insert current date into created_on field.
194 | 		current_date
195 | 	FROM 
196 | 		import_data.countries AS i
197 | );
198 | 
199 | /* Create cleaned_data.cities table */
200 | DROP TABLE IF EXISTS cleaned_data.cities;
201 | CREATE TABLE cleaned_data.cities (
202 | 	city_id int,
203 | 	city_name TEXT,
204 | 	latitude float,
205 | 	longitude float,
206 | 	country_code_2 varchar(2) NOT NULL,
207 | 	capital boolean,
208 | 	population int,
209 | 	insert_date date,
210 | 	PRIMARY KEY (city_id)
211 | );
212 | 
213 | INSERT INTO cleaned_data.cities (
214 | 	city_id,
215 | 	city_name,
216 | 	latitude,
217 | 	longitude,
218 | 	country_code_2,
219 | 	capital,
220 | 	population,
221 | 	insert_date
222 | )
223 | (
224 | 	SELECT
225 | 		i.city_id,
226 | 		trim(lower(regexp_replace(i.city_name, '[^\w\s^.]', '', 'gi'))),
227 | 		i.longitude::float,
228 | 		i.latitude::float,
229 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
230 | 		i.capital::boolean,
231 | 		i.population::int,
232 | 		i.insert_date::date
233 | 	FROM 
234 | 		import_data.cities AS i
235 | );
236 | 
237 | /* Create cleaned_data.currencies table */
238 | DROP TABLE IF EXISTS cleaned_data.currencies;
239 | CREATE TABLE cleaned_data.currencies (
240 | 	currency_id int,
241 | 	country_code_2 varchar(2) NOT NULL,
242 | 	currency_name TEXT,
243 | 	currency_code TEXT,
244 | 	PRIMARY KEY (currency_id)
245 | );
246 | 
247 | INSERT INTO cleaned_data.currencies (
248 | 	currency_id,
249 | 	country_code_2,
250 | 	currency_name,
251 | 	currency_code
252 | )
253 | (
254 | 	SELECT 
255 | 		currency_id,
256 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar,
257 | 		trim(lower(regexp_replace(i.currency_name, '[^\w\s^.]', '', 'gi'))),
258 | 		trim(lower(regexp_replace(i.currency_code, '[^\w\s^.]', '', 'gi')))
259 | 	FROM
260 | 		import_data.currencies AS i
261 | );
262 | 
263 | /* Create cleaned_data.languages table */
264 | DROP TABLE IF EXISTS cleaned_data.languages;
265 | CREATE TABLE cleaned_data.languages (
266 | 	language_id int,
267 | 	language TEXT,
268 | 	country_code_2 varchar(2) NOT NULL,
269 | 	PRIMARY KEY (language_id)
270 | );
271 | 
272 | INSERT INTO cleaned_data.languages (
273 | 	language_id,
274 | 	language,
275 | 	country_code_2
276 | )
277 | (
278 | 	SELECT 
279 | 		language_id,
280 | 		trim(lower(regexp_replace(i.language, '[^\w\s^.]', '', 'gi'))),
281 | 		trim(lower(regexp_replace(i.country_code_2, '[^\w\s^.]', '', 'gi')))::varchar
282 | 	FROM
283 | 		import_data.languages AS i
284 | );
285 | 
286 | ```
287 | 
288 | Click the link to view the complete [normalize_table.sql](../source_data/scripts/normalize_tables.sql)
289 | 
290 | Once all of your tables loaded with our cleaned data, click the link below
291 | 
292 |  Go to [WALKTHROUGH_REMOVE_DUPLICATES](WALKTHROUGH_4_REMOVE_DUPLICATES.md)
293 | 
294 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
295 | 
296 | 
297 | 
298 | 


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_4_REMOVE_DUPLICATES.md:
--------------------------------------------------------------------------------
  1 | ## Basic/Intermediate SQL Analytics Code Challenge
  2 | 
  3 | **Author**: Jaime M. Shaker <br />
  4 | **Email**: jaime.m.shaker@gmail.com <br />
  5 | **Website**: https://www.shaker.dev <br />
  6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
  7 | 
  8 | 
  9 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
 10 | 
 11 | ###  Removing Duplicates 
 12 | 
 13 | Removing duplicate entries is part of the Normalization process.  I've placed this process in its own file to reduce clutter in the `normalize_tables.sql` script.
 14 | 
 15 | Create a new file in `source_data/scripts/` and name it `remove_duplicates.sql`.  In this script, we remove duplicate entries in our tables.  
 16 | 
 17 | After inspecting our data, we see that every country must have a unique `country_code_2`.
 18 | 
 19 | Let's create a query that checks for multiple entries of `country_code_2` in our `clean_data.countries` table.
 20 | 
 21 | ```sql
 22 | -- Remove Duplicate Entries
 23 | -- Check for duplicate entries
 24 | SELECT
 25 | 	-- Get the columns.
 26 | 	country_code_2,
 27 | 	-- Count how many times the country_code_2 occurs.
 28 | 	count(*)
 29 | FROM
 30 | 	cleaned_data.countries
 31 | GROUP BY 
 32 | 	-- Using an aggregate function forces us to group all exact country_codes together.
 33 | 	country_code_2
 34 | HAVING 
 35 | 	-- Only select values that have a count greater than one (multiple entries).
 36 | 	count(*) > 1;
 37 | ```
 38 | 
 39 | **Results:**
 40 | 
 41 | country_code_2|count|
 42 | --------------|-----|
 43 | cn            |    3|
 44 | al            |    2|
 45 | sa            |    2|
 46 | ie            |    2|
 47 | ly            |    2|
 48 | mo            |    2|
 49 | re            |    2|
 50 | ph            |    2|
 51 | zw            |    2|
 52 | az            |    2|
 53 | kw            |    2|
 54 | cy            |    3|
 55 | rs            |    2|
 56 | mh            |    2|
 57 | 
 58 | According to our results, there are 14 `country_codes_2` with multiple entries.  Let's create another query that `deletes` all of the multiple entries. 
 59 | 
 60 | ```sql
 61 | -- Delete duplicate entries
 62 | DELETE 
 63 | FROM 
 64 | 	-- Add an alias to the id's we wish to keep
 65 | 	cleaned_data.countries AS clean_data_1
 66 | USING 
 67 | 	-- Add an alias to the duplicate id's
 68 | 	cleaned_data.countries AS clean_data_2
 69 | WHERE 
 70 | 	-- This statement will remove the greater value id's.
 71 | 	clean_data_1.country_id > clean_data_2.country_id
 72 | AND 
 73 | 	-- This statement ensures that both values are identical.
 74 | 	clean_data_1.country_code_2 = clean_data_2.country_code_2;
 75 | ```
 76 | 
 77 | Now, if we were to rerun our first query which checked for duplicate entries, it should not return anything.
 78 | 
 79 | ```sql
 80 | -- Remove Duplicate Entries
 81 | -- Check for duplicate entries
 82 | SELECT
 83 | 	-- Get the columns.
 84 | 	country_code_2,
 85 | 	-- Count how many times the country_code_2 occurs.
 86 | 	count(*)
 87 | FROM
 88 | 	cleaned_data.countries
 89 | GROUP BY 
 90 | 	-- Using an aggregate function forces us to group all like names together.
 91 | 	country_code_2
 92 | HAVING 
 93 | 	-- Only select values that have a count greater than one (multiple entries).
 94 | 	count(*) > 1;
 95 | ```
 96 | **Results:**
 97 | 
 98 | country_code_2|count|
 99 | --------------|-----|
100 | 
101 | There are no results.  We have successfully deleted multiple entries in the `cleaned_data.countries` table.  We can also take a peek at our `clean_data.countries` table and check if it removed the duplicate entries.  Initally we saw that `Albania` had multiple entries.
102 | 
103 | ```sql
104 | SELECT 
105 | 	country_name,
106 | 	country_code_2
107 | FROM 
108 | 	cleaned_data.countries 
109 | LIMIT 5;
110 | ```
111 | 
112 | **Results:**
113 | country_name  |country_code_2|
114 | --------------|--------------|
115 | afghanistan   |af            |
116 | albania       |al            |
117 | algeria       |dz            |
118 | american samoa|as            |
119 | andorra       |ad            |
120 | 
121 | The multiple entries for `Albania` have been deleted.  
122 | 
123 | This query works for the `clean_data.countries` table because every country has one unique `country_code_2`.  The problem with using this method with the other tables is that many different countries share cities of the same name, currency type and languages.  The other tables would have to seach for a combination of the table fields.  To reduce clutter, the only table that has duplicate entries is the `clean_data.countries` table.
124 | 
125 | If for whatever reason you needed to start over, you can easily rerun the `build_tables.sql`  and then the `normalize_table.sql` scripts and regenerate the data consistently.
126 | 
127 | The complete `remove_duplicates.sql` script should look like...
128 | 
129 | ```sql
130 | /*
131 | 	SQL Code Challenge
132 | 	Author: Jaime M. Shaker
133 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
134 | 	Website: https://www.shaker.dev
135 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
136 | 	
137 | 	File Name: remove_duplicates.sql
138 | */
139 | 
140 | -- Remove Duplicate Entries
141 | -- Check for duplicate entries in clean_data.countries
142 | SELECT
143 | 	-- Get the columns.
144 | 	country_code_2,
145 | 	-- Count how many times the country_code_2 occurs.
146 | 	count(*)
147 | FROM
148 | 	cleaned_data.countries
149 | GROUP BY 
150 | 	-- Using an aggregate function forces us to group all like names together.
151 | 	country_code_2
152 | HAVING 
153 | 	-- Only select values that have a count greater than one (multiple entries).
154 | 	count(*) > 1;
155 | 	
156 | -- Delete duplicate entries
157 | 
158 | DELETE 
159 | FROM 
160 | 	-- Add an alias to the id's we wish to keep
161 | 	cleaned_data.countries AS clean_data_1
162 | USING 
163 | 	-- Add an alias to the duplicate id's
164 | 	cleaned_data.countries AS clean_data_2
165 | WHERE 
166 | 	-- This statement will remove the greater value id's.
167 | 	clean_data_1.country_id > clean_data_2.country_id
168 | AND 
169 | 	-- This statement ensures that both values are identical.
170 | 	clean_data_1.country_code_2 = clean_data_2.country_code_2;
171 | ```
172 | 
173 | With our table data cleaned and free of duplicate entries, we can clean-up our database before progressing..
174 | 
175 | Go to [WALKTHROUGH_CLEANUP](WALKTHROUGH_5_CLEANUP.md)
176 | 
177 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
178 | 
179 | 
180 | 
181 | 


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_5_CLEANUP.md:
--------------------------------------------------------------------------------
 1 | ## Basic/Intermediate SQL Analytics Code Challenge
 2 | 
 3 | **Author**: Jaime M. Shaker <br />
 4 | **Email**: jaime.m.shaker@gmail.com <br />
 5 | **Website**: https://www.shaker.dev <br />
 6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
 7 | 
 8 | 
 9 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
10 | 
11 | ###  Clean Up 
12 | 
13 | Now that are tables have been cleaned, we can actually begin answering the questions of the `SQL Coding Challenge`.  Before we do that, let's do a little clean up and `Drop` our import schema and the tables associated with it as they are not needed anymore.
14 | 
15 | If for whatever reason you needed to start over, you can easily rerun the `build_tables.sql`  and then the `normalize_table.sql` scripts and regenerate the data consistently.
16 | 
17 | `DROP TABLE` removes tables from the database. To empty a table of rows without destroying the table, use `DELETE` or `TRUNCATE`.
18 | 
19 | The complete `cleanup_db.sql` script should look like...
20 | 
21 | ```sql
22 | /*
23 | 	SQL Code Challenge
24 | 	Author: Jaime M. Shaker
25 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
26 | 	Website: https://www.shaker.dev
27 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
28 | 	
29 | 	File Name: cleanup_db.sql
30 | */
31 | 	
32 | -- Drop All import_data Tables and Schema
33 | 
34 | DROP TABLE import_data.countries;
35 | DROP TABLE import_data.cities;
36 | DROP TABLE import_data.currencies;
37 | DROP TABLE import_data.languages;
38 | DROP SCHEMA import_data CASCADE;
39 | ```
40 | 
41 | With our database cleaned of any unnecessary tables and schema, we can now add `Foreign Key` relationships.
42 | 
43 | Go to [WALKTHROUGH_RELATIONSHIPS](WALKTHROUGH_6_RELATIONSHIPS.md)
44 | 
45 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
46 | 
47 | 
48 | 
49 | 


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_6_RELATIONSHIPS.md:
--------------------------------------------------------------------------------
 1 | ## Basic/Intermediate SQL Code Challenge
 2 | 
 3 | **Author**: Jaime M. Shaker <br />
 4 | **Email**: jaime.m.shaker@gmail.com <br />
 5 | **Website**: https://www.shaker.dev <br />
 6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
 7 | 
 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
 9 | 
10 | ### Foreign Keys
11 | 
12 | All of our tables in the `cleaned_data` schema share a common field named `country_code_2`.  We also know that this field is `UNIQUE` in the `cleaned_data.countries` table because there can only be one country code per country.
13 | 
14 | Using this information, we can make the `cleaned_data.countries` table the parent table and all other tables can have a [One-to-One](https://fmhelp.filemaker.com/help/18/fmp/en/index.html#page/FMP_Help/one-to-one-relationships.html) or a [One-to-Many](https://help.claris.com/archive/help/18/fmp/en/index.html#page/FMP_Help%2Fone-to-many-relationships.html%23)  `FOREIGN KEY` relationship to it.
15 | 
16 | First, create a new script and name it `create_relationships.sql`.  Create a query to `ALTER` the `cleaned_data.countries` table and add the `UNIQUE` constraint to the `country_code_2` field.  Then we can `ALTER` the child tables and add a `FOREIGN KEY` constraint.
17 | 
18 | ```sql
19 | /*
20 | 	SQL Code Challenge
21 | 	Author: Jaime M. Shaker
22 | 	Email: jaime.m.shaker@gmail.com or jaime@shaker.dev
23 | 	Website: https://www.shaker.dev
24 | 	LinkedIn: https://www.linkedin.com/in/jaime-shaker/
25 | 	
26 | 	File Name: create_relationships.sql
27 | */
28 | 
29 | -- Alter cleaned_data.countries and add the UNIQUE constraint to country_code_2
30 | 
31 | ALTER TABLE 
32 | 	cleaned_data.countries 
33 | ADD CONSTRAINT 
34 | 	unique_country_code_2 
35 | UNIQUE (country_code_2);
36 | 
37 | -- Alter all other tables and add a foreign key constraint and reference.
38 | -- Create Foreign Key relationship for cleaned_data.cities
39 | ALTER TABLE 
40 | 	-- Table to be altered
41 | 	cleaned_data.cities
42 | ADD CONSTRAINT 
43 | 	-- Give this constraint a name
44 | 	fk_country_city 
45 | 	-- Which key in cleaned_data.cities is a foreign key
46 | FOREIGN KEY (country_code_2)
47 | 	-- Which key to reference from parent table
48 | REFERENCES cleaned_data.countries (country_code_2);
49 | 
50 | -- Create Foreign Key relationship for cleaned_data.currencies
51 | ALTER TABLE
52 | 	cleaned_data.currencies
53 | ADD CONSTRAINT 
54 | 	fk_country_currencies
55 | FOREIGN KEY (country_code_2)
56 | REFERENCES cleaned_data.countries (country_code_2);
57 | 
58 | -- Create Foreign Key relationship for cleaned_data.languages
59 | ALTER TABLE 
60 | 	cleaned_data.languages
61 | ADD CONSTRAINT 
62 | 	fk_country_languages 
63 | FOREIGN KEY (country_code_2)
64 | REFERENCES cleaned_data.countries (country_code_2);
65 | 
66 | ```
67 | 
68 | Once you execute the complete script and have created a `FOREIGN KEY` constraint for all of our other tables, your `Entity Relationship Diagram` should look something like this.
69 | 
70 | ![alt text](../images/ERD.PNG)
71 | 
72 | :warning:**Bonus**:warning:
73 | * Automate this mini ETL process using a [Bash shell script](WALKTHROUGH_BONUS_BASH.md)!
74 | 
75 | We are now ready to complete the `SQL Coding Challenge`.
76 | 
77 | click the link below
78 | 
79 | Go to [WALKTHROUGH_CODE_CHALLENGE](WALKTHROUGH_7_CODE_CHALLENGE.md)
80 | 
81 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
82 | 
83 | 
84 | 
85 | 


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_7_CODE_CHALLENGE.md:
--------------------------------------------------------------------------------
  1 | ## Basic/Intermediate SQL Code Challenge
  2 | 
  3 | **Author**: Jaime M. Shaker <br />
  4 | **Email**: jaime.m.shaker@gmail.com <br />
  5 | **Website**: https://www.shaker.dev <br />
  6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
  7 | 
  8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
  9 | 
 10 | ### SQL Code Challenge
 11 | 
 12 | :warning: To start over with an empty database, run the the `drop_all_tables.sql` script. :warning:
 13 | 
 14 | <strong>1.  Create Database, Tables and Relations.</strong>   
 15 | Using the CSV files located in `source_data/csv_data`, create your new SQL database and tables with the properly formatted data.
 16 | 
 17 | * Add a numeric, auto-incrementing Primary Key to every table.
 18 | * In the `countries` table, add the column `created_on` with the current date.
 19 | * Create a one-to-one and one-to-many relationship with the countries table as the parent table.
 20 | 
 21 | :exclamation: This question has already been completed with the ETL process. :exclamation:
 22 | 
 23 | <strong>2.  List Regions and Country Count</strong>  
 24 | List all of the regions and the total number of countries in each region.  Order by country count in descending order and capitalize the region name.
 25 | 
 26 | <details>
 27 |   <summary>Click to expand expected results!</summary>
 28 | 
 29 |   ##### Expected Results:
 30 | 
 31 | region   |country_count|
 32 | ---------|-------------|
 33 | Africa   |           59|
 34 | Americas |           57|
 35 | Asia     |           50|
 36 | Europe   |           48|
 37 | Oceania  |           26|
 38 | Antartica|            1|
 39 | 
 40 | </details>
 41 | </p>
 42 | 
 43 | <details>
 44 |   <summary>Click to expand answer!</summary>
 45 | 
 46 |   ##### Answer
 47 |   ```sql
 48 | SELECT 
 49 | 	-- initcap() capitalizes the first letter of every word in a string.
 50 | 	initcap(region) AS region,
 51 | 	count(*) AS country_count
 52 | FROM
 53 | 	cleaned_data.countries
 54 | GROUP BY
 55 | 	-- Aggregate functions 'count()' require you to group all column fields.
 56 | 	region
 57 | ORDER BY 
 58 | 	country_count DESC;
 59 |   ```
 60 | </details>
 61 | <br />
 62 | 
 63 | <strong>3. List Sub-Regions and City Count</strong>  
 64 | List all of the sub-regions and the total number of cities in each sub-region.  Order by sub-region name alphabetically.
 65 | 
 66 | <details>
 67 |   <summary>Click to expand expected results!</summary>
 68 | 
 69 |   ##### Expected Results:
 70 | 
 71 | sub_region                     |city_count|
 72 | -------------------------------|----------|
 73 | Australia And New Zealand      |       329|
 74 | Central Asia                   |       560|
 75 | Eastern Asia                   |      3164|
 76 | Eastern Europe                 |      2959|
 77 | Latin America And The Caribbean|      7204|
 78 | Melanesia                      |        60|
 79 | Micronesia                     |        15|
 80 | Northern Africa                |      1152|
 81 | Northern America               |      5844|
 82 | Northern Europe                |      2025|
 83 | Polynesia                      |        22|
 84 | Southeastern Asia              |      2627|
 85 | Southern Asia                  |      6848|
 86 | Southern Europe                |      3238|
 87 | Subsaharan Africa              |      3223|
 88 | Western Asia                   |      1400|
 89 | Western Europe                 |      3952|
 90 | 
 91 | </details>
 92 | </p>
 93 | 
 94 | <details>
 95 |   <summary>Click to expand answer!</summary>
 96 | 
 97 |   ##### Answer
 98 |   ```sql
 99 | SELECT 
100 | 	initcap(t1.sub_region) AS sub_region,
101 | 	count(*) AS city_count
102 | FROM
103 | 	cleaned_data.countries AS t1
104 | JOIN 
105 | 	cleaned_data.cities AS t2
106 | ON
107 | 	t1.country_code_2 = t2.country_code_2
108 | GROUP BY
109 | 	t1.sub_region
110 | ORDER BY 
111 | 	t1.sub_region;
112 |   ```
113 | </details>
114 | <br />
115 | 
116 | <strong>4. Specific Sub-Region and String Functions</strong>  
117 | List all of the countries and the total number of cities in the Northern Europe sub-region.  List the country names in uppercase and order the list by the length of the country name and alphabetically in ascending order.
118 | 
119 | <details>
120 |   <summary>Click to expand expected results!</summary>
121 | 
122 |   ##### Expected Results:
123 | 
124 | country_name  |city_count|
125 | --------------|----------|
126 | JERSEY        |         1|
127 | LATVIA        |        39|
128 | NORWAY        |       127|
129 | SWEDEN        |       148|
130 | DENMARK       |        75|
131 | ESTONIA       |        20|
132 | FINLAND       |       142|
133 | ICELAND       |        12|
134 | IRELAND       |        64|
135 | LITHUANIA     |        61|
136 | ISLE OF MAN   |         2|
137 | FAROE ISLANDS |        29|
138 | UNITED KINGDOM|      1305|
139 | 
140 | </details>
141 | </p>
142 | 
143 | <details>
144 |   <summary>Click to expand answer!</summary>
145 | 
146 |   ##### Answer
147 |   ```sql
148 | SELECT 
149 | 	-- upper() returns your string in uppercase.
150 | 	upper(t1.country_name) AS country_name,
151 | 	count(*) AS city_count
152 | FROM
153 | 	cleaned_data.countries AS t1
154 | JOIN 
155 | 	cleaned_data.cities AS t2
156 | ON 
157 | 	t1.country_code_2 = t2.country_code_2
158 | WHERE
159 | 	t1.sub_region = 'northern europe'
160 | GROUP BY 
161 | 	t1.country_name
162 | ORDER BY 
163 | 	-- length() returns the number or characters in a string including spaces.  
164 | 	length(t1.country_name), t1.country_name;
165 |   ```
166 | </details>
167 | <br />
168 | 
169 | 
170 | <strong>5. List Specific Countries by Year</strong>  
171 |  List all of the countries and the total number of cities in the Southern Europe sub-region that were inserted in 2021.  Capitalize the country names and order alphabetically by the **LAST** letter of the country name and the number of cities.
172 | 
173 | <details>
174 |   <summary>Click to expand expected results!</summary>
175 | 
176 |   ##### Expected Results:
177 | 
178 | country_name          |city_count|
179 | ----------------------|----------|
180 | Andorra               |         5|
181 | Albania               |        11|
182 | Bosnia And Herzegovina|        15|
183 | Croatia               |        22|
184 | North Macedonia       |        28|
185 | Malta                 |        32|
186 | Serbia                |        58|
187 | Slovenia              |        74|
188 | Greece                |        64|
189 | Portugal              |       109|
190 | Spain                 |       302|
191 | San Marino            |         2|
192 | Montenegro            |        12|
193 | Italy                 |       542|
194 | 
195 | </details>
196 | </p>
197 | 
198 | <details>
199 |   <summary>Click to expand answer!</summary>
200 | 
201 |   ##### Answer
202 |   ```sql
203 | SELECT 
204 | 	initcap(t1.country_name) AS country_name,
205 | 	count(*) AS city_count
206 | FROM
207 | 	cleaned_data.countries AS t1
208 | JOIN 
209 | 	cleaned_data.cities AS t2
210 | ON 
211 | 	t1.country_code_2 = t2.country_code_2
212 | WHERE
213 | 	t1.sub_region = 'southern europe'
214 | AND
215 | 	-- extract() & date_part() functions allow you to breakdown dates and timestamps to individual years, month, days, hours.....
216 | 	EXTRACT('year' FROM t2.insert_date) = 2021
217 | GROUP BY 
218 | 	t1.country_name
219 | ORDER BY 
220 | 	-- substring() function returns a specific portion of a string.
221 | 	substring(t1.country_name,length(t1.country_name),1), city_count;
222 |   ```
223 | </details>
224 | <br />
225 | 
226 | <strong>6. List Anti-Join</strong>  
227 | List all of the countries in the region of Asia that did **NOT** have a city with an inserted date from June 2021 through Sept 2021.
228 | 
229 | <details>
230 |   <summary>Click to expand expected results!</summary>
231 | 
232 |   ##### Expected Results:
233 | 
234 | country_name     |
235 | -----------------|
236 | Brunei Darussalam|
237 | Kuwait           |
238 | Macao            |
239 | Singapore        |
240 | 
241 | </details>
242 | </p>
243 | 
244 | <details>
245 |   <summary>Click to expand answer!</summary>
246 | 
247 |   ##### Answer
248 |   ```sql
249 | SELECT 
250 | 	-- Distinct will only return unique values
251 | 	DISTINCT initcap(t1.country_name) AS country_name
252 | FROM
253 | 	cleaned_data.countries AS t1
254 | -- Left join will return all matching values from the left table (cleaned_data.countries) and
255 | -- only the matching values from the right table (cleaned_tables.cities) resulting in NULLS where
256 | -- there are no matches in the left table.
257 | LEFT JOIN 
258 | 	cleaned_data.cities AS t2
259 | ON 
260 | 	t1.country_code_2 = t2.country_code_2
261 | AND
262 | 	t2.insert_date BETWEEN '2021-06-01' AND '2021-10-01'
263 | WHERE
264 | 	t1.region = 'asia'
265 | -- Only return values that did NOT have a match with the countries table.
266 | AND 
267 | 	t2.country_code_2 IS NULL;
268 |   ```
269 | </details>
270 | <br />
271 | 
272 | <strong>7. Reversable Names</strong>  
273 | List the country, city name, population and city name length for the city names that are [palindromes](https://en.wikipedia.org/wiki/Palindrome) in the Western Asia sub-region.  Format the population with a thousands separator (1,000) and format the length of the city name in roman numerals.  Order by the length of the city name in descending order and alphabetically in ascending order.
274 | 
275 | <details>
276 |   <summary>Click to expand expected results!</summary>
277 | 
278 |   ##### Expected Results:
279 | 
280 | country_name        |city_name|population|roman_numeral_length|
281 | --------------------|---------|----------|--------------------|
282 | Yemen               |Hajjah   |  46,568  |             VI     |
283 | Syrian Arab Republic|Hamah    | 696,863  |              V     |
284 | Turkey              |Kavak    |  21,692  |              V     |
285 | Turkey              |Kinik    |  29,803  |              V     |
286 | Turkey              |Tut      |  10,161  |            III     |
287 | 
288 | </details>
289 | </p>
290 | 
291 | <details>
292 |   <summary>Click to expand answer!</summary>
293 | 
294 |   ##### Answer
295 |   ```sql
296 | SELECT 
297 | 	initcap(t1.country_name) AS country_name,
298 | 	initcap(t2.city_name) AS city_name,
299 | 	-- to_char() takes non-string data types and returns them as strings.
300 | 	to_char(t2.population, '999G999') AS population,
301 | 	to_char(length(t2.city_name), 'RN') AS roman_numeral_length
302 | FROM
303 | 	cleaned_data.countries AS t1
304 | JOIN 
305 | 	cleaned_data.cities AS t2
306 | ON 
307 | 	t1.country_code_2 = t2.country_code_2
308 | WHERE
309 | 	t2.city_name = reverse(t2.city_name)
310 | AND
311 | 	t1.sub_region = 'western asia'
312 | ORDER BY 
313 | 	length(t2.city_name) DESC, t2.city_name ASC;
314 |   ```
315 | </details>
316 | <br />
317 | 
318 | <strong>8. Search with Wildcard and Case</strong>  
319 | List all of the countries that end in 'stan'.  Make your query case-insensitive and list whether the total population of the cities listed is an odd or even number for cities inserted in 2022.  Order by whether the population value is odd or even in ascending order and country name in alphabetical order.
320 | 
321 | <details>
322 |   <summary>Click to expand expected results!</summary>
323 | 
324 |   ##### Expected Results:
325 | 
326 | country_name|total_population|odd_or_even|
327 | ------------|----------------|-----------|
328 | Afghanistan |  6,006,530     |Even       |
329 | Kazakhstan  |  4,298,264     |Even       |
330 | Kyrgyzstan  |  1,017,644     |Even       |
331 | Pakistan    | 26,344,480     |Even       |
332 | Tajikistan  |  2,720,953     |Odd        |
333 | Turkmenistan|    419,607     |Odd        |
334 | Uzbekistan  |  3,035,547     |Odd        |
335 | 
336 | </details>
337 | </p>
338 | 
339 | <details>
340 |   <summary>Click to expand answer!</summary>
341 | 
342 |   ##### Answer
343 |   ```sql
344 | SELECT
345 | 	initcap(t1.country_name) AS country_name,
346 | 	to_char(sum(t2.population), '99G999G999') total_population,
347 | 	CASE
348 | 		WHEN (sum(t2.population) % 2) = 0
349 | 			THEN 'Even'
350 | 		ELSE 
351 | 			'Odd'
352 | 	END AS odd_or_even
353 | FROM
354 | 	cleaned_data.countries AS t1
355 | JOIN 
356 | 	cleaned_data.cities AS t2
357 | ON 
358 | 	t1.country_code_2 = t2.country_code_2
359 | WHERE
360 | 	t1.country_name ILIKE '%stan'
361 | AND 
362 | 	EXTRACT('year' FROM t2.insert_date) = 2022
363 | GROUP BY
364 | 	t1.country_name
365 | ORDER BY 
366 | 	odd_or_even, country_name;
367 |   ```
368 | </details>
369 | <br />
370 | 
371 | <strong>9. Ranking Regions</strong>  
372 | List the third most populated city ranked by region WITHOUT using limit or offset.  List the region name, city name, population and order the results by region.
373 | 
374 | <details>
375 |   <summary>Click to expand expected results!</summary>
376 | 
377 |   ##### Expected Results:
378 | 
379 | region  |city_name|third_largest_pop|
380 | --------|---------|-----------------|
381 | Africa  |Kinshasa | 12,836,000      |
382 | Americas|New York | 18,972,871      |
383 | Asia    |Delhi    | 32,226,000      |
384 | Europe  |Paris    | 11,060,000      |
385 | Oceania |Brisbane |  2,360,241      |
386 | 
387 | </details>
388 | </p>
389 | 
390 | <details>
391 |   <summary>Click to expand answer!</summary>
392 | 
393 |   ##### Answer
394 |   ```sql
395 | WITH get_city_rank_cte AS (
396 | 	SELECT
397 | 		t1.region,
398 | 		t2.city_name,
399 | 		t2.population AS third_largest_pop,
400 | 		DENSE_RANK() OVER (PARTITION BY t1.region ORDER BY t2.population DESC) AS rnk
401 | 	FROM
402 | 		cleaned_data.countries AS t1
403 | 	JOIN 
404 | 		cleaned_data.cities AS t2
405 | 	ON 
406 | 		t1.country_code_2 = t2.country_code_2
407 | 	WHERE 
408 | 		t2.population IS NOT NULL
409 | 	GROUP BY
410 | 		t1.region,
411 | 		t2.city_name,
412 | 		t2.population
413 | )
414 | SELECT
415 | 	initcap(region) AS region,
416 | 	initcap(city_name) AS city_name,
417 | 	to_char(third_largest_pop, '99G999G999') AS third_largest_pop
418 | FROM
419 | 	get_city_rank_cte
420 | WHERE
421 | 	rnk = 3;
422 |   ```
423 | </details>
424 | <br />
425 | 
426 | <strong>10. Using Buckets</strong>  
427 | List the bottom third of all countries in the Western Asia sub-region that speak Arabic.  Include the row number and country name.  Order by row number.
428 | 
429 | <details>
430 |   <summary>Click to expand expected results!</summary>
431 | 
432 |   ##### Expected Results:
433 | 
434 | row_id|country_name        |
435 | ------|--------------------|
436 | 9|saudi arabia        |
437 | 10|syrian arab republic|
438 | 11|united arab emirates|
439 | 12|yemen               |
440 | 
441 | </details>
442 | </p>
443 | 
444 | <details>
445 |   <summary>Click to expand answer!</summary>
446 | 
447 |   ##### Answer
448 |   ```sql
449 | WITH get_ntile_cte AS (
450 | 	SELECT
451 | 		ROW_NUMBER() OVER (ORDER BY t1.country_name) AS rn,
452 | 		t1.country_name,
453 | 		-- ntile() window functions returns groups of data section into 'buckets'.
454 | 		NTILE(3) OVER (ORDER BY t1.country_name) AS nt
455 | 	FROM
456 | 		cleaned_data.countries AS t1
457 | 	JOIN 
458 | 		cleaned_data.languages AS t2
459 | 	ON
460 | 		t1.country_code_2 = t2.country_code_2
461 | 	WHERE
462 | 		t1.sub_region = 'western asia'
463 | 	AND 
464 | 		t2.language = 'arabic'
465 | )
466 | SELECT
467 | 	rn AS row_id,
468 | 	country_name
469 | FROM
470 | 	get_ntile_cte
471 | WHERE
472 | 	nt = 3;
473 |   ```
474 | </details>
475 | <br />
476 | 
477 | <strong>11. Using Arrays</strong>  
478 | Create a query that lists country name, capital name, population, languages spoken and currency name for countries in the Northen Africa sub-region.  There can be multiple currency names and languages spoken per country.  Add multiple values for the same field into an array.
479 | 
480 | <details>
481 |   <summary>Click to expand expected results!</summary>
482 | 
483 |   ##### Expected Results:
484 | 
485 | country_name|city_name|population|languages                                   |currencies     |
486 | ------------|---------|----------|--------------------------------------------|---------------|
487 | algeria     |algiers  |   3415811|{french,arabic,kabyle}                      |algerian dinar |
488 | egypt       |cairo    |  20296000|{arabic}                                    |egyptian pound |
489 | libya       |tripoli  |   1293016|{arabic}                                    |libyan dinar   |
490 | morocco     |rabat    |    572717|{arabic,tachelhit,moroccan tamazight,french}|moroccan dirham|
491 | sudan       |khartoum |   7869000|{arabic,english}                            |sudanese pound |
492 | tunisia     |tunis    |   1056247|{french,arabic}                             |tunisian dinar |
493 | 
494 | </details>
495 | </p>
496 | 
497 | <details>
498 |   <summary>Click to expand answer!</summary>
499 | 
500 |   ##### Answer
501 |   ```sql
502 | WITH get_row_values AS (
503 | 	SELECT
504 | 		t1.country_name,
505 | 		t2.city_name,
506 | 		t2.population,
507 | 		-- array_agg() aggregates multiple values and returns them in 'array' format.
508 | 		array_agg(t3.LANGUAGE) AS languages,
509 | 		t4.currency_name AS currencies
510 | 	FROM
511 | 		cleaned_data.countries AS t1
512 | 	JOIN
513 | 		cleaned_data.cities AS t2
514 | 	ON 
515 | 		t1.country_code_2 = t2.country_code_2
516 | 	JOIN
517 | 		cleaned_data.languages AS t3
518 | 	ON 
519 | 		t1.country_code_2 = t3.country_code_2
520 | 	JOIN
521 | 		cleaned_data.currencies AS t4
522 | 	ON 
523 | 		t1.country_code_2 = t4.country_code_2
524 | 	WHERE
525 | 		t1.sub_region = 'northern africa'
526 | 	AND
527 | 		t2.capital = TRUE
528 | 	GROUP BY
529 | 		t1.country_name,
530 | 		t2.city_name,
531 | 		t2.population,
532 | 		t4.currency_name
533 | )
534 | SELECT
535 | 	*
536 | FROM
537 | 	get_row_values;
538 |   ```
539 | </details>
540 | <br />
541 | 
542 | <strong>12. Using Case and Percentages</strong>  
543 | Produce a query that returns the city names for cities in the U.S. that were inserted on April, 28th 2022.  List how many vowels and consonants are present in the city name and concatnate their percentage to the their respective count in parenthesis.  
544 | 
545 | <details>
546 |   <summary>Click to expand expected results!</summary>
547 | 
548 |   ##### Expected Results:
549 | 
550 | city_name      |vowel_count_perc|consonants_count_perc|
551 | ---------------|----------------|---------------------|
552 | standish       |2 (25.00%)      |6 (75%)              |
553 | grand forks    |2 (18.18%)      |9 (81.82%)           |
554 | camano         |3 (50.00%)      |3 (50%)              |
555 | cedar hills    |3 (27.27%)      |8 (72.73%)           |
556 | gladstone      |3 (33.33%)      |6 (66.67%)           |
557 | whitehall      |3 (33.33%)      |6 (66.67%)           |
558 | homewood       |4 (50.00%)      |4 (50%)              |
559 | willowbrook    |4 (36.36%)      |7 (63.64%)           |
560 | port salerno   |4 (33.33%)      |8 (66.67%)           |
561 | vadnais heights|5 (33.33%)      |10 (66.67%)          |
562 | jeffersonville |5 (35.71%)      |9 (64.29%)           |
563 | 
564 | </details>
565 | </p>
566 | 
567 | <details>
568 |   <summary>Click to expand answer!</summary>
569 | 
570 |   ##### Answer
571 |   ```sql
572 | WITH get_letter_count AS (
573 | 	SELECT
574 | 		t1.city_name,
575 | 		length(t1.city_name) string_length,
576 | 		-- regexp_replace() returns a vlue that has been manipulated by a regular expression.
577 | 		length(regexp_replace(t1.city_name, '[aeiou]', '', 'gi')) AS consonant_count
578 | 	FROM
579 | 		cleaned_data.cities AS t1
580 | 	WHERE
581 | 		t1.insert_date = '2022-04-28'
582 | 	AND
583 | 		t1.country_code_2 in ('us')
584 | ),
585 | get_letter_diff AS (
586 | 	SELECT
587 | 		city_name,
588 | 		string_length - consonant_count AS vowels,
589 | 		round(100 * (string_length - consonant_count) / string_length::NUMERIC, 2) AS vowel_perc,
590 | 		consonant_count AS consonants,
591 | 		round( 100 * (consonant_count)::NUMERIC / string_length, 2)::float AS consonants_perc
592 | 	FROM
593 | 		get_letter_count
594 | )
595 | SELECT 
596 | 	city_name,
597 | 	vowels || ' (' || vowel_perc || '%)' AS vowel_count_perc,
598 | 	consonants || ' (' || consonants_perc || '%)' AS consonants_count_perc
599 | FROM
600 | 	get_letter_diff
601 | ORDER BY 
602 | 	vowels;
603 |   ```
604 | </details>
605 | <br />
606 | 
607 | <strong>13. Most Consecutive Days</strong>  
608 | List the most consecutive inserted dates and the capitalized city names for cities in Canada that where inserted in April 2022.  
609 | 
610 | <details>
611 |   <summary>Click to expand expected results!</summary>
612 | 
613 |   ##### Expected Results:
614 | 
615 | most_consecutive_dates|city_name   |
616 | ----------------------|------------|
617 | 2022-04-22|South Dundas|
618 | 2022-04-23|La Prairie  |
619 | 2022-04-24|Elliot Lake |
620 | 2022-04-25|Lachute     |
621 | 
622 | </details>
623 | </p>
624 | 
625 | <details>
626 |   <summary>Click to expand answer!</summary>
627 | 
628 |   ##### Answer
629 |   ```sql
630 | DROP TABLE IF EXISTS get_dates;
631 | CREATE TEMP TABLE get_dates AS (
632 | 	SELECT
633 | 		DISTINCT ON (insert_date) insert_date AS insert_date,
634 | 		city_name
635 | 	FROM
636 | 		cleaned_data.cities
637 | 	WHERE
638 | 		country_code_2 = 'ca'
639 | 	AND
640 | 		insert_date BETWEEN '2022-04-01' AND '2022-04-30'
641 | 	ORDER BY
642 | 		insert_date
643 | );
644 | 
645 | DROP TABLE IF EXISTS get_diff;
646 | CREATE TEMP TABLE get_diff AS (
647 | 	SELECT
648 | 		city_name,
649 | 		insert_date,
650 | 		EXTRACT('day' FROM insert_date) - ROW_NUMBER() OVER (ORDER BY insert_date) AS diff
651 | 	FROM
652 | 		get_dates
653 | );
654 | 
655 | 
656 | DROP TABLE IF EXISTS get_diff_count;
657 | CREATE TEMP TABLE get_diff_count AS (
658 | 	SELECT
659 | 		city_name,
660 | 		insert_date,
661 | 		count(*) OVER (PARTITION BY diff) AS diff_count
662 | 	FROM
663 | 		get_diff
664 | );
665 | 
666 | 
667 | WITH get_rank AS (
668 | 	SELECT
669 | 		DENSE_RANK() OVER (ORDER BY diff_count desc) AS rnk,
670 | 		insert_date,
671 | 		city_name
672 | 	FROM
673 | 		get_diff_count
674 | )
675 | SELECT
676 | 	insert_date AS most_consecutive_dates,
677 | 	initcap(city_name) AS city_name
678 | FROM
679 | 	get_rank
680 | WHERE
681 | 	rnk = 1
682 | ORDER BY 
683 | 	insert_date;
684 |   ```
685 | </details>
686 | <br />
687 | 
688 | <strong>14. Month over Month in View</strong>  
689 | Create a view that lists the month-year, the number of cities inserted for that month, a running city count total and the month over month percentage growth for 2021.
690 | 
691 | Format the cities count and the running total with the thousands separator and format the month over month growth with a plus symbol and percentage symbol
692 | 
693 | Example: 
694 | month_year|cities_inserted|running_total|month_over_month|
695 | ----------|---------------|-------------|----------------|
696 | Feb-2021  | 1,291         |  2,762      |+87.76%         |
697 | 
698 | <details>
699 |   <summary>Click to expand expected results!</summary>
700 | 
701 |   ##### Expected Results:
702 | 
703 | month_year|cities_inserted|running_total|month_over_month|
704 | ----------|---------------|-------------|----------------|
705 | Jan-2021  | 1,471         |  1,471      |                |
706 | Feb-2021  | 1,291         |  2,762      |+87.76%         |
707 | Mar-2021  | 1,485         |  4,247      |+53.77%         |
708 | Apr-2021  | 1,508         |  5,755      |+35.51%         |
709 | May-2021  | 1,509         |  7,264      |+26.22%         |
710 | Jun-2021  | 1,395         |  8,659      |+19.20%         |
711 | Jul-2021  | 1,394         | 10,053      |+16.10%         |
712 | Aug-2021  | 1,481         | 11,534      |+14.73%         |
713 | Sep-2021  | 1,452         | 12,986      |+12.59%         |
714 | Oct-2021  | 1,446         | 14,432      |+11.14%         |
715 | Nov-2021  | 1,378         | 15,810      |+9.55%          |
716 | Dec-2021  | 1,472         | 17,282      |+9.31%          |
717 | 
718 | </details>
719 | </p>
720 | 
721 | <details>
722 |   <summary>Click to expand answer!</summary>
723 | 
724 |   ##### Answer
725 |   ```sql
726 | DROP VIEW IF EXISTS cleaned_data.view_2021_growth;
727 | CREATE VIEW cleaned_data.view_2021_growth AS (
728 | 	WITH get_month_count AS (
729 | 		SELECT
730 | 			date_trunc('month', insert_date) as single_month,
731 | 		  	count(*) AS monthly_count
732 | 		FROM 
733 | 			cleaned_data.cities
734 | 		WHERE 
735 | 			EXTRACT('year' FROM insert_date) = 2021
736 | 		GROUP BY 
737 | 			single_month
738 | 		ORDER BY 
739 | 			single_month
740 | 	),
741 | 	get_running_total AS (
742 | 		SELECT
743 | 			single_month::date,
744 | 		  	monthly_count,
745 | 		  	sum(monthly_count) OVER (ORDER BY single_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_num_cities
746 | 		FROM
747 | 			get_month_count
748 | 	),
749 | 	get_month_over_month AS (
750 | 		SELECT
751 | 			single_month,
752 | 			monthly_count,
753 | 			total_num_cities,
754 | 			round(100.0 * ((total_num_cities - Lag(total_num_cities, 1) OVER (ORDER BY single_month)) / Lag(total_num_cities, 1) OVER (ORDER BY single_month))::NUMERIC, 2) AS month_over_month
755 | 		FROM
756 | 			get_running_total
757 | 	)
758 | 	SELECT
759 | 		to_char(single_month, 'Mon-YYYY') AS month_year,
760 | 		to_char(monthly_count, '9G999') AS cities_inserted,
761 | 		to_char(total_num_cities, '99G999') AS running_total,
762 | 		to_char(month_over_month, 'sg99.99') || '%' AS month_over_month
763 | 	FROM
764 | 		get_month_over_month
765 | );
766 | 
767 | SELECT 
768 | 	*
769 | FROM 
770 | 	cleaned_data.view_2021_growth;
771 |   ```
772 | </details>
773 | <br />
774 | 
775 | <strong>15. Stored Procedure to CSV</strong>  
776 | Create and call a stored procedure that lists a unique row id number, insert date, country name, city name, population and languages spoken for countries in the Latin America and Caribbean sub-region that were inserted on either '2022-04-09', '2022-04-28' or '2022-08-11'.
777 | 
778 |  Order by the insert date and output the results (including headers) to a CSV file located in [/source_data/csv_output/](../source_data/csv_output/) . 
779 | 
780 | <details>
781 |   <summary>Click to expand expected results!</summary>
782 | 
783 |   ##### Expected Results:
784 | 
785 | Results located in [/source_data/csv_output/output.csv](../source_data/csv_output/output.csv)
786 | 
787 | </details>
788 | </p>
789 | 
790 | <details>
791 |   <summary>Click to expand answer!</summary>
792 | 
793 |   ##### Answer
794 |   ```sql
795 | CREATE OR REPLACE PROCEDURE cleaned_data.sproc_output ()
796 | LANGUAGE plpgsql
797 | AS 
798 | $sproc$
799 | 	BEGIN
800 | 		COPY (
801 | 			SELECT
802 | 				ROW_NUMBER() OVER (ORDER BY t1.insert_date) AS row_id,
803 | 				t1.insert_date,
804 | 				t2.country_name,
805 | 				t1.city_name,
806 | 				t1.population,
807 | 				array_agg(t3.language) AS languages
808 | 			FROM
809 | 				cleaned_data.cities AS t1
810 | 			JOIN
811 | 				cleaned_data.countries AS t2
812 | 			ON 
813 | 				t1.country_code_2 = t2.country_code_2
814 | 			LEFT JOIN
815 | 				cleaned_data.languages AS t3
816 | 			ON 
817 | 				t2.country_code_2 = t3.country_code_2
818 | 			WHERE
819 | 				t2.sub_region = 'latin america and the caribbean'
820 | 			AND
821 | 				t1.insert_date IN ('2022-04-09', '2022-04-28', '2022-08-11')
822 | 			GROUP BY 
823 | 				t1.insert_date,
824 | 				t2.country_name,
825 | 				t1.city_name,
826 | 				t1.population
827 | 			ORDER BY
828 | 				t1.insert_date
829 | 			)
830 | 		TO '/var/lib/postgresql/source_data/csv_output/output.csv' DELIMITER ',' CSV HEADER;
831 | 	END
832 | $sproc$;
833 | 
834 | -- Call the stored procedure
835 | CALL cleaned_data.sproc_output();
836 |   ```
837 | </details>
838 | <br />
839 | 
840 | :warning: To start over with an empty database, run the the `drop_all_tables.sql` script. :warning:
841 | 
842 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
843 | 
844 | 
845 | 
846 | 


--------------------------------------------------------------------------------
/walkthrough/WALKTHROUGH_BONUS_BASH.md:
--------------------------------------------------------------------------------
 1 | ## Basic/Intermediate SQL Code Challenge
 2 | 
 3 | **Author**: Jaime M. Shaker <br />
 4 | **Email**: jaime.m.shaker@gmail.com <br />
 5 | **Website**: https://www.shaker.dev <br />
 6 | **LinkedIn**: https://www.linkedin.com/in/jaime-shaker/  <br />
 7 | 
 8 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
 9 | 
10 | ### Bash Script
11 | 
12 | What is Bash?  
13 | * The name stands for **B**ourne **A**gain **SH**ell. It is an open source version of the Bourne Shell and was first released in 1989.  A shell is a command-line interpreter that provides a command line user interface for operating systems.
14 | 
15 | What is a Bash script?
16 | * A Bash script is a plain text file which contains a series of commands. These are the commands we would normally type ouselves on the command line.  Using [psql](https://www.postgresguide.com/utilities/psql/), we can create a Bash script that will execute our `.sql` files in order and always start with a fresh, clean database everytime we run the script.
17 | 
18 | Do I have Bash on my computer?
19 | * If you are following along this walk-through and using the Docker container, the answer is YES!  Bash is the default shell used in Linux and MacOS up to version 10.15 although it remains available as an alternative shell.
20 | 
21 | To use the Bash shell, let's connect to our Docker/PostgreSQL container by opening up another command line terminal and type in this command.
22 | 
23 | `docker exec -ti sql_coding_challenge bash`
24 | 
25 |  ![alt text](../images/bash_1.PNG)
26 | 
27 |  After executing this command, you should be logged into the Docker container as `root`.  Keep in mind that the `@a80c64aa877f` portion after `root` is the actual Docker ID for this container, which will be different for your unique container.
28 | 
29 |  ![alt text](../images/bash_2.PNG)
30 | 
31 |  We are logged in as the `root` user.  Let's `substitute user` with the `su` command.  Running around your file system as the `root` user can lead to problems if you don't know what you're doing.  In the terminal type and execute the command
32 | 
33 |  `su postgres`
34 | 
35 |  ![alt text](../images/bash_3.PNG)
36 | 
37 |  We are now connected to the container as the user `postgres`.  Now we can cd (change directory) to where our scripts are located which we initially set-up as the `/source_data/scripts/` directory..
38 | 
39 |  `cd ~/source_data/scripts/`
40 | 
41 |  We are now in our `source_data/scripts` directory.  
42 |  
43 |   ![alt text](../images/bash_4.PNG)
44 |  
45 |  Create a new file in the `source_data/scripts` directory and name it
46 | 
47 |  `start_fresh.sh`
48 | 
49 |  This is the Bash script that will execute the psql code.  Inside of the `start_fresh.sh` file,  the bash script should look like this.
50 | 
51 |  ```bash
52 |  #!/bin/bash
53 | 
54 | psql sql_coding_challenge -f drop_all_tables.sql
55 | psql sql_coding_challenge -f build_tables.sql
56 | psql sql_coding_challenge -f normalize_tables.sql
57 | psql sql_coding_challenge -f remove_duplicates.sql
58 | psql sql_coding_challenge -f cleanup_db.sql
59 | psql sql_coding_challenge -f create_relationships.sql
60 | echo "Build Process Complete"
61 |  ```
62 | 
63 | To run this script, use the following command in the same directory as the script.
64 | 
65 | `./start_fresh.sh` or `bash start_fresh.sh`
66 | 
67 | :exclamation:  Note: :exclamation: 
68 | * If upon trying to run the script you receive a `Permission Denied` error, you must run this command first, then try again.
69 | 	* `chmod u+x start_fresh.sh`
70 | 
71 | ![alt text](../images/bash_5.PNG)
72 | 
73 | Once the script starts, it will start executing the `.sql` files we just created and once it's complete, should look something similar to this.
74 | 
75 | ![alt text](../images/bash_6.PNG)
76 | 
77 | We are now right back to where we were when we first started this page.  
78 | 
79 | However, now we have a script that we can run our SQL files and produce a fresh set-up all the time, everytime.
80 | 
81 | We are now ready to complete the `SQL Coding Challenge`.
82 | 
83 | click the link below
84 | 
85 | Go to [WALKTHROUGH_CODE_CHALLENGE](WALKTHROUGH_7_CODE_CHALLENGE.md)
86 | 
87 | :exclamation: If you find this repository helpful, please consider giving it a :star:. Thanks! :exclamation:
88 | 
89 | 
90 | 
91 | 


--------------------------------------------------------------------------------