├── .gitignore
├── LICENSE
├── Practical-SQL-exercises.Rproj
├── README.Rmd
├── README.md
├── assets
└── cover.jpg
├── exercises
├── Chapter-01.sql
├── Chapter-02.sql
├── Chapter-03.sql
├── Chapter-04.sql
├── Chapter-05.sql
├── Chapter-06.sql
├── Chapter-07.sql
├── Chapter-08.sql
├── Chapter-09.sql
├── Chapter-10.sql
├── Chapter-11.sql
├── Chapter-12.sql
├── Chapter-13.sql
├── Chapter-14.sql
├── Chapter-15.sql
├── Chapter-16.sql
├── Chapter-17.sql
├── Chapter-18.sql
└── extra.sql
└── feedback
├── feedback.Rmd
└── feedback.pdf
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Indrajeet Patil
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/Practical-SQL-exercises.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: No
4 | SaveWorkspace: No
5 | AlwaysSaveHistory: No
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: knitr
13 | LaTeX: pdfLaTeX
14 |
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 |
18 | QuitChildProcessesOnExit: Yes
19 | DisableExecuteRprofile: Yes
20 |
--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Practical-SQL exercises"
3 | output: github_document
4 | ---
5 |
6 | ```{r setup, include=FALSE}
7 | knitr::opts_chunk$set(echo = TRUE)
8 | ```
9 |
10 | Solutions to exercises from the _Practical SQL_ (1st Edition) [book](https://nostarch.com/practicalSQL).
11 |
12 | - [My solutions](https://github.com/IndrajeetPatil/Practical-SQL-exercises/tree/master/exercises)
13 |
14 | - [Author's solutions](https://github.com/anthonydb/practical-sql/blob/master/Try_It_Yourself/Try_It_Yourself.sql)
15 |
16 | ```{r, out.width="60%", echo=FALSE}
17 | knitr::include_graphics("assets/cover.jpg")
18 | ```
19 |
20 | ## See also
21 |
22 | ### Code and data
23 |
24 | - [1st Edition Code and Data](https://github.com/anthonydb/practical-sql)
25 |
26 | - [2nd Edition Code and Data](https://github.com/anthonydb/practical-sql-2)
27 |
28 | ### Resources
29 |
30 | - [A curated list of awesome PostgreSQL software, libraries, tools and resources](https://github.com/dhamaniasad/awesome-postgres)
31 |
32 | ### Useful tips for debugging a few common problems
33 |
34 | - [If you find pgAdmin is too slow on Windows](https://dba.stackexchange.com/questions/201646/slow-connect-time-to-postgresql-on-windows-10)
35 |
36 | - [If you have problems importing data on Windows](https://stackoverflow.com/a/65459173/7973626)
37 |
38 | - [Configuring binary path on Windows](https://dba.stackexchange.com/questions/149169/binary-path-in-the-pgadmin-preferences)
39 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Practical-SQL exercises
2 | ================
3 |
4 | Solutions to exercises from the *Practical SQL* (1st Edition)
5 | [book](https://nostarch.com/practicalSQL).
6 |
7 | - [My
8 | solutions](https://github.com/IndrajeetPatil/Practical-SQL-exercises/tree/master/exercises)
9 |
10 | - [Author’s
11 | solutions](https://github.com/anthonydb/practical-sql/blob/master/Try_It_Yourself/Try_It_Yourself.sql)
12 |
13 |
14 |
15 | ## See also
16 |
17 | ### Code and data
18 |
19 | - [1st Edition Code and
20 | Data](https://github.com/anthonydb/practical-sql)
21 |
22 | - [2nd Edition Code and
23 | Data](https://github.com/anthonydb/practical-sql-2)
24 |
25 | ### Resources
26 |
27 | - [A curated list of awesome PostgreSQL software, libraries, tools and
28 | resources](https://github.com/dhamaniasad/awesome-postgres)
29 |
30 | ### Useful tips for debugging a few common problems
31 |
32 | - [If you find pgAdmin is too slow on
33 | Windows](https://dba.stackexchange.com/questions/201646/slow-connect-time-to-postgresql-on-windows-10)
34 |
35 | - [If you have problems importing data on
36 | Windows](https://stackoverflow.com/a/65459173/7973626)
37 |
38 | - [Configuring binary path on
39 | Windows](https://dba.stackexchange.com/questions/149169/binary-path-in-the-pgadmin-preferences)
40 |
--------------------------------------------------------------------------------
/assets/cover.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IndrajeetPatil/Practical-SQL-exercises/fb02667a2070212d7ff234ba79fc8941e866f2c0/assets/cover.jpg
--------------------------------------------------------------------------------
/exercises/Chapter-01.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 1 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- create a database for a zoo
15 |
16 | CREATE DATABASE zoo;
17 |
18 | -- create a table describing animals in zoo (humans not included ;-) )
19 | -- total_count is smallint coz a zoo won't have thousands of same animals
20 |
21 | CREATE TABLE animals (
22 | species_counter smallserial,
23 | species_name varchar(100),
24 | total_count smallint,
25 | avg_weight numeric,
26 | date_acquisition date
27 | );
28 |
29 |
30 | ---------
31 | -- Q2 --
32 | ---------
33 |
34 |
35 | -- insert some made up data into it
36 |
37 | INSERT INTO
38 | animals (
39 | species_name,
40 | total_count,
41 | avg_weight,
42 | date_acquisition
43 | )
44 | VALUES
45 | ('Zebra', 2, 200.10, '2020-2-3'),
46 | ('Lion', 1, 190.50, '2018-7-8'),
47 | ('Tiger', 2, 150.40, '2017-12-1');
48 |
49 |
--------------------------------------------------------------------------------
/exercises/Chapter-02.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 2 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- list in the format specified by the superintendant
15 |
16 | SELECT
17 | school,
18 | last_name
19 | FROM
20 | teachers
21 | ORDER BY
22 | school,
23 | last_name DESC;
24 |
25 |
26 | ---------
27 | -- Q2 --
28 | ---------
29 |
30 |
31 | -- find teachers with specified conditions
32 |
33 | SELECT
34 | *
35 | FROM
36 | teachers
37 | WHERE
38 | (first_name LIKE 'S%')
39 | AND (salary > 40000);
40 |
41 |
42 | ---------
43 | -- Q3 --
44 | ---------
45 |
46 |
47 | -- find and order teachers by hiring date and salary
48 |
49 | SELECT
50 | *
51 | FROM
52 | teachers
53 | WHERE
54 | hire_date > '2010-1-1'
55 | ORDER BY
56 | salary DESC;
57 |
--------------------------------------------------------------------------------
/exercises/Chapter-03.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 3 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------------
10 | -- Q1 and Q2 --
11 | ---------------
12 |
13 |
14 | -- create a database for a grocery delivery company
15 |
16 | CREATE DATABASE grocery;
17 |
18 | -- create a table for drivers
19 | -- choose the correct datatype for miles driven with required precision
20 | /*
21 | -- since mileage can't be more than 999 (3 digits on the left), and
22 | -- needed precision is one tenth of a mile (1 digit on the right)
23 | the correct datatype would be decimal(4, 1)
24 | */
25 | -- why have separate columns for first and last names?
26 | /*
27 | - we can then sort either by first or last names
28 | - query them further (e.g. check if two drivers have same name)
29 | - etc.
30 | */
31 |
32 | CREATE TABLE drivers(
33 | driver_id serial,
34 | first_name varchar(50),
35 | last_name varchar(50),
36 | workday date,
37 | clock_in time with time zone,
38 | clock_out time with time zone,
39 | miles_driven decimal(4, 1)
40 | );
41 |
42 | -- add entries to the table
43 |
44 | INSERT INTO
45 | drivers (
46 | first_name,
47 | last_name,
48 | workday,
49 | clock_in,
50 | clock_out,
51 | miles_driven
52 | )
53 | VALUES
54 | (
55 | 'Adam',
56 | 'Driver',
57 | '2021-04-05',
58 | '08:15 GMT+1',
59 | '13:00 GMT+1',
60 | 123.4545
61 | ),
62 | (
63 | 'Adam',
64 | 'Driver',
65 | '2021-04-05',
66 | '11:30 CET',
67 | '18:00 CET',
68 | 234
69 | ),
70 | (
71 | 'Krystyna',
72 | 'Pszczyna',
73 | '2021-04-06',
74 | '13:15 EST',
75 | '23:00 EST',
76 | 678.4344
77 | );
78 |
79 |
80 | ---------
81 | -- Q3 --
82 | ---------
83 |
84 |
85 | -- what do you see when a malformed date entry is cast to timestamp?
86 |
87 | CREATE TABLE drivers2(
88 | first_name varchar(50),
89 | workhours timestamp
90 | );
91 |
92 | INSERT INTO
93 | drivers2 (first_name, workhours)
94 | VALUES
95 | ('X', '4//2017');
96 |
97 | /*
98 | ERROR: invalid input syntax for type timestamp: "4//2017"
99 | LINE 9: ('X', '4//2017');
100 | ^
101 | SQL state: 22007
102 | Character: 146
103 | */
104 |
105 | -- delete this unnecessary table
106 |
107 | DROP TABLE drivers2;
108 |
--------------------------------------------------------------------------------
/exercises/Chapter-04.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 4 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- importing specified text file
15 | -- Ref.
16 |
17 | COPY movies
18 | FROM -- your file path will be different
19 | 'C:\Users\IndrajeetPatil\Documents\GitHub\Practical-SQL-exercises\assets\ch4q1.txt'
20 | WITH (FORMAT CSV, HEADER, DELIMITER ':', quote '#');
21 |
22 |
23 | ---------
24 | -- Q2 --
25 | ---------
26 |
27 |
28 | -- counties with most housing units
29 |
30 | COPY (
31 | SELECT
32 | geo_name,
33 | state_us_abbreviation,
34 | housing_unit_count_100_percent
35 | FROM
36 | us_counties_2010
37 | ORDER BY
38 | housing_unit_count_100_percent DESC
39 | LIMIT
40 | 20
41 | )
42 | TO -- your file path will be different
43 | 'C:\Users\IndrajeetPatil\Documents\GitHub\Practical-SQL-exercises\assets\ch4q2.txt'
44 | WITH (FORMAT csv, HEADER);
45 |
46 |
47 | ---------
48 | -- Q3 --
49 | ---------
50 |
51 |
52 | -- fixed-point numbers
53 | /*
54 | numeric(3,8) will not work for the provided values and you will get the error
55 | 'NUMERIC scale 8 must be between 0 and precision 3
56 | This is because with this scale all values will be to the right of the decimal
57 | point and so the value needs to be between 0 and 1
58 | */
59 |
60 | SELECT
61 | CAST(17519.668 AS numeric(3, 8));
62 |
63 | -- it instead needs to be the following
64 | -- scale of 3 means 3 digits to the right of the decimal point
65 |
66 | SELECT
67 | CAST(17519.668 AS numeric(8, 3));
68 |
69 | -- if the value were to be between 0 and 1, the following will work
70 |
71 | SELECT
72 | CAST(0.17519668 AS numeric(8, 8));
73 |
--------------------------------------------------------------------------------
/exercises/Chapter-05.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 5 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- calculating the area of a circle whose radius is 5 inches
15 | /* no paranthese needed because exponent/square function has
16 | a higher precedence than multiplication
17 | */
18 |
19 | SELECT
20 | 3.14 * SQUARE(5);
21 |
22 |
23 | ---------
24 | -- Q2 --
25 | ---------
26 |
27 |
28 | /*
29 | You first need to have created a database and then also a table
30 | code here: https://github.com/anthonydb/practical-sql/blob/ebc2233c41f2ba18512daf7c4a1a2f4c7f72a520/Chapter_04/Chapter_04.sql#L20-L123
31 |
32 | -- Importing data from csv (path to file will be different for everyone)
33 | -- If you get 'Permission denied' error, see: https://stackoverflow.com/a/65459173/7973626
34 | -- If that still doesn't work, use the pgAdmin GUI to import data
35 | -- configure binary path, see: https://dba.stackexchange.com/questions/149169/binary-path-in-the-pgadmin-preferences
36 | */
37 | -- Q2. NY county with the highest percentage of
38 | -- CREATE DATABASE us_census;
39 |
40 |
41 | COPY us_counties_2010
42 | FROM -- your file path will be different
43 | 'C:\Users\IndrajeetPatil\Documents\GitHub\Practical-SQL-exercises\assets'
44 | WITH (FORMAT csv, HEADER);
45 |
46 | -- Frankin County with just about 7% people identifying as American Indian and Alaska Native alone
47 | -- You can see map here: https://www.census.gov/quickfacts/fact/map/franklincountynewyork/RHI325219
48 | -- As to why? I have no clue. Google search wasn't that helpful.
49 |
50 | SELECT
51 | geo_name,
52 | (
53 | CAST(p0010005 AS numeric(8, 1)) / population_count_100_percent
54 | ) * 100 AS "prop"
55 | FROM
56 | us_counties_2010
57 | WHERE
58 | state_us_abbreviation = 'NY'
59 | ORDER BY
60 | prop DESC;
61 |
62 |
63 | ---------
64 | -- Q3 --
65 | ---------
66 |
67 |
68 | -- median county population in CA (179140.5) was higher than NY (91301)
69 |
70 | SELECT
71 | state_us_abbreviation,
72 | percentile_cont(0.5) WITHIN GROUP (
73 | ORDER BY
74 | population_count_100_percent
75 | )
76 | FROM
77 | us_counties_2010
78 | WHERE
79 | state_us_abbreviation IN ('NY', 'CA')
80 | GROUP BY
81 | state_us_abbreviation;
82 |
--------------------------------------------------------------------------------
/exercises/Chapter-06.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 6 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- The curious case of missing counties
15 | -- identify which counties don’t exist in either of the tables
16 | /*
17 | Present in 2000 but not in 2010 (these counties were incorporated in city-and-borough)
18 | - Clifton Forge city, Virginia
19 | - Prince of Wales-Outer Ketchikan Census Area, Alaska
20 | - Skagway-Hoonah-Angoon Census Area, Alaska
21 | - Wrangell-Petersburg Census Area, Alaska
22 |
23 | Present in 2010 but not in 2000 (newly created counties)
24 | - Broomfield County
25 | - Hoonah-Angoon Census Area
26 | - Petersburg Census Area
27 | - Prince of Wales-Hyder Census Area
28 | - Skagway Municipality
29 | - Wrangell City and Borough
30 | */
31 |
32 | SELECT
33 | c2000.geo_name AS county_2000_name,
34 | c2000.county_fips AS county_2000,
35 | c2010.geo_name AS county_2010_name,
36 | c2010.county_fips AS county_2010
37 | FROM
38 | us_counties_2010 AS c2010 FULL
39 | JOIN us_counties_2000 AS c2000 ON c2010.state_fips = c2000.state_fips
40 | AND c2010.county_fips = c2000.county_fips
41 | WHERE
42 | (c2010.county_fips IS NULL)
43 | OR (c2000.county_fips IS NULL)
44 | ORDER BY
45 | county_2010_name;
46 |
47 |
48 | ---------
49 | -- Q2 --
50 | ---------
51 |
52 |
53 | -- median of the percent change in county population: 3.2%
54 |
55 | SELECT
56 | percentile_cont(.5) WITHIN GROUP (
57 | ORDER BY
58 | (
59 | (c2010.p0010001::numeric - c2000.p0010001) / c2000.p0010001 * 100
60 | )
61 | ) AS median_pct_change
62 | FROM
63 | us_counties_2010 AS c2010
64 | INNER JOIN us_counties_2000 AS c2000 ON c2010.state_fips = c2000.state_fips
65 | AND c2010.county_fips = c2000.county_fips
66 | AND c2010.p0010001 <> c2000.p0010001;
67 |
68 |
69 | ---------
70 | -- Q3 --
71 | ---------
72 |
73 |
74 | -- the greatest percentage loss of population
75 | /*
76 | The greatest percentage loss of population between 2000 and 2010 was seen
77 | in St. Bernard Parish, Louisiana (-46.6%). This was because, on August 29, 2005,
78 | St. Bernard was devastated by Hurricane Katrina.
79 | */
80 |
81 | SELECT
82 | c2010.geo_name,
83 | c2010.state_us_abbreviation AS st,
84 | c2010.p0010001 AS pop_2010,
85 | c2000.p0010001 AS pop_2000,
86 | round(
87 | (c2010.p0010001::numeric - c2000.p0010001) / c2000.p0010001 * 100,
88 | 1
89 | ) AS pct_change
90 | FROM
91 | us_counties_2010 AS c2010
92 | INNER JOIN us_counties_2000 AS c2000 ON c2010.state_fips = c2000.state_fips
93 | AND c2010.county_fips = c2000.county_fips
94 | AND c2010.p0010001 <> c2000.p0010001
95 | ORDER BY
96 | pct_change ASC;
97 |
--------------------------------------------------------------------------------
/exercises/Chapter-07.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 7 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- Choosing primary and foreign keys and imposing constraints with justification
15 |
16 | CREATE DATABASE vinyl;
17 |
18 | CREATE TABLE albums (
19 | album_id bigserial,
20 | --each album should have a unique catalog code
21 | album_catalog_code varchar(100) NOT NULL,
22 | -- why not unique?: multiple artists can have albums with the same title
23 | album_title text,
24 | -- why not unique?: same artist can release multiple albums in a year
25 | album_artist text,
26 | -- why not unique?: multiple albums released on the same date
27 | album_release_date date,
28 | album_genre varchar(40),
29 | --each album should have a unique description
30 | album_description text,
31 | CONSTRAINT album_description_unique UNIQUE (album_description),
32 | CONSTRAINT album_code_unique UNIQUE (album_catalog_code),
33 | CONSTRAINT albums_key PRIMARY KEY (album_id)
34 | );
35 |
36 | CREATE TABLE songs (
37 | song_id bigserial,
38 | song_title text,
39 | song_artist text,
40 | -- no midnight calls and consistent with albums data type
41 | album_id bigserial REFERENCES albums (album_id),
42 | CONSTRAINT songs_key PRIMARY KEY (album_id)
43 | );
44 |
45 |
46 | ---------
47 | -- Q2 --
48 | ---------
49 |
50 |
51 | -- Alternative natural primary key
52 | /*
53 | Instead of `album_id` column as a *surrogate* primary key for the `albums` table,
54 | `album_artist` column could be a *natural* primary key.
55 | In this case, the foreign key for `songs` table will be `song_artist`.
56 | */
57 |
58 | CONSTRAINT albums_key PRIMARY KEY (album_artist);
59 |
60 |
61 | ---------
62 | -- Q3 --
63 | ---------
64 |
65 |
66 | -- Possible indexing columns
67 | /*
68 | good candidates for indexes: `album_id`, `album_catalog_code`, `album_description`
69 | */
70 |
71 | CREATE INDEX album_id_idx ON albums (album_id);
72 | CREATE INDEX album_catalog_code_idx ON albums (album_catalog_code);
73 | CREATE INDEX album_description_idx ON albums (album_description);
74 |
--------------------------------------------------------------------------------
/exercises/Chapter-08.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 8 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- Use of technology in library
15 | -- Create tables using Listing 8-1 and Listing 8-2
16 | /*
17 | Results show that-
18 |
19 | 1. There has been a big increases in gpterms (the number of internet-connected
20 | computers used by the public), with the biggest being 827% for Guam.
21 | Only two states had decrease: Hawaii (-2.4%) and Puerto Rico (-6.18%)
22 |
23 | 2. The pitusr (use of public internet computers per year), on the other hand,
24 | has seen both increase and descrease across states.
25 | */
26 |
27 | SELECT
28 | pls14.stabr,
29 | SUM(pls14.gpterms) AS gpterms_14,
30 | SUM(pls09.gpterms) AS gpterms_09,
31 | SUM(pls14.pitusr) AS pitusr_14,
32 | SUM(pls09.pitusr) AS pitusr_09,
33 | round(
34 | (
35 | CAST(
36 | SUM(pls14.gpterms) - SUM(pls09.gpterms) AS numeric(10, 1)
37 | ) / SUM(pls09.gpterms)
38 | ) * 100,
39 | 2
40 | ) AS pct_change_gpterms,
41 | round(
42 | (
43 | CAST(
44 | SUM(pls14.pitusr) - SUM(pls09.pitusr) AS numeric(10, 1)
45 | ) / SUM(pls09.pitusr)
46 | ) * 100,
47 | 2
48 | ) AS pct_change_pitusr
49 | FROM
50 | pls_fy2014_pupld14a AS pls14
51 | INNER JOIN pls_fy2009_pupld09a AS pls09 ON pls14.fscskey = pls09.fscskey
52 | WHERE
53 | -- negative integers means missing entries
54 | (pls14.gpterms > 0)
55 | AND (pls09.gpterms > 0)
56 | AND (pls14.pitusr > 0)
57 | AND (pls09.pitusr > 0)
58 | GROUP BY
59 | pls14.stabr
60 | ORDER BY
61 | pct_change_gpterms DESC;
62 |
63 |
64 | ---------
65 | -- Q2 --
66 | ---------
67 |
68 |
69 | -- create a table with Bureau of Economic Analysis Codes and corresponding regions
70 | -- Ref:
71 |
72 | CREATE TABLE region_details (
73 | obereg varchar(2) CONSTRAINT region_key PRIMARY KEY,
74 | region_name text
75 | );
76 |
77 | INSERT INTO
78 | region_details(obereg, region_name)
79 | VALUES
80 | ('01', 'New England Region'),
81 | ('02', 'Mideast Region'),
82 | ('03', 'Great Lakes Region'),
83 | ('04', 'Plain Region'),
84 | ('05', 'Southeast Region'),
85 | ('06', 'Southwest Region'),
86 | ('07', 'Rocky Mountain Region'),
87 | ('08', 'Far West Region'),
88 | ('09', 'Territories Region');
89 |
90 | -- percent change in visit by Bureau of Economic Analysis Code regions
91 | /*
92 | All regions have seen decrease, with the highest one (-12.66%) in Great Lakes region,
93 | while the lowest in (-1.78%) in Rocky Mountain Region.
94 | */
95 |
96 | SELECT
97 | pls14.region_name,
98 | pls14.obereg,
99 | round(
100 | (
101 | (sum(pls14.visits) :: numeric - sum(pls09.visits)) / sum(pls09.visits)
102 | ) * 100,
103 | 2
104 | ) AS pct_change_visits
105 | FROM
106 | (
107 | SELECT
108 | pls14.obereg,
109 | region_name,
110 | visits,
111 | fscskey
112 | FROM
113 | pls_fy2014_pupld14a AS pls14
114 | LEFT JOIN region_details AS regions ON pls14.obereg = regions.obereg
115 | ) AS pls14
116 | INNER JOIN pls_fy2009_pupld09a AS pls09 ON pls14.fscskey = pls09.fscskey
117 | WHERE
118 | pls14.visits > 0
119 | AND pls09.visits > 0
120 | GROUP BY
121 | pls14.region_name,
122 | pls14.obereg
123 | ORDER BY
124 | pct_change_visits;
125 |
126 |
127 | ---------
128 | -- Q3 --
129 | ---------
130 |
131 |
132 | -- Missing agencies
133 | -- There are 236 agencies which are not present in either of the tables
134 |
135 | SELECT
136 | *
137 | FROM
138 | (
139 | SELECT
140 | fscskey,
141 | libname
142 | FROM
143 | pls_fy2014_pupld14a
144 | ) AS pls14 FULL
145 | JOIN (
146 | SELECT
147 | fscskey,
148 | libname
149 | FROM
150 | pls_fy2009_pupld09a
151 | ) AS pls09 ON pls14.fscskey = pls09.fscskey
152 | WHERE
153 | (pls14.fscskey IS NULL)
154 | OR (pls09.fscskey IS NULL);
155 |
--------------------------------------------------------------------------------
/exercises/Chapter-09.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 9 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ----------
10 | -- Q1-3 --
11 | ----------
12 |
13 |
14 | -- creating count of different activities at the plant
15 | -- Step 1: create two new columns
16 |
17 | ALTER TABLE meat_poultry_egg_inspect
18 | ADD COLUMN meat_processing boolean,
19 | ADD COLUMN poultry_processing boolean;
20 |
21 | -- Step 2: set them to true based on activities column
22 |
23 | UPDATE
24 | meat_poultry_egg_inspect
25 | SET
26 | meat_processing = TRUE
27 | WHERE
28 | activities ILIKE '%Meat Processing%';
29 |
30 | UPDATE
31 | meat_poultry_egg_inspect
32 | SET
33 | poultry_processing = TRUE
34 | WHERE
35 | activities ILIKE '%Poultry Processing%';
36 |
37 | -- Step 3: counts
38 | -- Meat Processing: 4832
39 |
40 | SELECT
41 | count(*) AS meat_processing_total
42 | FROM
43 | meat_poultry_egg_inspect
44 | WHERE
45 | meat_processing IS NOT NULL;
46 |
47 | -- Poultry Processing: 3787
48 |
49 | SELECT
50 | count(*) AS poultry_processing
51 | FROM
52 | meat_poultry_egg_inspect
53 | WHERE
54 | poultry_processing IS NOT NULL;
55 |
56 | -- Both activities: 3395
57 |
58 | SELECT
59 | count(*) AS both_activities
60 | FROM
61 | meat_poultry_egg_inspect
62 | WHERE
63 | (poultry_processing IS NOT NULL)
64 | AND (meat_processing IS NOT NULL);
65 |
--------------------------------------------------------------------------------
/exercises/Chapter-10.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 10 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- Relationship between masters degree and median household income
15 | /*
16 | The r of the variables pct_bachelors_higher and median_hh_income was about .68,
17 | while the r of pct_masters_higher and median_hh_income is lower (.57).
18 | One possibility is that getting a master's degree or higherhas a more
19 | incremental effect on salary than getting a bachelor's degree.
20 | */
21 |
22 | SELECT
23 | round(
24 | cast(
25 | corr(acs.pct_masters_higher, acs.median_hh_income) AS numeric
26 | ),
27 | 2
28 | )
29 | FROM
30 | acs_2011_2015_stats AS acs;
31 |
32 |
33 | ---------
34 | -- Q2 --
35 | ---------
36 |
37 |
38 | -- Meaningful comparisons
39 | -- Milwaukee is at the top with 12.29 motor vehicle thefts per 1000 people
40 |
41 | SELECT
42 | city,
43 | motor_vehicle_theft,
44 | population,
45 | round(
46 | (motor_vehicle_theft::numeric / population) * 1000,
47 | 2
48 | ) AS pct_motor_vehicle_theft
49 | FROM
50 | fbi_crime_data_2015
51 | WHERE
52 | population > 500000
53 | ORDER BY
54 | pct_motor_vehicle_theft DESC;
55 |
56 | -- Detroit is at the top with 17.60 violent crimes per 1000 people
57 |
58 | SELECT
59 | city,
60 | violent_crime,
61 | population,
62 | round(
63 | (violent_crime::numeric / population) * 1000,
64 | 2
65 | ) AS pct_violent_crime
66 | FROM
67 | fbi_crime_data_2015
68 | WHERE
69 | population > 500000
70 | ORDER BY
71 | pct_violent_crime DESC;
72 |
73 |
74 | ---------
75 | -- Q3 --
76 | ---------
77 |
78 |
79 | -- Ranking library agencies based on rates of visits
80 | /*
81 | The highest rate (of 12962.83 per 1000 people) is the Cuyahoga County Public Library,
82 | which has 27 branches and serve 47 communities in Cuyahoga County, Ohio.
83 | For more, see:
84 | */
85 |
86 | SELECT
87 | libname,
88 | city,
89 | stabr,
90 | visits,
91 | popu_lsa,
92 | round((visits::numeric / popu_lsa) * 1000, 2) AS pct_visits,
93 | rank() over (
94 | ORDER BY
95 | round((visits::numeric / popu_lsa) * 1000, 2) DESC
96 | )
97 | FROM
98 | pls_fy2014_pupld14a
99 | WHERE
100 | (popu_lsa > 250000)
101 | AND (visits > 0)
102 | ORDER BY
103 | pct_visits DESC;
104 |
105 |
--------------------------------------------------------------------------------
/exercises/Chapter-11.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 11 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- sorted durations of journeys
15 | /*
16 | The longest journey durations are astoundingly close to 24 hours, even though
17 | the trip distance is a mere few kilometers. Is this due to traffic jams?
18 |
19 | On the lower end, the shortest durations are instantenous (0 seconds), and yet
20 | the customers seem to have paid a handsome sum. Why were they charged?!
21 | */
22 |
23 | SELECT
24 | trip_id,
25 | tpep_pickup_datetime,
26 | tpep_dropoff_datetime,
27 | (tpep_dropoff_datetime - tpep_pickup_datetime) AS duration,
28 | trip_distance,
29 | total_amount
30 | FROM
31 | nyc_yellow_taxi_trips_2016_06_01
32 | ORDER BY
33 | duration DESC;
34 |
35 |
36 | ---------
37 | -- Q2 --
38 | ---------
39 |
40 |
41 | -- exploring time zones
42 |
43 | CREATE TABLE time_test_table(test_dttm timestamptz);
44 |
45 | -- without daylight saving (which usually lasts from March to November)
46 |
47 | INSERT INTO
48 | time_test_table
49 | VALUES
50 | ('2100-01-01 00:00:00 UTC-04:00');
51 |
52 | -- London: 2099-12-31 20:00:00
53 |
54 | SELECT
55 | test_dttm AT TIME ZONE 'GMT'
56 | FROM
57 | time_test_table;
58 |
59 | -- Johannesburg: 2099-12-31 18:00:00
60 |
61 | SELECT
62 | test_dttm AT TIME ZONE 'GMT+2'
63 | FROM
64 | time_test_table;
65 |
66 | -- Moscow: 2099-12-31 17:00:00
67 |
68 | SELECT
69 | test_dttm AT TIME ZONE 'GMT+2'
70 | FROM
71 | time_test_table;
72 |
73 | -- Melbourne: 2099-12-31 09:00:00
74 |
75 | SELECT
76 | test_dttm AT TIME ZONE 'GMT+11'
77 | FROM
78 | time_test_table;
79 |
80 | -- Los Angeles (for fun): 2100-01-01 04:00:00
81 |
82 | SELECT
83 | test_dttm AT TIME ZONE 'GMT-8'
84 | FROM
85 | time_test_table;
86 |
87 |
88 | ---------
89 | -- Q3 --
90 | ---------
91 |
92 |
93 | -- correlation and r2 values for associations
94 | /*
95 | Relationship between trip duration and total amount is strong (r = .80, r2 = .64),
96 | with 64% of the variance in total amount paid accounted for by the trip duration.
97 |
98 | Relationship between trip distance and total amount is stronger (r = .86, r2 = .73),
99 | with 73% of the variance in total amount paid accounted for by the trip duration.
100 |
101 | Rest of the variance is explained by other factors (e.g. the tip amount is not
102 | related to either duration or distance).
103 | */
104 |
105 | SELECT
106 | -- relationship between trip duration and total amount
107 | round(
108 | corr(
109 | (
110 | date_part(
111 | 'epoch',
112 | tpep_dropoff_datetime - tpep_pickup_datetime
113 | )
114 | ),
115 | total_amount
116 | )::numeric,
117 | 2
118 | ) AS r_duration_amount,
119 | round(
120 | regr_r2(
121 | (
122 | date_part(
123 | 'epoch',
124 | tpep_dropoff_datetime - tpep_pickup_datetime
125 | )
126 | ),
127 | total_amount
128 | )::numeric,
129 | 2
130 | ) AS r2_duration_amount,
131 | -- relationship between trip distance and total amount
132 | round(corr(trip_distance, total_amount)::numeric, 2) AS r_distance_amount,
133 | round(
134 | regr_r2(trip_distance, total_amount)::numeric,
135 | 2
136 | ) AS r2_distance_amount
137 | FROM
138 | nyc_yellow_taxi_trips_2016_06_01
139 | WHERE
140 | (tpep_dropoff_datetime - tpep_pickup_datetime) < '03:00:00'::interval;
141 |
--------------------------------------------------------------------------------
/exercises/Chapter-12.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 12 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- Groupings of Waikiki's daily maximum temperatures
15 | /*
16 | In this news schema, Waikiki’s daily maximum temperature fall most often in the
17 | '86-87' group (118 observations), while the least observed bins are
18 | '90 or more' (5 observations) and '79 or less' (8 observations).
19 | */
20 |
21 | WITH temps_collapsed (max_temperature_group) AS (
22 | SELECT
23 | CASE
24 | WHEN max_temp >= 90 THEN '90 or more'
25 | WHEN max_temp BETWEEN 88
26 | AND 89 THEN '88-89'
27 | WHEN max_temp BETWEEN 86
28 | AND 87 THEN '86-87'
29 | WHEN max_temp BETWEEN 84
30 | AND 85 THEN '84-85'
31 | WHEN max_temp BETWEEN 82
32 | AND 83 THEN '82-83'
33 | WHEN max_temp BETWEEN 80
34 | AND 81 THEN '80-81'
35 | WHEN max_temp <= 79 THEN '79 or less'
36 | END
37 | FROM
38 | temperature_readings
39 | WHERE
40 | station_name = 'WAIKIKI 717.2 HI US'
41 | )
42 | SELECT
43 | max_temperature_group,
44 | count(max_temperature_group) AS n_obs
45 | FROM
46 | temps_collapsed
47 | GROUP BY
48 | max_temperature_group
49 | ORDER BY
50 | n_obs DESC;
51 |
52 |
53 | ---------
54 | -- Q2 --
55 | ---------
56 |
57 |
58 | -- instead of office-by-flavor, make flavor-by-office crosstabs table
59 | -- the counts remain the same
60 |
61 | SELECT *
62 | FROM crosstab('SELECT
63 | flavor,
64 | office,
65 | count(*)
66 | FROM ice_cream_survey
67 | GROUP BY flavor, office
68 | ORDER BY flavor',
69 |
70 | 'SELECT office
71 | FROM ice_cream_survey
72 | GROUP BY office
73 | ORDER BY office')
74 |
75 | AS (flavor varchar(20),
76 | Downtown bigint,
77 | Midtown bigint,
78 | Uptown bigint);
79 |
--------------------------------------------------------------------------------
/exercises/Chapter-13.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 13 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- function to remove the comma: regexp_replace. For example:
15 |
16 | SELECT
17 | regexp_replace('Alvarez, Jr.', ',', '');
18 |
19 | -- can regex help?
20 | -- just to extract first name: outputs 'Alvarez'
21 |
22 | SELECT
23 | (regexp_match('Alvarez, Jr.', '\w+')) [1];
24 |
25 | -- just to extract suffix: outputs 'Jr.'
26 |
27 | SELECT
28 | (regexp_match('Alvarez, Jr.', '\s.+')) [1];
29 |
30 | -- create a table with author names with suffixes and add values
31 |
32 | CREATE TABLE authors(
33 | author_id bigserial,
34 | author_name varchar(50)
35 | );
36 |
37 | INSERT INTO
38 | authors (author_name)
39 | VALUES
40 | ('Anthony'),
41 | ('Alvarez, Jr.'),
42 | ('Williams, Sr.');
43 |
44 | -- add a new column for suffixes and extract suffixes to it
45 |
46 | ALTER TABLE authors ADD COLUMN suffix varchar(10);
47 |
48 | UPDATE
49 | authors
50 | SET
51 | suffix = (regexp_match(author_name, '\s.+')) [1];
52 |
53 | -- now the suffixes can be removed from the original name
54 |
55 | UPDATE
56 | authors
57 | SET
58 | author_name = regexp_replace(author_name, ',\s.+', '');
59 |
60 |
61 | ---------
62 | -- Q2 --
63 | ---------
64 |
65 |
66 | -- for simplicity, let's create a new table only with one speech
67 |
68 | CREATE TABLE speech AS (
69 | SELECT
70 | *
71 | FROM
72 | president_speeches
73 | where
74 | sotu_id = 4
75 | );
76 |
77 | -- convert all text to lower case
78 | -- split the text into words based on space character (\s)
79 | -- then remove '.' and ',' and ':'
80 | -- then select only distinct words
81 | -- then counts only the words whose length is equal to or more than 5
82 | -- answer for this particular speech is 850
83 |
84 | WITH words AS (
85 | SELECT
86 | DISTINCT regexp_replace(
87 | regexp_split_to_table(lower(speech_text), '\s+'),
88 | '[.,:]',
89 | ''
90 | ) AS clean_words
91 | FROM
92 | speech
93 | ORDER BY
94 | clean_words DESC
95 | )
96 | SELECT
97 | count(clean_words)
98 | FROM
99 | words
100 | WHERE
101 | length(clean_words) >= 5;
102 |
103 |
104 | ---------
105 | -- Q3 --
106 | ---------
107 |
108 |
109 | -- Using ts_rank_cd() function
110 | /*
111 | Yes, ts_rank_cd() instead of ts_rank() does change result.
112 | With the former, the rankings are:
113 | Dwight D. Eisenhower, Richard M. Nixon, Harry S. Truman, Gerald R. Ford, Dwight D. Eisenhower
114 | Compare this rankind with that in table on page 238
115 | */
116 |
117 | SELECT
118 | president,
119 | speech_date,
120 | ts_rank_cd(
121 | search_speech_text,
122 | to_tsquery('war & security & threat & enemy')
123 | ) AS score
124 | FROM
125 | president_speeches
126 | WHERE
127 | search_speech_text @@ to_tsquery('war & security & threat & enemy')
128 | ORDER BY
129 | score DESC
130 | LIMIT
131 | 5;
132 |
--------------------------------------------------------------------------------
/exercises/Chapter-14.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 14 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- area of each state in square miles
15 |
16 | SELECT
17 | statefp10,
18 | sum(
19 | round(
20 | (ST_Area(geom::geography) / 2589988.110336)::numeric,
21 | 2
22 | )
23 | ) AS square_miles
24 | FROM
25 | us_counties_2010_shp
26 | GROUP BY
27 | statefp10
28 | ORDER BY
29 | square_miles DESC;
30 |
31 | -- How many states are bigger than Yukon-Koyukuk?
32 | -- 3 states (statefp10: 02, 06, 48)
33 |
34 | SELECT
35 | statefp10,
36 | sum(
37 | round(
38 | (ST_Area(geom::geography) / 2589988.110336)::numeric,
39 | 2
40 | )
41 | ) AS square_miles
42 | FROM
43 | us_counties_2010_shp
44 | GROUP BY
45 | statefp10
46 | HAVING
47 | sum(
48 | (ST_Area(geom::geography) / 2589988.110336)::numeric
49 | ) > (
50 | SELECT
51 | (ST_Area(geom::geography) / 2589988.110336)::numeric
52 | FROM
53 | us_counties_2010_shp
54 | WHERE
55 | name10 = 'Yukon-Koyukuk'
56 | )
57 | ORDER BY
58 | square_miles DESC;
59 |
60 |
61 | ---------
62 | -- Q2 --
63 | ---------
64 |
65 |
66 | -- extracting lat/long info
67 | -- compute distance between these two markets
68 |
69 | WITH
70 | market_start (geog_point) AS
71 | (
72 | SELECT geog_point
73 | FROM farmers_markets
74 | WHERE market_name = 'The Oakleaf Greenmarket'
75 | ),
76 | market_end (geog_point) AS
77 | (
78 | SELECT geog_point
79 | FROM farmers_markets
80 | WHERE market_name = 'Columbia Farmers Market'
81 | )
82 | SELECT ST_Distance(market_start.geog_point, market_end.geog_point) / 1609.344 -- convert to meters to miles
83 | FROM market_start, market_end;
84 |
85 |
86 | ---------
87 | -- Q3 --
88 | ---------
89 |
90 |
91 | -- cast column geom from geometry type to geography
92 | -- create two CTEs, one for US county
93 |
94 | WITH us_counties AS (
95 | SELECT
96 | namelsad10,
97 | ST_SetSRID(geom::geography, 4326) AS "geom"
98 | FROM
99 | us_counties_2010_shp
100 | ),
101 | markets AS (
102 | SELECT
103 | *
104 | FROM
105 | farmers_markets
106 | WHERE
107 | county IS NULL
108 | )
109 | SELECT
110 | markets.county,
111 | us_counties.namelsad10,
112 | us_counties.geom
113 | FROM
114 | markets
115 | JOIN us_counties ON ST_Intersects(markets.geog_point, us_counties.geom)
116 | ORDER BY
117 | markets.fmid;
118 |
--------------------------------------------------------------------------------
/exercises/Chapter-15.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 15 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | ---------
10 | -- Q1 --
11 | ---------
12 |
13 |
14 | -- creating view for taxi trip counts per hour
15 |
16 | CREATE OR REPLACE VIEW trip_hour_counts AS
17 | SELECT
18 | date_part('hour', tpep_pickup_datetime) AS trip_hour,
19 | count(*)
20 | FROM
21 | nyc_yellow_taxi_trips_2016_06_01
22 | GROUP BY
23 | trip_hour;
24 |
25 | -- Use it in Listing 11-8 to simlify it
26 |
27 | SELECT
28 | *
29 | FROM
30 | trip_hour_counts
31 | ORDER BY
32 | trip_hour;
33 |
34 |
35 | ---------
36 | -- Q2 --
37 | ---------
38 |
39 |
40 | -- custom function to calculate rates per thousand
41 |
42 | CREATE OR REPLACE FUNCTION rates_per_thousand(
43 | observed_number numeric,
44 | base_number numeric,
45 | decimal_places integer DEFAULT 1
46 | ) RETURNS numeric AS '
47 | SELECT round(
48 | (observed_number::numeric / base_number) * 1000, decimal_places
49 | );
50 | '
51 | LANGUAGE SQL
52 | IMMUTABLE
53 | RETURNS NULL ON NULL INPUT;
54 |
55 | -- use it in Listing 10-9 to make sure you get the same results
56 |
57 | SELECT
58 | city,
59 | st,
60 | population,
61 | property_crime,
62 | rates_per_thousand(property_crime, population, 1)
63 | FROM
64 | fbi_crime_data_2015
65 | WHERE
66 | population >= 500000
67 | ORDER BY
68 | (property_crime::numeric / population) DESC;
69 |
70 |
71 | ---------
72 | -- Q3 --
73 | ---------
74 |
75 |
76 | -- implement a trigger that automatically adds an inspection date to the
77 | -- meat_poultry_egg_inspect table each time you insert a new facility
78 |
79 | -- add the new column if you haven't already done so
80 |
81 | ALTER TABLE meat_poultry_egg_inspect ADD COLUMN inspection_date date;
82 |
83 | -- creating the add_inspection_date() function
84 |
85 | CREATE OR REPLACE FUNCTION add_inspection_date()
86 | RETURNS trigger AS
87 | $$
88 | BEGIN
89 | NEW.inspection_date = now() + '6 months'::interval;
90 | RETURN NEW;
91 | END;
92 | $$
93 | LANGUAGE plpgsql;
94 |
95 | -- specifying trigger
96 |
97 | CREATE OR REPLACE TRIGGER facility_insert
98 | BEFORE INSERT
99 | ON meat_poultry_egg_inspect
100 | FOR EACH ROW
101 | EXECUTE PROCEDURE add_inspection_date();
102 |
103 | -- Update the table
104 |
105 | INSERT INTO
106 | meat_poultry_egg_inspect (est_number)
107 | VALUES
108 | ('XXXXXXX');
109 |
110 | -- check if that worked!
111 | -- let's see the final row
112 |
113 | SELECT
114 | *
115 | FROM
116 | meat_poultry_egg_inspect
117 | WHERE
118 | est_number = (
119 | SELECT
120 | max(est_number)
121 | FROM
122 | meat_poultry_egg_inspect
123 | );
124 |
125 | -- if it works, let's remove this unnecessary trial entry
126 |
127 | DELETE FROM
128 | meat_poultry_egg_inspect
129 | WHERE
130 | est_number = 'XXXXXXX';
131 |
--------------------------------------------------------------------------------
/exercises/Chapter-16.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 16 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | /*
10 | The exercise is redoing any of the prior analysis in Command prompt using psql
11 | So nothing extra needs to be done here
12 | */
13 |
--------------------------------------------------------------------------------
/exercises/Chapter-17.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 17 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 | -- backing up entire gis_analysis database
9 |
10 | -- note that this needs to be used with the normal, and not psql, command prompt
11 |
12 | pg_dump -d gis_analysis -U postgres -Fc > gis_analysis_backup.sql
13 |
14 | -- backing only a single table from this database
15 |
16 | pg_dump -t 'spatial_ref_sys' -d gis_analysis -U postgres -Fc > spatial_ref_sys_backup.sql
17 |
18 | -- delete the database using pgAdmin
19 |
20 | -- restore using the backup
21 |
22 | pg_restore -C -d gis_anaylsis -U postgres gis_analysis_backup.sql
23 |
--------------------------------------------------------------------------------
/exercises/Chapter-18.sql:
--------------------------------------------------------------------------------
1 | --------------------------------------------------------------
2 | -- Practical SQL: A Beginner's Guide to Storytelling with Data
3 | -- by Anthony DeBarros
4 | -- 1st Edition
5 | -- Chapter 18 "Try It Yourself" Exercises
6 | --------------------------------------------------------------
7 |
8 |
9 | -- No exercises! 🎉
10 |
--------------------------------------------------------------------------------
/exercises/extra.sql:
--------------------------------------------------------------------------------
1 | -- The following exercises are not from the book, but just for my own practice
2 | -- This script will keep updating while I keep practicing
3 | -- the tables come from datasets mentioned across various chapters in the book
4 |
5 | ----------
6 | --- P1 ---
7 | ----------
8 |
9 | -- find first_name with at least one 'e' *and* 'a' at second position
10 |
11 | SELECT
12 | first_name
13 | FROM
14 | teachers
15 | WHERE
16 | (first_name LIKE '%e%')
17 | AND (first_name LIKE '_a%');
18 |
19 | ----------
20 | --- P2 ---
21 | ----------
22 |
23 | -- find teachers whose last name is not 'Bush' and someone who
24 | -- was hired after 2005 and then sort by their school name (ascending) and
25 | -- salary (descending)
26 |
27 | SELECT
28 | *
29 | FROM
30 | teachers
31 | WHERE
32 | (first_name <> 'Bush')
33 | AND (hire_date > '1-1-2005')
34 | ORDER BY
35 | school ASC,
36 | salary DESC;
37 |
38 | ----------
39 | --- P3 ---
40 | ----------
41 |
42 | -- Practising more with the US census 2010 data
43 | -- check out where housing crisis is expected to be happening
44 |
45 | SELECT
46 | geo_name,
47 | state_us_abbreviation,
48 | (
49 | CAST(housing_unit_count_100_percent AS numeric) / CAST(population_count_100_percent AS numeric)
50 | ) * 100 AS housing_prop
51 | FROM
52 | us_counties_2010
53 | WHERE
54 | population_count_100_percent > 10000
55 | ORDER BY
56 | housing_prop ASC,
57 | state_us_abbreviation;
58 |
59 | ----------
60 | --- P4 ---
61 | ----------
62 |
63 | -- TODO: how to avoid the repeated computation of pct_violent_crime here?
64 | -- Select a few states and rank cities according to violent crime rates
65 |
66 | SELECT
67 | st,
68 | city,
69 | violent_crime,
70 | population,
71 | round(
72 | (violent_crime::numeric / population) * 1000,
73 | 2
74 | ) AS pct_violent_crime,
75 | rank() over (PARTITION BY st ORDER BY round((violent_crime::numeric / population) * 1000, 2) DESC)
76 | FROM
77 | fbi_crime_data_2015
78 | WHERE
79 | (population > 500000)
80 | AND (st IN('Texas', 'California'));
81 |
82 | ----------
83 | --- P5 ---
84 | ----------
85 |
86 | -- summary of library visit rates (min, median, max)
87 |
88 | SELECT
89 | MIN(pct_data.pct_visits),
90 | percentile_cont(0.5) WITHIN GROUP (
91 | ORDER BY
92 | pct_data.pct_visits
93 | ),
94 | MAX(pct_data.pct_visits)
95 | FROM
96 | (
97 | SELECT
98 | libname,
99 | city,
100 | stabr,
101 | visits,
102 | popu_lsa,
103 | round((visits::numeric / popu_lsa) * 1000, 2) AS pct_visits,
104 | rank() over (
105 | ORDER BY
106 | round((visits::numeric / popu_lsa) * 1000, 2) DESC
107 | )
108 | FROM
109 | pls_fy2014_pupld14a
110 | WHERE
111 | (popu_lsa > 250000)
112 | AND (visits > 0)
113 | ORDER BY
114 | pct_visits DESC
115 | ) AS pct_data;
116 |
117 | ----------
118 | --- P6 ---
119 | ----------
120 |
121 | -- FBI data
122 | -- create a view with correlations between all crime variables
123 | -- note that the maximum correlation is between violence and burglary, while
124 | -- minimum between violence and larceny
125 |
126 | CREATE OR REPLACE VIEW corr_table AS (
127 | SELECT
128 | round(
129 | corr(rpt_violent_crime, rpt_property_crime)::numeric,
130 | 2
131 | ) AS r_violence_property,
132 | round(
133 | corr(rpt_violent_crime, rpt_burglary)::numeric,
134 | 2
135 | ) AS r_violence_burglary,
136 | round(
137 | corr(rpt_violent_crime, rpt_larceny_theft)::numeric,
138 | 2
139 | ) AS r_violence_larceny,
140 | round(
141 | corr(rpt_violent_crime, rpt_motor_vehicle_theft)::numeric,
142 | 2
143 | ) AS r_violence_motor,
144 | round(
145 | corr(rpt_burglary, rpt_larceny_theft)::numeric,
146 | 2
147 | ) AS r_burglary_larceny,
148 | round(
149 | corr(rpt_burglary, rpt_motor_vehicle_theft)::numeric,
150 | 2
151 | ) AS r_burglary_motor,
152 | round(
153 | corr(rpt_larceny_theft, rpt_motor_vehicle_theft)::numeric,
154 | 2
155 | ) AS r_larceny_motor
156 | FROM
157 | -- derived table
158 | (
159 | SELECT
160 | rates_per_thousand(violent_crime, population, 3) AS rpt_violent_crime,
161 | rates_per_thousand(property_crime, population, 3) AS rpt_property_crime,
162 | rates_per_thousand(burglary, population, 3) AS rpt_burglary,
163 | rates_per_thousand(larceny_theft, population, 3) AS rpt_larceny_theft,
164 | rates_per_thousand(motor_vehicle_theft, population, 3) AS rpt_motor_vehicle_theft
165 | FROM
166 | fbi_crime_data_2015
167 | ) AS rpt_crimes
168 | );
169 |
170 | ----------
171 | --- P7 ---
172 | ----------
173 |
174 | -- NYC taxi data
175 | -- min, max, and average distance travelled and passanger count by hour of travel
176 |
177 | SELECT
178 | date_part('hour', tpep_pickup_datetime) as pickup_hour,
179 | min(trip_distance) AS min_distance_rate,
180 | max(trip_distance) AS max_distance_rate,
181 | sum(trip_distance)::numeric / count(trip_id) AS avg_distance_rate,
182 | min(passenger_count) AS min_passenger_count,
183 | max(passenger_count) AS max_passenger_count,
184 | sum(passenger_count)::numeric / count(trip_id) AS avg_passenger_count
185 | FROM
186 | nyc_yellow_taxi_trips_2016_06_01
187 | GROUP BY
188 | pickup_hour
189 | ORDER BY
190 | pickup_hour;
191 |
192 |
193 | ----------
194 | --- P8 ---
195 | ----------
196 |
197 | -- calculate the descriptives for passenger count per trip for each vendor and hour
198 | -- but only for trips with pick up times between 8-6 pm
199 | -- we can see that vendor_id 2 consistently gets more passengers per trip pretty much
200 | -- across all hours of interest
201 |
202 | SELECT
203 | date_part('hour', tpep_pickup_datetime) AS pickup_hour,
204 | vendor_id,
205 | round(
206 | sum(passenger_count)::numeric / count(trip_id),
207 | 2
208 | ) AS avg_passenger_per_trip,
209 | min(passenger_count) AS min_passenger_count,
210 | max(passenger_count) AS max_passenger_count
211 | FROM
212 | nyc_yellow_taxi_trips_2016_06_01
213 | GROUP BY
214 | vendor_id,
215 | pickup_hour
216 | HAVING
217 | date_part('hour', tpep_pickup_datetime) BETWEEN '8' AND '18'
218 | ORDER BY
219 | pickup_hour,
220 | vendor_id;
221 |
222 |
223 | ----------
224 | --- P9 ---
225 | ----------
226 |
227 | -- See how media proportion of each type has changed across years across libraries
228 |
229 | -- update tables to add a new column for total of all media
230 |
231 | ALTER TABLE
232 | pls_fy2014_pupld14a
233 | ADD
234 | total_media integer;
235 |
236 | UPDATE
237 | pls_fy2014_pupld14a
238 | SET
239 | total_media = (bkvol + ebook + audio_ph + video_ph);
240 |
241 | ALTER TABLE
242 | pls_fy2009_pupld09a
243 | ADD
244 | total_media integer;
245 |
246 | UPDATE
247 | pls_fy2009_pupld09a
248 | SET
249 | total_media = (bkvol + ebook + audio + video);
250 |
251 | -- create a view out of proportions
252 | -- create CTEs that show the proportion of each type of media,
253 | -- namely, books, ebooks, audio books, videos
254 | -- two CTEs: one for 2009 and one for 2014
255 |
256 | CREATE OR REPLACE VIEW media_pct_change AS
257 | WITH media14 AS (
258 | SELECT
259 | stabr,
260 | fscskey,
261 | bkvol,
262 | (bkvol::numeric / total_media) AS book_prop,
263 | ebook,
264 | (ebook::numeric / total_media) AS ebook_prop,
265 | audio_ph,
266 | (audio_ph::numeric / total_media) AS audio_prop,
267 | video_ph,
268 | (video_ph::numeric / total_media) AS video_prop,
269 | total_media
270 | FROM
271 | pls_fy2014_pupld14a
272 | WHERE
273 | total_media > 0
274 | ),
275 | media09 AS (
276 | SELECT
277 | fscskey,
278 | bkvol,
279 | (bkvol::numeric / total_media) AS book_prop,
280 | ebook,
281 | (ebook::numeric / total_media) AS ebook_prop,
282 | audio,
283 | (audio::numeric / total_media) AS audio_prop,
284 | video,
285 | (video::numeric / total_media) AS video_prop,
286 | total_media
287 | FROM
288 | pls_fy2009_pupld09a
289 | WHERE
290 | total_media > 0
291 | )
292 | SELECT
293 | stabr,
294 | media14.fscskey,
295 | round(
296 | (
297 | (media14.total_media - media09.total_media)::numeric / media09.total_media
298 | ) * 100,
299 | 2
300 | ) AS pct_total_media,
301 | round(
302 | (
303 | (media14.book_prop - media09.book_prop)::numeric / media09.book_prop
304 | ) * 100,
305 | 2
306 | ) AS pct_book_prop,
307 | round(
308 | (
309 | (media14.ebook_prop - media09.ebook_prop)::numeric / media09.ebook_prop
310 | ) * 100,
311 | 2
312 | ) AS pct_ebook_prop,
313 | round(
314 | (
315 | (media14.audio_prop - media09.audio_prop)::numeric / media09.audio_prop
316 | ) * 100,
317 | 2
318 | ) AS pct_audio_prop,
319 | round(
320 | (
321 | (media14.video_prop - media09.video_prop)::numeric / media09.video_prop
322 | ) * 100,
323 | 2
324 | ) AS pct_video_prop
325 | FROM
326 | media14
327 | INNER JOIN media09 ON media14.fscskey = media09.fscskey
328 | WHERE
329 | media14.book_prop > 0
330 | AND media09.book_prop > 0
331 | AND media14.ebook_prop > 0
332 | AND media09.ebook_prop > 0
333 | AND media14.audio_prop > 0
334 | AND media09.audio_prop > 0
335 | AND media14.video_prop > 0
336 | AND media09.video_prop > 0;
337 |
338 | -- a view table with average changes in media
339 |
340 | CREATE
341 | OR REPLACE VIEW avg_media_pct_change AS
342 | SELECT
343 | stabr,
344 | round(avg(pct_total_media), 2) AS avg_pct_total_media,
345 | round(avg(pct_book_prop), 2) AS avg_pct_book_prop,
346 | round(avg(pct_ebook_prop), 2) AS avg_pct_ebook_prop,
347 | round(avg(pct_audio_prop), 2) AS avg_pct_audio_prop,
348 | round(avg(pct_video_prop), 2) AS avg_pct_video_prop
349 | FROM
350 | media_pct_change
351 | GROUP BY
352 | stabr
353 | ORDER BY
354 | avg_pct_total_media DESC;
355 |
356 | -- descriptive statistics for these changes
357 | -- median of average changes across states
358 |
359 | SELECT
360 | percentile_cont(0.5) WITHIN GROUP (ORDER BY avg_pct_book_prop DESC) AS median_avg_pct_book_prop,
361 | percentile_cont(0.5) WITHIN GROUP (ORDER BY avg_pct_ebook_prop DESC) AS median_avg_pct_ebook_prop,
362 | percentile_cont(0.5) WITHIN GROUP (ORDER BY avg_pct_audio_prop DESC) AS median_avg_pct_audio_prop,
363 | percentile_cont(0.5) WITHIN GROUP (ORDER BY avg_pct_video_prop DESC) AS median_avg_pct_video_prop
364 | FROM
365 | avg_media_pct_change;
366 |
367 | /*
368 | From results, we can see that there has been a massive uptick in ebooks in libraries.
369 | The median of avergae change in ebooks across state libraries from 2009 to 2014 is
370 | is a whopping 22,000%!
371 | There has been a decrease in physical books with a median change of -11%. Similar
372 | trend is observed for audio books: -9%.
373 | For video collection, there is a modest increase, with a median at 16%.
374 | */
375 |
376 | -- the state where maximum percentage increase was seen in ebooks: Delware (613,276%)
377 | -- this massive change is mostly due to Milford Public Library, which had 1 ebook
378 | -- in 2009 and 19,770 ebooks in 2014, which is 1,976,900% change!
379 |
380 | SELECT
381 | stabr,
382 | avg_pct_ebook_prop
383 | FROM
384 | avg_media_pct_change
385 | WHERE
386 | avg_pct_ebook_prop = (
387 | SELECT
388 | max(avg_pct_ebook_prop)
389 | FROM
390 | avg_media_pct_change
391 | );
392 |
393 | -- the state where minimum percentage increase was seen in ebooks: Minnesota (51%)
394 | -- (using derived table to figure this out)
395 |
396 | SELECT
397 | stabr,
398 | avg_pct_ebook_prop
399 | FROM
400 | avg_media_pct_change
401 | WHERE
402 | avg_pct_ebook_prop = (
403 | SELECT
404 | min(avg_pct_ebook_prop)
405 | FROM
406 | avg_media_pct_change
407 | );
408 |
--------------------------------------------------------------------------------
/feedback/feedback.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Practical SQL: A Beginner's Guide to Storytelling with Data"
3 | subtitle: "Feedback on the 1st Edition"
4 | output: pdf_document
5 | ---
6 |
7 | ```{r setup, include=FALSE}
8 | knitr::opts_chunk$set(echo = TRUE)
9 | ```
10 |
11 | ## General comments
12 |
13 | * When the joins are mentioned for the first time, the book uses aliases for tables being joined by explicitly using the `AS` keyword (e.g. `left_table AS lt`). But then it stops using this keyword later on (e.g. `left_table lt`). Although, it might not be hard for someone to see that this keyword is optional, I think it's still a good practice for beginners to repeatedly see aliasing using `AS` keyword, so that they embody the habit of making their aliasing intent clear using this keyword. Plus, it may avoid confusion for some other users who may be wondering why there is no `AS`, and if this is an error.
14 |
15 | * While discussing writing functions for triggers, I think it would be helpful if the book were to clarify that we don't need to specify the table explicitly, because the function will always pick up the table specified in when `CREATE TRIGGER` clause is specified. Users coming from other programming languages will spend some time scratching their heads as to how does the function *know*, for example in Listing 15-22, which table does `NEW` keyword refer to?!
16 |
17 | ## Possible clarifications
18 |
19 | * I wish the book emphasized how DBMS, data model, and query language (SQL) are all different things. Only then will it be clear that, for example, *MySQL*, *PostgresSQL*, etc. are RDBMS, and **not** different dialects of SQL developed for respective databases (which is what I walked away thinking having read the book).
20 |
21 | * The way derived tables, CTEs, and view are discussed in the book, sometimes it may feel like they can't all be combined in the same workflow. So, maybe an example that contains more than one of these concepts can be produced to drive this point home:
22 |
23 | ```sql
24 | CREATE OR REPLACE VIEW ... AS
25 | WITH ... AS (
26 | ...
27 | )
28 | ```
29 |
30 | ## Wishlist
31 |
32 | * The book does not cover some common terms used while talking about relational data model:
33 | *attribute*, *tuple*, *domain*, *degree*, *cardinality*
34 |
35 | * I wish the book had also covered the following operators:
36 |
37 | - `UNION` (and `UNION ALL`)
38 | - `ANY` and `ALL`
39 |
40 | * While discussing joins, if not explicit treatment, self joins may at least deserve a hat tip.
41 |
--------------------------------------------------------------------------------
/feedback/feedback.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IndrajeetPatil/Practical-SQL-exercises/fb02667a2070212d7ff234ba79fc8941e866f2c0/feedback/feedback.pdf
--------------------------------------------------------------------------------