4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | ---
11 |
12 | **Welcome to the SQL Masterclass Free GitHub Course!!!**
13 |
14 | You can find all of the content and slides for Danny Ma's SQL Masterclass held at the ODSC Asia Pacific 2021 virtual conference!
15 |
16 | # Table of Contents
17 |
18 | - [📚 Course Content](#course-content)
19 | - [📊 Accessing the Data](#accessing-the-data)
20 | - [🙏 Conclusion](#thank-you)
21 | - [😎 About the Author](#about-danny)
22 |
23 | # 👋 Introduction
24 |
25 | This free GitHub course is sorted into multiple tutorials which were actually delivered using O'Reilly Katacoda during the live training (which is totally ah-mazing 🤩 )
26 |
27 | To avoid any legal issues - all of the SQL live training material is available directly here on GitHub as a companion course which you can learn from at your leisure 👌
28 |
29 | You can also see the presentation slides for the live training [here!](https://github.com/datawithdanny/sql-masterclass/tree/main/slides/sql-masterclass-odsc-apac-2021.pdf)
30 |
31 | # 📚 Course Content
32 |
33 | Click the navigation badge below to get started - all of the course tutorials can be found in the `/course-content` folder!
34 |
35 | **CLICK ON THE BADGE BELOW TO GET STARTED!!!**
36 |
37 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step1.md)
38 |
39 | # 📊 Accessing the Data
40 |
41 | Although all of the code examples show the outputs directly inside the training materials for this masterclass, you can also play with the data locally or on a few different hosted services provided below!
42 |
43 | My recommendations are to use any one of the `Docker` solutions to best replicate the SQLPad environment used for the actual live training session.
44 |
45 | You can also access the free `DB-Fiddle` instances below or you can also access the raw data [here!](#raw-data)
46 |
47 | ## Docker Solutions
48 |
49 | This is the preferred setup to access all of the course data in the desired SQLPad environment!
50 |
51 | This same Docker SQLPad interface is featured in the Data With Danny [Serious SQL course](https://www.datawithdanny.com) but with a lot more data included - you should check it out! 😉
52 |
53 | We even include instructions on how to install Docker if you haven't used it before inside the course :)
54 |
55 | ### Docker Compose
56 |
57 | If you have Docker installed on your machine with Docker-Compose, you can directly use the `docker-compose.yml` file included in the root of this repo to spin up the required SQLPad infrastructure.
58 |
59 | ```bash
60 | docker-compose up
61 | ```
62 |
63 | Once the initialisation is complete - you can visit `localhost:3000` in your favourite browser to access the SQL interface with all the data ready to go.
64 |
65 | Note that you may want to save all of your code in a separate text editor and copy & paste it directly into the SQLPad so you don't lose all of your code should something go wrong! 🥵
66 |
67 | ### Free Play With Docker Instance
68 |
69 | You can use this live Docker PostgreSQL environment to copy and paste SQL code from the `code/` folder directly here to see the outputs generated for the session.
70 |
71 | Click on the buttons below to launch a Play-With-Docker stack - DockerHub login/signup is required but all batteries should be included.
72 |
73 | It's totally free to create your own DockerHub account and highly recommended too if you want to create your own Docker images in the future!
74 |
75 | **👇👇👇 Make sure you right click and open the link in a new tab if you're viewing this on GitHub!**
76 |
77 | [](https://labs.play-with-docker.com/?stack=https://raw.githubusercontent.com/datawithdanny/sql-masterclass/main/docker-compose.yml)
78 |
79 | Once the initialisation steps are complete and you see `Your session is ready at the bottom of the screen` - there is a chance you might need to refresh your browser or click just outside the initial popout window and press your esc key to close this window as the `Close` button is sometimes not working!
80 |
81 | Next you can either hit port `3000` to use a SQLPad GUI or you can run the following to enter a `psql` terminal instance if you are so inclined!
82 |
83 | ```bash
84 | docker exec -it `docker ps -aqf "expose=5432"` psql -U postgres
85 | ```
86 |
87 | ### Local Docker Commands
88 |
89 | If you are happy to use the `psql` shell instead of the SQLPad GUI because you love the terminal - you can simply run the following commands to access the data directly too. By default, port `5432` will be used for the PostgreSQL database.
90 |
91 | Feel free to change the `name` argument to anything you like!
92 |
93 | ```bash
94 | docker run -d --rm --name="psql-masterclass" dannyma/psql-crypto:latest
95 | docker exec -it psql-masterclass psql -U postgres
96 | ```
97 |
98 | ## DB Fiddle
99 |
100 | You can also access the raw data directly in a live browser interface called DB Fiddle.
101 |
102 | Click on the badge below to get access to this free service - it takes a little bit longer to run each query when compared to the Docker solutions, so please keep this in mind!
103 |
104 | [](https://www.db-fiddle.com/f/cnLCK4ChsNfr5ViG6vzePg/7)
105 |
106 | ## Raw Data
107 |
108 | You can also find all of the raw data inside the `/data` folder
109 |
110 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/data)
111 |
112 | Update: There are now SQL scripts for both PostgreSQL and MySQL - however all of the code inside the course contents is for PostgreSQL only.
113 |
114 | The majority of the code snippets should still work across SQL flavours but you may need to tweak some of the queries slightly!
115 |
116 | * `Postgres/init-postgres.sql` and `MySQL/init-mysql.sql` can be ran with your favourite SQL IDE like PgAdmin4 or MySQL Workbench directly to create the required database tables
117 | * `Postgres/schema-postgres.sql` and `MySQL/schema-mysql.sql` contain the table definitions in case you want to use them also and load in the CSV files
118 | * `.csv` files containing the raw data for each table if you want to import the data to another tool
119 |
120 | Note that the two flavour init scripts are essentially the same - but MySQL has backticks instead of double quotes for table creation steps!
121 |
122 | ---
123 |
124 | # Thank You & Next Steps
125 |
126 | Thank you for your taking this free SQL Masterclass GitHub course! If you've enjoyed this - please feel free to share this with your friends and leave a review! ⭐
127 |
128 | Here are some ways you can support the author and the Data With Danny team below 🙏
129 |
130 | ## Data With Danny Virtual Data Apprenticeship
131 |
132 |
133 |
134 |
135 |
136 |
137 | If you're interested in learning valuable data science skills with Danny directly - you can checkout the [Data With Danny](https://www.datawithdanny.com) website for more details!
138 |
139 | Join our private student community with over 1,000 data professionals, join a local study group, get help from our team of 12+ data mentors and learn more about our personalized mentorship initiatives!
140 |
141 | The 1st part of the Data With Danny program is Serious SQL which is like this SQL course but on an entirely differently level. If you're serious about learning SQL you won't want to miss this course!
142 |
143 | Our first live SQL training cohort begins in November 2021 - you can gain all access to course content, recorded videos, our private Discord and more for a one off payment of only $49 - find out more [here!](https://www.datawithdanny.com/courses/serious-sql)
144 |
145 | ## 8 Week SQL Challenge Case Studies
146 |
147 |
148 |
149 |
150 |
151 |
152 | Want to test your SQL skills and tackle 8 realistic SQL case studies and get access to a collection of free SQL learning resources?
153 |
154 | Join the Data With Danny [8 Week SQL Challenge](https://www.8weeksqlchallenge.com/getting-started) for free today!
155 |
156 | Solve all 8 realistic SQL case studies designed to simulate real work scenarios and interview questions across multiple analytics domains including customer analytics, digital, banking, retail and subscriptions!
157 |
158 | ## About the Author: Danny Ma
159 |
160 |
161 |
162 |
163 |
164 | Danny is the Chief Data Mentor at Data With Danny and the Founder & CEO of Sydney Data Science, a boutique data consultancy based out of Sydney, Australia 🇦🇺
165 |
166 | After spending the last 10 years working in almost every single role in the data ecosystem, Danny is now focused on solving difficult problems at scale re-imagining data education and recruitment, and mentoring the next generation of data professionals.
167 |
168 | He provides specialist data consultancy services:
169 |
170 | * Digital customer analytics and experimentation
171 | * Data and machine learning strategy
172 | * Data engineering and systems design
173 | * Team building for analytics and data science functions
174 | * Technical training for practitioners and management
175 |
176 | Danny is a regular speaker at global data conferences, meetups and podcasts where he shares the importance of mentorship for all data professionals. He is
177 | also a technical author and instructor for O'Reilly.
178 |
179 | Danny believes that he is living proof that dispels the myth that you need higher level education to be successful in the data science space,
180 | and he wants to share his experiences with others so they can do the same.
181 |
--------------------------------------------------------------------------------
/course-content/assets/8-week-sql-challenge.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/course-content/assets/8-week-sql-challenge.png
--------------------------------------------------------------------------------
/course-content/assets/avatar.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/course-content/assets/avatar.png
--------------------------------------------------------------------------------
/course-content/assets/bull.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/course-content/assets/bull.jpeg
--------------------------------------------------------------------------------
/course-content/assets/crypto-erd.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/course-content/assets/crypto-erd.png
--------------------------------------------------------------------------------
/course-content/assets/dwd-banner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/course-content/assets/dwd-banner.png
--------------------------------------------------------------------------------
/course-content/assets/hodl.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/course-content/assets/hodl.jpeg
--------------------------------------------------------------------------------
/course-content/assets/trader.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/course-content/assets/trader.jpeg
--------------------------------------------------------------------------------
/course-content/step1.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 1 - Introduction
11 |
12 | ## Our Database
13 |
14 | All of our data lives within a PostgreSQL database and contains a single schema called `trading`.
15 |
16 | In PostgreSQL, a database can contain multiple schemas, and a schema is a collection of tables and other database objects.
17 |
18 | ## Copy and Run a SQL Query
19 |
20 | To run our first query together - you can click on the right hand corner of the following code snippet below to copy a basic `SELECT` query to your clipboard.
21 |
22 | You can then paste it into your SQLPad interface and click on the `Run` button in the top right corner or hit `cmd` + `enter` on Mac or `control` + `enter` on Windows to run the query.
23 |
24 | ```sql
25 | SELECT * FROM trading.members;
26 | ```
27 |
28 | This `SELECT` query above will return all of the records from the `members` table inside the `trading` schema.
29 |
30 | # Our Crypto Case Study
31 |
32 | For this entire SQL Simplified course we will focus on our Cryptocurrency Trading SQL Case Study!
33 |
34 | ## Setting the Context
35 |
36 | In our fictitious (but realistic) case study - my team of trusted data mentors from the Data With Danny team have been dabbling in the crypto markets since 2017.
37 |
38 | Our main purpose for this case study is to analyse the performance of the DWD mentors over time and to "slice and dice" the data in various ways to investigate other questions we might want answers to!
39 |
40 | ## Our Datasets
41 |
42 | All of our data for this case study exists within the `trading` schema as we mentioned in the previous tutorial.
43 |
44 | There are 3 data tables available to us in this schema which we can use to run our SQL queries with:
45 |
46 | 1. `members`
47 | 2. `prices`
48 | 3. `transactions`
49 |
50 | You can inspect each dataset by copying the following code snippet below and running it directly in the SQLPad GUI - please make sure to overwrite any previous queries which are already in the SQL interface!
51 |
52 | ```sql
53 | SELECT * FROM trading.members;
54 | ```
55 |
56 | | member_id | first_name | region |
57 | | --------- | ---------- | ------------- |
58 | | c4ca42 | Danny | Australia |
59 | | c81e72 | Vipul | United States |
60 | | eccbc8 | Charlie | United States |
61 | | a87ff6 | Nandita | United States |
62 | | e4da3b | Rowan | United States |
63 | | 167909 | Ayush | United States |
64 | | 8f14e4 | Alex | United States |
65 | | c9f0f8 | Abe | United States |
66 | | 45c48c | Ben | Australia |
67 | | d3d944 | Enoch | Africa |
68 | | 6512bd | Vikram | India |
69 | | c20ad4 | Leah | Asia |
70 | | c51ce4 | Pavan | Australia |
71 | | aab323 | Sonia | Australia |
72 |
73 |
74 | ```sql
75 | SELECT * FROM trading.prices LIMIT 5;
76 | ```
77 |
78 | | ticker | market_date | price | open | high | low | volume | change |
79 | | ------ | ----------- | ------- | ------- | ------- | ------- | ------- | ------ |
80 | | ETH | 2021-08-29 | 3177.84 | 3243.96 | 3282.21 | 3162.79 | 582.04K | -2.04% |
81 | | ETH | 2021-08-28 | 3243.90 | 3273.78 | 3284.58 | 3212.24 | 466.21K | -0.91% |
82 | | ETH | 2021-08-27 | 3273.58 | 3093.78 | 3279.93 | 3063.37 | 839.54K | 5.82% |
83 | | ETH | 2021-08-26 | 3093.54 | 3228.03 | 3249.62 | 3057.48 | 118.44K | -4.17% |
84 | | ETH | 2021-08-25 | 3228.15 | 3172.12 | 3247.43 | 3080.70 | 923.13K | 1.73% |
85 |
86 |
87 | ```sql
88 | SELECT * FROM trading.transactions LIMIT 5;
89 | ```
90 |
91 | | txn_id | member_id | ticker | txn_date | txn_type | quantity | percentage_fee | txn_time |
92 | | ------ | --------- | ------ | ---------- | -------- | -------- | -------------- | ------------------- |
93 | | 1 | c81e72 | BTC | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
94 | | 2 | eccbc8 | BTC | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
95 | | 3 | a87ff6 | BTC | 2017-01-01 | BUY | 50 | 0.00 | 2017-01-01 00:00:00 |
96 | | 4 | e4da3b | BTC | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
97 | | 5 | 167909 | BTC | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
98 |
99 |
100 | Note: the `LIMIT 5` in the above queries will return us only the first 5 rows from each dataset.
101 |
102 | It is a good practice to always `LIMIT` your queries just in case the tables are huge - you don't want to be trying to return all 5 million rows from a huge table when you are just inspecting the data for the first time!
103 |
104 | ## A Note on Schemas
105 |
106 | Notice above how the "`trading.`" is included before each of our available tables.
107 |
108 | If we were to remove this - our database will be unable to find our tables.
109 |
110 | This query below will return you an error when ran:
111 |
112 | ```sql
113 | SELECT * FROM members;
114 | ```
115 |
116 | > relation "members" does not exist
117 |
118 | In realistic scenarios - physical tables will almost always live within a schema and we'll need to reference the schema name to run our queries properly!
119 |
120 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step2.md)
--------------------------------------------------------------------------------
/course-content/step10.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 10 - The Bull Strategy
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step9.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step11.md)
14 |
15 | 
16 |
17 | Vikram is also similar to Leah but purchases Bitcoin frequently because he believes the price will go up in the future!
18 |
19 | ## Vikram's Transaction History
20 |
21 | * Vikram also purchases 50 units of both ETH and BTC just like Leah on Jan 1st 2017
22 | * He continues to purchase more throughout the entire 4 year period
23 | * He does not sell any of his crypto - he's in it for the long run
24 |
25 | ## Vikram's Data
26 |
27 | Because this is also a simplified version of our dataset - we will create another temp table called `vikram_bull_strategy` with our data for these questions.
28 |
29 | ```sql
30 | CREATE TEMP TABLE vikram_bull_strategy AS
31 | SELECT * FROM trading.transactions
32 | WHERE member_id = '6512bd'
33 | AND txn_type = 'BUY';
34 | ```
35 |
36 | Again, we can inspect the data by running the following query after creating the temp table above:
37 |
38 | ```sql
39 | SELECT * FROM vikram_bull_strategy LIMIT 10;
40 | ```
41 |
42 | | txn_id | member_id | ticker | txn_date | txn_type | quantity | percentage_fee | txn_time |
43 | | ------ | --------- | ------ | ---------- | -------- | ---------------- | -------------- | -------------------------- |
44 | | 11 | 6512bd | BTC | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
45 | | 25 | 6512bd | ETH | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
46 | | 30 | 6512bd | ETH | 2017-01-01 | BUY | 8.84298701787532 | 0.30 | 2017-01-01 06:22:20.202995 |
47 | | 31 | 6512bd | BTC | 2017-01-01 | BUY | 2.27106258645779 | 0.21 | 2017-01-01 06:40:48.691577 |
48 | | 35 | 6512bd | BTC | 2017-01-01 | BUY | 6.73841780964583 | 0.30 | 2017-01-01 11:00:14.002519 |
49 | | 36 | 6512bd | BTC | 2017-01-01 | BUY | 9.37875791241961 | 0.30 | 2017-01-01 12:03:33.017453 |
50 | | 55 | 6512bd | BTC | 2017-01-02 | BUY | 5.54383811940401 | 0.30 | 2017-01-02 11:12:42.895079 |
51 | | 63 | 6512bd | ETH | 2017-01-02 | BUY | 5.04372609654009 | 0.07 | 2017-01-02 20:48:13.480413 |
52 | | 65 | 6512bd | BTC | 2017-01-02 | BUY | 3.01276029896716 | 0.30 | 2017-01-02 21:00:49.341793 |
53 | | 99 | 6512bd | ETH | 2017-01-04 | BUY | 1.83100404691078 | 0.30 | 2017-01-04 22:04:12.689306 |
54 |
55 |
56 | ## Required Metrics
57 |
58 | To assess Vikram's performance we also need to regularly match the prices for his trades throughout the 4 years and not just at the start of the entire dataset, like in the case of Leah's HODL strategy.
59 |
60 | We will need to calculate the following metrics:
61 |
62 | * Total investment amount in dollars for all of his purchases
63 | * The dollar amount of fees paid
64 | * The dollar cost average per unit of BTC and ETH purchased by Vikram
65 | * The final investment value of his portfolio on August 29th 2021
66 | * Profitability can be measured by final portfolio value divided by the investment amount
67 | * Profitability split by BTC and ETH
68 |
69 | ## Solutions
70 |
71 | ### Question 1 & 2
72 |
73 | > Calculate the total investment amount in dollars for all of Vikram's purchases and his dollar amount of fees paid
74 |
75 | Click here to reveal the solution!
76 |
77 | ```sql
78 | SELECT
79 | SUM(transactions.quantity * prices.price) AS initial_investment,
80 | SUM(transactions.quantity * prices.price * transactions.percentage_fee / 100) AS fees
81 | FROM vikram_bull_strategy AS transactions
82 | INNER JOIN trading.prices
83 | ON transactions.ticker = prices.ticker
84 | AND transactions.txn_date = prices.market_date;
85 | ```
86 |
87 |
88 |
89 | | initial_investment | fees |
90 | | --------------------------- | --------------------------- |
91 | | 50730451.023400136384298882 | 128821.14163246531801672694 |
92 |
93 |
94 | ### Question 3
95 |
96 | > What is the average cost per unit of BTC and ETH purchased by Vikram
97 |
98 | Click here to reveal the solution!
99 |
100 | ```sql
101 | WITH cte_portfolio AS (
102 | SELECT
103 | transactions.ticker,
104 | SUM(transactions.quantity) AS total_quantity,
105 | SUM(transactions.quantity * prices.price) AS initial_investment
106 | FROM vikram_bull_strategy AS transactions
107 | INNER JOIN trading.prices
108 | ON transactions.ticker = prices.ticker
109 | AND transactions.txn_date = prices.market_date
110 | GROUP BY transactions.ticker
111 | )
112 | SELECT
113 | ticker,
114 | initial_investment / total_quantity AS dollar_cost_average
115 | FROM cte_portfolio;
116 | ```
117 |
118 |
119 |
120 | | ticker | dollar_cost_average |
121 | | ------ | ----------------------- |
122 | | BTC | 12190.13846337579877423 |
123 | | ETH | 538.402092304626902638 |
124 |
125 |
126 | ### Question 4
127 |
128 | > Calculate profitability by using final portfolio value divided by the investment amount
129 |
130 | Click here to reveal the solution!
131 |
132 | ```sql
133 | WITH cte_portfolio_values AS (
134 | SELECT
135 | SUM(transactions.quantity * prices.price) AS initial_investment,
136 | SUM(transactions.quantity * final.price) AS final_value
137 | FROM vikram_bull_strategy AS transactions
138 | INNER JOIN trading.prices
139 | ON transactions.ticker = prices.ticker
140 | AND transactions.txn_date = prices.market_date
141 | INNER JOIN trading.prices AS final
142 | ON transactions.ticker = final.ticker
143 | WHERE final.market_date = '2021-08-29'
144 | )
145 | SELECT
146 | final_value / initial_investment AS profitability
147 | FROM cte_portfolio_values;
148 | ```
149 |
150 |
151 |
152 | | profitability |
153 | | -------------------- |
154 | | 4.019204544489789883 |
155 |
156 | ### Question 5
157 |
158 | > Calculate Vikram's profitability split by BTC and ETH
159 |
160 | Click here to reveal the solution!
161 |
162 | ```sql
163 | WITH cte_ticker_portfolio_values AS (
164 | SELECT
165 | transactions.ticker,
166 | SUM(transactions.quantity * prices.price) AS initial_investment,
167 | SUM(transactions.quantity * final.price) AS final_value
168 | FROM vikram_bull_strategy AS transactions
169 | INNER JOIN trading.prices
170 | ON transactions.ticker = prices.ticker
171 | AND transactions.txn_date = prices.market_date
172 | INNER JOIN trading.prices AS final
173 | ON transactions.ticker = final.ticker
174 | WHERE final.market_date = '2021-08-29'
175 | GROUP BY transactions.ticker
176 | )
177 | SELECT
178 | ticker,
179 | final_value / initial_investment AS profitability
180 | FROM cte_ticker_portfolio_values;
181 | ```
182 |
183 |
184 |
185 | | ticker | profitability |
186 | | ------ | -------------------- |
187 | | BTC | 3.95852763649714995 |
188 | | ETH | 5.902354477110731653 |
189 |
190 |
191 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step9.md)
192 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step11.md)
--------------------------------------------------------------------------------
/course-content/step11.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 11 - The Trader Strategy
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step10.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step12.md)
14 |
15 | # Scenario 3: The Trader
16 |
17 | 
18 |
19 | Nandita is the Queen of crypto trading - she wants to follow the popular trader's adage of **BUY LOW, SELL HIGH**
20 |
21 | ## Nandita's Transaction History
22 |
23 | * She also starts out with a 50 BTC and ETH purchase just like Leah and Vikram
24 | * She continues to buy more crypto over the 4 years
25 | * She starts selling some of her crypto portfolio to realise gains
26 |
27 | ## Nandita's Data
28 |
29 | This final scenario 3 is actually exactly the same as our real `trading.transactions` dataset!
30 |
31 | To complete our individual scenarios before we calculate all our metrics for all mentors - let's also prepare another temp table called `nandita_trading_strategy`
32 |
33 | ```sql
34 | CREATE TEMP TABLE nandita_trading_strategy AS
35 | SELECT * FROM trading.transactions
36 | WHERE member_id = 'a87ff6';
37 | ```
38 |
39 | You can inspect the data by running the following query after creating the temp table above:
40 |
41 | ```sql
42 | SELECT * FROM nandita_trading_strategy LIMIT 10;
43 | ```
44 |
45 | | txn_id | member_id | ticker | txn_date | txn_type | quantity | percentage_fee | txn_time |
46 | | ------ | --------- | ------ | ---------- | -------- | ---------------- | -------------- | -------------------------- |
47 | | 3 | a87ff6 | BTC | 2017-01-01 | BUY | 50 | 0.00 | 2017-01-01 00:00:00 |
48 | | 19 | a87ff6 | ETH | 2017-01-01 | BUY | 50 | 0.20 | 2017-01-01 00:00:00 |
49 | | 41 | a87ff6 | ETH | 2017-01-01 | BUY | 1.98666102006509 | 0.30 | 2017-01-01 17:39:10.894181 |
50 | | 49 | a87ff6 | ETH | 2017-01-02 | BUY | 8.78673520720906 | 0.30 | 2017-01-02 04:48:50.044665 |
51 | | 53 | a87ff6 | BTC | 2017-01-02 | BUY | 5.95980481918755 | 0.30 | 2017-01-02 09:55:27.347188 |
52 | | 60 | a87ff6 | BTC | 2017-01-02 | BUY | 9.01117722642621 | 0.30 | 2017-01-02 17:16:29.062839 |
53 | | 64 | a87ff6 | ETH | 2017-01-02 | BUY | 1.37715908309016 | 0.01 | 2017-01-02 20:49:33.771818 |
54 | | 77 | a87ff6 | BTC | 2017-01-03 | BUY | 3.80769453794553 | 0.30 | 2017-01-03 12:30:20.779105 |
55 | | 89 | a87ff6 | BTC | 2017-01-04 | BUY | 5.68677206948404 | 0.00 | 2017-01-04 08:13:07.752195 |
56 | | 93 | a87ff6 | BTC | 2017-01-04 | BUY | 8.13772499730359 | 0.30 | 2017-01-04 12:25:48.367139 |
57 |
58 |
59 | ## Final Evaluation Metrics
60 |
61 | By the end of our assessment period on the 29th of August 2021 - we can calculate Nandita's metrics as follows for each individual BTC and ETH portfolio:
62 |
63 | * Count of buy and sell transactions
64 | * Total investment amount of purchases
65 | * The dollar amount of fees for purchase transactions
66 | * Dollar cost average of purchases
67 | * Total gross revenue of sell transactions
68 | * Average sell price for each unit sold
69 | * Final portfolio value and quantity
70 | * Profitability measured as (final portfolio value + gross sales revenue - purchase fees - sales fees) / initial investment amount
71 |
72 | **Bonus Question**
73 |
74 | We also want to calculate the difference if Nandita didn't sell any of her crypto and compare it to the final value at the end of August - how much does this impact her overall profitability?
75 |
76 | ## Solutions
77 |
78 | ### Question 1
79 |
80 | > Calculate Nandita's purchase metrics for each of her BTC and ETH portfolios:
81 | >
82 | > * Count of purchase transactions
83 | > * Initial investment
84 | > * Purchase fees
85 | > * Dollar cost average of purchases
86 |
87 | Click here to reveal the solution!
88 |
89 | ```sql
90 | WITH cte_purchases AS (
91 | SELECT
92 | transactions.ticker,
93 | COUNT(*) AS purchase_count,
94 | SUM(transactions.quantity) AS purchase_quantity,
95 | SUM(transactions.quantity * prices.price) AS initial_investment,
96 | SUM(transactions.quantity * prices.price * transactions.percentage_fee / 100) AS purchase_fees
97 | FROM nandita_trading_strategy AS transactions
98 | INNER JOIN trading.prices
99 | ON transactions.ticker = prices.ticker
100 | AND transactions.txn_date = prices.market_date
101 | WHERE transactions.txn_type = 'BUY'
102 | GROUP BY transactions.ticker
103 | )
104 | SELECT
105 | ticker,
106 | purchase_count,
107 | purchase_quantity,
108 | initial_investment,
109 | purchase_fees,
110 | initial_investment / purchase_quantity AS dollar_cost_average
111 | FROM cte_purchases;
112 | ```
113 |
114 |
115 |
116 | | ticker | purchase_count | purchase_quantity | initial_investment | purchase_fees | dollar_cost_average |
117 | | ------ | -------------- | ----------------------- | ---------------------------- | ---------------------------- | ------------------------- |
118 | | BTC | 954 | 5023.705687783492459935 | 63735345.6973630892576024850 | 162919.943377863222128081502 | 12686.9187126851255182628 |
119 | | ETH | 756 | 3822.0371970017654265 | 2287096.578215583047801140 | 5783.32678170688531189239 | 598.397257883758534604 |
120 |
121 |
122 | ### Question 2
123 |
124 | > Calculate Nandita's sales metrics for each of her BTC and ETH portfolios:
125 | >
126 | > * Count of sales transactions
127 | > * Gross revenue amount
128 | > * Sales fees
129 | > * Average selling price
130 |
131 | Click here to reveal the solution!
132 |
133 | ```sql
134 | WITH cte_sales AS (
135 | SELECT
136 | transactions.ticker,
137 | COUNT(*) AS sales_count,
138 | SUM(transactions.quantity) AS sales_quantity,
139 | SUM(transactions.quantity * prices.price) AS gross_revenue,
140 | SUM(transactions.quantity * prices.price * transactions.percentage_fee / 100) AS sales_fees
141 | FROM nandita_trading_strategy AS transactions
142 | INNER JOIN trading.prices
143 | ON transactions.ticker = prices.ticker
144 | AND transactions.txn_date = prices.market_date
145 | WHERE transactions.txn_type = 'SELL'
146 | GROUP BY transactions.ticker
147 | )
148 | SELECT
149 | ticker,
150 | sales_count,
151 | sales_quantity,
152 | gross_revenue,
153 | sales_fees,
154 | gross_revenue / sales_quantity AS average_selling_price
155 | FROM cte_sales;
156 | ```
157 |
158 |
159 |
160 | | ticker | sales_count | sales_quantity | gross_revenue | sales_fees | average_selling_price |
161 | | ------ | ----------- | -------------------- | -------------------------- | -------------------------- | ----------------------- |
162 | | BTC | 167 | 863.4858182768507102 | 10975745.05336688201117242 | 29522.09286188312984442411 | 12710.97315213559195557 |
163 | | ETH | 70 | 318.1506358514526923 | 172591.915512909206341725 | 447.93810830446683009024 | 542.484898862480594053 |
164 |
165 |
166 | ### Question 3
167 |
168 | > What is Nandita's final BTC and ETH portfolio value and quantity?
169 |
170 | Click here to reveal the solution!
171 |
172 | ```sql
173 | WITH cte_adjusted_transactions AS (
174 | SELECT
175 | member_id,
176 | txn_date,
177 | txn_type,
178 | ticker,
179 | percentage_fee,
180 | CASE
181 | WHEN txn_type = 'BUY' THEN quantity
182 | WHEN txn_type = 'SELL' THEN -quantity
183 | END as quantity
184 | FROM nandita_trading_strategy
185 | )
186 | SELECT
187 | transactions.ticker,
188 | SUM(transactions.quantity) AS final_quantity,
189 | SUM(transactions.quantity * prices.price) AS final_portfolio_value
190 | FROM cte_adjusted_transactions AS transactions
191 | INNER JOIN trading.prices
192 | ON transactions.ticker = prices.ticker
193 | WHERE prices.market_date = '2021-08-29'
194 | GROUP BY transactions.ticker;
195 | ```
196 |
197 |
198 |
199 | | ticker | final_quantity | final_portfolio_value |
200 | | ------ | ----------------------- | ----------------------------- |
201 | | BTC | 4160.219869506641749735 | 200751409.8030429976334624250 |
202 | | ETH | 3503.8865611503127342 | 11134790.869485909819250128 |
203 |
204 |
205 | ### Question 4 & 5 (bonus!)
206 |
207 | > What is Nandita's overall profitability and theoretical profitability if she didn't sell any of her portfolio?
208 |
209 | We will try to minimise how many times we access the temp table `nandita_trading_strategy` to optimise our query performance!
210 |
211 | Click here to reveal the solution!
212 |
213 | ```sql
214 | WITH cte_portfolio AS (
215 | SELECT
216 | transactions.ticker,
217 | transactions.txn_type,
218 | COUNT(*) AS transaction_count,
219 | SUM(transactions.quantity) AS total_quantity,
220 | SUM(transactions.quantity * prices.price) AS gross_values,
221 | SUM(transactions.quantity * prices.price * transactions.percentage_fee / 100) AS fees
222 | FROM nandita_trading_strategy AS transactions
223 | INNER JOIN trading.prices
224 | ON transactions.ticker = prices.ticker
225 | AND transactions.txn_date = prices.market_date
226 | GROUP BY 1,2
227 | ),
228 | cte_summary AS (
229 | SELECT
230 | ticker,
231 | SUM(
232 | CASE
233 | WHEN txn_type = 'BUY' THEN total_quantity
234 | WHEN txn_type = 'SELL' THEN -total_quantity
235 | END
236 | ) AS final_quantity,
237 | SUM(CASE WHEN txn_type = 'BUY' THEN gross_values ELSE 0 END) AS initial_investment,
238 | SUM(CASE WHEN txn_type = 'SELL' THEN gross_values ELSE 0 END) AS sales_revenue,
239 | SUM(CASE WHEN txn_type = 'BUY' THEN fees ELSE 0 END) AS purchase_fees,
240 | SUM(CASE WHEN txn_type = 'SELL' THEN fees ELSE 0 END) AS sales_fees,
241 | SUM(CASE WHEN txn_type = 'BUY' THEN total_quantity ELSE 0 END) AS purchase_quantity,
242 | SUM(CASE WHEN txn_type = 'SELL' THEN total_quantity ELSE 0 END) AS sales_quantity,
243 | SUM(CASE WHEN txn_type = 'BUY' THEN transaction_count ELSE 0 END) AS purchase_transactions,
244 | SUM(CASE WHEN txn_type = 'SELL' THEN transaction_count ELSE 0 END) AS sales_transactions
245 | FROM cte_portfolio
246 | GROUP BY ticker
247 | ),
248 | cte_metrics AS (
249 | SELECT
250 | summary.ticker,
251 | summary.final_quantity * final.price AS actual_final_value,
252 | summary.purchase_quantity * final.price AS theoretical_final_value,
253 | summary.sales_revenue,
254 | summary.purchase_fees,
255 | summary.sales_fees,
256 | summary.initial_investment,
257 | summary.purchase_quantity,
258 | summary.sales_quantity,
259 | summary.purchase_transactions,
260 | summary.sales_transactions,
261 | summary.initial_investment / purchase_quantity AS dollar_cost_average,
262 | summary.sales_revenue / sales_quantity AS average_selling_price
263 | FROM cte_summary AS summary
264 | INNER JOIN trading.prices AS final
265 | ON summary.ticker = final.ticker
266 | WHERE final.market_date = '2021-08-29'
267 | )
268 | SELECT
269 | ticker,
270 | actual_final_value AS final_portfolio_value,
271 | ( actual_final_value + sales_revenue - purchase_fees - sales_fees ) / initial_investment AS actual_profitability,
272 | ( theoretical_final_value - purchase_fees ) / initial_investment AS theoretical_profitability,
273 | dollar_cost_average,
274 | average_selling_price,
275 | sales_revenue,
276 | purchase_fees,
277 | sales_fees,
278 | initial_investment,
279 | purchase_quantity,
280 | sales_quantity,
281 | purchase_transactions,
282 | sales_transactions
283 | FROM cte_metrics;
284 | ```
285 |
286 |
287 |
288 | | ticker | final_portfolio_value | actual_profitability | theoretical_profitability | dollar_cost_average | average_selling_price | sales_revenue | purchase_fees | sales_fees | initial_investment | purchase_quantity | sales_quantity | purchase_transactions | sales_transactions |
289 | | ------ | ----------------------------- | ----------------------- | ------------------------- | ------------------------- | ----------------------- | -------------------------- | ---------------------------- | -------------------------- | ---------------------------- | ----------------------- | -------------------- | --------------------- | ------------------ |
290 | | BTC | 200751409.8030429976334624250 | 3.318954506414827841503 | 3.800967820444996477195 | 12686.9187126851255182628 | 12710.97315213559195557 | 10975745.05336688201117242 | 162919.943377863222128081502 | 29522.09286188312984442411 | 63735345.6973630892576024850 | 5023.705687783492459935 | 863.4858182768507102 | 954 | 167 |
291 | | ETH | 11134790.869485909819250128 | 4.94126554503705553061 | 5.30805715638391209991 | 598.397257883758534604 | 542.484898862480594053 | 172591.915512909206341725 | 5783.32678170688531189239 | 447.93810830446683009024 | 2287096.578215583047801140 | 3822.0371970017654265 | 318.1506358514526923 | 756 | 70 |
292 |
293 |
294 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step10.md)
295 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step12.md)
--------------------------------------------------------------------------------
/course-content/step12.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 12 - Final Case Study Questions
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step11.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step13.md)
14 |
15 | To finish up our entire cryptocurrency case study - let's now calculate exactly the same query that we just created for Nandita's data with our entire dataset with all mentors included!
16 |
17 | ## Create The Base Table
18 |
19 | > Create a summary table `mentor_performance` which includes the following metrics for each member and ticker:
20 |
21 | * Count of purchase transactions
22 | * Initial investment
23 | * Purchase fees
24 | * Dollar cost average of purchases
25 | * Count of sales transactions
26 | * Gross revenue amount
27 | * Sales fees
28 | * Average selling price
29 | * Actual Profitability (final portfolio value + gross sales revenue - purchase fees - sales fees) / initial investment amount
30 | * Theoretical Profitability (final portfolio value with no sales - purchase fees) / initial investment amount
31 |
32 | Click here to reveal the solution!
33 |
34 | ```sql
35 | CREATE TEMP TABLE mentor_performance AS
36 | WITH cte_portfolio AS (
37 | SELECT
38 | members.first_name,
39 | members.region,
40 | transactions.ticker,
41 | transactions.txn_type,
42 | COUNT(*) AS transaction_count,
43 | SUM(transactions.quantity) AS total_quantity,
44 | SUM(transactions.quantity * prices.price) AS gross_values,
45 | SUM(transactions.quantity * prices.price * transactions.percentage_fee / 100) AS fees
46 | FROM trading.transactions
47 | INNER JOIN trading.members
48 | ON transactions.member_id = members.member_id
49 | INNER JOIN trading.prices
50 | ON transactions.ticker = prices.ticker
51 | AND transactions.txn_date = prices.market_date
52 | GROUP BY
53 | members.first_name,
54 | members.region,
55 | transactions.ticker,
56 | transactions.txn_type
57 | ),
58 | cte_summary AS (
59 | SELECT
60 | first_name,
61 | region,
62 | ticker,
63 | SUM(
64 | CASE
65 | WHEN txn_type = 'BUY' THEN total_quantity
66 | WHEN txn_type = 'SELL' THEN -total_quantity
67 | END
68 | ) AS final_quantity,
69 | SUM(CASE WHEN txn_type = 'BUY' THEN gross_values ELSE 0 END) AS initial_investment,
70 | SUM(CASE WHEN txn_type = 'SELL' THEN gross_values ELSE 0 END) AS sales_revenue,
71 | SUM(CASE WHEN txn_type = 'BUY' THEN fees ELSE 0 END) AS purchase_fees,
72 | SUM(CASE WHEN txn_type = 'SELL' THEN fees ELSE 0 END) AS sales_fees,
73 | SUM(CASE WHEN txn_type = 'BUY' THEN total_quantity ELSE 0 END) AS purchase_quantity,
74 | SUM(CASE WHEN txn_type = 'SELL' THEN total_quantity ELSE 0 END) AS sales_quantity,
75 | SUM(CASE WHEN txn_type = 'BUY' THEN transaction_count ELSE 0 END) AS purchase_transactions,
76 | SUM(CASE WHEN txn_type = 'SELL' THEN transaction_count ELSE 0 END) AS sales_transactions
77 | FROM cte_portfolio
78 | GROUP BY
79 | first_name,
80 | region,
81 | ticker
82 | ),
83 | cte_metrics AS (
84 | SELECT
85 | summary.first_name,
86 | summary.region,
87 | summary.ticker,
88 | summary.final_quantity * final.price AS actual_final_value,
89 | summary.purchase_quantity * final.price AS theoretical_final_value,
90 | summary.sales_revenue,
91 | summary.purchase_fees,
92 | summary.sales_fees,
93 | summary.initial_investment,
94 | summary.purchase_quantity,
95 | summary.sales_quantity,
96 | summary.purchase_transactions,
97 | summary.sales_transactions,
98 | summary.initial_investment / purchase_quantity AS dollar_cost_average,
99 | summary.sales_revenue / sales_quantity AS average_selling_price
100 | FROM cte_summary AS summary
101 | INNER JOIN trading.prices AS final
102 | ON summary.ticker = final.ticker
103 | WHERE final.market_date = '2021-08-29'
104 | )
105 | SELECT
106 | first_name,
107 | region,
108 | ticker,
109 | actual_final_value AS final_portfolio_value,
110 | ( actual_final_value + sales_revenue - purchase_fees - sales_fees ) / initial_investment AS actual_profitability,
111 | ( theoretical_final_value - purchase_fees ) / initial_investment AS theoretical_profitability,
112 | dollar_cost_average,
113 | average_selling_price,
114 | sales_revenue,
115 | purchase_fees,
116 | sales_fees,
117 | initial_investment,
118 | purchase_quantity,
119 | sales_quantity,
120 | purchase_transactions,
121 | sales_transactions
122 | FROM cte_metrics;
123 | ```
124 |
125 |
126 |
127 | Make sure to checkout the results from the temp table by running a `SELECT` query before tackling the following questions!
128 |
129 | ```sql
130 | SELECT * FROM mentor_performance;
131 | ```
132 |
133 | **Warning - this is a very very wide table!!!**
134 |
135 | | first_name | region | ticker | final_portfolio_value | actual_profitability | theoretical_profitability | dollar_cost_average | average_selling_price | sales_revenue | purchase_fees | sales_fees | initial_investment | purchase_quantity | sales_quantity | purchase_transactions | sales_transactions |
136 | | ---------- | ------------- | ------ | ----------------------------- | ----------------------- | ------------------------- | ------------------------- | ------------------------ | --------------------------- | ---------------------------- | -------------------------- | ---------------------------- | ----------------------- | --------------------- | --------------------- | ------------------ |
137 | | Vipul | United States | ETH | 13102554.8171483815588462536 | 4.672465959057919903182 | 5.314621992146962886758 | 597.6673234944262953386 | 626.24202540473186303 | 456793.90734028892884625 | 7104.569887725402883402542 | 1188.37948073671816727685 | 2900193.5795488220132164898 | 4852.52157101720575379 | 729.420717245970714 | 991 | 149 |
138 | | Vikram | India | ETH | 6276426.482786365114210656 | 4.31181306610612708721 | 5.89972140484864266246 | 538.402092304626902638 | 576.741885087371095795 | 557225.943027915290149790 | 4169.62855809858209505223 | 1467.03920970122463238123 | 1583560.245623696053185812 | 2941.2223099752008596 | 966.1617396549943312 | 577 | 204 |
139 | | Sonia | Australia | ETH | 11320688.2256349723009004896 | 4.478791700263700984631 | 5.292086399343751113029 | 600.1977155852074903438 | 547.8389664223296338686 | 444504.2206952554872967260 | 6743.608268140600050792052 | 1074.114521166652869369938 | 2625122.0218275999023003858 | 4373.76210149024236043 | 811.37751773682107399 | 864 | 161 |
140 | | Rowan | United States | ETH | 4678642.5109952568421167048 | 5.074269504377240055910 | 6.70449304368406104767 | 473.813548833111016754 | 451.5349656200992805727 | 262663.3359667433250122461 | 2383.35566531251972219524 | 617.385725289504123294627 | 973205.128602539867412883 | 2053.9833252960165058 | 581.71206211245256233 | 398 | 127 |
141 | | Pavan | Australia | ETH | 4188486.703924433521762992 | 3.85446785580589944402 | 6.16497270013013441338 | 515.248248322169372184 | 531.294867143263816005 | 572045.492969362533176448 | 3229.56956929045240213967 | 1350.07976580743016281956 | 1233880.453924375664710254 | 2394.7300314796354124 | 1076.7005825695475786 | 462 | 215 |
142 | | Nandita | United States | ETH | 11134790.869485909819250128 | 4.94126554503705553061 | 5.30805715638391209991 | 598.397257883758534604 | 542.484898862480594053 | 172591.915512909206341725 | 5783.32678170688531189239 | 447.93810830446683009024 | 2287096.578215583047801140 | 3822.0371970017654265 | 318.1506358514526923 | 756 | 70 |
143 | | Leah | Asia | ETH | 5011670.9776990206825808176 | 4.270686580003484588095 | 5.48302943975337301171 | 579.309226305712103035 | 661.5229873200303442848 | 403560.1542523780850137170 | 3216.06018209301609128188 | 986.188509194701148571051 | 1267016.153467224471656408 | 2187.1154401373141792 | 610.04706108140806606 | 435 | 126 |
144 | | Enoch | Africa | ETH | 2183933.3382704268238606128 | 2.927688747446586597299 | 5.826836991147052000975 | 545.1397546539025353837 | 571.75035531306719678 | 604577.95275612948938510 | 2441.664983053686080567615 | 1602.26015552138487694330 | 951080.3934730024378543018 | 1744.65425673609669642 | 1057.415963344853958 | 351 | 209 |
145 | | Danny | Australia | ETH | 11138441.431815681783446160 | 5.08249463426467766581 | 6.23008931789695811035 | 509.872366169852376539 | 566.702626000625094273 | 573254.802885147022737177 | 5824.68526547388391619532 | 1455.00530622275440216891 | 2302888.126083087786695727 | 4516.5972484100717280 | 1011.5619313973581290 | 904 | 200 |
146 | | Charlie | United States | ETH | 5655595.638497926206310488 | 4.38490746540649869678 | 5.82323930041798717653 | 545.48335086195787859 | 636.245983138421001829 | 505215.393166603619467180 | 3500.08393571387478289503 | 1162.34054359444856996608 | 1403940.36949523667356874 | 2573.754757641582429 | 794.0567116424364033 | 501 | 158 |
147 | | Ben | Australia | ETH | 13428404.7308956467466546296 | 5.566492182258915860530 | 5.798153939300467470194 | 547.8409611043090877031 | 575.24111102831531919 | 124538.44970426124410866 | 6102.488840463623164634342 | 311.46640129618281001108 | 2433584.5235770879150933104 | 4442.13685422790551869 | 216.497825549417380 | 877 | 44 |
148 | | Ayush | United States | ETH | 1311604.529255899622334984 | 2.68703356830932432729 | 5.81625080031458481996 | 546.136841365536146123 | 472.423274839163292495 | 334003.599653997130003622 | 1535.33554277335488666741 | 874.60265072177019376644 | 611528.717056667941945237 | 1119.7353314008790779 | 707.0007288859948728 | 222 | 145 |
149 | | Alex | United States | ETH | 8166074.5514961468370127832 | 5.160548701564200259435 | 6.195710408303723678880 | 512.7005436032844769009 | 594.10347686987495653 | 394356.04132861779977087 | 4190.280933001530243794585 | 1055.12264813874726085400 | 1657805.3389265534531313035 | 3233.47685039578173973 | 663.783426089917745 | 651 | 136 |
150 | | Abe | United States | ETH | 6639149.360373732402400560 | 6.30636732390684560266 | 6.96696762485071066549 | 455.956218714819773258 | 402.869591682089677282 | 102436.515287599786845294 | 2830.40882752809559852513 | 280.26871378724081389402 | 1068519.299942312419013546 | 2343.4690790139731866 | 254.2671807517156776 | 452 | 49 |
151 | | Vipul | United States | BTC | 78502964.072588956226743050 | 2.68714530854748256805 | 3.68294320587067762687 | 13093.31447811552297915 | 13475.453907836528677637 | 13126372.591344438019363620 | 86003.42201666670006452538 | 33121.33857697519500985826 | 34054805.89838476266109091 | 2600.9308762349498788 | 974.09502352354264169 | 491 | 202 |
152 | | Vikram | India | BTC | 166813181.437258245953618100 | 3.52715101148526165567 | 3.95599133132907749521 | 12190.13846337579877423 | 11616.444334417796046131 | 6676932.386251731199555320 | 124651.51307436673592167471 | 16956.79224576098138040873 | 49146890.77777644033111307 | 4031.6925788360780822 | 574.78279876648434158 | 792 | 115 |
153 | | Sonia | Australia | BTC | 124284811.35069950050073250 | 3.22239151512076129431 | 3.98017619235346804918 | 12116.19259536080282520 | 13301.41484742083517680 | 12188130.03561228407096717 | 106218.63998275465210197183 | 32557.52213358600760727724 | 42308380.15320625283012337 | 3491.8873912094965336 | 916.3032786678013621 | 692 | 192 |
154 | | Rowan | United States | BTC | 123967563.642518772902551350 | 3.21604584843623520833 | 3.99480787192328942564 | 12071.72576961881153034 | 14698.463685865612609155 | 14674630.646730677455969795 | 109785.90179561532962420871 | 34896.99704422013042418676 | 43064532.63337412145429443 | 3567.3882471515063849 | 998.37853535959315513 | 731 | 192 |
155 | | Pavan | Australia | BTC | 74876626.12728888362311500 | 2.96170097119995459091 | 4.19446871610550698321 | 11497.41992360170303301 | 12736.34639275082929004 | 13102135.31594895329161890 | 75946.04432629785510284083 | 35021.69278578816478913785 | 29668016.64332961950593339 | 2580.4064599247725600 | 1028.7200828179673870 | 526 | 197 |
156 | | Nandita | United States | BTC | 200751409.8030429976334624250 | 3.318954506414827841503 | 3.800967820444996477195 | 12686.9187126851255182628 | 12710.97315213559195557 | 10975745.05336688201117242 | 162919.943377863222128081502 | 29522.09286188312984442411 | 63735345.6973630892576024850 | 5023.705687783492459935 | 863.4858182768507102 | 954 | 167 |
157 | | Leah | Asia | BTC | 195244116.218934723675250200 | 3.20713866637997563156 | 3.69585020094976729586 | 13047.716421455726604292 | 14163.27115481523942743 | 13167023.04333994792518748 | 162182.80997147155924445895 | 34511.92424266569639230177 | 64922183.350145074994503074 | 4975.75064119164404784 | 929.6597445190769838 | 985 | 178 |
158 | | Enoch | Africa | BTC | 170580708.903959882263220550 | 3.32210366299759732651 | 3.81349609220645770901 | 12645.17491200523085792 | 12221.728791595667220353 | 8993630.724672519050054694 | 139549.78578174917338018370 | 22127.73733517252665241326 | 54005738.62395010934056671 | 4270.8573823425313401 | 735.87222217343213249 | 820 | 146 |
159 | | Danny | Australia | BTC | 159477000.13128947514105900 | 3.15215421180060014193 | 3.86323307663085347221 | 12482.56627031336321059 | 12136.89819062786950087 | 13053993.84143928202879456 | 139898.65835627135429570787 | 33921.86482473383782701913 | 54679169.18667898299926584 | 4380.4429315724604872 | 1075.5626055691556454 | 841 | 216 |
160 | | Charlie | United States | BTC | 122534214.004535973744886950 | 3.72521619894663216642 | 4.00865603400732590012 | 12029.849955058029497028 | 16533.3629303089026886 | 5048387.7946428438527447 | 89519.44751889309135946118 | 13459.650465263838174873 | 34220731.332920134486598673 | 2844.65155099725936589 | 305.345489355233177 | 557 | 51 |
161 | | Ben | Australia | BTC | 174495457.85993726210654700 | 3.58080995618084189922 | 3.72643936325482578725 | 12940.57274048096941435 | 12007.5028729516103273 | 2379058.8188878469437270 | 124879.65939037520247690716 | 6292.82529374197896891580 | 49358482.11912615398782341 | 3814.2424689381731354 | 198.131022250011036 | 738 | 38 |
162 | | Ayush | United States | BTC | 190375533.507735667440266700 | 3.60665921157546573899 | 3.94587234371446856760 | 12221.424291634983319324 | 11627.47349802419417012 | 5849092.67497492474683032 | 137094.21663401017780783864 | 15782.14185896643545309336 | 54363813.801684160086909672 | 4448.23880624893711454 | 503.0407229884321422 | 879 | 95 |
163 | | Alex | United States | BTC | 106225322.51226045757462950 | 3.17134902464366229781 | 3.64732289428635720234 | 13221.31584224600059279 | 10912.7808704054692250 | 4864305.0963092239886611 | 86273.62341244515836663134 | 12991.807119280435744693 | 34997838.87410784294077066 | 2647.0768334782105019 | 445.743862547520261 | 534 | 84 |
164 | | Abe | United States | BTC | 179533509.46040851816270350 | 3.45022907603633744257 | 4.02573821957975606285 | 11978.89286334087078326 | 11297.72474955164239817 | 9629772.49687926822047538 | 142276.80367018421615985439 | 24216.72648756990733881614 | 54778040.60600280511777383 | 4572.8800842388871361 | 852.3638794847991004 | 900 | 171 |
165 |
166 |
167 | ## Question 1
168 |
169 | > Which mentors have the greatest actual profitability for each ticker?
170 |
171 | Click here to reveal the solution!
172 |
173 | ```sql
174 | WITH cte_ranks AS (
175 | SELECT
176 | first_name,
177 | ticker,
178 | actual_profitability,
179 | RANK() OVER (PARTITION BY ticker ORDER BY actual_profitability DESC) AS profitability_rank
180 | FROM mentor_performance
181 | )
182 | SELECT * FROM cte_ranks
183 | WHERE profitability_rank = 1;
184 | ```
185 |
186 |
187 |
188 | | first_name | ticker | actual_profitability | profitability_rank |
189 | | ---------- | ------ | ---------------------- | ------------------ |
190 | | Charlie | BTC | 3.72521619894663216642 | 1 |
191 | | Abe | ETH | 6.30636732390684560266 | 1 |
192 |
193 |
194 | ## Question 2
195 |
196 | > Which mentors have the greatest difference in actual vs theoretical profitability for each ticker?
197 |
198 | Click here to reveal the solution!
199 |
200 | ```sql
201 | WITH cte_ranks AS (
202 | SELECT
203 | first_name,
204 | ticker,
205 | ABS(actual_profitability - theoretical_profitability) AS difference,
206 | RANK() OVER (
207 | PARTITION BY ticker
208 | ORDER BY ABS(actual_profitability - theoretical_profitability) DESC
209 | ) AS difference_rank
210 | FROM mentor_performance
211 | )
212 | SELECT * FROM cte_ranks
213 | WHERE difference_rank = 1;
214 | ```
215 |
216 |
217 |
218 | | first_name | ticker | difference | difference_rank |
219 | | ---------- | ------ | ---------------------- | --------------- |
220 | | Pavan | BTC | 1.23276774490555239230 | 1 |
221 | | Ayush | ETH | 3.12921723200526049267 | 1 |
222 |
223 |
224 | ## Question 3
225 |
226 | > What is the total amount of sales revenue made by all mentors for each region? (combined BTC and ETH)
227 |
228 | Click here to reveal the solution!
229 |
230 | ```sql
231 | SELECT
232 | region,
233 | SUM(sales_revenue) AS total_sales
234 | FROM mentor_performance
235 | GROUP BY region
236 | ORDER BY total_sales DESC;
237 | ```
238 |
239 |
240 |
241 | | region | total_sales |
242 | | ------------- | ---------------------------- |
243 | | United States | 66396367.0625050180915045221 |
244 | | Australia | 42437660.9781423926224266410 |
245 | | Asia | 13570583.1975923260102011970 |
246 | | Africa | 9598208.677428648539439794 |
247 | | India | 7234158.329279646489705110 |
248 |
249 |
250 | ## Question 4
251 |
252 | > What is the average actual profitability for each region for each ticker?
253 |
254 | Click here to reveal the solution!
255 |
256 | ```sql
257 | SELECT
258 | region,
259 | ticker,
260 | AVG(actual_profitability) AS avg_profitability
261 | FROM mentor_performance
262 | GROUP BY region, ticker
263 | ORDER BY ticker, avg_profitability DESC;
264 | ```
265 |
266 |
267 |
268 | | region | ticker | avg_profitability |
269 | | ------------- | ------ | ----------------------- |
270 | | India | BTC | 3.52715101148526165567 |
271 | | Africa | BTC | 3.32210366299759732651 |
272 | | United States | BTC | 3.310799882085806180525 |
273 | | Australia | BTC | 3.22926416357553948159 |
274 | | Asia | BTC | 3.20713866637997563156 |
275 | | United States | ETH | 4.746694009665583482267 |
276 | | Australia | ETH | 4.745561593148298488748 |
277 | | India | ETH | 4.31181306610612708721 |
278 | | Asia | ETH | 4.270686580003484588095 |
279 | | Africa | ETH | 2.927688747446586597299 |
280 |
281 |
282 | ## Question 5
283 |
284 | > Which mentors have the largest initial investment in each ticker?
285 |
286 | Click here to reveal the solution!
287 |
288 | ```sql
289 | WITH cte_rank AS (
290 | SELECT
291 | first_name,
292 | ticker,
293 | initial_investment,
294 | RANK() OVER (PARTITION BY ticker ORDER BY initial_investment DESC) AS investment_rank
295 | FROM mentor_performance
296 | )
297 | SELECT * FROM cte_rank
298 | WHERE investment_rank = 1;
299 | ```
300 |
301 |
302 |
303 | | first_name | ticker | initial_investment | investment_rank |
304 | | ---------- | ------ | --------------------------- | --------------- |
305 | | Leah | BTC | 64922183.350145074994503074 | 1 |
306 | | Vipul | ETH | 2900193.5795488220132164898 | 1 |
307 |
308 |
309 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step11.md)
310 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step13.md)
--------------------------------------------------------------------------------
/course-content/step13.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # SQL Masterclass Summary
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step12.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/the-end.md)
14 |
15 | Congratulations you have reached the end of this SQL Masterclass!
16 |
17 | Here are some of the topics which you covered in this short but sweet SQL Masterclass course:
18 |
19 | > Step 2: Inspecting the `trading.members` table
20 |
21 | * Selecting rows and columns from database tables with `SELECT`
22 | * Use `LIMIT` to only return a set number of rows from a query
23 | * Counting the number of records using `COUNT(*)`
24 | * Counting the number of unique column and table records using `COUNT(DISTINCT)
25 | * Filtering data using a `WHERE` filter
26 | * Selecting `DATE` ranges using `BETWEEN`, `>`, `>=`, `<`, `<=`
27 | * Using the `IN` and `NOT IN` filter conditions to remove and keep records
28 | * Use `CASE WHEN` to apply simple if-else logic to an existing column
29 |
30 | > Step 3: Analyzing daily BTC and ETH prices in the `trading.prices` table
31 |
32 | * Finding the `MIN` and `MAX` dates
33 | * Use `GROUP BY` to aggregate data at different levels for analysis
34 | * Extracting information from dates using `DATE_TRUNC` and `EXTRACT`
35 | * Using `DATE_TRUNC` to obtain the begining date of month for a `DATE`
36 | * Using `AVG` to find the average price
37 | * Casting float data types to an exact `NUMERIC` to use with the `ROUND` function
38 | * Using `AND` conditions to apply multiple logical rules for `WHERE` filters
39 | * Using `SUM CASE WHEN` to aggregate logical values similar to a COUNTIF in Excel
40 | * Casting `INTEGER` data types to a `NUMERIC` to avoid integer floor division errors
41 |
42 | > Step 4: View all transaction histories in the `trading.transactions` table
43 |
44 | * More advanced usage of `SUM CASE WHEN` to replicate SUMIF functionality in Excel
45 | * How to filter records from a `GROUP BY` result using the `HAVING` clause
46 | * Using CTEs and subqueries to perform the same filtering of results
47 | * Use a `RANK` window function to perform custom ordering for a results set
48 |
49 | > Step 5: Starting data analysis
50 |
51 | * Interpreting entity relationship diagrams (ERDs) to visualize table joins
52 | * Analyzing ranges of data to make sure the analysis periods are aligned
53 | * Perform an `INNER JOIN` to combine datasets to select columns from both tables
54 | * Combining CTEs and joins for step-wise queries
55 | * Combining multiple aggregation functions to generate larger table outputs
56 |
57 | > Steps 6-7: Planning ahead and using base tables for data analysis
58 |
59 | * Drop and create a temporary table to re-use in future SQL queries
60 | * Add time `INTERVAL` to a date
61 | * Use `ALTER` and `UPDATE` statements to manipulate an existing temporary table
62 | * Use a custom `WINDOW FRAME` clause to specify a sliding window for cumulative metrics
63 | * Use a `SUM` window function to calculate a denominator value for percentage calculations
64 | * Use `MAX CASE WHEN` to pivot data from long to wide
65 |
66 | > Steps 8-12: Final Case Study Scenarios
67 |
68 | * Creating simplified data scenarios to better understand each question
69 | * Implementing a `SUM PRODUCT` aggregation to calculate initial investments
70 | * Performing multiple joins to the same tables with different joining conditions
71 | * Multiplying many columns to generate fees based off percentages
72 | * Calculating hypothetical scenarios and implementing complex logic using SQL
73 | * Creating a complete CTE workflow to generate a reporting dataset
74 | * Aggregating data at multiple levels to generate multiple insights
75 |
76 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step12.md)
77 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/the-end.md)
--------------------------------------------------------------------------------
/course-content/step2.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 2 - Exploring The Members Data
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step1.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step3.md)
14 |
15 | Let's now inspect our `trading.members` table in a bit more depth.
16 |
17 | ## Table Records
18 |
19 | We can see that there are 3 columns and 14 rows in this dataset:
20 |
21 | `SELECT * FROM trading.members;`
22 |
23 | | member_id | first_name | region |
24 | | --------- | ---------- | -------------- |
25 | | c4ca42 | Danny | Australia |
26 | | c81e72 | Vipul | United States |
27 | | eccbc8 | Charlie | United States |
28 | | a87ff6 | Nandita | United States |
29 | | e4da3b | Rowan | United States |
30 | | 167909 | Ayush | United States |
31 | | 8f14e4 | Alex | United States |
32 | | c9f0f8 | Abe | United States |
33 | | 45c48c | Ben | Australia |
34 | | d3d944 | Enoch | Africa |
35 | | 6512bd | Vikram | India |
36 | | c20ad4 | Leah | Asia |
37 | | c51ce4 | Pavan | Australia |
38 | | aab323 | Sonia | Australia |
39 |
40 |
41 | ## Basic SQL Introduction
42 |
43 | Let's try and answer a few questions using this dataset too better understand the DWD mentor team from the `trading.members` table.
44 |
45 | Each question has its own SQL query solution which you can run to generate the required data outputs.
46 |
47 | All the solutions are initially hidden - if you are adventurous, you can try to answer each question without looking at the solution too!
48 |
49 | ### How Do I Run These Code Examples?
50 |
51 | Reveal the code, click on the right hand corner in GitHub to copy it to your clipboard and paste it directly into the SQLPad interface then click on the `Run` button in the top right of SQLPad or hit `cmd` + `enter` on Mac or `control` + `enter` on Windows to run the query.
52 |
53 | > Remember to clear all previous SQL queries from SQLPad before running each new SQL query to avoid clashes in generated outputs!
54 |
55 | ### Question 1
56 |
57 | > Show only the top 5 rows from the `trading.members` table
58 |
59 |
60 | Click here to reveal the solution!
61 |
62 | ```sql
63 | SELECT * FROM trading.members
64 | LIMIT 5;
65 | ```
66 |
67 |
68 |
69 |
70 | | member_id | first_name | region |
71 | | --------- | ---------- | ------------- |
72 | | c4ca42 | Danny | Australia |
73 | | c81e72 | Vipul | United States |
74 | | eccbc8 | Charlie | United States |
75 | | a87ff6 | Nandita | United States |
76 | | e4da3b | Rowan | United States |
77 |
78 |
79 | ### Question 2
80 |
81 | > Sort all the rows in the table by `first_name` in alphabetical order and show the top 3 rows
82 |
83 |
84 | Click here to reveal the solution!
85 |
86 | ```sql
87 | SELECT * FROM trading.members
88 | ORDER BY first_name
89 | LIMIT 3;
90 | ```
91 |
92 |
93 |
94 |
95 | | member_id | first_name | region |
96 | | --------- | ---------- | ------------- |
97 | | c9f0f8 | Abe | United States |
98 | | 8f14e4 | Alex | United States |
99 | | 167909 | Ayush | United States |
100 |
101 |
102 | ### Question 3
103 |
104 | > Which records from `trading.members` are from the United States region?
105 |
106 |
107 | Click here to reveal the solution!
108 |
109 | ```sql
110 | SELECT * FROM trading.members
111 | WHERE region = 'United States';
112 | ```
113 |
114 |
115 |
116 |
117 | | member_id | first_name | region |
118 | | --------- | ---------- | ------------- |
119 | | c81e72 | Vipul | United States |
120 | | eccbc8 | Charlie | United States |
121 | | a87ff6 | Nandita | United States |
122 | | e4da3b | Rowan | United States |
123 | | 167909 | Ayush | United States |
124 | | 8f14e4 | Alex | United States |
125 | | c9f0f8 | Abe | United States |
126 |
127 |
128 | ### Question 4
129 |
130 | > Select only the `member_id` and `first_name` columns for members who are not from Australia
131 |
132 |
133 | Click here to reveal the solution!
134 |
135 | ```sql
136 | SELECT
137 | member_id,
138 | first_name
139 | FROM trading.members
140 | WHERE region != 'Australia';
141 | ```
142 |
143 |
144 |
145 |
146 | | member_id | first_name |
147 | | --------- | ---------- |
148 | | c81e72 | Vipul |
149 | | eccbc8 | Charlie |
150 | | a87ff6 | Nandita |
151 | | e4da3b | Rowan |
152 | | 167909 | Ayush |
153 | | 8f14e4 | Alex |
154 | | c9f0f8 | Abe |
155 | | d3d944 | Enoch |
156 | | 6512bd | Vikram |
157 | | c20ad4 | Leah |
158 |
159 |
160 | ### Question 5
161 |
162 | > Return the unique `region` values from the `trading.members` table and sort the output by reverse alphabetical order
163 |
164 |
165 | Click here to reveal the solution!
166 |
167 | ```sql
168 | SELECT DISTINCT region
169 | FROM trading.members
170 | ORDER BY region DESC;
171 | ```
172 |
173 |
174 |
175 |
176 | | region |
177 | | ------------- |
178 | | United States |
179 | | India |
180 | | Australia |
181 | | Asia |
182 | | Africa |
183 |
184 |
185 | ### Question 6
186 |
187 | > How many mentors are there from Australia or the United States?
188 |
189 |
190 | Click here to reveal the solution!
191 |
192 | ```sql
193 | SELECT
194 | COUNT(*) AS mentor_count
195 | FROM trading.members
196 | WHERE region IN ('Australia', 'United States');
197 | ```
198 |
199 |
200 |
201 |
202 | | mentor_count |
203 | | ------------- |
204 | | 11 |
205 |
206 |
207 | ### Question 7
208 |
209 | > How many mentors are not from Australia or the United States?
210 |
211 |
212 | Click here to reveal the solution!
213 |
214 | ```sql
215 | SELECT
216 | COUNT(*) AS mentor_count
217 | FROM trading.members
218 | WHERE region NOT IN ('Australia', 'United States');
219 | ```
220 |
221 |
222 |
223 |
224 | | mentor_count |
225 | | ------------ |
226 | | 3 |
227 |
228 |
229 | ### Question 8
230 |
231 | > How many mentors are there per region? Sort the output by regions with the most mentors to the least
232 |
233 |
234 | Click here to reveal the solution!
235 |
236 | ```sql
237 | SELECT
238 | region,
239 | COUNT(*) AS mentor_count
240 | FROM trading.members
241 | GROUP BY region
242 | ORDER BY mentor_count DESC;
243 | ```
244 |
245 |
246 |
247 |
248 | | region | mentor_count |
249 | | ------------- | ------------ |
250 | | United States | 7 |
251 | | Australia | 4 |
252 | | India | 1 |
253 | | Africa | 1 |
254 | | Asia | 1 |
255 |
256 |
257 | ### Question 9
258 |
259 | > How many US mentors and non US mentors are there?
260 |
261 |
262 | Click here to reveal the solution!
263 |
264 | ```sql
265 | SELECT
266 | CASE
267 | WHEN region != 'United States' THEN 'Non US'
268 | ELSE region
269 | END AS mentor_region,
270 | COUNT(*) AS mentor_count
271 | FROM trading.members
272 | GROUP BY mentor_region
273 | ORDER BY mentor_count DESC;
274 | ```
275 |
276 |
277 |
278 |
279 | | mentor_region | mentor_count |
280 | | ------------- | ------------ |
281 | | United States | 7 |
282 | | Non US | 7 |
283 |
284 |
285 | ### Question 10
286 |
287 | > How many mentors have a first name starting with a letter before `'E'`?
288 |
289 |
290 | Click here to reveal the solution!
291 |
292 | ```sql
293 | SELECT
294 | COUNT(*) AS mentor_count
295 | FROM trading.members
296 | WHERE LEFT(first_name, 1) < 'E';
297 | ```
298 |
299 |
300 |
301 |
302 | | mentor_count |
303 | | ------------ |
304 | | 6 |
305 |
306 |
307 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step1.md)
308 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step3.md)
309 |
310 | ## Appendix
311 |
312 | > `SELECT *`
313 |
314 | In practice - always try to return specific columns which you are after and use `SELECT *` sparingly!
315 |
316 | > `LIMIT`
317 |
318 | Note that `LIMIT` is sometimes implemented as `TOP` in some database flavours.
319 |
320 | One must also be careful when using `LIMIT` with newer database tools such as BigQuery - although you will only return the number of rows you ask for, BQ is billed by the total number of rows scanned and a `LIMIT` will not avoid this!
321 |
322 | Best practice is to always apply `WHERE` filters on specific partitions where possible to narrow down the amount of data that must be scanned - reducing your query costs and speeding up your query execution!
323 |
324 | > `!=` or `<>` for "not equals"
325 |
326 | You might have noticed in questions 4 and 9 there are two different methods for showing "not equals"
327 |
328 | You can use both `!=` or `<>` in `WHERE` filters to exclude records.
--------------------------------------------------------------------------------
/course-content/step3.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 3 - Daily Prices
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step2.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step4.md)
14 |
15 | Our next dataset to explore will be the `trading.prices` table which contains the daily price and volume data for the 2 cryptocurrency tickers: `ETH` and `BTC` (Ethereum and Bitcoin!)
16 |
17 | ## View The Data
18 |
19 | Before we try to solve our next set of questions below - you can try viewing a few rows from the `trading.prices` dataset:
20 |
21 | Example Bitcoin price data:
22 |
23 | ```sql
24 | SELECT * FROM trading.prices WHERE ticker = 'BTC' LIMIT 5;
25 | ```
26 |
27 | | ticker | market_date | price | open | high | low | volume | change |
28 | | ------ | ----------- | ------- | ------- | ------- | ------- | ------ | ------ |
29 | | BTC | 2021-08-29 | 48255.0 | 48899.7 | 49621.7 | 48101.9 | 40.96K | -1.31% |
30 | | BTC | 2021-08-28 | 48897.1 | 49062.8 | 49289.4 | 48428.5 | 36.73K | -0.34% |
31 | | BTC | 2021-08-27 | 49064.3 | 46830.2 | 49142.0 | 46371.5 | 62.47K | 4.77% |
32 | | BTC | 2021-08-26 | 46831.6 | 48994.4 | 49347.8 | 46360.4 | 73.79K | -4.41% |
33 | | BTC | 2021-08-25 | 48994.5 | 47707.4 | 49230.2 | 47163.3 | 63.54K | 2.68% |
34 |
35 |
36 | Example Ethereum price data:
37 |
38 | ```sql
39 | SELECT * FROM trading.prices WHERE ticker = 'ETH' LIMIT 5;
40 | ```
41 |
42 | | ticker | market_date | price | open | high | low | volume | change |
43 | | ------ | ----------- | ------- | ------- | ------- | ------- | ------- | ------ |
44 | | ETH | 2021-08-29 | 3177.84 | 3243.96 | 3282.21 | 3162.79 | 582.04K | -2.04% |
45 | | ETH | 2021-08-28 | 3243.90 | 3273.78 | 3284.58 | 3212.24 | 466.21K | -0.91% |
46 | | ETH | 2021-08-27 | 3273.58 | 3093.78 | 3279.93 | 3063.37 | 839.54K | 5.82% |
47 | | ETH | 2021-08-26 | 3093.54 | 3228.03 | 3249.62 | 3057.48 | 118.44K | -4.17% |
48 | | ETH | 2021-08-25 | 3228.15 | 3172.12 | 3247.43 | 3080.70 | 923.13K | 1.73% |
49 |
50 |
51 | ## Data Dictionary
52 |
53 | | Column Name | Description |
54 | | ----------- | ------------------------------- |
55 | | ticker | one of either BTC or ETH |
56 | | market_date | the date for each record |
57 | | price | closing price at end of day |
58 | | open | the opening price |
59 | | high | the highest price for that day |
60 | | low | the lowest price for that day |
61 | | volume | the total volume traded |
62 | | change | % change price in price |
63 |
64 |
65 | ## Data Exploration Questions
66 |
67 | Let's answer a few simple questions to help us better understand the `trading.prices` table.
68 |
69 | > Remember to clear all previous SQL queries from SQLPad before running each new SQL query!
70 |
71 | ### Question 1
72 |
73 | > How many total records do we have in the `trading.prices` table?
74 |
75 |
76 | Click here to reveal the solution!
77 |
78 |
79 | ```sql
80 | SELECT
81 | COUNT(*) AS total_records
82 | FROM trading.prices;
83 | ```
84 |
85 |
86 |
87 |
88 | | total_records |
89 | | ------------- |
90 | | 3404 |
91 |
92 |
93 | ### Question 2
94 |
95 | > How many records are there per `ticker` value?
96 |
97 |
98 | Click here to reveal the solution!
99 |
100 |
101 | ```sql
102 | SELECT
103 | ticker,
104 | COUNT(*) AS record_count
105 | FROM trading.prices
106 | GROUP BY ticker;
107 | ```
108 |
109 |
110 |
111 |
112 | | ticker | record_count |
113 | | ------ | ------------ |
114 | | BTC | 1702 |
115 | | ETH | 1702 |
116 |
117 |
118 | ### Question 3
119 |
120 | > What is the minimum and maximum `market_date` values?
121 |
122 |
123 | Click here to reveal the solution!
124 |
125 |
126 | ```sql
127 | SELECT
128 | MIN(market_date) AS min_date,
129 | MAX(market_date) AS max_date
130 | FROM trading.prices;
131 | ```
132 |
133 |
134 |
135 |
136 | | min_date | max_date |
137 | | ---------- | ---------- |
138 | | 2017-01-01 | 2021-08-29 |
139 |
140 |
141 | ### Question 4
142 |
143 | > Are there differences in the minimum and maximum `market_date` values for each ticker?
144 |
145 |
146 | Click here to reveal the solution!
147 |
148 |
149 | ```sql
150 | SELECT
151 | ticker,
152 | MIN(market_date) AS min_date,
153 | MAX(market_date) AS max_date
154 | FROM trading.prices
155 | GROUP BY ticker;
156 | ```
157 |
158 |
159 |
160 |
161 | | ticker | min_date | max_date |
162 | | ------ | ---------- | ---------- |
163 | | BTC | 2017-01-01 | 2021-08-29 |
164 | | ETH | 2017-01-01 | 2021-08-29 |
165 |
166 |
167 | ### Question 5
168 |
169 | > What is the average of the `price` column for Bitcoin records during the year 2020?
170 |
171 |
172 | Click here to reveal the solution!
173 |
174 |
175 | ```sql
176 | SELECT
177 | AVG(price)
178 | FROM trading.prices
179 | WHERE ticker = 'BTC'
180 | AND market_date BETWEEN '2020-01-01' AND '2020-12-31';
181 | ```
182 |
183 |
184 |
185 |
186 | | avg |
187 | | ------------------ |
188 | | 11111.631147540984 |
189 |
190 |
191 | ### Question 6
192 |
193 | > What is the monthly average of the `price` column for Ethereum in 2020? Sort the output in chronological order and also round the average price value to 2 decimal places
194 |
195 | Click here to reveal the solution!
196 |
197 | ```sql
198 | SELECT
199 | DATE_TRUNC('MON', market_date) AS month_start,
200 | -- need to cast approx. floats to exact numeric types for round!
201 | ROUND(AVG(price)::NUMERIC, 2) AS average_eth_price
202 | FROM trading.prices
203 | WHERE EXTRACT(YEAR FROM market_date) = 2020
204 | AND ticker = 'ETH'
205 | GROUP BY month_start
206 | ORDER BY month_start;
207 | ```
208 |
209 |
210 |
211 |
212 | | month_start | average_eth_price |
213 | | ---------------------- | ----------------- |
214 | | 2020-01-01 00:00:00+00 | 156.65 |
215 | | 2020-02-01 00:00:00+00 | 238.76 |
216 | | 2020-03-01 00:00:00+00 | 160.18 |
217 | | 2020-04-01 00:00:00+00 | 171.29 |
218 | | 2020-05-01 00:00:00+00 | 207.45 |
219 | | 2020-06-01 00:00:00+00 | 235.92 |
220 | | 2020-07-01 00:00:00+00 | 259.57 |
221 | | 2020-08-01 00:00:00+00 | 401.73 |
222 | | 2020-09-01 00:00:00+00 | 367.77 |
223 | | 2020-10-01 00:00:00+00 | 375.79 |
224 | | 2020-11-01 00:00:00+00 | 486.73 |
225 | | 2020-12-01 00:00:00+00 | 622.35 |
226 |
227 |
228 |
229 |
230 | ### Question 7
231 |
232 | > Are there any duplicate `market_date` values for any `ticker` value in our table?
233 |
234 | As you inspect the output from the following SQL query - what is your final answer?
235 |
236 |
237 | Click here to reveal the solution!
238 |
239 |
240 | ```sql
241 | SELECT
242 | ticker,
243 | COUNT(market_date) AS total_count,
244 | COUNT(DISTINCT market_date) AS unique_count
245 | FROM trading.prices
246 | GROUP BY ticker;
247 | ```
248 |
249 |
250 |
251 |
252 | | ticker | total_count | unique_count |
253 | | ------ | ----------- | ------------ |
254 | | BTC | 1702 | 1702 |
255 | | ETH | 1702 | 1702 |
256 |
257 |
258 | ### Question 8
259 |
260 | > How many days from the `trading.prices` table exist where the `high` price of Bitcoin is over $30,000?
261 |
262 |
263 | Click here to reveal the solution!
264 |
265 |
266 | ```sql
267 | SELECT
268 | COUNT(*) AS row_count
269 | FROM trading.prices
270 | WHERE ticker = 'BTC'
271 | AND high > 30000;
272 | ```
273 |
274 |
275 |
276 |
277 | | row_count |
278 | | --------- |
279 | | 240 |
280 |
281 |
282 | ### Question 9
283 |
284 | > How many "breakout" days were there in 2020 where the `price` column is greater than the `open` column for each `ticker`?
285 |
286 |
287 | Click here to reveal the solution!
288 |
289 |
290 | ```sql
291 | SELECT
292 | ticker,
293 | SUM(CASE WHEN price > open THEN 1 ELSE 0 END) AS breakout_days
294 | FROM trading.prices
295 | WHERE DATE_TRUNC('YEAR', market_date) = '2020-01-01'
296 | GROUP BY ticker;
297 | ```
298 |
299 |
300 |
301 |
302 | | ticker | breakout_days |
303 | | ------ | ------------- |
304 | | BTC | 207 |
305 | | ETH | 200 |
306 |
307 |
308 | ### Question 10
309 |
310 | > How many "non_breakout" days were there in 2020 where the `price` column is less than the `open` column for each `ticker`?
311 |
312 |
313 | Click here to reveal the solution!
314 |
315 |
316 | ```sql
317 | SELECT
318 | ticker,
319 | SUM(CASE WHEN price < open THEN 1 ELSE 0 END) AS non_breakout_days
320 | FROM trading.prices
321 | -- this another way to specify the year
322 | WHERE market_date >= '2020-01-01' AND market_date <= '2020-12-31'
323 | GROUP BY ticker;
324 | ```
325 |
326 |
327 |
328 |
329 | | ticker | non_breakout_days |
330 | | ------ | ----------------- |
331 | | BTC | 159 |
332 | | ETH | 166 |
333 |
334 |
335 | ### Question 11
336 |
337 | > What percentage of days in 2020 were breakout days vs non-breakout days? Round the percentages to 2 decimal places
338 |
339 |
340 | Click here to reveal the solution!
341 |
342 |
343 | ```sql
344 | SELECT
345 | ticker,
346 | ROUND(
347 | SUM(CASE WHEN price > open THEN 1 ELSE 0 END)
348 | / COUNT(*)::NUMERIC,
349 | 2
350 | ) AS breakout_percentage,
351 | ROUND(
352 | SUM(CASE WHEN price < open THEN 1 ELSE 0 END)
353 | / COUNT(*)::NUMERIC,
354 | 2
355 | ) AS non_breakout_percentage
356 | FROM trading.prices
357 | WHERE market_date >= '2020-01-01' AND market_date <= '2020-12-31'
358 | GROUP BY ticker;
359 | ```
360 |
361 |
362 |
363 |
364 | | ticker | breakout_percentage | non_breakout_percentage |
365 | | ------ | ------------------- | ----------------------- |
366 | | BTC | 0.57 | 0.43 |
367 | | ETH | 0.55 | 0.45 |
368 |
369 |
370 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step2.md)
371 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step4.md)
372 |
373 | # Appendix
374 |
375 | > Date Manipulations
376 |
377 | We use a variety of date manipulations in questions [5](#question-5), [6](#question-6), [9](#question-9) and [11](#question-11) to filter the `trading.prices` for 2020 values only.
378 |
379 | These are all valid methods to qualify `DATE` or `TIMESTAMP` values within a range using a `WHERE` filter:
380 |
381 | * `market_date BETWEEN '2020-01-01' AND '2020-12-31'`
382 | * `EXTRACT(YEAR FROM market_date) = 2020`
383 | * `DATE_TRUNC('YEAR', market_date) = '2020-01-01'`
384 | * `market_date >= '2020-01-01' AND market_date <= '2020-12-31'`
385 |
386 | The only additional thing to note is that `DATE_TRUNC` returns a `TIMESTAMP` data type which can be cast back to a regular `DATE` using the `::DATE` notation when used in a `SELECT` query.
387 |
388 | > `BETWEEN` Boundaries
389 |
390 | An additional note for [question 5](#question-5) - the boundaries for the `BETWEEN` clause must be `earlier-date-first` AND `later-date-second`
391 |
392 | See what happens when you reverse the order of the `DATE` boundaries using the query below - does it match your expectation?
393 |
394 |
395 | Click here to see the "wrong" code!
396 |
397 |
398 | ```sql
399 | SELECT
400 | AVG(price)
401 | FROM trading.prices
402 | WHERE ticker = 'BTC'
403 | AND market_date BETWEEN '2020-12-31' AND '2020-01-01';
404 | ```
405 |
406 |
407 |
408 |
409 | > Rounding Floats/Doubles
410 |
411 | In PostgreSQL - we cannot apply the `ROUND` function directly to approximate `FLOAT` or `DOUBLE PRECISION` data types.
412 |
413 | Instead we will need to cast any outputs from functions such as `AVG` to an exact `NUMERIC` data type before we can use it with other approximation functions such as `ROUND`
414 |
415 | In [question 6](#question-6) - if we were to remove our `::NUMERIC` from our query - we would run into this error:
416 |
417 | ```
418 | ERROR: function round(double precision, integer) does not exist
419 | LINE 3: ROUND(AVG(price), 2) AS average_eth_price
420 | ^
421 | HINT: No function matches the given name and argument types. You might need to add explicit type casts.
422 | ```
423 |
424 | You can try this yourself by running the below code snippet with the `::NUMERIC` removed:
425 |
426 |
427 | Click here to see the "wrong" code!
428 |
429 |
430 | ```sql
431 | SELECT
432 | DATE_TRUNC('MON', market_date) AS month_start,
433 | ROUND(AVG(price), 2) AS average_eth_price
434 | FROM trading.prices
435 | WHERE EXTRACT(YEAR FROM market_date) = 2020
436 | GROUP BY month_start
437 | ORDER BY month_start;
438 | ```
439 |
440 |
441 |
442 |
443 | > Integer Floor Division
444 |
445 | In [question 5](#question-5) - when dividing values in SQL it is very important to consider the data types of the numerator (the number on top) and the denominator (the number on the bottom)
446 |
447 | When there is an `INTEGER` / `INTEGER` as there is in this case - SQL will default to `FLOOR` division in this case!
448 |
449 | You can try running the same query as the solution to question 5 above - but this time remove the 2 instances of `::NUMERIC` and the decimal place rounding to see what happens!
450 |
451 | This is a super common error found in SQL queries and we usually recommend casting either the numerator or the denominator as a `NUMERIC` type using the shorthand `::NUMERIC` syntax to ensure that you will avoid the dreaded integer floor division!
452 |
453 |
454 | Click here to see the "wrong" code!
455 |
456 |
457 | ```sql
458 | SELECT
459 | ticker,
460 | SUM(CASE WHEN price > open THEN 1 ELSE 0 END) / COUNT(*) AS breakout_percentage,
461 | SUM(CASE WHEN price < open THEN 1 ELSE 0 END) / COUNT(*) AS non_breakout_percentage
462 | FROM trading.prices
463 | WHERE market_date >= '2019-01-01' AND market_date <= '2019-12-31'
464 | GROUP BY ticker;
465 | ```
466 |
467 |
468 |
--------------------------------------------------------------------------------
/course-content/step4.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 4 - Transactions Table
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step3.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step5.md)
14 |
15 | In our third `trading.transactions` database table we have each `BUY` or `SELL` transaction for a specific `ticker` performed by each `member`
16 |
17 | ## View The Data
18 |
19 | You can inspect the most recent 10 transactions by `member_id = 'c4ca42'` (do you remember who that is?)
20 |
21 | ```sql
22 | SELECT * FROM trading.transactions
23 | WHERE member_id = 'c4ca42'
24 | ORDER BY txn_time DESC
25 | LIMIT 10;
26 | ```
27 |
28 | ## Data Dictionary
29 |
30 | | Column Name | Description |
31 | | -------------- | --------------------------------- |
32 | | txn_id | unique ID for each transaction |
33 | | member_id | member identifier for each trade |
34 | | ticker | the ticker for each trade |
35 | | txn_date | the date for each transaction |
36 | | txn_type | either BUY or SELL |
37 | | quantity | the total quantity for each trade |
38 | | percentage_fee | % of total amount charged as fees |
39 | | txn_time | the timestamp for each trade |
40 |
41 |
42 | ## Transactions Questions
43 |
44 | Let's finish our initial data exploration with a few more questions for the `trading.transactions` table!
45 |
46 | ### Question 1
47 |
48 | > How many records are there in the `trading.transactions` table?
49 |
50 |
51 | Click here to reveal the solution!
52 |
53 |
54 | ```sql
55 | SELECT COUNT(*) FROM trading.transactions;
56 | ```
57 |
58 |
59 |
60 |
61 | ### Question 2
62 |
63 | > How many unique transactions are there?
64 |
65 |
66 | Click here to reveal the solution!
67 |
68 |
69 | ```sql
70 | SELECT COUNT(DISTINCT txn_id) FROM trading.transactions;
71 | ```
72 |
73 |
74 |
75 |
76 | | count |
77 | | ----- |
78 | | 22918 |
79 |
80 |
81 | ### Question 3
82 |
83 | > How many buy and sell transactions are there for Bitcoin?
84 |
85 |
86 | Click here to reveal the solution!
87 |
88 |
89 | ```sql
90 | SELECT
91 | txn_type,
92 | COUNT(*) AS transaction_count
93 | FROM trading.transactions
94 | WHERE ticker = 'BTC'
95 | GROUP BY txn_type;
96 | ```
97 |
98 |
99 |
100 |
101 | | txn_type | transaction_count |
102 | | -------- | ----------------- |
103 | | SELL | 2044 |
104 | | BUY | 10440 |
105 |
106 |
107 | ### Question 4
108 |
109 | > For each year, calculate the following buy and sell metrics for Bitcoin:
110 |
111 | * total transaction count
112 | * total quantity
113 | * average quantity per transaction
114 |
115 | Also round the quantity columns to 2 decimal places.
116 |
117 |
118 | Click here to reveal the solution!
119 |
120 |
121 | ```sql
122 | SELECT
123 | EXTRACT(YEAR FROM txn_date) AS txn_year,
124 | txn_type,
125 | COUNT(*) AS transaction_count,
126 | ROUND(SUM(quantity)::NUMERIC, 2) AS total_quantity,
127 | ROUND(AVG(quantity)::NUMERIC, 2) AS average_quantity
128 | FROM trading.transactions
129 | WHERE ticker = 'BTC'
130 | GROUP BY txn_year, txn_type
131 | ORDER BY txn_year, txn_type;
132 | ```
133 |
134 |
135 |
136 |
137 | | txn_year | txn_type | transaction_count | total_quantity | average_quantity |
138 | | -------- | -------- | ----------------- | -------------- | ---------------- |
139 | | 2017 | BUY | 2261 | 12069.58 | 5.34 |
140 | | 2017 | SELL | 419 | 2160.22 | 5.16 |
141 | | 2018 | BUY | 2204 | 11156.06 | 5.06 |
142 | | 2018 | SELL | 433 | 2145.05 | 4.95 |
143 | | 2019 | BUY | 2192 | 11114.43 | 5.07 |
144 | | 2019 | SELL | 443 | 2316.24 | 5.23 |
145 | | 2020 | BUY | 2350 | 11748.76 | 5.00 |
146 | | 2020 | SELL | 456 | 2301.98 | 5.05 |
147 | | 2021 | BUY | 1433 | 7161.32 | 5.00 |
148 | | 2021 | SELL | 293 | 1478.00 | 5.04 |
149 |
150 |
151 | ### Question 5
152 |
153 | > What was the monthly total quantity purchased and sold for Ethereum in 2020?
154 |
155 |
156 | Click here to reveal the solution!
157 |
158 |
159 | ```sql
160 | SELECT
161 | DATE_TRUNC('MON', txn_date)::DATE AS calendar_month,
162 | SUM(CASE WHEN txn_type = 'BUY' THEN quantity ELSE 0 END) AS buy_quantity,
163 | SUM(CASE WHEN txn_type = 'SELL' THEN quantity ELSE 0 END) AS sell_quantity
164 | FROM trading.transactions
165 | WHERE txn_date BETWEEN '2020-01-01' AND '2020-12-31'
166 | AND ticker = 'ETH'
167 | GROUP BY calendar_month
168 | ORDER BY calendar_month;
169 | ```
170 |
171 |
172 |
173 |
174 | | calendar_month | buy_quantity | sell_quantity |
175 | | -------------- | ----------------- | ------------------ |
176 | | 2020-01-01 | 801.0541163041565 | 158.1272716986775 |
177 | | 2020-02-01 | 687.8912804600265 | 160.06533517839912 |
178 | | 2020-03-01 | 804.2368342042604 | 182.1895644691428 |
179 | | 2020-04-01 | 761.87446914631 | 203.16695745059786 |
180 | | 2020-05-01 | 787.4238801914097 | 149.08328330131854 |
181 | | 2020-06-01 | 787.4659503521506 | 208.3362898912813 |
182 | | 2020-07-01 | 890.7807530272569 | 117.01628097387932 |
183 | | 2020-08-01 | 800.6004484214079 | 178.5423079909115 |
184 | | 2020-09-01 | 767.654783160818 | 118.86826373014458 |
185 | | 2020-10-01 | 744.7913667867248 | 174.269279883162 |
186 | | 2020-11-01 | 698.0915637008526 | 163.74629299419385 |
187 | | 2020-12-01 | 752.4121935735661 | 212.77643601396653 |
188 |
189 |
190 | ### Question 6
191 |
192 | > Summarise all buy and sell transactions for each `member_id` by generating 1 row for each member with the following additional columns:
193 |
194 | * Bitcoin buy quantity
195 | * Bitcoin sell quantity
196 | * Ethereum buy quantity
197 | * Ethereum sell quantity
198 |
199 |
200 | Click here to reveal the solution!
201 |
202 |
203 | ```sql
204 | SELECT
205 | member_id,
206 | SUM(
207 | CASE
208 | WHEN ticker = 'BTC' AND txn_type = 'BUY' THEN quantity
209 | ELSE 0
210 | END
211 | ) AS btc_buy_qty,
212 | SUM(
213 | CASE
214 | WHEN ticker = 'BTC' AND txn_type = 'SELL' THEN quantity
215 | ELSE 0
216 | END
217 | ) AS btc_sell_qty,
218 | SUM(
219 | CASE
220 | WHEN ticker = 'ETH' AND txn_type = 'BUY' THEN quantity
221 | ELSE 0
222 | END
223 | ) AS eth_buy_qty,
224 | SUM(
225 | CASE
226 | WHEN ticker = 'BTC' AND txn_type = 'SELL' THEN quantity
227 | ELSE 0
228 | END
229 | ) AS eth_sell_qty
230 | FROM trading.transactions
231 | GROUP BY member_id;
232 | ```
233 |
234 |
235 |
236 |
237 | | member_id | btc_buy_qty | btc_sell_qty | eth_buy_qty | eth_sell_qty |
238 | | --------- | ----------------------- | --------------------- | ---------------------- | --------------------- |
239 | | d3d944 | 4270.8573823425313401 | 735.87222217343213249 | 1744.65425673609669642 | 735.87222217343213249 |
240 | | c20ad4 | 4975.75064119164404784 | 929.6597445190769838 | 2187.1154401373141792 | 929.6597445190769838 |
241 | | c9f0f8 | 4572.8800842388871361 | 852.3638794847991004 | 2343.4690790139731866 | 852.3638794847991004 |
242 | | eccbc8 | 2844.65155099725936589 | 305.345489355233177 | 2573.754757641582429 | 305.345489355233177 |
243 | | 167909 | 4448.23880624893711454 | 503.0407229884321422 | 1119.7353314008790779 | 503.0407229884321422 |
244 | | c81e72 | 2600.9308762349498788 | 974.09502352354264169 | 4852.52157101720575379 | 974.09502352354264169 |
245 | | e4da3b | 3567.3882471515063849 | 998.37853535959315513 | 2053.9833252960165058 | 998.37853535959315513 |
246 | | c51ce4 | 2580.4064599247725600 | 1028.7200828179673870 | 2394.7300314796354124 | 1028.7200828179673870 |
247 | | a87ff6 | 5023.705687783492459935 | 863.4858182768507102 | 3822.0371970017654265 | 863.4858182768507102 |
248 | | 8f14e4 | 2647.0768334782105019 | 445.743862547520261 | 3233.47685039578173973 | 445.743862547520261 |
249 | | 45c48c | 3814.2424689381731354 | 198.131022250011036 | 4442.13685422790551869 | 198.131022250011036 |
250 | | c4ca42 | 4380.4429315724604872 | 1075.5626055691556454 | 4516.5972484100717280 | 1075.5626055691556454 |
251 | | 6512bd | 4031.6925788360780822 | 574.78279876648434158 | 2941.2223099752008596 | 574.78279876648434158 |
252 | | aab323 | 3491.8873912094965336 | 916.3032786678013621 | 4373.76210149024236043 | 916.3032786678013621 |
253 |
254 |
255 | ### Question 7
256 |
257 | > What was the final quantity holding of Bitcoin for each member? Sort the output from the highest BTC holding to lowest
258 |
259 |
260 | Click here to reveal the solution!
261 |
262 |
263 | ```sql
264 | SELECT
265 | member_id,
266 | SUM(
267 | CASE
268 | WHEN txn_type = 'BUY' THEN quantity
269 | WHEN txn_type = 'SELL' THEN -quantity
270 | ELSE 0
271 | END
272 | ) AS final_btc_holding
273 | FROM trading.transactions
274 | WHERE ticker = 'BTC'
275 | GROUP BY member_id
276 | ORDER BY final_btc_holding DESC;
277 | ```
278 |
279 |
280 |
281 |
282 | | member_id | final_btc_holding |
283 | | --------- | ----------------------- |
284 | | a87ff6 | 4160.219869506641749735 |
285 | | c20ad4 | 4046.09089667256706404 |
286 | | 167909 | 3945.19808326050497234 |
287 | | c9f0f8 | 3720.5162047540880357 |
288 | | 45c48c | 3616.1114466881620994 |
289 | | d3d944 | 3534.98516016909920761 |
290 | | 6512bd | 3456.90978006959374062 |
291 | | c4ca42 | 3304.8803260033048418 |
292 | | aab323 | 2575.5841125416951715 |
293 | | e4da3b | 2569.00971179191322977 |
294 | | eccbc8 | 2539.30606164202618889 |
295 | | 8f14e4 | 2201.3329709306902409 |
296 | | c81e72 | 1626.83585271140723711 |
297 | | c51ce4 | 1551.6863771068051730 |
298 |
299 |
300 | ### Question 8
301 |
302 | > Which members have sold less than 500 Bitcoin? Sort the output from the most BTC sold to least
303 |
304 | We can actually do this in 3 different ways!
305 |
306 |
307 | Click here to reveal the `HAVING` solution!
308 |
309 |
310 | ```sql
311 | SELECT
312 | member_id,
313 | SUM(quantity) AS btc_sold_quantity
314 | FROM trading.transactions
315 | WHERE ticker = 'BTC'
316 | AND txn_type = 'SELL'
317 | GROUP BY member_id
318 | HAVING SUM(quantity) < 500
319 | ORDER BY btc_sold_quantity DESC;
320 | ```
321 |
322 |
323 |
324 |
325 |
326 | Click here to reveal the `CTE` solution!
327 |
328 |
329 | ```sql
330 | WITH cte AS (
331 | SELECT
332 | member_id,
333 | SUM(quantity) AS btc_sold_quantity
334 | FROM trading.transactions
335 | WHERE ticker = 'BTC'
336 | AND txn_type = 'SELL'
337 | GROUP BY member_id
338 | )
339 | SELECT * FROM cte
340 | WHERE btc_sold_quantity < 500
341 | ORDER BY btc_sold_quantity DESC;
342 | ```
343 |
344 |
345 |
346 |
347 |
348 | Click here to reveal the `subquery` solution!
349 |
350 |
351 | ```sql
352 | SELECT * FROM (
353 | SELECT
354 | member_id,
355 | SUM(quantity) AS btc_sold_quantity
356 | FROM trading.transactions
357 | WHERE ticker = 'BTC'
358 | AND txn_type = 'SELL'
359 | GROUP BY member_id
360 | ) AS subquery
361 | WHERE btc_sold_quantity < 500
362 | ORDER BY btc_sold_quantity DESC;
363 | ```
364 |
365 |
366 |
367 |
368 | | member_id | btc_sold_quantity |
369 | | --------- | ------------------- |
370 | | 8f14e4 | 445.743862547520261 |
371 | | eccbc8 | 305.345489355233177 |
372 | | 45c48c | 198.131022250011036 |
373 |
374 |
375 | ### Question 9
376 |
377 | > What is the total Bitcoin quantity for each `member_id` owns after adding all of the BUY and SELL transactions from the `transactions` table? Sort the output by descending total quantity
378 |
379 |
380 | Click here to reveal the solution!
381 |
382 |
383 | ```sql
384 | SELECT
385 | member_id,
386 | SUM(
387 | CASE
388 | WHEN txn_type = 'BUY' THEN quantity
389 | WHEN txn_type = 'SELL' THEN -quantity
390 | END
391 | ) AS total_quantity
392 | FROM trading.transactions
393 | WHERE ticker = 'BTC'
394 | GROUP BY member_id
395 | ORDER BY total_quantity DESC;
396 | ```
397 |
398 |
399 |
400 |
401 | | member_id | total_quantity |
402 | | --------- | ----------------------- |
403 | | a87ff6 | 4160.219869506641749735 |
404 | | c20ad4 | 4046.09089667256706404 |
405 | | 167909 | 3945.19808326050497234 |
406 | | c9f0f8 | 3720.5162047540880357 |
407 | | 45c48c | 3616.1114466881620994 |
408 | | d3d944 | 3534.98516016909920761 |
409 | | 6512bd | 3456.90978006959374062 |
410 | | c4ca42 | 3304.8803260033048418 |
411 | | aab323 | 2575.5841125416951715 |
412 | | e4da3b | 2569.00971179191322977 |
413 | | eccbc8 | 2539.30606164202618889 |
414 | | 8f14e4 | 2201.3329709306902409 |
415 | | c81e72 | 1626.83585271140723711 |
416 | | c51ce4 | 1551.6863771068051730 |
417 |
418 |
419 | ### Question 10
420 |
421 | > Which `member_id` has the highest buy to sell ratio by quantity?
422 |
423 |
424 | Click here to reveal the solution!
425 |
426 |
427 | ```sql
428 | SELECT
429 | member_id,
430 | SUM(CASE WHEN txn_type = 'BUY' THEN quantity ELSE 0 END) /
431 | SUM(CASE WHEN txn_type = 'SELL' THEN quantity ELSE 0 END) AS buy_to_sell_ratio
432 | FROM trading.transactions
433 | GROUP BY member_id
434 | ORDER BY buy_to_sell_ratio DESC;
435 | ```
436 |
437 |
438 |
439 |
440 | | member_id | buy_to_sell_ratio |
441 | | --------- | -------------------- |
442 | | 45c48c | 19.91269871111331881 |
443 | | a87ff6 | 7.486010484765204502 |
444 | | c9f0f8 | 6.2499141870956191 |
445 | | 8f14e4 | 5.30005322455443465 |
446 | | eccbc8 | 4.92850232946761881 |
447 | | c20ad4 | 4.65209743522270350 |
448 | | 167909 | 4.60147388258864127 |
449 | | aab323 | 4.55272149176427243 |
450 | | 6512bd | 4.52509140656952666 |
451 | | c81e72 | 4.37533523692905634 |
452 | | c4ca42 | 4.2628218979753569 |
453 | | e4da3b | 3.55762611425005570 |
454 | | d3d944 | 3.35445896964968774 |
455 | | c51ce4 | 2.3630130420937542 |
456 |
457 |
458 | ### Question 11
459 |
460 | > For each `member_id` - which month had the highest total Ethereum quantity sold`?
461 |
462 |
463 | Click here to reveal the solution!
464 |
465 |
466 | ```sql
467 | WITH cte_ranked AS (
468 | SELECT
469 | member_id,
470 | DATE_TRUNC('MON', txn_date)::DATE AS calendar_month,
471 | SUM(quantity) AS sold_eth_quantity,
472 | RANK() OVER (PARTITION BY member_id ORDER BY SUM(quantity) DESC) AS month_rank
473 | FROM trading.transactions
474 | WHERE ticker = 'ETH' AND txn_type = 'SELL'
475 | GROUP BY member_id, calendar_month
476 | )
477 | SELECT
478 | member_id,
479 | calendar_month,
480 | sold_eth_quantity
481 | FROM cte_ranked
482 | WHERE month_rank = 1
483 | ORDER BY sold_eth_quantity DESC;
484 | ```
485 |
486 |
487 |
488 |
489 | | member_id | calendar_month | sold_eth_quantity |
490 | | --------- | -------------- | ------------------- |
491 | | c51ce4 | 2017-05-01 | 66.092440429535028 |
492 | | d3d944 | 2020-04-01 | 60.417369973983352 |
493 | | 6512bd | 2018-05-01 | 47.93285714951591 |
494 | | 167909 | 2020-12-01 | 45.92423664055218 |
495 | | c81e72 | 2018-08-01 | 41.26728177476413 |
496 | | aab323 | 2018-09-01 | 41.1750763370983665 |
497 | | c4ca42 | 2021-04-01 | 40.113474724022574 |
498 | | c20ad4 | 2017-12-01 | 37.71908498970638 |
499 | | eccbc8 | 2021-03-01 | 36.485934922948275 |
500 | | 8f14e4 | 2017-07-01 | 36.17383743681231 |
501 | | e4da3b | 2019-01-01 | 30.48442641077632 |
502 | | a87ff6 | 2020-07-01 | 27.28919836842423 |
503 | | 45c48c | 2020-01-01 | 20.21523406425370 |
504 | | c9f0f8 | 2020-11-01 | 15.931855129247867 |
505 |
506 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step3.md)
507 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step5.md)
--------------------------------------------------------------------------------
/course-content/step5.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 5 - Let the Data Analysis Begin!
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step4.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step6.md)
14 |
15 | Now that we've explored all 3 of our tables - let's try to first visualize how each of the tables are joined onto eachother using an Entity Relationship Diagram or ERD for short!
16 |
17 | ## What is an ERD?
18 |
19 | ERDs are very useful to visualize the relationships between columns in tables - especially when it comes to combining them together using tables joins (something we'll cover in this current tutorial)
20 |
21 | Below you will see the ERD for our current case study - the most important thing is to notice how all of the columns relate to one another.
22 |
23 | 
24 |
25 | # Realistic Analytics
26 |
27 | Even though we have been exploring our datasets and exploring a few of the basic SQL concepts required for data analysis - we have yet to combine our SQL queries into a single focused analytical process to solve a larger problem. This is our opportunity to try this now!
28 |
29 | Let's say that we wish to analyse our overall portfolio performance and also each member's performance based off all the data we have in our 3 tables.
30 |
31 | ## Analyse the Ranges
32 |
33 | Firstly - let's see what is the range of data we have to play with!
34 |
35 | ### Question 1
36 |
37 | > What is the earliest and latest date of transactions for all members?
38 |
39 | Click here to reveal the solution!
40 |
41 | ```sql
42 | SELECT
43 | MIN(txn_date) AS earliest_date,
44 | MAX(txn_date) AS latest_date
45 | FROM trading.transactions;
46 | ```
47 |
48 |
49 |
50 | | earliest_date | latest_date |
51 | | ------------- | ----------- |
52 | | 2017-01-01 | 2021-08-27 |
53 |
54 | ### Question 2
55 |
56 | > What is the range of `market_date` values available in the prices data?
57 |
58 | Click here to reveal the solution!
59 |
60 | ```sql
61 | SELECT
62 | MIN(market_date) AS earliest_date,
63 | MAX(market_date) AS latest_date
64 | FROM trading.prices;
65 | ```
66 |
67 |
68 |
69 | | earliest_date | latest_date |
70 | | ------------- | ----------- |
71 | | 2017-01-01 | 2021-08-29 |
72 |
73 |
74 | ## Joining our Datasets
75 |
76 | Now that we now our date ranges are from January 2017 through to almost the end of August 2021 for both our prices and transactions datasets - we can now get started on joining these two tables together!
77 |
78 | Let's make use of our ERD shown above to combine the `trading.transactions` table and the `trading.members` table to answer a few simple questions about our mentors!
79 |
80 | ### Question 3
81 |
82 | > Which top 3 mentors have the most Bitcoin quantity as of the 29th of August?
83 |
84 | Click here to reveal the solution!
85 |
86 | ```sql
87 | SELECT
88 | members.first_name,
89 | SUM(
90 | CASE
91 | WHEN transactions.txn_type = 'BUY' THEN transactions.quantity
92 | WHEN transactions.txn_type = 'SELL' THEN -transactions.quantity
93 | END
94 | ) AS total_quantity
95 | FROM trading.transactions
96 | INNER JOIN trading.members
97 | ON transactions.member_id = members.member_id
98 | WHERE ticker = 'BTC'
99 | GROUP BY members.first_name
100 | ORDER BY total_quantity DESC
101 | LIMIT 3;
102 | ```
103 |
104 |
105 |
106 | | first_name | total_quantity |
107 | | ---------- | ----------------------- |
108 | | Nandita | 4160.219869506641749735 |
109 | | Leah | 4046.09089667256706404 |
110 | | Ayush | 3945.19808326050497234 |
111 |
112 |
113 | ## Calculating Portfolio Value
114 |
115 | Now let's combine all 3 tables together using only strictly `INNER JOIN` so we can utilise all of our datasets together.
116 |
117 | ### Question 4
118 |
119 | > What is total value of all Ethereum portfolios for each region at the end date of our analysis? Order the output by descending portfolio value
120 |
121 | Click here to reveal the solution!
122 |
123 | ```sql
124 | WITH cte_latest_price AS (
125 | SELECT
126 | ticker,
127 | price
128 | FROM trading.prices
129 | WHERE ticker = 'ETH'
130 | AND market_date = '2021-08-29'
131 | )
132 | SELECT
133 | members.region,
134 | SUM(
135 | CASE
136 | WHEN transactions.txn_type = 'BUY' THEN transactions.quantity
137 | WHEN transactions.txn_type = 'SELL' THEN -transactions.quantity
138 | END
139 | ) * cte_latest_price.price AS ethereum_value,
140 | AVG(
141 | CASE
142 | WHEN transactions.txn_type = 'BUY' THEN transactions.quantity
143 | WHEN transactions.txn_type = 'SELL' THEN -transactions.quantity
144 | END
145 | ) * cte_latest_price.price AS avg_ethereum_value
146 | FROM trading.transactions
147 | INNER JOIN cte_latest_price
148 | ON transactions.ticker = cte_latest_price.ticker
149 | INNER JOIN trading.members
150 | ON transactions.member_id = members.member_id
151 | WHERE transactions.ticker = 'ETH'
152 | GROUP BY members.region, cte_latest_price.price
153 | ORDER BY avg_ethereum_value DESC;
154 | ```
155 |
156 |
157 |
158 | | region | ethereum_value | avg_ethereum_value |
159 | | ------------- | ---------------------------- | ------------------------- |
160 | | Australia | 40076021.0922707343527642712 | 10752.8900167080049298064 |
161 | | United States | 50688412.2772532532882719016 | 10549.0972481276281626456 |
162 | | Asia | 5011670.9776990206825808176 | 8933.4598532959370421432 |
163 | | India | 6276426.482786365114210656 | 8036.397545181005116104 |
164 | | Africa | 2183933.3382704268238606128 | 3899.8809611971907658600 |
165 |
166 |
167 | ### Question 5
168 |
169 | > What is the average value of each Ethereum portfolio in each region? Sort this output in descending order
170 |
171 | Click here to reveal the solution!
172 |
173 | ```sql
174 | WITH cte_latest_price AS (
175 | SELECT
176 | ticker,
177 | price
178 | FROM trading.prices
179 | WHERE ticker = 'ETH'
180 | AND market_date = '2021-08-29'
181 | )
182 | SELECT
183 | members.region,
184 | AVG(
185 | CASE
186 | WHEN transactions.txn_type = 'BUY' THEN transactions.quantity
187 | WHEN transactions.txn_type = 'SELL' THEN -transactions.quantity
188 | END
189 | ) * cte_latest_price.price AS avg_ethereum_value
190 | FROM trading.transactions
191 | INNER JOIN cte_latest_price
192 | ON transactions.ticker = cte_latest_price.ticker
193 | INNER JOIN trading.members
194 | ON transactions.member_id = members.member_id
195 | WHERE transactions.ticker = 'ETH'
196 | GROUP BY members.region, cte_latest_price.price
197 | ORDER BY avg_ethereum_value DESC;
198 | ```
199 |
200 |
201 |
202 | | region | avg_ethereum_value |
203 | | ------------- | ------------------------- |
204 | | Australia | 10752.8900167080049298064 |
205 | | United States | 10549.0972481276281626456 |
206 | | Asia | 8933.4598532959370421432 |
207 | | India | 8036.397545181005116104 |
208 | | Africa | 3899.8809611971907658600 |
209 |
210 |
211 | Mmm hang on a second...does the output for the above query look correct to you?
212 |
213 | Let's try again - this time we will calculate the total sum of portfolio value and then manually divide it by the total number of mentors in each region!
214 |
215 | Click here to reveal the solution!
216 |
217 | ```sql
218 | WITH cte_latest_price AS (
219 | SELECT
220 | ticker,
221 | price
222 | FROM trading.prices
223 | WHERE ticker = 'ETH'
224 | AND market_date = '2021-08-29'
225 | ),
226 | cte_calculations AS (
227 | SELECT
228 | members.region,
229 | SUM(
230 | CASE
231 | WHEN transactions.txn_type = 'BUY' THEN transactions.quantity
232 | WHEN transactions.txn_type = 'SELL' THEN -transactions.quantity
233 | END
234 | ) * cte_latest_price.price AS ethereum_value,
235 | COUNT(DISTINCT members.member_id) AS mentor_count
236 | FROM trading.transactions
237 | INNER JOIN cte_latest_price
238 | ON transactions.ticker = cte_latest_price.ticker
239 | INNER JOIN trading.members
240 | ON transactions.member_id = members.member_id
241 | WHERE transactions.ticker = 'ETH'
242 | GROUP BY members.region, cte_latest_price.price
243 | )
244 | -- final output
245 | SELECT
246 | *,
247 | ethereum_value / mentor_count AS avg_ethereum_value
248 | FROM cte_calculations
249 | ORDER BY avg_ethereum_value DESC;
250 | ```
251 |
252 |
253 |
254 | | region | ethereum_value | mentor_count | avg_ethereum_value |
255 | | ------------- | ---------------------------- | ------------ | ---------------------------- |
256 | | Australia | 40076021.0922707343527642712 | 4 | 10019005.2730676835881910678 |
257 | | United States | 50688412.2772532532882719016 | 7 | 7241201.7538933218983245574 |
258 | | India | 6276426.482786365114210656 | 1 | 6276426.482786365114210656 |
259 | | Asia | 5011670.9776990206825808176 | 1 | 5011670.9776990206825808176 |
260 | | Africa | 2183933.3382704268238606128 | 1 | 2183933.3382704268238606128 |
261 |
262 |
263 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step4.md)
264 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step6.md)
--------------------------------------------------------------------------------
/course-content/step6.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 6 - Planning Ahead for Data Analysis
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step5.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step7.md)
14 |
15 | # Planning Ahead
16 |
17 | Sometimes when creating SQL queries - we can jump to the initial problem at hand, but what happens when we stop and plan through our approach to a multi-part problem?
18 |
19 | ## Further Portfolio Questions
20 |
21 | Let's take this next series of questions and methodically break down our approach before we reveal the answers.
22 |
23 | **Questions 1-4**
24 |
25 | > What is the total portfolio value for each mentor at the end of 2020?
26 |
27 | > What is the total portfolio value for each region at the end of 2019?
28 |
29 | > What percentage of regional portfolio values does each mentor contribute at the end of 2018?
30 |
31 | > Does this region contribution percentage change when we look across both Bitcoin and Ethereum portfolios independently at the end of 2017?
32 |
33 | We can see that most questions are based off total portfolio value apart from the final question - which requires both tickers to be separated.
34 |
35 | Additionally, the `region` value for each mentor is going to be important also for both questions 3 and 4.
36 |
37 | We also need to factor in the timing aspect for these questions - it's not going to be as straightforward as our previous questions which required only the final portfolio value.
38 |
39 | For these questions - let's first create a base table which we can refer to later to solve our problems!
40 |
41 | ## Create a Base Table
42 |
43 | We can make use of a `TEMP` table which is stored in a temporary schema which will disappear once the SQL session is closed down - this is very useful in practice because you don't always have write access to production databases all the time!
44 |
45 | First let's create a portfolio quantity base table which summarizes our data with the required data first.
46 |
47 | ### Step 1
48 |
49 | > Create a base table that has each mentor's name, `region` and end of year total quantity for each ticker
50 |
51 | **You must run the query below in order to run all following queries in this tutorial!**
52 |
53 | Click here to reveal the solution!
54 |
55 |
56 | ```sql
57 | DROP TABLE IF EXISTS temp_portfolio_base;
58 | CREATE TEMP TABLE temp_portfolio_base AS
59 | WITH cte_joined_data AS (
60 | SELECT
61 | members.first_name,
62 | members.region,
63 | transactions.txn_date,
64 | transactions.ticker,
65 | CASE
66 | WHEN transactions.txn_type = 'SELL' THEN -transactions.quantity
67 | ELSE transactions.quantity
68 | END AS adjusted_quantity
69 | FROM trading.transactions
70 | INNER JOIN trading.members
71 | ON transactions.member_id = members.member_id
72 | WHERE transactions.txn_date <= '2020-12-31'
73 | )
74 | SELECT
75 | first_name,
76 | region,
77 | (DATE_TRUNC('YEAR', txn_date) + INTERVAL '12 MONTHS' - INTERVAL '1 DAY')::DATE AS year_end,
78 | ticker,
79 | SUM(adjusted_quantity) AS yearly_quantity
80 | FROM cte_joined_data
81 | GROUP BY first_name, region, year_end, ticker;
82 | ```
83 |
84 |
85 |
86 | ### Step 2
87 |
88 | Let's take a look at our base table now to see what data we have to play with - to keep things simple, let's take a look at Abe's data from our new temp table `temp_portfolio_base`
89 |
90 | > Inspect the `year_end`, `ticker` and `yearly_quantity` values from our new temp table `temp_portfolio_base` for Mentor Abe only. Sort the output with ordered BTC values followed by ETH values
91 |
92 | Click here to reveal the solution!
93 |
94 |
95 | ```sql
96 | SELECT
97 | year_end,
98 | ticker,
99 | yearly_quantity
100 | FROM temp_portfolio_base
101 | WHERE first_name = 'Abe'
102 | ORDER BY ticker, year_end;
103 | ```
104 |
105 |
106 |
107 | | year_end | ticker | yearly_quantity |
108 | | ---------- | ------ | -------------------- |
109 | | 2017-12-31 | BTC | 861.0106371411443039 |
110 | | 2018-12-31 | BTC | 755.1495855476883388 |
111 | | 2019-12-31 | BTC | 765.655338171040942 |
112 | | 2020-12-31 | BTC | 859.3718776810842491 |
113 | | 2017-12-31 | ETH | 543.2120486925716504 |
114 | | 2018-12-31 | ETH | 350.000100283493089 |
115 | | 2019-12-31 | ETH | 464.317705594980087 |
116 | | 2020-12-31 | ETH | 508.4673343549910666 |
117 |
118 |
119 | We can see from the output above that the yearly quantity is exactly the total portfolio quantity values that we need - we will need to create a cumulative sum of the `yearly_quantity` column that is separate for each mentor and ticker, using the `year_end` as the ordering column.
120 |
121 | We can do exactly this using a SQL window function!
122 |
123 | ### Step 3
124 |
125 | To create the cumulative sum - we'll need to apply a window function!
126 |
127 | Although we will only touch on this briefly in this course - the complete Data With Danny [Serious SQL course](https://www.datawithdanny.com/courses/serious-sql) covers this topic and many other SQL concepts in a lot of depth!
128 |
129 | > Create a cumulative sum for Abe which has an independent value for each ticker
130 |
131 | Click here to reveal the solution!
132 |
133 | ```sql
134 | SELECT
135 | year_end,
136 | ticker,
137 | yearly_quantity,
138 | /* this is a multi-line comment!
139 | for this case we don't actually need first_name
140 | but we include it anyway to prepare for the next query! */
141 | SUM(yearly_quantity) OVER (
142 | PARTITION BY first_name, ticker
143 | ORDER BY year_end
144 | ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
145 | ) AS cumulative_quantity
146 | FROM temp_portfolio_base
147 | WHERE first_name = 'Abe'
148 | ORDER BY ticker, year_end;
149 | ```
150 |
151 |
152 |
153 |
154 | | year_end | ticker | yearly_quantity | cumulative_quantity |
155 | | ---------- | ------ | -------------------- | --------------------- |
156 | | 2017-12-31 | BTC | 861.0106371411443039 | 861.0106371411443039 |
157 | | 2018-12-31 | BTC | 755.1495855476883388 | 1616.1602226888326427 |
158 | | 2019-12-31 | BTC | 765.655338171040942 | 2381.8155608598735847 |
159 | | 2020-12-31 | BTC | 859.3718776810842491 | 3241.1874385409578338 |
160 | | 2017-12-31 | ETH | 543.2120486925716504 | 543.2120486925716504 |
161 | | 2018-12-31 | ETH | 350.000100283493089 | 893.2121489760647394 |
162 | | 2019-12-31 | ETH | 464.317705594980087 | 1357.5298545710448264 |
163 | | 2020-12-31 | ETH | 508.4673343549910666 | 1865.9971889260358930 |
164 |
165 |
166 | ### Step 4
167 |
168 | Now let's apply our same window function to the entire temporary dataset and start answering our questions.
169 |
170 | We can actually `ALTER` and `UPDATE` our temp table to add in an extra column with our new calculation
171 |
172 | > Generate an additional `cumulative_quantity` column for the `temp_portfolio_base` temp table
173 |
174 | Click here to reveal the solution!
175 |
176 | ```sql
177 | -- add a column called cumulative_quantity
178 | ALTER TABLE temp_portfolio_base
179 | ADD cumulative_quantity NUMERIC;
180 |
181 | -- update new column with data
182 | UPDATE temp_portfolio_base
183 | SET (cumulative_quantity) = (
184 | SELECT
185 | SUM(yearly_quantity) OVER (
186 | PARTITION BY first_name, ticker
187 | ORDER BY year_end
188 | ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
189 | )
190 | );
191 | ```
192 |
193 |
194 |
195 | Now let's check that our updates to the temp table worked by inspecting Abe's records again!
196 |
197 | Click here to reveal the solution!
198 |
199 | ```sql
200 | -- query the updated table to check rows for Abe
201 | SELECT
202 | year_end,
203 | ticker,
204 | yearly_quantity,
205 | cumulative_quantity
206 | FROM temp_portfolio_base
207 | WHERE first_name = 'Abe'
208 | ORDER BY ticker, year_end;
209 | ```
210 |
211 |
212 |
213 | | year_end | ticker | yearly_quantity | cumulative_quantity |
214 | | ---------- | ------ | -------------------- | -------------------- |
215 | | 2017-01-01 | BTC | 861.0106371411443039 | 861.0106371411443039 |
216 | | 2018-01-01 | BTC | 755.1495855476883388 | 755.1495855476883388 |
217 | | 2019-01-01 | BTC | 765.655338171040942 | 765.655338171040942 |
218 | | 2020-01-01 | BTC | 859.3718776810842491 | 859.3718776810842491 |
219 | | 2021-01-01 | BTC | 479.3287662131302019 | 479.3287662131302019 |
220 | | 2017-01-01 | ETH | 543.2120486925716504 | 543.2120486925716504 |
221 | | 2018-01-01 | ETH | 350.000100283493089 | 350.000100283493089 |
222 | | 2019-01-01 | ETH | 464.317705594980087 | 464.317705594980087 |
223 | | 2020-01-01 | ETH | 508.4673343549910666 | 508.4673343549910666 |
224 | | 2021-01-01 | ETH | 223.204709336221616 | 223.204709336221616 |
225 |
226 |
227 | Wait a moment....it didn't work - the cumulative and the yearly quantity is exactly the same!
228 |
229 | This is because our `UPDATE` step only takes into account a single row at a time, which is exactly what we must not do with our window functions!
230 |
231 | We will need to create an additional temp table with our cumulative sum instead!
232 |
233 | **You must run this step for all following queries to work!**
234 |
235 | Click here to reveal the solution!
236 |
237 | ```sql
238 | DROP TABLE IF EXISTS temp_cumulative_portfolio_base;
239 | CREATE TEMP TABLE temp_cumulative_portfolio_base AS
240 | SELECT
241 | first_name,
242 | region,
243 | year_end,
244 | ticker,
245 | yearly_quantity,
246 | SUM(yearly_quantity) OVER (
247 | PARTITION BY first_name, ticker
248 | ORDER BY year_end
249 | ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
250 | ) AS cumulative_quantity
251 | FROM temp_portfolio_base;
252 | ```
253 |
254 |
255 |
256 | You can make sure this step is ran by checking the outputs from this query:
257 |
258 | ```sql
259 | SELECT * FROM temp_cumulative_portfolio_base LIMIT 20;
260 | ```
261 |
262 | | first_name | region | year_end | ticker | yearly_quantity | cumulative_quantity |
263 | | ---------- | ------------- | ---------- | ------ | --------------------- | ---------------------- |
264 | | Abe | United States | 2017-12-31 | BTC | 861.0106371411443039 | 861.0106371411443039 |
265 | | Abe | United States | 2018-12-31 | BTC | 755.1495855476883388 | 1616.1602226888326427 |
266 | | Abe | United States | 2019-12-31 | BTC | 765.655338171040942 | 2381.8155608598735847 |
267 | | Abe | United States | 2020-12-31 | BTC | 859.3718776810842491 | 3241.1874385409578338 |
268 | | Abe | United States | 2017-12-31 | ETH | 543.2120486925716504 | 543.2120486925716504 |
269 | | Abe | United States | 2018-12-31 | ETH | 350.000100283493089 | 893.2121489760647394 |
270 | | Abe | United States | 2019-12-31 | ETH | 464.317705594980087 | 1357.5298545710448264 |
271 | | Abe | United States | 2020-12-31 | ETH | 508.4673343549910666 | 1865.9971889260358930 |
272 | | Alex | United States | 2017-12-31 | BTC | 453.4749593742834454 | 453.4749593742834454 |
273 | | Alex | United States | 2018-12-31 | BTC | 447.1910423241274346 | 900.6660016984108800 |
274 | | Alex | United States | 2019-12-31 | BTC | 490.959718475924108 | 1391.6257201743349880 |
275 | | Alex | United States | 2020-12-31 | BTC | 444.259179847377622 | 1835.8849000217126100 |
276 | | Alex | United States | 2017-12-31 | ETH | 678.274023865761511 | 678.274023865761511 |
277 | | Alex | United States | 2018-12-31 | ETH | 546.83620089990823574 | 1225.11022476566974674 |
278 | | Alex | United States | 2019-12-31 | ETH | 476.8692888907140746 | 1701.97951365638382134 |
279 | | Alex | United States | 2020-12-31 | ETH | 621.5582550264365449 | 2323.53776868282036624 |
280 | | Ayush | United States | 2017-12-31 | BTC | 794.5344541497821383 | 794.5344541497821383 |
281 | | Ayush | United States | 2018-12-31 | BTC | 955.3494738695000753 | 1749.8839280192822136 |
282 | | Ayush | United States | 2019-12-31 | BTC | 743.2345666894748266 | 2493.1184947087570402 |
283 | | Ayush | United States | 2020-12-31 | BTC | 954.85594846498402504 | 3447.97444317374106524 |
284 |
285 |
286 | Now that we've obtained our base table properly - let's start answering some of these questions!
287 |
288 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step5.md)
289 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step7.md)
--------------------------------------------------------------------------------
/course-content/step7.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 7 - Answering Data Questions
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step6.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step8.md)
14 |
15 | # Answering the Questions
16 |
17 | Before running any of the solution queries below for this tutorial's questions - you can run the entire SQL prep queries directly below!
18 |
19 | Click here to reveal the entire data prep SQL script!
20 |
21 | ```sql
22 | DROP TABLE IF EXISTS temp_portfolio_base;
23 | CREATE TEMP TABLE temp_portfolio_base AS
24 | WITH cte_joined_data AS (
25 | SELECT
26 | members.first_name,
27 | members.region,
28 | transactions.txn_date,
29 | transactions.ticker,
30 | CASE
31 | WHEN transactions.txn_type = 'SELL' THEN -transactions.quantity
32 | ELSE transactions.quantity
33 | END AS adjusted_quantity
34 | FROM trading.transactions
35 | INNER JOIN trading.members
36 | ON transactions.member_id = members.member_id
37 | WHERE transactions.txn_date <= '2020-12-31'
38 | )
39 | SELECT
40 | first_name,
41 | region,
42 | (DATE_TRUNC('YEAR', txn_date) + INTERVAL '12 MONTHS' - INTERVAL '1 DAY')::DATE AS year_end,
43 | ticker,
44 | SUM(adjusted_quantity) AS yearly_quantity
45 | FROM cte_joined_data
46 | GROUP BY first_name, region, year_end, ticker;
47 |
48 | DROP TABLE IF EXISTS temp_cumulative_portfolio_base;
49 | CREATE TEMP TABLE temp_cumulative_portfolio_base AS
50 | SELECT
51 | first_name,
52 | region,
53 | year_end,
54 | ticker,
55 | yearly_quantity,
56 | SUM(yearly_quantity) OVER (
57 | PARTITION BY first_name, ticker
58 | ORDER BY year_end
59 | ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
60 | ) AS cumulative_quantity
61 | FROM temp_portfolio_base;
62 | ```
63 |
64 |
65 |
66 | Here is the ERD diagram below just in case you forgot about how all the tables are joined together!
67 |
68 | 
69 |
70 | ## Question 1
71 |
72 | > What is the total portfolio value for each mentor at the end of 2020?
73 |
74 | We can now inner join our `trading.prices` table (I hope you haven't forgot about this one yet!) to our new temp table `temp_cumulative_portfolio_base`
75 |
76 | Let's also order our results by highest portfolio value to lowest rounded to 2 decimal places.
77 |
78 | Click here to reveal the solution!
79 |
80 | ```sql
81 | SELECT
82 | base.first_name,
83 | ROUND(
84 | SUM(base.cumulative_quantity * prices.price),
85 | 2
86 | ) AS portfolio_value
87 | FROM temp_cumulative_portfolio_base AS base
88 | INNER JOIN trading.prices
89 | ON base.ticker = prices.ticker
90 | AND base.year_end = prices.market_date
91 | WHERE base.year_end = '2020-12-31'
92 | GROUP BY base.first_name
93 | ORDER BY portfolio_value DESC;
94 | ```
95 |
96 |
97 |
98 | | first_name | portfolio_value |
99 | | ---------- | --------------- |
100 | | Nandita | 105391731.03 |
101 | | Leah | 100724284.81 |
102 | | Ayush | 100071736.51 |
103 | | Abe | 95203693.60 |
104 | | Ben | 92722998.25 |
105 | | Enoch | 88346609.78 |
106 | | Vikram | 88000702.69 |
107 | | Danny | 84696448.83 |
108 | | Sonia | 67932183.73 |
109 | | Rowan | 67241367.69 |
110 | | Charlie | 66761769.49 |
111 | | Alex | 54857750.71 |
112 | | Vipul | 43123911.62 |
113 | | Pavan | 41764439.70 |
114 |
115 |
116 | ## Question 2
117 |
118 | > What is the total portfolio value for each region at the end of 2019?
119 |
120 | Let's also perform the same ordering and rounding for this query too.
121 |
122 | Click here to reveal the solution!
123 |
124 | ```sql
125 | SELECT
126 | base.region,
127 | ROUND(
128 | SUM(base.cumulative_quantity * prices.price),
129 | 2
130 | ) AS portfolio_value
131 | FROM temp_cumulative_portfolio_base AS base
132 | INNER JOIN trading.prices
133 | ON base.ticker = prices.ticker
134 | AND base.year_end = prices.market_date
135 | WHERE base.year_end = '2019-12-31'
136 | GROUP BY base.region
137 | ORDER BY portfolio_value DESC;
138 | ```
139 |
140 |
141 |
142 | | region | portfolio_value |
143 | | ------------- | --------------- |
144 | | United States | 98795362.60 |
145 | | Australia | 52861452.00 |
146 | | Asia | 18305366.24 |
147 | | India | 16168831.76 |
148 | | Africa | 16078915.65 |
149 |
150 |
151 | ## Question 3
152 |
153 | > What percentage of regional portfolio values does each mentor contribute at the end of 2018?
154 |
155 | Let's make our percentages between 0 and 100, rounded to decimal places - order the output with the regions with highest portfolio value first and then by descending mentor contributions within each group.
156 |
157 | Click here to reveal the solution!
158 |
159 | ```sql
160 | WITH cte_mentor_portfolio AS (
161 | SELECT
162 | base.region,
163 | base.first_name,
164 | ROUND(
165 | SUM(base.cumulative_quantity * prices.price),
166 | 2
167 | ) AS portfolio_value
168 | FROM temp_cumulative_portfolio_base AS base
169 | INNER JOIN trading.prices
170 | ON base.ticker = prices.ticker
171 | AND base.year_end = prices.market_date
172 | WHERE base.year_end = '2018-12-31'
173 | GROUP BY base.first_name, base.region
174 | ),
175 | cte_region_portfolio AS (
176 | SELECT
177 | region,
178 | first_name,
179 | portfolio_value,
180 | SUM(portfolio_value) OVER (PARTITION BY region) AS region_total
181 | FROM cte_mentor_portfolio
182 | )
183 | -- final output
184 | SELECT
185 | region,
186 | first_name,
187 | ROUND(100 * portfolio_value / region_total, 2) AS contribution_percentage
188 | FROM cte_region_portfolio
189 | ORDER BY region_total DESC, contribution_percentage DESC;
190 | ```
191 |
192 |
193 |
194 | | region | first_name | contribution_percentage |
195 | | ------------- | ---------- | ----------------------- |
196 | | United States | Nandita | 20.00 |
197 | | United States | Ayush | 18.61 |
198 | | United States | Abe | 17.45 |
199 | | United States | Rowan | 14.34 |
200 | | United States | Charlie | 12.52 |
201 | | United States | Alex | 10.00 |
202 | | United States | Vipul | 7.08 |
203 | | Australia | Danny | 31.55 |
204 | | Australia | Ben | 30.41 |
205 | | Australia | Sonia | 24.53 |
206 | | Australia | Pavan | 13.51 |
207 | | Asia | Leah | 100.00 |
208 | | Africa | Enoch | 100.00 |
209 | | India | Vikram | 100.00 |
210 |
211 |
212 | ## Question 4
213 |
214 | > Does this region contribution percentage change when we look across both Bitcoin and Ethereum portfolios independently at the end of 2017?
215 |
216 | We can use a similar approach to question 3 - but we will need to avoid the first level of aggregation.
217 |
218 | Click here to reveal the solution!
219 |
220 | ```sql
221 | WITH cte_mentor_portfolio AS (
222 | SELECT
223 | base.region,
224 | base.first_name,
225 | base.ticker,
226 | base.cumulative_quantity * prices.price AS portfolio_value
227 | FROM temp_cumulative_portfolio_base AS base
228 | INNER JOIN trading.prices
229 | ON base.ticker = prices.ticker
230 | AND base.year_end = prices.market_date
231 | WHERE base.year_end = '2017-12-31'
232 | ),
233 | cte_region_portfolio AS (
234 | SELECT
235 | region,
236 | first_name,
237 | ticker,
238 | portfolio_value,
239 | SUM(portfolio_value) OVER (
240 | PARTITION BY region, ticker
241 | ) AS region_total
242 | FROM cte_mentor_portfolio
243 | )
244 | -- final output
245 | SELECT
246 | region,
247 | first_name,
248 | ticker,
249 | ROUND(100 * portfolio_value / region_total, 2) AS contribution_percentage
250 | FROM cte_region_portfolio
251 | ORDER BY ticker, region, contribution_percentage DESC;
252 | ```
253 |
254 |
255 |
256 | | region | first_name | ticker | contribution_percentage |
257 | | ------------- | ---------- | ------ | ----------------------- |
258 | | Africa | Enoch | BTC | 100.00 |
259 | | Asia | Leah | BTC | 100.00 |
260 | | Australia | Ben | BTC | 32.96 |
261 | | Australia | Danny | BTC | 29.99 |
262 | | Australia | Sonia | BTC | 21.86 |
263 | | Australia | Pavan | BTC | 15.19 |
264 | | India | Vikram | BTC | 100.00 |
265 | | United States | Nandita | BTC | 20.99 |
266 | | United States | Abe | BTC | 17.69 |
267 | | United States | Ayush | BTC | 16.32 |
268 | | United States | Rowan | BTC | 14.65 |
269 | | United States | Charlie | BTC | 12.13 |
270 | | United States | Alex | BTC | 9.32 |
271 | | United States | Vipul | BTC | 8.91 |
272 | | Africa | Enoch | ETH | 100.00 |
273 | | Asia | Leah | ETH | 100.00 |
274 | | Australia | Ben | ETH | 34.34 |
275 | | Australia | Danny | ETH | 30.06 |
276 | | Australia | Sonia | ETH | 29.93 |
277 | | Australia | Pavan | ETH | 5.66 |
278 | | India | Vikram | ETH | 100.00 |
279 | | United States | Nandita | ETH | 21.21 |
280 | | United States | Vipul | ETH | 20.91 |
281 | | United States | Alex | ETH | 18.18 |
282 | | United States | Abe | ETH | 14.56 |
283 | | United States | Charlie | ETH | 12.36 |
284 | | United States | Rowan | ETH | 10.11 |
285 | | United States | Ayush | ETH | 2.66 |
286 |
287 |
288 | ## Bonus Question 5
289 |
290 | > Calculate the ranks for each mentor in the US and Australia for each year and ticker
291 |
292 | The final output we wish to generate looks like this:
293 |
294 | | region | first_name | BTC 2017 | BTC 2018 | BTC 2019 | BTC 2020 | ETH 2017 | ETH 2018 | ETH 2019 | ETH 2020 |
295 | | ------------- | ---------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
296 | | Australia | Ben | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 |
297 | | Australia | Danny | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 |
298 | | Australia | Sonia | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
299 | | Australia | Pavan | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
300 | | United States | Nandita | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 |
301 | | United States | Abe | 2 | 3 | 3 | 3 | 4 | 4 | 4 | 4 |
302 | | United States | Ayush | 3 | 2 | 2 | 2 | 7 | 7 | 7 | 7 |
303 | | United States | Rowan | 4 | 4 | 4 | 4 | 6 | 6 | 6 | 6 |
304 | | United States | Charlie | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
305 | | United States | Alex | 6 | 6 | 6 | 6 | 3 | 3 | 3 | 3 |
306 | | United States | Vipul | 7 | 7 | 7 | 7 | 2 | 1 | 1 | 1 |
307 |
308 |
309 | Our first step is to try and create a long table first with all of our ranks for each ticker and year end.
310 |
311 | Click here to reveal the solution!
312 |
313 | ```sql
314 | SELECT
315 | year_end,
316 | region,
317 | first_name,
318 | ticker,
319 | RANK() OVER (
320 | PARTITION BY region, year_end
321 | ORDER BY cumulative_quantity DESC
322 | ) AS ranking
323 | FROM temp_cumulative_portfolio_base
324 | WHERE region IN ('United States', 'Australia')
325 | ORDER BY year_end, region, ranking;
326 | ```
327 |
328 |
329 |
330 | Let's now pivote this long table to a slightly easier to read wide table
331 |
332 | Click here to reveal the solution!
333 |
334 | ```sql
335 | WITH cte_ranks AS (
336 | SELECT
337 | year_end,
338 | region,
339 | first_name,
340 | ticker,
341 | RANK() OVER (
342 | PARTITION BY region, year_end, ticker
343 | ORDER BY cumulative_quantity DESC
344 | ) AS ranking
345 | FROM temp_cumulative_portfolio_base
346 | WHERE region IN ('United States', 'Australia')
347 | )
348 | SELECT
349 | region,
350 | first_name,
351 | MAX(CASE WHEN ticker = 'BTC' AND year_end = '2017-12-31' THEN ranking ELSE NULL END) AS "BTC 2017",
352 | MAX(CASE WHEN ticker = 'BTC' AND year_end = '2018-12-31' THEN ranking ELSE NULL END) AS "BTC 2018",
353 | MAX(CASE WHEN ticker = 'BTC' AND year_end = '2019-12-31' THEN ranking ELSE NULL END) AS "BTC 2019",
354 | MAX(CASE WHEN ticker = 'BTC' AND year_end = '2020-12-31' THEN ranking ELSE NULL END) AS "BTC 2020",
355 | MAX(CASE WHEN ticker = 'ETH' AND year_end = '2017-12-31' THEN ranking ELSE NULL END) AS "ETH 2017",
356 | MAX(CASE WHEN ticker = 'ETH' AND year_end = '2018-12-31' THEN ranking ELSE NULL END) AS "ETH 2018",
357 | MAX(CASE WHEN ticker = 'ETH' AND year_end = '2019-12-31' THEN ranking ELSE NULL END) AS "ETH 2019",
358 | MAX(CASE WHEN ticker = 'ETH' AND year_end = '2020-12-31' THEN ranking ELSE NULL END) AS "ETH 2020"
359 | FROM cte_ranks
360 | GROUP BY region, first_name
361 | ORDER BY region, "BTC 2017";
362 | ```
363 |
364 |
365 |
366 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step6.md)
367 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step8.md)
--------------------------------------------------------------------------------
/course-content/step8.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 8 - The Final Case Study
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step7.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step9.md)
14 |
15 | Now that we've tackled a more complicated series of questions - let's take it to the next level!
16 |
17 | In these final questions - we will try to assess the profitability of our mentors and their crypto portfolios.
18 |
19 | It's not enough for our team of mentors to just hold a lot of crypto - we wish to see the various prices for every transaction that occured, calculate the total fee amounts and also calculate some final profit metrics.
20 |
21 | ## Simple Scenario Breakdown
22 |
23 | Often when faced with a difficult problem like we have right now - the first step is to try and break down our complex situation into a very simple scenarios to try and understand what we have to do!
24 |
25 | 1. Buy and hold (HODL Strategy)
26 | 2. Buy and keep buying (Bull Strategy)
27 | 3. Buy and sell (Trader Strategy)
28 |
29 | In the next 3 steps - we will cover each strategy by simplifying our existing `trading.transactions` table to answer more questions!
30 |
31 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step7.md)
32 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step9.md)
--------------------------------------------------------------------------------
/course-content/step9.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # Step 9 - Buy and Hold Analysis
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step8.md)
13 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step10.md)
14 |
15 | 
16 |
17 | Meet Leah - she is our mentor who will take the buy and hold strategy otherwise known as the "HODL strategy" or hold on for dear life!
18 |
19 | She is risk averse and just wants to leave her initial investment alone because she believes her original holdings will grow over time with low risk.
20 |
21 | ## Leah's Transaction History
22 |
23 | 1. She purchases 50 BTC and 50 ETH on Jan 1st 2017
24 | 2. She holds onto all of her portfolio and does not sell anything (HODL)
25 | 3. She also does not purchase any additional quantity of either crypto
26 | 4. By August 29th 2021 (the last date of our price data) - we can assess her individual performance
27 |
28 | > Remember that we are simplifying our problem at the moment so Leah's records will actually be different in the final `trading.transactions` dataset!
29 |
30 | ## The Data
31 |
32 | For this simplified scenario - we first need to create a new temp table called `leah_hodl_strategy` using the code below:
33 |
34 | ```sql
35 | CREATE TEMP TABLE leah_hodl_strategy AS
36 | SELECT * FROM trading.transactions
37 | WHERE member_id = 'c20ad4'
38 | AND txn_date = '2017-01-01'
39 | AND quantity = 50;
40 | ```
41 |
42 | You can inspect the data by running the following query after creating the temp table above:
43 |
44 | ```sql
45 | SELECT * FROM leah_hodl_strategy;
46 | ```
47 |
48 | | txn_id | member_id | ticker | txn_date | txn_type | quantity | percentage_fee | txn_time |
49 | | ------ | --------- | ------ | ---------- | -------- | -------- | -------------- | ------------------- |
50 | | 12 | c20ad4 | BTC | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
51 | | 26 | c20ad4 | ETH | 2017-01-01 | BUY | 50 | 0.30 | 2017-01-01 00:00:00 |
52 |
53 |
54 | ## Required Metrics
55 |
56 | For this basic scenario - we wish to calculate the following metrics:
57 |
58 | 1. The initial value of her original 50 BTC and 50 ETH purchases
59 | 2. The dollar amount of fees she paid for those 2 transactions
60 | 3. The final value of her portfolio on August 29th 2021
61 | 4. The profitability by dividing her final value by initial value
62 |
63 | ## Solutions
64 |
65 | ### Question 1 & 2
66 |
67 | We can calculate the first 2 questions using a single query
68 |
69 | > 1. The initial value of her original 50 BTC and 50 ETH purchases
70 | > 2. The dollar amount of fees she paid for those 2 transactions
71 |
72 | Click here to reveal the solution!
73 |
74 | ```sql
75 | SELECT
76 | SUM(transactions.quantity * prices.price) AS initial_value,
77 | SUM(transactions.quantity * prices.price * transactions.percentage_fee / 100) AS fees
78 | FROM leah_hodl_strategy AS transactions
79 | INNER JOIN trading.prices
80 | ON transactions.ticker = prices.ticker
81 | AND transactions.txn_date = prices.market_date;
82 | ```
83 |
84 |
85 |
86 | | initial_value | fees |
87 | | ------------- | -------------------- |
88 | | 50180.00 | 150.5400000000000000 |
89 |
90 |
91 | ### Question 3
92 |
93 | > The final value of her portfolio on August 29th 2021
94 |
95 | Click here to reveal the solution!
96 |
97 | ```sql
98 | SELECT
99 | SUM(transactions.quantity * prices.price) AS final_value
100 | FROM leah_hodl_strategy AS transactions
101 | INNER JOIN trading.prices
102 | ON transactions.ticker = prices.ticker
103 | WHERE prices.market_date = '2021-08-29';
104 | ```
105 |
106 |
107 |
108 | | final_value |
109 | | ----------- |
110 | | 2571642.00 |
111 |
112 | ### Question 4
113 |
114 | > Calculate the profitability by dividing Leah's final value by initial value
115 |
116 | We can actually do one better and combine all 4 metrics into a single query!
117 |
118 | Click here to reveal the solution!
119 |
120 | ```sql
121 | WITH cte_portfolio_values AS (
122 | SELECT
123 | -- initial metrics
124 | SUM(transactions.quantity * initial.price) AS initial_value,
125 | SUM(transactions.quantity * initial.price * transactions.percentage_fee / 100) AS fees,
126 | -- final value
127 | SUM(transactions.quantity * final.price) AS final_value
128 | FROM leah_hodl_strategy AS transactions
129 | INNER JOIN trading.prices AS initial
130 | ON transactions.ticker = initial.ticker
131 | AND transactions.txn_date = initial.market_date
132 | INNER JOIN trading.prices AS final
133 | ON transactions.ticker = final.ticker
134 | WHERE final.market_date = '2021-08-29'
135 | )
136 | SELECT
137 | initial_value,
138 | fees,
139 | final_value,
140 | final_value / initial_value AS profitability
141 | FROM cte_portfolio_values;
142 | ```
143 |
144 |
145 |
146 | | initial_value | fees | final_value | profitability |
147 | | ------------- | -------------------- | ----------- | ------------------- |
148 | | 50180.00 | 150.5400000000000000 | 2571642.00 | 51.2483459545635711 |
149 |
150 |
151 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step8.md)
152 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step10.md)
--------------------------------------------------------------------------------
/course-content/the-end.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | []()
6 | []()
7 | []()
8 | []()
9 |
10 | # The End & Next Steps!
11 |
12 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step13.md)
13 |
14 | # Next Steps on Your SQL Journey!
15 |
16 | Thank you for your taking this free SQL Masterclass GitHub course! If you've enjoyed this - please consider starring the repo and sharing it with your friends ⭐️
17 |
18 | This resource was originally created by [Danny Ma](https://www.linkedin.com/in/datawithdanny/) for the "SQL Masterclass for Data Scientists" half-day live training for the ODSC Asia Pacific 2021 Conference.
19 |
20 |
21 | # Data With Danny Virtual Data Apprenticeship
22 |
23 |
24 |
25 |
26 |
27 |
28 | If you're interested in learning valuable data science skills with Danny directly - you can checkout the [Data With Danny](https://www.datawithdanny.com) website for more details!
29 |
30 | Join our private student community with over 1,000 data professionals, join a local study group, get help from our team of 12+ data mentors and learn more about our personalized mentorship initiatives!
31 |
32 | The 1st part of the Data With Danny program is Serious SQL which is like this SQL course but on an entirely differently level. If you're serious about learning SQL you won't want to miss this course!
33 |
34 | Our first live SQL training cohort begins in November 2021 - you can gain all access to course content, recorded videos, our private Discord and more for a one off payment of only $49 - find out more [here!](https://www.datawithdanny.com/courses/serious-sql)
35 |
36 | # 8 Week SQL Challenge Case Studies
37 |
38 |
39 |
40 |
41 |
42 |
43 | Want to test your SQL skills and tackle 8 realistic SQL case studies and get access to a collection of free SQL learning resources?
44 |
45 | Join the Data With Danny [8 Week SQL Challenge](https://www.8weeksqlchallenge.com/getting-started) for free today!
46 |
47 | Solve all 8 realistic SQL case studies designed to simulate real work scenarios and interview questions across multiple analytics domains including customer analytics, digital, banking, retail and subscriptions!
48 |
49 | # About the Instructor: Danny Ma
50 |
51 |
52 |
53 |
54 |
55 | Danny is the Chief Data Mentor at Data With Danny and the Founder & CEO of Sydney Data Science, a boutique data consultancy based out of Sydney, Australia 🇦🇺
56 |
57 | After spending the last 10 years working in almost every single role in the data ecosystem, Danny is now focused on solving difficult problems at scale re-imagining data education and recruitment, and mentoring the next generation of data professionals.
58 |
59 | He provides specialist data consultancy services:
60 |
61 | * Digital customer analytics and experimentation
62 | * Data and machine learning strategy
63 | * Data engineering and systems design
64 | * Team building for analytics and data science functions
65 | * Technical training for practitioners and management
66 |
67 | Danny is a regular speaker at global data conferences, meetups and podcasts where he shares the importance of mentorship for all data professionals. He is
68 | also a technical author and instructor for O'Reilly.
69 |
70 | Danny believes that he is living proof that dispels the myth that you need higher level education to be successful in the data science space,
71 | and he wants to share his experiences with others so they can do the same.
72 |
73 | [](https://github.com/datawithdanny/sql-masterclass/tree/main/course-content/step13.md)
--------------------------------------------------------------------------------
/data/MySQL/schema-mysql.sql:
--------------------------------------------------------------------------------
1 | CREATE SCHEMA trading;
2 |
3 | CREATE TABLE trading.members (
4 | `member_id` VARCHAR(6),
5 | `first_name` VARCHAR(7),
6 | `region` VARCHAR(13)
7 | );
8 |
9 | CREATE TABLE trading.prices (
10 | `ticker` VARCHAR(3),
11 | `market_date` DATE,
12 | `price` FLOAT,
13 | `open` FLOAT,
14 | `high` FLOAT,
15 | `low` FLOAT,
16 | `volume` VARCHAR(7),
17 | `change` VARCHAR(7)
18 | );
19 |
20 | CREATE TABLE trading.transactions (
21 | `txn_id` INTEGER,
22 | `member_id` VARCHAR(6),
23 | `ticker` VARCHAR(3),
24 | `txn_date` DATE,
25 | `txn_type` VARCHAR(4),
26 | `quantity` FLOAT,
27 | `percentage_fee` FLOAT,
28 | `txn_time` TIMESTAMP
29 | );
30 |
31 | /*
32 | -- Creating these indexes after loading data
33 | -- will make things run much faster!!!
34 |
35 | CREATE INDEX ON trading.prices (ticker, market_date);
36 | CREATE INDEX ON trading.transactions (txn_date, ticker);
37 | CREATE INDEX ON trading.transactions (txn_date, member_id);
38 | CREATE INDEX ON trading.transactions (member_id);
39 | CREATE INDEX ON trading.transactions (ticker);
40 |
41 | */
--------------------------------------------------------------------------------
/data/Postgres/schema-postgres.sql:
--------------------------------------------------------------------------------
1 | CREATE SCHEMA trading;
2 |
3 | CREATE TABLE trading.members (
4 | "member_id" VARCHAR(6),
5 | "first_name" VARCHAR(7),
6 | "region" VARCHAR(13)
7 | );
8 |
9 | CREATE TABLE trading.prices (
10 | "ticker" VARCHAR(3),
11 | "market_date" DATE,
12 | "price" FLOAT,
13 | "open" FLOAT,
14 | "high" FLOAT,
15 | "low" FLOAT,
16 | "volume" VARCHAR(7),
17 | "change" VARCHAR(7)
18 | );
19 |
20 | CREATE TABLE trading.transactions (
21 | "txn_id" INTEGER,
22 | "member_id" VARCHAR(6),
23 | "ticker" VARCHAR(3),
24 | "txn_date" DATE,
25 | "txn_type" VARCHAR(4),
26 | "quantity" FLOAT,
27 | "percentage_fee" FLOAT,
28 | "txn_time" TIMESTAMP
29 | );
30 |
31 | /*
32 | -- Creating these indexes after loading data
33 | -- will make things run much faster!!!
34 |
35 | CREATE INDEX ON trading.prices (ticker, market_date);
36 | CREATE INDEX ON trading.transactions (txn_date, ticker);
37 | CREATE INDEX ON trading.transactions (txn_date, member_id);
38 | CREATE INDEX ON trading.transactions (member_id);
39 | CREATE INDEX ON trading.transactions (ticker);
40 |
41 | */
--------------------------------------------------------------------------------
/data/members.csv:
--------------------------------------------------------------------------------
1 | member_id,first_name,region
2 | c4ca42,Danny,Australia
3 | c81e72,Vipul,United States
4 | eccbc8,Charlie,United States
5 | a87ff6,Nandita,United States
6 | e4da3b,Rowan,United States
7 | 167909,Ayush,United States
8 | 8f14e4,Alex,United States
9 | c9f0f8,Abe,United States
10 | 45c48c,Ben,Australia
11 | d3d944,Enoch,Africa
12 | 6512bd,Vikram,India
13 | c20ad4,Leah,Asia
14 | c51ce4,Pavan,Australia
15 | aab323,Sonia,Australia
16 |
--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
1 | version: '3.5'
2 |
3 | services:
4 | db:
5 | image: dannyma/psql-crypto:latest
6 | environment:
7 | POSTGRES_HOST_AUTH_METHOD: 'trust'
8 |
9 | sqlpad:
10 | image: sqlpad/sqlpad:5.5
11 | depends_on:
12 | - db
13 | ports:
14 | - "3000:3000"
15 | environment:
16 | SQLPAD_AUTH_DISABLED: 'true'
17 | SQLPAD_AUTH_DISABLED_DEFAULT_ROLE: 'admin'
18 | SQLPAD_APP_LOG_LEVEL: debug
19 | SQLPAD_WEB_LOG_LEVEL: warn
20 | SQLPAD_CONNECTIONS__psql__name: psql
21 | SQLPAD_CONNECTIONS__psql__driver: postgres
22 | SQLPAD_CONNECTIONS__psql__host: db
23 | SQLPAD_CONNECTIONS__psql__username: postgres
24 | SQLPAD_CONNECTIONS__psql__multiStatementTransactionEnabled: 'true'
25 | SQLPAD_CONNECTIONS__psql__idleTimeoutSeconds: 86400
--------------------------------------------------------------------------------
/images/badges/click-these-badges-to-view-content!.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/go-to-next-tutorial.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/go-to-previous-tutorial.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/raw-data.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/sql-interface-db-fiddle.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/sql-interface-play-with-docker.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/sql-masterclass-v-1.0.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/start-here.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/badges/version-1.0.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/images/crypto.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/images/crypto.png
--------------------------------------------------------------------------------
/images/sql-masterclas-banner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/images/sql-masterclas-banner.png
--------------------------------------------------------------------------------
/slides/sql-masterclass-odsc-apac-2021.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datawithdanny/sql-masterclass/bb87f9be59a9fd828e75a8984e29537f10f12ec5/slides/sql-masterclass-odsc-apac-2021.pdf
--------------------------------------------------------------------------------