├── .gitignore
├── 01_getting_started
├── 01_getting_started.ipynb
├── 02_connecting_to_database.ipynb
├── 03_using_psql.ipynb
├── 04_setup_postgres_using_docker.ipynb
├── 05_setup_sql_workbench.ipynb
├── 06_sql_workbench_and_postgres.ipynb
├── 07_sql_workbench_features.ipynb
├── 08_data_loading_utilities.ipynb
├── 09_loading_data_postgres_in_docker.ipynb
└── 10_exercise_loading_data.ipynb
├── 02_dml_or_crud_operations
├── 01_dml_or_crud_operations.ipynb
├── 02_normalization_principles.ipynb
├── 03_tables_as_relations.ipynb
├── 04_overview_of_database_operations.ipynb
├── 05_crud_operations.ipynb
├── 06_creating_table.ipynb
├── 07_inserting_data.ipynb
├── 08_updating_data.ipynb
├── 09_deleting_data.ipynb
├── 10_overview_of_transactions.ipynb
└── 11_exercises_database_operations.ipynb
├── 03_writing_basic_sql_queries
├── 01_writing_basic_sql_queries.ipynb
├── 02_standard_transformations.ipynb
├── 03_overview_of_data_model.ipynb
├── 04_define_problem_statement.ipynb
├── 05_preparing_tables.ipynb
├── 06_selecting_or_projecting_data.ipynb
├── 07_filtering_data.ipynb
├── 08_joining_tables_inner.ipynb
├── 09_joining_tables_outer.ipynb
├── 10_performing_aggregations.ipynb
├── 11_sorting_data.ipynb
├── 12_solution_daily_product_revenue.ipynb
└── 13_exercises_basic_sql_queries.ipynb
├── 04_creating_tables_and_indexes
├── 01_creating_tables_and_indexes.ipynb
├── 02_data_definition_language.ipynb
├── 03_overview_of_data_types.ipynb
├── 04_adding_or_modifying_columns.ipynb
├── 05_different_types_of_constraints.ipynb
├── 06_managing_constraints.ipynb
├── 07_indexes_on_tables.ipynb
├── 08_indexes_for_constraints.ipynb
├── 09_overview_of_sequences.ipynb
├── 10_truncating_tables.ipynb
├── 11_dropping_tables.ipynb
└── 12_exercises_managing_db_objects.ipynb
├── 05_partitioning_tables_and_indexes
├── 01_partitioning_tables_and_indexes.ipynb
├── 02_overview_of_partitioning.ipynb
├── 03_list_partitioning.ipynb
├── 04_managing_partitions_list.ipynb
├── 05_manipulating_data.ipynb
├── 06_range_partitioning.ipynb
├── 07_managing_partitions_range.ipynb
├── 08_repartitioning_range.ipynb
├── 09_hash_partitioning.ipynb
├── 10_managing_partitions_hash.ipynb
├── 11_usage_scenarios.ipynb
├── 12_sub_partitioning.ipynb
└── 13_exercises_partitioning_tables.ipynb
├── 06_predefined_functions
├── 01_predefined_functions.ipynb
├── 02_overview_of_predefined_functions.ipynb
├── 03_string_manipulation_functions.ipynb
├── 04_date_manipulation_functions.ipynb
├── 05_overview_of_numeric_functions.ipynb
├── 06_data_type_conversion.ipynb
├── 07_handling_null_values.ipynb
├── 08_using_case_and_when.ipynb
└── 09_exercises_predefined_functions.ipynb
├── 07_writing_advanced_sql_queries
├── 01_writing_advanced_sql_queries.ipynb
├── 02_overview_of_views.ipynb
├── 03_named_queries_using_with_clause.ipynb
├── 04_overview_of_sub_queries.ipynb
├── 05_create_table_as_select.ipynb
├── 06_advanced_dml_operations.ipynb
├── 07_merging_or_upserting_data.ipynb
├── 08_pivoting_rows_into_columns.ipynb
├── 09_overview_of_analytic_functions.ipynb
├── 10_analytic_functions_aggregations.ipynb
├── 11_cumulative_or_moving_aggregations.ipynb
├── 12_analytic_functions_windowing.ipynb
├── 13_analytic_functions_ranking.ipynb
├── 14_analytic_funcions_filtering.ipynb
├── 15_ranking_and_filtering_recap.ipynb
└── 16_exercises_analytic_functions.ipynb
├── 08_query_performance_tuning
├── 01_query_performance_tuning.ipynb
├── 02_preparing_database.ipynb
├── 03_interpreting_explain_plans.ipynb
├── 04_overview_of_cost_based_optimizer.ipynb
├── 05_performance_tuning_using_indexes.ipynb
├── 06_criteria_for_indexing.ipynb
├── 07_criteria_for_partitioning.ipynb
├── 08_writing_queries_partition_pruning.ipynb
├── 09_overview_of_query_hints.ipynb
└── 10_exercises_tuning_queries.ipynb
├── README.md
├── _config.yml
├── _toc.yml
├── bonus_data_warehousing_concepts
├── 01_data_warehousing_concepts.ipynb
├── 02_overview_of_oltp_applications.ipynb
├── 03_data_warehouse_architecture.ipynb
├── 04_overview_of_data_lake.ipynb
├── 05_key_data_warehouse_concepts.ipynb
└── 06_dimensional_modeling.ipynb
├── bonus_overview_of_redshift
├── 01_overview_of_redshift.ipynb
├── 02_setup_aws_redshift.ipynb
├── 03_using_query_editor.ipynb
├── 04_accessing_redshift_publicly.ipynb
├── 05_connecting_using_psql.ipynb
├── 06_using_ides_sql_workbench.ipynb
├── 07_using_jupyter_environment.ipynb
└── 08_data_loading_utilities.ipynb
├── bonus_setup_exclusive_labs.ipynb
├── bonus_sqlalchemy_crud.ipynb
├── mastering-sql-using-postgresql.ipynb
├── mastering_postgresql_exercises.ipynb
└── solutions-mastering-postgresql
├── 01_getting_started.ipynb
├── 02_dml_or_crud_operations.ipynb
├── 03_basic_sql_queries.ipynb
├── 04_managing_database_objects.ipynb
├── 05_partitioned_tables.ipynb
├── 06_pre_defined_functions.ipynb
└── 07_analytic_functions.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 |
--------------------------------------------------------------------------------
/01_getting_started/01_getting_started.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Getting Started\n",
8 | "\n",
9 | "As part of this section we will primarily understand different ways to get started with Postgres.\n",
10 | "* Connecting to Database\n",
11 | "* Using psql\n",
12 | "* Setup Postgres using Docker\n",
13 | "* Setup SQL Workbench\n",
14 | "* SQL Workbench and Postgres\n",
15 | "* SQL Workbench Features\n",
16 | "* Data Loading Utilities\n",
17 | "* Loading Data - Docker\n",
18 | "* Exercise - Loading Data\n",
19 | "\n",
20 | "Here are the key objectives of this section\n",
21 | "* Connecting to Database using Jupyter based environment in our labs. This is relevant to only those who got our lab access.\n",
22 | "* Ability to setup Postgres Database using Docker for those who does not have access to our labs.\n",
23 | "* Relevance of IDEs such as SQL Workbench\n",
24 | "* Understand the key features for IDEs such as SQL Workbench including connecting SQL Workbench to Postgres Database.\n",
25 | "* How to load data into tables using Database native utilities?\n",
26 | "* Exercise to ensure our understanding related to loading data into the tables using database native utilities."
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": null,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": []
35 | }
36 | ],
37 | "metadata": {
38 | "kernelspec": {
39 | "display_name": "Python 3",
40 | "language": "python",
41 | "name": "python3"
42 | },
43 | "language_info": {
44 | "codemirror_mode": {
45 | "name": "ipython",
46 | "version": 3
47 | },
48 | "file_extension": ".py",
49 | "mimetype": "text/x-python",
50 | "name": "python",
51 | "nbconvert_exporter": "python",
52 | "pygments_lexer": "ipython3",
53 | "version": "3.6.12"
54 | }
55 | },
56 | "nbformat": 4,
57 | "nbformat_minor": 4
58 | }
59 |
--------------------------------------------------------------------------------
/01_getting_started/02_connecting_to_database.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Connecting to Database\n",
8 | "\n",
9 | "We will be using JupyterHub based environment to master Postgresql. Let us go through the steps involved to get started using JupyterHub environment."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* We will use Python Kernel with sql magic command and for that we need to first load the sql extension.\n",
44 | "* Create environment variable `DATABASE_URL` using SQL Alchemy format.\n",
45 | "* Write a simple query to get data from information schema table to validate database connectivity.\n",
46 | "* Here is the information you can leverage to connect to the database.\n",
47 | " * **User Name:** YOUR_OS_USER_sms_user\n",
48 | " * **Database Name:** YOUR_OS_USER_sms_db\n",
49 | " * **Password:** Your lab password provided by us"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": null,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "%load_ext sql"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": null,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "%env DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": null,
73 | "metadata": {},
74 | "outputs": [],
75 | "source": [
76 | "%sql SELECT * FROM information_schema.tables LIMIT 10"
77 | ]
78 | }
79 | ],
80 | "metadata": {
81 | "kernelspec": {
82 | "display_name": "Python 3",
83 | "language": "python",
84 | "name": "python3"
85 | },
86 | "language_info": {
87 | "codemirror_mode": {
88 | "name": "ipython",
89 | "version": 3
90 | },
91 | "file_extension": ".py",
92 | "mimetype": "text/x-python",
93 | "name": "python",
94 | "nbconvert_exporter": "python",
95 | "pygments_lexer": "ipython3",
96 | "version": "3.6.12"
97 | }
98 | },
99 | "nbformat": 4,
100 | "nbformat_minor": 4
101 | }
102 |
--------------------------------------------------------------------------------
/01_getting_started/03_using_psql.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Using psql\n",
8 | "\n",
9 | "Let us understand how to use `psql` utility to perform database operations."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 6,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* `psql` is command line utility to connect to the Postgres database server. It is typically used for the following by advanced Database users:\n",
44 | " * Manage Databases\n",
45 | " * Manage Tables\n",
46 | " * Load data into tables for testing purposes\n",
47 | "* We need to have at least Postgres Client installed on the server from which you want to use psql to connect to Postgres Server.\n",
48 | "* If you are on the server where **Postgres Database Server** is installed, `psql` will be automatically available.\n",
49 | "* We can run `sudo -u postgres psql -U postgres` from the server provided you have sudo permissions on the server. Otherwise we need to go with `psql -U postgres -W` which will prompt for the password.\n",
50 | "* **postgres** is the super user for the postgres server and hence typically developers will not have access to it in non development environments.\n",
51 | "* As a developer, we can use following command to connect to a database setup on postgres server using user credentials.\n",
52 | "\n",
53 | "```shell\n",
54 | "psql -h -d -U -W\n",
55 | "\n",
56 | "# Here is the example to connect to itversity_sms_db using itversity_sms_user\n",
57 | "psql -h localhost -p 5432 -d itversity_sms_db -U itversity_sms_user -W\n",
58 | "```\n",
59 | "* We typically use `psql` to troubleshoot the issues in non development servers. IDEs such as **SQL Alchemy** might be better for regular usage as part of development and unit testing process.\n",
60 | "* For this course, we will be primarily using Jupyter based environment for practice.\n",
61 | "* However, we will go through some of the important commands to get comfortable with `psql`.\n",
62 | " * Listing Databases - `\\l`\n",
63 | " * Switching to a Database - `\\c `\n",
64 | " * Get help for `psql` - `\\?`\n",
65 | " * Listing tables - `\\d`\n",
66 | " * Create table - `CREATE TABLE t (i SERIAL PRIMARY KEY)`\n",
67 | " * Get details related to a table - `\\d `\n",
68 | " * Running Scripts - `\\i `\n",
69 | " * You will go through some of the commands over a period of time."
70 | ]
71 | }
72 | ],
73 | "metadata": {
74 | "kernelspec": {
75 | "display_name": "Python 3",
76 | "language": "python",
77 | "name": "python3"
78 | },
79 | "language_info": {
80 | "codemirror_mode": {
81 | "name": "ipython",
82 | "version": 3
83 | },
84 | "file_extension": ".py",
85 | "mimetype": "text/x-python",
86 | "name": "python",
87 | "nbconvert_exporter": "python",
88 | "pygments_lexer": "ipython3",
89 | "version": "3.6.12"
90 | }
91 | },
92 | "nbformat": 4,
93 | "nbformat_minor": 4
94 | }
95 |
--------------------------------------------------------------------------------
/01_getting_started/04_setup_postgres_using_docker.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Setup Postgres using Docker\n",
8 | "\n",
9 | "In some cases you might want to have postgres setup on your machine. Let us understand how we can setup Postgres using Docker."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 7,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* If you are using our labs, the database will be pre-created by us with all the right permissions.\n",
44 | "* If you are using Windows or Mac, ensure that you have installed Docker Desktop.\n",
45 | "* If you are using Ubuntu based desktop, make sure to setup Docker.\n",
46 | "* Here are the steps that can be used to setup Postgres database using Docker.\n",
47 | " * Pull the postgres image using `docker pull`\n",
48 | " * Create the container using `docker create`.\n",
49 | " * Start the container using `docker start`.\n",
50 | " * Alternatively we can use `docker run` which will pull, create and start the container.\n",
51 | " * Use `docker logs` or `docker logs -f` to review the logs to ensure Postgres Server is up and running.\n",
52 | "\n",
53 | "```shell\n",
54 | "docker pull postgres\n",
55 | "\n",
56 | "docker container create \\\n",
57 | " --name itv_pg \\\n",
58 | " -p 5433:5432 \\\n",
59 | " -h itv_pg \\\n",
60 | " -e POSTGRES_PASSWORD=itversity \\\n",
61 | " postgres\n",
62 | "\n",
63 | "docker start itv_pg\n",
64 | "\n",
65 | "docker logs itv_pg\n",
66 | "```\n",
67 | "* You can connect to Postgres Database setup using Docker with `docker exec`.\n",
68 | "\n",
69 | "```shell\n",
70 | "docker exec \\\n",
71 | " -it itv_pg \\\n",
72 | " psql -U postgres\n",
73 | "```\n",
74 | "\n",
75 | "* You can also connecto to Postgres directly with out using `docker exec`.\n",
76 | "\n",
77 | "```shell\n",
78 | "psql -h localhost \\\n",
79 | " -p 5433 \\\n",
80 | " -d postgres \\\n",
81 | " -U postgres \\\n",
82 | " -W\n",
83 | "```"
84 | ]
85 | }
86 | ],
87 | "metadata": {
88 | "kernelspec": {
89 | "display_name": "Python 3",
90 | "language": "python",
91 | "name": "python3"
92 | },
93 | "language_info": {
94 | "codemirror_mode": {
95 | "name": "ipython",
96 | "version": 3
97 | },
98 | "file_extension": ".py",
99 | "mimetype": "text/x-python",
100 | "name": "python",
101 | "nbconvert_exporter": "python",
102 | "pygments_lexer": "ipython3",
103 | "version": "3.6.12"
104 | }
105 | },
106 | "nbformat": 4,
107 | "nbformat_minor": 4
108 | }
109 |
--------------------------------------------------------------------------------
/01_getting_started/05_setup_sql_workbench.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Setup SQL Workbench\n",
8 | "\n",
9 | "Let us understand how to setup and use SQL Workbench."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 8,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "**Why SQL Workbench**\n",
44 | "\n",
45 | "Let us see the details why we might have to use SQL Workbench.\n",
46 | "* Using Database CLIs such as psql for postgres, mysql etc can be cumbersome for those who are not comfortable with command line interfaces.\n",
47 | "* Database IDEs such as SQL Workbench will provide required features to run queries against databases with out worrying to much about underlying data dictionaries.\n",
48 | "* SQL Workbench provide required features to review databases and objects with out writing queries or running database specific commands.\n",
49 | "* Also Database IDEs provide capabilities to preserve the scripts we develop.\n",
50 | "> **In short Database IDEs such as SQL Workbench improves productivity.**\n",
51 | "\n",
52 | "**Alternative IDEs**\n",
53 | "\n",
54 | "There are several IDEs in the market.\n",
55 | "* TOAD\n",
56 | "* SQL Developer for Oracle\n",
57 | "* MySQL Workbench\n",
58 | "and many others\n",
59 | "\n",
60 | "**Install SQL Workbench**\n",
61 | "\n",
62 | "Here are the instructions to setup SQL Workbench.\n",
63 | "* Download SQL Workbench (typically zip file)\n",
64 | "* Unzip and launch\n",
65 | "\n",
66 | "Once installed we need to perform below steps which will be covered in detail as part of next topic.\n",
67 | "* Download JDBC driver for the database we would like to connect.\n",
68 | "* Get the database connectivity information and connect to the database."
69 | ]
70 | }
71 | ],
72 | "metadata": {
73 | "kernelspec": {
74 | "display_name": "Python 3",
75 | "language": "python",
76 | "name": "python3"
77 | },
78 | "language_info": {
79 | "codemirror_mode": {
80 | "name": "ipython",
81 | "version": 3
82 | },
83 | "file_extension": ".py",
84 | "mimetype": "text/x-python",
85 | "name": "python",
86 | "nbconvert_exporter": "python",
87 | "pygments_lexer": "ipython3",
88 | "version": "3.6.12"
89 | }
90 | },
91 | "nbformat": 4,
92 | "nbformat_minor": 4
93 | }
94 |
--------------------------------------------------------------------------------
/01_getting_started/06_sql_workbench_and_postgres.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## SQL Workbench and Postgres\n",
8 | "\n",
9 | "Let us connect to Postgres Database using SQL Workbench.\n",
10 | "* Download the JDBC Driver\n",
11 | "* Get the database connectivity information\n",
12 | "* Configure the connection using SQL Workbench\n",
13 | "* Validate the connection and save the profile"
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": 1,
19 | "metadata": {
20 | "tags": [
21 | "remove-cell"
22 | ]
23 | },
24 | "outputs": [
25 | {
26 | "data": {
27 | "text/html": [
28 | "\n"
29 | ],
30 | "text/plain": [
31 | ""
32 | ]
33 | },
34 | "metadata": {},
35 | "output_type": "display_data"
36 | }
37 | ],
38 | "source": [
39 | "%%HTML\n",
40 | ""
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "### Connecting to Postgres\n",
48 | "Here are the steps to connect to Postgres running on your PC or remote machine without Docker.\n",
49 | "\n",
50 | "* We are trying to connect to Postgres Database that is running as part of remote machine or on your PC.\n",
51 | "* We typically use ODBC or JDBC to connect to a Database from remote machines (our PC).\n",
52 | "* Here are the pre-requisites to connect to a Database.\n",
53 | " * Make sure 5432 port is opened as part of the firewalls.\n",
54 | " * If you have telnet configured on your system on which SQL Workbench is installed, make sure to validate by running telnet command using ip or DNS Alias and port number 5432.\n",
55 | " * Ensure that you have downloaded right JDBC Driver for Postgres.\n",
56 | " * Make sure to have right credentials (username and password).\n",
57 | " * Ensure that you have database created on which the user have permissions.\n",
58 | "* Once you have all the information required along with JDBC jar, ensure to save the information as part of the profile. You can also validate before saving the details by using **Test** option.\n",
59 | "\n",
60 | "### Postgres on Docker\n",
61 | "\n",
62 | "Here are the steps to connect to Postgres running as part of Docker container.\n",
63 | "* We are trying to connect to Postgres Database that is running as part of Docker container running in a Ubuntu 18.04 VM provisioned from GCP.\n",
64 | "* We have published Postgres database port to port 5433 on Ubuntu 18.04 VM.\n",
65 | "* We typically use ODBC or JDBC to connect to a Database from remote machines (our PC).\n",
66 | "* Here are the pre-requisites to connect to a Database on GCP.\n",
67 | " * Make sure 5432 port is opened as part of the firewalls.\n",
68 | " * If you have telnet configured on your system on which SQL Workbench is installed, make sure to validate by running telnet command using ip or DNS Alias and port number 5433.\n",
69 | " * Ensure that you have downloaded right JDBC Driver for Postgres.\n",
70 | " * Make sure to have right credentials (username and password).\n",
71 | " * Ensure that you have database created on which the user have permissions.\n",
72 | "* You can validate credentials and permissions to the database by installing postgres client on Ubuntu 18.04 VM and then by connecting to the database using the credentials.\n",
73 | "* Once you have all the information required along with JDBC jar, ensure to save the information as part of the profile. You can also validate before saving the details by using **Test** option."
74 | ]
75 | }
76 | ],
77 | "metadata": {
78 | "kernelspec": {
79 | "display_name": "Python 3",
80 | "language": "python",
81 | "name": "python3"
82 | },
83 | "language_info": {
84 | "codemirror_mode": {
85 | "name": "ipython",
86 | "version": 3
87 | },
88 | "file_extension": ".py",
89 | "mimetype": "text/x-python",
90 | "name": "python",
91 | "nbconvert_exporter": "python",
92 | "pygments_lexer": "ipython3",
93 | "version": "3.6.12"
94 | }
95 | },
96 | "nbformat": 4,
97 | "nbformat_minor": 4
98 | }
99 |
--------------------------------------------------------------------------------
/01_getting_started/07_sql_workbench_features.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## SQL Workbench Features\n",
8 | "\n",
9 | "Here are some of the key features, you have to familiar with related to SQL Workbench."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 10,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Ability to connect to different RDBMS, Data Warehouse and MPP Database servers such as Postgres, MySQL, Oracle, Redshift etc.\n",
44 | "* Saving profiles to connect to multiple databases.\n",
45 | "* Ability to access data dictionary or information schema using wizards to validate tables, columns, sequences, indexes, constraints etc.\n",
46 | "* Generate scripts out of existing data.\n",
47 | "* Ability to manage database objects with out writing any commands. We can drop tables, indexes, sequences etc by right clicking and then selecting drop option.\n",
48 | "* Develop SQL files and preserve them for future usage.\n",
49 | "\n",
50 | "Almost all leading IDEs provide all these features in similar fashion.\n",
51 | "\n",
52 | "**Usage Scenarios**\n",
53 | "\n",
54 | "Here are **some of the usage scenarios** for database IDEs such as SQL Workbench as part of day to day responsibilities.\n",
55 | "* Developers for generating and validating data as part of unit testing.\n",
56 | "* Testers to validate data for their test cases.\n",
57 | "* Business Analysts and Data Analysts to run ad hoc queries to understand the data better.\n",
58 | "* Developers to troubleshoot data related to production issues using read only accounts."
59 | ]
60 | }
61 | ],
62 | "metadata": {
63 | "kernelspec": {
64 | "display_name": "Python 3",
65 | "language": "python",
66 | "name": "python3"
67 | },
68 | "language_info": {
69 | "codemirror_mode": {
70 | "name": "ipython",
71 | "version": 3
72 | },
73 | "file_extension": ".py",
74 | "mimetype": "text/x-python",
75 | "name": "python",
76 | "nbconvert_exporter": "python",
77 | "pygments_lexer": "ipython3",
78 | "version": "3.6.12"
79 | }
80 | },
81 | "nbformat": 4,
82 | "nbformat_minor": 4
83 | }
84 |
--------------------------------------------------------------------------------
/01_getting_started/08_data_loading_utilities.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Data Loading Utilities\n",
8 | "\n",
9 | "Let us understand how we can load the data into databases using utilities provided."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 11,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Most of the databases provide data loading utilities.\n",
44 | "* One of the most common way of getting data into database tables is by using data loading utilities provided by the underlying datatabase technology.\n",
45 | "* We can load delimited files into database using these utilities.\n",
46 | "* Here are the steps we can follow to load the delimited data into the table.\n",
47 | " * Make sure files are available on the server from which we are trying to load.\n",
48 | " * Ensure the database and table are created for the data to be loaded.\n",
49 | " * Run relevant command to load the data into the table.\n",
50 | " * Make sure to validate by running queries.\n",
51 | "* Let us see a demo by loading a sample file into the table in Postgres database.\n",
52 | "\n",
53 | "### Loading Data\n",
54 | "We can use COPY Command using `psql` to copy the data into the table.\n",
55 | "* Make sure database is created along with the user with right permissions. Also the user who want to use `COPY` command need to have **pg_read_server_files** role assigned.\n",
56 | "* Create the file with sample data. In this case data is added to **users.csv** under **/data/sms_db**\n",
57 | "\n",
58 | "```text\n",
59 | "user_first_name,user_last_name,user_email_id,user_role,created_dt\n",
60 | "Gordan,Bradock,gbradock0@barnesandnoble.com,A,2020-01-10\n",
61 | "Tobe,Lyness,tlyness1@paginegialle.it,U,2020-02-10\n",
62 | "Addie,Mesias,amesias2@twitpic.com,U,2020-03-05\n",
63 | "Corene,Kohrsen,ckohrsen3@buzzfeed.com,U,2020-04-15\n",
64 | "Darill,Halsall,dhalsall4@intel.com,U,2020-10-10\n",
65 | "```\n",
66 | "\n",
67 | "* Connect to Database.\n",
68 | "\n",
69 | "```shell\n",
70 | "psql -U itversity_sms_user \\\n",
71 | " -h localhost \\\n",
72 | " -p 5432 \\\n",
73 | " -d itversity_sms_db \\\n",
74 | " -W\n",
75 | "```\n",
76 | "\n",
77 | "* Create the `users` table.\n",
78 | "\n",
79 | "```sql\n",
80 | "CREATE TABLE users (\n",
81 | " user_id SERIAL PRIMARY KEY,\n",
82 | " user_first_name VARCHAR(30) NOT NULL,\n",
83 | " user_last_name VARCHAR(30) NOT NULL,\n",
84 | " user_email_id VARCHAR(50) NOT NULL,\n",
85 | " user_email_validated BOOLEAN DEFAULT FALSE,\n",
86 | " user_password VARCHAR(200),\n",
87 | " user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A\n",
88 | " is_active BOOLEAN DEFAULT FALSE,\n",
89 | " created_dt DATE DEFAULT CURRENT_DATE\n",
90 | ");\n",
91 | "```\n",
92 | "\n",
93 | "* Use copy command to load the data\n",
94 | "\n",
95 | "```shell\n",
96 | "COPY users(user_first_name, user_last_name, \n",
97 | " user_email_id, user_role, created_dt\n",
98 | ") FROM '/data/sms_db/users.csv'\n",
99 | "DELIMITER ','\n",
100 | "CSV HEADER;\n",
101 | "```\n",
102 | "\n",
103 | "* Validate by running queries\n",
104 | "\n",
105 | "```sql\n",
106 | "SELECT * FROM users;\n",
107 | "```"
108 | ]
109 | }
110 | ],
111 | "metadata": {
112 | "kernelspec": {
113 | "display_name": "Python 3",
114 | "language": "python",
115 | "name": "python3"
116 | },
117 | "language_info": {
118 | "codemirror_mode": {
119 | "name": "ipython",
120 | "version": 3
121 | },
122 | "file_extension": ".py",
123 | "mimetype": "text/x-python",
124 | "name": "python",
125 | "nbconvert_exporter": "python",
126 | "pygments_lexer": "ipython3",
127 | "version": "3.6.12"
128 | }
129 | },
130 | "nbformat": 4,
131 | "nbformat_minor": 4
132 | }
133 |
--------------------------------------------------------------------------------
/01_getting_started/09_loading_data_postgres_in_docker.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Loading Data - Docker\n",
8 | "\n",
9 | "Let us understand how you can take care of loading data into Postgres Database running using Docker Container."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 12,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Make sure database is created along with the user with right permissions. Also the user who want to use `COPY` command need to have **pg_read_server_files** role assigned.\n",
44 | " * Create file with sample data\n",
45 | " * Copy file into Docker container\n",
46 | " * Connect to Database\n",
47 | " * Create the table\n",
48 | " * Run `COPY` Command to copy the data.\n",
49 | "\n",
50 | "**Prepare Data**\n",
51 | "\n",
52 | "We need to create file with sample data and copy the files into the container.\n",
53 | "\n",
54 | "* Sample File\n",
55 | "In this case data is added to users.csv under **~/sms_db**.\n",
56 | "\n",
57 | "```text\n",
58 | "user_first_name,user_last_name,user_email_id,user_role,created_dt\n",
59 | "Gordan,Bradock,gbradock0@barnesandnoble.com,A,2020-01-10\n",
60 | "Tobe,Lyness,tlyness1@paginegialle.it,U,2020-02-10\n",
61 | "Addie,Mesias,amesias2@twitpic.com,U,2020-03-05\n",
62 | "Corene,Kohrsen,ckohrsen3@buzzfeed.com,U,2020-04-15\n",
63 | "Darill,Halsall,dhalsall4@intel.com,U,2020-10-10\n",
64 | "```\n",
65 | "* Copy data\n",
66 | "\n",
67 | "```shell\n",
68 | "docker cp ~/sms_db/users.csv itv_pg:/tmp\n",
69 | "```\n",
70 | "\n",
71 | "**Create Database**\n",
72 | "\n",
73 | "Here are the steps to create database.\n",
74 | "* Connect to database as super user **postgres**\n",
75 | "\n",
76 | "```shell\n",
77 | "docker exec -it itv_pg psql -U postgres\n",
78 | "```\n",
79 | "\n",
80 | "* Create the database with right permissions.\n",
81 | "\n",
82 | "```sql\n",
83 | "CREATE DATABASE itversity_sms_db;\n",
84 | "CREATE USER itversity_sms_user WITH PASSWORD 'sms_password';\n",
85 | "GRANT ALL ON DATABASE itversity_sms_db TO itversity_sms_user;\n",
86 | "GRANT pg_read_server_files TO itversity_sms_user;\n",
87 | "```\n",
88 | "\n",
89 | "* Exit using `\\q`\n",
90 | "\n",
91 | "**Connect to Database**\n",
92 | "\n",
93 | "Use this command to connect to the newly created database.\n",
94 | "\n",
95 | "```shell\n",
96 | "psql -U itversity_sms_user \\\n",
97 | " -h localhost \\\n",
98 | " -p 5433 \\\n",
99 | " -d itversity_sms_db \\\n",
100 | " -W\n",
101 | "```\n",
102 | "\n",
103 | "**Create Table**\n",
104 | "\n",
105 | "Here is the script to create the table.\n",
106 | "\n",
107 | "```sql\n",
108 | "CREATE TABLE users (\n",
109 | " user_id SERIAL PRIMARY KEY,\n",
110 | " user_first_name VARCHAR(30) NOT NULL,\n",
111 | " user_last_name VARCHAR(30) NOT NULL,\n",
112 | " user_email_id VARCHAR(50) NOT NULL,\n",
113 | " user_email_validated BOOLEAN DEFAULT FALSE,\n",
114 | " user_password VARCHAR(200),\n",
115 | " user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A\n",
116 | " is_active BOOLEAN DEFAULT FALSE,\n",
117 | " created_dt DATE DEFAULT CURRENT_DATE\n",
118 | ");\n",
119 | "```\n",
120 | "\n",
121 | "**Load Data**\n",
122 | "\n",
123 | "Here are the steps to load and validate the data using `psql`.\n",
124 | "\n",
125 | "* Load data using `COPY` Command\n",
126 | "\n",
127 | "```shell\n",
128 | "COPY users(user_first_name, user_last_name, \n",
129 | " user_email_id, user_role, created_dt\n",
130 | ") FROM '/tmp/users.csv'\n",
131 | "DELIMITER ','\n",
132 | "CSV HEADER;\n",
133 | "```\n",
134 | "\n",
135 | "* Validate by running queries\n",
136 | "\n",
137 | "```sql\n",
138 | "SELECT * FROM users;\n",
139 | "```"
140 | ]
141 | }
142 | ],
143 | "metadata": {
144 | "kernelspec": {
145 | "display_name": "",
146 | "name": ""
147 | },
148 | "language_info": {
149 | "name": ""
150 | }
151 | },
152 | "nbformat": 4,
153 | "nbformat_minor": 4
154 | }
155 |
--------------------------------------------------------------------------------
/01_getting_started/10_exercise_loading_data.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Exercise - Loading Data\n",
8 | "\n",
9 | "As part of this exercise, you need to take care of loading data using `COPY` Command.\n",
10 | "* You can connect to the database using following details in the environment provided by us.\n",
11 | " * Host: localhost\n",
12 | " * Port: 5342\n",
13 | " * Database Name: YOUR_OS_USER_hr_db\n",
14 | " * User Name: YOUR_OS_USER_hr_user\n",
15 | " * Password: YOUR_OS_USER_PASSWORD (provided by us).\n",
16 | "* If you are using your own environment, make sure to create database for storing HR Data.\n",
17 | " * Database Name: hr_db\n",
18 | " * User Name: hr_user\n",
19 | " * You can create user with password of your choice.\n",
20 | " \n",
21 | "```sql\n",
22 | "CREATE DATABASE hr_db;\n",
23 | "CREATE USER hr_user WITH PASSWORD 'hr_password';\n",
24 | "GRANT ALL ON DATABASE hr_db TO hr_user;\n",
25 | "GRANT pg_read_server_files TO hr_user;\n",
26 | "```\n",
27 | "\n",
28 | "* Create table using this script.\n",
29 | "\n",
30 | "```sql\n",
31 | "CREATE TABLE employees\n",
32 | " ( employee_id INTEGER\n",
33 | " , first_name VARCHAR(20)\n",
34 | " , last_name VARCHAR(25)\n",
35 | " , email VARCHAR(25)\n",
36 | " , phone_number VARCHAR(20)\n",
37 | " , hire_date DATE\n",
38 | " , job_id VARCHAR(10)\n",
39 | " , salary NUMERIC(8,2)\n",
40 | " , commission_pct NUMERIC(2,2)\n",
41 | " , manager_id INTEGER\n",
42 | " , department_id INTEGER\n",
43 | " ) ;\n",
44 | "CREATE UNIQUE INDEX emp_emp_id_pk\n",
45 | " ON employees (employee_id) ;\n",
46 | "ALTER TABLE employees ADD\n",
47 | " PRIMARY KEY (employee_id);\n",
48 | "```\n",
49 | "\n",
50 | "* Understand data.\n",
51 | " * Check for delimiters (record as well as field).\n",
52 | " * Check whether header exists or not.\n",
53 | " * Ensure number of fields for the table and data being loaded are same or not.\n",
54 | "* Load data into the table using `COPY` Command. The file is under `/data/hr_db/employees`\n",
55 | "* Validate by running these queries. You can also use SQL Workbench to run the queries to validate whether data is loaded successfully or not.\n",
56 | "\n",
57 | "```sql\n",
58 | "SELECT * FROM employees LIMIT 10;\n",
59 | "SELECT count(1) FROM employees;\n",
60 | "```"
61 | ]
62 | }
63 | ],
64 | "metadata": {
65 | "kernelspec": {
66 | "display_name": "Python 3",
67 | "language": "python",
68 | "name": "python3"
69 | },
70 | "language_info": {
71 | "codemirror_mode": {
72 | "name": "ipython",
73 | "version": 3
74 | },
75 | "file_extension": ".py",
76 | "mimetype": "text/x-python",
77 | "name": "python",
78 | "nbconvert_exporter": "python",
79 | "pygments_lexer": "ipython3",
80 | "version": "3.6.12"
81 | }
82 | },
83 | "nbformat": 4,
84 | "nbformat_minor": 4
85 | }
86 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/01_dml_or_crud_operations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# DML or CRUD Operations\n",
8 | "\n",
9 | "Let us understand how to perform CRUD operations using Postgresql.\n",
10 | "\n",
11 | "* Normalization Principles\n",
12 | "* Tables as Relations\n",
13 | "* Database Operations - Overview\n",
14 | "* CRUD Operations\n",
15 | "* Creating Table\n",
16 | "* Inserting Data\n",
17 | "* Updating Data\n",
18 | "* Deleting Data\n",
19 | "* Overview of Transactions\n",
20 | "* Exercise - Database Operations\n",
21 | "\n",
22 | "Here are the key objectives of this section.\n",
23 | "* What are the different types of Database Operations?\n",
24 | "* How DML is related to CRUD Operations?\n",
25 | "* How to insert new records into table?\n",
26 | "* How to update existing data in a table?\n",
27 | "* How the data is typically deleted from a table?\n",
28 | "* You will also get a brief overview about Database Operations?\n",
29 | "* Self evaluate whether you gain enough skills related to performing CRUD or DML operations or not using exercieses"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": null,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": []
38 | }
39 | ],
40 | "metadata": {
41 | "kernelspec": {
42 | "display_name": "Python 3",
43 | "language": "python",
44 | "name": "python3"
45 | },
46 | "language_info": {
47 | "codemirror_mode": {
48 | "name": "ipython",
49 | "version": 3
50 | },
51 | "file_extension": ".py",
52 | "mimetype": "text/x-python",
53 | "name": "python",
54 | "nbconvert_exporter": "python",
55 | "pygments_lexer": "ipython3",
56 | "version": "3.6.12"
57 | }
58 | },
59 | "nbformat": 4,
60 | "nbformat_minor": 4
61 | }
62 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/02_normalization_principles.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Normalization Principles\n",
8 | "\n",
9 | "Let us get an overview about Normalization Principles.\n",
10 | "\n",
11 | "Here are different normal forms we use. Provided links are from Wiki.\n",
12 | "* [1st Normal Form](https://en.wikipedia.org/wiki/First_normal_form)\n",
13 | "* [2nd Normal Form](https://en.wikipedia.org/wiki/Second_normal_form)\n",
14 | "* [3rd Normal Form](https://en.wikipedia.org/wiki/Third_normal_form)\n",
15 | "* [Boyce Codd Normal Form](https://en.wikipedia.org/wiki/Boyce–Codd_normal_form)\n",
16 | "\n",
17 | "Most of the well designed Data Models will be in either 3rd Normal Form. BCNF is used in some extreme cases where 3rd Normal Form does not eliminate all insertion, updation and deletion anomalies.\n",
18 | "\n",
19 | "### Reporting Environments\n",
20 | "While normalization is extensively used for transactional systems, they are not ideal for reporting or descision support systems. We tend to use dimensional modeling for reporting systems where tables will contain pre processed data as per the report requirements."
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "### Normal Forms - Key Terms\n",
28 | "\n",
29 | "Let us understand some of the key terms we use while going through the normal forms.\n",
30 | "* Domain\n",
31 | "* Attribute\n",
32 | "* Atomic (indivisible)\n",
33 | "* Functionally Dependent\n",
34 | "* Prime Attribute\n",
35 | "* Candidate Key\n",
36 | "* Data Anomalies - potential issues to data due to the mistakes by users or developers\n",
37 | "* Transitive Dependency\n"
38 | ]
39 | }
40 | ],
41 | "metadata": {
42 | "kernelspec": {
43 | "display_name": "Python 3",
44 | "language": "python",
45 | "name": "python3"
46 | },
47 | "language_info": {
48 | "codemirror_mode": {
49 | "name": "ipython",
50 | "version": 3
51 | },
52 | "file_extension": ".py",
53 | "mimetype": "text/x-python",
54 | "name": "python",
55 | "nbconvert_exporter": "python",
56 | "pygments_lexer": "ipython3",
57 | "version": "3.6.12"
58 | }
59 | },
60 | "nbformat": 4,
61 | "nbformat_minor": 4
62 | }
63 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/03_tables_as_relations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Tables as Relations\n",
8 | "\n",
9 | "Let us understand details about relations and different types of relationships we typically use.\n",
10 | "\n",
11 | "* In RDBMS - R stands for Relational.\n",
12 | "* In the transactional systems, tables are created using normalization principles. There will be relations or tables created based on relationships among them.\n",
13 | "* Here are the typical relationships among the tables.\n",
14 | " * 1 to 1\n",
15 | " * 1 to many or many to 1 (1 to n or n to 1)\n",
16 | " * many to many (m to n)\n",
17 | "* To **enforce** relationships we typically define constraints such as **Primary Key** and **Foreign Key**.\n",
18 | "* Here is the typical process we follow from requirements to physical database tables before building applications.\n",
19 | " * Identify entities based up on the requirements.\n",
20 | " * Define relationships among them.\n",
21 | " * Create ER Diagram (Entity Relationship Diagram). It is also called as Logical Data Model.\n",
22 | " * Apply Normalization Principles on the entities to identify tables and constraints to manage relationships among them.\n",
23 | " * Come up with Physical Data Model and generate required DDL Scripts.\n",
24 | " * Execute the scripts in the database on which applications will be eventually build based up on business requirements.\n",
25 | "* Logical modeling is typically done by Data Architects.\n",
26 | "* Physical modeling is taken care by Application Architect or Development lead.\n",
27 | "* Let us go through [data model](https://docs.oracle.com/cd/B28359_01/server.111/b28328/diagrams.htm) related to HR and OE systems.\n",
28 | " * Identify the relationships between the tables.\n",
29 | " * Differentiate between transactional tables and non transactional tables."
30 | ]
31 | }
32 | ],
33 | "metadata": {
34 | "kernelspec": {
35 | "display_name": "Python 3",
36 | "language": "python",
37 | "name": "python3"
38 | },
39 | "language_info": {
40 | "codemirror_mode": {
41 | "name": "ipython",
42 | "version": 3
43 | },
44 | "file_extension": ".py",
45 | "mimetype": "text/x-python",
46 | "name": "python",
47 | "nbconvert_exporter": "python",
48 | "pygments_lexer": "ipython3",
49 | "version": "3.6.12"
50 | }
51 | },
52 | "nbformat": 4,
53 | "nbformat_minor": 4
54 | }
55 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/04_overview_of_database_operations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Database Operations - Overview\n",
8 | "\n",
9 | "Let us get an overview of Database Operations we typically perform on regular basis. They are broadly categorized into the following:"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* DDL - Data Definition Language\n",
44 | " * CREATE/ALTER/DROP Tables\n",
45 | " * CREATE/ALTER/DROP Indexes\n",
46 | " * Add constraints to tables\n",
47 | " * CREATE/ALTER/DROP Views\n",
48 | " * CREATE/ALTER/DROP Sequences\n",
49 | "* DML - Data Manipulation Language\n",
50 | " * Inserting new data into the table\n",
51 | " * Updating existing data in the table\n",
52 | " * Deleting existing data from the table\n",
53 | "* DQL - Data Query Language\n",
54 | " * Read the data from the table\n",
55 | "\n",
56 | "On top of these we also use TCL (Transaction Control Language) which include **COMMIT** and **ROLLBACK**. \n",
57 | "\n",
58 | "As part of this section in the subsequent topics we will primarily focus on basic DDL and DML."
59 | ]
60 | }
61 | ],
62 | "metadata": {
63 | "kernelspec": {
64 | "display_name": "Python 3",
65 | "language": "python",
66 | "name": "python3"
67 | },
68 | "language_info": {
69 | "codemirror_mode": {
70 | "name": "ipython",
71 | "version": 3
72 | },
73 | "file_extension": ".py",
74 | "mimetype": "text/x-python",
75 | "name": "python",
76 | "nbconvert_exporter": "python",
77 | "pygments_lexer": "ipython3",
78 | "version": "3.6.12"
79 | }
80 | },
81 | "nbformat": 4,
82 | "nbformat_minor": 4
83 | }
84 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/05_crud_operations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## CRUD Operations\n",
8 | "\n",
9 | "Let us get an overview of CRUD Operations. They are nothing but DML and queries to read the data while performing database operations."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* CRUD is widely used from application development perspective.\n",
44 | "* C - CREATE (INSERT)\n",
45 | "* R - READ (READ)\n",
46 | "* U - UPDATE (UPDATE)\n",
47 | "* D - DELETE (DELETE)\n",
48 | "\n",
49 | "As part of the application development process we perform CRUD Operations using REST APIs."
50 | ]
51 | }
52 | ],
53 | "metadata": {
54 | "kernelspec": {
55 | "display_name": "Python 3",
56 | "language": "python",
57 | "name": "python3"
58 | },
59 | "language_info": {
60 | "codemirror_mode": {
61 | "name": "ipython",
62 | "version": 3
63 | },
64 | "file_extension": ".py",
65 | "mimetype": "text/x-python",
66 | "name": "python",
67 | "nbconvert_exporter": "python",
68 | "pygments_lexer": "ipython3",
69 | "version": "3.6.12"
70 | }
71 | },
72 | "nbformat": 4,
73 | "nbformat_minor": 4
74 | }
75 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/09_deleting_data.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Deleting Data\n",
8 | "\n",
9 | "Let us understand how to delete the data from a table."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Typical Syntax - `DELETE FROM
WHERE `.\n",
44 | "* If we do not specify condition, it will delete all the data from the table.\n",
45 | "* It is not recommended to use delete with out where condition to delete all the data (instead we should use `TRUNCATE`).\n",
46 | "* For now we will see basic examples for delete. One need to have good knowledge about `WHERE` clause to take care of complex conditions.\n",
47 | "* Let's see how we can delete all those records from users where the password is not set. We need to use `IS NULL` as condition to compare against Null values."
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 41,
53 | "metadata": {},
54 | "outputs": [
55 | {
56 | "name": "stdout",
57 | "output_type": "stream",
58 | "text": [
59 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
60 | "6 rows affected.\n"
61 | ]
62 | },
63 | {
64 | "data": {
65 | "text/html": [
66 | "
"
215 | ],
216 | "text/plain": [
217 | "[(3,)]"
218 | ]
219 | },
220 | "execution_count": 44,
221 | "metadata": {},
222 | "output_type": "execute_result"
223 | }
224 | ],
225 | "source": [
226 | "%sql SELECT count(1) FROM users"
227 | ]
228 | }
229 | ],
230 | "metadata": {
231 | "kernelspec": {
232 | "display_name": "Python 3",
233 | "language": "python",
234 | "name": "python3"
235 | },
236 | "language_info": {
237 | "codemirror_mode": {
238 | "name": "ipython",
239 | "version": 3
240 | },
241 | "file_extension": ".py",
242 | "mimetype": "text/x-python",
243 | "name": "python",
244 | "nbconvert_exporter": "python",
245 | "pygments_lexer": "ipython3",
246 | "version": "3.6.12"
247 | }
248 | },
249 | "nbformat": 4,
250 | "nbformat_minor": 4
251 | }
252 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/10_overview_of_transactions.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Overview of Transactions\n",
8 | "\n",
9 | "Let us go through the details related to Transactions."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* We typically perform operations such as `COMMIT` and `ROLLBACK` via the applications.\n",
44 | "* `COMMIT` will persist the changes in the database.\n",
45 | "* `ROLLBACK` will revert the uncommitted changes in the database.\n",
46 | "* We typically rollback the uncommitted changes in a transaction if there is any exception as part of the application logic flow.\n",
47 | "* For example, once the order is placed all the items that are added to shopping cart will be rolled back if the payment using credit card fails.\n",
48 | "* By default every operation is typically committed in Postgres. We will get into the details related to transaction as part of application development later.\n",
49 | "* Commands such as `COMMIT`, `ROLLBACK` typically comes under TCL (Transaction Control Language)"
50 | ]
51 | }
52 | ],
53 | "metadata": {
54 | "kernelspec": {
55 | "display_name": "Python 3",
56 | "language": "python",
57 | "name": "python3"
58 | },
59 | "language_info": {
60 | "codemirror_mode": {
61 | "name": "ipython",
62 | "version": 3
63 | },
64 | "file_extension": ".py",
65 | "mimetype": "text/x-python",
66 | "name": "python",
67 | "nbconvert_exporter": "python",
68 | "pygments_lexer": "ipython3",
69 | "version": "3.6.12"
70 | }
71 | },
72 | "nbformat": 4,
73 | "nbformat_minor": 4
74 | }
75 |
--------------------------------------------------------------------------------
/02_dml_or_crud_operations/11_exercises_database_operations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Exercises - Database Operations\n",
8 | "\n",
9 | "Let's create a table and perform database operations using direct SQL."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "### Exercise 1 - Create Table\n",
44 | "\n",
45 | "Create table - **courses**\n",
46 | "* course_id - sequence generated integer and primary key\n",
47 | "* course_name - which holds alpha numeric or string values up to 60 characters\n",
48 | "* course_author - which holds the name of the author up to 40 characters\n",
49 | "* course_status - which holds one of these values (published, draft, inactive). \n",
50 | "* course_published_dt - which holds date type value. \n",
51 | "\n",
52 | "Provide the script as answer for this exercise."
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "### Exercise 2 - Inserting Data\n",
60 | "\n",
61 | "* Insert data into courses using the data provided. Make sure id is system generated.\n",
62 | "\n",
63 | "|Course Name |Course Author |Course Status|Course Published Date|\n",
64 | "|---------------------------------|----------------------|-------------|---------------------|\n",
65 | "|Programming using Python |Bob Dillon |published |2020-09-30 |\n",
66 | "|Data Engineering using Python |Bob Dillon |published |2020-07-15 |\n",
67 | "|Data Engineering using Scala |Elvis Presley |draft | |\n",
68 | "|Programming using Scala |Elvis Presley |published |2020-05-12 |\n",
69 | "|Programming using Java |Mike Jack |inactive |2020-08-10 |\n",
70 | "|Web Applications - Python Flask |Bob Dillon |inactive |2020-07-20 |\n",
71 | "|Web Applications - Java Spring |Mike Jack |draft | |\n",
72 | "|Pipeline Orchestration - Python |Bob Dillon |draft | |\n",
73 | "|Streaming Pipelines - Python |Bob Dillon |published |2020-10-05 |\n",
74 | "|Web Applications - Scala Play |Elvis Presley |inactive |2020-09-30 |\n",
75 | "|Web Applications - Python Django |Bob Dillon |published |2020-06-23 |\n",
76 | "|Server Automation - Ansible |Uncle Sam |published |2020-07-05 |\n",
77 | "\n",
78 | "Provide the insert statement(s) as answer for this exercise."
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "### Exercise 3 - Updating Data\n",
86 | "\n",
87 | "Update the status of all the **draft courses** related to Python and Scala to **published** along with the **course_published_dt using system date**. \n",
88 | "\n",
89 | "Provide the update statement as answer for this exercise."
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "### Exercise 4 - Deleting Data\n",
97 | "\n",
98 | "Delete all the courses which are neither in draft mode nor published.\n",
99 | "\n",
100 | "Provide the delete statement as answer for this exercise."
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "Validation - Get count of all published courses by author and make sure output is sorted in descending order by count. \n",
108 | "\n",
109 | "```sql\n",
110 | "SELECT course_author, count(1) AS course_count\n",
111 | "FROM courses\n",
112 | "WHERE course_status= 'published'\n",
113 | "GROUP BY course_author\n",
114 | "```\n",
115 | "\n",
116 | "|Course Author |Course Count|\n",
117 | "|----------------|------------|\n",
118 | "|Bob Dillon |5 |\n",
119 | "|Elvis Presley |2 |\n",
120 | "|Uncle Sam |1 |"
121 | ]
122 | }
123 | ],
124 | "metadata": {
125 | "kernelspec": {
126 | "display_name": "Python 3",
127 | "language": "python",
128 | "name": "python3"
129 | },
130 | "language_info": {
131 | "codemirror_mode": {
132 | "name": "ipython",
133 | "version": 3
134 | },
135 | "file_extension": ".py",
136 | "mimetype": "text/x-python",
137 | "name": "python",
138 | "nbconvert_exporter": "python",
139 | "pygments_lexer": "ipython3",
140 | "version": "3.6.12"
141 | }
142 | },
143 | "nbformat": 4,
144 | "nbformat_minor": 4
145 | }
146 |
--------------------------------------------------------------------------------
/03_writing_basic_sql_queries/01_writing_basic_sql_queries.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Writing Basic SQL Queries\n",
8 | "As part of this section we will primarily focus on writing basic queries.\n",
9 | "\n",
10 | "* Standard Transformations\n",
11 | "* Overview of Data Model\n",
12 | "* Define Problem Statement – Daily Product Revenue\n",
13 | "* Preparing Tables\n",
14 | "* Selecting or Projecting Data\n",
15 | "* Filtering Data\n",
16 | "* Joining Tables – Inner\n",
17 | "* Joining Tables – Outer\n",
18 | "* Performing Aggregations\n",
19 | "* Sorting Data\n",
20 | "* Solution – Daily Product Revenue\n",
21 | "\n",
22 | "Here are the key objectives for this section\n",
23 | "* What are different standard transformations and how they are implemented using Basic SQL?\n",
24 | "* Understand the data model using which basic SQL features are explored?\n",
25 | "* Setup the database, tables and load the data quickly\n",
26 | "* How we typically select or project the data, filter the data, join data from multiple tables, compute metrics using aggregate functions, sort the data etc?\n",
27 | "* While exploring basic SQL queries, we will define a problem statement and come up with a solution at the end.\n",
28 | "* Self evaluate whether one understood all the key aspects of writing basic SQL queries using exercises at the end."
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": null,
34 | "metadata": {},
35 | "outputs": [],
36 | "source": []
37 | }
38 | ],
39 | "metadata": {
40 | "kernelspec": {
41 | "display_name": "Python 3",
42 | "language": "python",
43 | "name": "python3"
44 | },
45 | "language_info": {
46 | "codemirror_mode": {
47 | "name": "ipython",
48 | "version": 3
49 | },
50 | "file_extension": ".py",
51 | "mimetype": "text/x-python",
52 | "name": "python",
53 | "nbconvert_exporter": "python",
54 | "pygments_lexer": "ipython3",
55 | "version": "3.6.12"
56 | }
57 | },
58 | "nbformat": 4,
59 | "nbformat_minor": 4
60 | }
61 |
--------------------------------------------------------------------------------
/03_writing_basic_sql_queries/02_standard_transformations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Standard Transformations\n",
8 | "\n",
9 | "Here are some of the transformations we typically perform on regular basis."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Projection of data\n",
44 | "* Filtering data\n",
45 | "* Performing Aggregations\n",
46 | "* Joins\n",
47 | "* Sorting\n",
48 | "* Ranking (will be covered as part of advanced queries)"
49 | ]
50 | }
51 | ],
52 | "metadata": {
53 | "kernelspec": {
54 | "display_name": "Python 3",
55 | "language": "python",
56 | "name": "python3"
57 | },
58 | "language_info": {
59 | "codemirror_mode": {
60 | "name": "ipython",
61 | "version": 3
62 | },
63 | "file_extension": ".py",
64 | "mimetype": "text/x-python",
65 | "name": "python",
66 | "nbconvert_exporter": "python",
67 | "pygments_lexer": "ipython3",
68 | "version": "3.6.12"
69 | }
70 | },
71 | "nbformat": 4,
72 | "nbformat_minor": 4
73 | }
74 |
--------------------------------------------------------------------------------
/03_writing_basic_sql_queries/03_overview_of_data_model.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Overview of Data Model\n",
8 | "\n",
9 | "We will be using retail data model for this section. It contains 6 tables."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Table list\n",
44 | " * orders\n",
45 | " * order_items\n",
46 | " * products\n",
47 | " * categories\n",
48 | " * departments\n",
49 | " * customers\n",
50 | "* **orders** and **order_items** are transactional tables.\n",
51 | "* **products**, **categories** and **departments** are non transactional tables which have data related to product catalog.\n",
52 | "* **customers** is a non transactional table which have customer details.\n",
53 | "* There is 1 to many relationship between **orders** and **order_items**.\n",
54 | "* There is 1 to many relationship between **products** and **order_items**. Each order item will have one product and product can be part of many order_items.\n",
55 | "* There is 1 to many relationship between **customers** and **orders**. A customer can place many orders over a period of time but there cannot be more than one customer for a given order.\n",
56 | "* There is 1 to many relationship between **departments** and **categories**. Also there is 1 to many relationship between **categories** and **products**.\n",
57 | "* There is hierarchical relationship from departments to products - **departments** -> **categories** -> **products**"
58 | ]
59 | }
60 | ],
61 | "metadata": {
62 | "kernelspec": {
63 | "display_name": "Python 3",
64 | "language": "python",
65 | "name": "python3"
66 | },
67 | "language_info": {
68 | "codemirror_mode": {
69 | "name": "ipython",
70 | "version": 3
71 | },
72 | "file_extension": ".py",
73 | "mimetype": "text/x-python",
74 | "name": "python",
75 | "nbconvert_exporter": "python",
76 | "pygments_lexer": "ipython3",
77 | "version": "3.6.12"
78 | }
79 | },
80 | "nbformat": 4,
81 | "nbformat_minor": 4
82 | }
83 |
--------------------------------------------------------------------------------
/03_writing_basic_sql_queries/04_define_problem_statement.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Define Problem Statement – Daily Product Revenue\n",
8 | "\n",
9 | "Let us try to get daily product revenue using retail tables."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* daily is derived from orders.order_date.\n",
44 | "* product has to be derived from products.product_name.\n",
45 | "* revenue has to be derived from order_items.order_item_subtotal.\n",
46 | "* We need to join all the 3 tables, then group by order_date, product_id as well as product_name to get revenue using order_item_subtotal.\n",
47 | "* Get Daily Product Revenue using products, orders and order_items data set.\n",
48 | "* We have following fields in **orders**.\n",
49 | " * order_id\n",
50 | " * order_date\n",
51 | " * order_customer_id\n",
52 | " * order_status\n",
53 | "* We have following fields in **order_items**.\n",
54 | " * order_item_id\n",
55 | " * order_item_order_id\n",
56 | " * order_item_product_id\n",
57 | " * order_item_quantity\n",
58 | " * order_item_subtotal\n",
59 | " * order_item_product_price\n",
60 | "* We have following fields in **products**\n",
61 | " * product_id\n",
62 | " * product_category_id\n",
63 | " * product_name\n",
64 | " * product_description\n",
65 | " * product_price\n",
66 | " * product_image\n",
67 | "* We have one to many relationship between orders and order_items.\n",
68 | "* **orders.order_id** is **primary key** and **order_items.order_item_order_id** is foreign key to **orders.order_id**.\n",
69 | "* We have one to many relationship between products and order_items.\n",
70 | "* **products.product_id** is **primary key** and **order_items.order_item_product_id** is foreign key to **products.product_id**\n",
71 | "* By the end of this module we will explore all standard transformations and get daily product revenue using following fields.\n",
72 | " * **orders.order_date**\n",
73 | " * **order_items.order_item_product_id**\n",
74 | " * **products.product_name**\n",
75 | " * **order_items.order_item_subtotal** (aggregated using date and product_id).\n",
76 | "* We will consider only **COMPLETE** or **CLOSED** orders.\n",
77 | "* As there can be more than one product names with different ids, we have to include product_id as part of the key using which we will group the data."
78 | ]
79 | }
80 | ],
81 | "metadata": {
82 | "kernelspec": {
83 | "display_name": "Python 3",
84 | "language": "python",
85 | "name": "python3"
86 | },
87 | "language_info": {
88 | "codemirror_mode": {
89 | "name": "ipython",
90 | "version": 3
91 | },
92 | "file_extension": ".py",
93 | "mimetype": "text/x-python",
94 | "name": "python",
95 | "nbconvert_exporter": "python",
96 | "pygments_lexer": "ipython3",
97 | "version": "3.6.12"
98 | }
99 | },
100 | "nbformat": 4,
101 | "nbformat_minor": 4
102 | }
103 |
--------------------------------------------------------------------------------
/03_writing_basic_sql_queries/13_exercises_basic_sql_queries.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Exercises - Basic SQL Queries\n",
8 | "\n",
9 | "Here are some of the exercises for which you can write SQL queries to self evaluate."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 13,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Ensure that we have required database and user for retail data. **We might provide the database as part of our labs.** Here are the instructions to use `psql` for setting up the required database (if required) and tables.\n",
44 | "\n",
45 | "```shell\n",
46 | "psql -U postgres -h localhost -p 5432 -W\n",
47 | "```\n",
48 | "\n",
49 | "```sql\n",
50 | "CREATE DATABASE itversity_retail_db;\n",
51 | "CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';\n",
52 | "GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;\n",
53 | "```\n",
54 | "\n",
55 | "* Create Tables using the script provided. You can either use `psql` or **SQL Workbench**.\n",
56 | "\n",
57 | "```shell\n",
58 | "psql -U itversity_retail_user \\\n",
59 | " -h localhost \\\n",
60 | " -p 5432 \\\n",
61 | " -d itversity_retail_db \\\n",
62 | " -W\n",
63 | "```\n",
64 | "\n",
65 | "* You can drop the existing tables.\n",
66 | "\n",
67 | "```sql\n",
68 | "DROP TABLE order_items;\n",
69 | "DROP TABLE orders;\n",
70 | "DROP TABLE customers;\n",
71 | "DROP TABLE products;\n",
72 | "DROP TABLE categories;\n",
73 | "DROP TABLE departments;\n",
74 | "```\n",
75 | "\n",
76 | "* Once the tables are dropped you can run below script to create the tables for the purpose of exercises.\n",
77 | "\n",
78 | "```sql\n",
79 | "\\i /data/retail_db/create_db_tables_pg.sql\n",
80 | "```\n",
81 | "\n",
82 | "* Data shall be loaded using the script provided.\n",
83 | "\n",
84 | "```sql\n",
85 | "\\i /data/retail_db/load_db_tables_pg.sql\n",
86 | "```\n",
87 | "\n",
88 | "* Run queries to validate we have data in all the 3 tables."
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "### Exercise 1 - Customer order count\n",
96 | "\n",
97 | "Get order count per customer for the month of 2014 January.\n",
98 | "* Tables - orders and customers\n",
99 | "* Data should be sorted in descending order by count and ascending order by customer id.\n",
100 | "* Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count."
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "### Exercise 2 - Dormant Customers\n",
108 | "\n",
109 | "Get the customer details who have not placed any order for the month of 2014 January.\n",
110 | "* Tables - orders and customers\n",
111 | "* Data should be sorted in ascending order by customer_id\n",
112 | "* Output should contain all the fields from customers"
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {},
118 | "source": [
119 | "### Exercise 3 - Revenue Per Customer\n",
120 | "\n",
121 | "Get the revenue generated by each customer for the month of 2014 January\n",
122 | "* Tables - orders, order_items and customers\n",
123 | "* Data should be sorted in descending order by revenue and then ascending order by customer_id\n",
124 | "* Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.\n",
125 | "* If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.\n",
126 | "* Consider only COMPLETE and CLOSED orders"
127 | ]
128 | },
129 | {
130 | "cell_type": "markdown",
131 | "metadata": {},
132 | "source": [
133 | "### Exercise 4 - Revenue Per Category\n",
134 | "\n",
135 | "Get the revenue generated for each category for the month of 2014 January\n",
136 | "* Tables - orders, order_items, products and categories\n",
137 | "* Data should be sorted in ascending order by category_id.\n",
138 | "* Output should contain all the fields from category along with the revenue as category_revenue.\n",
139 | "* Consider only COMPLETE and CLOSED orders"
140 | ]
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {},
145 | "source": [
146 | "### Exercise 5 - Product Count Per Department\n",
147 | "\n",
148 | "Get the products for each department.\n",
149 | "* Tables - departments, categories, products\n",
150 | "* Data should be sorted in ascending order by department_id\n",
151 | "* Output should contain all the fields from department and the product count as product_count"
152 | ]
153 | }
154 | ],
155 | "metadata": {
156 | "kernelspec": {
157 | "display_name": "Python 3",
158 | "language": "python",
159 | "name": "python3"
160 | },
161 | "language_info": {
162 | "codemirror_mode": {
163 | "name": "ipython",
164 | "version": 3
165 | },
166 | "file_extension": ".py",
167 | "mimetype": "text/x-python",
168 | "name": "python",
169 | "nbconvert_exporter": "python",
170 | "pygments_lexer": "ipython3",
171 | "version": "3.6.12"
172 | }
173 | },
174 | "nbformat": 4,
175 | "nbformat_minor": 4
176 | }
177 |
--------------------------------------------------------------------------------
/04_creating_tables_and_indexes/01_creating_tables_and_indexes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Creating Tables and Indexes\n",
8 | "\n",
9 | "Let us go through the details related to creating tables and indexes. We will also talk about how columns, constraints etc while going through the details related to tables and indexes.\n",
10 | "\n",
11 | "* DDL - Data Definition Language\n",
12 | "* Overview of Data Types\n",
13 | "* Adding or Modifying Columns\n",
14 | "* Different Types of Constraints\n",
15 | "* Managing Constraints\n",
16 | "* Indexes on Tables\n",
17 | "* Indexes for Constraints\n",
18 | "* Overview of Sequences\n",
19 | "* Truncating Tables\n",
20 | "* Dropping Tables\n",
21 | "* Exercise - Managing Database Objects\n",
22 | "\n",
23 | "Here are the key objectives of this section:\n",
24 | "* How to create and manage tables?\n",
25 | "* Get in depth understanding about columns and commonly used data types\n",
26 | "* What are different types of constraints and how they are managed?\n",
27 | "* What are indexes and how they are relevant to Prmary Key, Unique and Foreign Key constraints?\n",
28 | "* What is a Sequence and how sequences are used to populate Surrogate Keys?\n",
29 | "* Self evaluate whether one understood all the key aspects of managing tables and constraints."
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": null,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": []
38 | }
39 | ],
40 | "metadata": {
41 | "kernelspec": {
42 | "display_name": "Python 3",
43 | "language": "python",
44 | "name": "python3"
45 | },
46 | "language_info": {
47 | "codemirror_mode": {
48 | "name": "ipython",
49 | "version": 3
50 | },
51 | "file_extension": ".py",
52 | "mimetype": "text/x-python",
53 | "name": "python",
54 | "nbconvert_exporter": "python",
55 | "pygments_lexer": "ipython3",
56 | "version": "3.6.12"
57 | }
58 | },
59 | "nbformat": 4,
60 | "nbformat_minor": 4
61 | }
62 |
--------------------------------------------------------------------------------
/04_creating_tables_and_indexes/05_different_types_of_constraints.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Different Types of Constraints\n",
8 | "\n",
9 | "Let us understand details about different types of constraints used in RDBMS databases."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 33,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Supported constraints:\n",
44 | " * NOT NULL constraint\n",
45 | " * CHECK constraint\n",
46 | " * UNIQUE constraint\n",
47 | " * PRIMARY KEY constraint\n",
48 | " * FOREIGN KEY constraint\n",
49 | "* All constraints can be added while creating the table or on pre-created tables using `ALTER`.\n",
50 | "* Typically we define `NOT NULL`, `CHECK` constraints while creating the tables. However, we can also specify **not null constraints** as well as **check constraints** to the columns while adding columns using `ALTER TABLE`.\n",
51 | "* `FOREIGN KEY` constraints are created after the tables are created. It is primarily used to define relationship between 2 tables - example: users is parent table and user_login_details is child table with one to many relationship between them.\n",
52 | "* `PRIMARY KEY` and `UNIQUE` constraints might be added as part of CREATE table statements or ALTER table statements. Both are commonly used practices.\n",
53 | "* Let us compare and contrast `PRIMARY KEY` and `UNIQUE` constraints.\n",
54 | " * There can be only one `PRIMARY KEY` in a table where as there can be any number of `UNIQUE` constraints.\n",
55 | " * `UNIQUE` columns can have null values unless `NOT NULL` is also enforced. In case of `PRIMARY KEY`, both uniqueness as well as not null are strictly enforced. In other words a primary key column cannot be null where as unique column can be null.\n",
56 | " * `FOREIGN KEY` from a child table can be defined against `PRIMARY KEY` column or `UNIQUE` column.\n",
57 | " * Typically `PRIMARY KEY` columns are surrogate keys which are supported by sequence.\n",
58 | " * `PRIMARY KEY` or `UNIQUE` can be composite. It means there can be more than one column to define `PRIMARY KEY` or `UNIQUE` constraint.\n",
59 | "* Let's take an example of LMS (Learning Management System).\n",
60 | " * **USERS** - it contains columns such as user_id, user_email_id, user_first_name etc. We can enforce primary key constraint on user_id and unique constraint on user_email_id.\n",
61 | " * **COURSES** - it contains columns such as course_id, course_name, course_price etc. Primary key constraint will be enforced on course_id.\n",
62 | " * **STUDENTS** - A student is nothing but a user who is enrolled for one or more courses. But he can enroll for one course only once.\n",
63 | " * It contains fields such as student_id, user_id, course_id, amount_paid, enrolled_dt etc.\n",
64 | " * Primary key constraint will be enforced on student_id.\n",
65 | " * A foreign key constraint can be enforced on students.user_id against users.user_id.\n",
66 | " * Another foreign key constraint can be enforced on students.course_id against courses.course_id.\n",
67 | " * Also we can have unique constraint enforced on students.user_id and students.course_id. It will be composite key as it have more than one column."
68 | ]
69 | }
70 | ],
71 | "metadata": {
72 | "kernelspec": {
73 | "display_name": "Python 3",
74 | "language": "python",
75 | "name": "python3"
76 | },
77 | "language_info": {
78 | "codemirror_mode": {
79 | "name": "ipython",
80 | "version": 3
81 | },
82 | "file_extension": ".py",
83 | "mimetype": "text/x-python",
84 | "name": "python",
85 | "nbconvert_exporter": "python",
86 | "pygments_lexer": "ipython3",
87 | "version": "3.6.12"
88 | }
89 | },
90 | "nbformat": 4,
91 | "nbformat_minor": 4
92 | }
93 |
--------------------------------------------------------------------------------
/04_creating_tables_and_indexes/07_indexes_on_tables.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Indexes on Tables\n",
8 | "\n",
9 | "Let us go through the details related to indexes supported in RDBMS such as Postgres."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 64,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* An index can be unique or non unique.\n",
44 | "* Unique Index - Data will be sorted in ascending order and uniqueness is enforced.\n",
45 | "* Non Unique Index - Data will be sorted in ascending order and uniqueness is not enforced.\n",
46 | "* Unless specified all indexes are of type B Tree.\n",
47 | "* For sparsely populated columns, we tend to create B Tree indexes. B Tree indexes are the most commonly used ones.\n",
48 | "* For densely populated columns such as gender, month etc with very few distinct values we can leverage bit map index. However bitmap indexes are not used quite extensively in typical web or mobile applications.\n",
49 | "* Write operations will become relatively slow as data have to be managed in index as well as table.\n",
50 | "* We need to be careful while creating indexes on the tables as write operations can become slow as more indexes are added to the table.\n",
51 | "* Here are some of the criteria for creating indexes.\n",
52 | " * Create unique indexes when you want to enforce uniqueness. If you define unique constraint or primary key constraint, it will create unique index internally.\n",
53 | " * If we are performing joins between 2 tables based on a value, then the foreign key column in the child table should be indexed. \n",
54 | " * Typically as part of order management system, we tend to get all the order details for a given order using order id.\n",
55 | " * In our case we will be able to improve the query performance by adding index on **order_items.order_item_order_id**.\n",
56 | " * However, write operation will become a bit slow. But it is acceptable and required to create index on **order_items.order_item_order_id** as we write once and read many times over the life of the order.\n",
57 | "* Let us perform tasks related to indexes.\n",
58 | " * Drop and recreate retail db tables.\n",
59 | " * Load data into retail db tables.\n",
60 | " * Compute statistics (Optional). It is typically taken care automatically by the schedules defined by DBAs.\n",
61 | " * Use code to randomly fetch 2000 orders and join with order_items - compute time.\n",
62 | " * Create index for order_items.order_item_order_id and compute statistics\n",
63 | " * Use code to randomly fetch 2000 orders and join with order_items - compute time.\n",
64 | "* Script to create tables and load data in case there are no tables in retail database.\n",
65 | "\n",
66 | "```sql\n",
67 | "psql -U itversity_retail_user \\\n",
68 | " -h localhost \\\n",
69 | " -p 5432 \\\n",
70 | " -d itversity_retail_db \\\n",
71 | " -W\n",
72 | "\n",
73 | "DROP TABLE order_items;\n",
74 | "DROP TABLE orders;\n",
75 | "DROP TABLE products;\n",
76 | "DROP TABLE categories;\n",
77 | "DROP TABLE departments;\n",
78 | "DROP TABLE customers;\n",
79 | "\n",
80 | "\\i /data/retail_db/create_db_tables_pg.sql\n",
81 | "\\i /data/retail_db/load_db_tables_pg.sql\n",
82 | "```"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 65,
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "name": "stdout",
92 | "output_type": "stream",
93 | "text": [
94 | "Defaulting to user installation because normal site-packages is not writeable\n",
95 | "Requirement already satisfied: psycopg2 in /opt/anaconda3/envs/beakerx/lib/python3.6/site-packages (2.8.6)\n"
96 | ]
97 | }
98 | ],
99 | "source": [
100 | "!pip install psycopg2"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 66,
106 | "metadata": {},
107 | "outputs": [],
108 | "source": [
109 | "import psycopg2"
110 | ]
111 | },
112 | {
113 | "cell_type": "code",
114 | "execution_count": 67,
115 | "metadata": {},
116 | "outputs": [
117 | {
118 | "name": "stdout",
119 | "output_type": "stream",
120 | "text": [
121 | "CPU times: user 73.8 ms, sys: 31.4 ms, total: 105 ms\n",
122 | "Wall time: 19.6 s\n"
123 | ]
124 | }
125 | ],
126 | "source": [
127 | "%%time\n",
128 | "\n",
129 | "from random import randrange\n",
130 | "connection = psycopg2.connect(\n",
131 | " host='localhost',\n",
132 | " port='5432',\n",
133 | " database='itversity_retail_db',\n",
134 | " user='itversity_retail_user',\n",
135 | " password='retail_password'\n",
136 | ")\n",
137 | "cursor = connection.cursor()\n",
138 | "query = '''SELECT * \n",
139 | "FROM orders o JOIN order_items oi \n",
140 | " ON o.order_id = oi.order_item_order_id\n",
141 | "WHERE o.order_id = %s\n",
142 | "'''\n",
143 | "ctr = 0\n",
144 | "while True:\n",
145 | " if ctr == 2000:\n",
146 | " break\n",
147 | " order_id = randrange(1, 68883)\n",
148 | " cursor.execute(query, (order_id,))\n",
149 | " ctr += 1\n",
150 | "cursor.close()\n",
151 | "connection.close()"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 68,
157 | "metadata": {},
158 | "outputs": [
159 | {
160 | "name": "stdout",
161 | "output_type": "stream",
162 | "text": [
163 | "The sql extension is already loaded. To reload it, use:\n",
164 | " %reload_ext sql\n"
165 | ]
166 | }
167 | ],
168 | "source": [
169 | "%load_ext sql"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": 69,
175 | "metadata": {},
176 | "outputs": [
177 | {
178 | "name": "stdout",
179 | "output_type": "stream",
180 | "text": [
181 | "env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db\n"
182 | ]
183 | }
184 | ],
185 | "source": [
186 | "%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db"
187 | ]
188 | },
189 | {
190 | "cell_type": "code",
191 | "execution_count": 70,
192 | "metadata": {},
193 | "outputs": [
194 | {
195 | "name": "stdout",
196 | "output_type": "stream",
197 | "text": [
198 | " * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
199 | "Done.\n"
200 | ]
201 | },
202 | {
203 | "data": {
204 | "text/plain": [
205 | "[]"
206 | ]
207 | },
208 | "execution_count": 70,
209 | "metadata": {},
210 | "output_type": "execute_result"
211 | }
212 | ],
213 | "source": [
214 | "%%sql\n",
215 | "\n",
216 | "CREATE INDEX order_items_oid_idx\n",
217 | "ON order_items(order_item_order_id)"
218 | ]
219 | },
220 | {
221 | "cell_type": "code",
222 | "execution_count": 71,
223 | "metadata": {},
224 | "outputs": [
225 | {
226 | "name": "stdout",
227 | "output_type": "stream",
228 | "text": [
229 | "CPU times: user 49.1 ms, sys: 32.9 ms, total: 82 ms\n",
230 | "Wall time: 265 ms\n"
231 | ]
232 | }
233 | ],
234 | "source": [
235 | "%%time\n",
236 | "\n",
237 | "from random import randrange\n",
238 | "connection = psycopg2.connect(\n",
239 | " host='localhost',\n",
240 | " port='5432',\n",
241 | " database='itversity_retail_db',\n",
242 | " user='itversity_retail_user',\n",
243 | " password='retail_password'\n",
244 | ")\n",
245 | "cursor = connection.cursor()\n",
246 | "query = '''SELECT * \n",
247 | "FROM orders o JOIN order_items oi \n",
248 | " ON o.order_id = oi.order_item_order_id\n",
249 | "WHERE o.order_id = %s\n",
250 | "'''\n",
251 | "ctr = 0\n",
252 | "while True:\n",
253 | " if ctr == 2000:\n",
254 | " break\n",
255 | " order_id = randrange(1, 68883)\n",
256 | " cursor.execute(query, (order_id,))\n",
257 | " ctr += 1\n",
258 | "cursor.close()\n",
259 | "connection.close()"
260 | ]
261 | }
262 | ],
263 | "metadata": {
264 | "kernelspec": {
265 | "display_name": "Python 3",
266 | "language": "python",
267 | "name": "python3"
268 | },
269 | "language_info": {
270 | "codemirror_mode": {
271 | "name": "ipython",
272 | "version": 3
273 | },
274 | "file_extension": ".py",
275 | "mimetype": "text/x-python",
276 | "name": "python",
277 | "nbconvert_exporter": "python",
278 | "pygments_lexer": "ipython3",
279 | "version": "3.6.12"
280 | }
281 | },
282 | "nbformat": 4,
283 | "nbformat_minor": 4
284 | }
285 |
--------------------------------------------------------------------------------
/04_creating_tables_and_indexes/12_exercises_managing_db_objects.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Exercises - Managing Database Objects\n",
8 | "\n",
9 | "This exercise is primarily to assess your capabilities related to put all important DDL concepts in practice by coming up with solution for a typical data migration problem from one database (mysql) to another (postgres).\n",
10 | "* Here are the high level steps for database migration from one type of database to another type of database.\n",
11 | " * Extract DDL Statements from source database (MySQL).\n",
12 | " * Extract the data in the form of delimited files and ship them to target database.\n",
13 | " * Refactor scripts as per target database (Postgres).\n",
14 | " * Create tables in the target database.\n",
15 | " * Execute pre-migration steps (disable constraints, drop indexes etc).\n",
16 | " * Load the data using native utilities.\n",
17 | " * Execute post-migration steps (enable constraints, create or rebuild indexes, reset sequences etc).\n",
18 | " * Sanity checks with basic queries.\n",
19 | " * Make sure all the impacted applications are validated thoroughly.\n",
20 | "* We have scripts and data set available in our GitHub repository. If you are using our environment the repository is already cloned under **/data/retail_db**.\n",
21 | "* It have scripts to create tables with primary keys. Those scripts are generated from MySQL tables and refactored for Postgres.\n",
22 | " * Script to create tables: **create_db_tables_pg.sql**\n",
23 | " * Load data into tables: **load_db_tables_pg.sql**\n",
24 | "* Here are the steps you need to perform to take care of this exercise.\n",
25 | " * Create tables\n",
26 | " * Load data\n",
27 | " * All the tables have surrogate primary keys. Here are the details.\n",
28 | " * orders.order_id\n",
29 | " * order_items.order_item_id\n",
30 | " * customers.customer_id\n",
31 | " * products.product_id\n",
32 | " * categories.category_id\n",
33 | " * departments.department_id\n",
34 | " * Get the maximum value from all surrogate primary key fields.\n",
35 | " * Create sequences for all surrogate primary key fields using maximum value. Make sure to use standard naming conventions for sequences.\n",
36 | " * Ensure sequences are mapped to the surrogate primary key fields.\n",
37 | " * Create foreign key constraints based up on this information.\n",
38 | " * orders.order_customer_id to customers.customer_id\n",
39 | " * order_items.order_item_order_id to orders.order_id\n",
40 | " * order_items.order_item_product_id to products.product_id\n",
41 | " * products.product_category_id to categories.category_id\n",
42 | " * categories.category_department_id to departments.department_id\n",
43 | " * Insert few records in `departments` to ensure that sequence generated numbers are used for `department_id`.\n",
44 | "* Here are the commands to launch `psql` and run scripts to create tables as well as load data into tables.\n",
45 | "\n",
46 | "```sql\n",
47 | "psql -U itversity_retail_user \\\n",
48 | " -h localhost \\\n",
49 | " -p 5432 \\\n",
50 | " -d itversity_retail_db \\\n",
51 | " -W\n",
52 | "\n",
53 | "\\i /data/retail_db/create_db_tables_pg.sql\n",
54 | "\n",
55 | "\\i /data/retail_db/load_db_tables_pg.sql\n",
56 | "```\n",
57 | "* We use this approach of creating tables, loading data and then adding constraints as well as resetting sequences for large volume data migrations from one database to another database.\n",
58 | "* Here are the commands or queries you need to come up with to solve this problem."
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "### Exercise 1\n",
66 | "\n",
67 | "Queries to get maximum values from surrogate primary keys."
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "### Exercise 2\n",
75 | "\n",
76 | "Commands to add sequences with `START WITH` pointing to the maximum value for the corresponding surrogate primary key fields. Make sure to use meaningful names to sequences **TABLENAME_SURROGATEFIELD_seq** (example: users_user_id_seq for users.user_id)"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "### Exercise 3\n",
84 | "\n",
85 | "Commands to alter sequences to bind them to corresponding surrogate primary key fields."
86 | ]
87 | },
88 | {
89 | "cell_type": "markdown",
90 | "metadata": {},
91 | "source": [
92 | "### Exercise 4\n",
93 | "\n",
94 | "Add Foreign Key constraints to the tables.\n",
95 | "* Validate if the tables have data violataing foreign key constraints (Hint: You can use left outer join to find rows in child table but not in parent table)\n",
96 | "* Alter tables to add foreign keys as specified.\n",
97 | "* Here are the relationships for your reference.\n",
98 | " * orders.order_customer_id to customers.customer_id\n",
99 | " * order_items.order_item_order_id to orders.order_id\n",
100 | " * order_items.order_item_product_id to products.product_id\n",
101 | " * products.product_category_id to categories.category_id\n",
102 | " * categories.category_department_id to departments.department_id\n",
103 | "* Solution should contain the following:\n",
104 | " * Commands to add foreign keys to the tables."
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | "### Exercise 5\n",
112 | "\n",
113 | "Queries to validate whether constraints are created or not. You can come up with queries against `information_schema` tables such as `columns`, `sequences` etc."
114 | ]
115 | }
116 | ],
117 | "metadata": {
118 | "kernelspec": {
119 | "display_name": "Python 3",
120 | "language": "python",
121 | "name": "python3"
122 | },
123 | "language_info": {
124 | "codemirror_mode": {
125 | "name": "ipython",
126 | "version": 3
127 | },
128 | "file_extension": ".py",
129 | "mimetype": "text/x-python",
130 | "name": "python",
131 | "nbconvert_exporter": "python",
132 | "pygments_lexer": "ipython3",
133 | "version": "3.6.12"
134 | }
135 | },
136 | "nbformat": 4,
137 | "nbformat_minor": 4
138 | }
139 |
--------------------------------------------------------------------------------
/05_partitioning_tables_and_indexes/01_partitioning_tables_and_indexes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Partitioning Tables and Indexes\n",
8 | "\n",
9 | "As part of this section we will primarily talk about partitioning tables as well as indexes.\n",
10 | "\n",
11 | "* Overview of Partitioning\n",
12 | "* List Partitioning\n",
13 | "* Managing Partitions - List\n",
14 | "* Manipulating Data\n",
15 | "* Range Partitioning\n",
16 | "* Managing Partitions - Range\n",
17 | "* Repartitioning - Range\n",
18 | "* Hash Partitioning\n",
19 | "* Managing Partitions - Hash\n",
20 | "* Usage Scenarios\n",
21 | "* Sub Partitioning\n",
22 | "* Exercise - Paritioning Tables\n",
23 | "\n",
24 | "Here are the key objectives of this section.\n",
25 | "* Different partitioning strategies\n",
26 | "* How to create and manage partitioned tables?\n",
27 | "* How to manipulate data by inserting, updating and deleting data from managed tables?\n",
28 | "* How to repartition the tables if partitioning strategy is changed (example: from yearly to monthly)?\n",
29 | "* Learn about sub partitioning or nested partitioning or multi level partitioning with examples.\n",
30 | "* Self evaluate whether one understood key skills related to partitioned tables or not using exercises."
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": null,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": []
39 | }
40 | ],
41 | "metadata": {
42 | "kernelspec": {
43 | "display_name": "Python 3",
44 | "language": "python",
45 | "name": "python3"
46 | },
47 | "language_info": {
48 | "codemirror_mode": {
49 | "name": "ipython",
50 | "version": 3
51 | },
52 | "file_extension": ".py",
53 | "mimetype": "text/x-python",
54 | "name": "python",
55 | "nbconvert_exporter": "python",
56 | "pygments_lexer": "ipython3",
57 | "version": "3.6.12"
58 | }
59 | },
60 | "nbformat": 4,
61 | "nbformat_minor": 4
62 | }
63 |
--------------------------------------------------------------------------------
/05_partitioning_tables_and_indexes/02_overview_of_partitioning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Overview of Partitioning\n",
8 | "\n",
9 | "Most of the modern database technologies support wide variety of partitioning strategies. However, here are the most commonly used ones."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 14,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* List Partitioning\n",
44 | "* Range Partitioning\n",
45 | "* Hash Partitioning\n",
46 | "* List and Range are more widely used compared to Hash Partitioning.\n",
47 | "* We can also mix and match these to have multi level partitioning. It is known as sub partitioning.\n",
48 | "* We can either partition a table with out primary key or partition a table with primary key when partition column is prime attribute (one of the primary key columns).\n",
49 | "* Indexes can be added to the partitioned table. If we create on the main table, it is global index and if we create index on each partition then it is partitioned index."
50 | ]
51 | }
52 | ],
53 | "metadata": {
54 | "kernelspec": {
55 | "display_name": "Python 3",
56 | "language": "python",
57 | "name": "python3"
58 | },
59 | "language_info": {
60 | "codemirror_mode": {
61 | "name": "ipython",
62 | "version": 3
63 | },
64 | "file_extension": ".py",
65 | "mimetype": "text/x-python",
66 | "name": "python",
67 | "nbconvert_exporter": "python",
68 | "pygments_lexer": "ipython3",
69 | "version": "3.6.12"
70 | }
71 | },
72 | "nbformat": 4,
73 | "nbformat_minor": 4
74 | }
75 |
--------------------------------------------------------------------------------
/05_partitioning_tables_and_indexes/06_range_partitioning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Range Partitioning\n",
8 | "\n",
9 | "Let us understand how we can take care of range partitioning of tables."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 56,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* It is primarily used to create partitions based up on a given range of values.\n",
44 | "* Here are the steps involved in creating table using range partitioning strategy.\n",
45 | " * Create table using `PARTITION BY RANGE`\n",
46 | " * Add default and range specific partitions\n",
47 | " * Validate by inserting data into the table\n",
48 | "* We can detach as well as drop the partitions from the table.\n",
49 | "\n",
50 | "\n",
51 | "### Create Partitioned Table\n",
52 | "\n",
53 | "Let us create partitioned table with name `users_range_part`.\n",
54 | "* It contains same columns as `users`.\n",
55 | "* We will partition the table based up on `created_dt` field.\n",
56 | "* We will create one partition per year with naming convention **users_range_part_yyyy** (users_range_part_2016)."
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 57,
62 | "metadata": {},
63 | "outputs": [
64 | {
65 | "name": "stdout",
66 | "output_type": "stream",
67 | "text": [
68 | "The sql extension is already loaded. To reload it, use:\n",
69 | " %reload_ext sql\n"
70 | ]
71 | }
72 | ],
73 | "source": [
74 | "%load_ext sql"
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": 58,
80 | "metadata": {},
81 | "outputs": [
82 | {
83 | "name": "stdout",
84 | "output_type": "stream",
85 | "text": [
86 | "env: DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db\n"
87 | ]
88 | }
89 | ],
90 | "source": [
91 | "%env DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": 59,
97 | "metadata": {},
98 | "outputs": [
99 | {
100 | "name": "stdout",
101 | "output_type": "stream",
102 | "text": [
103 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
104 | "Done.\n"
105 | ]
106 | },
107 | {
108 | "data": {
109 | "text/plain": [
110 | "[]"
111 | ]
112 | },
113 | "execution_count": 59,
114 | "metadata": {},
115 | "output_type": "execute_result"
116 | }
117 | ],
118 | "source": [
119 | "%sql DROP TABLE IF EXISTS users_range_part"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 60,
125 | "metadata": {},
126 | "outputs": [
127 | {
128 | "name": "stdout",
129 | "output_type": "stream",
130 | "text": [
131 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
132 | "Done.\n"
133 | ]
134 | },
135 | {
136 | "data": {
137 | "text/plain": [
138 | "[]"
139 | ]
140 | },
141 | "execution_count": 60,
142 | "metadata": {},
143 | "output_type": "execute_result"
144 | }
145 | ],
146 | "source": [
147 | "%%sql\n",
148 | "\n",
149 | "CREATE TABLE users_range_part (\n",
150 | " user_id SERIAL,\n",
151 | " user_first_name VARCHAR(30) NOT NULL,\n",
152 | " user_last_name VARCHAR(30) NOT NULL,\n",
153 | " user_email_id VARCHAR(50) NOT NULL,\n",
154 | " user_email_validated BOOLEAN DEFAULT FALSE,\n",
155 | " user_password VARCHAR(200),\n",
156 | " user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A\n",
157 | " is_active BOOLEAN DEFAULT FALSE,\n",
158 | " created_dt DATE DEFAULT CURRENT_DATE,\n",
159 | " last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n",
160 | " PRIMARY KEY (created_dt, user_id)\n",
161 | ") PARTITION BY RANGE(created_dt)"
162 | ]
163 | },
164 | {
165 | "cell_type": "markdown",
166 | "metadata": {},
167 | "source": [
168 | "```{note}\n",
169 | "We will not be able to insert the data until we add at least one partition.\n",
170 | "```"
171 | ]
172 | }
173 | ],
174 | "metadata": {
175 | "kernelspec": {
176 | "display_name": "Python 3",
177 | "language": "python",
178 | "name": "python3"
179 | },
180 | "language_info": {
181 | "codemirror_mode": {
182 | "name": "ipython",
183 | "version": 3
184 | },
185 | "file_extension": ".py",
186 | "mimetype": "text/x-python",
187 | "name": "python",
188 | "nbconvert_exporter": "python",
189 | "pygments_lexer": "ipython3",
190 | "version": "3.6.12"
191 | }
192 | },
193 | "nbformat": 4,
194 | "nbformat_minor": 4
195 | }
196 |
--------------------------------------------------------------------------------
/05_partitioning_tables_and_indexes/09_hash_partitioning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Hash Partitioning\n",
8 | "\n",
9 | "Let us understand how we can take care of Hash partitioning of tables."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 104,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* It is primarily used to create partitions based up on modulus and reminder.\n",
44 | "* Here are the steps involved in creating table using hash partitioning strategy.\n",
45 | " * Create table using `PARTITION BY HASH`\n",
46 | " * Add default and remainder specific partitions based up on modulus.\n",
47 | " * Validate by inserting data into the table\n",
48 | "* We can detach as well as drop the partitions from the table.\n",
49 | "* Hash partitioning is typically done on sparse columns such as `user_id`.\n",
50 | "* If we want to use hash partitioning on more than one tables with common key, we typically partition all the tables using same key.\n",
51 | "\n",
52 | "\n",
53 | "### Create Partitioned Table\n",
54 | "\n",
55 | "Let us create partitioned table with name `users_hash_part`.\n",
56 | "* It contains same columns as `users`.\n",
57 | "* We will partition the table based up on `user_id` field.\n",
58 | "* We will create one partition for each reminder with modulus 8."
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": 105,
64 | "metadata": {},
65 | "outputs": [
66 | {
67 | "name": "stdout",
68 | "output_type": "stream",
69 | "text": [
70 | "The sql extension is already loaded. To reload it, use:\n",
71 | " %reload_ext sql\n"
72 | ]
73 | }
74 | ],
75 | "source": [
76 | "%load_ext sql"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 106,
82 | "metadata": {},
83 | "outputs": [
84 | {
85 | "name": "stdout",
86 | "output_type": "stream",
87 | "text": [
88 | "env: DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db\n"
89 | ]
90 | }
91 | ],
92 | "source": [
93 | "%env DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 107,
99 | "metadata": {},
100 | "outputs": [
101 | {
102 | "name": "stdout",
103 | "output_type": "stream",
104 | "text": [
105 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
106 | "Done.\n"
107 | ]
108 | },
109 | {
110 | "data": {
111 | "text/plain": [
112 | "[]"
113 | ]
114 | },
115 | "execution_count": 107,
116 | "metadata": {},
117 | "output_type": "execute_result"
118 | }
119 | ],
120 | "source": [
121 | "%sql DROP TABLE IF EXISTS users_hash_part"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": 108,
127 | "metadata": {},
128 | "outputs": [
129 | {
130 | "name": "stdout",
131 | "output_type": "stream",
132 | "text": [
133 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
134 | "Done.\n"
135 | ]
136 | },
137 | {
138 | "data": {
139 | "text/plain": [
140 | "[]"
141 | ]
142 | },
143 | "execution_count": 108,
144 | "metadata": {},
145 | "output_type": "execute_result"
146 | }
147 | ],
148 | "source": [
149 | "%%sql\n",
150 | "\n",
151 | "CREATE TABLE users_hash_part (\n",
152 | " user_id SERIAL,\n",
153 | " user_first_name VARCHAR(30) NOT NULL,\n",
154 | " user_last_name VARCHAR(30) NOT NULL,\n",
155 | " user_email_id VARCHAR(50) NOT NULL,\n",
156 | " user_email_validated BOOLEAN DEFAULT FALSE,\n",
157 | " user_password VARCHAR(200),\n",
158 | " user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A\n",
159 | " is_active BOOLEAN DEFAULT FALSE,\n",
160 | " created_dt DATE DEFAULT CURRENT_DATE,\n",
161 | " last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n",
162 | " PRIMARY KEY (user_id)\n",
163 | ") PARTITION BY HASH(user_id)"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {},
169 | "source": [
170 | "```{note}\n",
171 | "We will not be able to insert the data until we add at least one partition.\n",
172 | "```"
173 | ]
174 | }
175 | ],
176 | "metadata": {
177 | "kernelspec": {
178 | "display_name": "Python 3",
179 | "language": "python",
180 | "name": "python3"
181 | },
182 | "language_info": {
183 | "codemirror_mode": {
184 | "name": "ipython",
185 | "version": 3
186 | },
187 | "file_extension": ".py",
188 | "mimetype": "text/x-python",
189 | "name": "python",
190 | "nbconvert_exporter": "python",
191 | "pygments_lexer": "ipython3",
192 | "version": "3.6.12"
193 | }
194 | },
195 | "nbformat": 4,
196 | "nbformat_minor": 4
197 | }
198 |
--------------------------------------------------------------------------------
/05_partitioning_tables_and_indexes/11_usage_scenarios.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Usage Scenarios\n",
8 | "\n",
9 | "Let us go through some of the usage scenarios with respect to partitioning."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 131,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* It is typically used to manage large tables so that the tables does not grow abnormally over a period of time.\n",
44 | "* Partitioning is quite often used on top of log tables, reporting tables etc.\n",
45 | "* If a log table is partitioned and if we want to have data for 7 years, partitions older than 7 years can be quickly dropped.\n",
46 | "* Dropping partitions to clean up huge chunk of data is much faster compared to running delete command on non partitioned table.\n",
47 | "* For tables like orders with limited set of statuses, we often use list partitioning based up on the status. It can be 2 partitions (CLOSED orders and ACTIVE orders) or separate partition for each status.\n",
48 | " * As most of the operations will be on **Active Orders**, this approach can significantly improve the performance.\n",
49 | "* In case of log tables, where we might want to retain data for several years, we tend to use range partition on date column. If we use list partition, then we might end up in duplication of data unnecessarily."
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 132,
55 | "metadata": {},
56 | "outputs": [
57 | {
58 | "name": "stdout",
59 | "output_type": "stream",
60 | "text": [
61 | "The sql extension is already loaded. To reload it, use:\n",
62 | " %reload_ext sql\n"
63 | ]
64 | }
65 | ],
66 | "source": [
67 | "%load_ext sql"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 133,
73 | "metadata": {},
74 | "outputs": [
75 | {
76 | "name": "stdout",
77 | "output_type": "stream",
78 | "text": [
79 | "env: DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db\n"
80 | ]
81 | }
82 | ],
83 | "source": [
84 | "%env DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "```{note}\n",
92 | "Monthly partition using list. We need to have additional column to store the month to use list partitioning strategy.\n",
93 | "```"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 134,
99 | "metadata": {},
100 | "outputs": [
101 | {
102 | "name": "stdout",
103 | "output_type": "stream",
104 | "text": [
105 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
106 | "Done.\n"
107 | ]
108 | },
109 | {
110 | "data": {
111 | "text/plain": [
112 | "[]"
113 | ]
114 | },
115 | "execution_count": 134,
116 | "metadata": {},
117 | "output_type": "execute_result"
118 | }
119 | ],
120 | "source": [
121 | "%%sql\n",
122 | "\n",
123 | "DROP TABLE IF EXISTS users_mthly"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": 135,
129 | "metadata": {},
130 | "outputs": [
131 | {
132 | "name": "stdout",
133 | "output_type": "stream",
134 | "text": [
135 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
136 | "Done.\n"
137 | ]
138 | },
139 | {
140 | "data": {
141 | "text/plain": [
142 | "[]"
143 | ]
144 | },
145 | "execution_count": 135,
146 | "metadata": {},
147 | "output_type": "execute_result"
148 | }
149 | ],
150 | "source": [
151 | "%%sql\n",
152 | "\n",
153 | "CREATE TABLE users_mthly (\n",
154 | " user_id SERIAL,\n",
155 | " user_first_name VARCHAR(30) NOT NULL,\n",
156 | " user_last_name VARCHAR(30) NOT NULL,\n",
157 | " user_email_id VARCHAR(50) NOT NULL,\n",
158 | " user_email_validated BOOLEAN DEFAULT FALSE,\n",
159 | " user_password VARCHAR(200),\n",
160 | " user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A\n",
161 | " is_active BOOLEAN DEFAULT FALSE,\n",
162 | " created_dt DATE DEFAULT CURRENT_DATE,\n",
163 | " created_mnth INT,\n",
164 | " last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n",
165 | " PRIMARY KEY (created_mnth, user_id)\n",
166 | ") PARTITION BY LIST(created_mnth)"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": 136,
172 | "metadata": {},
173 | "outputs": [
174 | {
175 | "name": "stdout",
176 | "output_type": "stream",
177 | "text": [
178 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
179 | "Done.\n"
180 | ]
181 | },
182 | {
183 | "data": {
184 | "text/plain": [
185 | "[]"
186 | ]
187 | },
188 | "execution_count": 136,
189 | "metadata": {},
190 | "output_type": "execute_result"
191 | }
192 | ],
193 | "source": [
194 | "%%sql\n",
195 | "\n",
196 | "CREATE TABLE users_mthly_201601\n",
197 | "PARTITION OF users_mthly\n",
198 | "FOR VALUES IN (201601)"
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": 137,
204 | "metadata": {},
205 | "outputs": [
206 | {
207 | "name": "stdout",
208 | "output_type": "stream",
209 | "text": [
210 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
211 | "Done.\n"
212 | ]
213 | },
214 | {
215 | "data": {
216 | "text/plain": [
217 | "[]"
218 | ]
219 | },
220 | "execution_count": 137,
221 | "metadata": {},
222 | "output_type": "execute_result"
223 | }
224 | ],
225 | "source": [
226 | "%%sql\n",
227 | "\n",
228 | "CREATE TABLE users_mthly_201602\n",
229 | "PARTITION OF users_mthly\n",
230 | "FOR VALUES IN (201602)"
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | "```{note}\n",
238 | "Monthly partition using range. Partition strategy is defined on top of **created_dt**. No additional column is required.\n",
239 | "```"
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": 138,
245 | "metadata": {},
246 | "outputs": [
247 | {
248 | "name": "stdout",
249 | "output_type": "stream",
250 | "text": [
251 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
252 | "Done.\n"
253 | ]
254 | },
255 | {
256 | "data": {
257 | "text/plain": [
258 | "[]"
259 | ]
260 | },
261 | "execution_count": 138,
262 | "metadata": {},
263 | "output_type": "execute_result"
264 | }
265 | ],
266 | "source": [
267 | "%%sql\n",
268 | "\n",
269 | "DROP TABLE IF EXISTS users_mthly"
270 | ]
271 | },
272 | {
273 | "cell_type": "code",
274 | "execution_count": 139,
275 | "metadata": {},
276 | "outputs": [
277 | {
278 | "name": "stdout",
279 | "output_type": "stream",
280 | "text": [
281 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
282 | "Done.\n"
283 | ]
284 | },
285 | {
286 | "data": {
287 | "text/plain": [
288 | "[]"
289 | ]
290 | },
291 | "execution_count": 139,
292 | "metadata": {},
293 | "output_type": "execute_result"
294 | }
295 | ],
296 | "source": [
297 | "%%sql\n",
298 | "\n",
299 | "CREATE TABLE users_mthly (\n",
300 | " user_id SERIAL,\n",
301 | " user_first_name VARCHAR(30) NOT NULL,\n",
302 | " user_last_name VARCHAR(30) NOT NULL,\n",
303 | " user_email_id VARCHAR(50) NOT NULL,\n",
304 | " user_email_validated BOOLEAN DEFAULT FALSE,\n",
305 | " user_password VARCHAR(200),\n",
306 | " user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A\n",
307 | " is_active BOOLEAN DEFAULT FALSE,\n",
308 | " created_dt DATE DEFAULT CURRENT_DATE,\n",
309 | " last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n",
310 | " PRIMARY KEY (created_dt, user_id)\n",
311 | ") PARTITION BY RANGE(created_dt)"
312 | ]
313 | },
314 | {
315 | "cell_type": "code",
316 | "execution_count": 140,
317 | "metadata": {},
318 | "outputs": [
319 | {
320 | "name": "stdout",
321 | "output_type": "stream",
322 | "text": [
323 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
324 | "Done.\n"
325 | ]
326 | },
327 | {
328 | "data": {
329 | "text/plain": [
330 | "[]"
331 | ]
332 | },
333 | "execution_count": 140,
334 | "metadata": {},
335 | "output_type": "execute_result"
336 | }
337 | ],
338 | "source": [
339 | "%%sql\n",
340 | "\n",
341 | "CREATE TABLE users_mthly_201601\n",
342 | "PARTITION OF users_mthly\n",
343 | "FOR VALUES FROM ('2016-01-01') TO ('2016-01-31')"
344 | ]
345 | },
346 | {
347 | "cell_type": "code",
348 | "execution_count": 141,
349 | "metadata": {},
350 | "outputs": [
351 | {
352 | "name": "stdout",
353 | "output_type": "stream",
354 | "text": [
355 | " * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db\n",
356 | "Done.\n"
357 | ]
358 | },
359 | {
360 | "data": {
361 | "text/plain": [
362 | "[]"
363 | ]
364 | },
365 | "execution_count": 141,
366 | "metadata": {},
367 | "output_type": "execute_result"
368 | }
369 | ],
370 | "source": [
371 | "%%sql\n",
372 | "\n",
373 | "CREATE TABLE users_mthly_201602\n",
374 | "PARTITION OF users_mthly\n",
375 | "FOR VALUES FROM ('2016-02-01') TO ('2016-02-29')"
376 | ]
377 | }
378 | ],
379 | "metadata": {
380 | "kernelspec": {
381 | "display_name": "Python 3",
382 | "language": "python",
383 | "name": "python3"
384 | },
385 | "language_info": {
386 | "codemirror_mode": {
387 | "name": "ipython",
388 | "version": 3
389 | },
390 | "file_extension": ".py",
391 | "mimetype": "text/x-python",
392 | "name": "python",
393 | "nbconvert_exporter": "python",
394 | "pygments_lexer": "ipython3",
395 | "version": "3.6.12"
396 | }
397 | },
398 | "nbformat": 4,
399 | "nbformat_minor": 4
400 | }
401 |
--------------------------------------------------------------------------------
/05_partitioning_tables_and_indexes/13_exercises_partitioning_tables.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Exercises - Partitioning Tables\n",
8 | "\n",
9 | "Here is the exercise to get comfort with partitioning. We will be using range partitioning."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 155,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Use retail database. Make sure **orders** table already exists.\n",
44 | "* You can reset the database by running these commands.\n",
45 | "* Connect to retail database.\n",
46 | "\n",
47 | "```shell\n",
48 | "psql -U itversity_retail_user \\\n",
49 | " -h localhost \\\n",
50 | " -p 5432 \\\n",
51 | " -d itversity_retail_db \\\n",
52 | " -W\n",
53 | "```\n",
54 | "\n",
55 | "* Run these commands or scripts to reset the tables. It will take care of recreating **orders** table.\n",
56 | "\n",
57 | "```sql\n",
58 | "DROP TABLE IF EXISTS order_items;\n",
59 | "DROP TABLE IF EXISTS orders;\n",
60 | "DROP TABLE IF EXISTS customers;\n",
61 | "DROP TABLE IF EXISTS products;\n",
62 | "DROP TABLE IF EXISTS categories;\n",
63 | "DROP TABLE IF EXISTS departments;\n",
64 | "\n",
65 | "\n",
66 | "\\i /data/retail_db/create_db_tables_pg.sql\n",
67 | "\n",
68 | "\\i /data/retail_db/load_db_tables_pg.sql\n",
69 | "```"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "### Exercise 1\n",
77 | "\n",
78 | "Create table **orders_part** with the same columns as orders.\n",
79 | "* Partition the table by month using range partitioning on **order_date**.\n",
80 | "* Add 14 partitions - 13 based up on the data and 1 default. Here is the naming convention.\n",
81 | " * Default - orders_part_default\n",
82 | " * Partition for 2014 January - orders_part_201401"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {},
88 | "source": [
89 | "### Exercise 2\n",
90 | "\n",
91 | "Let us load and validate data in the partitioned table.\n",
92 | "* Load the data from **orders** into **orders_part**.\n",
93 | "* Get count on **orders_part** as well as all the 14 partitions. You should get 0 for default partition and all the records should be distributed using the other 13 partitions."
94 | ]
95 | }
96 | ],
97 | "metadata": {
98 | "kernelspec": {
99 | "display_name": "Python 3",
100 | "language": "python",
101 | "name": "python3"
102 | },
103 | "language_info": {
104 | "codemirror_mode": {
105 | "name": "ipython",
106 | "version": 3
107 | },
108 | "file_extension": ".py",
109 | "mimetype": "text/x-python",
110 | "name": "python",
111 | "nbconvert_exporter": "python",
112 | "pygments_lexer": "ipython3",
113 | "version": "3.6.12"
114 | }
115 | },
116 | "nbformat": 4,
117 | "nbformat_minor": 4
118 | }
119 |
--------------------------------------------------------------------------------
/06_predefined_functions/01_predefined_functions.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Pre-Defined Functions\n",
8 | "\n",
9 | "Let us go through the pre-defined functions available in Postgresql.\n",
10 | "\n",
11 | "* Overview of Pre-Defined Functions\n",
12 | "* String Manipulation Functions\n",
13 | "* Date Manipulation Functions\n",
14 | "* Overview of Numeric Functions\n",
15 | "* Data Type Conversion\n",
16 | "* Handling Null Values\n",
17 | "* Using CASE and WHEN\n",
18 | "* Exercises - Pre-Defined Functions\n",
19 | "\n",
20 | "Here are the key objectives of this section.\n",
21 | "* How to use official documentation of Postgres to get syntax and symantecs of the pre-defined functions?\n",
22 | "* Understand different categories of functions\n",
23 | "* How to use functions effectively using real world examples?\n",
24 | "* How to manipulate strings and dates?\n",
25 | "* How to deal with nulls, convert data types etc?\n",
26 | "* Self evaluate by solving the exercises by using multiple functions in tandem."
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": null,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": []
35 | }
36 | ],
37 | "metadata": {
38 | "kernelspec": {
39 | "display_name": "Python 3",
40 | "language": "python",
41 | "name": "python3"
42 | },
43 | "language_info": {
44 | "codemirror_mode": {
45 | "name": "ipython",
46 | "version": 3
47 | },
48 | "file_extension": ".py",
49 | "mimetype": "text/x-python",
50 | "name": "python",
51 | "nbconvert_exporter": "python",
52 | "pygments_lexer": "ipython3",
53 | "version": "3.6.12"
54 | }
55 | },
56 | "nbformat": 4,
57 | "nbformat_minor": 4
58 | }
59 |
--------------------------------------------------------------------------------
/07_writing_advanced_sql_queries/01_writing_advanced_sql_queries.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Writing Advanced SQL Queries\n",
8 | "\n",
9 | "As part of this section we will understand how to write queries using some of the advanced features.\n",
10 | "\n",
11 | "* Overview of Views\n",
12 | "* Overview of Sub Queries\n",
13 | "* CTAS - Create Table As Select\n",
14 | "* Advanced DML Operations\n",
15 | "* Merging or Upserting Data\n",
16 | "* Pivoting Rows into Columns\n",
17 | "* Overview of Analytic Functions\n",
18 | "* Analytic Functions – Aggregations\n",
19 | "* Cumulative Aggregations\n",
20 | "* Analytic Functions – Windowing\n",
21 | "* Analytic Functions – Ranking\n",
22 | "* Getting Top 5 Daily Products\n",
23 | "* Exercises - Analytic Functions"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": null,
29 | "metadata": {},
30 | "outputs": [],
31 | "source": []
32 | }
33 | ],
34 | "metadata": {
35 | "kernelspec": {
36 | "display_name": "Python 3",
37 | "language": "python",
38 | "name": "python3"
39 | },
40 | "language_info": {
41 | "codemirror_mode": {
42 | "name": "ipython",
43 | "version": 3
44 | },
45 | "file_extension": ".py",
46 | "mimetype": "text/x-python",
47 | "name": "python",
48 | "nbconvert_exporter": "python",
49 | "pygments_lexer": "ipython3",
50 | "version": "3.6.12"
51 | }
52 | },
53 | "nbformat": 4,
54 | "nbformat_minor": 4
55 | }
56 |
--------------------------------------------------------------------------------
/07_writing_advanced_sql_queries/08_pivoting_rows_into_columns.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Pivoting Rows into Columns\n",
8 | "\n",
9 | "Let us understand how we can pivot rows into columns in Postgres."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "tags": [
17 | "remove-cell"
18 | ]
19 | },
20 | "outputs": [
21 | {
22 | "data": {
23 | "text/html": [
24 | "\n"
25 | ],
26 | "text/plain": [
27 | ""
28 | ]
29 | },
30 | "metadata": {},
31 | "output_type": "display_data"
32 | }
33 | ],
34 | "source": [
35 | "%%HTML\n",
36 | ""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "* Actual results\n",
44 | "\n",
45 | "|order_date|order_status|count|\n",
46 | "|----------|------------|-----|\n",
47 | "|2013-07-25 00:00:00|CANCELED|1|\n",
48 | "|2013-07-25 00:00:00|CLOSED|20|\n",
49 | "|2013-07-25 00:00:00|COMPLETE|42|\n",
50 | "|2013-07-25 00:00:00|ON_HOLD|5|\n",
51 | "|2013-07-25 00:00:00|PAYMENT_REVIEW|3|\n",
52 | "|2013-07-25 00:00:00|PENDING|13|\n",
53 | "|2013-07-25 00:00:00|PENDING_PAYMENT|41|\n",
54 | "|2013-07-25 00:00:00|PROCESSING|16|\n",
55 | "|2013-07-25 00:00:00|SUSPECTED_FRAUD|2|\n",
56 | "|2013-07-26 00:00:00|CANCELED|3|\n",
57 | "|2013-07-26 00:00:00|CLOSED|29|\n",
58 | "|2013-07-26 00:00:00|COMPLETE|87|\n",
59 | "|2013-07-26 00:00:00|ON_HOLD|19|\n",
60 | "|2013-07-26 00:00:00|PAYMENT_REVIEW|6|\n",
61 | "|2013-07-26 00:00:00|PENDING|31|\n",
62 | "|2013-07-26 00:00:00|PENDING_PAYMENT|59|\n",
63 | "|2013-07-26 00:00:00|PROCESSING|30|\n",
64 | "|2013-07-26 00:00:00|SUSPECTED_FRAUD|5|\n",
65 | "\n",
66 | "* Pivoted results\n",
67 | "\n",
68 | "|order_date|CANCELED|CLOSED|COMPLETE|ON_HOLD|PAYMENT_REVIEW|PENDING|PENDING_PAYMENT|PROCESSING|SUSPECTED_FRAUD|\n",
69 | "|----------|--------|------|--------|-------|--------------|-------|---------------|----------|---------------|\n",
70 | "|2013-07-25|1|20|42|5|3|13|41|16|2|\n",
71 | "|2013-07-26|3|29|87|19|6|31|59|30|5|\n",
72 | "\n",
73 | "* We need to use `crosstab` as part of `FROM` clause to pivot the data. We need to pass the main query to `crosstab` function.\n",
74 | "* We need to install `tablefunc` as Postgres superuser to expose functions like crosstab - `CREATE EXTENSION tablefunc;`\n",
75 | "\n",
76 | "```{note}\n",
77 | "If you are using environment provided by us, you don't need to install `tablefunc`. If you are using your own environment run this command by logging in as superuser into postgres server to install `tablefunc`.\n",
78 | "\n",
79 | "`CREATE EXTENSION tablefunc;`\n",
80 | "\n",
81 | "However, in some cases you might have to run scripts in postgres. Follow official instructions by searching around.\n",
82 | "```"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": null,
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "%load_ext sql"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": null,
97 | "metadata": {},
98 | "outputs": [],
99 | "source": [
100 | "%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": null,
106 | "metadata": {},
107 | "outputs": [],
108 | "source": [
109 | "%%sql\n",
110 | "\n",
111 | "SELECT order_date,\n",
112 | " order_status,\n",
113 | " count(1)\n",
114 | "FROM orders\n",
115 | "GROUP BY order_date,\n",
116 | " order_status\n",
117 | "ORDER BY order_date,\n",
118 | " order_status\n",
119 | "LIMIT 18"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": null,
125 | "metadata": {},
126 | "outputs": [],
127 | "source": [
128 | "%%sql\n",
129 | "\n",
130 | "SELECT * FROM crosstab(\n",
131 | " 'SELECT order_date,\n",
132 | " order_status,\n",
133 | " count(1) AS order_count\n",
134 | " FROM orders\n",
135 | " GROUP BY order_date,\n",
136 | " order_status',\n",
137 | " 'SELECT DISTINCT order_status FROM orders ORDER BY 1'\n",
138 | ") AS (\n",
139 | " order_date DATE,\n",
140 | " \"CANCELED\" INT,\n",
141 | " \"CLOSED\" INT,\n",
142 | " \"COMPLETE\" INT,\n",
143 | " \"ON_HOLD\" INT,\n",
144 | " \"PAYMENT_REVIEW\" INT,\n",
145 | " \"PENDING\" INT,\n",
146 | " \"PENDING_PAYMENT\" INT,\n",
147 | " \"PROCESSING\" INT,\n",
148 | " \"SUSPECTED_FRAUD\" INT\n",
149 | ")\n",
150 | "LIMIT 10"
151 | ]
152 | }
153 | ],
154 | "metadata": {
155 | "kernelspec": {
156 | "display_name": "Python 3",
157 | "language": "python",
158 | "name": "python3"
159 | },
160 | "language_info": {
161 | "codemirror_mode": {
162 | "name": "ipython",
163 | "version": 3
164 | },
165 | "file_extension": ".py",
166 | "mimetype": "text/x-python",
167 | "name": "python",
168 | "nbconvert_exporter": "python",
169 | "pygments_lexer": "ipython3",
170 | "version": "3.6.12"
171 | }
172 | },
173 | "nbformat": 4,
174 | "nbformat_minor": 4
175 | }
176 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/01_query_performance_tuning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Query Performance Tuning\n",
8 | "\n",
9 | "As part of this section we will go through basic performance tuning techniques with respect to queries.\n",
10 | "\n",
11 | "* Preparing Database\n",
12 | "* Interpreting Explain Plans\n",
13 | "* Overview of Cost Based Optimizer\n",
14 | "* Performance Tuning using Indexes\n",
15 | "* Criteria for indexes\n",
16 | "* Criteria for Partitioning\n",
17 | "* Writing Queries – Partition Pruning\n",
18 | "* Overview of Query Hints"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": []
27 | }
28 | ],
29 | "metadata": {
30 | "kernelspec": {
31 | "display_name": "Python 3",
32 | "language": "python",
33 | "name": "python3"
34 | },
35 | "language_info": {
36 | "codemirror_mode": {
37 | "name": "ipython",
38 | "version": 3
39 | },
40 | "file_extension": ".py",
41 | "mimetype": "text/x-python",
42 | "name": "python",
43 | "nbconvert_exporter": "python",
44 | "pygments_lexer": "ipython3",
45 | "version": "3.6.12"
46 | }
47 | },
48 | "nbformat": 4,
49 | "nbformat_minor": 4
50 | }
51 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/02_preparing_database.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Preparing Database\n",
8 | "\n",
9 | "Let us prepare retail tables to come up with the solution for the problem statement.\n",
10 | "* Ensure that we have required database and user for retail data. We might provide the database as part of our labs.\n",
11 | "\n",
12 | "```shell\n",
13 | "psql -U postgres -h localhost -p 5432 -W\n",
14 | "```\n",
15 | "\n",
16 | "```sql\n",
17 | "CREATE DATABASE itversity_retail_db;\n",
18 | "CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';\n",
19 | "GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;\n",
20 | "```\n",
21 | "\n",
22 | "* Create Tables using the script provided. You can either use `psql` or **SQL Alchemy**.\n",
23 | "\n",
24 | "```shell\n",
25 | "psql -U itversity_retail_user \\\n",
26 | " -h localhost \\\n",
27 | " -p 5432 \\\n",
28 | " -d itversity_retail_db \\\n",
29 | " -W\n",
30 | "\n",
31 | "\\i /data/retail_db/create_db_tables_pg.sql\n",
32 | "```\n",
33 | "\n",
34 | "* Data shall be loaded using the script provided.\n",
35 | "\n",
36 | "```shell\n",
37 | "\\i /data/retail_db/load_db_tables_pg.sql\n",
38 | "```\n",
39 | "\n",
40 | "* Run queries to validate we have data in all the 6 tables."
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "%load_ext sql"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": null,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": null,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "%sql SELECT * FROM departments LIMIT 10"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": null,
73 | "metadata": {},
74 | "outputs": [],
75 | "source": [
76 | "%sql SELECT * FROM categories LIMIT 10"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": null,
82 | "metadata": {},
83 | "outputs": [],
84 | "source": [
85 | "%sql SELECT * FROM products LIMIT 10"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": null,
91 | "metadata": {},
92 | "outputs": [],
93 | "source": [
94 | "%sql SELECT * FROM orders LIMIT 10"
95 | ]
96 | },
97 | {
98 | "cell_type": "code",
99 | "execution_count": null,
100 | "metadata": {},
101 | "outputs": [],
102 | "source": [
103 | "%sql SELECT * FROM order_items LIMIT 10"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": null,
109 | "metadata": {},
110 | "outputs": [],
111 | "source": [
112 | "%sql SELECT * FROM customers LIMIT 10"
113 | ]
114 | }
115 | ],
116 | "metadata": {
117 | "kernelspec": {
118 | "display_name": "Python 3",
119 | "language": "python",
120 | "name": "python3"
121 | },
122 | "language_info": {
123 | "codemirror_mode": {
124 | "name": "ipython",
125 | "version": 3
126 | },
127 | "file_extension": ".py",
128 | "mimetype": "text/x-python",
129 | "name": "python",
130 | "nbconvert_exporter": "python",
131 | "pygments_lexer": "ipython3",
132 | "version": "3.6.12"
133 | }
134 | },
135 | "nbformat": 4,
136 | "nbformat_minor": 4
137 | }
138 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/03_interpreting_explain_plans.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Interpreting Explain Plans\n",
8 | "\n",
9 | "Let us review the below explain plans and understand key terms which will help us in interpreting them.\n",
10 | "* Seq Scan\n",
11 | "* Index Scan\n",
12 | "* Nested Loop\n",
13 | "\n",
14 | "Here are the explain plans for different queries.\n",
15 | "* Explain plan for query to get number of orders.\n",
16 | "\n",
17 | "```sql\n",
18 | "EXPLAIN\n",
19 | "SELECT count(1) FROM orders;\n",
20 | "```\n",
21 | "\n",
22 | "```text\n",
23 | " QUERY PLAN\n",
24 | "-------------------------------------------------------------------\n",
25 | " Aggregate (cost=1386.04..1386.05 rows=1 width=8)\n",
26 | " -> Seq Scan on orders (cost=0.00..1213.83 rows=68883 width=0)\n",
27 | "(2 rows)\n",
28 | "```\n",
29 | "\n",
30 | "* Explain plan for query to get number of orders by date.\n",
31 | "\n",
32 | "```sql\n",
33 | "EXPLAIN\n",
34 | "SELECT order_date, count(1) AS order_count\n",
35 | "FROM orders\n",
36 | "GROUP BY order_date;\n",
37 | "```\n",
38 | "\n",
39 | "```text\n",
40 | " QUERY PLAN\n",
41 | "-------------------------------------------------------------------\n",
42 | " HashAggregate (cost=1558.24..1561.88 rows=364 width=16)\n",
43 | " Group Key: order_date\n",
44 | " -> Seq Scan on orders (cost=0.00..1213.83 rows=68883 width=8)\n",
45 | "(3 rows)\n",
46 | "```\n",
47 | "\n",
48 | "* Explain plan for query to get order details for a given order id.\n",
49 | "\n",
50 | "```sql\n",
51 | "EXPLAIN\n",
52 | "SELECT * FROM orders\n",
53 | "WHERE order_id = 2;\n",
54 | "```\n",
55 | "\n",
56 | "```text\n",
57 | " QUERY PLAN\n",
58 | "---------------------------------------------------------------------------\n",
59 | " Index Scan using orders_pkey on orders (cost=0.29..8.31 rows=1 width=26)\n",
60 | " Index Cond: (order_id = 2)\n",
61 | "(2 rows)\n",
62 | "```\n",
63 | "\n",
64 | "* Explain plan for query to get order and order item details for a given order id.\n",
65 | "\n",
66 | "```sql\n",
67 | "EXPLAIN\n",
68 | "SELECT o.*,\n",
69 | " oi.order_item_subtotal\n",
70 | "FROM orders o JOIN order_items oi\n",
71 | " ON o.order_id = oi.order_item_order_id\n",
72 | "WHERE o.order_id = 2;\n",
73 | "```\n",
74 | "\n",
75 | "```text\n",
76 | " QUERY PLAN\n",
77 | "-----------------------------------------------------------------------------------\n",
78 | " Nested Loop (cost=0.29..3427.82 rows=4 width=34)\n",
79 | " -> Index Scan using orders_pkey on orders o (cost=0.29..8.31 rows=1 width=26)\n",
80 | " Index Cond: (order_id = 2)\n",
81 | " -> Seq Scan on order_items oi (cost=0.00..3419.47 rows=4 width=12)\n",
82 | " Filter: (order_item_order_id = 2)\n",
83 | "(5 rows)\n",
84 | "```\n",
85 | "\n",
86 | "```{note}\n",
87 | "We should understand the order in which the query plans should be interpreted.\n",
88 | "```\n",
89 | "\n",
90 | "* Explain plan for a query with multiple joins\n",
91 | "\n",
92 | "```sql\n",
93 | "EXPLAIN\n",
94 | "SELECT \n",
95 | " o.order_date,\n",
96 | " d.department_id,\n",
97 | " d.department_name,\n",
98 | " c.category_name,\n",
99 | " p.product_name,\n",
100 | " round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
101 | "FROM orders o\n",
102 | " JOIN order_items oi\n",
103 | " ON o.order_id = oi.order_item_order_id\n",
104 | " JOIN products p\n",
105 | " ON p.product_id = oi.order_item_product_id\n",
106 | " JOIN categories c\n",
107 | " ON c.category_id = p.product_category_id\n",
108 | " JOIN departments d\n",
109 | " ON d.department_id = c.category_department_id\n",
110 | "GROUP BY\n",
111 | " o.order_date,\n",
112 | " d.department_id,\n",
113 | " d.department_name,\n",
114 | " c.category_id,\n",
115 | " c.category_name,\n",
116 | " p.product_id,\n",
117 | " p.product_name\n",
118 | "ORDER BY o.order_date,\n",
119 | " revenue DESC;\n",
120 | "```\n",
121 | "\n",
122 | "```text\n",
123 | " QUERY PLAN\n",
124 | "--------------------------------------------------------------------------------------------------------------------------------------\n",
125 | " Sort (cost=76368.54..76799.03 rows=172198 width=211)\n",
126 | " Sort Key: o.order_date, (round((sum(oi.order_item_subtotal))::numeric, 2)) DESC\n",
127 | " -> Finalize GroupAggregate (cost=25958.31..43735.23 rows=172198 width=211)\n",
128 | " Group Key: o.order_date, d.department_id, c.category_id, p.product_id\n",
129 | " -> Gather Merge (cost=25958.31..39886.09 rows=101293 width=187)\n",
130 | " Workers Planned: 1\n",
131 | " -> Partial GroupAggregate (cost=24958.30..27490.62 rows=101293 width=187)\n",
132 | " Group Key: o.order_date, d.department_id, c.category_id, p.product_id\n",
133 | " -> Sort (cost=24958.30..25211.53 rows=101293 width=187)\n",
134 | " Sort Key: o.order_date, d.department_id, c.category_id, p.product_id\n",
135 | " -> Hash Join (cost=2495.48..7188.21 rows=101293 width=187)\n",
136 | " Hash Cond: (c.category_department_id = d.department_id)\n",
137 | " -> Hash Join (cost=2472.43..6897.32 rows=101293 width=79)\n",
138 | " Hash Cond: (p.product_category_id = c.category_id)\n",
139 | " -> Hash Join (cost=2470.13..6609.69 rows=101293 width=63)\n",
140 | " Hash Cond: (oi.order_item_product_id = p.product_id)\n",
141 | " -> Hash Join (cost=2411.87..6284.70 rows=101293 width=20)\n",
142 | " Hash Cond: (oi.order_item_order_id = o.order_id)\n",
143 | " -> Parallel Seq Scan on order_items oi (cost=0.00..2279.93 rows=101293 width=16)\n",
144 | " -> Hash (cost=1213.83..1213.83 rows=68883 width=12)\n",
145 | " -> Seq Scan on orders o (cost=0.00..1213.83 rows=68883 width=12)\n",
146 | " -> Hash (cost=41.45..41.45 rows=1345 width=47)\n",
147 | " -> Seq Scan on products p (cost=0.00..41.45 rows=1345 width=47)\n",
148 | " -> Hash (cost=1.58..1.58 rows=58 width=20)\n",
149 | " -> Seq Scan on categories c (cost=0.00..1.58 rows=58 width=20)\n",
150 | " -> Hash (cost=15.80..15.80 rows=580 width=112)\n",
151 | " -> Seq Scan on departments d (cost=0.00..15.80 rows=580 width=112)\n",
152 | "(27 rows)\n",
153 | "```"
154 | ]
155 | }
156 | ],
157 | "metadata": {
158 | "kernelspec": {
159 | "display_name": "Python 3",
160 | "language": "python",
161 | "name": "python3"
162 | },
163 | "language_info": {
164 | "codemirror_mode": {
165 | "name": "ipython",
166 | "version": 3
167 | },
168 | "file_extension": ".py",
169 | "mimetype": "text/x-python",
170 | "name": "python",
171 | "nbconvert_exporter": "python",
172 | "pygments_lexer": "ipython3",
173 | "version": "3.6.12"
174 | }
175 | },
176 | "nbformat": 4,
177 | "nbformat_minor": 4
178 | }
179 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/04_overview_of_cost_based_optimizer.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Overview of Cost Based Optimizer\n",
8 | "\n",
9 | "Let us get an overview of cost based optimizer.\n",
10 | "* Databases use cost based optimizer to generate explain plans. In the earlier days, they used to use rule based optimizer.\n",
11 | "* For cost based optimizer to generate optimal explain plan, we need to ensure statistics of our data in tables are collected at regular intervals.\n",
12 | "* We can analyze tables to collect statistics. Typically DBAs schedule to collect statistics at regular intervals.\n",
13 | "* In some cases we might have to compute statistics on the tables that are used in the query which we are trying to tune. The database user need to have permissions to compute statistics.\n",
14 | "* Here are some of the basic statistics typically collected.\n",
15 | " * Approximate number of records at table level.\n",
16 | " * Approximate number of unique records at index level.\n",
17 | "* When explain plans are generated, these statistics will be used by cost based optimizer to provide us with the most optimal plan for our query."
18 | ]
19 | }
20 | ],
21 | "metadata": {
22 | "kernelspec": {
23 | "display_name": "Python 3",
24 | "language": "python",
25 | "name": "python3"
26 | },
27 | "language_info": {
28 | "codemirror_mode": {
29 | "name": "ipython",
30 | "version": 3
31 | },
32 | "file_extension": ".py",
33 | "mimetype": "text/x-python",
34 | "name": "python",
35 | "nbconvert_exporter": "python",
36 | "pygments_lexer": "ipython3",
37 | "version": "3.6.12"
38 | }
39 | },
40 | "nbformat": 4,
41 | "nbformat_minor": 4
42 | }
43 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/05_performance_tuning_using_indexes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Performance Tuning using Indexes\n",
8 | "\n",
9 | "Let us understand how we can improve the performance of the query by creating index on order_items.order_item_order_id.\n",
10 | "\n",
11 | "* We have order level details in orders and item level details in order_items.\n",
12 | "* When customer want to review their orders, they need details about order_items. In almost all the scenarios in order management system, we prefer to get both order as well as order_items details by passing order_id of pending or outstanding orders.\n",
13 | "* Let us review the explain plan for the query with out index on order_items.order_item_order_id.\n",
14 | "\n",
15 | "```sql\n",
16 | "EXPLAIN\n",
17 | "SELECT o.*,\n",
18 | " oi.order_item_subtotal\n",
19 | "FROM orders o JOIN order_items oi\n",
20 | " ON o.order_id = oi.order_item_order_id\n",
21 | "WHERE o.order_id = 2;\n",
22 | "```\n",
23 | "\n",
24 | "```{text}\n",
25 | " QUERY PLAN\n",
26 | "-----------------------------------------------------------------------------------\n",
27 | " Nested Loop (cost=0.29..3427.82 rows=3 width=34)\n",
28 | " -> Index Scan using orders_pkey on orders o (cost=0.29..8.31 rows=1 width=26)\n",
29 | " Index Cond: (order_id = 2)\n",
30 | " -> Seq Scan on order_items oi (cost=0.00..3419.47 rows=3 width=12)\n",
31 | " Filter: (order_item_order_id = 2)\n",
32 | "(5 rows)\n",
33 | "```\n",
34 | "\n",
35 | "* Develop piece of code to randomly pass 2000 order ids and calculate time."
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": null,
41 | "metadata": {},
42 | "outputs": [],
43 | "source": [
44 | "!pip install psycopg2"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": null,
50 | "metadata": {},
51 | "outputs": [],
52 | "source": [
53 | "import psycopg2"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "%%time\n",
63 | "connection = psycopg2.connect(\n",
64 | " host='localhost',\n",
65 | " port='5432',\n",
66 | " database='itversity_retail_db',\n",
67 | " user='itversity_retail_user',\n",
68 | " password='retail_password'\n",
69 | ")\n",
70 | "cursor = connection.cursor()\n",
71 | "query = '''SELECT count(1) \n",
72 | "FROM orders o JOIN order_items oi \n",
73 | " ON o.order_id = oi.order_item_order_id\n",
74 | "WHERE o.order_id = %s\n",
75 | "'''\n",
76 | "ctr = 0\n",
77 | "while True:\n",
78 | " if ctr == 2000:\n",
79 | " break\n",
80 | " cursor.execute(query, (1,))\n",
81 | " ctr += 1\n",
82 | "cursor.close()\n",
83 | "connection.close()"
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {},
89 | "source": [
90 | "* Create index on order_items.order_item_order_id"
91 | ]
92 | },
93 | {
94 | "cell_type": "code",
95 | "execution_count": null,
96 | "metadata": {},
97 | "outputs": [],
98 | "source": [
99 | "%load_ext sql"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": [
108 | "%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "execution_count": null,
114 | "metadata": {},
115 | "outputs": [],
116 | "source": [
117 | "%%sql\n",
118 | "\n",
119 | "CREATE INDEX order_items_order_id_idx \n",
120 | "ON order_items(order_item_order_id);"
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "* Run explain plan after creating index on order_items.order_item_order_id\n",
128 | "\n",
129 | "```sql\n",
130 | "EXPLAIN\n",
131 | "SELECT o.*,\n",
132 | " oi.order_item_subtotal\n",
133 | "FROM orders o JOIN order_items oi\n",
134 | " ON o.order_id = oi.order_item_order_id\n",
135 | "WHERE o.order_id = 2;\n",
136 | "```\n",
137 | "\n",
138 | "```text\n",
139 | " QUERY PLAN\n",
140 | "------------------------------------------------------------------------------------------------------\n",
141 | " Nested Loop (cost=0.71..16.81 rows=3 width=34)\n",
142 | " -> Index Scan using orders_pkey on orders o (cost=0.29..8.31 rows=1 width=26)\n",
143 | " Index Cond: (order_id = 2)\n",
144 | " -> Index Scan using order_items_order_id_idx on order_items oi (cost=0.42..8.47 rows=3 width=12)\n",
145 | " Index Cond: (order_item_order_id = 2)\n",
146 | "(5 rows)\n",
147 | "```\n",
148 | "\n",
149 | "* Run the code again to see how much time, it get the results for 2000 random orders."
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": null,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "import psycopg2"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": null,
164 | "metadata": {},
165 | "outputs": [],
166 | "source": [
167 | "%%time\n",
168 | "\n",
169 | "from random import randrange\n",
170 | "connection = psycopg2.connect(\n",
171 | " host='localhost',\n",
172 | " port='5432',\n",
173 | " database='itversity_retail_db',\n",
174 | " user='itversity_retail_user',\n",
175 | " password='retail_password'\n",
176 | ")\n",
177 | "cursor = connection.cursor()\n",
178 | "query = '''SELECT count(1) \n",
179 | "FROM orders o JOIN order_items oi \n",
180 | " ON o.order_id = oi.order_item_order_id\n",
181 | "WHERE o.order_id = %s\n",
182 | "'''\n",
183 | "ctr = 0\n",
184 | "while True:\n",
185 | " if ctr == 2000:\n",
186 | " break\n",
187 | " order_id = randrange(1, 68883)\n",
188 | " cursor.execute(query, (order_id,))\n",
189 | " ctr += 1\n",
190 | "cursor.close()\n",
191 | "connection.close()"
192 | ]
193 | },
194 | {
195 | "cell_type": "markdown",
196 | "metadata": {},
197 | "source": [
198 | "```{warning}\n",
199 | "Keep in mind that having indexes on tables can have negative impact on write operations.\n",
200 | "```"
201 | ]
202 | }
203 | ],
204 | "metadata": {
205 | "kernelspec": {
206 | "display_name": "Python 3",
207 | "language": "python",
208 | "name": "python3"
209 | },
210 | "language_info": {
211 | "codemirror_mode": {
212 | "name": "ipython",
213 | "version": 3
214 | },
215 | "file_extension": ".py",
216 | "mimetype": "text/x-python",
217 | "name": "python",
218 | "nbconvert_exporter": "python",
219 | "pygments_lexer": "ipython3",
220 | "version": "3.6.12"
221 | }
222 | },
223 | "nbformat": 4,
224 | "nbformat_minor": 4
225 | }
226 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/07_criteria_for_partitioning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Criteria for Partitioning\n",
8 | "\n",
9 | "Let us understand how we can leverage partitioning to fine tune the performance.\n",
10 | "* Partitioning is another key strategy to boost the performance of the queries.\n",
11 | "* It is extensively used as key performance tuning strategy as part of tables created to support reporting requirements.\n",
12 | "* Even in transactional systems, we can leverage partitioning as one of the performance tuning technique while dealing with large tables.\n",
13 | "* For application log tables, we might want to discard all the irrelevant data after specific time period. If partitioning is used, we can detach and/or drop the paritions quickly.\n",
14 | "* Over a period of time most of the orders will be in **CLOSED** status. We can partition table using list parititioning to ensure that all the **CLOSED** orders are moved to another partition. It can improve the performance for the activity related to active orders.\n",
15 | "* In case of reporting databases, we might partition the transaction tables at daily level so that we can easily filter and process data to pre-aggregate and store in the reporting data marts.\n",
16 | "* Most of the tables in ODS or Data Lake will be timestamped and partitioned at daily or monthly level so that we can remove or archive old partitions easily"
17 | ]
18 | }
19 | ],
20 | "metadata": {
21 | "kernelspec": {
22 | "display_name": "Python 3",
23 | "language": "python",
24 | "name": "python3"
25 | },
26 | "language_info": {
27 | "codemirror_mode": {
28 | "name": "ipython",
29 | "version": 3
30 | },
31 | "file_extension": ".py",
32 | "mimetype": "text/x-python",
33 | "name": "python",
34 | "nbconvert_exporter": "python",
35 | "pygments_lexer": "ipython3",
36 | "version": "3.6.12"
37 | }
38 | },
39 | "nbformat": 4,
40 | "nbformat_minor": 4
41 | }
42 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/08_writing_queries_partition_pruning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Writing Queries – Partition Pruning\n",
8 | "\n",
9 | "Let us understand how to write queries by leveraging partitioing.\n",
10 | "* Make sure to include a condition on partitioned column.\n",
11 | "* Equal condition will yield better results.\n",
12 | "* Queries with condition on partition key will result in partition pruning. The data from the other partitions will be fully ignored.\n",
13 | "* As partition pruning will result in lesser I/O, the overall performance of such queries will improve drastically."
14 | ]
15 | }
16 | ],
17 | "metadata": {
18 | "kernelspec": {
19 | "display_name": "Python 3",
20 | "language": "python",
21 | "name": "python3"
22 | },
23 | "language_info": {
24 | "codemirror_mode": {
25 | "name": "ipython",
26 | "version": 3
27 | },
28 | "file_extension": ".py",
29 | "mimetype": "text/x-python",
30 | "name": "python",
31 | "nbconvert_exporter": "python",
32 | "pygments_lexer": "ipython3",
33 | "version": "3.6.12"
34 | }
35 | },
36 | "nbformat": 4,
37 | "nbformat_minor": 4
38 | }
39 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/09_overview_of_query_hints.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Overview of Query Hints\n",
8 | "\n",
9 | "Let us get an overview of query hints.\n",
10 | "* We can specify hint using /*+ HINT */ as part of the query.\n",
11 | "* Make sure there are no typos in the hint.\n",
12 | "* If there are typos or there no indexes specified as part of hint, they will be ignored.\n",
13 | "* In case of complex queries, CBO might use incorrect index or inappropriate join.\n",
14 | "* As an expert if we are sure that, the query should be using a particular index or right join, then we can force the optimizer to choose such index or join type leveraging hint."
15 | ]
16 | }
17 | ],
18 | "metadata": {
19 | "kernelspec": {
20 | "display_name": "Python 3",
21 | "language": "python",
22 | "name": "python3"
23 | },
24 | "language_info": {
25 | "codemirror_mode": {
26 | "name": "ipython",
27 | "version": 3
28 | },
29 | "file_extension": ".py",
30 | "mimetype": "text/x-python",
31 | "name": "python",
32 | "nbconvert_exporter": "python",
33 | "pygments_lexer": "ipython3",
34 | "version": "3.6.12"
35 | }
36 | },
37 | "nbformat": 4,
38 | "nbformat_minor": 4
39 | }
40 |
--------------------------------------------------------------------------------
/08_query_performance_tuning/10_exercises_tuning_queries.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Exercises - Tuning Queries\n",
8 | "\n",
9 | "As part of this exercise, you need to prepare data set, go through the explain plan and come up with right indexes to tune the performance.\n",
10 | "\n",
11 | "* As of now customer email id in customers table contain same value (**XXXXXXXXX**).\n",
12 | "* Let us update customer_email_id.\n",
13 | " * Use initial (first character) of customer_fname\n",
14 | " * Use full string of customer_lname\n",
15 | " * Use row_number by grouping or partitioning the data by first character of customer_fname and full customer_lname then sort it by customer_id.\n",
16 | " * Make sure row_number is at least 3 digits, if not pad with 0 and concatenate to email id. Here are the examples\n",
17 | " * Also make sure email ids are in upper case.\n",
18 | "|customer_id|customer_fname|customer_lname|rank|customer_email|\n",
19 | "|-----------|--------------|--------------|----|--------------|\n",
20 | "|11591|Ann|Alexander|1|AALEXANDER001@SOME.COM|\n",
21 | "|12031|Ashley|Benitez|1|ABENITEZ001@SOME.COM|\n",
22 | "|11298|Anthony|Best|1|ABEST001@SOME.COM|\n",
23 | "|11304|Alexander|Campbell|1|ACAMPBELL001@SOME.COM|\n",
24 | "|11956|Alan|Campos|1|ACAMPOS001@SOME.COM|\n",
25 | "|12075|Aaron|Carr|1|ACARR001@SOME.COM|\n",
26 | "|12416|Aaron|Cline|1|ACLINE001@SOME.COM|\n",
27 | "|10967|Alexander|Cunningham|1|ACUNNINGHAM001@SOME.COM|\n",
28 | "|12216|Ann|Deleon|1|ADELEON001@SOME.COM|\n",
29 | "|11192|Andrew|Dickson|1|ADICKSON001@SOME.COM|\n",
30 | "* Let us assume that customer care will try to search for customer details using at least first 4 characters.\n",
31 | "* Generate explain plan for the query.\n",
32 | "* Create unique index on customer_email.\n",
33 | "* Generate explain plan again and review the differences."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 1,
39 | "metadata": {},
40 | "outputs": [
41 | {
42 | "name": "stdout",
43 | "output_type": "stream",
44 | "text": [
45 | "env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db\n"
46 | ]
47 | }
48 | ],
49 | "source": [
50 | "%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 2,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "%load_ext sql"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 14,
65 | "metadata": {},
66 | "outputs": [
67 | {
68 | "name": "stdout",
69 | "output_type": "stream",
70 | "text": [
71 | " * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
72 | "10 rows affected.\n"
73 | ]
74 | },
75 | {
76 | "data": {
77 | "text/html": [
78 | "
\n",
79 | "
\n",
80 | "
customer_id
\n",
81 | "
customer_fname
\n",
82 | "
customer_lname
\n",
83 | "
rnk
\n",
84 | "
customer_email
\n",
85 | "
\n",
86 | "
\n",
87 | "
11591
\n",
88 | "
Ann
\n",
89 | "
Alexander
\n",
90 | "
1
\n",
91 | "
AALEXANDER001@SOME.COM
\n",
92 | "
\n",
93 | "
\n",
94 | "
12031
\n",
95 | "
Ashley
\n",
96 | "
Benitez
\n",
97 | "
1
\n",
98 | "
ABENITEZ001@SOME.COM
\n",
99 | "
\n",
100 | "
\n",
101 | "
11298
\n",
102 | "
Anthony
\n",
103 | "
Best
\n",
104 | "
1
\n",
105 | "
ABEST001@SOME.COM
\n",
106 | "
\n",
107 | "
\n",
108 | "
11304
\n",
109 | "
Alexander
\n",
110 | "
Campbell
\n",
111 | "
1
\n",
112 | "
ACAMPBELL001@SOME.COM
\n",
113 | "
\n",
114 | "
\n",
115 | "
11956
\n",
116 | "
Alan
\n",
117 | "
Campos
\n",
118 | "
1
\n",
119 | "
ACAMPOS001@SOME.COM
\n",
120 | "
\n",
121 | "
\n",
122 | "
12075
\n",
123 | "
Aaron
\n",
124 | "
Carr
\n",
125 | "
1
\n",
126 | "
ACARR001@SOME.COM
\n",
127 | "
\n",
128 | "
\n",
129 | "
12416
\n",
130 | "
Aaron
\n",
131 | "
Cline
\n",
132 | "
1
\n",
133 | "
ACLINE001@SOME.COM
\n",
134 | "
\n",
135 | "
\n",
136 | "
10967
\n",
137 | "
Alexander
\n",
138 | "
Cunningham
\n",
139 | "
1
\n",
140 | "
ACUNNINGHAM001@SOME.COM
\n",
141 | "
\n",
142 | "
\n",
143 | "
12216
\n",
144 | "
Ann
\n",
145 | "
Deleon
\n",
146 | "
1
\n",
147 | "
ADELEON001@SOME.COM
\n",
148 | "
\n",
149 | "
\n",
150 | "
11192
\n",
151 | "
Andrew
\n",
152 | "
Dickson
\n",
153 | "
1
\n",
154 | "
ADICKSON001@SOME.COM
\n",
155 | "
\n",
156 | "
"
157 | ],
158 | "text/plain": [
159 | "[(11591, 'Ann', 'Alexander', 1, 'AALEXANDER001@SOME.COM'),\n",
160 | " (12031, 'Ashley', 'Benitez', 1, 'ABENITEZ001@SOME.COM'),\n",
161 | " (11298, 'Anthony', 'Best', 1, 'ABEST001@SOME.COM'),\n",
162 | " (11304, 'Alexander', 'Campbell', 1, 'ACAMPBELL001@SOME.COM'),\n",
163 | " (11956, 'Alan', 'Campos', 1, 'ACAMPOS001@SOME.COM'),\n",
164 | " (12075, 'Aaron', 'Carr', 1, 'ACARR001@SOME.COM'),\n",
165 | " (12416, 'Aaron', 'Cline', 1, 'ACLINE001@SOME.COM'),\n",
166 | " (10967, 'Alexander', 'Cunningham', 1, 'ACUNNINGHAM001@SOME.COM'),\n",
167 | " (12216, 'Ann', 'Deleon', 1, 'ADELEON001@SOME.COM'),\n",
168 | " (11192, 'Andrew', 'Dickson', 1, 'ADICKSON001@SOME.COM')]"
169 | ]
170 | },
171 | "execution_count": 14,
172 | "metadata": {},
173 | "output_type": "execute_result"
174 | }
175 | ],
176 | "source": [
177 | "%%sql\n",
178 | "\n",
179 | "SELECT q.*,\n",
180 | " upper(concat(substring(customer_fname, 1, 1), customer_lname, lpad(rnk::varchar, 3, '0'), '@SOME.COM')) AS customer_email\n",
181 | "FROM ( \n",
182 | " SELECT customer_id,\n",
183 | " customer_fname,\n",
184 | " customer_lname,\n",
185 | " rank() OVER (\n",
186 | " PARTITION BY substring(customer_fname, 1, 1), customer_lname\n",
187 | " ORDER BY customer_id\n",
188 | " ) AS rnk\n",
189 | " FROM customers\n",
190 | ") q\n",
191 | "ORDER BY customer_email\n",
192 | "LIMIT 10"
193 | ]
194 | }
195 | ],
196 | "metadata": {
197 | "kernelspec": {
198 | "display_name": "Python 3",
199 | "language": "python",
200 | "name": "python3"
201 | },
202 | "language_info": {
203 | "codemirror_mode": {
204 | "name": "ipython",
205 | "version": 3
206 | },
207 | "file_extension": ".py",
208 | "mimetype": "text/x-python",
209 | "name": "python",
210 | "nbconvert_exporter": "python",
211 | "pygments_lexer": "ipython3",
212 | "version": "3.6.12"
213 | }
214 | },
215 | "nbformat": 4,
216 | "nbformat_minor": 4
217 | }
218 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # mastering-postgresql
2 | Content related to Mastering Postgresql along with videos.
3 |
4 | Here are the steps to contribute for this project.
5 | * Clone the repository
6 | * Create Python virtual environment - `python3 -m venv itvm-env`
7 | * Activate the environment - `source itvm-env/bin/activate`
8 | * Run these commands to install all the dependencies to create jupyter books based up on Python Notebooks.
9 | ```shell
10 | pip install jupyterlab
11 | pip install jupyter_book
12 | pip install ghp-import
13 | ```
14 | * You can go to the directory which contain files such as _toc.
15 | * Build the book using `jb build .`. As part of the build process, it will generate static HTML files based up on the content.
16 | * Publish the book to GitHub pages using `ghp-import -n -p -c postgresql.itversity.com -f _build/html`
17 | * `-n` to ensure that jekyll is not used.
18 | * `-c` is for specifying custom domain for the GitHub based static website that is generated.
19 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | title : Mastering SQL using Postgresql
2 | author : Durga Gadiraju
3 | copyright : "ITVersity, Inc"
4 |
5 | repository:
6 | url : https://github.com/itversity/mastering-postgresql
7 | html:
8 | use_repository_button : true
9 | use_issues_button : true
10 | use_edit_page_button : true
11 | extra_navbar : Subscribe to our Newsletter
12 | google_analytics_id : UA-80990145-12
13 |
14 | exclude_patterns : [_build, README.md, "**.ipynb_checkpoints"]
15 |
16 | execute:
17 | execute_notebooks : off
18 |
--------------------------------------------------------------------------------
/_toc.yml:
--------------------------------------------------------------------------------
1 | - file: mastering-sql-using-postgresql
2 |
3 | - file: mastering_postgresql_exercises
4 |
5 | - file: 01_getting_started/01_getting_started
6 | sections:
7 | - file: 01_getting_started/02_connecting_to_database
8 | - file: 01_getting_started/03_using_psql
9 | - file: 01_getting_started/04_setup_postgres_using_docker
10 | - file: 01_getting_started/05_setup_sql_workbench
11 | - file: 01_getting_started/06_sql_workbench_and_postgres
12 | - file: 01_getting_started/07_sql_workbench_features
13 | - file: 01_getting_started/08_data_loading_utilities
14 | - file: 01_getting_started/09_loading_data_postgres_in_docker
15 | - file: 01_getting_started/10_exercise_loading_data
16 |
17 | - file: 02_dml_or_crud_operations/01_dml_or_crud_operations
18 | sections:
19 | - file: 02_dml_or_crud_operations/02_normalization_principles
20 | - file: 02_dml_or_crud_operations/03_tables_as_relations
21 | - file: 02_dml_or_crud_operations/04_overview_of_database_operations
22 | - file: 02_dml_or_crud_operations/05_crud_operations
23 | - file: 02_dml_or_crud_operations/06_creating_table
24 | - file: 02_dml_or_crud_operations/07_inserting_data
25 | - file: 02_dml_or_crud_operations/08_updating_data
26 | - file: 02_dml_or_crud_operations/09_deleting_data
27 | - file: 02_dml_or_crud_operations/10_overview_of_transactions
28 | - file: 02_dml_or_crud_operations/11_exercises_database_operations
29 |
30 | - file: 03_writing_basic_sql_queries/01_writing_basic_sql_queries
31 | sections:
32 | - file: 03_writing_basic_sql_queries/02_standard_transformations
33 | - file: 03_writing_basic_sql_queries/03_overview_of_data_model
34 | - file: 03_writing_basic_sql_queries/04_define_problem_statement
35 | - file: 03_writing_basic_sql_queries/05_preparing_tables
36 | - file: 03_writing_basic_sql_queries/06_selecting_or_projecting_data
37 | - file: 03_writing_basic_sql_queries/07_filtering_data
38 | - file: 03_writing_basic_sql_queries/08_joining_tables_inner
39 | - file: 03_writing_basic_sql_queries/09_joining_tables_outer
40 | - file: 03_writing_basic_sql_queries/10_performing_aggregations
41 | - file: 03_writing_basic_sql_queries/11_sorting_data
42 | - file: 03_writing_basic_sql_queries/12_solution_daily_product_revenue
43 | - file: 03_writing_basic_sql_queries/13_exercises_basic_sql_queries
44 |
45 | - file: 04_creating_tables_and_indexes/01_creating_tables_and_indexes
46 | sections:
47 | - file: 04_creating_tables_and_indexes/02_data_definition_language
48 | - file: 04_creating_tables_and_indexes/03_overview_of_data_types
49 | - file: 04_creating_tables_and_indexes/04_adding_or_modifying_columns
50 | - file: 04_creating_tables_and_indexes/05_different_types_of_constraints
51 | - file: 04_creating_tables_and_indexes/06_managing_constraints
52 | - file: 04_creating_tables_and_indexes/07_indexes_on_tables
53 | - file: 04_creating_tables_and_indexes/08_indexes_for_constraints
54 | - file: 04_creating_tables_and_indexes/09_overview_of_sequences
55 | - file: 04_creating_tables_and_indexes/10_truncating_tables
56 | - file: 04_creating_tables_and_indexes/11_dropping_tables
57 | - file: 04_creating_tables_and_indexes/12_exercises_managing_db_objects
58 |
59 | - file: 05_partitioning_tables_and_indexes/01_partitioning_tables_and_indexes
60 | sections:
61 | - file: 05_partitioning_tables_and_indexes/02_overview_of_partitioning
62 | - file: 05_partitioning_tables_and_indexes/03_list_partitioning
63 | - file: 05_partitioning_tables_and_indexes/04_managing_partitions_list
64 | - file: 05_partitioning_tables_and_indexes/05_manipulating_data
65 | - file: 05_partitioning_tables_and_indexes/06_range_partitioning
66 | - file: 05_partitioning_tables_and_indexes/07_managing_partitions_range
67 | - file: 05_partitioning_tables_and_indexes/08_repartitioning_range
68 | - file: 05_partitioning_tables_and_indexes/09_hash_partitioning
69 | - file: 05_partitioning_tables_and_indexes/10_managing_partitions_hash
70 | - file: 05_partitioning_tables_and_indexes/11_usage_scenarios
71 | - file: 05_partitioning_tables_and_indexes/12_sub_partitioning
72 | - file: 05_partitioning_tables_and_indexes/13_exercises_partitioning_tables
73 |
74 | - file: 06_predefined_functions/01_predefined_functions
75 | sections:
76 | - file: 06_predefined_functions/02_overview_of_predefined_functions
77 | - file: 06_predefined_functions/03_string_manipulation_functions
78 | - file: 06_predefined_functions/04_date_manipulation_functions
79 | - file: 06_predefined_functions/05_overview_of_numeric_functions
80 | - file: 06_predefined_functions/06_data_type_conversion
81 | - file: 06_predefined_functions/07_handling_null_values
82 | - file: 06_predefined_functions/08_using_case_and_when
83 | - file: 06_predefined_functions/09_exercises_predefined_functions
84 |
85 | - file: 07_writing_advanced_sql_queries/01_writing_advanced_sql_queries
86 | sections:
87 | - file: 07_writing_advanced_sql_queries/02_overview_of_views
88 | - file: 07_writing_advanced_sql_queries/03_named_queries_using_with_clause
89 | - file: 07_writing_advanced_sql_queries/04_overview_of_sub_queries
90 | - file: 07_writing_advanced_sql_queries/05_create_table_as_select
91 | - file: 07_writing_advanced_sql_queries/06_advanced_dml_operations
92 | - file: 07_writing_advanced_sql_queries/07_merging_or_upserting_data
93 | - file: 07_writing_advanced_sql_queries/08_pivoting_rows_into_columns
94 | - file: 07_writing_advanced_sql_queries/09_overview_of_analytic_functions
95 | - file: 07_writing_advanced_sql_queries/10_analytic_functions_aggregations
96 | - file: 07_writing_advanced_sql_queries/11_cumulative_or_moving_aggregations
97 | - file: 07_writing_advanced_sql_queries/12_analytic_functions_windowing
98 | - file: 07_writing_advanced_sql_queries/13_analytic_functions_ranking
99 | - file: 07_writing_advanced_sql_queries/14_analytic_funcions_filtering
100 | - file: 07_writing_advanced_sql_queries/15_ranking_and_filtering_recap
101 | - file: 07_writing_advanced_sql_queries/16_exercises_analytic_functions
102 |
103 | - file: 08_query_performance_tuning/01_query_performance_tuning
104 | sections:
105 | - file: 08_query_performance_tuning/02_preparing_database
106 | - file: 08_query_performance_tuning/03_interpreting_explain_plans
107 | - file: 08_query_performance_tuning/04_overview_of_cost_based_optimizer
108 | - file: 08_query_performance_tuning/05_performance_tuning_using_indexes
109 | - file: 08_query_performance_tuning/06_criteria_for_indexing
110 | - file: 08_query_performance_tuning/07_criteria_for_partitioning
111 | - file: 08_query_performance_tuning/08_writing_queries_partition_pruning
112 | - file: 08_query_performance_tuning/09_overview_of_query_hints
113 | - file: 08_query_performance_tuning/10_exercises_tuning_queries
114 |
115 | - file: mastering_postgresql_exercises
--------------------------------------------------------------------------------
/bonus_data_warehousing_concepts/01_data_warehousing_concepts.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Data Warehousing Concepts"
8 | ]
9 | }
10 | ],
11 | "metadata": {
12 | "kernelspec": {
13 | "display_name": "Python 3",
14 | "language": "python",
15 | "name": "python3"
16 | },
17 | "language_info": {
18 | "codemirror_mode": {
19 | "name": "ipython",
20 | "version": 3
21 | },
22 | "file_extension": ".py",
23 | "mimetype": "text/x-python",
24 | "name": "python",
25 | "nbconvert_exporter": "python",
26 | "pygments_lexer": "ipython3",
27 | "version": "3.6.12"
28 | }
29 | },
30 | "nbformat": 4,
31 | "nbformat_minor": 4
32 | }
33 |
--------------------------------------------------------------------------------
/bonus_data_warehousing_concepts/02_overview_of_oltp_applications.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Overview of OLTP Applications\n",
8 | "\n",
9 | "Let us get an overview of OLTP Applications.\n",
10 | "* OLTP stands for on-line transaction processing.\n",
11 | "* Here are the some of the examples for OLTP Applications.\n",
12 | " * Order Management System in eCommerce platforms.\n",
13 | " * Point of Sale (PoS) system for brick and mortar retail outlets.\n",
14 | " * Banking applications.\n",
15 | " * Mobile billing applications.\n",
16 | "* Typically tables in databases for OLTP Applications are designed as per normalization principles. The tables will be either in 3NF or BCNF.\n",
17 | "* Here are the most commonly used databases used to support OLTP Applications.\n",
18 | " * Oracle\n",
19 | " * SQL Server\n",
20 | " * DB2\n",
21 | " * Informix\n",
22 | " * Sybase\n",
23 | " * Postgresql\n",
24 | " * MySQL\n",
25 | "* Applications are typically built using programming language based frameworks.\n",
26 | " * Java - J2EE, Spring Boot\n",
27 | " * Python - Flask, Django\n",
28 | " * Ruby - Rails\n",
29 | " * PHP - Laravel\n",
30 | " * Scala - Play\n",
31 | " * Java Script - Node\n",
32 | "* We might run basic reports such as customer's order history as part of OLTP applications.\n",
33 | "* Even though it is possible to run enterprise level reports at customer level, product level, category level, department level etc using OLTP application - it should be avoided as such kind of reports can have adverse impact on the core application."
34 | ]
35 | }
36 | ],
37 | "metadata": {
38 | "kernelspec": {
39 | "display_name": "Python 3",
40 | "language": "python",
41 | "name": "python3"
42 | },
43 | "language_info": {
44 | "codemirror_mode": {
45 | "name": "ipython",
46 | "version": 3
47 | },
48 | "file_extension": ".py",
49 | "mimetype": "text/x-python",
50 | "name": "python",
51 | "nbconvert_exporter": "python",
52 | "pygments_lexer": "ipython3",
53 | "version": "3.6.12"
54 | }
55 | },
56 | "nbformat": 4,
57 | "nbformat_minor": 4
58 | }
59 |
--------------------------------------------------------------------------------
/bonus_data_warehousing_concepts/03_data_warehouse_architecture.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Data Warehouse Architecture\n",
8 | "\n",
9 | "Let us get an overview of Data Warehouse Architecture. We will also get into the details related to different components of Data Warehouse.\n",
10 | "* ODS - Operational Data Store\n",
11 | " * Ad hoc queries for data analysis.\n",
12 | " * Extended data storage for compliance purposes.\n",
13 | " * Golden copy of source data to troubleshoot data quality issues and bugs in reports.\n",
14 | "* Enterprise Data Warehouse with Data Marts\n",
15 | " * Contains data as per modeled data marts.\n",
16 | " * Ad hoc queries or reports for data analysis.\n",
17 | " * Standard reports as per the business requirements.\n",
18 | "* There might be tables pre-aggregated as per the required granularity for different reports and dashboards.\n",
19 | "\n",
20 | "Here are the common architectural patterns with respect to getting data into Data Warehouse.\n",
21 | "* ETL - Extract, Transform, Load\n",
22 | "* ELT - Extract, Load, Transform\n",
23 | "* In ETL, data processing or transformations are typically taken care by servers such as Informatica, Talend etc.\n",
24 | "* In ELT, data processing is typically done after loading the data into the target system leveraging the capacity of the target."
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": null,
30 | "metadata": {},
31 | "outputs": [],
32 | "source": []
33 | }
34 | ],
35 | "metadata": {
36 | "kernelspec": {
37 | "display_name": "Python 3",
38 | "language": "python",
39 | "name": "python3"
40 | },
41 | "language_info": {
42 | "codemirror_mode": {
43 | "name": "ipython",
44 | "version": 3
45 | },
46 | "file_extension": ".py",
47 | "mimetype": "text/x-python",
48 | "name": "python",
49 | "nbconvert_exporter": "python",
50 | "pygments_lexer": "ipython3",
51 | "version": "3.6.12"
52 | }
53 | },
54 | "nbformat": 4,
55 | "nbformat_minor": 4
56 | }
57 |
--------------------------------------------------------------------------------
/bonus_data_warehousing_concepts/04_overview_of_data_lake.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Overview of Data Lake\n",
8 | "\n",
9 | "Let us get an overview of Data Lake and its role in modern analytics applications.\n",
10 | "* One of the major limitation of traditional data warehouse technologies is coupled storage and compute.\n",
11 | "* The latest and major change in this area is decoupling of storage and compute.\n",
12 | "* Data lake is nothing but a low cost storage. It is typically block storage.\n",
13 | " * AWS - s3\n",
14 | " * Azure - Blob\n",
15 | " * GCP - Cloud Storage\n",
16 | " * On Prem - Open source block storage\n",
17 | "* The raw data is typically stored in data lake. There might be multiple layers with in data lake for different categories of data.\n",
18 | " * Raw Data\n",
19 | " * Intermediate Data\n",
20 | " * Final or Curated Data fine tuned for reporting purposes.\n",
21 | "* Data is typically ingested into Data Lake using different technologies for different types of sources.\n",
22 | " * Data ingestion from files is typically done by using tools like NiFi, ETL tools such as Talend etc.\n",
23 | " * Data ingestion from databases is typically done by using tools like ETL tools, Spark over JDBC, Sqoop etc.\n",
24 | " * Data ingestion in real time or micro batches is typically done by using tools like Goldengate, Attinuity, Kafka, Kinesis etc.\n",
25 | "* Here are the details related to Data Processing.\n",
26 | " * Batch data processing is typically done using distributed compute frameworks such as Spark.\n",
27 | " * Streaming data processing is typically done using streaming technologies such as Kafka Streams, Spark Structured Streaming etc.\n",
28 | " "
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": null,
34 | "metadata": {},
35 | "outputs": [],
36 | "source": []
37 | }
38 | ],
39 | "metadata": {
40 | "kernelspec": {
41 | "display_name": "Python 3",
42 | "language": "python",
43 | "name": "python3"
44 | },
45 | "language_info": {
46 | "codemirror_mode": {
47 | "name": "ipython",
48 | "version": 3
49 | },
50 | "file_extension": ".py",
51 | "mimetype": "text/x-python",
52 | "name": "python",
53 | "nbconvert_exporter": "python",
54 | "pygments_lexer": "ipython3",
55 | "version": "3.6.12"
56 | }
57 | },
58 | "nbformat": 4,
59 | "nbformat_minor": 4
60 | }
61 |
--------------------------------------------------------------------------------
/bonus_data_warehousing_concepts/05_key_data_warehouse_concepts.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Key Data Warehouse Concepts\n",
8 | "\n",
9 | "Let us go through the key data warehouse concepts.\n",
10 | "* Facts\n",
11 | " * Fact tables typically contain dimension keys and measures.\n",
12 | " * Measures are nothing but metrics at the dimension level using which fact tables are created.\n",
13 | "* Dimensions\n",
14 | "* Measures\n",
15 | "* Granularity\n",
16 | "* Hierarchy"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": null,
22 | "metadata": {},
23 | "outputs": [],
24 | "source": [
25 | " "
26 | ]
27 | }
28 | ],
29 | "metadata": {
30 | "kernelspec": {
31 | "display_name": "Python 3",
32 | "language": "python",
33 | "name": "python3"
34 | },
35 | "language_info": {
36 | "codemirror_mode": {
37 | "name": "ipython",
38 | "version": 3
39 | },
40 | "file_extension": ".py",
41 | "mimetype": "text/x-python",
42 | "name": "python",
43 | "nbconvert_exporter": "python",
44 | "pygments_lexer": "ipython3",
45 | "version": "3.6.12"
46 | }
47 | },
48 | "nbformat": 4,
49 | "nbformat_minor": 4
50 | }
51 |
--------------------------------------------------------------------------------
/bonus_data_warehousing_concepts/06_dimensional_modeling.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Dimensional Modeling\n",
8 | "\n",
9 | "* **daily_product_revenue_fact**\n",
10 | " * **date_id** - **yyyyMMdd**\n",
11 | " * **product_id**\n",
12 | " * **product_revenue**\n",
13 | " * **product_count**\n",
14 | "* **date_dim** (parent table for daily_product_revenue_fact using date_id)\n",
15 | " * **date_id** - unique or primary key\n",
16 | " * **day_name**\n",
17 | " * **holiday_flag**\n",
18 | " * **week**\n",
19 | " * **day_of_month**\n",
20 | " * **month_name**\n",
21 | " * **quarter**\n",
22 | " * **year**\n",
23 | "* **product_dim** (parent table for daily_product_revenue_fact using product_id)\n",
24 | " * **product_id** - unique or primary key\n",
25 | " * **product_name**\n",
26 | " * **category_name**\n",
27 | " * **department_name**"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": []
36 | }
37 | ],
38 | "metadata": {
39 | "kernelspec": {
40 | "display_name": "Python 3",
41 | "language": "python",
42 | "name": "python3"
43 | },
44 | "language_info": {
45 | "codemirror_mode": {
46 | "name": "ipython",
47 | "version": 3
48 | },
49 | "file_extension": ".py",
50 | "mimetype": "text/x-python",
51 | "name": "python",
52 | "nbconvert_exporter": "python",
53 | "pygments_lexer": "ipython3",
54 | "version": "3.6.12"
55 | }
56 | },
57 | "nbformat": 4,
58 | "nbformat_minor": 4
59 | }
60 |
--------------------------------------------------------------------------------
/bonus_overview_of_redshift/01_overview_of_redshift.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Overview of Redshift\n",
8 | "\n",
9 | "As part of this section we will primarily get a detailed overview of AWS Redshift database.\n",
10 | "* Setup AWS Redshift Database\n",
11 | "* Using Query Editor\n",
12 | "* Accessing Redshift Publicly\n",
13 | "* Connecting using psql\n",
14 | "* Using IDEs - SQL Workbench\n",
15 | "* Using Jupyter Environment\n",
16 | "* Overview of SQL using Redshift\n",
17 | "* Evolution of DW Databases\n",
18 | "* Postgres vs. Redshift"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": []
27 | }
28 | ],
29 | "metadata": {
30 | "kernelspec": {
31 | "display_name": "Python 3",
32 | "language": "python",
33 | "name": "python3"
34 | },
35 | "language_info": {
36 | "codemirror_mode": {
37 | "name": "ipython",
38 | "version": 3
39 | },
40 | "file_extension": ".py",
41 | "mimetype": "text/x-python",
42 | "name": "python",
43 | "nbconvert_exporter": "python",
44 | "pygments_lexer": "ipython3",
45 | "version": "3.6.12"
46 | }
47 | },
48 | "nbformat": 4,
49 | "nbformat_minor": 4
50 | }
51 |
--------------------------------------------------------------------------------
/bonus_overview_of_redshift/02_setup_aws_redshift.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Setup AWS Redshift\n",
8 | "\n",
9 | "Let us go ahead and setup Redshift on AWS.\n",
10 | "* We can setup a single node Redshift cluster using free tier. \n",
11 | "* Here are the configuration details of free tier single node Redshift cluster.\n",
12 | " * Node Count: 1\n",
13 | " * Node Type: dc2.large (2 vCPU and 15 GB Memory)\n",
14 | " * Storage: 160 GB (SSD)\n",
15 | "* Here are the steps to setup Redshift using free tier using **AWS Web Console**.\n",
16 | " * Search for **Redshift** and choose the service using search bar in AWS Web Console.\n",
17 | " * Click on **Create Cluster**\n",
18 | " * Choose **Free Trial**\n",
19 | " * Here are the important areas which you might have to change. You need to enter Master user password\n",
20 | " * Database Name (defaut: retail_db)\n",
21 | " * Database Port (default: 5439)\n",
22 | " * Master user name (default: awsuser)\n",
23 | " * Master user password\n",
24 | "* Optionally select security group for your cluster. You can change others as well if you have to.\n",
25 | " * **Security group** is firewall for any AWS Compute resource.\n",
26 | " * Go to **Additional configurations** by disabling **Use defaults\"\n",
27 | " * Expand **Network and security**\n",
28 | " * Then go to **VPC security groups**\n",
29 | " * Close **default**\n",
30 | "* Click on create cluster and wait for few minutes.\n",
31 | "* One of the area which you should focus to get hands straight on Redshift as a developer or engineer is **Query Editor**."
32 | ]
33 | }
34 | ],
35 | "metadata": {
36 | "kernelspec": {
37 | "display_name": "Python 3",
38 | "language": "python",
39 | "name": "python3"
40 | },
41 | "language_info": {
42 | "codemirror_mode": {
43 | "name": "ipython",
44 | "version": 3
45 | },
46 | "file_extension": ".py",
47 | "mimetype": "text/x-python",
48 | "name": "python",
49 | "nbconvert_exporter": "python",
50 | "pygments_lexer": "ipython3",
51 | "version": "3.6.12"
52 | }
53 | },
54 | "nbformat": 4,
55 | "nbformat_minor": 4
56 | }
57 |
--------------------------------------------------------------------------------
/bonus_overview_of_redshift/03_using_query_editor.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Using Query Editor\n",
8 | "\n",
9 | "Redshift cluster comes with query editor to build and run queries. We will get started with query editor by creating user for retail_db database. Also we will create and drop simple table.\n",
10 | "* Here are the steps to access and connect to database using query editor.\n",
11 | " * Make sure cluster is up and running.\n",
12 | " * You will see **QUERIES** for saved queries and **EDITOR** for Redshift Query Editor.\n",
13 | " * Click on **Query Editor**. It will prompt for the following information (if you have only one Redshift Cluster)\n",
14 | " * Database Name\n",
15 | " * Master user name\n",
16 | " * Master user password\n",
17 | "* Now we should be able to use **Query Editor** to run one query at a time.\n",
18 | "* It is primarily used to quickly validate queries and also to run individual queries to get started.\n",
19 | "* On day to day bases, we typically connect to Redshift using Postgres based CLI `psql` or IDEs such as **SQL Workbench**.\n",
20 | "* Let us go a head and run below scripts one at a time. By default, Redshift enforces restrictions while creating passwords. Make sure to use cryptic passwords.\n",
21 | "\n",
22 | "```sql\n",
23 | "CREATE USER retail_user WITH PASSWORD 'Retail_P@ssw0rd';\n",
24 | "GRANT ALL ON DATABASE retail_db TO retail_user;\n",
25 | "```\n",
26 | "\n",
27 | "```{warning}\n",
28 | "It is not a good practice to grant all on database to the user.\n",
29 | "```\n",
30 | "\n",
31 | "* Now you can connect to retail_db as retail_user using psql or SQL Workbench and create tables. We will see this later.\n",
32 | "* We can also run below commands or queries to create tables in Redshift. Let us go ahead and create tables in **public** schema. Remember that we are connected to query editor as master user **awsuser** and database **retail_db** and hence the tables will be created in **public** schema with in **retail_db** database.\n",
33 | "\n",
34 | "```sql\n",
35 | "CREATE TABLE t (i INT);\n",
36 | "INSERT INTO t VALUES (1);\n",
37 | "SELECT * FROM t;\n",
38 | "DROP TABLE t;\n",
39 | "```"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": null,
45 | "metadata": {},
46 | "outputs": [],
47 | "source": []
48 | }
49 | ],
50 | "metadata": {
51 | "kernelspec": {
52 | "display_name": "Python 3",
53 | "language": "python",
54 | "name": "python3"
55 | },
56 | "language_info": {
57 | "codemirror_mode": {
58 | "name": "ipython",
59 | "version": 3
60 | },
61 | "file_extension": ".py",
62 | "mimetype": "text/x-python",
63 | "name": "python",
64 | "nbconvert_exporter": "python",
65 | "pygments_lexer": "ipython3",
66 | "version": "3.6.12"
67 | }
68 | },
69 | "nbformat": 4,
70 | "nbformat_minor": 4
71 | }
72 |
--------------------------------------------------------------------------------
/bonus_overview_of_redshift/04_accessing_redshift_publicly.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Accessing Redshift Publicly\n",
8 | "\n",
9 | "Let us understand how we can make Redshift publicly accessible from remote machines such as our personal PC or even a server.\n",
10 | "\n",
11 | "* By default, we can only access Redshift using servers with in the AWS using Private IP.\n",
12 | "* There are multiple ways to connect to Redshift from remote machines - the most straight forward way is to make it publicy accessible. \n",
13 | "* Publicly accessible clusters is not a typically choice in enterprise setup due to security reasons. But as individuals we can make them publicly accessible for our learning purposes.\n",
14 | "* Here are the steps that need to be followed to make a Redshift cluster publicly accessible.\n",
15 | " * Create elastic ip by going to EC2 Dashboard and then Elastic IPs under Network and Security.\n",
16 | " * You can review whether the cluster is publicly accessible or not by going to properties for the cluster.\n",
17 | " * You can click on Actions and then click on \"Modify publicly accessible setting\"\n",
18 | " * Make sure to select **Yes** and attache the elastic ip created in the first step.\n",
19 | "* Once the server is publicly accessible, you need to add the IP of the machine or server from which you want to connect as part of the security group as demonstrated.\n",
20 | "* We will be using `psql` to validate as part of the next topic. For now we can use telnet to validate."
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": null,
26 | "metadata": {},
27 | "outputs": [],
28 | "source": []
29 | }
30 | ],
31 | "metadata": {
32 | "kernelspec": {
33 | "display_name": "Python 3",
34 | "language": "python",
35 | "name": "python3"
36 | },
37 | "language_info": {
38 | "codemirror_mode": {
39 | "name": "ipython",
40 | "version": 3
41 | },
42 | "file_extension": ".py",
43 | "mimetype": "text/x-python",
44 | "name": "python",
45 | "nbconvert_exporter": "python",
46 | "pygments_lexer": "ipython3",
47 | "version": "3.6.12"
48 | }
49 | },
50 | "nbformat": 4,
51 | "nbformat_minor": 4
52 | }
53 |
--------------------------------------------------------------------------------
/bonus_overview_of_redshift/05_connecting_using_psql.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Connecting using psql\n",
8 | "\n",
9 | "Let us understand how we can connect to databases created in Redshift server using Postgres CLI `psql`.\n",
10 | "\n",
11 | "* By default AWS will not let us to connect from out side systems such as our PCs.\n",
12 | "* Here are the steps involved to connect to databases created in our AWS Redshift cluster.\n",
13 | " * Get the endpoint (from General Information of the Redshift cluster). We will use it later.\n",
14 | " * Make sure required databases and users are created as demonstrated in the previous topic. It is not advisable to use **master user** credentials to connect to databases in Redshift.\n",
15 | " * If you are using your system, you need to Postgres binaries installed. Redshift is a Postgres flavor.\n",
16 | " * Only when we have Postgres binaries installed we will be able to use `psql`.\n",
17 | " * Now you need to open up port used while creating the cluster by going to security group (AWS Firewall). In our case we have used 5439 which is default port number.\n",
18 | " * You can access security group under **properties** -> **Network and Security**.\n",
19 | " * You can open the port 5439 with your PC or Server IP (safest) or 0.0.0.0/0.\n",
20 | " * If you have created your cluster by using defaults for additional configurations, then the cluster is not publicly accessible.\n",
21 | " * Either you need to connect to cluster with in VPC or you need to make it publicly accessible.\n",
22 | " * In case you have created your cluster with default configurations, you can follow \"Steps to make Redshift cluster publicly accessible\"\n",
23 | " * Now you can connect to the cluster using `psql` with the following information.\n",
24 | " * Hostname: DNS from endpoint - **redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com**\n",
25 | " * Port Number: 5439\n",
26 | " * Database: retail_db\n",
27 | " * Username: retail_user\n",
28 | " * Password: **Your Password** (in my case it is Retail_p@ssw0rd)\n",
29 | " * Here is the sample command to connect to Redshift database using `psql`. It will prompt for the password and make sure to enter the right password.\n",
30 | " \n",
31 | "```shell\n",
32 | "psql -h redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com \\\n",
33 | " -p 5439 \\\n",
34 | " -d retail_db \\\n",
35 | " -U retail_user \\\n",
36 | " -W\n",
37 | "```\n",
38 | "\n",
39 | "* You can setup tables and load the data following these steps.\n",
40 | " * Clone GitHub repository to the desired location.\n",
41 | " * Run the below script to create tables. Make sure to use correct location of the script as per your environment.\n",
42 | "\n",
43 | "```shell\n",
44 | "psql -h redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com \\\n",
45 | " -p 5439 \\\n",
46 | " -d retail_db \\\n",
47 | " -U retail_user \\\n",
48 | " -W \\\n",
49 | "-f ~/Research/data/retail_db/create_db_tables_pg.sql\n",
50 | "```\n",
51 | "\n",
52 | " * Run the below script to load data in the tables. Make sure to use correct location of the script as per your environment.\n",
53 | "\n",
54 | "```shell\n",
55 | "psql -h redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com \\\n",
56 | " -p 5439 \\\n",
57 | " -d retail_db \\\n",
58 | " -U retail_user \\\n",
59 | " -W \\\n",
60 | "-f ~/Research/data/retail_db/load_db_tables_pg.sql\n",
61 | "```\n",
62 | "\n",
63 | "* Connect to the database using `psql`.\n",
64 | "\n",
65 | "```shell\n",
66 | "psql -h redshift-cluster-1.ckxblouy7rzo.us-east-1.redshift.amazonaws.com \\\n",
67 | " -p 5439 \\\n",
68 | " -d retail_db \\\n",
69 | " -U retail_user \\\n",
70 | " -W\n",
71 | "```\n",
72 | "\n",
73 | "* Run below queries to validate data in orders table. \n",
74 | "\n",
75 | "```sql\n",
76 | "SELECT * FROM orders LIMIT 10;\n",
77 | "SELECT count(1) FROM orders;\n",
78 | "```"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": null,
84 | "metadata": {},
85 | "outputs": [],
86 | "source": []
87 | }
88 | ],
89 | "metadata": {
90 | "kernelspec": {
91 | "display_name": "Python 3",
92 | "language": "python",
93 | "name": "python3"
94 | },
95 | "language_info": {
96 | "codemirror_mode": {
97 | "name": "ipython",
98 | "version": 3
99 | },
100 | "file_extension": ".py",
101 | "mimetype": "text/x-python",
102 | "name": "python",
103 | "nbconvert_exporter": "python",
104 | "pygments_lexer": "ipython3",
105 | "version": "3.6.12"
106 | }
107 | },
108 | "nbformat": 4,
109 | "nbformat_minor": 4
110 | }
111 |
--------------------------------------------------------------------------------
/bonus_overview_of_redshift/06_using_ides_sql_workbench.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## SQL Workbench and Postgres\n",
8 | "\n",
9 | "Let us connect to Redshift Database using SQL Workbench.\n",
10 | "* Download the JDBC Driver\n",
11 | "* Get the database connectivity information\n",
12 | "* Configure the connection using SQL Workbench\n",
13 | "* Validate the connection and save the profile"
14 | ]
15 | },
16 | {
17 | "cell_type": "markdown",
18 | "metadata": {},
19 | "source": [
20 | "### Connecting to Postgres\n",
21 | "Here are the steps to connect to Redshift running on AWS.\n",
22 | "\n",
23 | "* We are trying to connect to Redshift Database that is running as part of AWS environment which is remote to our system.\n",
24 | "* We typically use ODBC or JDBC to connect to a Database from remote machines (our PC).\n",
25 | "* Here are the pre-requisites to connect to a Database.\n",
26 | " * Make sure 5439 port (default) is opened as part of the firewalls. You can get the Redshift port details from AWS Web Console.\n",
27 | " * If you have telnet configured on your system on which SQL Workbench is installed, make sure to validate by running telnet command using DNS Alias from end point and port number 5439 (default).\n",
28 | " * Ensure that you have downloaded right JDBC Driver for Redshift (use google search to get the appropriate version).\n",
29 | " * Make sure to have right credentials (username and password).\n",
30 | " * Ensure that you have database created on which the user have permissions.\n",
31 | "* Once you have all the information required along with JDBC jar, ensure to save the information as part of the profile. You can also validate before saving the details by using **Test** option."
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": null,
37 | "metadata": {},
38 | "outputs": [],
39 | "source": []
40 | }
41 | ],
42 | "metadata": {
43 | "kernelspec": {
44 | "display_name": "Python 3",
45 | "language": "python",
46 | "name": "python3"
47 | },
48 | "language_info": {
49 | "codemirror_mode": {
50 | "name": "ipython",
51 | "version": 3
52 | },
53 | "file_extension": ".py",
54 | "mimetype": "text/x-python",
55 | "name": "python",
56 | "nbconvert_exporter": "python",
57 | "pygments_lexer": "ipython3",
58 | "version": "3.6.12"
59 | }
60 | },
61 | "nbformat": 4,
62 | "nbformat_minor": 4
63 | }
64 |
--------------------------------------------------------------------------------
/bonus_overview_of_redshift/08_data_loading_utilities.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Data Loading Utilities\n",
8 | "\n",
9 | "Let us understand how we can load the data into Redshift tables using `COPY` command or utility.\n",
10 | "* First let us understand the data processing or engineering life cycle to understand the relevance of `COPY` command.\n",
11 | " * Source Databases -> Files -> Database for Stage Data (ODS) -> Database for Curated Data (Data Marts) -> Reports\n",
12 | "* One of the major data source to Data Warehouse Databases such as Posgres is data in the form of files.\n",
13 | "* Typically data from sources are shipped in the form of files.\n",
14 | "* When we use cloud based services such as AWS Redshift, the files are typically shipped to s3 and then data is supposed to be loaded into stage tables.\n",
15 | "* We use stage tables to perform operations such as deduplication, upsert or merge etc. After data is processed, we typically truncate the stage tables.\n",
16 | "* For such kind of tables, the fastest way to get data from files is by using utilities such as `COPY`. Redshift `COPY` command facilitates us to copy data from s3 into the tables directly.\n",
17 | "* Here are the pre-requisites to understand how to use `COPY` command to get data into Redshift (stage) tables.\n",
18 | " * Make sure data is copied to s3 bucket.\n",
19 | " * We need to ensure that the machine from which `COPY` command need to run have Postgres binaries installed so that we can launch `psql`.\n",
20 | " * We either need to use AWS root credentials (access key and secret key) or IAM role to run `COPY` command using `psql`.\n",
21 | "* Setup and validate AWS root credentials\n",
22 | " * \n",
23 | "* Download the data or use existing data. In actual projects data will be shipped by source system into s3. They will use standard naming conventions for folders and files as per the technical specification document.\n",
24 | "* Launch `psql`\n",
25 | "* Run `COPY` command"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": null,
31 | "metadata": {},
32 | "outputs": [],
33 | "source": []
34 | }
35 | ],
36 | "metadata": {
37 | "kernelspec": {
38 | "display_name": "Python 3",
39 | "language": "python",
40 | "name": "python3"
41 | },
42 | "language_info": {
43 | "codemirror_mode": {
44 | "name": "ipython",
45 | "version": 3
46 | },
47 | "file_extension": ".py",
48 | "mimetype": "text/x-python",
49 | "name": "python",
50 | "nbconvert_exporter": "python",
51 | "pygments_lexer": "ipython3",
52 | "version": "3.6.12"
53 | }
54 | },
55 | "nbformat": 4,
56 | "nbformat_minor": 4
57 | }
58 |
--------------------------------------------------------------------------------
/mastering-sql-using-postgresql.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Mastering SQL using Postgresql\n",
8 | "\n",
9 | "This course is primarily designed to learn basic and advanced SQL using Postgresql Database."
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## About Postgresql\n",
17 | "\n",
18 | "Postgresql is one of the leading datatabase. It is an open source database and used for different types of applications.\n",
19 | "* Web Applications\n",
20 | "* Mobile Applications\n",
21 | "* Data Logging Applications\n",
22 | "\n",
23 | "Even though it is relational database and best suited for transactional systems (OLTP), it's flavors such as Redshift are extensively used for Analytical or Decision Support Systems."
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "## Course Details\n",
31 | "\n",
32 | "This course is primarily designed to go through basic and advanced SQL using Postgres Database. You will be learning following aspects of SQL as well as Postgres Database.\n",
33 | "\n",
34 | "* Setup Postgres Database using Docker\n",
35 | "* Connect to Postgres using different interfaces such as psql, SQL Workbench, Jupyter with SQL magic etc.\n",
36 | "* Understand utilities to load the data\n",
37 | "* Overview of Normalization Principles and Relations (RDBMS Concepts)\n",
38 | "* Performing CRUD or DML Operations\n",
39 | "* Writing basic SQL Queries such as filtering, joins, aggregations, sorting etc\n",
40 | "* Creating tables, constraints and indexes\n",
41 | "* Different partitioning strategies while creating tables\n",
42 | "* Using pre-defined functions provided by Postgresql\n",
43 | "* Writing advanced SQL queries using analytic functions\n",
44 | "* Overview of query performance tuning with emphasis on explain plans and different tuning techniques\n",
45 | "* Difference between RDBMS and Data Warhousing with live examples."
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "## Desired Audience\n",
53 | "\n",
54 | "Here are the desired audience for this course.\n",
55 | "* College students and entry level professionals to get hands on expertise with respect to SQL to be prepared for the interviews.\n",
56 | "* Experienced application developers to understand key aspects of Databases to improve their productivity.\n",
57 | "* Data Engineers and Data Warehouse Developers to understand the relevance of SQL and other key concepts.\n",
58 | "* Testers to improve their query writing abilities to validate data in the tables as part of running their test cases.\n",
59 | "* Business Analysts to write ad-hoc queries to understand data better or troubleshoot data quality issues.\n",
60 | "* Any other hands on IT Professional who want to improve their query writing and tuning capabilities.\n",
61 | "\n",
62 | "```{note}\n",
63 | "Developers from non CS or IT background at times struggle in writing queries and this course will provide required database skills to take their overall application development skills to next level.\n",
64 | "```"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "## Prerequisites\n",
72 | "\n",
73 | "Here are the prerequisites before signing up for the course.\n",
74 | "\n",
75 | "````{panels}\n",
76 | "**Logistics**\n",
77 | "\n",
78 | "* Computer with decent configuration\n",
79 | " * At least 4 GB RAM\n",
80 | " * 8 GB RAM is highly desired\n",
81 | "* Chrome Browser\n",
82 | "* High Speed Internet\n",
83 | "\n",
84 | "---\n",
85 | "\n",
86 | "**Desired Skills**\n",
87 | "\n",
88 | "* Engineering or Science Degree\n",
89 | "* Ability to use computer\n",
90 | "* Knowledge or working experience with databases is highly desired\n",
91 | "\n",
92 | "````"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {},
98 | "source": [
99 | "## Key Objectives\n",
100 | "The course is designed for the professionals to achieve these key objectives related to databases using Postgresql.\n",
101 | "* Ability to interpret data models.\n",
102 | "* Using database IDEs to interact with databases.\n",
103 | "* Data loading strategies to load data into database tables.\n",
104 | "* Write basic as well as advanced SQL queries.\n",
105 | "* Ability to create tables, partition tables, indexes etc.\n",
106 | "* Understand and use constraints effectively based up on the requirements.\n",
107 | "* Effective usage of functions provided by Postgresql.\n",
108 | "* Understand basic performance tuning strategies\n",
109 | "* Differences between RDBMS and Data Warehouse concepts by comparing Postgresql with Redshift.\n",
110 | "\n",
111 | "```{attention}\n",
112 | "This course is primarily designed to gain key database skills for application developers, data engineers, testers, business analysts etc.\n",
113 | "```"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "## Training Approach\n",
121 | "Here are the details related to the training approach.\n",
122 | "* It is self paced with reference material, code snippets and videos.\n",
123 | "* One can either use environment provided by us or setup their own environment using Docker.\n",
124 | "* Modules will be published as and when they are ready. We would recommend to complete **2 modules every week** by spending **4 to 5 hours per week**.\n",
125 | "* It is highly recommended to take care of the exercises at the end to ensure that you are able to meet all the key objectives for each module.\n",
126 | "* Support will be provided either through chat or email.\n",
127 | "* For those who signed up, we will have weekly monitoring and review sessions to keep track of the progress.\n",
128 | "\n",
129 | "```{attention}\n",
130 | "Spend 4 to 5 hours per week up to 8 weeks and complete all the exercises to get best out of this course.\n",
131 | "```"
132 | ]
133 | },
134 | {
135 | "cell_type": "markdown",
136 | "metadata": {},
137 | "source": [
138 | "## Self Evaluation\n",
139 | "\n",
140 | "The course is designed in such a way that one can self evaluate through the course and confirm whether the skills are acquired.\n",
141 | "* Here is the approach we recommend you to take this course.\n",
142 | " * Go through the consolidated exercises and see if you are able to solve the problems or not.\n",
143 | " * Make sure to follow the order we have defined as part of the course.\n",
144 | " * After each and every section or module, make sure to solve the exercises. We have provided enough information to validate the output of your queries.\n",
145 | " * After the completion of the course try to solve the exercises using consolidated list.\n",
146 | " * Keep in mind that you will be reviewing the same exercises before the course, during the course as well as at the end of the course.\n",
147 | "* By the end of the course, if you are able to solve the problems, then you can come to a conclusion that you are able to master the key skill called as SQL."
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": null,
153 | "metadata": {},
154 | "outputs": [],
155 | "source": []
156 | }
157 | ],
158 | "metadata": {
159 | "kernelspec": {
160 | "display_name": "Python 3",
161 | "language": "python",
162 | "name": "python3"
163 | },
164 | "language_info": {
165 | "codemirror_mode": {
166 | "name": "ipython",
167 | "version": 3
168 | },
169 | "file_extension": ".py",
170 | "mimetype": "text/x-python",
171 | "name": "python",
172 | "nbconvert_exporter": "python",
173 | "pygments_lexer": "ipython3",
174 | "version": "3.6.12"
175 | }
176 | },
177 | "nbformat": 4,
178 | "nbformat_minor": 4
179 | }
180 |
--------------------------------------------------------------------------------