├── assets
    ├── DW.PNG
    ├── graph.png
    ├── 2-graph.png
    ├── 4-graph.png
    ├── analytics.jpg
    ├── document.png
    ├── graph (1).png
    ├── key-value.png
    ├── 2-key-value.png
    ├── 2-relations.png
    ├── 3-relations.png
    ├── 2-data-process.png
    ├── 2-etl-vs-elt.png
    ├── column-family.png
    ├── copy-activity.png
    ├── 2-process-stage.png
    ├── snowflake-design.png
    ├── 2-tabular-diagram.png
    ├── 4-analytics-table.png
    ├── olap-data-pipeline.png
    ├── modern-data-warehouse.png
    ├── star-schema-example1.png
    ├── 2-extract-load-transform.png
    ├── 2-extract-transform-load.png
    ├── 6-mysql-mariadb-postgresql.png
    ├── 5-azure-sql-database-graphic.png
    ├── four-types-business-analytics.png
    └── big-data-logical.svg
├── LICENSE
├── relationalData.md
├── nonrelational.md
├── data.md
├── analytics.md
└── README.md


/assets/DW.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/DW.PNG


--------------------------------------------------------------------------------
/assets/graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/graph.png


--------------------------------------------------------------------------------
/assets/2-graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-graph.png


--------------------------------------------------------------------------------
/assets/4-graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/4-graph.png


--------------------------------------------------------------------------------
/assets/analytics.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/analytics.jpg


--------------------------------------------------------------------------------
/assets/document.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/document.png


--------------------------------------------------------------------------------
/assets/graph (1).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/graph (1).png


--------------------------------------------------------------------------------
/assets/key-value.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/key-value.png


--------------------------------------------------------------------------------
/assets/2-key-value.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-key-value.png


--------------------------------------------------------------------------------
/assets/2-relations.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-relations.png


--------------------------------------------------------------------------------
/assets/3-relations.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/3-relations.png


--------------------------------------------------------------------------------
/assets/2-data-process.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-data-process.png


--------------------------------------------------------------------------------
/assets/2-etl-vs-elt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-etl-vs-elt.png


--------------------------------------------------------------------------------
/assets/column-family.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/column-family.png


--------------------------------------------------------------------------------
/assets/copy-activity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/copy-activity.png


--------------------------------------------------------------------------------
/assets/2-process-stage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-process-stage.png


--------------------------------------------------------------------------------
/assets/snowflake-design.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/snowflake-design.png


--------------------------------------------------------------------------------
/assets/2-tabular-diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-tabular-diagram.png


--------------------------------------------------------------------------------
/assets/4-analytics-table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/4-analytics-table.png


--------------------------------------------------------------------------------
/assets/olap-data-pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/olap-data-pipeline.png


--------------------------------------------------------------------------------
/assets/modern-data-warehouse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/modern-data-warehouse.png


--------------------------------------------------------------------------------
/assets/star-schema-example1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/star-schema-example1.png


--------------------------------------------------------------------------------
/assets/2-extract-load-transform.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-extract-load-transform.png


--------------------------------------------------------------------------------
/assets/2-extract-transform-load.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/2-extract-transform-load.png


--------------------------------------------------------------------------------
/assets/6-mysql-mariadb-postgresql.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/6-mysql-mariadb-postgresql.png


--------------------------------------------------------------------------------
/assets/5-azure-sql-database-graphic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/5-azure-sql-database-graphic.png


--------------------------------------------------------------------------------
/assets/four-types-business-analytics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/codess-aus/DP-900/HEAD/assets/four-types-business-analytics.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Michelle Mei-Ling Sandford
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/assets/big-data-logical.svg:
--------------------------------------------------------------------------------
1 | <svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" width="751" height="266.87" viewBox="0 0 751 266.87"><defs><style>text{font-family:segoe-ui_normal,Segoe UI,Segoe,Segoe WP,Helvetica Neue,Helvetica,sans-serif;}.cls-1{fill:none;stroke:#afafaf;stroke-miterlimit:10;stroke-width:6px;}.cls-2{fill:#afafaf;}.cls-3{fill:#3fa0ec;}.cls-4{font-size:18px;fill:#fff;}.cls-5{letter-spacing:-0.03em;}.cls-6{letter-spacing:-0.03em;}</style></defs><title>big-data-logical</title><line class="cls-1" x1="102" y1="152" x2="132.24" y2="152"/><polygon class="cls-2" points="130.05 159.48 143 152 130.05 144.52 130.05 159.48"/><line class="cls-1" x1="105" y1="56.86" x2="132.24" y2="56.86"/><polygon class="cls-2" points="130.05 64.34 143 56.86 130.05 49.38 130.05 64.34"/><line class="cls-1" x1="596" y1="104.87" x2="626.23" y2="104.88"/><polygon class="cls-2" points="624.04 112.35 637 104.88 624.05 97.39 624.04 112.35"/><line class="cls-1" x1="446" y1="169.82" x2="626.24" y2="169.22"/><polygon class="cls-2" points="624.07 176.71 637 169.19 624.02 161.75 624.07 176.71"/><line class="cls-1" x1="451" y1="134.18" x2="477.22" y2="134.18"/><polygon class="cls-2" points="475.03 141.66 487.99 134.18 475.03 126.7 475.03 141.66"/><line class="cls-1" x1="446" y1="39.05" x2="626.24" y2="39.64"/><polygon class="cls-2" points="624.02 47.11 637 39.68 624.07 32.15 624.02 47.11"/><line class="cls-1" x1="451" y1="74.68" x2="477.22" y2="74.68"/><polygon class="cls-2" points="475.03 82.16 487.99 74.68 475.03 67.2 475.03 82.16"/><rect class="cls-3" x="142.5" y="118.64" width="182" height="67" rx="8" ry="8"/><text class="cls-4" transform="translate(156.49 145.29)"><tspan class="cls-5">R</tspan><tspan x="10.24" y="0">eal-Time Message</tspan><tspan x="40.1" y="21.6">Ingestion</tspan></text><rect class="cls-3" x="328.5" y="118.36" width="123" height="67" rx="8" ry="8"/><text class="cls-4" transform="translate(362.29 145.01)"><tspan class="cls-6">S</tspan><tspan x="8.98" y="0">tream</tspan><tspan x="-14.81" y="21.6">Processing</tspan></text><rect class="cls-3" x="328.5" y="23.36" width="123" height="67" rx="8" ry="8"/><text class="cls-4" transform="translate(367.96 50.01)">Batch<tspan x="-20.48" y="21.6">Processing</tspan></text><rect class="cls-3" x="142.5" y="23.36" width="182" height="67" rx="8" ry="8"/><text class="cls-4" transform="translate(181.99 60.81)">Data <tspan class="cls-6" x="41.97" y="0">S</tspan><tspan x="50.95" y="0">torage</tspan></text><rect class="cls-3" x="637" width="114" height="209" rx="8" ry="8"/><text class="cls-4" transform="translate(660.78 86.85)">Analytics<tspan x="20.25" y="21.6">and</tspan><tspan class="cls-5" x="-3.92" y="43.2">R</tspan><tspan x="6.31" y="43.2">eporting</tspan></text><rect class="cls-3" x="0.25" width="105" height="209" rx="8" ry="8"/><text class="cls-4" transform="translate(34.23 97.65)">Data<tspan x="-12.44" y="21.6">Sources</tspan></text><rect class="cls-3" y="217.87" width="751" height="49" rx="8" ry="8"/><text class="cls-4" transform="translate(321.46 246.32)">Orchestration</text><rect class="cls-3" x="488.75" y="52.5" width="114" height="104" rx="8" ry="8"/><text class="cls-4" transform="translate(509.59 97.65)">Analytical<tspan x="-3.48" y="21.6">Data </tspan><tspan class="cls-6" x="38.49" y="21.6">S</tspan><tspan x="47.47" y="21.6">tore</tspan></text></svg>


--------------------------------------------------------------------------------
/relationalData.md:
--------------------------------------------------------------------------------
  1 | ## Describe how to work with relational data on Azure (25-30%)
  2 | 
  3 | ### Describe relational data workloads:
  4 | * identify the right data offering for a relational workload
  5 | * describe relational data structures (e.g., tables, index, views)
  6 | 
  7 | ### Describe relational Azure data services:
  8 | * describe and compare PaaS, IaaS, and SaaS solutions
  9 | * describe Azure SQL database services including Azure SQL Database, Azure SQL
 10 | 
 11 | ### Managed Instance, and SQL Server on Azure Virtual Machine:
 12 | * describe Azure Synapse Analytics
 13 | * describe Azure Database for PostgreSQL, Azure Database for MariaDB, and Azure Database for MySQL
 14 | 
 15 | ### Identify basic management tasks for relational data:
 16 | * describe provisioning and deployment of relational data services
 17 | * describe method for deployment including the Azure portal, Azure Resource Manager templates, Azure PowerShell, and the Azure command-line interface (CLI)
 18 | * identify data security components (e.g., firewall, authentication)
 19 | * identify basic connectivity issues (e.g., accessing from on-premises, access with Azure VNets, access from Internet, authentication, firewalls)
 20 | * identify query tools (e.g., Azure Data Studio, SQL Server Management Studio, sqlcmd utility, etc.)
 21 | 
 22 | ### Describe query techniques for data using SQL language:
 23 | * compare Data Definition Language (DDL) versus Data Manipulation Language (DML)
 24 | * query relational data in Azure SQL Database, Azure Database for PostgreSQL, and Azure Database for MySQL
 25 | 
 26 | A primary use of **relational** databases is to handle transaction processing.
 27 | 
 28 | The SQL language has four basic types of commands.
 29 | 
 30 | **Data Manipulation Language DML commands** are used to *manipulate rows* in a table. They include:
 31 | * DELETE
 32 | * INSERT
 33 | * SELECT
 34 | * UPDATE
 35 | 
 36 | **Data Definition Language (DDL) commands** are used to *create, modify, and delete database objects*. They include:
 37 | * ALTER
 38 | * CREATE
 39 | * DROP
 40 | * RENAME
 41 | 
 42 | **Data Control Language (DCL) commands** are for *access control* and permission management. They include:
 43 | * DENY
 44 | * GRANT
 45 | * REVOKE
 46 | 
 47 | **Transaction Control Language (TCL) commands** are used to *manage and control transactions*. They include:
 48 | * BEGIN TRAN
 49 | * COMMIT TRAN
 50 | * ROLLBACK
 51 | 
 52 | A **key/value data store**:
 53 | * A key/value data store functions essentially as a large hash table and is optimized for fast data writes.
 54 | * Each data row is referenced by a single key value.
 55 | * The only operations supported are simple query, insert, and delete operations.
 56 | * Data updates require the application to rewrite the data for the entire value.
 57 | * Queries can be run by a key or a range of keys.
 58 | 
 59 | **A column-family (columnar) data store**: 
 60 | * A column-family data store is similar to a relational data store in that data is organized as rows and columns, but the columns are divided into column families that can store multiple values in a single column.
 61 | * A row does not necessarily have a value in each column family. 
 62 | * Columns within a column family are physically stored in the same file.
 63 | * A column-family data store is similar to a relational data store in that data is organized as rows and columns, but the columns are divided into column families that can store multiple values in a single column. 
 64 | * Columns within a column family are physically stored in the same file.
 65 | * Data is denormalized and relationships are not defined between entities.
 66 | 
 67 | **A Table Data Store**: 
 68 | * A table data store uses a row and column data format with the data somewhat normalized but the same schema is not enforced across all rows.
 69 | * Each row can have a different number of columns.
 70 | * In Azure Table store, data is organized based on a partition key and a row key.
 71 | * The partition key identifies the partition in which the data is stored, and the row uniquely identifies the row within the partition.
 72 | * You would use a table data store for denormalized data that is stored in a row/column structure with a variable number of columns per row. 
 73 | * Table storage can be used for unstructured data, semi-structured data, or a mix of data types. 
 74 | * This type of data is supported through Azure Table storage or an Azure Cosmos DB database.
 75 | 
 76 | **A Graph Data Store**: 
 77 | * A graph data store is designed to support extensive, complex relationships between entities. This helps to make it easier to perform complex relation analysis.
 78 | * Made up of entities and relationships that are referred to as nodes and edges.
 79 | * You can have multiple relationships between entities, including hierarchical relationships.
 80 | * A graph data store is designed to support extensive, complex relationships between entities. This helps to make it easier to perform complex relation analysis.
 81 | 
 82 | 
 83 | **A Document Store**:
 84 | * A document store supports semi-structured documents.
 85 | * Each document is identified as a single key, and the data schema is defined internally in each document named fields and values.
 86 | * The schema and content can vary between documents.
 87 | * Each document typically contains the data for a single entity.
 88 | * Relationships are not defined between documents.
 89 | * Semi-structured data with each entity providing its own field definitions is a description of document-type data, and therefore a document store is your best choice. 
 90 | * Documents are written and retrieved as complete documents. The embedded field definitions make it possible to query documents in order to retrieve field values. 
 91 | * You would typically use an Azure Cosmos DB storage solution.
 92 | 
 93 | **An object store** is used to store unstructured data, such as audio or video files, and it does not provide for relationship analysis.
 94 | * For large audio and video files that are used as the source for streaming content, you should choose an object data store. 
 95 | * Files of this type are unstructured, non-relational data. The typical storage solution for this type of file is an object store, such as Azure Blob storage.
 96 | * Digitally signed scanned documents.
 97 | * Large binary objects, such as images, media files, and digitally signed scanned documents.
 98 | 
 99 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/5-azure-sql-database-graphic.png" alt="Azure SQL Database"></p>
100 | <p align="center"></p>


--------------------------------------------------------------------------------
/nonrelational.md:
--------------------------------------------------------------------------------
  1 | ## Describe how to work with non-relational data on Azure (25-30%)
  2 | ### Describe non-relational data workloads
  3 | * describe the characteristics of non-relational data
  4 | * describe the types of non-relational and NoSQL data
  5 | * recommend the correct data store
  6 | * determine when to use non-relational data
  7 | ### Describe non-relational data offerings on Azure
  8 | * identify Azure data services for non-relational workloads
  9 | * describe Azure Cosmos DB APIs
 10 | * describe Azure Table storage
 11 | * describe Azure Blob storage
 12 | * describe Azure File storage
 13 | ### Identify basic management tasks for non-relational data
 14 | * describe provisioning and deployment of non-relational data services
 15 | * describe method for deployment including the Azure portal, Azure Resource Manager templates, Azure PowerShell, and the Azure command-line interface (CLI)
 16 | * identify data security components (e.g., firewall, authentication, encryption)
 17 | * identify basic connectivity issues (e.g., accessing from on-premises, access with Azure VNets, access from Internet, authentication, firewalls)
 18 | * identify management tools for non-relational data
 19 | 
 20 | Each document in a **document** database typically contains all data for a single entity. 
 21 | 
 22 | The data contained in the document can vary between documents. 
 23 | 
 24 | Each document is identified by a unique key used to identify the document and each document is written or retrieved as a single block.
 25 | 
 26 | Documents in a document database do not use the same data schema for all documents. 
 27 | 
 28 | The schema is defined internally in the document, and each individual document can have a different schema. 
 29 | 
 30 | This allows for easy support of denormalized data and variations between entities. 
 31 | 
 32 | Documents in a document database do not support relationships enforced between documents. 
 33 | 
 34 | Document databases do not provide a way to establish relationships between documents. Document databases and graph databases are examples of non-relational data stores.
 35 | 
 36 | Document databases store data in JSON or XML format and do not require all documents to have the same structure.
 37 | 
 38 | Azure Cosmos DB **SQL API**. It is used to implement **Document database**. Document database stores data in JSON or XML format. Document store does not require all documents to have the same structure. Azure Cosmos DB provides Core (SQL) API for Document database implementation.
 39 | 
 40 | A **key/value** data store functions essentially as a large hash table and is optimized for fast data writes.
 41 | 
 42 | Each data row is referenced by a single key value. The only operations
 43 | supported are simple query, insert, and delete operations.
 44 | 
 45 | Data updates require the application to rewrite the data for the entire value. Queries can be run by a key or a range of keys.
 46 | 
 47 | You should use a key-value non-relational data store to maintain user preferences for your company's application. Key-value stores are highly optimized for simple searches like user preferences. A Key-value store associates each data value with a key which can be used to access the data.
 48 | 
 49 | Microsoft recommends Azure Cosmos DB Core (SQL) API for new key/value requirements. Key/value storage is also supported by Azure Table storage and Cosmos DB Table API.
 50 | 
 51 | You should use the Azure Cosmos DB Table API to implement a Key-value store. Key-value stores are highly optimized for simple lookups scenario. Azure Cosmos DB provides Table API for key-value store
 52 | implementation.
 53 | 
 54 | A **column-family** data store is similar to a relational data store in that data is organized as rows and columns, but the columns are divided into column families that can store multiple values in a single column. A row does not necessarily have a value in each column family. 
 55 | 
 56 | Columns within a column family are physically stored in the same file.
 57 | 
 58 | You should use the Cassandra API to store columnar data. This API is compatible with Apache Cassandra databases, which are column-family databases used to store columnar data consisting of row identifiers and a group of information stored in a column. Each group of information is stored in independent data structures named keyspaces.
 59 | 
 60 | You should use the Cassandra API when moving column-family format data to the cloud to support an existing application. Microsoft suggests limiting the use of the **Cassandra API to supporting existing data**, such as when **moving data to the cloud**. The Cassandra API is specifically designed to support **column-family data**.
 61 | 
 62 | A **table** data Store uses a row and column data format with the data
 63 | somewhat normalized but the same schema is not enforced across all rows. 
 64 | 
 65 | Each row can have a different number of columns. 
 66 | 
 67 | In Azure Table store, data is organized based on a partition key and a row key. The partition key identifies the partition in which the data is stored, and the row uniquely identifies the row within the partition.
 68 | 
 69 | Table storage is used for storing structured, non-relational data.
 70 | 
 71 | The two elements that compose a key in Azure Table storage are the partition key and the row key. Data stored in Azure Table storage is referred to as rows and columns, and it forms a table in which the columns may vary according to each row. The rows in a table are split into partitions, and related rows are grouped based on a common property. This common property is called a partition key. The partition key identifies the partition inside the Azure Table storage and a row key is used to uniquely identify each row in a given partition.
 72 | 
 73 | In Azure Table storage, a group of columns is not stored in different partitions. The rows in a table are split into partitions, which group together related rows based on a common property. This common property is called the partition key.
 74 | 
 75 | The number of columns in each row may not be exactly the same. Azure Table lets you store semi- structured data. Unlike in a relational table, each row can have different columns of data.
 76 | 
 77 | In Azure Table storage, data is stored as rows and columns, forming a table in which the number of columns may vary according to each row.
 78 | 
 79 | Azure Table storage supports multi-region reads replicas only. You can configure read replicas in Azure Table storage by configuring the storage account to use Read-access geo-redundant storage (RA-GRS) redundancy. This enables a readable replica in a secondary region. However, you cannot write data in the secondary region.
 80 | 
 81 | **Azure Table API** does not let you initiate failover.
 82 | 
 83 | Each row in a table can have a different number of columns in both Azure Cosmos DB and Azure Table storage. This is a defining feature of Table storage.
 84 | 
 85 | Data is organized and distributed by partition keys and row keys. This is the only indexing on Azure Table storage.
 86 | 
 87 | You can use the Table API to store key/value data organized as rows and columns, forming a table in which the number of columns may vary according to each row. This API is compatible with Azure Table storage.
 88 | 
 89 | A **graph** data store is designed to support extensive, complex relationships between entities. This helps to make it easier to perform complex relation analysis. 
 90 | * Document databases and graph databases are examples of non-relational data stores.
 91 | * Graph databases store information in the form of edges and nodes. They are used to represent complex relationships such as social interactivity.
 92 | 
 93 | You should use the **Gremlin API** to store **graph data**. This API is compatible with Gremlin, which is a graph database that stores nodes and edges used for complex relationships among entities.
 94 | 
 95 | 
 96 | You do not need to define a schema on non-relational data. A non-relational database does not require you to configure a schema. It focuses on storing the data as it is rather than manipulating the data in tables and columns like in a relational database.
 97 | 
 98 | You can use non-relational data to store data that has a highly variable structure.
 99 | 
100 | You can store entities with different structures in a non-relational data store, for example, a customer in an e-commerce platform could have multiple contact numbers or addresses, while another customer could have only one contact number. Non-relational data provides you this flexibility.
101 | 
102 | You can use indexing with non-relational data in a similar way to an index in a relational database. Some non-relational databases, such as Cosmos DB, support indexing in the fields of a stored entity, in a similar way a relational database does.
103 | 
104 | Document databases and graph databases are examples of non-relational data stores.
105 | 
106 | The Azure database for MariaDg and SQL databases are examples relational databases. Relational databases store information in the form of tables, which you can connect through relationships. Relational databases are used for highly structured data.
107 | 
108 | **Azure Blob** is the only Azure storage option that supports access tiers. The default is the Hot tier, which is designed for frequently accessed data. The Cool tier is optimized for data that will be stored for at least 30 days. The Cool tier has lower storage cost than the Hot tier but higher costs for early access. The Archive tier is designed for data that is rarely accessed and will remain in storage for at least 180 days. Access to Archive tier data requires the data to be rehydrated to a Hot or Cool tier. This can mean a latency of several hours. Access tier support requires Data Lake Storage Gen2.
109 | 
110 | Azure Blob also supports two performance tiers. The Standard performance tier provides for high performance using hard disk-based storage media. The Premium performance provides greater throughput than the Standard tier and uses solid state drive (SSD) media. The Standard and Premium tiers are also supported for other storage options including Azure File storage and Azure SQL Database.
111 | 
112 | Azure Table and Azure File do not support access tiers. Access tiers is a feature supported through Cosmos DB. Azure Table and Azure File are distinct storage types and are not implemented through Cosmos DB APIs.
113 | 
114 | File storage provides file storage with shared access, similar to a file server.
115 | 
116 | You can create a premium Azure File storage account in a FileStorage storage account only. You can create a standard File storage account only in a GPv2 account.
117 | 
118 | You can configure a premium Azure File storage account for LRS and, in select regions, for ZRS. A standard file storage account supports GRS, but not a premium storage account.
119 | 
120 | You can use a premium Azure File storage account to replace or supplement traditional on-premises file shares. This is true for both standard and premium storage accounts. This includes scenarios where application data is moved to the cloud but applications continue to run on-premises.
121 | 
122 | Microsoft recommends that any new data project created from scratch uses the **Core (SQL) API**. This includes applications with **key/value data**. This type of application is also supported by the Table API, but the Core (SQL) API is recommended as the best solution because it provides improved indexing and a richer query experience.
123 | 
124 | When creating a new application that analyzes detailed relationship information for non-relational data you should use the Gremlin API. This is one of the few cases where the Core (SQL) API is not recommended as the best solution. **Graph Database -> Gremlin API**
125 | 
126 | When moving application data to the cloud that uses semi-structured documents to store data, you should use the Core (SQL) API. The Core (SQL) API gives you the ability to create, query, and update data 
127 | documents.
128 | 
129 | Azure File storage supports direct mounting by Windows, macOS, and Linux. This includes support for concurrent access from the cloud and on-premises. Azure File storage can be used to supplement or replace on-premises file server shares.
130 | 
131 | Azure File storage does not allow you to select the underlying hardware and operating system. Azure File storage is implemented as a serverless file service in which you have neither direct access to, or administrative responsibilities for, the underlying architecture. The one infrastructure choice you can make is between hard disk (ADD) standard file shares and solid-state disk (SSD) premium file shares.
132 | 
133 | Azure File storage does not support redundancy across multiple regions by default. Standard file shares support locally-redundant storage (CRS) by default with options for zone redundant storage (ZRS), geo-
134 | redundant storage (CRS), and geo-zone-redundant storage (GZRS). 
135 | 
136 | Replication across multiple regions is supported as an option, but not as a default setting. The large file share feature and premium file shares support LRS and ZRS only.
137 | 
138 | **Cosmos DB Table API** supports multi-master replication across multiple regions. This means that you can set it up to let any region accept writes, Global distribution is a turnkey feature of Cosmos DB Table API. Table storage is limited to one primary image that can accept writes and can have one read-only replica in a different region.
139 | 
140 | Cosmos DB Table API supports manual and automatic failovers. Failure can be initiated from any redundant region and at any time.
141 | 
142 | Cosmos DB Table API automatically creates secondary indexes without any index management requirements.
143 | 
144 | An Azure Cosmos account is topmost in the resource hierarchy. You must have this before you can create an Azure Cosmos DB database instance. You can then create API-specific containers, such as tables, collections, or graphs. 
145 | 
146 | You create items, the entities for which you are storing data, inside a container. Examples include documents, nodes, edges, or rows.
147 | 
148 | Azure Cosmos Account -> Database -> Container -> Item
149 | 
150 | Cosmos DB Table API supports multi-region writes and read replicas. You can configure read replicas in a Cosmos DB account to multiple regions, including support to create multi-region writes.
151 | 
152 | The type of account to create is specified when you create an Azure Cosmos DB by selecting the API type.
153 | 
154 | You can configure an account without multiple databases of the same type, but you must create a separate account for each type of database you want to support.
155 | 
156 | Azure Cosmos DB account serverless mode is not supported for all Cosmos DB APIs. It is supported for Cosmos DB Core (SQL) API only. 
157 | 
158 | You cannot apply the free tier discount when creating a serverless mode account. In addition, the serverless mode does not support multi-master writes or geo-redundancy.
159 | 
160 | You can have up to one free tier Azure Cosmos DB account per Azure subscription. You can request a free tier account for any account type.
161 | 
162 | **Azure Cosmos explorer(()) lets you set up temporary or permanent read or read-write access to your database. You can also use Azure Cosmos explorer to run queries, store procedures and triggers, and view their results. You can share query results with other users who do not have access to Azure portal or subscription.
163 | 
164 | You should use the **MongoDB API** to store JSON documents. This API is compatible with MongoDB, which is a document database that stores semi-structured data in JSON format. A document usually contains all data from an entity, and each document can have different fields Of data.
165 | 
166 | 
167 | 


--------------------------------------------------------------------------------
/data.md:
--------------------------------------------------------------------------------
  1 | ## Describe core data concepts (15-20%)
  2 | ### Describe types of core data workloads
  3 | * describe batch data
  4 | * describe streaming data
  5 | * describe the difference between batch and streaming data
  6 | * describe the characteristics of relational data
  7 | 
  8 | ### Describe data analytics core concepts
  9 | * describe data visualization (e.g., visualization, reporting, business intelligence (BI))
 10 | * describe basic chart types such as bar charts and pie charts
 11 | * describe analytics techniques (e.g., descriptive, diagnostic, predictive, prescriptive, cognitive)
 12 | * describe ELT and ETL processing
 13 | * describe the concepts of data processing
 14 | 
 15 | Examples of **Batch Processing**:
 16 | * Employee payroll processing and generating payroll checks
 17 | * Setting inventory stocking levels based on seasonal sales volume
 18 | 
 19 | Data for batch processing is collected over time, often from different data sources, and it is processed as a dataset that includes a range of rows or all rows in the dataset.
 20 | 
 21 | * Batch processing is designed to handle processing of large datasets.
 22 | 
 23 | * There is typically a long latency between data collection and data processing.
 24 | 
 25 | A batch process used to analyze customer activity could include data from different databases and from text documents in different formats. Data gathering often requires extensive transformation, and the data must then be written to a data store before analysis.
 26 | 
 27 | Batch processing must be used if the data is to be subjected to detailed analysis to generate visuals and reports.
 28 | 
 29 | 
 30 | Examples of **Stream Processing**: 
 31 | * Reporting the number of users and bandwidth user for an online game
 32 | * Identifying detected manufacturing errors to automatically reject failing parts
 33 | 
 34 | Stream processing is designed for real-time or near real-time data processing, often as a data load process with minimal processing. Data must be able to stream out as quickly as it streams in for processing. Data is either processed as it is generated or in micro-batches of a very few rows with latency of no more than a few milliseconds.
 35 | 
 36 | Streaming could involve collecting data from multiple Internet of Things (IOT) sensors and writing them to Table storage.
 37 | 
 38 | Stream processing is set up where changes to the data are kept to a minimum to optimize performance.
 39 | 
 40 | For data used for transaction processing that requires immediate, consistent postings, stream processing should be used. This is because of a concern about latency. For example, you might insert a time stamp on each incoming entry or make minor formatting changes to the data.
 41 | 
 42 | Examples of streaming data include:
 43 | * A financial institution tracks changes in the *stock market* in real time, computes value-at-risk, and automatically rebalances portfolios based on stock price movements.
 44 | * An *online gaming* company collects real-time data about player-game interactions, and feeds the data into its gaming platform. It then analyzes the data in real time, offers incentives and dynamic experiences to engage its players.
 45 | * A real-estate website that tracks a subset of data from consumers’ mobile devices, and makes *real-time property recommendations* of properties to visit based on their geo-location.
 46 | 
 47 | Data with similar content can be processed from multiple sources by both batch and stream processing. One primary difference is that batch processing can involve a wider variety of sources, including on premises sources, whereas stream processing would receive data from streaming sources only, and it would most likely be with similar data. 
 48 | 
 49 | With both batch and stream processing, the data that is processed can include large quantities of data.
 50 | 
 51 | Differences between batch and streaming data:
 52 | 
 53 | * Data Scope: Batch processing can process all the data in the dataset. Stream processing typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example).
 54 | 
 55 | * Data Size: Batch processing is suitable for handling large datasets efficiently. Stream processing is intended for individual records or micro batches consisting of few records.
 56 | 
 57 | * Performance: The latency for batch processing is typically a few hours. Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds. Latency is the time taken for the data to be received and processed.
 58 | 
 59 | * Analysis: You typically use batch processing for performing complex analytics. Stream processing is used for simple response functions, aggregates, or calculations such as rolling averages.
 60 | 
 61 | The main characteristics of a **Relational Database** are:
 62 | 
 63 | * All data is tabular. Entities are modeled as tables, each instance of an entity is a row in the table, and each property is defined as a column.
 64 | 
 65 | * All rows in the same table have the same set of columns.
 66 | 
 67 | * A table can contain any number of rows.
 68 | 
 69 | * A primary use of relational databases is to handle transaction processing
 70 | 
 71 | Examples:
 72 | * Inventory management
 73 | * Order management
 74 | * Reporting database
 75 | * Accounting
 76 | 
 77 | * Records are frequently created and updated.
 78 | * Multiple operations have to be completed in a single transaction.
 79 | * Relationships are enforced using database constraints.
 80 | * Indexes are used to optimize query performance.
 81 | * Data is highly normalized.
 82 | * Database schemas are required and enforced.
 83 | * Many-to-many relationships between data entities in the database.
 84 | * Constraints are defined in the schema and imposed on any data in the database.
 85 | * Data requires high integrity. Indexes and relationships need to be maintained accurately.
 86 | * Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
 87 | * Size of individual data entries is small to medium-sized.
 88 | 
 89 | * A primary key uniquely identifies each row in a table. No two rows can share the same primary key. 
 90 | * The primary key is a unique value assigned to a row. 
 91 | * Even if all of the other columnar information is the same two rows, the primary key value makes each row unique.
 92 | 
 93 | * A foreign key references rows in another, related table. For each value in the foreign key column, there should be a row with the same value in the corresponding primary key column in the other table.
 94 | * When setting up relationships between tables through the use of foreign keys, each foreign key value must have a corresponding value in a primary key.
 95 | 
 96 | A relational database restructures the data into a fixed format that is designed to answer specific queries. When data needs to be ingested very quickly, or the query is unknown and unconstrained, a relational database can be less suitable than a non-relational database.
 97 | 
 98 | **Non-relational databases** are highly suitable for the following scenarios:
 99 | 
100 | * IoT and telematics. These systems typically ingest large amounts of data in frequent bursts of activity. Non-relational databases can store this information very quickly. The data can then be used by analytics services such as Azure Machine Learning, Azure HDInsight, and Microsoft Power BI. Additionally, you can process the data in real-time using Azure Functions that are triggered as data arrives in the database.
101 | 
102 | * Retail and marketing. Microsoft uses CosmosDB for its own ecommerce platforms that run as part of Windows Store and Xbox Live. It's also used in the retail industry for storing catalog data and for event sourcing in order processing pipelines.
103 | 
104 | * Gaming. The database tier is a crucial component of gaming applications. Modern games perform graphical processing on mobile/console clients, but rely on the cloud to deliver customized and personalized content like in-game stats, social media integration, and high-score leaderboards. Games often require single-millisecond latencies for reads and write to provide an engaging in-game experience. A game database needs to be fast and be able to handle massive spikes in request rates during new game launches and feature updates.
105 | 
106 | * Web and mobile applications. A non-relational database such as Azure Cosmos DB is commonly used within web and mobile applications, and is well suited for modeling social interactions, integrating with third-party services, and for building rich personalized experiences. The Cosmos DB SDKs (software development kits) can be used to build rich iOS and Android applications using the popular Xamarin framework.
107 | 
108 | NoSQL (non-relational) databases generally fall into four categories: key-value stores, document databases, column family databases, and graph databases.
109 | 
110 | The focus of a **key-value store is** the ability to *read and write data very quickly*. Search capabilities are secondary. A key-value store is an excellent choice for data ingestion, when a large volume of data arrives as a continual stream and must be stored immediately.
111 | 
112 | **Azure Table** storage is an example of a **key-value store**. Cosmos DB also implements a key-value store using the Table API.
113 | 
114 | 
115 | 
116 | 
117 | **Normalization** is the process that is used to split an entity into multiple tables. This helps to minimize data duplication through the use of related tables. For example, an online order might need to include customer information and information about the items ordered. Rather than putting all of this information in the order you can have foreign keys pointing to the detail customer and detail item information in other tables.
118 | 
119 | Relational databases are commonly used in ecommerce systems, but one of the major use cases for using relational databases is Online Transaction Processing (OLTP). 
120 | 
121 | **OLTP applications** are focused on transaction-oriented tasks that process a very large number of transactions per minute. Relational databases are well suited for OLTP applications because they naturally support insert, update, and delete operations. A relational database can often be tuned to make these operations fast. Also, the nature of SQL makes it easy for users to perform ad-hoc queries over data.
122 | 
123 | Examples of OLTP applications that use relational databases are:
124 | 
125 | * Banking solutions
126 | * Online retail applications
127 | * Flight reservation systems
128 | * Many online purchasing applications.
129 | 
130 | **ETL** stands for *Extract, Transform, and Load*. The raw data is retrieved and transformed before being saved. The extract, transform, and load steps can be performed as a continuous pipeline of operations. It is suitable for systems that only require simple models, with little dependency between items. For example, this type of process is often used for basic data cleaning tasks, deduplicating data, and reformatting the contents of individual fields.
131 | 
132 | **ELT** is an abbreviation of *Extract, Load, and Transform*. The process differs from ETL in that the data is stored before being transformed. The data processing engine can take an iterative approach, retrieving and processing the data from storage, before writing the transformed data and models back to storage. ELT is more suitable for constructing complex models that depend on multiple items in the database, often using periodic batch processing.
133 | 
134 | **Azure Data Factory**: A cloud-based data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. 
135 | 
136 | Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. 
137 | 
138 | You can build complex ETL processes that transform data visually with data flows, or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
139 | 
140 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/2-etl-vs-elt.png" alt="ETL v ELT"></p>
141 | <p align="center"></p>
142 | 
143 | A **transactional database** must adhere to the **ACID** (Atomicity, Consistency, Isolation, Durability) properties to ensure that the database remains consistent while processing transactions.
144 | 
145 | **Atomicity** guarantees that each transaction is treated as a single unit, which *either succeeds completely, or fails completely*. If any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes.
146 | 
147 | **Consistency** ensures that a transaction can only take the data in the database from one valid state to another. A consistent database should never lose or create data in a manner that can't be accounted for. In the bank transfer example described earlier, if you *add funds to an account, there must be a corresponding deduction of funds somewhere*, or a record that describes where the funds have come from if they have been received externally. You can't suddenly create (or lose) money.
148 | 
149 | **Isolation** ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. *A concurrent process can't see the data in an inconsistent state* (for example, the funds have been deducted from one account, but not yet credited to another.)
150 | 
151 | **Durability** guarantees that once a transaction has been committed, it will remain committed even if there's a system failure such as a power outage or crash.
152 | 
153 | **Analytical workloads** are typically read-only systems that store vast volumes of historical data or business metrics, such as *sales performance and inventory levels*. Analytical workloads are used for data analysis and decision making. Analytics are generated by aggregating the facts presented by the raw data into summaries, trends, and other kinds of “Business information.”
154 | 
155 | Analytics can be based on a snapshot of the data at a given point in time, or a series of snapshots. People who are higher up in the hierarchy of the company usually don't require all the details of every transaction. They want the bigger picture.
156 | 
157 | An example of analytical information is a *report on monthly sales*. As the head of sales department, you may not need to see all daily transactions that took place (transactional information), but you definitely would like a monthly sales report to identify trends and to make decisions (analytical information).
158 | 
159 | There are three key job roles that deal with data in most organizations:
160 | 
161 | * **Database Administrators** manage databases, assigning permissions to users, storing backup copies of data and restore data in case of any failures.
162 | * **Data Engineers** are vital in working with data, applying data cleaning routines, identifying business rules, and turning data into useful information.
163 | * **Data Analysts** explore and analyze data to create visualizations and charts to enable organizations to make informed decisions.
164 | 
165 | Most database management systems provide their own set of tools to assist with database administration. 
166 | For example:
167 | * SQL Server Database Administrators use SQL Server Management Studio for most of their day-to-day database maintenance activities. 
168 | * pgAdmin for PostgreSQL systems, 
169 | * MySQL Workbench for MySQL. 
170 | 
171 | There are also a number of *cross-platform database administration tools* available. One example is **Azure Data Studio**
172 | 
173 | **Azure Data Studio** provides a graphical user interface for managing many different database systems. It currently provides connections to on-premises SQL Server databases, Azure SQL Database, PostgreSQL, Azure SQL Data Warehouse, and SQL Server Big Data Clusters, amongst others.
174 | 
175 | **SQL Server Management Studio** provides a graphical interface, enabling you to query data, perform general database administration tasks, and generate scripts for automating database maintenance and support operations.
176 | 
177 | Azure SQL database provides database services in Azure. It's similar to SQL Server, except that it runs in the cloud. *You can manage Azure SQL database using Azure portal.*
178 | 
179 | Typical configuration tasks such as increasing the database size, creating a new database, and deleting an existing database are done using the Azure portal.
180 | 
181 | 
182 | 
183 | 


--------------------------------------------------------------------------------
/analytics.md:
--------------------------------------------------------------------------------
  1 | ## Describe an analytics workload on Azure (25-30%)
  2 | ### Describe analytics workloads
  3 | * describe transactional workloads
  4 | * describe the difference between a transactional and an analytics workload
  5 | * describe the difference between batch and real time
  6 | * describe data warehousing workloads
  7 | * determine when a data warehouse solution is needed
  8 | 
  9 | ### Describe the components of a modern data warehouse
 10 | * describe Azure data services for modern data warehousing such as Azure Data Lake, Azure Synapse Analytics, Azure Databricks, and Azure HDInsight
 11 | * describe modern data warehousing architecture and workload
 12 | Describe data ingestion and processing on Azure
 13 | * describe common practices for data loading
 14 | * describe the components of Azure Data Factory (e.g., pipeline, activities, etc.)
 15 | * describe data processing options (e.g., Azure HDInsight , Azure Databricks, Azure Synapse Analytics, Azure Data Factory)
 16 | 
 17 | ### Describe data visualization in Microsoft Power BI
 18 | * describe the role of paginated reporting
 19 | * describe the role of interactive reports
 20 | * describe the role of dashboards
 21 | * describe the workflow in Power BI
 22 | 
 23 | 
 24 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/modern-data-warehouse.png" alt="Data Warehouse"></p>
 25 | <p align="center"></p>
 26 | 
 27 | Two common use cases for **Data Warehousing** are:
 28 | * You want to generate reports from historical data without impacting transactional processing.
 29 | * You want to provide a platform for data mining.
 30 | 
 31 | Data warehousing lets you consolidate data from multiple sources for analysis and reporting. Data stores are optimized for read operations with few, if any, writes performed on the data. There are typically no locking requirements in a data warehouse.
 32 | 
 33 | **OLTP** systems are used to record day-to-day business activities and interactions as they occur. This includes activities such as orders taken, services performed, and payments received or made. 
 34 | * In an OLTP system, data is highly normalized with the schema strongly enforced on write. 
 35 | * OLTP systems are usually structured around a relational data store supporting transactional applications. 
 36 | * An OLTP workload has heavy write requirements with minimal (in comparison) read requirements. 
 37 | * In an OLTP environment, changes made are rolled back automatically if a transaction is not completed so that no transaction is left in a partially completed state. This is known as atomicity and is a requirement for OLTP.
 38 | * An application to process hundreds of user purchases per minute, including updates to inventory on hand, is an example of an OLTP workload. 
 39 | * OLTP applications are optimized for write operations and entering and updating data across multiple, related tables. Partial changes made to data are rolled back automatically if a transaction is not completed, so that no transaction is left in a partially completed state.
 40 | * An application to support warehouse sales and shipping for physical warehouses and multiple international locations is an example of an OLTP application. 
 41 | * OLTP transactions can be distributed geographically and supported by one or more relational database. This scenario requires a solution that supports consistent and reliable data writes.
 42 | OLTP is used for transactional workloads, such as:
 43 | * Performing e-commerce transactions
 44 | * Tracking inventory management systems
 45 | 
 46 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/olap-data-pipeline.png" alt="OLAP"></p>
 47 | <p align="center"></p>
 48 | 
 49 | **OLAP**
 50 | An application to perform data mining on historic data collected from multiple relational and non-relational sources is an example of an OLAP workload.
 51 | * OLAP applications often manipulate data based on complex queries. Data mining queries are complex multidimensional queries that are designed to discover insights from the data that are not immediately apparent.
 52 | * An application to provide loosely normalized data to support report generation is an example of an OLAP workload. Companies will often maintain live data for transactional processing and a separate copy of historic data for analysis and report generation. This prevents analytic processing from interfering with the performance during transactional processing.
 53 | * Online analytical processing (OLAP) systems are designed to perform complex analysis and provide business intelligence. 
 54 | OLAP is used for analytical workloads, such as:
 55 | * Generating complex ad-hoc reports that include several aggregations 
 56 | * Performing big data analysis on NoSQL database Online transaction processing (OLTP) systems are designed to perform business transactions as they occur.
 57 |  
 58 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/star-schema-example1.png" alt="Star Schema"></p>
 59 | <p align="center"></p>
 60 | 
 61 | **Star schema** is a mature modeling approach widely adopted by relational data warehouses. It requires modelers to classify their model tables as either dimension or fact.
 62 | 
 63 | Dimension tables describe business entities—the things you model. Entities can include products, people, places, and concepts including time itself. The most consistent table you'll find in a star schema is a date dimension table. A dimension table contains a key column (or columns) that acts as a unique identifier, and descriptive columns.
 64 | 
 65 | Fact tables store observations or events, and can be sales orders, stock balances, exchange rates, temperatures, etc. A fact table contains dimension key columns that relate to dimension tables, and numeric measure columns. The dimension key columns determine the dimensionality of a fact table, while the dimension key values determine the granularity of a fact table. For example, consider a fact table designed to store sale targets that has two dimension key columns Date and ProductKey. It's easy to understand that the table has two dimensions. The granularity, however, can't be determined without considering the dimension key values. In this example, consider that the values stored in the Date column are the first day of each month. In this case, the granularity is at month-product level.
 66 | 
 67 | Generally, dimension tables contain a relatively small number of rows. Fact tables, on the other hand, can contain a very large number of rows and continue to grow over time.
 68 | 
 69 | A **snowflake** dimension is a set of normalized tables for a single business entity. For example, Adventure Works classifies products by category and subcategory. Categories are assigned to subcategories, and products are in turn assigned to subcategories. In the Adventure Works relational data warehouse, the product dimension is normalized and stored in three related tables: DimProductCategory, DimProductSubcategory, and DimProduct.
 70 | 
 71 | Processing data as it arrives is called streaming. Buffering and processing the data in groups is called batch processing.
 72 | 
 73 | An example of **batch processing** is the way that votes are typically counted in elections. The votes are not entered when they are cast, but are all entered together at one time in a batch.
 74 | 
 75 | Advantages of batch processing include:
 76 | 
 77 | Large volumes of data can be processed at a convenient time.
 78 | It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.
 79 | Disadvantages of batch processing include:
 80 | 
 81 | The time delay between ingesting the data and getting the results.
 82 | All of a batch job's input data must be ready before a batch can be processed. This means data must be carefully checked. Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt. The input data must be carefully checked before the job can be run again. Even minor data errors, such as typographical errors in dates, can prevent a batch job from running.
 83 | An example of an effective use of batch processing would be a connection to a mainframe system. Vast amounts of data need to be transferred into a data analysis system and the data is not real-time. An example of ineffective batch-processing would be to transfer small amounts of real-time data, such as a financial stock-ticker.
 84 | 
 85 | The most appropriate use case for **Azure Synapse Analytics** is to perform very complex queries and aggregations on a large amount of relational data. 
 86 | * You can provision Synapse SQL pools to quickly execute complex queries across multiple computer nodes thanks to the Synapse SQL massively parallel processing (MPP) architecture. 
 87 | * Azure Synapse Analytics models and serves data. 
 88 | * You can load relational data ingested from Azure Data Factory in Azure Synapse Analytics using Synapse SQL pool, and also read unstructured data stored in Azure Data Lake Storage using Polybase. 
 89 | * Combining both relational and unstructured data, you can perform complex analytics and serve data for later stages.
 90 | 
 91 | The most appropriate use case for **Power Bl** is to create dashboards and data visualizations from tabular data.
 92 | Power BI visualizes data. You can build interactive reports and dashboards with Power Bl, allowing business users to analyze this data and deliver insights throughout your organization.
 93 | 
 94 | The most appropriate use case for **Azure Data Lake Storage** is to store massive amounts of unstructured data in a hierarchical structure.
 95 | * Azure Data Lake Storage stores raw data. You can store raw, unstructured data, such as text files, logs, and images, to be processed quickly at later stages.
 96 | * Azure Data Lake Storage Gen2 is built on top of Azure Blob storage, combining the features of the previous generation of Azure Data Lake Storage with Azure Blob storage.
 97 | * Azure Data Lake Storage is capable of storing a large amount of data in a cost-effective way. 
 98 | * Azure Data Lake Storage can store large amounts of data, like hundreds of terabytes and more, and you only pay for what you use. 
 99 | * You can reduce the storage cost even more by using features such as storage lifecycle to archive or move data that is not used frequently to cheaper storage tiers.
100 | * Azure Data Lake Storage enables hierarchical namespace compatible with Hadoop Distributed File System (HDFS). 
101 | * Azure Data Lake Storage provides a layer to access Azure Blob Storage data as an HDFS storage, including support to organize files in directories and subdirectories, allowing you to quickly examine large quantities of data.
102 | 
103 | The most appropriate use case for Azure SQL Database is to serve as data storage for online transactional processing (OLTP) workloads.
104 | 
105 | **Azure HDlnsight** is a big data processing service used to provision and manage a cluster of open-source analytics solutions such as Apache Spark, Hadoop, and Kafka.
106 | 
107 | **Azure Databricks** is a complete platform for big data processing, streaming, and machine learning optimized for the Microsoft Azure cloud services platform and built on top of Apache Spark.
108 | * Azure Databricks can process batch and streaming processing workloads. You can also perform real-time data processing and event streaming from Azure Event Hubs with Azure Databricks.
109 | * Azure Databricks provides an interactive workspace for exploration and data visualization.
110 | * Azure Databricks provides a workspace for collaboration between data scientists, data engineers, and business analysts.
111 | * You can run notebooks in R, Python, Scala, or SQL, and interact with the data very quickly.
112 | 
113 | **Azure Data Factory** ingests data from the source. You can ingest data from both relational data and non-structured data from multiple sources with Azure Data Factory.
114 | 
115 | 
116 | **Azure Analysis Services** is a service used to build multidimensional or tabular models used by online analytical processing (OLAP) queries. You can combine data from multiple sources, like Azure Synapse Analytics, Azure Data Lake Store, Azure Cosmos DB, and others to build the tabular models.
117 | 
118 | You should use PolyBase. PolyBase is a feature present in SQL Server and Azure Synapse Analytics capable of reading data from Hadoop Distributed File System (HDFS)•compatible storage by using T-SQL queries. You can create an external table with an external data source to map the parquet format stored in Azure Data Lake Storage.You can query this external table with T-SQL and join it to other tables.
119 | 
120 | While you can read data from Azure Data Lake Storage parquet files, you cannot use T-SQL queries to read this data in Azure Data Factory.
121 | 
122 | Azure Databricks and Azure HDlnsight: These services are capable of reading data from Azure Data Lake Storage by using notebooks.
123 | 
124 | Azure Data Factory can load data from Azure Blob Storage, Azure Data Lake Storage, Azure Cosmos DB, and Azure Synapse Analytics. You can even load data from services outside Azure, such as Amazon 53.
125 | 
126 | Azure Data Factory can export data to Azure Data Lake Storage, Azure Synapse Analytics, and many other destinations, such as Azure SQL Database, Azure Blob Storage, and Azure Cosmos DB.
127 | 
128 | Azure Data Factory can run SQL Server Integration Service (SSIS) packages using the Execute SSIS Package activity, To use the Execute SSIS Package activity, you need to configure the Azure-SSIS
129 | integration runtime (IR).
130 | 
131 | You use the **Power 81 Report Builder** to author and publish paginated reports. You create a paginated report by creating a report definition that specifies what data to retrieve, where to get it, and how to display it. The report is generated by the report processor when you run the report. You can preview the report in Report Builder before publishing it to the Power BI service.
132 | 
133 | **Power Bl Dashboard** is a single page on which your visualizations are posted as tiles on the dashboard to display information. A dashboard allows you to show important business metrics at a glance.
134 | 
135 | * Each report page displayed on a dashboard is based on a single dataset.
136 | 
137 | * An interactive report can be created from one dataset only. The reports can be created from the same dataset or a different dataset, but each report will be based on one dataset.
138 | 
139 | * Report pages are pinned to the dashboard from interactive reports.
140 | 
141 | * A dashboard can include pages pinned from multiple reports.
142 | 
143 | * Dashboard content is not limited to cloud data only. Content can include cloud and on-premises data. This gives you an easy way to compare data from various sources.
144 | 
145 | * One dashboard can be identified as your featured dashboard. After you declare a featured dashboard, this is the dashboard that will be displayed initially when you open the Power Bl service. You can change the featured dashboard at any time. 
146 | 
147 | 
148 | The **Power Bl service** is a set of analysis and display tools that lets you create visuals based on your data. This includes Power Bl Desktop and Power Bl Dashboard.
149 | 
150 | **Power Bl Desktop** is used to create interactive reports, typically for publishing in Power Bl Dashboard.
151 | 
152 | You can include tiles as visualizations based on underlying data and along with standalone tiles, which can include:
153 | * Text boxes
154 | * Images
155 | * Videos
156 | * Streaming data
157 | * Web content
158 | 
159 | The primary difference between these and other types of visualizations is that they do not link you to additional data or information. Tiles with underlying data include:
160 | * Interactive reports
161 | * Datasets
162 | * Dashboards
163 | * Excel worksheets
164 | * Server Reporting Services (SSRS)
165 | This is a partial list of the types of tiles supported.
166 | 
167 | **Power Bl apps** are used to create visualizations.
168 | 
169 | An app is a collection of ready-made visuals pre-arranged on reports and dashboards. Power Bl service includes several apps already defined for you. This includes apps available for various online services.
170 | 
171 | A report based on a single dataset generated using the Power BI Desktop as a collection of one or more pages of visuals is an interactive report. **An interactive report** will have one or more pages of visuals. Once the Power is published, the report pages can be used as tiles on a dashboard.
172 | 
173 | A unique combination of data pulled from various sources and used to create a visualization is referred to as a **dataset**. A dataset can include data from one or more databases, spreadsheets, flat files, and other sources, including both cloud-based and on-premises sources. Each unique combination is considered a different dataset.
174 | 
175 | **A tile** is a rectangular box that contains a single visual for use with a report or dashboard that supports user interaction. The visual can be custom content, content from a pre-defined app, or standalone tile content.
176 | 
177 | **A paginated report** is created in Power BI Report Builder from a report definition that specifies what data to retrieve, where to get it, and how to display it.
178 | 
179 | **A dashboard** is a single-page canvas to which you can pin tiles containing visualizations.
180 | 
181 | You should use **Power BI service** to create an app workspace and share its dashboard. Power BI service allows you to create an app workspace and share reports and dashboards. You can create a dashboard from reports in Power Bl service.
182 | 
183 | Power BI desktop allows you to create reports and publish them to an app workspace in Power BI service. 
184 | 
185 | You can create dashboards from reports.
186 | 
187 | You only need Power BI service to create a workspace and share its dashboard.
188 | 
189 | Power BI mobile app allows you to view reports and dashboard that are shared with you.
190 | 
191 | You should create a report on Power Bl Desktop. In a common workflow, you begin by connecting to data sources and building a report in Power Bl Desktop. 
192 | 
193 | You can also create reports on Power Bl service but with limited access to data sources.
194 | 
195 | Then, you should share a report on Power Bl service. You can publish and share reports on a Power Bl service workspace to make it available to end users.
196 | 
197 | Finally, you should view and interact with reports on Power Bl mobile. After a report is shared on Power Bl service, you can view and interact with this report using Power Bl mobile. You can use Power Bl service to interact and view reports with end users with desktop access only.
198 | 
199 | Power Bl mobile apps can only be used to view and interact with reports.
200 | 
201 | You can create reports and dashboards in the Power Bl service. Although you can create basic reports and dashboards in the Power Bl service, it is more common to create reports in Power Bl Desktop for a complete design experience and for access to more data sources.
202 | 
203 | You can share and distribute reports in the Power BI service. You can create workspaces in the Power BI service to collaborate and share your reports with other team members and your company. 
204 | 
205 | You should use Power BI Desktop to design data modeling, like creating custom columns and managing model relationships.
206 | 
207 | In Power BI, paginated reports are a type of report designed to be printed or shared, formatting the information to fit well on a page, even if the data spans multiple pages, These reports are created with a standalone tool named Power Bl Report Builder and are based on the standard report format in SQL Server Reporting Services.
208 | 
209 | Reports are a collection of visuals from a dataset with one or more pages. A report groups together a set of visualizations created from a single dataset and organizes these visualizations On one or more pages.
210 | 
211 | Dashboards are a single-page collection of visuals. You can create a dashboard from one or more reports, consolidating the most relevant visuals in a single-page view.
212 | 
213 | Visualizations are a visual representation of your data, like charts, maps, and other visual components.
214 | 
215 | Visualizations are also called visuals.
216 | 
217 | Datasets are a collection of data used to create visualizations. A dataset can combine data from different sources, such as database fields, Excel tables, and many other sources supported by multiple data connectors.
218 | 
219 | Reports are a collection of visuals from a dataset with one or more pages. A report groups together a set of visualizations created from a single dataset and organizes these visualizations into one or more pages.
220 | 
221 | Dashboards are a single-page collection of visuals. You can create a dashboard from one or more reports, consolidating the most relevant visuals in a single-page view.
222 | 
223 | Tiles are a single visualization on a report or a dashboard that holds an individual visual. You can arrange or resize tiles while you are designing your reports or dashboards on the canvas.
224 | 
225 | You can display visualizations from multiple datasets on a dashboard. You can include visualizations from one or more reports, where each report can use a different dataset. If you are designing a report, you can only display visualizations from a single dataset.
226 | 
227 | You cannot filter and slice the data shown in a dashboard. A dashboard does not support filtering or data slices. You can apply filters in a report tile to filter the data shown in the dashboard. However, you cannot apply filters and slices directily in a dashboard, Also, if the whole report page were pinned as a live tile,
228 | all filters and slices would appear on the dashboard just as they do on a report page.
229 | 
230 | You can export the underlying data used to build a tile to an Excel file, You can export the data used to build a given tile nn your dashboard. However, you cannot export dataset tables, fields, and values directly from a dashboard.
231 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DP-900 Microsoft Azure Data Fundamentals
  2 | 
  3 | ## Describe core data concepts (15-20%)
  4 | ### Describe types of core data workloads
  5 | * describe batch data
  6 | * describe streaming data
  7 | * describe the difference between batch and streaming data
  8 | * describe the characteristics of relational data
  9 | 
 10 | ### Describe data analytics core concepts
 11 | * describe data visualization (e.g., visualization, reporting, business intelligence (BI))
 12 | * describe basic chart types such as bar charts and pie charts
 13 | * describe analytics techniques (e.g., descriptive, diagnostic, predictive, prescriptive, cognitive)
 14 | * describe ELT and ETL processing
 15 | * describe the concepts of data processing
 16 | 
 17 | ## Describe how to work with relational data on Azure (25-30%)
 18 | ### Describe relational data workloads
 19 | * identify the right data offering for a relational workload
 20 | * describe relational data structures (e.g., tables, index, views)
 21 | 
 22 | ### Describe relational Azure data services
 23 | * describe and compare PaaS, IaaS, and SaaS solutions
 24 | * describe Azure SQL database services including Azure SQL Database, Azure SQL
 25 | 
 26 | ### Managed Instance, and SQL Server on Azure Virtual Machine
 27 | * describe Azure Synapse Analytics
 28 | * describe Azure Database for PostgreSQL, Azure Database for MariaDB, and Azure Database for MySQL
 29 | 
 30 | ### Identify basic management tasks for relational data
 31 | * describe provisioning and deployment of relational data services
 32 | * describe method for deployment including the Azure portal, Azure Resource Manager templates, Azure PowerShell, and the Azure command-line interface (CLI)
 33 | * identify data security components (e.g., firewall, authentication)
 34 | * identify basic connectivity issues (e.g., accessing from on-premises, access with Azure VNets, access from Internet, authentication, firewalls)
 35 | * identify query tools (e.g., Azure Data Studio, SQL Server Management Studio, sqlcmd
 36 | utility, etc.)
 37 | Describe query techniques for data using SQL language
 38 | * compare Data Definition Language (DDL) versus Data Manipulation Language (DML)
 39 | * query relational data in Azure SQL Database, Azure Database for PostgreSQL, and Azure Database for MySQL
 40 | 
 41 | ## Describe how to work with non-relational data on Azure (25-30%)
 42 | ### Describe non-relational data workloads
 43 | * describe the characteristics of non-relational data
 44 | * describe the types of non-relational and NoSQL data
 45 | * recommend the correct data store
 46 | * determine when to use non-relational data
 47 | 
 48 | ### Describe non-relational data offerings on Azure
 49 | * identify Azure data services for non-relational workloads
 50 | * describe Azure Cosmos DB APIs
 51 | * describe Azure Table storage
 52 | * describe Azure Blob storage
 53 | * describe Azure File storage
 54 | 
 55 | ### Identify basic management tasks for non-relational data
 56 | * describe provisioning and deployment of non-relational data services
 57 | * describe method for deployment including the Azure portal, Azure Resource Manager templates, Azure PowerShell, and the Azure command-line interface (CLI)
 58 | * identify data security components (e.g., firewall, authentication, encryption)
 59 | * identify basic connectivity issues (e.g., accessing from on-premises, access with Azure VNets, access from Internet, authentication, firewalls)
 60 | * identify management tools for non-relational data
 61 | 
 62 | ## Describe an analytics workload on Azure (25-30%)
 63 | ### Describe analytics workloads
 64 | * describe transactional workloads
 65 | * describe the difference between a transactional and an analytics workload
 66 | * describe the difference between batch and real time
 67 | * describe data warehousing workloads
 68 | * determine when a data warehouse solution is needed
 69 | 
 70 | ### Describe the components of a modern data warehouse
 71 | * describe Azure data services for modern data warehousing such as Azure Data Lake, Azure Synapse Analytics, Azure Databricks, and Azure HDInsight
 72 | * describe modern data warehousing architecture and workload
 73 | Describe data ingestion and processing on Azure
 74 | * describe common practices for data loading
 75 | * describe the components of Azure Data Factory (e.g., pipeline, activities, etc.)
 76 | * describe data processing options (e.g., Azure HDInsight , Azure Databricks, Azure Synapse Analytics, Azure Data Factory)
 77 | 
 78 | ### Describe data visualization in Microsoft Power BI
 79 | * describe the role of paginated reporting
 80 | * describe the role of interactive reports
 81 | * describe the role of dashboards
 82 | * describe the workflow in Power BI
 83 | 
 84 | # Learning Path:
 85 | 
 86 | | Course | Length | Notes |
 87 | |-----|-----|----------|
 88 | | [Azure Data Fundamentals: Explore core data concepts](https://docs.microsoft.com/en-us/learn/paths/azure-data-fundamentals-explore-core-data-concepts/) | 1 hr 39 min | identify and describe core data concepts such as relational, non-relational, big data, and analytics, and explore how this technology is implemented with Microsoft Azure. You will explore the roles, tasks, and responsibilities in the world of data |
 89 | | [Azure Data Fundamentals: Explore relational data in Azure](https://docs.microsoft.com/en-us/learn/paths/azure-data-fundamentals-explore-relational-data/) | 1 hr 27 min | explore relational data offerings, provisioning and deploying relational databases, and querying relational data through cloud data solutions with Microsoft Azure |
 90 | | [Azure Data Fundamentals: Explore non-relational data in Azure](https://docs.microsoft.com/en-us/learn/paths/azure-data-fundamentals-explore-non-relational-data/) | 2 hr 24 min | explore non-relational data offerings, provisioning and deploying non-relational databases, and non-relational data stores with Microsoft Azure |
 91 | | [Azure Data Fundamentals: Explore modern data warehouse analytics in Azure](https://docs.microsoft.com/en-us/learn/paths/azure-data-fundamentals-explore-data-warehouse-analytics/)| 1 hr 51 min | explore the processing options available for building data analytics solutions in Azure. You will explore Azure Synapse Analytics, Azure Databricks, and Azure HDInsight |
 92 | | [A Guide to Cloud - Summary Learning Path Videos](https://www.youtube.com/playlist?list=PLhLKc18P9YODENOj4F2nHbNXeYwY1zYGb) | 20 videos | If you complete the Learning Path and then watch these video's you should be confident to pass the Data Fundamentals Exam |
 93 | 
 94 | 
 95 | ## Describe core data concepts (15-20%):
 96 | 
 97 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/big-data-logical.svg" alt="Big Data Architecture"></p>
 98 | <p align="center"></p>
 99 | 
100 | [My Data Workload Notes](https://github.com/msandfor/DP-900/blob/main/data.md)
101 | 
102 | | Reference | Objective | Item |
103 | |-----|-----|-----|
104 | | [Describe the difference between batch and streaming data](https://docs.microsoft.com/en-us/learn/modules/explore-core-data-concepts/4-describe-difference) | types of core data workloads | describe the difference between batch and streaming data |
105 | | [Choosing a batch processing technology in Azure](https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing) | types of core data workloads | describe the difference between batch and streaming data |
106 | | [Choosing a stream processing technology in Azure](https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/stream-processing) | types of core data workloads | describe the difference between batch and streaming data |
107 | | [Real Time vs Batch Processing vs Stream Processing](https://www.bmc.com/blogs/batch-processing-stream-processing-real-time/) | types of core data workloads | describe the difference between batch and streaming data |
108 | | [Big Data Battle : Batch Processing vs Stream Processing](https://medium.com/@gowthamy/big-data-battle-batch-processing-vs-stream-processing-5d94600d8103) | types of core data workloads | describe the difference between batch and streaming data |
109 | | [What Is Data Consistency?](https://www.easytechjunkie.com/what-is-data-consistency.htm) | types of core data workloads | describe the difference between batch and streaming data |
110 | | [Explore the characteristics of relational data](https://docs.microsoft.com/en-us/learn/modules/describe-concepts-of-relational-data/2-explore-characteristics) | types of core data workloads | describe the characteristics of relational data |
111 | | [Relational vs. NoSQL data](https://docs.microsoft.com/en-us/dotnet/architecture/cloud-native/relational-vs-nosql-data) | types of core data workloads | describe the characteristics of relational data |
112 | | [Relational Data Model](https://binaryterms.com/relational-data-model.html) | types of core data workloads | describe the characteristics of relational data |
113 | | [Big data architecture style](https://docs.microsoft.com/en-us/azure/architecture/guide/architecture-styles/big-data) | types of core data workloads | describe the characteristics of relational data |
114 | | [Identify types of data and data storage](https://docs.microsoft.com/en-us/learn/modules/explore-core-data-concepts/3-identify-types-storage) | types of core data workloads | describe the characteristics of relational data |
115 | | [Description of the database normalization basics](https://docs.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description) | types of core data workloads | describe the characteristics of relational data |
116 | | [Clustered and Nonclustered Indexes Described](https://docs.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15) | types of core data workloads | describe the characteristics of relational data |
117 | | [Explore the characteristics of relational data](https://docs.microsoft.com/en-us/learn/modules/describe-concepts-of-relational-data/2-explore-characteristics) | types of core data workloads | describe the characteristics of relational data |
118 | | [Understand data store models](https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/data-store-overview) | types of core data workloads | describe the characteristics of relational data |
119 | | [Databases](https://docs.microsoft.com/en-us/sql/relational-databases/databases/databases?view=sql-server-ver15) | types of core data workloads | describe the characteristics of relational data |
120 | | [Explore data analytics](https://docs.microsoft.com/en-us/learn/modules/explore-concepts-of-data-analytics/4-explore) | Describe data analytics core concepts | describe analytics techniques |
121 | | [Describe, diagnose, and predict with IoT Analytics](https://azure.microsoft.com/en-us/blog/answering-whats-happening-whys-happening-and-what-will-happen-with-iot-analytics/) | Describe data analytics core concepts | describe analytics techniques |
122 | | [Descriptive, predictive, and prescriptive analytics: How are they different?](https://www.zdnet.com/article/descriptive-predictive-and-prescriptive-analytics-how-are-they-different/) | Describe data analytics core concepts | describe analytics techniques |
123 | | [Explore data visualization](https://docs.microsoft.com/en-us/learn/modules/explore-concepts-of-data-analytics/3-explore-data-visualization) | Describe data analytics core concepts | describe data visualization |
124 | | [Describe data ingestion and processing](https://docs.microsoft.com/en-us/learn/modules/explore-concepts-of-data-analytics/2-describe-data-ingestion-process) | Describe data analytics core concepts | describe the concepts of data processing |
125 | | [Extract, transform, and load (ETL)](https://docs.microsoft.com/en-us/azure/architecture/data-guide/relational-data/etl) | Describe data analytics core concepts | describe the concepts of data processing |
126 | | [SQL Server Integration Services](https://docs.microsoft.com/en-us/sql/integration-services/sql-server-integration-services?view=sql-server-ver15) |  |  |
127 | | [SSIS and Data Sources](https://social.technet.microsoft.com/wiki/contents/articles/1947.ssis-and-data-sources.aspx) |  |  |
128 | | [What is Azure Data Factory?](https://docs.microsoft.com/en-us/azure/data-factory/introduction) |  |  |
129 | | [Copy activity in Azure Data Factory](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-overview) |  |  |
130 | | [Find the analytics product you need](https://azure.microsoft.com/en-us/product-categories/analytics/) |  |  |
131 | | [Load data into Azure Data Lake Storage Gen2 with Azure Data Factory](https://docs.microsoft.com/en-us/azure/data-factory/load-azure-data-lake-storage-gen2) |  |  |
132 | | [Introduction to Azure Data Lake Storage Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction) |  |  |
133 | | [What is Azure SQL Database?](https://docs.microsoft.com/en-us/azure/azure-sql/database/sql-database-paas-overview) |  |  |
134 | | [What is dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics?](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is) |  |  |
135 | | [FOUR TYPES OF BUSINESS ANALYTICS TO KNOW](https://www.analyticsinsight.net/four-types-of-business-analytics-to-know/) |  |  |
136 | | [Explore data analytics](https://docs.microsoft.com/en-us/learn/modules/explore-concepts-of-data-analytics/4-explore) |  |  |
137 | | [Explore data visualization](https://docs.microsoft.com/en-us/learn/modules/explore-concepts-of-data-analytics/3-explore-data-visualization) |  |  |
138 | | [Visualization types in Power BI](https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-types-for-reports-and-q-and-a) |  |  |
139 | | [Tips and tricks for creating reports in Power BI Desktop](https://docs.microsoft.com/en-us/power-bi/create-reports/desktop-tips-and-tricks-for-creating-reports) |  |  |
140 | | [Visual types in Power BI](https://docs.microsoft.com/en-us/power-bi/consumer/end-user-visual-type) |  |  |
141 | | [Features of the key influencers visual](https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-influencers) |  |  |
142 | | [Scatter charts, bubble charts, and dot plot charts in Power BI](https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-scatter) |  |  |
143 | | [Overview of data analysis](https://docs.microsoft.com/en-us/learn/modules/data-analytics-microsoft/2-data-analysis) |  |  |
144 | 
145 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/analytics.jpg" alt="Analytics"></p>
146 | <p align="center"></p>
147 | 
148 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/four-types-business-analytics.png" alt="4 types of business analytics"></p>
149 | <p align="center"></p>
150 | 
151 | 
152 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/2-data-process.png" alt="Data Processing"></p>
153 | <p align="center"></p>
154 | 
155 | <p><img align="center" src="https://github.com/msandfor/DP-900/blob/main/assets/copy-activity.png" alt="Data Factory"></p>
156 | <p align="center"></p>
157 | 
158 | 
159 | ## Describe how to work with relational data on Azure (25-30%)
160 | 
161 | [My Relational Data Workload Notes](https://github.com/msandfor/DP-900/blob/main/relationaldata.md)
162 | 
163 | | Reference | Objective | Item |
164 | |-----|-----|-----|
165 | | [Clustered and Nonclustered Indexes Described](https://docs.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15) |  |  |
166 | | [Primary and Foreign Key Constraints](https://docs.microsoft.com/en-us/sql/relational-databases/tables/primary-and-foreign-key-constraints?view=sql-server-ver15&viewFallbackFrom=sql-server-ver1) |  |  |
167 | | [Views](https://docs.microsoft.com/en-us/sql/relational-databases/views/views?view=sql-server-ver15) |  |  |
168 | | [Understand data store models](https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/data-store-overview) |  |  |
169 | | [What is Azure Table storage ?](https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-overview) |  |  |
170 | | [Explore relational data structures](https://docs.microsoft.com/en-us/learn/modules/describe-concepts-of-relational-data/3-explore-structures) |  |  |
171 | | [Heaps (Tables without Clustered Indexes)](https://docs.microsoft.com/en-us/sql/relational-databases/indexes/heaps-tables-without-clustered-indexes?view=sql-server-ver15) |  |  |
172 | | [Explore the characteristics of relational data](https://docs.microsoft.com/en-us/learn/modules/describe-concepts-of-relational-data/2-explore-characteristics) |  |  |
173 | | [Identify types of data and data storage](https://docs.microsoft.com/en-us/learn/modules/explore-core-data-concepts/3-identify-types-storage) |  |  |
174 | | [Relational vs. NoSQL data](https://docs.microsoft.com/en-us/dotnet/architecture/cloud-native/relational-vs-nosql-data) |  |  |
175 | | [Describe types of non-relational and NoSQL databases](https://docs.microsoft.com/en-us/learn/modules/explore-concepts-of-non-relational-data/4-describe-types-nosql-databases) |  |  |
176 | | [Understand data store models](https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/data-store-overview) |  |  |
177 | | [Azure SQL Database](https://docs.microsoft.com/en-us/learn/modules/explore-relational-data-offerings/4-azure-sql-database) |  |  |
178 | | [Explore Azure Synapse Analytics](https://docs.microsoft.com/en-us/learn/modules/explore-data-storage-processing-azure/3-explore-azure-synapse-analytics) |  |  |
179 | | [Explore Azure Cosmos DB](https://docs.microsoft.com/en-us/learn/modules/explore-non-relational-data-offerings-azure/5-explore-azure-cosmos-database) |  |  |
180 | | [What is SQL Server on Azure Virtual Machines (Windows)](https://docs.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/sql-server-on-azure-vm-iaas-what-is-overview) |  |  |
181 | | [What is Azure SQL Database?](https://docs.microsoft.com/en-us/azure/azure-sql/database/sql-database-paas-overview) |  |  |
182 | | [SQL Server on Azure virtual machines](https://docs.microsoft.com/en-us/learn/modules/explore-relational-data-offerings/3-sql-server-azure-virtual-machines) |  |  |
183 | | [Explore relational Azure data services](https://docs.microsoft.com/en-us/learn/modules/explore-relational-data-offerings/2-azure-data-services) |  |  |
184 | | [Features comparison: Azure SQL Database and Azure SQL Managed Instance](https://docs.microsoft.com/en-us/azure/azure-sql/database/features-comparison) |  |  |
185 | | [What is Azure SQL Managed Instance?](https://docs.microsoft.com/en-us/azure/azure-sql/managed-instance/sql-managed-instance-paas-overview) |  |  |
186 | | [What is Azure Database for PostgreSQL?](https://docs.microsoft.com/en-us/azure/postgresql/overview) |  |  |
187 | | [Configure TLS connectivity in Azure Database for PostgreSQL - Single Server](https://docs.microsoft.com/en-us/azure/postgresql/concepts-ssl-connection-security) |  |  |
188 | | [Configure TLS in Azure Database for PostgreSQL - Hyperscale (Citus)](https://docs.microsoft.com/en-us/azure/postgresql/concepts-hyperscale-ssl-connection-security) |  |  |
189 | | [Use Azure Active Directory for authenticating with PostgreSQL](https://docs.microsoft.com/en-us/azure/postgresql/concepts-aad-authentication) |  |  |
190 | | [Dedicated SQL pool (formerly SQL DW) architecture in Azure Synapse Analytics](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture) |  |  |
191 | | [What is SaaS?](https://azure.microsoft.com/en-us/overview/what-is-saas/) |  |  |
192 | | [SQL Server database migration to Azure SQL Database](https://docs.microsoft.com/en-us/azure/azure-sql/database/migrate-to-database-from-sql-server) |  |  |
193 | | [Azure SQL Database Managed Instance](https://docs.microsoft.com/en-us/learn/modules/explore-relational-data-offerings/5-azure-sql-database-managed-instance) |  |  |
194 | | [PostgreSQL, MariaDB, and MySQL](https://docs.microsoft.com/en-us/learn/modules/explore-relational-data-offerings/6-postgresql-mariadb-mysql) |  |  |
195 | | [What is Azure SQL?](https://docs.microsoft.com/en-us/azure/azure-sql/azure-sql-iaas-vs-paas-what-is-overview) |  |  |
196 | | [Connectivity architecture for Azure SQL Managed Instance](https://docs.microsoft.com/en-us/azure/azure-sql/managed-instance/connectivity-architecture-overview) |  |  |
197 | | [Use Azure SQL Managed Instance securely with public endpoints](https://docs.microsoft.com/en-us/azure/azure-sql/managed-instance/public-endpoint-overview) |  |  |
198 | | [Quickstart: Configure a point-to-site connection to Azure SQL Managed Instance from on-premises](https://docs.microsoft.com/en-us/azure/azure-sql/managed-instance/point-to-site-p2s-configure) |  |  |
199 | | [Tutorial: Secure a database in Azure SQL Database](https://docs.microsoft.com/en-us/azure/azure-sql/database/secure-database-tutorial) |  |  |
200 | | [Azure SQL Database and Azure SQL Managed Instance connect and query articles](https://docs.microsoft.com/en-us/azure/azure-sql/database/connect-query-content-reference-guide) |  |  |
201 | | [Configure Always Encrypted by using Azure Key Vault](https://docs.microsoft.com/en-us/azure/azure-sql/database/always-encrypted-azure-key-vault-configure?tabs=azure-powershell) |  |  |
202 | | [Dynamic Data Masking](https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver15) |  |  |
203 | | [Transparent data encryption for SQL Database, SQL Managed Instance, and Azure Synapse Analytics](https://docs.microsoft.com/en-us/azure/azure-sql/database/transparent-data-encryption-tde-overview?view=sql-server-ver15&tabs=azure-portal) |  |  |
204 | | [What is Azure Data Studio?](https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is-azure-data-studio?view=sql-server-ver15) |  |  |
205 | | [Quickstart: Use Azure Data Studio to connect and query Azure SQL database](https://docs.microsoft.com/en-us/sql/azure-data-studio/quickstart-sql-database?view=sql-server-ver15) |  |  |
206 | | [Quickstart: Use Azure Data Studio to connect and query PostgreSQL](https://docs.microsoft.com/en-us/sql/azure-data-studio/quickstart-postgres?view=sql-server-ver15) |  |  |
207 | | [Connect to Synapse SQL with SQL Server Management Studio (SSMS)](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/get-started-ssms) |  |  |
208 | | [Quickstart: Azure Database for MariaDB: Use MySQL Workbench to connect and query data](https://docs.microsoft.com/en-us/azure/mariadb/connect-workbench) |  |  |
209 | | [Quickstart: Create a server-level firewall rule using the Azure portal](https://docs.microsoft.com/en-us/azure/azure-sql/database/firewall-create-server-level-portal-quickstart) |  |  |
210 | | [What is Azure role-based access control (Azure RBAC)?](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) |  |  |
211 | | [Manage storage account access keys](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal) |  |  |
212 | | [Use Azure Active Directory authentication](https://docs.microsoft.com/en-us/azure/azure-sql/database/authentication-aad-overview) |  |  |
213 | | [Microsoft identity platform access tokens](https://docs.microsoft.com/en-us/azure/active-directory/develop/access-tokens) |  |  |
214 | | [What is SQL Server Management Studio (SSMS)?](https://docs.microsoft.com/en-us/sql/ssms/sql-server-management-studio-ssms?view=sql-server-ver15) |  |  |
215 | | [What is Azure Data Studio?](https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is-azure-data-studio?view=sql-server-ver15) |  |  |
216 | | [Use Jupyter Notebooks in Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/notebooks/notebooks-guidance?view=sql-server-ver15) |  |  |
217 | | [SQL Server Data Tools](https://docs.microsoft.com/en-us/sql/ssdt/sql-server-data-tools?view=sql-server-ver15) |  |  |
218 | | [Using multi-factor Azure Active Directory authentication](https://docs.microsoft.com/en-us/azure/azure-sql/database/authentication-mfa-ssms-overview) |  |  |
219 | | [Configure multi-factor authentication for SQL Server Management Studio and Azure AD](https://docs.microsoft.com/en-us/azure/azure-sql/database/authentication-mfa-ssms-configure) |  |  |
220 | | [Describe provisioning relational data services](https://docs.microsoft.com/en-us/learn/modules/explore-provision-deploy-relational-database-offerings-azure/2-describe-provision-relational-data-services) |  |  |
221 | | [What are ARM templates?](https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/overview) |  |  |
222 | | [Describe configuring relational data services](https://docs.microsoft.com/en-us/learn/modules/explore-provision-deploy-relational-database-offerings-azure/5-configure-relational-data-services) |  |  |
223 | | [Describe configuring Azure SQL Database, Azure Database for PostgreSQL, and Azure Database for MySQL](https://docs.microsoft.com/en-us/learn/modules/explore-provision-deploy-relational-database-offerings-azure/6-configure-sql-database-mysql-postgresql) |  |  |
224 | | [Configure the Windows Firewall to Allow SQL Server Access](https://docs.microsoft.com/en-us/sql/sql-server/install/configure-the-windows-firewall-to-allow-sql-server-access?view=sql-server-ver15) |  |  |
225 | | [UPDATE - SQL Command](https://docs.microsoft.com/en-us/sql/odbc/microsoft/update-sql-command?view=sql-server-ver15) |  |  |
226 | | [SQL Server commands - DML, DDL, DCL, TCL](https://social.technet.microsoft.com/wiki/contents/articles/34477.sql-server-commands-dml-ddl-dcl-tcl.aspx) |  |  |
227 | | [Transact-SQL statements](https://docs.microsoft.com/en-us/sql/t-sql/statements/statements?view=sql-server-ver15) |  |  |
228 | | [DROP TABLE Command](https://docs.microsoft.com/en-us/sql/odbc/microsoft/drop-table-command?view=sql-server-ver15) |  |  |
229 | | [Query Azure Cosmos DB](https://docs.microsoft.com/en-us/learn/modules/explore-non-relational-data-stores-azure/3-query-azure-cosmos-db) |  |  |
230 | | [CREATE VIEW (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql?view=sql-server-ver15) |  |  |
231 | | [SELECT Clause (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/queries/select-clause-transact-sql?view=sql-server-ver15) |  |  |
232 | | [SELECT - GROUP BY- Transact-SQL](https://docs.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver15) |  |  |
233 | | [UPDATE (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/queries/update-transact-sql?view=sql-server-ver15) |  |  |
234 | | [TRUNCATE TABLE (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver15) |  |  |
235 | | [INSERT (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver15) |  |  |
236 | | [CREATE TABLE (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-ver15) |  |  |
237 | | [SUM (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver15) |  |  |
238 | | [AVG (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/functions/avg-transact-sql?view=sql-server-ver15) |  |  |
239 | | [Query relational data in Azure Database for PostgreSQL](https://docs.microsoft.com/en-us/learn/modules/query-relational-data/4-azure-database-for-postgresql) |  |  |
240 | | [sqlcmd Utility](https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-ver15) |  |  |
241 | | [az postgres](https://docs.microsoft.com/en-us/cli/azure/postgres?view=azure-cli-latest) |  |  |
242 | | [Introduction to SQL](https://docs.microsoft.com/en-us/learn/modules/query-relational-data/2-introduction-to-sql) |  |  |
243 | | [Transact-SQL statements](https://docs.microsoft.com/en-us/sql/t-sql/statements/statements?view=sql-server-ver15) |  |  |
244 | | [DDL, DML, DCL and TCL Commands in Sql Server](https://www.c-sharpcorner.com/blogs/ddl-dml-dcl-and-tcl-commands-in-sql-server1) |  |  |
245 | 
246 | 
247 | ## Describe how to work with non-relational data on Azure (25-30%)
248 | [My Non-Relational Data Workload Notes](https://github.com/msandfor/DP-900/blob/main/nonrelational.md)
249 | | Reference | Objective | Notes |
250 | |-----|-----|----------|
251 | | [Understand data store models](https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/data-store-overview) |  |  |
252 | | [Non-relational data and NoSQL](https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data) |  |  |
253 | | [Identify the need for data solutions](https://docs.microsoft.com/en-us/learn/modules/explore-core-data-concepts/2-identify-need-data-solutions) |  |  |
254 | | []() |  |  |
255 | | []() |  |  |
256 | | []() |  |  |
257 | | []() |  |  |
258 | | []() |  |  |
259 | | []() |  |  |
260 | | []() |  |  |
261 | | []() |  |  |
262 | | []() |  |  |
263 | | []() |  |  |
264 | | []() |  |  |
265 | | []() |  |  |
266 | | []() |  |  |
267 | | []() |  |  |
268 | | []() |  |  |
269 | | []() |  |  |
270 | 
271 | 
272 | 
273 | ## Describe an analytics workload on Azure (25-30%)
274 | 
275 | [My Analytics Workload Notes](https://github.com/msandfor/DP-900/blob/main/analytics.md)
276 | 
277 | | Reference | Objective | Notes |
278 | |-----|-----|----------|
279 | | [Online analytical processing (OLAP)](https://docs.microsoft.com/en-us/azure/architecture/data-guide/relational-data/online-analytical-processing) |  |  |
280 | | [Online transaction processing (OLTP)](https://docs.microsoft.com/en-us/azure/architecture/data-guide/relational-data/online-transaction-processing) |  |  |
281 | | [Describe the difference between batch and streaming data](https://docs.microsoft.com/en-us/learn/modules/explore-core-data-concepts/4-describe-difference) |  |  |
282 | | []() |  |  |
283 | | []() |  |  |
284 | | []() |  |  |
285 | | []() |  |  |
286 | | []() |  |  |
287 | | []() |  |  |
288 | | []() |  |  |
289 | | []() |  |  |


--------------------------------------------------------------------------------