├── .gitignore
├── README.md
├── images
    ├── 1.1.1.png
    ├── 1.1.2.png
    ├── 2.1.1.png
    ├── 2.1.2.png
    ├── 2.3.1.png
    ├── 2.3.2.png
    ├── 2.5.1.png
    ├── 3.1.1.png
    ├── 3.1.2.png
    ├── 3.2.1.png
    ├── 3.3.1.png
    ├── 4.0.1.png
    └── 4.4.1.png
├── 2_Relational_database
    ├── 2.5_working_with_azure_database.md
    ├── 2.0_relational_data_concepts.md
    ├── 2.1_shared_responsibility_model.md
    ├── 2.3_Querying_relational_data.md
    ├── 2.2_Relational_database_oferings_in_Azure.md
    └── 2.4_Relational_data_management_task.md
├── 0.0_Refernce.md
├── 1_Core_Data_Concepts
    ├── 1.0_intro_core_data_concepts.md
    ├── 1.3_data_analytics.md
    └── 1.1_data_processing.md
├── 3_Non-Relational_Data
    ├── 3.0_non-relational-data-concepts.md
    ├── 3.1_Non-Relational_database_offerings_in_Azure.md
    ├── 3.3_Azure_Storage_Services.md
    └── 3.2_CosmosDB.md
└── 4_DataWareHousing_in_Azure
    ├── 4.1_modern_data_warehousing.md
    ├── 4.0_analytics_workloads.md
    ├── 4.4_MicroSoft_PowerBI.md
    ├── 4.3_data_analytics_tools.md
    └── 4.2_data_ingestion_components.md


/.gitignore:
--------------------------------------------------------------------------------
1 | 
2 | .idea


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # AZURE-DP900
2 | Azure DP 900 notes
3 | 


--------------------------------------------------------------------------------
/images/1.1.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/1.1.1.png


--------------------------------------------------------------------------------
/images/1.1.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/1.1.2.png


--------------------------------------------------------------------------------
/images/2.1.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/2.1.1.png


--------------------------------------------------------------------------------
/images/2.1.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/2.1.2.png


--------------------------------------------------------------------------------
/images/2.3.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/2.3.1.png


--------------------------------------------------------------------------------
/images/2.3.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/2.3.2.png


--------------------------------------------------------------------------------
/images/2.5.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/2.5.1.png


--------------------------------------------------------------------------------
/images/3.1.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/3.1.1.png


--------------------------------------------------------------------------------
/images/3.1.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/3.1.2.png


--------------------------------------------------------------------------------
/images/3.2.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/3.2.1.png


--------------------------------------------------------------------------------
/images/3.3.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/3.3.1.png


--------------------------------------------------------------------------------
/images/4.0.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/4.0.1.png


--------------------------------------------------------------------------------
/images/4.4.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eandbsoftware/AZURE-DP900/HEAD/images/4.4.1.png


--------------------------------------------------------------------------------
/2_Relational_database/2.5_working_with_azure_database.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Working with Azure Database
 3 | 
 4 | - go to Azure portal -> search for azure SQL -> 3 options for create
 5 |   - SQL db
 6 |   - SQL managed instance 
 7 |   - SQL VM .i.e, diff configuration of OS and SQL server Version
 8 | - ![img.png](../images/2.5.1.png)
 9 | - choose connectivity method
10 | - choose firewall rules 
11 | - click NExt and create
12 | 
13 | - Using Query editor we can run sql queries
14 | - also we can use Azure Studio to run queries. its one of most popular tools to manager sql data on Azure
15 |  


--------------------------------------------------------------------------------
/0.0_Refernce.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Reference links
 3 | 
 4 | 1. [how to prepare](https://medium.com/bb-tutorials-and-thoughts/how-to-pass-microsoft-azure-dp-900-data-fundamentals-exam-180aebdc27b2)
 5 | 2. [examtopics for practice](https://www.examtopics.com/exams/microsoft/dp-900/view/)
 6 | 3. [Exam details](https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4wsKZ)
 7 | 4. [200 practice question](https://medium.com/bb-tutorials-and-thoughts/200-practice-questions-for-azure-data-dp-900-fundamentals-exam-ea2446ee3a0)
 8 | 5. [Read this after course completion](https://docs.microsoft.com/en-us/learn/paths/azure-data-fundamentals-explore-data-warehouse-analytics/)
 9 | ## Exam Topics breakup:
10 | 
11 | https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4wsKZ
12 | 
13 | 1. Describe core data concepts (15–20%)
14 | 2. Describe how to work with relational data on Azure (25–30%)
15 | 3. Describe how to work with non-relational data on Azure (25–30%)
16 | 4. Describe an analytics workload on Azure (25–30%)
17 | 
18 | 
19 | **Focus on Following topics:** 
20 | - Relational and Non Relational Database
21 | - Modern Data Warehouse
22 | - Reporting (Power BI)
23 | - Data Ingestion and processing
24 | - Data Bricks
25 | - HInsight


--------------------------------------------------------------------------------
/1_Core_Data_Concepts/1.0_intro_core_data_concepts.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Core data concepts
 3 | 
 4 | ## What is data? 
 5 | 
 6 | - data is the new oil
 7 | - data is collection of facts such as numbers, descriptions and observations used in decision-making.
 8 | 
 9 | ## How we can organize data?
10 | 
11 | - Structured
12 | - Semi-Structured
13 | - UnStructured
14 | 
15 | ### Structured Data
16 | - tabular data represented by columns and tables in relational database
17 | - examples: SQL Server, Oracle DB2, MySQL etc.
18 | - we have upfront info on structure of data called schema
19 | 
20 | 
21 | ### Semi-Structured Data
22 | 
23 | - its not classified based on rigid schema structure of relational database but it still holds some structure to it. 
24 | - examples: XMl(Extensible Markup Language), JSON(JavaScript Object Notation), key-value pairs, graphs etc. 
25 | - technologies that work with semi-structured data formats: MongoDb, Cosmos Db, Cassandra
26 | 
27 | ### Non Structured Data
28 | 
29 | - don't have any Structure or searchable fields
30 | - these include: binary files, documents, images, audio files etc.
31 | - data hosting can be done using file server, SharePoint, Azure Files, Azure Data Lake, Azure Blob(Binary Large Objects) Storage.
32 | 


--------------------------------------------------------------------------------
/3_Non-Relational_Data/3.0_non-relational-data-concepts.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Non-Relational Data Concepts
 3 | 
 4 | - When data is coming from multiple sources viz. mobile phone, social n/w etc. then relational database is not a good fit as its schema may not be compatible with incoming data and also cost will be very high for large volume of data. 
 5 | 
 6 | - in that case data is first input to staging later which can hold large volume of non-relational data
 7 | - this data is normally unstructured like audio, video files etc. 
 8 | - **Azure files or Blob Storage is used for such type of data repository**
 9 | - few good use cases for no non-relational data are IOT and telematics, gaming, web and mobile, and social network applications.
10 | - Also we have semi-structured data which contains fields ( not same as relational db) and it can be stored in Azure files for further pricessing
11 | - **The main non-relational file formats are JSON, which is probably the most popular and it's quickly replaced the XML as the de facto standard for non-relational data.**
12 | - Another quite popular one is **Parquet** which uses a columnar format. Parquet was developed by **Cloudera and Twitter** and it's sufficient levels of compression and encoding make an excellent format for data ingestion.
13 | - Another common columnar format is **ORC which stands for Optimized Row Columnar format**. It was **developed by Hortonworks** for optimizing read and write operations on a Apache Hive
14 | - **Avro**, which uses a row based format. Also created by **Apache**.


--------------------------------------------------------------------------------
/1_Core_Data_Concepts/1.3_data_analytics.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Data Analytics
 3 | 
 4 | - data analysis is the process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions, in supporting decision-making.
 5 | - examples: getting recomendatation on websites for which product to buy, which movie to watch, whether stock prices will go up or down etc. 
 6 | 
 7 | - 5 types:
 8 | ### descriptive Analysis
 9 | 
10 | - focuses on discovering what happened in the past based on historical data
11 | - helps summarize large data into meaningful outcomes
12 | - example: total sales last year
13 | - 
14 | 
15 | ### diagnostic analytics
16 | - focuses on why it happened
17 | - example: you could identify that a substantial amount of your customers left you in the month when your competitor launched a new product
18 | 
19 | ### predictive analytics
20 | - goal to discover what will happen
21 | - example: what will be the Microsoft stock price by the end of next year?, will this customer default on his loan?
22 | 
23 | 
24 | ### prescriptive analytics
25 | - what you should do next
26 | -  while predictive analytics might tell you that an equipment component is likely to fail, if it gets above a certain temperature, prescriptive analytics could recommend you to slow down the machine, to prevent the failure from happening.This could give enough time for a maintenance team to arrive and replace the component before it breaks
27 | 
28 | 
29 | ### cognitive analytics
30 | - attempts to draw inferences for an existing data and patterns
31 | - its a realm of artificial intelligence and it can do things such as transcribing audio to text or vice versa, find your objects and images, detect your anomalies in using natural language processing and LP, to understand and translate language and much more. Microsoft has an entire set of APIs related to this called Cognitive services.
32 | 


--------------------------------------------------------------------------------
/4_DataWareHousing_in_Azure/4.1_modern_data_warehousing.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # modern data warehousing(MDW)
 3 | 
 4 | - data warehouses are no longer just reorganizing the data from your OLTP systems into a more read intensive format. Instead, data houses nowadays will gather data from multiple data stores, including IOT, social networks, web APIs, files and multiple corporate systems such as your CRM, HR application, and ERP systems.
 5 | 
 6 | - **Advantages**:
 7 |   - cross referencing this data instead of keeping those in silos
 8 | 
 9 | - modern data warehouses must be able to read the data in various formats
10 |   - XML, Parquet, ORC, JSON and much, much more
11 | - can even use cognitive services to extract tax from a recorded phone call or obtain metadata from pictures that your company has.
12 | - modern data warehouse solutions should be able to handle big data 
13 |   - **phases of MDW:**
14 |     - **data ingestion**
15 |       - capturing the data from diff sources
16 |       - **Solutions from Azure: Azure Data Factory, Stream Analytics or Event Hubs.**
17 |     - **data staging**
18 |       - holds the data temporarily
19 |       - ELT operations keep the data on the staging layer and have the analytical system just grab the data on the fly for further analysis
20 |       - **Azure Data Lake** can be used if we want to keep data in raw format
21 |     - **data transformation:**
22 |       - transform and process this data and model that into a format that is more convenient for the reporting
23 |       - includes data cleansing, filtering, normalization or denormalization, conversational formats and so on.
24 |       - **Solutions from Azure: Azure Data Factory and Databricks**
25 |     - **Data Modeling:**
26 |       - model and serve your data so that business intelligence analysts can generate reports and conclusions about it
27 |       - Azure BI Services and Azure Synapse Analytics
28 |       - **Solution: Power BI**


--------------------------------------------------------------------------------
/3_Non-Relational_Data/3.1_Non-Relational_database_offerings_in_Azure.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Non-Relational Database Offering in Azure
 3 | 
 4 | - also called NoSQL databases
 5 | - **Types**: 
 6 |   - **key-value stores**,
 7 |     - simplest and fastest
 8 |     - each row can have any number of columns
 9 |     - can have any number of columns
10 |     - each item is key-value pair
11 |     - read and write data very quickly
12 |     - **limitations**:
13 |       - search only based on key
14 |       - write operation is restricted to insert and delete
15 |       - to update and item - retrive it, modify in memory, write back to db overwriting original db
16 |       - **Azure Tables, Cosmos DB(Table API)**
17 |  
18 |   - **document databases**,
19 |     - better than key-value store 
20 |     - store data in json format
21 |     - other formats- xml , yaml etc.
22 |     - flexible schema - any number of columns 
23 |     - more flexibility than relational database
24 |     - can do search on values as well as key
25 |     - support indexing for fast retrival
26 |     - single document has all needed info while for same info in relational db we may have to query multiple tables
27 |     - **limitations**: 
28 |       - data repetitions may occur
29 |       - more storage
30 |       - not as fast as key-value but much better search
31 |     - Cosmos Db(SQL API)
32 |   - **column-family databases**,
33 |     - similar to relational but here we can group logically related columns into column-families
34 |     - ![img.png](../images/3.1.1.png)
35 |     - retrieval of related info is much faster
36 |     - As JSON is good example for document structure, parquet is good example for column-family. 
37 |     - apache Cassandra, Cosmos DB(Cassandra API)
38 |     - 
39 |   - **graph databases**
40 |     - use to model complex relationship
41 |     - consists of nodes(info about objects) and edges( info about relationship)
42 |     - ![img.png](../images/3.1.2.png)
43 |     - edges can also have direction
44 |     - goal of graph db is to perform queries which officially traverse this network of nodes and edges. 
45 |     - fast analysis without nested joins and subqueries
46 |     - example : Cosmos DB( Gremlin API)


--------------------------------------------------------------------------------
/2_Relational_database/2.0_relational_data_concepts.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Relational Data Concepts:
 3 | 
 4 | - Everything is **hosted in tables**.
 5 | - most important and widespread methods for storing and retrieving data
 6 | - **provides a simple and well-understood model for holding data**
 7 | - We use Relational Database when we need strong consistency for data
 8 | - example of applications: banking applications, e-commerce and online retail systems, and flight and hotel reservation sites
 9 | 
10 | 
11 | ## Understanding Relational Database
12 | 
13 | - Everything is stored in Tables
14 | - A table is just a database object that **holds data in a row and column format.**
15 | - Each table must have a column or a combination of columns, that uniquely identifies each row in that table. This is called a primary key. No two rows can have the same primary key
16 | - As you commonly have several tables in your database, you generally use primary and foreign key string for the relationship between these tables
17 | - A foreign key is just a reference to a primary key on a related table.
18 | - cardinality of the relationship:
19 |   - one-to-one
20 |   - one-to-many
21 |   - many-to-many
22 | - Views:
23 |   - view is just a virtual table based on the results of a query that allows you to filter the data
24 |   
25 | - **Index**:
26 |   - index help you search data substantially faster
27 |   - occupy extra space on the databases and 
28 |   - each index should be maintained by the database server
29 |   - two types:
30 |     - **clustered index**:
31 |       - which physically organizes the data on your table, based on a column or key that you choose
32 |       - most important index on the table
33 |       - we can have only one clustered index
34 |     - **nonclustered index**:
35 |       - less efficient than clustered ones
36 |       - we can have as many as required
37 |   - The overall rule is create a clustered index based on your most searched column, and one or more nonclustered indices for columns that are also searched relatively often.
38 | - **stored procedures and functions**:
39 |   - implement repeatable portion of code
40 |   - can even accept input and output parameters, making them quite flexible


--------------------------------------------------------------------------------
/4_DataWareHousing_in_Azure/4.0_analytics_workloads.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # analytics workloads
 3 | 
 4 | - analytical workloads is about transforming data into various insights 
 5 | 
 6 | ## data processing solutions:
 7 | 
 8 | - OLTP
 9 |   - online transaction processing solution
10 |   - for the day-to-day operations of a business
11 |   - These kinds of systems record transactions, which are small in discrete units of work that needs to be executed or rolled back as a whole, example deposit in bank account etc.
12 |   - follow ACID rules, which stand for atomicity, consistency, isolation, and durability
13 |   - often associated with relational databases, such as SQL Server, Oracle, DB2, and so on
14 |   - optimized for CRUD transactions, create, read, update and delete
15 |   - 
16 | - OLAP
17 |   - online analytical processing
18 |   - provide support for business intelligence or BI, which is a set of technologies, applications, and practices to support business decision making.
19 |   - analytical systems must be optimized for read operations.
20 |   - example: data warehousing solutions such as SQL Server Analysis services or Azure Synapse Analytics
21 | 
22 | 
23 | ## Data Modeling: 
24 | 
25 | - based on data processing solution OLTV or OLAP we decide how we will model out data
26 | - OLTP:
27 |   - works on normalized data
28 |   - Normalization consist of distributing the data across several related tables to ensure data integrity and preventing data redundancy.
29 |   - favours CRUD operaions
30 |   
31 | - OLAP: 
32 |   - de-normalized model which decreases the number of table even if that incurs some data redundancy
33 |   - because for analytics many tables need to be joined so de-normalized model is best suited
34 | 
35 | ## Modeling Standard:
36 | 
37 | - **Star Schema:**
38 |   - main modeling standard for business intelligence solution
39 |   - ![img.png](../images/4.0.1.png)
40 |   - On a star schema, you have a central facts table with data about something that happened, the sale of product, an ATM withdrawal, or the prescription for magical treatment.
41 |   - Then you have dimension tables that describe what happened
42 |   -  one-to-end operations between facts and dimensions
43 | - **Snowflake Schema**
44 |   - your dimensions are more normalized
45 |   -  These decreases data repetition, but makes your models more complex


--------------------------------------------------------------------------------
/4_DataWareHousing_in_Azure/4.4_MicroSoft_PowerBI.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # MicroSoft PowerBI
 3 | 
 4 | - Power BI which wraps up all the Data work in beautifully designed reports and dashboards.
 5 | 
 6 | - **definition**: Power BI is a collection of software, service, apps and connectors that work together to create visually immersive Data visualization experiences.
 7 | 
 8 | ## Data Visualization
 9 | 
10 | - graphical representation of data and info
11 | - uses visual elements such as charts, maps, graphs and tables to help you understand trends, anomalies, and patterns on Data.
12 | - most popular Data visualization tool is Power BI.
13 | - Power BI dashboard with a few elements to it
14 |   - bar and column chart
15 |   - line chart
16 |   - pie and donut charts
17 |   - Entry maps
18 |   - maps
19 | 
20 | ## Products in PowerBI
21 | 
22 | - Power BI Desktop
23 |   - create the reports
24 | - Power BI Pro service
25 |   - Cloud-based service originally designed for viewing and sharing reports and dashboards.
26 | - Power BI Mobile
27 |   - view your reports on a mobile phone.
28 | 
29 | - It can get Data from over 130 different Data sources of various types such as files, online services, Databases and Datahouses and SAS applications.
30 | - These Data can be combined on a single Data model for easier reporting.
31 | - extremely flexible
32 | 
33 | ## Definitions to know in PowerBI
34 | 
35 | - Datasets
36 |   - collection of Data that Power BI will use to create the report 
37 |   - Power BI Desktop has a very powerful tool for obtaining, treaching and cleansing the Data called Power Query which also allows to model server Data sources into a single Data model. This functionality is also available on the power BI service but there, it's called Data flows.
38 | 
39 | - Visulazitions:
40 |   - visual representation of the Data sets such as line, bar or pie charts, tables and matrices and maps
41 | 
42 | - Reports 
43 |   - collection of visuals grouped together into one or more pages.
44 |   - The report is the unit of work that you will then publish into the power BI service to be viewed and shared by others.
45 | - DashBoards
46 |   - aggregations of one or more reports into a single page
47 |     
48 | - Apps
49 |   - can combine dashboards and reports into
50 |   - can be distributed internally or externally
51 |   - 
52 | 
53 | 
54 | ![img.png](../images/4.4.1.png)


--------------------------------------------------------------------------------
/2_Relational_database/2.1_shared_responsibility_model.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Cloud Service Models:- On Premises, Iaas, pass, and Saas
 3 | 
 4 | ![img.png](../images/2.1.1.png)
 5 | 
 6 | ### on-premises:
 7 | - This has been the option traditionally chosen before public clouds became widely available
 8 | - option for highly regulated industries that forbid cloud hosting
 9 | - you'll be responsible for everything, hardware, software, cabling, patching, backups, VMs, the storage, and so on
10 | - Examples of on-premises technologies are SQL Server or a physical file server running Windows Server 2019.
11 | 
12 | ### IAAS: Infrastructure-as-a-Service
13 | 
14 | - Refers to Actual Servers provided by Azure
15 | - Scaling can be done on need basis
16 | - Iass provides servers, storage and networking as a service
17 | - Maintenance of server is done by Azure
18 | - You purchase, configure your own software, OS, middleware and applications and do installation on host and maintain it.
19 | - IAAS includes VM's, Networking, Storage, Firewall, and physical hardware everything runs on.
20 | 
21 | ### PAAS: Platform-as-a-Service
22 | - It is a superset of IaaS
23 | - Apart from IAAS offering, **it will also offer Middleware and development tools, BI Services, database management Systems and more**
24 | - PaaS is designed to support the complete web application lifecycle: building, testing, deploying, managing, and updating.
25 | - So you just manage application that you develop and cloud Service provider does manage everything else which includes Security Features, data Warehouse Service, VM provisioning, Networking, etc. etc.
26 | - example: Cosmos DB
27 | - Azure SQL Managed Instance gives highest compatibility to on premise sql servers. 
28 | 
29 | ### SAAS: Software-as-a-Service
30 | - Its superset of both PASS and IASS
31 | - **You don't own Software but just pay for usage**
32 | - No maintenance to be done by you but taken care by respective Service Provider.
33 | - Example: Microsoft 365, gmail for email , Azure SQL Server, Azure AD  etc.
34 | 
35 | ![img.png](../images/2.1.2.png)
36 | 
37 | ### Serverless
38 | 
39 | - **You don't have to manage any servers**
40 | - Azure Functions are probably the best-known examples of Serverless on Azure.
41 | - Serverless architecture takes PaaS to the most extreme by fully abstracting away the server in such a way that a single function of code can be hosted, deployed, run, and managed without even having to maintain a full application.


--------------------------------------------------------------------------------
/2_Relational_database/2.3_Querying_relational_data.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Querying Relational Data
 3 | 
 4 | - SQL or structured query language was originally created in the 1970s as a way to query relational database
 5 | - By 1987, it had being made the standard by both ANSI and ISO and it has been the main language used by most relational database systems ever since.
 6 | - Vendors create their own dialects of the SQL language exchanging them with additional features. For example, SQL Server uses Transact-SQL, Oracle uses PL-SQL. Postgres SQL uses pgSQL and so on.
 7 | - Vendor specific dialects of SQL have additional features. 
 8 | 
 9 | ## SQL Language Command Types:
10 | 
11 | ### DML: Data manipulation Language 
12 | 
13 | - allow you to perform **CRUD operations on the data.**
14 | - CRUD stands for create, read, update and delete.
15 | - most commonly used commands is SELECT to retrieve data from a table
16 | - **DML focuses on data** 
17 | - common commands: 
18 |   - SELECT: read data or query data 
19 |   - INSERT, 
20 |   - UPDATE, 
21 |   - DELETE
22 |   - MERGE - merges(syncs) data
23 | 
24 | 
25 | - Add WHERE clause to filter data
26 | - Add JOIN to related tables : INNER JOIN, CROSS JOIN, FULL JOIN etc. 
27 | - Add Functions for additional logic , example MAX used below: 
28 | 
29 | example: 
30 | ```sql
31 | SELECT MAX(ListPrice) FROM Production.Product FULL  JOIN Production.ProductModel PM ON PM.ProductModelID = Product.ProductID WHERE color = 'Blue'
32 | ```
33 | 
34 | ### DDL: Data Definition Language
35 | 
36 | - **these are used to create, update, delete  new objects in database** such as   tables, views, start procedures and functions.
37 | - common DDL commands:
38 |   - CREATE : create new object
39 |   - ALTER : modify property of existing one
40 |   - RENAME
41 |   - DROP
42 | 
43 | example:
44 | ![img.png](../images/2.3.1.png)
45 | 
46 | - as we can see in above example we need to provide name and data type for each of column and also define if value will be NOT NULL
47 | 
48 | **- DML focuses on objects compared to DDL which focuses on data.**
49 | 
50 | ### DCL: Data Control Language
51 | - these are used to set permissions on database objects 
52 | - commands usd: 
53 |   - GRANT
54 |   - REVOKE
55 |   - DENY
56 | 
57 | ### TCL: Transaction Control Language
58 | - these are used to manipulate transactions 
59 | - commands used: 
60 |   - BEGIN TRAN
61 |   - COMMIT TRAN
62 |   - ROLLABACK
63 | 
64 | ### Which tools we can use to query Azure Database: 
65 | 
66 | ![img.png](../images/2.3.2.png)
67 | 
68 | 
69 | 


--------------------------------------------------------------------------------
/3_Non-Relational_Data/3.3_Azure_Storage_Services.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Azure Storage Services
 3 | 
 4 | - Azure storage account can contain, Azure blob storage, Azure files, Azure tables and Azure queues.
 5 | 
 6 | ## Azure Table Storage
 7 | - Microsoft's implementation of key value stores
 8 | - doesn't have the concepts of schema, relationships, start procedures, secondary indices or foreign keys that are present on the relationship database
 9 | - can only search based on the key but not on the values
10 | - data insertion and retrieval as expected with key value stores is quite fast regardless of the database size.
11 | - like in cosmos DB, it divides the data into partitions. You can add the partition key on your queries for even faster results, which makes the choice of the partition key an important design decision.
12 | - Azure tables is quite robust, being able to store hundreds of terabytes of data and is ideal when you need extremely fast data ingestion, such as IOT and telematics scenarios.
13 | - like cosmos DB, it supports multiple read replicas.
14 | - unlike cosmos DB, it does not support multiple write regions.
15 | - 
16 | ## Azure Blob Storage
17 | 
18 | - Microsoft's main solution for unstructured data or blobs.
19 | - available under an Azure storage account.
20 | - supports massive amounts of data along with metadata,
21 | - example: Azure blob storage  for example, if you're storing x-ray and MRI images of patients, you could store that along with metadata such as name, age, and patient ID
22 | - supports encryption including the possibility of bring your own encryption keys, which is called BYOK.
23 | - **hot tier** uses high-performance media and is therefore more expensive. So it's ideal for more frequently accessed data.
24 | - the **cool tier** is more in the middle ground. It's cheaper, but not as fast as the hot tier.
25 | - the **archive tier**, which is the cheapest but it might take a few hours to retrieve
26 | - **re-hydration**: You could use this tier for files that you're unlikely to need again but you still need to keep that for compliance reasons such as a three-year-old backup
27 | - Some uses of Azure blob storage are, serving images and documents for a website, is streaming audio and video, is storing backed up or archived data, and storing data for analytics
28 | 
29 | 
30 | ## Azure files
31 | - Azure files enables you to create file shares on the cloud accessible through the internet.
32 | - supports SMB 3.0, which is the protocol used by Windows but it's also currently in preview for NFS which is used by Linux.
33 | - supports encryption including BYOK and Azure ID
34 | - two main performance tiers standard, which uses HDD disks, and premium, which uses solid state drives or SSD
35 | - The main use for Azure files is the migration of file shares from on premises Windows servers. You can use a command line utility called AzCopy to perform the copy to the cloud.
36 | 
37 | ## Queue Storage
38 | 
39 | 
40 | 
41 | ![img.png](../images/3.3.1.png)


--------------------------------------------------------------------------------
/2_Relational_database/2.2_Relational_database_oferings_in_Azure.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Relational Database Offering in Azure
 3 | 
 4 | ## SQL server hosted on Azure
 5 | 
 6 | - Same as SQL Premises but hosted on Azure 
 7 | - IAAS level
 8 | - No hardware or cabling 
 9 | - you still manage patching, upgrade, backup, licenses etc. 
10 | - this is required in case of lift and shift scenarios where we still have compatibility difference with azure SQL database. 
11 | 
12 | ## Azure SQL database: 
13 | 
14 | - PASS level service where backup, patching etc. are all done by Azure,i.e,management of database is done by Azure
15 | - 99.9% SLA
16 | - no upfront costs
17 | - **Available in Business Tier**: 
18 |   - high speed and availability and low latency
19 |   - read only copy for reporting
20 | 
21 | - **Disadvantage**: 
22 |   - cannot install custom software as it would compromise security of azure environment
23 |   - you can shutdown azure SQL VM but you cannot shutdown azure SQL database
24 | 
25 | 
26 | - MicroSoft has created Data Migration Systems which scans your database and tells you the best migration option for you based on any compatibility issues
27 | 
28 | ## Azure SQL is actually available in three different options.
29 | 
30 | ### Single Database 
31 | - low cost and minimal administration
32 | - preferred for new projects
33 | 
34 | - limitations:
35 |   - Azure SQL Single Database can only see one database at a time, which means that you cannot create cross database queries.
36 | 
37 | ### Elastic Pool
38 | - very similar to Single Database, except that it allows multiple databases to share the same pool of resources, such as processor and memory.
39 | 
40 | ### Azure SQL Managed Instance
41 | 
42 | - close to 100% compatibility to SQL server on-premises
43 | - It came in as a response to some limitations of Azure SQL Single Databases that were preventing companies from migrating to the cloud, such as linked servers, database mail, and cross database queries and transactions.
44 | - On SQL MI, you're managing on a server, not database level, but you still enjoy the same PaaS level benefits, such as automated backups, patching, and advanced security
45 | 
46 | 
47 | ## Other Azure Database Offering for open source databases:
48 | 
49 | ### Azure database for MySQL
50 | - PaaS implementation of MySql Community Edition
51 | 
52 | ### MariaDB
53 | - newer Database Management System created by the original developers of MySQL
54 | - engine has been rewritten and optimized for better performance
55 | - And it also has some interesting new features, such as support for versioning, which allows you to create your tables as they were at different points in time. 
56 | - also built on top of the Community Edition
57 | 
58 | ### Azure database for PostgreSQL
59 | - hybrid relational object database system
60 | - PostgreSQL is extensible, and has good support for geometric data such as lines, circles, and polygons.
61 | - It has two deployment options: single server for smaller workloads, and hyper scale which uses multiple nodes for faster query performance.
62 | 
63 | 
64 | 
65 | 


--------------------------------------------------------------------------------
/1_Core_Data_Concepts/1.1_data_processing.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Data Processing:
 3 | 
 4 | - Data processing is just a conversion of raw data into meaningful information, through a specific method after it has been ingested and collected.
 5 | 
 6 | 2 Types:
 7 | 1. Batch Processing
 8 | 2. Stream Processing
 9 | 
10 | ## Batch Processing:
11 | 
12 | - In this processing mode, newly arriving data elements are collected into a group. The whole group is then processed at a future time, as a batch, when a certain condition is met
13 | - **Conditions may include**: 
14 |   - scheduled time intervals: salary processing,CC or utility bill etc.
15 |   - event based processing: cpu utilization- optimizing utilization of servers
16 |   - specific size or volume of data has arrived e.g., more volume is there on Black Friday etc. 
17 | 
18 | - Advantages: 
19 |   - Process large amount of data
20 |   - Process data at convinient times
21 | 
22 | - Disadvantages: 
23 |   - high latency
24 | 
25 |   - example: for complex analytics, such as moving the data to a Data Warehouse for Business Intelligence operations
26 | 
27 | ## Stream Processing:
28 | 
29 | - Processing data in real time mode as it arrives
30 | - Useful for time critical operations requiring immediate responses.
31 | - examples:
32 |   - Sending telemetry data from a device from the edge. Device could be IoT device, mobile phone etc.
33 | 
34 | - Most organizations might require a combination of batch and stream processing for the day to day operations
35 | - Stream processing is used for simpler, more reactive situations, or small calculations.
36 | 
37 | ### Batch Processing Vs Stream Processing:
38 | 
39 | ![img.png](../images/1.1.1.png)
40 | 
41 | 
42 | 
43 | ## Order of Data Processing:
44 | 
45 | - Data processing generally extracts the data from a source, transforms that into a format more suitable to work with, and loads it into a destination
46 | -  Microsoft has ETL and ELT tools available both on premises, which is called **SQL Server Integration Services** or **SSIS**, and on the cloud, which is called Azure Data Factory
47 | - Two approaches:
48 |   - **ETL**: 
49 |     - Extract, Transform, Load
50 |     - Traditional Business Intelligence processes used ETL, which means extracting data from a source, usually a database, transforming it through operations such as filters, sorters, and lookups, and loading this data into a destination, generally a Data Warehouse.
51 |     - data is fully processed before it is loaded into destination.
52 |     - requires high upfront work to create Data Warehouse
53 |     - once data is processed, we will have higher confidence that its compliant, well-structured, and easily queried.
54 |     
55 |   - **ELT**: 
56 |     - Extract, Load, Transform
57 |     - performs the transformations after the load by the destination system itself
58 |     - provides more agility for your development team to change queries on the fly, in case the business needs change often
59 |     - fulfuil our need to experiment with several different possibilities, a common occurrence on advanced analytics workloads
60 |     - ELT only became feasible on more recent years, as storage became cheaper, and after the development of technologies such as **PolyBase**, **Data Lakes**, and **Massive Parallel Processing MPP systems, like Azure Synapse Analytics**.
61 | 
62 | ![img.png](../images/1.1.2.png)


--------------------------------------------------------------------------------
/4_DataWareHousing_in_Azure/4.3_data_analytics_tools.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # data analytics tools:
 3 | 
 4 | - Analytical Tools to obtain valuable insights.
 5 | 
 6 | 
 7 | ## Databricks
 8 | 
 9 | - most important analytics tools in Azure.
10 | - advanced analytics and machine learning platform **based on Apache Spark** which is a parallel processing engine for large scale analytics
11 | - Spark is designed to handle massive amounts of data by distributing the work across a cluster of computers, which considerably reduces the time needed to complete the analysis.
12 | - Databricks uses a collaborative workspace that allows data engineers, data scientists and business analysts to work together. 
13 | - It uses the concept of notebooks, which is a web based mix of runnable code, visualizations and text.
14 | - The code runs in a series of steps called cells. Cells can support several languages such as Python, R, Scala, Java, and SQL. 
15 | - Databricks supports the stream processing and can connect to several other Azure tools including data link storage, SQL databases, data warehouses, and Cosmos DB.
16 | - Azure Data Factory also has Databricks activities. So you can call Databricks from an ADF pipeline.
17 | 
18 | ## HDInsight
19 | - managed analytic service **based on Apache Hadoop**, a collection of open source tools that can process large amounts of data through a set of clusters.
20 | - HDInsight supports several analytics frameworks such as Hadoop, MapReduce, Apache Spark, Apache Hive, Apache Kafka, Storm, R, and more
21 | - It stores the data using Azure Data Lake Storage and it's easily integrated with all their Azure tools and services.
22 | 
23 | 
24 | ## Data Modeling:
25 | 
26 | - **Azure Analysis Services:**
27 |   - This enables you to build tabular models to support your OLAP queries. 
28 |   - So the focus here is on analytics not transactional workloads.
29 |   - The tool combines a graphical designer that helps you define the queries by combining, filtering, and, aggregating the data.
30 |   - You can also use a Microsoft BI language called DAX for the query building
31 |   - **best suited for smaller databases in less computational heavy workloads**
32 |   - easier development experience and it's more easily integrated to Power BI
33 | 
34 | - **Azure Synapse Analytics**
35 |   - advanced analytics and machine engine based on Spark and notebooks, similar to Databricks.
36 |   - supports a wide variety of languages such as PolyBase, C-sharp, Python, Scala, and Spark SQL and also several file formats such as CSV, JSON, XML, Parquet, ORC, and AVRO
37 |   - Azure snaps links for Azure Cosmos DB, which allows for hybrid transactional analytics processing or HTAB. HTAB tab is a mix of OLTP and OLAP. 
38 |   - In Azure Synapse Studio, which is a web interface used to manage Synapse Analytics as well as create, edit, and debug both SQL and Spark codes
39 |   - main advantages of using Azure Synapse Analytics is it's massive **scalability**.
40 |   - **better for high volumes of data, several terabytes to a few petabytes to perform very complex calculations, and to implement complex ELT operations because MPP Engine of Spark and SQL Clusters allow for a higher scalability.**
41 | 
42 | 
43 | - You can use Azure Synapse Analytics for the heavy lifting of dealing with large amounts of data and calculations and Azure Analysis Services to better serve your business users.


--------------------------------------------------------------------------------
/2_Relational_database/2.4_Relational_data_management_task.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Relational Data Management Task:
 3 | 
 4 | - here we will discuss on tools to manage databases operations such as creating, deploying, configuring, searching, permissions, and security, and so on
 5 | 
 6 | - The most common option is the **Azure portal** or https://portal.azure.com
 7 |   - Management via GUI
 8 | - Azure CLI
 9 |   - for automation
10 | - Azure PowerShell
11 |   - for automation
12 | 
13 | - ARM templates
14 |   - that's azure resource manager
15 |   - These are JSON files that describe the settings that you want you to configure for the resource. Which you can later provision using either Azure CLI or Azure PowerShell
16 | 
17 | ## database security
18 | 
19 | - all Azure databases are protected by server-level firewalls 
20 | - it prevents access to database resources from other networks
21 | - All access to your databases is blocked by the firewall by default. So remember that you need to configure the firewall rules before your client applications will be able to connect to them
22 | - You might also need to enable outgoing traffic on your company's firewall
23 | - Azure PostgreSQL uses port **5432**, and Azure MySQL and MariaDB both use port **3306**
24 | - 
25 | ### database encryption:
26 | 
27 | - Azure databases also support encryption, which guarantees that the data doesn't appear as plain text
28 | - Encryption occurs both in-transit with SSL connections enabled automatically, and at rest by encryption the database itself.
29 | - **the technology that is used to encrypt an Azure SQL database is a proprietary Microsoft one called Transparent Data encryption, or TD**
30 | 
31 | ### database threat protection:
32 | - We also have Advanced Threat Protection, which detects unusual activities in accesses to your databases, alerting you of suspicious events. 
33 | - ATP's available for Azure SQL, Azure MI, Synapse Analytics, and it's currently preview for MariaDB, MySQL, and PostgreSQL.
34 | 
35 | ### database authentication and permissions:
36 | - Authentication defines how we validate your identity on the database. 
37 | - For authentication, **all four databases support both native SQL authentication and Azure AD**, with the exception of Azure MariaDB, which does not yet have Azure AD integration.
38 | 
39 | - **Azure AD integration is the preferred option** as it has several advantages over SQL authentication, such as centralized management of identities across several Microsoft and Azure resources, so that you don't need one account for each server.
40 | - In Multi-Factor Authentication or MFA, which allows for more than one form of identification, considerably increasing security.
41 | 
42 | 
43 | 
44 | - Once you're authenticated, you need to make sure that you have the proper permissions to access the database resources.
45 |   - **2 permission levels available:**
46 |     1. 
47 |        - **one** is permission on database resource itself which allows you to configure CPU, memory etc.
48 |        - **this includes Azure Role based access controls or RBAC** with a set of **built-in groups and well-defined permissions**
49 |        - You just need to put your administrators on the proper roles and all the relevant permissions are automatically assigned.
50 |     2. 
51 |        - **second** **refers to the data in objects inside the database**
52 |        - For example, to be able to access a table on an Azure SQL database, you need to have at least a select permission on that table. Azure SQL has several built-in database roles available to simplify this management, such as **DB Owner, which has full access, DB DDL Admin, which can execute DDL commands, and DB Data Reader, which can read all the tables.**


--------------------------------------------------------------------------------
/3_Non-Relational_Data/3.2_CosmosDB.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Cosmos DB
 3 | 
 4 | - Azure Cosmos DB is the main Microsoft NoSQL database management system, storing data as JSON documents.
 5 | - It works on a Platform as a Service level, so several administrative tasks are managed for you.
 6 | - Cosmos DB is **multi-model**, which means that it supports documents, key-value pairs, graph and column-family data, depending on the API that you choose.
 7 | - It's also very fast, guaranteeing **less than 10 milliseconds latency** for both reads and writes 99% of the time. That's because the data is spread across partitions on several nodes.
 8 | - These makes Cosmos DB an excellent choice for IoT and telematics, gaming and the highly responsible mobile and web applications on a global scale
 9 | - Microsoft itself uses Cosmos DB on several of their mission-critical applications including Skype, Xbox, Office 365, and Azure
10 | - capability of supporting multiple read replicas and write regions.
11 | - 
12 | 
13 | 
14 | ## APIs supported by CosmosDB
15 | 
16 | ### SQL API
17 | - default native API
18 | - supports SQL-like commands
19 | 
20 | ### Gremlin API
21 | - enables you to implement a graph database on Cosmos DB
22 | - you can also query graph data as a JSON document using a SQL-like language
23 | 
24 | 
25 | 
26 | next three APIs are less focused on new projects but recommended instead when you're migrating from Azure Tables, MongoDB or Cassandra.
27 | 
28 | ### Table API
29 | 
30 | - allows you to migrate your key value pairs from Azure Tables
31 | - customers start with Azure Tables because of its low price, simplicity and high throughput
32 | 
33 | ### MongoDB API
34 | - allows to migrate from Mongo DB
35 | - MongoDB is another very old established document database. But customers may decide to migrate to Cosmos, either because it fits their IT strategy or to leverage the PaaS-level capabilities of the product, including automated backups and indexing.
36 | 
37 | ### Cassandra API
38 | - recommended for migrations from a Apache Cassandra, which is another famous on-premises column-family database management system
39 | 
40 | 
41 | ## how Cosmos DB manage the data.
42 | 
43 | ![img.png](../images/3.2.1.png)
44 | 
45 | - The top level on Cosmos DB is a Cosmos DB account.
46 | - We can have 50 accounts under azure subscription
47 | - next we can have one or more databases under each of those accounts 
48 | - under each database we have containers. 
49 | - Containers are the units of scalability for both throughput and storage, which means that it's on the container level that you configure a Cosmos DB performance
50 | 
51 | - depending on which API you have configured for Cosmos DB Container will mean diffefnt things 
52 |   - For Gremlin API -> Container Resource type would be graph
53 | - Containers will not only host data items but also other database elements, such as triggers, stored procedures and functions
54 | - items are the ones holding the data.
55 | - it can even hold these small binary files up to two megabytes in size. If you need more than that, however, you can always create a reference to an external Azure Blob Storage
56 | - the data type of the item will depend on which API you have configured for Cosmos
57 |     - If you have configured the Gremlin API, for example, the items will be nodes and edges
58 | 
59 | ## Cosmos DB Management Task
60 | 
61 | - **provisioning**
62 |   - creation of the resource on Azure
63 |   - Azure Portal, Azure CLI, Azure PowerShell, ARM templates
64 |   - Provisioning cosmos db resource you need to define amount of resource allocatd for it in **Request Units per second**. RUs are also the billing unit of Cosmos DB
65 |   - The minimum throughput is 400 RUs per second,
66 |   - can configure Rus both at the database and the container levels
67 | 
68 | - **replication**
69 |   - As Cosmos DB is a PaaS-level solution, both replication and failover, in case of a failure, happen automatically within a single region giving Cosmos DB a guaranteed high availability of 99.99%.
70 |   - can always configure multi-master replication, which means that every node can write
71 | 
72 | -  Data Explorer, Cosmos DB Data Migration tool if you want to perform data migrations
73 | - which is available in the Azure portal under your Cosmos DB resource.
74 | - This is a downloadable tool available on GitHub.


--------------------------------------------------------------------------------
/4_DataWareHousing_in_Azure/4.2_data_ingestion_components.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # data ingestion components:
 3 | 
 4 | ## Azure Data Factory
 5 | 
 6 | - Azure Data Factory is a data ingestion, transformation and orchestration-managed service on the cloud for data engineers, perfect for data integration workflows.
 7 | - It can ingest large amount of raw data from several sources, both on premises and in the cloud.
 8 | - It has connectors for most Azure services and even services from cloud competitors, such as Google and AWS, dozens of relational and non-relational database and data warehouses, such as SQL Server, Oracle, Cassandra, MongoDB and SAP HANA, several SaaS applications, such as Dynamics, Jira and Salesforce
 9 | - supports several file formats, including JSON, XML, Parquet, Avro and ORC.
10 | - can also clean, transform and restructure data, as well as filter out data that might be corrupt or duplicated, supporting therefore both ETL and ELT processes.
11 | - These transformation tasks can be done by Azure Data Factory itself through a feature called **mapping data flows**. However, these transformations can also be done by other Azure services, such as **Databricks, and HDInsight.**
12 | 
13 | ### main components of Data Factory
14 | 
15 | - **pipeline**
16 |   - logical group of activities that perform unit of work
17 |   - its the actual work that we do in data factory 
18 |   - can be created both via GUI or through code
19 |   - **Activities in pipeline:**
20 |     - **Data movement activities** move data between a source and a destination. It's also called **sink**.For example, a copy from SQL Server to Cosmos DB.
21 |     - **Data transformation activities** perform some change on the data.
22 |       - This data change could be executed by the Data Factory itself, which is called mapping data flow or by calling an external compute resource, such as Databricks or Hive
23 |     - **control flow activities**  which are a way to implement coding logic on your pipeline, such as pinging variable or executing a loop etc.
24 | 
25 | - **Integration runtimes** 
26 |   - these are the compute infrastructure of Data Factory, which is needed to execute your activities.
27 | - **linked services**
28 |   - A linked service provides the information that you need to connect to the source, a destination or a compute resource.
29 |   - they basically tell you where to find your external data or service.
30 | 
31 | - **datasets**
32 |   - representation of the data that you're working with.
33 |   - While linked services tell you where to find the data, datasets tells you its details and structure and format, such as JSON or XML.
34 | 
35 | - **triggers**: 
36 |   - Data Factory component that initiate the execution of a pipeline
37 |   - example: event based triggers
38 | 
39 | ## SSIS or SQL Server Integration Services.
40 | 
41 | - SSIS is the on-premises counterpart of Data Factory and it's part of SQL Server.
42 | - can add Azure support to SSIS by installing the Azure feature pack for SSIS.
43 | - you can also run your existing SSIS packages on Azure Data Factory, which is useful for migration scenarios
44 | - scalability is limited to the performance of the server where SSIS is installed.
45 | 
46 | 
47 | ## PolyBase
48 | 
49 | - feature of both SQL Server and Azure Synapse Analytics that enables you to run Transact-SQL commands on external data sources, such as Azure Data Lake, Blob Storage, Hadoop or Spark just as if they were SQL tables.
50 | - 
51 | ## datalakes
52 | 
53 | - repository for large amounts of raw data.
54 | - semi-structured or unstructured.
55 | - used as a staging layer for your ingested data before this data is structured and loaded into a final destination, which is generally a data warehouse solution.
56 | - **2 main services:**
57 |   - **Azure Data Lake Storage** 
58 |     - Azure Data Lake Storage provides a file repository that can store near unlimited amounts of data.
59 |     - compatible with the Hadoop Distributed File System, HDFS, and can be accessed directly by Azure Data Factory, Databricks, HDInsight, Data Lake Analytics and Stream Analytics
60 |     - Ideally, you should place your data lake on the same Azure data center as your analytics tools, otherwise you incur bandwidth costs as the data transverses the regions
61 |     - 
62 |   - **Azure Data Lake Analytics**
63 |     - Azure Data Lake Analytics is an on-demand analytics job service that you can use to process big data.
64 |     - has a set of tools that allow you to create jobs that can transform data and extracting sites.
65 |     - You write those jobs using U-SQL, which is a hybrid programming language that mixes SQL and C#.


--------------------------------------------------------------------------------