├── Technical Guides ├── 8-Understanding how to scale.md ├── 3-Understanding data factory pipelines.md ├── 1-Understanding ephemeral blobs.md ├── TechnicalGuide-ToC.md ├── 2-Understanding data ingestion.md ├── 4-Understanding logical datawarehouses.md ├── 5-Understanding data warehouse flip.md ├── 7-Understanding the job manager.md └── 6-Understanding tabular model refresh.md ├── Solution Overview └── Accelerating BI and Reporting solutions on Azure.docx ├── img ├── BlobToSQLDWLoad.png ├── SSAS-Model-Cache.png ├── restrict-adminui.png ├── troubleshoot-retry.png ├── ConfiguringSQLDWforTRI.png ├── InteractivePowerBISetup.png ├── azure-function-failure.png ├── powerbi_assets │ ├── textbox.png │ ├── all_tables.png │ ├── internet_sales_summary.png │ └── internet_sales_amt_by_customers.png ├── restrict-controlserver.png ├── restrict-azure-resources.png ├── adminui_assets │ ├── adminui-data.png │ ├── adminui-details.png │ ├── adminui-metrics.png │ ├── adminui-metrics2.png │ ├── adminui-overview.png │ └── adminui-dashboard.png ├── ChangeADFLinkedServicePassword.png ├── reportingserver_assets │ ├── ssrs-url.png │ ├── ssrs-email.png │ ├── ssrs-home.png │ ├── market-search.png │ ├── sendgrid-smtp.png │ ├── ssrs-instance.png │ ├── subscribe-1.png │ ├── subscribe-2.png │ ├── authentication.png │ └── sendgrid-config.png ├── azure-arch-enterprise-bi-and-reporting.png └── restart_ctrl_server_assets │ ├── deployments.png │ ├── deployments2.png │ ├── deployments3.png │ └── deployments4.png ├── User Guides ├── 17-Get Help and Support.md ├── 14-Set up incremental loads.md ├── 19-Restarting a Control Server Deployment.md ├── 12-Load historical tabular models.md ├── 18-Deleting a deployment.md ├── 11-Load historical data into the warehouse.md ├── 3-Troubleshoot the Deployment.md ├── 10-Configure Power BI.md ├── 1-Prerequisite Steps Before Deployment.md ├── 9-Configure SQL Server Reporting Services.md ├── UsersGuide-TOC.md ├── 15-Monitor and Troubleshoot Data Pipelines.md ├── 16-Frequently Asked Questions.md ├── 2-Set up Deployment.md ├── 5-Monitor the Deployed Components.md ├── 13-Create dashboards and reports.md ├── 6-Prepare the infrastructure for your Data.md ├── Configure User Access Control.md ├── 7-Configure Data Ingestion.md ├── 4-Manage the Deployed Infrastructure.md └── 8-Configure SQL Server Analysis Services.md ├── LICENSE-CODE ├── scripts ├── Common.psm1 ├── DeployDC.ps1 ├── DeployVPN.ps1 └── CommonNetworking.psm1 ├── armTemplates ├── vpn-gateway.json └── dc-deploy.json ├── .gitignore └── README.md /Technical Guides/8-Understanding how to scale.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /Solution Overview/Accelerating BI and Reporting solutions on Azure.docx: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /img/BlobToSQLDWLoad.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/BlobToSQLDWLoad.png -------------------------------------------------------------------------------- /img/SSAS-Model-Cache.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/SSAS-Model-Cache.png -------------------------------------------------------------------------------- /img/restrict-adminui.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/restrict-adminui.png -------------------------------------------------------------------------------- /img/troubleshoot-retry.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/troubleshoot-retry.png -------------------------------------------------------------------------------- /img/ConfiguringSQLDWforTRI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/ConfiguringSQLDWforTRI.png -------------------------------------------------------------------------------- /img/InteractivePowerBISetup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/InteractivePowerBISetup.png -------------------------------------------------------------------------------- /img/azure-function-failure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/azure-function-failure.png -------------------------------------------------------------------------------- /img/powerbi_assets/textbox.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/powerbi_assets/textbox.png -------------------------------------------------------------------------------- /img/restrict-controlserver.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/restrict-controlserver.png -------------------------------------------------------------------------------- /img/powerbi_assets/all_tables.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/powerbi_assets/all_tables.png -------------------------------------------------------------------------------- /img/restrict-azure-resources.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/restrict-azure-resources.png -------------------------------------------------------------------------------- /img/adminui_assets/adminui-data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/adminui_assets/adminui-data.png -------------------------------------------------------------------------------- /img/ChangeADFLinkedServicePassword.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/ChangeADFLinkedServicePassword.png -------------------------------------------------------------------------------- /img/adminui_assets/adminui-details.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/adminui_assets/adminui-details.png -------------------------------------------------------------------------------- /img/adminui_assets/adminui-metrics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/adminui_assets/adminui-metrics.png -------------------------------------------------------------------------------- /img/adminui_assets/adminui-metrics2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/adminui_assets/adminui-metrics2.png -------------------------------------------------------------------------------- /img/adminui_assets/adminui-overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/adminui_assets/adminui-overview.png -------------------------------------------------------------------------------- /img/reportingserver_assets/ssrs-url.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/ssrs-url.png -------------------------------------------------------------------------------- /img/adminui_assets/adminui-dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/adminui_assets/adminui-dashboard.png -------------------------------------------------------------------------------- /img/reportingserver_assets/ssrs-email.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/ssrs-email.png -------------------------------------------------------------------------------- /img/reportingserver_assets/ssrs-home.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/ssrs-home.png -------------------------------------------------------------------------------- /img/reportingserver_assets/market-search.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/market-search.png -------------------------------------------------------------------------------- /img/reportingserver_assets/sendgrid-smtp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/sendgrid-smtp.png -------------------------------------------------------------------------------- /img/reportingserver_assets/ssrs-instance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/ssrs-instance.png -------------------------------------------------------------------------------- /img/reportingserver_assets/subscribe-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/subscribe-1.png -------------------------------------------------------------------------------- /img/reportingserver_assets/subscribe-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/subscribe-2.png -------------------------------------------------------------------------------- /img/azure-arch-enterprise-bi-and-reporting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/azure-arch-enterprise-bi-and-reporting.png -------------------------------------------------------------------------------- /img/powerbi_assets/internet_sales_summary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/powerbi_assets/internet_sales_summary.png -------------------------------------------------------------------------------- /img/reportingserver_assets/authentication.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/authentication.png -------------------------------------------------------------------------------- /img/reportingserver_assets/sendgrid-config.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/reportingserver_assets/sendgrid-config.png -------------------------------------------------------------------------------- /img/restart_ctrl_server_assets/deployments.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/restart_ctrl_server_assets/deployments.png -------------------------------------------------------------------------------- /img/restart_ctrl_server_assets/deployments2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/restart_ctrl_server_assets/deployments2.png -------------------------------------------------------------------------------- /img/restart_ctrl_server_assets/deployments3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/restart_ctrl_server_assets/deployments3.png -------------------------------------------------------------------------------- /img/restart_ctrl_server_assets/deployments4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/restart_ctrl_server_assets/deployments4.png -------------------------------------------------------------------------------- /img/powerbi_assets/internet_sales_amt_by_customers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/azure-arch-enterprise-bi-and-reporting/HEAD/img/powerbi_assets/internet_sales_amt_by_customers.png -------------------------------------------------------------------------------- /User Guides/17-Get Help and Support.md: -------------------------------------------------------------------------------- 1 | # Getting Help and Support 2 | This reference implementation is supported during business hours via email by Microsoft at azent-biandreporting@microsoft.com, and also by [Artis Consulting](http://www.artisconsulting.com/) - a Microsoft Certified Partner. -------------------------------------------------------------------------------- /Technical Guides/3-Understanding data factory pipelines.md: -------------------------------------------------------------------------------- 1 | # Understanding Data Factory Pipelines 2 | 3 | Enterprise BI and Reporting TRI relies on [Azure Data Factory](https://azure.microsoft.com/en-us/services/data-factory/) to ingest data from the Ephemeral Blob storage into SQL Data Warehouses. 4 | 5 | When the Job Manager server initiates loading of a given file into a given physical Data Warehouse, the Job Manager will call Azure Data Factory APIs to create and start a one-time pipeline. This pipeline contains the following three activities: 6 | 1. Pre-load activity 7 | 2. [Copy activity](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-overview) ingests data from a given Ephemeral Blob storage file (source) into the given SQL Datawarehouse table (sink). 8 | 3. Post-load activity runs a specified Stored Procedure to produce aggregate facts after dependent tables have been ingested. 9 | -------------------------------------------------------------------------------- /User Guides/14-Set up incremental loads.md: -------------------------------------------------------------------------------- 1 | # Setting up Incremental Loads 2 | 3 | ## Review the data load pipeline setup 4 | 5 | * Ensure that you have completed the tasks listed in [step 6](./6-Prepare%20the%20infrastructure%20for%20your%20Data.md) 6 | * Deploy the upload script you created as part of [step 7](./7-Configure%20Data%20Ingestion.md) to on-premise system. 7 | 8 | ## Extract and Upload Incremental Data 9 | * Extract incremental data and put in the storage space where the on-premise system can access. 10 | * Run the upload script to authenticate, retrieve blob location, upload file and then register with job manager. 11 | * Schedule the upload jobs for each Fact and Dimension you need to upload. 12 | 13 | ## Monitor SSAS partition refresh 14 | * Once incremental data gets uploaded, there will be tasks created for partition builder. Once partition builder tasks are completed, you can check the partition builder machine to check the last refresh time for each table. You can also validate the refreshed data for accuracy and completeness. 15 | -------------------------------------------------------------------------------- /LICENSE-CODE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | Copyright (c) Microsoft Corporation 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and 5 | associated documentation files (the "Software"), to deal in the Software without restriction, 6 | including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 7 | and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, 8 | subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all copies or substantial 11 | portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT 14 | NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 15 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 16 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 17 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /Technical Guides/1-Understanding ephemeral blobs.md: -------------------------------------------------------------------------------- 1 | # Understanding Ephemeral Storage Accounts 2 | 3 | Enterprise BI and Reporting TRI Job Manager maintains [Azure Blob](https://docs.microsoft.com/en-us/azure/storage/) accounts which serve as the staging area for the data to be ingested into TRI. The Job Manager will create a new Azure Blob account every 24 hours. After a new account is created, previously created accounts will be deleted once all the data uploaded to those accounts is successfully ingested. This ensures that one storage account contains no more than 24 hours worth of data. As a result, if one of the customer's data sources was compromised, no more than 24 hours worth of data would be impacted. 4 | 5 | The Job Manager exposes an API to fetch the SAS URI for the currently active Storage Account. Therefore, when integrating on-prem data ingestion systems with the TRI, the first step is to always fetch the SAS URI for the currently active Azure Blob account. For a code sample on how the API is called, please refer to [Configuring Data Ingestion](../User%20Guides/7-Configure%20Data%20Ingestion.md#1-modify-the-code-provided-in-the-tri-to-ingest-your-data). 6 | -------------------------------------------------------------------------------- /User Guides/19-Restarting a Control Server Deployment.md: -------------------------------------------------------------------------------- 1 | # Restart a Failed Control Server 2 | The Control Server as the brain box of the entire system is still susceptible to failure. The following steps describes how to re-deploy a failed Control Server. 3 | 4 | #### Restart Control Server Deployment 5 | 1. Take a note of your resource group name. 6 | 7 | 2. Login to [Azure Portal](https://portal.azure.com) 8 | 9 | 3. Locate your resource group 10 | 11 | 4. Open the **Deployments** tab under your resource group. 12 | ![Deployment Tab](../img/restart_ctrl_server_assets/deployments.png) 13 | 5. Click on **deployVMCtrlsv** 14 | ![Click on Deploy Ctrl Server VM](../img/restart_ctrl_server_assets/deployments2.png) 15 | 6. Click on **Redeploy** to re-provision the Control Server VM 16 | ![Click on Deploy Ctrl Server VM](../img/restart_ctrl_server_assets/deployments3.png) 17 | 7. This opens the **Custom Deployment** 18 | ![Deployment Tab](../img/restart_ctrl_server_assets/deployments4.png) 19 | 8. Ensure the following are fulfilled: 20 | - Use your existing resource group 21 | - Ensure all the settings are the same as your previous deployment. 22 | 8. Click on the **Purchase** button to deploy the virtual machine again. 23 | -------------------------------------------------------------------------------- /User Guides/12-Load historical tabular models.md: -------------------------------------------------------------------------------- 1 | # Load historical data into tabular models. 2 | 3 | ## Summary 4 | 5 | This file contains steps to refresh the SSAS Models with historical loads. 6 | 7 | ## 1. Data Model for SSAS. 8 | Please ensure that you have created the SSAS models and configured them in the Job Manager as indicated in [Configure SQL Server Analysis Services](./8-Configure%20SQL%20Server%20Analysis%20Services.md) 9 | 10 | ## 2. Locate the Partition Builder Machine. 11 | * Login into [Azure portal](https://portal.azure.com) and locate your resource group. In the list of resources search for the partition builder Virtual Machine ending in the string "ssaspbvm00". 12 | * Remote Desktop into the Machine using your credentials. 13 | * Start SQL Server Management Studio (SSMS) and connect to Analysis Server. 14 | * Process the Model for full refresh. 15 | 16 | ## 3. Copy database to SSAS Read-Only nodes 17 | There are multiple ways you can sync up the SSAS tabular model from Partition builder to SSAS Read-Only nodes. One way is to backup the [database and restore](https://docs.microsoft.com/en-us/sql/analysis-services/multidimensional-models/backup-and-restore-of-analysis-services-databases) on each Read-Only node. Another option is to use the following [link](https://docs.microsoft.com/en-us/sql/analysis-services/multidimensional-models/synchronize-analysis-services-databases). 18 | -------------------------------------------------------------------------------- /Technical Guides/TechnicalGuide-ToC.md: -------------------------------------------------------------------------------- 1 | # Technical Guides for Enterprise BI and Reporting 2 | 3 | The following documents describe the technical details of the various operational components of the TRI after it has been successfully deployed. 4 | 5 | 1. [Understanding Ephemeral Storage Accounts](./1-Understanding%20ephemeral%20blobs.md) - Explains ephemeral blobs as the intermediary stage between data upload and ingestion. 6 | 7 | 2. [Understanding data ingestion](./2-Understanding%20data%20ingestion.md) - Explains the checks for valid data slices, and steps taken to load each data slice into physical data warehouses from blob storage. 8 | 9 | 3. [Understanding data factory pipelines](./3-Understanding%20data%20factory%20pipelines.md) - Explains how the pipeline created by Azure Data Factory moves data from the Ephemeral Blob to the Data Warehouse. 10 | 11 | 4. [Understanding Logical Data Warehouses](./4-Understanding%20logical%20datawarehouses.md) - Explains the purpose and requirements of logical groupings of data warehouses. 12 | 13 | 5. [Understanding Data Warehouse Flip](./5-Understanding%20data%20warehouse%20flip.md) - Explains the details of how the DWs coordinate between being in a loading state and queryable active state. 14 | 15 | 6. [Understanding Tabular Model Refresh](./6-Understanding%20tabular%20model%20refresh.md) - Explains how the TRI operationalizes and manages tabular models in Analysis Services for interactive BI. 16 | 17 | 7. [Understanding the job manager](./7-Understanding%20the%20job%20manager.md) 18 | 19 | 8. [Understanding how to scale](./8-Understanding%20how%20to%20scale.md) 20 | -------------------------------------------------------------------------------- /User Guides/18-Deleting a deployment.md: -------------------------------------------------------------------------------- 1 | # Deleting the Deployment 2 | 3 | Since the deployment provisions resources in your subscription, you will be billed for the usage. If you want to discard use of the deployment, we recommend deleting the resources that were provisioned so that you are not billed for the usage. 4 | Since deleting the deployment will also remove the data in your physical data warehouses, please consider backing up your data before deleting the deployment. 5 | 6 | In order to delete the deployment, please do the following: 7 | 8 | ## 1. Delete the resource group 9 | 1. Find your deployment resource group in the [Azure portal](https://portal.azure.com). 10 | 2. Click on 'Delete resource group' and follow the prompts to delete the resource group. 11 | 3. If you have locks that are preventing deletion of the resource group, you will have to find and delete them in the Locks pane for your resource group. 12 | 13 | ## 2. Delete the deployment in the Cortana Intelligence Solution Deployments 14 | 1. Find your [deployments on Cortana Intelligence Solutions](https://start.cortanaintelligence.com/Deployments/) and delete it. 15 | 16 | ## 3. Delete the Azure Active Directory applications 17 | Since the Active Directory applications are provisioned as part of the tenant and not the subscription, they have to be deleted separately. 18 | 19 | 1. On the [Azure Portal](https://portal.azure.com), click on 'All Services' on the left menu and search for 'Azure Active Directory' to launch your Azure Active Directory Management UI. 20 | 2. Click on 'App Registrations' and enter the name of your deployment in the search bar which will show all the applications that were provisioned for this deployment. 21 | 3. Follow the prompt to open and then delete all of the applications. 22 | -------------------------------------------------------------------------------- /User Guides/11-Load historical data into the warehouse.md: -------------------------------------------------------------------------------- 1 | # Load historical data into the warehouse. 2 | 3 | ## Summary 4 | This page lists ways you can load historical data into the SQL Data Warehouse 5 | 6 | Loading of historical data will depend on the data size and retention policy on Azure Cloud. Depending on the size of the data, you can use either of following approaches 7 | 8 | ## 1. Use the Solution itself to upload files and load the data. 9 | 10 | In case your historical data size is small (GB to TB), you can use the following steps to load data. 11 | * Export the data from your existing solution into on-premise storage as csv files. 12 | * Setup the data Ingestion pipeline as indicated in [Configure Data Ingestion](./7-Configure%20Data%20Ingestion.md) 13 | * Upload each data file as indicated in step 2 and let the solution ADF pipelines load into the SQL DW's 14 | 15 | ## 2. Export the data and Upload using Blob Storage 16 | 17 | In case your data is in low GB's or TB's which can be transferred over the network, you can follow the steps below to load data. 18 | 19 | * Export the historical data from your existing solution into on-premise storage as CSV files. 20 | * Create a new Blob Storage account in your resource group and upload all the files 21 | * Follow the [ documentation link ](https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-get-started-load-with-polybase) to load data into both Reader and Loader SQL DW's. 22 | 23 | 24 | ## 3. Use Microsoft Import/Export Service to transfer data to Storage. 25 | 26 | In case your data is very large and cannot be uploaded using the network, you can use the Microsoft Import/Export Service to transfer data. Once data is transferred you can use the steps mentioned above in Step 2 to load into SQL DW from Blob Storage account. 27 | -------------------------------------------------------------------------------- /scripts/Common.psm1: -------------------------------------------------------------------------------- 1 | function New-ResourceGroupIfNotExists ($resourceGroupName, $location) { 2 | $resourceGroup = Get-AzureRmResourceGroup -Name $resourceGroupName -ErrorAction SilentlyContinue 3 | if(-not $resourceGroup) { 4 | New-AzureRmResourceGroup -Name $resourceGroupName -Location $location 5 | Write-Output ("Created Resource Group $resourceGroupName in $location."); 6 | return; 7 | } 8 | 9 | $isSameLocation = $resourceGroup.Location -eq $location 10 | if(-not $isSameLocation) { 11 | throw "Resource Group $resourceGroup exists in a different location $resourceGroup.Location .. Delete the existing resource group or choose another name." ; 12 | } 13 | 14 | Write-Output ("Resource Group $resourceGroupName already exists. Skipping creation."); 15 | } 16 | 17 | function New-Session () { 18 | $Error.Clear() 19 | $ErrorActionPreference = "SilentlyContinue" 20 | Get-AzureRmContext -ErrorAction Continue; 21 | foreach ($eacherror in $Error) { 22 | if ($eacherror.Exception.ToString() -like "*Run Login-AzureRmAccount to login.*") { 23 | Add-AzureAccount 24 | } 25 | } 26 | $Error.Clear(); 27 | $ErrorActionPreference = "Stop" 28 | } 29 | 30 | function Load-Module($name) 31 | { 32 | if(-not(Get-Module -name $name)) 33 | { 34 | if(Get-Module -ListAvailable | 35 | Where-Object { $_.name -eq $name }) 36 | { 37 | Import-Module -Name $name 38 | Write-Host "Module $name imported successfully" 39 | } #end if module available then import 40 | 41 | else 42 | { 43 | Write-Host "Module $name does not exist. Installing.." 44 | Install-Module -Name $name -AllowClobber -Force 45 | Write-Host "Module $name installed successfully" 46 | } #module not available 47 | 48 | } # end if not module 49 | 50 | else 51 | { 52 | Write-Host "Module $name exists and is already loaded" 53 | } #module already loaded 54 | } 55 | -------------------------------------------------------------------------------- /Technical Guides/2-Understanding data ingestion.md: -------------------------------------------------------------------------------- 1 | # Understanding Data Ingestion 2 | 3 | Once the data is uploaded to the Ephemeral Blob storage (see [Understanding Ephemeral Blobs](./1-Understanding%20ephemeral%20blobs.md)), the clients must call a Job Manager API to create a `DWTableAvailabilityRange` entity. The table below summarizes the contract for `DWTableAvailabilityRange`. 4 | 5 | | Name | Description | 6 | | ---- | -------- | 7 | | `DWTableName` | The name of the table in SQL Datawarehouse to which the data will be ingested | 8 | | `StorageAccountName` | The name of the Ephemeral Blob storage where the file was uploaded | 9 | | `FileUri` | The URI to the file in the Ephemeral Blob storage above | 10 | | `StartDate` | StartDate for the slice | 11 | | `EndDate` | EndDate for the slice | 12 | | `ColumnDelimiter` | Delimiter for the column (i.e. ',') | 13 | | `FileType` | Type of file (i.e. 'CSV') | 14 | 15 | Note that `StartDate` - `EndDate` ranges for `DWTableAvailabilityRange` entities sharing the same `DWTableName` must not overlap. Job Manager will throw an exception at the time of creation if an overlapping `DWTableAvailabilityRange` is detected. 16 | 17 | Upon creation, the Job Manager will create a separate instance of `DWTableAvailabilityRange` for each physical data warehouse and return `200 - OK` status code. For data ingestion code sample, please refer to [Configuring Data Ingestion](../User%20Guides/7-Configure%20Data%20Ingestion.md#1-modify-the-code-provided-in-the-tri-to-ingest-your-data). 18 | 19 | A Job Manager background process continuously looks for `DWTableAvailabilityRange` entities that belong to physical data warehouses in `Load` state. Once such entity is found, an Azure Data Factory pipeline is created to ingest data into the physical data warehouse (see [Understanding Data Factory Pipelines](./3-Understanding%20data%20factory%20pipelines.md)). Therefore, the data is loaded into physical data warehouses in `Load` state as soon as corresponding `DWTableAvailabilityRange` is created. The remaining Physical Data Warehouses will have the data ingested after the next flip operation is performed (see [Understanding data warehouse flip](./5-Understanding%20data%20warehouse%20flip.md)). 20 | -------------------------------------------------------------------------------- /scripts/DeployDC.ps1: -------------------------------------------------------------------------------- 1 | #Requires -RunAsAdministrator 2 | #Requires -Modules AzureRM.Network 3 | #Requires -Modules AzureRM.profile 4 | 5 | # This script deploys a Domain Controller on an existing VNET resource group and assigns the DNS address of VNET to the DC IP address 6 | 7 | Param( 8 | [Parameter(Mandatory=$true)] 9 | [string]$SubscriptionName, 10 | 11 | [Parameter(Mandatory=$true)] 12 | [string]$DnsVmName, 13 | 14 | [Parameter(Mandatory=$true)] 15 | [string]$Location, 16 | 17 | [Parameter(Mandatory=$true)] 18 | [string]$ResourceGroupName, 19 | 20 | [Parameter(Mandatory=$true)] 21 | [string]$VNetName, 22 | 23 | [parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Domain name to create.")] 24 | [string]$DomainName, 25 | 26 | [parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Domain admin user name.")] 27 | [string]$DomainUserName, 28 | 29 | [parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Domain admin user password.")] 30 | [securestring]$DomainUserPassword 31 | ) 32 | 33 | $scriptPath = $MyInvocation.MyCommand.Path 34 | $scriptDir = Split-Path $scriptPath 35 | Import-Module (Join-Path $scriptDir CommonNetworking.psm1) -Force 36 | 37 | # Name of the subnet in which Domain Controller will be present 38 | $SubnetName = "DCSubnet" 39 | 40 | # Select subscription 41 | Select-AzureRmSubscription -SubscriptionName $SubscriptionName 42 | $subscription = Get-AzureRmSubscription -SubscriptionName $SubscriptionName 43 | $SubscriptionId = $subscription.Subscription.SubscriptionId 44 | 45 | # Create a subnet for DC deployment 46 | $virtualNetwork = Get-VirtualNetworkOrExit -ResourceGroupName $ResourceGroupName -VirtualNetworkName $VNetName 47 | $subnetAddressPrefix = Get-AvailableSubnetOrExit -VirtualNetwork $virtualNetwork -SubnetName $SubnetName 48 | 49 | $templateParameters = @{ 50 | adminUsername=$DomainUserName 51 | adminPassword=$DomainUserPassword 52 | domainName=$DomainName 53 | existingVirtualNetworkName=$VNetName 54 | existingVirtualNetworkAddressRange=$virtualNetwork.AddressSpace.AddressPrefixes[0] 55 | dcSubnetName=$SubnetName 56 | existingSubnetAddressRange=$subnetAddressPrefix 57 | dcVmName=$DnsVmName 58 | } 59 | 60 | $templateFilePath = Join-Path (Join-Path (Split-Path -Parent $scriptDir) 'armTemplates') 'dc-deploy.json' 61 | $out = New-AzureRmResourceGroupDeployment -Name DeployDC ` 62 | -ResourceGroupName $ResourceGroupName ` 63 | -TemplateFile $templateFilePath ` 64 | -TemplateParameterObject $templateParameters 65 | 66 | # Update the DNS server address for VNET 67 | $virtualNetwork.DhcpOptions.DnsServers = $out.Outputs.dcIp.Value 68 | Set-AzureRmVirtualNetwork -VirtualNetwork $virtualNetwork -------------------------------------------------------------------------------- /armTemplates/vpn-gateway.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json", 3 | "contentVersion": "1.0.0.0", 4 | "parameters": { 5 | "gatewayName": { 6 | "type": "string", 7 | "metadata": { 8 | "description": "Name of the VPN gateway" 9 | } 10 | }, 11 | "virtualNetworkName": { 12 | "type": "string", 13 | "metadata": { 14 | "description": "Name for the Azure Virtual Network" 15 | } 16 | }, 17 | "edwAzureVNetAddressPrefix": { 18 | "type": "string", 19 | "metadata": { 20 | "description": "CIDR block representing the address space of the Azure VNet" 21 | } 22 | }, 23 | "vpnGatewaySubnetPrefix": { 24 | "type": "string", 25 | "metadata": { 26 | "description": "CIDR block for gateway subnet, subset of edwAzureVNetAddressPrefix address space" 27 | } 28 | } 29 | }, 30 | "variables": { 31 | "apiVersion": "2015-06-15", 32 | "gatewaySubnetName": "GatewaySubnet", 33 | "vnetID": "[resourceId('Microsoft.Network/virtualNetworks', parameters('virtualNetworkName'))]", 34 | "gatewaySubnetRef": "[concat(variables('vnetID'),'/subnets/', variables('gatewaySubnetName'))]", 35 | "gatewayPublicIPName": "EDWVPNGwPublicIp" 36 | }, 37 | "resources": [ 38 | { 39 | "apiVersion": "[variables('apiVersion')]", 40 | "type": "Microsoft.Network/virtualNetworks", 41 | "name": "[parameters('virtualNetworkName')]", 42 | "location": "[resourceGroup().location]", 43 | "properties": { 44 | "addressSpace": { 45 | "addressPrefixes": [ 46 | "[parameters('edwAzureVNetAddressPrefix')]" 47 | ] 48 | }, 49 | "subnets": [ 50 | { 51 | "name": "[variables('gatewaySubnetName')]", 52 | "properties": { 53 | "addressPrefix": "[parameters('vpnGatewaySubnetPrefix')]" 54 | } 55 | } 56 | ] 57 | } 58 | }, 59 | { 60 | "apiVersion": "[variables('apiVersion')]", 61 | "type": "Microsoft.Network/publicIPAddresses", 62 | "name": "[variables('gatewayPublicIPName')]", 63 | "location": "[resourceGroup().location]", 64 | "properties": { 65 | "publicIPAllocationMethod": "Dynamic" 66 | } 67 | }, 68 | { 69 | "apiVersion": "[variables('apiVersion')]", 70 | "type": "Microsoft.Network/virtualNetworkGateways", 71 | "name": "[parameters('gatewayName')]", 72 | "location": "[resourceGroup().location]", 73 | "dependsOn": [ 74 | "[concat('Microsoft.Network/publicIPAddresses/', variables('gatewayPublicIPName'))]", 75 | "[concat('Microsoft.Network/virtualNetworks/', parameters('virtualNetworkName'))]" 76 | ], 77 | "properties": { 78 | "ipConfigurations": [ 79 | { 80 | "properties": { 81 | "privateIPAllocationMethod": "Dynamic", 82 | "subnet": { 83 | "id": "[variables('gatewaySubnetRef')]" 84 | }, 85 | "publicIPAddress": { 86 | "id": "[resourceId('Microsoft.Network/publicIPAddresses',variables('gatewayPublicIPName'))]" 87 | } 88 | }, 89 | "name": "vnetGatewayConfig" 90 | } 91 | ], 92 | "gatewayType": "Vpn", 93 | "vpnType": "RouteBased", 94 | "enableBgp": "false" 95 | } 96 | } 97 | ] 98 | } 99 | -------------------------------------------------------------------------------- /User Guides/3-Troubleshoot the Deployment.md: -------------------------------------------------------------------------------- 1 | # Troubleshooting the Deployment 2 | 3 | Even if everything is setup properly, it is still possible for the deployment to fail. Please consult the troubleshooting steps below, for some known possible failures. 4 | 5 | ## Generic Failure 6 | 7 | Although the deployment is completely automated, some deployment steps will encounter intermittent failures, or may appear to be spinning on a specific step for a long time. 8 | 9 | As a first step, refresh the web page. If the dashboard has timed you out, log back in. If the ensuing screen shows an error on a particular step, simply click the **retry** button at the top of your deployment page. After one retry the failed activity should successfully complete. If the activity still fails on retry, please consult the other troubleshooting sections. 10 | 11 | ![Retry button](../img/troubleshoot-retry.png) 12 | 13 | ## CustomScriptExtension Failure 14 | 15 | Custom script extensions may fail intermittently. In this scenario, the **retry** button will not work without first deleting the custom script extension from the VM which runs the script. 16 | 17 | ### Retry steps: 18 | 1. Find your deployment resource group in the [Azure portal](https://portal.azure.com) 19 | 2. Click the `Deployments` blade. This will show you the failed deployment (if you do not see a failed deployment, this could be due to the failure in an Azure function, see steps below for [Azure Function failures](#azure-function-failure)). 20 | 3. Click the *failed* `Deployment Name`. 21 | 4. Scroll down to the `Operation details` section and click the failed `Resource`. 22 | 5. In the `Operation details` blade find the `RESOURCE`. This resource will be of the format `/`. Make note of this resource. 23 | 6. Search for the `VM Name` in the Azure portal search box and click the Virtual Machine link. 24 | 7. Click the `Extensions` blade of the Virtual Machine page. 25 | 8. Locate the extension with `Type` of `Microsoft.Compute.CustomScriptExtension`. 26 | 9. Click the `...` and then `Uninstall` 27 | 10. Wait for the uninstall to complete. This will take a minute or two. 28 | 11. Reboot the VM with the Restart button in the Azure portal and wait 5 minutes. 29 | 12. After the reboot has completed, click the `Retry` button on the [deployments](https://start.cortanaintelligence.com/Deployments) page. 30 | 31 | ## Azure Function Failure 32 | Azure Functions are used to execute complex deployment steps that cannot be accomplished via ARM deployments. If retrying the deployment using the **retry** button does not work, please follow these steps to investigate the deployment failure. 33 | 34 | 1. Find your deployment resource group in the [Azure portal](https://portal.azure.com) and click the `Deployments` blade. If you see a failed deployment, do not proceed to the next step since it is not a failure in an Azure Function. 35 | 3. Go to your deployment resource group and click on the `Overview` blade. Then type 'functions-' in the search text box which should filter it down to the Azure Functions resource that hosts various deployment steps. Click on that Azure function resource to open it. 36 | 4. Under Functions, look for the function name that correlates somewhat to the step that failed. Alternately, you can click through each function. Click on the `Monitor` tab and examine failures in the Invocation log. Examine failures in the `Invocation log` by examining the `log`. Further investigation of the failure is required to root cause the problem. More often than not, the issue is related to the specific subscription. 37 | 38 | ![Azure Function failure](../img/azure-function-failure.png) 39 | -------------------------------------------------------------------------------- /User Guides/10-Configure Power BI.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | [PowerBI](https://powerbi.microsoft.com/en-us/) is used to create live reports and dashboards that can be shared across your organization. This document describes the steps needed to create and share a PowerBI dashboard for your Enterprise Reporting and BI Technical Reference Implementation solution. 4 | 5 | # Install PowerBI Data Gateway 6 | 7 | The [SQL Server Analysis Services (SSAS)](https://docs.microsoft.com/en-us/sql/analysis-services/analysis-services) installed as part of your solution is deployed in a private virtual network (VNET) on Azure. For PowerBI to access SSAS, you will need to install the [PowerBI Data Gateway](https://powerbi.microsoft.com/en-us/gateway/) on a virtual machine (VM) within this VNET. 8 | 9 | The following are the steps to install the PowerBI Data Gateway on your VM. 10 | 11 | ## Download the PowerBI Data Gateway installer locally 12 | 13 | First, from your local environment, browse to https://powerbi.microsoft.com/en-us/gateway, click **DOWNLOAD GATEWAY**, and click **Save As** to download the installer executable to a folder in your local environment. You will next upload and run this installer executable on your SQL Server VM. 14 | 15 | ## Log into the SQL Server VM 16 | 17 | Now, log into the SQL Server VM in your solution. Technically, the PowerBI Data Gateway could be installed on any of the VMs deployed as part of your solution, or even a new one, but the SQL Server VM is a good choice since it is not heavily used. You can find this VM by opening the [Azure portal](https://portal.azure.com/) and clicking **Resource groups**. Type in the name you gave for your solution, which is also used as the name of the resource group, to find your resource group. Click it, and sort the resources by type. Scroll down to the virtual machines and find the one ending in "sqlvm00". Click on this VM, then click the **Networking** link. The private IP listed is the IP address you will use to connect to the SQL Server VM. 18 | 19 | > NOTE: Because your VM is running in a VNET, you must first ensure that your VPN client is running before you can connect to your VM. Until it is running, you will not be able to connect. 20 | 21 | Log in to the SQL Server VM using your preferred client, such as Remote Desktop, with the private IP address and the username and password you used for your deployment. If you are using Remote Desktop, select **More choices** > **Use a different account** to enter the username and password. 22 | 23 | ## Upload the PowerBI Data Gateway installer and install 24 | 25 | Copy the installer executable from your local environment onto the VM. If you are using Remote Desktop, you can do this by copying the file locally and pasting it onto the desktop of the VM. 26 | 27 | Once there, double click it to begin the installation. Click **Next** for all default options, accept the terms of agreement, and click **Install**. The installation takes a few minutes. Next, you need to register your gateway by entering an email address and clicking **Sign in**. 28 | 29 | Now that you are signed in, click **Next** and give your on-premises data gateway a name and recovery key and click **Configure**. Your gateway is now ready. 30 | 31 | ## Create a Gateway Data Source in the Power BI Website 32 | 33 | The next step is to create a gateway data source so that the gateway knows how to connect to Analysis Services and knows which credentials to use. The steps are outlined in the [Manage your data source - Analysis Services](https://docs.microsoft.com/en-us/power-bi/service-gateway-enterprise-manage-ssas) documentation. The data source should be configured with the Server property being the SSAS Read-Only load balancer IP or DNS. 34 | -------------------------------------------------------------------------------- /User Guides/1-Prerequisite Steps Before Deployment.md: -------------------------------------------------------------------------------- 1 | # Prerequisites before deploying Technical Reference Implementation for Enterprise BI and Reporting 2 | 3 | # VNET 4 | 5 | Most of the resources provisioned will be placed in an Azure VNET that should be configured in the subscription where you will want the TRI to be deployed. Customers who already have a functioning Azure VNET and domain controller can skip this section. If not, an Azure VNET resource and a domain controller must be deployed in the subscription, following the steps provided in this guide. 6 | 7 | ## Provisioning Azure VNet resource and creating root and client certificates 8 | 9 | First, create a new Azure VNET and VPN Gateway resource. Navigate to the [scripts](../scripts) directory and run the command below, altering the parameter values to ones that apply to your environment. Note that it might take up to 45 minutes to complete. 10 | 11 | ```PowerShell 12 | Login-AzureRmAccount 13 | 14 | .\DeployVPN.ps1 ` 15 | -SubscriptionName "My Subscription" ` 16 | -ResourceGroupName "ContosoVNetGroup" ` 17 | -Location "eastus" ` 18 | -VNetName "ContosoVNet" ` 19 | -VNetGatewayName "ContosoGateway" ` 20 | -AddressPrefix "10.254.0.0/16" ` 21 | -GatewaySubnetPrefix "10.254.1.0/24" ` 22 | -OnpremiseVPNClientSubnetPrefix "192.168.200.0/24" ` 23 | -RootCertificateName "ContosoRootCertificate" ` 24 | -ChildCertificateName "ContosoChildCertificate" 25 | ``` 26 | 27 | The above script will provision the Azure VNET and VPN Gateway resources. In addition, it will create a self-signed root certificate (identified by ```ContosoRootCertificate``` in the above example), and a client certificate for the VPN gateway (identified by ```ContosoChildCertificate```). The root certificate is used for generating and signing client certificates on the client side, and for validating those client certificates on the VPN gateway side. 28 | 29 | ## Export the certificates 30 | 31 | To enable users in your organization to connect to the newly provisioned VNET via the VPN gateway, you will need to export the two certificates to generate PFX files that they can import to their devices. 32 | 33 | From the same PowerShell console that you used above, run the commands shown below to generate the PFX files in the same directory (```ContosoRootCertificate.pfx``` and ```ContosoChildCertificate.pfx``` in the example) 34 | 35 | ```PowerShell 36 | $rootCert = Get-ChildItem -Path cert:\CurrentUser\My | ?{ $_.Subject -eq "CN=ContosoRootCertificate" } 37 | $childCert = Get-ChildItem -Path cert:\CurrentUser\My | ?{ $_.Subject -eq "CN=ContosoChildCertificate" } 38 | 39 | If($rootCert.Count -gt 1) { 40 | Write-Output "More than one certificate by the name ContosoRootCertificate, selecting first certificate" 41 | $rootCert = $rootCert[0] #Selecting first certificate 42 | } 43 | 44 | If($childCert.Count -gt 1) { 45 | Write-Output "More than one certificate by the name ContosoChildCertificate, selecting first certificate" 46 | $childCert = $childCert[0] #Selecting first certificate 47 | } 48 | 49 | $securePassword = ConvertTo-SecureString -String "MyPassword" -Force –AsPlainText 50 | 51 | Export-PfxCertificate -Cert $rootCert -FilePath "ContosoRootCertificate.pfx" -Password $securePassword -Verbose 52 | Export-PfxCertificate -Cert $childCert -FilePath "ContosoChildCertificate.pfx" -Password $securePassword -Verbose 53 | ``` 54 | Share these two files with the users who need VPN access, instructing them to install these files in their client machines (using Certmgr or other tools). 55 | 56 | ## Provisioning the Domain Controller 57 | 58 | The next step is to deploy a Domain Controller VM and set up a new domain. All VMs provisioned during the solution's deployment will join the domain managed by the domain controller. 59 | 60 | To do that, run the PowerShell script below. 61 | 62 | ```PowerShell 63 | $securePassword = ConvertTo-SecureString -String "MyPassword" -Force –AsPlainText 64 | 65 | .\DeployDC.ps1 ` 66 | -SubscriptionName "My Subscription" ` 67 | -DnsVmName "contosodns" ` 68 | -Location "eastus" ` 69 | -ResourceGroupName "ContosoVNetGroup" ` 70 | -VNetName "ContosoVNet" ` 71 | -DomainName "contosodomain.ms" ` 72 | -DomainUserName "MyUser" ` 73 | -DomainUserPassword $securePassword 74 | ``` 75 | 76 | 77 | The script above will provision an Azure VM and promote it to serve as the domain controller for the VNET. In addition, it will reconfigure the VNET to use the newly provisioned VM as its DNS server. -------------------------------------------------------------------------------- /User Guides/9-Configure SQL Server Reporting Services.md: -------------------------------------------------------------------------------- 1 | # Configuring Reporting Services 2 | **SQL Server Reporting Services (SSRS)** is part of Microsoft SQL Server services - SSRS, SSAS and SSIS. It is a server-based report generating software system that can create, deploy and manage traditional and mobile ready paginated reports via a modern web portal. 3 | 4 | > Read more about [SQL Server Reporting Service](https://en.wikipedia.org/wiki/SQL_Server_Reporting_Services) and find [documentation here](https://docs.microsoft.com/en-us/sql/reporting-services/create-deploy-and-manage-mobile-and-paginated-reports). 5 | 6 | # Table Of Contents 7 | 1. [Connect to SSRS Web Portal](#connect-to-ssrs-web-portal) 8 | 2. [Subscribing to Reports via Email](#subscribing-to-reports-via-email) 9 | 10 | 11 | ## Connect to SSRS Web Portal 12 | Deploying the solution, provisions two SSRS virtual machines front-ended by an [Azure Load balancer](https://azure.microsoft.com/en-us/services/load-balancer/) for high availability and performance. Follow the next steps to connect to the SSRS admin web-portal. 13 | 1. Obtain the **SSRS** load balancer url from the deployment summary page. 14 | > *e.g.* `http://ssrslb.ciqsedw.ms/reports`. 15 | 16 | ![ssrs-url](../img/reportingserver_assets/ssrs-url.png) 17 | 18 | 2. Browse to the SSRS load balancer url. 19 | 3. Enter the admin credentials on the prompt. 20 | - Username name **must** be a user that can authenticate against the SQL Server in the format **domain\username**. For instance `ciqsedw\edwadmin`. 21 | - Password is the SSRS admin password. 22 | 23 | ![authentication](../img/reportingserver_assets/authentication.png) 24 | 25 | 4. If everything works correctly, you should now have successfully authenticated and can access the reports and data sources. 26 | ![Home](../img/reportingserver_assets/ssrs-home.png) 27 | 28 | ## Subscribing to Reports via Email 29 | > This step is not automated by the solution, however, you can manually configure a custom SMTP server, such as SendGrid. 30 | 31 | ### 1. Create SendGrid SMTP account on Azure 32 | 1. Go to the [Azure portal](https://portal.azure.com). 33 | 2. Search the market place for **SendGrid Email Delivery**. 34 | 3. Create a new SendGrid account. 35 | 36 | ![SendGrid Account](../img/reportingserver_assets/sendgrid-smtp.png) 37 | 38 | 4. Find the SendGrid account created under your subscription. 39 | 5. Go to **All settings** -> **Configurations**. Get the following: 40 | - Username 41 | - Password 42 | - SMTP Server address 43 | 44 | ![Configuration Parameters](../img/reportingserver_assets/sendgrid-config.png) 45 | 46 | > **NOTE:** Password is the same one created when the SendGrid account was created. 47 | 48 | ### 2. Enter SendGrid credentials into Reporting Server 49 | 1. Connect to both of your **SSRS Servers** using Remote Desktop. 50 | 2. Open the **Reporting Services Configuration Manager**. 51 | 3. Connect to the server instance. 52 | 53 | ![SSRS Instance](../img/reportingserver_assets/ssrs-instance.png) 54 | 55 | 4. On the left tabs, click on `Email Settings` and fill out the following 56 | - Sender email address 57 | - SMTP Server (smtp.sendgrid.net) 58 | - Username 59 | - Password/Confirm Password 60 | 61 | ![SSRS Information](../img/reportingserver_assets/ssrs-email.png) 62 | 63 | 5. Click Apply. 64 | 65 | > **Note:** There is no need to restart the Reporting Server service. It takes the most recent configuration. 66 | 67 | ### 3. Subscribe to a report to receive email delivery 68 | 1. Navigate to the Reporting Server web portal 69 | 1. Right click on any paginated report you want to subscribe to. 70 | 2. Click on **Subscribe** 71 | 72 | ![Subscribe](../img/reportingserver_assets/subscribe-1.png) 73 | 74 | 3. When the page loads, make sure the following options are set correctly. 75 | - The **Owner** field points to a user that can query the SQL Server. 76 | - Select **Destination (Deliver the report to:)** as Email. 77 | - Create a schedule for report delivery. 78 | - Fill out the **Delivery options (E-mail)** fields. 79 | - Click on **Create subscription**. 80 | 81 | ![Create Subscriptions](../img/reportingserver_assets/subscribe-2.png) 82 | 83 | 4. If the subscription was successful, the page reloads to the home page. 84 | 85 | ![Home Page](../img/reportingserver_assets/ssrs-home.png) 86 | 87 | 5. Find your existing subscriptions, per report, by clicking on the gear icon at the top right hand side of the page and clicking **My Subscriptions** . 88 | -------------------------------------------------------------------------------- /User Guides/UsersGuide-TOC.md: -------------------------------------------------------------------------------- 1 | # User Guide for Enterprise BI and Reporting 2 | 3 | The guide is organized along the sequence of steps that you need to follow to deploy and operationalize the TRI. You can return to specific sections once you have successfully deployed the TRI end-to-end (i.e. after you have completed all the steps until Step 15 below). If the deployment is blocked at any stage, consult the [Get Help and Support](./17-Get%20Help%20and%20Support.md) section for mitigation and workarounds. 4 | 5 | - Step 1 - It is assumed that you have an Azure subscription. If not, [obtain a subscription](https://azure.microsoft.com/en-us/free/?v=17.39a). Then implement these [prerequisite steps before deployment](./1-Prerequisite%20Steps%20Before%20Deployment.md) 6 | 7 | - Step 2 - Next, [set up the deployment](./2-Set%20up%20Deployment.md), starting from either the GitHub repository, or the Cortana Gallery. 8 | 9 | - Step 3 - Azure is a dynamic environment, and the TRI has several products in its architecture. Please monitor the deployment progress. If the deployment fails or stalls at a given step, you can [troubleshoot the deployment](./3-Troubleshoot%20the%20Deployment.md). 10 | 11 | - Step 4 - Once the deployment completes with success, you can [manage the deployed infrastructure](./4-Manage%20the%20Deployed%20Infrastructure.md) from an Admin console that is provided as part of the TRI. 12 | 13 | - Step 5 - You may also want to [monitor the deployed components](./5-Monitor%20the%20Deployed%20Components.md) individually - such as the SQL DW, SSAS, and SSRS servers, the VMs, load balancers, and other components, as part of monitoring the end-to-end system. 14 | 15 | - Step 6 - Once the infrastructure has been deployed in the subscription, the next step is to [prepare the infrastructure to ingest your data](./6-Prepare%20the%20infrastructure%20for%20your%20Data.md). This includes optionally removing any demo data and models that are currently in SQL DW and the SSAS servers. 16 | 17 | - Step 7 - The default data generator that is shipped with the TRI is programmed to load demo data files. So you need to modify this script to [configure the data ingestion](./7-Configure%20Data%20Ingestion.md) into the SQL Data Warehouse. 18 | 19 | - Step 8 - Once SQL DW is configured for data ingestion, the next step is to [configure the SQL Server Analysis Services](./8-Configure%20SQL%20Server%20Analysis%20Services.md) for generation of analytical tabular models for interactive BI, and the models enabling direct query access for SSRS report generation. 20 | 21 | - Step 9 - Next, [configure SQL Server Reporting Services](./9-Configure%20SQL%20Server%20Reporting%20Services.md) to generate and serve reports. 22 | 23 | - Step 10 - To enable interactive BI against the SSAS cached models, [configure Power BI](./10-Configure%20Power%20BI.md) gateway to connect to the SSAS read-only servers through the frontend load balancers, and the Power BI clients for dashboard access. 24 | 25 | - Step 11 - Now that all the data engines are set up, your next step is to do a [one-time load of historical data into the SQL DW](./11-Load%20historical%20data%20into%20the%20warehouse.md). Skip this step if you have no historical data, and are starting your BI project from scratch. 26 | 27 | - Step 12 - Follow this with a [one time load of all the historical tabular models](./12-Load%20historical%20tabular%20models.md). 28 | 29 | - Step 13 - [Create and/or import dashboards and reports](./13-Create%20dashboards%20and%20reports.md) to confirm that your users are able to view the tabular model data through their PowerBI clients, and are able to receive reports. 30 | 31 | - Step 14 - Given the reassurance that the end-to-end pipeline is working for data at rest, you can now confidently [set up incremental load](./14-Set%20up%20incremental%20loads.md) of data from your on-premise or cloud based data sources. 32 | 33 | - Step 15 - Incremental data ingestion process is enabled by dynamic ADF pipelines that you may want to [monitor and troubleshoot](./15-Monitor%20and%20Troubleshoot%20Data%20Pipelines.md) as necessary. 34 | 35 | - Step 16 - If you face any issues with the deployment, consult the [frequently asked questions](16-Frequently%20Asked%20Questions.md). 36 | 37 | - Step 17 - [Get help and support](./17-Get%20Help%20and%20Support.md) for any of the above steps from the documentation and additional resources provided with the TRI. 38 | 39 | - Step 18 - Finally, for any number of reasons, if you'd like to remove the deployed implementation from your subscription, you can follow these steps for [deleting the deployment](./18-Deleting%20a%20deployment.md). 40 | -------------------------------------------------------------------------------- /scripts/DeployVPN.ps1: -------------------------------------------------------------------------------- 1 | #Requires -RunAsAdministrator 2 | #Requires -Modules AzureRM.Network 3 | #Requires -Modules AzureRM.profile 4 | 5 | # This script deploys a Azure Virtual Network, subnet and VPN gateway 6 | # Use this script when connectivity from onpremises to Azure VNET is using point-to-site VPN connection 7 | Param( 8 | [Parameter(Mandatory=$true)] 9 | [string]$SubscriptionName, 10 | 11 | # Under this resource group common resources like VPN gateway, Virtual network will be deployed. 12 | [Parameter(Mandatory=$true)] 13 | [string]$ResourceGroupName, 14 | 15 | # The name of Azure VNet resource. 16 | [Parameter(Mandatory=$true)] 17 | [string]$VNetName, 18 | 19 | [Parameter(Mandatory=$true)] 20 | [string]$Location, 21 | 22 | # The name of the Azure VNet Gateway resource. 23 | [Parameter(Mandatory=$true)] 24 | [string]$VNetGatewayName, 25 | 26 | [Parameter(Mandatory=$false)] 27 | [string]$AddressPrefix = "10.254.0.0/16", 28 | 29 | [Parameter(Mandatory=$false)] 30 | [string]$GatewaySubnetPrefix = "10.254.1.0/24", 31 | 32 | [Parameter(Mandatory=$false)] 33 | [string]$OnpremiseVPNClientSubnetPrefix = "192.168.200.0/24", 34 | 35 | [Parameter(Mandatory=$false)] 36 | [string]$RootCertificateName = "VPN-RootCert-$($VNetName)", 37 | 38 | [Parameter(Mandatory=$false)] 39 | [string]$ChildCertificateName = "VPN-ChildCert-$($VNetName)" 40 | ) 41 | 42 | # Import the common functions 43 | $scriptPath = $MyInvocation.MyCommand.Path 44 | $scriptDir = Split-Path $scriptPath 45 | Import-Module (Join-Path $scriptDir Common.psm1) -Force 46 | 47 | # Select subscription 48 | Select-AzureRmSubscription -SubscriptionName $SubscriptionName 49 | $subscription = Get-AzureRmSubscription -SubscriptionName $SubscriptionName 50 | $SubscriptionId = $subscription.Subscription.SubscriptionId 51 | 52 | # Create the resource group if needed 53 | New-ResourceGroupIfNotExists $ResourceGroupName -Location $Location 54 | 55 | # Deploy VNET, Gateway Subnet and VPN gateway 56 | $templateParamsVpnGateway = @{ 57 | gatewayName=$VNetGatewayName 58 | virtualNetworkName=$VNetName 59 | edwAzureVNetAddressPrefix=$AddressPrefix 60 | vpnGatewaySubnetPrefix=$GatewaySubnetPrefix 61 | } 62 | 63 | Write-Host -ForegroundColor Yellow "VPN Gateway deployment could take upto 45 minutes" 64 | 65 | $templateFilePath = Join-Path (Join-Path (Split-Path -Parent $scriptDir) 'armTemplates') 'vpn-gateway.json' 66 | $vpnGwDeployment = New-AzureRmResourceGroupDeployment -Name VpnGateway ` 67 | -ResourceGroupName $ResourceGroupName ` 68 | -TemplateFile $templateFilePath ` 69 | -TemplateParameterObject $templateParamsVpnGateway ` 70 | -Verbose 71 | 72 | Write-Host "Generating certificates for VPN gateway" 73 | 74 | # Generate a self signed root certificate 75 | $vpnRootCert = New-SelfSignedCertificate -Type Custom -KeySpec Signature ` 76 | -Subject "CN=$($RootCertificateName)" -KeyExportPolicy Exportable ` 77 | -HashAlgorithm sha256 -KeyLength 2048 ` 78 | -CertStoreLocation "Cert:\CurrentUser\My" -KeyUsageProperty Sign -KeyUsage CertSign 79 | 80 | # Add the self signed root certificate to the trusted root certificates store 81 | $store = New-Object System.Security.Cryptography.X509Certificates.X509Store( 82 | [System.Security.Cryptography.X509Certificates.StoreName]::Root, 83 | "currentuser" 84 | ) 85 | $store.open("MaxAllowed") 86 | $store.add($vpnRootCert) 87 | $store.close() 88 | 89 | # Generate a client certificate 90 | New-SelfSignedCertificate -Type Custom -KeySpec Signature ` 91 | -Subject "CN=$($ChildCertificateName)" -KeyExportPolicy Exportable ` 92 | -HashAlgorithm sha256 -KeyLength 2048 ` 93 | -CertStoreLocation "Cert:\CurrentUser\My" ` 94 | -Signer $vpnRootCert -TextExtension @("2.5.29.37={text}1.3.6.1.5.5.7.3.2") 95 | 96 | # Get root certificate public key data to be used by VPN gateway 97 | $certBase64 = [system.convert]::ToBase64String($vpnRootCert.RawData) 98 | $rootCert = New-AzureRmVpnClientRootCertificate -Name $RootCertificateName -PublicCertData $certBase64 99 | 100 | $gateway = Get-AzureRmVirtualNetworkGateway -Name $VNetGatewayName -ResourceGroupName $ResourceGroupName 101 | 102 | Write-Host "Updating VPN gateway with certificates" 103 | 104 | Set-AzureRmVirtualNetworkGateway -VirtualNetworkGateway $gateway ` 105 | -VpnClientAddressPool $OnpremiseVPNClientSubnetPrefix ` 106 | -VpnClientRootCertificates $rootCert 107 | -------------------------------------------------------------------------------- /Technical Guides/4-Understanding logical datawarehouses.md: -------------------------------------------------------------------------------- 1 | # Configuring Logical Data Warehouses 2 | The TRI implements data load orchestration into multiple parallel data warehouses for redundancy and high availability. 3 | 4 | ![Architecture](../img/ConfiguringSQLDWforTRI.png) 5 | 6 | ## Data Availability and Orchestration features 7 | 8 | The logical Data Warehouse architecture and orchestration address these requirements: 9 | 10 | 1. Each logical data warehouse (LDW) consists of a single physical data warehouse by default. More replicas per LDW can be configured for scalability and high availability. 11 | 2. SQL DW data refresh cycle can be configured by the user - one option is to use 8 hours.This implies loading the physical data warehouses 3 times a day. 12 | 3. Adding new schemas and data files to the SQL DW is a simple, scriptable process. The TRI assumes 100-500 data files being sent in every day, but this can vary day to day. 13 | 4. The job manager is both “table” aware and "data/time" aware to plan execution of a report until data has been applied representing a given period of time for that table. 14 | 5. Surrogate keys are not utilized and no surrogate key computation is applied during data upload. 15 | 6. All data files are expected to be applied using “INSERT” operations. There is no support to upload “DELETE” datasets. Datasets must be deleted by hand; no special accommodation is made in the architecture for DELETE or UPDATE. 16 | 7. All fact tables in the data warehouse (and the DIMENSION_HISTORY) tables are expected to follow the Kimball [Additive Accumulating Snapshot Fact Table](http://www.kimballgroup.com/2008/11/fact-tables/) approach. A “reversal flag” approach is recommended, to indicate if a fact is to be removed, with offsetting numeric values. For example, a cancelled order is stored with value of $100 on day 1 and reversal flag set to false; and stored with a value of -$100 on day 2 with a reversal flag set to true. 17 | 8. All fact tables will have DW_ARCHIVAL_DATE column set so that out-of-time analysis and aggregation can be performed. The values for the DW_ARCHIVAL_DATE will be set by the Data Generator that computes the change set for the LDW each local-timezone day. 18 | 9. The job manager does not prioritize data loads, and provides only a minimal dependency tracking for golden dimensions and aggregates. “Golden Dimensions” are tables that must be loaded before other tables (dimension, fact or aggregate) into the physical EDWs. 19 | 10. Dimension tables must be re-calculated and refreshed after every load of a dimension table with >0 records. A stored procedure to re-create the current dimension table after a load of dimension table history records is sufficient. 20 | 11. The Admin GUI provides DW load status. 21 | 12. Data availability can be controlled using manual overrides. 22 | 23 | ## Relationship with Tabular Models 24 | 25 | The TRI also meets the following requirements for the tabular model generation in relation to the SQL DW: 26 | 27 | 1. An optional stored procedure runs on tables to produce aggregate results after a load. The aggregate tables will also be tracked in the job manager. A set of tabular model caches will be refreshed with the results of the incremental dataset changes. 28 | 2. Tabular model refreshes do not need to be applied synchronously with the logical data warehouse flip; however, there will be minimal (data volume dependent) delay between the tabular model refresh and the application of updates as viewed by a customer. 29 | 3. Dependencies from the tabular model caches will be known to the Job Manager. Only the tabular model caches that are impacted by a dataset change will get re-evaluated and their read-only instances updated. 30 | 4. The system is designed to refresh 10-100 tabular model caches 3 times daily, with each tabular model having size approximately 10Gb of data. 31 | 32 | ## Logical Data Warehouse Status and Availability 33 | A set of control tables associate physical DWs to tables, schemas, and to time ranges and record dataset auditing information (start date, end date, row count, filesize, checksum) in a separate audit file. 34 | 35 | The LDW load and read data sets iterate through three states: 36 | - `Load`: The LDW set is processing uploaded data files to "catch-up" to the latest and greatest data. 37 | - `Standby`:The LDW is not processing updates nor serving customers; it is a hot-standby with “best available” data staleness for disaster recovery purposes. 38 | - `Active`: The LDW is up-to-date and serving requests but not receiving any additional data loads. 39 | 40 | It is recommended that the data files that are loaded into physical DW instances have the following naming structure: 41 | 42 | - Data File: `startdatetime-enddatetime-schema.tablename.data.csv` 43 | - Audit file: `startdatetime-enddatetime-schema.tablename.data.audit.json` 44 | 45 | This will provide sufficient information to determine the intent of the file should it appear outside of the expected system paths. The purpose of the audit file is to contain the rowcount, start/end date, filesize and checksum. Audit files must appear next to their data files in the same working directory always. Orphaned data or audit files should not be loaded. 46 | 47 | > To gain a deeper understanding of the flip process, please read [Anatomy of a Logical Data Warehouse Flip](./5-Understanding%20data%20warehouse%20flip.md). 48 | -------------------------------------------------------------------------------- /User Guides/15-Monitor and Troubleshoot Data Pipelines.md: -------------------------------------------------------------------------------- 1 | # Monitor and Troubleshoot Data Pipelines 2 | 3 | ## Summary 4 | 5 | All data ingestion jobs are performed as one-time Azure data factory (ADF) pipelines. In the event of a pipeline failure, this document should guide you through troubleshooting any possible issues. 6 | 7 | ## Monitor the pipeline status 8 | 9 | ### Administration UI 10 | 11 | On the Administration UI dashboard you can see an *Azure Data Factory* element. Inside this element is a numeric count for the *completed*, *in progress*, *waiting*, and *failed* data factory pipeline states. These counts should be used to monitor the states of the current pipeline jobs. 12 | 13 | All jobs should be completed through the normal course of a flip interval. During this interval, it's common to see pipelines in the *in progress* or *waiting* state. Each pipeline is processed on a first come first serve basis and can be delayed for a number of reason; including but not limited to table dependencies, data warehouse dependencies, maximum pipelines allowed at once. 14 | 15 | Pipelines may also fail. If you encounter a failure it's best to try to correct immediately as it will hold up any future flip operations. Each failed pipeline will be automatically recreated and retried every 5 minutes into the max retry count is reached. 16 | 17 | Pipeline states: 18 | - *Completed* - Pipeline ran successfully. 19 | - *In Progress* - Pipelines is enqueued or being processed by ADF. 20 | - *Waiting* - Pipeline job is waiting on an internal dependency to be completed before being enqueued. 21 | - *Failed* - The pipeline failed for an unknown reason. 22 | 23 | ![Dashboard](../img/adminui_assets/adminui-dashboard.png) 24 | 25 | ### Azure Portal 26 | 27 | The Administration UI is the best way to get a status of the overall system and its components. When you need more in-depth troubleshooting of the pipelines, it's best to work directly in the Azure portal. 28 | 29 | To perform any troubleshooting you first need to find your data factory name. The name can found on the Administration UI dashboard in the Azure data factory element. The name should be of the format `Dev-LoadEDW-`. Otherwise, you can find the name by directly inspecting the resources under the resource group of this deployment. There should only be one data factory for this deployment. 30 | 31 | ### Monitor & Manage Dashboard 32 | 33 | The best way to view pipeline execution logs is through the Azure portal's *Monitor & Manage* dashboard of the data factory. 34 | 35 | 1. Login to the [Azure portal](https://portal.azure.com) 36 | 2. Search for the data factory by name. 37 | 3. In the data factory's main blade click the *Monitor & Manage* panel. 38 | 4. Set the *Start time (UTC)* and *End time (UTC)* in the middle center of the *MONITOR* tab. This date range will filter all visible activity to this range. 39 | 5. In the bottom center of the data factory *MONITOR* tab, you will see an *ACTIVITY WINDOWS*. In this window you can filter by *Type*. Most likely you will want to find *Type* of *Failed*. 40 | 6. Click the activity row under investigation. This will open an *Activity window explorer* on right hand side. 41 | 42 | The *Activity window explorer* should give you the diagnostic issues you will need in order to determine the problem with the activity. The most useful information will be **Failed execution** error logs. Depending on the type of error encountered, you are most likely to encounter one of the following root causes: 43 | 44 | 1. A linked service is broken. 45 | 2. The data being ingested in incorrect. 46 | 3. The configuration for ingestion was incorrect. 47 | 48 | ### Additional information 49 | 50 | Additional information on monitoring Azure data factories can be found on [Microsoft Docs](https://docs.microsoft.com/en-us/azure/data-factory/monitor-visually). 51 | 52 | ## Resolutions to Common Problems 53 | 54 | ### Fix a broken linked service 55 | 56 | 1. Login to the [Azure portal](https://portal.azure.com) 57 | 2. Search for the data factory by name. 58 | 3. In the data factory's main blade click the *Author and deploy* panel. 59 | 4. Click the linked service that is failing. 60 | 5. Correct the JSON of the linked service. 61 | 6. Click *Deploy*. 62 | 7. Wait for the pipeline to be recreated and rerun. 63 | 64 | ### Fix bad data 65 | 66 | It's inevitable that some data you may be ingested was specified incorrectly. If you encounter a failure due to the data being bad, simply correct the data in the blob location and wait for the pipeline to be recreated and retried within 5 minutes. 67 | 68 | ### Remove a failed job 69 | 70 | If a pipeline is failing due to erroneously specified data, you may want to simply remove the job entirely. The can be done by performing a delete on the `DWTableAvailabilityRanges` entries that are causing the problem. This can be done by directly connecting to the `ctrldb` or through the `DWTableAVailabilityRanges` using the odata API. 71 | 72 | > **WARNING**: There is a `DWTableAvailabilityRanges` entry for each of the Data Warehouses. Only remove the entries related to your specific FileUri if the none of them have been processed. If you remove only some entries then the Data Warehouses will get out of sync. 73 | 74 | ### Retry a pipeline that has hit its maximum retry limit 75 | 76 | If too much time has passed before the error was corrected, the maximum retry count may have been hit. If so, you can force a retry to calling the odata function `RetryDwTableAvailabilityRangeJobs`. 77 | -------------------------------------------------------------------------------- /User Guides/16-Frequently Asked Questions.md: -------------------------------------------------------------------------------- 1 | # Frequently Asked Questions 2 | 3 | ### Why do the TRIs have multiple SQL Data Warehouses? 4 | 5 | SQL DW does not currently support read isolation, i.e. loading and reading data at the same time. In addition, SQL Data Warehouse (DW) supports limited (currently 32, more upcoming) concurrent queries; limiting the number of jobs which can run in parallel. Large enterprises usually require a large number of concurrent queries. To support this requirement our design has two logical DWs; one for loading data and the other for reading. Each logical DW can have one or more physical DWs. After a loading cycle is finished, the logical DWs are switched from loader to reader and vice versa. For more information on the flip operation see the [technical guide](../Technical%20Guides/5-Understanding%20data%20warehouse%20flip.md). 6 | 7 | ### Why doesn’t the TRI use an ADF data source instead of rotating an ephemeral blob? 8 | 9 | Enterprise customers have concerns of security of their data in the public cloud. At the time of inquiry, Blobs did not offer user keys for encryption. Keeping the blob ephemeral addresses this concern, and requires us to create one-time data load pipelines which can be deleted once the load is completed. For more information on the ephemeral blob see the [technical guide](../Technical%20Guides/1-Understanding%20ephemeral%20blobs.md). 10 | 11 | ### Why is a Job Manager required? Can’t we just use ADF? 12 | 13 | To load data from ephemeral blob storage into logical DWs, we need to create dynamic Data Factory pipelines with changing source and destination sinks. In addition, ADF currently does not support the building of SSAS partitions. The piece required us to build custom code along with maintaining the state of data loads into SSAS. We also needed to keep track of which SQL DW is the loader and which is the reader, as well as maintain switching the SQL DW instances. These key requirements led us to build our own scheduler and metadata system (on SQL Server) to orchestrate the entire workflow. 14 | 15 | ### Why is blob used to swap the partitions? 16 | 17 | To solve for very large concurrent users, SSAS servers can be configured to host the same data with a load balancer in front. As the SSAS cache needs to be refreshed on all the servers without disrupting the users, the cache model is built by partition builder machine and copied to Blob storage. Each server then removes itself from the load balancer and updates the model from Blob before resuming user requests. 18 | 19 | ### Why is there a separation between SSAS Read Only and SSAS for SSRS? What is the Queue limit, row security, query limit? 20 | 21 | Enterprises typically have the need for interactive analytics as well as scheduled reporting needs. To address the interactive needs, we have SSAS cache models (SSAS Read Only) built which can be accessed using Power BI. Power BI allows users to interactively query the SSAS cache; providing instant results. To address the scheduled reporting needs we provide SSRS reports using SSAS Direct Query model ensuring that the same row level security is applied to both cache and Direct Query models. The limitations of the SSRS architecture is that report building has to be done by a limited set of users and multiple reports cannot be scheduled at the same time due to the 1024 query queue limit in SQL DW. 22 | 23 | ### Why use SSAS instead of Redis cache? 24 | 25 | Redis is a completely different use case. It is not an interchangeable technology with SSAS which has a vertipaq engine allowing accessing relational data in a very low latency. Even if we use Redis, we are limited to cache hits of only 32 queries at a time. Any new query, even with minor syntactic changes, will require a new DW read. 26 | 27 | ### Why does the TRI use Azure SQL DW? Why not Teradata? 28 | 29 | Azure DW is in the list of technologies that was recommended by all product teams as part of the TRI. It's a PaaS service which is T-SQL compliant and allows for scaling up and down including pausing the data warehouse when not in use. SQL DW provides automated backups along with geo backups and restores. 30 | 31 | ### Can I configure the TRI to have only one logical SQL DW to play both Loader and Reader roles? 32 | 33 | No, the system requires a minimum of two SQL DWs. 34 | 35 | ### Can I configure the TRI to replace SQL DW with SQL DB? 36 | 37 | The TRI does not currently support the swapping of SQL DW with SQL DB. 38 | 39 | ### Can I configure the TRI to remove the SSAS cache components and have Power BI directly query SQL DW? 40 | 41 | Although this is not a supported and tested scenario, removal of any node in the architecture should not effect the upstream nodes. As such, you can remove any nodes without seriously effecting the system. This will need to be performed manually in the [Azure portal](https://portal.azure.com) post deployment. 42 | 43 | Keep in mind, the administration UI and other components are not configured to dynamically account for the removal of these nodes so you will still see the presence of these components even though they may not be functional. 44 | 45 | ### Can I configure the TRI to remove the SSRS scheduled reporting components? 46 | 47 | Although this is not a supported and tested scenario, removal of any node in the architecture should not effect the upstream nodes. As such, you can remove any nodes without seriously effecting the system. This will need to be performed manually in the [Azure portal](https://portal.azure.com) post deployment. 48 | 49 | Keep in mind, the administration UI and other components are not configured to dynamically account for the removal of these nodes so you will still see the presence of these components even though they may not be functional. 50 | 51 | ### Are the tabular models backed up? 52 | 53 | All tabular model are backed up a storage blob. These can be found under your resource group with the name XXXpntrfsh. 54 | -------------------------------------------------------------------------------- /User Guides/2-Set up Deployment.md: -------------------------------------------------------------------------------- 1 | # Setting up the Deployment 2 | 3 | Before you start the deployment, confirm that you have an Azure subscription with sufficient quota, and have implemented the [prerequisites](./1-Prerequisite%20Steps%20Before%20Deployment.md). 4 | 5 | Start the deployment by clicking on the DEPLOY button on the main page - which is README.md in the GitHub, or the solutions page in the Cortana Gallery. Once you click on DEPLOY, the deployment platform takes you through a series of questions to set up the TRI. This section explains each parameter in detail. 6 | 7 | ## Deployment name and subscription 8 | ### 1. Deployment Name 9 | Provide a deployment name. This name will be assigned to the resource group that contains all the resources required for the TRI. 10 | 11 | ### 2. Subscription 12 | Provide the name of the Azure subscription where you have set up the prerequisites such as VNET, domain controller etc. and where you want to deploy the reference implementation. 13 | 14 | ### 3. Location 15 | Choose the Data Center location where you want to deploy the reference implementation. This is a key benefit of the TRI - it saves you the effort of putting together all the products for an enterprise BI and Reporting solution that will be available together in a given Data Center. 16 | 17 | ### 4. Description 18 | Provide the description for this deployment. 19 | 20 | Once you click on create, the resource group gets created in your subscription. This is so that Azure has some resource in the subscription to record all the subsequent parameters that you are about to provide next. After this CREATE step, if you want to abandon your deployment, follow the instructions to cleanly [delete your deployment](./18-Deleting%20a%20deployment.md). 21 | 22 | ## VNET and Certificates for component connectivity 23 | 24 | ### 5. VNET 25 | The next screen accepts the VNET parameters. If you created the VNET and provisioned the domain controller using the steps outlined in [the prerequisites](./1-Prerequisite%20Steps%20Before%20Deployment.md), then input the values you provided for the parameters. 26 | ```PowerShell 27 | -VNetName "ContosoVNet" -DomainName "contosodomain.ms" -DnsVmName "contosodns" 28 | ``` 29 | 30 | ### 6. Certificates to manage security between components 31 | In addition to the certificates generated/used for VNET connectivity, you will need to provide three more certificates to manage secure access between the different components in the TRI. 32 | 1. A .PFX file with the private key used by Azure VMs to authenticate with Azure Active Directory, with its corresponding password. 33 | ```PowerShell 34 | $certName = "Contoso Client" 35 | $certPassword = ConvertTo-SecureString "" -AsPlainText -Force 36 | $cert = New-SelfSignedCertificate -DnsName $certName ` 37 | -CertStoreLocation cert:\LocalMachine\My ` 38 | -KeyExportPolicy Exportable ` 39 | -Provider "Microsoft Enhanced RSA and AES Cryptographic Provider" 40 | Export-PfxCertificate -Cert $cert -FilePath contosoglobalcert.pfx -Password $certPassword -Force | Write-Verbose 41 | ``` 42 | 2. A .CER file with the public key of the certificate authority to allow SSL encryption from a non-public certificate. 43 | ```PowerShell 44 | $rootCertAuthorityName = "Contoso Certificate Authority" 45 | $rootCert = New-SelfSignedCertificate -Type Custom -KeySpec Signature ` 46 | -Subject "CN=$rootCertAuthorityName" -KeyExportPolicy Exportable ` 47 | -HashAlgorithm sha256 -KeyLength 2048 ` 48 | -CertStoreLocation "Cert:\LocalMachine\My" -KeyUsageProperty Sign -KeyUsage CertSign 49 | Export-Certificate -Cert $rootCert -FilePath contosoauthority.cer 50 | ``` 51 | 3. Another .PFX file with the private key used to encrypt all of web server traffic over HTTPS, with its corresponding password. Make sure that you replace "contosodomain.ms" with domain name specified during [Domain Controller](https://github.com/Azure/azure-arch-enterprise-bi-and-reporting/blob/master/User%20Guides/1-Prerequisite%20Steps%20Before%20Deployment.md#provisioning-the-domain-controller) setup. 52 | ```Powershell 53 | $dnsName = "*.adminui.contosodomain.ms" # make sure to replace "contosodomain.ms" with your Domain Name if it is different 54 | $certPassword = ConvertTo-SecureString "” -AsPlainText -Force 55 | $sslCert = New-SelfSignedCertificate -DnsName $dnsName ` 56 | -CertStoreLocation cert:\LocalMachine\My -KeyExportPolicy Exportable ` 57 | -Provider "Microsoft Enhanced RSA and AES Cryptographic Provider" ` 58 | -Signer $rootCert ` 59 | -HashAlgorithm SHA256 60 | Export-PfxCertificate -Cert $sslCert -FilePath contosossl.pfx -Password $certPassword -Force | Write-Verbose 61 | ``` 62 | 63 | These certificate files should be publicly available for your Azure subscription, and they must be secure. We recommend that you store the files in Azure Storage with [Shared Access Signature (SAS)](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-dotnet-shared-access-signature-part-2) support. This will enable you to provide the certificates as Blob files, and set the password. 64 | 65 | :warning: **Please note** 66 | 67 | It is essential to generate the certificates above with `-Provider "Microsoft Enhanced RSA and AES Cryptographic Provider"` parameter and `-CertStoreLocation cert:\LocalMachine\My`. Otherwise, Azure Functions will not be able to download and decrypt the certificates and the deployment will fail during the "Download Certificate" step. 68 | 69 | Examples: 70 | - Private key used by VMs to authenticate with Azure Active Directory: http://_contosoblob_.blob.core.windows.net/_certificates_/_contosoglobalcert.pfx 71 | - Public key of a certificate authority to allow SSL encryption from a non-public certificate: http://_contosoblob_.blob.core.windows.net/_certificates_/_contosoauthority.cer 72 | - Private key used to encrypt all web server traffic over HTTPS: 73 | http://_contosoblob_.blob.core.windows.net/_certificates_/_contosossl.pfx 74 | 75 | ## Configure the topology 76 | The parameters in this section are self-explanatory in the deployment configuration page. 77 | 78 | ## Configure the default account names and password 79 | The parameters in this section are self-explanatory in the deployment configuration page. 80 | -------------------------------------------------------------------------------- /User Guides/5-Monitor the Deployed Components.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | The Admin UI provides a graphical interface for performing several functions pertaining to your Enterprise Reporting and BI Technical Reference Implementation, including: 4 | 5 | - Monitoring the health of the overall system and each deployed resource 6 | - Understanding the flow of data through the various component layers 7 | - Seeing the state, active or load, of each SQL Data Warehouse over time 8 | - Seeing the number of load jobs and average latency per SQL Data Warehouse over time 9 | - Drilling down to see the individual status of table load jobs 10 | - Viewing the various system configuration settings 11 | - Detailed monitoring of several components using pre-built Operations Management Suite (OMS) dashboards 12 | 13 | # How to access the Admin UI 14 | 15 | The Admin UI is automatically installed as part of your solution deployment. The web page announcing the successful completion of your deployment will have a link to the Admin GUI (Example: https://rbaedw56y9058.adminui.ciqsedw.ms). The link's naming convention is as follows: `https://.adminui.`, where `your-deployment` is the name you provided at the beginning of the deployment. This name is also the name of the resource group holding all of the resources for the deployment in your subscription. 16 | 17 | >Before you click on the Admin link, please check that you are connected to your Virtual Private Network (VPN). The Admin UI is hosted on a web server in Azure that is not accessible over the open Internet. Until you connect your local computer to your Azure VPN, your browser will not be able to resolve the DNS name. Refer to the [VPN installation guide](./6-Prepare%20the%20infrastructure%20for%20your%20Data.md#1-install-vpn-client) and the general [point-to-site documentation](https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-howto-point-to-site-resource-manager-portal) for help connecting to your VPN. 18 | 19 | # Usage 20 | 21 | The Admin UI consists of four tabs - **overview**, **details**, **data**, and **metrics**. Below is a description of each one of the tabs: 22 | 23 | ## Overview 24 | 25 | ![Overview](../img/adminui_assets/adminui-overview.png) 26 | 27 | The overview tab shows a diagram depicting the flow of data through the different component layers of the system, starting from the originating data source, to data ingestion, persistence, reporting, and ultimately consumption. All the boxes in the diagram correspond to a resource deployed in Azure, except for the data source, which can be configured to be running on-premises. 28 | 29 | | Diagram Box | Description | 30 | | ---------- | ----------- | 31 | | Data Sources | links to the Azure VM hosting the data generator | 32 | | Control Server | links to the Azure VM hosting the Control Server | 33 | | Azure Data Factory | links to the Azure Data Factory (ADF). From there, you can see the various configured linked services and the one-time pipelines created for loading data from the data source into the SQL Data Warehouses and Azure blob storage. In this box, you will also see the number of `successful`, `in progress`, `waiting`, and `failed` load jobs. | 34 | | Logical Data Warehouse (2) | one box represents the SQL Data Warehouses in the `load` state, and the others are for SQL Data Warehouses in the `active` state. Each links to the Azure resource denoting the SQL Data Warehouses. In this box, you will also see a health check indicator of the SQL Data Warehouse. | 35 | | SSAS DQ | contains links to each of the Azure VMs hosting a SSAS Direct Query (DQ) node. In this box, you will see the last timestamp when this node flipped to the Active Reader SQL Data Warehouse. | 36 | | SSAS Partition Builder | links to each of the Azure VMs hosting a SSAS Partition Builder (PB) node. In this box, you will also see the number of tabular models ready, updating, or pending. | 37 | | SSAS ReadOnly | links to each of the Azure VMs hosting a SSAS Read Only (RO) nodes. In this box, you will also see if the partition is up to date. | 38 | | SSRS | links to each of the Azure VMs hosting a SSRS node | 39 | | PBI Gateway | indicates the PBI Gateway enabling Power BI (PBI) reports | 40 | 41 | ## Details 42 | 43 | ![Details](../img/adminui_assets/adminui-details.png) 44 | 45 | The details tab shows at the top two tiles for the two logical data warehouses in the system. Each tile contains two time-series graphs grouped by the state, `active` or `load`, of the logical data warehouse. The first graph shows the `total job count` for each state. The second graph shows the `average job duration` for each state. Each tile also shows the name of the associated SQL Server and the number of attached Direct Query nodes. 46 | 47 | Below this is a table showing similar information, but in tabular format. Each row shows the `start` and `stop` times for a particular SQL Data Warehouse state, and additional information such as the number of `completed`, `failed`, and `in progress` jobs. You can click on a row to open a new tab that drills down into that particular time range, showing the individual jobs for each table. 48 | 49 | ## Data 50 | 51 | ![Data](../img/adminui_assets/adminui-data.png) 52 | 53 | The data tab shows the underlying settings of the system, including _DWTables_, _ControlServerProperties_, _StoredProcedures_, _TabularModelTablePartitions_, and _TabularModelNodeAssignments_. 54 | 55 | ## Metrics 56 | 57 | ![Metrics](../img/adminui_assets/adminui-metrics.png) 58 | 59 | The metrics tab contains links to several pre-built Operations Management Suite (OMS) dashboards, including the Control Server, Partition Builder Nodes, Direct Query Nodes, Read Only Nodes, SQL Server Usage, and Azure Storage Analytics. By clicking a link, you will be taken to one of these dashboards, allowing you to see in greater detail information about that component of the system. 60 | 61 | # Troubleshooting 62 | 63 | If you are unable to view any content on the Admin UI, and you have verified VPN connectivity, then open a debugging console in your browser and view the console output. Typically this is done in your browser by clicking Function F12 and clicking "Console". Refer to the [FAQ](./16-Frequently%20Asked%20Questions.md), and if there is no solution available there, refer [Help and Support](./17-Get%20Help%20and%20Support.md). 64 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## Ignore Visual Studio temporary files, build results, and 2 | ## files generated by popular Visual Studio add-ons. 3 | ## 4 | ## Get latest from https://github.com/github/gitignore/blob/master/VisualStudio.gitignore 5 | 6 | # User-specific files 7 | *.suo 8 | *.user 9 | *.userosscache 10 | *.sln.docstates 11 | 12 | # User-specific files (MonoDevelop/Xamarin Studio) 13 | *.userprefs 14 | 15 | # Build results 16 | [Dd]ebug/ 17 | [Dd]ebugPublic/ 18 | [Rr]elease/ 19 | [Rr]eleases/ 20 | x64/ 21 | x86/ 22 | bld/ 23 | [Bb]in/ 24 | [Oo]bj/ 25 | [Ll]og/ 26 | 27 | # Visual Studio 2015 cache/options directory 28 | .vs/ 29 | # Uncomment if you have tasks that create the project's static files in wwwroot 30 | #wwwroot/ 31 | 32 | # MSTest test Results 33 | [Tt]est[Rr]esult*/ 34 | [Bb]uild[Ll]og.* 35 | 36 | # NUNIT 37 | *.VisualState.xml 38 | TestResult.xml 39 | 40 | # Build Results of an ATL Project 41 | [Dd]ebugPS/ 42 | [Rr]eleasePS/ 43 | dlldata.c 44 | 45 | # .NET Core 46 | project.lock.json 47 | project.fragment.lock.json 48 | artifacts/ 49 | **/Properties/launchSettings.json 50 | 51 | *_i.c 52 | *_p.c 53 | *_i.h 54 | *.ilk 55 | *.meta 56 | *.obj 57 | *.pch 58 | *.pdb 59 | *.pgc 60 | *.pgd 61 | *.rsp 62 | *.sbr 63 | *.tlb 64 | *.tli 65 | *.tlh 66 | *.tmp 67 | *.tmp_proj 68 | *.log 69 | *.vspscc 70 | *.vssscc 71 | .builds 72 | *.pidb 73 | *.svclog 74 | *.scc 75 | 76 | # Chutzpah Test files 77 | _Chutzpah* 78 | 79 | # Visual C++ cache files 80 | ipch/ 81 | *.aps 82 | *.ncb 83 | *.opendb 84 | *.opensdf 85 | *.sdf 86 | *.cachefile 87 | *.VC.db 88 | *.VC.VC.opendb 89 | 90 | # Visual Studio profiler 91 | *.psess 92 | *.vsp 93 | *.vspx 94 | *.sap 95 | 96 | # TFS 2012 Local Workspace 97 | $tf/ 98 | 99 | # Guidance Automation Toolkit 100 | *.gpState 101 | 102 | # ReSharper is a .NET coding add-in 103 | _ReSharper*/ 104 | *.[Rr]e[Ss]harper 105 | *.DotSettings.user 106 | 107 | # JustCode is a .NET coding add-in 108 | .JustCode 109 | 110 | # TeamCity is a build add-in 111 | _TeamCity* 112 | 113 | # DotCover is a Code Coverage Tool 114 | *.dotCover 115 | 116 | # Visual Studio code coverage results 117 | *.coverage 118 | *.coveragexml 119 | 120 | # NCrunch 121 | _NCrunch_* 122 | .*crunch*.local.xml 123 | nCrunchTemp_* 124 | 125 | # MightyMoose 126 | *.mm.* 127 | AutoTest.Net/ 128 | 129 | # Web workbench (sass) 130 | .sass-cache/ 131 | 132 | # Installshield output folder 133 | [Ee]xpress/ 134 | 135 | # DocProject is a documentation generator add-in 136 | DocProject/buildhelp/ 137 | DocProject/Help/*.HxT 138 | DocProject/Help/*.HxC 139 | DocProject/Help/*.hhc 140 | DocProject/Help/*.hhk 141 | DocProject/Help/*.hhp 142 | DocProject/Help/Html2 143 | DocProject/Help/html 144 | 145 | # Click-Once directory 146 | publish/ 147 | 148 | # Publish Web Output 149 | *.[Pp]ublish.xml 150 | *.azurePubxml 151 | # TODO: Comment the next line if you want to checkin your web deploy settings 152 | # but database connection strings (with potential passwords) will be unencrypted 153 | *.pubxml 154 | *.publishproj 155 | 156 | # Microsoft Azure Web App publish settings. Comment the next line if you want to 157 | # checkin your Azure Web App publish settings, but sensitive information contained 158 | # in these scripts will be unencrypted 159 | PublishScripts/ 160 | 161 | # NuGet Packages 162 | *.nupkg 163 | # The packages folder can be ignored because of Package Restore 164 | **/packages/* 165 | # except build/, which is used as an MSBuild target. 166 | !**/packages/build/ 167 | # Uncomment if necessary however generally it will be regenerated when needed 168 | #!**/packages/repositories.config 169 | # NuGet v3's project.json files produces more ignorable files 170 | *.nuget.props 171 | *.nuget.targets 172 | 173 | # Microsoft Azure Build Output 174 | csx/ 175 | *.build.csdef 176 | 177 | # Microsoft Azure Emulator 178 | ecf/ 179 | rcf/ 180 | 181 | # Windows Store app package directories and files 182 | AppPackages/ 183 | BundleArtifacts/ 184 | Package.StoreAssociation.xml 185 | _pkginfo.txt 186 | 187 | # Visual Studio cache files 188 | # files ending in .cache can be ignored 189 | *.[Cc]ache 190 | # but keep track of directories ending in .cache 191 | !*.[Cc]ache/ 192 | 193 | # Others 194 | ClientBin/ 195 | ~$* 196 | *~ 197 | *.dbmdl 198 | *.dbproj.schemaview 199 | *.jfm 200 | *.pfx 201 | *.publishsettings 202 | orleans.codegen.cs 203 | 204 | # Since there are multiple workflows, uncomment next line to ignore bower_components 205 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622) 206 | #bower_components/ 207 | 208 | # RIA/Silverlight projects 209 | Generated_Code/ 210 | 211 | # Backup & report files from converting an old project file 212 | # to a newer Visual Studio version. Backup files are not needed, 213 | # because we have git ;-) 214 | _UpgradeReport_Files/ 215 | Backup*/ 216 | UpgradeLog*.XML 217 | UpgradeLog*.htm 218 | 219 | # SQL Server files 220 | *.mdf 221 | *.ldf 222 | *.ndf 223 | 224 | # Business Intelligence projects 225 | *.rdl.data 226 | *.bim.layout 227 | *.bim_*.settings 228 | 229 | # Microsoft Fakes 230 | FakesAssemblies/ 231 | 232 | # GhostDoc plugin setting file 233 | *.GhostDoc.xml 234 | 235 | # Node.js Tools for Visual Studio 236 | .ntvs_analysis.dat 237 | node_modules/ 238 | 239 | # Typescript v1 declaration files 240 | typings/ 241 | 242 | # Visual Studio 6 build log 243 | *.plg 244 | 245 | # Visual Studio 6 workspace options file 246 | *.opt 247 | 248 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.) 249 | *.vbw 250 | 251 | # Visual Studio LightSwitch build output 252 | **/*.HTMLClient/GeneratedArtifacts 253 | **/*.DesktopClient/GeneratedArtifacts 254 | **/*.DesktopClient/ModelManifest.xml 255 | **/*.Server/GeneratedArtifacts 256 | **/*.Server/ModelManifest.xml 257 | _Pvt_Extensions 258 | 259 | # Paket dependency manager 260 | .paket/paket.exe 261 | paket-files/ 262 | 263 | # FAKE - F# Make 264 | .fake/ 265 | 266 | # JetBrains Rider 267 | .idea/ 268 | *.sln.iml 269 | 270 | # CodeRush 271 | .cr/ 272 | 273 | # Python Tools for Visual Studio (PTVS) 274 | __pycache__/ 275 | *.pyc 276 | 277 | # Cake - Uncomment if you are using it 278 | # tools/** 279 | # !tools/packages.config 280 | 281 | # Telerik's JustMock configuration file 282 | *.jmconfig 283 | 284 | # BizTalk build output 285 | *.btp.cs 286 | *.btm.cs 287 | *.odx.cs 288 | *.xsd.cs 289 | -------------------------------------------------------------------------------- /User Guides/13-Create dashboards and reports.md: -------------------------------------------------------------------------------- 1 | # Create Dashboard using PowerBI Online 2 | 3 | ## Configure datasources 4 | 5 | You are now ready to create a dashboard using the data in your deployment. The first thing to do is browse to [PowerBI Online](https://powerbi.microsoft.com) and sign in. Then, click **settings** at the top and select **Manage gateways**. Click the gateway you just created the click **Add data sources to use the gateway**. Next, enter a new data source name and select **Analysis Services** for the type. For **Server** enter the SSAS load balancer's IP address. You can find this the same way you found the SQL Server VM's IP address, except this time scroll down to the load balancers and click the one ending in "ssasrolb" and clicking **Frontend IP configuration**. For **database** enter **AdventureWorks** (unless you've reconfigured your own database). For **Username** and **Password** enter the values you chose for your deployment. 6 | 7 | > Note: **Username** here also requires the domain name which you specified during the deployment. So, **Username** should look like `DomainName\Username`. 8 | 9 | When ready, click **Add**. 10 | 11 | ## Add users to the datasource 12 | 13 | Next, click **Users** and add all the people who can publish reports using this datasource by entering their email addresses and clicking **Add**. The email you entered in setting up the on-premises PowerBI data gateway is already used by default. 14 | 15 | When ready, click **Map user names**. Enter the **Original name** (the user's email address) and the **New name** (the domain joined username from above) for all users. This will create mappings for each user's email id to the domain joined username for the SSAS Read Only VMs. Click [here](https://powerbi.microsoft.com/en-us/documentation/powerbi-gateway-enterprise-manage-ssas/#usernames-with-analysis-services) to learn more about usernames with Analysis Services. 16 | 17 | When ready, click **OK**. You may need to wait a few minutes for the changes to take affect. 18 | 19 | ## Create an app workspace 20 | 21 | Next create a workspace. To do this, click **Workspaces** > **Create app workspace** and type a name for your workspace. If needed, edit it to be unique. You have a few options to set. If you choose Public, anyone in your organization can see what's in the workspace. Private means that only members of the workspace can see its contents. 22 | 23 | > Note: You cannot change the Public/Private setting after you've created the workspace. 24 | 25 | You can also choose if members can edit or have view-only access. Add email addresses of people you want to have access to the workspace, and click **Add**. You cannot add group aliases, only individuals. Decide whether each person is a member or an admin. 26 | 27 | > Note: Admins can edit the workspace itself, including adding other members. Members can edit the content in the workspace unless they have view-only access. 28 | 29 | When ready, click **save**. 30 | 31 | ## Create a dataset 32 | 33 | Now you are ready to create a report. Begin by clicking **Get Data**. Under **Databases** click **Get** > **SQL Server Analysis Services** > **Connect**. Scroll down and click on your newly created gateway. Click **AdventureWorks - Model** > **Connect**. 34 | 35 | > TIP: You may have to refresh your browser to see your new dataset. 36 | 37 | ## Create reports and dashboards 38 | 39 | Under **Datasets**, click your new dataset. You can create whatever kind of report you want, but below are the steps for creating a three tabbed report showing some interesting data visualizations. 40 | 41 | ### Tab One - Percentage of sales contributed by each individual customer 42 | 43 | Under **Visualizations** select **Pie chart**. Under **Fields** expand **FactInternetSales** and drag **SalesAmount** into **Legend** and **CustomerKey** into **Values**. Under **Filters** select **SalesAmount**, enter **greater than 250000**, and click **Apply filter**. 44 | 45 | Under **Visualizations** select **TreeMap**. Under **Fields** expand **FactInternetSales** and drag **SalesAmount** into **Values** and **OrderDate** into **Group**. Under **Filters** select **SalesAmount**, enter **greater than 250000**, and click **Apply filter**. 46 | 47 | - Dashboard 48 | ![Internet Sales By Customer ID](../img/powerbi_assets/internet_sales_amt_by_customers.png) 49 | 50 | ### Tab Two - Underlying Fact tables 51 | 52 | Create a new tab. Under **Visualizations** click **Table**. Under **Fields** expand **FactInternetSales** and check all the fields. Click **Table** again, expand **FactProductInventory**, and check all fields. Click **Table** again, expand **FactResellerSales**, and check all fields. Click **Table** again, expand **FactSalesQuota**, and check all fields. 53 | - Dashboard 54 | ![Internet Sales By Customer ID](../img/powerbi_assets/all_tables.png) 55 | 56 | 57 | ### Tab Three - Sales summary for top spending customers and sales amount ordered by date 58 | 59 | Create a new tab. Under **Visualizations** click **Clustered column chart**. Under **Fields** expand **FactInternetSales** and drag **SalesAmount** to **Axis** and **CustomerKey** to **Value**. Click **CustomerKey** and select **Count (Distinct)**. 60 | 61 | Under **Visualizations** select **Stacked column chart**. Under **Fields** expand **FactInternetSales** and drag **SalesAmount** to **Value** and **OrderDate** to **Axis**. 62 | - Dashboard 63 | ![Internet Sales By Customer ID](../img/powerbi_assets/internet_sales_summary.png) 64 | 65 | When ready, click **Save this report**, enter a name for your report, and click **Save**. 66 | 67 | 68 | **NOTE: You can click “Text box” to add custom titles to all your report visualizations.** 69 | ![Customized Titles](../img/powerbi_assets/textbox.png) 70 | 71 | ## Publish app 72 | 73 | For your new workspace, click **Publish app** in the upper right to start the process of sharing all the content in that workspace. First, for **Details**, fill in the description to help people find the app. You can set a background color to personalize it. Next, for **Content**, you see the content that’s going to be published as part of the app – everything that’s in the workspace. You can also set the landing page (the dashboard or report people will see first when they go to your app) or none (in which case they’ll land on a list of all the content in the app). Last, for **Access**, decide who has access to the app: either everyone in your organization, or specific people or email distribution lists. 74 | -------------------------------------------------------------------------------- /User Guides/6-Prepare the infrastructure for your Data.md: -------------------------------------------------------------------------------- 1 | # Prepare the infrastructure for your Data 2 | 3 | ## Summary 4 | This page lists the steps to prepare your deployment for ingesting your own data. 5 | 6 | ## 1. Install VPN Client 7 | 8 | - Confirm that your client machine has the two certificates installed for VPN connectivity to the VM see [prerequisites](./1-Prerequisite%20Steps%20Before%20Deployment.md) for more details. 9 | - Login to the [Azure portal](http://portal.azure.com) and find the Resource Group that corresponds to the VNet setup. Pick the **Virtual Network** resource, and then the **Virtual Network Gateway** in that resource. 10 | - Click on **Point-to-site configuration**, and **Download the VPN client** to the client machine. 11 | - Install the `64-bit (Amd64)` or `32-bit (x86)` version, depending on your local Windows operating system. The modal dialog that pops up after you launch the application may show up with a single **Don't run** button. Click on **More**, and choose **Run anyway**. 12 | - Finally, choose the relevant VPN connection from **Network & Internet Settings**. This should set you up for the next step. 13 | 14 | 15 | ## 2. Stopping Data generation 16 | 17 | The TRI deploys a dedicated VM for data generation, with a PowerShell script placed in the VM. The PowerShell script is scheduled to run every 3 hours. We need to login into the VM and disable the schedule. 18 | 19 | * Get the IP address of your data generator VM: From the portal, open the resource group in which the TRI is deployed (this will be different than the VNET resource group) and look for a VM with the name ending in `dgvm00`. 20 | * Click the VM name. 21 | * Click on the **Networking** tab for that specific VM and find the private IP address at the top of the blade. 22 | * Connect to the VM: Use Remote Desktop to connect to the VM using its IP address and the admin username and password that you specified as part of the deployment parameters. *You must be connected to the VPN in order to connect to the VM*. 23 | * Start the `Task Scheduler` app and disable the task named "Generate and upload data". 24 | 25 | 26 | ## 3. Drop AdventureWorks tables 27 | 28 | As part of the initial deployment, the TRI installs AdventureWorks tables in the data warehouse. We need to drop the tables in all the physical data warehouse instances to create a clean schema for your organization. 29 | 30 | * Login into the physical data warehouse: 31 | > As a pre-requisite, install either [SSMS](https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms) or [SQLCMD](https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility) on your local computer, based on your personal preference. 32 | 33 | From the Azure Portal, get the list of SQL Data Warehouses (SQL DWs) installed in your resource group. Click on each server and ensure that the firewall setting in each DW allows your client machine to connect. Connect to the SQL DWs based on your client tool preference. 34 | * Drop the tables: Once you log into the SQL DW database named "dw", you will be able to view the list of tables created in the dbo schema. Use the `Drop` command to drop all 29 AdventureWorks tables. 35 | 36 | ## 4. Size your data warehouse: 37 | 38 | As part of the initial deployment, the SQL DW physical instances are setup at DWU 100 to demonstrate the solution. Based on the data load volumes and query volumes, you will need to scale the SQL DW instances. Further details on sizing for scale are available [here](https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-manage-compute-overview). 39 | 40 | Once you have determined the needed DWUs for your workload, the correct numbers must be updated in the Job Manager SQL Server instance. You can login into Azure portal and locate the ctrldb server in the solution resource group. Login into the SQL server instance and update the dbo.ControlServerProperties table. For example, if you determine that you need DWU 1000 for loading and 2000 for queries, run the following SQL. 41 | 42 | ```sql 43 | /* Updating Reader to 2000 */ 44 | 45 | update dbo.ControlServerProperties 46 | SET value = 2000 47 | Where Name = 'ComputeUnits_Active' 48 | 49 | /* Updating Loader to 1000 */ 50 | 51 | update dbo.ControlServerProperties 52 | SET value = 1000 53 | Where Name = 'ComputeUnits_Load' 54 | ``` 55 | 56 | ## 5. Optionally override the initial setting for flip time 57 | 58 | As part of the initial deployment, the data warehouses Reader and Loader are scheduled to *Flip* every 2 hours. This implies that you have a refresh cycle of 2 hours. Each organization will have different refresh cycles based on their business requirements. The flip time is controlled by a property in the ControlServerProperties table and can be changed as follows. 59 | 60 | ```sql 61 | /* Update the Flip time to 8 hours. */ 62 | 63 | update dbo.ControlServerProperties 64 | SET value = 8 65 | Where Name = 'FlipInterval' 66 | ``` 67 | 68 | The above command changes the Flip time to every 8 hours. 69 | 70 | ## 6. Create fact and dimension tables in all the physical data warehouses - both loader and reader. 71 | 72 | You will need to create tables in both of the data warehouses (Loader and Reader). Since SQL DW is based on MPP technology, please read the [documentation](https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-overview) on how to create tables. 73 | 74 | 75 | #### Example Create Table Region 76 | 77 | ```sql 78 | CREATE TABLE REGION ( 79 | r_regionkey integer not null, 80 | r_name char(25) not null, 81 | r_comment varchar(152) not null 82 | ) 83 | WITH 84 | ( 85 | DISTRIBUTION = round_robin, 86 | CLUSTERED COLUMNSTORE INDEX 87 | ) 88 | ``` 89 | 90 | Create all the tables needed for your organization. 91 | 92 | 93 | 94 | ## 7. Insert entries in the mapping DW-Table in the Job Manager SQL Database. 95 | 96 | Once the tables are created, the next step is to insert entries in Job Manager database so that, when a file is uploaded and registered with Job Manager, the correct data factory pipeline is created to load data into the SQL DW instances. 97 | 98 | #### Example Entry for Region Table 99 | 100 | ```sql 101 | insert into dbo.DWTables 102 | (Name,Type,RunOrder,LoadUser,Id,CreationDate_UTC,createdby,LastUpdatedDate_UTC,lastupdatedby) 103 | values 104 | ('dbo.regions','Fact',0,'edw_loader_mdrc',0,sysdatetime(),'admin',sysdatetime(),'admin'); 105 | ``` 106 | 107 | Insert entries for all the tables you have created in Step 6. 108 | -------------------------------------------------------------------------------- /scripts/CommonNetworking.psm1: -------------------------------------------------------------------------------- 1 | # This module contains network related common functions 2 | 3 | # To get virtual network. If not possible to get virtual network, exit as it is a critical requirement 4 | Function Get-VirtualNetworkOrExit( 5 | [Parameter(Mandatory=$true)] 6 | [string] $ResourceGroupName, 7 | 8 | [Parameter(Mandatory=$true)] 9 | [string] $VirtualNetworkName 10 | ) 11 | { 12 | try { 13 | $virtualNetwork = Get-AzureRmVirtualNetwork -Name $VirtualNetworkName -ResourceGroupName $ResourceGroupName 14 | 15 | if (-not $virtualNetwork) { 16 | Write-Host -ForegroundColor DarkRed "Unable to get virtual network $($VirtualNetworkName). Exiting..." 17 | Write-VnetErrorMessage 18 | 19 | exit 3 20 | } 21 | 22 | return $virtualNetwork 23 | } catch { 24 | Write-Host -ForegroundColor DarkRed "Unable to get virtual network $($VirtualNetworkName). Exiting..." 25 | Write-Host $Error[0] 26 | Write-VnetErrorMessage 27 | 28 | exit 3 29 | } 30 | } 31 | 32 | # Get uint32 value for given IPv4 network address 33 | Function Get-NetworkValue( 34 | [Parameter(Mandatory=$true)] 35 | [string] $NetworkPrefix 36 | ) 37 | { 38 | $strArray = $NetworkPrefix.Split(".") 39 | [uint32] $value = 0 40 | for($i=0; $i -lt 4; $i++) { 41 | $value = ($value -shl 8) + [convert]::ToInt16($strArray[$i]) 42 | } 43 | 44 | return $value 45 | } 46 | 47 | # Get Ipv4 network address for given uint32 48 | Function Get-NetworkAddress( 49 | [Parameter(Mandatory=$true)] 50 | [uint32] $NetworkValue 51 | ) 52 | { 53 | [uint32] $mask = (1 -shl 8) - 1 54 | [string] $networkAddress = "" 55 | for($i=0; $i -lt 3; $i++) { 56 | [uint32] $value = $NetworkValue -band $mask 57 | $NetworkValue = $NetworkValue -shr 8 58 | $networkAddress = "." + $value + $networkAddress 59 | } 60 | 61 | $networkAddress = "$($NetworkValue)$($networkAddress)" 62 | return $networkAddress 63 | } 64 | 65 | # Get free subnet or existing given subnet under given virtual network or exit 66 | # If free subnet is found then it is created under VNET for this deployment to use 67 | Function Get-AvailableSubnetOrExit( 68 | [Parameter(Mandatory=$true)] 69 | [object] $VirtualNetwork, 70 | 71 | [Parameter(Mandatory=$true)] 72 | [string] $SubnetName 73 | ) 74 | { 75 | $GatewaySubnetName = "GatewaySubnet" 76 | $TotalBits = 32 77 | 78 | try { 79 | $vnetAddressPrefix = $VirtualNetwork.AddressSpace.AddressPrefixes[0] 80 | $addressPrefixArray = $vnetAddressPrefix.Split("/") 81 | $networkAddress = $addressPrefixArray[0] 82 | [int]$networkNumBits = $TotalBits - ([convert]::ToInt32($addressPrefixArray[1], 10)) 83 | 84 | $gatewaySubnet = Get-AzureRmVirtualNetworkSubnetConfig -Name $GatewaySubnetName -VirtualNetwork $VirtualNetwork 85 | if (-not $gatewaySubnet) { 86 | Write-Host -ForegroundColor DarkRed "Unable to get subnet for deployment. Exiting..." 87 | Write-VnetErrorMessage 88 | 89 | exit 4 90 | } 91 | 92 | # Using GatewaySubnet range size for all subnets to keep things simple 93 | $addressPrefixArray = $gatewaySubnet.AddressPrefix.Split("/") 94 | $gwNetworkAddress = $addressPrefixArray[0] 95 | $gwNetworkAddressSuffix = $addressPrefixArray[1] 96 | [int]$gwNetworkNumBits = $TotalBits - ([convert]::ToInt16($gwNetworkAddressSuffix, 10)) 97 | 98 | # Track use subnets 99 | $usedSubnets = New-Object 'System.Collections.Generic.HashSet[int]' 100 | for($i=0; $i -lt $VirtualNetwork.Subnets.Count; $i++) { 101 | $subnet = $VirtualNetwork.Subnets[$i] 102 | if ($subnet.Name.Equals($SubnetName)) { 103 | # Found an existing given subnet, return it 104 | return $subnet.AddressPrefix 105 | } 106 | 107 | [uint32]$value = Get-NetworkValue ($subnet.AddressPrefix.Split("/"))[0] 108 | $value = $value -shr $gwNetworkNumBits 109 | $mask = (1 -shl ($networkNumBits - $gwNetworkNumBits)) - 1 110 | $value = $value -band $mask 111 | 112 | $usedSubnets.Add($value) | Out-Null 113 | } 114 | 115 | # Find a free subnet 116 | $numSubnets = 1 -shl ($networkNumBits - $gwNetworkNumBits) 117 | for([uint32]$i=1; $i -lt $numSubnets; $i++) { 118 | if (-not $usedSubnets.Contains($i)) { 119 | # Found a free subnet 120 | 121 | [uint32]$value = Get-NetworkValue $networkAddress 122 | $value = $value -bor ($i -shl $gwNetworkNumBits) 123 | 124 | $newSubnetAddress = Get-NetworkAddress $value 125 | $newSubnetPrefix = "$($newSubnetAddress)/$($gwNetworkAddressSuffix)" 126 | 127 | # Create the new subnet 128 | New-Subnet -VirtualNetwork $VirtualNetwork ` 129 | -SubnetName $SubnetName ` 130 | -SubnetPrefix $newSubnetPrefix | Out-Null 131 | 132 | return $newSubnetPrefix 133 | } 134 | } 135 | 136 | Write-Host -ForegroundColor DarkRed "Unable to find free subnet for deployment. Exiting..." 137 | Write-VnetErrorMessage 138 | 139 | exit 4 140 | } catch { 141 | Write-Host -ForegroundColor DarkRed "Unable to get subnet for deployment. Exiting..." 142 | Write-VnetErrorMessage 143 | 144 | Write-Host $Error[0] 145 | 146 | exit 4 147 | } 148 | } 149 | 150 | # Creates a subnet under given VNET 151 | Function New-Subnet( 152 | [Parameter(Mandatory=$true)] 153 | [object] $VirtualNetwork, 154 | 155 | [Parameter(Mandatory=$true)] 156 | [string] $SubnetName, 157 | 158 | [Parameter(Mandatory=$true)] 159 | [string] $SubnetPrefix 160 | ) 161 | { 162 | Add-AzureRmVirtualNetworkSubnetConfig -Name $SubnetName -VirtualNetwork $VirtualNetwork -AddressPrefix $SubnetPrefix 163 | Set-AzureRmVirtualNetwork -VirtualNetwork $VirtualNetwork 164 | } 165 | 166 | Function Write-VnetErrorMessage() 167 | { 168 | Write-Host -ForegroundColor DarkRed "Unable to determine VNET or subnet used for this deployment." 169 | Write-Host -ForegroundColor DarkRed "If your setup uses point-to-site VPN configuration, please run DeployVPN.ps1 script before running this script." 170 | } 171 | -------------------------------------------------------------------------------- /Technical Guides/5-Understanding data warehouse flip.md: -------------------------------------------------------------------------------- 1 | # Anatomy of a Logical Data Warehouse Flip 2 | 3 | > For a detailed description of the LDW transition states (`Active`, `Standby`, `Load`), please read [LDW States and Availability](./4-Understanding%20logical%20datawarehouses.md#logical-data-warehouse-status-and-availability) 4 | 5 | The transition of a LDW from `Load` to `Active` and vice versa a.k.a the "Flip Operation" is done every T hours where T is configurable by the user. 6 | The flip operation is triggered by daemons which run as scheduled tasks on each of the Analysis Server Direct Query (ASDQ) Nodes. 7 | 8 | Every few minutes each daemon running on each ASDQ node queries the job manager if a LDW flip needs to happen. 9 | 10 | ## Step-by-Step 11 | 12 | 1. The Job Manager maintains a database table "LDWExpectedStates" which stores the start and end times of the current flip interval. It consists of records that define which LDW is in `Load` state and which is in `Active` state and until what time they are supposed to be in those states. 13 | 2. On being queried by the ASDQ daemon, job manager queries this table and checks if its past the end time of the current flip interval else it responds with a No-Op. If the current UTC time is past the endtime, then flip operation needs to be executed and the following steps are executed. 14 | * (a) The LDW which needs to be `Active` in the next flip interval is determined and `LDWExpectedStates` table is populated with details regarding the start and end time of the next flip interval. The endtime stamp is determined by adding T hours to the start time which is the current UTC time. If there are no LDWs in `Load` state then no flip will happen. If there are more than 1 LDW in `Load` state then the next LDW in sequence after the currently active LDW is picked as the one to be flipped to `Active` state. 15 | 16 | * (b) The state of the next-to-be-Active LDW is switched to `Active` state and the state transitions of its PDWs from `Load` to `Active` are initiated 17 | 18 | 3. PDW state transition from `Load` to `Active` goes through a couple of intermediate steps as follows 19 | * (a) `Load`: The PDW is processing uploaded data files to "catch-up" to the latest and greatest data. 20 | * (b) `StopLoading`: The PDW will not be accepting any new data load jobs but will wait till the current load jobs complete 21 | * (c) `ScaleUpToActive`: State indicating that the PDW has completed all its assigned load jobs and is being scaled up to Active DWU capacity 22 | * (d) `Active` - PDW is up-to-date and serving requests but not receiving any additional data loads. 23 | 24 | 4. Once a PDW is changed to `Active` state, job manager checks if there is at least 1 DQ node in the "DQ Alias group" which is still serving active queries. A "DQ Alias group" is the group of DQ nodes which point to the same PDW instance in an LDW. Multiple DQ nodes can point to the same PDW. This is to ensure that we can increase the availability of DQs if we need to, assuming the PDW can support concurrent queries from all these DQs. Checking at least 1 DQ is in `Active` state ensures new requests do not get dropped. If this check succeeds a "Transition" response is sent to the DQ node which stops accepting new connections from the DQ LoadBalancer and drains off the existing connections. Once the grace time is over, the DQ changes its connection string to point to the newly active PDW and reports to job manager which then allows other ASDQs to start their transitions. 25 | 26 | 5. Once all the DQs in a "DQ Alias group" have flipped to a different PDW, the group's original PDW is transitioned to a `Load` state after scaling down its DWU capacity. 27 | 6. After all of the Active PDWs of the previously Active LDW have been transitioned to `Load` state, the state of the LDW is changed to `Load` state. This marks the end of the flip operation. 28 | 29 | ## Example 30 | Here is an example schedule showing how the Flip Operation occurs using the following configuration 31 | * 2 LDWS: `LDW01`, `LDW02` 32 | * 2 PDWS: `PDW01-LDW01` (LDW01), `PDW01-LDW02` (LDW02) 33 | * 2 DQ Nodes: `DQ01` (points to PDW01-LDW01), `DQ02` (points to PDW01-LDW01) 34 | * ASDQ daemon schedule: 1 minute 35 | * Connection Time Drain: 10 minutes 36 | 37 | | UTC | LDW01 | LDW02 | PDW01-LDW01 | PDW01-LDW02 | DQ01 | DQ02 | 38 | |:----|:----|:----|:----|:----|:----|:----| 39 | |00:00 | Active | Load | Active | Load | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 | 40 | |00:01 | Active | Load | Active | `StopLoading` | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 | 41 | |00:03 | Active | Load | Active | `ScaleUpToActive` | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 | 42 | |00:05 | Active | `Active` | Active | `Active` | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 | 43 | |00:06 | Active | Active | Active | Active | `Transition` : PDW01-LDW01 | Normal : PDW01-LDW01 | 44 | |00:16 | Active | Active | Active | Active | `ChangeCompleted` : PDW01-LDW02 | Normal : PDW01-LDW01 | 45 | |00:26 | Active | Active | Active | Active | ChangeCompleted : PDW01-LDW02 | `Transition` : PDW01-LDW01 | 46 | |00:27 | Active | Active | `ScaleDownToLoad` | Active | `Normal` : PDW01-LDW02 | `Normal` : PDW02-LDW02 | 47 | |00:29 | `Load` | Active | `Load` | Active | Normal : PDW01-LDW02 | Normal : PDW02-LDW02 | 48 | 49 | ## FAQ 50 | **Are there any timing instructions for the Admin to restart the process?** 51 | The flip interval of T hours is a configurable property and can be set by the Admin by updating a ControlServer database property 52 | When the next flip time comes around, this value will be used to set the next flip interval. 53 | If the Admin wants to flip immediately then the end timestamp of the current flip interval will need to be updated to current UTC time in the LDWExpectedStates db table and flip operation should be initiated in the next couple of minutes. 54 | 55 | **What other situations will require Admin intervention?** 56 | The flip operation requires a Load PDW to satisfy certain conditions before it can be made Active. These are explained in 2.a - 2.c of Data Warehouse Flip Operation. If load jobs get stuck or if scaling takes a long time, flip operation will be halted. If all the Direct Query nodes die, even then flip operation will not be triggered because currently ASDQ daemons initiate flip operation. Admin intervention will be required to address these. 57 | 58 | **What other steps should the Admin NOT do with the flip pattern?** 59 | Once a flip operation is started, Admin should not try to change the state of PDWs or LDWs by themselves. Since these states are maintained in the job manager's database, any mismatch between those and the real state will throw off the flip operation. If any of the PDWs die , Admin needs to get it back into the state as was last recorded in the database. 60 | -------------------------------------------------------------------------------- /User Guides/Configure User Access Control.md: -------------------------------------------------------------------------------- 1 | # User Access Control 2 | 3 | It is essential that you manage which users have access to your deployment. In particular, you should control access to the Admin UI and Control Server apps, as well as the underlying Azure resources that comprise the solution. For additional resources on how to manage user access for apps click [here](https://docs.microsoft.com/en-us/azure/active-directory/active-directory-managing-access-to-apps) and for Azure resources click [here](https://docs.microsoft.com/en-us/azure/active-directory/role-based-access-control-configure). 4 | 5 | Otherwise, here are some steps to get you started. 6 | 7 | ## Manage Access to the Admin UI and Control Server 8 | [Azure Active Directory (AAD)](https://docs.microsoft.com/en-us/azure/active-directory/active-directory-whatis) is the service used to authenticate and authorize users. You will configure the two registered AAD apps in your deployment, the Admin UI and the Control Server, to only allow access to the users that you add. Users not added will not be able to access either app. 9 | 10 | First, sign in to the Azure portal with an account that’s a [global admin]( https://docs.microsoft.com/en-us/azure/active-directory/active-directory-assign-admin-roles-azure-portal#details-about-the-global-administrator-role) for the directory. 11 | 12 | ### Admin UI 13 | Click **Azure Active Directory** > **App registrations** > the Admin UI app. 14 | 15 | For example: 16 | ![Restrict AdminUI](../img/restrict-adminui.png) 17 | 18 | Click the link under **Managed application in local directory** > **Properties** > **User assignment required** > **Yes** > **Save**. At this point, only users you add will have access to the Admin UI. 19 | 20 | To add users, click **Users and groups** > **Add user** > **Users and groups**. Type in the email of the user to add and click **Invite**. Do this for each user. Only the users you added will now have access to the Admin UI. 21 | 22 | ### Control Server 23 | Click **Azure Active Directory** > **App registrations** > the Control Server app. 24 | 25 | For example: 26 | ![Restrict Control Server](../img/restrict-controlserver.png) 27 | 28 | Click the link under **Managed application in local directory** > **Properties** > **User assignment required** > **Yes** > **Save**. At this point, only users you add will have access to the Control Server. 29 | 30 | To add users, click **Users and groups** > **Add user** > **Users and groups**. Type in the email of the user to add and click **Invite**. Do this for each user. Only the users you added will now have access to the Control Server. 31 | 32 | ## Manage access to the Azure resources 33 | 34 | Most of the users of your deployment will not need access to the underlying Azure resources. For those that do, you should only grant the access that users need. For example, a non-administrator user should not have write privileges where they could delete the resource group. 35 | 36 | Role Based Access Control (RBAC) enables fine-grained access management to the underlying Azure resources of your deployment. Using RBAC, you can grant only the amount of access that users need to perform their jobs. You can add users to the resource group with roles appropriate to their needed level of access. The three most basic roles are: 37 | 38 | - **Owner** – Lets you manage everything, including access to resources. 39 | - **Contributor** – Lets you manage everything except access to resources. 40 | - **Reader** – Lets you view everything, but not make any changes. 41 | 42 | Here are the steps to do this: 43 | 44 | First, sign in to the Azure portal with an account that’s the subscription owner. Click **Resource groups** > your resource group > **Access control (IAM)**. 45 | 46 | For example: 47 | ![Restrict Azure Resources](../img/restrict-azure-resources.png) 48 | 49 | Click **Add**, select a role appropriate for the user, type in the email of the users to add, and click **Save**. The users you added will now have role based access to the Azure resources. 50 | 51 | > Note that user roles can also be inherited from the parent subscription. If you want to modify a role inherited from the parent subscription, click **Subscription (inherited)** and modify it at the subscription level. 52 | 53 | ## Manage app access by instrumenting the deployment code 54 | 55 | In the previous sections we discussed how to whitelist user access to the apps. If you want more granular control for individual functions within the app itself, you need to instrument the app code. This section describes how to do just that. Click [here](https://docs.microsoft.com/en-us/aspnet/core/security/authorization/roles) for additional reading on role based authorization in ASP.NET MVC web apps. 56 | 57 | > Note that the code for your deployment is not released, nor is it currently instrumented for RBAC, so this section describes in general terms how a developer would do this for an app. 58 | 59 | Now let's get started. 60 | 61 | ### Basic syntax 62 | 63 | Role based authorization checks are declarative. The developer embeds them within their code, against a controller or an action within a controller. For example, the following code would limit access to any actions on the **AdministrationController** to users who are a member of the **Administrator** group. 64 | 65 | ```csharp 66 | [Authorize(Roles = "Administrator")] 67 | public class AdministrationController : Controller { 68 | } 69 | ``` 70 | 71 | Role requirements can also be expressed using the new Policy syntax, where a developer registers a policy at startup as part of the Authorization service configuration. This normally occurs in ConfigureServices() in your Startup.cs file. Here is an example: 72 | 73 | ```csharp 74 | public void ConfigureServices(IServiceCollection services) { 75 | services.AddMvc(); 76 | services.AddAuthorization(options => { 77 | options.AddPolicy("RequireAdministratorRole", policy => policy.RequireRole("Administrator")); 78 | }); 79 | } 80 | 81 | [Authorize(Policy = "RequireAdministratorRole")] 82 | public IActionResult Shutdown() { 83 | ... 84 | } 85 | ``` 86 | 87 | By the way, you are not restricted to the [existing roles](https://docs.microsoft.com/en-us/azure/active-directory/role-based-access-built-in-roles) in AAD. You can define your own [custom roles](https://docs.microsoft.com/en-us/azure/active-directory/role-based-access-control-custom-roles), as well as your own [custom policies](https://docs.microsoft.com/en-us/aspnet/core/security/authorization/policies). 88 | 89 | ### Where to do it 90 | 91 | Now that you know how to do it, where in the app code might you want to make these changes? A good starting point might be to review the exposed actions in your app. 92 | 93 | For example, in your deployment the Control Server has several exposed actions. You can see all the Control Server actions by first connecting to the VPN client for your deployment then using the OData API endpoint - you see it on the final page of your deployment - and browsing to: 94 | 95 | ``` 96 | https://.adminui.ciqsedw.ms:8081/odata/$metadata 97 | ``` 98 | 99 | You will see an XML document describing all the exposed Control Server endpoints. If you scan the XML, you can see all the exposed Control Server actions. For example: 100 | 101 | - CreateAndInitializeTabularModelTablePartition 102 | - UpdateRangeStatus 103 | - ClaimNextReadOnlyNodeToRestore 104 | - BackupOfPartitionStatesCompleted 105 | - CompleteRestoreForReadOnlyNode 106 | - FlipLdwState 107 | - FlipPdwState 108 | - FlipAliasNodeState 109 | - IsReadyToFlip 110 | - ReportTransitionStatus 111 | - RetryDwTableAvailabilityRangeJobs 112 | 113 | The next step would be to instrument the code as described in the previous section to only allow execution by users affiliated with a certain role. 114 | -------------------------------------------------------------------------------- /armTemplates/dc-deploy.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", 3 | "contentVersion": "1.0.0.0", 4 | "parameters": { 5 | "adminUsername": { 6 | "type": "string", 7 | "metadata": { 8 | "description": "The name of the Administrator of the new VM and Domain" 9 | }, 10 | "defaultValue": "adAdministrator" 11 | }, 12 | "adminPassword": { 13 | "type": "securestring", 14 | "metadata": { 15 | "description": "The password for the Administrator account of the new VM and Domain" 16 | } 17 | }, 18 | "domainName": { 19 | "type": "string", 20 | "metadata": { 21 | "description": "The FQDN of the AD Domain to be created" 22 | } 23 | }, 24 | "existingVirtualNetworkName": { 25 | "type": "string", 26 | "metadata": { 27 | "description": "Name of the existing VNET" 28 | } 29 | }, 30 | "existingVirtualNetworkAddressRange": { 31 | "type": "string", 32 | "metadata": { 33 | "description": "Address range of the existing VNET" 34 | } 35 | }, 36 | "dcSubnetName": { 37 | "type": "string", 38 | "metadata": { 39 | "description": "Name of the existing subnet for Domain Controller" 40 | } 41 | }, 42 | "existingSubnetAddressRange": { 43 | "type": "string", 44 | "metadata": { 45 | "description": "Address range of the existing subnet" 46 | } 47 | }, 48 | "dcVmName": { 49 | "type": "string", 50 | "metadata": { 51 | "description": "The name of the Azure VM which will serve as the Domain Controller and the DNS server" 52 | } 53 | }, 54 | "_artifactsLocation": { 55 | "type": "string", 56 | "metadata": { 57 | "description": "The location of resources, such as templates and DSC modules, that the template depends on" 58 | }, 59 | "defaultValue": "https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/active-directory-new-domain-ha-2-dc" 60 | }, 61 | "_artifactsLocationSasToken": { 62 | "type": "securestring", 63 | "metadata": { 64 | "description": "Auto-generated token to access _artifactsLocation" 65 | }, 66 | "defaultValue": "" 67 | } 68 | }, 69 | "variables": { 70 | "vhdStorageAccountName": "[concat('vhds', uniqueString(resourceGroup().id))]", 71 | "storageAccountType": "Premium_LRS", 72 | "dcVMName": "[parameters('dcVmName')]", 73 | "imagePublisher": "MicrosoftWindowsServer", 74 | "imageOffer": "WindowsServer", 75 | "imageSKU": "2016-Datacenter", 76 | "dcVMSize": "Standard_DS2_v2", 77 | "dcPDCNicName": "adPDCNic", 78 | "dcSubnetRef": "[resourceId('Microsoft.Network/virtualNetworks/subnets', parameters('existingVirtualNetworkName'), parameters('dcSubnetName'))]", 79 | "dcDataDisk": "DCDataDisk", 80 | "dcDataDiskSize": 1000, 81 | "dcModulesURL": "[concat(parameters('_artifactsLocation'),'/DSC/CreateADPDC.zip', parameters('_artifactsLocationSasToken'))]", 82 | "dcConfigurationFunction": "CreateADPDC.ps1\\CreateADPDC", 83 | "vnetwithDNSTemplateUri": "[concat(parameters('_artifactsLocation'),'/nestedtemplates/vnet-with-dns-server.json', parameters('_artifactsLocationSasToken'))]" 84 | }, 85 | "resources": [ 86 | { 87 | "name": "[variables('vhdStorageAccountName')]", 88 | "type": "Microsoft.Storage/storageAccounts", 89 | "apiVersion": "2016-05-01", 90 | "location": "[resourceGroup().location]", 91 | "properties": { 92 | }, 93 | "sku": { "name": "[variables('storageAccountType')]" }, 94 | "kind": "Storage" 95 | }, 96 | { 97 | "name": "[variables('dcPDCNicName')]", 98 | "type": "Microsoft.Network/networkInterfaces", 99 | "apiVersion": "2016-10-01", 100 | "location": "[resourceGroup().location]", 101 | "dependsOn": [ 102 | ], 103 | "properties": { 104 | "ipConfigurations": [ 105 | { 106 | "name": "ipconfig1", 107 | "properties": { 108 | "privateIPAllocationMethod": "Dynamic", 109 | "subnet": { 110 | "id": "[variables('dcSubnetRef')]" 111 | } 112 | } 113 | } 114 | ] 115 | } 116 | }, 117 | { 118 | "name": "[variables('dcVMName')]", 119 | "type": "Microsoft.Compute/virtualMachines", 120 | "apiVersion": "2016-03-30", 121 | "location": "[resourceGroup().location]", 122 | "dependsOn": [ 123 | "[resourceId('Microsoft.Storage/storageAccounts',variables('vhdStorageAccountName'))]", 124 | "[resourceId('Microsoft.Network/networkInterfaces',variables('dcPDCNicName'))]" 125 | ], 126 | "properties": { 127 | "hardwareProfile": { 128 | "vmSize": "[variables('dcVMSize')]" 129 | }, 130 | "osProfile": { 131 | "computerName": "[variables('dcVMName')]", 132 | "adminUsername": "[parameters('adminUsername')]", 133 | "adminPassword": "[parameters('adminPassword')]" 134 | }, 135 | "storageProfile": { 136 | "imageReference": { 137 | "publisher": "[variables('imagePublisher')]", 138 | "offer": "[variables('imageOffer')]", 139 | "sku": "[variables('imageSKU')]", 140 | "version": "latest" 141 | }, 142 | "osDisk": { 143 | "name": "osdisk", 144 | "vhd": { 145 | "uri": "[concat(reference(resourceId('Microsoft.Storage/storageAccounts/', variables('vhdStorageAccountName'))).primaryEndpoints.blob,'vhds0/','osdisk.vhd')]" 146 | }, 147 | "caching": "ReadWrite", 148 | "createOption": "FromImage" 149 | }, 150 | "dataDisks": [ 151 | { 152 | "vhd": { 153 | "uri": "[concat(reference(resourceId('Microsoft.Storage/storageAccounts/', variables('vhdStorageAccountName'))).primaryEndpoints.blob,'vhds0/', variables('dcDataDisk'),'.vhd')]" 154 | }, 155 | "name": "[concat(variables('dcVMName'),'-data-disk1')]", 156 | "caching": "None", 157 | "diskSizeGB": "[variables('dcDataDiskSize')]", 158 | "lun": 0, 159 | "createOption": "empty" 160 | } 161 | ] 162 | }, 163 | "networkProfile": { 164 | "networkInterfaces": [ 165 | { 166 | "id": "[resourceId('Microsoft.Network/networkInterfaces',variables('dcPDCNicName'))]" 167 | } 168 | ] 169 | } 170 | }, 171 | "resources": [ 172 | { 173 | "name": "CreateADForest", 174 | "type": "extensions", 175 | "apiVersion": "2016-03-30", 176 | "location": "[resourceGroup().location]", 177 | "dependsOn": [ 178 | "[resourceId('Microsoft.Compute/virtualMachines', variables('dcVMName'))]" 179 | ], 180 | "properties": { 181 | "publisher": "Microsoft.Powershell", 182 | "type": "DSC", 183 | "typeHandlerVersion": "2.19", 184 | "autoUpgradeMinorVersion": true, 185 | "settings": { 186 | "ModulesUrl": "[variables('dcModulesURL')]", 187 | "ConfigurationFunction": "[variables('dcConfigurationFunction')]", 188 | "Properties": { 189 | "DomainName": "[parameters('domainName')]", 190 | "AdminCreds": { 191 | "UserName": "[parameters('adminUserName')]", 192 | "Password": "PrivateSettingsRef:AdminPassword" 193 | } 194 | } 195 | }, 196 | "protectedSettings": { 197 | "Items": { 198 | "AdminPassword": "[parameters('adminPassword')]" 199 | } 200 | } 201 | } 202 | } 203 | ] 204 | } 205 | ], 206 | 207 | "outputs": { 208 | "DCIp": { 209 | "type": "string", 210 | "value": "[reference(variables('dcPDCNicName')).ipConfigurations[0].properties.privateIPAddress]" 211 | } 212 | } 213 | } 214 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # [Technical Reference Implementation for Enterprise BI and Reporting](https://gallery.cortanaintelligence.com/Solution/Enterprise-Reporting-and-BI-Technical-Reference-Implementation-2) 3 | 4 | [![Deploy to Azure](https://raw.githubusercontent.com/Azure/Azure-CortanaIntelligence-SolutionAuthoringWorkspace/master/docs/images/DeployToAzure.PNG)](https://start.cortanaintelligence.com/track/Deployments/new/enterprisebiandreporting?source=GitHub) 5 | 6 | View deployed solution > 7 | 8 | ## Summary 9 | 10 | This solution creates a reference implementation for an end-to-end BI and Reporting platform built using SQL Data Warehouse, SQL Server Analysis Services, SQL Server Reporting Services, Power BI, and custom solutions for job management and coordination. Customers who are planning to build such a system for their enterprise applications, or onboarding their on-prem solution can use this TRI to get a jumpstart on the infrastructure implementation, and customize it for their needs. Systems Integrators can partner with us to use this reference implementation to accelerate their customer deployments. 11 | 12 | 13 | ## Description 14 | 15 | #### Estimated Provisioning Time: 2 Hours 16 | 17 | Azure offers a rich data and analytics platform for customers and ISVs seeking to build scalable BI and reporting solutions. However, customers face pragmatic challenges in building the right infrastructure for enterprise-grade production systems. They have to evaluate the various products for security, scale, performance and geo-availability requirements. They have to understand service features and their interoperability, and they must plan to address any perceived gaps using custom software. This takes time, effort, and many times, the end-to-end system architecture they design is sub-optimal. Consequently, the promise and expectations set during proof-of-concept (POC) stages do not translate to robust production systems in the expected time-to-market. 18 | 19 | This TRI addresses this customer pain by providing a reference implementation that is: 20 | - pre-built based on selected and stable Azure components proven to work in enterprise BI and reporting scenarios 21 | - easily configured and deployed to an Azure subscription within a few hours 22 | - bundled with software to handle all the operational essentials for a full-fledged production system 23 | - tested end-to-end against large workloads 24 | 25 | Once deployed, the TRI can be used as-is, or customized to fit the application needs using the technical documentation that is provided with the TRI. This enables the customer to build the solution that delivers the business goals based on a robust and functional infrastructure. 26 | 27 | ## Audience 28 | 29 | It is recommended that the TRI is reviewed and deployed by a person who is familiar with operational concepts of data warehousing, business intelligence, and analytics. Knowledge of Azure is a plus, but not mandatory. The technical guides provide pointers to Azure documentation for all the resources employed in this TRI. 30 | 31 | >Note: Connect with one of our Advanced Analytics partners to arrange a proof of concept in your environment: [Artis Consulting](http://www.artisconsulting.com/) 32 | 33 | ## Architecture 34 | 35 | ![Architecture](./img/azure-arch-enterprise-bi-and-reporting.png) 36 | 37 | 38 | ### How the TRI works, in a Nutshell 39 | 40 | TRI has 4 segments – Ingestion, Processing, Analysis & Reporting, and Consumption 41 | 42 | 1. A data generator, provided in place of the customer's data source, queries the job manager for a staging [Azure Blob](https://docs.microsoft.com/en-us/azure/storage/) storage. The job manager returns the handle to an ephemeral BLOB, and the data generator pushes data files into this storage. The TRI is designed with _a key assumption that the data that is to be ingested into the system has been ETL-processed_ for reporting and analytics. 43 | 2. When the job manager detects fresh data in the Azure Blob, it creates a dynamic one-time [Azure data factory](https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-introduction) pipeline to load the data from the Blob into a _logical_ _Loader_ SQL Data Warehouse (DW) using [Polybase](https://docs.microsoft.com/en-us/sql/relational-databases/polybase/get-started-with-polybase). The logical SQL DW is a design provided by the TRI for scalability and performance, to allow multiple physical [SQL Data Warehouse](https://docs.microsoft.com/en-us/azure/sql-data-warehouse/) replicas to handle large-scale concurrent queries. 44 | 3. Interactive BI is best served by cached analytical models to enable fast drilldowns for summaries aggregated over various dimensions or pivots. As data lands in DW tables, a [SSAS partition builder](https://docs.microsoft.com/en-us/sql/analysis-services/multidimensional-models-olap-logical-cube-objects/partitions-analysis-services-multidimensional-data) starts refreshing the tabular models that are dependent on these tables. 45 | 4. After a preconfigured duration, the job manager flips the _Loader_ DW to become the _Reader_ DW, ready to serve queries for report generation. The current _Reader_ flips to become the _Loader_, and the job manager starts data load on the _Loader_. This _Loader_-_Reader_ pairing is another design provided by the TRI for scalability and performance of queries against the data warehouse. When this flip happens, the Partition Builder commits the tabular models into files in a Blob. This Blob is then loaded into 46 | an [availability-set of SSAS read-only servers](https://docs.microsoft.com/en-us/sql/analysis-services/instances/high-availability-and-scalability-in-analysis-services). 47 | 5. You can set up a Power BI gateway to the SSAS Read-Only nodes, enabling Power BI dashboards to access the tabular models. 48 | 6. For reporting, SSRS generates the reports from data in the SQL DW via SSAS Direct Query. SSAS also offers row-level security for the data fetched from SQL DW. 49 | 7. You can schedule report generation with SSRS using the Report Builder client tool. The generated reports are stored in SSRS servers. You can enable email-based delivery of reports to users. 50 | 8. The end-to-end system can be managed via an Administrative web app. The TRI is configured with [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/), which you can use to control access to the apps (Admin UI and Control Server) and underlying Azure resources for the users of your system, and [OMS](https://docs.microsoft.com/en-us/azure/operations-management-suite/operations-management-suite-overview), which enables you to monitor the individual components in the Azure resource group. 51 | 52 | ## User's Guide 53 | 54 | Please follow the step by step instructions in the [User's Guide](https://github.com/Azure/azure-arch-enterprise-bi-and-reporting/blob/master/User%20Guides/UsersGuide-TOC.md) to deploy and operationalize the TRI. 55 | 56 | ## Technical Guides 57 | 58 | The design and operational details of the main data components of the TRI are provided in the [Technical Guide](https://github.com/Azure/azure-arch-enterprise-bi-and-reporting/blob/master/Technical%20Guides/TechnicalGuide-ToC.md). 59 | 60 | ## Resource Consumption 61 | 62 | In preparation for a deployment, please be aware that the default configuration will use the following resources. The solution may be configured to use a different number of resources during deployment. Please adjust your quotas accordingly. 63 | 64 | - 10 Virtual Machines 65 | - 20 Cores 66 | - 8 Storage Accounts 67 | - 3 Sql Servers 68 | - 2 Sql Data Warehouses 69 | - 1 Sql Database 70 | - 6 OMS Solutions 71 | - 6 Network Security Groups 72 | - 10 Network Interfaces 73 | - 1 Log Analytics 74 | - 6 Load Balancers 75 | - 1 Data Factory 76 | - 1 Batch Account 77 | - 6 Availability Sets 78 | - 1 Automation Account 79 | - 1 Application Insight 80 | - 1 App Service Plan 81 | - 1 App Service 82 | 83 | 84 | ## Disclaimer 85 | 86 | ©2017 Microsoft Corporation. All rights reserved. This information is provided "as-is" and may change without notice. Microsoft makes no warranties, express or implied, with respect to the information provided here. Third party data was used to generate the solution. You are responsible for respecting the rights of others, including procuring and complying with relevant licenses in order to create similar datasets. 87 | 88 | -------------------------------------------------------------------------------- /Technical Guides/7-Understanding the job manager.md: -------------------------------------------------------------------------------- 1 | # Understanding the Job Manager 2 | 3 | The Job Manager is an ASP.NET Web application hosted on one of the Azure IaaS VMs provisioned during the deployment. Its purpose is to track and manage TRI data ingestion and Analysis Services tabular model building. Its state is persisted to an Azure SQL database and exposed via [OData](https://msdn.microsoft.com/en-us/library/hh525392(v=vs.103).aspx) REST APIs. 4 | 5 | The Job Manager's responsibilities fall into three broad areas: data ingestion coordination, Logical and Physical Data Warehouse state management (i.e. `Standby`, `Active` and `Load`), and coordination of Analysis Services tabular model building. This document discusses these areas in the context of Job Manager APIs. 6 | 7 | ## Job Manager REST Endpoint and OData Client Schema 8 | In order to access the Job Manager, please refer to the "OData API" section of the Cortana Intelligence Quick Start deployment summary page. It lists the URL for the Job Manager in the form of `https://.adminui.:8081/odata`. You can see the service client schema by calling GET `/odata/$metadata` API. 9 | 10 | ## Data Ingestion 11 | ### Ephemeral Storage Accounts 12 | This [Ephemeral Storage Account](./1-Understanding%20ephemeral%20blobs.md) are managed by the Job Manager. At any point in time, clients can fetch the active Epemeral Blob storage account and its SAS token by calling `/StorageAccounts` OData API. 13 | 14 | ### Data Warehouse Table Availability Ranges 15 | TRI clients create `DWTableAvailabilityRange` entities to signal arrival of new data (see [Understanding Data Ingestion](./2-Understanding%20data%20ingestion.md)). Clients can monitor the status of their data imports by querying `/DWTableAvailabilityRanges` OData API. 16 | 17 | ### Job Runtime Policy 18 | The Job Manager imposes certain policies on Load jobs. Those policies can be fetched by calling `/RuntimePolicy` OData API. They can be modified by invoking `PUT` or `PATCH` requests to `/RuntimePolicy` OData API endpoint. 19 | 20 | ### Runtime Tasks 21 | Once a `DWTableAvailabilityRange` entity is created, a background process running inside the Job Manager will create an Azure Data Factory pipeline to ingest the data from the Ephemeral Blob Storage account into each of the Physical Data Warehouses. One `RuntimeTask` corresponds to one Azure Data Factory pipeline. Clients can monitor their status by calling `/RuntimeTasks` OData API. 22 | 23 | ### Runtime Task Policy 24 | The Job Manager can enforce certain policies on the Runtime Tasks above. Clients can query `/RuntimeTaskPolicy` OData API to see the default policies. The defaults can be changed by invoking `PUT` or `PATCH` requests to `/RuntimeTaskPolicy` OData API endpoint. 25 | 26 | ### Data Warehouse Tables 27 | The Job Manager tracks references of Physical Data Warehouse tables. Clients can query `/DWTables` OData API to see the tables, table types (i.e. `Fact`, `Dimension` or `Aggregate`), etc. When a new table is created in the Physical Data Warehouse, clients must create a `DWTable` entity by sending a `POST` request to `/DWTable` OData API. Similarly, when a client drops a table in the Physical Data Warehouse, they must delete corresponding `DWTable` entity by sending `DELETE` request to `/DWTable` OData API endpoint. 28 | 29 | ### Data Warehouse Table Dependencies 30 | The Job Manager enables clients to declare dependencies between Physical Data Warehouse tables. When a dependency is declared between two Data Warehouse tables, the Job Manager will ensure that the given table is loaded into the Physical Data Warehouse only after all of its dependencies are loaded for specified time interval. Clients can query dependencies by calling `/DWTableDependencies` OData API. Clients can declare a dependency by sending `POST` request to `/DWTableDependencies` OData API endpoint. 31 | 32 | ### Stored Procedures 33 | In order to generate data for aggregate tables, `StoredProcedure` entities can be associated with fact tables. The Job Manager will ensure that once a given fact table is loaded, the stored procedure is invoked to generate data for the aggregate table. Note that the Job Manager only stores the mappings between stored procedures and Data Warehouse tables. Users will need to create stored procedures in each Physical Data Warehouse. 34 | 35 | Clients can query existing stored procedures by calling `/StoredProcedures` OData API. After the stored procedure is creating in each Physical Data Warehouse, clients can create a `StoredProcedure` entity by sending `POST` request to `/StoredProcedures` OData API endpoint. 36 | 37 | ## Logical and Physical Data Warehouse state management 38 | As discussed in [Understanding Logical Data Warehouses](./4-Understanding%20logical%20datawarehouses.md) and [Understanding Data Warehouse Flip](./5-Understanding%20data%20warehouse%20flip.md), the Job Manager is responsible for maintaining the mapping between Logical and Physical Data Warehouses, as well as their states. 39 | 40 | ### Logical to Physical Data Warehouse mappings 41 | Clients can find out the mapping between Logical and Physical Data Warehouses by calling `/LDWPDWMappings` OData API. 42 | 43 | ### Logical Data Warehouse state history 44 | Clients can audit Logical and Physical Data Warehouse state transition history by calling `/DWStatesHistory` OData API. 45 | 46 | ### Data Warehouse flip intervals 47 | As discussed in [Understanding Data Warehouse Flip](./5-Understanding%20data%20warehouse%20flip.md), a series of operations is required to be performed during the Data Warehouse flip. While the `/DWStatesHistory` OData API lists the times and the statuses of each Logical and Physical Data Warehouse state transition, `/LDWExpectedStates` API lists expected flip intervals for each of the Logical Data Warehouses; i.e. what state a given Logical Data Warehouse should be in at a given time. 48 | 49 | ## Analysis Service Direct Query node management 50 | ### Analysis Services Direct Query node to Physical Data Warehouse mapping 51 | As discussed in [Understanding Data Warehouse Flip](./5-Understanding%20data%20warehouse%20flip.md), Analysis Services Direct Query nodes run a daemon which ensures that the nodes are always connected to the Data Warehouse currently set in the `Active` state. During the Data Warehouse Flip, the daemons will update to point to the Logical Data Warehouse in `Active` state and call a Job Manager API to report which Physical Data Warehouse the node is connected to. 52 | 53 | This ensures that the Job Manager is always aware of which Physical Data Warehouses are connected to the Analysis Services Direct Query nodes and can perform data warehouse state transition accordingly. 54 | 55 | Clients can see which Physical Data Warehouse each of the Direct Query nodes is connected by calling `/PdwAliasNodeStates` OData API. 56 | 57 | ## Analysis Services tabular model building 58 | As discussed in [Understanding Tabular Model Refresh](./6-Understanding%20tabular%20model%20refresh.md#tabular-model-partition-state-transition), Analysis Services Partition Builder nodes build tabular models to be consumed by the Analysis Services Read-Only nodes. The Job Manager orchestrates tabular model building and Read-Only node refresh by exposing a set of OData APIs. A daemon running on the Analysis Services Partition Builder node calls a set of Job Manager APIs to query for and report on the progress of partition building. 59 | 60 | ### Analysis Services tabular models 61 | Clients can call `/TabularModels` OData API to fetch the list of tabular models with their server and database names. Partition Builder uses these server and database names to connect to the data source. 62 | 63 | ### Analysis Services tabular model partitions 64 | Clients can call `/TabularModelTablePartitions` OData API to fetch the list of tabular model table partitions. For an in-depth discussion on their meaning, please refer to [Tabular model configuration for continuous incremental refresh at scale](./6-Understanding%20tabular%20model%20refresh.md#tabular-model-configuration-for-continuous-incremental-refresh-at-scale). 65 | 66 | ### Analysis Services tabular model partition states 67 | Clients can call `/TabularModelTablePartitionStates` OData API to fetch the status (i.e. `Queued`, `Dequeued`, `Processed`, `Ready`, `Purged`) of each of the tables in the tabular model. This API effectively shows the status of Analysis Services Partition Builder. 68 | 69 | ### Analysis Services tabular model node assignments 70 | Once the Partion Builder finishes building tabular model table partitions, Analysis Services Read-Only nodes must refresh the models. Daemons running on Analysis Services Read-Only nodes will call the Job Manager APIs to check if updates are available. Upon updating the model, the daemon will call a Job Manager API to mark the node as updated. 71 | 72 | Clients can call `/TabularModelNodeAssignments` OData API to find the latest partition for each tabular model table and each Analysis Services Read-only node. 73 | 74 | # The Job Manager Properties and Miscellaneous APIs 75 | 76 | ## Job Manager Status API 77 | Clients can use `/ServerStatus` OData API as an HTTP ping function to ensure that the Job Manager is up and serving requests. This could be useful for setting up external monitoring and HTTP probes. 78 | 79 | ## Job Manager Properties 80 | Clients can query and update the Job Manager properties by calling `/ControlServerProperties` API. The table below summarizes the properties and their meaning. 81 | 82 | | Property name | Description | 83 | |:----------|:------------| 84 | |**ComputeUnits_Active**| Compute Data Warehouse Units for Physical Data Warehouses in `Active` state. Note that the value must match one of the values listed under [SQL Data Warehouse pricing](https://azure.microsoft.com/en-us/pricing/details/sql-data-warehouse/elasticity/). See [Data Warehouse Units (DWUs) and compute Data Warehouse Units (cDWUs)](https://docs.microsoft.com/en-us/azure/sql-data-warehouse/what-is-a-data-warehouse-unit-dwu-cdwu) | 85 | |**ComputeUnits_Load**| Compute Data Warehouse Units for Physical Data Warehouses in `Load` state. Note that the value must match one of the values listed under [SQL Data Warehouse pricing](https://azure.microsoft.com/en-us/pricing/details/sql-data-warehouse/elasticity/). | 86 | |**ComputeUnits_Standby**| Compute Data Warehouse Units for Physical Data Warehouses in `Standby` state. Note that the value must match one of the values listed under [SQL Data Warehouse pricing](https://azure.microsoft.com/en-us/pricing/details/sql-data-warehouse/elasticity/). | 87 | |**MinDQNodesNotInTransitionStateDuringFlip**| This parameter specifies the minimum number of Analysis Services Direct Query nodes that will serve traffic during the data warehouse flip. By default, it is set to 1. As a result, the Job Manager will guarantee that at least one DQ node will always be available to serve traffic, while the rest of the fleet is performing the flip. | 88 | |**MinsAliasNodeDaemonGraceTime**| This parameter specifies the number of minutes that a Direct Query node will wait for the existing connections to be closed before the data warehouse flip is initiated. By default, it is set to 3. | 89 | |**MinSSASROServersNotInTransition**| This parameter specifies the minimum number of Analysis Services Read-Only nodes that will serve traffic during the tabular model refresh. By default, it is set to 1. As a result, the Job Manager will ensure that at least on Read-Only node will be ready to serve traffic while the rest of the fleet is refreshing. | 90 | |**MinsToWaitBeforeKillingAliasNodeDaemonGraceTime**| This parameter specifies the number of minutes that each Analysis Services Direct Query node is given to perform the flip. If that time is exceeded, the Job Manager will terminate the flip. It is set to 10 minutes. | 91 | 92 | 93 | 94 | 95 | -------------------------------------------------------------------------------- /User Guides/7-Configure Data Ingestion.md: -------------------------------------------------------------------------------- 1 | # Configuring Data Ingestion 2 | 3 | # Summary 4 | This page provides the steps to configure data ingestion in the Enterprise Reporting and BI TRI. 5 | 6 | Once the TRI is deployed, these are your two options to ingest your ETL-processed data into the system: 7 | 1. Modify the code provided in the TRI to ingest your data 8 | 2. Integrate your existing pipeline and store into the TRI 9 | 10 | 11 | ## 1. Modify the code provided in the TRI to ingest your data 12 | 13 | 14 | The TRI deploys a dedicated VM for data generation, with a PowerShell script placed in the VM. This script gets called by the Job Manager at a regular cadence (that is configurable). You can modify this script as follows; 15 | 16 | Confirm that prerequisites are installed in the VM - Install **AzCopy** - if it is not already present in the VM (see [here](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy)). 17 | 18 | Modify the code as per your requirements and run it: The PowerShell script `GenerateAndUploadDataData.ps1` is located in the VM at `C:\Enterprise_BI_and_Reporting_TRI\DataGenerator`. Please note that this script generates and uploads data. 19 | 20 | See below for an example script which only uploads a file and registers with the job manager. 21 | 22 | The following information needs to be retrieved from your resource group. Log into Azure Portal and open the resource group where the solution was deployed. 23 | Go to the Automation account in the resource group and open the automation Account variables. Following are the variables which are needed in the script below or for your own script. 24 | 25 | | Name | Variable | 26 | | ---- | -------- | 27 | | ControlServerUri | `controlServerUri` | 28 | | CertThumbprint | `internalDaemonPfxCertThumbprint` | 29 | | AADTenantDomain | `adTenantDomain` | 30 | | ControlServerIdentifierUris | `adAppControlServerIdentifierUri` | 31 | | AADApplicationId | `adAppControlServerId` | 32 | 33 | Please adapt the below script as per your needs. 34 | 35 | 36 | ```Powershell 37 | # --------------------------------------------------------------------------------------------------------------------------------------------- 38 | # Name:upload_file_prod.ps1 39 | # Description: This script is use to upload a file to Blob and indicate to job manager that the file has been uploaded 40 | # The script accepts 3 parameters. TableName, FileName and the Rundate. Here assumption is that a file is sent for a 24hr period. 41 | # Steps: 42 | # 1. Authenticate 43 | # 2. Contact job manager to get the blob. 44 | # 3. Upload the file to the blob. 45 | # 4. Update job manager that upload has finished 46 | # 47 | # Sample Command line: .\upload_file_prod.ps1 -Tablename 'dbo.customer' -FileName 'customer.tbl.1' -RunDate '20171002' 48 | #----------------------------------------------------------------------------------------------------------------------------------------------- 49 | 50 | 51 | Param( 52 | [parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Enter Table Name to populate")] 53 | [string]$Tablename, 54 | [parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Enter filename to upload")] 55 | [string]$FileName, 56 | [parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Enter the Rundate/dataset date in YYYYMMDD format.")] 57 | [string]$RunDate 58 | ) 59 | 60 | 61 | Function GetAccessTokenClientCertBased([string] $certThumbprint, [string] $tenant, [string] $resource, [string] $clientId) 62 | { 63 | [System.Reflection.Assembly]::LoadFile("$PSScriptRoot\Microsoft.IdentityModel.Clients.ActiveDirectory.dll") | Out-Null # adal 64 | [System.Reflection.Assembly]::LoadFile("$PSScriptRoot\Microsoft.IdentityModel.Clients.ActiveDirectory.Platform.dll") | Out-Null # adal 65 | $cert = Get-childitem Cert:\LocalMachine\My | where {$_.Thumbprint -eq $certThumbprint} 66 | $authContext = new-object Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext("https://login.windows.net/$tenant") 67 | $assertioncert = new-object Microsoft.IdentityModel.Clients.ActiveDirectory.ClientAssertionCertificate($clientId, $cert) 68 | $result = $authContext.AcquireToken($resource, $assertioncert) 69 | $authHeader = @{ 70 | 'Content-Type' = 'application\json' 71 | 'Authorization' = $result.CreateAuthorizationHeader() 72 | } 73 | 74 | return $authHeader 75 | } 76 | 77 | 78 | 79 | Echo "Starting the Program" 80 | 81 | # Set the below variables as per your environment. 82 | # To get this values, login into Azure Portal and open the resource group where the solution was deployed. 83 | # Go to the Automation account in the resource group and open the automationAccount variables 84 | # 1. ControlServerUri : Variable Name- controlServerUri 85 | # 2. CertThumbprint: Variable Name - internalDaemonPfxCertThumbprint 86 | # 3. AADTenantDomain: Variable Name - adTenantDomain 87 | # 4. ControlServerIdentifierUris : Variable Name - adAppControlServerIdentifierUri 88 | # 5. AADApplicationId: Variable Name - adAppControlServerId 89 | 90 | 91 | $ControlServerUri = 'https://navigt115t8457.adminui.ciqsedw.ms:8081' 92 | $CertThumbprint = 'xxxxxxx' 93 | $AADTenantDomain = 'xxxxxx' 94 | $ControlServerIdentifierUris = 'http://microsoft.onmicrosoft.com/navigt115t8457ADAppControlServer' 95 | $AADApplicationId = 'xxxxxxx' 96 | 97 | 98 | 99 | # Obtain bearer authentication header 100 | $authenticationHeader = GetAccessTokenClientCertBased -certThumbprint $CertThumbprint ` 101 | -tenant $AADTenantDomain ` 102 | -resource $ControlServerIdentifierUris ` 103 | -clientId $AADApplicationId 104 | 105 | # Set the working directory 106 | $invocation = (Get-Variable MyInvocation).Value 107 | $directorypath = Split-Path $invocation.MyCommand.Path 108 | Set-Location $directorypath 109 | 110 | # Log file name 111 | $LOG_FILE='dataupload-log.txt' 112 | 113 | 114 | # Control server URI to fetch the storage details for uploading 115 | # Fetch only the current storage 116 | $getCurrentStorageAccountURI = $ControlServerUri + '/odata/StorageAccounts?$filter=IsCurrent%20eq%20true' 117 | 118 | # DWTableAvailabilityRanges endpoint 119 | $dwTableAvailabilityRangesURI = $ControlServerUri + '/odata/DWTableAvailabilityRanges' 120 | 121 | # The blob container to upload the datasets to 122 | $currentStorageSASURI = '' 123 | 124 | # set Path for On-prem data file location 125 | $source = "E:\projects\tri\tri1_testing" 126 | 127 | # AzCopy path. AzCopy must be installed. Update path if installed in non-default location 128 | $azCopyPath = "C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe" 129 | 130 | 131 | # Data contract for DWTableAvailabilityRanges' request body 132 | 133 | $dwTableAvailabilityRangeContract = @{ 134 | DWTableName="" 135 | StorageAccountName="" 136 | ColumnDelimiter="" 137 | FileUri="" 138 | StartDate="" 139 | EndDate="" 140 | FileType="" 141 | } 142 | 143 | 144 | $invocation = (Get-Variable MyInvocation).Value 145 | $directorypath = Split-Path $invocation.MyCommand.Path 146 | echo $directorypath > $LOG_FILE 147 | 148 | # Invoke the Job Manager to fetch the latest blob container to upload the blobs to 149 | try 150 | { 151 | $response = Invoke-RestMethod -Uri $getCurrentStorageAccountURI -Method Get -Headers $authenticationHeader 152 | if($response -and $response.value -and $response.value.SASToken){ 153 | $currentStorageSASURI = $response.value.SASToken 154 | $storageAccountName = $response.value.Name 155 | echo "Current storage location - $currentStorageSASURI" >> $LOG_FILE 156 | } else{ 157 | $errMessage = "Could not find SAS token in the response fron Control Server." + $response.ToString() 158 | echo $errMessage >> $LOG_FILE 159 | exit 1 160 | } 161 | } 162 | catch 163 | { 164 | echo "Error fetching current storage account from Control Server" >> $LOG_FILE 165 | echo $error >> $LOG_FILE 166 | exit 2 167 | } 168 | 169 | 170 | # Create a custom AzCopy log file stamped with current timestamp 171 | 172 | $azCopyLogFileName = [io.path]::combine($Env:TEMP, -join("AzCopy-", $((get-date).ToUniversalTime()).ToString("yyyyMMddThhmmssZ"), '.log')) 173 | If (Test-Path $azCopyLogFileName){ 174 | 175 | Remove-Item $azCopyLogFileName 176 | echo "Deleted existing AzCopy log file $azCopyLogFileName" >> $LOG_FILE 177 | } 178 | 179 | # Create empty log file in the same location 180 | $azCopyLogFile = New-Item $azCopyLogFileName -ItemType file 181 | 182 | # Execute AzCopy to upload files. 183 | echo "Begin uploading data files to storage location " >> $LOG_FILE 184 | & "$azCopyPath" /source:""$source"" /Dest:""$currentStorageSASURI"" /Y /V:""$azCopyLogFile"" /Pattern:$FileName 185 | echo "Completed uploading data file $srcFileName to storage location" >> $LOG_FILE 186 | 187 | 188 | echo "Begin post process for - $FileName" 189 | $dwTableName = $Tablename 190 | 191 | # ************************************************************************************** 192 | # Date fields need special formatting. 193 | # 194 | # 1. Reconvert to OData supported DateTimeOffset format string using the 195 | # 's' and 'zzz' formatter options 196 | # ************************************************************************************** 197 | # Start date 198 | $startDateStr = $RunDate 199 | 200 | [datetime]$startDate = New-Object DateTime 201 | if(![DateTime]::TryParseExact($startDateStr, "yyyyMMddHH", [System.Globalization.CultureInfo]::InvariantCulture, [System.Globalization.DateTimeStyles]::AdjustToUniversal, [ref]$startDate)){ 202 | [DateTime]::TryParseExact($startDateStr, "yyyyMMdd", [System.Globalization.CultureInfo]::InvariantCulture, [System.Globalization.DateTimeStyles]::AdjustToUniversal, [ref]$startDate) 203 | } 204 | $startDateFormatted = $startDate.ToString("s") + $startDate.ToString("zzz") 205 | 206 | # End date is set to 24 hours . We are assuming the file is sent once per 24 hours. 207 | $endDate = $startDate.AddHours(24) 208 | 209 | $endDateFormatted = $endDate.ToString("s") + $endDate.ToString("zzz") 210 | 211 | $successFileUri = $currentStorageSASURI.Split("?") 212 | $successFileUri = -join($successFileUri[0],"/",$FileName) 213 | 214 | 215 | #Construct DWTableAvailabilityRange request body 216 | $dwTableAvailabilityRangeContract['DWTableName'] = $dwTableName 217 | $dwTableAvailabilityRangeContract['FileUri'] = $successFileUri 218 | $dwTableAvailabilityRangeContract['StorageAccountName'] = $storageAccountName 219 | $dwTableAvailabilityRangeContract['StartDate'] = $startDateFormatted 220 | $dwTableAvailabilityRangeContract['EndDate'] = $endDateFormatted 221 | $dwTableAvailabilityRangeContract['ColumnDelimiter'] = "|" 222 | $dwTableAvailabilityRangeContract['FileType'] = 'Csv' 223 | 224 | $dwTableAvailabilityRangeJSONBody = $dwTableAvailabilityRangeContract | ConvertTo-Json 225 | 226 | # Create DWTableAvailabilityRanges entry for the current file 227 | try 228 | { 229 | echo "Begin DWTableAvailabilityRanges creation for file - $fileNameSegment with body $dwTableAvailabilityRangeJSONBody" >> $LOG_FILE 230 | $response = Invoke-RestMethod $dwTableAvailabilityRangesURI -Method Post -Body $dwTableAvailabilityRangeJSONBody -ContentType 'application/json' -Headers $authenticationHeader 231 | echo "DWTableAvailabilityRanges creation successful" >> $LOG_FILE 232 | } 233 | catch 234 | { 235 | echo "Error creating DWTableAvailabilityRanges on Control Server" >> $LOG_FILE 236 | echo $error >> $LOG_FILE 237 | } 238 | 239 | exit 0 240 | 241 | ``` 242 | 243 | 244 | ## 2. Integrate your existing pipeline and store into the TRI 245 | 246 | In cases where you already have an ETL pipeline setup, you will need to do the following. 247 | 248 | 1. Use the above script as an example to get the blob account to upload. 249 | 2. Upload the File to the storage account using your pipeline 250 | 3. Inform the job manager that a file has been uploaded. 251 | -------------------------------------------------------------------------------- /User Guides/4-Manage the Deployed Infrastructure.md: -------------------------------------------------------------------------------- 1 | # Manage the Deployed Infrastructure 2 | 3 | ## Summary 4 | Once the deployment is successfully completed, you will find information about how to manage the deployment and change some key properties of your deployment. 5 | 6 | # Table of Contents 7 | 1. [Change key properties of the deployment](#change-key-properties-of-the-deployment) 8 | 2. [Change the password to the data warehouse user account](#change-the-password-to-the-data-warehouse-user-account) 9 | 3. [Add more virtual machines to your SSRS scale-out set](#add-more-virtual-machines-to-your-ssrs-scale-out-set) 10 | 4. [Add more virtual machines to your SSAS Read-Only scale-out set](#add-more-virtual-machines-to-your-ssas-read-only-scale-out-set) 11 | 5. [Add more virtual machines to your SSAS Direct-Query scale-out set](#add-more-virtual-machines-to-your-ssas-direct-query-scale-out-set) 12 | 13 | ### **Change key properties of the deployment** 14 | 15 | The following data-driven values are tracked in the dbo.ControlServerProperties table in the Job Manager database (search for ControlServerDB in resources under your resource group) and can be edited to change the desired setting: 16 | 1. `ComputeUnits_Active` and `ComputeUnits_Load`: 17 | These two properties track the data warehouse compute units for the data warehouse that is in the active or load state respectively. Please exercise caution and pick an [allowable value](https://azure.microsoft.com/en-us/pricing/details/sql-data-warehouse/elasticity/) while also considering the pricing for the desired values. If the value is not allowed, it will be ignored in favor of a safe default which will not help you achieve the desired scale. 18 | 19 | 2. `FlipInterval`: This number stands for the number of hours after which we will attempt to flip the state of the logical data warehouses (along with the physical data warehouses under it) from `Active` to `Load`. The exact flip event times are visible in the Admin UI as well as the dbo.DWStateHistories table in the Job Manager database. 20 | 21 | 3. `MinDQNodesNotInTransitionStateDuringFlip`: This number specifies the minimum number of Direct Query nodes in an `Analysis Services Direct Query (ASDQ)` group that should be in normal state serving queries during any flip operation. During any flip operation check for a direct query node, it is checked whether there are {`MinDQNodesNotInTransitionStateDuringFlip`} number of direct query nodes in normal state. If not then the node is not flipped and it awaits for the condition to be satisfied before flip can happen. If there in only a single node in an `ASDQ` group then this condition is ignored. This must be a positive integer that must not exceed the total number of Direct Query nodes in an `ASDQ` Group. 22 | 23 | 4. `MinSSASROServersNotInTransition`: This is the minimum number of Read Only nodes that should be in Active state at any time. This must be a positive integer that must not exceed the total number of Read-Only nodes. 24 | 25 | 5. `MinsAliasNodeDaemonGraceTime`: Time in minutes that a Direct Query node is given to drain existing connections during a flip operaton. The connection string to the physical datawarehouse is switched after {`MinsAliasNodeDaemonGraceTime`} minutes from the start of the flip operation initiation for the node. 26 | 27 | 6. `MinsToWaitBeforeKillingAliasNodeDaemonGraceTime`: In case a Direct Query node does not respond back after `MinsAliasNodeDaemonGraceTime` minutes signalling the completion of its connection string switch (in case the node goes down), a sweeper job changes the state of the Direct Query node to `ChangeCompleted` after waiting for another {`MinsToWaitBeforeKillingAliasNodeDaemonGraceTime`} minutes and unblocks the flip operation. When the node comes back up, it polls its new connection string from Control Server and is ready to serve requests. 28 | 29 | 7. `BatchUriLinkedService`: The uri to the batch linked service. 30 | 31 | ### **Change the password to the data warehouse user account** 32 | Access to the data warehouses is controlled by sql logins. The passwords for these logins are managed in the key vault as the central authority. However, the password is also cached in other entities which makes it tricky to change the password for the sql logins, while ensuring seamless execution of the data flow. Following is the places where you must change the password: 33 | 34 | 1. SQL Data Warehouse 35 | 36 | All the Azure SQL data warehouses must maintain the same password for any given user login. This is required since the SSAS Direct Query nodes issue queries using the same credentials. 37 | 38 | 2. Key vault 39 | 40 | All clients that access the database retrieve the password from key vault. The passwords are stored as secrets. You will need to grant your account the right access policies in order to view and edit the secrets. 41 | 42 | **How to grant access to a user to view and edit the secrets** 43 | 44 | 1. Find the key vault resource using the search term 'keyvault' in your resource group. 45 | 2. Click on the `Access policies` blade and click `Add new`. 46 | 3. Click `Select principal` and find your user account and click `Select`. 47 | 4. For `Secret permissions`, grant yourself permissions to `Get`, `Set` and `List` under `Secret Management Operations`. 48 | 5. Then press `OK` and `Save` again to save your changes. 49 | 50 | **How to change the password for the sql login stored as a secret in the Key Vault** 51 | 52 | 1. Find the key vault resource using the search term 'keyvault' in your resource group. 53 | 2. Select `Secrets` under `SETTINGS` and find the name of the secret (usually ends in 'Password'). 54 | 3. Click `New Version` in the following screen and `Manual` as `Upload options`. Supply the new password and click `Create`. 55 | 56 | 3. Linked Services in the Azure Data Factory 57 | 58 | Your resource group will have an Azure Data Factory that has linked services. There will be one linked service for each physical data warehouse. You will need to update the linked service after changing the password on the data warehouse since the linked services are not refreshed after creation. 59 | 60 | How to change the password for a linked service: 61 | 62 | 1. Find the Azure Data Factory resource using the search term 'LoadEDW' in your resource group. 63 | 2. Click on `Author and deploy` that will open the Azure Data Factory in an editable mode. 64 | 3. Expand on `Linked services` and find **all** the linked services that begin with 'azureSqlDataWarehouseLinkedService'. There should be one linked service per data warehouse. Edit the password field and click `Deploy`. 65 | 66 | ![Change password on ADF Linked Service that points to a data warehouse](../img/ChangeADFLinkedServicePassword.png) 67 | 68 | 4. Connection on the SSAS Direct-Query nodes 69 | 70 | You can connect to the SSAS server on the SSAS Direct-Query virtual machines (SSAS Direct-Query nodes can be found by searching for resources using the search term 'ssasdq' in your resource group) and view the connections under each Analysis Services database. These connections point to individual physical data warehouses and the passwords are cached until the next data warehouse flip event is triggered. Edit the connection and change the password to ensure that the queries from the SSAS Direct-Query nodes to the data warehouses do not fail during this time. 71 | 72 | ### **Add more virtual machines to your SSRS scale-out set** 73 | 74 | You can add new virtual machines of type SSRS if you want to want to scale-out your SSRS nodes. 75 | 76 | 1. Add SSRS virtual machines: 77 | 1. Find your deployment resource group in the [Azure portal](https://portal.azure.com). 78 | 2. Add new VMs as needed along with network interfaces and join the VM to your Azure AD domain, [install the OMS extension](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/extensions-oms) to enable monitoring and join the virtual machine to the SSRS availability set. Alternately, you can search for the deployment (called 'deployVMSSRS') from the deployments in your resource group and redeploy the additional VMs by simply increasing the number of instances. 79 | 80 | 2. Register DSC on the new virtual machines: 81 | 1. Find your resource by searching for 'automationAccount' in your resource group. 82 | 2. Under `Configuration Management`, click on `DSC Nodes` and add your new Azure VM and connect it to the 'DSCConfiguration.ssrs' node configuration. Choose 'ApplyAndMonitor' for the `Configuration Mode` and 'ContinueConfiguration' for the `Action after Reboot` while adding the node. Once the node is added, you will be able to find it listed under `DSC Nodes`. 83 | 84 | 3. Add the virtual machine to the SSRS scale-out set: 85 | 86 | This step will require you to download a custom script extension (ConfigureSSRS-CSE.ps1) which can be found in the 'ssrs' container in the artifacts storage account (search for 'edwartifacts' under your resource group) and use the default parameter set ('NonPrimarySSRS') since you are adding the VM to an existing scale-out set. 87 | 88 | ### **Add more virtual machines to your SSAS Read-Only scale-out set** 89 | 90 | You can add new virtual machines of type SSAS Read-Only if you want to want to scale-out your SSAS Read-Only nodes. 91 | 92 | 1. Add SSAS Read-Only virtual machines: 93 | 1. Find your deployment resource group in the [Azure portal](https://portal.azure.com). 94 | 2. Add new VMs as needed along with network interfaces and join the VM to your Azure AD domain, [install the OMS extension](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/extensions-oms) to enable monitoring and join the virtual machine to the SSAS Read-Only availability set. Alternately, you can search for the deployment (called 'deployVMSSASRo') from the deployments in your resource group and redeploy the additional VMs by simply increasing the number of instances. 95 | 96 | 2. Register DSC on the new virtual machines: 97 | 1. Find your resource by searching for 'automationAccount' in your resource group. 98 | 2. Under `Configuration Management`, click on `DSC Nodes` and add your new Azure VM and connect it to the 'DSCConfiguration.ssasro' node configuration. Choose 'ApplyAndMonitor' for the `Configuration Mode` and 'ContinueConfiguration' for the `Action after Reboot` while adding the node. Once the node is added, you will be able to find it listed under `DSC Nodes`. 99 | 100 | 3. Populate the dbo.TabularModelNodeAssignments table in the Job Manager database (search for 'ControlServerDB' in the resources under your resource group). The data values depend on the Analysis Services tables that you intend to populate on the SSAS Read-Only virtual machine. You can examine the existing entries corresponding to existing nodes in this table for suitable examples. In order to catch up with the other Read-Only nodes, we recommend restoring backups of the databases from those nodes onto your newly added node. 101 | 102 | ### **Add more virtual machines to your SSAS Direct-Query scale-out set** 103 | 104 | You can add new virtual machines of type SSAS Direct-Query if you want to want to scale-out your SSAS Direct-Query nodes. 105 | 106 | 1. Add SSAS Direct-Query virtual machines: 107 | 1. Find your deployment resource group in the [Azure portal](https://portal.azure.com). 108 | 2. Add new VMs as needed along with network interfaces and join the VM to your Azure AD domain, [install the OMS extension](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/extensions-oms) to enable monitoring and join the virtual machine to the SSAS Direct-Query availability set. Alternately, you can search for the deployment (called 'deployVMSSASDq') from the deployments in your resource group and redeploy the additional VMs by simply increasing the number of instances. 109 | 110 | 2. Register DSC on the new virtual machines: 111 | 1. Find your resource by searching for 'automationAccount' in your resource group. 112 | 2. Under `Configuration Management`, click on `DSC Nodes` and add your new Azure VM and connect it to the 'DSCConfiguration.ssasdq' node configuration. Choose 'ApplyAndMonitor' for the `Configuration Mode` and 'ContinueConfiguration' for the `Action after Reboot` while adding the node. Once the node is added, you will be able to find it listed under `DSC Nodes`. 113 | 114 | 3. Populate the dbo.PDWAliasNodeStates table in the Job Manager database (search for 'ControlServerDB' in the resources under your resource group). You can examine the existing rows in this table for suitable examples. Pick an AliasName from the current set of aliases. In order to equally distribute the load among the physical data warehouses, we recommend that you target for an even distribution of aliases between the rows in this table to the extent possible. 115 | -------------------------------------------------------------------------------- /User Guides/8-Configure SQL Server Analysis Services.md: -------------------------------------------------------------------------------- 1 | # Analysis Services for Interactive BI 2 | 3 | The TRI helps you operationalize and manage tabular models in Analysis Services (AS) for interactive BI. The read-only AS servers are configured to handle interactive BI query load from client connections via a frontend load balancer. Analysis Services, tabular models, and their characteristics are explained [here]((https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas)) 4 | 5 | The SSAS Model Cache in this TRI consists of six components - their roles described in the [architectural overview](../README.md#architecture) 6 | - Tabular Models 7 | - SSAS Partition Builder 8 | - SSAS Read Only Cache servers 9 | - Job Manager that coordinates the tabular model refresh 10 | - Azure Blob that stores the tabular models for refresh 11 | - Load Balancers that handles client connections 12 | 13 | ![SSAS Tabular Models Cache](../img/SSAS-Model-Cache.png) 14 | 15 | ## Tabular Model Creation 16 | 17 | Tabular models are Analysis Services databases that run in-memory, or act as a pass-through for backend data sources. They support cached summaries and drilldowns of large amounts of data, thanks to a columnar storage that offers 10x or more data compression. This makes them ideal for interactive BI applications. See [this article](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas) for more details. Typically, tabular models hold only a subset of the big data held in upstream data warehouse or data marts – in terms of the number of entities, and data size. There is a large corpus of best practices information for tabular model design and tuning, including this [excellent article](https://msdn.microsoft.com/en-us/library/dn751533.aspx) on the lifecycle of an enterprise grade tabular model. 18 | 19 | You can use tools such as the [SSDT tabular model designer](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-designer-ssas), available with Visual Studio 2015 (and later), to create your tabular models. Set the compatibility level of the tabular models at `1200` or higher (latest is `1400`, as of this writing) and the query mode to `In-Memory`. See [here](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-solutions-ssas-tabular) for details on tabular model creation. 20 | 21 | ## Tabular Model Partition Processing 22 | 23 | Partition creation is automated using the open source [AsPartitionProcessing tool](https://github.com/Microsoft/Analysis-Services/tree/master/AsPartitionProcessing). Many of the configurable options for partition building directly correspond to the configuration of this tool. Refer to AsPartitionProcessing tool's [whitepaper](https://github.com/Microsoft/Analysis-Services/blob/master/AsPartitionProcessing/Automated%20Partition%20Management%20for%20Analysis%20Services%20Tabular%20Models.pdf) for further documentation. 24 | 25 | ## Tabular model configuration for continuous incremental refresh at scale 26 | The various orchestration components of the TRI refer to four configuration tables to enable continuous and incremental model refresh. 27 | 28 | You can provide configuration inputs for two of these tables: 29 | 30 | | TableName | Description | 31 | |:----------|:------------| 32 | |**TabularModel**|Lists tabular models with their server and database names, to be used by the daemon on the Partition Builder servers to connect to the SSAS server and the database for refresh.| 33 | |**TabularModelTablePartitions**|This table specifies which model that a tabular model table is part of, and the source (DW) table to which a tabular model table is bound to. It also provides the column that will be used in refreshing the tabular model, and the lower and upper bounds of the data held in this tabular model. It also defines the strategy for processing the SSAS partitions.| 34 | 35 | ### TabularModel 36 | Provide the unique <_server, database_> pairs in the table. This information uniquely identifies each tabular model for the Job Manager, and in turn, will be used by the daemon on the Partition Builder nodes to connect to the SSAS server and the database before refresh. 37 | 38 | _Example_: 39 | 40 | ```json 41 | { 42 | "AnalysisServicesServer":"ssaspbvm00", 43 | "AnalysisServicesDatabase":"AdventureWorks", 44 | "IntegratedAuth":true, 45 | "MaxParallelism":4, 46 | "CommitTimeout":-1 47 | } 48 | ``` 49 | 50 | * **AnalysisServicesServer** : SSAS VM name or Azure AS URL. 51 | * **AnalysisServicesDatabase** : Name of the database. 52 | * **IntegratedAuth** : Boolean flag whether connection to DW to be made using integrated authentication or SQL authentication. 53 | * **MaxParallelism** : Maximum number of threads on which to run processing commands in parallel during partition building. 54 | * **CommitTimeout** : Cancels processing (after specified time in seconds) if write locks cannot be obtained. -1 will use the server default. 55 | 56 | ### TabularModelTablePartitions 57 | 58 | _Example_: 59 | ```json 60 | { 61 | "AnalysisServicesTable":"FactResellerSales", 62 | "SourceTableName":"[dbo].[FactResellerSales]", 63 | "SourcePartitionColumn":"OrderDate", 64 | "TabularModel_FK":1, 65 | "DWTable_FK":"dbo.FactResellerSales", 66 | "DefaultPartition":"FactResellerSales", 67 | "ProcessStrategy":"ProcessDefaultPartition", 68 | "MaxDate":"2156-01-01T00:00:00Z", 69 | "LowerBoundary":"2010-01-01T00:00:00Z", 70 | "UpperBoundary":"2011-12-31T23:59:59Z", 71 | "Granularity":"Daily", 72 | "NumberOfPartitionsFull":0, 73 | "NumberOfPartitionsForIncrementalProcess":0 74 | } 75 | ``` 76 | 77 | * **AnalysisServicesTable** : The table to be partitioned. 78 | * **SourceTableName** : The source table in the DW database. 79 | * **SourcePartitionColumn** : The source column of the source table. 80 | * **TabularModel_FK** : Foreign Key reference to the TabularModel. 81 | * **DWTable_FK** : Foreign Key reference to the DWTable. 82 | * **DefaultPartition** : 83 | * **MaxDate** : The maximum date that needs to be accounted for in the partitioning configuration. 84 | * **LowerBoundary** : The lower boundary of the partition date range. 85 | * **UpperBoundary** : The upper boundary of the partition date range. 86 | * **ProcessStrategy** : Strategy used for processing the partition; "RollingWindow" or "ProcessDefaultPartition". The default partition can be specified using the "DefaultPartition" property. Otherwise assumes a "template" partition with same name as table is present. 87 | * **Granularity** : Partition granularity of "Daily", "Monthly", or "Yearly". 88 | * **NumberOfPartitionsFull** : Count of all partitions in the rolling window. For example, a rolling window of 10 years partitioned by month would require 120 partitions. 89 | * **NumberOfPartitionsForIncrementalProcess** : Count of hot partitions where the data can change. For example, it may be necessary to refresh the most recent 3 months of data every day. This only applies to the most recent partitions. 90 | 91 | 92 | Provide one of the following values for partitioning strategy in the column _PartitionStrategy_: 93 | 94 | - _ModelProcessStrategy.ProcessDefaultPartition_ (Default) 95 | - _ModelProcessStrategy.RollingWindow_ 96 | 97 | If you choose _ModelProcessStrategy.ProcessDefaultPartition_: 98 | 99 | - Confirm that the tabular model contains a partition with the same name as the tabular model table. This partition is always used for the data load of the date slice, even if there are other partitions in the table. 100 | - Provide values for _SourceTableName_ and _SourcePartitionColumn_. 101 | - Provide values for _LowerBoundary, UpperBoundary_ to provide the start and end time span for the data in the tabular model. 102 | 103 | If you choose _ModelProcessingStrategy.RollingWindow_: 104 | - Confirm that the table partitions are defined on time granularity based ranges - as in Daily, Monthly or Yearly. 105 | - Provide values for the columns _MaxDate_, _NumberOfPartitionsFull_ and _NumberOfPartitionsforIncrementalProcess_. 106 | 107 | For both the cases, confirm that the value provided in _SourcePartitionColumn_ of the table represents a column of type DateTime in the source DW table. Tabular model refresh is incremental on time, for a specific date range. 108 | 109 | Next, the following two tables are **read-only**. You should **not** change or update these tables or its values, but you can view them for troubleshooting and/or understanding how the model refresh happens. 110 | 111 | | TableName | Description | 112 | |:----------|:------------| 113 | |**TabularModelPartitionStates**|In this table, the Job Manager tracks the source and target context for all data slices that are to be refreshed or processed in a tabular model, the start and end dates of each data slice, and the Blob URI where the processed tabular model backups will be stored.| 114 | |**TabularModelNodeAssignments**|In this table, the partition builder tracks the refresh state of each AS Read-Only node for each tabular model. It is used to indicate the maximum date for an entity for which the data has been processed. Each of the SSAS Read-Only nodes provides its current state here - in terms of latest data by date for every entity. 115 | 116 | ### TabularModelPartitionStates 117 | 118 | This table helps track all the data slices that are to be refreshed or processed in a tabular model. Each piece of incremental data loaded into the DW is specified with a start and end date, defining the data slice. Each new data slice will trigger a new partition to be built. 119 | 120 | --- 121 | _Example_: 122 | ```json 123 | { 124 | "ProcessStatus":"Purged", 125 | "TabularModelTablePartition_FK":4, 126 | "StartDate":"2017-09-12T18:00:00Z", 127 | "EndDate":"2017-09-12T21:00:00Z", 128 | "PartitionUri":"https://edw.blob.core.windows.net/data/AdventureWorks-backup-20170912T090408Z.abf", 129 | "ArchiveUri":"https://edw.blob.core.windows.net/data", 130 | "SourceContext":"{\"Name\":\"AzureDW\",\"Description\":\"Data source connection Azure SQL DW\",\"DataSource\":\"bd044-pdw01-ldw01.database.windows.net\",\"InitialCatalog\":\"dw\",\"ConnectionUserName\":\"username\",\"ConnectionUserPassword\":\"password\",\"ImpersonationMode\":\"ImpersonateServiceAccount\"}", 131 | "TargetContext":"{\"ModelConfigurationID\":1,\"AnalysisServicesServer\":\"ssaspbvm00\",\"AnalysisServicesDatabase\":\"AdventureWorks\",\"ProcessStrategy\":1,\"IntegratedAuth\":true,\"MaxParallelism\":4,\"CommitTimeout\":-1,\"InitialSetup\":false,\"IncrementalOnline\":true,\"TableConfigurations\":[{\"TableConfigurationID\":1,\"AnalysisServicesTable\":\"FactSalesQuota\",\"PartitioningConfigurations\":[{\"DWTable\":null,\"AnalysisServicesTable\":\"FactSalesQuota\",\"Granularity\":0,\"NumberOfPartitionsFull\":0,\"NumberOfPartitionsForIncrementalProcess\":0,\"MaxDate\":\"2156-01-01T00:00:00\",\"LowerBoundary\":\"2017-09-12T18:00:00\",\"UpperBoundary\":\"2017-09-12T21:00:00\",\"SourceTableName\":\"[dbo].[FactSalesQuota]\",\"SourcePartitionColumn\":\"Date\",\"TabularModel_FK\":1,\"DWTable_FK\":\"dbo.FactSalesQuota\",\"DefaultPartition\":\"FactSalesQuota\",\"Id\":4,\"CreationDate\":\"2017-09-12T17:17:28.8225494\",\"CreatedBy\":null,\"LastUpdatedDate\":\"2017-09-12T17:17:28.8225494\",\"LastUpdatedBy\":null}],\"DefaultPartitionName\":\"FactSalesQuota\"}]}" 132 | } 133 | ``` 134 | * **ProcessStatus**: Current status of the partition being processed. _Queued_, _Dequeued_, _Ready_, or _Purged_. 135 | * **TabularModelTablePartition_FK**: Foreign key referencing the TabularModelTablePartition. 136 | * **StartDate**: The start date of the current refresh data slice. 137 | * **EndDate**: The end date of the current refresh data slice. 138 | * **PartitionUri**: Uri of the resulting partitioned backup datafile. 139 | * **ArchiveUri**: Uri of the location to place the partitioned backup datafile. 140 | * **SourceContext**: JSON Object specifying the source DW connection information. 141 | * **TargetContract**: JSON Object that maps to the AsPartitionProcessing client's "ModelConfiguration" contract. 142 | 143 | Each row of this table represents a partition that should be built and the configuration that will be used to execute the partition builder client. Various components in the Control server can create a “work item” for the partition builder which uses the information in the attributes to process that data slice. It is evident from the fields in this table that all work items exist in the context of a _TabularModelTablePartition_ entity. 144 | 145 | Each row contains the start and end date of the data slice. Each entity to be partitioned clearly defines a StartDate and EndDate for the date slice to be processed. Note that this date range is produced by producer of this entity. In a typical case, this represents the date range for which a tabular model needs to be refreshed – where the range is simply the date range of the data slice in a _DWTableAvailabilityRange_ entity. 146 | 147 | #### Source context - DataSourceInfo sample contract object 148 | Source context – indicates which data source to connect to fetch the data for the slice (i.e. all information required to connect to a DW table), and contains serialized data source (DW) connection information that the tabular model uses to update or set its connection string dynamically. This data contract directly maps to the partition processing client’s DataSourceInfo contract. 149 | 150 | ```json 151 | { 152 | "Name":"AzureDW", 153 | "Description":"Data source connection Azure SQL DW", 154 | "DataSource":"pdw01-ldw01.database.windows.net", 155 | "InitialCatalog":"dw", 156 | "ConnectionUserName":"username", 157 | "ConnectionUserPassword":"password", 158 | "ImpersonationMode":"ImpersonateServiceAccount" 159 | } 160 | ``` 161 | #### Target context - ModelConfiguration sample contract object 162 | The TargetContext is a serialized representation of the ModelConfiguration contract that the partition processing client expects. It is a simplified representation of the tables and partitions that are to be processed by the client. A sample contract object is like so: 163 | 164 | ```json 165 | { 166 | "ModelConfigurationID":1, 167 | "AnalysisServicesServer":"ssaspbvm00", 168 | "AnalysisServicesDatabase":"AdventureWorks", 169 | "ProcessStrategy":1, 170 | "IntegratedAuth":true, 171 | "MaxParallelism":4, 172 | "CommitTimeout":-1, 173 | "InitialSetup":false, 174 | "IncrementalOnline":true, 175 | "TableConfigurations":[ 176 | { 177 | "TableConfigurationID":1, 178 | "AnalysisServicesTable":"FactSalesQuota", 179 | "PartitioningConfigurations":[ 180 | { 181 | "DWTable":null, 182 | "AnalysisServicesTable":"FactSalesQuota", 183 | "Granularity":0, 184 | "NumberOfPartitionsFull":0, 185 | "NumberOfPartitionsForIncrementalProcess":0, 186 | "MaxDate":"2156-01-01T00:00:00", 187 | "LowerBoundary":"2017-09-12T18:00:00", 188 | "UpperBoundary":"2017-09-12T21:00:00", 189 | "SourceTableName":"[dbo].[FactSalesQuota]", 190 | "SourcePartitionColumn":"Date", 191 | "TabularModel_FK":1, 192 | "DWTable_FK":"dbo.FactSalesQuota", 193 | "DefaultPartition":"FactSalesQuota" 194 | }], 195 | "DefaultPartitionName":"FactSalesQuota" 196 | }] 197 | } 198 | ``` 199 | ### TabularModelNodeAssignment 200 | This table contains entities that represents the refresh state of each of the supported tabular model tables per entity. The Analysis Server Read-Only nodes use this table to figure which backup file from the _TabularModelPartitionState_ entity to restore on each of the nodes. The Partition Builder node logs an entry that points to the maximum date ceiling for which data has been refreshed on a per tabular model table basis. 201 | 202 | _example_: 203 | ```json 204 | { 205 | "Name":"ssaspbvm00", 206 | "Type":"ASPB", 207 | "TabularModelTablePartition_FK":1, 208 | "State":"Building", 209 | "LatestPartitionDate":"2017-09-14T18:00:00Z" 210 | } 211 | ``` 212 | * **Name**: The name of the virtual machine node. 213 | * **Type**: The type of the virtual machine node. 214 | * _ASRO_: SSAS Read-only 215 | * _ASPB_: SSAS Partition Builder 216 | * **TabularModelTablePartition_FK**: Foreign key referencing the _TabularModelTablePartition_ table. 217 | * **State**: Current state of the node. 218 | * _Normal_ 219 | * _Transition_ 220 | * _Building_ 221 | * **LatestPartitionDate**: Latest partition build date for the node. 222 | 223 | --- 224 | -------------------------------------------------------------------------------- /Technical Guides/6-Understanding tabular model refresh.md: -------------------------------------------------------------------------------- 1 | # Analysis Services for Interactive BI 2 | 3 | The TRI helps you operationalize and manage tabular models in Analysis Services (AS) for interactive BI. The read-only AS servers are configured to handle interactive BI query load from client connections via a front end load balancer. Analysis Services, tabular models, and their characteristics are explained [here]((https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas)) 4 | 5 | The SSAS Model Cache in this TRI consists of six components - their roles described in the [architectural overview](../README.md#architecture) 6 | - Tabular Models 7 | - SSAS Partition Builder 8 | - SSAS Read Only Cache servers 9 | - Job Manager that coordinates the tabular model refresh 10 | - Azure Blob that stores the tabular models for refresh 11 | - Load Balancers that handles client connections 12 | 13 | ![SSAS Tabular Models Cache](../img/SSAS-Model-Cache.png) 14 | 15 | ## Tabular Model Creation 16 | 17 | Tabular models are Analysis Services databases that run in-memory, or act a pass-through for backend data sources. They support cached summaries and drilldowns of large amounts of data, thanks to a columnar storage that offers 10x or more data compression. This makes them ideal for interactive BI applications. See [this article](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas) for more details. Typically, tabular models hold only a subset of the big data held in upstream data warehouse or data marts – in terms of the number of entities and data size. There is a large corpus of best practices information for tabular model design and tuning, including this [excellent article](https://msdn.microsoft.com/en-us/library/dn751533.aspx) on the lifecycle of an enterprise grade tabular model. 18 | 19 | You can use tools such as the [SSDT tabular model designer](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-designer-ssas) available with Visual Studio 2015 (or greater) to create your tabular models. Set the compatibility level of the tabular models at 1200 or higher (latest is 1400 as of this writing) and the query mode to In-Memory. See [here](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-solutions-ssas-tabular) for details on tabular model creation. 20 | 21 | ## Tabular Model Partition Processing 22 | 23 | Partition creation is automated using the open source [AsPartitionProcessing tool](https://github.com/Microsoft/Analysis-Services/tree/master/AsPartitionProcessing). Many of the configurable options for partition building directly correspond to the configuration of this tool. Refer to AsPartitionProcessing tool's [whitepaper](https://github.com/Microsoft/Analysis-Services/blob/master/AsPartitionProcessing/Automated%20Partition%20Management%20for%20Analysis%20Services%20Tabular%20Models.pdf) for further documentation. 24 | 25 | ## Tabular Model Partition State Transition 26 | 27 | The following process outlines the various state transitions as incoming data is tracked until it is made available in the SSAS Read-only nodes: 28 | 1. When a DWTableAvailabilityRange is marked as Completed, a TabularModelPartitionState item is created in Queued state (provided there is not already one that is tracking the same range) for that range. 29 | 30 | 2. The Partition Builder virtual machine has a scheduled task that operates on TabularModelPartitionState items as follows: 31 | 32 | 1. It dequeues items that are in Queued state and moves them to the 'Dequeued' state. 33 | 2. It processes the items as per the processing strategy and moves the item to a 'Processed' state. This updates the SSAS database on the Partition Builder. 34 | 3. When it is time to flip the logical data warehouses, a backup of the database is created to a storage account and changes the state on the TabularModelPartitionState to 'Ready' while updating the PartitionUri to point to the database backup location. 35 | 36 | 3. The freshness of each SSAS database on each Read-Only virtual machine is maintained as TabularModelNodeAssignment. The Read-Only servers have scheduled tasks that monitor for TabularModelPartitionState items that are 'Ready' and have an EndDate that is later than the LatestPartitionDate (a watermark date to indicate freshness of data) on the TabularModelNodeAssignment. Once it finds such an item, the backup location of that item is fetched and restored on the Read-Only server. 37 | This process is done while ensuring that a configurable number of Read-Only virtual machines (represented by the MinSSASROServersNotInTransition value in the Control Server) are always kept running to serve active traffic. 38 | 39 | 4. Once all the Read-Only servers have restored the SSAS database backup on the TabularModelPartitionStates, these items are moved from 'Ready' to 'Purged' while deleting the backups on the storage account to free up space. This is done by a job called the CleanupOldASBackupJob on the Job manager hosted on the Control Server. 40 | 41 | ## Tabular model configuration for continuous incremental refresh at scale 42 | The various orchestration components of the TRI refer to four configuration tables to enable continuous and incremental model refresh. 43 | 44 | You can provide configuration inputs for two of these tables: 45 | 46 | | TableName | Description | 47 | |:----------|:------------| 48 | |**TabularModel**|Lists tabular models with their server and database names, to be used by the daemon on the Partition Builder servers to connect to the SSAS server and the database for refresh.| 49 | |**TabularModelTablePartitions**|This table specifies which model that a tabular model table is part of, and the source (DW) table to which a tabular model table is bound to. It also provides the column that will be used in refreshing the tabular model, and the lower and upper bounds of the data held in this tabular model. It also defines the strategy for processing the SSAS partitions.| 50 | 51 | ### TabularModel 52 | Provide the unique <_server, database_> pairs in the table. This information uniquely identifies each tabular model for the Job Manager, and in turn, will be used by the daemon on the Partition Builder nodes to connect to the SSAS server and the database before refresh. 53 | 54 | _Example_: 55 | 56 | ```json 57 | { 58 | "AnalysisServicesServer":"ssaspbvm00", 59 | "AnalysisServicesDatabase":"AdventureWorks", 60 | "IntegratedAuth":true, 61 | "MaxParallelism":4, 62 | "CommitTimeout":-1 63 | } 64 | ``` 65 | 66 | * **AnalysisServicesServer** : SSAS VM name or Azure AS URL. 67 | * **AnalysisServicesDatabase** : Name of the database. 68 | * **IntegratedAuth** : Boolean flag whether connection to DW to be made using integrated authentication or SQL authentication. 69 | * **MaxParallelism** : Maximum number of threads on which to run processing commands in parallel during partition building. 70 | * **CommitTimeout** : Cancels processing (after specified time in seconds) if write locks cannot be obtained. -1 will use the server default. 71 | 72 | ### TabularModelTablePartitions 73 | 74 | _Example_: 75 | ```json 76 | { 77 | "AnalysisServicesTable":"FactResellerSales", 78 | "SourceTableName":"[dbo].[FactResellerSales]", 79 | "SourcePartitionColumn":"OrderDate", 80 | "TabularModel_FK":1, 81 | "DWTable_FK":"dbo.FactResellerSales", 82 | "DefaultPartition":"FactResellerSales", 83 | "ProcessStrategy":"ProcessDefaultPartition", 84 | "MaxDate":"2156-01-01T00:00:00Z", 85 | "LowerBoundary":"2010-01-01T00:00:00Z", 86 | "UpperBoundary":"2011-12-31T23:59:59Z", 87 | "Granularity":"Daily", 88 | "NumberOfPartitionsFull":0, 89 | "NumberOfPartitionsForIncrementalProcess":0 90 | } 91 | ``` 92 | 93 | * **AnalysisServicesTable** : The table to be partitioned. 94 | * **SourceTableName** : The source table in the DW database. 95 | * **SourcePartitionColumn** : The source column of the source table. 96 | * **TabularModel_FK** : Foreign Key reference to the TabularModel. 97 | * **DWTable_FK** : Foreign Key reference to the DWTable. 98 | * **DefaultPartition** : 99 | * **MaxDate** : The maximum date that needs to be accounted for in the partitioning configuration. 100 | * **LowerBoundary** : The lower boundary of the partition date range. 101 | * **UpperBoundary** : The upper boundary of the partition date range. 102 | * **ProcessStrategy** : Strategy used for processing the partition; "RollingWindow" or "ProcessDefaultPartition". The default partition can be specified using the "DefaultPartition" property. Otherwise assumes a "template" partition with same name as table is present. 103 | * **Granularity** : Partition granularity of "Daily", "Monthly", or "Yearly". 104 | * **NumberOfPartitionsFull** : Count of all partitions in the rolling window. For example, a rolling window of 10 years partitioned by month would require 120 partitions. 105 | * **NumberOfPartitionsForIncrementalProcess** : Count of hot partitions where the data can change. For example, it may be necessary to refresh the most recent 3 months of data every day. This only applies to the most recent partitions. 106 | 107 | 108 | Provide one of the following values for partitioning strategy in the column _PartitionStrategy_: 109 | 110 | - _ModelProcessStrategy.ProcessDefaultPartition_ (Default) 111 | - _ModelProcessStrategy.RollingWindow_ 112 | 113 | If you choose _ModelProcessStrategy.ProcessDefaultPartition_: 114 | 115 | - Confirm that the tabular model contains a partition with the same name as the tabular model table. This partition is always used for the data load of the date slice, even if there are other partitions in the table. 116 | - Provide values for _SourceTableName_ and _SourcePartitionColumn_. 117 | - Provide values for _LowerBoundary, UpperBoundary_ to provide the start and end time span for the data in the tabular model. 118 | 119 | If you choose _ModelProcessingStrategy.RollingWindow_: 120 | - Confirm that the table partitions are defined on time granularity based ranges - as in Daily, Monthly or Yearly. 121 | - Provide values for the columns _MaxDate_, _NumberOfPartitionsFull_ and _NumberOfPartitionsforIncrementalProcess_. 122 | 123 | For both the cases, confirm that the value provided in _SourcePartitionColumn_ of the table represents a column of type DateTime in the source DW table. Tabular model refresh is incremental on time, for a specific date range. 124 | 125 | Next, the following two tables are **read-only**. You should **not** change or update these tables or its values, but you can view them for troubleshooting and/or understanding how the model refresh happens. 126 | 127 | | TableName | Description | 128 | |:----------|:------------| 129 | |**TabularModelPartitionStates**|In this table, the Job Manager tracks the source and target context for all data slices that are to be refreshed or processed in a tabular model, the start and end dates of each data slice, and the Blob URI where the processed tabular model backups will be stored.| 130 | |**TabularModelNodeAssignments**|In this table, the partition builder tracks the refresh state of each AS Read-Only node for each tabular model. It is used to indicate the maximum date for an entity for which the data has been processed. Each of the SSAS Read-Only nodes provides its current state here - in terms of latest data by date for every entity. 131 | 132 | ### TabularModelPartitionStates 133 | 134 | This table helps track all the data slices that are to be refreshed or processed in a tabular model. Each piece of incremental data loaded into the DW is specified with a start and end date, defining the data slice. Each new data slice will trigger a new partition to be built. 135 | 136 | --- 137 | _Example_: 138 | ```json 139 | { 140 | "ProcessStatus":"Purged", 141 | "TabularModelTablePartition_FK":4, 142 | "StartDate":"2017-09-12T18:00:00Z", 143 | "EndDate":"2017-09-12T21:00:00Z", 144 | "PartitionUri":"https://edw.blob.core.windows.net/data/AdventureWorks-backup-20170912T090408Z.abf", 145 | "ArchiveUri":"https://edw.blob.core.windows.net/data", 146 | "SourceContext":"{\"Name\":\"AzureDW\",\"Description\":\"Data source connection Azure SQL DW\",\"DataSource\":\"bd044-pdw01-ldw01.database.windows.net\",\"InitialCatalog\":\"dw\",\"ConnectionUserName\":\"username\",\"ConnectionUserPassword\":\"password\",\"ImpersonationMode\":\"ImpersonateServiceAccount\"}", 147 | "TargetContext":"{\"ModelConfigurationID\":1,\"AnalysisServicesServer\":\"ssaspbvm00\",\"AnalysisServicesDatabase\":\"AdventureWorks\",\"ProcessStrategy\":1,\"IntegratedAuth\":true,\"MaxParallelism\":4,\"CommitTimeout\":-1,\"InitialSetup\":false,\"IncrementalOnline\":true,\"TableConfigurations\":[{\"TableConfigurationID\":1,\"AnalysisServicesTable\":\"FactSalesQuota\",\"PartitioningConfigurations\":[{\"DWTable\":null,\"AnalysisServicesTable\":\"FactSalesQuota\",\"Granularity\":0,\"NumberOfPartitionsFull\":0,\"NumberOfPartitionsForIncrementalProcess\":0,\"MaxDate\":\"2156-01-01T00:00:00\",\"LowerBoundary\":\"2017-09-12T18:00:00\",\"UpperBoundary\":\"2017-09-12T21:00:00\",\"SourceTableName\":\"[dbo].[FactSalesQuota]\",\"SourcePartitionColumn\":\"Date\",\"TabularModel_FK\":1,\"DWTable_FK\":\"dbo.FactSalesQuota\",\"DefaultPartition\":\"FactSalesQuota\",\"Id\":4,\"CreationDate\":\"2017-09-12T17:17:28.8225494\",\"CreatedBy\":null,\"LastUpdatedDate\":\"2017-09-12T17:17:28.8225494\",\"LastUpdatedBy\":null}],\"DefaultPartitionName\":\"FactSalesQuota\"}]}" 148 | } 149 | ``` 150 | * **ProcessStatus**: Current status of the partition being processed. _Queued_, _Dequeued_, _Ready_, or _Purged_. 151 | * **TabularModelTablePartition_FK**: Foreign key referencing the TabularModelTablePartition. 152 | * **StartDate**: The start date of the current refresh data slice. 153 | * **EndDate**: The end date of the current refresh data slice. 154 | * **PartitionUri**: Uri of the resulting partitioned backup datafile. 155 | * **ArchiveUri**: Uri of the location to place the partitioned backup datafile. 156 | * **SourceContext**: JSON Object specifying the source DW connection information. 157 | * **TargetContract**: JSON Object that maps to the AsPartitionProcessing client's "ModelConfiguration" contract. 158 | 159 | Each row of this table represents a partition that should be built and the configuration that will be used to execute the partition builder client. Various components in the Control server can create a “work item” for the partition builder which uses the information in the attributes to process that data slice. It is evident from the fields in this table that all work items exist in the context of a _TabularModelTablePartition_ entity. 160 | 161 | Each row contains the start and end date of the data slice. Each entity to be partitioned clearly defines a StartDate and EndDate for the date slice to be processed. Note that this date range is produced by producer of this entity. In a typical case, this represents the date range for which a tabular model needs to be refreshed – where the range is simply the date range of the data slice in a _DWTableAvailabilityRange_ entity. 162 | 163 | #### Source context - DataSourceInfo sample contract object 164 | Source context – indicates which data source to connect to fetch the data for the slice (i.e. all information required to connect to a DW table), and contains serialized data source (DW) connection information that the tabular model uses to update or set its connection string dynamically. This data contract directly maps to the partition processing client’s DataSourceInfo contract. 165 | 166 | ```json 167 | { 168 | "Name":"AzureDW", 169 | "Description":"Data source connection Azure SQL DW", 170 | "DataSource":"pdw01-ldw01.database.windows.net", 171 | "InitialCatalog":"dw", 172 | "ConnectionUserName":"username", 173 | "ConnectionUserPassword":"password", 174 | "ImpersonationMode":"ImpersonateServiceAccount" 175 | } 176 | ``` 177 | #### Target context - ModelConfiguration sample contract object 178 | The TargetContext is a serialized representation of the ModelConfiguration contract that the partition processing client expects. It is a simplified representation of the tables and partitions that are to be processed by the client. A sample contract object is like so: 179 | 180 | ```json 181 | { 182 | "ModelConfigurationID":1, 183 | "AnalysisServicesServer":"ssaspbvm00", 184 | "AnalysisServicesDatabase":"AdventureWorks", 185 | "ProcessStrategy":1, 186 | "IntegratedAuth":true, 187 | "MaxParallelism":4, 188 | "CommitTimeout":-1, 189 | "InitialSetup":false, 190 | "IncrementalOnline":true, 191 | "TableConfigurations":[ 192 | { 193 | "TableConfigurationID":1, 194 | "AnalysisServicesTable":"FactSalesQuota", 195 | "PartitioningConfigurations":[ 196 | { 197 | "DWTable":null, 198 | "AnalysisServicesTable":"FactSalesQuota", 199 | "Granularity":0, 200 | "NumberOfPartitionsFull":0, 201 | "NumberOfPartitionsForIncrementalProcess":0, 202 | "MaxDate":"2156-01-01T00:00:00", 203 | "LowerBoundary":"2017-09-12T18:00:00", 204 | "UpperBoundary":"2017-09-12T21:00:00", 205 | "SourceTableName":"[dbo].[FactSalesQuota]", 206 | "SourcePartitionColumn":"Date", 207 | "TabularModel_FK":1, 208 | "DWTable_FK":"dbo.FactSalesQuota", 209 | "DefaultPartition":"FactSalesQuota" 210 | }], 211 | "DefaultPartitionName":"FactSalesQuota" 212 | }] 213 | } 214 | ``` 215 | ### TabularModelNodeAssignment 216 | This table contains entities that represents the refresh state of each of the supported tabular model tables per entity. The Analysis Server Read-Only nodes use this table to figure which backup file from the _TabularModelPartitionState_ entity to restore on each of the nodes. The Partition Builder node logs an entry that points to the maximum date ceiling for which data has been refreshed on a per tabular model table basis. 217 | 218 | _example_: 219 | ```json 220 | { 221 | "Name":"ssaspbvm00", 222 | "Type":"ASPB", 223 | "TabularModelTablePartition_FK":1, 224 | "State":"Building", 225 | "LatestPartitionDate":"2017-09-14T18:00:00Z" 226 | } 227 | ``` 228 | * **Name**: The name of the virtual machine node. 229 | * **Type**: The type of the virtual machine node. 230 | * _ASRO_: SSAS Read-only 231 | * _ASPB_: SSAS Partition Builder 232 | * **TabularModelTablePartition_FK**: Foreign key referencing the _TabularModelTablePartition_ table. 233 | * **State**: Current state of the node. 234 | * _Normal_ 235 | * _Transition_ 236 | * _Building_ 237 | * **LatestPartitionDate**: Latest partition build date for the node. 238 | 239 | --- 240 | --------------------------------------------------------------------------------