├── 05-extract-insights-with-databricks └── README.md ├── assets ├── img │ ├── ds-process.png │ └── key-components-ml-workspace.png └── template.md ├── 06-intro-to-ml-with-python-and-azure-notebooks └── README.md ├── .vscode └── settings.json ├── 04-data-engineering-with-databricks └── README.md ├── README.md ├── 00-exam-training └── README.md ├── 03-get-started-with-ADSVM └── README.md ├── 01-explore-AI-solution-development └── README.md └── 02-build-AI-solutions-with-AMLS └── README.md /05-extract-insights-with-databricks/README.md: -------------------------------------------------------------------------------- 1 | # Perform data engineering with Azure Databricks -------------------------------------------------------------------------------- /assets/img/ds-process.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbrull/azure-ds-examdp100-notes/HEAD/assets/img/ds-process.png -------------------------------------------------------------------------------- /06-intro-to-ml-with-python-and-azure-notebooks/README.md: -------------------------------------------------------------------------------- 1 | # Introduction to machine learning with Python and Azure Notebooks -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "python.pythonPath": "C:\\Users\\P.BrullBorras\\AppData\\Local\\Continuum\\miniconda3\\python.exe" 3 | } -------------------------------------------------------------------------------- /assets/img/key-components-ml-workspace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmbrull/azure-ds-examdp100-notes/HEAD/assets/img/key-components-ml-workspace.png -------------------------------------------------------------------------------- /04-data-engineering-with-databricks/README.md: -------------------------------------------------------------------------------- 1 | # Data Engineering with Databricks 2 | 3 | ## 4 | 5 |
6 | 7 | Show content 8 | 9 |

10 | 11 | ### Learning objectives -------------------------------------------------------------------------------- /assets/template.md: -------------------------------------------------------------------------------- 1 | ## Section 2 | 3 |

4 | 5 | Show content 6 | 7 |

8 | 9 | ### Learning Objectives 10 | 11 | 12 | 13 | ### Knowledge Check 14 | 15 | 1. Some question 16 | 17 | * bla 18 | 19 |

20 | 21 | Answer 22 | 23 |

24 | bla 25 |

26 |
27 | 28 | 29 |

30 |
31 | 32 | --- -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Azure Data Science exam DP-100 notes 2 | 3 | Personal notes for the Azure Data Science exam DP-100. All content is based on Microsoft's Learning Path [docs](https://docs.microsoft.com/en-us/learn/certifications/exams/dp-100?source=learn). 4 | 5 | Some useful links: 6 | * [Exam skills measured](https://docs.microsoft.com/en-us/learn/certifications/exams/dp-100?source=learn) 7 | * [Exam requirements infographic](https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE2PLKZ) 8 | * [DP-100 Labs](https://github.com/MicrosoftLearning/DP-100-Designing-and-Implementing-a-Data-Science-Solutio) 9 | * [Azure ML service example notebooks](https://github.com/Azure/MachineLearningNotebooks) 10 | 11 | Study guide: 12 | * [Data Concepts](https://medium.com/deep-ai/all-you-need-to-know-about-data-for-machine-learning-a80bc8555d58) 13 | * [Study Guide](https://medium.com/deep-ai/study-guide-for-microsoft-azure-data-scientist-associate-certification-dp-100-c2e4611cb071) -------------------------------------------------------------------------------- /00-exam-training/README.md: -------------------------------------------------------------------------------- 1 | # DS DP-100 Exam training 01 2 | 3 | ## Azure Data Science Options 4 | 5 | * Azure ML Studio -> Drag and drop. Understand for the exam. No code is needed there. Training and Deployment. Complete ML environment. Ideal for learning and beginner data scientists. 6 | * Azure Databricks for Big Data - based on Spark. Massive scale with spark. User friendly portal. Dynamic scale. Secure collaboration (secured workspace). DS tools. You can use different languages in the same notebook. 7 | * Core artifacts: Jobs, libraries, clusters, workspaces and notebooks. 8 | * Azure Data Science Virtual Machine - VM with almost all of the tools one would need to do DS already presintalled. You can deploy them directly to Azure and work from there. It's easy to customize for your needs. It has some sample code already there. They merged it with Deep Learning VM. There are specific versions for Geo data. 9 | * SQL Server Machine Learning Services - We cna use this to analyze data on SQL Server. Useful for on-premise data. It is an option as source Python and R does not scale, security concerns, operationalization. 10 | * Spark on Azure HDInsight - massive scale with in-memory processing. Hortonworks Distribution. Easy management PaaS. Integration with other Azure services. 11 | 12 | > Databricks vs. HDInsights: Databricks it's easier to collaborate, built for collaboration and work in teams. 13 | 14 | * Azure ML Service: core of the course. Model management, training, selection, hyper-param tuning, feature selection and model evaluation. It lets you automate tuning and selection tasks. All is in Python. 15 | 16 | ## Azure Notebooks 17 | Azure based Jupyter Notebooks. Free tier. Ready to use project that teach how to use Azure data and AI services. 18 | 19 | Jupyter notebooks can be integrated in VScode and thus you can use git integration with that. 20 | 21 | Azure notebooks only support Python, R and F#. 22 | 23 | The advantadge of Azure Notebooks is that most of the libraries are preinstalled, but you still have the possibility to install more libraries. You can upload data from your local machine and use custom environment configuration. By using VMs from your azure subscription you can add processing power. 24 | 25 | It is Azure ML service ready. From Azure Notebooks you can call Azure ML Service. 26 | 27 | ## Azure ML Service 28 | 29 | Bring the power of containerization and automation to DS. Pack model and libraries into a container and run everything. 30 | 31 | DS pipeline: 32 | Environment setup -> Data preparation -> Experimentation -> Deployment 33 | 34 | * Environment setup: create a workspace to store your work. Use python or Azure portal. An experiment within the workspace stored model training information. Use the IDE of your choice. 35 | * Data Preparation: Use python libraries or the Azure Data Prep SDK. 36 | * Experimentation: Train models with Python open source modules of your choice. Train locally or in Azure. Submit model training to Azure containers. Monitor model training. Register final model. 37 | * Deployment: to make a model available. Target deployment environments are: Docker images, Azure container instances, Azure kubernetes service, Azure IoT edge, Field Programmable Gate Array (FPGA). FOr the dpeloyment you'll need the following files: 38 | * A score script file tells Azure ML Services (AMLS) to call the model 39 | * An environment file specifies package dependencies 40 | * A configuration file requests the required resources for the container. 41 | 42 | ### What is a Workspace 43 | 44 | The top-level resource for AMLS. It serves as a hub for building and deploying models. You can create a workspace in the Azure portla, or you can create and access it using Python on an IDE of your choice. 45 | 46 | All models must be registered in the workspace for future use. Together with the scoring scripts, you create an image for deployment. 47 | 48 | The workspace stores experiment objects that are required for each model you create. Additionally, it saves your compute targets. You can track training runs. 49 | 50 | ### What is an Image 51 | 52 | An image has three components: 53 | * A model and scoring script or application 54 | * An environment file that declares the dependencies that are needed by the model, scoring script or application 55 | * A configuration file that describes the necessary resources to execute the model 56 | 57 | ### What is a Datastore 58 | 59 | An abstraction over an Azure Storage account. Each workspace has a registered, default datastore that you can use right away, but you can register other Azure Blob or File storage containers as a datastore. 60 | 61 | ### What is a Pipeline 62 | 63 | A ML pipeline is a tool to create and manage workflows during a DS process: data manipulation, model training and testing and deployment phases. Each step of the process can run unattended in different compute targets, which makes it easier to allocate resources. 64 | 65 | ### What is a Compute Target 66 | 67 | Is the compute resource to run a training script or to host service deployment. It's attached to a workspace. Other than the local machine, users of the workspace share compute targets. 68 | 69 | ### What is a deployed Web Service 70 | 71 | For a deployed web service, you have the choices of container Instances, AKS or FPGAs. With your model, script and associated files all set in the image, you can create a web service. 72 | 73 | > OBS: Trainer said that its better to create a new RG for each ML workspace as there are several resources involved and we don't want to get a mess. 74 | 75 | > OBS2: In Azure Notebooks, change python kernel to 3.6, as the default is just set to Python 3! This can rise errors when importing azure ML libs. 76 | 77 | ### Interact with ML Service 78 | 79 | We can interact via Azure Notebook (linking the subscription), a visual interface, Notebook VMs and automated ML. 80 | 81 | You can run notebooks in the workspace without any kind of authentication and they are stored in the WS. So it is useful to work in teams. 82 | 83 | If you use JupyterLab, you can link a repo in Azure DevOps. 84 | 85 | If you register a model with the same name multiple times, it gets uploaded with greater version. 86 | 87 | > OBS: Scalability is enabled during training, but once the code is deployed it is flat. Also, it is only supported as an Azure App Service so you keep paying even if it is idle. 88 | -------------------------------------------------------------------------------- /03-get-started-with-ADSVM/README.md: -------------------------------------------------------------------------------- 1 | # Get started with Machine Learning with an Azure Data Science Virtual Machine 2 | 3 | ## Introduction to the Azure Data Science Virtual Machine (DSVM) 4 | 5 |
6 | 7 | Show content 8 | 9 |

10 | 11 | ### Learning objectives 12 | 13 | * Learn about the types of Data Science Virtual Machines 14 | * Learn what type of DSVM to use for each type of use case 15 | 16 | ### When to use an Azure DSVM? 17 | 18 | Azure DSVM makes it easy to maintain consistency in the evolving Data Science environments. 19 | 20 | It also provides samples in Jupyter Notebooks and scripts for Python and R to learn about Microsoft and Azure ML services: 21 | * How to connect to cloud Datastores with Azure ML and how to build models. 22 | * Deep Learning samples using Microsoft Cognitive Services. 23 | * How to compare Microsoft R and open source R and how to operationalize models with ML Services in SQL Server. 24 | 25 | ### Types of Azure DSVM 26 | 27 | * **Windows vs. Linux**: Windows Server 2012 and 2016 vs. Ubuntu 16.04 LTS and CentOS 7.4 28 | * **Deep Learning**: The Deep Learning DSVM comes preconfigured and preinstalled with many tools and you can select high-speed GPU based machines. 29 | * **Geo AI DSVM**: VM optimized for geospatial and location data. It has ArcGIS Pro system integrated. 30 | 31 | ### Use cases for a DSVM 32 | 33 | * **Collaborate as a team using DSVMs**: Working with cloud-based resources that can share the same configuration helps to ensure that all team members have a consistent development environment. 34 | * **Address issues with DSVMs**: As issues related to environment mismatches are reduced. Giving DSVMs to students in a class. 35 | * **Use on-demand elastic capacity for large-scale projects**: As it helps to replicate data science environments on demand to allow high-powered computing resources to be run. 36 | * **Experiment and evaluate on a DSVM**: As they are easy to create, they can be used for demos and short experiments. 37 | * **Learn about DSVMs and deep learning**: The flexibility of the underlying compute power (scaling or switching to GPU) makes it easy to train all kind of models. 38 | 39 | ### Knowledge Check 40 | 41 | 1. Which of the following is a reason to use an Azure Data Science Virtual Machine? 42 | 43 | * You want to create an Azure Databricks workspace. 44 | * You want to get a jump-start on data science work. 45 | * You want to deploy a web application to it. 46 | 47 |

48 | 49 | Answer 50 | 51 |

52 | The purpose of Data Science Virtual Machines is to give a data scientist the tools they need, pre-installed, and ready to go. 53 |

54 |
55 | 56 | 1. Which of the following is installed on a Data Science Virtual Machine? 57 | 58 | * Azure Data Warehouse 59 | * Jupyter Notebook 60 | * Azure Machine Learning Studio 61 | 62 |
63 | 64 | Answer 65 | 66 |

67 | Jupyter Notebook is installed on Data Science Virtual Machines and provides a great data science development tool. 68 |

69 |
70 | 71 |

72 |
73 | 74 | --- 75 | 76 | ## Explore the types of Azure Data Science Virtual Machines 77 | 78 |
79 | 80 | Show content 81 | 82 |

83 | 84 | ### Learning objectives 85 | 86 | * Learn how to create Windows-based and Linux-based DSVMs 87 | * Explore the Deep Learning Data Science Virtual Machines 88 | * Work with Geo AI Data Science Virtual Machines 89 | 90 | ### Windows-Based DSVMs 91 | 92 | You can use the Windows-based DSVM to jump-start your data science projects. You don't pay for the DSVM image, just usage fees. 93 | 94 | The image comes with a bunch of features: 95 | * Tutorials 96 | * Support for Office 97 | * SQL Server integrated with ML Services 98 | * Preinstalled languages: R, Python, SQL, C# 99 | * Data Science tools such as Azure ML SDK for Python, Anaconda, Jupyter... 100 | * ML tools as Azure Congitive Services support, H2O, Tensorflow, Weka... 101 | 102 | ### Deep Learning Virtual Machine 103 | 104 | Deep Learning Virtual Machines (DLVMs) use GPU-based hardware that provide increased mathematical calculation speed for faster model training. The image can be either Windows or Ubuntu. 105 | 106 | The DLVM simplifies the tool selection process by including preconfigured tools for different situations. 107 | 108 | ### Geo AI Data Science VM with ArcGIS 109 | 110 | Both Python and R work with ArcGIS Pro, and are preconfigured on the Geo AI Data Science VM. 111 | 112 | The image includes a large set of tools as DL frameworks, Keras, Caffe2 and Spark standalone. 113 | 114 | > OBS: Tools need to be compatible with GPUs. 115 | 116 | It also comes bundled with IDEs such as visual studio or PyCharm. 117 | 118 | Examples of Geo AI include: 119 | 120 | * Real-time results of traffic conditions 121 | * Driver availability in Uber or Lyft at any time 122 | * Deep learning for disaster response 123 | * Urban growth prediction 124 | 125 | ### Knowledge Check 126 | 127 | 1. You want to learn about how to use Azure services related to machine learning with as little fuss as possible installing and configuring software and locating demonstration scripts. Which Data Science Virtual Machine type would best suit these needs? 128 | 129 | * Deep Learning DSVM 130 | * Windows 2016 DSVM 131 | * Geo AI Data Science VM with ArcGIS DSVM 132 | 133 |

134 | 135 | Answer 136 | 137 |

138 | The Windows 2016 gives you the most popular data science tools installed and configured and includes many sample scripts for using Azure machine learning related services. 139 |

140 |
141 | 142 | 2. You need to train deep learning models to do image recognition using a lot of training data in the form of images. Which DSVM configuration would be best for the fastest model training? 143 | 144 | * Windows 2016 with standard CPUs. 145 | * Geo AI Data Science VM with ArcGIS DSVM 146 | * Deep Learning VM which is configured to use GPUs. 147 | 148 |
149 | 150 | Answer 151 | 152 |

153 | The DSVM includes all the software needed for training deep learning models and use graphic processor units (GPUs) which perform calculations much faster than standard CPUs. 154 |

155 |
156 | 157 |

158 |
159 | 160 | --- 161 | 162 | ## Provision and use an Azure Data Science Virtual Machine 163 | 164 |
165 | 166 | Show content 167 | 168 |

169 | 170 | This module is based on exercise, so it's best followed [here](https://docs.microsoft.com/en-us/learn/modules/provision-and-use-azure-dsvm/). 171 | 172 | ### Knowledge Check 173 | 174 | 1. What method did we use to log into a Windows-Based Data Science VM? 175 | 176 | * Remote Desktop Protocol (RDP) 177 | * HTTP 178 | * ODBC 179 | 180 |

181 | 182 | Answer 183 | 184 |

185 | RDP: A step by step walk through explains all the steps to connect to a Windows-based DSVM. 186 |

187 |
188 | 189 | 1. What development environment has pre-loaded sample code available? 190 | 191 | * PyCharm 192 | * Zeppelin Notebook 193 | * Jupyter Notebook 194 | 195 |
196 | 197 | Answer 198 | 199 |

200 | Jupyter: We showed that many sample notebooks are installed that demonstrate how to use Microsoft Machine Learning technologies. 201 |

202 |
203 | 204 | 1. What type of Jupyter Notebook cell is used to provide annotations? 205 | 206 | * Code cell 207 | * Markdown cell 208 | * Raw cell 209 | 210 |
211 | 212 | Answer 213 | 214 |

215 | Markdown support rich formatting and is ideal for adding comments and annotations to your notebooks. 216 |

217 |
218 | 219 |

220 |
221 | 222 | --- -------------------------------------------------------------------------------- /01-explore-AI-solution-development/README.md: -------------------------------------------------------------------------------- 1 | # Explore AI solution development with data science services in Azure 2 | 3 | ## Introduction to Data Science in Azure 4 | 5 |
6 | 7 | Show content 8 | 9 |

10 | 11 | ### Learning Objectives 12 | 13 | * Learn the steps involved in the data science process 14 | * Learn the machine learning modeling cycle 15 | * Learn data cleansing and preparation 16 | * Learn model feature engineering 17 | * Learn model training and evaluation 18 | * Learn about model deployment 19 | * Discover the specialized roles in the data science process 20 | 21 | ### The Data Science process 22 | 23 | ![img](../assets/img/ds-process.png) 24 | 25 | Iterative process that starts with a question, risen from business needs and understanding. 26 | 27 | ### What is modeling? 28 | 29 | Modeling is a cycle of data and business understanding. You start by exploring your assets, in this case data, with **Exploratory Data Analysis (EDA)**, from that point feature engineering starts and finally train a model on top, which is an algorithm that learns information and provides a probabilistic prediction. 30 | 31 | In the end, the model is evaluated to check where it is accurate and where it is failing to correct the behavior. 32 | 33 | ### Choose a use case 34 | 35 | Identify the problem (business understanding) -> Define the project goals -> Identify data sources 36 | 37 | ### Data preparation 38 | 39 | Data cleansing and EDA are vital to the modeling process, to get insights on what data is or is not useful and what needs to be corrected or taken into account. Understanding the data is one of the most vital steps in the data science cycle. 40 | 41 | ### Feature engineering 42 | 43 | What extra knowledge we can extract by combining existing features to create new ones. 44 | 45 | ### Model training 46 | 47 | Split data -> Cross-validate data -> Obtain probabilistic prediction 48 | 49 | ### Model evaluation 50 | 51 | **Hyperparameters** are parameters used in model training that cannot be learned by the training process. These parameters must be set before model training begins. 52 | 53 | For evaluating the results you need to set up a metric to compare different runs, such as accuracy or MSE. 54 | 55 | ### Model deployment 56 | 57 | Model deployment is the final stage of the data science procedure. It is often done by a developer, and is usually not part of the data scientist role. 58 | 59 | ### Specialized roles in the Data Science process 60 | 61 | In the data science process, there are specialists in each of the steps: 62 | 63 | Business Analyst or Domain Expert, Data Engineer, Developer and Data Scientist. 64 | 65 | ### Knowledge Check 66 | 67 | 1. Which of the following is not a specialized role in the Data Science Process? 68 | 69 | * Database Administrator 70 | * Data Scientist 71 | * Data Engineer 72 | 73 |

74 | 75 | Answer 76 | 77 |

78 | DBA 79 |

80 |
81 | 82 | 1. Model feature engineering refers to which of the following? 83 | 84 | * Selecting the best model to use for the experiment. 85 | * Determine which data elements will help in making a prediction and preparing these columns to be used in model training. 86 | * Exploring the data to understand it better. 87 | 88 |
89 | 90 | Answer 91 | 92 |

93 | Feature engineering involves the data scientist determining which data to use in model training and preparing the data so it can be used by the model. 94 |

95 |
96 | 97 | 1. The Model deployment involves. 98 | 99 | * Calling a model to score new data. 100 | * Training a model. 101 | * Copying a trained model and its code dependencies to an environment where it will be used to score new data. 102 | 103 |
104 | 105 | Answer 106 | 107 |

108 | Deploying a model makes it available for use. 109 |

110 |
111 | 112 |

113 |
114 | 115 | --- 116 | 117 | ## Choose the Data Science service in Azure you need 118 | 119 |
120 | 121 | Show content 122 | 123 |

124 | 125 | ### Learning Objectives 126 | 127 | * Differentiate each of the Azure machine learning products. 128 | * Identify key features of each product. 129 | * Describe the use cases for each service. 130 | 131 | ### Machine Learning options on Azure 132 | 133 | We have the following services: 134 | 135 | * **Azure Machine Learning Studio**: GUI-based solution best chosen for learning. It includes all DS pipeline steps, from importing and playing around with data to different deployment options. All is based in a drag-and-drop method. 136 | * **Azure Databricks**: Great collaboration platform with a powerful notebook interface, job scheduling, AAD integration and granular security control. It allows to create and modify Spark clusters. 137 | * **Azure Data Science Virtual Machine**: preconfigured VMs with lots of preinstalled popular ML tools. You can directly connect to the machine via ssh or remote desktop. There are different types of machines: 138 | * Linux and Windows OS, where Windows supports scalability with ML in SQL Server and Linux does not. 139 | * Deep Learning VM, offering DL tools. 140 | * Geo AI DSVM, with specific tools for working with spatial data. Includes ArcGIS. 141 | * **SQL Server Machine Learning Services**: add-on which runs on the SQL Server on-premises and supports scale up and high performance of Python and R code. It includes several advantages: 142 | * Security, as the processing occurs closer to the data source. 143 | * Performance 144 | * Consistency 145 | * Efficiency, as you can use integrated tools such as PowerBI to report and analyze results. 146 | * **Spark on HDInsight**: HDInsight is PaaS service offering Apache Hadoop. It provides several benefits: 147 | * Easy and fast to create and modify clusters on demand. 148 | * Usage of ETL tools in the cluster with Map Reduce and Spark. 149 | * Compliance standards with Azure Virtual Network, envryption and integration of Azure AD. 150 | * Integrated with other Azure services, such as ADLS or ADF. 151 | 152 | HDInsight Spark is an implementation of Apache Spark on Azure HDInsight. 153 | * **Azure Machine Learning Service**: Supports the whole DS pipeline integration, scale processing and automate the following tasks: 154 | 155 | * Model management 156 | * Model training 157 | * Model selection 158 | * Hyper-parameter tuning 159 | * Feature selection 160 | * Model evaluation 161 | 162 | It supports open-source technologies such as Python and common ds tools. It makes it easier to containerize and deploy the model and automate several tasks. The platform is designed to support three roles: 163 | 164 | * Data Engineer to ingest and prepare data for analysis either locally or on Azure containers. 165 | * Data Scientist to apply the modeling tools and processes. AMLS support sklearn, tensorFlow, pyTorch, Microsoft Cognitive Toolkit and Apache MXNet. 166 | * Developer to create an image of the built and trained model with all the needed components. An **image** contains: 167 | 1. The model 168 | 1. A scoring script or application which passes input to the model and returns the output of the model 169 | 1. The required dependencies, such as Python scripts or packages needed by the model or scoring script. 170 | 171 | Images can be deployed as Docker images or field programmable gate array (FPGA) images. Iages can be deployed to a web service (running in Azure Container Instance, FPGA or Azure Kubernetes Service), or an IoT module (IoT Edge). 172 | 173 | > OBS: Scalability is enabled during training, but once the code is deployed it is flat. Also, it is only supported as an Azure App Service so you keep paying even if it is idle. 174 | 175 | 176 | ### Knowledge Check 177 | 178 | 1. Azure Machine Learning service supports which programming language. 179 | 180 | * R 181 | * Julia 182 | * Python 183 | 184 |

185 | 186 | Answer 187 | 188 |

189 | Python is supported by Azure Machine Learning service. 190 |

191 |
192 | 193 | 1. Azure Databricks is built on which Big Data platform? 194 | 195 | * Azure SQL Data Warehouse 196 | * SQL Server 197 | * Apache Spark 198 | 199 |
200 | 201 | Answer 202 | 203 |

204 | Azure Databricks makes using Spark easier. 205 |

206 |
207 | 208 | 209 | 1. Which is not an operating system available for an Azure Data Science Virtual Machine? 210 | 211 | * Windows 212 | * Linux 213 | * Apple iOS 214 | 215 |
216 | 217 | Answer 218 | 219 |

220 | Data Science VMs running Apple iOS are not available. 221 |

222 |
223 | 224 | 225 |

226 |
227 | 228 | --- -------------------------------------------------------------------------------- /02-build-AI-solutions-with-AMLS/README.md: -------------------------------------------------------------------------------- 1 | # Build AI solutions with Azure Machine Learning service 2 | 3 | ## Introduction to Azure Machine Learning service 4 | 5 |
6 | 7 | Show content 8 | 9 |

10 | 11 | ### Learning Objectives 12 | 13 | * Learn the difference between Azure Machine Learning Studio and Azure Machine Learning service 14 | * See how Azure Machine Learning service fits into the data science process 15 | * Learn the concepts related to an Azure Machine Learning service experiment 16 | * Explore the Azure Machine Learning service pipeline 17 | * Train a model using Azure Machine Learning service 18 | 19 | ### Azure Machine Learning Service within a data science process 20 | 21 | Environment Set Up -> Data Preparation -> Experimentation -> Deployment 22 | 23 | * **Environment setup**: First step is creating a **Workspace**, where you store your ML work. An **Experiment** is created within the workspace to store information about runs for your model. You can have multiple experiments in one workspace. You can interact with the environment with different IDEs such as PyCharm or Azure Notebooks. 24 | * **Data Preparation**: explore, analyze and visualize the sources. You can use any tool. Azure provides the following SDK `Azureml.dataprep`. 25 | * **Experimentation**: Iterative process of training and testing. With AMLS you can run the model in Azure containers. You need to create and configure a computer target object used to provision computer resources. 26 | * **Deployment**: Create a Docker image that will get deployed to Azure Container Instances (you could also choose AKS, Azure IoT or FPGA). 27 | 28 | ### Create a machine learning experiment 29 | 30 | ![img](../assets/img/key-components-ml-workspace.png) 31 | 32 | * **Workspace**: top-level resource in AMLS where you build and deploy your models. With a registered model and scoring scripts you can create an image for deployment. It stores experiment objects which save computer targets, track runs, logs, metrics and outputs. 33 | * **Image**: it has three key components: 34 | 1. A model and scoring script or application 35 | 1. An environment file that declares the dependencies. 36 | 1. A configuration file with the necessary resources to execute the model. 37 | * **Datastore**: Abstraction over an Azure Storage account. Each workspace has a default one, but you could add Blob or File storage containers. 38 | * **Pipeline**: Tool to create and manage workflows during a ds process. Each step can run unattended in different computer targets, which makes it easier to allocate resources. 39 | * **Computer target**: Resource to run a training model or to host service deployment. It is attached to a workspace. 40 | * **Deployed Web service**: You can choose between ACI, AKS or FPGA. With the model, script and image files you can create a Web service. 41 | * **IoT module**: It is a Docker container and has the same needs as a Web Service. It enables to monitor a hosting device. 42 | 43 | ### Creating a pipeline 44 | 45 | Some features or Azure ML pipelines are: 46 | * Schedule tasks and executions, 47 | * You can allocate different computer targets for different steps and coordinate multiple pipelines, 48 | * You can reuse pipeline scripts and customize them, 49 | * You can record and manage input, output, intermediate tasks and data. 50 | 51 | ### Knowledge Check 52 | 53 | 1. The Azure Machine Learning service SDK is which of the following? 54 | 55 | * A visual machine learning development portal. 56 | * A Python package containing functions to use the Azure ML service. 57 | * A special type of Azure virtual machine. 58 | 59 |

60 | 61 | Answer 62 | 63 |

64 | The modules provided by the Azure ML SDK provide the functions you need to work with the service in Python. 65 |

66 |
67 | 68 | 1. Which of the following is the underlying technology of the Azure Machine Learning service? 69 | 70 | * Spark 71 | * Hadoop 72 | * Containerization including Docker and Kubernetes 73 | 74 |
75 | 76 | Answer 77 | 78 |

79 | Containerization is a key technology used by the Azure ML service. 80 |

81 |
82 | 83 | 1. Which of the following is not a component of an Azure Machine Learning service workspace image? 84 | 85 | * An R package 86 | * An environment file that declares dependencies that are needed by the model, scoring script or application. 87 | * A model scoring script 88 | 89 |
90 | 91 | Answer 92 | 93 |

94 | R packages are not part of an Azure Machine Learning service workspace image. 95 |

96 |
97 | 98 | 1. Which of the following descriptions accurately describes Azure Machine Learning? 99 | 100 | * A Python library that you can use as an alternative to common machine learning frameworks like Scikit-Learn, PyTorch, and Tensorflow. 101 | * A cloud-based platform for operating machine learning solutions at scale. 102 | * An application for Microsoft Windows that enables you to create machine learning models by using a drag and drop interface. 103 | 104 |
105 | 106 | Answer 107 | 108 |

109 | Cloud based Platform: Azure Machine Learning enables you to manage machine learning model data preparation, training, validation, and deployment. It supports existing frameworks such as Scikit-Learn, PyTorch, and Tensorflow; and provides a cross-platform platform for operationalizing machine learning in the cloud. 110 |

111 |
112 | 113 | 1. Which edition of Azure Machine Learning workspace should you provision if you only plan to use the graphical Designer tool to train machine learning models? 114 | 115 | * Basic 116 | * Enterprise 117 | 118 |
119 | 120 | Answer 121 | 122 |

123 | The visual Designer tool is not available in Basic edition workspaces, so you must create an Enterprise workspace to use it. 124 |

125 |
126 | 127 | 1. You are using the Azure Machine Learning Python SDK to write code for an experiment. You must log metrics from each run of the experiment, and be able to retrieve them easily from each run. What should you do? 128 | 129 | * Add print statements to the experiment code to print the metrics. 130 | * Save the experiment data in the outputs folder. 131 | * Use the log* methods of the Run class to record named metrics. 132 | 133 |
134 | 135 | Answer 136 | 137 |

138 | To record metrics in an experiment run, use the Run.log* methods. 139 |

140 |
141 | 142 | 143 | 144 |

145 |
146 | 147 | --- 148 | 149 | ## Train a local ML model with Azure Machine Learning service 150 | 151 |
152 | 153 | Show content 154 | 155 |

156 | 157 | ### Learning Objectives 158 | 159 | 160 | * Use an Estimator to run a model training script as an Azure Machine Learning experiment. 161 | * Create reusable, parameterized training scripts. 162 | * Register models, including metadata such as performance metrics. 163 | 164 | > As this is a rather practical module, you can refer to the labs notebooks or directly to Azure's docs. 165 | 166 | ### What is HyperDrive 167 | 168 | HyperDrive is a built-in service that automatically launches multiple experiments in parallel each with different parameter configurations. Azure Machine Learning then automatically finds the configuration that results in the best performance measured by the metric you choose. The service will terminate poorly performing training runs to minimize compute resources usage. 169 | 170 | ### Azure Machine Learning estimators 171 | 172 | In Azure Machine Learning, you can use a **Run Configuration** and a **Script Run Configuration** to run a script-based experiment that trains a machine learning model. However, these configurations may end up being really complex, so another abstraction layer is added: An **Estimator** encapsulates a run configuration and a script configuration in a single object. 173 | 174 | We have some default Estimators for frameworks such as Scikit Learn, Pytorch and TF. 175 | 176 | #### Writing a Script to Train a Model 177 | 178 | After training a model, it should be saved in the **outputs** directory. For example witch SKlearn: 179 | 180 | ```python 181 | from azureml.core import Run 182 | import joblib 183 | 184 | # Get the experiment run context 185 | run = Run.get_context() 186 | 187 | # Train and test... 188 | 189 | # Save the trained model 190 | os.makedirs('outputs', exist_ok=True) 191 | joblib.dump(value=model, filename='outputs/model.pkl') 192 | 193 | run.complete() 194 | ``` 195 | 196 | #### Using an Estimator 197 | 198 | You can use a generic Estimator class to define a run configuration for a training script like this: 199 | 200 | ```python 201 | from azureml.train.estimator import Estimator 202 | from azureml.core import Experiment 203 | 204 | # Create an estimator 205 | estimator = Estimator(source_directory='experiment_folder', 206 | entry_script='training_script.py', 207 | compute_target='local', 208 | conda_packages=['scikit-learn'] 209 | ) 210 | 211 | # Or use a framework specific estimator as 212 | estimator = SKLearn(source_directory='experiment_folder', 213 | entry_script='training_script.py' 214 | compute_target='local' 215 | ) 216 | 217 | # Create and run an experiment 218 | experiment = Experiment(workspace = ws, name = 'training_experiment') 219 | run = experiment.submit(config=estimator) 220 | ``` 221 | 222 | ### Using script parameters 223 | 224 | Used to increase the flexibility of script-based experiments. 225 | 226 | These parameters are read as usual Python parameters in scripts. So for example, after setting the `Run`: 227 | 228 | ```python 229 | # Set regularization hyperparameter 230 | parser = argparse.ArgumentParser() 231 | parser.add_argument('--reg_rate', type=float, dest='reg', default=0.01) 232 | args = parser.parse_args() 233 | reg = args.reg 234 | ``` 235 | 236 | To use parameters in **Estimators**, add the `script_params` value as a dict: 237 | 238 | ```python 239 | # Create an estimator 240 | estimator = SKLearn(source_directory='experiment_folder', 241 | entry_script='training_script.py', 242 | script_params = {'--reg_rate': 0.1}, 243 | compute_target='local' 244 | ) 245 | ``` 246 | 247 | ### Registering models 248 | 249 | After running an experiment that trains a model you can use a reference to the Run object to retrieve its outputs, including the trained model. 250 | 251 | #### Retrieving Model Files 252 | 253 | From the `run` object we can get all the files that it generated with `run.get_file_names()` and download the models as (recall how we said that usually those were stored under `outputs/`) 254 | 255 | ```python 256 | run.download_file(name='outputs/model.pkl', output_file_path='model.pkl') 257 | ``` 258 | 259 | #### Registering a Model 260 | 261 | With `Model.register()` we can save different versions of our models: 262 | 263 | ```python 264 | from azureml.core import Model 265 | 266 | model = Model.register(workspace=ws, 267 | model_name='classification_model', 268 | model_path='model.pkl', # local path 269 | description='A classification model', 270 | tags={'dept': 'sales'}, 271 | model_framework=Model.Framework.SCIKITLEARN, 272 | model_framework_version='0.20.3') 273 | ``` 274 | 275 | Or the same by referencing the `run` object: 276 | 277 | ```python 278 | run.register_model( model_name='classification_model', 279 | model_path='outputs/model.pkl', # run outputs path 280 | description='A classification model', 281 | tags={'dept': 'sales'}, 282 | model_framework=Model.Framework.SCIKITLEARN, 283 | model_framework_version='0.20.3') 284 | ``` 285 | 286 | We can then view all the models we saved by using: 287 | 288 | ```python 289 | for model in Model.list(ws): 290 | # Get model name and auto-generated version 291 | print(model.name, 'version:', model.version) 292 | ``` 293 | 294 | ### Knowledge Check 295 | 296 | 1. An Experiment contains which of the following? 297 | 298 | * A composition of a series of runs 299 | * A Docker image 300 | * The data used for model training 301 | 302 | 303 |

304 | 305 | Answer 306 | 307 |

308 | A composition of a series of runs: Azure ML Studio provides a visual drag and drop machine learning development portal but that is a separate offering. 309 |

310 |
311 | 312 | 313 | 1. A run refers to which of the following? 314 | 315 | * Python code for a specific task such as training a model or tuning hyperparameters. Run does the job of logging metrics and uploading the results to Azure platform. 316 | * A set of containers managed by Kubertes to run your models. 317 | * A Spark cluster. 318 | 319 | 320 | 321 |
322 | 323 | Answer 324 | 325 |

326 | Python code for a specific task such as training a model or tuning hyperparameters. Run does the job of logging metrics and uploading the results to Azure platform. 327 |

328 |
329 | 330 | 331 | 1. A hyperparameter is which of the following? 332 | 333 | * A model parameter that cannot be learned by the model training process. 334 | * A model feature derived from the source data. 335 | * A parameter that automatically and frequently changes value during a single model training run. 336 | 337 | 338 | 339 |
340 | 341 | Answer 342 | 343 |

344 | Hyperparameters control how the model training executes and must be set before model training. 345 |

346 |
347 | 348 | 349 | 1. Before you can train and run experiments in your code, you must do which of the following? 350 | 351 | * Create a virtual machine 352 | * Log out of the Azure portal 353 | * Write a model scoring script 354 | 355 | 356 |
357 | 358 | Answer 359 | 360 |

361 | Your Python script needs to connect to the Azure ML workspace before you can train and run experiments. 362 |

363 |
364 | 365 | 366 | 1. Which of the following is a technique for determining hyperparameter values? 367 | 368 | * grid searching 369 | * Bayesian sampling 370 | * hyper searching 371 | 372 | 373 | 374 |
375 | 376 | Answer 377 | 378 |

379 | Grid searching is often used by data scientists to find the best hyperparamter value. 380 |

381 |
382 | 383 | 1. You have written a script that uses the Scikit-Learn framework to train a model. Which framework-specific estimator should you use to run the script as an experiment? 384 | 385 | * PyTorch 386 | * Tensorflow 387 | * SKLearn 388 | 389 | 390 |
391 | 392 | Answer 393 | 394 |

395 | To run a scikit-learn training script as an experiment, use the generic Estimator estimator or a SKLearn estimator. 396 |

397 |
398 | 399 | 400 | 1. You have run an experiment to train a model. You want the model to be stored in the workspace, and available to other experiments and published services. What should you do? 401 | 402 | * Register the model in the workspace. 403 | * Save the model as a file in a Compute Instance. 404 | * Save the experiment script as a notebook. 405 | 406 |
407 | 408 | Answer 409 | 410 |

411 | To store a model in the workspace, register it. 412 |

413 |
414 | 415 |

416 |
417 | 418 | --- 419 | 420 | 421 | ## Working with Data in Azure Machine Learning 422 | 423 |
424 | 425 | Show content 426 | 427 |

428 | 429 | ### Learning objectives 430 | 431 | * Create and use datastores 432 | * Create and use datasets 433 | 434 | ### Introduction to datastores 435 | 436 | Abstractions for cloud data sources. They hold the connection information and can be used to both read and write. The different sources could be (sample from [here](https://docs.microsoft.com/en-us/azure/machine-learning/concept-data#access-data-in-storage)): 437 | 438 | * Azure Storage (blob and file containers) 439 | * Azure Data Lake Storage 440 | * Azure SQL Database 441 | * Azure Databricks file system (DBFS) 442 | 443 | #### Using datastores 444 | 445 | Each workspace has two built-in datastores (blob container + Azure Storage File container) used as system storage by AMLS. You have a limited use on top of those. 446 | 447 | The good part of using external datasources - which is the usual - is the ability to share data accross multiple experiments, regardless of the compute context in which those experiments are running. 448 | 449 | You can use the AMLS SDK to store / retrieve data from the datastores. 450 | 451 | #### Registering a datastore 452 | 453 | To register a datastore, you could either use the UI in AMLS or the SDK: 454 | 455 | ```python 456 | from azureml.core import Workspace, Datastore 457 | 458 | ws = Workspace.from_config() 459 | 460 | # Register a new datastore 461 | blob_ds = Datastore.register_azure_blob_container( 462 | workspace=ws, 463 | datastore_name='blob_data', 464 | container_name='data_container', 465 | account_name='az_store_acct', 466 | account_key='123456abcde789…' 467 | ) 468 | ``` 469 | 470 | #### Managing datastores 471 | 472 | Again, managing can be done via UI or SDK: 473 | 474 | ```python 475 | # list 476 | for ds_name in ws.datastores: 477 | print(ds_name) 478 | 479 | # get 480 | blob_store = Datastore.get(ws, datastore_name='blob_data') 481 | 482 | # get default 483 | default_store = ws.get_default_datastore() 484 | 485 | # set default 486 | ws.set_default_datastore('blob_data') 487 | ``` 488 | 489 | ### Use datastores 490 | 491 | You can interact directly with a datastore via the SDK and *pass data references* to scripts that need to access data. 492 | 493 | > OBS: For blobs to work correctly as a datastore and be accessible in the code to upload / download, the storage account should be Standard / Hot, not Premium! 494 | 495 | #### Working directly with a datastore 496 | 497 | ```python 498 | blob_ds.upload(src_dir='/files', 499 | target_path='/data/files', 500 | overwrite=True, show_progress=True) 501 | 502 | blob_ds.download(target_path='downloads', 503 | prefix='/data', 504 | show_progress=True) 505 | ``` 506 | 507 | #### Using data references 508 | 509 | When you want to use a datastore in an experiment script, you must pass a data reference to the script. There are the following accesses: 510 | 511 | * **Download**: Contents are downloaded to the compute context. 512 | * **Upload**: The files generated by the experiment are uploaded to the datastore after the run completes. 513 | * **Mount**: When experiments run on a remote compute (not local), you can mount the path. 514 | 515 | To pass the reference to an experiment script, define the `script_params`: 516 | 517 | ```python 518 | data_ref = blob_ds.path('data/files').as_download(path_on_compute='training_data') 519 | estimator = SKLearn(source_directory='experiment_folder', 520 | entry_script='training_script.py' 521 | compute_target='local', 522 | script_params = {'--data_folder': data_ref}) 523 | ``` 524 | 525 | `script_params` can then be retrieved via `argparse`. 526 | 527 | ### Introduction to datasets 528 | 529 | Datasets are versioned packaged data objects that can be easily consumed in experiments and pipelines. They are the recommended way to work with data. 530 | 531 | Datasets can be based on files in a datastore or on URLs and other resources. 532 | 533 | #### Types of dataset 534 | 535 | * **Tabular**: useful when when we work, for example, with pandas. 536 | * **File**: For unstructured data. Dataset will present a list of paths that can be read as thought from the file system. For example, for images in a CNN. 537 | 538 | #### Creating and registering datasets 539 | 540 | You can use the UI or the SDK to create datasets from files or paths (which can include wildcards `*` for regex). 541 | 542 | ##### Creating and registering tabular datasets 543 | 544 | ```python 545 | from azureml.core import Dataset 546 | 547 | blob_ds = ws.get_default_datastore() 548 | csv_paths = [(blob_ds, 'data/files/current_data.csv'), 549 | (blob_ds, 'data/files/archive/*.csv')] 550 | tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths) 551 | tab_ds = tab_ds.register(workspace=ws, name='csv_table') 552 | ``` 553 | 554 | ##### Creating and registering file datasets 555 | 556 | ```python 557 | from azureml.core import Dataset 558 | 559 | blob_ds = ws.get_default_datastore() 560 | file_ds = Dataset.File.from_files(path=(blob_ds, 'data/files/images/*.jpg')) 561 | file_ds = file_ds.register(workspace=ws, name='img_files') 562 | ``` 563 | 564 | #### Retrieving a registered dataset 565 | 566 | You can retrieve datasets by the `datasets` attribute of a `Workspace` or by calling `get_by_name` or `get_by_id` of the `Dataset` class: 567 | 568 | ```python 569 | import azureml.core 570 | from azureml.core import Workspace, Dataset 571 | 572 | # Load the workspace from the saved config file 573 | ws = Workspace.from_config() 574 | 575 | # Get a dataset from the workspace datasets collection 576 | ds1 = ws.datasets['csv_table'] 577 | 578 | # Get a dataset by name from the datasets class 579 | ds2 = Dataset.get_by_name(ws, 'img_files') 580 | ``` 581 | 582 | #### Dataset versioning 583 | 584 | Useful to reproduce experiments with data in the same state. Use the `create_new_version` property when registering a dataset: 585 | 586 | ```python 587 | img_paths = [(blob_ds, 'data/files/images/*.jpg'), 588 | (blob_ds, 'data/files/images/*.png')] 589 | file_ds = Dataset.File.from_files(path=img_paths) 590 | file_ds = file_ds.register(workspace=ws, name='img_files', create_new_version=True) 591 | ``` 592 | 593 | To retrieve a specific version: 594 | 595 | ```python 596 | img_ds = Dataset.get_by_name(workspace=ws, name='img_files', version=2) 597 | ``` 598 | 599 | ### Use datasets 600 | 601 | You can read data directly from a dataset, or you can pass a dataset as a named input to a script configuration or estimator. 602 | 603 | #### Working with a dataset directly 604 | 605 | If you have a reference to a dataset, you can access its contents directly. 606 | 607 | ```python 608 | df = tab_ds.to_pandas_dataframe() 609 | ``` 610 | 611 | When working with a file dataset, use `to_path()`: 612 | 613 | ```python 614 | for file_path in file_ds.to_path(): 615 | print(file_path) 616 | ``` 617 | 618 | #### Passing a dataset to an experiment script 619 | 620 | When you need to access a dataset in an experiment script, you can pass the dataset as an input to a **ScriptRunConfig** or an **Estimator**: 621 | 622 | ```python 623 | estimator = SKLearn( source_directory='experiment_folder', 624 | entry_script='training_script.py', 625 | compute_target='local', 626 | inputs=[tab_ds.as_named_input('csv_data')], 627 | pip_packages=['azureml-dataprep[pandas]') 628 | ``` 629 | 630 | Since the script will need to work with a **Dataset** object, you must include either the full **azureml-sdk** package or the **azureml-dataprep** package with the **pandas** extra library in the script's compute environment. 631 | 632 | Then, in the experiment 633 | 634 | ```python 635 | run = Run.get_context() 636 | data = run.input_datasets['csv_data'].to_pandas_dataframe() 637 | ``` 638 | 639 | Finally, when passing a file dataset, you must specify the access mode: 640 | 641 | ```python 642 | estimator = Estimator( source_directory='experiment_folder', 643 | entry_script='training_script.py' 644 | compute_target='local', 645 | inputs=[img_ds.as_named_input('img_data').as_download(path_on_compute='data')], 646 | pip_packages=['azureml-dataprep[pandas]') 647 | ``` 648 | 649 | ### Knowledge Check 650 | 651 | 1. You've uploaded some data files to a folder in a blob container, and registered the blob container as a datastore in your Azure Machine Learning workspace. You want to run a script as an experiment that loads the data files and trains a model. What should you do? 652 | 653 | * Save the experiment script in the same blob folder as the data files. 654 | * Create a data reference for the datastore location and pass it to the script as a parameter. 655 | * Create global variables for the Azure Storage account name and key in the experiment script. 656 | 657 |

658 | 659 | Answer 660 | 661 |

662 | To access a path in a datastore in an experiment script, you must create a data reference and pass it to the script as a parameter. The script can then read data from the data reference parameter just like a local file path. 663 |

664 |
665 | 666 | 1. You've registered a dataset in your workspace. You want to use the dataset in an experiment script that is run using an estimator. What should you do? 667 | 668 | * Pass the dataset as a named input to the estimator. 669 | * Create a data reference for the datastore location where the dataset data is stored, and pass it to the script as a parameter. 670 | * Use the dataset to save the data as a CSV file in the experiment script folder before running the experiment. 671 | 672 |
673 | 674 | Answer 675 | 676 |

677 | To access a dataset in an experiment script, pass the dataset as a named input to the estimator. 678 |

679 |
680 | 681 |

682 |
683 | 684 | --- 685 | 686 | 687 | ## Working with Compute Contexts in Azure Machine Learning 688 | 689 |
690 | 691 | Show content 692 | 693 |

694 | 695 | ### Learning objectives 696 | 697 | * Create and use environments. 698 | * Create and use compute targets. 699 | 700 | ### Introduction to environments 701 | 702 | Python code runs in the context of a virtual environment that defines the version of the Python runtime to be used as well as the installed packages available to the code. 703 | 704 | #### Environments in Azure Machine Learning 705 | 706 | In general, AML handles environment creationm, package installation and environment registration for you - usually through the creation of Docker containers. You'd just need to specify the packages you want. You could also manage the environments if needed. 707 | 708 | Environments are encapsulated by the **Environment** class; which you can use to create environments and specify runtime configuration for an experiment. 709 | 710 | #### Creating environments 711 | 712 | * **Creating an environment from a specification file**: based on conda or pip. For example, a file named **conda.yml** 713 | 714 | ``` 715 | name: py_env 716 | dependencies: 717 | - numpy 718 | - pandas 719 | - scikit-learn 720 | - pip: 721 | - azureml-defaults 722 | ``` 723 | 724 | Then, create the environment with the SDK 725 | 726 | ```python 727 | from azureml.core import Environment 728 | 729 | env = Environment.from_conda_specification(name='training_environment', 730 | file_path='./conda.yml') 731 | ``` 732 | 733 | * **Creating an environment from an existing Conda environment**: If you have already a defined Conda environment on the workstation you can reuse it in AML 734 | 735 | ```python 736 | from azureml.core import Environment 737 | 738 | env = Environment.from_existing_conda_environment(name='training_environment', 739 | conda_environment_name='py_env') 740 | ``` 741 | 742 | * **Creating an environment by specifying packages**: using a **CondaDependencies** object: 743 | 744 | ```python 745 | from azureml.core import Environment 746 | from azureml.core.conda_dependencies import CondaDependencies 747 | 748 | env = Environment('training_environment') 749 | deps = CondaDependencies.create(conda_packages=['scikit-learn','pandas','numpy'], 750 | pip_packages=['azureml-defaults']) 751 | env.python.conda_dependencies = deps 752 | ``` 753 | 754 | #### Registering and reusing environments 755 | 756 | After you've created an environment, you can register it in your workspace and reuse it for future experiments that have the same Python dependencies. 757 | 758 | Register it via `env.register(workspace=ws)` and get the registered environments in a workspace using `Environment.list(workspace=ws)`. 759 | 760 | #### Retrieving and using an environment 761 | 762 | You can retrieve an environment and assign it to an **Estimator** or a **ScriptRunConfig**: 763 | 764 | ```python 765 | from azureml.core import Environment, Estimator 766 | 767 | training_env = Environment.get(workspace=ws, name='training_environment') 768 | estimator = Estimator(source_directory='experiment_folder' 769 | entry_script='training_script.py', 770 | compute_target='local', 771 | environment_definition=training_env) 772 | ``` 773 | 774 | > OBS: When an experiment based on the estimator is run, Azure Machine Learning will look for an existing environment that matches the definition, and if none is found a new environment will be created based on the registered environment specification. 775 | 776 | ### Introduction to compute targets 777 | 778 | Compute Targets are physical or virtual computers on which experiments are run. You can assign experiments to specific compute targets. This means that one can test on cheaper ones and run individual processes on GPUs, if needed. 779 | 780 | You pay-by-use as compute targets 781 | 782 | * Start on-demand and stop automatically when no longer required. 783 | * Scale automatically based on workload processing needs (for model training) 784 | 785 | #### Types of compute 786 | 787 | * **Local compute**: Great for test and development. The experiment will run where the code is initiated, e.g., you own computer or a VM with jupyter on top. 788 | * **Training Clusters**: multi-node clusters of VMs that automatically scale up or down to meet demand for training workloads. Useful when working with large data or when needing parallel processing. 789 | * **Inference clusters**: To deploy trained models as production services. They use containerization to enable rapid initialization of compute for on-demand inferencing. 790 | * **Attached compute**: You can attach another Azure-based compute environment to AML, as another VM or a Databricks cluster. They can be used for certain types of workload. 791 | 792 | More info [here](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target). 793 | 794 | ### Create compute targets 795 | 796 | Can be done via UI or SDK. UI is the most common. 797 | 798 | #### Creating a managed compute target with the SDK 799 | 800 | They are managed by AML, e.g., a training cluster. 801 | 802 | ```python 803 | from azureml.core import Workspace 804 | from azureml.core.compute import ComputeTarget, AmlCompute 805 | 806 | # Load the workspace from the saved config file 807 | ws = Workspace.from_config() 808 | 809 | # Specify a name for the compute (unique within the workspace) 810 | compute_name = 'aml-cluster' 811 | 812 | # Define compute configuration 813 | compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2', 814 | min_nodes=0, max_nodes=4, 815 | vm_priority='dedicated') 816 | 817 | # Create the compute 818 | aml_cluster = ComputeTarget.create(ws, compute_name, compute_config) 819 | aml_cluster.wait_for_completion(show_output=True) 820 | ``` 821 | 822 | > Priority can be **dedicated** to use for this cluster or **low priority**, for less cost but the possibility to be preemted. 823 | 824 | #### Attaching an unmanaged compute target with the SDK 825 | 826 | Unmanaged instances are defined and managed outside of the AML, e.g., a VM or a Databricks. 827 | 828 | ```python 829 | from azureml.core import Workspace 830 | from azureml.core.compute import ComputeTarget, DatabricksCompute 831 | 832 | # Load the workspace from the saved config file 833 | ws = Workspace.from_config() 834 | 835 | # Specify a name for the compute (unique within the workspace) 836 | compute_name = 'db_cluster' 837 | 838 | # Define configuration for existing Azure Databricks cluster 839 | db_workspace_name = 'db_workspace' 840 | db_resource_group = 'db_resource_group' 841 | db_access_token = '1234-abc-5678-defg-90...' 842 | db_config = DatabricksCompute.attach_configuration(resource_group=db_resource_group, 843 | workspace_name=db_workspace_name, 844 | access_token=db_access_token) 845 | 846 | # Create the compute 847 | databricks_compute = ComputeTarget.attach(ws, compute_name, db_config) 848 | databricks_compute.wait_for_completion(True) 849 | ``` 850 | 851 | #### Checking for an existing compute target 852 | 853 | You can check if a compute targets exists to only create it otherwise: 854 | 855 | ```python 856 | from azureml.core.compute import ComputeTarget, AmlCompute 857 | from azureml.core.compute_target import ComputeTargetException 858 | 859 | compute_name = "aml-cluster" 860 | 861 | # Check if the compute target exists 862 | try: 863 | aml_cluster = ComputeTarget(workspace=ws, name=compute_name) 864 | print('Found existing cluster.') 865 | except ComputeTargetException: 866 | # If not, create it 867 | compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2', 868 | max_nodes=4) 869 | aml_cluster = ComputeTarget.create(ws, compute_name, compute_config) 870 | 871 | aml_cluster.wait_for_completion(show_output=True) 872 | ``` 873 | 874 | More info [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets). 875 | 876 | ### Use compute targets 877 | 878 | You can use them to run specific workloads: 879 | 880 | ```python 881 | from azureml.core import Environment, Estimator 882 | 883 | compute_name = 'aml-cluster' 884 | 885 | training_env = Environment.get(workspace=ws, name='training_environment') 886 | 887 | estimator = Estimator(source_directory='experiment_folder', 888 | entry_script='training_script.py', 889 | environment_definition=training_env, 890 | compute_target=compute_name) 891 | ``` 892 | 893 | > OBS: When an experiment for the estimator is submitted, the run will be queued while the compute target is started and the specified environment deployed to it, and then the run will be processed on the compute environment. 894 | 895 | Instead of working by name, you could also pass a **ComputeTarget** object: 896 | 897 | ```python 898 | from azureml.core import Environment, Estimator 899 | from azureml.core.compute import ComputeTarget 900 | 901 | compute_name = 'aml-cluster' 902 | training_cluster = ComputeTarget(workspace=ws, name=compute_name) 903 | 904 | training_env = Environment.get(workspace=ws, name='training_environment') 905 | 906 | estimator = Estimator(source_directory='experiment_folder', 907 | entry_script='training_script.py', 908 | environment_definition=training_env, 909 | compute_target=training_cluster) 910 | ``` 911 | 912 | ### Knowledge Check 913 | 914 | 1. You're using the Azure Machine Learning Python SDK to run experiments. You need to create an environment from a Conda configuration (.yml) file. Which method of the Environment class should you use? 915 | 916 | * create 917 | * create_from_conda_specification 918 | * create_from_existing_conda_environment 919 | 920 |

921 | 922 | Answer 923 | 924 |

925 | Use the create_from_conda_specification method to create an environment from a configuration file. The create method requires you to explicitly specify conda and pip packages, and the create_from_existing_conda_environment requires an existing environment on the computer. 926 |

927 |
928 | 929 | 1. You must create a compute target for training experiments that require a graphical processing unit (GPU). You want to be able to scale the compute so that multiple nodes are started automatically as required. Which kind of compute target should you create? 930 | 931 | * Compute Instance 932 | * Training Cluster 933 | * Inference Cluster 934 | 935 |
936 | 937 | Answer 938 | 939 |

940 | Use a training cluster to create multiple nodes of GPU-enabled VMs that are started automatically as needed. 941 |

942 |
943 | 944 |

945 |
946 | 947 | --- 948 | 949 | 950 | ## Orchestrating machine learning with pipelines 951 | 952 |
953 | 954 | Show content 955 | 956 |

957 | 958 | ### Learning objectives 959 | 960 | * Create an Azure Machine Learning pipeline. 961 | * Publish an Azure Machine Learning pipeline. 962 | * Schedule an Azure Machine Learning pipeline. 963 | 964 | ### Introduction to pipelines 965 | 966 | A pipeline is a workflow of machine learning tasks in which each task is implemented as a step. Steps can be sequential or parallel and you can choose a specific compute target for them to run on. 967 | 968 | A pipeline can be executed as a process by running the pipeline as an experiment. 969 | 970 | They can be triggered via an scheduler or through a REST endpoint. 971 | 972 | #### Pipeline steps 973 | 974 | There are different types of steps: 975 | * **PythonScriptStep**: runs a specific python script. 976 | * **EstimatorStep**: runs an estimator. 977 | * **DataTransferStep**: Uses Azure Data Factory to copy data between data stores. 978 | * **DatabricksStep**: runs a notebook, script or compiled JAR on dbks. 979 | * **AdlaStep**: runs a U-SQL job in Azure Data Lake Analytics. 980 | 981 | You can find the full list [here](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps?view=azure-ml-py). 982 | 983 | #### Defining steps in a pipeline 984 | 985 | First, you define the steps and then assemble the pipeline based on those: 986 | 987 | ```python 988 | from azureml.pipeline.steps import PythonScriptStep, EstimatorStep 989 | 990 | # Step to run a Python script 991 | step1 = PythonScriptStep(name = 'prepare data', 992 | source_directory = 'scripts', 993 | script_name = 'data_prep.py', 994 | compute_target = 'aml-cluster', 995 | runconfig = run_config) 996 | 997 | # Step to run an estimator 998 | step2 = EstimatorStep(name = 'train model', 999 | estimator = sk_estimator, 1000 | compute_target = 'aml-cluster') 1001 | 1002 | from azureml.pipeline.core import Pipeline 1003 | from azureml.core import Experiment 1004 | 1005 | # Construct the pipeline 1006 | train_pipeline = Pipeline(workspace = ws, steps = [step1,step2]) 1007 | 1008 | # Create an experiment and run the pipeline 1009 | experiment = Experiment(workspace = ws, name = 'training-pipeline') 1010 | pipeline_run = experiment.submit(train_pipeline) 1011 | ``` 1012 | 1013 | ### Pass data between pipeline steps 1014 | 1015 | It is not unusual to have steps depending on previous steps' results. 1016 | 1017 | #### The PipelineData object 1018 | 1019 | The **PipelineData** object is a special kind of **DataReference** that: 1020 | 1021 | * References a location in a datastore. 1022 | * Creates a data dependency between pipeline steps. 1023 | 1024 | It is an intermediary store between two subsequent steps: `step1 -> PipelineData -> step2`. 1025 | 1026 | #### PipelineData step inputs and outputs 1027 | 1028 | To use a **PipelineData** object you must: 1029 | 1. Define a named **PipelineData** object that references a location in a datastore. 1030 | 2. Configure the input / output of the steps that use it. 1031 | 3. Pass the **PipelineData** object as a script parameter in steps that run scripts (and add the `argparse` in those scripts, as we do with usual data refs). 1032 | 1033 | ```python 1034 | from azureml.pipeline.core import PipelineData 1035 | from azureml.pipeline.steps import PythonScriptStep, EstimatorStep 1036 | 1037 | # Get a dataset for the initial data 1038 | raw_ds = Dataset.get_by_name(ws, 'raw_dataset') 1039 | 1040 | # Define a PipelineData object to pass data between steps 1041 | data_store = ws.get_default_datastore() 1042 | prepped_data = PipelineData('prepped', datastore=data_store) 1043 | 1044 | # Step to run a Python script 1045 | step1 = PythonScriptStep(name = 'prepare data', 1046 | source_directory = 'scripts', 1047 | script_name = 'data_prep.py', 1048 | compute_target = 'aml-cluster', 1049 | runconfig = run_config, 1050 | # Specify dataset as initial input 1051 | inputs=[raw_ds.as_named_input('raw_data')], 1052 | # Specify PipelineData as output 1053 | outputs=[prepped_data], 1054 | # Also pass as data reference to script 1055 | arguments = ['--folder', prepped_data]) 1056 | 1057 | # Step to run an estimator 1058 | step2 = EstimatorStep(name = 'train model', 1059 | estimator = sk_estimator, 1060 | compute_target = 'aml-cluster', 1061 | # Specify PipelineData as input 1062 | inputs=[prepped_data], 1063 | # Pass as data reference to estimator script 1064 | estimator_entry_script_arguments=['--folder', prepped_data]) 1065 | ``` 1066 | 1067 | ### Reuse pipeline steps 1068 | 1069 | AML includes some caching and reuse feature to reduce the time to run some steps. 1070 | 1071 | #### Managing step output reuse 1072 | 1073 | By default, the step output from a previous pipeline run is reused without rerunning the step. This is useful if the scripts, sources and directories have no change at all, otherwise this may lead to stale results. 1074 | 1075 | To control reuse for an individual step, you can use `allow_reuse` parameter: 1076 | 1077 | ```python 1078 | step1 = PythonScriptStep(name = 'prepare data', 1079 | ... 1080 | # Disable step reuse 1081 | allow_reuse = False) 1082 | ``` 1083 | 1084 | #### Forcing all steps to run 1085 | 1086 | You can force all steps to run regardless of individual reuse by setting the `regenerate_outputs` param at submision time: 1087 | 1088 | ```python 1089 | pipeline_run = experiment.submit(train_pipeline, regenerate_outputs=True) 1090 | ``` 1091 | 1092 | ### Publish pipelines 1093 | 1094 | After you have created a pipeline, you can publish it to create a REST endpoint through which the pipeline can be run on demand. 1095 | 1096 | ```python 1097 | published_pipeline = pipeline.publish(name='training_pipeline', 1098 | description='Model training pipeline', 1099 | version='1.0') 1100 | ``` 1101 | 1102 | You can also publish the pipeline on a successful run: 1103 | 1104 | ```python 1105 | # Get the most recent run of the pipeline 1106 | pipeline_experiment = ws.experiments.get('training-pipeline') 1107 | run = list(pipeline_experiment.get_runs())[0] 1108 | 1109 | # Publish the pipeline from the run 1110 | published_pipeline = run.publish_pipeline(name='training_pipeline', 1111 | description='Model training pipeline', 1112 | version='1.0') 1113 | ``` 1114 | 1115 | To get the endpoint 1116 | 1117 | ```python 1118 | rest_endpoint = published_pipeline.endpoint 1119 | print(rest_endpoint) 1120 | ``` 1121 | 1122 | #### Using a published pipeline 1123 | 1124 | To use the endpoint, you need to get the token from a service principal with permission to run the pipeline. 1125 | 1126 | ```python 1127 | import requests 1128 | 1129 | response = requests.post(rest_endpoint, 1130 | headers=auth_header, 1131 | json={"ExperimentName": "run_training_pipeline"}) 1132 | run_id = response.json()["Id"] 1133 | print(run_id) 1134 | ``` 1135 | 1136 | ### Use pipeline parameters 1137 | 1138 | To define parameters for a pipeline, create a **PipelineParameter** object for each parameter, and specify each parameter in at least one step. 1139 | 1140 | ```python 1141 | from azureml.pipeline.core.graph import PipelineParameter 1142 | 1143 | reg_param = PipelineParameter(name='reg_rate', default_value=0.01) 1144 | 1145 | ... 1146 | 1147 | step2 = EstimatorStep(name = 'train model', 1148 | estimator = sk_estimator, 1149 | compute_target = 'aml-cluster', 1150 | inputs=[prepped], 1151 | estimator_entry_script_arguments=['--folder', prepped, 1152 | '--reg', reg_param]) 1153 | ``` 1154 | 1155 | > OBS: You must define parameters for a pipeline before publishing it. 1156 | 1157 | #### Running a pipeline with a parameter 1158 | 1159 | After publishing a pipeline with a parameter, you can specify it in the JSON payload in the REST call: 1160 | 1161 | ```python 1162 | response = requests.post(rest_endpoint, 1163 | headers=auth_header, 1164 | json={"ExperimentName": "run_training_pipeline", 1165 | "ParameterAssignments": {"reg_rate": 0.1}}) 1166 | ``` 1167 | 1168 | ### Schedule pipelines 1169 | 1170 | #### Scheduling a pipeline for periodic intervals 1171 | 1172 | To schedule a pipeline to run at periodic intervals, you must define a **ScheduleRecurrance** that determines the run frequency, and use it to create a **Schedule**. 1173 | 1174 | ```python 1175 | from azureml.pipeline.core import ScheduleRecurrence, Schedule 1176 | 1177 | daily = ScheduleRecurrence(frequency='Day', interval=1) 1178 | pipeline_schedule = Schedule.create(ws, name='Daily Training', 1179 | description='trains model every day', 1180 | pipeline_id=published_pipeline.id, 1181 | experiment_name='Training_Pipeline', 1182 | # daily schedule 1183 | recurrence=daily) 1184 | ``` 1185 | 1186 | #### Triggering a pipeline run on data changes 1187 | 1188 | You can also monitor a specified path on a datastore. This will become a trigger for a new run. 1189 | 1190 | ```python 1191 | from azureml.core import Datastore 1192 | from azureml.pipeline.core import Schedule 1193 | 1194 | training_datastore = Datastore(workspace=ws, name='blob_data') 1195 | pipeline_schedule = Schedule.create(ws, name='Reactive Training', 1196 | description='trains model on data change', 1197 | pipeline_id=published_pipeline_id, 1198 | experiment_name='Training_Pipeline', 1199 | datastore=training_datastore, 1200 | path_on_datastore='data/training') 1201 | ``` 1202 | 1203 | ### Knowledge Check 1204 | 1205 | 1. You're creating a pipeline that includes two steps. Step 1 preprocesses some data, and step 2 uses the preprocessed data to train a model. What type of object should you use to pass data from step 1 to step 2 and create a dependency between these steps? 1206 | 1207 | * Datastore 1208 | * PipelineData 1209 | * Data Reference 1210 | 1211 |

1212 | 1213 | Answer 1214 | 1215 |

1216 | To pass data between steps in a pipeline, use a PipelineData object. 1217 |

1218 |
1219 | 1220 | 2. You've published a pipeline that you want to run every week. You plan to use the Schedule.create method to create the schedule. What kind of object must you create first to configure how frequently the pipeline runs? 1221 | 1222 | * Datastore 1223 | * PipelineParameter 1224 | * ScheduleRecurrance 1225 | 1226 |
1227 | 1228 | Answer 1229 | 1230 |

1231 | You need a ScheduleRecurrance object to create a schedule that runs at a regular interval. 1232 |

1233 |
1234 | 1235 |

1236 |
1237 | 1238 | --- 1239 | 1240 | ## Deploying machine learning models with Azure Machine Learning 1241 | 1242 |
1243 | 1244 | Show content 1245 | 1246 |

1247 | 1248 | ### Learning objectives 1249 | 1250 | * Deploy a model as a real-time inferencing service. 1251 | * Consume a real-time inferencing service. 1252 | * Troubleshoot service deployment 1253 | 1254 | ### Deploying a model as a real-time service 1255 | 1256 | You can deploy a model as a real-time web service to several kinds of compute target: 1257 | * Local compute 1258 | * Azure ML compute instance 1259 | * Azure Container Instance (ACI) 1260 | * AKS 1261 | * Azure Function 1262 | * IoT module 1263 | 1264 | AML uses containers for model packaging and deployment. 1265 | 1266 | #### 1. Register a trained model 1267 | 1268 | After a successful training, you first need to register the model. 1269 | 1270 | To register from a local file: 1271 | 1272 | ```python 1273 | from azureml.core import Model 1274 | 1275 | classification_model = Model.register(workspace=ws, 1276 | model_name='classification_model', 1277 | model_path='model.pkl', # local path 1278 | description='A classification model') 1279 | ``` 1280 | 1281 | Or to reference to the **Run** used to train the model: 1282 | 1283 | ```python 1284 | run.register_model( model_name='classification_model', 1285 | model_path='outputs/model.pkl', # run outputs path 1286 | description='A classification model') 1287 | ``` 1288 | 1289 | #### 2. Define an Inference Configuration 1290 | 1291 | The model will be deployed as a service that consist of: 1292 | 1293 | * A script to load the model and return predictions for submitted data. 1294 | * An environment in which the script will be run. 1295 | 1296 | ##### Creating an Entry Script (or scoring script) 1297 | 1298 | It is a py file that must contain 1299 | 1300 | * `init()`: Called when the service is initialized. 1301 | * `run(raw_data)`: Called when new data is submitted to the service. 1302 | 1303 | ```python 1304 | import json 1305 | import joblib 1306 | import numpy as np 1307 | from azureml.core.model import Model 1308 | 1309 | # Called when the service is loaded 1310 | def init(): 1311 | global model 1312 | # Get the path to the registered model file and load it 1313 | model_path = Model.get_model_path('classification_model') 1314 | model = joblib.load(model_path) 1315 | 1316 | # Called when a request is received 1317 | def run(raw_data): 1318 | # Get the input data as a numpy array 1319 | data = np.array(json.loads(raw_data)['data']) 1320 | # Get a prediction from the model 1321 | predictions = model.predict(data) 1322 | # Return the predictions as any JSON serializable format 1323 | return predictions.tolist() 1324 | ``` 1325 | 1326 | ##### Creating an Environment 1327 | 1328 | You can use **CondaDependencies** 1329 | 1330 | ```python 1331 | from azureml.core.conda_dependencies import CondaDependencies 1332 | 1333 | # Add the dependencies for your model 1334 | myenv = CondaDependencies() 1335 | myenv.add_conda_package("scikit-learn") 1336 | 1337 | # Save the environment config as a .yml file 1338 | env_file = 'service_files/env.yml' 1339 | with open(env_file,"w") as f: 1340 | f.write(myenv.serialize_to_string()) 1341 | print("Saved dependency info in", env_file) 1342 | ``` 1343 | 1344 | ##### Combining the Script and Environment in an InferenceConfig 1345 | 1346 | ```python 1347 | from azureml.core.model import InferenceConfig 1348 | 1349 | classifier_inference_config = InferenceConfig(runtime= "python", 1350 | source_directory = 'service_files', 1351 | entry_script="score.py", 1352 | conda_file="env.yml") 1353 | ``` 1354 | 1355 | #### 3. Define a Deployment Configuration 1356 | 1357 | Now, select the compute target to deploy to. 1358 | 1359 | > OBS: if deploying to AKS, create the cluster and a compute target for it before deploying. 1360 | 1361 | ```python 1362 | from azureml.core.compute import ComputeTarget, AksCompute 1363 | 1364 | cluster_name = 'aks-cluster' 1365 | compute_config = AksCompute.provisioning_configuration(location='eastus') 1366 | production_cluster = ComputeTarget.create(ws, cluster_name, compute_config) 1367 | production_cluster.wait_for_completion(show_output=True) 1368 | ``` 1369 | 1370 | With the compute target created, define the deployment config 1371 | 1372 | ```python 1373 | from azureml.core.webservice import AksWebservice 1374 | 1375 | classifier_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1, 1376 | memory_gb = 1) 1377 | ``` 1378 | 1379 | The code to configure an ACI deployment is similar, except that you do not need to explicitly create an ACI compute target, and you must use the deploy_configuration class from the **azureml.core.webservice.AciWebservice** namespace. Similarly, you can use the **azureml.core.webservice.LocalWebservice** namespace to configure a local Docker-based service. 1380 | 1381 | #### 4. Deploy the Model 1382 | 1383 | ```python 1384 | from azureml.core.model import Model 1385 | 1386 | model = ws.models['classification_model'] 1387 | service = Model.deploy(workspace=ws, 1388 | name = 'classifier-service', 1389 | models = [model], 1390 | inference_config = classifier_inference_config, 1391 | deployment_config = classifier_deploy_config, 1392 | deployment_target = production_cluster) 1393 | service.wait_for_deployment(show_output = True) 1394 | ``` 1395 | 1396 | For ACI or local services, you can omit the deployment_target parameter (or set it to None). 1397 | 1398 | ### Consuming a real-time inferencing service 1399 | 1400 | #### Using the Azure Machine Learning SDK 1401 | 1402 | For testing, you can use the AML SDK 1403 | 1404 | ```python 1405 | import json 1406 | 1407 | # An array of new data cases 1408 | x_new = [[0.1,2.3,4.1,2.0], 1409 | [0.2,1.8,3.9,2.1]] 1410 | 1411 | # Convert the array to a serializable list in a JSON document 1412 | json_data = json.dumps({"data": x_new}) 1413 | 1414 | # Call the web service, passing the input data 1415 | response = service.run(input_data = json_data) 1416 | 1417 | # Get the predictions 1418 | predictions = json.loads(response) 1419 | 1420 | # Print the predicted class for each case. 1421 | for i in range(len(x_new)): 1422 | print (x_new[i]), predictions[i] ) 1423 | ``` 1424 | 1425 | #### Using a REST Endpoint 1426 | 1427 | You can retrieve the service endpoint via the UI or the SDK: 1428 | 1429 | ```python 1430 | endpoint = service.scoring_uri 1431 | print(endpoint) 1432 | ``` 1433 | 1434 | ```python 1435 | import requests 1436 | import json 1437 | 1438 | # An array of new data cases 1439 | x_new = [[0.1,2.3,4.1,2.0], 1440 | [0.2,1.8,3.9,2.1]] 1441 | 1442 | # Convert the array to a serializable list in a JSON document 1443 | json_data = json.dumps({"data": x_new}) 1444 | 1445 | # Set the content type in the request headers 1446 | request_headers = { 'Content-Type':'application/json' } 1447 | 1448 | # Call the service 1449 | response = requests.post(url = endpoint, 1450 | data = json_data, 1451 | headers = request_headers) 1452 | 1453 | # Get the predictions from the JSON response 1454 | predictions = json.loads(response.json()) 1455 | 1456 | # Print the predicted class for each case. 1457 | for i in range(len(x_new)): 1458 | print (x_new[i]), predictions[i] ) 1459 | ``` 1460 | 1461 | #### Authentication 1462 | 1463 | There are two kinds of auth 1464 | 1465 | * **Key**: Requests are authenticated by specifying the key associated with the service. 1466 | * **Token**: Requests are authenticated by providing a JSON Web Token (JWT). 1467 | 1468 | > OBS: By default, authentication is disabled for ACI services, and set to key-based authentication for AKS services (for which primary and secondary keys are automatically generated). You can optionally configure an AKS service to use token-based authentication (which is not supported for ACI services). 1469 | 1470 | You can retrieve the keys for a **WebService** as 1471 | 1472 | ```python 1473 | primary_key, secondary_key = service.get_keys() 1474 | ``` 1475 | 1476 | To use a token, the application needs to use a service-principal auth to verity the identity through AAD and call the **get_token** method to create a time-limited token. 1477 | 1478 | ```python 1479 | import requests 1480 | import json 1481 | 1482 | # An array of new data cases 1483 | x_new = [[0.1,2.3,4.1,2.0], 1484 | [0.2,1.8,3.9,2.1]] 1485 | 1486 | # Convert the array to a serializable list in a JSON document 1487 | json_data = json.dumps({"data": x_new}) 1488 | 1489 | # Set the content type in the request headers 1490 | request_headers = { "Content-Type":"application/json", 1491 | "Authorization":"Bearer " + key_or_token } 1492 | 1493 | # Call the service 1494 | response = requests.post(url = endpoint, 1495 | data = json_data, 1496 | headers = request_headers) 1497 | 1498 | # Get the predictions from the JSON response 1499 | predictions = json.loads(response.json()) 1500 | 1501 | # Print the predicted class for each case. 1502 | for i in range(len(x_new)): 1503 | print (x_new[i]), predictions[i] ) 1504 | ``` 1505 | 1506 | ### Troubleshooting service deployment 1507 | 1508 | #### Check the Service State 1509 | 1510 | ```python 1511 | from azureml.core.webservice import AksWebservice 1512 | 1513 | # Get the deployed service 1514 | service = AciWebservice(name='classifier-service', workspace=ws) 1515 | 1516 | # Check its state 1517 | print(service.state) 1518 | ``` 1519 | 1520 | > OBS: To view the state of a service, you must use the compute-specific service type (for example AksWebservice) and not a generic WebService object. 1521 | 1522 | #### Review Service Logs 1523 | 1524 | ```python 1525 | print(service.get_logs()) 1526 | ``` 1527 | 1528 | #### Deploy to a Local Container 1529 | 1530 | A quick check on runtime errors can be done by deploying to a local container. 1531 | 1532 | ```python 1533 | from azureml.core.webservice import LocalWebservice 1534 | 1535 | deployment_config = LocalWebservice.deploy_configuration(port=8890) 1536 | service = Model.deploy(ws, 'test-svc', [model], inference_config, deployment_config) 1537 | ``` 1538 | 1539 | You can then test the locally deployed service using the SDK `service.run(input_data = json_data)` and troubleshoot runtime issues by making changes to the scoring file and reloading the service without redeploying (this can ONLY be done with a local service) 1540 | 1541 | ```python 1542 | service.reload() 1543 | print(service.run(input_data = json_data)) 1544 | ``` 1545 | 1546 | ### Knowledge Check 1547 | 1548 | 1. You've trained a model using the Python SDK for Azure Machine Learning. You want to deploy the model as a containerized real-time service with high scalability and security. What kind of compute should you create to host the service? 1549 | 1550 | * An Azure Kubernetes Services (AKS) inferencing cluster. 1551 | * A compute instance with GPUs. 1552 | * A training cluster with multiple nodes. 1553 | 1554 |

1555 | 1556 | Answer 1557 | 1558 |

1559 | You should use an AKS cluster to deploy a model as a scalable, secure, containerized service. 1560 |

1561 |
1562 | 1563 | 2. You're deploying a model as a real-time inferencing service. What functions must the entry script for the service include? 1564 | 1565 | * main() and score() 1566 | * base() and train() 1567 | * init() and run() 1568 | 1569 |
1570 | 1571 | Answer 1572 | 1573 |

1574 | You must implement init and run functions in the entry (scoring) script. 1575 |

1576 |
1577 | 1578 |

1579 |
1580 | 1581 | --- 1582 | 1583 | ## Automate machine learning model selection with Azure Machine Learning 1584 | 1585 |
1586 | 1587 | Show content 1588 | 1589 |

1590 | 1591 | ### Learning objectives 1592 | 1593 | > OBS: Azure Machine Learning includes support for automated machine learning through a visual interface in Azure Machine Learning studio for Enterprise edition workspaces only. SDK is enabled in both Basic and Enterprise. 1594 | 1595 | * Use Azure Machine Learning's automated machine learning capabilities to determine the best performing algorithm for your data. 1596 | * Use automated machine learning to preprocess data for training. 1597 | * Run an automated machine learning experiment. 1598 | 1599 | ### Automated machine learning tasks and algorithms 1600 | 1601 | You can automate classification, regression and time series forecasting. 1602 | 1603 | There is a huge [list](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-define-task-type) of supporting algorithms for each task. By default, automated ML will randomly select from the full range of algorithms, but you can block individual algorithms. 1604 | 1605 | ### Preprocessing and featurization 1606 | 1607 | As well as trying the algorithms, automated ML can also apply preprocessing to the data to improve the performance. 1608 | 1609 | * **Scaling and Normalization**: they are applied automatically to prevent any large numeric feature to dominate the taining. 1610 | * **Optional Featurization**: You can choose to apply preprocessing such as: 1611 | * Missing value imputation 1612 | * Categorical encoding 1613 | * Dropping high cardinality features (as IDs) 1614 | * Feature engineering (e.g., deriving individual date parts from DateTime features) 1615 | 1616 | More information [here](https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml#preprocessing). 1617 | 1618 | ### Running automated machine learning experiments 1619 | 1620 | You can use the UI (Enterprise) or the SDK. 1621 | 1622 | #### Configuring an Automated Machine Learning Experiment 1623 | 1624 | With the SDK you have greater flexibility and you can set experiment options using the **AutoMLConfig** class: 1625 | 1626 | ```python 1627 | from azureml.train.automl import AutoMLConfig 1628 | 1629 | automl_run_config = RunConfiguration(framework='python') 1630 | automl_config = AutoMLConfig(name='Automated ML Experiment', 1631 | task='classification', 1632 | primary_metric = 'AUC_weighted', 1633 | compute_target=aml_compute, 1634 | training_data = train_dataset, 1635 | validation_data = test_dataset, 1636 | label_column_name='Label', 1637 | featurization='auto', 1638 | iterations=12, 1639 | max_concurrent_iterations=4) 1640 | ``` 1641 | 1642 | #### Specifying Data for Training 1643 | 1644 | With the UI, you can just select the training **dataset**. With the SDK, you can submit the data in the following ways: 1645 | * Specify a dataset or dataframe of training data that includes features and the label to be predicted. 1646 | * Specify a dataset, dataframe, or numpy array of X values containing the training features, with a corresponding y array of label values. 1647 | 1648 | For both cases, you can optionally specify a validation dataset that will be used to validate the model. If it is not provided, Cross-Validation will be applied. 1649 | 1650 | #### Specifying the Primary Metric 1651 | 1652 | One of the most important settings. You can get all the metrics for a particular task as follows: 1653 | 1654 | ```python 1655 | from azureml.train.automl.utilities import get_primary_metrics 1656 | 1657 | get_primary_metrics('classification') 1658 | ``` 1659 | 1660 | You can find a full list of primary metrics [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml). 1661 | 1662 | #### Submitting an Automated Machine Learning Experiment 1663 | 1664 | Automated ML experiments are submitted as any other experiment with the SDK: 1665 | 1666 | ```python 1667 | from azureml.core.experiment import Experiment 1668 | 1669 | automl_experiment = Experiment(ws, 'automl_experiment') 1670 | automl_run = automl_experiment.submit(automl_config) 1671 | ``` 1672 | 1673 | You can monitor the runs in AML Studio or in the Jupyter Notebooks **RunDetails** widget. 1674 | 1675 | #### Retrieving the Best Run and its Model 1676 | 1677 | ```python 1678 | best_run, fitted_model = automl_run.get_output() 1679 | best_run_metrics = best_run.get_metrics() 1680 | for metric_name in best_run_metrics: 1681 | metric = best_run_metrics[metric_name] 1682 | print(metric_name, metric) 1683 | ``` 1684 | 1685 | #### Exploring Preprocessing Steps 1686 | 1687 | AutoML uses SKlearn pipelines to encapsulate the processing steps. You view those steps in the fittedmodel obtained from the best run as shown below: 1688 | 1689 | ```python 1690 | for step_ in fitted_model.named_steps: 1691 | print(step) 1692 | ``` 1693 | 1694 | ### Knowledge Check 1695 | 1696 | 1. You are using automated machine learning to train a model that predicts the species of an iris based on its petal and sepal measurements. Which kind of task should you specify for automated machine learning? 1697 | 1698 | 1699 | * Regression 1700 | * Forecasting 1701 | * Classification 1702 | 1703 |

1704 | 1705 | Answer 1706 | 1707 |

1708 | Predicting a class requires a classification task. 1709 |

1710 |
1711 | 1712 | 2. You have submitted an automated machine learning run using the Python SDk for Azure Machine Learning. When the run completes, which method of the run object should you use to retrieve the best model? 1713 | 1714 | 1715 | * get_output() 1716 | * load_model() 1717 | * get_metrics() 1718 | 1719 |
1720 | 1721 | Answer 1722 | 1723 |

1724 | The get_output method of an automated machine learning run returns the best mode and the child run that trained it. 1725 |

1726 |
1727 | 1728 |

1729 |
1730 | 1731 | --- 1732 | --------------------------------------------------------------------------------