├── Instructor └── README.md ├── Predictive maintenance NASA sample.ipynb ├── README.md ├── auto-ml-forecasting-energy-demand-end2end.ipynb ├── configuration.ipynb ├── nyc_energy.csv └── train_FD001.txt /Instructor/README.md: -------------------------------------------------------------------------------- 1 | # Instructor notes for automated ML workshop 2 | 3 | Recipe for creating and conducting automated ML workshop using Azure Notebooks backed by DSVM compute (for running the notebook and the compute) 4 | 5 | ## Preparations 6 | 7 | ### Prepare DSVM (if using any in the workshop) 8 | 9 | 1. Go to Azure portal and create a Data Science Virtual Machine (with enough cores to serve the audience). Use username and password for admin access and write them down. recommended to use the following: user: automl , pass: AutoML123!@# 10 | 1. Make sure port 8000 is open for all incoming traffic (Networking tab) 11 | 1. Once the DSVM is running, log on to the console and update the Azure SDK to the latest version using the following commands: 12 | sudo -i /anaconda/envs/py36/bin/pip install --upgrade azureml-sdk 13 | sudo -i /anaconda/envs/py36/bin/pip install --upgrade azureml-sdk[automl,notebooks,explain] 14 | 15 | ## Workshop walkthrough 16 | 17 | ### Subscribe to Azure 18 | 19 | Get a Free trial from here https://azure.microsoft.com/en-us/free/ 20 | 21 | ### Configure Azure Notebooks 22 | 23 | #### Log in 24 | 25 | 1. Go to https://notebooks.azure.com/ 26 | 1. Click "Sign in" on the top right corner 27 | 1. Use your Azure subscription credentials 28 | 29 | #### Clone workshop repo 30 | 31 | 1. From Azure Notebooks home page, click "My projects" 32 | 1. Select "Upload GitHub repo" 33 | 1. In the "Upload GitHub Repository" window, enter the the following GitHub repository: https://github.com/tsikiksr/automl-workshop 34 | 1. The rest of the fields will be automatically populated, leave them be 35 | 1. Make sure "Public" is checked 36 | 1. Click "Import" and wait for the clone process to complete (can take a few minutes) 37 | 38 | #### Attach to DSVM (if using any) 39 | 40 | (**Instructor only**: go to the DSVM and write down the IP address. you have to do this right before the workshop starts because the IP is dynamic and can change over time) 41 | 42 | 1. When the cloning process will end, you should land inside the "automl-workshop" project. If this is not the case, click to enter the project 43 | 1. The project will probably be in a running state. If you can't select a compute from the dropdown (above the file list on the left), click the "shutdown" checkbox (or press the "h" key on the keyboard) to shu down the server 44 | 1. Click on the compute dropdown and select "Direct compute", then in the "Run on new direct compute target" window that opened, fillin the following: 45 | Name: automl-workshop 46 | Enter an IP: *the IP of the DSVM* 47 | Username & Password: *as defined in the DSVM* 48 | 1. Click "Validate" and wait for the green "Your DSVM configuration has been validated..." message. If you get an error message, try validate again, sometimes the error is intermittent. Once succeeded, click "Run" to open the the Jupyter notebook home page. You should be able to see the notebook list. If not, there is a problem with running the server on the DSVM, see the troubleshooting section. 49 | 1. Select the "Python 3.6 - Azure ML" Kernel when prompted 50 | 51 | ### Run automated ML notebooks 52 | 53 | #### Configuration 54 | 55 | 1. Go over the "configuration.ipynb" notebook 56 | + If using free compute, users will be prompted to open a login window and enter a code 57 | 58 | #### Energy demand forecasting notebook 59 | 60 | 1. Go over the "auto-ml-forecasting-energy-demand-end2end.ipynb" notebook (until "Deploy" section) 61 | 62 | #### Deployment 63 | 1. go over the "Deploy" section of the notebook 64 | 65 | ### Consumption 66 | TBD 67 | 68 | #### Consume from Power BI 69 | TBD 70 | 71 | ## Troubleshooting 72 | + Issue: Jupyter server will not run, always back to "Stopped" mode 73 | Fix: Log out from Azure Notebooks and log back in 74 | -------------------------------------------------------------------------------- /Predictive maintenance NASA sample.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": {}, 5 | "cell_type": "markdown", 6 | "source": "# Microsoft Azure ML Automated Machine Learning\n![alt text](https://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg \"NASA Ames\")\n## _NASA Predictive Maintenance Sample_ " 7 | }, 8 | { 9 | "metadata": {}, 10 | "cell_type": "markdown", 11 | "source": "## Purpose and Challenge" 12 | }, 13 | { 14 | "metadata": {}, 15 | "cell_type": "markdown", 16 | "source": "The purpose of this notebook is for the user to build and deploy a Machine Learning (ML) application using Azure Machine Learning (AML) Service.\n\nThe challenge we will tackle is predictive maintenance: when will a certain piece of machinery will fail, so that we are prepared to fix or replace it in advance _before_ it fails.\n\nThis notebook has the complete code to load, prep, train and deploy the model. We chose a small public data set for this demo so as to run the entire process in only few minutes.\n\nFollowing are the high level steps:\n\n1. Acquire and Prepare Data\n2. Train using automated machine learning to get the best possible model\n3. Deploy the model\n" 17 | }, 18 | { 19 | "metadata": {}, 20 | "cell_type": "markdown", 21 | "source": "## Prepare the environment for training" 22 | }, 23 | { 24 | "metadata": { 25 | "trusted": true 26 | }, 27 | "cell_type": "code", 28 | "source": "import logging\nimport os\nimport random\nimport time\n\nfrom matplotlib import pyplot as plt\nfrom matplotlib.pyplot import imshow\nimport numpy as np\nimport pandas as pd\n\nimport azureml.core\nfrom azureml.core.experiment import Experiment\nfrom azureml.core.workspace import Workspace\nfrom azureml.train.automl import AutoMLConfig\nfrom azureml.train.automl.run import AutoMLRun\nfrom azureml.widgets import RunDetails\nfrom azureml.core.model import Model", 29 | "execution_count": null, 30 | "outputs": [] 31 | }, 32 | { 33 | "metadata": {}, 34 | "cell_type": "markdown", 35 | "source": "Define the experiment name and read the config file" 36 | }, 37 | { 38 | "metadata": { 39 | "trusted": true 40 | }, 41 | "cell_type": "code", 42 | "source": "# Retrieve workspace\nws = Workspace.from_config()\n\n# Choose a name for the experiment and specify the project folder.\nexperiment_name = 'automl-predictive-rul'\nproject_folder = './sample_projects/automl-demo-predmain'\n\nexperiment = Experiment(ws, experiment_name)\n\noutput = {}\noutput['SDK version'] = azureml.core.VERSION\noutput['Subscription ID'] = ws.subscription_id\noutput['Workspace Name'] = ws.name\noutput['Resource Group'] = ws.resource_group\noutput['Location'] = ws.location\noutput['Project Directory'] = project_folder\noutput['Experiment Name'] = experiment.name\npd.set_option('display.max_colwidth', -1)\npd.DataFrame(data = output, index = ['']).T", 43 | "execution_count": null, 44 | "outputs": [] 45 | }, 46 | { 47 | "metadata": {}, 48 | "cell_type": "markdown", 49 | "source": "## 1. Acquire and Prepare Data\nFor this notebook, we will use the NASA Prognostics Center's Turbo-Fan Failure dataset. It is located here: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan" 50 | }, 51 | { 52 | "metadata": {}, 53 | "cell_type": "markdown", 54 | "source": "We have it as .txt file in the same folder. We read it into a Pandas DataFrame.\nNote the headers were not in the space seperated txt file, so we assign them from the ReadMe in the zip file. In pandas we use read_csv with the delimiter option, even with a space delimited file." 55 | }, 56 | { 57 | "metadata": { 58 | "trusted": true 59 | }, 60 | "cell_type": "code", 61 | "source": "import pandas as pd\ntrain = pd.read_csv(\"train_FD001.txt\", delimiter=\"\\s|\\s\\s\", index_col=False, engine='python', names=['unit','cycle','os1','os2','os3','sm1','sm2','sm3','sm4','sm5','sm6','sm7','sm8','sm9','sm10','sm11','sm12','sm13','sm14','sm15','sm16','sm17','sm18','sm19','sm20','sm21'])", 62 | "execution_count": null, 63 | "outputs": [] 64 | }, 65 | { 66 | "metadata": {}, 67 | "cell_type": "markdown", 68 | "source": "Take a quick look at the data" 69 | }, 70 | { 71 | "metadata": { 72 | "trusted": true 73 | }, 74 | "cell_type": "code", 75 | "source": "train.head(5)", 76 | "execution_count": null, 77 | "outputs": [] 78 | }, 79 | { 80 | "metadata": {}, 81 | "cell_type": "markdown", 82 | "source": "Our dataset has a number of units in it, with each engine flight listed as a cycle. The cycles count up until the engine fails. What we would like to predict is the no. of cycles until failure. \nSo we need to calculate a new column called \"Remaining Useful Life\", or RUL, for short. It will be the last cycle value minus each cycle value per unit." 83 | }, 84 | { 85 | "metadata": { 86 | "trusted": true 87 | }, 88 | "cell_type": "code", 89 | "source": "def assignrul(df):\n maxi = df['cycle'].max()\n df['rul'] = maxi - df['cycle']\n return df\n \ntrain_new = train.groupby('unit').apply(assignrul)\n\ntrain_new.columns", 90 | "execution_count": null, 91 | "outputs": [] 92 | }, 93 | { 94 | "metadata": {}, 95 | "cell_type": "markdown", 96 | "source": "Now our dataframe has the 'RUL' column. Predicting this value will be the objective of this exercise." 97 | }, 98 | { 99 | "metadata": { 100 | "trusted": true 101 | }, 102 | "cell_type": "code", 103 | "source": "train_new.head(192)", 104 | "execution_count": null, 105 | "outputs": [] 106 | }, 107 | { 108 | "metadata": {}, 109 | "cell_type": "markdown", 110 | "source": "First note that some of the sensor measurements do seem to be changing as we near 0 RUL (sm3, sm4, sm14, sm17). This implies that we should be able to make a model that will be useful enough for business value.\n\nWe are now ready to train a model on this data using Automated ML." 111 | }, 112 | { 113 | "metadata": {}, 114 | "cell_type": "markdown", 115 | "source": "## 2. Train using automated machine learning" 116 | }, 117 | { 118 | "metadata": {}, 119 | "cell_type": "markdown", 120 | "source": "Here we utilize Azure's AutoML package to automate the scaling of the sensors, selection of sensors, and automatically train and evaluate many different types of ML models." 121 | }, 122 | { 123 | "metadata": {}, 124 | "cell_type": "markdown", 125 | "source": "Create training data" 126 | }, 127 | { 128 | "metadata": { 129 | "trusted": true 130 | }, 131 | "cell_type": "code", 132 | "source": "# remove the unit ID and cycle number\nX_train = train_new.iloc[:,2:26].values\n# extract the RUL column to be the target column\ny_train = train_new.iloc[:,26:27].values.astype(int).flatten()", 133 | "execution_count": null, 134 | "outputs": [] 135 | }, 136 | { 137 | "metadata": {}, 138 | "cell_type": "markdown", 139 | "source": "### Split data to train and test" 140 | }, 141 | { 142 | "metadata": { 143 | "trusted": true 144 | }, 145 | "cell_type": "code", 146 | "source": "from sklearn.model_selection import train_test_split\nX_train, X_test, y_train, y_test = train_test_split(X_train,\n y_train,\n test_size=0.3,\n random_state=100)\nX_train = pd.DataFrame(X_train)\nX_test = pd.DataFrame(X_test)\nprint(X_train.shape)\nprint(X_test.shape)", 147 | "execution_count": null, 148 | "outputs": [] 149 | }, 150 | { 151 | "metadata": { 152 | "trusted": true 153 | }, 154 | "cell_type": "code", 155 | "source": "X_test[0:1]", 156 | "execution_count": null, 157 | "outputs": [] 158 | }, 159 | { 160 | "metadata": {}, 161 | "cell_type": "markdown", 162 | "source": "Now we are ready to configure automated ML. We provide necessary information on: what we want to predict, what accuracy metric we want to use, how many models we want to try, and many other parameters. AutoML will also automatically scale the data for us." 163 | }, 164 | { 165 | "metadata": {}, 166 | "cell_type": "markdown", 167 | "source": "## Configure Automated ML\n\nSet the automated ML run. Full list of parameters is available [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train).\n\n|Property|Description|\n|-|-|\n|**task**|classification or regression|\n|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics:
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error|\n|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n|**n_cross_validations**|Number of cross validation splits.|\n|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|\n|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|" 168 | }, 169 | { 170 | "metadata": { 171 | "trusted": true 172 | }, 173 | "cell_type": "code", 174 | "source": "# Create Auto ML configuration\nAutoml_config = AutoMLConfig(task = 'regression',\n primary_metric = 'r2_score',\n iteration_timeout_minutes = 5,\n iterations = 5, \n blacklist_models = ['KNN','RandomForest'],\n X = X_train,\n y = y_train,\n n_cross_validations = 3,\n preprocess = False,\n experiment_exit_score = 0.985,\n path=project_folder)", 175 | "execution_count": null, 176 | "outputs": [] 177 | }, 178 | { 179 | "metadata": {}, 180 | "cell_type": "markdown", 181 | "source": "Finally we are ready to launch AutoML. This step can take many minutes, but AutoML will give you updates as models are trained and evaluated by the metric we specified above. AutoML also let us know which scaling method was used. The information from each ML model training will be stored in the Experiment section of the ML Workspace, where we can review it through Azure Portal." 182 | }, 183 | { 184 | "metadata": { 185 | "trusted": true 186 | }, 187 | "cell_type": "code", 188 | "source": "# Submit the training job. The output will show the iterations as they finish one by one\nexperiment=Experiment(ws, experiment_name)\nlocal_run = experiment.submit(Automl_config, show_output=True)", 189 | "execution_count": null, 190 | "outputs": [] 191 | }, 192 | { 193 | "metadata": {}, 194 | "cell_type": "markdown", 195 | "source": "### View the training run in a graphic widget" 196 | }, 197 | { 198 | "metadata": { 199 | "trusted": true 200 | }, 201 | "cell_type": "code", 202 | "source": "RunDetails(local_run).show()", 203 | "execution_count": null, 204 | "outputs": [] 205 | }, 206 | { 207 | "metadata": {}, 208 | "cell_type": "markdown", 209 | "source": "### Retrieve the best model (according to the primary metric)" 210 | }, 211 | { 212 | "metadata": { 213 | "trusted": true 214 | }, 215 | "cell_type": "code", 216 | "source": "# find the run with the highest accuracy value.\nbest_run, fitted_model = local_run.get_output()\nprint(best_run)", 217 | "execution_count": null, 218 | "outputs": [] 219 | }, 220 | { 221 | "metadata": {}, 222 | "cell_type": "markdown", 223 | "source": "## 3. Deploy Model" 224 | }, 225 | { 226 | "metadata": { 227 | "trusted": true 228 | }, 229 | "cell_type": "code", 230 | "source": "# register best model in workspace. The output of this cell is important\ndescription = 'AutoML NASA RUL Regression'\ntags = None\nmodel = local_run.register_model(description=description, tags=tags)\nlocal_run.model_id # Use the model id that is printed out in the cell below", 231 | "execution_count": null, 232 | "outputs": [] 233 | }, 234 | { 235 | "metadata": {}, 236 | "cell_type": "markdown", 237 | "source": "After we register the model in our ML Workspace, it should be visible in Azure Portal.\n\nNow we want to deploy the model as a REST API that we can feed a row or rows of \"X\" data to, and return the predicted 'RUL' value. To accomplish this, we will build a container image in our AML Workspace and deploy that image as a Container instance in Azure's ACI service. We will then obtain an IP address where we can submit data and receive back the predicted 'RUL' value.\n\nThere are 3 things we need: \n1. A score.py file that contains the init() and run() functions with instructions on how to load and socre with the model\n2. A myenv.yml file that contains information on the python environment in which the model needs to run\n3. Configurations for our images and our services, using functions provided by AzureML service.\n\nThe cells below help you set these up. You will need to use the registered model name provided by the cell above." 238 | }, 239 | { 240 | "metadata": {}, 241 | "cell_type": "markdown", 242 | "source": "### Create scoring script" 243 | }, 244 | { 245 | "metadata": { 246 | "trusted": true 247 | }, 248 | "cell_type": "code", 249 | "source": "%%writefile score.py\n# Scoring Script\nimport json\nimport numpy as np\nimport os\nimport pickle\nfrom sklearn.externals import joblib\nfrom sklearn.linear_model import LogisticRegression\n\nfrom azureml.core.model import Model\n\nimport azureml.train.automl\n\ndef init():\n global model\n model_path = Model.get_model_path('<>')\n print(model_path)\n model = joblib.load(model_path)\n \n\ndef run(raw_data):\n # grab and prepare the data\n data = (np.array(json.loads(raw_data)['data'])).reshape(1,-1)\n # make prediction\n y_hat = model.predict(data)\n return json.dumps(y_hat.tolist())", 250 | "execution_count": null, 251 | "outputs": [] 252 | }, 253 | { 254 | "metadata": {}, 255 | "cell_type": "markdown", 256 | "source": "Replace the 'modelid' tag with the actual model ID" 257 | }, 258 | { 259 | "metadata": { 260 | "trusted": true 261 | }, 262 | "cell_type": "code", 263 | "source": "# Substitute the actual model id in the script file.\n\nscript_file_name = 'score.py'\n\nwith open(script_file_name, 'r') as cefr:\n content = cefr.read()\n\nwith open(script_file_name, 'w') as cefw:\n cefw.write(content.replace('<>', local_run.model_id))", 264 | "execution_count": null, 265 | "outputs": [] 266 | }, 267 | { 268 | "metadata": {}, 269 | "cell_type": "markdown", 270 | "source": "### Create the conda environment file" 271 | }, 272 | { 273 | "metadata": { 274 | "trusted": true 275 | }, 276 | "cell_type": "code", 277 | "source": "from azureml.core.conda_dependencies import CondaDependencies\n\nmyenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','lightgbm'], pip_packages=['azureml-sdk[automl]'])\n\nconda_env_file_name = 'myenv.yml'\nmyenv.save_to_file('.', conda_env_file_name)", 278 | "execution_count": null, 279 | "outputs": [] 280 | }, 281 | { 282 | "metadata": { 283 | "trusted": true 284 | }, 285 | "cell_type": "code", 286 | "source": "with open(\"myenv.yml\",\"r\") as f:\n print(f.read())", 287 | "execution_count": null, 288 | "outputs": [] 289 | }, 290 | { 291 | "metadata": {}, 292 | "cell_type": "markdown", 293 | "source": "### Create the webservice configuration" 294 | }, 295 | { 296 | "metadata": { 297 | "trusted": true 298 | }, 299 | "cell_type": "code", 300 | "source": "from azureml.core.webservice import AciWebservice\n\naciconfig = AciWebservice.deploy_configuration(cpu_cores=2, \n memory_gb=2, \n tags={\"data\": \"RUL\", \"method\" : \"sklearn\"}, \n description='Predict RUL with Azure AutoML')", 301 | "execution_count": null, 302 | "outputs": [] 303 | }, 304 | { 305 | "metadata": {}, 306 | "cell_type": "markdown", 307 | "source": "### Create the container image and deploy as a webservice" 308 | }, 309 | { 310 | "metadata": {}, 311 | "cell_type": "markdown", 312 | "source": "Finally, configure the container image and deploy the service. Make sure the filenames match, your Workspace is in variable ws, and your model name is correct. It will create your containter image and deploy it as a webservice.\n\nThis process can take up to 10 minutes, so please be patient. You can check the progress bar periodically ..." 313 | }, 314 | { 315 | "metadata": { 316 | "scrolled": true, 317 | "trusted": true 318 | }, 319 | "cell_type": "code", 320 | "source": "%%time\nfrom azureml.core.webservice import Webservice\nfrom azureml.core.image import ContainerImage\n\n# configure the image\nimage_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n runtime=\"python\", \n conda_file=\"myenv.yml\",\n tags = {'ml': \"Regression\", 'type': \"automl\"},\n description = \"Image for automated ML NASA predictive maintenance\")\n\n# deploy the image to a webservice\nservice = Webservice.deploy_from_model(workspace=ws,\n name='automl-rul-regression',\n deployment_config=aciconfig,\n models=[model],\n image_config=image_config)\n\nservice.wait_for_deployment(show_output=True)", 321 | "execution_count": null, 322 | "outputs": [] 323 | }, 324 | { 325 | "metadata": {}, 326 | "cell_type": "markdown", 327 | "source": "Just as a check, we can retrieve the URI for the scoring function." 328 | }, 329 | { 330 | "metadata": { 331 | "trusted": true 332 | }, 333 | "cell_type": "code", 334 | "source": "print(service.scoring_uri)", 335 | "execution_count": null, 336 | "outputs": [] 337 | }, 338 | { 339 | "metadata": {}, 340 | "cell_type": "markdown", 341 | "source": "### Test the service\n\nLet's check to see if the service is working. Here we submit a single row of data from X_train to see if it returns a reasonable prediction." 342 | }, 343 | { 344 | "metadata": { 345 | "trusted": true 346 | }, 347 | "cell_type": "code", 348 | "source": "import requests\nimport json\n\n# send a random row from the test set to score\n#random_index = np.random.randint(0, len(X_train)-1)\ninput_data = \"{\\\"data\\\": \" + str(X_test[1:2].values.tolist()) + \"}\" #str(list(X_train[0].reshape(1,-1)[0])) + \"}\"\n\nheaders = {'Content-Type':'application/json'}\n\n# for AKS deployment you'd need to the service key in the header as well\n# api_key = service.get_key()\n# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n\nresp = requests.post(service.scoring_uri, input_data, headers=headers)\n\nprint(\"POST to url\", service.scoring_uri)\nprint(\"input data:\", input_data)\nprint(\"label:\", y_test[1:2])\nprint(\"prediction:\", resp.text)", 349 | "execution_count": null, 350 | "outputs": [] 351 | }, 352 | { 353 | "metadata": {}, 354 | "cell_type": "markdown", 355 | "source": "Here we see one engine evolving through many flights, or cycles. As we approach failure, the rul declines to zero, as does the prediction. This is a good example of how the predictive model can assist in estimate the future failure of the engine.\n\nNote that the model does not perform well at high rul. This is an acceptable outcome as the engine is far from failure." 356 | }, 357 | { 358 | "metadata": {}, 359 | "cell_type": "markdown", 360 | "source": "### Delete the web service resource\n\nTo avoid any run-away Azure costs, we always delete un-necessary services when we are done." 361 | }, 362 | { 363 | "metadata": { 364 | "trusted": true 365 | }, 366 | "cell_type": "code", 367 | "source": "service.delete()", 368 | "execution_count": null, 369 | "outputs": [] 370 | }, 371 | { 372 | "metadata": {}, 373 | "cell_type": "markdown", 374 | "source": "If the workspace will not be in use, it is advisable to delete it also" 375 | } 376 | ], 377 | "metadata": { 378 | "kernelspec": { 379 | "name": "python36", 380 | "display_name": "Python 3.6", 381 | "language": "python" 382 | }, 383 | "language_info": { 384 | "mimetype": "text/x-python", 385 | "nbconvert_exporter": "python", 386 | "name": "python", 387 | "pygments_lexer": "ipython3", 388 | "version": "3.6.6", 389 | "file_extension": ".py", 390 | "codemirror_mode": { 391 | "version": 3, 392 | "name": "ipython" 393 | } 394 | } 395 | }, 396 | "nbformat": 4, 397 | "nbformat_minor": 2 398 | } 399 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # automl-workshop 2 | Repository with instructions, notebooks and data files for automated ML worshop. 3 | 4 | ## Pre-requisites 5 | + Azure subscription. Preferably use your existing organization/personal subscription. If needed, you can get a free trial subscription with some credit [here](https://azure.microsoft.com/en-us/free/) 6 | 7 | ## Overview 8 | In this workshop, you will go over the following steps: 9 | 1. Set up Azure Notebooks project to do the workshop 10 | 1. Configure your Azure Machine Learning service workspace so it is ready for automated ML training 11 | 1. Run the Energy demand forecasting notebook, including: 12 | + Read the data and prepare it for training (split to train/validation/test) 13 | + Configure and run automated ML training job to get the best model 14 | + Analyze and test the results 15 | + Deploy the model to Azure Container Instance (ACI) 16 | 17 | ## Scenario 18 | This scenario focuses on energy demand forecasting where the goal is to predict the future load on an energy grid. It is a critical business operation for companies in the energy sector as operators need to maintain the fine balance between the energy consumed on a grid and the energy supplied to it. 19 | 20 | ## Configure Azure Notebooks 21 | 22 | #### Log in 23 | 24 | 1. Go to https://notebooks.azure.com/ 25 | 1. Click "Sign in" on the top right corner 26 | 1. Use your Azure subscription credentials to log in 27 | 28 | #### Clone workshop repo 29 | 30 | 1. From Azure Notebooks home page, click "My projects" 31 | 1. Select "Upload GitHub repo" 32 | 1. In the "Upload GitHub Repository" window, enter the the following GitHub repository: https://github.com/tsikiksr/automl-workshop 33 | 1. The rest of the fields will be automatically populated, leave them be 34 | 1. Make sure "Public" is checked 35 | 1. Click "Import" and wait for the clone process to complete (can take a few minutes) 36 | 1. Click "Run on Free Compute" to start the Jupyter server 37 | 38 | ## Run automated ML notebooks 39 | 40 | ### Configuration 41 | The configuration notebook will create and prepare an Azure Machine Learning service workspace. You only need to run it once to create the workspace, which can then be used for multiple experiments, model training, and deployments. 42 | 43 | 1. From the Jupyter file list, select the "configuration.ipynb" notebook to open it 44 | 1. If prompted for kernel, select "Python 3.6" and click "Set Kernel" 45 | 1. Proceed to the first code cell and hit "Run", you should see that the SDK version is at least the version that is expected 46 | 1. Proceed to the next code cell, to enter your subscription id in the appropriate palce () 47 | + To retrieve your subscription id, go to https://portal.azure.com/ and search for "subscriptions", then select "subscriptions" from the results (the one with the key icon) 48 | + Locate your subscription and copy the "subscription id" text (it is a 32 character text in the format of XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) 49 | + Go back to the notebook and replace the (including the brackets) with the actual subscription id 50 | + Keep the rest of the parameters as is 51 | + Run the cell 52 | 1. Go to the next cell and run it, you should be prompted to open a login window, then do the following: 53 | + Copy the 9 character code in the output 54 | + click the link to open the login window (https://microsoft.com/devicelogin) 55 | + In the login window, paste the 9 character code and hit "Accept" when prompted to. 56 | + Go back to the notebook and wait for the authentication to finish. 57 | + You should get the message: "Workspace not accessible. Change your parameters or create a new workspace below". That is ok, it means that the workspace was not found, so you can process to the next code cell. 58 | 1. Run the cell to create the new workspace (might take a few minutes to complete) 59 | + Once completed successfully, verify that the workspace was actually created by going to https://portal.azure.com/ and search for "my-automl-workshop-ws" in the search box. selecting the workspace will open it, and you'll be able to view the workspace. This is your Azure Machine Leanring service workspace, which include all of the resources you'll need for training and deploying ML models. 60 | + this cell will also persist the worksapce details into a config file, so it is easily retrieved for future use. 61 | 1. Run the next cell to verify the configuration and the config file and wait for the output. verify the configuration. 62 | 1. Run the next cells **one by one (wait for each one to complete)** to install the additional dependencies 63 | 64 | Well done! You have completed the configuration notebook, and you are now ready to move on to the energy demand forecasting notebook. 65 | 66 | 67 | ### Energy demand forecasting notebook 68 | 69 | 1. Go over the "auto-ml-forecasting-energy-demand-end2end.ipynb" notebook (until "Deploy" section), cell by cell. Each cell has the details of its content, wait for each one to complete before starting the next cell. 70 | 71 | ### Deployment 72 | 1. go over the "Deploy" section of the notebook. wait for each step to complete, image creation and service deployment takes several minutes to complete. 73 | 74 | Congrats! You have completed the workshop successfully! 75 | 76 | # References 77 | + https://aka.ms/automatedmldocs 78 | + http://aka.ms/automatedmlsamples 79 | + http://aka.ms/automatedml 80 | 81 | Ask questions, send feedback: AskAutomatedML@microsoft.com 82 | 83 | 84 | -------------------------------------------------------------------------------- /auto-ml-forecasting-energy-demand-end2end.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Copyright (c) Microsoft Corporation. All rights reserved.\n", 8 | "\n", 9 | "Licensed under the MIT License." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Automated Machine Learning\n", 17 | "_**Energy Demand Forecasting**_\n", 18 | "\n", 19 | "## Contents\n", 20 | "1. [Introduction](#Introduction)\n", 21 | "2. [Setup](#Setup)\n", 22 | "3. [Data](#Data)\n", 23 | "4. [Train](#Train)\n", 24 | "5. [Deploy](#Deploy)" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## Introduction\n", 32 | "In this example, we show how AutoML can be used for energy demand forecasting.\n", 33 | "\n", 34 | "Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n", 35 | "\n", 36 | "In this notebook you would see\n", 37 | "1. Creating an Experiment in an existing Workspace\n", 38 | "2. Instantiating AutoMLConfig with new task type \"forecasting\" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: \"time_column_name\" \n", 39 | "3. Training the Model using local compute\n", 40 | "4. Exploring the results\n", 41 | "5. Testing the fitted model" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "# Scenario\n", 49 | "This scenario focuses on energy demand forecasting where the __goal is to predict the future load on an energy grid__. It is a critical business operation for companies in the energy sector as operators need to maintain the fine balance between the energy consumed on a grid and the energy supplied to it. \n", 50 | "\n", 51 | "Too much power supplied to the grid can result in waste of energy or technical faults. However, if too little power is supplied it can lead to blackouts, leaving customers without power. ypically, grid operators can take short-term decisions to manage energy supply to the grid and keep the load in balance. An accurate short-term forecast of energy demand is therefore essential for the operator to make these decisions with confidence.\n", 52 | "\n", 53 | "This scenario details the construction of a machine learning energy demand forecasting solution. _The solution is trained on a public dataset from the New York Independent System Operator (NYISO)_ , which operates the power grid for New York State. \n", 54 | "The dataset includes hourly power demand data for New York City over a period of five years. An additional dataset containing hourly weather conditions in New York City over the same time period was taken from darksky.net. \n" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "## Setup\n", 62 | "\n", 63 | "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "# Import libraries\n", 73 | "import azureml.core\n", 74 | "import pandas as pd\n", 75 | "import numpy as np\n", 76 | "import logging\n", 77 | "import warnings\n", 78 | "# Squash warning messages for cleaner output in the notebook\n", 79 | "warnings.showwarning = lambda *args, **kwargs: None\n", 80 | "\n", 81 | "\n", 82 | "from azureml.core.workspace import Workspace\n", 83 | "from azureml.core.experiment import Experiment\n", 84 | "from azureml.train.automl import AutoMLConfig\n", 85 | "from azureml.train.automl.run import AutoMLRun\n", 86 | "from matplotlib import pyplot as plt\n", 87 | "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": {}, 94 | "outputs": [], 95 | "source": [ 96 | "# Retrieve workspace\n", 97 | "ws = Workspace.from_config()\n", 98 | "\n", 99 | "# choose a name for the run history container in the workspace\n", 100 | "experiment_name = 'automl-energydemandforecasting'\n", 101 | "# project folder\n", 102 | "project_folder = './sample_projects/automl-local-energydemandforecasting'\n", 103 | "\n", 104 | "experiment = Experiment(ws, experiment_name)\n", 105 | "\n", 106 | "output = {}\n", 107 | "output['SDK version'] = azureml.core.VERSION\n", 108 | "output['Subscription ID'] = ws.subscription_id\n", 109 | "output['Workspace'] = ws.name\n", 110 | "output['Resource Group'] = ws.resource_group\n", 111 | "output['Location'] = ws.location\n", 112 | "output['Project Directory'] = project_folder\n", 113 | "output['Run History Name'] = experiment_name\n", 114 | "pd.set_option('display.max_colwidth', -1)\n", 115 | "outputDf = pd.DataFrame(data = output, index = [''])\n", 116 | "outputDf.T" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "## Data\n", 124 | "Read energy demanding data from file, and preview data." 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "data = pd.read_csv(\"nyc_energy.csv\", parse_dates=['timeStamp'])\n", 134 | "data.head()" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "### Split the data to train and test\n", 142 | "\n" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "train = data[data['timeStamp'] < '2017-02-01']\n", 152 | "test = data[data['timeStamp'] >= '2017-02-01']\n" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "### Prepare the test data, we will feed X_test to the fitted model and get prediction" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "y_test = test.pop('demand').values\n", 169 | "X_test = test" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "### Split the train data to train and valid\n", 177 | "\n", 178 | "Use one month's data as valid data\n" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": null, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "X_train = train[train['timeStamp'] < '2017-01-01']\n", 188 | "X_valid = train[train['timeStamp'] >= '2017-01-01']\n", 189 | "y_train = X_train.pop('demand').values\n", 190 | "y_valid = X_valid.pop('demand').values\n", 191 | "print(X_train.shape)\n", 192 | "print(y_train.shape)\n", 193 | "print(X_valid.shape)\n", 194 | "print(y_valid.shape)" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "## Train\n", 202 | "\n", 203 | "Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n", 204 | "\n", 205 | "Read more in the [documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": null, 211 | "metadata": {}, 212 | "outputs": [], 213 | "source": [ 214 | "# Set the time series column\n", 215 | "time_column_name = 'timeStamp'\n", 216 | "automl_settings = {\n", 217 | " \"time_column_name\": time_column_name,\n", 218 | "}\n", 219 | "\n", 220 | "# create the configuration object\n", 221 | "automl_config = AutoMLConfig(task = 'forecasting',\n", 222 | " debug_log = 'automl_nyc_energy_errors.log',\n", 223 | " primary_metric='normalized_root_mean_squared_error',\n", 224 | " iterations = 5,\n", 225 | " iteration_timeout_minutes = 10,\n", 226 | " X = X_train,\n", 227 | " y = y_train,\n", 228 | " X_valid = X_valid,\n", 229 | " y_valid = y_valid,\n", 230 | " path=project_folder,\n", 231 | " blacklist_models = ['RandomForest'],\n", 232 | " # model_explainability=True,\n", 233 | " verbosity = logging.INFO,\n", 234 | " **automl_settings)" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "You can call the submit method on the experiment object and pass the run configuration. For Local runs the execution is synchronous. Depending on the data and number of iterations this can run for while.\n", 242 | "You will see the currently running iterations printing to the console." 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": {}, 249 | "outputs": [], 250 | "source": [ 251 | "local_run = experiment.submit(automl_config, show_output=True)" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "# View the run summary\n", 261 | "local_run" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "### Retrieve the Best Model\n", 269 | "Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration." 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "best_run, fitted_model = local_run.get_output()\n", 279 | "fitted_model.steps" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "### Retrieve explanation for best model\n", 287 | "Model explainability is important to understand the features and their importance. This will retrieve the explainability of the model." 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "metadata": { 294 | "scrolled": false 295 | }, 296 | "outputs": [], 297 | "source": [ 298 | "from azureml.train.automl.automlexplainer import explain_model\n", 299 | "\n", 300 | "# shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n", 301 | "# explain_model(fitted_model, X_train, X_test, best_run)" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "### Widget for monitoring runs" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "from azureml.widgets import RunDetails\n", 318 | "RunDetails(local_run).show()" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "### Test the Best Fitted Model\n", 326 | "\n", 327 | "Predict on training and test set, and calculate residual values." 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": {}, 334 | "outputs": [], 335 | "source": [ 336 | "y_pred = fitted_model.predict(X_test)\n", 337 | "y_pred" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "### Use the Check Data Function to remove the nan values from y_test to avoid error when calculate metrics " 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": {}, 351 | "outputs": [], 352 | "source": [ 353 | "if len(y_test) != len(y_pred):\n", 354 | " raise ValueError(\n", 355 | " 'the true values and prediction values do not have equal length.')\n", 356 | "elif len(y_test) == 0:\n", 357 | " raise ValueError(\n", 358 | " 'y_true and y_pred are empty.')\n", 359 | "\n", 360 | "# if there is any non-numeric element in the y_true or y_pred,\n", 361 | "# the ValueError exception will be thrown.\n", 362 | "y_test_f = np.array(y_test).astype(float)\n", 363 | "y_pred_f = np.array(y_pred).astype(float)\n", 364 | "\n", 365 | "# remove entries both in y_true and y_pred where at least\n", 366 | "# one element in y_true or y_pred is missing\n", 367 | "y_test = y_test_f[~(np.isnan(y_test_f) | np.isnan(y_pred_f))]\n", 368 | "y_pred = y_pred_f[~(np.isnan(y_test_f) | np.isnan(y_pred_f))]" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "### Plot the predictions to compare to actual data" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": null, 381 | "metadata": {}, 382 | "outputs": [], 383 | "source": [ 384 | "print(\"[Test Data] \\nRoot Mean squared error: %.2f\" % np.sqrt(mean_squared_error(y_test, y_pred)))\n", 385 | "# Explained variance score: 1 is perfect prediction\n", 386 | "print('mean_absolute_error score: %.2f' % mean_absolute_error(y_test, y_pred))\n", 387 | "print('R2 score: %.2f' % r2_score(y_test, y_pred))\n", 388 | "\n", 389 | "# Plot outputs\n", 390 | "test_pred = plt.scatter(y_test, y_pred, color='b')\n", 391 | "test_test = plt.scatter(y_test, y_test, color='g')\n", 392 | "plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n", 393 | "plt.show()" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "## Deploy\n", 401 | "Deploy the model into an Azure Container Instance to enable inferencing on new data" 402 | ] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "metadata": {}, 407 | "source": [ 408 | "### Register the model\n", 409 | "Register the best model to the AML service" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [ 418 | "model = local_run.register_model(description = 'automated ml model for energy demand forecasting', tags = {'ml': \"Forecasting\", 'type': \"automl\"})\n", 419 | "print(local_run.model_id) # This will be written to the script file later in the notebook." 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "### Create Scoring Script\n", 427 | "This will be used to run the model on new data for predictions" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": null, 433 | "metadata": {}, 434 | "outputs": [], 435 | "source": [ 436 | "%%writefile score_energy_demand.py\n", 437 | "import pickle\n", 438 | "import json\n", 439 | "import numpy as np\n", 440 | "import azureml.train.automl\n", 441 | "from sklearn.externals import joblib\n", 442 | "from azureml.core.model import Model\n", 443 | "\n", 444 | "\n", 445 | "def init():\n", 446 | " global model\n", 447 | " model_path = Model.get_model_path(model_name = '<>') # this name is model.id of model that we want to deploy\n", 448 | " # deserialize the model file back into a sklearn model\n", 449 | " model = joblib.load(model_path)\n", 450 | "\n", 451 | "def run(timestamp,precip,temp):\n", 452 | " try:\n", 453 | " rawdata = json.dumps({timestamp, precip, temp})\n", 454 | " data = json.loads(rawdata)\n", 455 | " data_arr = numpy.array(data)\n", 456 | " result = model.predict(data_arr)\n", 457 | " # result = json.dumps({'timeStamp':timestamp, 'precip':precip, 'temp':temp})\n", 458 | " except Exception as e:\n", 459 | " result = str(e)\n", 460 | " return json.dumps({\"error\": result})\n", 461 | " return json.dumps({\"result\":result.tolist()})" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "### Create a YAML File for the Environment\n", 469 | "the YAML file will be used to setup the conda environment on the deployed image" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": null, 475 | "metadata": {}, 476 | "outputs": [], 477 | "source": [ 478 | "# Retrieve the dependencies\n", 479 | "experiment = Experiment(ws, experiment_name)\n", 480 | "ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)\n", 481 | "dependencies = ml_run.get_run_sdk_dependencies(iteration = 0)\n", 482 | "for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", 483 | " print('{}\\t{}'.format(p, dependencies[p]))" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": null, 489 | "metadata": {}, 490 | "outputs": [], 491 | "source": [ 492 | "# Create the environment file\n", 493 | "\n", 494 | "from azureml.core.conda_dependencies import CondaDependencies \n", 495 | "\n", 496 | "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=[\"azureml-train-automl\"])\n", 497 | "print(myenv.serialize_to_string())\n", 498 | "\n", 499 | "conda_env_file_name = 'my_conda_env.yml'\n", 500 | "myenv.save_to_file('.', conda_env_file_name)" 501 | ] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "execution_count": null, 506 | "metadata": {}, 507 | "outputs": [], 508 | "source": [ 509 | "# Substitute the actual version number in the environment file.\n", 510 | "# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n", 511 | "# However, we include this in case this code is used on an experiment from a previous SDK version.\n", 512 | "\n", 513 | "with open(conda_env_file_name, 'r') as cefr:\n", 514 | " content = cefr.read()\n", 515 | "\n", 516 | "with open(conda_env_file_name, 'w') as cefw:\n", 517 | " cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n", 518 | "\n", 519 | "# Substitute the actual model id in the script file.\n", 520 | "\n", 521 | "script_file_name = 'score_energy_demand.py'\n", 522 | "\n", 523 | "with open(script_file_name, 'r') as cefr:\n", 524 | " content = cefr.read()\n", 525 | "\n", 526 | "with open(script_file_name, 'w') as cefw:\n", 527 | " cefw.write(content.replace('<>', local_run.model_id))" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "### Generate schema file\n", 535 | "Schema file is used to define the deployed web service REST API, so it is consumable from \"Swagger enabled\" services, such as Power BI" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": null, 541 | "metadata": {}, 542 | "outputs": [], 543 | "source": [ 544 | "from azureml.webservice_schema.sample_definition import SampleDefinition\n", 545 | "from azureml.webservice_schema.data_types import DataTypes\n", 546 | "from azureml.webservice_schema.schema_generation import generate_schema\n", 547 | "\n", 548 | "schema_file_name = './schema.json'\n", 549 | "def run(timestamp,precip,temp):\n", 550 | " return \"OK\"\n", 551 | "\n", 552 | "import numpy as np\n", 553 | "generate_schema(run, inputs={\n", 554 | " \"timestamp\" : SampleDefinition(DataTypes.STANDARD, '2012-01-01 00:00:00'),\n", 555 | " \"precip\" : SampleDefinition(DataTypes.STANDARD, '0.0'),\n", 556 | " \"temp\" : SampleDefinition(DataTypes.STANDARD, '0.0')}, \n", 557 | " filepath=schema_file_name)" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "metadata": {}, 563 | "source": [ 564 | "### Create a Docker file to include extra dependencies in the image" 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": null, 570 | "metadata": {}, 571 | "outputs": [], 572 | "source": [ 573 | "%%writefile docker_steps.dockerfile\n", 574 | "RUN apt-get update && \\\n", 575 | " apt-get upgrade -y && \\\n", 576 | " apt-get install -y build-essential gcc g++ python-dev unixodbc unixodbc-dev" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": null, 582 | "metadata": {}, 583 | "outputs": [], 584 | "source": [ 585 | "docker_file_name = \"docker_steps.dockerfile\"" 586 | ] 587 | }, 588 | { 589 | "cell_type": "markdown", 590 | "metadata": {}, 591 | "source": [ 592 | "### Create a Container Image\n", 593 | "The container image will be based on the model and is used to deploy the container instance" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": null, 599 | "metadata": {}, 600 | "outputs": [], 601 | "source": [ 602 | "from azureml.core.image import Image, ContainerImage\n", 603 | "\n", 604 | "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", 605 | " execution_script = script_file_name,\n", 606 | " docker_file = docker_file_name,\n", 607 | " schema_file = schema_file_name,\n", 608 | " conda_file = conda_env_file_name,\n", 609 | " tags = {'ml': \"Forecasting\", 'type': \"automl\"},\n", 610 | " description = \"Image for automated ml energy demand forecasting predictions\")\n", 611 | "\n", 612 | "image = Image.create(name = \"automlenergyforecasting\",\n", 613 | " models = [model],\n", 614 | " image_config = image_config, \n", 615 | " workspace = ws)\n", 616 | "\n", 617 | "image.wait_for_creation(show_output = True)" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "### Deploy the Image as a Web Service on Azure Container Instance" 625 | ] 626 | }, 627 | { 628 | "cell_type": "code", 629 | "execution_count": null, 630 | "metadata": {}, 631 | "outputs": [], 632 | "source": [ 633 | "from azureml.core.webservice import AciWebservice\n", 634 | "\n", 635 | "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", 636 | " memory_gb = 1, \n", 637 | " tags = {'ml': \"Forecasting\", 'type': \"automl\"}, \n", 638 | " description = 'ACI service for automated ml energy demand forecasting predictions')" 639 | ] 640 | }, 641 | { 642 | "cell_type": "code", 643 | "execution_count": null, 644 | "metadata": {}, 645 | "outputs": [], 646 | "source": [ 647 | "from azureml.core.webservice import Webservice\n", 648 | "\n", 649 | "aci_service_name = 'automlenergyforecasting'\n", 650 | "print(aci_service_name)\n", 651 | "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", 652 | " image = image,\n", 653 | " name = aci_service_name,\n", 654 | " workspace = ws)\n", 655 | "aci_service.wait_for_deployment(True)\n", 656 | "print(aci_service.state)" 657 | ] 658 | } 659 | ], 660 | "metadata": { 661 | "authors": [ 662 | { 663 | "name": "xiaga" 664 | } 665 | ], 666 | "kernelspec": { 667 | "display_name": "Python (myenv2)", 668 | "language": "python", 669 | "name": "myenv2" 670 | }, 671 | "language_info": { 672 | "codemirror_mode": { 673 | "name": "ipython", 674 | "version": 3 675 | }, 676 | "file_extension": ".py", 677 | "mimetype": "text/x-python", 678 | "name": "python", 679 | "nbconvert_exporter": "python", 680 | "pygments_lexer": "ipython3", 681 | "version": "3.6.6" 682 | } 683 | }, 684 | "nbformat": 4, 685 | "nbformat_minor": 2 686 | } 687 | -------------------------------------------------------------------------------- /configuration.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Copyright (c) Microsoft Corporation. All rights reserved.\n", 8 | "\n", 9 | "Licensed under the MIT License." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Configuration\n", 17 | "\n", 18 | "_**Setting up your Azure Machine Learning services workspace and configuring your notebook library**_\n", 19 | "\n", 20 | "---\n", 21 | "---\n", 22 | "\n", 23 | "## Table of Contents\n", 24 | "\n", 25 | "1. [Introduction](#Introduction)\n", 26 | " 1. What is an Azure Machine Learning workspace\n", 27 | "1. [Setup](#Setup)\n", 28 | " 1. Azure subscription\n", 29 | " 1. Azure ML SDK and other library installation\n", 30 | " 1. Azure Container Instance registration\n", 31 | "1. [Configure your Azure ML Workspace](#Configure%20your%20Azure%20ML%20workspace)\n", 32 | " 1. Workspace parameters\n", 33 | " 1. Create a new workspace\n", 34 | "1. [Next steps](#Next%20steps)\n", 35 | "\n", 36 | "---\n", 37 | "\n", 38 | "## Introduction\n", 39 | "\n", 40 | "This notebook configures your library of notebooks to connect to an Azure Machine Learning (ML) workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook library to use an existing workspace or create a new workspace.\n", 41 | "\n", 42 | "Typically you will need to run this notebook only once per notebook library as all other notebooks will use connection information that is written here. If you want to redirect your notebook library to work with a different workspace, then you should re-run this notebook.\n", 43 | "\n", 44 | "In this notebook you will\n", 45 | "* Learn about getting an Azure subscription\n", 46 | "* Specify your workspace parameters\n", 47 | "* Access or create your workspace\n", 48 | "* Add a default compute cluster for your workspace\n", 49 | "\n", 50 | "### What is an Azure Machine Learning workspace\n", 51 | "\n", 52 | "An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models." 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "## Setup\n", 60 | "\n", 61 | "This section describes activities required before you can access any Azure ML services functionality." 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "### 1. Azure Subscription\n", 69 | "\n", 70 | "In order to create an Azure ML Workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com). Later in this notebook you will need information such as your subscription ID in order to create and access AML workspaces.\n", 71 | "\n", 72 | "### 2. Azure ML SDK and other library installation\n", 73 | "\n", 74 | "If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n", 75 | "\n", 76 | "Once installation is complete, the following cell checks the Azure ML SDK version:" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": { 83 | "tags": [ 84 | "install" 85 | ] 86 | }, 87 | "outputs": [], 88 | "source": [ 89 | "import azureml.core\n", 90 | "\n", 91 | "print(\"This notebook was created using version 1.0.6 of the Azure ML SDK\")\n", 92 | "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK." 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "## Configure your Azure ML workspace\n", 107 | "\n", 108 | "### Workspace parameters\n", 109 | "\n", 110 | "To use an AML Workspace, you will need to import the Azure ML SDK and supply the following information:\n", 111 | "* Your subscription id\n", 112 | "* A resource group name\n", 113 | "* (optional) The region that will host your workspace\n", 114 | "* A name for your workspace\n", 115 | "\n", 116 | "You can get your subscription ID from the [Azure portal](https://portal.azure.com).\n", 117 | "\n", 118 | "You will also need access to a [_resource group_](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n", 119 | "\n", 120 | "The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data.\n", 121 | "\n", 122 | "The name for your workspace is unique within the subscription and should be descriptive enough to discern among other AML Workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation.\n", 123 | "\n", 124 | "The following cell allows you to specify your workspace parameters. This cell uses the python method `os.getenv` to read values from environment variables which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. \n", 125 | "\n", 126 | "If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n", 127 | "\n", 128 | "Replace the default values in the cell below with your workspace parameters" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "import os\n", 138 | "\n", 139 | "subscription_id = os.getenv(\"SUBSCRIPTION_ID\", default=\"\")\n", 140 | "print(subscription_id)\n", 141 | "resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"my-automl-workshop-rg\")\n", 142 | "print(resource_group)\n", 143 | "workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"my-automl-workshop-ws\")\n", 144 | "print(workspace_name)\n", 145 | "workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")\n", 146 | "print(workspace_region)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "### Create a new workspace\n", 154 | "\n", 155 | "If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n", 156 | "\n", 157 | "**Note**: As with other Azure services, there are limits on certain resources (for example AmlCompute quota) associated with the Azure ML service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n", 158 | "\n", 159 | "This cell will create an Azure ML workspace for you in a subscription provided you have the correct permissions.\n", 160 | "\n", 161 | "This will fail if:\n", 162 | "* You do not have permission to create a workspace in the resource group\n", 163 | "* You do not have permission to create a resource group if it's non-existing.\n", 164 | "* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n", 165 | "\n", 166 | "If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": { 173 | "tags": [ 174 | "create workspace" 175 | ] 176 | }, 177 | "outputs": [], 178 | "source": [ 179 | "from azureml.core import Workspace\n", 180 | "\n", 181 | "# Create the workspace using the specified parameters\n", 182 | "ws = Workspace.create(name = workspace_name,\n", 183 | " subscription_id = subscription_id,\n", 184 | " resource_group = resource_group, \n", 185 | " location = workspace_region,\n", 186 | " create_resource_group = True,\n", 187 | " exist_ok = True)\n", 188 | "ws.get_details()\n", 189 | "\n", 190 | "# write the details of the workspace to a configuration file to the notebook library\n", 191 | "ws.write_config()" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "### verify the config file" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "import azureml.core\n", 208 | "import pandas as pd\n", 209 | "from azureml.core.workspace import Workspace\n", 210 | "\n", 211 | "ws = Workspace.from_config()\n", 212 | "\n", 213 | "output = {}\n", 214 | "output['SDK version'] = azureml.core.VERSION\n", 215 | "output['Subscription ID'] = ws.subscription_id\n", 216 | "output['Workspace'] = ws.name\n", 217 | "output['Resource Group'] = ws.resource_group\n", 218 | "output['Location'] = ws.location\n", 219 | "pd.set_option('display.max_colwidth', -1)\n", 220 | "outputDf = pd.DataFrame(data = output, index = [''])\n", 221 | "outputDf.T" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "### [OPTIONAL] Install additional packages" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "# auto ml core packages\n", 238 | "# !pip install --upgrade azureml-sdk[automl,notebooks,explain]" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "metadata": {}, 245 | "outputs": [], 246 | "source": [ 247 | "# Webservice schema for Swagger compatibility\n", 248 | "# !pip install --upgrade azureml-sdk[webservice-schema]" 249 | ] 250 | } 251 | ], 252 | "metadata": { 253 | "authors": [ 254 | { 255 | "name": "roastala" 256 | } 257 | ], 258 | "kernelspec": { 259 | "display_name": "Python (myenv2)", 260 | "language": "python", 261 | "name": "myenv2" 262 | }, 263 | "language_info": { 264 | "codemirror_mode": { 265 | "name": "ipython", 266 | "version": 3 267 | }, 268 | "file_extension": ".py", 269 | "mimetype": "text/x-python", 270 | "name": "python", 271 | "nbconvert_exporter": "python", 272 | "pygments_lexer": "ipython3", 273 | "version": "3.6.6" 274 | } 275 | }, 276 | "nbformat": 4, 277 | "nbformat_minor": 2 278 | } 279 | --------------------------------------------------------------------------------