├── .github ├── CONTRIBUTING.md ├── ISSUE_TEMPLATE.md └── PULL_REQUEST_TEMPLATE.md ├── .vscode └── settings.json ├── Allfiles └── Labs │ ├── 01 │ ├── basic-env.yml │ ├── conda-envs │ │ └── basic-env-cpu.yml │ ├── data-local-path.yml │ └── data │ │ └── diabetes.csv │ ├── 02 │ ├── basic-job │ │ ├── basic-job.yml │ │ └── src │ │ │ ├── diabetes.csv │ │ │ └── main.py │ ├── input-data-job │ │ ├── data-job.yml │ │ └── src │ │ │ └── main.py │ └── sweep-job │ │ ├── src │ │ ├── diabetes.csv │ │ └── main.py │ │ └── sweep-job.yml │ ├── 03 │ └── mlflow-job │ │ ├── mlflow-job.yml │ │ └── src │ │ ├── custom-mlflow.py │ │ └── mlflow-autolog.py │ ├── 04 │ └── mlflow-endpoint │ │ ├── create-endpoint.yml │ │ ├── mlflow-deployment.yml │ │ ├── model │ │ ├── MLmodel │ │ ├── conda.yaml │ │ ├── model.pkl │ │ └── requirements.txt │ │ └── sample-data.json │ └── 05 │ ├── fix-missing-data.yml │ ├── job.yml │ ├── normalize-data.yml │ ├── src │ ├── fix-missing-data.py │ ├── normalize-data.py │ ├── summary-stats.py │ ├── train-decision-tree.py │ └── train-logistic-regression.py │ ├── summary-stats.yml │ ├── train-decision-tree.yml │ └── train-logistic-regression.yml ├── Instructions └── Labs │ ├── 01-create-workspace.md │ ├── 02-run-python-job.md │ ├── 03-run-sweep-job.md │ ├── 04-use-mlflow-jobs.md │ ├── 05-deploy-managed-endpoint.md │ ├── 06-create-pipeline.md │ └── media │ ├── designer-pipeline-decision.png │ └── designer-pipeline-regression.png ├── LICENSE ├── _build.yml ├── _config.yml ├── index.md └── readme.md /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to Microsoft Learning Repositories 2 | 3 | MCT contributions are a key part of keeping the lab and demo content current as the Azure platform changes. We want to make it as easy as possible for you to contribute changes to the lab files. Here are a few guidelines to keep in mind as you contribute changes. 4 | 5 | ## GitHub Use & Purpose 6 | 7 | Microsoft Learning is using GitHub to publish the lab steps and lab scripts for courses that cover cloud services like Azure. Using GitHub allows the course’s authors and MCTs to keep the lab content current with Azure platform changes. Using GitHub allows the MCTs to provide feedback and suggestions for lab changes, and then the course authors can update lab steps and scripts quickly and relatively easily. 8 | 9 | > When you prepare to teach these courses, you should ensure that you are using the latest lab steps and scripts by downloading the appropriate files from GitHub. GitHub should not be used to discuss technical content in the course, or how to prep. It should only be used to address changes in the labs. 10 | 11 | It is strongly recommended that MCTs and Partners access these materials and in turn, provide them separately to students. Pointing students directly to GitHub to access Lab steps as part of an ongoing class will require them to access yet another UI as part of the course, contributing to a confusing experience for the student. An explanation to the student regarding why they are receiving separate Lab instructions can highlight the nature of an always-changing cloud-based interface and platform. Microsoft Learning support for accessing files on GitHub and support for navigation of the GitHub site is limited to MCTs teaching this course only. 12 | 13 | > As an alternative to pointing students directly to the GitHub repository, you can point students to the GitHub Pages website to view the lab instructions. The URL for the GitHub Pages website can be found at the top of the repository. 14 | 15 | To address general comments about the course and demos, or how to prepare for a course delivery, please use the existing MCT forums. 16 | 17 | ## Additional Resources 18 | 19 | A user guide has been provided for MCTs who are new to GitHub. It provides steps for connecting to GitHub, downloading and printing course materials, updating the scripts that students use in labs, and explaining how you can help ensure that this course’s content remains current. 20 | 21 | 22 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # Module: 00 2 | ## Lab/Demo: 00 3 | ### Task: 00 4 | #### Step: 00 5 | 6 | Description of issue 7 | 8 | Repro steps: 9 | 10 | 1. 11 | 1. 12 | 1. -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # Module: 00 2 | ## Lab/Demo: 00 3 | 4 | Fixes # . 5 | 6 | Changes proposed in this pull request: 7 | 8 | - 9 | - 10 | - -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "markdownlint.config": { 3 | "MD028": false, 4 | "MD025": { 5 | "front_matter_title": "" 6 | } 7 | } 8 | } -------------------------------------------------------------------------------- /Allfiles/Labs/01/basic-env.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json 2 | name: basic-env-scikit 3 | image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04 4 | conda_file: conda-envs/basic-env-cpu.yml 5 | description: Environment created from a Docker image plus Conda environment. -------------------------------------------------------------------------------- /Allfiles/Labs/01/conda-envs/basic-env-cpu.yml: -------------------------------------------------------------------------------- 1 | name: basic-env-cpu 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - python=3.7 6 | - scikit-learn 7 | - pandas 8 | - numpy 9 | - matplotlib 10 | - pip 11 | - pip: 12 | - azureml-defaults 13 | - azureml-core 14 | - azureml-mlflow 15 | - mlflow 16 | -------------------------------------------------------------------------------- /Allfiles/Labs/01/data-local-path.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/data.schema.json 2 | name: diabetes-data 3 | version: 1 4 | path: data 5 | description: Dataset pointing to diabetes data stored as CSV on local computer. Data is uploaded to default datastore. 6 | -------------------------------------------------------------------------------- /Allfiles/Labs/02/basic-job/basic-job.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json 2 | code: src 3 | command: >- 4 | python main.py 5 | environment: azureml:basic-env-scikit@latest 6 | compute: azureml: 7 | experiment_name: diabetes-example 8 | description: Train a Logistic Regression classification model on the diabetes dataset that is stored locally. -------------------------------------------------------------------------------- /Allfiles/Labs/02/basic-job/src/main.py: -------------------------------------------------------------------------------- 1 | # Import libraries 2 | import mlflow 3 | import argparse 4 | 5 | import pandas as pd 6 | from sklearn.model_selection import train_test_split 7 | from sklearn.linear_model import LogisticRegression 8 | 9 | # define functions 10 | def main(args): 11 | # enable auto logging 12 | mlflow.autolog() 13 | 14 | # read data 15 | df = pd.read_csv('diabetes.csv') 16 | 17 | # process data 18 | X_train, X_test, y_train, y_test = process_data(df) 19 | 20 | # train model 21 | model = train_model(args.reg_rate, X_train, X_test, y_train, y_test) 22 | 23 | def process_data(df): 24 | # split dataframe into X and y 25 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values 26 | 27 | # train/test split 28 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) 29 | 30 | # return splits and encoder 31 | return X_train, X_test, y_train, y_test 32 | 33 | def train_model(reg_rate, X_train, X_test, y_train, y_test): 34 | # train model 35 | model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train) 36 | 37 | # return model 38 | return model 39 | 40 | def parse_args(): 41 | # setup arg parser 42 | parser = argparse.ArgumentParser() 43 | 44 | # add arguments 45 | parser.add_argument("--reg-rate", dest="reg_rate", type=float, default=0.01) 46 | 47 | # parse args 48 | args = parser.parse_args() 49 | 50 | # return args 51 | return args 52 | 53 | # run script 54 | if __name__ == "__main__": 55 | # add space in logs 56 | print("\n\n") 57 | print("*" * 60) 58 | 59 | # parse args 60 | args = parse_args() 61 | 62 | # run main function 63 | main(args) 64 | 65 | # add space in logs 66 | print("*" * 60) 67 | print("\n\n") -------------------------------------------------------------------------------- /Allfiles/Labs/02/input-data-job/data-job.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json 2 | code: src 3 | command: >- 4 | python main.py 5 | --diabetes-csv ${{inputs.diabetes}} 6 | inputs: 7 | diabetes: 8 | path: azureml:diabetes-data:1 9 | mode: ro_mount 10 | environment: azureml:basic-env-scikit@latest 11 | compute: azureml: 12 | experiment_name: diabetes-data-example 13 | description: Train a classification model on diabetes data using a registered dataset as input. -------------------------------------------------------------------------------- /Allfiles/Labs/02/input-data-job/src/main.py: -------------------------------------------------------------------------------- 1 | # Import libraries 2 | import mlflow 3 | import argparse 4 | import glob 5 | 6 | import pandas as pd 7 | from sklearn.model_selection import train_test_split 8 | from sklearn.linear_model import LogisticRegression 9 | 10 | # define functions 11 | def main(args): 12 | # enable auto logging 13 | mlflow.autolog() 14 | 15 | # read data 16 | data_path = args.diabetes_csv 17 | all_files = glob.glob(data_path + "/*.csv") 18 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 19 | 20 | # process data 21 | X_train, X_test, y_train, y_test = process_data(df) 22 | 23 | # train model 24 | model = train_model(args.reg_rate, X_train, X_test, y_train, y_test) 25 | 26 | def process_data(df): 27 | # split dataframe into X and y 28 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values 29 | 30 | # train/test split 31 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) 32 | 33 | # return splits and encoder 34 | return X_train, X_test, y_train, y_test 35 | 36 | def train_model(reg_rate, X_train, X_test, y_train, y_test): 37 | # train model 38 | model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train) 39 | 40 | # return model 41 | return model 42 | 43 | def parse_args(): 44 | # setup arg parser 45 | parser = argparse.ArgumentParser() 46 | 47 | # add arguments 48 | parser.add_argument("--diabetes-csv", dest='diabetes_csv', type=str) 49 | parser.add_argument("--reg-rate", dest='reg_rate', type=float, default=0.01) 50 | 51 | # parse args 52 | args = parser.parse_args() 53 | 54 | # return args 55 | return args 56 | 57 | # run script 58 | if __name__ == "__main__": 59 | # add space in logs 60 | print("\n\n") 61 | print("*" * 60) 62 | 63 | # parse args 64 | args = parse_args() 65 | 66 | # run main function 67 | main(args) 68 | 69 | # add space in logs 70 | print("*" * 60) 71 | print("\n\n") -------------------------------------------------------------------------------- /Allfiles/Labs/02/sweep-job/src/main.py: -------------------------------------------------------------------------------- 1 | # Import libraries 2 | import mlflow 3 | import argparse 4 | import glob 5 | 6 | import pandas as pd 7 | from sklearn.model_selection import train_test_split 8 | from sklearn.ensemble import GradientBoostingClassifier 9 | 10 | # define functions 11 | def main(args): 12 | # enable auto logging 13 | mlflow.autolog() 14 | 15 | params = { 16 | "learning_rate": args.learning_rate, 17 | "n_estimators": args.n_estimators, 18 | } 19 | 20 | # read data 21 | data_path = args.diabetes_csv 22 | all_files = glob.glob(data_path + "/*.csv") 23 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 24 | 25 | # process data 26 | X_train, X_test, y_train, y_test = process_data(df) 27 | 28 | # train model 29 | model = train_model(params, X_train, X_test, y_train, y_test) 30 | 31 | def process_data(df): 32 | # split dataframe into X and y 33 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values 34 | 35 | # train/test split 36 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) 37 | 38 | # return splits and encoder 39 | return X_train, X_test, y_train, y_test 40 | 41 | def train_model(params, X_train, X_test, y_train, y_test): 42 | # train model 43 | model = GradientBoostingClassifier(**params) 44 | model = model.fit(X_train, y_train) 45 | 46 | # return model 47 | return model 48 | 49 | def parse_args(): 50 | # setup arg parser 51 | parser = argparse.ArgumentParser() 52 | 53 | # add arguments 54 | parser.add_argument("--diabetes-csv", type=str) 55 | parser.add_argument("--learning-rate", dest='learning_rate', type=float, default=0.1) 56 | parser.add_argument("--n-estimators", dest='n_estimators', type=int, default=100) 57 | 58 | # parse args 59 | args = parser.parse_args() 60 | 61 | # return args 62 | return args 63 | 64 | # run script 65 | if __name__ == "__main__": 66 | # add space in logs 67 | print("\n\n") 68 | print("*" * 60) 69 | 70 | # parse args 71 | args = parse_args() 72 | 73 | # run main function 74 | main(args) 75 | 76 | # add space in logs 77 | print("*" * 60) 78 | print("\n\n") -------------------------------------------------------------------------------- /Allfiles/Labs/02/sweep-job/sweep-job.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json 2 | type: sweep 3 | sampling_algorithm: grid 4 | trial: 5 | code: src 6 | command: >- 7 | python main.py 8 | --diabetes-csv ${{inputs.diabetes}} 9 | --learning-rate ${{search_space.learning_rate}} 10 | --n-estimators ${{search_space.n_estimators}} 11 | environment: azureml:basic-env-scikit@latest 12 | inputs: 13 | diabetes: 14 | path: azureml:diabetes-data:1 15 | mode: ro_mount 16 | compute: azureml:aml-cluster 17 | search_space: 18 | learning_rate: 19 | type: choice 20 | values: [0.01, 0.1, 1.0] 21 | n_estimators: 22 | type: choice 23 | values: [10, 100] 24 | objective: 25 | primary_metric: training_roc_auc_score 26 | goal: maximize 27 | limits: 28 | max_total_trials: 6 29 | max_concurrent_trials: 3 30 | timeout: 3600 31 | experiment_name: diabetes-sweep-example 32 | description: Run a hyperparameter sweep job for classification on diabetes dataset. -------------------------------------------------------------------------------- /Allfiles/Labs/03/mlflow-job/mlflow-job.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json 2 | code: src 3 | command: >- 4 | python mlflow-autolog.py 5 | --diabetes-csv ${{inputs.diabetes}} 6 | inputs: 7 | diabetes: 8 | path: azureml:diabetes-data:1 9 | mode: ro_mount 10 | environment: azureml:basic-env-scikit@latest 11 | compute: azureml: 12 | experiment_name: diabetes-mlflow-example 13 | description: Train a classification model on diabetes data using a registered dataset as input. Use MLflow to track parameter, metric, and artifact. -------------------------------------------------------------------------------- /Allfiles/Labs/03/mlflow-job/src/custom-mlflow.py: -------------------------------------------------------------------------------- 1 | # Import libraries 2 | import mlflow 3 | import argparse 4 | import glob 5 | import joblib 6 | 7 | import pandas as pd 8 | import numpy as np 9 | import matplotlib.pyplot as plt 10 | from sklearn.model_selection import train_test_split 11 | from sklearn.linear_model import LogisticRegression 12 | from sklearn.metrics import confusion_matrix 13 | 14 | # define functions 15 | def main(args): 16 | # read data 17 | data_path = args.diabetes_csv 18 | all_files = glob.glob(data_path + "/*.csv") 19 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 20 | 21 | # process data 22 | X_train, X_test, y_train, y_test = process_data(df) 23 | 24 | # train model 25 | reg_rate = args.reg_rate 26 | mlflow.log_param("Regularization rate", reg_rate) 27 | model = train_model(reg_rate, X_train, X_test, y_train, y_test) 28 | 29 | def process_data(df): 30 | # split dataframe into X and y 31 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values 32 | 33 | # train/test split 34 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) 35 | 36 | # return splits and encoder 37 | return X_train, X_test, y_train, y_test 38 | 39 | def train_model(reg_rate, X_train, X_test, y_train, y_test): 40 | # train model 41 | model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train) 42 | 43 | # calculate accuracy 44 | y_pred = model.predict(X_test) 45 | acc = np.average(y_pred == y_test) 46 | mlflow.log_metric("Accuracy", np.float(acc)) 47 | 48 | # create confusion matrix 49 | conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred) 50 | fig, ax = plt.subplots(figsize=(7.5, 7.5)) 51 | ax.matshow(conf_matrix, cmap=plt.cm.Blues, alpha=0.3) 52 | for i in range(conf_matrix.shape[0]): 53 | for j in range(conf_matrix.shape[1]): 54 | ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center', size='xx-large') 55 | 56 | plt.xlabel('Predictions', fontsize=18) 57 | plt.ylabel('Actuals', fontsize=18) 58 | plt.title('Confusion Matrix', fontsize=18) 59 | plt.savefig("ConfusionMatrix.png") 60 | mlflow.log_artifact("ConfusionMatrix.png") 61 | 62 | # return model 63 | return model 64 | 65 | def parse_args(): 66 | # setup arg parser 67 | parser = argparse.ArgumentParser() 68 | 69 | # add arguments 70 | parser.add_argument("--diabetes-csv", dest='diabetes_csv', type=str) 71 | parser.add_argument("--reg-rate", dest='reg_rate', type=float, default=0.01) 72 | 73 | # parse args 74 | args = parser.parse_args() 75 | 76 | # return args 77 | return args 78 | 79 | # run script 80 | if __name__ == "__main__": 81 | # add space in logs 82 | print("\n\n") 83 | print("*" * 60) 84 | 85 | # parse args 86 | args = parse_args() 87 | 88 | # run main function 89 | main(args) 90 | 91 | # add space in logs 92 | print("*" * 60) 93 | print("\n\n") -------------------------------------------------------------------------------- /Allfiles/Labs/03/mlflow-job/src/mlflow-autolog.py: -------------------------------------------------------------------------------- 1 | # Import libraries 2 | import mlflow 3 | import argparse 4 | import glob 5 | 6 | import pandas as pd 7 | from sklearn.model_selection import train_test_split 8 | from sklearn.linear_model import LogisticRegression 9 | 10 | # define functions 11 | def main(args): 12 | # enable auto logging 13 | mlflow.autolog() 14 | 15 | # read data 16 | data_path = args.diabetes_csv 17 | all_files = glob.glob(data_path + "/*.csv") 18 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 19 | 20 | # process data 21 | X_train, X_test, y_train, y_test = process_data(df) 22 | 23 | # train model 24 | model = train_model(args.reg_rate, X_train, X_test, y_train, y_test) 25 | 26 | def process_data(df): 27 | # split dataframe into X and y 28 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values 29 | 30 | # train/test split 31 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) 32 | 33 | # return splits and encoder 34 | return X_train, X_test, y_train, y_test 35 | 36 | def train_model(reg_rate, X_train, X_test, y_train, y_test): 37 | # train model 38 | model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train) 39 | 40 | # return model 41 | return model 42 | 43 | def parse_args(): 44 | # setup arg parser 45 | parser = argparse.ArgumentParser() 46 | 47 | # add arguments 48 | parser.add_argument("--diabetes-csv", dest='diabetes_csv', type=str) 49 | parser.add_argument("--reg-rate", dest='reg_rate', type=float, default=0.01) 50 | 51 | # parse args 52 | args = parser.parse_args() 53 | 54 | # return args 55 | return args 56 | 57 | # run script 58 | if __name__ == "__main__": 59 | # add space in logs 60 | print("\n\n") 61 | print("*" * 60) 62 | 63 | # parse args 64 | args = parse_args() 65 | 66 | # run main function 67 | main(args) 68 | 69 | # add space in logs 70 | print("*" * 60) 71 | print("\n\n") -------------------------------------------------------------------------------- /Allfiles/Labs/04/mlflow-endpoint/create-endpoint.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json 2 | auth_mode: key -------------------------------------------------------------------------------- /Allfiles/Labs/04/mlflow-endpoint/mlflow-deployment.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json 2 | model: 3 | name: sample-mlflow-sklearn-model 4 | version: 1 5 | path: model 6 | type: mlflow_model 7 | instance_type: Standard_F4s_v2 8 | instance_count: 1 -------------------------------------------------------------------------------- /Allfiles/Labs/04/mlflow-endpoint/model/MLmodel: -------------------------------------------------------------------------------- 1 | artifact_path: model 2 | flavors: 3 | python_function: 4 | env: conda.yaml 5 | loader_module: mlflow.sklearn 6 | model_path: model.pkl 7 | python_version: 3.7.13 8 | sklearn: 9 | pickled_model: model.pkl 10 | serialization_format: cloudpickle 11 | sklearn_version: 0.24.1 12 | run_id: 3cdc6ac1-76e3-47e3-a104-5599c222846e 13 | signature: 14 | inputs: '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": [-1, 8]}}]' 15 | outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "int64", "shape": [-1]}}]' 16 | utc_time_created: '2021-11-18 10:39:04.318176' 17 | -------------------------------------------------------------------------------- /Allfiles/Labs/04/mlflow-endpoint/model/conda.yaml: -------------------------------------------------------------------------------- 1 | channels: 2 | - conda-forge 3 | dependencies: 4 | - python=3.7.13 5 | - pip<=20.2.4 6 | - pip: 7 | - mlflow 8 | - cloudpickle==2.2.0 9 | - psutil==5.8.0 10 | - scikit-learn==0.24.1 11 | name: mlflow-env -------------------------------------------------------------------------------- /Allfiles/Labs/04/mlflow-endpoint/model/model.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MicrosoftLearning/mslearn-aml-cli/191e8804f8005b2cad8eb4230ffb7cf5ada14068/Allfiles/Labs/04/mlflow-endpoint/model/model.pkl -------------------------------------------------------------------------------- /Allfiles/Labs/04/mlflow-endpoint/model/requirements.txt: -------------------------------------------------------------------------------- 1 | mlflow 2 | cloudpickle==2.2.0 3 | scikit-learn==0.24.2 -------------------------------------------------------------------------------- /Allfiles/Labs/04/mlflow-endpoint/sample-data.json: -------------------------------------------------------------------------------- 1 | { 2 | "input_data": [ 3 | [2,180,74,24,21,23.9091702,1.488172308,60], 4 | [0,148,58,11,179,39.19207553,0.160829008,45] 5 | ] 6 | } -------------------------------------------------------------------------------- /Allfiles/Labs/05/fix-missing-data.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 2 | name: remove_empty_rows 3 | display_name: Remove Empty Rows 4 | version: 1 5 | type: command 6 | inputs: 7 | input_data: 8 | type: uri_folder 9 | outputs: 10 | output_data: 11 | type: uri_folder 12 | code: ./src 13 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest 14 | command: >- 15 | python fix-missing-data.py 16 | --input_data ${{inputs.input_data}} 17 | --output_data ${{outputs.output_data}} 18 | 19 | -------------------------------------------------------------------------------- /Allfiles/Labs/05/job.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json 2 | type: pipeline 3 | experiment_name: diabetes-pipeline-example 4 | 5 | compute: azureml: 6 | settings: 7 | datastore: azureml:workspaceblobstore 8 | 9 | outputs: 10 | pipeline_job_trained_model: 11 | mode: upload 12 | 13 | jobs: 14 | stats_job: 15 | type: command 16 | component: file:./summary-stats.yml 17 | inputs: 18 | input_data: 19 | type: uri_folder 20 | path: azureml:diabetes-data:1 21 | 22 | fix_missing_job: 23 | type: command 24 | component: file:./fix-missing-data.yml 25 | inputs: 26 | input_data: 27 | type: uri_folder 28 | path: azureml:diabetes-data:1 29 | outputs: 30 | output_data: 31 | mode: upload 32 | 33 | normalize_job: 34 | type: command 35 | component: file:./normalize-data.yml 36 | inputs: 37 | input_data: ${{parent.jobs.fix_missing_job.outputs.output_data}} 38 | outputs: 39 | output_data: 40 | mode: upload 41 | 42 | train_job: 43 | type: command 44 | component: file:./train-decision-tree.yml 45 | inputs: 46 | training_data: ${{parent.jobs.normalize_job.outputs.output_data}} 47 | outputs: 48 | model_output: ${{parent.outputs.pipeline_job_trained_model}} -------------------------------------------------------------------------------- /Allfiles/Labs/05/normalize-data.yml: -------------------------------------------------------------------------------- 1 | # 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 3 | name: normalize_data 4 | display_name: Normalize Numerical Columns 5 | version: 1 6 | type: command 7 | inputs: 8 | input_data: 9 | type: uri_folder 10 | outputs: 11 | output_data: 12 | type: uri_folder 13 | code: ./src 14 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest 15 | command: >- 16 | python normalize-data.py 17 | --input_data ${{inputs.input_data}} 18 | --output_data ${{outputs.output_data}} 19 | # 20 | -------------------------------------------------------------------------------- /Allfiles/Labs/05/src/fix-missing-data.py: -------------------------------------------------------------------------------- 1 | # import libraries 2 | import argparse 3 | import glob 4 | from pathlib import Path 5 | import pandas as pd 6 | import mlflow 7 | 8 | # get parameters 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument("--input_data", type=str, help='Path to input data') 11 | parser.add_argument('--output_data', type=str, help='Path of output data') 12 | args = parser.parse_args() 13 | 14 | # load the data (passed as an input dataset) 15 | data_path = args.input_data 16 | all_files = glob.glob(data_path + "/*.csv") 17 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 18 | 19 | # log row count input data 20 | row_count = (len(df)) 21 | mlflow.log_metric('row count input data', row_count) 22 | 23 | # remove nulls 24 | df = df.dropna() 25 | 26 | # log processed rows 27 | row_count_processed = (len(df)) 28 | mlflow.log_metric('row count output data', row_count_processed) 29 | 30 | # set the processed data as output 31 | output_df = df.to_csv((Path(args.output_data) / "output_data.csv")) -------------------------------------------------------------------------------- /Allfiles/Labs/05/src/normalize-data.py: -------------------------------------------------------------------------------- 1 | # import libraries 2 | import argparse 3 | import os 4 | import glob 5 | from pathlib import Path 6 | import pandas as pd 7 | import mlflow 8 | from sklearn.preprocessing import MinMaxScaler 9 | 10 | # get parameters 11 | parser = argparse.ArgumentParser() 12 | parser.add_argument("--input_data", type=str, help='Path to input data') 13 | parser.add_argument('--output_data', type=str, help='Path of output data') 14 | args = parser.parse_args() 15 | 16 | # load the data (passed as an input dataset) 17 | print("files in input_data path: ") 18 | arr = os.listdir(args.input_data) 19 | print(arr) 20 | 21 | for filename in arr: 22 | print("reading file: %s ..." % filename) 23 | with open(os.path.join(args.input_data, filename), "r") as handle: 24 | print(handle.read()) 25 | 26 | data_path = args.input_data 27 | all_files = glob.glob(data_path + "/*.csv") 28 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 29 | 30 | # log row count input data 31 | row_count = (len(df)) 32 | mlflow.log_metric('row count input data', row_count) 33 | 34 | # normalize the numeric columns 35 | scaler = MinMaxScaler() 36 | num_cols = ['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree'] 37 | df[num_cols] = scaler.fit_transform(df[num_cols]) 38 | 39 | # log processed rows 40 | row_count_processed = (len(df)) 41 | mlflow.log_metric('row count output data', row_count_processed) 42 | 43 | # set the processed data as output 44 | output_df = df.to_csv((Path(args.output_data) / "output_data.csv")) -------------------------------------------------------------------------------- /Allfiles/Labs/05/src/summary-stats.py: -------------------------------------------------------------------------------- 1 | # import libraries 2 | import argparse 3 | import glob 4 | from pathlib import Path 5 | import pandas as pd 6 | import mlflow 7 | 8 | # get parameters 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument("--input_data", type=str, help='Path to input data') 11 | args = parser.parse_args() 12 | 13 | # read data 14 | data_path = args.input_data 15 | all_files = glob.glob(data_path + "/*.csv") 16 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 17 | 18 | # log row count 19 | row_count = (len(df)) 20 | mlflow.log_metric('row count', row_count) 21 | 22 | # get summary statistics 23 | stats = df.describe() 24 | stats.to_csv('summary_statistics.csv') 25 | mlflow.log_artifact('summary_statistics.csv') 26 | -------------------------------------------------------------------------------- /Allfiles/Labs/05/src/train-decision-tree.py: -------------------------------------------------------------------------------- 1 | # Import libraries 2 | import argparse 3 | import glob 4 | import pandas as pd 5 | import numpy as np 6 | import mlflow 7 | from sklearn.model_selection import train_test_split 8 | from sklearn.tree import DecisionTreeClassifier 9 | from sklearn.metrics import roc_auc_score 10 | from sklearn.metrics import roc_curve 11 | import matplotlib.pyplot as plt 12 | import pickle 13 | from pathlib import Path 14 | 15 | # get parameters 16 | parser = argparse.ArgumentParser("train") 17 | parser.add_argument("--training_data", type=str, help="Path to training data") 18 | parser.add_argument("--model_output", type=str, help="Path of output model") 19 | 20 | args = parser.parse_args() 21 | 22 | training_data = args.training_data 23 | 24 | # load the prepared data file in the training folder 25 | print("Loading Data...") 26 | data_path = args.training_data 27 | all_files = glob.glob(data_path + "/*.csv") 28 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 29 | 30 | # Separate features and labels 31 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values 32 | 33 | # Split data into training set and test set 34 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) 35 | 36 | # Train a decision tree model 37 | print('Training a decision tree model...') 38 | model = DecisionTreeClassifier().fit(X_train, y_train) 39 | 40 | # calculate accuracy 41 | y_hat = model.predict(X_test) 42 | acc = np.average(y_hat == y_test) 43 | print('Accuracy:', acc) 44 | mlflow.log_metric('Accuracy', np.float(acc)) 45 | 46 | # calculate AUC 47 | y_scores = model.predict_proba(X_test) 48 | auc = roc_auc_score(y_test,y_scores[:,1]) 49 | print('AUC: ' + str(auc)) 50 | mlflow.log_metric('AUC', np.float(auc)) 51 | 52 | # plot ROC curve 53 | fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1]) 54 | fig = plt.figure(figsize=(6, 4)) 55 | # Plot the diagonal 50% line 56 | plt.plot([0, 1], [0, 1], 'k--') 57 | # Plot the FPR and TPR achieved by our model 58 | plt.plot(fpr, tpr) 59 | plt.xlabel('False Positive Rate') 60 | plt.ylabel('True Positive Rate') 61 | plt.title('ROC Curve') 62 | plt.savefig("ROCcurve.png") 63 | mlflow.log_artifact("ROCcurve.png") 64 | 65 | # Output the model and test data 66 | pickle.dump(model, open((Path(args.model_output) / "model.sav"), "wb")) -------------------------------------------------------------------------------- /Allfiles/Labs/05/src/train-logistic-regression.py: -------------------------------------------------------------------------------- 1 | # Import libraries 2 | import argparse 3 | import glob 4 | import pandas as pd 5 | import numpy as np 6 | import mlflow 7 | from sklearn.model_selection import train_test_split 8 | from sklearn.linear_model import LogisticRegression 9 | from sklearn.metrics import confusion_matrix 10 | import matplotlib.pyplot as plt 11 | import pickle 12 | from pathlib import Path 13 | 14 | # get parameters 15 | parser = argparse.ArgumentParser("train") 16 | parser.add_argument("--training_data", type=str, help="Path to training data") 17 | parser.add_argument("--reg_rate", type=float, default=0.01) 18 | parser.add_argument("--model_output", type=str, help="Path of output model") 19 | 20 | args = parser.parse_args() 21 | 22 | training_data = args.training_data 23 | 24 | # load the prepared data file in the training folder 25 | print("Loading Data...") 26 | data_path = args.training_data 27 | all_files = glob.glob(data_path + "/*.csv") 28 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False) 29 | 30 | # Separate features and labels 31 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values 32 | 33 | # Split data into training set and test set 34 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) 35 | 36 | # Train a logistic regression model 37 | print('Training a logistic regression model...') 38 | model = LogisticRegression(C=1/args.reg_rate, solver="liblinear").fit(X_train, y_train) 39 | 40 | # calculate accuracy 41 | y_pred = model.predict(X_test) 42 | acc = np.average(y_pred == y_test) 43 | mlflow.log_metric("Accuracy", np.float(acc)) 44 | 45 | # create confusion matrix 46 | conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred) 47 | fig, ax = plt.subplots(figsize=(7.5, 7.5)) 48 | ax.matshow(conf_matrix, cmap=plt.cm.Blues, alpha=0.3) 49 | for i in range(conf_matrix.shape[0]): 50 | for j in range(conf_matrix.shape[1]): 51 | ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center', size='xx-large') 52 | 53 | plt.xlabel('Predictions', fontsize=18) 54 | plt.ylabel('Actuals', fontsize=18) 55 | plt.title('Confusion Matrix', fontsize=18) 56 | plt.savefig("ConfusionMatrix.png") 57 | mlflow.log_artifact("ConfusionMatrix.png") 58 | 59 | # Output the model and test data 60 | pickle.dump(model, open((Path(args.model_output) / "model.sav"), "wb")) -------------------------------------------------------------------------------- /Allfiles/Labs/05/summary-stats.yml: -------------------------------------------------------------------------------- 1 | # 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 3 | name: get_summary_statistics 4 | display_name: Get Summary Statistics 5 | version: 1 6 | type: command 7 | inputs: 8 | input_data: 9 | type: uri_folder 10 | code: ./src 11 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest 12 | command: >- 13 | python summary-stats.py 14 | --input_data ${{inputs.input_data}} 15 | # 16 | -------------------------------------------------------------------------------- /Allfiles/Labs/05/train-decision-tree.yml: -------------------------------------------------------------------------------- 1 | # 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 3 | name: train_decision_tree_classifier_model 4 | display_name: Train a Decision Tree Classifier Model 5 | version: 1 6 | type: command 7 | inputs: 8 | training_data: 9 | type: uri_folder 10 | outputs: 11 | model_output: 12 | type: uri_folder 13 | code: ./src 14 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest 15 | command: >- 16 | python train-decision-tree.py 17 | --training_data ${{inputs.training_data}} 18 | --model_output ${{outputs.model_output}} 19 | # 20 | -------------------------------------------------------------------------------- /Allfiles/Labs/05/train-logistic-regression.yml: -------------------------------------------------------------------------------- 1 | # 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 3 | name: train_logistic_regression_classifier_model 4 | display_name: Train a Logistic Regression Classifier Model 5 | version: 1 6 | type: command 7 | inputs: 8 | training_data: 9 | type: uri_folder 10 | regularization_rate: 11 | type: number 12 | outputs: 13 | model_output: 14 | type: uri_folder 15 | code: ./src 16 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest 17 | command: >- 18 | python train-logistic-regression.py 19 | --training_data ${{inputs.training_data}} 20 | --reg_rate ${{inputs.regularization_rate}} 21 | --model_output ${{outputs.model_output}} 22 | # 23 | -------------------------------------------------------------------------------- /Instructions/Labs/01-create-workspace.md: -------------------------------------------------------------------------------- 1 | --- 2 | lab: 3 | title: 'Lab: Create an Azure Machine Learning workspace and assets with the CLI (v2)' 4 | module: 'Module: Create Azure Machine Learning resources with the CLI (v2)' 5 | --- 6 | 7 | # Create an Azure Machine Learning workspace and assets with the CLI (v2) 8 | 9 | In this exercise, you will create and explore an Azure Machine Learning workspace using the Azure Cloud Shell. 10 | 11 | ## Set up the Azure Cloud Shell and install the Azure Machine Learning extension 12 | 13 | To start, open the Azure Cloud Shell, install the Azure Machine Learning extension and clone the Git repo. 14 | 15 | 1. In a browser, open the Azure portal at [http://portal.azure.com](https://portal.azure.com/?azure-portal=true), signing in with your Microsoft account. 16 | 1. Select the [>_] (*Cloud Shell*) button at the top of the page to the right of the search box. This opens a Cloud Shell pane at the bottom of the portal. 17 | 1. The first time you open the cloud shell, you will be asked to choose the type of shell you want to use (*Bash* or *PowerShell*). Select **Bash**. 18 | 1. If you are asked to create storage for your cloud shell, check that the correct subscription is specified and select **Create storage**. Wait for the storage to be created. 19 | 1. Check to see if the Azure Machine Learning extension is installed with the following command: 20 | 21 | ```azurecli 22 | az extension list 23 | ``` 24 | 25 | > **Tip:** Adding **-o table** at the end of the command will format the output in a table, making it easier to read for some people. The command would then be: `az extension list -o table` 26 | 27 | 1. If it is not intalled, use the following command to install the Azure Machine Learning extension: 28 | 29 | ```azurecli 30 | az extension add -n ml -y 31 | ``` 32 | 33 | 1. In the command shell, clone this Github repository to download all necessary files which are stored in the *Allfiles* folder. 34 | 35 | ```azurecli 36 | git clone https://github.com/MicrosoftLearning/mslearn-aml-cli.git mslearn-aml-cli 37 | ``` 38 | 39 | 1. The files are downloaded to a folder named **mslearn-aml-cli**. To see the files in your Cloud Shell storage and work with them, type the following command in the shell: 40 | 41 | ```azurecli 42 | code . 43 | ``` 44 | 45 | ## Create an Azure resource group and set as default 46 | 47 | To create a workspace with the CLI (v2), you need a resource group. You can create a new one with the CLI or use an existing resource group. Either way, make sure to set a resource group as the default to complete this exercise. 48 | 49 | > **Tip:** You can get a list of available locations with the `az account list-locations -o table` command. Use the **name** column for the location name 50 | 51 | 1. Run the following command to create a resource group and use a location close to you: 52 | 53 | ```azurecli 54 | az group create --name "diabetes-dev-rg" --location "eastus" 55 | ``` 56 | 57 | 1. Set the resource group as the default to avoid having to specify it on every command going forward: 58 | 59 | ```azurecli 60 | az configure --defaults group="diabetes-dev-rg" 61 | ``` 62 | 63 | ## Create an Azure Machine Learning workspace and set as default 64 | 65 | As its name suggests, a workspace is a centralized place to manage all of the Azure ML assets you need to work on a machine learning project. 66 | 67 | 1. Create a workspace: 68 | 69 | ```azurecli 70 | az ml workspace create --name "aml-diabetes-dev" 71 | ``` 72 | 73 | 1. Set the workspace as the default: 74 | 75 | ```azurecli 76 | az configure --defaults workspace="aml-diabetes-dev" 77 | ``` 78 | 79 | 1. Check your work by signing in to the [Azure Machine Learning Studio](https://ml.azure.com). After you sign in, choose the *aml-diabetes-dev* workspace to open it. 80 | 81 | ## Create a Compute Instance 82 | 83 | To run a notebook, you'll need a compute instance. 84 | 85 | In this exercise, you'll create a compute instance with the following settings: 86 | 87 | - `--name`: *Name of compute instance. Has to be unique and fewer than 24 characters.* 88 | - `--size`: STANDARD_DS11_V2 89 | - `--type`: ComputeInstance 90 | - `--workspace-name`: *Will use the default workspace you've configured so you don't need to specify.* 91 | - `--resource-group`: *Will use the default resource group you've configured so you don't need to specify.* 92 | 93 | 1. Run the `az ml compute create` with the settings listed above. Change the name to make it unique in your region. It should look something like this: 94 | 95 | ```azurecli 96 | az ml compute create --name "testdev-vm" --size STANDARD_DS11_V2 --type ComputeInstance 97 | ``` 98 | 99 | > **Note:** If a compute instance with the name "testdev-vm" already exists, change the name to make it unique within your Azure region, with a maximum of 24 characters. If you get an error because the name is not unique, delete the partially created compute instance with `az ml compute delete --name "compute-instance-name"`. 100 | 101 | 1. The command will take 2 to 5 minutes to complete. After that, switch to [Azure Machine Learning Studio](https://ml.azure.com), open the **Compute** tab and confirm that the instance has been created and is running. 102 | 103 | ## Create an environment 104 | 105 | To execute a Python script, you'll need to install any necessary libraries and packages. To automate the installation of packages, you can use an environment. 106 | 107 | To create an environment from a Docker image plus a Conda environment with the CLI (v2) you need two files: 108 | 109 | - The specification YAML file, including the environment name, version and base Docker image. 110 | - The Conda environment file, including the libraries and packages you want installed. 111 | 112 | The necessary YAML files have already been created for you and are part of the **mslearn-aml-cli** repo you cloned in the Azure Cloud Shell. 113 | 114 | 1. To navigate to the YAML files, run the following command in the Cloud Shell: 115 | 116 | ```azurecli 117 | code . 118 | ``` 119 | 120 | 1. Navigate to the **mslearn-aml-cli/Allfiles/Labs/01** folder. 121 | 1. Select the **basic-env.yml** file to open it. Explore its contents which describes how the environment should be created within the Azure ML workspace. 122 | 1. Select the **conda-envs/basic-env-cpu.yml** file to open it. Explore its contents which list the libraries that need to be installed on the compute. 123 | 1. Run the following command to create the environment : 124 | 125 | ```azurecli 126 | az ml environment create --file ./mslearn-aml-cli/Allfiles/Labs/01/basic-env.yml 127 | ``` 128 | 129 | 1. Once the environment is created, a summary is shown in the prompt. You can also view the environment in the **Azure Machine Learning Studio** in the **Environments** tab, under *Custom environments*. 130 | 131 | ## Create a dataset 132 | 133 | To create a dataset in the workspace from a local CSV, you need two files: 134 | 135 | - The specification YAML file, including the dataset name, version and local path of the CSV file. Navigate to **Allfiles/Labs/01/data-local-path.yml** to explore the contents of this file. 136 | - The CSV file containing data. In this exercise, you'll work with diabetes data. Navigate to **Allfiles/Labs/01/data/diabetes.csv** to explore the contents of this file. 137 | 138 | Before you create a dataset, you can explore the files by using the `code .` command in the Cloud Shell. 139 | 140 | 1. Run the following command to create a dataset from the configuration described in `data-local-path.yml`: 141 | 142 | ```azurecli 143 | az ml data create --file ./mslearn-aml-cli/Allfiles/Labs/01/data-local-path.yml 144 | ``` 145 | 146 | >**Note:** When you create a dataset from a local path, the workspace will automatically upload the dataset to the default datastore. In this case, it will be uploaded to the storage account which was created when you created the workspace. 147 | 148 | 2. Once the dataset is created, a summary is shown in the prompt. You can also view the environment in the **Azure Machine Learning Studio** in the **Data** tab, under *Data assets*. 149 | 150 | ## Clean up resources 151 | 152 | Once you've finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription. 153 | 154 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary. 155 | 156 | ```azurecli 157 | az ml compute stop --name "testdev-vm" --no-wait 158 | ``` 159 | 160 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat this lab to create the workspace and prepare the environment first. 161 | 162 | To delete the complete Azure Machine Learning workspace and all assets you created, you can use the following command in the CLI: 163 | 164 | ```azurecli 165 | az ml workspace delete 166 | ``` 167 | -------------------------------------------------------------------------------- /Instructions/Labs/02-run-python-job.md: -------------------------------------------------------------------------------- 1 | --- 2 | lab: 3 | title: 'Lab: Run a basic Python training job' 4 | module: 'Module: Run jobs in Azure Machine Learning with CLI (v2)' 5 | --- 6 | 7 | # Run a basic Python training job 8 | 9 | In this exercise, you will train a model with a Python script. The model training will be submitted with the CLI (v2). First, you'll train a model based on a local CSV dataset. Next, you'll train a model using a dataset registered in the Azure Machine Learning workspace. 10 | 11 | ## Prerequisites 12 | 13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your cloud shell environment and your Azure Machine Learning environment. 14 | 15 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account. 16 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell. 17 | 1. If your compute instance is stopped. Start the instance again by using the following command. Change `` to your compute instance name before running the code: 18 | 19 | ```azurecli 20 | az ml compute start --name "" 21 | ``` 22 | 23 | 1. To confirm that the instance is now in a **Running** state, open another tab in your browser and navigate to the [Azure Machine Learning Studio](https://ml.azure.com). Open the **Compute** tab and select **Compute instances**. 24 | 25 | ## Train a model 26 | 27 | To track a machine learning workflow, you can run the training script using a **job**. The configuration of the job can be described in a YAML file. 28 | 29 | In this exercise, you'll train a Logistic Regression model. Explore the training script **main.py** by navigating to **mslearn-aml-cli/Allfiles/Labs/02/basic-job/src/main.py**. The dataset used is in the same folder and stored as **diabetes.csv**. 30 | 31 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo: 32 | 33 | ```azurecli 34 | code . 35 | ``` 36 | 37 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/02/basic-job** and open **basic-job.yml** by selecting the file. 38 | 1. Change the **compute** value by replacing ` ` with the name of your compute instance. 39 | 1. Save the file by selecting the top right corner of the text editor and then selecting **Save** 40 | 1. Run the job by using the following command: 41 | 42 | ```azurecli 43 | az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/02/basic-job/basic-job.yml 44 | ``` 45 | 46 | 1. Return to your Azure Machine Learning Studio browser tab, go to the **Jobs** page and locate the **diabetes-example** experiment in the **All experiments** tab. 47 | 1. Open the run to monitor the job and refresh the view if necessary. Once completed, you can explore the details of the job which are stored in the experiment run. 48 | 49 | ## Train a model with dataset from datastore 50 | 51 | In the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](Instructions/Labs/01-create-workspace.md) lab, you created a dataset named **diabetes-data**. To check that the dataset exists within your workspace, you can navigate to the Azure Machine Learning Studio and select the **Data** item from the left menu and then the **Data assets** tab. 52 | 53 | Instead of storing a CSV file in the same folder as the training script, you can also train a model using a registered dataset as input. 54 | 55 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/02/input-data-job** and open **data-job.yml** by selecting the file. 56 | 1. Change the **compute** value with the name of your compute instance and save the file. 57 | 58 | > **Note** that the command now runs the **main.py** script with the parameter **--diabetes-csv**. The input of that parameter is defined in the **inputs.diabetes** value. It takes version 1 of the **diabetes-data** dataset from the Azure ML workspace. 59 | 60 | 1. Use the following command to run the job: 61 | 62 | ```azurecli 63 | az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/02/input-data-job/data-job.yml 64 | ``` 65 | 66 | 1. Go to the Azure Machine Learning Studio and locate the **diabetes-data-example** experiment. Open the run to monitor the job. Refresh the view if necessary. 67 | 1. Once completed, you can explore the details of the job which are stored in the experiment run. Note that now, it lists the input dataset **diabetes-data**. 68 | 69 | ## Clean up resources 70 | 71 | When you're finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription. 72 | 73 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary. 74 | 75 | ```azurecli 76 | az ml compute stop --name "testdev-vm" --no-wait 77 | ``` 78 | 79 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first. 80 | 81 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI: 82 | 83 | ```azurecli 84 | az ml workspace delete 85 | ``` 86 | -------------------------------------------------------------------------------- /Instructions/Labs/03-run-sweep-job.md: -------------------------------------------------------------------------------- 1 | --- 2 | lab: 3 | title: 'Lab: Perform hyperparameter tuning with a sweep job' 4 | module: 'Module: Run jobs in Azure Machine Learning with CLI (v2)' 5 | --- 6 | 7 | # Run a sweep job to tune hyperparameters 8 | 9 | In this exercise, you will perform hyperparameter tuning when training a model with a Python script.The model training will be submitted with the CLI (v2). 10 | 11 | ## Prerequisites 12 | 13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment. 14 | 15 | You'll run all commands in this lab from the Azure Cloud Shell. If this is your first time using the cloud shell, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](Instructions/Labs/01-create-workspace.md) lab to set up the cloud shell environment. 16 | 17 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account. 18 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell. 19 | 1. To train multiple models in parallel, you'll use a compute cluster to train the models. To create a compute cluster, use the following command: 20 | 21 | ```azurecli 22 | az ml compute create --name "aml-cluster" --size STANDARD_DS11_V2 --max-instances 2 --type AmlCompute 23 | ``` 24 | 25 | 1. To confirm that the cluster has been created, open another tab in your browser and navigate to the [Azure Machine Learning Studio](https://ml.azure.com). Open the **Compute** tab and select **Compute clusters**, you should see there a cluster named **aml-cluster**. 26 | > **Note:** Creating a compute cluster with two maximum instances means you can train two models in parallel. If you want to train more models in parallel, increase the number for --max-instances. You can also change this after the cluster is created. 27 | 28 | ## Run a sweep job 29 | 30 | To train multiple models with varying hyperparameters, you can run the training script using a **sweep job**. Just like with a command job, the configuration of the sweep job can be described in a YAML file. 31 | 32 | In this exercise, you'll train a Gradient Boosting Classifier model. Explore the training script **main.py** by navigating to **mslearn-aml-cli/Allfiles/Labs/02/sweep-job/src/main.py**. The dataset used is the registered dataset **diabetes-data**. 33 | 34 | There are two hyperparameter values: 35 | 36 | - Learning rate: with search space [0.01, 0.1, 1.0] 37 | - N estimators: with search space [10, 100] 38 | 39 | You'll use the grid sampling method on these hyperparameters, which means you'll try out all possible combinations of values. As a result, you'll train six models as part of the sweep job. Recall that each individual model will be listed as a child run, and the details of the overview of the sweep job will be stored with the main experiment run. 40 | 41 | To run the sweep job: 42 | 43 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo, if they are not opened yet: 44 | 45 | ```azurecli 46 | code . 47 | ``` 48 | 49 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/02/sweep-job** and open **sweep-job.yml** by selecting the file. 50 | 51 | 1. Run the job by using the following command: 52 | 53 | ```azurecli 54 | az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/02/sweep-job/sweep-job.yml 55 | ``` 56 | 57 | 1. Switch to the browser tab with Azure Machine Learning Studio. Go to the **Jobs** page and select the **diabetes-sweep-example** experiment. 58 | 1. Monitor the job and refresh the view if necessary. Once completed, you can explore the details of the job which are stored in the experiment run. 59 | 60 | ## Clean up resources 61 | 62 | The compute cluster will automatically scale down to 0 nodes, so there is no need to stop the cluster. 63 | 64 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first. 65 | 66 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI: 67 | 68 | ```azurecli 69 | az ml workspace delete 70 | ``` 71 | -------------------------------------------------------------------------------- /Instructions/Labs/04-use-mlflow-jobs.md: -------------------------------------------------------------------------------- 1 | --- 2 | lab: 3 | title: 'Lab: Track Azure ML jobs with MLflow' 4 | module: 'Module: Use MLflow with Azure ML jobs submitted with CLI (v2)' 5 | --- 6 | 7 | # Track Azure ML jobs with MLflow 8 | 9 | In this exercise, you will train a model with a Python script. The Python script uses **MLflow** to track parameters, metrics, and artifacts. 10 | 11 | ## Prerequisites 12 | 13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment. 14 | 15 | You'll run all commands in this lab from the Azure Cloud Shell. If this is your first time using the cloud shell, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](Instructions/Labs/01-create-workspace.md) lab to set up the cloud shell environment. 16 | 17 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account. 18 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell. 19 | 1. If your compute instance is stopped. Start the instance again by using the following command. Change `` to your compute instance name before running the code: 20 | 21 | ```azurecli 22 | az ml compute start --name "" 23 | ``` 24 | 25 | 1. To confirm that the instance is now in a **Running** state, open another tab in your browser and navigate to the [Azure Machine Learning Studio](https://ml.azure.com). Open the **Compute** tab and select **Compute instances**. 26 | 27 | ## Track a model 28 | 29 | You'll train a Logistic Regression model to classify whether someone has diabetes. You can track things like input parameter such as the value of the regularization rate used to train the model. As a result, you want to know the accuracy of the model and store a confusion matrix to explain the accuracy. 30 | 31 | To track the input and output of a model, we can: 32 | 33 | - Enable autologging using `mlflow.autolog()` 34 | - Use logging functions to track custom metrics using `mlflow.log_*` 35 | 36 | Note that to do either, we have to include the `mlflow` and `azureml-mlflow` packages in the environment used during training. The registered environment **basic-env-scikit** includes these two packages. 37 | 38 | ### Enable autologging 39 | 40 | You'll submit a job from the Azure Cloud Shell with the CLI (v2), using a Python script. 41 | 42 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo: 43 | 44 | ```azurecli 45 | code . 46 | ``` 47 | 48 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/03/mlflow-job** and open **mlflow-job.yml** by selecting the file. 49 | 1. Change the **compute** value by replacing `` with the name of your compute instance. 50 | 1. Save the file by selecting the top right corner of the text editor and then selecting **Save**. 51 | 1. Note that you'll run the **mlflow-autolog.py** script that is located in the **src** folder. Navigate to that folder and open the file to explore it. Find the `mlflow.autolog()` method. 52 | 1. Run the job by using the following command: 53 | 54 | ```azurecli 55 | az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/03/mlflow-job/mlflow-job.yml 56 | ``` 57 | 58 | 1. Switch to your Azure Machine Learning Studio browser tab. Go to the **Jobs** page, in the **All experiments** tab and select the **diabetes-mlflow-example** experiment. 59 | 1. Monitor the job and refresh the view if necessary. Once completed, you can explore the details of the job which are stored in the experiment run. 60 | 61 | ### Use logging functions to track custom metrics 62 | 63 | Instead of using the autologging feature of Mlflow, you can also create and track your own parameters, metrics, and artifacts. For this, we'll use another training script. 64 | 65 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/03/mlflow-job** and open **mlflow-job.yml** by selecting the file. 66 | 1. Now, you want to run the **custom-mlflow.py** script that is located in the **src** folder. In the **mlflow-job.yml** file, remove the **mlflow-autolog.py** file, and replace with **custom-mlflow.py**. Don't forget to save the YAML file! 67 | 1. To explore the training script, navigate to the **src** folder and open the file to explore it. Find the `mlflow.log_param()`, `mlflow.metric()`, and `mlflow.artifact()` methods. 68 | 1. Run the job by using the following command: 69 | 70 | ```azurecli 71 | az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/03/mlflow-job/mlflow-job.yml 72 | ``` 73 | 74 | 1. Go to the Azure Machine Learning Studio and again, locate the **diabetes-mlflow-example** experiment. Open the newest run to monitor the job. Once completed, you'll find the regularization rate in the **Overview** tab under **Parameters**. The accuracy score is listed under **Metrics** and the confusion matrix can be found under **Images**. 75 | 76 | ## Clean up resources 77 | 78 | When you're finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription. 79 | 80 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary. 81 | 82 | ```azurecli 83 | az ml compute stop --name "testdev-vm" --no-wait 84 | ``` 85 | 86 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first. 87 | 88 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI: 89 | 90 | ```azurecli 91 | az ml workspace delete 92 | ``` 93 | -------------------------------------------------------------------------------- /Instructions/Labs/05-deploy-managed-endpoint.md: -------------------------------------------------------------------------------- 1 | --- 2 | lab: 3 | title: 'Lab: Deploy an MLflow model to a managed online endpoint' 4 | module: 'Module: Deploy an Azure Machine Learning model to a managed endpoint with CLI (v2)' 5 | --- 6 | 7 | # Deploy a model to a managed online endpoint 8 | 9 | In this exercise, you will deploy an MLflow model to a managed online endpoint. 10 | 11 | ## Prerequisites 12 | 13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment. 14 | 15 | ## Deploy a model 16 | 17 | A model has been trained to predict whether someone has diabetes. To consume the model, you want to deploy it to a managed online endpoint. The endpoint can be called from an application where a patient's information can be entered, after which the model can decide whether the patient is probable to have diabetes. 18 | 19 | To deploy the model using the CLI (v2), you first create an endpoint. 20 | 21 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo: 22 | 23 | ```azurecli 24 | code . 25 | ``` 26 | 27 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint** and open **create-endpoint.yml** by selecting the file. 28 | 1. Explore the contents of the file. Note that your endpoint will use key-based authentication. 29 | 1. Use the following command to create the new endpoint.Before you run the command, replace `` with a name that is unique in the Azure region: 30 | 31 | ```azurecli 32 | az ml online-endpoint create --name -f ./mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint/create-endpoint.yml 33 | ``` 34 | 35 | 1. Next, you'll create the deployment. In the same folder **mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint**, find and open the YAML configuration file **mlflow-deployment.yml** for the deployment. 36 | 1. The deployment configuration refers to the endpoint configuration. In addition, it specifies how the model should be registered, what kind of compute should be used for the inference configuration, and where it can find the model assets. The MLflow model assets are stored in the **model** folder. 37 | 1. To deploy the model, run the following command. Before you run the command, replace the `` with the name you previously created and create a new ``: 38 | 39 | ```azurecli 40 | az ml online-deployment create --name --endpoint -f ./mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint/mlflow-deployment.yml --all-traffic 41 | ``` 42 | 43 | 1. Deployment may take some time, and progress will be visible in the Azure Cloud Shell. You can also view the endpoint in the Azure Machine Learning Studio, in the **Endpoints** tab, under **Real-time endpoints**. 44 | 45 | ## Test the endpoint 46 | 47 | Once deployment is completed, you can test and consume the endpoint. Let's try testing it with two data points. 48 | 49 | 1. In the **mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint** folder, you can find the **sample-data.json** that contains two data points. 50 | 1. Run the following command to invoke the endpoint to predict for these two patients whether they have diabetes. Replace the `` with the name you previously created before you run the command: 51 | 52 | ```azurecli 53 | az ml online-endpoint invoke --name --request-file ./mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint/sample-data.json 54 | ``` 55 | 56 | 1. As a result, you will see either a 1 or a 0 for each data point. A 1 means the patient is likely to have diabetes, a 0 means the patient is likely not to have diabetes. 57 | 1. Feel free to play around with the sample data and run the command again to see different results! 58 | 59 | ## Clean up resources 60 | 61 | When you're finished exploring Azure Machine Learning, delete your endpoint to avoid unnecessary charges in your Azure subscription. 62 | 63 | You can delete an endpoint and all underlying deployments by using the following command. Rememeber to replace the `` with the name you previously created before you run the command: 64 | 65 | ```azurecli 66 | az ml online-endpoint delete --name --yes --no-wait 67 | ``` 68 | 69 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI: 70 | 71 | ```azurecli 72 | az ml workspace delete 73 | ``` 74 | -------------------------------------------------------------------------------- /Instructions/Labs/06-create-pipeline.md: -------------------------------------------------------------------------------- 1 | --- 2 | lab: 3 | title: 'Lab: Run a pipeline with components' 4 | module: 'Module: Run component-based pipelines in Azure Machine Learning with CLI (v2)' 5 | --- 6 | 7 | # Run a pipeline with components 8 | 9 | In this exercise, you will build a pipeline with components. The pipeline will be submitted with the CLI (v2). First, you'll run a pipeline. Next, you'll create components in the Azure Machine Learning workspace so that they can be reused. Finally, you'll create a pipeline with the Designer in the Azure Machine Learning Studio to experience how you can reuse components to create new pipelines. 10 | 11 | ## Prerequisites 12 | 13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment. 14 | 15 | You'll run all commands in this lab from the Azure Cloud Shell. 16 | 17 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account. 18 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell. 19 | 1. If your compute instance is stopped. Start the instance again by using the following command. Change to your compute instance name before running the code: 20 | ```azurecli 21 | az ml compute start --name "" 22 | ``` 23 | 24 | ## Run a pipeline 25 | 26 | You can train a model by running a job that refers to one training script. To train a model as part of a pipeline, you can use Azure Machine Learning to run multiple scripts. The configuration of the pipeline is defined in a YAML file. 27 | 28 | In this exercise, you'll start by preprocessing the data and training a Decision Tree model. To explore the pipeline job definition **job.yml** navigate to **mslearn-aml-cli/Allfiles/Labs/05/job.yml**. The dataset used is the **diabetes-data** dataset registered to the Azure Machine Learning workspace in the set-up. 29 | 30 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo. 31 | 32 | ```azurecli 33 | code . 34 | ``` 35 | 36 | 2. Navigate to **mslearn-aml-cli/Allfiles/Labs/05/** and open **job.yml** by selecting the file. 37 | 3. Change the **compute** value: replace `` with the name of your compute instance. 38 | 4. Run the job by using the following command: 39 | 40 | ```azurecli 41 | az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/05/job.yml 42 | ``` 43 | 44 | 5. Open another tab in your browser and open the Azure Machine Learning Studio. Go to the **Experiments** page and locate the **diabetes-pipeline-example** experiment. Open the run to monitor the job. Refresh the view if necessary. Once completed, you can explore the details of the job and of each component by expanding the **Child runs**. 45 | 46 | ## Create components 47 | 48 | To reuse the pipeline's components, you can create the component in the Azure Machine Learning workspace. In addition to the components that were part of the pipeline you've just ran, you'll create another new component you haven't used before. You'll use the new component in the next part. 49 | 50 | 1. Each component is created separately. Run the following code to create the components: 51 | 52 | ```azurecli 53 | az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/summary-stats.yml 54 | az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/fix-missing-data.yml 55 | az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/normalize-data.yml 56 | az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/train-decision-tree.yml 57 | az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/train-logistic-regression.yml 58 | ``` 59 | 60 | 2. Navigate to the **Components** page in the Azure Machine Learning Studio. All created components should show in the list here. 61 | 62 | ## Create a new pipeline with the Designer 63 | 64 | You can reuse the components by creating a pipeline with the Designer. You can recreate the same pipeline, or change the algorithm you use to train a model by replacing the component used to train the model. 65 | 66 | 1. Navigate to the **Designer** page in the Azure Machine Learning Studio. 67 | 2. Select the **Custom** tab at the top of the page. 68 | 3. Create a new empty pipeline using custom components. 69 | 4. Rename the pipeline to *Train-Diabetes-Classifier*. 70 | 5. In the left menu, select the **Data** tab. 71 | 6. Drag and drop the **diabetes-data** component to the canvas. 72 | 7. In the left menu, select the **Component** tab. 73 | 8. Drag and drop the **Remove Empty Rows** component on to the canvas, below the **diabetes-data**. Connect the output of the data to the input of the new component. 74 | 9. Drag and drop the **Normalize numerical columns** component on to the canvas, below the **Remove empty rows**. Connect the output of the previous component to the input of the new component. 75 | 10. Drag and drop the **Train a Decision Tree Classifier Model** component on to the canvas, below the **Normalize numerical columns**. Connect the output of the previous component to the input of the new component. Your pipeline should look like this: 76 | ![Decision Tree Pipeline in Designer](media/designer-pipeline-decision.png) 77 | 78 | 11. Select **Configure & Submit** to setup the pipeline job. 79 | 12. On the **Basics** page create a new experiment, name it *diabetes-designer-pipeline* and select **Next**. 80 | 13. On the **Inputs & outputs** page select **Next**. 81 | 14. On the **Runtime settings** page set the default compute target to use the compute instance you created and select **Next**. 82 | 15. On the **Review + Submit** page select **Submit** and wait for the job to complete. 83 | 84 | ## Update the pipeline with the Designer 85 | 86 | You have now trained the model with a similar pipeline as before (only omitting the calculation of the summary statistics). You can change the algorithm you use to train the model by replacing the last component: 87 | 88 | 1. Remove the **Train a Decision Tree Classifier Model** component from the pipeline. 89 | 2. Drag and drop the **Train a Logistic Regression Classifier Model** component on to the canvas, below the **Remove empty rows**. Connect the output of the previous component to the input of the new component. 90 | 91 | The new model training component expects a numeric input, namely the regularization rate. 92 | 93 | 3. Select the **Train a Logistic Regression Model** component and enter **1** for the **regularization_rate**. Your pipeline should look like this: 94 | ![Logistic Regression Pipeline in Designer](media/designer-pipeline-regression.png) 95 | 4. Submit the pipeline. Select the existing experiment named *diabetes-designer-pipeline*. Once completed, you can review the metrics and compare it with the previous pipeline to see if the model's performance has improved. 96 | 97 | ## Clean up resources 98 | 99 | When you're finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription. 100 | 101 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary. 102 | 103 | ```azurecli 104 | az ml compute stop --name "testdev-vm" --no-wait 105 | ``` 106 | 107 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first. 108 | 109 | To completely delete the Azure Machine Learning workspace, you can use the following command in the CLI: 110 | 111 | ```azurecli 112 | az ml workspace delete 113 | ``` 114 | -------------------------------------------------------------------------------- /Instructions/Labs/media/designer-pipeline-decision.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MicrosoftLearning/mslearn-aml-cli/191e8804f8005b2cad8eb4230ffb7cf5ada14068/Instructions/Labs/media/designer-pipeline-decision.png -------------------------------------------------------------------------------- /Instructions/Labs/media/designer-pipeline-regression.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MicrosoftLearning/mslearn-aml-cli/191e8804f8005b2cad8eb4230ffb7cf5ada14068/Instructions/Labs/media/designer-pipeline-regression.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Sidney Andrews 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /_build.yml: -------------------------------------------------------------------------------- 1 | name: '$(Date:yyyyMMdd)$(Rev:.rr)' 2 | jobs: 3 | - job: build_markdown_content 4 | displayName: 'Build Markdown Content' 5 | workspace: 6 | clean: all 7 | pool: 8 | vmImage: 'Ubuntu 16.04' 9 | container: 10 | image: 'microsoftlearning/markdown-build:latest' 11 | steps: 12 | - task: Bash@3 13 | displayName: 'Build Content' 14 | inputs: 15 | targetType: inline 16 | script: | 17 | cp /{attribution.md,template.docx,package.json,package.js} . 18 | npm install 19 | node package.js --version $(Build.BuildNumber) 20 | - task: GitHubRelease@0 21 | displayName: 'Create GitHub Release' 22 | inputs: 23 | gitHubConnection: 'github-microsoftlearning-organization' 24 | repositoryName: '$(Build.Repository.Name)' 25 | tagSource: manual 26 | tag: 'v$(Build.BuildNumber)' 27 | title: 'Version $(Build.BuildNumber)' 28 | releaseNotesSource: input 29 | releaseNotes: '# Version $(Build.BuildNumber) Release' 30 | assets: '$(Build.SourcesDirectory)/out/*.zip' 31 | assetUploadMode: replace 32 | - task: PublishBuildArtifacts@1 33 | displayName: 'Publish Output Files' 34 | inputs: 35 | pathtoPublish: '$(Build.SourcesDirectory)/out/' 36 | artifactName: 'Lab Files' 37 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | remote_theme: MicrosoftLearning/Jekyll-Theme 2 | exclude: 3 | - readme.md 4 | - .github/ 5 | header_pages: 6 | - index.html 7 | author: Microsoft Learning 8 | twitter_username: mslearning 9 | github_username: MicrosoftLearning 10 | plugins: 11 | - jekyll-sitemap 12 | - jekyll-mentions 13 | - jemoji 14 | markdown: kramdown 15 | kramdown: 16 | syntax_highlighter_opts: 17 | disable : true 18 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Online Hosted Instructions 3 | permalink: index.html 4 | layout: home 5 | --- 6 | 7 | # Content Directory 8 | 9 | Hyperlinks to each of the lab exercises for the Learn modules are listed below. 10 | 11 | ## Labs 12 | 13 | {% assign labs = site.pages | where_exp:"page", "page.url contains '/Instructions/Labs'" %} 14 | | Module | Lab | 15 | | --- | --- | 16 | {% for activity in labs %}| {{ activity.lab.module }} | [{{ activity.lab.title }}{% if activity.lab.type %} - {{ activity.lab.type }}{% endif %}]({{ site.github.url }}{{ activity.url }}) | 17 | {% endfor %} 18 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # Train models in Azure Machine Learning with the CLI (v2) 2 | 3 | This repository contains the hands-on lab exercises for the Microsoft Learning Path Train models in Azure Machine Learning with the CLI (v2). The Learning Path consists of self-paced modules on Microsoft Learn. The labs are designed to accompany the learning materials and enable you to practice using the technologies described them. 4 | 5 | You can view the instructions for the lab exercises at https://aka.ms/aml-cli2. 6 | 7 | ## What are we doing? 8 | 9 | - To support this course, we will need to make frequent updates to the course content to keep it current with the Azure services used in the course. We are publishing the lab instructions and lab files on GitHub to allow for open contributions between the course authors and MCTs to keep the content current with changes in the Azure platform. 10 | 11 | - We hope that this brings a sense of collaboration to the labs like we've never had before - when Azure changes and you find it first during a live delivery, go ahead and make an enhancement right in the lab source. 12 | 13 | ## How do I contribute? 14 | 15 | - Anyone can submit a pull request to the code or content in the GitHub repro, Microsoft and the course author will triage and include content and lab code changes as needed. 16 | 17 | - You can submit bugs, changes, improvement and ideas. Find a new Azure feature before we have? Submit a new demo! 18 | --------------------------------------------------------------------------------