├── .github
    ├── CONTRIBUTING.md
    ├── ISSUE_TEMPLATE.md
    └── PULL_REQUEST_TEMPLATE.md
├── .vscode
    └── settings.json
├── Allfiles
    └── Labs
    │   ├── 01
    │       ├── basic-env.yml
    │       ├── conda-envs
    │       │   └── basic-env-cpu.yml
    │       ├── data-local-path.yml
    │       └── data
    │       │   └── diabetes.csv
    │   ├── 02
    │       ├── basic-job
    │       │   ├── basic-job.yml
    │       │   └── src
    │       │   │   ├── diabetes.csv
    │       │   │   └── main.py
    │       ├── input-data-job
    │       │   ├── data-job.yml
    │       │   └── src
    │       │   │   └── main.py
    │       └── sweep-job
    │       │   ├── src
    │       │       ├── diabetes.csv
    │       │       └── main.py
    │       │   └── sweep-job.yml
    │   ├── 03
    │       └── mlflow-job
    │       │   ├── mlflow-job.yml
    │       │   └── src
    │       │       ├── custom-mlflow.py
    │       │       └── mlflow-autolog.py
    │   ├── 04
    │       └── mlflow-endpoint
    │       │   ├── create-endpoint.yml
    │       │   ├── mlflow-deployment.yml
    │       │   ├── model
    │       │       ├── MLmodel
    │       │       ├── conda.yaml
    │       │       ├── model.pkl
    │       │       └── requirements.txt
    │       │   └── sample-data.json
    │   └── 05
    │       ├── fix-missing-data.yml
    │       ├── job.yml
    │       ├── normalize-data.yml
    │       ├── src
    │           ├── fix-missing-data.py
    │           ├── normalize-data.py
    │           ├── summary-stats.py
    │           ├── train-decision-tree.py
    │           └── train-logistic-regression.py
    │       ├── summary-stats.yml
    │       ├── train-decision-tree.yml
    │       └── train-logistic-regression.yml
├── Instructions
    └── Labs
    │   ├── 01-create-workspace.md
    │   ├── 02-run-python-job.md
    │   ├── 03-run-sweep-job.md
    │   ├── 04-use-mlflow-jobs.md
    │   ├── 05-deploy-managed-endpoint.md
    │   ├── 06-create-pipeline.md
    │   └── media
    │       ├── designer-pipeline-decision.png
    │       └── designer-pipeline-regression.png
├── LICENSE
├── _build.yml
├── _config.yml
├── index.md
└── readme.md


/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to Microsoft Learning Repositories
 2 | 
 3 | MCT contributions are a key part of keeping the lab and demo content current as the Azure platform changes. We want to make it as easy as possible for you to contribute changes to the lab files. Here are a few guidelines to keep in mind as you contribute changes.
 4 | 
 5 | ## GitHub Use & Purpose
 6 | 
 7 | Microsoft Learning is using GitHub to publish the lab steps and lab scripts for courses that cover cloud services like Azure. Using GitHub allows the course’s authors and MCTs to keep the lab content current with Azure platform changes. Using GitHub allows the MCTs to provide feedback and suggestions for lab changes, and then the course authors can update lab steps and scripts quickly and relatively easily.
 8 | 
 9 | > When you prepare to teach these courses, you should ensure that you are using the latest lab steps and scripts by downloading the appropriate files from GitHub. GitHub should not be used to discuss technical content in the course, or how to prep. It should only be used to address changes in the labs.
10 | 
11 | It is strongly recommended that MCTs and Partners access these materials and in turn, provide them separately to students.  Pointing students directly to GitHub to access Lab steps as part of an ongoing class will require them to access yet another UI as part of the course, contributing to a confusing experience for the student. An explanation to the student regarding why they are receiving separate Lab instructions can highlight the nature of an always-changing cloud-based interface and platform. Microsoft Learning support for accessing files on GitHub and support for navigation of the GitHub site is limited to MCTs teaching this course only.
12 | 
13 | > As an alternative to pointing students directly to the GitHub repository, you can point students to the GitHub Pages website to view the lab instructions. The URL for the GitHub Pages website can be found at the top of the repository.
14 | 
15 | To address general comments about the course and demos, or how to prepare for a course delivery, please use the existing MCT forums.
16 | 
17 | ## Additional Resources
18 | 
19 | A user guide has been provided for MCTs who are new to GitHub. It provides steps for connecting to GitHub, downloading and printing course materials, updating the scripts that students use in labs, and explaining how you can help ensure that this course’s content remains current.
20 | 
21 | <https://microsoftlearning.github.io/MCT-User-Guide/>
22 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | # Module: 00
 2 | ## Lab/Demo: 00
 3 | ### Task: 00
 4 | #### Step: 00
 5 | 
 6 | Description of issue
 7 | 
 8 | Repro steps:
 9 | 
10 | 1.
11 | 1.
12 | 1.


--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | # Module: 00
 2 | ## Lab/Demo: 00
 3 | 
 4 | Fixes # .
 5 | 
 6 | Changes proposed in this pull request:
 7 | 
 8 | -
 9 | -
10 | -


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 |     "markdownlint.config": {
3 |         "MD028": false,
4 |         "MD025": {
5 |             "front_matter_title": ""
6 |         }
7 |     }
8 | }


--------------------------------------------------------------------------------
/Allfiles/Labs/01/basic-env.yml:
--------------------------------------------------------------------------------
1 | $schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
2 | name: basic-env-scikit
3 | image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
4 | conda_file: conda-envs/basic-env-cpu.yml
5 | description: Environment created from a Docker image plus Conda environment.


--------------------------------------------------------------------------------
/Allfiles/Labs/01/conda-envs/basic-env-cpu.yml:
--------------------------------------------------------------------------------
 1 | name: basic-env-cpu
 2 | channels:
 3 |   - conda-forge
 4 | dependencies:
 5 |   - python=3.7
 6 |   - scikit-learn
 7 |   - pandas
 8 |   - numpy
 9 |   - matplotlib
10 |   - pip
11 |   - pip:
12 |     - azureml-defaults
13 |     - azureml-core
14 |     - azureml-mlflow
15 |     - mlflow
16 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/01/data-local-path.yml:
--------------------------------------------------------------------------------
1 | $schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
2 | name: diabetes-data
3 | version: 1
4 | path: data
5 | description: Dataset pointing to diabetes data stored as CSV on local computer. Data is uploaded to default datastore.
6 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/02/basic-job/basic-job.yml:
--------------------------------------------------------------------------------
1 | $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
2 | code: src
3 | command: >-
4 |   python main.py 
5 | environment: azureml:basic-env-scikit@latest
6 | compute: azureml:<your-compute-instance-name>
7 | experiment_name: diabetes-example
8 | description: Train a Logistic Regression classification model on the diabetes dataset that is stored locally.


--------------------------------------------------------------------------------
/Allfiles/Labs/02/basic-job/src/main.py:
--------------------------------------------------------------------------------
 1 | # Import libraries
 2 | import mlflow
 3 | import argparse
 4 | 
 5 | import pandas as pd
 6 | from sklearn.model_selection import train_test_split
 7 | from sklearn.linear_model import LogisticRegression
 8 | 
 9 | # define functions
10 | def main(args):
11 |     # enable auto logging
12 |     mlflow.autolog()
13 |     
14 |     # read data
15 |     df = pd.read_csv('diabetes.csv')
16 |     
17 |     # process data
18 |     X_train, X_test, y_train, y_test = process_data(df)
19 | 
20 |     # train model
21 |     model = train_model(args.reg_rate, X_train, X_test, y_train, y_test)
22 |     
23 | def process_data(df):
24 |     # split dataframe into X and y
25 |     X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
26 | 
27 |     # train/test split
28 |     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
29 | 
30 |     # return splits and encoder
31 |     return X_train, X_test, y_train, y_test
32 | 
33 | def train_model(reg_rate, X_train, X_test, y_train, y_test):
34 |     # train model
35 |     model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)
36 | 
37 |     # return model
38 |     return model
39 | 
40 | def parse_args():
41 |     # setup arg parser
42 |     parser = argparse.ArgumentParser()
43 | 
44 |     # add arguments
45 |     parser.add_argument("--reg-rate", dest="reg_rate", type=float, default=0.01)
46 | 
47 |     # parse args
48 |     args = parser.parse_args()
49 | 
50 |     # return args
51 |     return args
52 | 
53 | # run script
54 | if __name__ == "__main__":
55 |     # add space in logs
56 |     print("\n\n")
57 |     print("*" * 60)
58 | 
59 |     # parse args
60 |     args = parse_args()
61 | 
62 |     # run main function
63 |     main(args)
64 | 
65 |     # add space in logs
66 |     print("*" * 60)
67 |     print("\n\n")


--------------------------------------------------------------------------------
/Allfiles/Labs/02/input-data-job/data-job.yml:
--------------------------------------------------------------------------------
 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
 2 | code: src
 3 | command: >-
 4 |   python main.py 
 5 |   --diabetes-csv ${{inputs.diabetes}}
 6 | inputs:
 7 |   diabetes:
 8 |     path: azureml:diabetes-data:1
 9 |     mode: ro_mount
10 | environment: azureml:basic-env-scikit@latest
11 | compute: azureml:<your-compute-instance-name>
12 | experiment_name: diabetes-data-example
13 | description: Train a classification model on diabetes data using a registered dataset as input.


--------------------------------------------------------------------------------
/Allfiles/Labs/02/input-data-job/src/main.py:
--------------------------------------------------------------------------------
 1 | # Import libraries
 2 | import mlflow
 3 | import argparse
 4 | import glob
 5 | 
 6 | import pandas as pd
 7 | from sklearn.model_selection import train_test_split
 8 | from sklearn.linear_model import LogisticRegression
 9 | 
10 | # define functions
11 | def main(args):
12 |     # enable auto logging
13 |     mlflow.autolog()
14 |     
15 |     # read data
16 |     data_path = args.diabetes_csv
17 |     all_files = glob.glob(data_path + "/*.csv")
18 |     df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
19 |     
20 |     # process data
21 |     X_train, X_test, y_train, y_test = process_data(df)
22 | 
23 |     # train model
24 |     model = train_model(args.reg_rate, X_train, X_test, y_train, y_test)
25 |     
26 | def process_data(df):
27 |     # split dataframe into X and y
28 |     X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
29 | 
30 |     # train/test split
31 |     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
32 | 
33 |     # return splits and encoder
34 |     return X_train, X_test, y_train, y_test
35 | 
36 | def train_model(reg_rate, X_train, X_test, y_train, y_test):
37 |     # train model
38 |     model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)
39 | 
40 |     # return model
41 |     return model
42 | 
43 | def parse_args():
44 |     # setup arg parser
45 |     parser = argparse.ArgumentParser()
46 | 
47 |     # add arguments
48 |     parser.add_argument("--diabetes-csv", dest='diabetes_csv', type=str)
49 |     parser.add_argument("--reg-rate", dest='reg_rate', type=float, default=0.01)
50 | 
51 |     # parse args
52 |     args = parser.parse_args()
53 | 
54 |     # return args
55 |     return args
56 | 
57 | # run script
58 | if __name__ == "__main__":
59 |     # add space in logs
60 |     print("\n\n")
61 |     print("*" * 60)
62 | 
63 |     # parse args
64 |     args = parse_args()
65 | 
66 |     # run main function
67 |     main(args)
68 | 
69 |     # add space in logs
70 |     print("*" * 60)
71 |     print("\n\n")


--------------------------------------------------------------------------------
/Allfiles/Labs/02/sweep-job/src/main.py:
--------------------------------------------------------------------------------
 1 | # Import libraries
 2 | import mlflow
 3 | import argparse
 4 | import glob
 5 | 
 6 | import pandas as pd
 7 | from sklearn.model_selection import train_test_split
 8 | from sklearn.ensemble import GradientBoostingClassifier
 9 | 
10 | # define functions
11 | def main(args):
12 |     # enable auto logging
13 |     mlflow.autolog()
14 |     
15 |     params = {
16 |         "learning_rate": args.learning_rate,
17 |         "n_estimators": args.n_estimators,
18 |     }
19 | 
20 |     # read data
21 |     data_path = args.diabetes_csv
22 |     all_files = glob.glob(data_path + "/*.csv")
23 |     df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
24 |     
25 |     # process data
26 |     X_train, X_test, y_train, y_test = process_data(df)
27 | 
28 |     # train model
29 |     model = train_model(params, X_train, X_test, y_train, y_test)
30 |     
31 | def process_data(df):
32 |     # split dataframe into X and y
33 |     X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
34 | 
35 |     # train/test split
36 |     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
37 | 
38 |     # return splits and encoder
39 |     return X_train, X_test, y_train, y_test
40 | 
41 | def train_model(params, X_train, X_test, y_train, y_test):
42 |     # train model
43 |     model = GradientBoostingClassifier(**params)
44 |     model = model.fit(X_train, y_train)
45 | 
46 |     # return model
47 |     return model
48 | 
49 | def parse_args():
50 |     # setup arg parser
51 |     parser = argparse.ArgumentParser()
52 | 
53 |     # add arguments
54 |     parser.add_argument("--diabetes-csv", type=str)
55 |     parser.add_argument("--learning-rate", dest='learning_rate', type=float, default=0.1)
56 |     parser.add_argument("--n-estimators", dest='n_estimators', type=int, default=100)
57 | 
58 |     # parse args
59 |     args = parser.parse_args()
60 | 
61 |     # return args
62 |     return args
63 | 
64 | # run script
65 | if __name__ == "__main__":
66 |     # add space in logs
67 |     print("\n\n")
68 |     print("*" * 60)
69 | 
70 |     # parse args
71 |     args = parse_args()
72 | 
73 |     # run main function
74 |     main(args)
75 | 
76 |     # add space in logs
77 |     print("*" * 60)
78 |     print("\n\n")


--------------------------------------------------------------------------------
/Allfiles/Labs/02/sweep-job/sweep-job.yml:
--------------------------------------------------------------------------------
 1 | $schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json
 2 | type: sweep
 3 | sampling_algorithm: grid
 4 | trial:
 5 |   code: src
 6 |   command: >-
 7 |     python main.py
 8 |     --diabetes-csv ${{inputs.diabetes}} 
 9 |     --learning-rate ${{search_space.learning_rate}}
10 |     --n-estimators ${{search_space.n_estimators}}
11 |   environment: azureml:basic-env-scikit@latest
12 | inputs:
13 |   diabetes:
14 |     path: azureml:diabetes-data:1
15 |     mode: ro_mount
16 | compute: azureml:aml-cluster
17 | search_space:
18 |   learning_rate:
19 |     type: choice
20 |     values: [0.01, 0.1, 1.0]
21 |   n_estimators:
22 |     type: choice
23 |     values: [10, 100]
24 | objective:
25 |   primary_metric: training_roc_auc_score
26 |   goal: maximize
27 | limits:
28 |   max_total_trials: 6
29 |   max_concurrent_trials: 3
30 |   timeout: 3600
31 | experiment_name: diabetes-sweep-example
32 | description: Run a hyperparameter sweep job for classification on diabetes dataset.


--------------------------------------------------------------------------------
/Allfiles/Labs/03/mlflow-job/mlflow-job.yml:
--------------------------------------------------------------------------------
 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
 2 | code: src
 3 | command: >-
 4 |   python mlflow-autolog.py 
 5 |   --diabetes-csv ${{inputs.diabetes}}
 6 | inputs:
 7 |   diabetes:
 8 |     path: azureml:diabetes-data:1
 9 |     mode: ro_mount
10 | environment: azureml:basic-env-scikit@latest
11 | compute: azureml:<your-compute-instance-name>
12 | experiment_name: diabetes-mlflow-example
13 | description: Train a classification model on diabetes data using a registered dataset as input. Use MLflow to track parameter, metric, and artifact.


--------------------------------------------------------------------------------
/Allfiles/Labs/03/mlflow-job/src/custom-mlflow.py:
--------------------------------------------------------------------------------
 1 | # Import libraries
 2 | import mlflow
 3 | import argparse
 4 | import glob
 5 | import joblib
 6 | 
 7 | import pandas as pd
 8 | import numpy as np
 9 | import matplotlib.pyplot as plt
10 | from sklearn.model_selection import train_test_split
11 | from sklearn.linear_model import LogisticRegression
12 | from sklearn.metrics import confusion_matrix
13 | 
14 | # define functions
15 | def main(args):
16 |     # read data
17 |     data_path = args.diabetes_csv
18 |     all_files = glob.glob(data_path + "/*.csv")
19 |     df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
20 | 
21 |     # process data
22 |     X_train, X_test, y_train, y_test = process_data(df)
23 | 
24 |     # train model
25 |     reg_rate = args.reg_rate
26 |     mlflow.log_param("Regularization rate", reg_rate)
27 |     model = train_model(reg_rate, X_train, X_test, y_train, y_test)
28 |   
29 | def process_data(df):
30 |     # split dataframe into X and y
31 |     X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
32 | 
33 |     # train/test split
34 |     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
35 | 
36 |     # return splits and encoder
37 |     return X_train, X_test, y_train, y_test
38 | 
39 | def train_model(reg_rate, X_train, X_test, y_train, y_test):
40 |     # train model
41 |     model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)
42 | 
43 |     # calculate accuracy
44 |     y_pred = model.predict(X_test)
45 |     acc = np.average(y_pred == y_test)
46 |     mlflow.log_metric("Accuracy", np.float(acc))
47 | 
48 |     # create confusion matrix
49 |     conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred)
50 |     fig, ax = plt.subplots(figsize=(7.5, 7.5))
51 |     ax.matshow(conf_matrix, cmap=plt.cm.Blues, alpha=0.3)
52 |     for i in range(conf_matrix.shape[0]):
53 |         for j in range(conf_matrix.shape[1]):
54 |             ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center', size='xx-large')
55 |     
56 |     plt.xlabel('Predictions', fontsize=18)
57 |     plt.ylabel('Actuals', fontsize=18)
58 |     plt.title('Confusion Matrix', fontsize=18)
59 |     plt.savefig("ConfusionMatrix.png")
60 |     mlflow.log_artifact("ConfusionMatrix.png")
61 | 
62 |     # return model
63 |     return model
64 | 
65 | def parse_args():
66 |     # setup arg parser
67 |     parser = argparse.ArgumentParser()
68 | 
69 |     # add arguments
70 |     parser.add_argument("--diabetes-csv", dest='diabetes_csv', type=str)
71 |     parser.add_argument("--reg-rate", dest='reg_rate', type=float, default=0.01)
72 | 
73 |     # parse args
74 |     args = parser.parse_args()
75 | 
76 |     # return args
77 |     return args
78 | 
79 | # run script
80 | if __name__ == "__main__":
81 |     # add space in logs
82 |     print("\n\n")
83 |     print("*" * 60)
84 | 
85 |     # parse args
86 |     args = parse_args()
87 | 
88 |     # run main function
89 |     main(args)
90 | 
91 |     # add space in logs
92 |     print("*" * 60)
93 |     print("\n\n")


--------------------------------------------------------------------------------
/Allfiles/Labs/03/mlflow-job/src/mlflow-autolog.py:
--------------------------------------------------------------------------------
 1 | # Import libraries
 2 | import mlflow
 3 | import argparse
 4 | import glob
 5 | 
 6 | import pandas as pd
 7 | from sklearn.model_selection import train_test_split
 8 | from sklearn.linear_model import LogisticRegression
 9 | 
10 | # define functions
11 | def main(args):
12 |     # enable auto logging
13 |     mlflow.autolog()
14 |     
15 |     # read data
16 |     data_path = args.diabetes_csv
17 |     all_files = glob.glob(data_path + "/*.csv")
18 |     df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
19 |     
20 |     # process data
21 |     X_train, X_test, y_train, y_test = process_data(df)
22 | 
23 |     # train model
24 |     model = train_model(args.reg_rate, X_train, X_test, y_train, y_test)
25 |     
26 | def process_data(df):
27 |     # split dataframe into X and y
28 |     X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
29 | 
30 |     # train/test split
31 |     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
32 | 
33 |     # return splits and encoder
34 |     return X_train, X_test, y_train, y_test
35 | 
36 | def train_model(reg_rate, X_train, X_test, y_train, y_test):
37 |     # train model
38 |     model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)
39 | 
40 |     # return model
41 |     return model
42 | 
43 | def parse_args():
44 |     # setup arg parser
45 |     parser = argparse.ArgumentParser()
46 | 
47 |     # add arguments
48 |     parser.add_argument("--diabetes-csv", dest='diabetes_csv', type=str)
49 |     parser.add_argument("--reg-rate", dest='reg_rate', type=float, default=0.01)
50 | 
51 |     # parse args
52 |     args = parser.parse_args()
53 | 
54 |     # return args
55 |     return args
56 | 
57 | # run script
58 | if __name__ == "__main__":
59 |     # add space in logs
60 |     print("\n\n")
61 |     print("*" * 60)
62 | 
63 |     # parse args
64 |     args = parse_args()
65 | 
66 |     # run main function
67 |     main(args)
68 | 
69 |     # add space in logs
70 |     print("*" * 60)
71 |     print("\n\n")


--------------------------------------------------------------------------------
/Allfiles/Labs/04/mlflow-endpoint/create-endpoint.yml:
--------------------------------------------------------------------------------
1 | $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
2 | auth_mode: key


--------------------------------------------------------------------------------
/Allfiles/Labs/04/mlflow-endpoint/mlflow-deployment.yml:
--------------------------------------------------------------------------------
1 | $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
2 | model:
3 |   name: sample-mlflow-sklearn-model
4 |   version: 1
5 |   path: model
6 |   type: mlflow_model
7 | instance_type: Standard_F4s_v2
8 | instance_count: 1


--------------------------------------------------------------------------------
/Allfiles/Labs/04/mlflow-endpoint/model/MLmodel:
--------------------------------------------------------------------------------
 1 | artifact_path: model
 2 | flavors:
 3 |   python_function:
 4 |     env: conda.yaml
 5 |     loader_module: mlflow.sklearn
 6 |     model_path: model.pkl
 7 |     python_version: 3.7.13
 8 |   sklearn:
 9 |     pickled_model: model.pkl
10 |     serialization_format: cloudpickle
11 |     sklearn_version: 0.24.1
12 | run_id: 3cdc6ac1-76e3-47e3-a104-5599c222846e
13 | signature:
14 |   inputs: '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": [-1, 8]}}]'
15 |   outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "int64", "shape": [-1]}}]'
16 | utc_time_created: '2021-11-18 10:39:04.318176'
17 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/04/mlflow-endpoint/model/conda.yaml:
--------------------------------------------------------------------------------
 1 | channels:
 2 | - conda-forge
 3 | dependencies:
 4 | - python=3.7.13
 5 | - pip<=20.2.4
 6 | - pip:
 7 |   - mlflow
 8 |   - cloudpickle==2.2.0
 9 |   - psutil==5.8.0
10 |   - scikit-learn==0.24.1
11 | name: mlflow-env


--------------------------------------------------------------------------------
/Allfiles/Labs/04/mlflow-endpoint/model/model.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MicrosoftLearning/mslearn-aml-cli/191e8804f8005b2cad8eb4230ffb7cf5ada14068/Allfiles/Labs/04/mlflow-endpoint/model/model.pkl


--------------------------------------------------------------------------------
/Allfiles/Labs/04/mlflow-endpoint/model/requirements.txt:
--------------------------------------------------------------------------------
1 | mlflow
2 | cloudpickle==2.2.0
3 | scikit-learn==0.24.2


--------------------------------------------------------------------------------
/Allfiles/Labs/04/mlflow-endpoint/sample-data.json:
--------------------------------------------------------------------------------
1 | {
2 |   "input_data": [
3 |       [2,180,74,24,21,23.9091702,1.488172308,60],
4 |       [0,148,58,11,179,39.19207553,0.160829008,45]
5 |   ]
6 | }


--------------------------------------------------------------------------------
/Allfiles/Labs/05/fix-missing-data.yml:
--------------------------------------------------------------------------------
 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 2 | name: remove_empty_rows
 3 | display_name: Remove Empty Rows
 4 | version: 1
 5 | type: command
 6 | inputs:
 7 |   input_data: 
 8 |     type: uri_folder 
 9 | outputs:
10 |   output_data:
11 |     type: uri_folder
12 | code: ./src
13 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
14 | command: >-
15 |   python fix-missing-data.py 
16 |   --input_data ${{inputs.input_data}} 
17 |   --output_data ${{outputs.output_data}}
18 | 
19 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/05/job.yml:
--------------------------------------------------------------------------------
 1 | $schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
 2 | type: pipeline
 3 | experiment_name: diabetes-pipeline-example
 4 | 
 5 | compute: azureml:<your-compute-instance>
 6 | settings:
 7 |   datastore: azureml:workspaceblobstore
 8 | 
 9 | outputs: 
10 |   pipeline_job_trained_model:
11 |     mode: upload
12 | 
13 | jobs:
14 |   stats_job:
15 |     type: command
16 |     component: file:./summary-stats.yml
17 |     inputs:
18 |       input_data:
19 |         type: uri_folder
20 |         path: azureml:diabetes-data:1 
21 | 
22 |   fix_missing_job:
23 |     type: command
24 |     component: file:./fix-missing-data.yml
25 |     inputs:
26 |       input_data:
27 |         type: uri_folder
28 |         path: azureml:diabetes-data:1 
29 |     outputs:
30 |       output_data:
31 |         mode: upload
32 | 
33 |   normalize_job:
34 |     type: command
35 |     component: file:./normalize-data.yml
36 |     inputs:
37 |       input_data: ${{parent.jobs.fix_missing_job.outputs.output_data}}
38 |     outputs:
39 |       output_data:
40 |         mode: upload
41 | 
42 |   train_job:
43 |     type: command
44 |     component: file:./train-decision-tree.yml
45 |     inputs:
46 |       training_data: ${{parent.jobs.normalize_job.outputs.output_data}}
47 |     outputs:
48 |       model_output: ${{parent.outputs.pipeline_job_trained_model}}


--------------------------------------------------------------------------------
/Allfiles/Labs/05/normalize-data.yml:
--------------------------------------------------------------------------------
 1 | # <component>
 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 3 | name: normalize_data
 4 | display_name: Normalize Numerical Columns
 5 | version: 1
 6 | type: command
 7 | inputs:
 8 |   input_data: 
 9 |     type: uri_folder 
10 | outputs:
11 |   output_data:
12 |     type: uri_folder
13 | code: ./src
14 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
15 | command: >-
16 |   python normalize-data.py 
17 |   --input_data ${{inputs.input_data}} 
18 |   --output_data ${{outputs.output_data}}
19 | # </component>
20 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/05/src/fix-missing-data.py:
--------------------------------------------------------------------------------
 1 | # import libraries
 2 | import argparse
 3 | import glob
 4 | from pathlib import Path
 5 | import pandas as pd
 6 | import mlflow
 7 | 
 8 | # get parameters
 9 | parser = argparse.ArgumentParser()
10 | parser.add_argument("--input_data", type=str, help='Path to input data')
11 | parser.add_argument('--output_data', type=str, help='Path of output data')
12 | args = parser.parse_args()
13 | 
14 | # load the data (passed as an input dataset)
15 | data_path = args.input_data
16 | all_files = glob.glob(data_path + "/*.csv")
17 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
18 |     
19 | # log row count input data
20 | row_count = (len(df))
21 | mlflow.log_metric('row count input data', row_count)
22 | 
23 | # remove nulls
24 | df = df.dropna()
25 | 
26 | # log processed rows
27 | row_count_processed = (len(df))
28 | mlflow.log_metric('row count output data', row_count_processed)
29 | 
30 | # set the processed data as output
31 | output_df = df.to_csv((Path(args.output_data) / "output_data.csv"))


--------------------------------------------------------------------------------
/Allfiles/Labs/05/src/normalize-data.py:
--------------------------------------------------------------------------------
 1 | # import libraries
 2 | import argparse
 3 | import os
 4 | import glob
 5 | from pathlib import Path
 6 | import pandas as pd
 7 | import mlflow
 8 | from sklearn.preprocessing import MinMaxScaler
 9 | 
10 | # get parameters
11 | parser = argparse.ArgumentParser()
12 | parser.add_argument("--input_data", type=str, help='Path to input data')
13 | parser.add_argument('--output_data', type=str, help='Path of output data')
14 | args = parser.parse_args()
15 | 
16 | # load the data (passed as an input dataset)
17 | print("files in input_data path: ")
18 | arr = os.listdir(args.input_data)
19 | print(arr)
20 | 
21 | for filename in arr:
22 |     print("reading file: %s ..." % filename)
23 |     with open(os.path.join(args.input_data, filename), "r") as handle:
24 |         print(handle.read())
25 | 
26 | data_path = args.input_data
27 | all_files = glob.glob(data_path + "/*.csv")
28 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
29 |     
30 | # log row count input data
31 | row_count = (len(df))
32 | mlflow.log_metric('row count input data', row_count)
33 | 
34 | # normalize the numeric columns
35 | scaler = MinMaxScaler()
36 | num_cols = ['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree']
37 | df[num_cols] = scaler.fit_transform(df[num_cols])
38 | 
39 | # log processed rows
40 | row_count_processed = (len(df))
41 | mlflow.log_metric('row count output data', row_count_processed)
42 | 
43 | # set the processed data as output
44 | output_df = df.to_csv((Path(args.output_data) / "output_data.csv"))


--------------------------------------------------------------------------------
/Allfiles/Labs/05/src/summary-stats.py:
--------------------------------------------------------------------------------
 1 | # import libraries
 2 | import argparse
 3 | import glob
 4 | from pathlib import Path
 5 | import pandas as pd
 6 | import mlflow
 7 | 
 8 | # get parameters
 9 | parser = argparse.ArgumentParser()
10 | parser.add_argument("--input_data", type=str, help='Path to input data')
11 | args = parser.parse_args()
12 | 
13 | # read data
14 | data_path = args.input_data
15 | all_files = glob.glob(data_path + "/*.csv")
16 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
17 |     
18 | # log row count
19 | row_count = (len(df))
20 | mlflow.log_metric('row count', row_count)
21 | 
22 | # get summary statistics
23 | stats = df.describe()
24 | stats.to_csv('summary_statistics.csv')
25 | mlflow.log_artifact('summary_statistics.csv')
26 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/05/src/train-decision-tree.py:
--------------------------------------------------------------------------------
 1 | # Import libraries
 2 | import argparse
 3 | import glob
 4 | import pandas as pd
 5 | import numpy as np
 6 | import mlflow
 7 | from sklearn.model_selection import train_test_split
 8 | from sklearn.tree import DecisionTreeClassifier
 9 | from sklearn.metrics import roc_auc_score
10 | from sklearn.metrics import roc_curve
11 | import matplotlib.pyplot as plt
12 | import pickle
13 | from pathlib import Path
14 | 
15 | # get parameters
16 | parser = argparse.ArgumentParser("train")
17 | parser.add_argument("--training_data", type=str, help="Path to training data")
18 | parser.add_argument("--model_output", type=str, help="Path of output model")
19 | 
20 | args = parser.parse_args()
21 | 
22 | training_data = args.training_data
23 | 
24 | # load the prepared data file in the training folder
25 | print("Loading Data...")
26 | data_path = args.training_data
27 | all_files = glob.glob(data_path + "/*.csv")
28 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
29 | 
30 | # Separate features and labels
31 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
32 | 
33 | # Split data into training set and test set
34 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
35 | 
36 | # Train a decision tree model
37 | print('Training a decision tree model...')
38 | model = DecisionTreeClassifier().fit(X_train, y_train)
39 | 
40 | # calculate accuracy
41 | y_hat = model.predict(X_test)
42 | acc = np.average(y_hat == y_test)
43 | print('Accuracy:', acc)
44 | mlflow.log_metric('Accuracy', np.float(acc))
45 | 
46 | # calculate AUC
47 | y_scores = model.predict_proba(X_test)
48 | auc = roc_auc_score(y_test,y_scores[:,1])
49 | print('AUC: ' + str(auc))
50 | mlflow.log_metric('AUC', np.float(auc))
51 | 
52 | # plot ROC curve
53 | fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
54 | fig = plt.figure(figsize=(6, 4))
55 | # Plot the diagonal 50% line
56 | plt.plot([0, 1], [0, 1], 'k--')
57 | # Plot the FPR and TPR achieved by our model
58 | plt.plot(fpr, tpr)
59 | plt.xlabel('False Positive Rate')
60 | plt.ylabel('True Positive Rate')
61 | plt.title('ROC Curve')
62 | plt.savefig("ROCcurve.png")
63 | mlflow.log_artifact("ROCcurve.png")
64 | 
65 | # Output the model and test data
66 | pickle.dump(model, open((Path(args.model_output) / "model.sav"), "wb"))


--------------------------------------------------------------------------------
/Allfiles/Labs/05/src/train-logistic-regression.py:
--------------------------------------------------------------------------------
 1 | # Import libraries
 2 | import argparse
 3 | import glob
 4 | import pandas as pd
 5 | import numpy as np
 6 | import mlflow
 7 | from sklearn.model_selection import train_test_split
 8 | from sklearn.linear_model import LogisticRegression
 9 | from sklearn.metrics import confusion_matrix
10 | import matplotlib.pyplot as plt
11 | import pickle
12 | from pathlib import Path
13 | 
14 | # get parameters
15 | parser = argparse.ArgumentParser("train")
16 | parser.add_argument("--training_data", type=str, help="Path to training data")
17 | parser.add_argument("--reg_rate", type=float, default=0.01)
18 | parser.add_argument("--model_output", type=str, help="Path of output model")
19 | 
20 | args = parser.parse_args()
21 | 
22 | training_data = args.training_data
23 | 
24 | # load the prepared data file in the training folder
25 | print("Loading Data...")
26 | data_path = args.training_data
27 | all_files = glob.glob(data_path + "/*.csv")
28 | df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
29 | 
30 | # Separate features and labels
31 | X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
32 | 
33 | # Split data into training set and test set
34 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
35 | 
36 | # Train a logistic regression model
37 | print('Training a logistic regression model...')
38 | model = LogisticRegression(C=1/args.reg_rate, solver="liblinear").fit(X_train, y_train)
39 | 
40 | # calculate accuracy
41 | y_pred = model.predict(X_test)
42 | acc = np.average(y_pred == y_test)
43 | mlflow.log_metric("Accuracy", np.float(acc))
44 | 
45 | # create confusion matrix
46 | conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred)
47 | fig, ax = plt.subplots(figsize=(7.5, 7.5))
48 | ax.matshow(conf_matrix, cmap=plt.cm.Blues, alpha=0.3)
49 | for i in range(conf_matrix.shape[0]):
50 |     for j in range(conf_matrix.shape[1]):
51 |         ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center', size='xx-large')
52 | 
53 | plt.xlabel('Predictions', fontsize=18)
54 | plt.ylabel('Actuals', fontsize=18)
55 | plt.title('Confusion Matrix', fontsize=18)
56 | plt.savefig("ConfusionMatrix.png")
57 | mlflow.log_artifact("ConfusionMatrix.png")
58 | 
59 | # Output the model and test data
60 | pickle.dump(model, open((Path(args.model_output) / "model.sav"), "wb"))


--------------------------------------------------------------------------------
/Allfiles/Labs/05/summary-stats.yml:
--------------------------------------------------------------------------------
 1 | # <component>
 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 3 | name: get_summary_statistics
 4 | display_name: Get Summary Statistics
 5 | version: 1
 6 | type: command
 7 | inputs:
 8 |   input_data: 
 9 |     type: uri_folder 
10 | code: ./src
11 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
12 | command: >-
13 |   python summary-stats.py 
14 |   --input_data ${{inputs.input_data}} 
15 | # </component>
16 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/05/train-decision-tree.yml:
--------------------------------------------------------------------------------
 1 | # <component>
 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 3 | name: train_decision_tree_classifier_model
 4 | display_name: Train a Decision Tree Classifier Model
 5 | version: 1
 6 | type: command
 7 | inputs:
 8 |   training_data: 
 9 |     type: uri_folder
10 | outputs:
11 |   model_output:
12 |     type: uri_folder
13 | code: ./src
14 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
15 | command: >-
16 |   python train-decision-tree.py 
17 |   --training_data ${{inputs.training_data}} 
18 |   --model_output ${{outputs.model_output}}
19 | # </component>
20 | 


--------------------------------------------------------------------------------
/Allfiles/Labs/05/train-logistic-regression.yml:
--------------------------------------------------------------------------------
 1 | # <component>
 2 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 3 | name: train_logistic_regression_classifier_model
 4 | display_name: Train a Logistic Regression Classifier Model
 5 | version: 1
 6 | type: command
 7 | inputs:
 8 |   training_data: 
 9 |     type: uri_folder
10 |   regularization_rate: 
11 |     type: number
12 | outputs:
13 |   model_output:
14 |     type: uri_folder
15 | code: ./src
16 | environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
17 | command: >-
18 |   python train-logistic-regression.py 
19 |   --training_data ${{inputs.training_data}}
20 |   --reg_rate ${{inputs.regularization_rate}}
21 |   --model_output ${{outputs.model_output}}
22 | # </component>
23 | 


--------------------------------------------------------------------------------
/Instructions/Labs/01-create-workspace.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | lab:
  3 |     title: 'Lab: Create an Azure Machine Learning workspace and assets with the CLI (v2)'
  4 |     module: 'Module: Create Azure Machine Learning resources with the CLI (v2)'
  5 | ---
  6 | 
  7 | # Create an Azure Machine Learning workspace and assets with the CLI (v2)
  8 | 
  9 | In this exercise, you will create and explore an Azure Machine Learning workspace using the Azure Cloud Shell.
 10 | 
 11 | ## Set up the Azure Cloud Shell and install the Azure Machine Learning extension
 12 | 
 13 | To start, open the Azure Cloud Shell, install the Azure Machine Learning extension and clone the Git repo.
 14 | 
 15 | 1. In a browser, open the Azure portal at [http://portal.azure.com](https://portal.azure.com/?azure-portal=true), signing in with your Microsoft account.
 16 | 1. Select the [>_] (*Cloud Shell*) button at the top of the page to the right of the search box. This opens a Cloud Shell pane at the bottom of the portal.
 17 | 1. The first time you open the cloud shell, you will be asked to choose the type of shell you want to use (*Bash* or *PowerShell*). Select **Bash**.
 18 | 1. If you are asked to create storage for your cloud shell, check that the correct subscription is specified and select **Create storage**. Wait for the storage to be created.
 19 | 1. Check to see if the Azure Machine Learning extension is installed with the following command:
 20 | 
 21 |     ```azurecli
 22 |     az extension list
 23 |     ```
 24 | 
 25 |     > **Tip:** Adding **-o table** at the end of the command will format the output in a table, making it easier to read for some people. The command would then be: `az extension list -o table`
 26 | 
 27 | 1. If it is not intalled, use the following command to install the Azure Machine Learning extension:
 28 | 
 29 |     ```azurecli
 30 |     az extension add -n ml -y
 31 |     ```
 32 | 
 33 | 1. In the command shell, clone this Github repository to download all necessary files which are stored in the *Allfiles* folder.
 34 | 
 35 |     ```azurecli
 36 |     git clone https://github.com/MicrosoftLearning/mslearn-aml-cli.git mslearn-aml-cli
 37 |     ```
 38 | 
 39 | 1. The files are downloaded to a folder named **mslearn-aml-cli**. To see the files in your Cloud Shell storage and work with them, type the following command in the shell:
 40 | 
 41 |     ```azurecli
 42 |     code .
 43 |     ```
 44 | 
 45 | ## Create an Azure resource group and set as default
 46 | 
 47 | To create a workspace with the CLI (v2), you need a resource group. You can create a new one with the CLI or use an existing resource group. Either way, make sure to set a resource group as the default to complete this exercise.
 48 | 
 49 | > **Tip:** You can get a list of available locations with the `az account list-locations -o table` command. Use the **name** column for the location name
 50 | 
 51 | 1. Run the following command to create a resource group and use a location close to you:
 52 | 
 53 |     ```azurecli
 54 |     az group create --name "diabetes-dev-rg" --location "eastus"
 55 |     ```
 56 | 
 57 | 1. Set the resource group as the default to avoid having to specify it on every command going forward:
 58 | 
 59 |     ```azurecli
 60 |     az configure --defaults group="diabetes-dev-rg"
 61 |     ```
 62 | 
 63 | ## Create an Azure Machine Learning workspace and set as default
 64 | 
 65 | As its name suggests, a workspace is a centralized place to manage all of the Azure ML assets you need to work on a machine learning project.
 66 | 
 67 | 1. Create a workspace:
 68 | 
 69 |     ```azurecli
 70 |     az ml workspace create --name "aml-diabetes-dev"
 71 |     ```
 72 | 
 73 | 1. Set the workspace as the default:
 74 | 
 75 |     ```azurecli
 76 |     az configure --defaults workspace="aml-diabetes-dev"
 77 |     ```
 78 | 
 79 | 1. Check your work by signing in to the [Azure Machine Learning Studio](https://ml.azure.com). After you sign in, choose the *aml-diabetes-dev* workspace to open it.
 80 | 
 81 | ## Create a Compute Instance
 82 | 
 83 | To run a notebook, you'll need a compute instance.
 84 | 
 85 | In this exercise, you'll create a compute instance with the following settings:
 86 | 
 87 | - `--name`: *Name of compute instance. Has to be unique and fewer than 24 characters.*
 88 | - `--size`: STANDARD_DS11_V2
 89 | - `--type`: ComputeInstance
 90 | - `--workspace-name`: *Will use the default workspace you've configured so you don't need to specify.*
 91 | - `--resource-group`: *Will use the default resource group you've configured so you don't need to specify.*
 92 | 
 93 | 1. Run the `az ml compute create` with the settings listed above. Change the name to make it unique in your region. It should look something like this:
 94 | 
 95 |     ```azurecli
 96 |     az ml compute create --name "testdev-vm" --size STANDARD_DS11_V2 --type ComputeInstance
 97 |     ```
 98 | 
 99 |     > **Note:** If a compute instance with the name "testdev-vm" already exists, change the name to make it unique within your Azure region, with a maximum of 24 characters. If you get an error because the name is not unique, delete the partially created compute instance with `az ml compute delete --name "compute-instance-name"`.
100 | 
101 | 1. The command will take 2 to 5 minutes to complete. After that, switch to [Azure Machine Learning Studio](https://ml.azure.com), open the **Compute** tab and confirm that the instance has been created and is running.
102 | 
103 | ## Create an environment
104 | 
105 | To execute a Python script, you'll need to install any necessary libraries and packages. To automate the installation of packages, you can use an environment.
106 | 
107 | To create an environment from a Docker image plus a Conda environment with the CLI (v2) you need two files:
108 | 
109 | - The specification YAML file, including the environment name, version and base Docker image.
110 | - The Conda environment file, including the libraries and packages you want installed.
111 | 
112 | The necessary YAML files have already been created for you and are part of the **mslearn-aml-cli** repo you cloned in the Azure Cloud Shell.
113 | 
114 | 1. To navigate to the YAML files, run the following command in the Cloud Shell:
115 | 
116 |     ```azurecli
117 |     code .
118 |     ```
119 | 
120 | 1. Navigate to the **mslearn-aml-cli/Allfiles/Labs/01** folder.
121 | 1. Select the **basic-env.yml** file to open it. Explore its contents which describes how the environment should be created within the Azure ML workspace.
122 | 1. Select the **conda-envs/basic-env-cpu.yml** file to open it. Explore its contents which list the libraries that need to be installed on the compute.
123 | 1. Run the following command to create the environment :
124 | 
125 |     ```azurecli
126 |     az ml environment create --file ./mslearn-aml-cli/Allfiles/Labs/01/basic-env.yml
127 |     ```
128 | 
129 | 1. Once the environment is created, a summary is shown in the prompt. You can also view the environment in the **Azure Machine Learning Studio** in the **Environments** tab, under *Custom environments*.
130 | 
131 | ## Create a dataset
132 | 
133 | To create a dataset in the workspace from a local CSV, you need two files:
134 | 
135 | - The specification YAML file, including the dataset name, version and local path of the CSV file. Navigate to  **Allfiles/Labs/01/data-local-path.yml** to explore the contents of this file.
136 | - The CSV file containing data. In this exercise, you'll work with diabetes data. Navigate to **Allfiles/Labs/01/data/diabetes.csv** to explore the contents of this file.
137 | 
138 | Before you create a dataset, you can explore the files by using the `code .` command in the Cloud Shell.
139 | 
140 | 1. Run the following command to create a dataset from the configuration described in `data-local-path.yml`:
141 | 
142 |     ```azurecli
143 |     az ml data create --file ./mslearn-aml-cli/Allfiles/Labs/01/data-local-path.yml
144 |     ```
145 | 
146 |     >**Note:** When you create a dataset from a local path, the workspace will automatically upload the dataset to the default datastore. In this case, it will be uploaded to the storage account which was created when you created the workspace.
147 | 
148 | 2. Once the dataset is created, a summary is shown in the prompt. You can also view the environment in the **Azure Machine Learning Studio** in the **Data** tab, under *Data assets*.
149 | 
150 | ## Clean up resources
151 | 
152 | Once you've finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription.
153 | 
154 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary.
155 | 
156 | ```azurecli
157 | az ml compute stop --name "testdev-vm" --no-wait
158 | ```
159 | 
160 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat this lab to create the workspace and prepare the environment first.
161 | 
162 | To delete the complete Azure Machine Learning workspace and all assets you created, you can use the following command in the CLI:
163 | 
164 | ```azurecli
165 | az ml workspace delete
166 | ```
167 | 


--------------------------------------------------------------------------------
/Instructions/Labs/02-run-python-job.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | lab:
 3 |     title: 'Lab: Run a basic Python training job'
 4 |     module: 'Module: Run jobs in Azure Machine Learning with CLI (v2)'
 5 | ---
 6 | 
 7 | # Run a basic Python training job
 8 | 
 9 | In this exercise, you will train a model with a Python script. The model training will be submitted with the CLI (v2). First, you'll train a model based on a local CSV dataset. Next, you'll train a model using a dataset registered in the Azure Machine Learning workspace.
10 | 
11 | ## Prerequisites
12 | 
13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your cloud shell environment and your Azure Machine Learning environment.
14 | 
15 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account.
16 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell.
17 | 1. If your compute instance is stopped. Start the instance again by using the following command. Change `<your-compute-instance-name>` to your compute instance name before running the code:
18 | 
19 |     ```azurecli
20 |     az ml compute start --name "<your-compute-instance-name>"
21 |     ```
22 | 
23 | 1. To confirm that the instance is now in a **Running** state, open another tab in your browser and navigate to the [Azure Machine Learning Studio](https://ml.azure.com). Open the **Compute** tab and select **Compute instances**.
24 | 
25 | ## Train a model
26 | 
27 | To track a machine learning workflow, you can run the training script using a **job**. The configuration of the job can be described in a YAML file.
28 | 
29 | In this exercise, you'll train a Logistic Regression model. Explore the training script **main.py** by navigating to **mslearn-aml-cli/Allfiles/Labs/02/basic-job/src/main.py**. The dataset used is in the same folder and stored as **diabetes.csv**.
30 | 
31 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo:
32 | 
33 |     ```azurecli
34 |     code .
35 |     ```
36 | 
37 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/02/basic-job** and open **basic-job.yml** by selecting the file.
38 | 1. Change the **compute** value by replacing `<your-compute-instance-name> ` with the name of your compute instance.
39 | 1. Save the file by selecting the top right corner of the text editor and then selecting **Save**
40 | 1. Run the job by using the following command:
41 | 
42 |     ```azurecli
43 |     az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/02/basic-job/basic-job.yml
44 |     ```
45 | 
46 | 1. Return to your Azure Machine Learning Studio browser tab, go to the **Jobs** page and locate the **diabetes-example** experiment in the **All experiments** tab.
47 | 1. Open the run to monitor the job and refresh the view if necessary. Once completed, you can explore the details of the job which are stored in the experiment run.
48 | 
49 | ## Train a model with dataset from datastore
50 | 
51 | In the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](Instructions/Labs/01-create-workspace.md) lab, you created a dataset named **diabetes-data**. To check that the dataset exists within your workspace, you can navigate to the Azure Machine Learning Studio and select the **Data** item from the left menu and then the **Data assets** tab.
52 | 
53 | Instead of storing a CSV file in the same folder as the training script, you can also train a model using a registered dataset as input.
54 | 
55 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/02/input-data-job** and open **data-job.yml** by selecting the file.
56 | 1. Change the **compute** value <your-compute-instance-name> with the name of your compute instance and save the file.
57 | 
58 |     > **Note** that the command now runs the **main.py** script with the parameter **--diabetes-csv**. The input of that parameter is defined in the **inputs.diabetes** value. It takes version 1 of the **diabetes-data** dataset from the Azure ML workspace.
59 | 
60 | 1. Use the following command to run the job:
61 | 
62 |     ```azurecli
63 |     az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/02/input-data-job/data-job.yml
64 |     ```
65 | 
66 | 1. Go to the Azure Machine Learning Studio and locate the **diabetes-data-example** experiment. Open the run to monitor the job. Refresh the view if necessary. 
67 | 1. Once completed, you can explore the details of the job which are stored in the experiment run. Note that now, it lists the input dataset **diabetes-data**.
68 | 
69 | ## Clean up resources
70 | 
71 | When you're finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription.
72 | 
73 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary.
74 | 
75 | ```azurecli
76 | az ml compute stop --name "testdev-vm" --no-wait
77 | ```
78 | 
79 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first.
80 | 
81 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI:
82 | 
83 | ```azurecli
84 | az ml workspace delete
85 | ```
86 | 


--------------------------------------------------------------------------------
/Instructions/Labs/03-run-sweep-job.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | lab:
 3 |     title: 'Lab: Perform hyperparameter tuning with a sweep job'
 4 |     module: 'Module: Run jobs in Azure Machine Learning with CLI (v2)'
 5 | ---
 6 | 
 7 | # Run a sweep job to tune hyperparameters
 8 | 
 9 | In this exercise, you will perform hyperparameter tuning when training a model with a Python script.The model training will be submitted with the CLI (v2).
10 | 
11 | ## Prerequisites
12 | 
13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment.
14 | 
15 | You'll run all commands in this lab from the Azure Cloud Shell. If this is your first time using the cloud shell, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](Instructions/Labs/01-create-workspace.md) lab to set up the cloud shell environment.
16 | 
17 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account.
18 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell.
19 | 1. To train multiple models in parallel, you'll use a compute cluster to train the models. To create a compute cluster, use the following command:
20 | 
21 |     ```azurecli
22 |     az ml compute create --name "aml-cluster" --size STANDARD_DS11_V2 --max-instances 2 --type AmlCompute
23 |     ```
24 | 
25 | 1. To confirm that the cluster has been created, open another tab in your browser and navigate to the [Azure Machine Learning Studio](https://ml.azure.com). Open the **Compute** tab and select **Compute clusters**, you should see there a cluster named **aml-cluster**.
26 |     > **Note:** Creating a compute cluster with two maximum instances means you can train two models in parallel. If you want to train more models in parallel, increase the number for --max-instances. You can also change this after the cluster is created.
27 | 
28 | ## Run a sweep job
29 | 
30 | To train multiple models with varying hyperparameters, you can run the training script using a **sweep job**. Just like with a command job, the configuration of the sweep job can be described in a YAML file.
31 | 
32 | In this exercise, you'll train a Gradient Boosting Classifier model. Explore the training script **main.py** by navigating to **mslearn-aml-cli/Allfiles/Labs/02/sweep-job/src/main.py**. The dataset used is the registered dataset **diabetes-data**.
33 | 
34 | There are two hyperparameter values:
35 | 
36 | - Learning rate: with search space [0.01, 0.1, 1.0]
37 | - N estimators: with search space [10, 100]
38 | 
39 | You'll use the grid sampling method on these hyperparameters, which means you'll try out all possible combinations of values. As a result, you'll train six models as part of the sweep job. Recall that each individual model will be listed as a child run, and the details of the overview of the sweep job will be stored with the main experiment run.
40 | 
41 | To run the sweep job:
42 | 
43 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo, if they are not opened yet:
44 | 
45 |     ```azurecli
46 |     code .
47 |     ```
48 | 
49 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/02/sweep-job** and open **sweep-job.yml** by selecting the file.
50 | 
51 | 1. Run the job by using the following command:
52 | 
53 |     ```azurecli
54 |     az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/02/sweep-job/sweep-job.yml
55 |     ```
56 | 
57 | 1. Switch to the browser tab with Azure Machine Learning Studio. Go to the **Jobs** page and select the **diabetes-sweep-example** experiment.
58 | 1. Monitor the job and refresh the view if necessary. Once completed, you can explore the details of the job which are stored in the experiment run.
59 | 
60 | ## Clean up resources
61 | 
62 | The compute cluster will automatically scale down to 0 nodes, so there is no need to stop the cluster.
63 | 
64 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first.
65 | 
66 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI:
67 | 
68 | ```azurecli
69 | az ml workspace delete
70 | ```
71 | 


--------------------------------------------------------------------------------
/Instructions/Labs/04-use-mlflow-jobs.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | lab:
 3 |     title: 'Lab: Track Azure ML jobs with MLflow'
 4 |     module: 'Module: Use MLflow with Azure ML jobs submitted with CLI (v2)'
 5 | ---
 6 | 
 7 | # Track Azure ML jobs with MLflow
 8 | 
 9 | In this exercise, you will train a model with a Python script. The Python script uses **MLflow** to track parameters, metrics, and artifacts.
10 | 
11 | ## Prerequisites
12 | 
13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment.
14 | 
15 | You'll run all commands in this lab from the Azure Cloud Shell. If this is your first time using the cloud shell, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](Instructions/Labs/01-create-workspace.md) lab to set up the cloud shell environment.
16 | 
17 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account.
18 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell.
19 | 1. If your compute instance is stopped. Start the instance again by using the following command. Change `<your-compute-instance-name>` to your compute instance name before running the code:
20 | 
21 |     ```azurecli
22 |     az ml compute start --name "<your-compute-instance-name>"
23 |     ```
24 | 
25 | 1. To confirm that the instance is now in a **Running** state, open another tab in your browser and navigate to the [Azure Machine Learning Studio](https://ml.azure.com). Open the **Compute** tab and select **Compute instances**.
26 | 
27 | ## Track a model
28 | 
29 | You'll train a Logistic Regression model to classify whether someone has diabetes. You can track things like input parameter such as the value of the regularization rate used to train the model. As a result, you want to know the accuracy of the model and store a confusion matrix to explain the accuracy.
30 | 
31 | To track the input and output of a model, we can:
32 | 
33 | - Enable autologging using `mlflow.autolog()`
34 | - Use logging functions to track custom metrics using `mlflow.log_*`
35 | 
36 | Note that to do either, we have to include the `mlflow` and `azureml-mlflow` packages in the environment used during training. The registered environment **basic-env-scikit** includes these two packages.
37 | 
38 | ### Enable autologging
39 | 
40 | You'll submit a job from the Azure Cloud Shell with the CLI (v2), using a Python script.
41 | 
42 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo:
43 | 
44 |     ```azurecli
45 |     code .
46 |     ```
47 | 
48 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/03/mlflow-job** and open **mlflow-job.yml** by selecting the file.
49 | 1. Change the **compute** value by replacing `<your-compute-instance-name>` with the name of your compute instance.
50 | 1. Save the file by selecting the top right corner of the text editor and then selecting **Save**.
51 | 1. Note that you'll run the **mlflow-autolog.py** script that is located in the **src** folder. Navigate to that folder and open the file to explore it. Find the `mlflow.autolog()` method.
52 | 1. Run the job by using the following command:
53 | 
54 |     ```azurecli
55 |     az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/03/mlflow-job/mlflow-job.yml
56 |     ```
57 | 
58 | 1. Switch to your Azure Machine Learning Studio browser tab. Go to the **Jobs** page, in the **All experiments** tab and select the **diabetes-mlflow-example** experiment.
59 | 1. Monitor the job and refresh the view if necessary. Once completed, you can explore the details of the job which are stored in the experiment run.
60 | 
61 | ### Use logging functions to track custom metrics
62 | 
63 | Instead of using the autologging feature of Mlflow, you can also create and track your own parameters, metrics, and artifacts. For this, we'll use another training script.
64 | 
65 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/03/mlflow-job** and open **mlflow-job.yml** by selecting the file.
66 | 1. Now, you want to run the **custom-mlflow.py** script that is located in the **src** folder. In the **mlflow-job.yml** file, remove the **mlflow-autolog.py** file, and replace with **custom-mlflow.py**. Don't forget to save the YAML file!
67 | 1. To explore the training script, navigate to the **src** folder and open the file to explore it. Find the `mlflow.log_param()`, `mlflow.metric()`, and `mlflow.artifact()` methods.
68 | 1. Run the job by using the following command:
69 | 
70 |     ```azurecli
71 |     az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/03/mlflow-job/mlflow-job.yml
72 |     ```
73 | 
74 | 1. Go to the Azure Machine Learning Studio and again, locate the **diabetes-mlflow-example** experiment. Open the newest run to monitor the job. Once completed, you'll find the regularization rate in the **Overview** tab under **Parameters**. The accuracy score is listed under **Metrics** and the confusion matrix can be found under **Images**.
75 | 
76 | ## Clean up resources
77 | 
78 | When you're finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription.
79 | 
80 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary.
81 | 
82 | ```azurecli
83 | az ml compute stop --name "testdev-vm" --no-wait
84 | ```
85 | 
86 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first.
87 | 
88 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI:
89 | 
90 | ```azurecli
91 | az ml workspace delete
92 | ```
93 | 


--------------------------------------------------------------------------------
/Instructions/Labs/05-deploy-managed-endpoint.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | lab:
 3 |     title: 'Lab: Deploy an MLflow model to a managed online endpoint'
 4 |     module: 'Module: Deploy an Azure Machine Learning model to a managed endpoint with CLI (v2)'
 5 | ---
 6 | 
 7 | # Deploy a model to a managed online endpoint
 8 | 
 9 | In this exercise, you will deploy an MLflow model to a managed online endpoint.
10 | 
11 | ## Prerequisites
12 | 
13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment.
14 | 
15 | ## Deploy a model
16 | 
17 | A model has been trained to predict whether someone has diabetes. To consume the model, you want to deploy it to a managed online endpoint. The endpoint can be called from an application where a patient's information can be entered, after which the model can decide whether the patient is probable to have diabetes.
18 | 
19 | To deploy the model using the CLI (v2), you first create an endpoint.
20 | 
21 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo:
22 | 
23 |     ```azurecli
24 |     code .
25 |     ```
26 | 
27 | 1. Navigate to **mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint** and open **create-endpoint.yml** by selecting the file.
28 | 1. Explore the contents of the file. Note that your endpoint will use key-based authentication.
29 | 1. Use the following command to create the new endpoint.Before you run the command, replace `<endpoint_name>` with a name that is unique in the Azure region:
30 | 
31 |     ```azurecli
32 |     az ml online-endpoint create --name <endpoint_name> -f ./mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint/create-endpoint.yml
33 |     ```
34 | 
35 | 1. Next, you'll create the deployment. In the same folder **mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint**, find and open the YAML configuration file **mlflow-deployment.yml** for the deployment.
36 | 1. The deployment configuration refers to the endpoint configuration. In addition, it specifies how the model should be registered, what kind of compute should be used for the inference configuration, and where it can find the model assets. The MLflow model assets are stored in the **model** folder.
37 | 1. To deploy the model, run the following command. Before you run the command, replace the `<endpoint_name>` with the name you previously created and create a new `<deployment_name>`:
38 | 
39 |     ```azurecli
40 |     az ml online-deployment create --name <deployment_name> --endpoint <endpoint_name> -f ./mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint/mlflow-deployment.yml --all-traffic
41 |     ```
42 | 
43 | 1. Deployment may take some time, and progress will be visible in the Azure Cloud Shell. You can also view the endpoint in the Azure Machine Learning Studio, in the **Endpoints** tab, under **Real-time endpoints**.
44 | 
45 | ## Test the endpoint
46 | 
47 | Once deployment is completed, you can test and consume the endpoint. Let's try testing it with two data points.
48 | 
49 | 1. In the **mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint** folder, you can find the **sample-data.json** that contains two data points.
50 | 1. Run the following command to invoke the endpoint to predict for these two patients whether they have diabetes. Replace the `<endpoint_name>` with the name you previously created before you run the command:
51 | 
52 |     ```azurecli
53 |     az ml online-endpoint invoke --name <endpoint_name> --request-file ./mslearn-aml-cli/Allfiles/Labs/04/mlflow-endpoint/sample-data.json
54 |     ```
55 | 
56 | 1. As a result, you will see either a 1 or a 0 for each data point. A 1 means the patient is likely to have diabetes, a 0 means the patient is likely not to have diabetes.
57 | 1. Feel free to play around with the sample data and run the command again to see different results!
58 | 
59 | ## Clean up resources
60 | 
61 | When you're finished exploring Azure Machine Learning, delete your endpoint to avoid unnecessary charges in your Azure subscription.
62 | 
63 | You can delete an endpoint and all underlying deployments by using the following command. Rememeber to replace the `<endpoint_name>` with the name you previously created before you run the command:
64 | 
65 | ```azurecli
66 | az ml online-endpoint delete --name <endpoint_name> --yes --no-wait
67 | ```
68 | 
69 | To delete the Azure Machine Learning workspace, you can use the following command in the CLI:
70 | 
71 | ```azurecli
72 | az ml workspace delete
73 | ```
74 | 


--------------------------------------------------------------------------------
/Instructions/Labs/06-create-pipeline.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | lab:
  3 |     title: 'Lab: Run a pipeline with components'
  4 |     module: 'Module: Run component-based pipelines in Azure Machine Learning with CLI (v2)'
  5 | ---
  6 | 
  7 | # Run a pipeline with components
  8 | 
  9 | In this exercise, you will build a pipeline with components. The pipeline will be submitted with the CLI (v2). First, you'll run a pipeline. Next, you'll create components in the Azure Machine Learning workspace so that they can be reused. Finally, you'll create a pipeline with the Designer in the Azure Machine Learning Studio to experience how you can reuse components to create new pipelines.
 10 | 
 11 | ## Prerequisites
 12 | 
 13 | Before you continue, complete the [Create an Azure Machine Learning Workspace and assets with the CLI (v2)](01-create-workspace.md) lab to set up your Azure Machine Learning environment.
 14 | 
 15 | You'll run all commands in this lab from the Azure Cloud Shell.
 16 | 
 17 | 1. Open the Cloud Shell by navigating to [http://shell.azure.com](https://shell.azure.com/?azure-portal=true) and signing in with your Microsoft account.
 18 | 1. The repo [https://github.com/MicrosoftLearning/mslearn-aml-cli](https://github.com/MicrosoftLearning/mslearn-aml-cli) should be cloned. You can explore the repo and its contents by using the `code .` command in the Cloud Shell.
 19 | 1. If your compute instance is stopped. Start the instance again by using the following command. Change <your-compute-instance-name> to your compute instance name before running the code:
 20 |     ```azurecli
 21 |     az ml compute start --name "<your-compute-instance-name>"
 22 |     ```
 23 | 
 24 | ## Run a pipeline
 25 | 
 26 | You can train a model by running a job that refers to one training script. To train a model as part of a pipeline, you can use Azure Machine Learning to run multiple scripts. The configuration of the pipeline is defined in a YAML file.
 27 | 
 28 | In this exercise, you'll start by preprocessing the data and training a Decision Tree model. To explore the pipeline job definition **job.yml** navigate to **mslearn-aml-cli/Allfiles/Labs/05/job.yml**. The dataset used is the **diabetes-data** dataset registered to the Azure Machine Learning workspace in the set-up.
 29 | 
 30 | 1. Run the following command in the Cloud Shell to open the files of the cloned repo.
 31 | 
 32 |     ```azurecli
 33 |     code .
 34 |     ```
 35 | 
 36 | 2. Navigate to **mslearn-aml-cli/Allfiles/Labs/05/** and open **job.yml** by selecting the file.
 37 | 3. Change the **compute** value: replace `<your-compute-instance-name>` with the name of your compute instance.
 38 | 4. Run the job by using the following command:
 39 | 
 40 |     ```azurecli
 41 |     az ml job create --file ./mslearn-aml-cli/Allfiles/Labs/05/job.yml
 42 |     ```
 43 | 
 44 | 5. Open another tab in your browser and open the Azure Machine Learning Studio. Go to the **Experiments** page and locate the **diabetes-pipeline-example** experiment. Open the run to monitor the job. Refresh the view if necessary. Once completed, you can explore the details of the job and of each component by expanding the **Child runs**.
 45 | 
 46 | ## Create components
 47 | 
 48 | To reuse the pipeline's components, you can create the component in the Azure Machine Learning workspace. In addition to the components that were part of the pipeline you've just ran, you'll create another new component you haven't used before. You'll use the new component in the next part.
 49 | 
 50 | 1. Each component is created separately. Run the following code to create the components:
 51 | 
 52 |     ```azurecli
 53 |     az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/summary-stats.yml
 54 |     az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/fix-missing-data.yml
 55 |     az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/normalize-data.yml
 56 |     az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/train-decision-tree.yml
 57 |     az ml component create --file ./mslearn-aml-cli/Allfiles/Labs/05/train-logistic-regression.yml
 58 |     ```
 59 | 
 60 | 2. Navigate to the **Components** page in the Azure Machine Learning Studio. All created components should show in the list here. 
 61 | 
 62 | ## Create a new pipeline with the Designer
 63 | 
 64 | You can reuse the components by creating a pipeline with the Designer. You can recreate the same pipeline, or change the algorithm you use to train a model by replacing the component used to train the model.
 65 | 
 66 | 1. Navigate to the **Designer** page in the Azure Machine Learning Studio.
 67 | 2. Select the **Custom** tab at the top of the page.
 68 | 3. Create a new empty pipeline using custom components.
 69 | 4. Rename the pipeline to *Train-Diabetes-Classifier*.
 70 | 5. In the left menu, select the **Data** tab.
 71 | 6. Drag and drop the **diabetes-data** component to the canvas.
 72 | 7. In the left menu, select the **Component** tab.
 73 | 8. Drag and drop the **Remove Empty Rows** component on to the canvas, below the **diabetes-data**. Connect the output of the data to the input of the new component.
 74 | 9. Drag and drop the **Normalize numerical columns** component on to the canvas, below the **Remove empty rows**. Connect the output of the previous component to the input of the new component.
 75 | 10. Drag and drop the **Train a Decision Tree Classifier Model** component on to the canvas, below the **Normalize numerical columns**. Connect the output of the previous component to the input of the new component. Your pipeline should look like this:
 76 | ![Decision Tree Pipeline in Designer](media/designer-pipeline-decision.png)
 77 | 
 78 | 11. Select **Configure & Submit** to setup the pipeline job.
 79 | 12. On the **Basics** page create a new experiment, name it *diabetes-designer-pipeline* and select **Next**.
 80 | 13. On the **Inputs & outputs** page select **Next**.
 81 | 14. On the **Runtime settings** page set the default compute target to use the compute instance you created and select **Next**.
 82 | 15. On the **Review + Submit** page select **Submit** and wait for the job to complete.
 83 | 
 84 | ## Update the pipeline with the Designer
 85 | 
 86 | You have now trained the model with a similar pipeline as before (only omitting the calculation of the summary statistics). You can change the algorithm you use to train the model by replacing the last component:
 87 | 
 88 | 1. Remove the **Train a Decision Tree Classifier Model** component from the pipeline. 
 89 | 2. Drag and drop the **Train a Logistic Regression Classifier Model** component on to the canvas, below the **Remove empty rows**. Connect the output of the previous component to the input of the new component.
 90 | 
 91 |     The new model training component expects a numeric input, namely the regularization rate.
 92 | 
 93 | 3. Select the **Train a Logistic Regression Model** component and enter **1** for the **regularization_rate**. Your pipeline should look like this:
 94 | ![Logistic Regression Pipeline in Designer](media/designer-pipeline-regression.png)
 95 | 4. Submit the pipeline. Select the existing experiment named *diabetes-designer-pipeline*. Once completed, you can review the metrics and compare it with the previous pipeline to see if the model's performance has improved.
 96 | 
 97 | ## Clean up resources
 98 | 
 99 | When you're finished exploring Azure Machine Learning, shut down the compute instance to avoid unnecessary charges in your Azure subscription.
100 | 
101 | You can stop a compute instance with the following command. Change `"testdev-vm"` to the name of your compute instance if necessary.
102 | 
103 | ```azurecli
104 | az ml compute stop --name "testdev-vm" --no-wait
105 | ```
106 | 
107 | > **Note:** Stopping your compute ensures your subscription won't be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure Machine Learning workspace exists in your subscription. If you have finished exploring Azure Machine Learning, you can delete the Azure Machine Learning workspace and associated resources. However, if you plan to complete any other labs in this series, you will need to repeat the set-up to create the workspace and prepare the environment first.
108 | 
109 | To completely delete the Azure Machine Learning workspace, you can use the following command in the CLI:
110 | 
111 | ```azurecli
112 | az ml workspace delete
113 | ```
114 | 


--------------------------------------------------------------------------------
/Instructions/Labs/media/designer-pipeline-decision.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MicrosoftLearning/mslearn-aml-cli/191e8804f8005b2cad8eb4230ffb7cf5ada14068/Instructions/Labs/media/designer-pipeline-decision.png


--------------------------------------------------------------------------------
/Instructions/Labs/media/designer-pipeline-regression.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MicrosoftLearning/mslearn-aml-cli/191e8804f8005b2cad8eb4230ffb7cf5ada14068/Instructions/Labs/media/designer-pipeline-regression.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Sidney Andrews
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/_build.yml:
--------------------------------------------------------------------------------
 1 | name: '$(Date:yyyyMMdd)$(Rev:.rr)'
 2 | jobs:
 3 |   - job: build_markdown_content
 4 |     displayName: 'Build Markdown Content'
 5 |     workspace:
 6 |       clean: all
 7 |     pool:
 8 |       vmImage: 'Ubuntu 16.04'
 9 |     container:
10 |       image: 'microsoftlearning/markdown-build:latest'
11 |     steps:
12 |       - task: Bash@3
13 |         displayName: 'Build Content'
14 |         inputs:
15 |           targetType: inline
16 |           script: |
17 |             cp /{attribution.md,template.docx,package.json,package.js} .
18 |             npm install
19 |             node package.js --version $(Build.BuildNumber)
20 |       - task: GitHubRelease@0
21 |         displayName: 'Create GitHub Release'
22 |         inputs:
23 |           gitHubConnection: 'github-microsoftlearning-organization'
24 |           repositoryName: '$(Build.Repository.Name)'
25 |           tagSource: manual
26 |           tag: 'v$(Build.BuildNumber)'
27 |           title: 'Version $(Build.BuildNumber)'
28 |           releaseNotesSource: input
29 |           releaseNotes: '# Version $(Build.BuildNumber) Release'
30 |           assets: '$(Build.SourcesDirectory)/out/*.zip'
31 |           assetUploadMode: replace
32 |       - task: PublishBuildArtifacts@1
33 |         displayName: 'Publish Output Files'
34 |         inputs:
35 |           pathtoPublish: '$(Build.SourcesDirectory)/out/'
36 |           artifactName: 'Lab Files'
37 | 


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
 1 | remote_theme: MicrosoftLearning/Jekyll-Theme
 2 | exclude:
 3 |   - readme.md
 4 |   - .github/
 5 | header_pages:
 6 |   - index.html
 7 | author: Microsoft Learning
 8 | twitter_username: mslearning
 9 | github_username:  MicrosoftLearning
10 | plugins:
11 |   - jekyll-sitemap
12 |   - jekyll-mentions
13 |   - jemoji
14 | markdown: kramdown
15 | kramdown:
16 |    syntax_highlighter_opts:
17 |       disable : true
18 | 


--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Online Hosted Instructions
 3 | permalink: index.html
 4 | layout: home
 5 | ---
 6 | 
 7 | # Content Directory
 8 | 
 9 | Hyperlinks to each of the lab exercises for the Learn modules are listed below.
10 | 
11 | ## Labs
12 | 
13 | {% assign labs = site.pages | where_exp:"page", "page.url contains '/Instructions/Labs'" %}
14 | | Module | Lab |
15 | | --- | --- | 
16 | {% for activity in labs  %}| {{ activity.lab.module }} | [{{ activity.lab.title }}{% if activity.lab.type %} - {{ activity.lab.type }}{% endif %}]({{ site.github.url }}{{ activity.url }}) |
17 | {% endfor %}
18 | 


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | # Train models in Azure Machine Learning with the CLI (v2)
 2 | 
 3 | This repository contains the hands-on lab exercises for the Microsoft Learning Path Train models in Azure Machine Learning with the CLI (v2). The Learning Path consists of self-paced modules on Microsoft Learn. The labs are designed to accompany the learning materials and enable you to practice using the technologies described them.
 4 | 
 5 | You can view the instructions for the lab exercises at https://aka.ms/aml-cli2.
 6 | 
 7 | ## What are we doing?
 8 | 
 9 | - To support this course, we will need to make frequent updates to the course content to keep it current with the Azure services used in the course.  We are publishing the lab instructions and lab files on GitHub to allow for open contributions between the course authors and MCTs to keep the content current with changes in the Azure platform.
10 | 
11 | - We hope that this brings a sense of collaboration to the labs like we've never had before - when Azure changes and you find it first during a live delivery, go ahead and make an enhancement right in the lab source. 
12 | 
13 | ## How do I contribute?
14 | 
15 | - Anyone can submit a pull request to the code or content in the GitHub repro, Microsoft and the course author will triage and include content and lab code changes as needed.
16 | 
17 | - You can submit bugs, changes, improvement and ideas.  Find a new Azure feature before we have?  Submit a new demo!
18 | 


--------------------------------------------------------------------------------