├── .github
├── ISSUE_TEMPLATE.md
└── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── CHANGELOG.md
├── CONTRIBUTING.md
├── HttpTrigger
├── __init__.py
├── function.json
├── host.json
└── project
│ ├── .gitignore
│ └── pytorch_train.py
├── LICENSE.md
├── README.md
├── azuredeploy.json
├── host.json
├── images
├── arch_diagram.png
├── data_file_structure.png
└── testing_locally.png
├── local.settings.json
└── requirements.txt
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
1 |
4 | > Please provide us with the following information:
5 | > ---------------------------------------------------------------
6 |
7 | ### This issue is for a: (mark with an `x`)
8 | ```
9 | - [ ] bug report -> please search issues before submitting
10 | - [ ] feature request
11 | - [ ] documentation issue or request
12 | - [ ] regression (a behavior that used to work and stopped in a new release)
13 | ```
14 |
15 | ### Minimal steps to reproduce
16 | >
17 |
18 | ### Any log messages given by the failure
19 | >
20 |
21 | ### Expected/desired behavior
22 | >
23 |
24 | ### OS and Version?
25 | > Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
26 |
27 | ### Versions
28 | >
29 |
30 | ### Mention any other details that might be useful
31 |
32 | > ---------------------------------------------------------------
33 | > Thanks! We'll be in touch soon.
34 |
--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | ## Purpose
2 |
3 | * ...
4 |
5 | ## Does this introduce a breaking change?
6 |
7 | ```
8 | [ ] Yes
9 | [ ] No
10 | ```
11 |
12 | ## Pull Request Type
13 | What kind of change does this Pull Request introduce?
14 |
15 |
16 | ```
17 | [ ] Bugfix
18 | [ ] Feature
19 | [ ] Code style update (formatting, local variables)
20 | [ ] Refactoring (no functional changes, no api changes)
21 | [ ] Documentation content changes
22 | [ ] Other... Please describe:
23 | ```
24 |
25 | ## How to Test
26 | * Get the code
27 |
28 | ```
29 | git clone [repo-address]
30 | cd [repo-name]
31 | git checkout [branch-name]
32 | npm install
33 | ```
34 |
35 | * Test the code
36 |
37 | ```
38 | ```
39 |
40 | ## What to Check
41 | Verify that the following are valid
42 | * ...
43 |
44 | ## Other Information
45 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .vars
3 | bin
4 | obj
5 | csx
6 | .vs
7 | edge
8 | Publish
9 | config.json
10 | data
11 | extensions.*
12 | .vscode
13 |
14 | *.user
15 | *.suo
16 | *.cscfg
17 | *.Cache
18 | project.lock.json
19 |
20 | /packages
21 | /TestResults
22 |
23 | /tools/NuGet.exe
24 | /App_Data
25 | /secrets
26 | /data
27 | .secrets
28 | appsettings.json
29 |
30 | node_modules
31 |
32 | # Local python packages
33 | .python_packages/
34 |
35 | # Python Environments
36 | .env
37 | .venv
38 | env/
39 | venv/
40 | ENV/
41 | env.bak/
42 | venv.bak/
43 |
44 | # Byte-compiled / optimized / DLL files
45 | __pycache__/
46 | *.py[cod]
47 | *$py.class
48 |
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
1 | ## [project-title] Changelog
2 |
3 |
4 | # x.y.z (yyyy-mm-dd)
5 |
6 | *Features*
7 | * ...
8 |
9 | *Bug Fixes*
10 | * ...
11 |
12 | *Breaking Changes*
13 | * ...
14 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing to [project-title]
2 |
3 | This project welcomes contributions and suggestions. Most contributions require you to agree to a
4 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
5 | the rights to use your contribution. For details, visit https://cla.microsoft.com.
6 |
7 | When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
8 | a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
9 | provided by the bot. You will only need to do this once across all repos using our CLA.
10 |
11 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
12 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
13 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
14 |
15 | - [Code of Conduct](#coc)
16 | - [Issues and Bugs](#issue)
17 | - [Feature Requests](#feature)
18 | - [Submission Guidelines](#submit)
19 |
20 | ## Code of Conduct
21 | Help us keep this project open and inclusive. Please read and follow our [Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
22 |
23 | ## Found an Issue?
24 | If you find a bug in the source code or a mistake in the documentation, you can help us by
25 | [submitting an issue](#submit-issue) to the GitHub Repository. Even better, you can
26 | [submit a Pull Request](#submit-pr) with a fix.
27 |
28 | ## Want a Feature?
29 | You can *request* a new feature by [submitting an issue](#submit-issue) to the GitHub
30 | Repository. If you would like to *implement* a new feature, please submit an issue with
31 | a proposal for your work first, to be sure that we can use it.
32 |
33 | * **Small Features** can be crafted and directly [submitted as a Pull Request](#submit-pr).
34 |
35 | ## Submission Guidelines
36 |
37 | ### Submitting an Issue
38 | Before you submit an issue, search the archive, maybe your question was already answered.
39 |
40 | If your issue appears to be a bug, and hasn't been reported, open a new issue.
41 | Help us to maximize the effort we can spend fixing issues and adding new
42 | features, by not reporting duplicate issues. Providing the following information will increase the
43 | chances of your issue being dealt with quickly:
44 |
45 | * **Overview of the Issue** - if an error is being thrown a non-minified stack trace helps
46 | * **Version** - what version is affected (e.g. 0.1.2)
47 | * **Motivation for or Use Case** - explain what are you trying to do and why the current behavior is a bug for you
48 | * **Browsers and Operating System** - is this a problem with all browsers?
49 | * **Reproduce the Error** - provide a live example or a unambiguous set of steps
50 | * **Related Issues** - has a similar issue been reported before?
51 | * **Suggest a Fix** - if you can't fix the bug yourself, perhaps you can point to what might be
52 | causing the problem (line of code or commit)
53 |
54 | You can file new issues by providing the above information at the corresponding repository's issues link: https://github.com/[organization-name]/[repository-name]/issues/new].
55 |
56 | ### Submitting a Pull Request (PR)
57 | Before you submit your Pull Request (PR) consider the following guidelines:
58 |
59 | * Search the repository (https://github.com/[organization-name]/[repository-name]/pulls) for an open or closed PR
60 | that relates to your submission. You don't want to duplicate effort.
61 |
62 | * Make your changes in a new git fork:
63 |
64 | * Commit your changes using a descriptive commit message
65 | * Push your fork to GitHub:
66 | * In GitHub, create a pull request
67 | * If we suggest changes then:
68 | * Make the required updates.
69 | * Rebase your fork and force push to your GitHub repository (this will update your Pull Request):
70 |
71 | ```shell
72 | git rebase master -i
73 | git push -f
74 | ```
75 |
76 | That's it! Thank you for your contribution!
77 |
--------------------------------------------------------------------------------
/HttpTrigger/__init__.py:
--------------------------------------------------------------------------------
1 | import logging
2 | import azure.functions as func
3 | from azureml.core.authentication import ServicePrincipalAuthentication
4 | from azureml.core import Workspace, Experiment, Datastore
5 | from azureml.exceptions import ProjectSystemException
6 | from azureml.core.compute import ComputeTarget, AmlCompute
7 | from azureml.core.compute_target import ComputeTargetException
8 | from azureml.train.dnn import PyTorch
9 | import shutil
10 | import os
11 | import json
12 | import time
13 |
14 |
15 | def main(req: func.HttpRequest) -> (func.HttpResponse):
16 | logging.info('Python HTTP trigger function processed a request.')
17 |
18 | # For now this can be a POST where we have /api/HttpTrigger?start=
19 | image_url = req.params.get('start')
20 | logging.info(type(image_url))
21 |
22 | # Use service principal secrets to create authentication vehicle and
23 | # define workspace object
24 | try:
25 | svc_pr = ServicePrincipalAuthentication(
26 | tenant_id=os.getenv('TENANT_ID', ''),
27 | service_principal_id=os.getenv('APP_ID', ''),
28 | service_principal_password=os.getenv('PRINCIPAL_PASSWORD', ''))
29 |
30 | ws = Workspace(subscription_id=os.getenv('AZURE_SUB', ''),
31 | resource_group=os.getenv('RESOURCE_GROUP', ''),
32 | workspace_name=os.getenv('WORKSPACE_NAME',''),
33 | auth=svc_pr)
34 | print("Found workspace {} at location {} using Azure CLI \
35 | authentication".format(ws.name, ws.location))
36 | # Usually because authentication didn't work
37 | except ProjectSystemException as err:
38 | print('Authentication did not work.')
39 | return json.dumps('ProjectSystemException')
40 | # Need to create the workspace
41 | except Exception as err:
42 | ws = Workspace.create(name=os.getenv('WORKSPACE_NAME', ''),
43 | subscription_id=os.getenv('AZURE_SUB', ''),
44 | resource_group=os.getenv('RESOURCE_GROUP', ''),
45 | create_resource_group=True,
46 | location='westus', # Or other supported Azure region
47 | auth=svc_pr)
48 | print("Created workspace {} at location {}".format(ws.name, ws.location))
49 |
50 |
51 |
52 | # choose a name for your cluster - under 16 characters
53 | cluster_name = "gpuforpytorch"
54 |
55 | try:
56 | compute_target = ComputeTarget(workspace=ws, name=cluster_name)
57 | print('Found existing compute target.')
58 | except ComputeTargetException:
59 | print('Creating a new compute target...')
60 | # AML Compute config - if max_nodes are set, it becomes persistent storage that scales
61 | compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
62 | min_nodes=0,
63 | max_nodes=2)
64 | # create the cluster
65 | compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
66 | compute_target.wait_for_completion(show_output=True)
67 |
68 | # use get_status() to get a detailed status for the current cluster.
69 | # print(compute_target.get_status().serialize())
70 |
71 | # # Create a project directory and copy training script to ii
72 | project_folder = os.path.join(os.getcwd(), 'HttpTrigger', 'project')
73 | # os.makedirs(project_folder, exist_ok=True)
74 | # shutil.copy(os.path.join(os.getcwd(), 'HttpTrigger', 'pytorch_train.py'), project_folder)
75 |
76 | # Create an experiment
77 | experiment_name = 'fish-no-fish'
78 | experiment = Experiment(ws, name=experiment_name)
79 |
80 | # Use an AML Data Store for training data
81 | ds = Datastore.register_azure_blob_container(workspace=ws,
82 | datastore_name='funcdefaultdatastore',
83 | container_name=os.getenv('STORAGE_CONTAINER_NAME_TRAINDATA', ''),
84 | account_name=os.getenv('STORAGE_ACCOUNT_NAME', ''),
85 | account_key=os.getenv('STORAGE_ACCOUNT_KEY', ''),
86 | create_if_not_exists=True)
87 |
88 | # Use an AML Data Store to save models back up to
89 | ds_models = Datastore.register_azure_blob_container(workspace=ws,
90 | datastore_name='modelsdatastorage',
91 | container_name=os.getenv('STORAGE_CONTAINER_NAME_MODELS', ''),
92 | account_name=os.getenv('STORAGE_ACCOUNT_NAME', ''),
93 | account_key=os.getenv('STORAGE_ACCOUNT_KEY', ''),
94 | create_if_not_exists=True)
95 |
96 | # Set up for training ("trans" flag means - use transfer learning and
97 | # this should download a model on compute)
98 | # Using /tmp to store model and info due to the fact that
99 | # creating new folders and files on the Azure Function host
100 | # will trigger the function to restart.
101 | script_params = {
102 | '--data_dir': ds.as_mount(),
103 | '--num_epochs': 30,
104 | '--learning_rate': 0.01,
105 | '--output_dir': '/tmp/outputs',
106 | '--trans': 'True'
107 | }
108 |
109 | # Instantiate PyTorch estimator with upload of final model to
110 | # a specified blob storage container (this can be anything)
111 | estimator = PyTorch(source_directory=project_folder,
112 | script_params=script_params,
113 | compute_target=compute_target,
114 | entry_script='pytorch_train.py',
115 | use_gpu=True,
116 | inputs=[ds_models.as_upload(path_on_compute='./outputs/model_finetuned.pth')])
117 |
118 | run = experiment.submit(estimator)
119 | print(run.get_details())
120 |
121 | # # The following would certainly be blocking, but that's ok for debugging
122 | # while run.get_status() not in ['Completed', 'Failed']: # For example purposes only, not exhaustive
123 | # print('Run {} not in terminal state'.format(run.id))
124 | # time.sleep(10)
125 |
126 | return json.dumps(run.get_status())
127 |
--------------------------------------------------------------------------------
/HttpTrigger/function.json:
--------------------------------------------------------------------------------
1 | {
2 | "scriptFile": "__init__.py",
3 | "bindings": [
4 | {
5 | "authLevel": "anonymous",
6 | "type": "httpTrigger",
7 | "direction": "in",
8 | "name": "req",
9 | "methods": [
10 | "get",
11 | "post"
12 | ]
13 | },
14 | {
15 | "type": "http",
16 | "direction": "out",
17 | "name": "$return"
18 | }
19 | ]
20 | }
--------------------------------------------------------------------------------
/HttpTrigger/host.json:
--------------------------------------------------------------------------------
1 | {
2 | "version": "2.0"
3 | }
--------------------------------------------------------------------------------
/HttpTrigger/project/.gitignore:
--------------------------------------------------------------------------------
1 | assets
2 | aml_config
3 | .amlignore
--------------------------------------------------------------------------------
/HttpTrigger/project/pytorch_train.py:
--------------------------------------------------------------------------------
1 | """
2 | Based on: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
3 |
4 | In this tutorial, you will learn how to train your network using
5 | transfer learning. You can read more about the transfer learning at `cs231n
6 | notes `__
7 |
8 | Quoting these notes,
9 |
10 | In practice, very few people train an entire Convolutional Network
11 | from scratch (with random initialization), because it is relatively
12 | rare to have a dataset of sufficient size. Instead, it is common to
13 | pretrain a ConvNet on a very large dataset (e.g. ImageNet, which
14 | contains 1.2 million images with 1000 categories), and then use the
15 | ConvNet either as an initialization or a fixed feature extractor for
16 | the task of interest.
17 |
18 | These two major transfer learning scenarios look as follows:
19 |
20 | - **Finetuning the convnet**: Instead of random initializaion, we
21 | initialize the network with a pretrained network, like the one that is
22 | trained on imagenet 1000 dataset. Rest of the training looks as
23 | usual.
24 | - **ConvNet as fixed feature extractor**: Here, we will freeze the weights
25 | for all of the network except that of the final fully connected
26 | layer. This last fully connected layer is replaced with a new one
27 | with random weights and only this layer is trained.
28 |
29 | **Original Author**: `Sasank Chilamkurthy `_
30 | """
31 |
32 | from __future__ import print_function, division
33 | import torch
34 | import torch.nn as nn
35 | import torch.optim as optim
36 | from torch.optim import lr_scheduler
37 | from torchvision import datasets, models, transforms
38 | import numpy as np
39 | import time
40 | import os
41 | import copy
42 | import argparse
43 |
44 |
45 | from azureml.core.run import Run
46 | from azureml.core import Datastore
47 |
48 | # get the Azure ML run object
49 | run = Run.get_context()
50 |
51 | def load_data(data_dir):
52 | """Load the train/val data."""
53 |
54 | # Data augmentation and normalization for training
55 | # Just normalization for validation
56 | data_transforms = {
57 | 'train': transforms.Compose([
58 | transforms.RandomResizedCrop(224),
59 | transforms.RandomHorizontalFlip(),
60 | transforms.ToTensor(),
61 | transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
62 | ]),
63 | 'val': transforms.Compose([
64 | transforms.Resize(256),
65 | transforms.CenterCrop(224),
66 | transforms.ToTensor(),
67 | transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
68 | ]),
69 | }
70 |
71 | image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, 'data', x),
72 | data_transforms[x])
73 | for x in ['train', 'val']}
74 | dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
75 | shuffle=True, num_workers=4)
76 | for x in ['train', 'val']}
77 | dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
78 | class_names = image_datasets['train'].classes
79 |
80 | return dataloaders, dataset_sizes, class_names
81 |
82 |
83 | def train_model(model, criterion, optimizer, scheduler, num_epochs, data_dir):
84 | """Train the model."""
85 |
86 | # load training/validation data
87 | dataloaders, dataset_sizes, class_names = load_data(data_dir)
88 |
89 | device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
90 |
91 | since = time.time()
92 |
93 | best_model_wts = copy.deepcopy(model.state_dict())
94 | best_acc = 0.0
95 |
96 | for epoch in range(num_epochs):
97 | print('Epoch {}/{}'.format(epoch, num_epochs - 1))
98 | print('-' * 10)
99 |
100 | # Each epoch has a training and validation phase
101 | for phase in ['train', 'val']:
102 | if phase == 'train':
103 | scheduler.step()
104 | model.train() # Set model to training mode
105 | else:
106 | model.eval() # Set model to evaluate mode
107 |
108 | running_loss = 0.0
109 | running_corrects = 0
110 |
111 | # Iterate over data.
112 | for inputs, labels in dataloaders[phase]:
113 | inputs = inputs.to(device)
114 | labels = labels.to(device)
115 |
116 | # zero the parameter gradients
117 | optimizer.zero_grad()
118 |
119 | # forward
120 | # track history if only in train
121 | with torch.set_grad_enabled(phase == 'train'):
122 | outputs = model(inputs)
123 | _, preds = torch.max(outputs, 1)
124 | loss = criterion(outputs, labels)
125 |
126 | # backward + optimize only if in training phase
127 | if phase == 'train':
128 | loss.backward()
129 | optimizer.step()
130 |
131 | # statistics
132 | running_loss += loss.item() * inputs.size(0)
133 | running_corrects += torch.sum(preds == labels.data)
134 |
135 | epoch_loss = running_loss / dataset_sizes[phase]
136 | epoch_acc = running_corrects.double() / dataset_sizes[phase]
137 |
138 | print('{} Loss: {:.4f} Acc: {:.4f}'.format(
139 | phase, epoch_loss, epoch_acc))
140 |
141 | # deep copy the model
142 | if phase == 'val' and epoch_acc > best_acc:
143 | best_acc = epoch_acc
144 | best_model_wts = copy.deepcopy(model.state_dict())
145 |
146 | # log the best val accuracy to AML run
147 | run.log('best_val_acc', np.float(best_acc))
148 |
149 | print()
150 |
151 | time_elapsed = time.time() - since
152 | print('Training complete in {:.0f}m {:.0f}s'.format(
153 | time_elapsed // 60, time_elapsed % 60))
154 | print('Best val Acc: {:4f}'.format(best_acc))
155 |
156 | # load best model weights
157 | model.load_state_dict(best_model_wts)
158 | return model
159 |
160 |
161 | def fine_tune_model(num_epochs, data_dir, learning_rate, momentum, transfer_learn):
162 | """Load a pretrained model and reset the final fully connected layer."""
163 |
164 | # log the hyperparameter metrics to the AML run
165 | run.log('lr', np.float(learning_rate))
166 | run.log('momentum', np.float(momentum))
167 |
168 | model_ft = models.resnet18(pretrained=transfer_learn)
169 | num_ftrs = model_ft.fc.in_features
170 | model_ft.fc = nn.Linear(num_ftrs, 2) # only 2 classes to predict
171 |
172 | device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
173 | model_ft = model_ft.to(device)
174 |
175 | criterion = nn.CrossEntropyLoss()
176 |
177 | # Observe that all parameters are being optimized
178 | optimizer_ft = optim.SGD(model_ft.parameters(),
179 | lr=learning_rate, momentum=momentum)
180 |
181 | # Decay LR by a factor of 0.1 every 7 epochs
182 | exp_lr_scheduler = lr_scheduler.StepLR(
183 | optimizer_ft, step_size=7, gamma=0.1)
184 |
185 | model = train_model(model_ft, criterion, optimizer_ft,
186 | exp_lr_scheduler, num_epochs, data_dir)
187 |
188 | # Complete the run
189 | run.complete()
190 |
191 | return model
192 |
193 | def main():
194 | print("PyTorch version:", torch.__version__)
195 |
196 | # get command-line arguments
197 | parser = argparse.ArgumentParser()
198 | parser.add_argument('--data_dir', type=str, help='data directory',
199 | default='data')
200 | parser.add_argument('--num_epochs', type=int, default=25,
201 | help='number of epochs to train')
202 | parser.add_argument('--output_dir', type=str, help='output directory',
203 | default='models')
204 | parser.add_argument('--learning_rate', type=float,
205 | default=0.001, help='learning rate')
206 | parser.add_argument('--trans', type=str, default='True',
207 | help='Set to True if wishing to use transfer learning')
208 | parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
209 | args = parser.parse_args()
210 |
211 | print("data directory is: {}".format(args.data_dir))
212 | print("using transfer learning: {}".format(args.trans))
213 | model = fine_tune_model(args.num_epochs, args.data_dir,
214 | args.learning_rate, args.momentum, bool(args.trans))
215 | os.makedirs(args.output_dir, exist_ok=True)
216 | torch.save(model, os.path.join(args.output_dir, 'model_finetuned.pth'))
217 |
218 | if __name__ == "__main__":
219 | main()
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) Microsoft Corporation. All rights reserved.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ---
2 | page_type: sample
3 | languages:
4 | - python
5 | products:
6 | - azure
7 | - azure-functions
8 | description: "Communicate the process of training a model using a Python-based Azure Function and the Azure ML Python SDK."
9 | urlFragment: training-a-model-with-azureml-and-azure-functions
10 | ---
11 |
12 | # Training a Model with AzureML and Azure Functions
13 |
14 | Automating the training of new ML model given code updates and/or new data with labels provided by a data scientist, can pose a challenge in the context of the dev ops or app development process due to the manual nature of data science work. One solution would be to use a training script (written by the data scientist) with the Azure Machine Learning SDK for Python (Azure ML Python SDK) run with a lightweight Azure Function that sends a training script to larger compute (process managed by the dev ops professional) to train an ML model (automatically performed when new data appears). This, then, triggers the build and release of an intelligent application (managed by the app developer).
15 |
16 | The intent of this repository is to communicate the process of training a model using a Python-based Azure Function and the Azure ML Python SDK, as well as, to provide a code sample for doing so. Training a model with the Azure ML Python SDK involves utilizing an Azure Compute option (e.g. an N-Series AML Compute) - the model **_is not_** trained within the Azure Function Consumption Plan. Triggers for the Azure Function could be HTTP Requests, an Event Grid or some other trigger.
17 |
18 | The motivation behind this process was to provide a way to automate ML model training/retraining in a lightweight, serverless fashion, once new data is provided _and potentially_ labels which are stored Azure Storage Blob containers. This solution is especially attractive for people familiar with Azure Functions.
19 |
20 | The idea is that once new data is provided, it would signal training a new model via an Event Grid - note that the _training is actually done on a separate compute_, not in the Function. The Azure ML SDK can be set up to output a model to a user-specified Storage option, here shown as the "Trained Models" Blob Storage Container. Another lightweight Function, triggered by Event Grid, may be used to send an HTTP request to an Azure DevOps build Pipeline to create the final application.
21 |
22 | The following diagram represents an example process as part of a larger deployment. The end product or application could be an IoT Edge module, web service or any other application a DevOps build/release can produce.
23 |
24 | 
25 |
26 | ## Getting started
27 |
28 | The instructions below are an example - it follows [this Azure Docs tutorial](https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-first-function-python) which should be referenced as needed.
29 |
30 | The commands are listed here for quick reference (but if errors are encountered, check the docs link above for troubleshooting, as it may have been updated).
31 |
32 | ## Prerequisites
33 |
34 | - Install Python 3.6
35 | - Install [Functions Core Tools](https://docs.microsoft.com/en-us/azure/azure-functions/functions-run-local#v2)
36 | - Install Docker
37 | - Note: If run on Windows you can use Ubuntu WSL to run the scripts
38 |
39 | ## Steps
40 |
41 | In summary, the steps include the following sections.
42 |
43 | - [Deploy locally](#deploy-locally)
44 | - [Set up a Service Principal for AzureML](#set-up-a-service-principal-for-azureml)
45 | - [Data setup](#data-setup)
46 | - [Test locally](#test-locally)
47 | - [Deploy function to Azure](#deploy-function-to-azure)
48 |
49 | ### Deploy locally
50 |
51 | #### Set up virtual environment
52 |
53 | Important notes:
54 |
55 | * It is good practice to create a fresh virtual environment for each function
56 | * Make sure `.env`, which holds the virtual environment, once created, resides in main folder (same place as `requirements.txt`)
57 | * Make sure to use the `pip` installer from the virtual environment
58 |
59 | **Create the virtual environment**
60 |
61 | The `venv` command is part of the Python standard library as of version 3.3. Python 3.6 is being used in this sample.
62 |
63 | ```
64 | python3.6 -m venv .env
65 | ```
66 |
67 | **Activate the virtual environment**
68 |
69 | On Windows, the command is:
70 |
71 | ```
72 | .env\Scripts\activate
73 |
74 | ```
75 |
76 | On unix systems (including MacOS), the command is:
77 | ```
78 | source .env/bin/activate
79 | ```
80 |
81 | **Install the required Python packages**
82 |
83 | Please check the `requirements.txt` file for versions of packages used.
84 |
85 | IMPORTANT NOTE: If a more recent version of a package is available it is ok to update after testing locally and in a staging environment.
86 |
87 | On Windows, the command to install packages is as follows.
88 |
89 | ```
90 | .env\Scripts\pip install -r requirements.txt
91 | ```
92 |
93 | On unix systems (including macOS), the command is as follows.
94 |
95 | ```
96 | .env/bin/pip install -r requirements.txt
97 | ```
98 |
99 | ### Set up a Service Principal for AzureML
100 |
101 | This will set up an Azure Active Directory registered application so that we can authenticate in Azure in code, as we will do for the AzureML workspace, without a requirement for interactive or CLI-based login.
102 |
103 | Follow the brief instructions under "Service Principal Authentication" in this doc for setting up the application that will allow authentication more easily.
104 |
105 | Take note of the variables mentioned in the doc for the next section.
106 |
107 | #### Set up environment variables
108 |
109 | This is so that AzureML, Service Principal and correct storage accounts may be accessed.
110 |
111 | **For local testing**
112 |
113 | Unix systems create a shell script `set_vars.sh` to set environment variables in the current shell. Run this in the bash terminal.
114 |
115 | ```
116 | export AZURE_SUB=
117 | export RESOURCE_GROUP=
118 | export WORKSPACE_NAME=
119 | export STORAGE_CONTAINER_NAME_TRAINDATA=
120 | export STORAGE_CONTAINER_NAME_MODELS=
121 | export STORAGE_ACCOUNT_NAME=
122 | export STORAGE_ACCOUNT_KEY=
123 | export TENANT_ID=
124 | export APP_ID=
125 | export PRINCIPAL_PASSWORD=
126 | ```
127 |
128 | On Windows create a script called `set_vars.cmd` and run it in the shell where the work is being done.
129 |
130 | ```
131 | set AZURE_SUB
132 | set RESOURCE_GROUP
133 | set WORKSPACE_NAME
134 | set STORAGE_CONTAINER_NAME_TRAINDATA
135 | set STORAGE_CONTAINER_NAME_MODELS
136 | set STORAGE_ACCOUNT_NAME
137 | set STORAGE_ACCOUNT_KEY
138 | set TENANT_ID
139 | set APP_ID
140 | set PRINCIPAL_PASSWORD
141 | ```
142 |
143 | Run each file in a bash shell/terminal window.
144 |
145 | Further descriptions of the environment variables are as follows.
146 |
147 | 1. `AZURE_SUB` - the Azure Subscription id
148 | 2. `RESOURCE_GROUP` - the resource group in which AzureML Workspace is found
149 | 3. `WORKSPACE_NAME` - the AzureML Workspace name (create this if it doesn't exist - [with code](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python) or [in Azure Portal](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started))
150 | 4. `STORAGE_CONTAINER_NAME_TRAINDATA` - the Blob Storage container name containing the training data (in this sample is was fish images - see [Data setup](#data-setup) above.
151 | 5. `STORAGE_CONTAINER_NAME_MODELS` - the specific Blob Storage container where the output model should go (this could be the same as the `STORAGE_CONTAINER_NAME_TRAINDATA`)
152 | 5. `STORAGE_ACCOUNT_NAME` - the Storage Account name for the training and model blobs
153 | 6. `STORAGE_ACCOUNT_KEY` - the Storage Account access key for the training and model blobs (Note: these must be in the same Storage Account)
154 | 7. `TENANT_ID` - from AAD the tenant ID for subscription from Service Principal step
155 | 8. `APP_ID` - the AAD registered application ID from Service Principal step
156 | 9. `PRINCIPAL_PASSWORD` - the AAD registered app password from Service Principal step
157 |
158 | **Note: For when moving on to Azure deployments**
159 |
160 | Add as a key/value pairs, when performing deployment to Azure, the following under **Application settings** in the "Application settings" configuration link/tab in the Azure Portal under the published Azure Function App.
161 |
162 | ### Data setup
163 |
164 | In this example the labels are `fish`/`not_fish` for a binary classification scenario in this example which uses the PyTorch framework. The data structure in this repo is shown in the following image. For adding training data, use this structure so that the Python scripts may find the data.
165 |
166 | A `train` and `val` folder are both required for training. The folders under `train` and `val` are used in PyTorch's `datasets.ImageFolderImage()` function that delineates the labels using folder names, a common pattern for classification.
167 |
168 | Notice that the `data` folder is under the Function's `HttpTrigger` folder.
169 |
170 |
171 |
172 |
173 | ### Test locally
174 |
175 | **Start the function**
176 |
177 | From the base of the repo run the following.
178 |
179 | ```
180 | func host start
181 | ```
182 |
183 | Information similar to the following should appear.
184 |
185 | 
186 |
187 | This provides a URL with which to use as a POST HTTP call.
188 |
189 | **Call the function**
190 |
191 | For now this can be a POST request using `https:///api/HttpTrigger?start=`, where `start` is specified as the parameter in the Azure Function `__init__.py` code and the value is any string for this sample (note: this is a potential entrypoint for passing a variable to the Azure Function in the future).
192 |
193 |
194 | One way to call the Function App is with the `curl` command.
195 |
196 | ```
197 | curl http://localhost:7071/api/HttpTrigger?start=foo
198 | ```
199 |
200 |
201 | ### Deploy function to Azure
202 |
203 | Use the following commands to deploy the function to Azure from a local machine that _has this repo cloned locally_.
204 |
205 | Here is an example of deploying this sample to `westus` region. Update the `--location`, `--name`'s, `resource-group`, and `--sku` as needed.
206 |
207 | Use the Azure CLI to log in.
208 |
209 | ```
210 | az login
211 | ```
212 |
213 | Create a resource group for the Azure Function.
214 |
215 | ```
216 | az group create --name azfunc --location westus
217 | ```
218 |
219 | Create a storage account for the Azure Function.
220 |
221 | ```
222 | az storage account create --name azfuncstorage123 --location westus --resource-group azfunc --sku Standard_LRS
223 |
224 | ```
225 |
226 | Create the Azure Function.
227 |
228 | ```
229 | az functionapp create --resource-group azfunc --os-type Linux --consumption-plan-location westus --runtime python --name dnnfuncapp --storage-account azfuncstorage123
230 |
231 | ```
232 |
233 | Publish the Azure Function.
234 |
235 | ```
236 | func azure functionapp publish dnnfuncapp --build-native-deps
237 | ```
238 |
239 | IMPORTANT NOTE: Don't forget to add as a key/value pairs, when performing deployment to Azure, the environment variables (from above) under **Application settings** in the "Application settings" configuration link/tab in the Azure Portal under the published Azure Function App.
240 |
241 | #### Test deployment
242 |
243 | For now this can be a POST request using `https:///api/HttpTrigger?start=`, where `start` is specified as the parameter in the Azure Function `__init__.py` code and the value is any string (this is a potential entrypoint for passing a variable to the Azure Function in the future).
244 |
245 | One way to call the Function App, for e.g., is:
246 |
247 | ```
248 | curl https://dnnfuncapp.azurewebsites.net/api/HttpTrigger?start=foo
249 | ```
250 |
251 | Or to go to the browser and enter in the same URL.
252 |
253 | This may time out, but don't worry if this happens. For proof of successful execution check for a completed AzureML Experiment run in the Azure Portal ML Workspace and look for the model in the blob storage as well (specified earlier in STORAGE_CONTAINER_NAME_MODELS).
254 |
255 | ## References
256 |
257 | 1. Queue an Azure DevOps Build with HTTP request
258 | 2. Azure Machine Learning Services Overview
259 | 3. Using a Azure Storage with Azure ML Python SDK
260 | 4. Python Azure Functions
261 | 5. [How to call another function with in an Azure function (StackOverflow)](https://stackoverflow.com/questions/46315734/how-to-call-another-function-with-in-an-azure-function)
262 | 6. [Creation of virtual environments](https://docs.python.org/3/library/venv.html)
263 | 7. [How to access data in Blob and elsewhere with the AzureML Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data)
264 |
265 | ## Troubleshooting
266 |
267 | * Function using the wrong Python: if there are multiple versions of Python on the system, be sure to preface any `func` commands with `PYTHONPATH=.env/bin/python` to ensure the correct Python interpreter is being used.
268 |
269 |
--------------------------------------------------------------------------------
/azuredeploy.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
3 | "contentVersion": "1.0.0.0",
4 | "parameters": {
5 | "uniqueResourceNameSuffix" : {
6 | "type" : "string",
7 | "defaultValue" : "[uniqueString(subscription().subscriptionId, resourceGroup().id)]"
8 | },
9 | "config_web_name": {
10 | "defaultValue": "web",
11 | "type": "String"
12 | },
13 | "storageName": {
14 | "defaultValue": "[uniqueString(subscription().subscriptionId, resourceGroup().id)]",
15 | "type": "String"
16 | },
17 | "linuxConsumptionAppName": {
18 | "defaultValue": "WestUSLinuxDynamicPlan",
19 | "type": "String"
20 | }
21 | },
22 | "variables": {
23 | "storageAccountid": "[concat(resourceGroup().id,'/providers/','Microsoft.Storage/storageAccounts/', parameters('storageName'))]",
24 | "functionapp" : "[concat('incv3',parameters('uniqueResourceNameSuffix'))]",
25 | "siteName1" : "[concat(variables('functionapp'),'.azurewebsites.net')]"
26 | },
27 | "resources": [
28 | {
29 | "name": "[parameters('storageName')]",
30 | "type": "Microsoft.Storage/storageAccounts",
31 | "apiVersion": "2017-10-01",
32 | "sku": {
33 | "name": "Standard_LRS"
34 | },
35 | "kind": "StorageV2",
36 | "location": "West US",
37 | "tags": {},
38 | "properties": {
39 | "accessTier": "Hot"
40 | }
41 | },
42 | {
43 | "type": "Microsoft.Web/serverfarms",
44 | "sku": {
45 | "name": "Y1",
46 | "tier": "Dynamic",
47 | "size": "Y1",
48 | "family": "Y",
49 | "capacity": 0
50 | },
51 | "kind": "functionapp",
52 | "name": "[parameters('linuxConsumptionAppName')]",
53 | "apiVersion": "2016-09-01",
54 | "location": "West US",
55 | "properties": {
56 | "name": "[parameters('linuxConsumptionAppName')]",
57 | "perSiteScaling": false,
58 | "reserved": true
59 | },
60 | "dependsOn": []
61 | },
62 | {
63 | "type": "Microsoft.Web/sites",
64 | "kind": "functionapp,linux",
65 | "name": "[variables('functionapp')]",
66 | "apiVersion": "2016-08-01",
67 | "location": "West US",
68 | "properties": {
69 | "enabled": true,
70 | "hostNameSslStates": [
71 | {
72 | "name": "[concat(variables('functionapp'),'.azurewebsites.net')]",
73 | "sslState": "Disabled",
74 | "hostType": "Standard"
75 | }
76 | ],
77 | "serverFarmId": "[resourceId('Microsoft.Web/serverfarms', parameters('linuxConsumptionAppName'))]",
78 | "reserved": true,
79 | "siteConfig": {
80 | "appSettings": [
81 | {
82 | "name": "AzureWebJobsDashboard",
83 | "value": "[concat('DefaultEndpointsProtocol=https;AccountName=', parameters('storageName'), ';AccountKey=', listKeys(variables('storageAccountid'),'2015-05-01-preview').key1)]"
84 | },
85 | {
86 | "name": "AzureWebJobsStorage",
87 | "value": "[concat('DefaultEndpointsProtocol=https;AccountName=', parameters('storageName'), ';AccountKey=', listKeys(variables('storageAccountid'),'2015-05-01-preview').key1)]"
88 | },
89 | {
90 | "name": "WEBSITE_CONTENTAZUREFILECONNECTIONSTRING",
91 | "value": "[concat('DefaultEndpointsProtocol=https;AccountName=', parameters('storageName'), ';AccountKey=', listKeys(variables('storageAccountid'),'2015-05-01-preview').key1)]"
92 | },
93 | {
94 | "name": "WEBSITE_CONTENTSHARE",
95 | "value": "[variables('functionapp')]"
96 | },
97 | {
98 | "name": "FUNCTIONS_EXTENSION_VERSION",
99 | "value": "~2"
100 | },
101 | {
102 | "name": "WEBSITE_NODE_DEFAULT_VERSION",
103 | "value": "8.11.1"
104 | },
105 | {
106 | "name": "FUNCTIONS_WORKER_RUNTIME",
107 | "value": "python"
108 | }
109 | ]
110 | }
111 | },
112 | "dependsOn": [
113 | "[resourceId('Microsoft.Web/serverfarms', parameters('linuxConsumptionAppName'))]"
114 | ]
115 | },
116 | {
117 | "type": "Microsoft.Web/sites/config",
118 | "name": "[concat(variables('functionapp'), '/', parameters('config_web_name'))]",
119 | "apiVersion": "2016-08-01",
120 | "location": "West US",
121 | "properties": {
122 | "netFrameworkVersion": "v4.0",
123 | "scmType": "None",
124 | "use32BitWorkerProcess": true,
125 | "webSocketsEnabled": false,
126 | "alwaysOn": false,
127 | "appCommandLine": "",
128 | "managedPipelineMode": "Integrated",
129 | "virtualApplications": [
130 | {
131 | "virtualPath": "/",
132 | "physicalPath": "site\\wwwroot",
133 | "preloadEnabled": false
134 | }
135 | ],
136 | "customAppPoolIdentityAdminState": false,
137 | "customAppPoolIdentityTenantState": false,
138 | "loadBalancing": "LeastRequests",
139 | "routingRules": [],
140 | "experiments": {
141 | "rampUpRules": []
142 | },
143 | "autoHealEnabled": false,
144 | "vnetName": ""
145 | },
146 | "dependsOn": [
147 | "[resourceId('Microsoft.Web/sites', variables('functionapp'))]"
148 | ]
149 | },
150 | {
151 | "type": "Microsoft.Web/sites/hostNameBindings",
152 | "name": "[concat(variables('functionapp'), '/', variables('siteName1'))]",
153 | "apiVersion": "2016-08-01",
154 | "location": "West US",
155 | "properties": {
156 | "siteName": "ldamodeling",
157 | "hostNameType": "Verified"
158 | },
159 | "dependsOn": [
160 | "[resourceId('Microsoft.Web/sites', variables('functionapp'))]"
161 | ]
162 | }
163 | ]
164 | }
165 |
--------------------------------------------------------------------------------
/host.json:
--------------------------------------------------------------------------------
1 | {
2 | "version": "2.0"
3 | }
--------------------------------------------------------------------------------
/images/arch_diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/functions-python-azureml-azurefunctions-deeplearning/5de8d3752b92411c6ededb0cb41c3d01f217dc13/images/arch_diagram.png
--------------------------------------------------------------------------------
/images/data_file_structure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/functions-python-azureml-azurefunctions-deeplearning/5de8d3752b92411c6ededb0cb41c3d01f217dc13/images/data_file_structure.png
--------------------------------------------------------------------------------
/images/testing_locally.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/functions-python-azureml-azurefunctions-deeplearning/5de8d3752b92411c6ededb0cb41c3d01f217dc13/images/testing_locally.png
--------------------------------------------------------------------------------
/local.settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "IsEncrypted": false,
3 | "Values": {
4 | "FUNCTIONS_WORKER_RUNTIME": "python",
5 | "AzureWebJobsStorage": "{AzureWebJobsStorage}"
6 | }
7 | }
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | azure-functions==1.0.0b4
2 | azure-functions-worker==1.0.0b6
3 | grpcio==1.20.1
4 | grpcio-tools==1.20.1
5 | protobuf==3.6.1
6 | six==1.12.0
7 | azureml-sdk==1.0.39
8 | requests==2.20.1
9 | scikit-image==0.14.2
10 | scikit-learn==0.20.2
11 |
--------------------------------------------------------------------------------