├── pytest.ini ├── pyproject.toml ├── src ├── data │ └── .gitkeep ├── rai │ └── .gitkeep ├── deploy │ ├── .gitkeep │ ├── batch │ │ └── score.py │ └── online │ │ └── score.py ├── features │ └── .gitkeep ├── monitor │ └── .gitkeep └── model │ ├── register_model.py │ └── train.py ├── environments ├── Dockerfile ├── code_quality.txt ├── requirements.txt └── conda_train.yml ├── tests ├── unit │ └── .gitkeep └── data_validation │ └── .gitkeep ├── tox.ini ├── .env.example ├── NOTICE.md ├── .vscode └── settings.json ├── cli ├── endpoints │ ├── batch_endpoint.yml │ ├── online_endpoint.yml │ ├── online_deployment_mlflow.yml │ ├── online_deployment.yml │ ├── batch_deployment_mlflow.yml │ └── batch_deployment.yml ├── assets │ ├── create-data.yml │ ├── create-compute.yml │ ├── register_model.yml │ ├── create-environment.yml │ └── register_model_mlflow.yml └── jobs │ └── train.yml ├── .devcontainer ├── noop.txt ├── Dockerfile ├── devcontainer.json └── add-notice.sh ├── scripts ├── jobs │ └── train.sh ├── prototyping │ └── run-notebooks.sh ├── endpoints │ ├── deploy-batch-endpoint-custom.sh │ ├── deploy-online-endpoint-custom.sh │ ├── deploy-batch-endpoint-mlflow.sh │ └── deploy-online-endpoint-mlflow.sh ├── configure-workspace.sh ├── assets │ ├── create-data.sh │ ├── create-compute.sh │ ├── create-environment.sh │ └── register-model.sh └── setup.sh ├── data └── samples │ └── nyc_taxi_sample.json ├── CODE_OF_CONDUCT.md ├── SUPPORT.md ├── .github ├── dependabot.yml └── workflows │ ├── smoke-testing.yml │ ├── smoke-testing-azureml.yml │ ├── smoke-testing-python-script.yml │ ├── microsoft-security-devops-analysis.yml │ ├── smoke-testing-notebook.yml │ ├── code-quality.yml │ └── codeql-analysis.yml ├── pipelines ├── train.yml ├── eval.yml ├── prep.yml ├── score.yml ├── pipeline.yml ├── train │ └── train.py ├── prep │ └── prep.py ├── score │ └── score.py └── eval │ └── eval.py ├── CONTRIBUTING.md ├── docs ├── images │ └── azureml-icon.svg ├── quickstart.md └── coding-guidelines.md ├── .pre-commit-config.yaml ├── LICENSE ├── .gitignore ├── SECURITY.md ├── utils └── prepare_data.py ├── notebooks ├── train-experiment.ipynb ├── train-mlflow-local.ipynb └── train-model-debugging.ipynb └── README.md /pytest.ini: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/data/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/rai/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /environments/Dockerfile: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/deploy/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/features/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/monitor/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tests/unit/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tests/data_validation/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [flake8] 2 | max-line-length = 120 3 | -------------------------------------------------------------------------------- /.env.example: -------------------------------------------------------------------------------- 1 | GROUP="" 2 | LOCATION="" 3 | WORKSPACE="" 4 | SUBSCRIPTION="" 5 | -------------------------------------------------------------------------------- /NOTICE.md: -------------------------------------------------------------------------------- 1 | NOTICES 2 | This repository incorporates material as listed below or described in the code. 3 | -------------------------------------------------------------------------------- /environments/code_quality.txt: -------------------------------------------------------------------------------- 1 | black[jupyter]==23.1.0 2 | isort[color]==5.12.0 3 | flake8==6.0.0 4 | mypy==0.991 5 | -------------------------------------------------------------------------------- /environments/requirements.txt: -------------------------------------------------------------------------------- 1 | scikit-learn 2 | pandas 3 | ipykernel 4 | matplotlib 5 | 6 | 7 | black 8 | flake8 9 | pytest 10 | pre-commit 11 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "python.linting.flake8Enabled": true, 3 | "python.linting.pylintEnabled": false, 4 | "python.linting.enabled": true 5 | } 6 | -------------------------------------------------------------------------------- /cli/endpoints/batch_endpoint.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json 2 | description: endpoint for batch-deployment 3 | auth_mode: aad_token 4 | -------------------------------------------------------------------------------- /cli/endpoints/online_endpoint.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json 2 | description: endpoint for online-deployment 3 | auth_mode: key 4 | -------------------------------------------------------------------------------- /.devcontainer/noop.txt: -------------------------------------------------------------------------------- 1 | This file is copied into the container along with environment.yml* from the 2 | parent folder. This is done to prevent the Dockerfile COPY instruction from 3 | failing if no environment.yml is found. 4 | -------------------------------------------------------------------------------- /cli/assets/create-data.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/data.schema.json 2 | name: nyc_taxi_dataset 3 | description: nyc_taxi_dataset. 4 | type: uri_file 5 | path: ../../data/raw/nyc_taxi_dataset.csv 6 | -------------------------------------------------------------------------------- /scripts/jobs/train.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | az ml job create -f ./cli/jobs/train.yml --stream 9 | -------------------------------------------------------------------------------- /cli/assets/create-compute.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json 2 | name: cpu-cluster 3 | type: amlcompute 4 | size: STANDARD_DS3_v2 5 | min_instances: 0 6 | max_instances: 4 7 | idle_time_before_scale_down: 300 8 | -------------------------------------------------------------------------------- /cli/endpoints/online_deployment_mlflow.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json 2 | name: blue-mlflow 3 | model: azureml:nyc_taxi_mlflow@latest 4 | instance_type: Standard_DS4_v2 5 | instance_count: 1 6 | -------------------------------------------------------------------------------- /cli/assets/register_model.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/model.schema.json 2 | name: nyc_taxi 3 | type: custom_model 4 | path: azureml://datastores/workspaceblobstore/paths/nyc-taxi/models 5 | description: データストア上にあるモデルファイルをカスタム型のモデルとして登録する。 6 | -------------------------------------------------------------------------------- /cli/assets/create-environment.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json 2 | name: nyc-taxi-env 3 | image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04 4 | conda_file: ../../environments/conda_train.yml 5 | description: nyc-taxi-env 6 | -------------------------------------------------------------------------------- /cli/assets/register_model_mlflow.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/model.schema.json 2 | name: nyc_taxi_mlflow 3 | type: mlflow_model 4 | path: azureml://datastores/workspaceblobstore/paths/nyc-taxi/models 5 | description: データストア上にあるモデルファイルを MLflow 型のモデルとして登録する。 6 | -------------------------------------------------------------------------------- /data/samples/nyc_taxi_sample.json: -------------------------------------------------------------------------------- 1 | { 2 | "data": [ 3 | [ 4 | 2, 5 | 1421107273.0, 6 | 1, 7 | 2.99, 8 | -73.82838439941406, 9 | 40.75553512573242, 10 | -73.78858184814453, 11 | 40.74454879760742 12 | ] 13 | ] 14 | } 15 | -------------------------------------------------------------------------------- /cli/endpoints/online_deployment.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json 2 | name: blue 3 | model: azureml:nyc_taxi:1 4 | code_configuration: 5 | code: ../../src/deploy/online/ 6 | scoring_script: score.py 7 | environment: azureml:nyc-taxi-env@latest 8 | instance_type: Standard_DS4_v2 9 | instance_count: 1 10 | -------------------------------------------------------------------------------- /scripts/prototyping/run-notebooks.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | source /anaconda/etc/profile.d/conda.sh 3 | conda activate mlops-train 4 | 5 | # move to directory of shell script 6 | exec_path=$(readlink -f "$0") 7 | exec_dir=$(dirname "$exec_path") 8 | cd $exec_dir/../../notebooks 9 | 10 | papermill train-experiment.ipynb out.ipynb -k mlops-train 11 | papermill train-mlflow-local.ipynb out.ipynb -k mlops-train 12 | papermill train-model-debugging.ipynb out.ipynb -k mlops-train 13 | -------------------------------------------------------------------------------- /scripts/endpoints/deploy-batch-endpoint-custom.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | export ENDPOINT_NAME=batch-endpoint-custom-`echo $RANDOM` 9 | 10 | az ml batch-endpoint create --name $ENDPOINT_NAME -f ./cli/endpoints/batch_endpoint.yml 11 | 12 | az ml batch-deployment create --endpoint-name $ENDPOINT_NAME -f ./cli/endpoints/batch_deployment.yml 13 | -------------------------------------------------------------------------------- /scripts/endpoints/deploy-online-endpoint-custom.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | export ENDPOINT_NAME=online-endpoint-custom-`echo $RANDOM` 9 | 10 | az ml online-endpoint create --name $ENDPOINT_NAME -f ./cli/endpoints/online_endpoint.yml 11 | 12 | az ml online-deployment create --endpoint-name $ENDPOINT_NAME -f ./cli/endpoints/online_deployment.yml 13 | -------------------------------------------------------------------------------- /scripts/endpoints/deploy-batch-endpoint-mlflow.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | export ENDPOINT_NAME=batch-endpoint-mlflow-`echo $RANDOM` 9 | 10 | az ml batch-endpoint create --name $ENDPOINT_NAME -f ./cli/endpoints/batch_endpoint.yml 11 | 12 | az ml batch-deployment create --endpoint-name $ENDPOINT_NAME -f ./cli/endpoints/batch_deployment_mlflow.yml 13 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Microsoft Open Source Code of Conduct 2 | 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 4 | 5 | Resources: 6 | 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/) 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns 10 | -------------------------------------------------------------------------------- /scripts/endpoints/deploy-online-endpoint-mlflow.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | export ENDPOINT_NAME=online-endpoint-mlflow-`echo $RANDOM` 9 | 10 | az ml online-endpoint create --name $ENDPOINT_NAME -f ./cli/endpoints/online_endpoint.yml 11 | 12 | az ml online-deployment create --endpoint-name $ENDPOINT_NAME -f ./cli/endpoints/online_deployment_mlflow.yml 13 | -------------------------------------------------------------------------------- /cli/endpoints/batch_deployment_mlflow.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json 2 | name: batch-deployment-mlflow 3 | model: azureml:nyc_taxi@latest 4 | compute: azureml:cpu-cluster 5 | resources: 6 | instance_count: 2 7 | max_concurrency_per_instance: 2 8 | mini_batch_size: 10 9 | output_action: append_row 10 | output_file_name: predictions.csv 11 | retry_settings: 12 | max_retries: 3 13 | timeout: 30 14 | error_threshold: -1 15 | logging_level: info 16 | -------------------------------------------------------------------------------- /SUPPORT.md: -------------------------------------------------------------------------------- 1 | # Support 2 | 3 | ## How to file issues and get help 4 | 5 | This project uses GitHub Issues to track bugs and feature requests. Please search the existing 6 | issues before filing new issues to avoid duplicates. For new issues, file your bug or 7 | feature request as a new Issue. 8 | 9 | For help and questions about using this project, please contact our team via GitHub Issues. 10 | 11 | ## Microsoft Support Policy 12 | 13 | Support for this project is limited to the resources listed above. 14 | -------------------------------------------------------------------------------- /environments/conda_train.yml: -------------------------------------------------------------------------------- 1 | name: 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - python=3.8.5 6 | - pip=22.3.1 7 | - pip: 8 | - azureml-mlflow==1.45.0 9 | - mlflow==1.30.0 10 | - scikit-learn==1.0.2 11 | - pandas==1.1.5 12 | - joblib==1.0.0 13 | - matplotlib==3.3.3 14 | - azureml-defaults==1.47.0 15 | - black[jupyter]==22.8.0 16 | - pre-commit==2.20.0 17 | - papermill==2.4.0 18 | - ipykernel==6.6.0 19 | - raiwidgets==0.23.0 20 | - numpy<1.24.0 21 | -------------------------------------------------------------------------------- /.github/dependabot.yml: -------------------------------------------------------------------------------- 1 | # To get started with Dependabot version updates, you'll need to specify which 2 | # package ecosystems to update and where the package manifests are located. 3 | # Please see the documentation for all configuration options: 4 | # https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates 5 | 6 | version: 2 7 | updates: 8 | - package-ecosystem: "" # See documentation for possible values 9 | directory: "/" # Location of package manifests 10 | schedule: 11 | interval: "weekly" 12 | -------------------------------------------------------------------------------- /scripts/configure-workspace.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../ 7 | 8 | # 環境変数の読み込み 9 | source .env 10 | 11 | # 共同で利用しているサブスクリプションをセット 12 | az account set --subscription $SUBSCRIPTION 13 | # az ml デフォルトワークスペースの設定 14 | az configure --defaults group=$GROUP workspace=$WORKSPACE location=$LOCATION 15 | 16 | az configure -l -o table 17 | 18 | echo "Note : リージョン $LOCATION にあるリソースグループ $GROUP の Azure Machine Learning ワークスペース $WORKSPACE をデフォルトのリソースとして設定" 19 | -------------------------------------------------------------------------------- /pipelines/train.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 2 | name: train_linear_regression_model 3 | 4 | display_name: TrainLinearRegressionModel 5 | 6 | version: 1 7 | 8 | type: command 9 | 10 | inputs: 11 | training_data: 12 | type: uri_folder 13 | 14 | outputs: 15 | model_output: 16 | type: mlflow_model 17 | 18 | code: ./train 19 | 20 | environment: azureml:nyc-taxi-env@latest 21 | 22 | command: >- 23 | python train.py 24 | --training_data ${{inputs.training_data}} 25 | --model_output ${{outputs.model_output}} 26 | -------------------------------------------------------------------------------- /.github/workflows/smoke-testing.yml: -------------------------------------------------------------------------------- 1 | name: smoke-testing 2 | on: 3 | push: 4 | branches: 5 | - main 6 | pull_request: 7 | branches: 8 | - main 9 | schedule: 10 | - cron: "0 0 * * *" 11 | jobs: 12 | build: 13 | runs-on: ubuntu-latest 14 | steps: 15 | - name: Checkout repository 16 | uses: actions/checkout@v3 17 | - name: Setup python 18 | uses: actions/setup-python@v4.2.0 19 | with: 20 | python-version: "3.9" 21 | - name: pip install 22 | run: pip install black[jupyter]==22.8.0 23 | - name: Check code format 24 | run: black --check . 25 | -------------------------------------------------------------------------------- /scripts/assets/create-data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | # Get dataset name from yaml file 9 | dataset_name=$(cat ./cli/assets/create-data.yml | grep name | awk '{print $2}') 10 | 11 | # Check if dataset exists 12 | dataset_exists=$(az ml data list --query "[?name=='$dataset_name']" | jq 'length') 13 | 14 | # If dataset exists, do not create 15 | if [ $dataset_exists -gt 0 ]; then 16 | echo "Dataset already exists" 17 | else 18 | # Create new dataset 19 | az ml data create -f ./cli/assets/create-data.yml 20 | fi 21 | -------------------------------------------------------------------------------- /cli/jobs/train.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json 2 | experiment_name: train_nyc_taxi 3 | description: train_nyc_taxi 4 | type: command 5 | code: ../../src/model 6 | command: >- 7 | python train.py 8 | --input_data ${{inputs.nyc_taxi_data}} 9 | --output_dir ${{outputs.output_dir}} 10 | environment: azureml:nyc-taxi-env@latest 11 | inputs: 12 | nyc_taxi_data: 13 | type: uri_file 14 | path: azureml:nyc_taxi_dataset@latest 15 | outputs: 16 | output_dir: 17 | type: uri_folder 18 | path: azureml://datastores/workspaceblobstore/paths/nyc-taxi/ 19 | mode: mount 20 | compute: azureml:cpu-cluster 21 | -------------------------------------------------------------------------------- /cli/endpoints/batch_deployment.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json 2 | name: batch-deployment 3 | description: custom batch deployment 4 | model: azureml:nyc_taxi@latest 5 | code_configuration: 6 | code: ../../src/deploy/batch/ 7 | scoring_script: score.py 8 | environment: azureml:nyc-taxi-env@latest 9 | compute: azureml:cpu-cluster 10 | resources: 11 | instance_count: 1 12 | max_concurrency_per_instance: 2 13 | mini_batch_size: 10 14 | output_action: append_row 15 | output_file_name: predictions.csv 16 | retry_settings: 17 | max_retries: 3 18 | timeout: 30 19 | error_threshold: -1 20 | logging_level: info 21 | -------------------------------------------------------------------------------- /scripts/assets/create-compute.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | # Get cluster name from yaml file 9 | cluster_name=$(cat ./cli/assets/create-compute.yml | grep name | awk '{print $2}') 10 | 11 | # Check if cluster exists 12 | cluster_exists=$(az ml compute list --query "[?name=='$cluster_name']" | jq 'length') 13 | 14 | # If cluster exists, do not create 15 | if [ $cluster_exists -gt 0 ]; then 16 | echo "Cluster already exists" 17 | else 18 | # Create new cluster 19 | az ml compute create -f ./cli/assets/create-compute.yml 20 | fi 21 | -------------------------------------------------------------------------------- /scripts/assets/create-environment.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | # Get environment name from yaml file 9 | env_name=$(cat ./cli/assets/create-environment.yml | grep name | awk '{print $2}') 10 | 11 | # Check if environment exists 12 | env_exists=$(az ml environment list --query "[?name=='$env_name']" | jq 'length') 13 | 14 | # If environment exists, do not create 15 | if [ $env_exists -gt 0 ]; then 16 | echo "Environment already exists" 17 | else 18 | # Create new environment 19 | az ml environment create -f ./cli/assets/create-environment.yml 20 | fi 21 | -------------------------------------------------------------------------------- /pipelines/eval.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 2 | name: evaluate_linear_regression_model 3 | 4 | display_name: EvaluateLinearRegressionModel 5 | 6 | version: 1 7 | 8 | type: command 9 | 10 | inputs: 11 | predicted_data: 12 | type: uri_folder 13 | label_data: 14 | type: uri_folder 15 | 16 | outputs: 17 | model_performance_report: 18 | type: uri_folder 19 | 20 | code: ./eval 21 | 22 | environment: azureml:nyc-taxi-env@latest 23 | 24 | command: >- 25 | python eval.py 26 | --predicted_data ${{inputs.predicted_data}} 27 | --label_data ${{inputs.label_data}} 28 | --model_performance_report ${{outputs.model_performance_report}} 29 | -------------------------------------------------------------------------------- /pipelines/prep.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 2 | name: prep_taxi_data 3 | 4 | display_name: PrepTaxiData 5 | 6 | version: 1 7 | 8 | type: command 9 | 10 | inputs: 11 | nyc_taxi_data: 12 | type: uri_file 13 | test_split_ratio: 14 | type: number 15 | min: 0 16 | max: 1 17 | default: 0.2 18 | 19 | outputs: 20 | training_data: 21 | type: uri_folder 22 | testing_data: 23 | type: uri_folder 24 | 25 | code: ./prep 26 | 27 | environment: azureml:nyc-taxi-env@latest 28 | 29 | command: >- 30 | python prep.py 31 | --input_data ${{inputs.nyc_taxi_data}} 32 | --training_data ${{outputs.training_data}} 33 | --testing_data ${{outputs.testing_data}} 34 | -------------------------------------------------------------------------------- /pipelines/score.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json 2 | name: score_linear_regression_model 3 | 4 | display_name: ScoreLinearRegressionModel 5 | 6 | version: 1 7 | 8 | type: command 9 | 10 | inputs: 11 | model_input: 12 | type: mlflow_model 13 | testing_data: 14 | type: uri_folder 15 | 16 | outputs: 17 | predicted_data: 18 | type: uri_folder 19 | label_data: 20 | type: uri_folder 21 | 22 | code: ./score 23 | 24 | environment: azureml:nyc-taxi-env@latest 25 | 26 | command: >- 27 | python score.py 28 | --testing_data ${{inputs.testing_data}} 29 | --model_input ${{inputs.model_input}} 30 | --predicted_data ${{outputs.predicted_data}} 31 | --label_data ${{outputs.label_data}} 32 | -------------------------------------------------------------------------------- /scripts/assets/register-model.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../../ 7 | 8 | # get the latest run id 9 | run_id=$(echo $(az ml job list --query "reverse(sort_by([?status=='Completed'].{experiment_name:experiment_name, run_id:name, status:status, date:creation_context.created_at}, &date))[0].run_id") | sed 's/"//g') 10 | echo $run_id 11 | 12 | 13 | # register model that was trained in the latest job 14 | az ml model create -f ./cli/assets/register_model.yml --path azureml://datastores/workspaceblobstore/paths/nyc-taxi/$run_id/models 15 | az ml model create -f ./cli/assets/register_model_mlflow.yml --path azureml://datastores/workspaceblobstore/paths/nyc-taxi/$run_id/models 16 | -------------------------------------------------------------------------------- /.devcontainer/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM mcr.microsoft.com/vscode/devcontainers/miniconda:0-3 2 | SHELL ["/bin/bash", "-c"] 3 | 4 | # conda インストール 5 | COPY ./environments/conda_train.yml /environments/ 6 | RUN conda env create -n mlops-train --file /environments/conda_train.yml && \ 7 | conda init bash 8 | 9 | # kernel の設定 10 | RUN source ~/.bashrc && conda activate mlops-train && \ 11 | ipython kernel install --name=mlops-train --display-name=mlops-train 12 | 13 | # pre-commit 設定 14 | COPY .pre-commit-config.yaml . 15 | RUN source ~/.bashrc && conda activate mlops-train && \ 16 | git init . && \ 17 | pre-commit install-hooks 18 | 19 | # vscode ユーザへ切り替え 20 | USER vscode 21 | 22 | # Azure CLI & ml extension のインストール 23 | RUN curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash - && az extension add --name ml 24 | 25 | # conda の設定 26 | RUN conda init bash 27 | -------------------------------------------------------------------------------- /.devcontainer/devcontainer.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Miniconda (Python 3)", 3 | "build": { 4 | "context": "..", 5 | "dockerfile": "Dockerfile", 6 | }, 7 | 8 | "customizations": { 9 | "vscode": { 10 | "settings": { 11 | "terminal.integrated.profiles.linux": { 12 | "bash": { 13 | "path": "/bin/bash" 14 | } 15 | }, 16 | "[python]": { 17 | "editor.defaultFormatter": "ms-python.black-formatter", 18 | "editor.formatOnSave": true, 19 | "editor.codeActionsOnSave": { 20 | "source.organizeImports": true 21 | }, 22 | }, 23 | "isort.args":[ 24 | "--profile", "black" 25 | ], 26 | }, 27 | "extensions": [ 28 | "ms-python.python", 29 | "ms-python.vscode-pylance", 30 | "ms-toolsai.vscode-ai", 31 | "ms-python.black-formatter", 32 | "ms-python.isort" 33 | ] 34 | } 35 | } 36 | } 37 | -------------------------------------------------------------------------------- /src/deploy/batch/score.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import numpy as np 4 | from mlflow.pyfunc import load_model 5 | 6 | 7 | # Called when the service is loaded 8 | def init(): 9 | global model 10 | 11 | model_path = os.path.join( 12 | os.environ["AZUREML_MODEL_DIR"], 13 | "nyc_taxi", 14 | ) 15 | # model_path = Model.get_model_path(args.model_name) 16 | model = load_model(model_path) 17 | 18 | 19 | def run(mini_batch): 20 | print(f"run method start: {__file__}, run({mini_batch})") 21 | 22 | for input in mini_batch: 23 | input_np = np.loadtxt(input, delimiter=",", skiprows=1) 24 | predictions = model.predict(input_np) 25 | log_txt = "Predictions:" + str(predictions) 26 | print(log_txt) 27 | 28 | return [ 29 | [row, pred] for row, pred in enumerate(predictions) 30 | ] # return a dataframe or a list 31 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | ## Contributing 2 | 3 | This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. 4 | 5 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. 6 | 7 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 8 | -------------------------------------------------------------------------------- /.devcontainer/add-notice.sh: -------------------------------------------------------------------------------- 1 | # Display a notice when not running in GitHub Codespaces 2 | 3 | cat << 'EOF' > /usr/local/etc/vscode-dev-containers/conda-notice.txt 4 | When using "conda" from outside of GitHub Codespaces, note the Anaconda repository 5 | contains restrictions on commercial use that may impact certain organizations. See 6 | https://aka.ms/vscode-remote/conda/miniconda 7 | 8 | EOF 9 | 10 | notice_script="$(cat << 'EOF' 11 | if [ -t 1 ] && [ "${IGNORE_NOTICE}" != "true" ] && [ "${TERM_PROGRAM}" = "vscode" ] && [ "${CODESPACES}" != "true" ] && [ ! -f "$HOME/.config/vscode-dev-containers/conda-notice-already-displayed" ]; then 12 | cat "/usr/local/etc/vscode-dev-containers/conda-notice.txt" 13 | mkdir -p "$HOME/.config/vscode-dev-containers" 14 | ((sleep 10s; touch "$HOME/.config/vscode-dev-containers/conda-notice-already-displayed") &) 15 | fi 16 | EOF 17 | )" 18 | 19 | echo "${notice_script}" | tee -a /etc/bash.bashrc >> /etc/zsh/zshrc 20 | -------------------------------------------------------------------------------- /.github/workflows/smoke-testing-azureml.yml: -------------------------------------------------------------------------------- 1 | name: Smoke Testing for Azure ML 2 | on: 3 | push: 4 | branches: 5 | - main 6 | pull_request: 7 | branches: 8 | - main 9 | schedule: 10 | - cron: "0 0 * * *" 11 | workflow_dispatch: 12 | 13 | 14 | jobs: 15 | smoke-testing-azureml-training: 16 | name: Smoke Testing for Azure ML (Training) 17 | runs-on: ubuntu-latest 18 | steps: 19 | - name: Checkout repository 20 | uses: actions/checkout@v3 21 | - name: Install az ml extension 22 | run: az extension add -n ml -y 23 | - name: Azure login 24 | uses: azure/login@v1 25 | with: 26 | creds: ${{secrets.AZURE_CREDENTIALS}} 27 | - name: Configure default azureml workspace 28 | run: | 29 | az configure --defaults group=${{secrets.GROUP}} workspace=${{secrets.WORKSPACE}} location=${{secrets.LOCATION}} 30 | - name: Job for model training 31 | run: | 32 | az ml job create -f train.yml --stream 33 | working-directory: cli/jobs 34 | -------------------------------------------------------------------------------- /docs/images/azureml-icon.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | Icon-166Artboard 1 12 | 13 | 14 | 15 | 16 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | # See https://pre-commit.com for more information 2 | # See https://pre-commit.com/hooks.html for more hooks 3 | repos: 4 | # サンプルで生成されるもの (pre-commit sample-config > .pre-commit-config.yaml) 5 | - repo: https://github.com/pre-commit/pre-commit-hooks 6 | rev: v4.4.0 7 | hooks: 8 | - id: trailing-whitespace 9 | - id: no-commit-to-branch 10 | args: [--branch, main] 11 | - id: end-of-file-fixer 12 | - id: check-yaml 13 | - id: check-added-large-files 14 | 15 | # Python 自動フォーマッター 16 | - repo: https://github.com/ambv/black 17 | rev: 23.1.0 18 | hooks: 19 | - id: black 20 | - id: black-jupyter 21 | language_version: python3 22 | 23 | # import 並び替え 24 | - repo: https://github.com/pycqa/isort 25 | rev: 5.12.0 26 | hooks: 27 | - id: isort 28 | name: isort (python) 29 | args: ["--profile", "black"] # black との競合回避 (他には .isort.cfg にて [tools.isort] profile="black" とする方法もある) 30 | 31 | # Python 静的解析ツール 32 | - repo: https://github.com/pycqa/flake8 33 | rev: 6.0.0 34 | hooks: 35 | - id: flake8 36 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Microsoft Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE 22 | -------------------------------------------------------------------------------- /.github/workflows/smoke-testing-python-script.yml: -------------------------------------------------------------------------------- 1 | name: Smoke Testing for Python Script 2 | on: 3 | push: 4 | branches: 5 | - main 6 | pull_request: 7 | branches: 8 | - main 9 | schedule: 10 | - cron: "0 0 * * *" 11 | workflow_dispatch: 12 | 13 | 14 | jobs: 15 | smoke-testing-python-script: 16 | name: Smoke Testing for Python Script 17 | runs-on: ubuntu-latest 18 | env: 19 | INPUT_DATA: './data/raw/nyc_taxi_dataset.csv' 20 | steps: 21 | - name: Checkout repository 22 | uses: actions/checkout@v3 23 | - name: Setup python 24 | uses: actions/setup-python@v4.2.0 25 | with: 26 | python-version: "3.9" 27 | - name: Create conda environment 28 | uses: conda-incubator/setup-miniconda@v2 29 | with: 30 | activate-environment: mlops-train 31 | environment-file: environments/conda_train.yml 32 | - name : Kernel configuration 33 | run: | 34 | python -m ipykernel install --user --name mlops-train 35 | shell: bash -el {0} 36 | - name: Run Python script 37 | run: | 38 | python src/model/train.py --input_data $INPUT_DATA 39 | shell: bash -el {0} 40 | env: 41 | AZUREML_RUN_ID : $GITHUB_RUN_ID 42 | -------------------------------------------------------------------------------- /src/deploy/online/score.py: -------------------------------------------------------------------------------- 1 | import json 2 | import logging 3 | import os 4 | 5 | import joblib 6 | import numpy 7 | 8 | 9 | def init(): 10 | """ 11 | This function is called when the container is initialized/started, typically after create/update of the deployment. 12 | You can write the logic here to perform init operations like caching the model in memory 13 | """ 14 | global model 15 | # AZUREML_MODEL_DIR is an environment variable created during deployment. 16 | # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION) 17 | # Please provide your model's folder name if there is one 18 | model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "models/model.pkl") 19 | # deserialize the model file back into a sklearn model 20 | model = joblib.load(model_path) 21 | logging.info("Init complete") 22 | 23 | 24 | def run(raw_data): 25 | """ 26 | This function is called for every invocation of the endpoint to perform the actual scoring/prediction. 27 | In the example we extract the data from the json input and call the scikit-learn model's predict() 28 | method and return the result back 29 | """ 30 | logging.info("model: request received") 31 | data = json.loads(raw_data)["data"] 32 | data = numpy.array(data) 33 | result = model.predict(data) 34 | logging.info("Request processed") 35 | return result.tolist() 36 | -------------------------------------------------------------------------------- /.github/workflows/microsoft-security-devops-analysis.yml: -------------------------------------------------------------------------------- 1 | name: Microsoft Security DevOps Analysis 2 | on: 3 | push: 4 | branches: 5 | - main 6 | pull_request: 7 | branches: 8 | - main 9 | schedule: 10 | - cron: "0 0 * * *" 11 | workflow_dispatch: 12 | 13 | jobs: 14 | scan-code: 15 | name: Microsoft Security DevOps Analysis 16 | 17 | # MSDO runs on windows-latest. 18 | # ubuntu-latest and macos-latest supporting coming soon 19 | runs-on: ubuntu-latest 20 | 21 | steps: 22 | 23 | # Checkout your code repository to scan 24 | - uses: actions/checkout@v3 25 | 26 | # Install dotnet, used by MSDO 27 | - uses: actions/setup-dotnet@v3 28 | with: 29 | dotnet-version: | 30 | 3.1.x 31 | 5.0.x 32 | 6.0.x 33 | 34 | # Run analyzers 35 | - name: Run Microsoft Security DevOps Analysis 36 | uses: microsoft/security-devops-action@preview 37 | id: msdo 38 | 39 | # Upload alerts to the Security tab 40 | - name: Upload alerts to Security tab 41 | uses: github/codeql-action/upload-sarif@v2 42 | with: 43 | sarif_file: ${{ steps.msdo.outputs.sarifFile }} 44 | 45 | # Upload alerts file as a workflow artifact 46 | - name: Upload alerts file as a workflow artifact 47 | uses: actions/upload-artifact@v3 48 | with: 49 | name: alerts 50 | path: ${{ steps.msdo.outputs.sarifFile }} 51 | -------------------------------------------------------------------------------- /scripts/setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # move to directory of shell script 4 | exec_path=$(readlink -f "$0") 5 | exec_dir=$(dirname "$exec_path") 6 | cd $exec_dir/../ 7 | 8 | # conda 仮想環境 9 | # conda環境名を指定 10 | CONDA_ENV_NAME="mlops-train" 11 | 12 | # 特定の環境が存在するかどうかを確認 13 | if conda env list | grep -Eq "\s*$CONDA_ENV_NAME\s"; then 14 | # check if the conda environment already exists 15 | echo "Conda environment '$CONDA_ENV_NAME' already exists. Skipping creation." 16 | source /anaconda/etc/profile.d/conda.sh 17 | conda activate $CONDA_ENV_NAME 18 | else 19 | # 環境が存在しない場合はエラーメッセージを表示 20 | conda env create -n $CONDA_ENV_NAME --file environments/conda_train.yml 21 | 22 | conda init bash 23 | # check if the command was successful 24 | if [ $? -eq 0 ]; then 25 | echo "'conda init' command was successful." 26 | fi 27 | 28 | source /anaconda/etc/profile.d/conda.sh 29 | conda activate $CONDA_ENV_NAME 30 | 31 | # check if the command was successful 32 | if [ $? -eq 0 ]; then 33 | echo "'conda activate $CONDA_ENV_NAME' command was successful." 34 | fi 35 | fi 36 | 37 | 38 | # ipykernel 39 | ipython kernel install --user --name=$CONDA_ENV_NAME --display-name=$CONDA_ENV_NAME 40 | 41 | # pre-commit 42 | git init . 43 | git config --global --add safe.directory . 44 | pre-commit install-hooks 45 | 46 | # Azure CLI & ml extension 47 | curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash - && az extension add --name ml 48 | -------------------------------------------------------------------------------- /.github/workflows/smoke-testing-notebook.yml: -------------------------------------------------------------------------------- 1 | name: Smoke Testing for Notebook 2 | on: 3 | push: 4 | branches: 5 | - main 6 | pull_request: 7 | branches: 8 | - main 9 | schedule: 10 | - cron: "0 0 * * *" 11 | workflow_dispatch: 12 | 13 | 14 | jobs: 15 | smoke-testing-notebook: 16 | name: Smoke Testing for Notebook 17 | runs-on: ubuntu-latest 18 | steps: 19 | - name: Checkout repository 20 | uses: actions/checkout@v3 21 | - name: Setup python 22 | uses: actions/setup-python@v4.2.0 23 | with: 24 | python-version: "3.9" 25 | - name: Create conda environment 26 | uses: conda-incubator/setup-miniconda@v2 27 | with: 28 | activate-environment: mlops-train 29 | environment-file: environments/conda_train.yml 30 | - name : Kernel configuration 31 | run: | 32 | python -m ipykernel install --user --name mlops-train 33 | shell: bash -el {0} 34 | - name: Run Notebook for experiment 35 | run: | 36 | papermill train-experiment.ipynb output.ipynb -k mlops-train 37 | working-directory: notebooks 38 | shell: bash -el {0} 39 | - name: Run Notebook for mlflow 40 | run: | 41 | papermill train-mlflow-local.ipynb output.ipynb -k mlops-train 42 | working-directory: notebooks 43 | shell: bash -el {0} 44 | - name: Run Notebook for responsible ai debugging 45 | run: | 46 | papermill train-model-debugging.ipynb output.ipynb -k mlops-train 47 | working-directory: notebooks 48 | shell: bash -el {0} 49 | -------------------------------------------------------------------------------- /pipelines/pipeline.yml: -------------------------------------------------------------------------------- 1 | $schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json 2 | type: pipeline 3 | display_name: train_nyc_taxi_pipeline 4 | description: train_nyc_taxi 5 | 6 | outputs: 7 | pipeline_job_trained_model: 8 | type: mlflow_model 9 | mode: upload 10 | 11 | settings: 12 | default_datastore: azureml:workspaceblobstore 13 | default_compute: azureml:cpu-cluster 14 | continue_on_step_failure: false 15 | 16 | 17 | jobs: 18 | prep_job: 19 | type: command 20 | component: ./prep.yml 21 | inputs: 22 | nyc_taxi_data: #using local data, will crate an anonymous data asset 23 | type: uri_file 24 | path: azureml:nyc_taxi_dataset@latest 25 | outputs: 26 | training_data: 27 | testing_data: 28 | 29 | 30 | train_job: 31 | type: command 32 | component: ./train.yml 33 | inputs: 34 | training_data: ${{parent.jobs.prep_job.outputs.training_data}} 35 | outputs: 36 | model_output: ${{parent.outputs.pipeline_job_trained_model}} 37 | 38 | 39 | score_job: 40 | type: command 41 | component: ./score.yml 42 | inputs: 43 | testing_data: ${{parent.jobs.prep_job.outputs.testing_data}} 44 | model_input: ${{parent.jobs.train_job.outputs.model_output}} 45 | outputs: 46 | predicted_data: 47 | label_data: 48 | 49 | 50 | eval_job: 51 | type: command 52 | component: ./eval.yml 53 | inputs: 54 | predicted_data: ${{parent.jobs.score_job.outputs.predicted_data}} 55 | label_data: ${{parent.jobs.score_job.outputs.label_data}} 56 | outputs: 57 | model_performance_report: 58 | -------------------------------------------------------------------------------- /.github/workflows/code-quality.yml: -------------------------------------------------------------------------------- 1 | name: Code Quality (linter, formatter, pre-commit) 2 | on: 3 | push: 4 | branches: 5 | - main 6 | pull_request: 7 | branches: 8 | - main 9 | schedule: 10 | - cron: "0 0 * * *" 11 | workflow_dispatch: 12 | 13 | 14 | jobs: 15 | lint-python: 16 | name: Python Lint 17 | runs-on: ubuntu-latest 18 | steps: 19 | - name: Checkout repository 20 | uses: actions/checkout@v3 21 | - name: Setup python 22 | uses: actions/setup-python@v4.2.0 23 | with: 24 | python-version: "3.9" 25 | - name: Set up python 26 | run: | 27 | pip install -r environments/code_quality.txt 28 | pip list 29 | - name: Lint with flake8 30 | run: | 31 | flake8 . 32 | # - name: Tyepcheck with mypy 33 | # run: | 34 | # mypy . 35 | 36 | 37 | format-python: 38 | name: Python Format 39 | runs-on: ubuntu-latest 40 | steps: 41 | - name: Checkout repository 42 | uses: actions/checkout@v3 43 | - name: Setup python 44 | uses: actions/setup-python@v4.2.0 45 | with: 46 | python-version: "3.9" 47 | - name: Set up python 48 | run: | 49 | pip install -r environments/code_quality.txt 50 | pip list 51 | - name: Lint with isort 52 | run: | 53 | isort . --check --diff 54 | - name: Lint with black 55 | run: | 56 | black . --check --diff 57 | 58 | 59 | pre-commit: 60 | name: pre-commit 61 | runs-on: ubuntu-latest 62 | env: 63 | SKIP: no-commit-to-branch 64 | steps: 65 | - uses: actions/checkout@v3 66 | - uses: actions/setup-python@v3 67 | - uses: pre-commit/action@v3.0.0 68 | -------------------------------------------------------------------------------- /pipelines/train/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from pathlib import Path 3 | 4 | import mlflow 5 | import mlflow.sklearn 6 | import pandas as pd 7 | from sklearn.linear_model import LinearRegression 8 | 9 | 10 | def parse_args(): 11 | # 引数の処理 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument("--training_data", type=str, help="Path of training_data") 14 | parser.add_argument("--model_output", type=str, help="Path of training_data") 15 | 16 | args = parser.parse_args() 17 | return args 18 | 19 | 20 | def split_label(df): 21 | X = df.drop(columns="totalAmount") 22 | y = df["totalAmount"] 23 | return X, y 24 | 25 | 26 | def train_model(X_train, y_train): 27 | # データのサンプル数のロギング 28 | mlflow.log_metric("Train samples", len(X_train)) 29 | 30 | # モデル学習 31 | model = LinearRegression().fit(X_train, y_train) 32 | 33 | return model 34 | 35 | 36 | def save_model(model, output_dir): 37 | # モデルの保存 38 | mlflow.sklearn.save_model(model, output_dir) 39 | 40 | 41 | def main(args): 42 | # 自動ロギングの有効化 43 | mlflow.autolog(log_models=False) 44 | 45 | # 引数の確認 46 | lines = [ 47 | f"training_data のパス: {args.training_data}", 48 | f"モデル出力フォルダのパス: {args.model_output}", 49 | ] 50 | [print(line) for line in lines] 51 | 52 | # 学習データの読み込み 53 | df = pd.read_csv(Path(args.training_data) / "train.csv") 54 | 55 | # データ前処理 56 | X_train, y_train = split_label(df) 57 | 58 | # モデル学習 59 | model = train_model(X_train, y_train) 60 | 61 | # モデル保存 62 | save_model(model, args.model_output) 63 | 64 | 65 | if __name__ == "__main__": 66 | # 引数の処理 67 | args = parse_args() 68 | 69 | # main 関数の実行 70 | main(args) 71 | -------------------------------------------------------------------------------- /pipelines/prep/prep.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from pathlib import Path 3 | 4 | import mlflow 5 | import mlflow.sklearn 6 | import pandas as pd 7 | from sklearn.model_selection import train_test_split 8 | 9 | 10 | def parse_args(): 11 | # 引数の処理 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument("--input_data", type=str, help="input data") 14 | parser.add_argument( 15 | "--test_split_ratio", type=float, help="Ratio of train test split" 16 | ) 17 | parser.add_argument("--training_data", type=str, help="Path of training_data") 18 | parser.add_argument("--testing_data", type=str, help="Path of training_data") 19 | 20 | args = parser.parse_args() 21 | return args 22 | 23 | 24 | def process_data(df): 25 | training_data, testing_data = train_test_split( 26 | df, test_size=args.test_split_ratio, random_state=0 27 | ) 28 | mlflow.log_metric("Train samples", len(training_data)) 29 | mlflow.log_metric("Test samples", len(testing_data)) 30 | 31 | # 分割データの出力 32 | return training_data, testing_data 33 | 34 | 35 | def main(args): 36 | # 引数の確認 37 | lines = [ 38 | f"学習データのパス: {args.input_data}", 39 | f"分割データのパス (training_data): {args.training_data}", 40 | f"分割データのパス (testing_data): {args.testing_data}", 41 | ] 42 | [print(line) for line in lines] 43 | 44 | # 学習データの読み込み 45 | df = pd.read_csv(args.input_data) 46 | 47 | # データ前処理 48 | training_data, testing_data = process_data(df) 49 | training_data.to_csv(Path(args.training_data) / "train.csv", index=False) 50 | testing_data.to_csv(Path(args.testing_data) / "test.csv", index=False) 51 | 52 | 53 | if __name__ == "__main__": 54 | # 引数の処理 55 | args = parse_args() 56 | 57 | # main 関数の実行 58 | main(args) 59 | -------------------------------------------------------------------------------- /pipelines/score/score.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from pathlib import Path 3 | 4 | import mlflow 5 | import mlflow.sklearn 6 | import numpy as np 7 | import pandas as pd 8 | 9 | 10 | def parse_args(): 11 | # 引数の処理 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument("--model_input", type=str, help="Path of model input") 14 | parser.add_argument("--testing_data", type=str, help="Path of testing data") 15 | parser.add_argument("--predicted_data", type=str, help="Path of predicted data") 16 | parser.add_argument("--label_data", type=str, help="Path of label data") 17 | 18 | args = parser.parse_args() 19 | return args 20 | 21 | 22 | def split_label(df): 23 | X = df.drop(columns="totalAmount") 24 | y = df["totalAmount"] 25 | return X, y 26 | 27 | 28 | def get_model(model_input): 29 | return mlflow.sklearn.load_model(model_input) 30 | 31 | 32 | def score_model(X_test, model): 33 | pred = model.predict(X_test) 34 | return pred 35 | 36 | 37 | def save_data(pred, data_path, filename): 38 | np.savetxt(Path(data_path) / filename, pred, delimiter=",") 39 | 40 | 41 | def main(args): 42 | # 引数の確認 43 | lines = [ 44 | f"モデル入力ファイルのパス: {args.model_input}", 45 | f"testing_data のパス: {args.testing_data}", 46 | ] 47 | [print(line) for line in lines] 48 | 49 | # テストデータの読み込み 50 | df = pd.read_csv(Path(args.testing_data) / "test.csv") 51 | 52 | # データ前処理 53 | X_test, y_test = split_label(df) 54 | 55 | # モデルの取得 56 | model = get_model(args.model_input) 57 | 58 | # 予測 59 | pred = score_model(X_test, model) 60 | 61 | # 予測値の保存 62 | save_data(pred, args.predicted_data, "pred.csv") 63 | 64 | # ラベルデータの保存 65 | save_data(y_test, args.label_data, "label.csv") 66 | 67 | 68 | if __name__ == "__main__": 69 | # 引数の処理 70 | args = parse_args() 71 | 72 | # main 関数の実行 73 | main(args) 74 | -------------------------------------------------------------------------------- /src/model/register_model.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | from pathlib import Path 4 | 5 | import mlflow 6 | from mlflow.pyfunc import load_model 7 | 8 | 9 | def parse_args(): 10 | parser = argparse.ArgumentParser() 11 | parser.add_argument( 12 | "--model_name", 13 | type=str, 14 | help="Name under which model will be registered", 15 | ) 16 | parser.add_argument( 17 | "--model_path", 18 | type=str, 19 | help="Model directory", 20 | ) 21 | parser.add_argument( 22 | "--deploy_flag", 23 | type=str, 24 | help="A deploy flag whether to deploy or no", 25 | ) 26 | 27 | args, _ = parser.parse_known_args() 28 | print(f"Arguments: {args}") 29 | 30 | return args 31 | 32 | 33 | def main(): 34 | # Get run 35 | run = mlflow.start_run() 36 | run_id = run.info.run_id 37 | print("run_id: ", run_id) 38 | 39 | args = parse_args() 40 | 41 | model_name = args.model_name 42 | model_path = args.model_path 43 | 44 | if len(args.deploy_flag) == 1: # this is the case where deploy_flag is a digit 45 | deploy_flag = int(args.deploy_flag) 46 | else: # this is the case where deploy_flag is a path name 47 | with open( 48 | (Path(args.deploy_flag) / "deploy_flag"), 49 | "rb", 50 | ) as f: 51 | deploy_flag = int(f.read()) 52 | 53 | if deploy_flag == 1: 54 | print("Registering ", model_name) 55 | 56 | model = load_model(os.path.join(model_path, "models")) 57 | # log model using mlflow 58 | mlflow.sklearn.log_model(model, model_name) 59 | 60 | # register model using mlflow model 61 | model_uri = f"runs:/{run_id}/{args.model_name}" 62 | mlflow.register_model(model_uri, model_name) 63 | 64 | else: 65 | print("Model will not be registered!") 66 | 67 | mlflow.end_run() 68 | 69 | 70 | if __name__ == "__main__": 71 | main() 72 | -------------------------------------------------------------------------------- /pipelines/eval/eval.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from pathlib import Path 3 | 4 | import matplotlib.pyplot as plt 5 | import mlflow 6 | import mlflow.sklearn 7 | import numpy as np 8 | import pandas as pd 9 | from sklearn.metrics import mean_squared_error, r2_score 10 | 11 | 12 | def parse_args(): 13 | # 引数の処理 14 | parser = argparse.ArgumentParser() 15 | parser.add_argument("--predicted_data", type=str, help="Path of predicted data") 16 | parser.add_argument("--label_data", type=str, help="Path of label data") 17 | parser.add_argument( 18 | "--model_performance_report", type=str, help="Path of model performance report" 19 | ) 20 | 21 | args = parser.parse_args() 22 | return args 23 | 24 | 25 | def evaluate_model(y_test, y_pred): 26 | # データのサンプル数のロギング 27 | mlflow.log_metric("Test samples", len(y_test)) 28 | 29 | # モデル評価 30 | mse = mean_squared_error(y_test, y_pred) 31 | rmse = np.sqrt(mse) 32 | r2 = r2_score(y_test, y_pred) 33 | 34 | # 精度メトリックのロギング 35 | mlflow.log_metric("mse", mse) 36 | mlflow.log_metric("rmse", rmse) 37 | mlflow.log_metric("r2", r2) 38 | 39 | 40 | def plot_actuals_predictions(y_test, y_pred, report_path): 41 | # 出力パス 42 | output_path = str(Path(report_path) / "actuals_vs_predictions.png") 43 | # 実測値と予測値のプロット 44 | plt.figure(figsize=(10, 7)) 45 | plt.scatter(y_test, y_pred) 46 | plt.plot(y_test, y_test, color="r") 47 | plt.title("Actual VS Predicted Values (Test set)") 48 | plt.xlabel("Actual Values") 49 | plt.ylabel("Predicted Values") 50 | plt.savefig(output_path) 51 | 52 | # プロット画像のロギング 53 | mlflow.log_artifact(output_path) 54 | 55 | 56 | def main(args): 57 | # 引数の確認 58 | lines = [ 59 | f"予測値データのパス: {args.predicted_data}", 60 | f"ラベルデータのパス: {args.label_data}", 61 | f"モデルパフォーマンスレポートのパス: {args.model_performance_report}", 62 | ] 63 | [print(line) for line in lines] 64 | 65 | # 予測値データの読み込み 66 | y_pred = pd.read_csv(Path(args.predicted_data) / "pred.csv") 67 | 68 | # ラベルデータの読み込み 69 | y_test = pd.read_csv(Path(args.label_data) / "label.csv") 70 | 71 | # モデル評価指標の算出 72 | evaluate_model(y_test, y_pred) 73 | 74 | # 実測値と予測値のプロット 75 | plot_actuals_predictions(y_test, y_pred, args.model_performance_report) 76 | 77 | 78 | if __name__ == "__main__": 79 | # 引数の処理 80 | args = parse_args() 81 | 82 | # main 関数の実行 83 | main(args) 84 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | 131 | ##AML 132 | config.json 133 | 134 | # Mlflow 135 | mlruns/ 136 | mlartifacts/ 137 | outputs/ 138 | mlruns.db 139 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Security 4 | 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). 6 | 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below. 8 | 9 | ## Reporting Security Issues 10 | 11 | **Please do not report security vulnerabilities through public GitHub issues.** 12 | 13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report). 14 | 15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey). 16 | 17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 18 | 19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: 20 | 21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) 22 | * Full paths of source file(s) related to the manifestation of the issue 23 | * The location of the affected source code (tag/branch/commit or direct URL) 24 | * Any special configuration required to reproduce the issue 25 | * Step-by-step instructions to reproduce the issue 26 | * Proof-of-concept or exploit code (if possible) 27 | * Impact of the issue, including how an attacker might exploit the issue 28 | 29 | This information will help us triage your report more quickly. 30 | 31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs. 32 | 33 | ## Preferred Languages 34 | 35 | We prefer all communications to be in English. 36 | 37 | ## Policy 38 | 39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd). 40 | 41 | 42 | -------------------------------------------------------------------------------- /utils/prepare_data.py: -------------------------------------------------------------------------------- 1 | # 本リポジトリで利用するデータを生成するスクリプト 2 | import copy 3 | import os 4 | from datetime import datetime 5 | 6 | import pandas as pd 7 | from azureml.core import Dataset, Datastore, Workspace 8 | from azureml.opendatasets import NycTlcGreen 9 | from dateutil.relativedelta import relativedelta 10 | 11 | 12 | def register_dataset(ws: Workspace) -> None: 13 | dataset_name = "nyc_taxi_dataset" 14 | try: 15 | dataset = Dataset.get_by_name(ws, dataset_name) 16 | df = dataset.to_pandas_dataframe() 17 | except Exception: 18 | raw_df = pd.DataFrame([]) 19 | start = datetime.strptime("1/1/2015", "%m/%d/%Y") 20 | end = datetime.strptime("1/31/2015", "%m/%d/%Y") 21 | 22 | for sample_month in range(3): 23 | temp_df_green = NycTlcGreen( 24 | start + relativedelta(months=sample_month), 25 | end + relativedelta(months=sample_month), 26 | ).to_pandas_dataframe() 27 | raw_df = raw_df.append(temp_df_green.sample(2000)) 28 | 29 | print(raw_df.head(10)) 30 | 31 | df = copy.deepcopy(raw_df) 32 | 33 | columns_to_remove = [ 34 | "lpepDropoffDatetime", 35 | "puLocationId", 36 | "doLocationId", 37 | "extra", 38 | "mtaTax", 39 | "improvementSurcharge", 40 | "tollsAmount", 41 | "ehailFee", 42 | "tripType", 43 | "rateCodeID", 44 | "storeAndFwdFlag", 45 | "paymentType", 46 | "fareAmount", 47 | "tipAmount", 48 | ] 49 | for col in columns_to_remove: 50 | df.pop(col) 51 | 52 | df = df.query("pickupLatitude>=40.53 and pickupLatitude<=40.88") 53 | df = df.query("pickupLongitude>=-74.09 and pickupLongitude<=-73.72") 54 | df = df.query("tripDistance>=0.25 and tripDistance<31") 55 | df = df.query("passengerCount>0 and totalAmount>0") 56 | 57 | df["lpepPickupDatetime"] = df["lpepPickupDatetime"].map(lambda x: x.timestamp()) 58 | 59 | datastore = Datastore.get_default(ws) 60 | dataset = Dataset.Tabular.register_pandas_dataframe(df, datastore, dataset_name) 61 | 62 | print(df.head(5)) 63 | df.to_csv( 64 | "./data/raw/nyc_taxi_dataset.csv", 65 | header=True, 66 | index=False, 67 | ) 68 | 69 | 70 | def main() -> None: 71 | subscription_id = os.getenv("subscription_id") 72 | resource_group = os.getenv("resource_group") 73 | workspace_name = os.getenv("workspace") 74 | 75 | ws = Workspace( 76 | workspace_name=workspace_name, 77 | subscription_id=subscription_id, 78 | resource_group=resource_group, 79 | ) 80 | 81 | # Inline Environment によって生成、Environment を使い回す場合以下使用 82 | # create_environment(ws) 83 | register_dataset(ws) 84 | 85 | 86 | if __name__ == "__main__": 87 | main() 88 | -------------------------------------------------------------------------------- /.github/workflows/codeql-analysis.yml: -------------------------------------------------------------------------------- 1 | # For most projects, this workflow file will not need changing; you simply need 2 | # to commit it to your repository. 3 | # 4 | # You may wish to alter this file to override the set of languages analyzed, 5 | # or to provide custom queries or build logic. 6 | # 7 | # ******** NOTE ******** 8 | # We have attempted to detect the languages in your repository. Please check 9 | # the `language` matrix defined below to confirm you have the correct set of 10 | # supported CodeQL languages. 11 | # 12 | name: "CodeQL" 13 | 14 | on: 15 | push: 16 | branches: [ "main" ] 17 | pull_request: 18 | # The branches below must be a subset of the branches above 19 | branches: [ "main" ] 20 | schedule: 21 | - cron: '45 21 * * 5' 22 | 23 | jobs: 24 | analyze: 25 | name: Analyze 26 | runs-on: ubuntu-latest 27 | permissions: 28 | actions: read 29 | contents: read 30 | security-events: write 31 | 32 | strategy: 33 | fail-fast: false 34 | matrix: 35 | language: ['python'] 36 | # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ] 37 | # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support 38 | 39 | steps: 40 | - name: Checkout repository 41 | uses: actions/checkout@v3 42 | 43 | # Initializes the CodeQL tools for scanning. 44 | - name: Initialize CodeQL 45 | uses: github/codeql-action/init@v2 46 | with: 47 | languages: ${{ matrix.language }} 48 | # If you wish to specify custom queries, you can do so here or in a config file. 49 | # By default, queries listed here will override any specified in a config file. 50 | # Prefix the list here with "+" to use these queries and those in the config file. 51 | 52 | # Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs 53 | # queries: security-extended,security-and-quality 54 | 55 | 56 | # Autobuild attempts to build any compiled languages (C/C++, C#, or Java). 57 | # If this step fails, then you should remove it and run the build manually (see below) 58 | - name: Autobuild 59 | uses: github/codeql-action/autobuild@v2 60 | 61 | # ℹ️ Command-line programs to run using the OS shell. 62 | # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun 63 | 64 | # If the Autobuild fails above, remove it and uncomment the following three lines. 65 | # modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance. 66 | 67 | # - run: | 68 | # echo "Run, Build Application using script" 69 | # ./location_of_script_within_repo/buildscript.sh 70 | 71 | - name: Perform CodeQL Analysis 72 | uses: github/codeql-action/analyze@v2 73 | -------------------------------------------------------------------------------- /src/model/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import matplotlib.pyplot as plt 5 | import mlflow 6 | import mlflow.sklearn 7 | import numpy as np 8 | import pandas as pd 9 | from sklearn.linear_model import LinearRegression 10 | from sklearn.metrics import mean_squared_error, r2_score 11 | from sklearn.model_selection import train_test_split 12 | 13 | 14 | def parse_args(): 15 | # 引数の処理 16 | parser = argparse.ArgumentParser() 17 | parser.add_argument("--input_data", type=str, help="input data") 18 | parser.add_argument( 19 | "--output_dir", type=str, help="output dir", default="./outputs" 20 | ) 21 | args = parser.parse_args() 22 | return args 23 | 24 | 25 | def process_data(df): 26 | # X, y の作成 27 | X = df.drop(columns="totalAmount") 28 | y = df["totalAmount"] 29 | 30 | # 学習データ、テストデータの分割 31 | X_train, X_test, y_train, y_test = train_test_split( 32 | X, y, test_size=0.30, random_state=0 33 | ) 34 | 35 | # 分割データの出力 36 | return X_train, X_test, y_train, y_test 37 | 38 | 39 | def train_model(X_train, y_train): 40 | # データのサンプル数のロギング 41 | mlflow.log_metric("Train samples", len(X_train)) 42 | 43 | # モデル学習 44 | model = LinearRegression().fit(X_train, y_train) 45 | 46 | return model 47 | 48 | 49 | def evaluate_model(model, X_test, y_test): 50 | # データのサンプル数のロギング 51 | mlflow.log_metric("Test samples", len(X_test)) 52 | 53 | # モデル評価 54 | y_pred = model.predict(X_test) 55 | mse = mean_squared_error(y_test, y_pred) 56 | rmse = np.sqrt(mse) 57 | r2 = r2_score(y_test, y_pred) 58 | 59 | # 精度メトリックのロギング 60 | mlflow.log_metric("mse", mse) 61 | mlflow.log_metric("rmse", rmse) 62 | mlflow.log_metric("r2", r2) 63 | 64 | # 実測値と予測値のプロット 65 | plot_actuals_predictions(y_test, y_pred) 66 | 67 | 68 | def plot_actuals_predictions(y_test, y_pred): 69 | # 実測値と予測値のプロット 70 | plt.figure(figsize=(10, 7)) 71 | plt.scatter(y_test, y_pred) 72 | plt.plot(y_test, y_test, color="r") 73 | plt.title("Actual VS Predicted Values (Test set)") 74 | plt.xlabel("Actual Values") 75 | plt.ylabel("Predicted Values") 76 | plt.savefig("actuals_vs_predictions.png") 77 | 78 | # プロット画像のロギング 79 | mlflow.log_artifact("actuals_vs_predictions.png") 80 | 81 | 82 | def save_model(model, output_dir): 83 | # モデルの保存 84 | os.makedirs(os.path.join(output_dir, "models"), exist_ok=True) 85 | mlflow.sklearn.save_model(model, os.path.join(output_dir, "models")) 86 | 87 | 88 | def main(args): 89 | # 自動ロギングの有効化 90 | mlflow.autolog(log_models=False) 91 | 92 | # 引数の確認 93 | lines = [ 94 | f"学習データのパス: {args.input_data}", 95 | f"出力フォルダのパス: {args.output_dir}", 96 | ] 97 | [print(line) for line in lines] 98 | 99 | # 学習データの読み込み 100 | df = pd.read_csv(args.input_data) 101 | 102 | # データ前処理 103 | X_train, X_test, y_train, y_test = process_data(df) 104 | 105 | # モデル学習 106 | model = train_model(X_train, y_train) 107 | 108 | # モデル評価 109 | evaluate_model(model, X_test, y_test) 110 | 111 | # モデル保存 112 | dir = os.path.join(args.output_dir, os.environ["AZUREML_RUN_ID"]) 113 | save_model(model, dir) 114 | 115 | 116 | if __name__ == "__main__": 117 | # 引数の処理 118 | args = parse_args() 119 | 120 | # main 関数の実行 121 | main(args) 122 | -------------------------------------------------------------------------------- /notebooks/train-experiment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import argparse\n", 10 | "import os\n", 11 | "import shutil\n", 12 | "\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "import numpy as np\n", 15 | "import pandas as pd\n", 16 | "from pathlib import Path\n", 17 | "from sklearn.linear_model import LinearRegression\n", 18 | "from sklearn.metrics import mean_squared_error, r2_score\n", 19 | "from sklearn.model_selection import train_test_split" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": null, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "input_data = \"../data/raw/nyc_taxi_dataset.csv\"\n", 29 | "output_dir = \"outputs\"\n", 30 | "\n", 31 | "lines = [f\"学習データのパス: {input_data}\", f\"出力フォルダのパス: {output_dir}\"]\n", 32 | "\n", 33 | "for line in lines:\n", 34 | " print(line)" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "# 学習データの読み込み\n", 44 | "df = pd.read_csv(input_data)" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": null, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "df.head()" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "# X, y の作成\n", 63 | "X = df.drop(columns=\"totalAmount\")\n", 64 | "y = df[\"totalAmount\"]\n", 65 | "\n", 66 | "# 学習データ、テストデータの分割\n", 67 | "X_train, X_test, y_train, y_test = train_test_split(\n", 68 | " X, y, test_size=0.30, random_state=0\n", 69 | ")" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "# モデル学習\n", 79 | "model = LinearRegression().fit(X_train, y_train)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": null, 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "# モデル評価\n", 89 | "y_pred = model.predict(X_test)\n", 90 | "mse = mean_squared_error(y_test, y_pred)\n", 91 | "rmse = np.sqrt(mse)\n", 92 | "r2 = r2_score(y_test, y_pred)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "# outputs フォルダの作成\n", 102 | "os.makedirs(\"./outputs\", exist_ok=True)\n", 103 | "\n", 104 | "# 実測値と予測値のプロット\n", 105 | "plt.figure(figsize=(10, 7))\n", 106 | "plt.scatter(y_test, y_pred)\n", 107 | "plt.plot(y_test, y_test, color=\"r\")\n", 108 | "plt.title(\"Actual VS Predicted Values (Test set)\")\n", 109 | "plt.xlabel(\"Actual Values\")\n", 110 | "plt.ylabel(\"Predicted Values\")\n", 111 | "plt.savefig(\"./outputs/actuals_vs_predictions.png\")" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "# モデルの保存\n", 121 | "model_path = os.path.join(output_dir, \"models\")\n", 122 | "\n", 123 | "if Path(model_path).exists():\n", 124 | " shutil.rmtree(model_path)\n", 125 | "else:\n", 126 | " os.makedirs(model_path, exist_ok=True)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [] 135 | } 136 | ], 137 | "metadata": { 138 | "kernelspec": { 139 | "display_name": "Python 3.8.10 64-bit", 140 | "language": "python", 141 | "name": "python3" 142 | }, 143 | "language_info": { 144 | "codemirror_mode": { 145 | "name": "ipython", 146 | "version": 3 147 | }, 148 | "file_extension": ".py", 149 | "mimetype": "text/x-python", 150 | "name": "python", 151 | "nbconvert_exporter": "python", 152 | "pygments_lexer": "ipython3", 153 | "version": "3.8.10" 154 | }, 155 | "orig_nbformat": 4, 156 | "vscode": { 157 | "interpreter": { 158 | "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" 159 | } 160 | } 161 | }, 162 | "nbformat": 4, 163 | "nbformat_minor": 2 164 | } 165 | -------------------------------------------------------------------------------- /docs/quickstart.md: -------------------------------------------------------------------------------- 1 | # クイックスタート 2 | ## 1. コード実行 3 | サンプルコードを動かす手順を紹介します。 4 | 5 | ### Azure Machine Learning の環境準備 6 | - Azure の Subscription を準備します。 7 | - Azure のリソースグループに対する所有者権限を持っていることが前提です。 8 | - [クイックスタート: Azure Machine Learning の利用を開始するために必要なワークスペース リソースを作成する](https://learn.microsoft.com/ja-jp/azure/machine-learning/quickstart-create-resources) の手順に従って、Azure Machine Learning の `ワークスペース` と `コンピューティングインスタンス` を作成します。 9 | - [Visual Studio Code で Azure Machine Learning コンピューティング インスタンスに接続する (プレビュー)](https://learn.microsoft.com/ja-jp/azure/machine-learning/how-to-set-up-vs-code-remote?tabs=studio#configure-a-remote-compute-instance) の手順に従って、Azure Machine Learning の `コンピューティングインスタンス` にアクセス可能なことを確認します。 10 | 11 | 12 |
13 | 14 | ### GitHub の環境準備 15 | - GitHub のアカウントを準備します。 16 | - Public リポジトリを利用する前提であれば Free プラン (個人・組織の基本プラン) の[価格プラン](https://github.com/pricing)で動作しますが、セキュリティ機能などが豊富な Team プランや Enterprise プランの利用を推奨します。 17 | - 本リポジトリ [Azure/mlops-starter-sklearn](https://github.com/Azure/mlops-starter-sklearn) を自分のアカウント・組織に Fork します。 18 | - `コンピューティングインスタンス` のターミナル上で、User フォルダ (Users) 配下の自分の個人フォルダに Fork したリポジトリをクローンします。 19 | 20 | ```bash 21 | cd User/ 22 | git clone https://github.com//mlops-starter-sklearn #Fork 先のリポジトリを指定 23 | ``` 24 | 25 |
26 | 27 | ### Azure Machine Learning 上での環境変数の設定 28 | 先ほど Fork したコードを実行します。 29 | 30 | - `.env.sample` ファイルを `.env` に改名します。 31 | ```bash 32 | mv .env.sample .env 33 | ``` 34 | - `.env` ファイルを開いて環境変数を設定します。 35 | - GROUP: Azure Machine Learning ワークスペースのリソースグループ名 36 | - WORKSPACE: Azure Machine Learning ワークスペースの名前 37 | - LOCATION: Azure Machine Learning ワークスペースのリージョン 38 | - Azure Clud Shell や (Azure 認証後の) Azure ML の Compute Instance 上 で コマンド `az account list-locations -o table` を実行して Name 列を確認します。DisplayName ではありません。例えば東日本リージョンの場合は Japan East ではなく、japaneast になります。 39 | - SUBSCRIPTION: Azure サブスクリプションID 40 | 41 | _.env ファイルの記載の例_ 42 | ``` 43 | GROUP="azureml" 44 | WORKSPACE="azureml" 45 | LOCATION="japaneast" 46 | SUBSCRIPTION="xxxxxxxxxxx" 47 | ``` 48 | 49 | ### シェルスクリプトの実行 50 | - `コンピューティングインスタンス` のターミナル上で、[scripts](../scripts) フォルダの各シェルスクリプトを実行します。 51 | - Azure CLI ログイン 52 | - `az login --use-device` コマンドで Azure CLI 認証を行います。 53 | - [setup.sh](../scripts/setup.sh) 54 | - conda 仮想環境の作成 55 | - pre-commit の設定 56 | - Azure CLI と ML 拡張機能のインストール 57 | - [configure-workspace.sh](../scripts/configure-workspace.sh) 58 | - Azure CLI で利用する Azure Machine Learning ワークスペースの設定 59 | - ノートブック 60 | - [run-notebooks.sh](../scripts/prototyping/run-notebooks.sh): 実験用ノートブックの実行 61 | - アセット作成 (計算環境、データ、環境) 62 | - [create-compute.sh](../scripts/assets/create-compute.sh): コンピューティングクラスターの作成 63 | - [create-data.sh](../scripts/assets/create-data.sh): Data アセットの作成 64 | - [create-environment.sh](../scripts/assets/create-environment.sh): 環境の作成 65 | - ジョブの実行 (モデル学習) 66 | - [train.sh](../scripts/jobs/train.sh): Azure ML Job 形式でのモデル学習 67 | - アセットの作成 (モデル登録) 68 | - [register-model.sh](../scripts/assets/register-model.sh): 学習済みモデルの登録 69 | - エンドポイントの作成 70 | - [deploy-online-endpoint-custom.sh](../scripts/endpoints/deploy-online-endpoint-custom.sh): カスタム型モデルのバッチエンドポイントへのデプロイ 71 | - [deploy-online-endpoint-mlflow.sh](../scripts/endpoints/deploy-online-endpoint-mlflow.sh): MLflow 型モデルのバッチエンドポイントへのデプロイ 72 | - [deploy-batch-endpoint-custom.sh](../scripts/endpoints/deploy-batch-endpoint-custom.sh): カスタム型モデルのオンラインエンドポイントへのデプロイ 73 | - [deploy-batch-endpoint-mlflow.sh](../scripts/endpoints/deploy-batch-endpoint-mlflow.sh): MLflow 型モデルのオンラインエンドポイントへのデプロイ 74 | 75 | #### E2E のスクリプト実行例 76 | 77 | ```bash 78 | # Azure ログイン認証 79 | az login --use-device 80 | 81 | # Python 環境の構築、Jupyter カーネルの設定、pre-commit 設定、Azure CLI インストール 82 | bash ./scripts/setup.sh 83 | 84 | # 環境変数の読み込みと Azure CLI の設定 85 | bash ./scripts/configure-workspace.sh 86 | 87 | # Notebook の実行 88 | bash ./scripts/prototyping/run-notebooks.sh 89 | 90 | # アセット (計算環境、データアセット、環境) の作成 91 | bash ./scripts/assets/create-compute.sh 92 | bash ./scripts/assets/create-data.sh 93 | bash ./scripts/assets/create-environment.sh 94 | 95 | # Job の実行 96 | bash ./scripts/jobs/train.sh 97 | 98 | # モデルの登録 99 | bash ./scripts/assets/register-model.sh 100 | 101 | # 推論環境の構築 102 | bash ./scripts/endpoints/deploy-online-endpoint-custom.sh 103 | bash ./scripts/endpoints/deploy-online-endpoint-mlflow.sh 104 | bash ./scripts/endpoints/deploy-batch-endpoint-custom.sh 105 | bash ./scripts/endpoints/deploy-batch-endpoint-mlflow.sh 106 | ``` 107 | 108 | --- 109 | 110 | ## 2. CI/CD の実行 111 | 112 | ### GitHub Actions のシークレット作成 113 | - GitHub Actions のシークレットを作成します。 114 | - GROUP: Azure Machine Learning ワークスペースのリソースグループ名 115 | - WORKSPACE: Azure Machine Learning ワークスペースの名前 116 | - SUBSCRIPTION: Azure サブスクリプション ID 117 | - AZURE_CREDENTIALS: Azure の接続情報 118 | - Azure Service Principal を利用する想定で書かれています。技術的には OpenID Connect の利用も可能ですが、本ドキュメントやコードは Azure Service Principal を利用することを前提に作成されています。 119 | - 資格情報とそれをシークレット AZURE_CREDENTAL に設定する詳細な方法は [Azure Machine Learning で GitHub Actions を使用する - 手順2. Azure での認証](https://learn.microsoft.com/ja-JP/azure/machine-learning/how-to-github-actions-machine-learning?tabs=userlevel#step-2-authenticate-with-azure) をご参照ください。 120 | 121 | ### GitHub Actions の有効化と実行 122 | Fork 先の GitHub のページ内の `Actions` タブにアクセスし、GitHub Actions を有効化します。詳細は [GitHub アクション - ワークフローの無効化と有効化](https://docs.github.com/ja/actions/managing-workflow-runs/disabling-and-enabling-a-workflow) をご確認ください。 123 | -------------------------------------------------------------------------------- /notebooks/train-mlflow-local.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import argparse\n", 10 | "import os\n", 11 | "import shutil\n", 12 | "\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "import mlflow\n", 15 | "import mlflow.sklearn\n", 16 | "import numpy as np\n", 17 | "import pandas as pd\n", 18 | "from pathlib import Path\n", 19 | "from sklearn.linear_model import LinearRegression\n", 20 | "from sklearn.metrics import mean_squared_error, r2_score\n", 21 | "from sklearn.model_selection import train_test_split" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "input_data = \"../data/raw/nyc_taxi_dataset.csv\"\n", 31 | "output_dir = \"outputs\"\n", 32 | "\n", 33 | "lines = [f\"学習データのパス: {input_data}\", f\"出力フォルダのパス: {output_dir}\"]\n", 34 | "\n", 35 | "for line in lines:\n", 36 | " print(line)" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "# 自動ロギングの有効化\n", 46 | "mlflow.autolog(log_models=False)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "# 学習データの読み込み\n", 56 | "df = pd.read_csv(input_data)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "df.head()" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "# X, y の作成\n", 75 | "X = df.drop(columns=\"totalAmount\")\n", 76 | "y = df[\"totalAmount\"]\n", 77 | "\n", 78 | "# 学習データ、テストデータの分割\n", 79 | "X_train, X_test, y_train, y_test = train_test_split(\n", 80 | " X, y, test_size=0.30, random_state=0\n", 81 | ")" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "# データのサンプル数のロギング\n", 91 | "mlflow.log_metric(\"Train samples\", len(X_train))\n", 92 | "\n", 93 | "# モデル学習\n", 94 | "model = LinearRegression().fit(X_train, y_train)" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": null, 100 | "metadata": {}, 101 | "outputs": [], 102 | "source": [ 103 | "# データのサンプル数のロギング\n", 104 | "mlflow.log_metric(\"Test samples\", len(X_test))\n", 105 | "\n", 106 | "# モデル評価\n", 107 | "y_pred = model.predict(X_test)\n", 108 | "mse = mean_squared_error(y_test, y_pred)\n", 109 | "rmse = np.sqrt(mse)\n", 110 | "r2 = r2_score(y_test, y_pred)\n", 111 | "\n", 112 | "# 精度メトリックのロギング\n", 113 | "mlflow.log_metric(\"mse\", mse)\n", 114 | "mlflow.log_metric(\"rmse\", rmse)\n", 115 | "mlflow.log_metric(\"r2\", r2)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "# outputs フォルダの作成\n", 125 | "os.makedirs(\"./outputs\", exist_ok=True)\n", 126 | "\n", 127 | "# 実測値と予測値のプロット\n", 128 | "plt.figure(figsize=(10, 7))\n", 129 | "plt.scatter(y_test, y_pred)\n", 130 | "plt.plot(y_test, y_test, color=\"r\")\n", 131 | "plt.title(\"Actual VS Predicted Values (Test set)\")\n", 132 | "plt.xlabel(\"Actual Values\")\n", 133 | "plt.ylabel(\"Predicted Values\")\n", 134 | "plt.savefig(\"./outputs/actuals_vs_predictions.png\")\n", 135 | "\n", 136 | "# プロット画像のロギング\n", 137 | "mlflow.log_artifact(\"./outputs/actuals_vs_predictions.png\")" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "# モデルの保存\n", 147 | "model_path = os.path.join(output_dir, \"models\")\n", 148 | "\n", 149 | "if Path(model_path).exists():\n", 150 | " shutil.rmtree(model_path)\n", 151 | "else:\n", 152 | " os.makedirs(model_path, exist_ok=True)\n", 153 | "\n", 154 | "mlflow.sklearn.save_model(model, model_path)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "# MLflow UI の起動\n", 164 | "#!mlflow ui --backend-store-uri ./mlruns" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [] 173 | } 174 | ], 175 | "metadata": { 176 | "kernelspec": { 177 | "display_name": "Python 3.8.13 ('mlops-train')", 178 | "language": "python", 179 | "name": "python3" 180 | }, 181 | "language_info": { 182 | "codemirror_mode": { 183 | "name": "ipython", 184 | "version": 3 185 | }, 186 | "file_extension": ".py", 187 | "mimetype": "text/x-python", 188 | "name": "python", 189 | "nbconvert_exporter": "python", 190 | "pygments_lexer": "ipython3", 191 | "version": "3.8.5" 192 | }, 193 | "orig_nbformat": 4, 194 | "vscode": { 195 | "interpreter": { 196 | "hash": "74419d3d9274bcbfe6ecb9acd0596b867bc1ac63effdfbb8a6e0b958ebbd5c34" 197 | } 198 | } 199 | }, 200 | "nbformat": 4, 201 | "nbformat_minor": 2 202 | } 203 | -------------------------------------------------------------------------------- /notebooks/train-model-debugging.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import argparse\n", 10 | "import os\n", 11 | "import shutil\n", 12 | "\n", 13 | "import matplotlib.pyplot as plt\n", 14 | "import mlflow\n", 15 | "import mlflow.sklearn\n", 16 | "import numpy as np\n", 17 | "import pandas as pd\n", 18 | "from pathlib import Path\n", 19 | "from sklearn.linear_model import LinearRegression\n", 20 | "from sklearn.metrics import mean_squared_error, r2_score\n", 21 | "from sklearn.model_selection import train_test_split" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "input_data = \"../data/raw/nyc_taxi_dataset.csv\"\n", 31 | "output_dir = \"outputs\"\n", 32 | "\n", 33 | "lines = [f\"学習データのパス: {input_data}\", f\"出力フォルダのパス: {output_dir}\"]\n", 34 | "\n", 35 | "for line in lines:\n", 36 | " print(line)" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "# 自動ロギングの有効化\n", 46 | "mlflow.autolog(log_models=False)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "# 学習データの読み込み\n", 56 | "df = pd.read_csv(input_data)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "df.head()" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "# X, y の作成\n", 75 | "X = df.drop(columns=\"totalAmount\")\n", 76 | "y = df[\"totalAmount\"]\n", 77 | "\n", 78 | "# 学習データ、テストデータの分割\n", 79 | "X_train, X_test, y_train, y_test = train_test_split(\n", 80 | " X, y, test_size=0.30, random_state=0\n", 81 | ")" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "# データのサンプル数のロギング\n", 91 | "mlflow.log_metric(\"Train samples\", len(X_train))\n", 92 | "\n", 93 | "# モデル学習\n", 94 | "model = LinearRegression().fit(X_train, y_train)" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": null, 100 | "metadata": {}, 101 | "outputs": [], 102 | "source": [ 103 | "# データのサンプル数のロギング\n", 104 | "mlflow.log_metric(\"Test samples\", len(X_test))\n", 105 | "\n", 106 | "# モデル評価\n", 107 | "y_pred = model.predict(X_test)\n", 108 | "mse = mean_squared_error(y_test, y_pred)\n", 109 | "rmse = np.sqrt(mse)\n", 110 | "r2 = r2_score(y_test, y_pred)\n", 111 | "\n", 112 | "# 精度メトリックのロギング\n", 113 | "mlflow.log_metric(\"mse\", mse)\n", 114 | "mlflow.log_metric(\"rmse\", rmse)\n", 115 | "mlflow.log_metric(\"r2\", r2)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "# outputs フォルダの作成\n", 125 | "os.makedirs(\"./outputs\", exist_ok=True)\n", 126 | "\n", 127 | "# 実測値と予測値のプロット\n", 128 | "plt.figure(figsize=(10, 7))\n", 129 | "plt.scatter(y_test, y_pred)\n", 130 | "plt.plot(y_test, y_test, color=\"r\")\n", 131 | "plt.title(\"Actual VS Predicted Values (Test set)\")\n", 132 | "plt.xlabel(\"Actual Values\")\n", 133 | "plt.ylabel(\"Predicted Values\")\n", 134 | "plt.savefig(\"./outputs/actuals_vs_predictions.png\")\n", 135 | "\n", 136 | "# プロット画像のロギング\n", 137 | "mlflow.log_artifact(\"./outputs/actuals_vs_predictions.png\")" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "# モデルの保存\n", 147 | "model_path = os.path.join(output_dir, \"models\")\n", 148 | "\n", 149 | "if Path(model_path).exists():\n", 150 | " shutil.rmtree(model_path)\n", 151 | "else:\n", 152 | " os.makedirs(model_path, exist_ok=True)\n", 153 | "\n", 154 | "mlflow.sklearn.save_model(model, model_path)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "# Responsible AI Toolbox ライブラリのインポート\n", 164 | "from raiwidgets import ResponsibleAIDashboard\n", 165 | "from responsibleai import RAIInsights" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "# データ準備\n", 175 | "train_data = X_train.copy()\n", 176 | "train_data[\"totalAmount\"] = y_train\n", 177 | "\n", 178 | "test_data = X_test.copy()\n", 179 | "test_data[\"totalAmount\"] = y_test\n", 180 | "\n", 181 | "target_feature = \"totalAmount\"" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "# 設定\n", 191 | "rai_insights = RAIInsights(\n", 192 | " model, train_data, test_data, target_feature, \"regression\", categorical_features=[]\n", 193 | ")" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": null, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "# モデルの説明性\n", 203 | "rai_insights.explainer.add()\n", 204 | "# エラー分析\n", 205 | "rai_insights.error_analysis.add()\n", 206 | "# 反実仮想データの生成\n", 207 | "rai_insights.counterfactual.add(total_CFs=20, desired_range=[50, 250])" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": null, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "# 計算処理の実行\n", 217 | "rai_insights.compute()" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "# ダッシュボードの生成\n", 227 | "ResponsibleAIDashboard(rai_insights)" 228 | ] 229 | } 230 | ], 231 | "metadata": { 232 | "kernelspec": { 233 | "display_name": "Python 3.8.10 64-bit", 234 | "language": "python", 235 | "name": "python3" 236 | }, 237 | "language_info": { 238 | "codemirror_mode": { 239 | "name": "ipython", 240 | "version": 3 241 | }, 242 | "file_extension": ".py", 243 | "mimetype": "text/x-python", 244 | "name": "python", 245 | "nbconvert_exporter": "python", 246 | "pygments_lexer": "ipython3", 247 | "version": "3.8.10" 248 | }, 249 | "orig_nbformat": 4, 250 | "vscode": { 251 | "interpreter": { 252 | "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" 253 | } 254 | } 255 | }, 256 | "nbformat": 4, 257 | "nbformat_minor": 2 258 | } 259 | -------------------------------------------------------------------------------- /docs/coding-guidelines.md: -------------------------------------------------------------------------------- 1 | # コーディングガイドライン 2 | 3 | コード品質を改善するために本リポジトリで利用するツールの概要や機械学習システムへの導入方法を記載します。 4 | 5 | 6 | ## 概念 7 | 8 | 複数のエンジニアによる共同開発において、プロジェクトまたはリポジトリ全体で一貫性を保つことは解釈の違いを減らすことや可読性の向上、引継ぎの工数を減らす観点で重要です。 9 | これらを実現するために、Linter やテキスト解析・整形ツールを使用する方法があります。 10 | 11 | 本リポジトリでは、次のツールの活用を推奨します。 12 | 13 | - [Linter](#linter) 14 | - [Flake8](#flake8) 15 | - [Formatter](#formatter) 16 | - [black](#black) 17 | - [型ヒント](#型ヒント) 18 | - [mypy](#mypy) 19 | - [Git hook](#git-hook) 20 | - [pre-commit](#pre-commit) 21 | 22 | ## Install 23 | 本テンプレートを利用する際は、まずpre-commit環境、conda環境、Azure CLI v2環境の構築を行います。 24 | `/.pre-commit-config.yaml` にすでにFlake8、black、isort の設定の記述がされているので次の方法で反映を行います。 25 | 26 | ※ VSCodeを用いる場合は`.vscode/settings.json` に black、 flake8、isort を設定します。 27 | 詳細はこちらの[Editing](https://code.visualstudio.com/docs/python/editing), [Linting](https://code.visualstudio.com/docs/python/linting)のVSCodeのドキュメントをご参考ください。 28 | 29 | 30 | 続いて、pre-commitの内容の反映とconda/Azure CLI環境を設定します。 31 | 32 | **devcontainer を利用する場合**
33 | pre-commit のインストールと設定は自動で反映されます。 34 | - [.devcontainer/Dockerfile](.devcontainer/Dockerfile) : devcontainer を構築する Docker ファイル 35 | - [.pre-commit-config.yaml](.pre-commit-config.yaml) : pre-commit の設定 36 | 37 | **devcontainer を利用しない場合**
38 | シェルスクリプト [scripts/setup.sh](scripts/setup.sh) を実行してください。 39 | 40 | ```sh 41 | chmod +x ./scripts/setup.sh #必要に応じて 42 | bash ./scripts/setup.sh 43 | ``` 44 | 45 | その後、git commit 時にpre-commitの動作確認を行ってください。 46 | 47 | ## CI/CD パイプライン (GitHub Actions) 48 | 49 | GitHub にコードがpush された段階で GitHub Actions 上でコードの確認をします。開発端末での漏れを防ぐことができます。 50 | 51 | **参考** 52 | - [Black with GitHub Actions integration](https://black.readthedocs.io/en/stable/integrations/github_actions.html) : Black の GitHub Actions 実装サンプル 53 | - [pre-commit action](https://github.com/pre-commit/action) : pre-commit の GitHub Actions 実装サンプル 54 | 55 | 56 | ## 各種ツールの簡易説明 57 | ### Linter 58 | 59 | コンパイラやインタープリタよりも厳しくソースコードをチェックし、文法だけでなく、バグの原因となる記述を検出して警告してくれるツール。例えば、ソースコード内で未使用の変数や初期化されていない変数のチェックします。 60 | 61 | #### ◼︎ Flake8 62 | [Flake8](https://flake8.pycqa.org/en/latest/#) は、Python コードの静的解析ツールです。次の3つのツールのラッパーであり、単一のスクリプトを起動することですべてのツールを実行します。 63 | 64 | - PyFlakes: コードに論理的なエラーが無いかを確認。 65 | - pep8: コードがコーディング規約([PEP8](https://pep8.readthedocs.io/en/latest/))に準じているかを確認 66 | - Ned Batchelder’s McCabe script: 循環的複雑度のチェック。 67 | 68 |
69 | 導入設定の詳細 70 |
71 | 72 | 1. flake8 のインストール 73 | ```sh 74 | pip install flake8 75 | ``` 76 | 2. flake8 によるチェックの実行 77 | ```sh 78 | flake8 <任意のディレクトリ or Pythonファイル> # チェックしたい対象を指定して実行 79 | ``` 80 | 3. コードの修正箇所の表示 (show-sourceオプションの指定) 81 | ```sh 82 | flake8 --show-source <任意のディレクトリ or Pythonファイル> # チェックしたいファイルを指定して実行 83 | ``` 84 | 85 |
86 | 87 |
88 | 89 | ### Formatter 90 | 91 | ソースコードのスタイル(スペースの数、改行の位置、コメントの書き方など)をチェックし、自動的に修正・整形してくれるツールです。 92 | 93 | #### ◼︎ black 94 | [black](https://black.readthedocs.io/en/stable/index.html) は一貫性、一般性、可読性及び git 差分の削減を追求した Formatter ツールです。black のコードスタイルは[こちら](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html)のドキュメントに記載してあります。 95 | 96 |
97 | 導入設定の詳細 98 |
99 | 100 | 1. black のインストール 101 | 102 | ```sh 103 | # 通常 104 | pip install black 105 | 106 | # jupyter notebookを対象とする場合 107 | pip install black[jupyter] 108 | ``` 109 | 110 | 2. black によるフォーマットの実行 111 | 112 | ```sh 113 | black <任意のディレクトリ or Pythonファイル> # チェックしたい対象を指定して実行 114 | ``` 115 | ※ git hookの設定 (githookについては本ページの下の方で解説あり) 116 | git commit 前に black が自動実行されるようにするためには、Git で管理しているプロジェクトディレクトリの`.git/hooks/pre-commit`ファイルに下記の記述をすることで可能です。 117 | 118 | ```sh:pre-commit 119 | #!/bin/bash 120 | black . 121 | ``` 122 | 123 | 実行可能なファイルへ権限を付与します。 124 | 125 | ```sh 126 | chmod +x .git/hooks/pre-commit 127 | ``` 128 | 129 | 130 | ※ black を利用していることを示すバッジをREADME.mdに表記する方法 131 | 132 | [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) 133 | 134 | ▼ こちらを記述。 135 | ```md 136 | [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) 137 | ``` 138 |
139 |
140 | 141 | ### 型ヒント 142 | 143 | Python ではオプションで型ヒントがサポートされています。 144 | 145 | #### ◼︎ mypy 146 | 147 | [mypy](https://mypy.readthedocs.io/en/stable/index.html#) は型ヒントの静的チェックツールです。Python は関数や変数に対する型を強制しない仕様のため、型に注意して実装する必要があります。mypy は型アノテーションに基づきコードのバグを検知します。 148 | 149 |
150 | 導入設定の詳細 151 |
152 | 153 | 1. mypy のインストール 154 | ```sh 155 | pip install mypy 156 | ``` 157 | 158 | 2. 設定 159 | 型情報を保持する stub ファイルが存在しないパッケージに対するエラーを除外するために、次のように _mypy.ini_ に ignore_missing_imports = True を記載します。 160 | ``` 161 | [mypy-numpy] 162 | ignore_missing_imports = True 163 | 164 | [mypy-pandas.*] 165 | ignore_missing_imports = True 166 | 167 | [mypy-sklearn.*] 168 | ignore_missing_imports = True 169 | 170 | [mypy-matplotlib.*] 171 | ignore_missing_imports = True 172 | 173 | [mypy-mlflow.*] 174 | ignore_missing_imports = True 175 | 176 | [mypy-azureml.*] 177 | ignore_missing_imports = True 178 | 179 | [mypy-dateutil.*] 180 | ignore_missing_imports = True 181 | ``` 182 | 183 | 3. mypy による型チェックの実行 184 | ```bash 185 | $ mypy train.py 186 | Success: no issues found in 1 source file 187 | ``` 188 | 189 | 190 |
191 |
192 | 193 | ### Git hook 194 | #### ◼︎ pre-commit 195 | `pre-commit` は Git hook の Python ラッパーです。 196 | 197 |
198 | 導入設定の詳細 199 |
200 | 201 | 1. pre-commit のインストール 202 | 203 | ```bash 204 | $ pip install pre-commit 205 | ``` 206 | 207 | 2. サンプルの設定ファイルの生成 208 | 209 | ```bash 210 | $ pre-commit sample-config > .pre-commit-config.yaml 211 | ``` 212 | 213 | 3. git hook へのインストール 214 | 215 | ```bash 216 | $ pre-commit install 217 | ``` 218 | 219 | 4. 設定 (.pre-commit-config.yaml) 220 | 221 | ```yml 222 | repos: 223 | # サンプルで生成されるもの (pre-commit sample-config > .pre-commit-config.yaml) 224 | - repo: https://github.com/pre-commit/pre-commit-hooks 225 | rev: v4.3.0 226 | hooks: 227 | - id: trailing-whitespace 228 | - id: no-commit-to-branch 229 | args: [--branch, main] 230 | - id: end-of-file-fixer 231 | - id: check-yaml 232 | - id: check-added-large-files 233 | ``` 234 | 235 | 5. pre-commit の 実行 236 | 237 | ```bash 238 | $ git commit -m "pre-commit demo" 239 | [WARNING] Unstaged files detected. 240 | [INFO] Stashing unstaged files to /home/vscode/.cache/pre-commit/patch1666333249-14074. 241 | trim trailing whitespace.................................................Passed 242 | don't commit to branch...................................................Passed 243 | fix end of files.........................................................Passed 244 | check yaml...............................................................Passed 245 | check for added large files..............................................Passed 246 | [INFO] Restored changes from /home/vscode/.cache/pre-commit/patch1666333249-14074. 247 | [coding-guideline-v1 c101751] pre-commit demo 248 | 2 files changed, 19 insertions(+), 20 deletions(-) 249 | ``` 250 | #### 参考 251 | 252 | - [Git hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) 253 | - [pre-commit](https://pre-commit.com/) 254 | 255 |
256 | 257 | 258 | ## インスピレーション 259 | 以下の資料が本書に多大なインスピレーションを与えてくれた主な参考資料です。 260 | - [Code with Engineering](https://microsoft.github.io/code-with-engineering-playbook/) 261 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 |

3 | 4 |   5 | MLOps with Azure Machine Learning 6 |

7 | Azure Machine Learning + GitHub を利用した MLOps 実装サンプルコード 8 | 9 | [![MIT licensed](https://img.shields.io/badge/license-MIT-brightgreen.svg)](LICENSE) 10 | [![](https://img.shields.io/github/contributors-anon/Azure/mlops-starter-sklearn)](https://github.com/Azure/mlops-starter-sklearn/graphs/contributors) 11 | [![Star](https://img.shields.io/github/stars/Azure/mlops-starter-sklearn.svg)](https://github.com/Azure/mlops-starter-sklearn) 12 | [![Open in VSCode](https://img.shields.io/static/v1?logo=visualstudiocode&label=&message=Open%20in%20VSCode&labelColor=2c2c32&color=007acc&logoColor=007acc)](https://open.vscode.dev/Azure/mlops-starter-sklearn) 13 | 14 |
15 | 16 | --- 17 | 18 | ## 👋 概要 19 | 本リポジトリは、MLOps のサンプルコードを素早く利用できることを目的に作成されました。Azure Machine Learning と GitHub Actions を利用する想定です。 20 | 21 | 22 | ## 🚀 使い方 23 | - Azure Machine Learning と GitHub の環境を準備します。 24 | - クライアント環境として下記のいずれかにアクセスします。 25 | - Azure Machine Learning のコンピューティングインスタンス 26 | - DevContainer 環境 27 | - Conda でのパッケージインストールの際にメモリを消費するため、ある程度大きいスペックが必要になります。Codespaces の場合、4-core / 8GB RAM / 32GB storage 以上の Machine Type を選択してください。 28 | - .env ファイルに環境変数の設定をします。 29 | - [./scripts](./scripts) フォルダの各シェルスクリプトを実行します。 30 | - GitHub の Secrets を作成し、GitHub Actions を有効化し実行します。 31 | 32 | :point_right: **クライアント環境として Azure Machine Learning のコンピューティングインスタンス (Compute Instance) を利用した場合のコードや CI/CD の実行方法は [クイックスタート](./docs/quickstart.md) のドキュメントに記載してあります。** 33 | 34 | 35 | ## 📝 技術条件 36 | - GitHub 37 | - ソースコード管理、CI/CD パイプライン 38 | - Data 39 | - [NYC タクシー & リムジン協会 - グリーンタクシー運行記録](https://learn.microsoft.com/ja-jp/azure/open-datasets/dataset-taxi-green?tabs=azureml-opendatasets) 40 | - Azure Machine Learning 41 | - チーム・組織で共有の機械学習プラットフォーム 42 | - Compute Instance : CPU タイプ、クライアント端末 43 | - もしくは Dev Container に対応した GitHub Codespace など 44 | - Compute Cluster : 共有のクラスター環境 45 | - API : Azure Machine Learning CLI (v2) 46 | - IDE/Editor 47 | - Visual Studio Code 48 | 49 | ## 📁 コンテンツ 50 | ### Assets 51 | **CLI v2 + YAML** 52 | 53 | |シナリオ |YAML ファイル|シェルスクリプト|詳細 | 54 | |--------------------|---------|-----------|-----------| 55 | |Create Data asset |[cli/assets/create-data.yml](cli/assets/create-data.yml)|[scripts/assets/create-data.sh](scripts/assets/create-data.sh)|データアセットを作成する| 56 | |Create Compute Cluster|[cli/assets/create-compute.yml](cli/assets/create-compute.yml)|[scripts/assets/create-compute.sh](scripts/assets/create-compute.sh)|Compute を作成する| 57 | |Create Environment for training|[cli/assets/create-environment.yml](cli/assets/create-environment.yml)|[scripts/assets/create-environment.sh](scripts/assets/create-environment.sh)|環境を作成する| 58 | 59 | ### Prototyping 60 | **Notebook** 61 | 62 | |シナリオ |Notebook|シェルスクリプト|詳細 | 63 | |--------------------|---------|-----------|-----------| 64 | |Baseline Notebook |[notebooks/train-prototyping.ipynb](notebooks/train-prototyping.ipynb)|[scripts/prototyping/run-notebooks.sh](scripts/prototyping/run-notebooks.sh)|実験用の Notebook| 65 | 66 | 67 | ### Training 68 | **CLI v2 + YAML** 69 | 70 | |シナリオ |YAML ファイル|シェルスクリプト|詳細 | 71 | |--------------------|---------|-----------|-----------| 72 | |Job for training model |[cli/jobs/train.yml](cli/jobs/train.yml) |[scripts/training/train.sh](scripts/training/train.sh)| Azure ML の Job として Python script を実行 | 73 | 74 | 75 | **CI/CD Pipeline** 76 | |シナリオ |YAML ファイル|Status |詳細 | 77 | |--------------------|---------|-----------|-----------| 78 | |Smoke Test |[.github/workflows/smoke-testing.yml](.github/workflows/smoke-testing.yml)|[![smoke-testing](https://github.com/Azure/MLInsider-MLOps/actions/workflows/smoke-testing.yml/badge.svg)](https://github.com/Azure/MLInsider-MLOps/actions/workflows/smoke-testing.yml)|Smoke Test パイプライン| 79 | 80 | 81 | ### Operationalizing 82 | **CLI v2 + YAML** 83 | 84 | |シナリオ |YAML ファイル |シェルスクリプト|詳細 | 85 | |----------------------------------|---------|-----------|-----------| 86 | |Create Batch Endpoint (custom) |[cli/endpoints/batch_deployment.yml](cli/endpoints/batch_deployment.yml)|[scripts/endpoints/deploy-batch-endpoint.sh](scripts/endpoints/deploy-batch-endpoint-custom.sh) |カスタム型モデルのバッチエンドポイントへのデプロイ| 87 | |Create Batch Endpoint (mlflow) |[cli/endpoints/batch_deployment_mlflow.yml](cli/endpoints/batch_deployment_mlflow.yml)|[scripts/endpoints/deploy-batch-endpoint.sh](scripts/endpoints/deploy-batch-endpoint-mlflow.sh)|MLflow 型モデルのバッチエンドポイントへのデプロイ| 88 | |Create Online Endpoint (custom) |[cli/endpoints/online_deployment.yml](cli/endpoints/online_deployment.yml)|[scripts/endpoints/deploy-online-endpoint-custom.sh](scripts/endpoints/deploy-online-endpoint-custom.sh)|カスタム型モデルのオンラインエンドポイントへのデプロイ| 89 | |Create Online Endpoint (mlflow) |[cli/endpoints/online_deployment_mlflow.yml](cli/endpoints/online_deployment_mlflow.yml)|[scripts/endpoints/deploy-online-endpoint-mlflow.sh](scripts/endpoints/deploy-online-endpoint-mlflow.sh)|MLflow 型モデルのオンラインエンドポイントへのデプロイ| 90 | 91 | 92 | ### CI/CD Pipeline 93 | 94 | >TODO 95 | 96 | ## 🗒️ ドキュメンテーション 97 | - [クイックスタート](./docs/quickstart.md) 98 | - [Coding Guideline](./docs/coding-guidelines.md) 99 | 100 | ## 📄 ディレクトリ構造 101 | 102 | ``` 103 | . 104 | ├── .devcontainer # Configuration files for DevContainer 105 | ├── .github 106 | │ └── workflows # YAML files for GitHub Actions 107 | ├── .vscode 108 | ├── cli # YAML files for Azure ML CLI v2 109 | │ ├── assets 110 | │ ├── endpoints 111 | │ └── jobs 112 | ├── data # Sample data 113 | │ ├── raw 114 | │ └── samples 115 | ├── docs # Documenting quickstart, coding style guide etc 116 | ├── environments # Python libraries 117 | ├── notebooks # Jupyter Notebook 118 | ├── pipelines # Azure ML Pipeline CLI v2 119 | │ ├── eval 120 | │ ├── prep 121 | │ ├── score 122 | │ └── train 123 | ├── scripts 124 | │ ├── assets # Shell scripts for creating assets like data, compute, environment 125 | │ ├── endpoints # Shell scripts for scoring model 126 | │ ├── jobs # Shell scripts for model training 127 | │ └── prototyping # Shell scripts for experimental 128 | ├── src 129 | │ ├── data # Code for data preparation 130 | │ ├── deploy # Code for scoring model 131 | │ ├── features # Code for feature engineering 132 | │ ├── model # Code for model training 133 | │ ├── monitor # Code for monitoing data and model 134 | │ └── rai # Code for responsible ai 135 | ├── tests 136 | │ ├── data_validation # Code for validating data 137 | │ └── unit # Code for unit testing 138 | └── utils # Code for utilities 139 | ``` 140 | 141 | --- 142 | 143 | ## 関連リポジトリ/リソース 144 | 145 | ### 主要なリポジトリ/リソースとの比較 146 | | リポジトリ/リソース名 | 概要と目的 | 本リポジトリとの差異 | 147 | | --- | --- | --- | 148 | | [microsoft/MLOps](https://github.com/microsoft/MLOps) | MLOps の概要説明から、Microsoft 製品でどのように MLOps を実現するのか Azure DevOps や GitHub Actions、IaCなどのツール単位やシナリオ単位でサンプルコードを提供している。 | 本リポジトリは、ツールを1シナリオに絞り、迅速的に活用可能な MLOps のテンプレートを提供する。| 149 | | [Azure/mlops-v2](https://github.com/Azure/mlops-v2) | MLOps に関してより広範で抽象的なテンプレート。 | 本リポジトリは、より具体的なデータ・コードを含み、実行可能なサンプルを目指している。| 150 | | [Azure/azureml-examples](https://github.com/Azure/azureml-examples) | AzureML に関してのサンプルコード集。テストコード等が整備されている。 | 本リポジトリは、単発のサンプルコード群ではなく、ML ライフサイクルの一連の流れが end-to-end で網羅されたものを目指している。| 151 | | [Tutorial: Azure Machine Learning in a day](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day) | End-to-End で AzureML を学ぶことができるチュートリアルページ。 | 本リポジトリは、比較対象のリポジトリでは掲載されていない ML システムを設計・運用していくための Tips も提供している。| 152 | 153 | ### その他 154 | - https://github.com/dslp/dslp-repo-template 155 | 156 | ## 🛡 免責事項 157 | 当社は、外部のリンク先ウェブサイトの内容に関していかなる責任も負うものではありません。お客様は、自らの責任においてこれらのリンクをご利用ください。なお、お客様によるリンクご利用の結果、ないしはリンクご利用に関連して、お客様が被るいかなる損害または損失について当社は、責任を負うものではありません。 158 | 159 | ## 🤝 Contributing 160 | We are welcome your contribution from customers and internal microsoft employees. Please see [CONTRIBUTING](./CONTRIBUTING.md). We appreciate all contributors from Microsoft employees and community to make this repo thrive. 161 | 162 | 163 | 164 | 165 | ## Trademarks 166 | 167 | This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 168 | trademarks or logos is subject to and must follow 169 | [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). 170 | Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. 171 | Any use of third-party trademarks or logos are subject to those third-party's policies. 172 | --------------------------------------------------------------------------------