├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── app.py
├── cdk.json
├── container
    └── Dockerfile
├── deploy_stack.sh
├── lab
    ├── 1_track_experiments.ipynb
    ├── 2_track_experiments_hpo.ipynb
    ├── 3_deploy_model.ipynb
    └── source_dir
    │   ├── requirements.txt
    │   ├── setup.py
    │   └── train.py
├── media
    ├── architecture-experiments.png
    ├── architecture-mlflow.png
    ├── load-balancer.png
    └── mlflow-interface.png
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Created by .ignore support plugin (hsz.mobi)
  2 | ### Python template
  3 | # Byte-compiled / optimized / DLL files
  4 | __pycache__/
  5 | *.py[cod]
  6 | *$py.class
  7 | 
  8 | # C extensions
  9 | #*.so
 10 | 
 11 | # Distribution / packaging
 12 | .Python
 13 | build/
 14 | develop-eggs/
 15 | dist/
 16 | downloads/
 17 | eggs/
 18 | .eggs/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | 
 53 | # Translations
 54 | *.mo
 55 | *.pot
 56 | 
 57 | # Django stuff:
 58 | *.log
 59 | local_settings.py
 60 | db.sqlite3
 61 | db.sqlite3-journal
 62 | 
 63 | # Flask stuff:
 64 | instance/
 65 | .webassets-cache
 66 | 
 67 | # Scrapy stuff:
 68 | .scrapy
 69 | 
 70 | # Sphinx documentation
 71 | docs/_build/
 72 | 
 73 | # PyBuilder
 74 | target/
 75 | 
 76 | # Jupyter Notebook
 77 | .ipynb_checkpoints
 78 | 
 79 | # IPython
 80 | profile_default/
 81 | ipython_config.py
 82 | 
 83 | # pyenv
 84 | .python-version
 85 | 
 86 | # pipenv
 87 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 88 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 89 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 90 | #   install all needed dependencies.
 91 | #Pipfile.lock
 92 | 
 93 | # celery beat schedule file
 94 | celerybeat-schedule
 95 | 
 96 | # SageMath parsed files
 97 | *.sage.py
 98 | 
 99 | # Environments
100 | .env
101 | .venv
102 | env/
103 | venv/
104 | ENV/
105 | env.bak/
106 | venv.bak/
107 | 
108 | # Spyder project settings
109 | .spyderproject
110 | .spyproject
111 | 
112 | # Rope project settings
113 | .ropeproject
114 | 
115 | # mkdocs documentation
116 | /site
117 | 
118 | # mypy
119 | .mypy_cache/
120 | .dmypy.json
121 | dmypy.json
122 | 
123 | # Pyre type checker
124 | .pyre/
125 | 
126 | .idea/
127 | deploy/
128 | test/
129 | **/.DS_Store
130 | 
131 | # direnv
132 | .envrc
133 | # AWS CDK
134 | cdk.out/


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Manage your machine learning lifecycle with MLflow and Amazon SageMaker
  2 | 
  3 | ### Overview
  4 | 
  5 | In this repository we show how to deploy MLflow on AWS Fargate and how to use it during your ML project
  6 | with [Amazon SageMaker](https://aws.amazon.com/sagemaker). You will use Amazon SageMaker to develop, train, tune and
  7 | deploy a Scikit-Learn based ML model (Random Forest) and track experiment runs and models with MLflow.
  8 | 
  9 | This implementation shows how to do the following:
 10 | 
 11 | * Host a serverless MLflow server on AWS Fargate with S3 as artifact store and RDS and backend stores
 12 | * Track experiment runs running on SageMaker with MLflow
 13 | * Register models trained in SageMaker in the MLflow model registry
 14 | * Deploy an MLflow model into a SageMaker endpoint
 15 | 
 16 | ### MLflow tracking server
 17 | You can set a central MLflow tracking server during your ML project. By using this remote MLflow server, data scientists
 18 | will be able to manage experiments and models in a collaborative manner.
 19 | An MLflow tracking server also has two components for storage: a ```backend store``` and an ```artifact store```. This
 20 | implementation uses an Amazon S3 bucket as artifact store and an Amazon RDS instance for MySQL as backend store.
 21 | 
 22 | ![](media/architecture-mlflow.png)
 23 | 
 24 | ### Prerequisites
 25 | 
 26 | We will use [the AWS CDK](https://cdkworkshop.com/) to deploy the MLflow server.
 27 | 
 28 | To go through this example, make sure you have the following:
 29 | * An AWS account where the service will be deployed
 30 | * [AWS CDK installed and configured](https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html). Make sure to have the credentials and permissions to deploy the stack into your account
 31 | * [Docker](https://www.docker.com) to build and push the MLflow container image to ECR
 32 | * This [Github repository](https://github.com/aws-samples/amazon-sagemaker-mlflow-fargate) cloned into your environment to follow the steps
 33 | 
 34 | ### Deploying the stack
 35 | 
 36 | You can view the CDK stack details in [app.py](https://github.com/aws-samples/amazon-sagemaker-mlflow-fargate/blob/main/app.py).
 37 | Execute the following commands to install CDK and make sure you have the right dependencies:
 38 | 
 39 | ```
 40 | npm install -g aws-cdk@2.51.1
 41 | python3 -m venv .venv
 42 | source .venv/bin/activate
 43 | pip3 install -r requirements.txt
 44 | ```
 45 | 
 46 | Once this is installed, you can execute the following commands to deploy the inference service into your account:
 47 | 
 48 | ```
 49 | ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"')
 50 | AWS_REGION=$(aws configure get region)
 51 | cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION}
 52 | cdk deploy --parameters ProjectName=mlflow --require-approval never
 53 | ```
 54 | 
 55 | The first 2 commands will get your account ID and current AWS region using the AWS CLI on your computer. ```cdk
 56 | bootstrap``` and ```cdk deploy``` will build the container image locally, push it to ECR, and deploy the stack. 
 57 | 
 58 | The stack will take a few minutes to launch the MLflow server on AWS Fargate, with an S3 bucket and a MySQL database on
 59 | RDS. You can then use the load balancer URI present in the stack outputs to access the MLflow UI:
 60 | ![](media/load-balancer.png)
 61 | ![](media/mlflow-interface.png)
 62 | 
 63 | **N.B:** In this illustrative example stack, the load balancer is launched on a public subnet and is internet facing.
 64 | For security purposes, you may want to provision an internal load balancer in your VPC private subnets where there is no
 65 | direct connectivity from the outside world. Here is a blog post explaining how to achieve
 66 | this: [Access Private applications on AWS Fargate using Amazon API Gateway PrivateLink](https://aws.amazon.com/blogs/compute/access-private-applications-on-aws-fargate-using-amazon-api-gateway-privatelink/)
 67 | 
 68 | ### Managing an ML lifecycle with Amazon SageMaker and MLflow
 69 | 
 70 | You now have a remote MLflow tracking server running accessible through
 71 | a [REST API](https://mlflow.org/docs/latest/rest-api.html#rest-api) via
 72 | the [load balancer uri](https://mlflow.org/docs/latest/quickstart.html#quickstart-logging-to-remote-server). 
 73 | You can use the MLflow Tracking API to log parameters, metrics, and models when running your machine learning project with Amazon
 74 | SageMaker. For this you will need install the MLflow library when running your code on Amazon SageMaker and set the
 75 | remote tracking uri to be your load balancer address.
 76 | 
 77 | The following python API command allows you to point your code executing on SageMaker to your MLflow remote server:
 78 | 
 79 | ```
 80 | import mlflow
 81 | mlflow.set_tracking_uri('<YOUR LOAD BALANCER URI>')
 82 | ```
 83 | 
 84 | Connect to your notebook instance and set the remote tracking URI.
 85 | ![](media/architecture-experiments.png)
 86 | 
 87 | ### Running an example lab
 88 | 
 89 | This describes how to develop, train, tune and deploy a Random Forest model using Scikit-learn with
 90 | the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/using_sklearn.html). We use
 91 | the [Boston Housing dataset](https://scikit-learn.org/stable/datasets/index.html#boston-dataset), present
 92 | in [Scikit-Learn](https://scikit-learn.org/stable/index.html.), and log our machine learning runs into MLflow. You can
 93 | find the original lab in
 94 | the [SageMaker Examples](https://github.com/aws/amazon-sagemaker-examples/tree/fb04396d2e7ceeb135b0b0a516e54c97922ca0d8/sagemaker-python-sdk/scikit_learn_randomforest)
 95 | repository for more details on using custom Scikit-learn scipts with Amazon SageMaker.
 96 | 
 97 | Follow the step-by-step guide by executing the notebooks in the following folders:
 98 | 
 99 | * lab/1_track_experiments.ipynb
100 | * lab/2_track_experiments_hpo.ipynb
101 | * lab/3_deploy_model.ipynb
102 | 
103 | ### Current limitation on user access control
104 | 
105 | The [open source version](https://github.com/mlflow/mlflow) of MLflow does not currently provide user access control
106 | features in case you have multiple tenants on your MLflow server. This means any user having access to the MLflow server
107 | can modify experiments, model versions, and stages. This can be a challenge for enterprises in regulated industries that
108 | need to keep strong model governance for audit purposes.
109 | 
110 | ### Security
111 | 
112 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
113 | 
114 | ### License
115 | 
116 | This library is licensed under the MIT-0 License. See the LICENSE file.
117 | 
118 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | from aws_cdk import (
  5 |     aws_ec2 as ec2,
  6 |     aws_s3 as s3,
  7 |     aws_ecs as ecs,
  8 |     aws_rds as rds,
  9 |     aws_iam as iam,
 10 |     aws_secretsmanager as sm,
 11 |     aws_ecs_patterns as ecs_patterns,
 12 |     App,
 13 |     Stack,
 14 |     CfnParameter,
 15 |     CfnOutput,
 16 |     Aws,
 17 |     RemovalPolicy,
 18 |     Duration,
 19 | )
 20 | from constructs import Construct
 21 | 
 22 | 
 23 | class MLflowStack(Stack):
 24 |     def __init__(self, scope: Construct, id: str, **kwargs) -> None:
 25 |         super().__init__(scope, id, **kwargs)
 26 |         # ==============================
 27 |         # ======= CFN PARAMETERS =======
 28 |         # ==============================
 29 |         project_name_param = CfnParameter(scope=self, id="ProjectName", type="String")
 30 |         db_name = "mlflowdb"
 31 |         port = 3306
 32 |         username = "master"
 33 |         bucket_name = f"{project_name_param.value_as_string}-artifacts-{Aws.ACCOUNT_ID}"
 34 |         container_repo_name = "mlflow-containers"
 35 |         cluster_name = "mlflow"
 36 |         service_name = "mlflow"
 37 | 
 38 |         # ==================================================
 39 |         # ================= IAM ROLE =======================
 40 |         # ==================================================
 41 |         role = iam.Role(
 42 |             scope=self,
 43 |             id="TASKROLE",
 44 |             assumed_by=iam.ServicePrincipal(service="ecs-tasks.amazonaws.com"),
 45 |         )
 46 |         role.add_managed_policy(
 47 |             iam.ManagedPolicy.from_aws_managed_policy_name("AmazonS3FullAccess")
 48 |         )
 49 |         role.add_managed_policy(
 50 |             iam.ManagedPolicy.from_aws_managed_policy_name("AmazonECS_FullAccess")
 51 |         )
 52 | 
 53 |         # ==================================================
 54 |         # ================== SECRET ========================
 55 |         # ==================================================
 56 |         db_password_secret = sm.Secret(
 57 |             scope=self,
 58 |             id="DBSECRET",
 59 |             secret_name="dbPassword",
 60 |             generate_secret_string=sm.SecretStringGenerator(
 61 |                 password_length=20, exclude_punctuation=True
 62 |             ),
 63 |         )
 64 | 
 65 |         # ==================================================
 66 |         # ==================== VPC =========================
 67 |         # ==================================================
 68 |         public_subnet = ec2.SubnetConfiguration(
 69 |             name="Public", subnet_type=ec2.SubnetType.PUBLIC, cidr_mask=28
 70 |         )
 71 |         private_subnet = ec2.SubnetConfiguration(
 72 |             name="Private", subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, cidr_mask=28
 73 |         )
 74 |         isolated_subnet = ec2.SubnetConfiguration(
 75 |             name="DB", subnet_type=ec2.SubnetType.PRIVATE_ISOLATED, cidr_mask=28
 76 |         )
 77 | 
 78 |         vpc = ec2.Vpc(
 79 |             scope=self,
 80 |             id="VPC",
 81 |             ip_addresses=ec2.IpAddresses.cidr("10.0.0.0/24"),
 82 |             max_azs=2,
 83 |             nat_gateway_provider=ec2.NatProvider.gateway(),
 84 |             nat_gateways=1,
 85 |             subnet_configuration=[public_subnet, private_subnet, isolated_subnet],
 86 |         )
 87 |         vpc.add_gateway_endpoint(
 88 |             "S3Endpoint", service=ec2.GatewayVpcEndpointAwsService.S3
 89 |         )
 90 |         # ==================================================
 91 |         # ================= S3 BUCKET ======================
 92 |         # ==================================================
 93 |         artifact_bucket = s3.Bucket(
 94 |             scope=self,
 95 |             id="ARTIFACTBUCKET",
 96 |             bucket_name=bucket_name,
 97 |             public_read_access=False,
 98 |         )
 99 |         # # ==================================================
100 |         # # ================== DATABASE  =====================
101 |         # # ==================================================
102 |         # Creates a security group for AWS RDS
103 |         sg_rds = ec2.SecurityGroup(
104 |             scope=self, id="SGRDS", vpc=vpc, security_group_name="sg_rds"
105 |         )
106 |         # Adds an ingress rule which allows resources in the VPC's CIDR to access the database.
107 |         sg_rds.add_ingress_rule(
108 |             peer=ec2.Peer.ipv4("10.0.0.0/24"), connection=ec2.Port.tcp(port)
109 |         )
110 | 
111 |         database = rds.DatabaseInstance(
112 |             scope=self,
113 |             id="MYSQL",
114 |             database_name=db_name,
115 |             port=port,
116 |             credentials=rds.Credentials.from_username(
117 |                 username=username, password=db_password_secret.secret_value
118 |             ),
119 |             engine=rds.DatabaseInstanceEngine.mysql(
120 |                 version=rds.MysqlEngineVersion.VER_8_0_34
121 |             ),
122 |             instance_type=ec2.InstanceType.of(
123 |                 ec2.InstanceClass.M5, ec2.InstanceSize.LARGE
124 |             ),
125 |             vpc=vpc,
126 |             security_groups=[sg_rds],
127 |             vpc_subnets=ec2.SubnetSelection(
128 |                 subnet_type=ec2.SubnetType.PRIVATE_ISOLATED
129 |             ),
130 |             # multi_az=True,
131 |             removal_policy=RemovalPolicy.DESTROY,
132 |             deletion_protection=False,
133 |         )
134 |         # ==================================================
135 |         # =============== FARGATE SERVICE ==================
136 |         # ==================================================
137 |         cluster = ecs.Cluster(
138 |             scope=self, id="CLUSTER", cluster_name=cluster_name, vpc=vpc
139 |         )
140 | 
141 |         task_definition = ecs.FargateTaskDefinition(
142 |             scope=self,
143 |             id="MLflow",
144 |             task_role=role,
145 |             cpu=4 * 1024,
146 |             memory_limit_mib=8 * 1024,
147 |         )
148 | 
149 |         container = task_definition.add_container(
150 |             id="Container",
151 |             image=ecs.ContainerImage.from_asset(directory="container"),
152 |             environment={
153 |                 "BUCKET": f"s3://{artifact_bucket.bucket_name}",
154 |                 "HOST": database.db_instance_endpoint_address,
155 |                 "PORT": str(port),
156 |                 "DATABASE": db_name,
157 |                 "USERNAME": username,
158 |             },
159 |             secrets={"PASSWORD": ecs.Secret.from_secrets_manager(db_password_secret)},
160 |             logging=ecs.LogDriver.aws_logs(stream_prefix="mlflow"),
161 |         )
162 |         port_mapping = ecs.PortMapping(
163 |             container_port=5000, host_port=5000, protocol=ecs.Protocol.TCP
164 |         )
165 |         container.add_port_mappings(port_mapping)
166 | 
167 |         fargate_service = ecs_patterns.NetworkLoadBalancedFargateService(
168 |             scope=self,
169 |             id="MLFLOW",
170 |             service_name=service_name,
171 |             cluster=cluster,
172 |             task_definition=task_definition,
173 |         )
174 | 
175 |         # Setup security group
176 |         fargate_service.service.connections.security_groups[0].add_ingress_rule(
177 |             peer=ec2.Peer.ipv4(vpc.vpc_cidr_block),
178 |             connection=ec2.Port.tcp(5000),
179 |             description="Allow inbound from VPC for mlflow",
180 |         )
181 | 
182 |         # Setup autoscaling policy
183 |         scaling = fargate_service.service.auto_scale_task_count(max_capacity=2)
184 |         scaling.scale_on_cpu_utilization(
185 |             id="AUTOSCALING",
186 |             target_utilization_percent=70,
187 |             scale_in_cooldown=Duration.seconds(60),
188 |             scale_out_cooldown=Duration.seconds(60),
189 |         )
190 |         # ==================================================
191 |         # =================== OUTPUTS ======================
192 |         # ==================================================
193 |         CfnOutput(
194 |             scope=self,
195 |             id="LoadBalancerDNS",
196 |             value=fargate_service.load_balancer.load_balancer_dns_name,
197 |         )
198 | 
199 | 
200 | app = App()
201 | MLflowStack(app, "MLflowStack")
202 | app.synth()
203 | 


--------------------------------------------------------------------------------
/cdk.json:
--------------------------------------------------------------------------------
1 | {
2 |     "app": "python3 app.py"
3 | }
4 | 


--------------------------------------------------------------------------------
/container/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM python:3.10.12
 2 | 
 3 | RUN pip install \
 4 |     mlflow==2.6.0 \
 5 |     pymysql==1.0.2 \
 6 |     boto3 && \
 7 |     mkdir /mlflow/
 8 | 
 9 | EXPOSE 5000
10 | 
11 | CMD mlflow server \
12 |     --host 0.0.0.0 \
13 |     --port 5000 \
14 |     --default-artifact-root ${BUCKET} \
15 |     --backend-store-uri mysql+pymysql://${USERNAME}:${PASSWORD}@${HOST}:${PORT}/${DATABASE}
16 | 


--------------------------------------------------------------------------------
/deploy_stack.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # INSTALL NODE
 4 | # sudo yum install -y gcc-c++ make
 5 | # curl -sL https://rpm.nodesource.com/setup_16.x | sudo bash -
 6 | # sudo yum install -y nodejs
 7 | 
 8 | 
 9 | # INSTALL CDK
10 | # npm install -g aws-cdk@2.92.0
11 | # python3 -m venv .venv
12 | # source .venv/bin/activate
13 | # pip3 install -r requirements.txt
14 | 
15 | 
16 | # DEPLOY STACK
17 | ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"')
18 | AWS_REGION=$(aws configure get region)
19 | cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION}
20 | cdk deploy --parameters ProjectName=mlflow --require-approval never


--------------------------------------------------------------------------------
/lab/1_track_experiments.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Train a Scikit-Learn model in SageMaker and track with MLFlow"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Setup Environment"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "!pip install -q --upgrade pip\n",
 24 |     "!pip install -q --upgrade sagemaker==2.117.0"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": null,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "import sagemaker\n",
 34 |     "import pandas as pd\n",
 35 |     "from sklearn.datasets import load_boston\n",
 36 |     "from sagemaker.sklearn.estimator import SKLearn\n",
 37 |     "from sklearn.model_selection import train_test_split\n",
 38 |     "\n",
 39 |     "sess = sagemaker.Session()\n",
 40 |     "role = sagemaker.get_execution_role()\n",
 41 |     "bucket = sess.default_bucket()\n",
 42 |     "\n",
 43 |     "# uri of your remote mlflow server\n",
 44 |     "tracking_uri = '<YOUR MLFLOW SERVER URI>' "
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "markdown",
 49 |    "metadata": {},
 50 |    "source": [
 51 |     "## Prepare data\n",
 52 |     "We load a dataset from sklearn, split it and send it to S3"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": null,
 58 |    "metadata": {},
 59 |    "outputs": [],
 60 |    "source": [
 61 |     "# we use the Boston housing dataset \n",
 62 |     "data = load_boston()\n",
 63 |     "\n",
 64 |     "X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25, random_state=42)\n",
 65 |     "\n",
 66 |     "trainX = pd.DataFrame(X_train, columns=data.feature_names)\n",
 67 |     "trainX['target'] = y_train\n",
 68 |     "\n",
 69 |     "testX = pd.DataFrame(X_test, columns=data.feature_names)\n",
 70 |     "testX['target'] = y_test\n",
 71 |     "\n",
 72 |     "trainX.to_csv('boston_train.csv')\n",
 73 |     "testX.to_csv('boston_test.csv')"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "code",
 78 |    "execution_count": null,
 79 |    "metadata": {},
 80 |    "outputs": [],
 81 |    "source": [
 82 |     "# send data to S3. SageMaker will take training data from s3\n",
 83 |     "train_path = sess.upload_data(path='boston_train.csv', bucket=bucket, key_prefix='sagemaker/sklearncontainer')\n",
 84 |     "test_path = sess.upload_data(path='boston_test.csv', bucket=bucket, key_prefix='sagemaker/sklearncontainer')"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "markdown",
 89 |    "metadata": {},
 90 |    "source": [
 91 |     "## Train"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "code",
 96 |    "execution_count": null,
 97 |    "metadata": {},
 98 |    "outputs": [],
 99 |    "source": [
100 |     "hyperparameters = {\n",
101 |     "    'tracking_uri': tracking_uri,\n",
102 |     "    'experiment_name': 'boston-housing',\n",
103 |     "    'n-estimators': 100,\n",
104 |     "    'min-samples-leaf': 3,\n",
105 |     "    'features': 'CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT',\n",
106 |     "    'target': 'target'\n",
107 |     "}\n",
108 |     "\n",
109 |     "metric_definitions = [{'Name': 'median-AE', 'Regex': \"AE-at-50th-percentile: ([0-9.]+).*$\"}]\n",
110 |     "\n",
111 |     "estimator = SKLearn(\n",
112 |     "    entry_point='train.py',\n",
113 |     "    source_dir='source_dir',\n",
114 |     "    role=role,\n",
115 |     "    metric_definitions=metric_definitions,\n",
116 |     "    hyperparameters=hyperparameters,\n",
117 |     "    instance_count=1,\n",
118 |     "    instance_type='ml.m5.xlarge',\n",
119 |     "    framework_version='1.0-1',\n",
120 |     "    base_job_name='mlflow',\n",
121 |     ")"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": null,
127 |    "metadata": {
128 |     "scrolled": true
129 |    },
130 |    "outputs": [],
131 |    "source": [
132 |     "estimator.fit({'train':train_path, 'test': test_path})"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "code",
137 |    "execution_count": null,
138 |    "metadata": {},
139 |    "outputs": [],
140 |    "source": []
141 |   }
142 |  ],
143 |  "metadata": {
144 |   "kernelspec": {
145 |    "display_name": "conda_python3",
146 |    "language": "python",
147 |    "name": "conda_python3"
148 |   },
149 |   "language_info": {
150 |    "codemirror_mode": {
151 |     "name": "ipython",
152 |     "version": 3
153 |    },
154 |    "file_extension": ".py",
155 |    "mimetype": "text/x-python",
156 |    "name": "python",
157 |    "nbconvert_exporter": "python",
158 |    "pygments_lexer": "ipython3",
159 |    "version": "3.8.12"
160 |   },
161 |   "vscode": {
162 |    "interpreter": {
163 |     "hash": "3b41de70bedc0e302a3aeb58a0c77b854f2e56c8930e61a4aaa3340c96b01f1d"
164 |    }
165 |   }
166 |  },
167 |  "nbformat": 4,
168 |  "nbformat_minor": 2
169 | }
170 | 


--------------------------------------------------------------------------------
/lab/2_track_experiments_hpo.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Tune a Scikit-Learn model in SageMaker and track with MLFlow"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Setup environment"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "import sagemaker\n",
 24 |     "import pandas as pd\n",
 25 |     "from sklearn.datasets import load_boston\n",
 26 |     "from sagemaker.sklearn.estimator import SKLearn\n",
 27 |     "from sklearn.model_selection import train_test_split\n",
 28 |     "from sagemaker.tuner import IntegerParameter, HyperparameterTuner\n",
 29 |     "\n",
 30 |     "sess = sagemaker.Session()\n",
 31 |     "role = sagemaker.get_execution_role()\n",
 32 |     "bucket = sess.default_bucket()\n",
 33 |     "\n",
 34 |     "# uri of your remote mlflow server\n",
 35 |     "tracking_uri = '<YOUR MLFLOW SERVER URI>' "
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "## Prepare data\n",
 43 |     "We load a dataset from sklearn, split it and send it to S3"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": null,
 49 |    "metadata": {},
 50 |    "outputs": [],
 51 |    "source": [
 52 |     "# we use the Boston housing dataset \n",
 53 |     "data = load_boston()\n",
 54 |     "\n",
 55 |     "X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25, random_state=42)\n",
 56 |     "\n",
 57 |     "trainX = pd.DataFrame(X_train, columns=data.feature_names)\n",
 58 |     "trainX['target'] = y_train\n",
 59 |     "\n",
 60 |     "testX = pd.DataFrame(X_test, columns=data.feature_names)\n",
 61 |     "testX['target'] = y_test\n",
 62 |     "\n",
 63 |     "trainX.to_csv('boston_train.csv')\n",
 64 |     "testX.to_csv('boston_test.csv')"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": null,
 70 |    "metadata": {},
 71 |    "outputs": [],
 72 |    "source": [
 73 |     "# send data to S3. SageMaker will take training data from s3\n",
 74 |     "train_path = sess.upload_data(path='boston_train.csv', bucket=bucket, key_prefix='sagemaker/sklearncontainer')\n",
 75 |     "test_path = sess.upload_data(path='boston_test.csv', bucket=bucket, key_prefix='sagemaker/sklearncontainer')"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "markdown",
 80 |    "metadata": {},
 81 |    "source": [
 82 |     "## Tune"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": null,
 88 |    "metadata": {},
 89 |    "outputs": [],
 90 |    "source": [
 91 |     "hyperparameters = {\n",
 92 |     "    'tracking_uri': tracking_uri,\n",
 93 |     "    'experiment_name': 'boston-housing',\n",
 94 |     "    'features': 'CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT',\n",
 95 |     "    'target': 'target'\n",
 96 |     "}\n",
 97 |     "\n",
 98 |     "metric_definitions = [{'Name': 'median-AE', 'Regex': \"AE-at-50th-percentile: ([0-9.]+).*$\"}]"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": null,
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "estimator = SKLearn(\n",
108 |     "    entry_point='train.py',\n",
109 |     "    source_dir='source_dir',\n",
110 |     "    role=role,\n",
111 |     "    instance_count=1,\n",
112 |     "    instance_type='ml.m5.xlarge',\n",
113 |     "    hyperparameters=hyperparameters,\n",
114 |     "    metric_definitions=metric_definitions,\n",
115 |     "    framework_version='1.0-1',\n",
116 |     "    py_version='py3'\n",
117 |     ")\n",
118 |     "\n",
119 |     "hyperparameter_ranges = {\n",
120 |     "    'n-estimators': IntegerParameter(50, 200),\n",
121 |     "    'min-samples-leaf': IntegerParameter(1, 10)\n",
122 |     "}\n",
123 |     "\n",
124 |     "objective_metric_name = 'median-AE'\n",
125 |     "objective_type = 'Minimize'"
126 |    ]
127 |   },
128 |   {
129 |    "cell_type": "code",
130 |    "execution_count": null,
131 |    "metadata": {},
132 |    "outputs": [],
133 |    "source": [
134 |     "tuner = HyperparameterTuner(estimator,\n",
135 |     "                            objective_metric_name,\n",
136 |     "                            hyperparameter_ranges,\n",
137 |     "                            metric_definitions,\n",
138 |     "                            max_jobs=20,\n",
139 |     "                            max_parallel_jobs=2,\n",
140 |     "                            objective_type=objective_type,\n",
141 |     "                            base_tuning_job_name='mlflow')\n",
142 |     "\n",
143 |     "tuner.fit({'train':train_path, 'test': test_path})"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": null,
149 |    "metadata": {},
150 |    "outputs": [],
151 |    "source": []
152 |   }
153 |  ],
154 |  "metadata": {
155 |   "kernelspec": {
156 |    "display_name": "conda_python3",
157 |    "language": "python",
158 |    "name": "conda_python3"
159 |   },
160 |   "language_info": {
161 |    "codemirror_mode": {
162 |     "name": "ipython",
163 |     "version": 3
164 |    },
165 |    "file_extension": ".py",
166 |    "mimetype": "text/x-python",
167 |    "name": "python",
168 |    "nbconvert_exporter": "python",
169 |    "pygments_lexer": "ipython3",
170 |    "version": "3.8.12"
171 |   },
172 |   "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
173 |   "vscode": {
174 |    "interpreter": {
175 |     "hash": "3b41de70bedc0e302a3aeb58a0c77b854f2e56c8930e61a4aaa3340c96b01f1d"
176 |    }
177 |   }
178 |  },
179 |  "nbformat": 4,
180 |  "nbformat_minor": 2
181 | }
182 | 


--------------------------------------------------------------------------------
/lab/3_deploy_model.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Deploy an MLflow model with SageMaker"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Install MLflow"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "!pip install -q mlflow==2.6.0"
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "markdown",
 28 |    "metadata": {},
 29 |    "source": [
 30 |     "## Setup environment"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": null,
 36 |    "metadata": {},
 37 |    "outputs": [],
 38 |    "source": [
 39 |     "import json\n",
 40 |     "import boto3\n",
 41 |     "import mlflow\n",
 42 |     "import sagemaker\n",
 43 |     "import pandas as pd\n",
 44 |     "import mlflow.sagemaker\n",
 45 |     "from sklearn.datasets import load_boston\n",
 46 |     "from mlflow.deployments import get_deploy_client\n",
 47 |     "\n",
 48 |     "# name of the AWS region to which to deploy the application\n",
 49 |     "region = sagemaker.Session().boto_region_name\n",
 50 |     "# we are using the notebook instance role for training in this example\n",
 51 |     "role = sagemaker.get_execution_role() \n",
 52 |     "# uri of your remote mlflow server\n",
 53 |     "tracking_uri = '<YOUR MLFLOW SERVER URI>' \n",
 54 |     "# set remote mlflow server\n",
 55 |     "mlflow.set_tracking_uri(tracking_uri)"
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "markdown",
 60 |    "metadata": {},
 61 |    "source": [
 62 |     "## Build MLflow docker image to serve the model with SageMaker "
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": null,
 68 |    "metadata": {},
 69 |    "outputs": [],
 70 |    "source": [
 71 |     "!mlflow sagemaker build-and-push-container"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": null,
 77 |    "metadata": {},
 78 |    "outputs": [],
 79 |    "source": [
 80 |     "# URL of the ECR-hosted Docker image the model should be deployed into\n",
 81 |     "image_uri = '<YOUR mlflow-pyfunc ECR IMAGE URI>'"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "markdown",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "## Deploy a SageMaker endpoint with our scikit-learn model"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "metadata": {},
 95 |    "outputs": [],
 96 |    "source": [
 97 |     "endpoint_name = 'boston-housing'\n",
 98 |     "# The location, in URI format, of the MLflow model to deploy to SageMaker.\n",
 99 |     "model_uri = 'models:/boston/1'"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "code",
104 |    "execution_count": null,
105 |    "metadata": {},
106 |    "outputs": [],
107 |    "source": [
108 |     "config={\n",
109 |     "    'execution_role_arn': role,\n",
110 |     "    'image_url': image_uri,\n",
111 |     "    'instance_type': 'ml.m5.xlarge',\n",
112 |     "    'instance_count': 1, \n",
113 |     "    'region_name': region\n",
114 |     "}\n",
115 |     "\n",
116 |     "client = get_deploy_client(\"sagemaker\")\n",
117 |     "\n",
118 |     "client.create_deployment(\n",
119 |     "    name=endpoint_name,\n",
120 |     "    model_uri=model_uri,\n",
121 |     "    flavor='python_function',\n",
122 |     "    config=config\n",
123 |     ")"
124 |    ]
125 |   },
126 |   {
127 |    "cell_type": "markdown",
128 |    "metadata": {},
129 |    "source": [
130 |     "## Predict"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "code",
135 |    "execution_count": null,
136 |    "metadata": {
137 |     "scrolled": true
138 |    },
139 |    "outputs": [],
140 |    "source": [
141 |     "# load boston dataset\n",
142 |     "data = load_boston()\n",
143 |     "df = pd.DataFrame(data.data, columns=data.feature_names)"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": null,
149 |    "metadata": {},
150 |    "outputs": [],
151 |    "source": [
152 |     "client = get_deploy_client(f\"sagemaker:/{region}\")\n",
153 |     "\n",
154 |     "payload = df.iloc[[0]]\n",
155 |     "prediction = client.predict(endpoint_name, df.iloc[[0]])\n",
156 |     "\n",
157 |     "print(f'Payload: {payload}')\n",
158 |     "print(f'Prediction: {prediction}')"
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "markdown",
163 |    "metadata": {},
164 |    "source": [
165 |     "## Delete endpoint"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "code",
170 |    "execution_count": null,
171 |    "metadata": {},
172 |    "outputs": [],
173 |    "source": [
174 |     "client.delete_deployment(endpoint_name, config=config)"
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "code",
179 |    "execution_count": null,
180 |    "metadata": {},
181 |    "outputs": [],
182 |    "source": []
183 |   }
184 |  ],
185 |  "metadata": {
186 |   "kernelspec": {
187 |    "display_name": "conda_python3",
188 |    "language": "python",
189 |    "name": "conda_python3"
190 |   },
191 |   "language_info": {
192 |    "codemirror_mode": {
193 |     "name": "ipython",
194 |     "version": 3
195 |    },
196 |    "file_extension": ".py",
197 |    "mimetype": "text/x-python",
198 |    "name": "python",
199 |    "nbconvert_exporter": "python",
200 |    "pygments_lexer": "ipython3",
201 |    "version": "3.8.12"
202 |   },
203 |   "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
204 |  },
205 |  "nbformat": 4,
206 |  "nbformat_minor": 2
207 | }
208 | 


--------------------------------------------------------------------------------
/lab/source_dir/requirements.txt:
--------------------------------------------------------------------------------
1 | mlflow==2.16.0
2 | 


--------------------------------------------------------------------------------
/lab/source_dir/setup.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | from setuptools import setup, find_packages
 5 | 
 6 | setup(name='sagemaker-example',
 7 |       version='1.0',
 8 |       description='SageMaker MLFlow Example.',
 9 |       author='sofian',
10 |       author_email='hamitis@amazon.com',
11 |       packages=find_packages(exclude=('tests', 'docs')))


--------------------------------------------------------------------------------
/lab/source_dir/train.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import os
 5 | import logging
 6 | import argparse
 7 | import numpy as np
 8 | import pandas as pd
 9 | from sklearn.ensemble import RandomForestRegressor
10 | import mlflow
11 | import mlflow.sklearn
12 | 
13 | logging.basicConfig(level=logging.INFO)
14 | 
15 | 
16 | if __name__ =='__main__':
17 |     parser = argparse.ArgumentParser()
18 |     # MLflow related parameters
19 |     parser.add_argument("--tracking_uri", type=str)
20 |     parser.add_argument("--experiment_name", type=str)
21 |     # hyperparameters sent by the client are passed as command-line arguments to the script.
22 |     # to simplify the demo we don't use all sklearn RandomForest hyperparameters
23 |     parser.add_argument('--n-estimators', type=int, default=10)
24 |     parser.add_argument('--min-samples-leaf', type=int, default=3)
25 | 
26 |     # Data, model, and output directories
27 |     parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
28 |     parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
29 |     parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
30 |     parser.add_argument('--train-file', type=str, default='boston_train.csv')
31 |     parser.add_argument('--test-file', type=str, default='boston_test.csv')
32 |     parser.add_argument('--features', type=str)  # we ask user to explicitly name features
33 |     parser.add_argument('--target', type=str) # we ask user to explicitly name the target
34 | 
35 |     args, _ = parser.parse_known_args()
36 | 
37 |     logging.info('reading data')
38 |     train_df = pd.read_csv(os.path.join(args.train, args.train_file))
39 |     test_df = pd.read_csv(os.path.join(args.test, args.test_file))
40 | 
41 |     logging.info('building training and testing datasets')
42 |     X_train = train_df[args.features.split()]
43 |     X_test = test_df[args.features.split()]
44 |     y_train = train_df[args.target]
45 |     y_test = test_df[args.target]
46 | 
47 |     
48 |     # set remote mlflow server
49 |     mlflow.set_tracking_uri(args.tracking_uri)
50 |     mlflow.set_experiment(args.experiment_name)
51 |     
52 |     with mlflow.start_run():
53 |         params = {
54 |             "n-estimators": args.n_estimators,
55 |             "min-samples-leaf": args.min_samples_leaf,
56 |             "features": args.features
57 |         }
58 |         mlflow.log_params(params)
59 |         
60 |         # TRAIN
61 |         logging.info('training model')
62 |         model = RandomForestRegressor(
63 |             n_estimators=args.n_estimators,
64 |             min_samples_leaf=args.min_samples_leaf,
65 |             n_jobs=-1
66 |         )
67 | 
68 |         model.fit(X_train, y_train)
69 | 
70 |         # ABS ERROR AND LOG COUPLE PERF METRICS
71 |         logging.info('evaluating model')
72 |         abs_err = np.abs(model.predict(X_test) - y_test)
73 | 
74 |         for q in [10, 50, 90]:
75 |             logging.info(f'AE-at-{q}th-percentile: {np.percentile(a=abs_err, q=q)}')
76 |             mlflow.log_metric(f'AE-at-{str(q)}th-percentile', np.percentile(a=abs_err, q=q))
77 | 
78 |         # SAVE MODEL
79 |         logging.info('saving model in MLflow')
80 |         mlflow.sklearn.log_model(model, "model")


--------------------------------------------------------------------------------
/media/architecture-experiments.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-mlflow-fargate/6e0e305d546e53aa6c580a71731a94c1df331872/media/architecture-experiments.png


--------------------------------------------------------------------------------
/media/architecture-mlflow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-mlflow-fargate/6e0e305d546e53aa6c580a71731a94c1df331872/media/architecture-mlflow.png


--------------------------------------------------------------------------------
/media/load-balancer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-mlflow-fargate/6e0e305d546e53aa6c580a71731a94c1df331872/media/load-balancer.png


--------------------------------------------------------------------------------
/media/mlflow-interface.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-mlflow-fargate/6e0e305d546e53aa6c580a71731a94c1df331872/media/mlflow-interface.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aws-cdk-lib==2.92.0
2 | constructs==10.2.69
3 | 


--------------------------------------------------------------------------------