├── .gitignore
├── 05-experiment-tracking
├── 06-MLflow-and-DVC-project.md
├── 03-basic-MLflow-installation.md
├── 04-basic-MLflow-on-Kubernetes.md
├── 05-MLflow-prod-setup.md
├── 01-what-is-experiment-tracking.md
└── 02-what-is-MLflow.md
├── 03-role-of-mlops
├── 01-introduction.md
├── 04-ml-engineers-without-mlops.md
├── 05-how-mlops-engineers-help-ml-engineers.md
├── 03-how-mlops-help-datascientists.md
└── 02-data-scientists-without-mlops.md
├── 06-fundamentals-of-model-deployment.md
├── 03-project-for-deployment.md
├── 01-introduction-to-deployment-and-serving.md
└── 02-popular-ways.md
├── 07-deploy-and-serving-using-vms
├── 02-implementing-wsgi.md
├── 00-IMPORTANT.md
└── 01-architecture.md
├── 04-versioning-and-experiment-tracking
├── 03-DVC-hands-on.md
├── 02-introduction-to-dvc.md
└── 01-what-is-data-versioning.md
├── README.md
├── 08-kserve
├── 02-architecture.md
├── 01-Introduction.md
└── 03-end-to-end-demo.md
├── 09-SageMaker
├── 01-introduction.md
└── 02-production-setup.md
└── 02-introduction-to-mlops
├── 03-what-is-mlops.md
├── 01-what-is-machine-learning-and-model.md
├── 05-ds-vs-ml-vs-mlops.md
├── 02-steps-to-create-a-model.md
└── 04-machine-learning-lifecycle-overview.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .venv/
2 | .vscode/
--------------------------------------------------------------------------------
/05-experiment-tracking/06-MLflow-and-DVC-project.md:
--------------------------------------------------------------------------------
1 | Please refer to the below repository for this lecture.
2 |
3 | https://github.com/iam-veeramalla/Wine-Prediction-Model
--------------------------------------------------------------------------------
/05-experiment-tracking/03-basic-MLflow-installation.md:
--------------------------------------------------------------------------------
1 | Please refer to the below documentation for this lecture.
2 |
3 | https://mlflow.org/docs/2.4.2/quickstart.html#install-mlflow
--------------------------------------------------------------------------------
/03-role-of-mlops/01-introduction.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | Please refer the below repository for all the project files and notes.
4 |
5 | https://github.com/iam-veeramalla/hello-world-mlops
--------------------------------------------------------------------------------
/05-experiment-tracking/04-basic-MLflow-on-Kubernetes.md:
--------------------------------------------------------------------------------
1 | Please refer to the below documentation for this lecture.
2 |
3 | https://community-charts.github.io/docs/charts/mlflow/basic-installation
--------------------------------------------------------------------------------
/05-experiment-tracking/05-MLflow-prod-setup.md:
--------------------------------------------------------------------------------
1 | Please refer to the below document for the next lecture
2 |
3 | https://community-charts.github.io/docs/charts/mlflow/postgresql-backend-installation
--------------------------------------------------------------------------------
/06-fundamentals-of-model-deployment.md/03-project-for-deployment.md:
--------------------------------------------------------------------------------
1 | Please refer to the below GitHub repository for this lecture
2 |
3 | https://github.com/iam-veeramalla/Intent-classifier-model
--------------------------------------------------------------------------------
/07-deploy-and-serving-using-vms/02-implementing-wsgi.md:
--------------------------------------------------------------------------------
1 | Please refer for complete project files and notes.
2 |
3 | https://github.com/iam-veeramalla/Intent-classifier-model/tree/virtual-machines
--------------------------------------------------------------------------------
/04-versioning-and-experiment-tracking/03-DVC-hands-on.md:
--------------------------------------------------------------------------------
1 | # Learn DVC using a project
2 |
3 | Please refer to the below repository for this lecture.
4 |
5 | https://github.com/iam-veeramalla/Wine-Prediction-Model
6 |
--------------------------------------------------------------------------------
/07-deploy-and-serving-using-vms/00-IMPORTANT.md:
--------------------------------------------------------------------------------
1 | # Important Note
2 |
3 | Please refer to the Virtual Machines branch of Intent Classifier repo for this section.
4 |
5 | Link:
6 |
7 | https://github.com/iam-veeramalla/Intent-classifier-model/tree/virtual-machines
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # MLOps Zero to Hero
2 |
3 | Notes for my Udemy course - MLOps Zero to Hero
4 |
5 | https://www.udemy.com/user/abhishek-veeramalla/?srsltid=AfmBOopEdZFhCNtWblrQcXgZa3LAzdW2Zg7b31Tu6ruW5TQ_GdD0qdOe
6 |
7 |
8 |
--------------------------------------------------------------------------------
/08-kserve/02-architecture.md:
--------------------------------------------------------------------------------
1 | # KServe Architecture
2 |
3 |
4 |
5 |
6 |
7 |
--------------------------------------------------------------------------------
/07-deploy-and-serving-using-vms/01-architecture.md:
--------------------------------------------------------------------------------
1 | # Architecture
2 |
3 | Internet (client)
4 | │
5 | ▼
6 | Internet Gateway (IGW) attached to VPC
7 | │
8 | ▼
9 | Application Load Balancer (ALB) — internet-facing (ENIs in Public Subnets A & B)
10 | │ (Listener: HTTP 80)
11 | ▼
12 | Target Group (HTTP: 80, Health-check: /predict)
13 | │
14 | ▼
15 | Auto Scaling Group (ASG)
16 | │
17 | ▼
18 | EC2 Instance (in a Public Subnet) ──> Nginx (listen :80) ── proxy_pass──> Gunicorn (127.0.0.1:6000) ──> WSGI app (/predict)
19 |
--------------------------------------------------------------------------------
/03-role-of-mlops/04-ml-engineers-without-mlops.md:
--------------------------------------------------------------------------------
1 | # Role of an ML Engineer in a Project
2 |
3 | Once a Data Scientist builds a working model, the question becomes:
4 |
5 | “How do we let real users or applications use this model?”
6 |
7 | This is where the ML Engineer steps in.
8 |
9 | ### Turn the Model into an API
10 |
11 | A trained model by itself is just a file.
12 | An ML Engineer’s first responsibility is to wrap the model with an API.
13 |
14 | They:
15 |
16 | - Load the trained model
17 | - Accept input from users or applications (usually JSON)
18 | - Run predictions using the model
19 | - Return the result as a response
20 |
21 | Now the model can be:
22 |
23 | - Called by a frontend
24 | - Used by backend services
25 | - Integrated into real applications
26 |
27 | The model becomes usable, not just theoretical.
28 |
29 | ### Handle Input and Output Safely
30 |
31 | In the real world, users can send bad data.
32 |
33 | An ML Engineer ensures:
34 |
35 | - Inputs are validated
36 | - Missing or incorrect fields are handled
37 | - Errors don’t crash the service
38 |
39 | This prevents:
40 |
41 | - Application failures
42 | - Incorrect predictions
43 | - Production incidents
44 |
45 | ### Make the Model Fast and Efficient
46 |
47 | A model that works in a notebook may be:
48 |
49 | Slow, Memory-heavy and NOT optimized for repeated requests
50 |
51 | ML Engineers:
52 |
53 | - Optimize how the model is loaded
54 | - Avoid reloading the model for every request
55 | - Ensure predictions are fast enough for real users
56 |
57 |
--------------------------------------------------------------------------------
/08-kserve/01-Introduction.md:
--------------------------------------------------------------------------------
1 | # Introduction to KServe
2 |
3 | Imagine you’ve trained a machine learning model, maybe a classifier, a recommender, or anything else.
4 | The next big question is: How do you deploy this model so real users or applications can send requests and get predictions?
5 |
6 | KServe is a tool that solves exactly this problem.
7 |
8 | ### What is KServe?
9 |
10 | KServe is a Kubernetes-native platform designed to deploy and serve ML models easily, reliably, and at scale.
11 |
12 | - In even simpler words:
13 | KServe takes your ML model and turns it into a production-ready API running on Kubernetes without you writing a lot of server code.
14 |
15 | ### Why KServe Exists?
16 |
17 | Traditional model deployment is painful:
18 | - You need to write Flask or FastAPI code
19 | - You need to containerize the app
20 | - You need to expose endpoints
21 | - You need to manage scaling, logging, networking
22 | - You need to monitor and version your models
23 |
24 | KServe removes most of this effort by providing standardized, ready-to-use model servers.
25 |
26 | ### What KServe Actually Does?
27 |
28 | KServe provides:
29 |
30 | 1. Standard Model Servers
31 |
32 | For popular frameworks like:
33 | - TensorFlow
34 | - PyTorch
35 | - Scikit-learn
36 | - XGBoost
37 | - ONNX
38 |
39 | You simply point KServe to your model file (a storage URI), and it deploys everything automatically.
40 |
41 | ### Automatic Scaling
42 |
43 | Your model API can:
44 | - Scale up when traffic increases
45 | - Scale down to zero when idle (saving huge costs)
46 | - This is powered by Knative under the hood.
47 |
48 |
--------------------------------------------------------------------------------
/09-SageMaker/01-introduction.md:
--------------------------------------------------------------------------------
1 | # What is SageMaker?
2 |
3 | AWS SageMaker is Amazon’s fully managed platform for building, training, and deploying machine learning models at scale.
4 |
5 | In real-world ML systems, the actual training code is only 5–10% of the work.
6 |
7 | MLOps challenges include:
8 |
9 | - Environment and dependency management
10 | - Scalable training workloads
11 | - Handling large datasets
12 | - Model versioning
13 | - Model registry
14 | - Automated deployments
15 | - Monitoring predictions & model drift
16 | - Cost control for GPUs/instances
17 |
18 | SageMaker bundles these into managed services so that MLOps engineers can avoid building the entire ML control plane from scratch.
19 |
20 | ### What MLOps Engineers Actually Do With SageMaker
21 |
22 | A) Build ML Environments
23 |
24 | - Prepare Docker images with Python/ML libraries
25 | - Manage dependency consistency
26 | - Use CDK/Terraform to provision infrastructure
27 |
28 | B) Automate Training
29 |
30 | - Use SageMaker Training Jobs with CI pipelines
31 | - Configure distributed training
32 | - Use Spot instances to control cost
33 |
34 | C) Manage Model Registry
35 |
36 | - Store versioned models
37 | - Integrate approval workflows (“manual gate” for prod)
38 |
39 | D) Automate Deployments
40 |
41 | - Blue/Green deployments
42 | - Canary deployments
43 | - Event-driven retraining
44 | - Update production endpoints with zero downtime
45 |
46 | E) Observability & Monitoring
47 |
48 | - CloudWatch for logs/metrics
49 | - SageMaker Model Monitor for:
50 | - Data drift
51 | - Feature drift
52 | - Prediction drift
53 | - Outlier detection
54 |
55 | F) Cost Optimization
56 |
57 | - Spot training
58 | - Multi-model endpoints
59 | - Serverless endpoints
60 |
61 | Endpoint autoscaling
62 |
63 |
--------------------------------------------------------------------------------
/04-versioning-and-experiment-tracking/02-introduction-to-dvc.md:
--------------------------------------------------------------------------------
1 | # What is DVC?
2 |
3 | Think of DVC (Data Version Control) as Git for your data.
4 |
5 | Git works great for: code and small text files.
6 |
7 | But Git cannot handle:
8 |
9 | - Large datasets
10 | - Model files
11 | - Data stored in cloud storage
12 | - This is where DVC helps.
13 |
14 | DVC lets you:
15 |
16 | - Track versions of datasets
17 | - Store large files outside Git (S3, GCS, Azure, local storage)
18 | - Keep your Git repo clean and lightweight
19 | - Reproduce your ML project anytime
20 |
21 | ### Wine Prediction Example
22 |
23 | Imagine you're building a simple Wine Quality Prediction ML model.
24 |
25 | You have:
26 |
27 | - A CSV file → wine_data_sample.csv
28 | - A training script → train.py
29 | - A Git repo
30 |
31 | Your dataset may change over time:
32 |
33 | - You add more rows
34 | - You clean the data
35 | - You update features
36 |
37 | DVC allows you to version these dataset changes without storing the actual data inside Git.
38 |
39 | Without DVC -> Your CSV sits in your repo → Git becomes slow & heavy.
40 |
41 | With DVC -> Git stores only a small metadata file:
42 |
43 | - wine_data_sample.csv.dvc
44 | - Actual data is stored in: `S3 bucket` or any external storage
45 |
46 | You pull/push data similar to git pull / git push.
47 |
48 | ### How DVC Works (Very Simple Flow)
49 |
50 | Add your dataset to DVC
51 |
52 | `dvc add wine_data_sample.csv`
53 |
54 | Commit the .dvc file to Git
55 |
56 | `git add wine_data_sample.csv.dvc`
57 | `git commit -m "Track dataset with DVC"`
58 |
59 | Configure remote storage (e.g., S3)
60 |
61 | `dvc remote add -d myremote s3://mybucket/dvcstore`
62 |
63 | Push data to S3
64 |
65 | `dvc push`
66 |
67 | Anyone with your Git repo simply runs:
68 |
69 | `dvc pull`
70 |
71 | …and they get the exact same dataset version.
--------------------------------------------------------------------------------
/02-introduction-to-mlops/03-what-is-mlops.md:
--------------------------------------------------------------------------------
1 | # What is MLOps?
2 |
3 | Before understanding MLOps, it’s important to understand **where it comes from**.
4 |
5 | MLOps is **directly inspired by DevOps**.
6 |
7 | Just like DevOps transformed how we build and operate software, **MLOps brings those same principles into the Machine Learning world**.
8 |
9 | ---
10 |
11 | ## How DevOps Inspired MLOps
12 |
13 | ### What DevOps Solved
14 |
15 | Before DevOps:
16 | - Developers wrote code
17 | - Ops teams deployed and maintained it
18 | - Deployments were slow, manual, and risky
19 | - Failures were hard to debug
20 |
21 | DevOps introduced:
22 | - Automation
23 | - CI/CD pipelines
24 | - Infrastructure as Code
25 | - Monitoring and feedback loops
26 | - Shared ownership between Dev and Ops
27 |
28 | The result:
29 | - Faster releases
30 | - More reliable systems
31 | - Continuous improvement
32 |
33 | ---
34 |
35 | ## The Same Problem Happened in Machine Learning
36 |
37 | In ML, a similar gap appeared:
38 |
39 | - Data Scientists trained models in notebooks
40 | - Models worked locally
41 | - Production teams struggled to deploy them
42 | - No clear ownership after deployment
43 | - Models degraded silently over time
44 |
45 | Just like Dev vs Ops, ML had a gap between:
46 | - **Model development**
47 | - **Model operations**
48 |
49 | That gap is what **MLOps** was created to solve.
50 |
51 | ---
52 |
53 | ## MLOps = DevOps Practices for Machine Learning
54 |
55 | MLOps takes proven DevOps ideas and applies them to ML systems.
56 |
57 | | DevOps Concept | MLOps Equivalent |
58 | |----------------|------------------|
59 | | Source code versioning | Data + model versioning |
60 | | CI pipelines | Model training pipelines |
61 | | CD pipelines | Automated model deployment |
62 | | Monitoring services | Monitoring model performance |
63 | | Rollbacks | Model version rollback |
64 | | Automation | End-to-end ML lifecycle automation |
65 |
--------------------------------------------------------------------------------
/08-kserve/03-end-to-end-demo.md:
--------------------------------------------------------------------------------
1 | # Kserve Demonstration for Iris model
2 |
3 | ### Install Cert Manager
4 |
5 | ```
6 | kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
7 | ```
8 |
9 | ### Install KServe CRDs
10 |
11 | ```
12 | kubectl create namespace kserve
13 |
14 | helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
15 | --version v0.16.0 \
16 | -n kserve \
17 | --wait
18 | ```
19 |
20 | ### Install KServe controller
21 |
22 | ```
23 | helm install kserve oci://ghcr.io/kserve/charts/kserve \
24 | --version v0.16.0 \
25 | -n kserve \
26 | --set kserve.controller.deploymentMode=RawDeployment \
27 | --wait
28 | ```
29 |
30 | ### Deploy the sklearn iris model
31 |
32 | ```
33 | kubectl create namespace ml
34 |
35 | cat < MLflow helps you answer the question:
8 | > **“Which model did we train, with what parameters, and how good was it?”**
9 |
10 | ---
11 |
12 | ## Why MLflow Exists
13 |
14 | Once you start building real ML projects, common problems appear:
15 |
16 | - Multiple experiments with different parameters
17 | - No clear record of which model performed best
18 | - Difficult to reproduce results
19 | - Models stored locally with no versioning
20 | - Hard to move models from training to deployment
21 |
22 | **MLflow solves these problems by acting as a central system of record for ML work.**
23 |
24 | ---
25 |
26 | ## Core Components of MLflow
27 |
28 | MLflow has four main components. Beginners should focus mainly on the first two.
29 |
30 | ---
31 |
32 | ### MLflow Tracking
33 |
34 | Used to **track experiments**.
35 |
36 | You can log:
37 | - Parameters (learning rate, epochs, etc.)
38 | - Metrics (accuracy, loss, F1-score)
39 | - Artifacts (model files, plots, datasets)
40 | - Source code version
41 |
42 | Each training run is stored as a **Run**.
43 |
44 | Example:
45 |
46 | import mlflow
47 |
48 | with mlflow.start_run():
49 | mlflow.log_param("learning_rate", 0.01)
50 | mlflow.log_metric("accuracy", 0.92)
51 |
52 | To view runs:
53 |
54 | mlflow ui
55 |
56 | This opens a UI where you can compare experiments.
57 |
58 | ---
59 |
60 | ### MLflow Models
61 |
62 | MLflow provides a **standard way to package models**.
63 |
64 | This allows the same model to be:
65 | - Loaded in Python
66 | - Served via REST API
67 | - Containerized using Docker
68 | - Deployed to cloud or Kubernetes
69 |
70 | This makes models **portable and production-ready**.
71 |
72 | ---
73 |
74 | ### MLflow Projects
75 |
76 | A way to package ML code with:
77 | - Environment details
78 | - Entry points
79 | - Reproducible execution
80 |
81 | Mostly useful for larger teams and advanced workflows.
82 |
83 | ---
84 |
85 | ### MLflow Model Registry
86 |
87 | A centralized place to manage models:
88 | - Model versions
89 | - Stages (Staging, Production, Archived)
90 | - Metadata and approvals
91 |
92 | Very useful in enterprise MLOps setups.
93 |
94 | ---
95 |
96 | ## Simple Real-World Example
97 |
98 | Imagine a **Wine Quality Prediction** model.
99 |
100 | You try:
101 | - Run 1: learning_rate = 0.01 → accuracy = 0.86
102 | - Run 2: learning_rate = 0.1 → accuracy = 0.89
103 | - Run 3: learning_rate = 0.001 → accuracy = 0.82
104 |
105 | Without MLflow:
106 | - You forget results
107 | - You overwrite models
108 | - You guess which model to deploy
109 |
110 | With MLflow:
111 | - Every run is logged
112 | - Metrics are compared visually
113 | - Best model is clearly identified
114 | - Model file is stored and versioned
115 |
116 | ---
117 |
118 | ## How MLflow Fits into MLOps
119 |
120 | MLflow helps MLOps Engineers with:
121 |
122 | - Experiment tracking
123 | - Reproducibility
124 | - Model lineage
125 | - Model packaging
126 | - Deployment readiness
127 |
128 | In real projects, MLflow is often combined with:
129 | - Git (code versioning)
130 | - DVC (data versioning)
131 | - GitHub Actions (CI/CD)
132 | - Docker (containerization)
133 | - Kubernetes (serving)
134 | - Cloud storage like S3 (remote tracking)
135 |
136 |
137 |
138 |
--------------------------------------------------------------------------------
/06-fundamentals-of-model-deployment.md/01-introduction-to-deployment-and-serving.md:
--------------------------------------------------------------------------------
1 | # Introduction to Model Deployment and Model Serving
2 |
3 | Machine learning models create value only when they can be used by real users or systems. Training a model is just one step. To make it useful, the model must be deployed and served so that applications can request predictions from it.
4 |
5 | ---
6 |
7 | ## What Is Model Deployment
8 |
9 | Model deployment is the process of taking a trained machine learning model and making it available in a production environment.
10 |
11 | In simple terms, deployment means:
12 | - Moving the model out of a local machine
13 | - Packaging it with required code and dependencies
14 | - Making it accessible to other systems or users
15 |
16 | A deployed model can be accessed by:
17 | - Web applications
18 | - Backend services
19 | - Mobile apps
20 | - Batch jobs or data pipelines
21 |
22 | ---
23 |
24 | ## What Is Model Serving
25 |
26 | Model serving is how the deployed model **runs in production** and responds to prediction requests.
27 |
28 | A model serving system typically:
29 | - Loads the trained model into memory
30 | - Accepts input data (JSON, text, images, numbers)
31 | - Runs inference on the model
32 | - Returns predictions to the caller
33 |
34 | Model serving focuses on runtime behavior such as:
35 | - Response time (latency)
36 | - Number of requests handled (throughput)
37 | - Reliability and availability
38 |
39 | ---
40 |
41 | ## Types of Model Serving
42 |
43 | ### Real-time Serving
44 | - Predictions are returned instantly
45 | - Used when low latency is required
46 | - Examples:
47 | - Fraud detection
48 | - Recommendation systems
49 | - Intent classification APIs
50 |
51 | ### Batch Serving
52 | - Predictions run on large datasets at scheduled intervals
53 | - Used for offline analytics and reports
54 | - Examples:
55 | - Nightly churn prediction
56 | - Weekly risk scoring
57 | - Bulk data enrichment
58 |
59 | ---
60 |
61 | ## Why Model Deployment and Serving Matter
62 |
63 | A model that is not deployed is just an experiment.
64 |
65 | Production systems require:
66 | - Consistent availability
67 | - Fast responses
68 | - Ability to scale with traffic
69 | - Logging and monitoring
70 | - Safe updates and rollbacks
71 |
72 | Deployment and serving help ensure:
73 | - The model can handle real user traffic
74 | - Predictions remain reliable over time
75 | - Issues can be detected and fixed quickly
76 |
77 | ---
78 |
79 | ## Common Ways to Deploy and Serve Models
80 |
81 | ### Python API-Based Serving
82 | - Flask
83 | - FastAPI
84 | - Django
85 |
86 | Simple and ideal for learning and small-scale use cases.
87 |
88 | ---
89 |
90 | ### Container-Based Deployment
91 | - Docker for packaging
92 | - Kubernetes for orchestration
93 | - Load balancers for traffic distribution
94 |
95 | Used in production environments for scalability and reliability.
96 |
97 | ---
98 |
99 | ### MLOps and Model Serving Platforms
100 | - MLflow Model Serving
101 | - KServe
102 | - Seldon Core
103 | - TensorFlow Serving
104 | - TorchServe
105 |
106 | These tools provide built-in features like:
107 | - Model versioning
108 | - Auto-scaling
109 | - Canary deployments
110 | - Metrics and monitoring
111 |
112 | ---
113 |
114 | ### Serverless Deployment
115 | - AWS Lambda
116 | - GCP Cloud Run
117 | - Azure Functions
118 |
119 | Best suited for lightweight models and variable traffic patterns.
120 |
121 | ---
122 |
123 | ## Simple Example: Model Serving Flow
124 |
125 | For a basic prediction model:
126 |
127 | - Train a model locally
128 | - Save the trained model as a file
129 | - Load the model inside an API service
130 | - Expose a `/predict` endpoint
131 | - Deploy the service to a server or container platform
132 | - Client applications send requests and receive predictions
133 |
134 | ---
135 |
136 | ## Model Deployment vs Model Serving
137 |
138 | | Concept | Description |
139 | |-------|-------------|
140 | | Model Deployment | The process of releasing the model into a production environment |
141 | | Model Serving | The system that handles prediction requests at runtime |
142 |
143 | Deployment is a one-time or versioned activity, while serving is continuous and always running.
144 |
--------------------------------------------------------------------------------
/09-SageMaker/02-production-setup.md:
--------------------------------------------------------------------------------
1 | # Install SageMaker using AWS CLI
2 |
3 | ### Get the Default VPC ID
4 |
5 | ```
6 | aws ec2 describe-vpcs \
7 | --filters "Name=isDefault,Values=true" \
8 | --query "Vpcs[0].VpcId" \
9 | --output text \
10 | --region
11 | ```
12 |
13 | ### List Subnets Under the Default VPC
14 |
15 | ```
16 | aws ec2 describe-subnets \
17 | --filters "Name=vpc-id,Values=" \
18 | --query "Subnets[].SubnetId" \
19 | --output text \
20 | --region
21 | ```
22 |
23 | ### Create an Execution Role for SageMaker Domain
24 |
25 | Create a simple trust policy
26 |
27 | Save as trust.json:
28 |
29 | ```
30 | {
31 | "Version": "2012-10-17",
32 | "Statement": [
33 | {
34 | "Effect": "Allow",
35 | "Principal": { "Service": "sagemaker.amazonaws.com" },
36 | "Action": "sts:AssumeRole"
37 | }
38 | ]
39 | }
40 | ```
41 |
42 | Create the role
43 |
44 | ```
45 | aws iam create-role \
46 | --role-name SageMakerDomainExecutionRole \
47 | --assume-role-policy-document file://trust.json
48 | ```
49 |
50 | Attach a basic policy (beginner friendly)
51 |
52 | ```
53 | aws iam attach-role-policy \
54 | --role-name SageMakerDomainExecutionRole \
55 | --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
56 | ```
57 |
58 | Save the role ARN from:
59 |
60 | `aws iam get-role --role-name SageMakerDomainExecutionRole --query "Role.Arn" --output text`
61 |
62 | ### Create the SageMaker Domain (Using Default VPC)
63 |
64 | This is the core step.
65 |
66 | ```
67 | aws sagemaker create-domain \
68 | --domain-name my-sagemaker-domain \
69 | --auth-mode IAM \
70 | --vpc-id \
71 | --subnet-ids \
72 | --app-network-access-type VpcOnly \
73 | --default-user-settings "{
74 | \"ExecutionRole\": \"\"
75 | }" \
76 | --region
77 | ```
78 |
79 | This returns a DomainId. If you lose it, list domains:
80 |
81 | `aws sagemaker list-domains --region `
82 |
83 | ### Create a SageMaker UserProfile + Tag It
84 |
85 | ABAC depends on tags.
86 |
87 | ```
88 | aws sagemaker create-user-profile \
89 | --domain-id \
90 | --user-profile-name alice-profile \
91 | --tags Key=studiouserid,Value=alice123 \
92 | --region
93 | ```
94 |
95 | ### Create the IAM User and Tag the User
96 |
97 | The IAM user must have the same tag for ABAC matching.
98 |
99 | ```
100 | aws iam create-user --user-name alice-iam-user
101 | ```
102 |
103 | Add ABAC tag
104 |
105 | ```
106 | aws iam tag-user \
107 | --user-name alice-iam-user \
108 | --tags Key=studiouserid,Value=alice123
109 | ```
110 |
111 | ### Create the ABAC Policy
112 |
113 | This policy enforces two things:
114 |
115 | The IAM user can only generate a presigned URL for a user profile whose tag matches their own (studiouserid).
116 |
117 | The IAM user can view the domain and user profile in the SageMaker console.
118 |
119 | Save this as sagemaker-abac.json:
120 |
121 | ```
122 | {
123 | "Version": "2012-10-17",
124 | "Statement": [
125 | {
126 | "Sid": "AllowConsoleListAndDescribe",
127 | "Effect": "Allow",
128 | "Action": [
129 | "sagemaker:ListDomains",
130 | "sagemaker:ListUserProfiles",
131 | "sagemaker:ListApps",
132 | "sagemaker:DescribeDomain",
133 | "sagemaker:DescribeUserProfile",
134 | "sagemaker:ListTags"
135 | ],
136 | "Resource": "*"
137 | },
138 | {
139 | "Sid": "AllowPresignedUrlWhenTagMatches",
140 | "Effect": "Allow",
141 | "Action": [
142 | "sagemaker:CreatePresignedDomainUrl"
143 | ],
144 | "Resource": "*",
145 | "Condition": {
146 | "StringEquals": {
147 | "sagemaker:ResourceTag/studiouserid": "${aws:PrincipalTag/studiouserid}"
148 | }
149 | }
150 | }
151 | ]
152 | }
153 | ```
154 |
155 | ### Create the IAM policy
156 |
157 | ```
158 | aws iam create-policy \
159 | --policy-name SageMaker-Studio-ABAC \
160 | --policy-document file://sagemaker-abac.json
161 | ```
162 |
163 | ### Attach the Policy to the IAM User
164 |
165 | ```
166 | aws iam attach-user-policy \
167 | --user-name alice-iam-user \
168 | --policy-arn arn:aws:iam:::policy/SageMaker-Studio-ABAC
169 | ```
170 |
171 | ### How the IAM User Opens SageMaker Studio
172 |
173 | There are two ways now:
174 |
175 | Using the SageMaker Console (now works due to list permissions)
176 |
177 | - IAM user signs in → goes to:
178 | - Amazon SageMaker → Studio → Domains
179 | - They can now see: The domain -> The user profile
180 |
181 | Using a Presigned URL (ABAC-restricted)
182 |
183 | The user (or admin) runs:
184 |
185 | ```
186 | aws sagemaker create-presigned-domain-url \
187 | --domain-id \
188 | --user-profile-name alice-profile \
189 | --session-expiration-duration-in-seconds 3600 \
190 | --region
191 | ```
192 |
193 | This returns a URL that opens SageMaker Studio only for this UserProfile.
194 |
195 | If an IAM user tries to open another user’s profile → access denied because the tags won't match.
196 |
--------------------------------------------------------------------------------
/06-fundamentals-of-model-deployment.md/02-popular-ways.md:
--------------------------------------------------------------------------------
1 | # High-level overview of popular model serving implementations
2 |
3 | Below are concise, high-level descriptions, architectures, trade-offs, and best-practices for four common model serving approaches: Flask on VM (WSGI + autoscaling), Containerized on Kubernetes (Ingress), Amazon SageMaker, and KServe.
4 |
5 | ---
6 |
7 | ## 1) Flask app deployment on VM with WSGI and autoscaling
8 |
9 | **What it is (short):**
10 | Run a Python Flask app that loads a model and exposes prediction endpoints (REST). Serve it via a production WSGI server (e.g., Gunicorn, uWSGI) on virtual machines. Autoscale by adding/removing VM instances (manual, cloud ASG, or autoscaler).
11 |
12 | **Architecture (high level):**
13 | - Model artifact stored on disk or fetched at startup (S3, artifact store).
14 | - Flask app exposes `/predict` (REST).
15 | - WSGI server (Gunicorn/uWSGI) runs multiple worker processes/threads.
16 | - Fronted by a load-balancer (cloud LB or HAProxy/Nginx).
17 | - Autoscaling group / VM scale set increases instances based on metrics (CPU, latency, queue length).
18 |
19 | **When to use:**
20 | - Small teams or POCs.
21 | - Low to moderate traffic; simple deployment requirements.
22 | - When you need direct control over the host environment or have non-containerized infra constraints.
23 |
24 | **Pros:**
25 | - Simple and easy to debug.
26 | - Minimal infra complexity; direct control of system packages and drivers (GPU drivers on VM).
27 | - Quick to prototype.
28 |
29 | **Cons / Risks:**
30 | - Operational overhead: patching, OS maintenance, scaling logic.
31 | - Harder to achieve fast, fine-grained autoscaling (startup time of VM can be high).
32 | - Less portable and reproducible than container-based deployments.
33 | - Concurrency limited by WSGI worker model; can be CPU-bound with Python GIL for single-process workers.
34 |
35 | **Best practices:**
36 | - Use a process manager and WSGI server with multiple workers.
37 | - Load model once per process; use batching if needed.
38 | - Use health checks and graceful shutdown to avoid dropping in-flight requests.
39 | - Autoscale on application-level metrics (latency, queue length) and keep warm instances or fast startup containers/VM images.
40 | - Add logging, metrics (Prometheus, StatsD), and tracing (OpenTelemetry).
41 |
42 | ---
43 |
44 | ## 2) Containerize and deploy to Kubernetes with Ingress
45 |
46 | **What it is (short):**
47 | Package model server (Flask/FastAPI/TorchServe/Triton or custom) into a container image, run it as pods on Kubernetes. Expose via Ingress (Ingress Controller / LB). Use Horizontal Pod Autoscaler (HPA) and potentially custom autoscalers (KEDA) for scaling.
48 |
49 | **Architecture (high level):**
50 | - Container image contains model server and dependencies.
51 | - Kubernetes Deployment or StatefulSet runs pods; ConfigMaps/Secrets for config.
52 | - Service exposes pod set; Ingress routes external traffic (nginx-ingress/ingress-nginx/traefik/ALB).
53 | - Autoscaling: HPA (CPU/RPS/Custom metrics), KEDA for event-driven scaling.
54 | - Optional: GPU nodes via nodePools; use device plugin for GPU scheduling.
55 | - Observability: Prometheus, Grafana, Loki, OpenTelemetry.
56 |
57 | **When to use:**
58 | - Production-grade systems requiring elasticity, multi-tenancy, and resilience.
59 | - Teams already using Kubernetes for infra.
60 | - Need for Canary/Blue-Green deployments, rollout strategies.
61 |
62 | **Pros:**
63 | - Portability: same image across environments.
64 | - Rich ecosystem: autoscaling, service discovery, network policies, RBAC.
65 | - Easy to integrate CI/CD, rollout strategies, canary testing.
66 | - Fast horizontal scaling of pods compared to VMs (container startup faster).
67 |
68 | **Cons / Risks:**
69 | - Operational complexity (K8s cluster management).
70 | - Resource fragmentation and noisy-neighbor issues without careful resource requests/limits.
71 | - Requires solid observability and cost control.
72 |
73 | **Best practices:**
74 | - Use readiness/liveness probes; graceful termination.
75 | - Keep container images small and immutable; load model from external artifact store or use initContainers to fetch large models.
76 | - Use resource requests/limits; tune HPA using meaningful metrics (latency, queue length rather than just CPU).
77 | - Secure with network policies, PodSecurity, and RBAC.
78 | - Use multi-stage builds and CI to test image, run model-smoke tests in CI.
79 | - Consider model warm-up or preloading to avoid cold-start latency.
80 | - For high-throughput or low-latency needs, use specialized servers (Triton, TorchServe) rather than general web frameworks.
81 |
82 | ---
83 |
84 | ## 3) Amazon SageMaker
85 |
86 | **What it is (short):**
87 | A fully managed AWS service for training and serving ML models. SageMaker provides hosted endpoints (real-time), multi-model endpoints, batch transform, and serverless inference options.
88 |
89 | **Architecture (high level):**
90 | - Models are registered in SageMaker Model registry or stored in S3.
91 | - Create an Endpoint (single-model or multi-model) backed by endpoint instances (EC2) or serverless compute.
92 | - Autoscaling via SageMaker Endpoint Auto Scaling (target tracking policies).
93 | - Integration with CI/CD (SageMaker Pipelines), Model Monitor for drift detection, and Experiments for lineage.
94 |
95 | **When to use:**
96 | - Teams on AWS wanting managed end-to-end MLOps: training, deployment, monitoring.
97 | - Need for simplified operational burden and built-in features like model monitoring, A/B testing, and built-in containers for popular frameworks.
98 |
99 | **Pros:**
100 | - Managed: reduces infra ops (patching, scaling, provisioning).
101 | - Feature-rich: model registry, batch inference, built-in monitoring and explainability features.
102 | - Tight integration with other AWS services (IAM, CloudWatch, S3, ECR).
103 | - Serverless inference option reduces need to manage instance fleets for sporadic traffic.
104 |
105 | **Cons / Risks:**
106 | - Cost can increase for always-on heavy workloads unless optimized (multi-model endpoints, serverless).
107 | - Vendor lock-in to AWS APIs and patterns.
108 | - Less flexibility for very custom runtime environments (though custom containers are supported).
109 |
110 | **Best practices:**
111 | - Use multi-model endpoints for many small models to save instances, or serverless endpoints for intermittent traffic.
112 | - Enable Model Monitor for data/label drift and alarms.
113 | - Use SageMaker Model Registry for versioning and lineage.
114 | - Automate deployment with SageMaker Pipelines or Terraform/CDK and integrate CI/CD for model promotion.
115 | - Control costs by right-sizing instance types and using autoscaling policies.
116 |
117 | ---
118 |
119 | ## 4) KServe (previously KFServing)
120 |
121 | **What it is (short):**
122 | An open-source Kubernetes-native model serving framework built for cloud-native MLOps. KServe provides standardized CRDs to deploy model servers with autoscaling, multi-framework support, canary rollouts, inference graphing, and explainability.
123 |
124 | **Architecture (high level):**
125 | - KServe runs on Kubernetes and exposes a `InferenceService` CRD per model.
126 | - Behind the scenes it orchestrates a server (predictor) with autoscaling (Knative/KEDA), transformer, and explainer components.
127 | - Supports many frameworks out-of-the-box: TensorFlow, PyTorch, ONNX, Triton; can use custom containers.
128 | - Integrates with Knative Serving for request autoscaling (including scale-to-zero) and with istio/other service meshes for networking.
129 |
130 | **When to use:**
131 | - Teams standardizing on Kubernetes and wanting a declarative, extensible model-serving platform.
132 | - When you need multi-framework support, canary deployments for models, and scale-to-zero support to save costs.
133 |
134 | **Pros:**
135 | - Declarative model lifecycle using Kubernetes CRDs.
136 | - Rich features: canary rollouts, autoscale-to-zero, explainers, transformers, batch/streaming integration.
137 | - Framework abstraction: swap underlying predictor implementation without changing higher-level config.
138 | - Extensible and vendor-neutral; integrates with Kubeflow and other tools.
139 |
140 | **Cons / Risks:**
141 | - Requires Kubernetes and some maturity in K8s operations.
142 | - Complexity in advanced features (Knative setup, autoscaling tuning).
143 | - Ecosystem maturity varies; sometimes upgrades or custom integrations needed.
144 |
145 | **Best practices:**
146 | - Use `InferenceService` to standardize deployments; use canary rollouts for model updates.
147 | - Leverage Knative or KEDA for event-driven or scale-to-zero behavior to reduce cost.
148 | - Use model explainers and monitors (prometheus exporters) offered by KServe.
149 | - Manage model artifacts outside the cluster (object storage) and use init containers or model loaders.
150 | - Integrate CI/CD to create and update InferenceService resources programmatically.
151 |
152 |
--------------------------------------------------------------------------------