├── .gitignore ├── 05-experiment-tracking ├── 06-MLflow-and-DVC-project.md ├── 03-basic-MLflow-installation.md ├── 04-basic-MLflow-on-Kubernetes.md ├── 05-MLflow-prod-setup.md ├── 01-what-is-experiment-tracking.md └── 02-what-is-MLflow.md ├── 03-role-of-mlops ├── 01-introduction.md ├── 04-ml-engineers-without-mlops.md ├── 05-how-mlops-engineers-help-ml-engineers.md ├── 03-how-mlops-help-datascientists.md └── 02-data-scientists-without-mlops.md ├── 06-fundamentals-of-model-deployment.md ├── 03-project-for-deployment.md ├── 01-introduction-to-deployment-and-serving.md └── 02-popular-ways.md ├── 07-deploy-and-serving-using-vms ├── 02-implementing-wsgi.md ├── 00-IMPORTANT.md └── 01-architecture.md ├── 04-versioning-and-experiment-tracking ├── 03-DVC-hands-on.md ├── 02-introduction-to-dvc.md └── 01-what-is-data-versioning.md ├── README.md ├── 08-kserve ├── 02-architecture.md ├── 01-Introduction.md └── 03-end-to-end-demo.md ├── 09-SageMaker ├── 01-introduction.md └── 02-production-setup.md └── 02-introduction-to-mlops ├── 03-what-is-mlops.md ├── 01-what-is-machine-learning-and-model.md ├── 05-ds-vs-ml-vs-mlops.md ├── 02-steps-to-create-a-model.md └── 04-machine-learning-lifecycle-overview.md /.gitignore: -------------------------------------------------------------------------------- 1 | .venv/ 2 | .vscode/ -------------------------------------------------------------------------------- /05-experiment-tracking/06-MLflow-and-DVC-project.md: -------------------------------------------------------------------------------- 1 | Please refer to the below repository for this lecture. 2 | 3 | https://github.com/iam-veeramalla/Wine-Prediction-Model -------------------------------------------------------------------------------- /05-experiment-tracking/03-basic-MLflow-installation.md: -------------------------------------------------------------------------------- 1 | Please refer to the below documentation for this lecture. 2 | 3 | https://mlflow.org/docs/2.4.2/quickstart.html#install-mlflow -------------------------------------------------------------------------------- /03-role-of-mlops/01-introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Please refer the below repository for all the project files and notes. 4 | 5 | https://github.com/iam-veeramalla/hello-world-mlops -------------------------------------------------------------------------------- /05-experiment-tracking/04-basic-MLflow-on-Kubernetes.md: -------------------------------------------------------------------------------- 1 | Please refer to the below documentation for this lecture. 2 | 3 | https://community-charts.github.io/docs/charts/mlflow/basic-installation -------------------------------------------------------------------------------- /05-experiment-tracking/05-MLflow-prod-setup.md: -------------------------------------------------------------------------------- 1 | Please refer to the below document for the next lecture 2 | 3 | https://community-charts.github.io/docs/charts/mlflow/postgresql-backend-installation -------------------------------------------------------------------------------- /06-fundamentals-of-model-deployment.md/03-project-for-deployment.md: -------------------------------------------------------------------------------- 1 | Please refer to the below GitHub repository for this lecture 2 | 3 | https://github.com/iam-veeramalla/Intent-classifier-model -------------------------------------------------------------------------------- /07-deploy-and-serving-using-vms/02-implementing-wsgi.md: -------------------------------------------------------------------------------- 1 | Please refer for complete project files and notes. 2 | 3 | https://github.com/iam-veeramalla/Intent-classifier-model/tree/virtual-machines -------------------------------------------------------------------------------- /04-versioning-and-experiment-tracking/03-DVC-hands-on.md: -------------------------------------------------------------------------------- 1 | # Learn DVC using a project 2 | 3 | Please refer to the below repository for this lecture. 4 | 5 | https://github.com/iam-veeramalla/Wine-Prediction-Model 6 | -------------------------------------------------------------------------------- /07-deploy-and-serving-using-vms/00-IMPORTANT.md: -------------------------------------------------------------------------------- 1 | # Important Note 2 | 3 | Please refer to the Virtual Machines branch of Intent Classifier repo for this section. 4 | 5 | Link: 6 | 7 | https://github.com/iam-veeramalla/Intent-classifier-model/tree/virtual-machines -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MLOps Zero to Hero 2 | 3 | Notes for my Udemy course - MLOps Zero to Hero 4 | 5 | https://www.udemy.com/user/abhishek-veeramalla/?srsltid=AfmBOopEdZFhCNtWblrQcXgZa3LAzdW2Zg7b31Tu6ruW5TQ_GdD0qdOe 6 | 7 | Screenshot 2025-12-17 at 12 17 51 PM 8 | -------------------------------------------------------------------------------- /08-kserve/02-architecture.md: -------------------------------------------------------------------------------- 1 | # KServe Architecture 2 | 3 | 4 | Screenshot 2025-12-10 at 6 01 31 PM 5 | 6 | Screenshot 2025-12-10 at 5 59 52 PM 7 | -------------------------------------------------------------------------------- /07-deploy-and-serving-using-vms/01-architecture.md: -------------------------------------------------------------------------------- 1 | # Architecture 2 | 3 | Internet (client) 4 | │ 5 | ▼ 6 | Internet Gateway (IGW) attached to VPC 7 | │ 8 | ▼ 9 | Application Load Balancer (ALB) — internet-facing (ENIs in Public Subnets A & B) 10 | │ (Listener: HTTP 80) 11 | ▼ 12 | Target Group (HTTP: 80, Health-check: /predict) 13 | │ 14 | ▼ 15 | Auto Scaling Group (ASG) 16 | │ 17 | ▼ 18 | EC2 Instance (in a Public Subnet) ──> Nginx (listen :80) ── proxy_pass──> Gunicorn (127.0.0.1:6000) ──> WSGI app (/predict) 19 | -------------------------------------------------------------------------------- /03-role-of-mlops/04-ml-engineers-without-mlops.md: -------------------------------------------------------------------------------- 1 | # Role of an ML Engineer in a Project 2 | 3 | Once a Data Scientist builds a working model, the question becomes: 4 | 5 | “How do we let real users or applications use this model?” 6 | 7 | This is where the ML Engineer steps in. 8 | 9 | ### Turn the Model into an API 10 | 11 | A trained model by itself is just a file. 12 | An ML Engineer’s first responsibility is to wrap the model with an API. 13 | 14 | They: 15 | 16 | - Load the trained model 17 | - Accept input from users or applications (usually JSON) 18 | - Run predictions using the model 19 | - Return the result as a response 20 | 21 | Now the model can be: 22 | 23 | - Called by a frontend 24 | - Used by backend services 25 | - Integrated into real applications 26 | 27 | The model becomes usable, not just theoretical. 28 | 29 | ### Handle Input and Output Safely 30 | 31 | In the real world, users can send bad data. 32 | 33 | An ML Engineer ensures: 34 | 35 | - Inputs are validated 36 | - Missing or incorrect fields are handled 37 | - Errors don’t crash the service 38 | 39 | This prevents: 40 | 41 | - Application failures 42 | - Incorrect predictions 43 | - Production incidents 44 | 45 | ### Make the Model Fast and Efficient 46 | 47 | A model that works in a notebook may be: 48 | 49 | Slow, Memory-heavy and NOT optimized for repeated requests 50 | 51 | ML Engineers: 52 | 53 | - Optimize how the model is loaded 54 | - Avoid reloading the model for every request 55 | - Ensure predictions are fast enough for real users 56 | 57 | -------------------------------------------------------------------------------- /08-kserve/01-Introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction to KServe 2 | 3 | Imagine you’ve trained a machine learning model, maybe a classifier, a recommender, or anything else. 4 | The next big question is: How do you deploy this model so real users or applications can send requests and get predictions? 5 | 6 | KServe is a tool that solves exactly this problem. 7 | 8 | ### What is KServe? 9 | 10 | KServe is a Kubernetes-native platform designed to deploy and serve ML models easily, reliably, and at scale. 11 | 12 | - In even simpler words: 13 | KServe takes your ML model and turns it into a production-ready API running on Kubernetes without you writing a lot of server code. 14 | 15 | ### Why KServe Exists? 16 | 17 | Traditional model deployment is painful: 18 | - You need to write Flask or FastAPI code 19 | - You need to containerize the app 20 | - You need to expose endpoints 21 | - You need to manage scaling, logging, networking 22 | - You need to monitor and version your models 23 | 24 | KServe removes most of this effort by providing standardized, ready-to-use model servers. 25 | 26 | ### What KServe Actually Does? 27 | 28 | KServe provides: 29 | 30 | 1. Standard Model Servers 31 | 32 | For popular frameworks like: 33 | - TensorFlow 34 | - PyTorch 35 | - Scikit-learn 36 | - XGBoost 37 | - ONNX 38 | 39 | You simply point KServe to your model file (a storage URI), and it deploys everything automatically. 40 | 41 | ### Automatic Scaling 42 | 43 | Your model API can: 44 | - Scale up when traffic increases 45 | - Scale down to zero when idle (saving huge costs) 46 | - This is powered by Knative under the hood. 47 | 48 | -------------------------------------------------------------------------------- /09-SageMaker/01-introduction.md: -------------------------------------------------------------------------------- 1 | # What is SageMaker? 2 | 3 | AWS SageMaker is Amazon’s fully managed platform for building, training, and deploying machine learning models at scale. 4 | 5 | In real-world ML systems, the actual training code is only 5–10% of the work. 6 | 7 | MLOps challenges include: 8 | 9 | - Environment and dependency management 10 | - Scalable training workloads 11 | - Handling large datasets 12 | - Model versioning 13 | - Model registry 14 | - Automated deployments 15 | - Monitoring predictions & model drift 16 | - Cost control for GPUs/instances 17 | 18 | SageMaker bundles these into managed services so that MLOps engineers can avoid building the entire ML control plane from scratch. 19 | 20 | ### What MLOps Engineers Actually Do With SageMaker 21 | 22 | A) Build ML Environments 23 | 24 | - Prepare Docker images with Python/ML libraries 25 | - Manage dependency consistency 26 | - Use CDK/Terraform to provision infrastructure 27 | 28 | B) Automate Training 29 | 30 | - Use SageMaker Training Jobs with CI pipelines 31 | - Configure distributed training 32 | - Use Spot instances to control cost 33 | 34 | C) Manage Model Registry 35 | 36 | - Store versioned models 37 | - Integrate approval workflows (“manual gate” for prod) 38 | 39 | D) Automate Deployments 40 | 41 | - Blue/Green deployments 42 | - Canary deployments 43 | - Event-driven retraining 44 | - Update production endpoints with zero downtime 45 | 46 | E) Observability & Monitoring 47 | 48 | - CloudWatch for logs/metrics 49 | - SageMaker Model Monitor for: 50 | - Data drift 51 | - Feature drift 52 | - Prediction drift 53 | - Outlier detection 54 | 55 | F) Cost Optimization 56 | 57 | - Spot training 58 | - Multi-model endpoints 59 | - Serverless endpoints 60 | 61 | Endpoint autoscaling 62 | 63 | -------------------------------------------------------------------------------- /04-versioning-and-experiment-tracking/02-introduction-to-dvc.md: -------------------------------------------------------------------------------- 1 | # What is DVC? 2 | 3 | Think of DVC (Data Version Control) as Git for your data. 4 | 5 | Git works great for: code and small text files. 6 | 7 | But Git cannot handle: 8 | 9 | - Large datasets 10 | - Model files 11 | - Data stored in cloud storage 12 | - This is where DVC helps. 13 | 14 | DVC lets you: 15 | 16 | - Track versions of datasets 17 | - Store large files outside Git (S3, GCS, Azure, local storage) 18 | - Keep your Git repo clean and lightweight 19 | - Reproduce your ML project anytime 20 | 21 | ### Wine Prediction Example 22 | 23 | Imagine you're building a simple Wine Quality Prediction ML model. 24 | 25 | You have: 26 | 27 | - A CSV file → wine_data_sample.csv 28 | - A training script → train.py 29 | - A Git repo 30 | 31 | Your dataset may change over time: 32 | 33 | - You add more rows 34 | - You clean the data 35 | - You update features 36 | 37 | DVC allows you to version these dataset changes without storing the actual data inside Git. 38 | 39 | Without DVC -> Your CSV sits in your repo → Git becomes slow & heavy. 40 | 41 | With DVC -> Git stores only a small metadata file: 42 | 43 | - wine_data_sample.csv.dvc 44 | - Actual data is stored in: `S3 bucket` or any external storage 45 | 46 | You pull/push data similar to git pull / git push. 47 | 48 | ### How DVC Works (Very Simple Flow) 49 | 50 | Add your dataset to DVC 51 | 52 | `dvc add wine_data_sample.csv` 53 | 54 | Commit the .dvc file to Git 55 | 56 | `git add wine_data_sample.csv.dvc` 57 | `git commit -m "Track dataset with DVC"` 58 | 59 | Configure remote storage (e.g., S3) 60 | 61 | `dvc remote add -d myremote s3://mybucket/dvcstore` 62 | 63 | Push data to S3 64 | 65 | `dvc push` 66 | 67 | Anyone with your Git repo simply runs: 68 | 69 | `dvc pull` 70 | 71 | …and they get the exact same dataset version. -------------------------------------------------------------------------------- /02-introduction-to-mlops/03-what-is-mlops.md: -------------------------------------------------------------------------------- 1 | # What is MLOps? 2 | 3 | Before understanding MLOps, it’s important to understand **where it comes from**. 4 | 5 | MLOps is **directly inspired by DevOps**. 6 | 7 | Just like DevOps transformed how we build and operate software, **MLOps brings those same principles into the Machine Learning world**. 8 | 9 | --- 10 | 11 | ## How DevOps Inspired MLOps 12 | 13 | ### What DevOps Solved 14 | 15 | Before DevOps: 16 | - Developers wrote code 17 | - Ops teams deployed and maintained it 18 | - Deployments were slow, manual, and risky 19 | - Failures were hard to debug 20 | 21 | DevOps introduced: 22 | - Automation 23 | - CI/CD pipelines 24 | - Infrastructure as Code 25 | - Monitoring and feedback loops 26 | - Shared ownership between Dev and Ops 27 | 28 | The result: 29 | - Faster releases 30 | - More reliable systems 31 | - Continuous improvement 32 | 33 | --- 34 | 35 | ## The Same Problem Happened in Machine Learning 36 | 37 | In ML, a similar gap appeared: 38 | 39 | - Data Scientists trained models in notebooks 40 | - Models worked locally 41 | - Production teams struggled to deploy them 42 | - No clear ownership after deployment 43 | - Models degraded silently over time 44 | 45 | Just like Dev vs Ops, ML had a gap between: 46 | - **Model development** 47 | - **Model operations** 48 | 49 | That gap is what **MLOps** was created to solve. 50 | 51 | --- 52 | 53 | ## MLOps = DevOps Practices for Machine Learning 54 | 55 | MLOps takes proven DevOps ideas and applies them to ML systems. 56 | 57 | | DevOps Concept | MLOps Equivalent | 58 | |----------------|------------------| 59 | | Source code versioning | Data + model versioning | 60 | | CI pipelines | Model training pipelines | 61 | | CD pipelines | Automated model deployment | 62 | | Monitoring services | Monitoring model performance | 63 | | Rollbacks | Model version rollback | 64 | | Automation | End-to-end ML lifecycle automation | 65 | -------------------------------------------------------------------------------- /08-kserve/03-end-to-end-demo.md: -------------------------------------------------------------------------------- 1 | # Kserve Demonstration for Iris model 2 | 3 | ### Install Cert Manager 4 | 5 | ``` 6 | kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml 7 | ``` 8 | 9 | ### Install KServe CRDs 10 | 11 | ``` 12 | kubectl create namespace kserve 13 | 14 | helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \ 15 | --version v0.16.0 \ 16 | -n kserve \ 17 | --wait 18 | ``` 19 | 20 | ### Install KServe controller 21 | 22 | ``` 23 | helm install kserve oci://ghcr.io/kserve/charts/kserve \ 24 | --version v0.16.0 \ 25 | -n kserve \ 26 | --set kserve.controller.deploymentMode=RawDeployment \ 27 | --wait 28 | ``` 29 | 30 | ### Deploy the sklearn iris model 31 | 32 | ``` 33 | kubectl create namespace ml 34 | 35 | cat <