├── assets └── images │ ├── GitOps-Loop.png │ ├── flux-architecture.png │ ├── Pull-vs-Push-Model.png │ ├── What-Exactly-Is-GitOps.png │ ├── CI-CD-vs-GitOps-Comparison.png │ └── The-Four-GitOps-Principles.png ├── examples ├── day3 │ └── clusters │ │ └── aks │ │ └── apps │ │ └── hello │ │ ├── namespace.yaml │ │ ├── kustomization.yaml │ │ ├── service.yaml │ │ └── deployment.yaml └── day2 │ └── clusters │ └── local │ └── apps │ └── hello │ ├── namespace.yaml │ ├── service.yaml │ └── deployment.yaml ├── README.md ├── Day-1-What-really-is-GitOps.md ├── Day-2-Building-Your-First-Self-Healing-System.md └── Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md /assets/images/GitOps-Loop.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/GitOps-Loop.png -------------------------------------------------------------------------------- /examples/day3/clusters/aks/apps/hello/namespace.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: hello -------------------------------------------------------------------------------- /examples/day2/clusters/local/apps/hello/namespace.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: hello -------------------------------------------------------------------------------- /assets/images/flux-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/flux-architecture.png -------------------------------------------------------------------------------- /assets/images/Pull-vs-Push-Model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/Pull-vs-Push-Model.png -------------------------------------------------------------------------------- /assets/images/What-Exactly-Is-GitOps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/What-Exactly-Is-GitOps.png -------------------------------------------------------------------------------- /assets/images/CI-CD-vs-GitOps-Comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/CI-CD-vs-GitOps-Comparison.png -------------------------------------------------------------------------------- /assets/images/The-Four-GitOps-Principles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/The-Four-GitOps-Principles.png -------------------------------------------------------------------------------- /examples/day3/clusters/aks/apps/hello/kustomization.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: kustomize.config.k8s.io/v1beta1 2 | kind: Kustomization 3 | resources: 4 | - namespace.yaml 5 | - deployment.yaml 6 | - service.yaml 7 | -------------------------------------------------------------------------------- /examples/day2/clusters/local/apps/hello/service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: hello 5 | namespace: hello 6 | spec: 7 | selector: 8 | app: hello 9 | ports: 10 | - port: 80 11 | targetPort: 80 12 | -------------------------------------------------------------------------------- /examples/day3/clusters/aks/apps/hello/service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: hello 5 | namespace: hello 6 | spec: 7 | type: LoadBalancer # ← this is the only required change on AKS 8 | selector: 9 | app: hello 10 | ports: 11 | - port: 80 12 | targetPort: 80 13 | -------------------------------------------------------------------------------- /examples/day3/clusters/aks/apps/hello/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: hello 5 | namespace: hello 6 | spec: 7 | replicas: 1 8 | selector: 9 | matchLabels: 10 | app: hello 11 | template: 12 | metadata: 13 | labels: 14 | app: hello 15 | spec: 16 | containers: 17 | - name: hello 18 | image: nginxdemos/hello:plain-text 19 | ports: 20 | - containerPort: 80 21 | -------------------------------------------------------------------------------- /examples/day2/clusters/local/apps/hello/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: hello 5 | namespace: hello 6 | spec: 7 | replicas: 1 8 | selector: 9 | matchLabels: 10 | app: hello 11 | template: 12 | metadata: 13 | labels: 14 | app: hello 15 | spec: 16 | containers: 17 | - name: hello 18 | image: nginxdemos/hello:plain-text 19 | ports: 20 | - containerPort: 80 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # GitOps-Days 🔄🚀 3 | 4 | Commit, reconcile, ship. 5 | *Learn the Git-first, pull-based way to run Kubernetes - and let your clusters look after themselves.* 6 | 7 | ## Welcome 8 | 9 | If you’ve ever fixed something in production at the last minute… and then spent the next week wondering what else had changed, you’re not alone. 10 | GitOps gives you a different way: changes go into Git, the cluster follows, and if things drift, they’re put back automatically. When trouble shows up, you can roll back cleanly to a known good state - no guesswork. 11 | 12 | That’s the predictability you’ll build in this series. 13 | Let’s start by choosing how you want to begin. 14 | 15 | ## 🗺️ Start here 16 | 17 | There’s more than one way to get into GitOps - pick the path that fits how you like to learn. 18 | 19 | * **Want the big picture first?** Start with **[Day 1: What really is GitOps?](./Day-1-What-really-is-GitOps.md)** and build a clear mental model. 20 | * **Learn best by doing?** Jump into **[Day 2: Building Your First Self-Healing System](./Day-2-Building-Your-First-Self-Healing-System.md)** and see it in action. 21 | * **Ready for cloud from the start?** Head to **[Day 3: Production GitOps on AKS with GitHub Actions](./Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md)** and go straight to a production-grade setup. 22 | 23 | > **You’ll need**: Git, Docker, and `kubectl` for Days 1-2. Day 3 also needs an Azure account and the Azure CLI. 24 | 25 | ### Quick cheat sheet before you dive in 26 | 27 | * **Desired state**: what Git says your cluster should look like. 28 | * **Drift**: when your cluster no longer matches Git. 29 | * **Controller**: software in the cluster (like [Flux](https://fluxcd.io/) or [Argo CD](https://argo-cd.readthedocs.io/)) that keeps it matching Git. 30 | * **Self-healing**: the controller restores the cluster to match Git automatically. 31 | 32 | ## 🎯 What you will achieve 33 | 34 | By the end of this series, you will be able to: 35 | 36 | * Keep clusters aligned with Git and recover fast when things change. 37 | * Build and run a self-healing loop locally. 38 | * Take the same loop to Azure Kubernetes Service with production patterns. 39 | * Extend GitOps with CI pipelines, multi-environment setups, and secrets management. *(Days 4-5 planned)* 40 | 41 | ## 🛠️ What you need 42 | 43 | For local labs (Days 1-2): 44 | 45 | * Docker Desktop or Docker Engine 46 | * `kubectl` 47 | * Git and a GitHub account 48 | * [Flux CLI](https://fluxcd.io/) 49 | 50 | For cloud labs (Day 3): 51 | 52 | * Azure CLI 53 | * An Azure subscription 54 | 55 | > **Tested with**: current stable releases of Docker, `kubectl`, kind, Flux, and Azure CLI. If you hit a version snag, open an issue and we’ll help. 56 | 57 | ## 🗺️ Your learning path 58 | 59 | **Status:** Days 1-3 ready now. Days 4-5 planned. 60 | 61 | | Day | Focus | What you'll build | 62 | | -------------------------------------------------------------------- | ---------------------------------------------------- | ------------------------------------ | 63 | | [**Day 1**](./Day-1-What-really-is-GitOps.md) | **Understand**: what GitOps is and why it matters | A clear mental model | 64 | | [**Day 2**](./Day-2-Building-Your-First-Self-Healing-System.md) | **Build**: your first self-healing system with Flux | Local GitOps loop that auto-corrects | 65 | | [**Day 3**](./Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md) | **Scale**: take the same loop to AKS | Production-ready cloud deployment | 66 | | **Day 4** | **Operate**: real-world patterns and troubleshooting | Robust GitOps workflows | 67 | | **Day 5** | **Advance**: tool choices and next steps | Your GitOps roadmap | 68 | 69 | ## 🔄 Stay up to date (sync your fork) 70 | 71 | **Using GitHub (no local clone):** 72 | 73 | 1. Open *your fork* on GitHub. 74 | 2. Click **Sync fork** (or **Fetch upstream**) → **Update branch**. 75 | 3. Your fork is now in sync with the latest changes. 76 | 77 |
Using the CLI (optional) 78 | 79 | ```bash 80 | # inside your local clone 81 | git remote add upstream https://github.com/ahmedmuhi/GitOps-Days.git 82 | git fetch upstream 83 | git checkout main 84 | git merge upstream/main # or: git rebase upstream/main 85 | git push origin main 86 | ``` 87 | 88 |
89 | 90 | ## 🗂️ Repo map 91 | 92 | Everything you need is linked in the learning path above. 93 | If you’re browsing the source, the `/examples/` folder contains the manifests for Day 2 and Day 3 labs. 94 | 95 | ## ❓ Quick FAQ 96 | 97 | **Do I need to re-fork to get updates?** 98 | No - use the sync steps above. 99 | 100 | **I broke my local lab - now what?** 101 | Reconciliation will usually fix it. If not, recreate the kind cluster and reapply Flux. 102 | 103 | **I have a question that’s not listed here.** 104 | [Open an issue](https://github.com/ahmedmuhi/GitOps-Days/issues) - we’ll add it to the FAQ. 105 | 106 | ## 💬 Join the conversation 107 | 108 | GitOps-Days is evolving, and your feedback matters. 109 | 110 | * Found a bug or typo? [Open an issue](https://github.com/ahmedmuhi/GitOps-Days/issues) 111 | * Have ideas or improvements? Send a pull request 112 | * Sharing your journey? Post with `#GitOpsDays` 113 | 114 | ## 🧭 Roadmap 115 | 116 | * **Day 4** - CI pipelines that feed the loop, image promotion, rollout gates 117 | * **Day 5** - Multi-environment layouts and secrets patterns 118 | * Future topics will follow learner needs 119 | 120 | ## 📚 Additional resources 121 | 122 | * [Flux documentation](https://fluxcd.io/flux/) 123 | * [OpenGitOps / CNCF GitOps Working Group](https://opengitops.dev/) 124 | * [Kubernetes tutorials](https://kubernetes.io/docs/tutorials/) 125 | 126 | 🚀 **Ready to start?** Jump into **[Day 1: What really is GitOps?](./Day-1-What-really-is-GitOps.md)** and begin your GitOps journey. 127 | -------------------------------------------------------------------------------- /Day-1-What-really-is-GitOps.md: -------------------------------------------------------------------------------- 1 | # 🌟 Day 1 – What really is GitOps? 2 | 3 | If you’ve worked with Kubernetes for more than a few weeks, you’ve seen it: 4 | what’s running in the cluster drifts away from what you thought was deployed. 5 | A manual fix here, an emergency tweak there… and before long, the state in your cluster and the state in your Git repo tell two different stories. 6 | 7 | That gap is where incidents start and trust in your deployments erodes. 8 | 9 | Today we’re going to close that gap - permanently. 10 | Not with more scripts or a bigger CI pipeline, but by changing **where** your source of truth lives and **how** your cluster keeps itself aligned with it. 11 | 12 | By the end of this session, you’ll be able to: 13 | 14 | * State GitOps in one clear sentence. 15 | * Trace the Git → controller → cluster loop. 16 | * Explain why pull beats push for Kubernetes - and why it changes the game for drift, security, and recovery. 17 | 18 | Let’s start with the most important question: **what exactly is GitOps?** 19 | 20 | ## What exactly is GitOps? 21 | 22 | So what *is* GitOps, really? Let’s start with how most teams try to fight drift - and why those fixes don’t quite close the gap. 23 | 24 | You’ve probably seen this: configs stored in Git but applied manually, deployment scripts that push updates, or CI/CD pipelines that run after every commit. They help, sure, but they still leave a gap between what’s in your repo and what’s running in your cluster. Manual changes or external systems can sneak in, and that’s where drift and configuration rot take hold. 25 | 26 | This isn’t just your team’s problem. Back in 2017, engineers at Weaveworks popularised a different way: treat Git as the single source of truth, and let the cluster enforce it for itself. The idea caught on quickly. Tools like Flux and Argo CD put it into practice, and the CNCF’s [OpenGitOps project](https://opengitops.dev) formalised the core principles so teams everywhere could work from the same playbook. 27 | 28 | Put simply: 29 | 30 | **GitOps means storing the desired state of your system in Git, and running a controller inside each cluster that continuously pulls that state and reconciles the cluster to match it.** 31 | 32 | > If it’s not in Git, it shouldn’t exist in the cluster. 33 | > If it’s in Git, the cluster should match it. 34 | 35 | GitOps loop comic 36 | The loop is simple: declare your infrastructure as code, commit it to Git, the controller pulls and enforces it, and the cluster stays aligned automatically. 37 | 38 | In practice, that looks like this: 39 | 40 | 1. You declare your infrastructure and application configs as code. 41 | 2. You commit the changes to Git - now it’s versioned and visible. 42 | 3. The in-cluster controller compares Git (desired) with the cluster (actual) and fixes any differences. 43 | 4. The cluster stays aligned, and drift never builds up. 44 | 45 | Storing YAML in Git is a good first step. GitOps goes further - your cluster pulls from Git and enforces it, continuously. Let’s see what that changes. 46 | 47 | ## Beyond Just Storing YAML in Git 48 | 49 | You might be thinking, *“Hang on - we already keep our manifests in Git. Isn’t that GitOps?”* 50 | That’s a good start… but it’s not the whole story. 51 | 52 | Here’s why: storing YAML in Git is like writing down the rules but never checking if anyone’s following them. Without something constantly making sure your cluster matches those files - and fixing it when it doesn’t - drift will sneak back in. 53 | 54 | When you go from “YAML in Git” to **GitOps**, a few key things change: 55 | 56 | * **Authority** - You (or a CI job) are no longer the one applying changes; an in-cluster controller does it, all the time, without forgetting. 57 | * **Direction** - Instead of pushing changes into the cluster, the cluster pulls its own configuration from Git. 58 | * **Timing** - Instead of waiting until you remember to run a command, reconciliation happens automatically on a short, predictable cycle. 59 | * **Evidence** - Instead of ad-hoc CLI changes, every change leaves a commit and a review trail. 60 | 61 | Here’s how that plays out: 62 | 63 | You merge a PR that sets `replicas: 3`. 64 | Later, someone bumps it to 5 by hand. 65 | In the next reconciliation cycle, the controller spots the mismatch and sets it back to 3. 66 | No Slack pings. No late-night debugging. If you *really* want 5, you change Git. If you need to undo a bad change, you revert the commit. 67 | 68 | The takeaway? 69 | YAML in Git tells you what *should* happen. GitOps makes sure it *does* happen - automatically, continuously, and without you chasing it. 70 | 71 | ## How the GitOps Engine Works 72 | 73 | So far, we’ve been talking about *what* GitOps is. Let’s talk about *how* it actually works. 74 | 75 | At the heart of GitOps is a small piece of software called a **GitOps controller** that runs inside your cluster. Its job is simple but relentless: 76 | 77 | 1. **Watch** your Git repository for changes. 78 | 2. **Compare** what’s in Git (desired state) with what’s running in the cluster (actual state). 79 | 3. **Reconcile** any differences by updating the cluster to match Git. 80 | 81 | That’s it: **watch → compare → reconcile**. 82 | This is the GitOps loop, and it runs continuously - on a short, predictable cycle - like a steady heartbeat keeping your cluster healthy. 83 | 84 | Here’s what that means in real life: 85 | If someone makes a direct change in the cluster - maybe tweaks an environment variable - the controller spots it in the next cycle and flips it back to match Git. No drama. No surprises. And no chasing down mysterious changes later. 86 | 87 | With that foundation in place, we can look at how this loop shapes your day-to-day workflow. 88 | 89 | ## GitOps Workflows in Practice 90 | 91 | Now that you understand *how* the GitOps engine works, let’s zoom out and see how it fits into your everyday workflow. 92 | 93 | Here’s the big picture at a glance: 94 | 95 | ![GitOps Workflow Diagram](assets/images/GitOps-Loop.png) 96 | 97 | ### The Two-Repository Pattern 98 | 99 | In GitOps, we usually split work into two repos: 100 | 101 | 1. **Application repo** – Your source code, tests, and Dockerfiles. 102 | 2. **Configuration repo** – Your Kubernetes manifests, Helm charts, or Kustomize configs. 103 | 104 | Why split them? 105 | It keeps code changes and deployment settings independent. Developers focus on building and testing code. Platform teams focus on how and where it runs. Each can evolve separately, with its own review and approval process. 106 | 107 | ### From Commit to Cluster 108 | 109 | Here’s what it looks like in motion: 110 | You push code to the **application repo**. 111 | CI picks it up, builds and tests it, creates a container image, and pushes that image to a registry. 112 | Then - and this is the only deployment step CI does - it updates the **config repo** with the new image tag. 113 | 114 | The GitOps controller, always watching the config repo, spots the change. It pulls the new config, applies it to the cluster, and if anything drifts later, it quietly puts things back in place. 115 | 116 | ### The Key Shift 117 | 118 | Notice what never happens? 119 | The CI pipeline never talks to your cluster. No API tokens floating around in external systems. No one-off manual kubectl commands. 120 | 121 | The cluster follows what’s in Git, and only what’s in Git. That’s not just cleaner - it’s more secure, more auditable, and more reliable. If you want to change something, you change Git. If you want to undo something, you revert Git. 122 | 123 | ## GitOps vs Traditional CI/CD 124 | 125 | You’ve seen how GitOps works inside a team’s workflow. But how does it stack up against the way most teams deploy today? Let’s put them side by side. 126 | 127 | CI/CD vs GitOps comic 128 | ![Push vs Pull comic](assets/images/Pull-vs-Push-Model.png) 129 | 130 | The difference comes down to who makes the change, and how it gets into your cluster. 131 | 132 | ### Push vs Pull at a Glance 133 | 134 | | Aspect | Traditional CI/CD | GitOps | 135 | | ------------------- | ----------------------------------------- | ------------------------------------------- | 136 | | **Who deploys** | CI pipeline pushes to cluster | Controller inside cluster pulls from Git | 137 | | **Access model** | External systems need cluster credentials | Only the in-cluster controller needs access | 138 | | **When it happens** | On demand when the pipeline runs | Continuously, on a short, regular cycle | 139 | | **Drift handling** | Manual intervention required | Automatically detected and fixed | 140 | | **Rollback** | Re-run pipeline with an old version | `git revert` and commit | 141 | | **Audit trail** | Spread across CI logs | All in Git history | 142 | | **Source of truth** | Could be CI, could be the cluster | Always Git | 143 | 144 | ### Why the Pull Model Changes the Game 145 | 146 | Security is one of the biggest wins. 147 | 148 | In the push model: 149 | 150 | * Anyone with CI credentials can make direct changes to your cluster. 151 | * Those changes might leave minimal traces outside the CI system. 152 | * Finding them later means digging through multiple systems. 153 | 154 | In the pull model: 155 | 156 | * Every change must go through Git. 157 | * That means commits, branches, and file changes - all logged, reviewable, and easy to trace. 158 | * Rolling back is as simple as reverting a commit. 159 | 160 | Push model = a quiet side door. 161 | Pull model = the front door, where everyone sees who’s coming and going. 162 | 163 | ### What This Means Day to Day 164 | 165 | **Traditional CI/CD**: You run a pipeline and hope it finishes cleanly. Maybe the cluster updated. Maybe someone changed something in between. 166 | 167 | **GitOps**: If it’s in Git, it’s in the cluster. If someone changes the cluster by hand, it’s corrected automatically in the next reconciliation cycle. 168 | 169 | With GitOps, deployments aren’t one-off events. They’re a state your system actively maintains - secure, auditable, and resistant to drift. 170 | 171 | ## The Four Principles That Make GitOps Work 172 | 173 | Everything you’ve seen today - the self-healing, the security, the simplicity - comes down to four core ideas. They’re not mine, and they’re not specific to Flux. The CNCF’s [OpenGitOps](https://opengitops.dev) project pulled these patterns from real-world teams and wrote them down so everyone is speaking the same language. 174 | 175 | Here they are, plain and simple: 176 | 177 | 1. **Declarative** - Describe the end state you want, not the steps to get there. For example, `replicas: 3` instead of running `kubectl scale`. 178 | 2. **Versioned & Immutable** - Keep the desired state in Git, so every change is tracked, reviewed, and reversible. 179 | 3. **Pulled Automatically** - The cluster fetches its own configuration from Git - you never push changes into it. 180 | 4. **Continuously Reconciled** - The system keeps reality matched to Git and fixes drift whenever it appears. 181 | 182 | If someone tells you they “do GitOps,” these are the four things you should be able to see in action. And when you use Flux, Argo CD, or other GitOps tools, this is exactly what they’re implementing for you. 183 | 184 | The Four GitOps Principles 185 | 186 | ## GitOps in Context: Your Journey Forward 187 | 188 | ### From Drift to Control 189 | 190 | We began today with a simple truth: what’s in Git often drifts from what’s running in your cluster. Quick fixes, manual changes, and ad-hoc scripts make it worse. 191 | 192 | Now you’ve seen the alternative — a model where Git is the single source of truth, and your cluster keeps itself aligned with it. 193 | 194 | By the end of this first session, you can: 195 | 196 | * Explain the GitOps loop: **watch → compare → reconcile** 197 | * Show why **pull beats push** for security and reliability 198 | * Identify the four CNCF-endorsed principles that make self-healing possible 199 | 200 | ### Tomorrow: From Knowledge to Power 201 | 202 | Tomorrow we go from concept to action. In the next hour you will: 203 | 204 | * **Build** a local Kubernetes cluster 205 | * **Install** Flux and watch it take control 206 | * **Deploy** an app using only Git commits 207 | * **Break** something on purpose — and watch it heal itself 208 | 209 | No cloud accounts. No complex setup. Just your laptop and the full GitOps loop in action. 210 | 211 | And here’s the real win: with GitOps, drift isn’t something you scramble to detect - it’s something that simply can’t persist. Your clusters will sync themselves, your audit trail will live in Git, and your role will shift from firefighting to guiding intent. 212 | 213 | By Day 5, you’ll be confident running GitOps in production - with the peace of mind that comes from knowing your system is always in the state you declared. 214 | 215 | **Ready to build your first self-healing system?** 216 | [Continue to Day 2 →](https://github.com/ahmedmuhi/GitOps-Days/blob/main/Day-2-Building-Your-First-Self-Healing-System.md) 217 | -------------------------------------------------------------------------------- /Day-2-Building-Your-First-Self-Healing-System.md: -------------------------------------------------------------------------------- 1 | # 🚀 Day 2 – Building Your First Self-Healing Kubernetes System with Flux 2 | 3 | Yesterday we explored the antidote to one of the most frustrating problems in Kubernetes operations: **configuration drift**. 4 | That moment when the running system doesn’t match what you think is deployed - maybe because someone made a “quick fix” in production and never committed it, or a setting silently changed without explanation. In the old world, that meant 3 a.m. firefights, scrambling through logs, and hoping you could piece the cluster back together. 5 | 6 | GitOps changes that story. By making Git your single source of truth and letting the cluster reconcile itself, drift can’t persist - and today, you’ll see that in action. 7 | 8 | ## 🏁 Welcome Back to GitOps-Days 9 | 10 | If any of the terms from Day 1 - *desired state*, *pull model*, or the *four GitOps principles* - aren’t fresh in your mind, you can [review them here](./Day-1-What-really-is-GitOps.md) before diving in. 11 | 12 | Today, we’ll use **[Flux](https://fluxcd.io/)** - a GitOps operator - to build a live, breathing system that syncs with your repo, detects drift, and heals itself automatically. You’ll turn yesterday’s concepts into something you can watch work in real time. 13 | 14 | ## 🗺️ Your Hands-On Journey (\~1 hour) 15 | 16 | You’ll: 17 | 18 | 1. Set up a local Kubernetes lab environment (\~15 min) 19 | 2. Create a Git repository as your single source of truth (\~5 min) 20 | 3. Install Flux and connect it to your cluster (\~10 min) 21 | 4. Deploy an application entirely through Git commits (\~10 min) 22 | 5. Break things on purpose and watch Flux fix them (\~15 min) 23 | 24 | No cloud accounts. No complex setup. Just Docker and a few free tools. 25 | 26 | ## 🎯 By the End, You’ll Have 27 | 28 | * A real Kubernetes cluster with Flux keeping it in sync with your repo 29 | * Automated deployments triggered by pushes to Git 30 | * Continuous drift detection and correction in under a minute 31 | * A system that can recover from mistakes without your intervention 32 | 33 | **Ready to turn the pain of drift into the peace of self-healing?** 34 | Let’s get started. 35 | 36 | ## 🧰 Preparing Your Workspace 37 | 38 | Before we dive into building your self-healing system, let’s set up the **essential building blocks** it relies on. These tools form the foundation for everything you’ll do today - once they’re in place, Flux will be able to work its GitOps magic. 39 | 40 | Later in this lab, we’ll install **[Flux](https://fluxcd.io/)** - the GitOps operator that keeps your cluster in sync with Git - but first, we need this solid groundwork. 41 | 42 | > [!TIP] 43 | > Already have Docker, kind, kubectl, and Git installed? 44 | > [Skip ahead to “Set Up Your Git Repository”](#📂-set-up-your-git-repository). 45 | 46 | > [!IMPORTANT] 47 | > You'll need at least **4 GB of free RAM** and **2 GB of disk space** for the local cluster. 48 | 49 | ### The Tools You’ll Need 50 | 51 | **1. Docker** - Runs the containers that become your Kubernetes nodes. 52 | [Install Docker](https://docs.docker.com/get-docker/) 53 | **Minimum version:** **24.0** 54 | 55 | **2. kind** - Creates your local “playground” by spinning up a Kubernetes cluster inside Docker. 56 | [Install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) 57 | **Minimum version:** **0.25.0** 58 | 59 | **3. kubectl** - Your command-line tool for inspecting and managing Kubernetes resources. 60 | [Install kubectl](https://kubernetes.io/docs/tasks/tools/) 61 | **Minimum version:** **1.32** 62 | 63 | **4. Git** - Your single source of truth, where you declare what should be running in your cluster. 64 | [Install Git](https://git-scm.com/downloads) 65 | **Minimum version:** **2.40** 66 | 67 | ### ✅ Checkpoint: Verify Your Setup 68 | 69 | **Why this matters:** 70 | Each lab step includes a checkpoint like this. If something fails later, knowing exactly which step last worked makes troubleshooting easier - without retracing your entire setup. 71 | 72 | Run the following commands to confirm your tools are ready: 73 | 74 | ```shell 75 | docker --version 76 | kind --version 77 | kubectl version --client --short 78 | git --version 79 | ``` 80 | 81 | **Pass criteria:** 82 | You should see version numbers that match or exceed the minimums above. 83 | 84 | With your tools confirmed, you’re ready to create the Git repository that will control everything in your self-healing Kubernetes system. 85 | 86 | ## 📂 Set Up Your Git Repository 87 | 88 | Your Git repository is the **single source of truth** Flux will watch. Every change you push here will be automatically applied to your Kubernetes cluster - which is why this step is critical to building your self-healing system. 89 | 90 | ⏱️ **Time needed:** \~3 minutes 91 | 92 | ### 1️⃣ Fork this Repository 93 | 94 | Go to [`https://github.com/ahmedmuhi/GitOps-Days`](https://github.com/ahmedmuhi/GitOps-Days) and click **Fork** in the top-right corner. 95 | On the next screen, click **Create fork**. 96 | 97 | 💡 *Why this matters:* 98 | In GitOps, Flux constantly pulls from a Git repository to know what you cluster should look like. To make changes that Flux will actually see and apply, you need **write access** to that repository. You can't commit to someone else's repo, so forking creates you own copy in your GitHub account - one you control completely. 99 | 100 | Your fork will live at: 101 | 102 | ``` 103 | https://github.com/YOUR-USERNAME/GitOps-Days 104 | ``` 105 | 106 | ### 2️⃣ Clone Your Fork Locally 107 | 108 | ```shell 109 | git clone https://github.com/YOUR-USERNAME/GitOps-Days.git 110 | cd GitOps-Days 111 | ``` 112 | 113 | > [!IMPORTANT] 114 | > Replace `YOUR-USERNAME` with your actual GitHub username before running the command. 115 | > If you copy it exactly as show above without replacing it, the command will fail. 116 | 117 | ### 3️⃣ Checkpoint: Confirm Your Local Repo Is Linked to Your Fork 118 | 119 | Run: 120 | 121 | ```shell 122 | git remote -v 123 | ``` 124 | 125 | Expected output: 126 | 127 | ``` 128 | origin https://github.com/YOUR-USERNAME/GitOps-Days.git (fetch) 129 | origin https://github.com/YOUR-USERNAME/GitOps-Days.git (push) 130 | ``` 131 | 132 | ✅ This means: 133 | 134 | * You have a fork in your GitHub account. 135 | * You've cloned it locally. 136 | * Your local repo is connected to your fork, so pushes will go to a place you control 137 | 138 | ❌ If you see `ahmedmuhi` instead of your username, you cloned the original repo by mistake. Delete the folder and clone your fork instead - Flux won't work otherwise. 139 | 140 | **Checkpoint complete:** You now have a GitOps-ready repository with full write access, linked locally, and ready for Flux to watch. 141 | 142 | Next: we’ll create the Kubernetes cluster where your self-healing system will run. 143 | 144 | ## 🔄 Creating Your Kubernetes Environment 145 | 146 | Your Git repository is ready - now it's time to prepare the **stage** where GitOps will actually perform it's magic. 147 | We'll spin up a local Kubernetes cluster using **[kind](https://kind.sigs.k8s.io/)** (*Kubernetes in Docker*), which runs an entire cluster inside Docker containers. 148 | 149 | Why kind? For GitOps experiments, it's perfect: 150 | 151 | * ⚡ **Fast** - launches in about a minute 152 | * 🔄 **Rebuildable** - tear down and recreate easily to start fresh 153 | * 🧪 **Safe** - keeps your work local so you can experiment without touching production 154 | 155 | This gives you a fast feedback loop: test your GitOps setup locally, confirm it works, and only then consider running it in production. 156 | 157 | ⏱️ **Time needed:** \~2 minutes 158 | 159 | ### Creating the Cluster 160 | 161 | **Choose one** of the following commands based on your operating system: 162 | 163 | **Windows (PowerShell or Windows Terminal)** 164 | 165 | ```shell 166 | kind create cluster --name gitops-loop-demo 167 | ``` 168 | 169 | **macOS/Linux (Terminal)** 170 | 171 | ```shell 172 | kind create cluster \ 173 | --name gitops-loop-demo 174 | ``` 175 | 176 | This spins up a fully functional Kubernetes cluster in about a minute, using the latest stable Kubernetes version supported by kind. 177 | 178 | ### ✅ Checkpoint: Confirm Your Cluster Is Ready 179 | 180 | **Why this matters:** 181 | Before we bring in Flux, we need to be sure the cluster is running and ready to accept workloads. If the cluster isn’t healthy, you'll run into errors later. 182 | 183 | Run: 184 | 185 | ```shell 186 | kubectl get nodes 187 | ``` 188 | 189 | **Pass criteria:** 190 | 191 | * `STATUS` shows **Ready**. 192 | * The `AGE` is just a few minutes (freshly created). 193 | 194 | Example output: 195 | 196 | ``` 197 | NAME STATUS ROLES AGE VERSION 198 | gitops-loop-demo-control-plane Ready control-plane 1m v1.32.x 199 | ``` 200 | 201 | If you see `NotReady`: 202 | 203 | 1. Wait a few seconds and try again. 204 | 2. Make sure Docker is running: 205 | ```shell 206 | docker ps 207 | ``` 208 | 3. Ensure you have a least **4 GB RAM** and **2 GB disk** free 209 | 4. If still stuck, recreate the cluster: 210 | 211 | ```shell 212 | kind delete cluster --name gitops-loop-demo 213 | kind create cluster --name gitops-loop-demo 214 | ``` 215 | 216 | ### What You’ve Just Built 217 | 218 | * ✅ A fully functional Kubernetes cluster running locally 219 | * ✅ A safe, disposable playground for your GitOps experiments 220 | * ✅ The environment Flux will soon manage, keeping it in sync with your Git repo 221 | 222 | > [!TIP] 223 | > When you’re done with the lab, you can delete the cluster to free resources: 224 | > 225 | > ```shell 226 | > kind delete cluster --name gitops-loop-demo 227 | > ``` 228 | 229 | ## 🚀 Deploy Flux to Your Cluster and Connect It to Your Repo 230 | 231 | Your cluster is up and running - now it's time to bring in the **director** of our GitOps pla: **Flux**. 232 | 233 | Flux is not a single binary that “just runs” - when you install it, it deploys several **specialized controllers** in your cluster. 234 | These controllers live together in their own namespace (`flux-system`) and work like a team of automation agents: 235 | 236 | * **Source Controller** - pulls manifests from your Git repository 237 | * **Kustomize Controller** - applies those manifests to your cluster 238 | * **Helm Controller** - manages Helm releases (if you use them) 239 | * **Notification Controller** - sends alerts and status events 240 | 241 | Together, they ensure that what's running in your cluster always matches what's in Git - continuously and automatically. 242 | 243 | ⏱️ **Time needed:** \~3-4 minutes 244 | 245 | ### Install the Flux controllers 246 | 247 | Run: 248 | 249 | ```shell 250 | flux install 251 | ``` 252 | 253 | This sets up all the core Flux components insdie the `flux-system` namespace. From here on, Flux will be applying your declared state and fixing any drift. 254 | 255 | ### Connect Flux to your Git repository 256 | 257 | Next, we tell Flux **Source Controller** which Git repository to watch. This must be **your fork on GitHub**, not the original. 258 | 259 | ```shell 260 | flux create source git gitops-loop-demo \ 261 | --url=https://github.com/YOUR-USERNAME/GitOps-Days.git \ 262 | --branch=main \ 263 | --interval=30s 264 | ``` 265 | 266 | > [!IMPORTANT] 267 | > Replace `YOUR-USERNAME` with your actual GitHub username. 268 | > We use the HTTPS URL so Flux can pull directly from GitHub 269 | > The `--interval=30s` means Flux will check for changes every 30 seconds. 270 | 271 | ### ✅ Checkpoint: Confirm Flux is healthy and watching your repo 272 | 273 | 1. **Check that Flux components are healthy:** 274 | 275 | ```shell 276 | flux check 277 | ``` 278 | 279 | Example whenall is healthy: 280 | 281 | ```shell 282 | ► checking controllers 283 | ✔ source-controller: deployment ready 284 | ✔ kustomize-controller: deployment ready 285 | ✔ helm-controller: deployment ready 286 | ✔ notification-controller: deployment ready 287 | ✔ all checks passed 288 | ``` 289 | 290 | 2. **Verify your Git source is registered and ready:** 291 | 292 | ```shell 293 | flux get sources git 294 | ``` 295 | 296 | Example output when ready: 297 | 298 | ```shell 299 | NAME URL READY STATUS AGE 300 | gitops-loop-demo https://github.com/YOUR-USERNAME/GitOps-Days True stored artifact for revision 'main@sha1:123abc456def...' 1m 301 | ``` 302 | 303 | **Pass criteria:** 304 | 305 | * `flux check` shows all controllers as `✔ ... ready` 306 | * `flux get sources git` shows your source with `READY` = `True` 307 | 308 | **What this means:** 309 | The Source Controller has made a copy of your GitHub fork and stored it locally inside your cluster. It will refresh this cached copy every 30 seconds (or whatever interval you set), so your cluster always has the latest version of your repo ready to use. 310 | Later, when we tell Flux *what* to deploy, the Kustomize Controller will read those files from this local cache - not directly from GitHub - and deploy them to your cluster. 311 | 312 | **Checkpoint complete:** 313 | Flux is now **installed in your cluster** and **watching your GitHub fork**. 314 | Before we tell Flux *what* to deploy from that repository, there's one last bit of setup to keep your work safe. 315 | 316 | ## 💡 Before You Copy the Example Files 317 | 318 | I update this repository frequently - adding lessons, improving examples, and fixing issues. 319 | If you make changes directly in the shared `examples/` folders and later *sync your fork* with the upstream repo, those changes could be overwritten. 320 | 321 | To prevent that: 322 | 323 | * **Create your own workspace folder** under `student-work/YOUR-USERNAME` 324 | * **Copy** the example files you'll be working on into that folder 325 | (e.g., copy the Day 2 `hello` app example there) 326 | * Make changes **only** inside your workspace folder 327 | * Later, when we tell Flux what to deploy, you'll point it to *your* folder - not the shared examples 328 | 329 | This way, syncing upstream changes will never overwrite your personal work. 330 | 331 | ### 📋 Commands to Create Your Workspace and Copy the Example 332 | 333 | **macOS/Linux (Terminal):** 334 | 335 | ```shell 336 | mkdir -p student-work/YOUR-USERNAME/Day2 337 | cp -r examples/days/clusters/local/apps/hello student-work/YOUR-USERNAME/day2/ 338 | ``` 339 | 340 | **Windows (PowerShell):** 341 | 342 | ```shell 343 | New-Item -ItemType Directory - Path "student-work\YOUR-USERNAME\day2" -Force 344 | Copy-Item -Recurse examples\day2\clusters\local\apps\hello student-work\YOUR-USERNAME\day2\ 345 | ``` 346 | 347 | > [!IMPORTANT] 348 | > Replace `YOUR-USERNAME` with your actual GitHub username before running these commands. 349 | 350 | ### ⚠️ A note About Re-copying 351 | 352 | If you repeat a lesson in the future and re-copy files from `examples/` into a folder that **already exists**, your previous work will be overwritten. 353 | 354 | **How to avoid overwriting yourself:** 355 | 356 | * If you want a fresh start, delete or rename your old folder first 357 | * Or create a new subfolder (e.g., `student-work/YOUR-USERNAME/day2-v2`) and copy into that 358 | 359 | **Next:** We'll copy the Day 2 example into your workspace and tell Flux what to deploy - and when we do that, something will happen immediately that surprises most people on their first run. 360 | 361 | ## 📦 Tell Flux What to Deploy 362 | 363 | You've got Flux installed, your repository linked, and you know how to keep your work safe. 364 | Now it's time to give Flux a very specific instruction: **what to deploy** from your Git repo into the cluster. 365 | 366 | This is done by creating a **Kustomization** - a Kubernetes object that Flux treats like marching orders. 367 | It tells Flux: 368 | 369 | * **Where** in your Git repository to find your application configuration manifests 370 | * **How often** to check them 371 | * **What to do** when something changes (or is removed) 372 | 373 | **⏱️ Time needed:** \~2 minutes 374 | 375 | ### 🛑 Important: Point to Your Own Workspace 376 | 377 | Do **not** point Flux at the shared `examples/` folder. 378 | If you do, the next time you sync your fork with upstream changes, your work could be overwritten. 379 | 380 | Instead: 381 | 382 | * Work only inside your **own** folder under `./student-work/YOUR-USERNAME/` 383 | * Point Flux to *your* folder, not the shared examples 384 | 385 | For example: 386 | 387 | ```shell 388 | ./student-work/YOUR-USERNAME/day2/hello 389 | ``` 390 | 391 | ### 🛠️ Create a Kustomization 392 | 393 | Now tell Flux to deploy from your workspace folder: 394 | 395 | Run: 396 | 397 | ```bash 398 | flux create kustomization hello-app \ 399 | --source=GitRepository/gitops-loop-demo \ 400 | --path="./student-work/YOUR-USERNAME/day2/hello" \ 401 | --prune=true \ 402 | --interval=1m 403 | ``` 404 | 405 | **Flags explained:** 406 | 407 | * `--path` → The folder in your repo where your manifests live 408 | * `--prune=true` → Remove cluster resources when the file is deleted from Git 409 | * `--interval=1m` → Check for drift and reconcile every minute 410 | 411 | ### ✅ Checkpoint: Confirm the Kustomization is Ready 412 | 413 | ```bash 414 | flux get kustomizations 415 | ``` 416 | 417 | Example when ready: 418 | 419 | ``` 420 | NAME READY MESSAGE REVISION SUSPENDED 421 | hello-app True Applied revision: main@sha1:123abc456def... main@sha1:123abc... False 422 | ``` 423 | 424 | **Checkpoint complete:** You’ve just given Flux its marching orders. 425 | 426 | Here’s the part that surprises most people: the moment you created that Kustomization, Flux didn’t wait for you to push a commit. It immediately pulled the manifests from your folder and applied them to the cluster. 427 | Your very first GitOps-powered deployment has already happened in the background. 428 | 429 | **Next:** Let’s go see what Flux just installed for you. 430 | 431 | ## 👀 See Your First Flux Deployment 432 | 433 | The moment you created your Kustomization, Flux got to work. 434 | It didn't wait for a new commit - it immediately pulled the manifests from your workspace folder and applied them to your cluster. 435 | 436 | Let's see exactly what happened. 437 | 438 | **⏱️ Time needed:** \~5 minutes 439 | 440 | ### 🔍 What Flux Discovered in Your Repository 441 | 442 | When you created the Kustomization, you told Flux to watch your workspace folder. Inside that folder, Flux found: 443 | 444 | ``` 445 | student-work/YOUR-USERNAME/day2/hello/ 446 | ├── namespace.yaml # Creates the 'hello' namespace 447 | ├── deployment.yaml # Defines pods running the web server 448 | └── service.yaml # Exposes the app on port 80 449 | ``` 450 | 451 | These YAML files define a simple “Hello World” web application. The Kustomize Controller read them from the cached copy of your repository that Flux's Source Controller originally fetch from your GitHub Repository, and **deployed them automatically** to your cluster. 452 | 453 | ### See What Flux Created 454 | 455 | Check your cluster for these resources: 456 | 457 | ```bash 458 | kubectl get pods,svc -n hello 459 | ``` 460 | 461 | You should see something like: 462 | 463 | ``` 464 | NAME READY STATUS RESTARTS AGE 465 | pod/hello-65d4c4d5c9-xz7vp 1/1 Running 0 2m 466 | 467 | NAME TYPE CLUSTER-IP PORT(S) AGE 468 | service/hello ClusterIP 10.96.x.x 80/TCP 2m 469 | ``` 470 | 471 | > **Note:** The `AGE` column should show only a few minutes - that’s how you know Flux just created them. 472 | 473 | ### The Automatic Deployment in Action 474 | 475 | Think about this: 476 | 477 | * ❌ You didn’t run `kubectl apply` 478 | * ❌ You didn’t manually trigger a deployment 479 | * ✅ Yet your application is running in the cluster 480 | 481 | That's because Flux: 482 | 483 | 1. Pulled your manifests from GitHub 484 | 2. Applied them to the cluster 485 | 3. Got everything running without any manual steps 486 | 487 | ### Access Your Running Application 488 | 489 | Forward the service port to your local machine: 490 | 491 | ```bash 492 | kubectl port-forward -n hello svc/hello 8080:80 493 | ``` 494 | 495 | Then open [http://localhost:8080](http://localhost:8080) in your browser. 496 | 497 | 🎉 **There it is!** Your Hello World app, running in Kubernetes, deployed entirely by Flux. 498 | 499 | ### ✅ Checkpoint complete: 500 | 501 | Flux is actively deploying workloads from your workspace folder in your GitHub fork. 502 | Next, let's go beyond simply watching Flux deploy what's already there, we'll make a real change in Git, push it, and watch Flux pick it up and apply it to the cluster automatically. 503 | 504 | ## ✏️ Make Your First GitOps-Driven Change 505 | 506 | So far, Flux has deployed what was alreay in your workspace folder. 507 | Now, let's prove that **pushing a Change to Git** is all it takes to update your cluster. 508 | 509 | **⏱️ Time needed:** \~5 minutes 510 | 511 | ### 1️⃣ Edit Your Deployment 512 | 513 | Open: 514 | 515 | ```shell 516 | student-work/YOUR-USERNAME/day2/hello/deployment.yaml 517 | ``` 518 | 519 | Find: 520 | 521 | ```yaml 522 | spec: 523 | replicas: 1 524 | ``` 525 | 526 | ... and change it to: 527 | 528 | ```yaml 529 | spec: 530 | replicas: 3 531 | ``` 532 | 533 | This tells Kubernetes to run three pods instead of one. 534 | 535 | ### 2️⃣ Commit and Push Your Change 536 | 537 | > [!TIP] 538 | > You can do this because you forked the repository earlier 539 | > If you had cloned the original repo, the `git push` command below would fail - and Flux wouldn't see your change. 540 | 541 | Run: 542 | 543 | ```shell 544 | git add student-work/YOUR-USERNAME/day2/hello/deployment.yaml 545 | git commit -m "Scale hello app to 3 replicas" 546 | git push 547 | ``` 548 | 549 | ### 3️⃣ Watch Flux Reconcile 550 | 551 | Flux checks your Git source every 30 seconds. 552 | Let's watch it notice the new commit and update being applied to your cluster: 553 | 554 | ```shell 555 | kubectl get deployment hello -n hello -w 556 | ``` 557 | 558 | You'll see the replica count change from 1 → 3 within a minute. 559 | 560 | ### ✅ Checkpoint: Git Changes = Live Changes 561 | 562 | Flux just: 563 | 564 | 1. Pulled your updated repo from GitHub (via the Source Controller cache) 565 | 2. Saw that the desired state now had `replicas: 3` 566 | 3. Applied the change (via the Kustomize Controller) to your cluster automatically 567 | 568 | No `kubectl apply`. No manual deploys. 569 | Just Git → Flux → Cluster, exactly as GitOps promises. 570 | 571 | Next, we’ll put that self-healing promise to the test: we’ll *break* something in the cluster on purpose and watch Flux notice the drift - and put it back automatically. 572 | 573 | ## 🔨 Breaking Things (For Science!) 574 | 575 | Seeing Flux deploy you app automatically is great - but the **real proof** of a self-healing system is how it responds when things inevitably go wrong. 576 | Let's test that resilience with **two real-world drift simulations**. 577 | 578 | ### 🧪 Mini-Lab 1: The Emergency Scale 579 | 580 | **Scenario:** 581 | A teammate is in the middle of an incident. Under pressure, they bypass Git and run a quick fix in the cluster - scaling the app manually. 582 | 583 | **Action:** 584 | Scale the deployment from 3 to 5: 585 | 586 | ```bash 587 | kubectl scale deployment hello -n hello --replicas=5 588 | ``` 589 | 590 | Now watch what happens live: 591 | 592 | ```bash 593 | kubectl get deployment hello -n hello -w 594 | ``` 595 | 596 | > [!TIP] 597 | > The `-w` flag means “watch” - you’ll see updates as they happen. 598 | 599 | **Expected Result:** 600 | Within \~30-60 seconds, you’ll see something like: 601 | 602 | ``` 603 | hello 3/3 3 3 15m 604 | hello 5/5 5 5 15m15s # Manual change applied 605 | hello 3/3 3 3 15m40s # Flux detects drift and fixes it 606 | ``` 607 | 608 | **Why it happens:** 609 | Flux's Kustomize Controller compared the cluster's live state to your Git-defined desired state (3 replicas). 610 | Seeing a mismatch, it reconciled the deployment back to what's in Git - no questions asked. 611 | 612 | **Checkpoint ✅** 613 | Your deployment is back at **3/3 replicas**. 614 | Manula changes didn't stick - **Git's declared state wins**. 615 | 616 | Press `Ctrl+C` to stop watching. 617 | 618 | ### 🧪 Mini-Lab 2: The Catastrophic Delete 619 | 620 | **Scenario:** 621 | A worst-case accident - the entire `hello` namespace is deleted. 622 | 623 | **Action:** 624 | Run: 625 | 626 | ```shell 627 | kubectl delete namespace hello 628 | ``` 629 | 630 | Then watch Flux rebuild everything: 631 | 632 | ```shell 633 | watch kubectl get all -n hello 634 | ``` 635 | 636 | **Expected Result:** 637 | Over the next \~60 seconds you’ll see: 638 | 639 | * Namespace recreated 640 | * Deployment, service, and pods restored 641 | * App fully backonline 642 | 643 | **Why it happens:** 644 | The namespace and its resources are still defined in Git. 645 | When the Kustomize Controller notices they're missing, it reapplies the manifests from the Source Controller's latest copy of your repo until the cluster matches Git again. 646 | 647 | **Checkpoint ✅** 648 | Namespace **hello** and all resources are running exactly as defined in your repo. Drift is gone. 649 | 650 | Press `Ctrl+C` when you see everything running again. 651 | 652 | ### 📊 Why It Took \~60 Seconds 653 | 654 | When you set up Flux, you configured: 655 | 656 | * **Source check interval** → every 30 s (updates the cached Git copy) 657 | * **Reconciliation interval** → every 60 s (compares cluster vs. Git and applies fixes) 658 | 659 | See the timing in the events log: 660 | 661 | ```bash 662 | flux events --for Kustomization/hello-app 663 | ``` 664 | 665 | Sample output: 666 | 667 | ``` 668 | Reconciliation finished in 1.2s, next run in 1m0s 669 | ``` 670 | 671 | ### 💡 The Takeaway 672 | 673 | With GitOps: 674 | 675 | * Accidental deletions → **rebuilt** 676 | * Manual scaling → **reverted** 677 | * Any config drift → **corrected** 678 | 679 | **Manual fixes that don't exist in Git literally cannot persist.** 680 | The cluster is always brought back to the declared state - automatically, continuously, and reliably. 681 | 682 | ## 🏆 Day 2 Complete: You Built a Self-Healing Cluster 683 | 684 | Today, you didn't just learn about GitOps - you proved it works. 685 | 686 | ### What You Achieved 687 | 688 | * **A local Kubernetes cluster** running on kind + Docker 689 | * **Flux installed** and watching your personal GitHub fork 690 | * **Automatic depoyments** from Git commits 691 | * **Drift detection and correction** in under a minute 692 | * Confidence that quick fixes and accidental deletes **can't persist** 693 | 694 | ### Your Journey Today 695 | 696 | 1. Prepared your workspace with Docker, kind, kubectl, and Git 697 | 2. Forked and cloned your own GitOps repo 698 | 3. Installed Flux and connected it to your fork 699 | 4. Deployed your first app with Kustomization 700 | 5. Made a Git change and watched Flux apply it automatically 701 | 6. Broke things on purpose, and watched Flux heal them 702 | 703 | ### What It Matters 704 | 705 | You now have a working Git → Flux → Cluster pipeline. 706 | Your cluster state is no longer guesswork, it's **enforced reality** 707 | 708 | ### Up Next - Day 3: GitOps in the Cloud 709 | 710 | Tomorrow, we take your local success to **Azure Kubernetes Service (AKS)**: 711 | 712 | * Cloud-specific GitOps patterns 713 | * Managing screts securely 714 | * Production-ready deployment flows 715 | 716 | THe core GitOps loop stays the same - Git defines, Flux enforces, but the stage gets bigger. 717 | 718 | > [!TIP] 719 | > 720 | > Between now and day 3, try: 721 | > 722 | > * Changing the `replicas` in your Git repo and watch Flux update it 723 | > * Break things in new ways - Flux will keep fixing them 724 | 725 | **See you tomorrow for Day 3!** 726 | 727 | [Continue to Day 3: GitOps in the Cloud →](https://github.com/ahmedmuhi/GitOps-Days/blob/main/Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md) 728 | 729 | > [!NOTE] 730 | > *Proud of what you built? Share your success! #GitOpsDays* 731 | -------------------------------------------------------------------------------- /Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md: -------------------------------------------------------------------------------- 1 | # 🚀 Day 3 – GitOps in the Cloud with AKS: Same Magic, Bigger Stage 2 | 3 | In [GitOps Day 2](./Day-2-Building-Your-First-Self-Healing-System.md) you built something remarkable on your laptop: 4 | a Kubernetes system that was truely **resilient and self-healing**. Flux kept every piece in its proper place, automatically repairing whatever chaos you threw at it - scaling changes, deletions, drift. What you saw wasn't just a neat demo; it was the core promise of GitOps delivered: **the cluster always returns to its declared state.** 5 | 6 | Now let's ask a bigger question: *what if that same architectural elegance, that precise reliability, translated identically to the cloud - no matter how sprawling or complex the environment?* 7 | 8 | That's exactly what we'll prove today. 9 | 10 | Welcome back to GitOps Days, where **Git is not just for code anymore**. It's the single source of truth for everything - infrastructure, application configuration, and the glue that holds them together. 11 | Yesterday, Git defined your laptop cluster. Today, Git defines a **production-grade Azure Kubernetes Service (AKS) cluster** in the cloud. 12 | 13 | Here's the surprising insight: GitOps doesn't fundamentally care where it runs. Local of cloud, small or enterprise-scale, the rules don't change. Same Git. Same Flux. Same self-healing loop. Only the stage beneath it grows larger. 14 | 15 | Today we'll take those principles into the real world: 16 | 17 | * Provision an AKS cluster (using Azure's free credits and low-cost resources for this tutorial). 18 | * Bootstrap Flux the production way - one command, end-to-end. 19 | * Deploy the very same Hello app you ran yesterday, now Internet accessible with a single YAML tweak. 20 | * Break things on prupose, and watch Flux restore order - even Azure load balancer and public IPs. 21 | 22 | By the end, you'll see your local success story scale seamlessly into the cloud. The GitOps loop remains unchanged; the only difference is the size of the playground. 23 | 24 | ## ☁️ Preparing Your Cloud Workspace 25 | 26 | Before we build your first AKS cluster, let's pause and get the **essentials** in place. Moving from your laptop to Azure isn't a huge leap - the setup looks almost the same, just one extra wait while Azure spins things up. 27 | 28 | But the key difference: this tutorial won't be completely free. 29 | But don't worry - it's very inexpensive. Running a one-node AKS cluster with a load balancer costs **about $1 if you leave it up for a full workday**. If you shut it down after this tutorial, you'll spend less than the price of a coffee. And if you're new to Azure, Microsoft gives you **$200 in free credits** when you sign up. 30 | 31 | ### 💰 Cost Snapshot 32 | 33 | * **Control plane**: free on the *Free* tier 34 | * **Node (Standard_B2s)**: ~$0.04-$0.05 per hour 35 | * **OS disk + Load Balancer + IP**: ~$0.04 per hour 36 | * **Total**: ~$0.08-$0.09 per hour (≈$0.70 for 8 hours) 37 | * **Cleanup**: We'll delete everything at the end so there are no ongoing costs 38 | 39 | ### ✅ What You'll Need 40 | 41 | 1. **Azure account** 42 | 43 | * Sign up at [azure.microsoft.com/free](https://azure.microsoft.com/free) if you don't have one 44 | * Comes with **$200 in credits** 45 | 46 | 2. **Azure CLI** (version 2.76.0 or later) 47 | 48 | ```bash 49 | # Check your version 50 | az --version 51 | 52 | # Install/update if needed: 53 | # Windows: winget install Microsoft.AzureCLI 54 | # macOS: brew install azure-cli 55 | # Linux: curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash 56 | ``` 57 | 58 | 3. **Tools from Day2** 59 | 60 | * `kubectl` - for cluster verification 61 | * `flux` CLI - for bootstrapping GitOps 62 | * `git` - your single source of truth 63 | 64 | ### 🔑 Logging Into Azure 65 | 66 | 1. Open your terminal and run: 67 | 68 | ```bash 69 | az loging 70 | ``` 71 | 72 | * A browser window will open for you to authenticate with your Azure account. 73 | * Once signed in, the CLI retrieves your tenants and subscriptions. 74 | 75 | 2. If you have access to more than one Azure Subscription, you'll see a prompt to **select which subscription** you want to use. The output would look like this: 76 | 77 | ``` 78 | [Tenant and subscription selection] 79 | 80 | No Subscription name Subscription ID Tenant 81 | --- ------------------------------------ ------------------------------------ ----------------- 82 | [1] Pay-As-You-Go Dev/Test xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx Default Directory 83 | [2]* Visual Studio Enterprise Subscrip... xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx Default Directory 84 | 85 | The default is marked with an *. 86 | Select a subscription and tenant (Type a number or Enter for no changes): 87 | ``` 88 | 89 | Pick the subscription where you're comfortable creating temporary test resources. 90 | 91 | 3. Double-check that the correct subscription is active: 92 | 93 | ```bash 94 | az account show --output table 95 | ``` 96 | 97 | ### 📦 A Quick Word on Kubernetes Types 98 | 99 | Yesterday, with kind (Kubernetes in Docker), you ran a **local simulation of Kubernetes**. It mimics the control plane and nodes inside containers, perfect for quick experiments. 100 | 101 | Today with AKS, Microsoft porvides a **conformant Kubernetes cluster** - meaning it passes the CNCF's official tests and behaves like upstream Kubernetes. The manifests you wrote yesterday will run here too, but now on a real cloud infrastructure managed by Azure. 102 | 103 | ✅ That's it. You've set up your workspace. logged into Azure. and know what costs to expect. 104 | 👉 In the next section, we'll actually **create your AKS cluster** and see your GitOps loop come alive in the cloud. 105 | 106 | ## ⚙️ Create Your AKS Cluster 107 | 108 | With your cloud workspace ready, it's time for the fun part: creating your first **Azure Kubernetes Service (AKS)** cluster. 109 | 110 | This is the same GitOps loop you've built yesterday, just running on **real, cloud-hosted infrastructure**. The steps look familiar: we'll create a logical container (called a *resource group*) to hold your cluster, then provision the cluster itself. Once it's up, you'll connect `kubectl` to it so you can talk to it from your laptop. 111 | 112 | ⏱️ **Time neded:** about 7-10 minutes (most of it waiting for Azure to spin up the nodes). 113 | 114 | ### Pick a Region 115 | 116 | Choose a region that is close to you to minimize latency. For me, that's **Australia East (Sydney)**, but you can use whatever region works best. 117 | 118 | Example regions often used for learning: 119 | 120 | * **Australia East** (Sydney) - good from New Zealand/Australia 121 | * **East US 2** or **West US 2** - good for the Americas 122 | * **West Europe** - good for Europe 123 | 124 | Well set a shell variable for the location for consistency: 125 | 126 | ```bash 127 | LOCATION="australiaeast" 128 | ``` 129 | 130 | ### Create a Resource Group 131 | 132 | A **resource group** in Azure is just a logical container to keep related resource together (cluster, disks, load balancer). 133 | 134 | ```bash 135 | RESOURCE_GROUP="gitops-prod-rg" 136 | 137 | az group create \ 138 | --name $RESOURCE_GROUP \ 139 | --location $LOCATION 140 | ``` 141 | 142 | ### Create Your AKS Cluster 143 | 144 | Now let's create a small AKS Cluster - one node is enough for this tutorial. 145 | 146 | ```bash 147 | az aks create \ 148 | --resource-group $RESOURCE_GROUP \ 149 | --name gitops-prod-aks \ 150 | --location $LOCATION \ 151 | --kubernetes-version 1.33 \ 152 | --node-count 1 \ 153 | --node-vm-size Standard_B2s \ 154 | --tier free \ 155 | --enable-managed-identity \ 156 | --generate-ssh-keys 157 | ``` 158 | 159 | **What this does.** 160 | 161 | * Spins up an AKS cluster with **Kubernetes 1.33** (latest long-term supported version) 162 | * Uses **1x Standard_B2s VM** for your node (lowest cost, enough for learning) 163 | * Runs in the **Free tier** (no control-plane cost; only the node, disk, and LB/IP billed) 164 | * Generates SSH keys automatically so Azure can access the node if needed 165 | 166 | > [!TIP] 167 | > **Cost reminder:** At this size, running the cluster for 8 hours costs about **$0.70**. We'll delete it at the end, to avoid ongoing cost. 168 | 169 | ### Connect to Your Cluster 170 | 171 | Once creation completes (≈7-10 minutes), download the credentials so `kubectl` can talk to your cluster: 172 | 173 | ```bash 174 | az aks get-credentials \ 175 | --resource-group $RESOURCE_GROUP \ 176 | --name gitops-prod-aks \ 177 | --overwrite-existing 178 | ``` 179 | 180 | This merges the cluster context into your `~/.kube/config` file. From now on, `kubectl` commands will talk to AKS instead of your local kind cluster. 181 | 182 | ### Verify Your Cluster 183 | 184 | Check that your cluster is ready: 185 | 186 | ```bash 187 | kubectl get nodes 188 | ``` 189 | 190 | You should see one node with `STATUS = Ready`. For example: 191 | 192 | ```bash 193 | NAME STATUS ROLES AGE VERSION 194 | aks-nodepool1-12345678-vmss000000 Ready 2m v1.33.2 195 | ``` 196 | 197 | ✅ Congratulations - you now have a real, conformant Kubernetes cluster running in Azure! 198 | 199 | 👉 Before we install Flux, let's first prepare your **student workspace** so you have a safe place in your fork to work with. This ensures your changes won't be overwritten if the the upstream examples are updated. 200 | 201 | ## 📂 Preparing Your Student Worspace 202 | 203 | Your AKS cluster is live - now it's time to set up the **Git side** of your GitOps loop. The step makes sure you're working in a safe copy of the repo that belongs to you, so nothing you do later gets lost when the examples are updated. 204 | 205 | ### 1️⃣ Check and Sync Your Fork 206 | 207 | All your changes will live in *your fork* of the repository. Before starting Day 3: 208 | 209 | * Open your fork on GitHub (`https://github.com//GitOps-Days`) 210 | * If GitHub shows a banner that says **“This branch is behind ...”**, click **Sync fork → Update branch** 211 | * If it says you're already up to date, no action is needed 212 | 213 | 👉 This ensures your fork contains the latest Day 3 examples before you clone it locally. 214 | 215 | ### 2️⃣ Clone Your Fork Locally 216 | 217 | Always clone **your fork's URL**, not the original repo: 218 | 219 | ```bash 220 | git clone https://github.com//GitOps-Days.git 221 | cd GitOps-Days 222 | ``` 223 | 224 | This guarantees your local Git repo `origin` points to your fork (the one Flux will use later). 225 | 226 | ### 3️⃣ Copy Day 3 Examples Into Your Workspace 227 | 228 | Never work directly in the sahred `examples/` folder - it may change in future updates. Instead, create your personal workspace: 229 | 230 | ```bash 231 | mkdir -p student-work//day3 232 | cp -r examples/day3/clusters student-work//day3/ 233 | ``` 234 | 235 | You now have: 236 | 237 | ``` 238 | student-work//day3/clusters/aks/apps/hello 239 | ├── deployment.yaml 240 | ├── namespace.yaml 241 | ├── service.yaml 242 | └── kustomization.yaml 243 | ``` 244 | 245 | This is **your safe sandbox**. Any changes you make will live here, untouched by upstream syncs. 246 | 247 | ### 4️⃣ Set Up GitHub Credentials 248 | 249 | Flux will commit its own config into your fork when we set it up later, so it needs your GitHub username and a personal access token (PAT). 250 | 251 | #### 1. Create a PAT i GitHub 252 | 253 | * Go to **Settings → Developer settings → Personal access tokens** 254 | * Choose **Fine-grained token** (recommended) 255 | * Scope it to **your fork**, with at least `contents: read/write` 256 | * Give it a friendly **name/label** like `gitops-days` so you can recoginize it later in your GitHub account. 257 | 258 | > [!IMPORTANT] 259 | > GitHub will show you a **long random secret string** only once when the token is created (it looks like `ghp_abCdEf1234567890XYZ...`). 260 | > Copy this secret immediately - this is the actual token you'll provide to Flux later. 261 | > The name/label you gave the token is just for your GitHub dashboard; Flux never sees that. 262 | 263 | #### 2. Export Your Credentials 264 | 265 | Paste your GitHub username and the **secret token string** into your shell as environment variables: 266 | 267 | ```bash 268 | # macOS/Linux 269 | export GITHUB_USER= 270 | export GITHUB_TOKEN= 271 | 272 | # Windows PowerShell 273 | $env:GITHUB_USER="" 274 | $env:GITHUB_TOKEN="" 275 | ``` 276 | 277 | Later we'll tell `flux` to pick these up automatically when it talks to GitHub. 278 | You don't need to pass `--token` on every command. 279 | 280 | ✅ At this point you have: 281 | 282 | * A GitHub fork up to date with Day 3 283 | * A safe student workspace folder in your fork 284 | * Credentials exported so Flux can authenticate with GitHub 285 | 286 | 👉 Next, we'll install Flux into the cluster using the **bootstrap method** and connect it to your Git repository. That's where the GitOps magic comes alive at cloud scale. 287 | 288 | ## 🤖 Installing Flux on Your Cloud Custer 289 | 290 | **Yesterday you installed Flux piece by piece. Today, you'll do it the production way - with one command that does everything.** 291 | 292 | On Day 2 you ran `flux install`, then create a `GitRepository`, then a `Kustomization`. That was great for learning the moving parts. 293 | In production, though, teams want one clean step. That's what `flux bootstrap` gives you - and that's what you'll use now. 294 | 295 | ### 1️⃣ Run the Bootstrap Command 296 | 297 | With your AKS cluster ready and your student workspace set up, install Flux into the cluster and connect it to your fork: 298 | 299 | ```bash 300 | flux bootstrap github \ 301 | --owner=$GITHUB_USER \ 302 | --repository=GitOps-Days \ 303 | --branch=main \ 304 | --path=student-work/$GITHUB_USER/day3/clusters/aks \ 305 | --personal 306 | ``` 307 | 308 | * `--owner` → your GitHub username (the one you exported as `$GITHUB_USER`) 309 | * `--repository` → the name of your forked repo (`GitOps-Days`) 310 | * `--branch` → the branch Flux will track (`main` in our case) 311 | * `--path` → the folder inside your repo that Flux will watch (`student-work/your-username/day3/clusters/aks`) 312 | * `--personal` → tells Flux this is your personal fork, not an org repo 313 | 314 | This one command does all the heavy lifting: it installs Flux controllers in your cluster `and` pushes Flux's own configuration into your fork, that's why we provided Flux with your GitHub username/token to allow it to commit to your repo. 315 | 316 | ⏱️ **How long will this take?** Usually 2-3 minutes. You'll know it's working when you see new commits appear in your fork under `student-work/your-username/day3/clusters/aks/flux-system`. 317 | 318 | ### 2️⃣ Verify Flux is Healthy 319 | 320 | Once the bootstrap finishes, check that everything came up cleanly: 321 | 322 | ```bash 323 | flux check 324 | flux get sources git 325 | flux get kustomizations 326 | ``` 327 | 328 | You should see: 329 | 330 | * Controllers in the `flux-system` namespace are **ready** 331 | * A `GitRepository/flux-system` pointing at your fork 332 | * A `Kustomization/flux-system` applying your root folder 333 | 334 | If you see all green checks ✅, congratualtions - Flux is now installed in AKS, connected to your fork, and running GitOps. 335 | 336 | That's it. Flux is now alive and watching your repo. 337 | 338 | 👉 Next, we'll slow down and unpack *what actually happened* during bootstrap - the files Flux created, the `flux-system` namespace, an how Flux now manages itself. 339 | 340 | ## 🧩 Unpacking Flux's Bootstrap 341 | 342 | You just ran `flux bootstrap` and confirmed Flux is alive in your AKS cluster. From the outside it look like one simple step - but behind the scenes, Flux quitly set up quite a bit of scaffolding for you. 343 | Let's slow down and see what actually happened. 344 | 345 | ### How your repo changed 346 | 347 | **Before bootstrap**, your Day 3 workspace was simple - it only contained your Hello app: 348 | 349 | ``` 350 | student-work//day3/clusters/aks/ 351 | └── apps/ 352 | └── hello/ 353 | ├── deployment.yaml 354 | ├── namespace.yaml 355 | ├── service.yaml 356 | └── kustomization.yaml 357 | ``` 358 | 359 | **After bootstrap** Flux committed new files into your fork so it could manage itself: 360 | 361 | ``` 362 | student-work//day3/clusters/aks/ 363 | ├── kustomization.yaml # Root recipe (entry point for Flux) 364 | ├── flux-system/ # Flux's own configuration 365 | │ ├── gotk-components.yaml 366 | │ ├── gotk-sync.yaml 367 | │ └── kustomization.yaml # Flux system recipe 368 | └── apps/ 369 | └── hello/ 370 | ├── deployment.yaml 371 | ├── namespace.yaml 372 | ├── service.yaml 373 | └── kustomization.yaml # Hello app Recipe 374 | ``` 375 | 376 | #### What This Means 377 | 378 | * The **new `flux-system` folder** is Flux adding its own configuration to Git. 379 | * The files inside tell Fulx *what to install* and *where to look*. 380 | * You don't need to edit these files right now - we'll make a small, safe change soon to connect you app. 381 | 382 | For now, just remeber: **Flux manages your cluster by following recipes (`kustomization.yaml` files) it finds in your repo.** 383 | 384 | 👉 But there's one detail that trips most people up at first: *why are there so many `kustomization.yaml` files now, and what's the difference between them?* 385 | We'll clear that up in the next section before you make your first change. 386 | 387 | ## ❓ Why Flux Has So Many `kustomization.yaml` Files? 388 | 389 | If you looked closely at your repo after bootstrap, you may have noticed something surprising: suddenly there are **three different `kustomization.yaml` files.** 390 | 391 | They look almost identical at first glance, but each one play a **different role**. Understanding this now ill make the next step - deploying your Hello app much clearer. 392 | 393 | ### 1️⃣ Root recipe (top level) 394 | 395 | * **Where:** `student-work//day3/cluster/aks/kustomization.yaml` 396 | * **Role:** The **entry point** for Flux in this repo. 397 | * Think of it as the master to-do list: “apply this folder, then that folder.” 398 | * Right now its only asking Flux to apply `./flux-system`. Soon, you'll `./apps/hello`. 399 | 400 | ### 2️⃣ Flux system recipe 401 | 402 | * **Where:** `student-work//day3/clusters/aks/flux-system/kustomization.yaml` 403 | * **Role:** Applies Flux's own building blocks. 404 | 405 | * `gotk-components.yaml` → installs the Flux controllers and CRDs. 406 | * `gotk-sync.yaml` → wires Flux to your repo (GitRepository + Kustomization objects). 407 | * This is how Flux keeps itself running. 408 | 409 | ### 3️⃣ Hellp app recipe 410 | 411 | * **Where:** `student-work//day3/clusters/aks/apps/hello/kustomization.yaml` 412 | * **Role:** Bundles your app's resources together. 413 | * It includes: 414 | 415 | * `namespace.yaml` (the Hello namespace) 416 | * `deployment.yaml` (the pods) 417 | * `service.yaml` (the LoadBalancer on AKS) 418 | * When Flux sees this file, it will apply all three resources as one unit. 419 | 420 | ### ✨ How they connect 421 | 422 | * The **root recipe** tells Flux to apply what's inside `./flux-system` folder. 423 | * The **flux-system recipe** tells Flux to apply its own configuration YAML files so it can run. 424 | * Sonn, you'll add `./apps/hello` to the **root recipe**. When you do, Flux will open the Hello app folder, finds the **hello app recipe**, and applies your app's resources as per the recipe. 425 | 426 | So the structure is **layered, not duplicated**: 427 | 428 | * Root → flux-system → Flux controllers 429 | * Root → hello → your app resources 430 | 431 | 👉 With that clear, you're ready for the fun part: edit the root recipe to add your Hello app and watch Flux deploy it automatically. That's next. 432 | 433 | ## ☁️ Your First GitOps Cloud Deployment 434 | 435 | **Time to deploy to the cloud using pure GitOps. No `kubectl apply`. Just Git.** 436 | 437 | Remember that Hello app from Day 2 - the one that healed itself when Flux noticed drift? You're about to deploy the exact same app to AKS. The only difference is that now it will be reachable on the internet through a real Azure Load Balancer. 438 | 439 | ### 1️⃣ Verify Your App Folder 440 | 441 | You should alreay have the Hello app folder in your repo (created for you during prep): 442 | 443 | ``` 444 | student-work//day3/clusters/aks/apps/hello/ 445 | ├── deployment.yaml 446 | ├── namespace.yaml 447 | ├── service.yaml 448 | └── kustomization.yaml 449 | ``` 450 | 451 | This folder contains everything Flux needs to deploy the app, including the `kustomization.yaml` file that lists the three resources above so they are applied together. 452 | 453 | If you don't see this folder go back to the prep step, copy it again from the examples folder or recreate it from your GitHub fork. 454 | 455 | ### 2️⃣ Edit the Root Recipe 456 | 457 | Now we'll tell Flux to include your Hello app in its reconciliation loop. 458 | 459 | Open the root `kustomization.yaml` 460 | 461 | ``` 462 | student-work//day3/clusters/aks/kustomization.yaml 463 | ``` 464 | 465 | Add the Hello app folder under `resources:`: 466 | 467 | ```yaml 468 | apiversion: kustomize.config.k8s.io/v1beta1 469 | kind: Kustomization 470 | resources: 471 | - ./flux-system 472 | - ./apps/hello # ← add this line 473 | ``` 474 | 475 | This is the single line that “wires” your hello app into the GitOps loop. 476 | 477 | ### 3️⃣ Commit and Push 478 | 479 | Save you change, then commit and push it to your fork: 480 | 481 | ```bash 482 | git add . 483 | git commit -m "Add Hello app to root recipe" 484 | git push origin main 485 | ``` 486 | 487 | From now on, Flux will notice this change in Git and apply the Hello app to your cluster automatically. 488 | 489 | ### 4️⃣ Watch Flux Reconcile 490 | 491 | Within about a minute, Flux will detect the commit, pull it down, and reconcile. 492 | You can watch it happen live: 493 | 494 | ```bash 495 | flux logs --follow --tail 20 496 | ``` 497 | 498 | You'll see messages about GitRepository updating and the root Kustomization applying. 499 | 500 | Press `Ctrl+c` once you see the reconciliation complete. 501 | 502 | ### 5️⃣ Verify Hello App Resources in AKS 503 | 504 | Chech that the Hello app is now up and running in AKS: 505 | 506 | ```bash 507 | kubectl get pods -n hello 508 | kubectl get svc -n hello 509 | ``` 510 | 511 | You'll should see something like: 512 | 513 | ``` 514 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 515 | hello LoadBalancer 10.0.123.45 20.248.xxx.xxx 80/TCP 2m 516 | ``` 517 | 518 | * The `Pod` is running inside the `hello` namespace. 519 | * The `Service` is of type `LoadBalancer`. AKS will provision an Azure Load Balancer and assign it a public IP address. 520 | 521 | >[!TIP] 522 | > ⏳ If `EEXTERNAL_IP` shows ``, wait 30-60 seconds and try again. Azure is creating the load balancer. 523 | 524 | ### 6️⃣ Access Your App in the Browser 525 | 526 | Once the external IP is assigned, get the URL: 527 | 528 | ```bash 529 | echo "http://$(kubectl get svc hello -n hello -o jsonpath='{.status.loadBalancer.ingress[0].ip}')" 530 | ``` 531 | 532 | Copy the printed URL into your browser. 533 | 534 | 🎉 **Boom!** Your Hello World app is live on the internet, deployed entirely through GitOps. 535 | 536 | ### 7️⃣ What Just Happened 537 | 538 | Let's recap: 539 | 540 | 1. You edited Git - not the cluster. 541 | 2. Flux noticed the change and reconciled the root recipe. 542 | 3. Flux read the Hello app recipe and applied the app resources into the `hello` namespace. 543 | 4. AKS provisioned a real Azure Load Balancer with a public IP. 544 | 5. You app became internet accessible. 545 | 546 | No manual `kubectl apply`. Just Git → Flux → Cloud. 547 | 548 | **Same GitOps loop as Day 2 - just running at cloud scale!** 549 | 550 | 👉 Next, we'll **update the Hello app using GitOps** by changing the replica count in Git and watch Flux reconcile the cluster to match Git. 551 | 552 | ## 🔄 Updating Your App with GitOps 553 | 554 | You've just deployed your Hello app to AKS using Flux. That's powerful on its own - but the real beauty of GitOps is how **every change flows the same way**: edit Git → comit → push → Flux reconciles. Let's try that now. 555 | 556 | ### 1️⃣ Make a Change in Git 557 | 558 | Right now your app is running with a single replica: 559 | 560 | ```yaml 561 | # student-work//day3/clusters/aks/apps/hello/deployment.yaml 581 | git commit -m "Scale Hello app to 3 replicas" 582 | git push origin main 583 | ``` 584 | 585 | ### 3️⃣ Watch Flux Reconcile 586 | 587 | Flux checks for new commits about every minute. Once it sees your change, it will reconcile the cluster to match Git. 588 | 589 | You can watch the rollout live: 590 | 591 | ```bash 592 | kubectl get deployment hello -n hello -w 593 | ``` 594 | 595 | You'll see the replica count go from **1 → 3**: 596 | 597 | ``` 598 | NAME READY UP-TO-DATE AVAILABLE AGE 599 | hello 1/1 1 1 5m 600 | hello 2/3 2 2 5m 601 | hello 3/3 3 3 5m 602 | ``` 603 | 604 | Press `Ctrl+c` to stop watching once all 3 replicas are up. 605 | 606 | ### 4️⃣ What Just Happened 607 | 608 | You changed a single line in Git (`replicas: 1 → 3`). 609 | * Flux noticed the new commit and pulled it. 610 | * Flux reconciled your cluster so the deployment matched what Git declared. 611 | * Kubernetes spun up two more pods until the cluster had 3 replicas running. 612 | 613 | ✅ That's GitOps in action: deployments, updates, and changes all flow through the same loop. 614 | 615 | 👉 Next, we'll push this one step further: what happens if we “break” things manually in the cluster? 616 | You'll see Flux heal them back into shape automatically. 617 | 618 | ## 💥 Cloud-Scale Self-Healing 619 | 620 | **Now that you've deployed and updated your Hello app the GitOps way, let's see what happens when things drift in the cluster without a Git commit.** 621 | 622 | This is where GitOps really shines - Flux continuously reconciles the cluster against what's in Git, reparing anythin that drifts. 623 | 624 | >[!NOTE] 625 | >To make the demo feedback fast, we'll first shorten Flux's reconcile internal. 626 | 627 | ### 1️⃣ Speed Up Self-Healing 628 | 629 | By default, Flux reconciles every **10 minutes**. For this demo, we'll set it to **1 minute** so you don't have to wait long. 630 | 631 | Open the file: 632 | 633 | ``` 634 | student-work//day3/clusters/aks/flux-system/gotk-sync.yaml 635 | ``` 636 | 637 | Find the `Kustomization` spec and change: 638 | 639 | ```yaml 640 | spec: 641 | interval: 1m0s # ← Change from 10m0s 642 | ``` 643 | 644 | Commit and Push: 645 | 646 | ```bash 647 | git add student-work//day3/aks/flux-system/gotk-sync.yaml 648 | git commit -m "Speed up Flux reconciliation to 1 minute" 649 | git push origin main 650 | ``` 651 | 652 | Flux now reconciles once a minute. 653 | 654 | ### 2️⃣ Test 1: Manual Scale Up 655 | 656 | Imagine a teammate scaled the deployment directly in the cluster to cope with demand, but forgot to commit the change to Git. Flux will notice and correct it. 657 | 658 | ```bash 659 | # Scale to 5 replicas manually (bypassing GitOps) 660 | kubectl scale deployment hello -n hello --replicas=5 661 | 662 | # Watch the deployment in real-time 663 | kubectl get deployment hello -n hello -w 664 | ``` 665 | 666 | You should see something like this: 667 | 668 | ``` 669 | NAME READY UP-TO-DATE AVAILABLE AGE 670 | hello 3/3 3 3 12m 671 | hello 5/5 5 5 12m30s # Manual change applied 672 | hello 3/3 3 3 13m # Flux detected a drift reconciled it back to Git state 673 | ``` 674 | 675 | Press `Ctrl+c` to stop watching. 676 | ✅ Git's declared state (3 replicas) wins. Manual changes don't stick. 677 | 678 | ### 3️⃣ Test 2: The Nuclear Option 679 | 680 | Let's go further: delete the entire namespace. This wipes out the app, service, load balancer, and public IP. 681 | 682 | ```bash 683 | kubectl delete namespace hello 684 | ``` 685 | 686 | Now watch Flux rebuild it: 687 | 688 | ```bash 689 | # Watch until the namespace reappears 690 | kubectl get ns -w 691 | ``` 692 | 693 | Within **1-3 minutes** you'll see: 694 | 695 | * Namespace recreated 696 | * Deployment and pods spun up 697 | * Service re-created 698 | * Azure provisioned a new load balancer and public IP 699 | 700 | Press `Ctrl+c` once everything is back. 701 | 702 | ### 4️⃣ Verify You App is Back 703 | 704 | Get the new public IP: 705 | 706 | ```bash 707 | kubectl get svc hello -n hello 708 | ``` 709 | 710 | Print the new URL: 711 | 712 | ```bash 713 | echo "http://$(kubectl get svc hello -n hello -o jsonpath='{.status.loadBalancer.ingress[0].ip}')" 714 | ``` 715 | 716 | Copy the URL and open it in your browser. 🎉 Your app is back online again! Now with a fresh public IP. 717 | 718 | >[!NOTE] 719 | >Because Azure created a new Load Balancer, the external IP may change each time the service is re-created. 720 | 721 | ### 5️⃣ Check Out Flux's Records 722 | 723 | See what Flux recorded during this healing: 724 | 725 | ```bash 726 | flux events -n flux-system --for Kustomization/flux-system 727 | ``` 728 | 729 | You'll see entries like: 730 | 731 | ``` 732 | Reconciliation finished in 1.3s, next run in 1m0s 733 | Applied 3 resources 734 | Namespace hello created 735 | Deployment hello created 736 | Service hello created 737 | ``` 738 | 739 | ### 6️⃣ What We LEarned 740 | 741 | * **Drift is temporary** - Flux brings the cluster back to what is declared in Git 742 | * **Manual changes don't persist** - Git is always the source truth 743 | * **Deletions are reparied** - automation prevails. 744 | 745 | What we saw on your laptop in Day 2 now holds true at cloud scale. Flux healed not just pods, but also Azure infrastructure like load balancers and public IPs. 746 | 747 | 🎉 **Congratulations!** You've now seen GitOps enforce and heal workloads in the cloud, end-to-end. 748 | 749 | 👉 Next, let's clean up your resources and wrap up Day 3. 750 | 751 | ## 🧹 Cleanup & Next Steps 752 | 753 | **Before you log off, let's make sure you delete everything you created today.** 754 | This step is important because cloud resources (like load balancers and node VMs) can continue incurring costs if left running. 755 | 756 | ### Delete the Resource Group 757 | 758 | All your AKS resources were created inside a single Azure resource group (`gitops-prod-rg`). Deleting the group removes **everything inside it** in one go. 759 | 760 | ```bash 761 | # Delete the entire resource group (and all resources it contains) 762 | az group delete --name gitops-prod-rg --yes --no-wait 763 | ``` 764 | 765 | This will remove: 766 | 767 | * Your AKS cluster 768 | * All worker nodes 769 | * Any load balancers 770 | * Public IP addresses 771 | * Disks, NIC, and other linked resources 772 | 773 | The `--no-wait` flag tells Azure to start deletion in the background so you don't have to sit around watching it. 774 | 775 | ### C;ean Up Local Kubeconfig (Optional) 776 | 777 | Azure resources are gone, but your kubeconfig may still have a ontext pointing to the deleted cluster. 778 | You can safely remove it: 779 | 780 | ```bash 781 | kubectl config delete-context gitops-prod-aks || true 782 | kubectl config delete-cluster gitops-prod-aks || true 783 | ``` 784 | 785 | This prevents confusion later if you create a new cluster with the same name. 786 | 787 | ### Double-Check Azure Resources are Deleted 788 | 789 | If you want to confirm the resource group is gone: 790 | 791 | ```bash 792 | az group list --output table 793 | ``` 794 | 795 | You shouldn't see `gitops-prod-rg` anymore. 796 | 797 | ✅ That's it! Your cloud environment is fully cleaned up and won't incure any ongoing costs. 798 | 799 | 👉 Next, we'll **wrap-up** to reflect on what you achieved in Day 3, and then a **Day 4 preview** to tee up production GitOps patterns. 800 | 801 | ## 🎯 Day 3 Wrap-Up 802 | 803 | **Take a step back, you just ran GitOps at cloud scale!** 804 | 805 | ### What You Did Today 806 | 807 | * ✅ **Provisioned AKS** with a single `az` command 808 | * ✅ **Bootsrapped Flux** into the cluster with one command 809 | * ✅ **Understood** how flux wires itself into Git (root + flux-system + app recipes) 810 | * ✅ **Deployed** your Hello app to the internet using GitOps 811 | * ✅ **Updated** your app by changing replicas in Git (no `kubectl apply`) 812 | * ✅ **Stress tested drift** by scaling manually and deleting the namespace, Flux healed everything 813 | * ✅ **Cleaned up** your resources safely 814 | 815 | That's huge amount of ground to cover in one day. 816 | 817 | ### Why It Matters 818 | 819 | You proved that: 820 | 821 | * GitOps is **infrastructure agnostic**: same workflow, different clusters 822 | * Cloud infrastructure is no different from local, Flux reconciles both 823 | * Self-healing is real, not just a demo trick: even Azure load balancers came back automatically 824 | * Git is now your **single source of truth** for both apps and cluster config 825 | 826 | This is the promise of GitOps: consistent, reliable, drift-free deployments, no matter the environment. 827 | 828 | 👉 Next, in **Day 4**, we'll take the leap from “it works” to **production-ready patterns**: multiple environments, image automation, and secrets. That's when you'll see how real-world teams apply these principles at scale. 829 | 830 | ## 🚀 Coming Up Next (Day 4) 831 | 832 | Tmorrow, you'll see how to take you GitOps workflow from **capable** to **production-ready**: 833 | 834 | * **Multi-environment GitOps** 835 | Manage dev, staging, and production clusters cleanly from Git 836 | 837 | * **Image automation with GitHub Actions** 838 | Build and push new app images automatically, with GitOps handling deployment 839 | 840 | * **Azure Container Registry (ACR)** 841 | Store your container images securely and integrate them into your GitOps loop 842 | 843 | * **Secret management** 844 | Handle sensitive data (like API keys) safely in a GitOps workflow 845 | 846 | 🎉 Congratulations again on completing Day 3! Take a well-desrved break, you've earned it. 847 | 848 | See you in **Day 4**, where we'll add the patterns that make GitOps ready for real-world production. --------------------------------------------------------------------------------