├── assets
    └── images
    │   ├── GitOps-Loop.png
    │   ├── flux-architecture.png
    │   ├── Pull-vs-Push-Model.png
    │   ├── What-Exactly-Is-GitOps.png
    │   ├── CI-CD-vs-GitOps-Comparison.png
    │   └── The-Four-GitOps-Principles.png
├── examples
    ├── day3
    │   └── clusters
    │   │   └── aks
    │   │       └── apps
    │   │           └── hello
    │   │               ├── namespace.yaml
    │   │               ├── kustomization.yaml
    │   │               ├── service.yaml
    │   │               └── deployment.yaml
    └── day2
    │   └── clusters
    │       └── local
    │           └── apps
    │               └── hello
    │                   ├── namespace.yaml
    │                   ├── service.yaml
    │                   └── deployment.yaml
├── README.md
├── Day-1-What-really-is-GitOps.md
├── Day-2-Building-Your-First-Self-Healing-System.md
└── Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md


/assets/images/GitOps-Loop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/GitOps-Loop.png


--------------------------------------------------------------------------------
/examples/day3/clusters/aks/apps/hello/namespace.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 |   name: hello


--------------------------------------------------------------------------------
/examples/day2/clusters/local/apps/hello/namespace.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 |   name: hello


--------------------------------------------------------------------------------
/assets/images/flux-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/flux-architecture.png


--------------------------------------------------------------------------------
/assets/images/Pull-vs-Push-Model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/Pull-vs-Push-Model.png


--------------------------------------------------------------------------------
/assets/images/What-Exactly-Is-GitOps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/What-Exactly-Is-GitOps.png


--------------------------------------------------------------------------------
/assets/images/CI-CD-vs-GitOps-Comparison.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/CI-CD-vs-GitOps-Comparison.png


--------------------------------------------------------------------------------
/assets/images/The-Four-GitOps-Principles.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedmuhi/GitOps-Days/HEAD/assets/images/The-Four-GitOps-Principles.png


--------------------------------------------------------------------------------
/examples/day3/clusters/aks/apps/hello/kustomization.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: kustomize.config.k8s.io/v1beta1
2 | kind: Kustomization
3 | resources:
4 |   - namespace.yaml
5 |   - deployment.yaml
6 |   - service.yaml
7 | 


--------------------------------------------------------------------------------
/examples/day2/clusters/local/apps/hello/service.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: hello
 5 |   namespace: hello
 6 | spec:
 7 |   selector:
 8 |     app: hello
 9 |   ports:
10 |     - port: 80
11 |       targetPort: 80
12 | 


--------------------------------------------------------------------------------
/examples/day3/clusters/aks/apps/hello/service.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: hello
 5 |   namespace: hello
 6 | spec:
 7 |   type: LoadBalancer        # ← this is the only required change on AKS
 8 |   selector:
 9 |     app: hello
10 |   ports:
11 |     - port: 80
12 |       targetPort: 80
13 | 


--------------------------------------------------------------------------------
/examples/day3/clusters/aks/apps/hello/deployment.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: apps/v1
 2 | kind: Deployment
 3 | metadata:
 4 |   name: hello
 5 |   namespace: hello
 6 | spec:
 7 |   replicas: 1
 8 |   selector:
 9 |     matchLabels:
10 |       app: hello
11 |   template:
12 |     metadata:
13 |       labels:
14 |         app: hello
15 |     spec:
16 |       containers:
17 |         - name: hello
18 |           image: nginxdemos/hello:plain-text
19 |           ports:
20 |             - containerPort: 80
21 | 


--------------------------------------------------------------------------------
/examples/day2/clusters/local/apps/hello/deployment.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: apps/v1
 2 | kind: Deployment
 3 | metadata:
 4 |   name: hello
 5 |   namespace: hello
 6 | spec:
 7 |   replicas: 1
 8 |   selector:
 9 |     matchLabels:
10 |       app: hello
11 |   template:
12 |     metadata:
13 |       labels:
14 |         app: hello
15 |     spec:
16 |       containers:
17 |         - name: hello
18 |           image: nginxdemos/hello:plain-text
19 |           ports:
20 |             - containerPort: 80
21 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # GitOps-Days 🔄🚀
  3 | 
  4 | Commit, reconcile, ship.
  5 | *Learn the Git-first, pull-based way to run Kubernetes - and let your clusters look after themselves.*
  6 | 
  7 | ## Welcome
  8 | 
  9 | If you’ve ever fixed something in production at the last minute… and then spent the next week wondering what else had changed, you’re not alone.
 10 | GitOps gives you a different way: changes go into Git, the cluster follows, and if things drift, they’re put back automatically. When trouble shows up, you can roll back cleanly to a known good state - no guesswork.
 11 | 
 12 | That’s the predictability you’ll build in this series.
 13 | Let’s start by choosing how you want to begin.
 14 | 
 15 | ## 🗺️ Start here
 16 | 
 17 | There’s more than one way to get into GitOps - pick the path that fits how you like to learn.
 18 | 
 19 | * **Want the big picture first?** Start with **[Day 1: What really is GitOps?](./Day-1-What-really-is-GitOps.md)** and build a clear mental model.
 20 | * **Learn best by doing?** Jump into **[Day 2: Building Your First Self-Healing System](./Day-2-Building-Your-First-Self-Healing-System.md)** and see it in action.
 21 | * **Ready for cloud from the start?** Head to **[Day 3: Production GitOps on AKS with GitHub Actions](./Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md)** and go straight to a production-grade setup.
 22 | 
 23 | > **You’ll need**: Git, Docker, and `kubectl` for Days 1-2. Day 3 also needs an Azure account and the Azure CLI.
 24 | 
 25 | ### Quick cheat sheet before you dive in
 26 | 
 27 | * **Desired state**: what Git says your cluster should look like.
 28 | * **Drift**: when your cluster no longer matches Git.
 29 | * **Controller**: software in the cluster (like [Flux](https://fluxcd.io/) or [Argo CD](https://argo-cd.readthedocs.io/)) that keeps it matching Git.
 30 | * **Self-healing**: the controller restores the cluster to match Git automatically.
 31 | 
 32 | ## 🎯 What you will achieve
 33 | 
 34 | By the end of this series, you will be able to:
 35 | 
 36 | * Keep clusters aligned with Git and recover fast when things change.
 37 | * Build and run a self-healing loop locally.
 38 | * Take the same loop to Azure Kubernetes Service with production patterns.
 39 | * Extend GitOps with CI pipelines, multi-environment setups, and secrets management. *(Days 4-5 planned)*
 40 | 
 41 | ## 🛠️ What you need
 42 | 
 43 | For local labs (Days 1-2):
 44 | 
 45 | * Docker Desktop or Docker Engine
 46 | * `kubectl`
 47 | * Git and a GitHub account
 48 | * [Flux CLI](https://fluxcd.io/)
 49 | 
 50 | For cloud labs (Day 3):
 51 | 
 52 | * Azure CLI
 53 | * An Azure subscription
 54 | 
 55 | > **Tested with**: current stable releases of Docker, `kubectl`, kind, Flux, and Azure CLI. If you hit a version snag, open an issue and we’ll help.
 56 | 
 57 | ## 🗺️ Your learning path
 58 | 
 59 | **Status:** Days 1-3 ready now. Days 4-5 planned.
 60 | 
 61 | | Day                                                                  | Focus                                                | What you'll build                    |
 62 | | -------------------------------------------------------------------- | ---------------------------------------------------- | ------------------------------------ |
 63 | | [**Day 1**](./Day-1-What-really-is-GitOps.md)                        | **Understand**: what GitOps is and why it matters    | A clear mental model                 |
 64 | | [**Day 2**](./Day-2-Building-Your-First-Self-Healing-System.md)      | **Build**: your first self-healing system with Flux  | Local GitOps loop that auto-corrects |
 65 | | [**Day 3**](./Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md) | **Scale**: take the same loop to AKS                 | Production-ready cloud deployment    |
 66 | | **Day 4**                                                            | **Operate**: real-world patterns and troubleshooting | Robust GitOps workflows              |
 67 | | **Day 5**                                                            | **Advance**: tool choices and next steps             | Your GitOps roadmap                  |
 68 | 
 69 | ## 🔄 Stay up to date (sync your fork)
 70 | 
 71 | **Using GitHub (no local clone):**
 72 | 
 73 | 1. Open *your fork* on GitHub.
 74 | 2. Click **Sync fork** (or **Fetch upstream**) → **Update branch**.
 75 | 3. Your fork is now in sync with the latest changes.
 76 | 
 77 | <details><summary>Using the CLI (optional)</summary>
 78 | 
 79 | ```bash
 80 | # inside your local clone
 81 | git remote add upstream https://github.com/ahmedmuhi/GitOps-Days.git
 82 | git fetch upstream
 83 | git checkout main
 84 | git merge upstream/main    # or: git rebase upstream/main
 85 | git push origin main
 86 | ```
 87 | 
 88 | </details>
 89 | 
 90 | ## 🗂️ Repo map
 91 | 
 92 | Everything you need is linked in the learning path above.
 93 | If you’re browsing the source, the `/examples/` folder contains the manifests for Day 2 and Day 3 labs.
 94 | 
 95 | ## ❓ Quick FAQ
 96 | 
 97 | **Do I need to re-fork to get updates?**
 98 | No - use the sync steps above.
 99 | 
100 | **I broke my local lab - now what?**
101 | Reconciliation will usually fix it. If not, recreate the kind cluster and reapply Flux.
102 | 
103 | **I have a question that’s not listed here.**
104 | [Open an issue](https://github.com/ahmedmuhi/GitOps-Days/issues) - we’ll add it to the FAQ.
105 | 
106 | ## 💬 Join the conversation
107 | 
108 | GitOps-Days is evolving, and your feedback matters.
109 | 
110 | * Found a bug or typo? [Open an issue](https://github.com/ahmedmuhi/GitOps-Days/issues)
111 | * Have ideas or improvements? Send a pull request
112 | * Sharing your journey? Post with `#GitOpsDays`
113 | 
114 | ## 🧭 Roadmap
115 | 
116 | * **Day 4** - CI pipelines that feed the loop, image promotion, rollout gates
117 | * **Day 5** - Multi-environment layouts and secrets patterns
118 | * Future topics will follow learner needs
119 | 
120 | ## 📚 Additional resources
121 | 
122 | * [Flux documentation](https://fluxcd.io/flux/)
123 | * [OpenGitOps / CNCF GitOps Working Group](https://opengitops.dev/)
124 | * [Kubernetes tutorials](https://kubernetes.io/docs/tutorials/)
125 | 
126 | 🚀 **Ready to start?** Jump into **[Day 1: What really is GitOps?](./Day-1-What-really-is-GitOps.md)** and begin your GitOps journey.
127 | 


--------------------------------------------------------------------------------
/Day-1-What-really-is-GitOps.md:
--------------------------------------------------------------------------------
  1 | # 🌟 Day 1 – What really is GitOps?
  2 | 
  3 | If you’ve worked with Kubernetes for more than a few weeks, you’ve seen it:
  4 | what’s running in the cluster drifts away from what you thought was deployed.
  5 | A manual fix here, an emergency tweak there… and before long, the state in your cluster and the state in your Git repo tell two different stories.
  6 | 
  7 | That gap is where incidents start and trust in your deployments erodes.
  8 | 
  9 | Today we’re going to close that gap - permanently.
 10 | Not with more scripts or a bigger CI pipeline, but by changing **where** your source of truth lives and **how** your cluster keeps itself aligned with it.
 11 | 
 12 | By the end of this session, you’ll be able to:
 13 | 
 14 | * State GitOps in one clear sentence.
 15 | * Trace the Git → controller → cluster loop.
 16 | * Explain why pull beats push for Kubernetes - and why it changes the game for drift, security, and recovery.
 17 | 
 18 | Let’s start with the most important question: **what exactly is GitOps?**
 19 | 
 20 | ## What exactly is GitOps?
 21 | 
 22 | So what *is* GitOps, really? Let’s start with how most teams try to fight drift - and why those fixes don’t quite close the gap.
 23 | 
 24 | You’ve probably seen this: configs stored in Git but applied manually, deployment scripts that push updates, or CI/CD pipelines that run after every commit. They help, sure, but they still leave a gap between what’s in your repo and what’s running in your cluster. Manual changes or external systems can sneak in, and that’s where drift and configuration rot take hold.
 25 | 
 26 | This isn’t just your team’s problem. Back in 2017, engineers at Weaveworks popularised a different way: treat Git as the single source of truth, and let the cluster enforce it for itself. The idea caught on quickly. Tools like Flux and Argo CD put it into practice, and the CNCF’s [OpenGitOps project](https://opengitops.dev) formalised the core principles so teams everywhere could work from the same playbook.
 27 | 
 28 | Put simply:
 29 | 
 30 | **GitOps means storing the desired state of your system in Git, and running a controller inside each cluster that continuously pulls that state and reconciles the cluster to match it.**
 31 | 
 32 | > If it’s not in Git, it shouldn’t exist in the cluster.
 33 | > If it’s in Git, the cluster should match it.
 34 | 
 35 | <img src="assets/images/What-Exactly-Is-GitOps.png" width="50%" alt="GitOps loop comic">
 36 | The loop is simple: declare your infrastructure as code, commit it to Git, the controller pulls and enforces it, and the cluster stays aligned automatically.
 37 | 
 38 | In practice, that looks like this:
 39 | 
 40 | 1. You declare your infrastructure and application configs as code.
 41 | 2. You commit the changes to Git - now it’s versioned and visible.
 42 | 3. The in-cluster controller compares Git (desired) with the cluster (actual) and fixes any differences.
 43 | 4. The cluster stays aligned, and drift never builds up.
 44 | 
 45 | Storing YAML in Git is a good first step. GitOps goes further - your cluster pulls from Git and enforces it, continuously. Let’s see what that changes.
 46 | 
 47 | ## Beyond Just Storing YAML in Git
 48 | 
 49 | You might be thinking, *“Hang on - we already keep our manifests in Git. Isn’t that GitOps?”*
 50 | That’s a good start… but it’s not the whole story.
 51 | 
 52 | Here’s why: storing YAML in Git is like writing down the rules but never checking if anyone’s following them. Without something constantly making sure your cluster matches those files - and fixing it when it doesn’t - drift will sneak back in.
 53 | 
 54 | When you go from “YAML in Git” to **GitOps**, a few key things change:
 55 | 
 56 | * **Authority** - You (or a CI job) are no longer the one applying changes; an in-cluster controller does it, all the time, without forgetting.
 57 | * **Direction** - Instead of pushing changes into the cluster, the cluster pulls its own configuration from Git.
 58 | * **Timing** - Instead of waiting until you remember to run a command, reconciliation happens automatically on a short, predictable cycle.
 59 | * **Evidence** - Instead of ad-hoc CLI changes, every change leaves a commit and a review trail.
 60 | 
 61 | Here’s how that plays out:
 62 | 
 63 | You merge a PR that sets `replicas: 3`.
 64 | Later, someone bumps it to 5 by hand.
 65 | In the next reconciliation cycle, the controller spots the mismatch and sets it back to 3.
 66 | No Slack pings. No late-night debugging. If you *really* want 5, you change Git. If you need to undo a bad change, you revert the commit.
 67 | 
 68 | The takeaway?
 69 | YAML in Git tells you what *should* happen. GitOps makes sure it *does* happen - automatically, continuously, and without you chasing it.
 70 | 
 71 | ## How the GitOps Engine Works
 72 | 
 73 | So far, we’ve been talking about *what* GitOps is. Let’s talk about *how* it actually works.
 74 | 
 75 | At the heart of GitOps is a small piece of software called a **GitOps controller** that runs inside your cluster. Its job is simple but relentless:
 76 | 
 77 | 1. **Watch** your Git repository for changes.
 78 | 2. **Compare** what’s in Git (desired state) with what’s running in the cluster (actual state).
 79 | 3. **Reconcile** any differences by updating the cluster to match Git.
 80 | 
 81 | That’s it: **watch → compare → reconcile**.
 82 | This is the GitOps loop, and it runs continuously - on a short, predictable cycle - like a steady heartbeat keeping your cluster healthy.
 83 | 
 84 | Here’s what that means in real life:
 85 | If someone makes a direct change in the cluster - maybe tweaks an environment variable - the controller spots it in the next cycle and flips it back to match Git. No drama. No surprises. And no chasing down mysterious changes later.
 86 | 
 87 | With that foundation in place, we can look at how this loop shapes your day-to-day workflow.
 88 | 
 89 | ## GitOps Workflows in Practice
 90 | 
 91 | Now that you understand *how* the GitOps engine works, let’s zoom out and see how it fits into your everyday workflow.
 92 | 
 93 | Here’s the big picture at a glance:
 94 | 
 95 | ![GitOps Workflow Diagram](assets/images/GitOps-Loop.png)
 96 | 
 97 | ### The Two-Repository Pattern
 98 | 
 99 | In GitOps, we usually split work into two repos:
100 | 
101 | 1. **Application repo** – Your source code, tests, and Dockerfiles.
102 | 2. **Configuration repo** – Your Kubernetes manifests, Helm charts, or Kustomize configs.
103 | 
104 | Why split them?
105 | It keeps code changes and deployment settings independent. Developers focus on building and testing code. Platform teams focus on how and where it runs. Each can evolve separately, with its own review and approval process.
106 | 
107 | ### From Commit to Cluster
108 | 
109 | Here’s what it looks like in motion:
110 | You push code to the **application repo**.
111 | CI picks it up, builds and tests it, creates a container image, and pushes that image to a registry.
112 | Then - and this is the only deployment step CI does - it updates the **config repo** with the new image tag.
113 | 
114 | The GitOps controller, always watching the config repo, spots the change. It pulls the new config, applies it to the cluster, and if anything drifts later, it quietly puts things back in place.
115 | 
116 | ### The Key Shift
117 | 
118 | Notice what never happens?
119 | The CI pipeline never talks to your cluster. No API tokens floating around in external systems. No one-off manual kubectl commands.
120 | 
121 | The cluster follows what’s in Git, and only what’s in Git. That’s not just cleaner - it’s more secure, more auditable, and more reliable. If you want to change something, you change Git. If you want to undo something, you revert Git.
122 | 
123 | ## GitOps vs Traditional CI/CD
124 | 
125 | You’ve seen how GitOps works inside a team’s workflow. But how does it stack up against the way most teams deploy today? Let’s put them side by side.
126 | 
127 | <img src="assets/images/CI-CD-vs-GitOps-Comparison.png" width="50%" alt="CI/CD vs GitOps comic">
128 | ![Push vs Pull comic](assets/images/Pull-vs-Push-Model.png)
129 | 
130 | The difference comes down to who makes the change, and how it gets into your cluster.
131 | 
132 | ### Push vs Pull at a Glance
133 | 
134 | | Aspect              | Traditional CI/CD                         | GitOps                                      |
135 | | ------------------- | ----------------------------------------- | ------------------------------------------- |
136 | | **Who deploys**     | CI pipeline pushes to cluster             | Controller inside cluster pulls from Git    |
137 | | **Access model**    | External systems need cluster credentials | Only the in-cluster controller needs access |
138 | | **When it happens** | On demand when the pipeline runs          | Continuously, on a short, regular cycle     |
139 | | **Drift handling**  | Manual intervention required              | Automatically detected and fixed            |
140 | | **Rollback**        | Re-run pipeline with an old version       | `git revert` and commit                     |
141 | | **Audit trail**     | Spread across CI logs                     | All in Git history                          |
142 | | **Source of truth** | Could be CI, could be the cluster         | Always Git                                  |
143 | 
144 | ### Why the Pull Model Changes the Game
145 | 
146 | Security is one of the biggest wins.
147 | 
148 | In the push model:
149 | 
150 | * Anyone with CI credentials can make direct changes to your cluster.
151 | * Those changes might leave minimal traces outside the CI system.
152 | * Finding them later means digging through multiple systems.
153 | 
154 | In the pull model:
155 | 
156 | * Every change must go through Git.
157 | * That means commits, branches, and file changes - all logged, reviewable, and easy to trace.
158 | * Rolling back is as simple as reverting a commit.
159 | 
160 | Push model = a quiet side door.
161 | Pull model = the front door, where everyone sees who’s coming and going.
162 | 
163 | ### What This Means Day to Day
164 | 
165 | **Traditional CI/CD**: You run a pipeline and hope it finishes cleanly. Maybe the cluster updated. Maybe someone changed something in between.
166 | 
167 | **GitOps**: If it’s in Git, it’s in the cluster. If someone changes the cluster by hand, it’s corrected automatically in the next reconciliation cycle.
168 | 
169 | With GitOps, deployments aren’t one-off events. They’re a state your system actively maintains - secure, auditable, and resistant to drift.
170 | 
171 | ## The Four Principles That Make GitOps Work
172 | 
173 | Everything you’ve seen today - the self-healing, the security, the simplicity - comes down to four core ideas. They’re not mine, and they’re not specific to Flux. The CNCF’s [OpenGitOps](https://opengitops.dev) project pulled these patterns from real-world teams and wrote them down so everyone is speaking the same language.
174 | 
175 | Here they are, plain and simple:
176 | 
177 | 1. **Declarative** - Describe the end state you want, not the steps to get there. For example, `replicas: 3` instead of running `kubectl scale`.
178 | 2. **Versioned & Immutable** - Keep the desired state in Git, so every change is tracked, reviewed, and reversible.
179 | 3. **Pulled Automatically** - The cluster fetches its own configuration from Git - you never push changes into it.
180 | 4. **Continuously Reconciled** - The system keeps reality matched to Git and fixes drift whenever it appears.
181 | 
182 | If someone tells you they “do GitOps,” these are the four things you should be able to see in action. And when you use Flux, Argo CD, or other GitOps tools, this is exactly what they’re implementing for you.
183 | 
184 | <img src="assets/images/The-Four-GitOps-Principles.png" width="50%" alt="The Four GitOps Principles">
185 | 
186 | ## GitOps in Context: Your Journey Forward
187 | 
188 | ### From Drift to Control
189 | 
190 | We began today with a simple truth: what’s in Git often drifts from what’s running in your cluster. Quick fixes, manual changes, and ad-hoc scripts make it worse.
191 | 
192 | Now you’ve seen the alternative — a model where Git is the single source of truth, and your cluster keeps itself aligned with it.
193 | 
194 | By the end of this first session, you can:
195 | 
196 | * Explain the GitOps loop: **watch → compare → reconcile**
197 | * Show why **pull beats push** for security and reliability
198 | * Identify the four CNCF-endorsed principles that make self-healing possible
199 | 
200 | ### Tomorrow: From Knowledge to Power
201 | 
202 | Tomorrow we go from concept to action. In the next hour you will:
203 | 
204 | * **Build** a local Kubernetes cluster
205 | * **Install** Flux and watch it take control
206 | * **Deploy** an app using only Git commits
207 | * **Break** something on purpose — and watch it heal itself
208 | 
209 | No cloud accounts. No complex setup. Just your laptop and the full GitOps loop in action.
210 | 
211 | And here’s the real win: with GitOps, drift isn’t something you scramble to detect - it’s something that simply can’t persist. Your clusters will sync themselves, your audit trail will live in Git, and your role will shift from firefighting to guiding intent.
212 | 
213 | By Day 5, you’ll be confident running GitOps in production - with the peace of mind that comes from knowing your system is always in the state you declared.
214 | 
215 | **Ready to build your first self-healing system?**
216 | [Continue to Day 2 →](https://github.com/ahmedmuhi/GitOps-Days/blob/main/Day-2-Building-Your-First-Self-Healing-System.md)
217 | 


--------------------------------------------------------------------------------
/Day-2-Building-Your-First-Self-Healing-System.md:
--------------------------------------------------------------------------------
  1 | # 🚀 Day 2 – Building Your First Self-Healing Kubernetes System with Flux
  2 | 
  3 | Yesterday we explored the antidote to one of the most frustrating problems in Kubernetes operations: **configuration drift**.
  4 | That moment when the running system doesn’t match what you think is deployed - maybe because someone made a “quick fix” in production and never committed it, or a setting silently changed without explanation. In the old world, that meant 3 a.m. firefights, scrambling through logs, and hoping you could piece the cluster back together.
  5 | 
  6 | GitOps changes that story. By making Git your single source of truth and letting the cluster reconcile itself, drift can’t persist - and today, you’ll see that in action.
  7 | 
  8 | ## 🏁 Welcome Back to GitOps-Days
  9 | 
 10 | If any of the terms from Day 1 - *desired state*, *pull model*, or the *four GitOps principles* - aren’t fresh in your mind, you can [review them here](./Day-1-What-really-is-GitOps.md) before diving in.
 11 | 
 12 | Today, we’ll use **[Flux](https://fluxcd.io/)** - a GitOps operator - to build a live, breathing system that syncs with your repo, detects drift, and heals itself automatically. You’ll turn yesterday’s concepts into something you can watch work in real time.
 13 | 
 14 | ## 🗺️ Your Hands-On Journey (\~1 hour)
 15 | 
 16 | You’ll:
 17 | 
 18 | 1. Set up a local Kubernetes lab environment (\~15 min)
 19 | 2. Create a Git repository as your single source of truth (\~5 min)
 20 | 3. Install Flux and connect it to your cluster (\~10 min)
 21 | 4. Deploy an application entirely through Git commits (\~10 min)
 22 | 5. Break things on purpose and watch Flux fix them (\~15 min)
 23 | 
 24 | No cloud accounts. No complex setup. Just Docker and a few free tools.
 25 | 
 26 | ## 🎯 By the End, You’ll Have
 27 | 
 28 | * A real Kubernetes cluster with Flux keeping it in sync with your repo
 29 | * Automated deployments triggered by pushes to Git
 30 | * Continuous drift detection and correction in under a minute
 31 | * A system that can recover from mistakes without your intervention
 32 | 
 33 | **Ready to turn the pain of drift into the peace of self-healing?**
 34 | Let’s get started.
 35 | 
 36 | ## 🧰 Preparing Your Workspace
 37 | 
 38 | Before we dive into building your self-healing system, let’s set up the **essential building blocks** it relies on. These tools form the foundation for everything you’ll do today - once they’re in place, Flux will be able to work its GitOps magic.
 39 | 
 40 | Later in this lab, we’ll install **[Flux](https://fluxcd.io/)** - the GitOps operator that keeps your cluster in sync with Git - but first, we need this solid groundwork.
 41 | 
 42 | > [!TIP]
 43 | > Already have Docker, kind, kubectl, and Git installed?
 44 | > [Skip ahead to “Set Up Your Git Repository”](#📂-set-up-your-git-repository).
 45 | 
 46 | > [!IMPORTANT]
 47 | > You'll need at least **4 GB of free RAM** and **2 GB of disk space** for the local cluster.
 48 | 
 49 | ### The Tools You’ll Need
 50 | 
 51 | **1. Docker** - Runs the containers that become your Kubernetes nodes.
 52 | [Install Docker](https://docs.docker.com/get-docker/)
 53 | **Minimum version:** **24.0**
 54 | 
 55 | **2. kind** - Creates your local “playground” by spinning up a Kubernetes cluster inside Docker.
 56 | [Install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation)
 57 | **Minimum version:** **0.25.0**
 58 | 
 59 | **3. kubectl** - Your command-line tool for inspecting and managing Kubernetes resources.
 60 | [Install kubectl](https://kubernetes.io/docs/tasks/tools/)
 61 | **Minimum version:** **1.32**
 62 | 
 63 | **4. Git** - Your single source of truth, where you declare what should be running in your cluster.
 64 | [Install Git](https://git-scm.com/downloads)
 65 | **Minimum version:** **2.40**
 66 | 
 67 | ### ✅ Checkpoint: Verify Your Setup
 68 | 
 69 | **Why this matters:**
 70 | Each lab step includes a checkpoint like this. If something fails later, knowing exactly which step last worked makes troubleshooting easier - without retracing your entire setup.
 71 | 
 72 | Run the following commands to confirm your tools are ready:
 73 | 
 74 | ```shell
 75 | docker --version
 76 | kind --version
 77 | kubectl version --client --short
 78 | git --version
 79 | ```
 80 | 
 81 | **Pass criteria:**
 82 | You should see version numbers that match or exceed the minimums above.
 83 | 
 84 | With your tools confirmed, you’re ready to create the Git repository that will control everything in your self-healing Kubernetes system.
 85 | 
 86 | ## 📂 Set Up Your Git Repository
 87 | 
 88 | Your Git repository is the **single source of truth** Flux will watch. Every change you push here will be automatically applied to your Kubernetes cluster - which is why this step is critical to building your self-healing system.
 89 | 
 90 | ⏱️ **Time needed:** \~3 minutes
 91 | 
 92 | ### 1️⃣ Fork this Repository
 93 | 
 94 | Go to [`https://github.com/ahmedmuhi/GitOps-Days`](https://github.com/ahmedmuhi/GitOps-Days) and click **Fork** in the top-right corner.
 95 | On the next screen, click **Create fork**.
 96 | 
 97 | 💡 *Why this matters:*
 98 | In GitOps, Flux constantly pulls from a Git repository to know what you cluster should look like. To make changes that Flux will actually see and apply, you need **write access** to that repository. You can't commit to someone else's repo, so forking creates you own copy in your GitHub account - one you control completely.
 99 | 
100 | Your fork will live at:
101 | 
102 | ```
103 | https://github.com/YOUR-USERNAME/GitOps-Days
104 | ```
105 | 
106 | ### 2️⃣ Clone Your Fork Locally
107 | 
108 | ```shell
109 | git clone https://github.com/YOUR-USERNAME/GitOps-Days.git
110 | cd GitOps-Days
111 | ```
112 | 
113 | > [!IMPORTANT]
114 | > Replace `YOUR-USERNAME` with your actual GitHub username before running the command.
115 | > If you copy it exactly as show above without replacing it, the command will fail.
116 | 
117 | ### 3️⃣ Checkpoint: Confirm Your Local Repo Is Linked to Your Fork
118 | 
119 | Run:
120 | 
121 | ```shell
122 | git remote -v
123 | ```
124 | 
125 | Expected output:
126 | 
127 | ```
128 | origin  https://github.com/YOUR-USERNAME/GitOps-Days.git (fetch)
129 | origin  https://github.com/YOUR-USERNAME/GitOps-Days.git (push)
130 | ```
131 | 
132 | ✅ This means:
133 | 
134 | * You have a fork in your GitHub account.
135 | * You've cloned it locally.
136 | * Your local repo is connected to your fork, so pushes will go to a place you control
137 | 
138 | ❌ If you see `ahmedmuhi` instead of your username, you cloned the original repo by mistake. Delete the folder and clone your fork instead - Flux won't work otherwise.
139 | 
140 | **Checkpoint complete:** You now have a GitOps-ready repository with full write access, linked locally, and ready for Flux to watch.
141 | 
142 | Next: we’ll create the Kubernetes cluster where your self-healing system will run.
143 | 
144 | ## 🔄 Creating Your Kubernetes Environment
145 | 
146 | Your Git repository is ready - now it's time to prepare the **stage** where GitOps will actually perform it's magic.
147 | We'll spin up a local Kubernetes cluster using **[kind](https://kind.sigs.k8s.io/)** (*Kubernetes in Docker*), which runs an entire cluster inside Docker containers.
148 | 
149 | Why kind? For GitOps experiments, it's perfect:
150 | 
151 | * ⚡ **Fast** - launches in about a minute
152 | * 🔄 **Rebuildable** - tear down and recreate easily to start fresh
153 | * 🧪 **Safe** - keeps your work local so you can experiment without touching production
154 | 
155 | This gives you a fast feedback loop: test your GitOps setup locally, confirm it works, and only then consider running it in production.
156 | 
157 | ⏱️ **Time needed:** \~2 minutes
158 | 
159 | ### Creating the Cluster
160 | 
161 | **Choose one** of the following commands based on your operating system:
162 | 
163 | **Windows (PowerShell or Windows Terminal)**
164 | 
165 | ```shell
166 | kind create cluster --name gitops-loop-demo
167 | ```
168 | 
169 | **macOS/Linux (Terminal)**
170 | 
171 | ```shell
172 | kind create cluster \
173 |   --name gitops-loop-demo
174 | ```
175 | 
176 | This spins up a fully functional Kubernetes cluster in about a minute, using the latest stable Kubernetes version supported by kind.
177 | 
178 | ### ✅ Checkpoint: Confirm Your Cluster Is Ready
179 | 
180 | **Why this matters:**
181 | Before we bring in Flux, we need to be sure the cluster is running and ready to accept workloads. If the cluster isn’t healthy, you'll run into errors later.
182 | 
183 | Run:
184 | 
185 | ```shell
186 | kubectl get nodes
187 | ```
188 | 
189 | **Pass criteria:**
190 | 
191 | * `STATUS` shows **Ready**.
192 | * The `AGE` is just a few minutes (freshly created).
193 | 
194 | Example output:
195 | 
196 | ```
197 | NAME                             STATUS   ROLES           AGE   VERSION
198 | gitops-loop-demo-control-plane   Ready    control-plane   1m    v1.32.x
199 | ```
200 | 
201 | If you see `NotReady`:
202 | 
203 | 1. Wait a few seconds and try again.
204 | 2. Make sure Docker is running:
205 |     ```shell
206 |     docker ps
207 |     ```
208 | 3. Ensure you have a least **4 GB RAM** and **2 GB disk** free
209 | 4. If still stuck, recreate the cluster:
210 | 
211 |     ```shell
212 |     kind delete cluster --name gitops-loop-demo
213 |     kind create cluster --name gitops-loop-demo
214 |     ```
215 | 
216 | ### What You’ve Just Built
217 | 
218 | * ✅ A fully functional Kubernetes cluster running locally
219 | * ✅ A safe, disposable playground for your GitOps experiments
220 | * ✅ The environment Flux will soon manage, keeping it in sync with your Git repo
221 | 
222 | > [!TIP]
223 | > When you’re done with the lab, you can delete the cluster to free resources:
224 | >
225 | > ```shell
226 | > kind delete cluster --name gitops-loop-demo
227 | > ```
228 | 
229 | ## 🚀 Deploy Flux to Your Cluster and Connect It to Your Repo
230 | 
231 | Your cluster is up and running - now it's time to bring in the **director** of our GitOps pla: **Flux**.
232 | 
233 | Flux is not a single binary that “just runs” - when you install it, it deploys several **specialized controllers** in your cluster.
234 | These controllers live together in their own namespace (`flux-system`) and work like a team of automation agents:
235 | 
236 | * **Source Controller** - pulls manifests from your Git repository
237 | * **Kustomize Controller** - applies those manifests to your cluster
238 | * **Helm Controller** - manages Helm releases (if you use them)
239 | * **Notification Controller** - sends alerts and status events
240 | 
241 | Together, they ensure that what's running in your cluster always matches what's in Git - continuously and automatically.
242 | 
243 | ⏱️ **Time needed:** \~3-4 minutes
244 | 
245 | ### Install the Flux controllers
246 | 
247 | Run:
248 | 
249 | ```shell
250 | flux install
251 | ```
252 | 
253 | This sets up all the core Flux components insdie the `flux-system` namespace. From here on, Flux will be applying your declared state and fixing any drift.
254 | 
255 | ### Connect Flux to your Git repository
256 | 
257 | Next, we tell Flux **Source Controller** which Git repository to watch. This must be **your fork on GitHub**, not the original.
258 | 
259 | ```shell
260 | flux create source git gitops-loop-demo \
261 |   --url=https://github.com/YOUR-USERNAME/GitOps-Days.git \
262 |   --branch=main \
263 |   --interval=30s
264 | ```
265 | 
266 | > [!IMPORTANT]
267 | > Replace `YOUR-USERNAME` with your actual GitHub username.
268 | > We use the HTTPS URL so Flux can pull directly from GitHub
269 | > The `--interval=30s` means Flux will check for changes every 30 seconds.
270 | 
271 | ### ✅ Checkpoint: Confirm Flux is healthy and watching your repo
272 | 
273 | 1. **Check that Flux components are healthy:**
274 | 
275 |    ```shell
276 |    flux check
277 |    ```
278 | 
279 |    Example whenall is healthy:
280 | 
281 |    ```shell
282 |    ► checking controllers
283 |    ✔ source-controller: deployment ready
284 |    ✔ kustomize-controller: deployment ready
285 |    ✔ helm-controller: deployment ready
286 |    ✔ notification-controller: deployment ready
287 |    ✔ all checks passed
288 |    ```
289 | 
290 | 2. **Verify your Git source is registered and ready:**
291 | 
292 |    ```shell
293 |    flux get sources git
294 |    ```
295 | 
296 |    Example output when ready:
297 | 
298 |    ```shell
299 |    NAME                URL                                              READY   STATUS                                                              AGE
300 |    gitops-loop-demo    https://github.com/YOUR-USERNAME/GitOps-Days     True    stored artifact for revision 'main@sha1:123abc456def...'             1m
301 |    ```
302 | 
303 | **Pass criteria:**
304 | 
305 | * `flux check` shows all controllers as `✔ ... ready`
306 | * `flux get sources git` shows your source with `READY` = `True`
307 | 
308 | **What this means:**
309 | The Source Controller has made a copy of your GitHub fork and stored it locally inside your cluster. It will refresh this cached copy every 30 seconds (or whatever interval you set), so your cluster always has the latest version of your repo ready to use.
310 | Later, when we tell Flux *what* to deploy, the Kustomize Controller will read those files from this local cache - not directly from GitHub - and deploy them to your cluster.
311 | 
312 | **Checkpoint complete:** 
313 | Flux is now **installed in your cluster** and **watching your GitHub fork**.
314 | Before we tell Flux *what* to deploy from that repository, there's one last bit of setup to keep your work safe.
315 | 
316 | ## 💡 Before You Copy the Example Files
317 | 
318 | I update this repository frequently - adding lessons, improving examples, and fixing issues.
319 | If you make changes directly in the shared `examples/` folders and later *sync your fork* with the upstream repo, those changes could be overwritten.
320 | 
321 | To prevent that:
322 | 
323 | * **Create your own workspace folder** under `student-work/YOUR-USERNAME`
324 | * **Copy** the example files you'll be working on into that folder
325 |     (e.g., copy the Day 2 `hello` app example there)
326 | * Make changes **only** inside your workspace folder
327 | * Later, when we tell Flux what to deploy, you'll point it to *your* folder - not the shared examples
328 | 
329 | This way, syncing upstream changes will never overwrite your personal work.
330 | 
331 | ### 📋 Commands to Create Your Workspace and Copy the Example
332 | 
333 | **macOS/Linux (Terminal):**
334 | 
335 | ```shell
336 | mkdir -p student-work/YOUR-USERNAME/Day2
337 | cp -r examples/days/clusters/local/apps/hello student-work/YOUR-USERNAME/day2/
338 | ```
339 | 
340 | **Windows (PowerShell):**
341 | 
342 | ```shell
343 | New-Item -ItemType Directory - Path "student-work\YOUR-USERNAME\day2" -Force
344 | Copy-Item -Recurse examples\day2\clusters\local\apps\hello student-work\YOUR-USERNAME\day2\
345 | ```
346 | 
347 | > [!IMPORTANT]
348 | > Replace `YOUR-USERNAME` with your actual GitHub username before running these commands.
349 | 
350 | ### ⚠️ A note About Re-copying
351 | 
352 | If you repeat a lesson in the future and re-copy files from `examples/` into a folder that **already exists**, your previous work will be overwritten.
353 | 
354 | **How to avoid overwriting yourself:**
355 | 
356 | * If you want a fresh start, delete or rename your old folder first
357 | * Or create a new subfolder (e.g., `student-work/YOUR-USERNAME/day2-v2`) and copy into that
358 | 
359 | **Next:** We'll copy the Day 2 example into your workspace and tell Flux what to deploy - and when we do that, something will happen immediately that surprises most people on their first run.
360 | 
361 | ## 📦 Tell Flux What to Deploy
362 | 
363 | You've got Flux installed, your repository linked, and you know how to keep your work safe.
364 | Now it's time to give Flux a very specific instruction: **what to deploy** from your Git repo into the cluster.
365 | 
366 | This is done by creating a **Kustomization** - a Kubernetes object that Flux treats like marching orders.
367 | It tells Flux:
368 | 
369 | * **Where** in your Git repository to find your application configuration manifests
370 | * **How often** to check them
371 | * **What to do** when something changes (or is removed)
372 | 
373 | **⏱️ Time needed:** \~2 minutes
374 | 
375 | ### 🛑 Important: Point to Your Own Workspace
376 | 
377 | Do **not** point Flux at the shared `examples/` folder.
378 | If you do, the next time you sync your fork with upstream changes, your work could be overwritten.
379 | 
380 | Instead:
381 | 
382 | * Work only inside your **own** folder under `./student-work/YOUR-USERNAME/`
383 | * Point Flux to *your* folder, not the shared examples
384 | 
385 | For example:
386 | 
387 | ```shell
388 | ./student-work/YOUR-USERNAME/day2/hello
389 | ```
390 | 
391 | ### 🛠️ Create a Kustomization
392 | 
393 | Now tell Flux to deploy from your workspace folder:
394 | 
395 | Run:
396 | 
397 | ```bash
398 | flux create kustomization hello-app \
399 |   --source=GitRepository/gitops-loop-demo \
400 |   --path="./student-work/YOUR-USERNAME/day2/hello" \
401 |   --prune=true \
402 |   --interval=1m
403 | ```
404 | 
405 | **Flags explained:**
406 | 
407 | * `--path` → The folder in your repo where your manifests live
408 | * `--prune=true` → Remove cluster resources when the file is deleted from Git
409 | * `--interval=1m` → Check for drift and reconcile every minute
410 | 
411 | ### ✅ Checkpoint: Confirm the Kustomization is Ready
412 | 
413 | ```bash
414 | flux get kustomizations
415 | ```
416 | 
417 | Example when ready:
418 | 
419 | ```
420 | NAME        READY  MESSAGE                                      REVISION               SUSPENDED
421 | hello-app   True   Applied revision: main@sha1:123abc456def...  main@sha1:123abc...    False
422 | ```
423 | 
424 | **Checkpoint complete:** You’ve just given Flux its marching orders.
425 | 
426 | Here’s the part that surprises most people: the moment you created that Kustomization, Flux didn’t wait for you to push a commit. It immediately pulled the manifests from your folder and applied them to the cluster.
427 | Your very first GitOps-powered deployment has already happened in the background.
428 | 
429 | **Next:** Let’s go see what Flux just installed for you.
430 | 
431 | ## 👀 See Your First Flux Deployment
432 | 
433 | The moment you created your Kustomization, Flux got to work.
434 | It didn't wait for a new commit - it immediately pulled the manifests from your workspace folder and applied them to your cluster.
435 | 
436 | Let's see exactly what happened.
437 | 
438 | **⏱️ Time needed:** \~5 minutes
439 | 
440 | ### 🔍 What Flux Discovered in Your Repository
441 | 
442 | When you created the Kustomization, you told Flux to watch your workspace folder. Inside that folder, Flux found:
443 | 
444 | ```
445 | student-work/YOUR-USERNAME/day2/hello/
446 | ├── namespace.yaml      # Creates the 'hello' namespace
447 | ├── deployment.yaml     # Defines pods running the web server
448 | └── service.yaml        # Exposes the app on port 80
449 | ```
450 | 
451 | These YAML files define a simple “Hello World” web application. The Kustomize Controller read them from the cached copy of your repository that Flux's Source Controller originally fetch from your GitHub Repository, and **deployed them automatically** to your cluster.
452 | 
453 | ### See What Flux Created
454 | 
455 | Check your cluster for these resources:
456 | 
457 | ```bash
458 | kubectl get pods,svc -n hello
459 | ```
460 | 
461 | You should see something like:
462 | 
463 | ```
464 | NAME                         READY   STATUS    RESTARTS   AGE
465 | pod/hello-65d4c4d5c9-xz7vp   1/1     Running   0          2m
466 | 
467 | NAME            TYPE        CLUSTER-IP      PORT(S)   AGE
468 | service/hello   ClusterIP   10.96.x.x       80/TCP    2m
469 | ```
470 | 
471 | > **Note:** The `AGE` column should show only a few minutes - that’s how you know Flux just created them.
472 | 
473 | ### The Automatic Deployment in Action
474 | 
475 | Think about this:
476 | 
477 | * ❌ You didn’t run `kubectl apply`
478 | * ❌ You didn’t manually trigger a deployment
479 | * ✅ Yet your application is running in the cluster
480 | 
481 | That's because Flux:
482 | 
483 | 1. Pulled your manifests from GitHub
484 | 2. Applied them to the cluster
485 | 3. Got everything running without any manual steps
486 | 
487 | ### Access Your Running Application
488 | 
489 | Forward the service port to your local machine:
490 | 
491 | ```bash
492 | kubectl port-forward -n hello svc/hello 8080:80
493 | ```
494 | 
495 | Then open [http://localhost:8080](http://localhost:8080) in your browser.
496 | 
497 | 🎉 **There it is!** Your Hello World app, running in Kubernetes, deployed entirely by Flux.
498 | 
499 | ### ✅ Checkpoint complete:
500 | 
501 | Flux is actively deploying workloads from your workspace folder in your GitHub fork.
502 | Next, let's go beyond simply watching Flux deploy what's already there, we'll make a real change in Git, push it, and watch Flux pick it up and apply it to the cluster automatically.
503 | 
504 | ## ✏️ Make Your First GitOps-Driven Change
505 | 
506 | So far, Flux has deployed what was alreay in your workspace folder.
507 | Now, let's prove that **pushing a Change to Git** is all it takes to update your cluster.
508 | 
509 | **⏱️ Time needed:** \~5 minutes
510 | 
511 | ### 1️⃣ Edit Your Deployment
512 | 
513 | Open:
514 | 
515 | ```shell
516 | student-work/YOUR-USERNAME/day2/hello/deployment.yaml
517 | ```
518 | 
519 | Find:
520 | 
521 | ```yaml
522 | spec:
523 |   replicas: 1
524 | ```
525 | 
526 | ... and change it to:
527 | 
528 | ```yaml
529 | spec:
530 |   replicas: 3
531 | ```
532 | 
533 | This tells Kubernetes to run three pods instead of one.
534 | 
535 | ### 2️⃣ Commit and Push Your Change
536 | 
537 | > [!TIP]
538 | > You can do this because you forked the repository earlier
539 | > If you had cloned the original repo, the `git push` command below would fail - and Flux wouldn't see your change.
540 | 
541 | Run:
542 | 
543 | ```shell
544 | git add student-work/YOUR-USERNAME/day2/hello/deployment.yaml
545 | git commit -m "Scale hello app to 3 replicas"
546 | git push
547 | ```
548 | 
549 | ### 3️⃣ Watch Flux Reconcile
550 | 
551 | Flux checks your Git source every 30 seconds.
552 | Let's watch it notice the new commit and update being applied to your cluster:
553 | 
554 | ```shell
555 | kubectl get deployment hello -n hello -w
556 | ```
557 | 
558 | You'll see the replica count change from 1 → 3 within a minute.
559 | 
560 | ### ✅ Checkpoint: Git Changes = Live Changes
561 | 
562 | Flux just:
563 | 
564 | 1. Pulled your updated repo from GitHub (via the Source Controller cache)
565 | 2. Saw that the desired state now had `replicas: 3`
566 | 3. Applied the change (via the Kustomize Controller) to your cluster automatically
567 | 
568 | No `kubectl apply`. No manual deploys.
569 | Just Git → Flux → Cluster, exactly as GitOps promises.
570 | 
571 | Next, we’ll put that self-healing promise to the test: we’ll *break* something in the cluster on purpose and watch Flux notice the drift - and put it back automatically.
572 | 
573 | ## 🔨 Breaking Things (For Science!)
574 | 
575 | Seeing Flux deploy you app automatically is great - but the **real proof** of a self-healing system is how it responds when things inevitably go wrong.
576 | Let's test that resilience with **two real-world drift simulations**.
577 | 
578 | ### 🧪 Mini-Lab 1: The Emergency Scale
579 | 
580 | **Scenario:**
581 | A teammate is in the middle of an incident. Under pressure, they bypass Git and run a quick fix in the cluster - scaling the app manually.
582 | 
583 | **Action:**
584 | Scale the deployment from 3 to 5:
585 | 
586 | ```bash
587 | kubectl scale deployment hello -n hello --replicas=5
588 | ```
589 | 
590 | Now watch what happens live:
591 | 
592 | ```bash
593 | kubectl get deployment hello -n hello -w
594 | ```
595 | 
596 | > [!TIP]
597 | > The `-w` flag means “watch” - you’ll see updates as they happen.
598 | 
599 | **Expected Result:**
600 | Within \~30-60 seconds, you’ll see something like:
601 | 
602 | ```
603 | hello   3/3   3   3   15m
604 | hello   5/5   5   5   15m15s   # Manual change applied
605 | hello   3/3   3   3   15m40s   # Flux detects drift and fixes it
606 | ```
607 | 
608 | **Why it happens:**
609 | Flux's Kustomize Controller compared the cluster's live state to your Git-defined desired state (3 replicas).
610 | Seeing a mismatch, it reconciled the deployment back to what's in Git - no questions asked.
611 | 
612 | **Checkpoint ✅**
613 | Your deployment is back at **3/3 replicas**.
614 | Manula changes didn't stick - **Git's declared state wins**.
615 | 
616 | Press `Ctrl+C` to stop watching.
617 | 
618 | ### 🧪 Mini-Lab 2: The Catastrophic Delete
619 | 
620 | **Scenario:**
621 | A worst-case accident - the entire `hello` namespace is deleted.
622 | 
623 | **Action:**
624 | Run:
625 | 
626 | ```shell
627 | kubectl delete namespace hello
628 | ```
629 | 
630 | Then watch Flux rebuild everything:
631 | 
632 | ```shell
633 | watch kubectl get all -n hello
634 | ```
635 | 
636 | **Expected Result:**
637 | Over the next \~60 seconds you’ll see:
638 | 
639 | * Namespace recreated
640 | * Deployment, service, and pods restored
641 | * App fully backonline
642 | 
643 | **Why it happens:**
644 | The namespace and its resources are still defined in Git.
645 | When the Kustomize Controller notices they're missing, it reapplies the manifests from the Source Controller's latest copy of your repo until the cluster matches Git again.
646 | 
647 | **Checkpoint ✅**
648 | Namespace **hello** and all resources are running exactly as defined in your repo. Drift is gone.
649 | 
650 | Press `Ctrl+C` when you see everything running again.
651 | 
652 | ### 📊 Why It Took \~60 Seconds
653 | 
654 | When you set up Flux, you configured:
655 | 
656 | * **Source check interval** → every 30 s (updates the cached Git copy)
657 | * **Reconciliation interval** → every 60 s (compares cluster vs. Git and applies fixes)
658 | 
659 | See the timing in the events log:
660 | 
661 | ```bash
662 | flux events --for Kustomization/hello-app
663 | ```
664 | 
665 | Sample output:
666 | 
667 | ```
668 | Reconciliation finished in 1.2s, next run in 1m0s
669 | ```
670 | 
671 | ### 💡 The Takeaway
672 | 
673 | With GitOps:
674 | 
675 | * Accidental deletions → **rebuilt**
676 | * Manual scaling → **reverted**
677 | * Any config drift → **corrected**
678 | 
679 | **Manual fixes that don't exist in Git literally cannot persist.**
680 | The cluster is always brought back to the declared state - automatically, continuously, and reliably.
681 | 
682 | ## 🏆 Day 2 Complete: You Built a Self-Healing Cluster
683 | 
684 | Today, you didn't just learn about GitOps - you proved it works.
685 | 
686 | ### What You Achieved
687 | 
688 | * **A local Kubernetes cluster** running on kind + Docker
689 | * **Flux installed** and watching your personal GitHub fork
690 | * **Automatic depoyments** from Git commits
691 | * **Drift detection and correction** in under a minute
692 | * Confidence that quick fixes and accidental deletes **can't persist**
693 | 
694 | ### Your Journey Today
695 | 
696 | 1. Prepared your workspace with Docker, kind, kubectl, and Git
697 | 2. Forked and cloned your own GitOps repo
698 | 3. Installed Flux and connected it to your fork
699 | 4. Deployed your first app with Kustomization
700 | 5. Made a Git change and watched Flux apply it automatically
701 | 6. Broke things on purpose, and watched Flux heal them
702 | 
703 | ### What It Matters
704 | 
705 | You now have a working Git → Flux → Cluster pipeline.
706 | Your cluster state is no longer guesswork, it's **enforced reality**
707 | 
708 | ### Up Next - Day 3: GitOps in the Cloud
709 | 
710 | Tomorrow, we take your local success to **Azure Kubernetes Service (AKS)**:
711 | 
712 | * Cloud-specific GitOps patterns
713 | * Managing screts securely
714 | * Production-ready deployment flows
715 | 
716 | THe core GitOps loop stays the same - Git defines, Flux enforces, but the stage gets bigger.
717 | 
718 | > [!TIP]
719 | > 
720 | > Between now and day 3, try:
721 | >
722 | > * Changing the `replicas` in your Git repo and watch Flux update it
723 | > * Break things in new ways - Flux will keep fixing them
724 | 
725 | **See you tomorrow for Day 3!**
726 | 
727 | [Continue to Day 3: GitOps in the Cloud →](https://github.com/ahmedmuhi/GitOps-Days/blob/main/Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md)
728 | 
729 | > [!NOTE]
730 | > *Proud of what you built? Share your success! #GitOpsDays*
731 | 


--------------------------------------------------------------------------------
/Day-3-GitOps-on-AKS-Self-Healing-Cloud-Scale.md:
--------------------------------------------------------------------------------
  1 | # 🚀 Day 3 – GitOps in the Cloud with AKS: Same Magic, Bigger Stage
  2 | 
  3 | In [GitOps Day 2](./Day-2-Building-Your-First-Self-Healing-System.md) you built something remarkable on your laptop:
  4 | a Kubernetes system that was truely **resilient and self-healing**. Flux kept every piece in its proper place, automatically repairing whatever chaos you threw at it - scaling changes, deletions, drift. What you saw wasn't just a neat demo; it was the core promise of GitOps delivered: **the cluster always returns to its declared state.**
  5 | 
  6 | Now let's ask a bigger question: *what if that same architectural elegance, that precise reliability, translated identically to the cloud - no matter how sprawling or complex the environment?*
  7 | 
  8 | That's exactly what we'll prove today.
  9 | 
 10 | Welcome back to GitOps Days, where **Git is not just for code anymore**. It's the single source of truth for everything - infrastructure, application configuration, and the glue that holds them together.
 11 | Yesterday, Git defined your laptop cluster. Today, Git defines a **production-grade Azure Kubernetes Service (AKS) cluster** in the cloud.
 12 | 
 13 | Here's the surprising insight: GitOps doesn't fundamentally care where it runs. Local of cloud, small or enterprise-scale, the rules don't change. Same Git. Same Flux. Same self-healing loop. Only the stage beneath it grows larger.
 14 | 
 15 | Today we'll take those principles into the real world:
 16 | 
 17 | * Provision an AKS cluster (using Azure's free credits and low-cost resources for this tutorial).
 18 | * Bootstrap Flux the production way - one command, end-to-end.
 19 | * Deploy the very same Hello app you ran yesterday, now Internet accessible with a single YAML tweak.
 20 | * Break things on prupose, and watch Flux restore order - even Azure load balancer and public IPs.
 21 | 
 22 | By the end, you'll see your local success story scale seamlessly into the cloud. The GitOps loop remains unchanged; the only difference is the size of the playground.
 23 | 
 24 | ## ☁️ Preparing Your Cloud Workspace
 25 | 
 26 | Before we build your first AKS cluster, let's pause and get the **essentials** in place. Moving from your laptop to Azure isn't a huge leap - the setup looks almost the same, just one extra wait while Azure spins things up.
 27 | 
 28 | But the key difference: this tutorial won't be completely free.
 29 | But don't worry - it's very inexpensive. Running a one-node AKS cluster with a load balancer costs **about $1 if you leave it up for a full workday**. If you shut it down after this tutorial, you'll spend less than the price of a coffee. And if you're new to Azure, Microsoft gives you **$200 in free credits** when you sign up.
 30 | 
 31 | ### 💰 Cost Snapshot
 32 | 
 33 | * **Control plane**: free on the *Free* tier
 34 | * **Node (Standard_B2s)**: ~$0.04-$0.05 per hour
 35 | * **OS disk + Load Balancer + IP**: ~$0.04 per hour
 36 | * **Total**: ~$0.08-$0.09 per hour (≈$0.70 for 8 hours)
 37 | * **Cleanup**: We'll delete everything at the end so there are no ongoing costs
 38 | 
 39 | ### ✅ What You'll Need
 40 | 
 41 | 1. **Azure account**
 42 | 
 43 |    * Sign up at [azure.microsoft.com/free](https://azure.microsoft.com/free) if you don't have one
 44 |    * Comes with **$200 in credits**
 45 | 
 46 | 2. **Azure CLI** (version 2.76.0 or later)
 47 | 
 48 |    ```bash
 49 |    # Check your version
 50 |    az --version
 51 | 
 52 |    # Install/update if needed:
 53 |    # Windows: winget install Microsoft.AzureCLI
 54 |    # macOS: brew install azure-cli
 55 |    # Linux: curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
 56 |    ```
 57 |   
 58 | 3. **Tools from Day2**
 59 | 
 60 |    * `kubectl` - for cluster verification
 61 |    * `flux` CLI - for bootstrapping GitOps
 62 |    * `git` - your single source of truth
 63 | 
 64 | ### 🔑 Logging Into Azure
 65 | 
 66 | 1. Open your terminal and run:
 67 | 
 68 |    ```bash
 69 |    az loging
 70 |    ```
 71 | 
 72 |    * A browser window will open for you to authenticate with your Azure account.
 73 |    * Once signed in, the CLI retrieves your tenants and subscriptions.
 74 | 
 75 | 2. If you have access to more than one Azure Subscription, you'll see a prompt to **select which subscription** you want to use. The output would look like this:
 76 | 
 77 |    ```
 78 |    [Tenant and subscription selection]
 79 | 
 80 |    No   Subscription name                     Subscription ID                       Tenant
 81 |    ---  ------------------------------------  ------------------------------------  -----------------
 82 |    [1]  Pay-As-You-Go Dev/Test                xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  Default Directory
 83 |    [2]* Visual Studio Enterprise Subscrip...  xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  Default Directory
 84 | 
 85 |    The default is marked with an *.
 86 |    Select a subscription and tenant (Type a number or Enter for no changes):
 87 |    ```
 88 | 
 89 |    Pick the subscription where you're comfortable creating temporary test resources.
 90 | 
 91 | 3. Double-check that the correct subscription is active:
 92 | 
 93 |    ```bash
 94 |    az account show --output table
 95 |    ```
 96 | 
 97 | ### 📦 A Quick Word on Kubernetes Types
 98 | 
 99 | Yesterday, with kind (Kubernetes in Docker), you ran a **local simulation of Kubernetes**. It mimics the control plane and nodes inside containers, perfect for quick experiments.
100 | 
101 | Today with AKS, Microsoft porvides a **conformant Kubernetes cluster** - meaning it passes the CNCF's official tests and behaves like upstream Kubernetes. The manifests you wrote yesterday will run here too, but now on a real cloud infrastructure managed by Azure.
102 | 
103 | ✅ That's it. You've set up your workspace. logged into Azure. and know what costs to expect.
104 | 👉 In the next section, we'll actually **create your AKS cluster** and see your GitOps loop come alive in the cloud.
105 | 
106 | ## ⚙️ Create Your AKS Cluster
107 | 
108 | With your cloud workspace ready, it's time for the fun part: creating your first **Azure Kubernetes Service (AKS)** cluster.
109 | 
110 | This is the same GitOps loop you've built yesterday, just running on **real, cloud-hosted infrastructure**. The steps look familiar: we'll create a logical container (called a *resource group*) to hold your cluster, then provision the cluster itself. Once it's up, you'll connect `kubectl` to it so you can talk to it from your laptop.
111 | 
112 | ⏱️ **Time neded:** about 7-10 minutes (most of it waiting for Azure to spin up the nodes).
113 | 
114 | ### Pick a Region
115 | 
116 | Choose a region that is close to you to minimize latency. For me, that's **Australia East (Sydney)**, but you can use whatever region works best.
117 | 
118 | Example regions often used for learning:
119 | 
120 | * **Australia East** (Sydney) - good from New Zealand/Australia
121 | * **East US 2** or **West US 2** - good for the Americas
122 | * **West Europe** - good for Europe
123 | 
124 | Well set a shell variable for the location for consistency:
125 | 
126 | ```bash
127 | LOCATION="australiaeast"
128 | ```
129 | 
130 | ### Create a Resource Group
131 | 
132 | A **resource group** in Azure is just a logical container to keep related resource together (cluster, disks, load balancer).
133 | 
134 | ```bash
135 | RESOURCE_GROUP="gitops-prod-rg"
136 | 
137 | az group create \
138 |   --name $RESOURCE_GROUP \
139 |   --location $LOCATION
140 | ```
141 | 
142 | ### Create Your AKS Cluster
143 | 
144 | Now let's create a small AKS Cluster - one node is enough for this tutorial.
145 | 
146 | ```bash
147 | az aks create \
148 |   --resource-group $RESOURCE_GROUP \
149 |   --name gitops-prod-aks \
150 |   --location $LOCATION \
151 |   --kubernetes-version 1.33 \
152 |   --node-count 1 \
153 |   --node-vm-size Standard_B2s \
154 |   --tier free \
155 |   --enable-managed-identity \
156 |   --generate-ssh-keys
157 | ```
158 | 
159 | **What this does.**
160 | 
161 | * Spins up an AKS cluster with **Kubernetes 1.33** (latest long-term supported version)
162 | * Uses **1x Standard_B2s VM** for your node (lowest cost, enough for learning)
163 | * Runs in the **Free tier** (no control-plane cost; only the node, disk, and LB/IP billed)
164 | * Generates SSH keys automatically so Azure can access the node if needed
165 | 
166 | > [!TIP]
167 | > **Cost reminder:** At this size, running the cluster for 8 hours costs about **$0.70**. We'll delete it at the end, to avoid ongoing cost.
168 | 
169 | ### Connect to Your Cluster
170 | 
171 | Once creation completes (≈7-10 minutes), download the credentials so `kubectl` can talk to your cluster:
172 | 
173 | ```bash
174 | az aks get-credentials \
175 |   --resource-group $RESOURCE_GROUP \
176 |   --name gitops-prod-aks \
177 |   --overwrite-existing
178 | ```
179 | 
180 | This merges the cluster context into your `~/.kube/config` file. From now on, `kubectl` commands will talk to AKS instead of your local kind cluster.
181 | 
182 | ### Verify Your Cluster
183 | 
184 | Check that your cluster is ready:
185 | 
186 | ```bash
187 | kubectl get nodes
188 | ```
189 | 
190 | You should see one node with `STATUS = Ready`. For example:
191 | 
192 | ```bash
193 | NAME                                STATUS   ROLES   AGE   VERSION
194 | aks-nodepool1-12345678-vmss000000   Ready    <none>  2m    v1.33.2
195 | ```
196 | 
197 | ✅ Congratulations - you now have a real, conformant Kubernetes cluster running in Azure!
198 | 
199 | 👉 Before we install Flux, let's first prepare your **student workspace** so you have a safe place in your fork to work with. This ensures your changes won't be overwritten if the the upstream examples are updated.
200 | 
201 | ## 📂 Preparing Your Student Worspace
202 | 
203 | Your AKS cluster is live - now it's time to set up the **Git side** of your GitOps loop. The step makes sure you're working in a safe copy of the repo that belongs to you, so nothing you do later gets lost when the examples are updated.
204 | 
205 | ### 1️⃣ Check and Sync Your Fork
206 | 
207 | All your changes will live in *your fork* of the repository. Before starting Day 3:
208 | 
209 | * Open your fork on GitHub (`https://github.com/<your-username>/GitOps-Days`)
210 | * If GitHub shows a banner that says **“This branch is behind ...”**, click **Sync fork → Update branch**
211 | * If it says you're already up to date, no action is needed
212 | 
213 | 👉 This ensures your fork contains the latest Day 3 examples before you clone it locally.
214 | 
215 | ### 2️⃣ Clone Your Fork Locally
216 | 
217 | Always clone **your fork's URL**, not the original repo:
218 | 
219 | ```bash
220 | git clone https://github.com/<your-username>/GitOps-Days.git
221 | cd GitOps-Days
222 | ```
223 | 
224 | This guarantees your local Git repo `origin` points to your fork (the one Flux will use later).
225 | 
226 | ### 3️⃣ Copy Day 3 Examples Into Your Workspace
227 | 
228 | Never work directly in the sahred `examples/` folder - it may change in future updates. Instead, create your personal workspace:
229 | 
230 | ```bash
231 | mkdir -p student-work/<your-username>/day3
232 | cp -r examples/day3/clusters student-work/<your-username>/day3/
233 | ```
234 | 
235 | You now have:
236 | 
237 | ```
238 | student-work/<your-username>/day3/clusters/aks/apps/hello
239 | ├── deployment.yaml
240 | ├── namespace.yaml
241 | ├── service.yaml
242 | └── kustomization.yaml
243 | ```
244 | 
245 | This is **your safe sandbox**. Any changes you make will live here, untouched by upstream syncs.
246 | 
247 | ### 4️⃣ Set Up GitHub Credentials
248 | 
249 | Flux will commit its own config into your fork when we set it up later, so it needs your GitHub username and a personal access token (PAT).
250 | 
251 | #### 1. Create a PAT i GitHub
252 | 
253 | * Go to **Settings → Developer settings → Personal access tokens**
254 | * Choose **Fine-grained token** (recommended)
255 | * Scope it to **your fork**, with at least `contents: read/write`
256 | * Give it a friendly **name/label** like `gitops-days` so you can recoginize it later in your GitHub account.
257 | 
258 | > [!IMPORTANT]
259 | > GitHub will show you a **long random secret string** only once when the token is created (it looks like `ghp_abCdEf1234567890XYZ...`).
260 | > Copy this secret immediately - this is the actual token you'll provide to Flux later.
261 | > The name/label you gave the token is just for your GitHub dashboard; Flux never sees that.
262 | 
263 | #### 2. Export Your Credentials
264 | 
265 | Paste your GitHub username and the **secret token string** into your shell as environment variables:
266 | 
267 | ```bash
268 | # macOS/Linux
269 | export GITHUB_USER=<your-username>
270 | export GITHUB_TOKEN=<your-token-from-earlier>
271 | 
272 | # Windows PowerShell
273 | $env:GITHUB_USER="<your-username>"
274 | $env:GITHUB_TOKEN="<your-token-from-earlier>"
275 | ```
276 | 
277 | Later we'll tell `flux` to pick these up automatically when it talks to GitHub.
278 | You don't need to pass `--token` on every command.
279 | 
280 | ✅ At this point you have:
281 | 
282 | * A GitHub fork up to date with Day 3
283 | * A safe student workspace folder in your fork
284 | * Credentials exported so Flux can authenticate with GitHub
285 | 
286 | 👉 Next, we'll install Flux into the cluster using the **bootstrap method** and connect it to your Git repository. That's where the GitOps magic comes alive at cloud scale.
287 | 
288 | ## 🤖 Installing Flux on Your Cloud Custer
289 | 
290 | **Yesterday you installed Flux piece by piece. Today, you'll do it the production way - with one command that does everything.**
291 | 
292 | On Day 2 you ran `flux install`, then create a `GitRepository`, then a `Kustomization`. That was great for learning the moving parts.
293 | In production, though, teams want one clean step. That's what `flux bootstrap` gives you - and that's what you'll use now.
294 | 
295 | ### 1️⃣ Run the Bootstrap Command
296 | 
297 | With your AKS cluster ready and your student workspace set up, install Flux into the cluster and connect it to your fork:
298 | 
299 | ```bash
300 | flux bootstrap github \
301 |   --owner=$GITHUB_USER \
302 |   --repository=GitOps-Days \
303 |   --branch=main \
304 |   --path=student-work/$GITHUB_USER/day3/clusters/aks \
305 |   --personal
306 | ```
307 | 
308 | * `--owner` → your GitHub username (the one you exported as `$GITHUB_USER`)
309 | * `--repository` → the name of your forked repo (`GitOps-Days`)
310 | * `--branch` → the branch Flux will track (`main` in our case)
311 | * `--path` → the folder inside your repo that Flux will watch (`student-work/your-username/day3/clusters/aks`)
312 | * `--personal` → tells Flux this is your personal fork, not an org repo
313 | 
314 | This one command does all the heavy lifting: it installs Flux controllers in your cluster `and` pushes Flux's own configuration into your fork, that's why we provided Flux with your GitHub username/token to allow it to commit to your repo.
315 | 
316 | ⏱️ **How long will this take?** Usually 2-3 minutes. You'll know it's working when you see new commits appear in your fork under `student-work/your-username/day3/clusters/aks/flux-system`.
317 | 
318 | ### 2️⃣ Verify Flux is Healthy
319 | 
320 | Once the bootstrap finishes, check that everything came up cleanly:
321 | 
322 | ```bash
323 | flux check
324 | flux get sources git
325 | flux get kustomizations
326 | ```
327 | 
328 | You should see:
329 | 
330 | * Controllers in the `flux-system` namespace are **ready**
331 | * A `GitRepository/flux-system` pointing at your fork
332 | * A `Kustomization/flux-system` applying your root folder
333 | 
334 | If you see all green checks ✅, congratualtions - Flux is now installed in AKS, connected to your fork, and running GitOps.
335 | 
336 | That's it. Flux is now alive and watching your repo.
337 | 
338 | 👉 Next, we'll slow down and unpack *what actually happened* during bootstrap - the files Flux created, the `flux-system` namespace, an how Flux now manages itself.
339 | 
340 | ## 🧩 Unpacking Flux's Bootstrap
341 | 
342 | You just ran `flux bootstrap` and confirmed Flux is alive in your AKS cluster. From the outside it look like one simple step - but behind the scenes, Flux quitly set up quite a bit of scaffolding for you.
343 | Let's slow down and see what actually happened.
344 | 
345 | ### How your repo changed
346 | 
347 | **Before bootstrap**, your Day 3 workspace was simple - it only contained your Hello app:
348 | 
349 | ```
350 | student-work/<your-username>/day3/clusters/aks/
351 | └── apps/
352 |     └── hello/
353 |         ├── deployment.yaml
354 |         ├── namespace.yaml
355 |         ├── service.yaml
356 |         └── kustomization.yaml
357 | ```
358 | 
359 | **After bootstrap** Flux committed new files into your fork so it could manage itself:
360 | 
361 | ```
362 | student-work/<your-username>/day3/clusters/aks/
363 | ├── kustomization.yaml            # Root recipe (entry point for Flux)
364 | ├── flux-system/                  # Flux's own configuration
365 | │   ├── gotk-components.yaml
366 | │   ├── gotk-sync.yaml
367 | │   └── kustomization.yaml        # Flux system recipe
368 | └── apps/
369 |     └── hello/
370 |         ├── deployment.yaml
371 |         ├── namespace.yaml
372 |         ├── service.yaml
373 |         └── kustomization.yaml    # Hello app Recipe 
374 | ```
375 | 
376 | #### What This Means
377 | 
378 | * The **new `flux-system` folder** is Flux adding its own configuration to Git.
379 | * The files inside tell Fulx *what to install* and *where to look*.
380 | * You don't need to edit these files right now - we'll make a small, safe change soon to connect you app.
381 | 
382 | For now, just remeber: **Flux manages your cluster by following recipes (`kustomization.yaml` files) it finds in your repo.**
383 | 
384 | 👉 But there's one detail that trips most people up at first: *why are there so many `kustomization.yaml` files now, and what's the difference between them?*
385 | We'll clear that up in the next section before you make your first change.
386 | 
387 | ## ❓ Why Flux Has So Many `kustomization.yaml` Files?
388 | 
389 | If you looked closely at your repo after bootstrap, you may have noticed something surprising: suddenly there are **three different `kustomization.yaml` files.**
390 | 
391 | They look almost identical at first glance, but each one play a **different role**. Understanding this now ill make the next step - deploying your Hello app much clearer.
392 | 
393 | ### 1️⃣ Root recipe (top level)
394 | 
395 | * **Where:** `student-work/<your-username>/day3/cluster/aks/kustomization.yaml`
396 | * **Role:** The **entry point** for Flux in this repo.
397 | * Think of it as the master to-do list: “apply this folder, then that folder.”
398 | * Right now its only asking Flux to apply `./flux-system`. Soon, you'll `./apps/hello`.
399 | 
400 | ### 2️⃣ Flux system recipe
401 | 
402 | * **Where:** `student-work/<your-username>/day3/clusters/aks/flux-system/kustomization.yaml`
403 | * **Role:** Applies Flux's own building blocks.
404 | 
405 |   * `gotk-components.yaml` → installs the Flux controllers and CRDs.
406 |   * `gotk-sync.yaml` → wires Flux to your repo (GitRepository + Kustomization objects).
407 | * This is how Flux keeps itself running.
408 | 
409 | ### 3️⃣ Hellp app recipe
410 | 
411 | * **Where:** `student-work/<your-username>/day3/clusters/aks/apps/hello/kustomization.yaml`
412 | * **Role:** Bundles your app's resources together.
413 | * It includes:
414 | 
415 |   * `namespace.yaml` (the Hello namespace)
416 |   * `deployment.yaml` (the pods)
417 |   * `service.yaml` (the LoadBalancer on AKS)
418 | * When Flux sees this file, it will apply all three resources as one unit.
419 | 
420 | ### ✨ How they connect
421 | 
422 | * The **root recipe** tells Flux to apply what's inside `./flux-system` folder.
423 | * The **flux-system recipe** tells Flux to apply its own configuration YAML files so it can run.
424 | * Sonn, you'll add `./apps/hello` to the **root recipe**. When you do, Flux will open the Hello app folder, finds the **hello app recipe**, and applies your app's resources as per the recipe.
425 | 
426 | So the structure is **layered, not duplicated**:
427 | 
428 | * Root → flux-system → Flux controllers
429 | * Root → hello → your app resources
430 | 
431 | 👉 With that clear, you're ready for the fun part: edit the root recipe to add your Hello app and watch Flux deploy it automatically. That's next.
432 | 
433 | ## ☁️ Your First GitOps Cloud Deployment
434 | 
435 | **Time to deploy to the cloud using pure GitOps. No `kubectl apply`. Just Git.**
436 | 
437 | Remember that Hello app from Day 2 - the one that healed itself when Flux noticed drift? You're about to deploy the exact same app to AKS. The only difference is that now it will be reachable on the internet through a real Azure Load Balancer.
438 | 
439 | ### 1️⃣ Verify Your App Folder
440 | 
441 | You should alreay have the Hello app folder in your repo (created for you during prep):
442 | 
443 | ```
444 | student-work/<your-username>/day3/clusters/aks/apps/hello/
445 | ├── deployment.yaml
446 | ├── namespace.yaml
447 | ├── service.yaml
448 | └── kustomization.yaml
449 | ```
450 | 
451 | This folder contains everything Flux needs to deploy the app, including the `kustomization.yaml` file that lists the three resources above so they are applied together. 
452 | 
453 | If you don't see this folder go back to the prep step, copy it again from the examples folder or recreate it from your GitHub fork.
454 | 
455 | ### 2️⃣ Edit the Root Recipe
456 | 
457 | Now we'll tell Flux to include your Hello app in its reconciliation loop.
458 | 
459 | Open the root `kustomization.yaml`
460 | 
461 | ```
462 | student-work/<your-username>/day3/clusters/aks/kustomization.yaml
463 | ```
464 | 
465 | Add the Hello app folder under `resources:`:
466 | 
467 | ```yaml
468 | apiversion: kustomize.config.k8s.io/v1beta1
469 | kind: Kustomization
470 | resources:
471 |   - ./flux-system
472 |   - ./apps/hello       # ← add this line
473 | ```
474 | 
475 | This is the single line that “wires” your hello app into the GitOps loop.
476 | 
477 | ### 3️⃣ Commit and Push
478 | 
479 | Save you change, then commit and push it to your fork:
480 | 
481 | ```bash
482 | git add .
483 | git commit -m "Add Hello app to root recipe"
484 | git push origin main
485 | ```
486 | 
487 | From now on, Flux will notice this change in Git and apply the Hello app to your cluster automatically.
488 | 
489 | ### 4️⃣ Watch Flux Reconcile
490 | 
491 | Within about a minute, Flux will detect the commit, pull it down, and reconcile.
492 | You can watch it happen live:
493 | 
494 | ```bash
495 | flux logs --follow --tail 20
496 | ```
497 | 
498 | You'll see messages about GitRepository updating and the root Kustomization applying.
499 | 
500 | Press `Ctrl+c` once you see the reconciliation complete.
501 | 
502 | ### 5️⃣ Verify Hello App Resources in AKS
503 | 
504 | Chech that the Hello app is now up and running in AKS:
505 | 
506 | ```bash
507 | kubectl get pods -n hello
508 | kubectl get svc -n hello
509 | ```
510 | 
511 | You'll should see something like:
512 | 
513 | ```
514 | NAME    TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)   AGE
515 | hello   LoadBalancer   10.0.123.45   20.248.xxx.xxx   80/TCP    2m
516 | ```
517 | 
518 | * The `Pod` is running inside the `hello` namespace.
519 | * The `Service` is of type `LoadBalancer`. AKS will provision an Azure Load Balancer and assign it a public IP address.
520 | 
521 | >[!TIP]
522 | > ⏳ If `EEXTERNAL_IP` shows `<pending>`, wait 30-60 seconds and try again. Azure is creating the load balancer.
523 | 
524 | ### 6️⃣ Access Your App in the Browser
525 | 
526 | Once the external IP is assigned, get the URL:
527 | 
528 | ```bash
529 | echo "http://$(kubectl get svc hello -n hello -o jsonpath='{.status.loadBalancer.ingress[0].ip}')"
530 | ```
531 | 
532 | Copy the printed URL into your browser.
533 | 
534 | 🎉 **Boom!** Your Hello World app is live on the internet, deployed entirely through GitOps.
535 | 
536 | ### 7️⃣ What Just Happened
537 | 
538 | Let's recap:
539 | 
540 | 1. You edited Git - not the cluster.
541 | 2. Flux noticed the change and reconciled the root recipe.
542 | 3. Flux read the Hello app recipe and applied the app resources into the `hello` namespace.
543 | 4. AKS provisioned a real Azure Load Balancer with a public IP.
544 | 5. You app became internet accessible.
545 | 
546 | No manual `kubectl apply`. Just Git → Flux → Cloud.
547 | 
548 | **Same GitOps loop as Day 2 - just running at cloud scale!**
549 | 
550 | 👉 Next, we'll **update the Hello app using GitOps** by changing the replica count in Git and watch Flux reconcile the cluster to match Git.
551 | 
552 | ## 🔄 Updating Your App with GitOps
553 | 
554 | You've just deployed your Hello app to AKS using Flux. That's powerful on its own - but the real beauty of GitOps is how **every change flows the same way**: edit Git → comit → push → Flux reconciles. Let's try that now.
555 | 
556 | ### 1️⃣ Make a Change in Git
557 | 
558 | Right now your app is running with a single replica:
559 | 
560 | ```yaml
561 | # student-work/<your-username./day3/clusters/aks/apps/hello/deployment.yaml
562 | spec:
563 |   replicas: 1
564 | ```
565 | 
566 | Open this file and change the replicas from **1** to **3**:
567 | 
568 | ```yaml
569 | spec:
570 |   replicas: 3
571 | ```
572 | 
573 | This tells Kubernetes (via Flux) that you want three pods running instead of one.
574 | 
575 | ### 2️⃣ Commit and Push
576 | 
577 | Save the change and commit it to your fork:
578 | 
579 | ```bash
580 | git add student-work/<your-username>/day3/clusters/aks/apps/hello/deployment.yaml
581 | git commit -m "Scale Hello app to 3 replicas"
582 | git push origin main
583 | ```
584 | 
585 | ### 3️⃣ Watch Flux Reconcile
586 | 
587 | Flux checks for new commits about every minute. Once it sees your change, it will reconcile the cluster to match Git.
588 | 
589 | You can watch the rollout live:
590 | 
591 | ```bash
592 | kubectl get deployment hello -n hello -w
593 | ```
594 | 
595 | You'll see the replica count go from **1 → 3**:
596 | 
597 | ```
598 | NAME    READY   UP-TO-DATE   AVAILABLE   AGE
599 | hello   1/1     1            1           5m
600 | hello   2/3     2            2           5m
601 | hello   3/3     3            3           5m
602 | ```
603 | 
604 | Press `Ctrl+c` to stop watching once all 3 replicas are up.
605 | 
606 | ### 4️⃣ What Just Happened
607 | 
608 | You changed a single line in Git (`replicas: 1 → 3`).
609 | * Flux noticed the new commit and pulled it.
610 | * Flux reconciled your cluster so the deployment matched what Git declared.
611 | * Kubernetes spun up two more pods until the cluster had 3 replicas running.
612 | 
613 | ✅ That's GitOps in action: deployments, updates, and changes all flow through the same loop.
614 | 
615 | 👉 Next, we'll push this one step further: what happens if we “break” things manually in the cluster?
616 | You'll see Flux heal them back into shape automatically.
617 | 
618 | ## 💥 Cloud-Scale Self-Healing
619 | 
620 | **Now that you've deployed and updated your Hello app the GitOps way, let's see what happens when things drift in the cluster without a Git commit.**
621 | 
622 | This is where GitOps really shines - Flux continuously reconciles the cluster against what's in Git, reparing anythin that drifts.
623 | 
624 | >[!NOTE]
625 | >To make the demo feedback fast, we'll first shorten Flux's reconcile internal.
626 | 
627 | ### 1️⃣ Speed Up Self-Healing
628 | 
629 | By default, Flux reconciles every **10 minutes**. For this demo, we'll set it to **1 minute** so you don't have to wait long.
630 | 
631 | Open the file:
632 | 
633 | ```
634 | student-work/<your-username>/day3/clusters/aks/flux-system/gotk-sync.yaml
635 | ```
636 | 
637 | Find the `Kustomization` spec and change:
638 | 
639 | ```yaml
640 | spec:
641 |   interval: 1m0s   # ← Change from 10m0s
642 | ```
643 | 
644 | Commit and Push:
645 | 
646 | ```bash
647 | git add student-work/<your-username>/day3/aks/flux-system/gotk-sync.yaml
648 | git commit -m "Speed up Flux reconciliation to 1 minute"
649 | git push origin main
650 | ```
651 | 
652 | Flux now reconciles once a minute.
653 | 
654 | ### 2️⃣ Test 1: Manual Scale Up
655 | 
656 | Imagine a teammate scaled the deployment directly in the cluster to cope with demand, but forgot to commit the change to Git. Flux will notice and correct it.
657 | 
658 | ```bash
659 | # Scale to 5 replicas manually (bypassing GitOps)
660 | kubectl scale deployment hello -n hello --replicas=5
661 | 
662 | # Watch the deployment in real-time
663 | kubectl get deployment hello -n hello -w
664 | ```
665 | 
666 | You should see something like this:
667 | 
668 | ```
669 | NAME    READY   UP-TO-DATE   AVAILABLE   AGE
670 | hello   3/3     3            3           12m
671 | hello   5/5     5            5           12m30s   # Manual change applied
672 | hello   3/3     3            3           13m      # Flux detected a drift reconciled it back to Git state
673 | ```
674 | 
675 | Press `Ctrl+c` to stop watching.
676 | ✅ Git's declared state (3 replicas) wins. Manual changes don't stick.
677 | 
678 | ### 3️⃣ Test 2: The Nuclear Option
679 | 
680 | Let's go further: delete the entire namespace. This wipes out the app, service, load balancer, and public IP.
681 | 
682 | ```bash
683 | kubectl delete namespace hello
684 | ```
685 | 
686 | Now watch Flux rebuild it:
687 | 
688 | ```bash
689 | # Watch until the namespace reappears
690 | kubectl get ns -w
691 | ```
692 | 
693 | Within **1-3 minutes** you'll see:
694 | 
695 | * Namespace recreated
696 | * Deployment and pods spun up
697 | * Service re-created
698 | * Azure provisioned a new load balancer and public IP
699 | 
700 | Press `Ctrl+c` once everything is back.
701 | 
702 | ### 4️⃣ Verify You App is Back
703 | 
704 | Get the new public IP:
705 | 
706 | ```bash
707 | kubectl get svc hello -n hello
708 | ```
709 | 
710 | Print the new URL:
711 | 
712 | ```bash
713 | echo "http://$(kubectl get svc hello -n hello -o jsonpath='{.status.loadBalancer.ingress[0].ip}')"
714 | ```
715 | 
716 | Copy the URL and open it in your browser. 🎉 Your app is back online again! Now with a fresh public IP.
717 | 
718 | >[!NOTE]
719 | >Because Azure created a new Load Balancer, the external IP may change each time the service is re-created.
720 | 
721 | ### 5️⃣ Check Out Flux's Records
722 | 
723 | See what Flux recorded during this healing:
724 | 
725 | ```bash
726 | flux events -n flux-system --for Kustomization/flux-system
727 | ```
728 | 
729 | You'll see entries like:
730 | 
731 | ```
732 | Reconciliation finished in 1.3s, next run in 1m0s
733 | Applied 3 resources
734 | Namespace hello created
735 | Deployment hello created
736 | Service hello created
737 | ```
738 | 
739 | ### 6️⃣ What We LEarned
740 | 
741 | * **Drift is temporary** - Flux brings the cluster back to what is declared in Git 
742 | * **Manual changes don't persist** - Git is always the source truth
743 | * **Deletions are reparied** - automation prevails.
744 | 
745 | What we saw on your laptop in Day 2 now holds true at cloud scale. Flux healed not just pods, but also Azure infrastructure like load balancers and public IPs.
746 | 
747 | 🎉 **Congratulations!** You've now seen GitOps enforce and heal workloads in the cloud, end-to-end.
748 | 
749 | 👉 Next, let's clean up your resources and wrap up Day 3.
750 | 
751 | ## 🧹 Cleanup & Next Steps
752 | 
753 | **Before you log off, let's make sure you delete everything you created today.**
754 | This step is important because cloud resources (like load balancers and node VMs) can continue incurring costs if left running.
755 | 
756 | ### Delete the Resource Group
757 | 
758 | All your AKS resources were created inside a single Azure resource group (`gitops-prod-rg`). Deleting the group removes **everything inside it** in one go.
759 | 
760 | ```bash
761 | # Delete the entire resource group (and all resources it contains)
762 | az group delete --name gitops-prod-rg --yes --no-wait
763 | ```
764 | 
765 | This will remove:
766 | 
767 | * Your AKS cluster
768 | * All worker nodes
769 | * Any load balancers
770 | * Public IP addresses
771 | * Disks, NIC, and other linked resources
772 | 
773 | The `--no-wait` flag tells Azure to start deletion in the background so you don't have to sit around watching it.
774 | 
775 | ### C;ean Up Local Kubeconfig (Optional)
776 | 
777 | Azure resources are gone, but your kubeconfig may still have a ontext pointing to the deleted cluster.
778 | You can safely remove it:
779 | 
780 | ```bash
781 | kubectl config delete-context gitops-prod-aks || true
782 | kubectl config delete-cluster gitops-prod-aks || true
783 | ```
784 | 
785 | This prevents confusion later if you create a new cluster with the same name.
786 | 
787 | ### Double-Check Azure Resources are Deleted
788 | 
789 | If you want to confirm the resource group is gone:
790 | 
791 | ```bash
792 | az group list --output table
793 | ```
794 | 
795 | You shouldn't see `gitops-prod-rg` anymore.
796 | 
797 | ✅ That's it! Your cloud environment is fully cleaned up and won't incure any ongoing costs.
798 | 
799 | 👉 Next, we'll **wrap-up** to reflect on what you achieved in Day 3, and then a **Day 4 preview** to tee up production GitOps patterns.
800 | 
801 | ## 🎯 Day 3 Wrap-Up
802 | 
803 | **Take a step back, you just ran GitOps at cloud scale!**
804 | 
805 | ### What You Did Today
806 | 
807 | * ✅ **Provisioned AKS** with a single `az` command
808 | * ✅ **Bootsrapped Flux** into the cluster with one command
809 | * ✅ **Understood** how flux wires itself into Git (root + flux-system + app recipes)
810 | * ✅ **Deployed** your Hello app to the internet using GitOps
811 | * ✅ **Updated** your app by changing replicas in Git (no `kubectl apply`)
812 | * ✅ **Stress tested drift** by scaling manually and deleting the namespace, Flux healed everything
813 | * ✅ **Cleaned up** your resources safely
814 | 
815 | That's huge amount of ground to cover in one day.
816 | 
817 | ### Why It Matters
818 | 
819 | You proved that:
820 | 
821 | * GitOps is **infrastructure agnostic**: same workflow, different clusters
822 | * Cloud infrastructure is no different from local, Flux reconciles both
823 | * Self-healing is real, not just a demo trick: even Azure load balancers came back automatically
824 | * Git is now your **single source of truth** for both apps and cluster config
825 | 
826 | This is the promise of GitOps: consistent, reliable, drift-free deployments, no matter the environment.
827 | 
828 | 👉 Next, in **Day 4**, we'll take the leap from “it works” to **production-ready patterns**: multiple environments, image automation, and secrets. That's when you'll see how real-world teams apply these principles at scale.
829 | 
830 | ## 🚀 Coming Up Next (Day 4)
831 | 
832 | Tmorrow, you'll see how to take you GitOps workflow from **capable** to **production-ready**:
833 | 
834 | * **Multi-environment GitOps**
835 |   Manage dev, staging, and production clusters cleanly from Git
836 | 
837 | * **Image automation with GitHub Actions**
838 |   Build and push new app images automatically, with GitOps handling deployment
839 | 
840 | * **Azure Container Registry (ACR)**
841 |   Store your container images securely and integrate them into your GitOps loop
842 | 
843 | * **Secret management**
844 |   Handle sensitive data (like API keys) safely in a GitOps workflow
845 | 
846 | 🎉 Congratulations again on completing Day 3! Take a well-desrved break, you've earned it.
847 | 
848 | See you in **Day 4**, where we'll add the patterns that make GitOps ready for real-world production.


--------------------------------------------------------------------------------