├── .gitignore
├── LICENSE
├── questions
├── 07_secrets.md
├── 08_security.md
├── 10_general.md
├── 09_shift_left.md
├── 02_cd.md
├── 03_gitops.md
├── 06_helm.md
├── 04_docker.md
├── 01_ci.md
└── 05_kubernetes.md
├── CONTRIBUTING.md
├── README.md
└── home-assignments
└── assignment_1.md
/.gitignore:
--------------------------------------------------------------------------------
1 | # ---- Basic Ignored Files ----
2 | .DS_Store
3 | *.log
4 | *.tmp
5 |
6 | # ---- Node / Java / etc. (Add or remove as needed) ----
7 | node_modules/
8 | target/
9 | out/
10 |
11 | # ---- Python ----
12 | __pycache__/
13 | *.pyc
14 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2024 Moran Weissman
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/questions/07_secrets.md:
--------------------------------------------------------------------------------
1 | # 07 – Secrets Management
2 |
3 | Deals with handling sensitive data like passwords, tokens, and keys in DevOps.
4 |
5 | ## Table of Contents
6 | 1. What Are Secrets & Why
7 | 2. Storing and Managing Secrets
8 |
9 | ---
10 |
11 | ## 1) What Are Secrets & Why
12 | **Question:**
13 | Organizations always talk about storing things like passwords or tokens as “secrets.” What’s the big deal?
14 |
15 |
16 | Hints / Key Points
17 |
18 | - It keeps sensitive data out of plain text in code or config.
19 | - Minimizes risk if repos or logs get exposed.
20 | - In Kubernetes, a Secret is base64-encoded, but a real secrets manager provides better security.
21 |
22 |
23 | ---
24 |
25 | ## 2) Storing and Managing Secrets
26 | **Question (Scenario):**
27 | Your team has many credentials for different microservices. How can you safely store and use them without embedding them in code?
28 |
29 |
30 | Hints / Key Points
31 |
32 | - Use a **secrets manager** (like HashiCorp Vault, AWS Secrets Manager, Azure Key Vault).
33 | - Give each service controlled access.
34 | - Automate secret rotation and auditing, especially if you handle sensitive data.
35 |
36 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 |
3 | Thank you for your interest in improving **DevOps Interview Questions**!
4 |
5 | ## Contribution Steps
6 |
7 | 1. **Fork this repository** on GitHub.
8 | 2. **Create a new branch** in your fork, e.g. `feature/add-docker-questions`.
9 | 3. Add or modify files:
10 | - To add new questions, pick the most relevant folder (e.g., `03_docker/questions.md`) or create a new folder if needed.
11 | - Follow the question format shown below.
12 | 4. **Commit** and **push** your changes to your branch.
13 | 5. Open a **Pull Request** (PR) against the `main` branch of this repo. Include a short description of your changes.
14 |
15 | ## Question Format
16 |
17 | Here’s a suggested format for new questions in `questions.md`:
18 |
19 | ```md
20 | ## [Short Title or ID]
21 | **Question:**
22 | (Describe the question, scenario, or challenge)
23 |
24 | **Answer Hints / Key Points:**
25 | - (List possible solution approaches or facts)
26 |
27 | **Why This Matters (Optional):**
28 | - (Explain the real-world relevance)
29 |
30 | **Style Guidelines:**
31 | - (Keep questions concise and clear.)
32 | - (Avoid duplicates. If a question overlaps with an existing one, consider merging them or adding details.)
33 | - (Use neutral or professional language.)
34 | - (For home assignments, place them in home-assignments/ with a clear name (e.g. assignment_2.md).)
35 |
36 | **License**
37 | By contributing, you agree your contributions are licensed under the project’s MIT License.
38 |
39 | Thank you for helping us build a great DevOps interview resource!
--------------------------------------------------------------------------------
/questions/08_security.md:
--------------------------------------------------------------------------------
1 | # 08 – Security
2 |
3 | Focuses on adding security checks to CI/CD, avoiding hardcoded secrets, and building secure images.
4 |
5 | ## Table of Contents
6 | 1. Pipeline Security
7 | 2. Avoiding Hardcoded Secrets
8 | 3. Secure Docker Builds
9 |
10 | ---
11 |
12 | ## 1) Pipeline Security
13 | **Question (Scenario):**
14 | You want to catch vulnerabilities early. How can you add security steps to your CI/CD pipeline?
15 |
16 |
17 | Hints / Key Points
18 |
19 | - **Static code analysis** (SAST) to look for known flaws.
20 | - **Image scanning** for Docker containers.
21 | - **SCA scanning** Analyzing open-source and third-party components in the application for vulnerabilities and licensing issues.
22 | - Dependency checks to flag libraries with known CVEs.
23 |
24 |
25 | ---
26 |
27 | ## 2) Avoiding Hardcoded Secrets
28 | **Question:**
29 | You found actual passwords in your pipeline scripts. How do you clean that up and prevent it from happening again?
30 |
31 |
32 | Hints / Key Points
33 |
34 | - Store secrets in a secure variable store or a secrets manager.
35 | - Don’t commit them to Git.
36 | - Use environment variables or injected secrets at runtime.
37 |
38 |
39 | ---
40 |
41 | ## 3) Secure Docker Builds
42 | **Question:**
43 | Your production app runs in Docker containers. What steps can you take to make sure those containers are secure?
44 |
45 |
46 | Hints / Key Points
47 |
48 | - Use **minimal base images**, patch them regularly.
49 | - Don’t run as root if you can avoid it.
50 | - Scan images for vulnerabilities before deploying.
51 | - Sign images for verification (e.g., with Cosign or Notary).
52 |
53 |
--------------------------------------------------------------------------------
/questions/10_general.md:
--------------------------------------------------------------------------------
1 | # 10 – General / Architecture
2 |
3 | Covers broader DevOps or architecture topics.
4 |
5 | ## Table of Contents
6 | 1. Stateless vs Stateful
7 | 2. DevOps Culture
8 | 3. Scenario: Handling a Critical Outage
9 | 4. Additional General Questions
10 |
11 | ---
12 |
13 | ## 1) Stateless vs Stateful
14 | **Question:**
15 | What’s the main difference between stateless and stateful apps, and how does this affect scaling?
16 |
17 |
18 | Hints / Key Points
19 |
20 | - **Stateless**: Doesn’t keep data in memory across requests; easier to scale horizontally.
21 | - **Stateful**: Maintains sessions or data that might need external storage. Harder to scale.
22 | - Many modern microservices aim to be stateless for simplicity.
23 |
24 |
25 | ---
26 |
27 | ## 2) DevOps Culture
28 | **Question:**
29 | How would you describe “DevOps culture,” and what does it change compared to older dev-and-ops silos?
30 |
31 |
32 | Hints / Key Points
33 |
34 | - Focus on **collaboration** and **automation**.
35 | - Shared responsibility for stability, performance, and delivery.
36 | - Closer feedback loops, continuous integration, continuous delivery.
37 |
38 |
39 | ---
40 |
41 | ## 3) Scenario: Handling a Critical Outage
42 | **Question (Scenario):**
43 | Production goes down during peak hours. Describe how you’d handle the incident, from detecting the problem to resolving it.
44 |
45 |
46 | Hints / Key Points
47 |
48 | - **Quick triage**: check logs, monitoring, recent changes, and alerts.
49 | - Possibly roll back the last deployment if that caused it.
50 | - Communicate clearly with stakeholders.
51 | - After fixing, do a post-mortem to learn from it.
52 |
53 |
54 | ---
55 |
56 | ## 4) Additional General Questions
57 | - *(Placeholder for any other broad DevOps or architecture topics.)*
58 |
--------------------------------------------------------------------------------
/questions/09_shift_left.md:
--------------------------------------------------------------------------------
1 | # 09 – Shift Left
2 |
3 | Idea of doing testing and security earlier in the development process.
4 |
5 | ## Table of Contents
6 | 1. Shift Left Concept
7 | 2. Why Shift Left
8 | 3. CI/CD Tools for Shift Left
9 | 4. Empowering Developers
10 |
11 | ---
12 |
13 | ## 1) Shift Left Concept
14 | **Question:**
15 | What does “Shift Left” mean in software development, and why do we hear about it a lot now?
16 |
17 |
18 | Hints / Key Points
19 |
20 | - Do QA/testing/security as early as possible, not at the end.
21 | - Helps catch problems sooner, which is cheaper to fix.
22 | - Encourages developers to take ownership of quality from the start.
23 |
24 |
25 | ---
26 |
27 | ## 2) Why Shift Left
28 | **Question (Scenario):**
29 | Your testers always find big problems right before release. How could a Shift Left approach help?
30 |
31 |
32 | Hints / Key Points
33 |
34 | - If devs run tests and checks on each commit, issues are spotted earlier.
35 | - Fewer surprises at the end.
36 | - Faster feedback loops and more stable releases.
37 |
38 |
39 | ---
40 |
41 | ## 3) CI/CD Tools for Shift Left
42 | **Question:**
43 | How do tools like Jenkins, GitHub Actions, or Azure Pipelines help a Shift Left approach?
44 |
45 |
46 | Hints / Key Points
47 |
48 | - They let you run automated tests and scans on every push or pull request.
49 | - If something fails, devs see it right away.
50 | - Can even spin up test environments automatically for quick integration checks.
51 |
52 |
53 | ---
54 |
55 | ## 4) Empowering Developers
56 | **Question:**
57 | Some companies give developers full power to define infrastructure (e.g., Helm charts). How does this fit into Shift Left?
58 |
59 |
60 | Hints / Key Points
61 |
62 | - Developers can fix both code and deployment configs early on.
63 | - No waiting for ops teams to fix environment issues.
64 | - Still need guardrails, but it speeds up delivery and fosters more responsibility.
65 |
66 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # DevOps Interview Questions
2 |
3 | A comprehensive collection of DevOps-related interview questions and scenario-based exercises.
4 | This repository is designed to help both **interviewers** and **candidates** prepare for a wide variety of DevOps topics.
5 |
6 | ---
7 |
8 | ## Table of Contents
9 |
10 | 1. [Continuous Integration (CI)](./questions/01_ci.md)
11 | 2. [Continuous Deployment (CD)](./questions/02_cd.md)
12 | 3. [GitOps & ArgoCD](./questions/03_gitops.md)
13 | 4. [Docker](./questions/04_docker.md)
14 | 5. [Kubernetes](./questions/05_kubernetes.md)
15 | 6. [Helm](./questions/06_helm.md)
16 | 7. [Secrets](./questions/07_secrets.md)
17 | 8. [Security](./questions/08_security.md)
18 | 9. [Shift Left](./questions/09_shift_left.md)
19 | 10. [General / Architecture](./questions/10_general.md)
20 | 11. [Home Assignments](./home-assignments/assignment_1.md)
21 |
22 | ---
23 |
24 | ## How to Use
25 |
26 | - **Interviewers**:
27 | - Pick relevant topics (e.g., Docker, Kubernetes) and select scenario-based questions to evaluate practical experience.
28 | - Use the [home assignments](./home-assignments/assignment_1.md) to assess coding and configuration skills under real-world conditions.
29 |
30 | - **Candidates**:
31 | - Start by reviewing questions in your chosen topic (e.g., `01_ci.md`) and try answering on your own.
32 | - Expand your knowledge by reading the hidden hints in the `` blocks.
33 | - Check scenario-based questions for real-world problem-solving practice.
34 |
35 | ---
36 |
37 | ## Contributing
38 |
39 | Contributions are always welcome!
40 | - To add or edit questions, please see our [CONTRIBUTING.md](./CONTRIBUTING.md) for the recommended structure and style guidelines.
41 | - You can also submit additional scenario-based exercises or home assignments that fit the existing format.
42 |
43 | ---
44 |
45 | ## License
46 |
47 | This project is licensed under the [MIT License](./LICENSE).
48 | Feel free to use it as a reference or foundation for your own DevOps interview prep or training material.
49 |
50 | ---
51 |
52 | Enjoy and good luck with your interviews!
53 |
--------------------------------------------------------------------------------
/home-assignments/assignment_1.md:
--------------------------------------------------------------------------------
1 | # Home Assignment 1
2 |
3 | **Objective**: Demonstrate end-to-end DevOps skills, from setting up a local environment to configuring CI/CD, containers, and Kubernetes.
4 |
5 | ---
6 |
7 | ## Topics / Instructions
8 |
9 | ### Prerequisites
10 | - Use a web-server application where the config file path is passed via an environment variable.
11 | - Avoid Terraform for infra creation; everything can be local or in-kind/minikube.
12 |
13 | ---
14 |
15 | ### 1. CI
16 |
17 | - **Branch policy**: Explain how you’d set up and enforce a branch policy.
18 | - **Merge to main**: Why require pull requests? Outline best practices (e.g., code reviews, checks).
19 | - **DRY Pipelines**: Use parameterization or templates to avoid repeating code.
20 | - **Versioning**:
21 | - How do you manage application versions after merges?
22 | - How do you let developers work on branches without bumping the version prematurely?
23 | - **Dockerfile**:
24 | - Optimize the Dockerfile; possibly integrate CI steps (tests, lint) within it.
25 | - On PRs, run at least one test (lint/code coverage/unit tests) inside the Docker build.
26 |
27 | ---
28 |
29 | ### 2. CD
30 |
31 | - **ArgoCD App-of-Apps**:
32 | - Use something like kind or minikube. No need for Terraform.
33 | - Deploy and manage your Helm chart via an ArgoCD Application.
34 | - The ArgoCD Application is itself managed by a root “app-of-apps.”
35 | - The application should track the latest commit and be managed by the root app.
36 | - Explain the folder structure and how it all ties together.
37 |
38 | ---
39 |
40 | ### 3. Helm / Kubernetes
41 |
42 | - **Handling API Deprecations**: Provide examples of how you’d manage deprecations when upgrading K8s.
43 | - **Create a Job**: In your Helm chart, include a “hello world” job that must run first during ArgoCD deployment.
44 | - **Secure Environment Variables**: Ensure certain env vars do not appear in the raw Helm chart for security reasons.
45 | - **Config in Git**: The web server’s configuration is managed from Git/Helm, not stored in the image.
46 |
47 | ---
48 |
49 | ## Deliverables
50 |
51 | 1. A small Git repository showing:
52 | - The sample application code or Dockerfile
53 | - A CI pipeline definition (GitHub Actions, Azure Pipelines, or your choice)
54 | - A Helm chart for deployment
55 | - ArgoCD manifests if applicable
56 |
57 | 2. **README** in your repo explaining:
58 | - How to run or test it locally
59 | - How to access the deployed app in kind/minikube
60 |
61 | 3. Explanations of your design decisions in short markdown notes:
62 | - How you tackled versioning
63 | - Why you used certain Docker/Helm features
64 | - How the ArgoCD “app-of-apps” setup is structured
65 |
66 | ---
67 |
68 | ## Tips
69 |
70 | - Demonstrate best practices (small Docker images, secrets management, no sensitive info in code).
71 | - Keep it simple. This is a local assignment, so ephemeral clusters or local containers are fine.
72 | - Focus on clarity, automation, and an end-to-end approach.
73 |
74 | Good luck!
--------------------------------------------------------------------------------
/questions/02_cd.md:
--------------------------------------------------------------------------------
1 | # 02 – Continuous Deployment (CD)
2 |
3 | Automating deployments, promotions across environments, and using different strategies like rolling or canary.
4 |
5 | ## Table of Contents
6 | 1. Promotion Across Environments
7 | 2. K8s Deployment Strategies (Rolling, Blue/Green, Canary)
8 | 3. Examples of Blue/Green & Canary
9 | 4. Bonus: Canary/Blue-Green in Kubernetes
10 | 5. Argo Rollouts Explanation
11 | 6. **Scenario: Slow Canary Deployments with Argo Rollouts**
12 |
13 | ---
14 |
15 | ## 1) Promotion Across Environments
16 | **Question:**
17 | You have `dev`, `staging`, and `prod` environments. How would you automate moving a service from dev to staging, then on to prod, with proper checks along the way?
18 |
19 |
20 | Hints / Key Points
21 |
22 | - **Multi-stage pipeline** with approvals or gating (manual or automated).
23 | - Same config, but different overrides for each environment (Helm or plain YAML).
24 | - Possibly a final manual step for production if your org requires it.
25 |
26 |
27 | ---
28 |
29 | ## 2) K8s Deployment Strategies (Rolling, Blue/Green, Canary)
30 | **Question:**
31 | You want minimal downtime and easy rollback. Compare rolling updates, blue/green deployments, and canary releases.
32 |
33 |
34 | Hints / Key Points
35 |
36 | - **Rolling**: Replaces pods one by one; simpler to set up but partial downtime if something goes wrong mid-roll.
37 | - **Blue/Green**: Parallel environments; easy rollback by flipping traffic back.
38 | - **Canary**: Gradual traffic shift, letting you observe performance in real time with a fraction of traffic.
39 |
40 |
41 | ---
42 |
43 | ## 3) Examples of Blue/Green & Canary
44 | **Question (Scenario):**
45 | Your new microservice is critical, and you’re debating between blue/green and canary. Give a quick example of each strategy to show how you’d roll out a new version.
46 |
47 |
48 | Hints / Key Points
49 |
50 | - **Blue/Green**:
51 | - Deploy new version in parallel (green).
52 | - Test it, switch traffic once stable.
53 | - Rollback by switching to old (blue) if needed.
54 |
55 | - **Canary**:
56 | - Send small % of traffic to new version.
57 | - Watch metrics, gradually increase if stable.
58 | - Roll back if problems arise.
59 |
60 |
61 | ---
62 |
63 | ## 4) Bonus: Canary/Blue-Green in Kubernetes
64 | **Question:**
65 | Can you do canary or blue-green deployments directly in Kubernetes without extra tools?
66 |
67 |
68 | Hints / Key Points
69 |
70 | - Yes, but you have to handle traffic splits, labels, or separate services yourself.
71 | - Tools like **Argo Rollouts** or a service mesh (Istio) make it easier.
72 |
73 |
74 | ---
75 |
76 | ## 5) Argo Rollouts Explanation
77 | **Question:**
78 | What is Argo Rollouts, and how does it help with advanced deployment strategies?
79 |
80 |
81 | Hints / Key Points
82 |
83 | - **Kubernetes controller** that replaces Deployments with CRDs.
84 | - Supports canary, blue-green, progressive rollouts with health checks.
85 | - Integrates with ingress controllers or service meshes for traffic shaping.
86 |
87 |
88 | ---
89 |
90 | ## 6) Scenario: Slow Canary Deployments with Argo Rollouts
91 | **Question:**
92 | You set up Argo Rollouts for canary deployments, but traffic shifting is slower than expected, delaying the rollout.
93 |
94 | - What steps do you take to debug this behavior?
95 | - How can you ensure canary traffic shifts happen faster?
96 |
97 |
98 | Hints / Key Points
99 |
100 | - Check the **Rollout object** for correct strategy, weights, and health checks.
101 | - Make sure the ingress or service mesh is applying traffic splits properly.
102 | - Validate that health checks or success criteria aren’t too strict, causing slow or paused progress.
103 | - Monitor logs or metrics to see if pods are failing readiness or not meeting thresholds.
104 |
105 |
--------------------------------------------------------------------------------
/questions/03_gitops.md:
--------------------------------------------------------------------------------
1 | # 03 – GitOps & ArgoCD
2 |
3 | Focuses on GitOps concepts, ArgoCD usage, the app-of-apps pattern, and ApplicationSets.
4 |
5 | ## Table of Contents
6 | 1. What is GitOps?
7 | 2. ArgoCD as a GitOps Tool
8 | 3. App-of-Apps Pattern
9 | 4. Adding a New App Under a Root “Umbrella”
10 | 5. ApplicationSet (Uses & Generators)
11 | 6. **Scenario: Deprecated APIs Causing ArgoCD Sync Failures**
12 | 7. **Scenario: Namespace Conflicts in ArgoCD Applications**
13 |
14 | ---
15 |
16 | ## 1) What is GitOps?
17 | **Question:**
18 | Your manager says, “We should do GitOps!” How do you explain the core idea of GitOps?
19 |
20 |
21 | Hints / Key Points
22 |
23 | - Infrastructure and app config in Git as the single source of truth.
24 | - A tool (ArgoCD) automatically updates the cluster to match Git.
25 | - All changes go through PRs, so everything is tracked and auditable.
26 |
27 |
28 | ---
29 |
30 | ## 2) ArgoCD as a GitOps Tool
31 | **Question:**
32 | Why is ArgoCD considered a GitOps solution and not just another deployment tool?
33 |
34 |
35 | Hints / Key Points
36 |
37 | - Watches a Git repo for changes, automatically syncing them into the cluster.
38 | - Declarative approach: no manual clicks or imperative commands.
39 | - Integrates with Helm, Kustomize, or plain YAML.
40 |
41 |
42 | ---
43 |
44 | ## 3) App-of-Apps Pattern
45 | **Question (Scenario):**
46 | You have a bunch of microservices, each with its own repo or chart. You want a single place to manage them. How would you do this in ArgoCD?
47 |
48 |
49 | Hints / Key Points
50 |
51 | - Root “umbrella” application referencing multiple child apps.
52 | - Each child is a separate Helm chart or folder.
53 | - App-of-apps keeps everything organized under one main config.
54 |
55 |
56 | ---
57 |
58 | ## 4) Adding a New App Under a Root “Umbrella”
59 | **Question:**
60 | If you already have a root “app-of-apps” that manages several services, how do you add a brand-new microservice?
61 |
62 |
63 | Hints / Key Points
64 |
65 | - Create a new ArgoCD “child” Application in the same or separate repo.
66 | - Reference it in the root app’s YAML so ArgoCD picks it up.
67 |
68 |
69 | ---
70 |
71 | ## 5) ApplicationSet (Uses & Generators)
72 | **Question:**
73 | What is an ApplicationSet in ArgoCD, and when might you use it? Name at least one generator type.
74 |
75 |
76 | Hints / Key Points
77 |
78 | - **ApplicationSet** is a CRD that can create multiple ArgoCD Applications automatically.
79 | - Used for deploying the same app across multiple clusters or generating apps for each folder/branch.
80 | - Generators: List, Git directory, Cluster, or SCM provider.
81 |
82 |
83 | ---
84 |
85 | ## 6) Scenario: Deprecated APIs Causing ArgoCD Sync Failures
86 | **Question:**
87 | After upgrading Kubernetes, an ArgoCD application fails to sync due to deprecated APIs in its Helm chart (e.g., `extensions/v1beta1` → `apps/v1`).
88 |
89 | - How do you identify which APIs are outdated?
90 | - How do you fix them?
91 |
92 |
93 | Hints / Key Points
94 |
95 | - Check chart templates for old API references.
96 | - Replace them with newer versions (`apps/v1`).
97 | - Tools like **Pluto** or `helm template` can highlight deprecated APIs.
98 | - Keep charts updated and test in a staging cluster before upgrading production.
99 |
100 |
101 | ---
102 |
103 | ## 7) Scenario: Namespace Conflicts in ArgoCD Applications
104 | **Question:**
105 | Two ArgoCD applications deploy into the same namespace, causing resource conflicts (e.g., overlapping ConfigMaps). One of them fails to sync.
106 |
107 | - How would you troubleshoot and resolve it?
108 | - How do you prevent it in the future?
109 |
110 |
111 | Hints / Key Points
112 |
113 | - Identify which resources clash by checking logs or ArgoCD sync errors.
114 | - Move each app into its own namespace for isolation, or rename the conflicting resources.
115 | - Have a clear naming/namespace strategy so multiple apps don’t step on each other.
116 |
117 |
--------------------------------------------------------------------------------
/questions/06_helm.md:
--------------------------------------------------------------------------------
1 | # 06 – Helm
2 |
3 | Helm chart usage, templating, hooks, etc.
4 |
5 | ## Table of Contents
6 | 1. Chart Resource Requests/Limits
7 | 2. Requests vs Limits Differences
8 | 3. Sequence of Operations (Hooks)
9 | 4. Multiple Jobs Order
10 | 5. Dynamic Resource Generation
11 | 6. Handling API Deprecations
12 | 7. TPL Helper File
13 | 8. (Optional) Scenario: Deprecated APIs Causing ArgoCD Sync Failures w/ Helm
14 |
15 | ---
16 |
17 | ## 1) Chart Resource Requests/Limits
18 | **Question:**
19 | Why do we set resource requests and limits in a Helm chart, and how do we manage them across environments?
20 |
21 |
22 | Hints / Key Points
23 |
24 | - Ensures pods have enough CPU/memory, prevents resource hogging.
25 | - Helm `values.yaml` can differ for dev vs prod.
26 | - Good for cost control and stability.
27 |
28 |
29 | ---
30 |
31 | ## 2) Requests vs Limits Differences
32 | **Question:**
33 | What’s the difference between resource requests and limits in Kubernetes, and how does Helm simplify handling them?
34 |
35 |
36 | Hints / Key Points
37 |
38 | - **Requests**: minimum guaranteed resources.
39 | - **Limits**: maximum allowed before throttling or OOMKill.
40 | - Helm: store these in `values.yaml` for easy environment overrides.
41 |
42 |
43 | ---
44 |
45 | ## 3) Sequence of Operations (Hooks)
46 | **Question:**
47 | You want a job to run before your main app starts. How do you do that in Helm?
48 |
49 |
50 | Hints / Key Points
51 |
52 | - Use **Helm hooks** (`pre-install`, `post-install`) on that job.
53 | - The job runs first; if it succeeds, Helm proceeds to install the rest.
54 | - Weights can fine-tune the order of multiple hooks.
55 |
56 |
57 | ---
58 |
59 | ## 4) Multiple Jobs Order
60 | **Question:**
61 | If you have multiple jobs that need to run in a certain sequence, how can Helm handle that?
62 |
63 |
64 | Hints / Key Points
65 |
66 | - Hooks with **weights** (lower weight runs first).
67 | - Or a single job that does tasks in order.
68 | - Sometimes separate subcharts if they’re truly independent.
69 |
70 |
71 | ---
72 |
73 | ## 5) Dynamic Resource Generation
74 | **Question:**
75 | You want to create several similar resources from a list in `values.yaml`. How do you do that with Helm?
76 |
77 |
78 | Hints / Key Points
79 |
80 | - Use `{{- range .Values.myItems }}` in the template.
81 | - Each item in the list gets its own resource.
82 | - `_helpers.tpl` can keep repeated logic DRY.
83 |
84 |
85 | ---
86 |
87 | ## 6) Handling API Deprecations
88 | **Question:**
89 | A new Kubernetes version might deprecate older APIs. How do you update your Helm charts to handle that?
90 |
91 |
92 | Hints / Key Points
93 |
94 | - Replace old references (e.g., `extensions/v1beta1`) with `apps/v1`.
95 | - Tools like **pluto** can scan for deprecated usage.
96 | - Test in a lower environment or staging cluster first.
97 |
98 |
99 | ---
100 |
101 | ## 7) TPL Helper File
102 | **Question:**
103 | What is the `tpl` function in Helm, and why might you use it?
104 |
105 |
106 | Hints / Key Points
107 |
108 | - `tpl` parses a string as a Helm template at runtime.
109 | - Good for user-provided or nested templates in `values.yaml`.
110 | - Keep advanced logic or partials in `_helpers.tpl`.
111 |
112 |
113 | ---
114 |
115 | ## 8) (Optional) Scenario: Deprecated APIs Causing ArgoCD Sync Failures w/ Helm
116 | **Question:**
117 | After upgrading the cluster, your Helm chart can’t sync in ArgoCD because it references deprecated APIs.
118 |
119 | - How do you find which APIs are outdated?
120 | - How do you fix them in your chart?
121 |
122 |
123 | Hints / Key Points
124 |
125 | - Look at the chart’s templates for older API versions (e.g., `extensions/v1beta1`).
126 | - Update them to the newer equivalents (e.g., `apps/v1`).
127 | - **Helm 3.12+** offers a **server-side dry run** via:
128 | ```bash
129 | helm upgrade --dry-run=server ...
130 | ```
131 | This checks the manifests against the actual cluster APIs, catching potential deprecations or validation issues before you apply them.
132 | - You can also use tools like **Pluto** to scan for deprecated or removed APIs.
133 | - Test in a non-production cluster to confirm everything works with the new APIs.
134 |
135 |
--------------------------------------------------------------------------------
/questions/04_docker.md:
--------------------------------------------------------------------------------
1 | # 04 – Docker
2 |
3 | Discusses container basics, Dockerfiles, and best practices.
4 |
5 | ## Table of Contents
6 | 1. Docker vs Container
7 | 2. Dockerfile: ENTRYPOINT vs CMD
8 | 3. Image Size Optimization
9 | 4. Build-time vs Runtime Secrets
10 | 5. Reducing Docker Build Time
11 | 6. Docker-in-Docker or Alternatives
12 | 7. Other Container Build Tools
13 | 8. **Scenario: Kaniko Image Build Failing Without Docker Daemon**
14 | 9. (Optional) RBAC for Kaniko? (See `05_kubernetes.md` for RBAC if needed)
15 |
16 | ---
17 |
18 | ## 1) Docker vs Container
19 | **Question:**
20 | What’s the difference between Docker as a tool and the concept of a container?
21 |
22 |
23 | Hints / Key Points
24 |
25 | - **Docker** is a platform for building/managing containers.
26 | - A **container** is an isolated environment bundling the app and dependencies.
27 | - Docker is popular, but other runtimes exist (Podman, Containerd).
28 |
29 |
30 | ---
31 |
32 | ## 2) Dockerfile: ENTRYPOINT vs CMD
33 | **Question:**
34 | A coworker is confused about `ENTRYPOINT` and `CMD` in a Dockerfile. How do you explain the difference?
35 |
36 |
37 | Hints / Key Points
38 |
39 | - **ENTRYPOINT**: The main command the container will always run.
40 | - **CMD**: Default arguments that can be overridden at runtime.
41 | - Typically, you set `ENTRYPOINT` to the main process and use `CMD` for optional flags.
42 |
43 |
44 | ---
45 |
46 | ## 3) Image Size Optimization
47 | **Question (Scenario):**
48 | You have a huge Docker image based on a full JDK. You want to reduce its size. How do you do it, and why does size matter?
49 |
50 |
51 | Hints / Key Points
52 |
53 | - Use **multi-stage builds** or switch to a smaller base (JRE or Alpine).
54 | - Clean up leftover artifacts (logs, caches).
55 | - Smaller images pull faster, less storage overhead, fewer security risks.
56 |
57 |
58 | ---
59 |
60 | ## 4) Build-time vs Runtime Secrets
61 | **Question:**
62 | You need to use some private tokens or credentials during the build. How do you add them without exposing them in the final image?
63 |
64 |
65 | Hints / Key Points
66 |
67 | - **ARG** for build-time secrets (not present in the final image).
68 | - Inject secrets at runtime via environment variables or secret managers.
69 | - Don’t store secrets in plain text in the Dockerfile or version control.
70 |
71 |
72 | ---
73 |
74 | ## 5) Reducing Docker Build Time
75 | **Question (Scenario):**
76 | Your Docker builds take too long, especially for .NET or Java projects. How can you make them faster?
77 |
78 |
79 | Hints / Key Points
80 |
81 | - Reorder Dockerfile steps to install dependencies first, so you can cache them.
82 | - Use multi-stage builds to keep final images small.
83 | - Possibly store or share a build cache between CI runs.
84 |
85 |
86 | ---
87 |
88 | ## 6) Docker-in-Docker or Alternatives
89 | **Question:**
90 | Sometimes we build Docker images inside a container during CI. Why do we do that, and what are some alternatives?
91 |
92 |
93 | Hints / Key Points
94 |
95 | - **Docker-in-Docker**: runs a Docker daemon inside a container, but can be less secure.
96 | - Alternatives: **Kaniko**, **Buildah**, **Podman** to build without a Docker daemon.
97 | - Reduces the need for privileged mode in CI.
98 |
99 |
100 | ---
101 |
102 | ## 7) Other Container Build Tools
103 | **Question:**
104 | If Docker wasn’t an option, what else could you use to build and run containers?
105 |
106 |
107 | Hints / Key Points
108 |
109 | - **Podman**, **Buildah**, **Containerd**.
110 | - For Java, **Jib** can build containers without a Docker daemon.
111 | - Some environments rely on containerd or rkt (less common nowadays).
112 |
113 |
114 | ---
115 |
116 | ## 8) Scenario: Kaniko Image Build Failing Without Docker Daemon
117 | **Question:**
118 | You are using **Kaniko** to build and push Docker images in a Kubernetes Pod, but the build fails due to the absence of a Docker daemon.
119 |
120 | - What steps would you take to troubleshoot Kaniko’s failure?
121 | - How can you configure it to build images without a Docker daemon?
122 |
123 |
124 | Hints / Key Points
125 |
126 | - Verify the **Kaniko** Pod has access to your Dockerfile, context, and registry credentials.
127 | - Kaniko doesn’t need a Docker daemon; pass `--context` and `--destination` parameters correctly.
128 | - Ensure you have the right RBAC permissions if it needs to create or manage certain resources in K8s.
129 |
130 |
--------------------------------------------------------------------------------
/questions/01_ci.md:
--------------------------------------------------------------------------------
1 | # 01 – Continuous Integration (CI)
2 |
3 | Focuses on building code, versioning, branching strategies, and ensuring every commit is validated properly.
4 |
5 | ## Table of Contents
6 | 1. Clean Build Environments
7 | 2. Git & Branch Policies
8 | 3. Testing from Dev/Personal Branches (Version Bump Avoidance)
9 | 4. Code Validation in CI
10 | 5. Git Flow vs OneFlow
11 | 6. Unit / Integration / E2E Tests
12 | 7. Scenario: Slow Build Times
13 | 8. **Scenario: Double-Build Problem in Docker CI Pipeline**
14 | 9. **Scenario: Docker Build Time Optimization in CI/CD**
15 |
16 | ---
17 |
18 | ## 1) Clean Build Environments
19 | **Question (Scenario):**
20 | Your builds sometimes fail because leftover files from previous runs cause unexpected results. How would you make sure each build runs in a clean, consistent environment?
21 |
22 |
23 | Hints / Key Points
24 |
25 | - **Containerized** or **ephemeral** build agents ensure no leftover files or dependencies.
26 | - Reproducible environment: each run starts fresh.
27 | - Avoid “works on my machine” issues by isolating dependencies.
28 |
29 |
30 | ---
31 |
32 | ## 2) Git & Branch Policies
33 | **Question:**
34 | When setting up a new Git repository for a service, how do you make sure collaboration and merging are done in a controlled, high-quality way?
35 |
36 |
37 | Hints / Key Points
38 |
39 | - Require **pull requests** to merge into `main`.
40 | - Use **branch protections**: code reviews, mandatory checks, or tests.
41 | - Adopt a branching strategy (Git Flow, OneFlow, or trunk-based).
42 |
43 |
44 | ---
45 |
46 | ## 3) Testing from Dev/Personal Branches (Version Bump Avoidance)
47 | **Question:**
48 | You want developers to build and test code on a personal branch without bumping the official app version. How do you do that?
49 |
50 |
51 | Hints / Key Points
52 |
53 | - Only bump the version if building from `main` (after merging).
54 | - Use **ephemeral tags** (like `dev-`) for personal branches.
55 | - This prevents cluttering your production semver with test builds.
56 |
57 |
58 | ---
59 |
60 | ## 4) Code Validation in CI
61 | **Question:**
62 | What kind of checks do you usually include in a CI pipeline to keep code quality high?
63 |
64 |
65 | Hints / Key Points
66 |
67 | - **Linting** and **static analysis** for code style and potential security flaws.
68 | - **Unit tests** to verify logic.
69 | - Optional checks: integration tests, code coverage, or style checks.
70 |
71 |
72 | ---
73 |
74 | ## 5) Git Flow vs OneFlow
75 | **Question:**
76 | Explain the main differences between Git Flow and OneFlow, and give an example of when you might pick one over the other.
77 |
78 |
79 | Hints / Key Points
80 |
81 | - **Git Flow**:
82 | - Has a `develop` branch and separate `release/` branches.
83 | - Good for scheduled releases or big features that need isolation.
84 | - **OneFlow**:
85 | - Fewer branches; merges features straight into `main`.
86 | - Easier for continuous delivery or smaller teams who release often.
87 |
88 |
89 | ---
90 |
91 | ## 6) Unit / Integration / E2E Tests
92 | **Question (Scenario):**
93 | A new developer asks how different types of tests fit into the CI pipeline. Can you explain the roles of unit, integration, and end-to-end tests?
94 |
95 |
96 | Hints / Key Points
97 |
98 | - **Unit tests**: Check individual pieces of code in isolation.
99 | - **Integration tests**: Validate how services or components interact (e.g., API to DB).
100 | - **E2E tests**: Full user flow from start to finish, mirroring real production usage.
101 |
102 |
103 | ---
104 |
105 | ## 7) Scenario: Slow Build Times
106 | **Question (Scenario):**
107 | The CI builds are getting slower and slower, which annoys developers. What could you do to speed things up?
108 |
109 |
110 | Hints / Key Points
111 |
112 | - **Caching** dependencies so you don’t rebuild or re-download everything on each run.
113 | - Splitting large monolithic builds into smaller jobs or microservices.
114 | - Using multi-stage Docker builds or ephemeral agents to reduce overhead.
115 |
116 |
117 | ---
118 |
119 | ## 8) Scenario: Double-Build Problem in Docker CI Pipeline
120 | **Question:**
121 | You notice your CI pipeline builds a Docker image twice—once for security scanning (e.g., Trivy) and again for deployment. This doubles build time and resource usage.
122 |
123 | **How would you troubleshoot and resolve this?**
124 |
125 |
126 | Hints / Key Points
127 |
128 | - Identify where each build step is triggered; check pipeline definitions or separate workflows.
129 | - Consolidate scanning and deployment into **one** pipeline stage, or reuse the built image artifact.
130 | - Push a single built image to a temporary registry, run scans on that image, then deploy if it’s clean.
131 |
132 |
133 | ---
134 |
135 | ## 9) Scenario: Docker Build Time Optimization in CI/CD
136 | **Question (Scenario):**
137 | Your Docker builds in CI/CD are taking longer than expected, especially for a .NET or Java app with large dependencies.
138 |
139 | **What can you do to improve the build time?**
140 |
141 |
142 | Hints / Key Points
143 |
144 | - Optimize your **Dockerfile** structure: place static dependency installation steps at the top for caching.
145 | - Use **multi-stage builds** to keep final images small.
146 | - Possibly store a “build cache” or layer cache between runs so you don’t rebuild everything from scratch each time.
147 |
148 |
--------------------------------------------------------------------------------
/questions/05_kubernetes.md:
--------------------------------------------------------------------------------
1 | # 05 – Kubernetes
2 |
3 | Core K8s topics: pods, services, ingress, CRDs, and debugging.
4 |
5 | ## Table of Contents
6 | 1. Deploying a New Application
7 | 2. Services vs Ingress
8 | 3. CRDs & Operators
9 | 4. Logs & Crash Troubleshooting
10 | 5. Resource in “Terminating” State
11 | 6. Debugging Inside a Container
12 | 7. Editing a Resource Live
13 | 8. Service-to-Service Communication
14 | 9. Sidecar/Init Containers
15 | 10. Bonus: Resource Name Limits
16 | 11. **Scenario: RBAC Permissions for Kaniko Builds**
17 | 12. **Scenario: Kubernetes Pod Logs Lost After Crash**
18 | 13. **Scenario: Resource Exhaustion (OOMKills) in Pods**
19 |
20 | ---
21 |
22 | ## 1) Deploying a New Application
23 | **Question:**
24 | If you have a new microservice to run in Kubernetes, what resources do you usually set up?
25 |
26 |
27 | Hints / Key Points
28 |
29 | - Usually a **Deployment** (or StatefulSet if stateful) and a **Service**.
30 | - Possibly an Ingress or LoadBalancer if external access is required.
31 | - Might use Helm for templating.
32 |
33 |
34 | ---
35 |
36 | ## 2) Services vs Ingress
37 | **Question:**
38 | What does a Service do in Kubernetes, and how is it different from an Ingress?
39 |
40 |
41 | Hints / Key Points
42 |
43 | - **Service**: Exposes pods at a stable address, can be ClusterIP, NodePort, or LoadBalancer.
44 | - **Ingress**: Defines routing rules for HTTP/HTTPS traffic to one or more Services.
45 |
46 |
47 | ---
48 |
49 | ## 3) CRDs & Operators
50 | **Question:**
51 | How do CRDs (Custom Resource Definitions) and Operators help you extend Kubernetes beyond its default features?
52 |
53 |
54 | Hints / Key Points
55 |
56 | - A **CRD** adds a new type of object (like “MyDatabase”) to the cluster.
57 | - An **Operator** watches these CRDs and automates tasks (install, upgrade, manage).
58 | - Good for complex/stateful apps so K8s can handle them more natively.
59 |
60 |
61 | ---
62 |
63 | ## 4) Logs & Crash Troubleshooting
64 | **Question (Scenario):**
65 | Your app keeps crashing after a few minutes in Kubernetes. How would you check what’s going on?
66 |
67 |
68 | Hints / Key Points
69 |
70 | - Inspect logs from the pod/container.
71 | - Check events or error messages for the pod.
72 | - See if it’s an OOM kill, code exception, or config problem.
73 |
74 |
75 | ---
76 |
77 | ## 5) Resource in “Terminating” State
78 | **Question:**
79 | Sometimes a pod or other resource is stuck “Terminating” for a long time. Why could that happen, and what might you do?
80 |
81 |
82 | Hints / Key Points
83 |
84 | - **Finalizers** might be blocking deletion.
85 | - The app might not handle termination signals well, so it never exits.
86 | - You can remove the finalizer or do a force delete if absolutely needed.
87 |
88 |
89 | ---
90 |
91 | ## 6) Debugging Inside a Container
92 | **Question:**
93 | You need to run commands inside a container for debugging. How do you do that in a Kubernetes environment?
94 |
95 |
96 | Hints / Key Points
97 |
98 | - Typically use a CLI to exec into the container.
99 | - If multiple containers, specify which container.
100 | - Make sure you have the right RBAC privileges.
101 | - Use `kubectl debug` to create a temporary debugging container in the same Pod and use that to debug the target container.
102 |
103 |
104 | ---
105 |
106 | ## 7) Editing a Resource Live
107 | **Question (Scenario):**
108 | You spot a small config mistake in a live resource. How would you fix it right away in the cluster? What risks might that cause?
109 |
110 |
111 | Hints / Key Points
112 |
113 | - You can **edit** the resource in place with the CLI, but that can cause drift from Git or Helm config.
114 | - If you’re using GitOps, the next sync might overwrite your manual fix.
115 | - Best practice: fix it in your config repo or chart too.
116 |
117 |
118 | ---
119 |
120 | ## 8) Service-to-Service Communication
121 | **Question:**
122 | How do different services within the same cluster talk to each other?
123 |
124 |
125 | Hints / Key Points
126 |
127 | - **Cluster DNS**: `..svc.cluster.local`.
128 | - A Service provides a stable endpoint, even if pod IPs change.
129 |
130 |
131 | ---
132 |
133 | ## 9) Sidecar/Init Containers
134 | **Question:**
135 | What are sidecar containers and init containers, and why might you use them?
136 |
137 |
138 | Hints / Key Points
139 |
140 | - **Init** containers run first to do setup tasks (migrations, config).
141 | - **Sidecar** containers run alongside the main app for logging, proxying, etc.
142 | - Helps separate concerns in a single pod.
143 |
144 |
145 | ---
146 |
147 | ## 10) Bonus: Resource Name Limits
148 | **Question:**
149 | Is there a name length limit or other format rule for K8s resources?
150 |
151 |
152 | Hints / Key Points
153 |
154 | - Usually follows **DNS label** rules (lowercase, up to 63 chars, alphanumeric + dashes).
155 | - Some resource types might vary slightly, but typically the same constraints apply.
156 |
157 |
158 | ---
159 |
160 | ## 11) Scenario: RBAC Permissions for Kaniko Builds
161 | **Question:**
162 | You’re running an Azure DevOps agent in Kubernetes, which uses Kaniko to build and push Docker images. It’s failing because it can’t create needed resources.
163 |
164 | - How would you troubleshoot the missing permissions?
165 | - How can RBAC be configured to give Kaniko the required access?
166 |
167 |
168 | Hints / Key Points
169 |
170 | - Check the Pod logs for permission errors (e.g., “forbidden”).
171 | - Assign a **ServiceAccount** with an appropriate Role/RoleBinding that allows creating ConfigMaps, Pods, etc.
172 | - Verify that the agent is using this ServiceAccount when building.
173 |
174 |
175 | ---
176 |
177 | ## 12) Scenario: Kubernetes Pod Logs Lost After Crash
178 | **Question:**
179 | A Kubernetes Pod crashes unexpectedly, and its logs are lost because the container restarts too quickly.
180 |
181 | - How would you recover logs from a previously crashed container?
182 | - How can you ensure logs are always accessible?
183 |
184 |
185 | Hints / Key Points
186 |
187 | - Use the CLI to get logs from the **previous** container instance (`-p` option), if still available.
188 | - Centralize logs in an external system like ELK, Loki, or FluentD.
189 | - Ensure your app flushes logs frequently so they aren’t lost on crash.
190 | - Attach a log volume to the Pod to persist logs and access them later.
191 |
192 |
193 | ---
194 |
195 | ## 13) Scenario: Resource Exhaustion (OOMKills) in Pods
196 | **Question:**
197 | A Kubernetes Pod crashes intermittently and is marked as OOMKilled.
198 |
199 | - How would you identify the cause of the memory spikes?
200 | - How do you stop the Pod from running out of memory in the future?
201 |
202 |
203 | Hints / Key Points
204 |
205 | - Check resource usage with `kubectl top` or a monitoring tool.
206 | - Increase the memory limit if the app truly needs more, or find memory leaks.
207 | - Monitor usage over time, maybe use VPA (Vertical Pod Autoscaler) if appropriate.
208 |
209 |
210 | ---
211 |
212 | ## 14) Scenario: Node Debugging
213 | **Question:**
214 | How do you debug issues on a Kubernetes node, such as accessing the node's file system or checking running services?
215 |
216 |
217 | Hints / Key Points
218 |
219 | - Use `kubectl debug` to create a temporary debugging pod on the node.
220 | - Access the node's file system and inspect logs or configuration files.
221 | - Check running services and their statuses.
222 | - Use tools like `top`, `ps`, and `netstat` to monitor resource usage and network connections.
223 | - Access pod/container logs directly by accessing the logs saved using a volume mount or a pre-defined log path for pods/containers.
224 |
225 |
--------------------------------------------------------------------------------