├── .gitignore ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── home-assignments └── assignment_1.md └── questions ├── 01_ci.md ├── 02_cd.md ├── 03_gitops.md ├── 04_docker.md ├── 05_kubernetes.md ├── 06_helm.md ├── 07_secrets.md ├── 08_security.md ├── 09_shift_left.md └── 10_general.md /.gitignore: -------------------------------------------------------------------------------- 1 | # ---- Basic Ignored Files ---- 2 | .DS_Store 3 | *.log 4 | *.tmp 5 | 6 | # ---- Node / Java / etc. (Add or remove as needed) ---- 7 | node_modules/ 8 | target/ 9 | out/ 10 | 11 | # ---- Python ---- 12 | __pycache__/ 13 | *.pyc 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | Thank you for your interest in improving **DevOps Interview Questions**! 4 | 5 | ## Contribution Steps 6 | 7 | 1. **Fork this repository** on GitHub. 8 | 2. **Create a new branch** in your fork, e.g. `feature/add-docker-questions`. 9 | 3. Add or modify files: 10 | - To add new questions, pick the most relevant folder (e.g., `03_docker/questions.md`) or create a new folder if needed. 11 | - Follow the question format shown below. 12 | 4. **Commit** and **push** your changes to your branch. 13 | 5. Open a **Pull Request** (PR) against the `main` branch of this repo. Include a short description of your changes. 14 | 15 | ## Question Format 16 | 17 | Here’s a suggested format for new questions in `questions.md`: 18 | 19 | ```md 20 | ## [Short Title or ID] 21 | **Question:** 22 | (Describe the question, scenario, or challenge) 23 | 24 | **Answer Hints / Key Points:** 25 | - (List possible solution approaches or facts) 26 | 27 | **Why This Matters (Optional):** 28 | - (Explain the real-world relevance) 29 | 30 | **Style Guidelines:** 31 | - (Keep questions concise and clear.) 32 | - (Avoid duplicates. If a question overlaps with an existing one, consider merging them or adding details.) 33 | - (Use neutral or professional language.) 34 | - (For home assignments, place them in home-assignments/ with a clear name (e.g. assignment_2.md).) 35 | 36 | **License** 37 | By contributing, you agree your contributions are licensed under the project’s MIT License. 38 | 39 | Thank you for helping us build a great DevOps interview resource! -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Moran Weissman 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DevOps Interview Questions 2 | 3 | A comprehensive collection of DevOps-related interview questions and scenario-based exercises. 4 | This repository is designed to help both **interviewers** and **candidates** prepare for a wide variety of DevOps topics. 5 | 6 | --- 7 | 8 | ## Table of Contents 9 | 10 | 1. [Continuous Integration (CI)](./questions/01_ci.md) 11 | 2. [Continuous Deployment (CD)](./questions/02_cd.md) 12 | 3. [GitOps & ArgoCD](./questions/03_gitops.md) 13 | 4. [Docker](./questions/04_docker.md) 14 | 5. [Kubernetes](./questions/05_kubernetes.md) 15 | 6. [Helm](./questions/06_helm.md) 16 | 7. [Secrets](./questions/07_secrets.md) 17 | 8. [Security](./questions/08_security.md) 18 | 9. [Shift Left](./questions/09_shift_left.md) 19 | 10. [General / Architecture](./questions/10_general.md) 20 | 11. [Home Assignments](./home-assignments/assignment_1.md) 21 | 22 | --- 23 | 24 | ## How to Use 25 | 26 | - **Interviewers**: 27 | - Pick relevant topics (e.g., Docker, Kubernetes) and select scenario-based questions to evaluate practical experience. 28 | - Use the [home assignments](./home-assignments/assignment_1.md) to assess coding and configuration skills under real-world conditions. 29 | 30 | - **Candidates**: 31 | - Start by reviewing questions in your chosen topic (e.g., `01_ci.md`) and try answering on your own. 32 | - Expand your knowledge by reading the hidden hints in the `
` blocks. 33 | - Check scenario-based questions for real-world problem-solving practice. 34 | 35 | --- 36 | 37 | ## Contributing 38 | 39 | Contributions are always welcome! 40 | - To add or edit questions, please see our [CONTRIBUTING.md](./CONTRIBUTING.md) for the recommended structure and style guidelines. 41 | - You can also submit additional scenario-based exercises or home assignments that fit the existing format. 42 | 43 | --- 44 | 45 | ## License 46 | 47 | This project is licensed under the [MIT License](./LICENSE). 48 | Feel free to use it as a reference or foundation for your own DevOps interview prep or training material. 49 | 50 | --- 51 | 52 | Enjoy and good luck with your interviews! 53 | -------------------------------------------------------------------------------- /home-assignments/assignment_1.md: -------------------------------------------------------------------------------- 1 | # Home Assignment 1 2 | 3 | **Objective**: Demonstrate end-to-end DevOps skills, from setting up a local environment to configuring CI/CD, containers, and Kubernetes. 4 | 5 | --- 6 | 7 | ## Topics / Instructions 8 | 9 | ### Prerequisites 10 | - Use a web-server application where the config file path is passed via an environment variable. 11 | - Avoid Terraform for infra creation; everything can be local or in-kind/minikube. 12 | 13 | --- 14 | 15 | ### 1. CI 16 | 17 | - **Branch policy**: Explain how you’d set up and enforce a branch policy. 18 | - **Merge to main**: Why require pull requests? Outline best practices (e.g., code reviews, checks). 19 | - **DRY Pipelines**: Use parameterization or templates to avoid repeating code. 20 | - **Versioning**: 21 | - How do you manage application versions after merges? 22 | - How do you let developers work on branches without bumping the version prematurely? 23 | - **Dockerfile**: 24 | - Optimize the Dockerfile; possibly integrate CI steps (tests, lint) within it. 25 | - On PRs, run at least one test (lint/code coverage/unit tests) inside the Docker build. 26 | 27 | --- 28 | 29 | ### 2. CD 30 | 31 | - **ArgoCD App-of-Apps**: 32 | - Use something like kind or minikube. No need for Terraform. 33 | - Deploy and manage your Helm chart via an ArgoCD Application. 34 | - The ArgoCD Application is itself managed by a root “app-of-apps.” 35 | - The application should track the latest commit and be managed by the root app. 36 | - Explain the folder structure and how it all ties together. 37 | 38 | --- 39 | 40 | ### 3. Helm / Kubernetes 41 | 42 | - **Handling API Deprecations**: Provide examples of how you’d manage deprecations when upgrading K8s. 43 | - **Create a Job**: In your Helm chart, include a “hello world” job that must run first during ArgoCD deployment. 44 | - **Secure Environment Variables**: Ensure certain env vars do not appear in the raw Helm chart for security reasons. 45 | - **Config in Git**: The web server’s configuration is managed from Git/Helm, not stored in the image. 46 | 47 | --- 48 | 49 | ## Deliverables 50 | 51 | 1. A small Git repository showing: 52 | - The sample application code or Dockerfile 53 | - A CI pipeline definition (GitHub Actions, Azure Pipelines, or your choice) 54 | - A Helm chart for deployment 55 | - ArgoCD manifests if applicable 56 | 57 | 2. **README** in your repo explaining: 58 | - How to run or test it locally 59 | - How to access the deployed app in kind/minikube 60 | 61 | 3. Explanations of your design decisions in short markdown notes: 62 | - How you tackled versioning 63 | - Why you used certain Docker/Helm features 64 | - How the ArgoCD “app-of-apps” setup is structured 65 | 66 | --- 67 | 68 | ## Tips 69 | 70 | - Demonstrate best practices (small Docker images, secrets management, no sensitive info in code). 71 | - Keep it simple. This is a local assignment, so ephemeral clusters or local containers are fine. 72 | - Focus on clarity, automation, and an end-to-end approach. 73 | 74 | Good luck! -------------------------------------------------------------------------------- /questions/01_ci.md: -------------------------------------------------------------------------------- 1 | # 01 – Continuous Integration (CI) 2 | 3 | Focuses on building code, versioning, branching strategies, and ensuring every commit is validated properly. 4 | 5 | ## Table of Contents 6 | 1. Clean Build Environments 7 | 2. Git & Branch Policies 8 | 3. Testing from Dev/Personal Branches (Version Bump Avoidance) 9 | 4. Code Validation in CI 10 | 5. Git Flow vs OneFlow 11 | 6. Unit / Integration / E2E Tests 12 | 7. Scenario: Slow Build Times 13 | 8. **Scenario: Double-Build Problem in Docker CI Pipeline** 14 | 9. **Scenario: Docker Build Time Optimization in CI/CD** 15 | 16 | --- 17 | 18 | ## 1) Clean Build Environments 19 | **Question (Scenario):** 20 | Your builds sometimes fail because leftover files from previous runs cause unexpected results. How would you make sure each build runs in a clean, consistent environment? 21 | 22 |
23 | Hints / Key Points 24 | 25 | - **Containerized** or **ephemeral** build agents ensure no leftover files or dependencies. 26 | - Reproducible environment: each run starts fresh. 27 | - Avoid “works on my machine” issues by isolating dependencies. 28 |
29 | 30 | --- 31 | 32 | ## 2) Git & Branch Policies 33 | **Question:** 34 | When setting up a new Git repository for a service, how do you make sure collaboration and merging are done in a controlled, high-quality way? 35 | 36 |
37 | Hints / Key Points 38 | 39 | - Require **pull requests** to merge into `main`. 40 | - Use **branch protections**: code reviews, mandatory checks, or tests. 41 | - Adopt a branching strategy (Git Flow, OneFlow, or trunk-based). 42 |
43 | 44 | --- 45 | 46 | ## 3) Testing from Dev/Personal Branches (Version Bump Avoidance) 47 | **Question:** 48 | You want developers to build and test code on a personal branch without bumping the official app version. How do you do that? 49 | 50 |
51 | Hints / Key Points 52 | 53 | - Only bump the version if building from `main` (after merging). 54 | - Use **ephemeral tags** (like `dev-`) for personal branches. 55 | - This prevents cluttering your production semver with test builds. 56 |
57 | 58 | --- 59 | 60 | ## 4) Code Validation in CI 61 | **Question:** 62 | What kind of checks do you usually include in a CI pipeline to keep code quality high? 63 | 64 |
65 | Hints / Key Points 66 | 67 | - **Linting** and **static analysis** for code style and potential security flaws. 68 | - **Unit tests** to verify logic. 69 | - Optional checks: integration tests, code coverage, or style checks. 70 |
71 | 72 | --- 73 | 74 | ## 5) Git Flow vs OneFlow 75 | **Question:** 76 | Explain the main differences between Git Flow and OneFlow, and give an example of when you might pick one over the other. 77 | 78 |
79 | Hints / Key Points 80 | 81 | - **Git Flow**: 82 | - Has a `develop` branch and separate `release/` branches. 83 | - Good for scheduled releases or big features that need isolation. 84 | - **OneFlow**: 85 | - Fewer branches; merges features straight into `main`. 86 | - Easier for continuous delivery or smaller teams who release often. 87 |
88 | 89 | --- 90 | 91 | ## 6) Unit / Integration / E2E Tests 92 | **Question (Scenario):** 93 | A new developer asks how different types of tests fit into the CI pipeline. Can you explain the roles of unit, integration, and end-to-end tests? 94 | 95 |
96 | Hints / Key Points 97 | 98 | - **Unit tests**: Check individual pieces of code in isolation. 99 | - **Integration tests**: Validate how services or components interact (e.g., API to DB). 100 | - **E2E tests**: Full user flow from start to finish, mirroring real production usage. 101 |
102 | 103 | --- 104 | 105 | ## 7) Scenario: Slow Build Times 106 | **Question (Scenario):** 107 | The CI builds are getting slower and slower, which annoys developers. What could you do to speed things up? 108 | 109 |
110 | Hints / Key Points 111 | 112 | - **Caching** dependencies so you don’t rebuild or re-download everything on each run. 113 | - Splitting large monolithic builds into smaller jobs or microservices. 114 | - Using multi-stage Docker builds or ephemeral agents to reduce overhead. 115 |
116 | 117 | --- 118 | 119 | ## 8) Scenario: Double-Build Problem in Docker CI Pipeline 120 | **Question:** 121 | You notice your CI pipeline builds a Docker image twice—once for security scanning (e.g., Trivy) and again for deployment. This doubles build time and resource usage. 122 | 123 | **How would you troubleshoot and resolve this?** 124 | 125 |
126 | Hints / Key Points 127 | 128 | - Identify where each build step is triggered; check pipeline definitions or separate workflows. 129 | - Consolidate scanning and deployment into **one** pipeline stage, or reuse the built image artifact. 130 | - Push a single built image to a temporary registry, run scans on that image, then deploy if it’s clean. 131 |
132 | 133 | --- 134 | 135 | ## 9) Scenario: Docker Build Time Optimization in CI/CD 136 | **Question (Scenario):** 137 | Your Docker builds in CI/CD are taking longer than expected, especially for a .NET or Java app with large dependencies. 138 | 139 | **What can you do to improve the build time?** 140 | 141 |
142 | Hints / Key Points 143 | 144 | - Optimize your **Dockerfile** structure: place static dependency installation steps at the top for caching. 145 | - Use **multi-stage builds** to keep final images small. 146 | - Possibly store a “build cache” or layer cache between runs so you don’t rebuild everything from scratch each time. 147 |
148 | -------------------------------------------------------------------------------- /questions/02_cd.md: -------------------------------------------------------------------------------- 1 | # 02 – Continuous Deployment (CD) 2 | 3 | Automating deployments, promotions across environments, and using different strategies like rolling or canary. 4 | 5 | ## Table of Contents 6 | 1. Promotion Across Environments 7 | 2. K8s Deployment Strategies (Rolling, Blue/Green, Canary) 8 | 3. Examples of Blue/Green & Canary 9 | 4. Bonus: Canary/Blue-Green in Kubernetes 10 | 5. Argo Rollouts Explanation 11 | 6. **Scenario: Slow Canary Deployments with Argo Rollouts** 12 | 13 | --- 14 | 15 | ## 1) Promotion Across Environments 16 | **Question:** 17 | You have `dev`, `staging`, and `prod` environments. How would you automate moving a service from dev to staging, then on to prod, with proper checks along the way? 18 | 19 |
20 | Hints / Key Points 21 | 22 | - **Multi-stage pipeline** with approvals or gating (manual or automated). 23 | - Same config, but different overrides for each environment (Helm or plain YAML). 24 | - Possibly a final manual step for production if your org requires it. 25 |
26 | 27 | --- 28 | 29 | ## 2) K8s Deployment Strategies (Rolling, Blue/Green, Canary) 30 | **Question:** 31 | You want minimal downtime and easy rollback. Compare rolling updates, blue/green deployments, and canary releases. 32 | 33 |
34 | Hints / Key Points 35 | 36 | - **Rolling**: Replaces pods one by one; simpler to set up but partial downtime if something goes wrong mid-roll. 37 | - **Blue/Green**: Parallel environments; easy rollback by flipping traffic back. 38 | - **Canary**: Gradual traffic shift, letting you observe performance in real time with a fraction of traffic. 39 |
40 | 41 | --- 42 | 43 | ## 3) Examples of Blue/Green & Canary 44 | **Question (Scenario):** 45 | Your new microservice is critical, and you’re debating between blue/green and canary. Give a quick example of each strategy to show how you’d roll out a new version. 46 | 47 |
48 | Hints / Key Points 49 | 50 | - **Blue/Green**: 51 | - Deploy new version in parallel (green). 52 | - Test it, switch traffic once stable. 53 | - Rollback by switching to old (blue) if needed. 54 | 55 | - **Canary**: 56 | - Send small % of traffic to new version. 57 | - Watch metrics, gradually increase if stable. 58 | - Roll back if problems arise. 59 |
60 | 61 | --- 62 | 63 | ## 4) Bonus: Canary/Blue-Green in Kubernetes 64 | **Question:** 65 | Can you do canary or blue-green deployments directly in Kubernetes without extra tools? 66 | 67 |
68 | Hints / Key Points 69 | 70 | - Yes, but you have to handle traffic splits, labels, or separate services yourself. 71 | - Tools like **Argo Rollouts** or a service mesh (Istio) make it easier. 72 |
73 | 74 | --- 75 | 76 | ## 5) Argo Rollouts Explanation 77 | **Question:** 78 | What is Argo Rollouts, and how does it help with advanced deployment strategies? 79 | 80 |
81 | Hints / Key Points 82 | 83 | - **Kubernetes controller** that replaces Deployments with CRDs. 84 | - Supports canary, blue-green, progressive rollouts with health checks. 85 | - Integrates with ingress controllers or service meshes for traffic shaping. 86 |
87 | 88 | --- 89 | 90 | ## 6) Scenario: Slow Canary Deployments with Argo Rollouts 91 | **Question:** 92 | You set up Argo Rollouts for canary deployments, but traffic shifting is slower than expected, delaying the rollout. 93 | 94 | - What steps do you take to debug this behavior? 95 | - How can you ensure canary traffic shifts happen faster? 96 | 97 |
98 | Hints / Key Points 99 | 100 | - Check the **Rollout object** for correct strategy, weights, and health checks. 101 | - Make sure the ingress or service mesh is applying traffic splits properly. 102 | - Validate that health checks or success criteria aren’t too strict, causing slow or paused progress. 103 | - Monitor logs or metrics to see if pods are failing readiness or not meeting thresholds. 104 |
105 | -------------------------------------------------------------------------------- /questions/03_gitops.md: -------------------------------------------------------------------------------- 1 | # 03 – GitOps & ArgoCD 2 | 3 | Focuses on GitOps concepts, ArgoCD usage, the app-of-apps pattern, and ApplicationSets. 4 | 5 | ## Table of Contents 6 | 1. What is GitOps? 7 | 2. ArgoCD as a GitOps Tool 8 | 3. App-of-Apps Pattern 9 | 4. Adding a New App Under a Root “Umbrella” 10 | 5. ApplicationSet (Uses & Generators) 11 | 6. **Scenario: Deprecated APIs Causing ArgoCD Sync Failures** 12 | 7. **Scenario: Namespace Conflicts in ArgoCD Applications** 13 | 14 | --- 15 | 16 | ## 1) What is GitOps? 17 | **Question:** 18 | Your manager says, “We should do GitOps!” How do you explain the core idea of GitOps? 19 | 20 |
21 | Hints / Key Points 22 | 23 | - Infrastructure and app config in Git as the single source of truth. 24 | - A tool (ArgoCD) automatically updates the cluster to match Git. 25 | - All changes go through PRs, so everything is tracked and auditable. 26 |
27 | 28 | --- 29 | 30 | ## 2) ArgoCD as a GitOps Tool 31 | **Question:** 32 | Why is ArgoCD considered a GitOps solution and not just another deployment tool? 33 | 34 |
35 | Hints / Key Points 36 | 37 | - Watches a Git repo for changes, automatically syncing them into the cluster. 38 | - Declarative approach: no manual clicks or imperative commands. 39 | - Integrates with Helm, Kustomize, or plain YAML. 40 |
41 | 42 | --- 43 | 44 | ## 3) App-of-Apps Pattern 45 | **Question (Scenario):** 46 | You have a bunch of microservices, each with its own repo or chart. You want a single place to manage them. How would you do this in ArgoCD? 47 | 48 |
49 | Hints / Key Points 50 | 51 | - Root “umbrella” application referencing multiple child apps. 52 | - Each child is a separate Helm chart or folder. 53 | - App-of-apps keeps everything organized under one main config. 54 |
55 | 56 | --- 57 | 58 | ## 4) Adding a New App Under a Root “Umbrella” 59 | **Question:** 60 | If you already have a root “app-of-apps” that manages several services, how do you add a brand-new microservice? 61 | 62 |
63 | Hints / Key Points 64 | 65 | - Create a new ArgoCD “child” Application in the same or separate repo. 66 | - Reference it in the root app’s YAML so ArgoCD picks it up. 67 |
68 | 69 | --- 70 | 71 | ## 5) ApplicationSet (Uses & Generators) 72 | **Question:** 73 | What is an ApplicationSet in ArgoCD, and when might you use it? Name at least one generator type. 74 | 75 |
76 | Hints / Key Points 77 | 78 | - **ApplicationSet** is a CRD that can create multiple ArgoCD Applications automatically. 79 | - Used for deploying the same app across multiple clusters or generating apps for each folder/branch. 80 | - Generators: List, Git directory, Cluster, or SCM provider. 81 |
82 | 83 | --- 84 | 85 | ## 6) Scenario: Deprecated APIs Causing ArgoCD Sync Failures 86 | **Question:** 87 | After upgrading Kubernetes, an ArgoCD application fails to sync due to deprecated APIs in its Helm chart (e.g., `extensions/v1beta1` → `apps/v1`). 88 | 89 | - How do you identify which APIs are outdated? 90 | - How do you fix them? 91 | 92 |
93 | Hints / Key Points 94 | 95 | - Check chart templates for old API references. 96 | - Replace them with newer versions (`apps/v1`). 97 | - Tools like **Pluto** or `helm template` can highlight deprecated APIs. 98 | - Keep charts updated and test in a staging cluster before upgrading production. 99 |
100 | 101 | --- 102 | 103 | ## 7) Scenario: Namespace Conflicts in ArgoCD Applications 104 | **Question:** 105 | Two ArgoCD applications deploy into the same namespace, causing resource conflicts (e.g., overlapping ConfigMaps). One of them fails to sync. 106 | 107 | - How would you troubleshoot and resolve it? 108 | - How do you prevent it in the future? 109 | 110 |
111 | Hints / Key Points 112 | 113 | - Identify which resources clash by checking logs or ArgoCD sync errors. 114 | - Move each app into its own namespace for isolation, or rename the conflicting resources. 115 | - Have a clear naming/namespace strategy so multiple apps don’t step on each other. 116 |
117 | -------------------------------------------------------------------------------- /questions/04_docker.md: -------------------------------------------------------------------------------- 1 | # 04 – Docker 2 | 3 | Discusses container basics, Dockerfiles, and best practices. 4 | 5 | ## Table of Contents 6 | 1. Docker vs Container 7 | 2. Dockerfile: ENTRYPOINT vs CMD 8 | 3. Image Size Optimization 9 | 4. Build-time vs Runtime Secrets 10 | 5. Reducing Docker Build Time 11 | 6. Docker-in-Docker or Alternatives 12 | 7. Other Container Build Tools 13 | 8. **Scenario: Kaniko Image Build Failing Without Docker Daemon** 14 | 9. (Optional) RBAC for Kaniko? (See `05_kubernetes.md` for RBAC if needed) 15 | 16 | --- 17 | 18 | ## 1) Docker vs Container 19 | **Question:** 20 | What’s the difference between Docker as a tool and the concept of a container? 21 | 22 |
23 | Hints / Key Points 24 | 25 | - **Docker** is a platform for building/managing containers. 26 | - A **container** is an isolated environment bundling the app and dependencies. 27 | - Docker is popular, but other runtimes exist (Podman, Containerd). 28 |
29 | 30 | --- 31 | 32 | ## 2) Dockerfile: ENTRYPOINT vs CMD 33 | **Question:** 34 | A coworker is confused about `ENTRYPOINT` and `CMD` in a Dockerfile. How do you explain the difference? 35 | 36 |
37 | Hints / Key Points 38 | 39 | - **ENTRYPOINT**: The main command the container will always run. 40 | - **CMD**: Default arguments that can be overridden at runtime. 41 | - Typically, you set `ENTRYPOINT` to the main process and use `CMD` for optional flags. 42 |
43 | 44 | --- 45 | 46 | ## 3) Image Size Optimization 47 | **Question (Scenario):** 48 | You have a huge Docker image based on a full JDK. You want to reduce its size. How do you do it, and why does size matter? 49 | 50 |
51 | Hints / Key Points 52 | 53 | - Use **multi-stage builds** or switch to a smaller base (JRE or Alpine). 54 | - Clean up leftover artifacts (logs, caches). 55 | - Smaller images pull faster, less storage overhead, fewer security risks. 56 |
57 | 58 | --- 59 | 60 | ## 4) Build-time vs Runtime Secrets 61 | **Question:** 62 | You need to use some private tokens or credentials during the build. How do you add them without exposing them in the final image? 63 | 64 |
65 | Hints / Key Points 66 | 67 | - **ARG** for build-time secrets (not present in the final image). 68 | - Inject secrets at runtime via environment variables or secret managers. 69 | - Don’t store secrets in plain text in the Dockerfile or version control. 70 |
71 | 72 | --- 73 | 74 | ## 5) Reducing Docker Build Time 75 | **Question (Scenario):** 76 | Your Docker builds take too long, especially for .NET or Java projects. How can you make them faster? 77 | 78 |
79 | Hints / Key Points 80 | 81 | - Reorder Dockerfile steps to install dependencies first, so you can cache them. 82 | - Use multi-stage builds to keep final images small. 83 | - Possibly store or share a build cache between CI runs. 84 |
85 | 86 | --- 87 | 88 | ## 6) Docker-in-Docker or Alternatives 89 | **Question:** 90 | Sometimes we build Docker images inside a container during CI. Why do we do that, and what are some alternatives? 91 | 92 |
93 | Hints / Key Points 94 | 95 | - **Docker-in-Docker**: runs a Docker daemon inside a container, but can be less secure. 96 | - Alternatives: **Kaniko**, **Buildah**, **Podman** to build without a Docker daemon. 97 | - Reduces the need for privileged mode in CI. 98 |
99 | 100 | --- 101 | 102 | ## 7) Other Container Build Tools 103 | **Question:** 104 | If Docker wasn’t an option, what else could you use to build and run containers? 105 | 106 |
107 | Hints / Key Points 108 | 109 | - **Podman**, **Buildah**, **Containerd**. 110 | - For Java, **Jib** can build containers without a Docker daemon. 111 | - Some environments rely on containerd or rkt (less common nowadays). 112 |
113 | 114 | --- 115 | 116 | ## 8) Scenario: Kaniko Image Build Failing Without Docker Daemon 117 | **Question:** 118 | You are using **Kaniko** to build and push Docker images in a Kubernetes Pod, but the build fails due to the absence of a Docker daemon. 119 | 120 | - What steps would you take to troubleshoot Kaniko’s failure? 121 | - How can you configure it to build images without a Docker daemon? 122 | 123 |
124 | Hints / Key Points 125 | 126 | - Verify the **Kaniko** Pod has access to your Dockerfile, context, and registry credentials. 127 | - Kaniko doesn’t need a Docker daemon; pass `--context` and `--destination` parameters correctly. 128 | - Ensure you have the right RBAC permissions if it needs to create or manage certain resources in K8s. 129 |
130 | -------------------------------------------------------------------------------- /questions/05_kubernetes.md: -------------------------------------------------------------------------------- 1 | # 05 – Kubernetes 2 | 3 | Core K8s topics: pods, services, ingress, CRDs, and debugging. 4 | 5 | ## Table of Contents 6 | 1. Deploying a New Application 7 | 2. Services vs Ingress 8 | 3. CRDs & Operators 9 | 4. Logs & Crash Troubleshooting 10 | 5. Resource in “Terminating” State 11 | 6. Debugging Inside a Container 12 | 7. Editing a Resource Live 13 | 8. Service-to-Service Communication 14 | 9. Sidecar/Init Containers 15 | 10. Bonus: Resource Name Limits 16 | 11. **Scenario: RBAC Permissions for Kaniko Builds** 17 | 12. **Scenario: Kubernetes Pod Logs Lost After Crash** 18 | 13. **Scenario: Resource Exhaustion (OOMKills) in Pods** 19 | 20 | --- 21 | 22 | ## 1) Deploying a New Application 23 | **Question:** 24 | If you have a new microservice to run in Kubernetes, what resources do you usually set up? 25 | 26 |
27 | Hints / Key Points 28 | 29 | - Usually a **Deployment** (or StatefulSet if stateful) and a **Service**. 30 | - Possibly an Ingress or LoadBalancer if external access is required. 31 | - Might use Helm for templating. 32 |
33 | 34 | --- 35 | 36 | ## 2) Services vs Ingress 37 | **Question:** 38 | What does a Service do in Kubernetes, and how is it different from an Ingress? 39 | 40 |
41 | Hints / Key Points 42 | 43 | - **Service**: Exposes pods at a stable address, can be ClusterIP, NodePort, or LoadBalancer. 44 | - **Ingress**: Defines routing rules for HTTP/HTTPS traffic to one or more Services. 45 |
46 | 47 | --- 48 | 49 | ## 3) CRDs & Operators 50 | **Question:** 51 | How do CRDs (Custom Resource Definitions) and Operators help you extend Kubernetes beyond its default features? 52 | 53 |
54 | Hints / Key Points 55 | 56 | - A **CRD** adds a new type of object (like “MyDatabase”) to the cluster. 57 | - An **Operator** watches these CRDs and automates tasks (install, upgrade, manage). 58 | - Good for complex/stateful apps so K8s can handle them more natively. 59 |
60 | 61 | --- 62 | 63 | ## 4) Logs & Crash Troubleshooting 64 | **Question (Scenario):** 65 | Your app keeps crashing after a few minutes in Kubernetes. How would you check what’s going on? 66 | 67 |
68 | Hints / Key Points 69 | 70 | - Inspect logs from the pod/container. 71 | - Check events or error messages for the pod. 72 | - See if it’s an OOM kill, code exception, or config problem. 73 |
74 | 75 | --- 76 | 77 | ## 5) Resource in “Terminating” State 78 | **Question:** 79 | Sometimes a pod or other resource is stuck “Terminating” for a long time. Why could that happen, and what might you do? 80 | 81 |
82 | Hints / Key Points 83 | 84 | - **Finalizers** might be blocking deletion. 85 | - The app might not handle termination signals well, so it never exits. 86 | - You can remove the finalizer or do a force delete if absolutely needed. 87 |
88 | 89 | --- 90 | 91 | ## 6) Debugging Inside a Container 92 | **Question:** 93 | You need to run commands inside a container for debugging. How do you do that in a Kubernetes environment? 94 | 95 |
96 | Hints / Key Points 97 | 98 | - Typically use a CLI to exec into the container. 99 | - If multiple containers, specify which container. 100 | - Make sure you have the right RBAC privileges. 101 | - Use `kubectl debug` to create a temporary debugging container in the same Pod and use that to debug the target container. 102 |
103 | 104 | --- 105 | 106 | ## 7) Editing a Resource Live 107 | **Question (Scenario):** 108 | You spot a small config mistake in a live resource. How would you fix it right away in the cluster? What risks might that cause? 109 | 110 |
111 | Hints / Key Points 112 | 113 | - You can **edit** the resource in place with the CLI, but that can cause drift from Git or Helm config. 114 | - If you’re using GitOps, the next sync might overwrite your manual fix. 115 | - Best practice: fix it in your config repo or chart too. 116 |
117 | 118 | --- 119 | 120 | ## 8) Service-to-Service Communication 121 | **Question:** 122 | How do different services within the same cluster talk to each other? 123 | 124 |
125 | Hints / Key Points 126 | 127 | - **Cluster DNS**: `..svc.cluster.local`. 128 | - A Service provides a stable endpoint, even if pod IPs change. 129 |
130 | 131 | --- 132 | 133 | ## 9) Sidecar/Init Containers 134 | **Question:** 135 | What are sidecar containers and init containers, and why might you use them? 136 | 137 |
138 | Hints / Key Points 139 | 140 | - **Init** containers run first to do setup tasks (migrations, config). 141 | - **Sidecar** containers run alongside the main app for logging, proxying, etc. 142 | - Helps separate concerns in a single pod. 143 |
144 | 145 | --- 146 | 147 | ## 10) Bonus: Resource Name Limits 148 | **Question:** 149 | Is there a name length limit or other format rule for K8s resources? 150 | 151 |
152 | Hints / Key Points 153 | 154 | - Usually follows **DNS label** rules (lowercase, up to 63 chars, alphanumeric + dashes). 155 | - Some resource types might vary slightly, but typically the same constraints apply. 156 |
157 | 158 | --- 159 | 160 | ## 11) Scenario: RBAC Permissions for Kaniko Builds 161 | **Question:** 162 | You’re running an Azure DevOps agent in Kubernetes, which uses Kaniko to build and push Docker images. It’s failing because it can’t create needed resources. 163 | 164 | - How would you troubleshoot the missing permissions? 165 | - How can RBAC be configured to give Kaniko the required access? 166 | 167 |
168 | Hints / Key Points 169 | 170 | - Check the Pod logs for permission errors (e.g., “forbidden”). 171 | - Assign a **ServiceAccount** with an appropriate Role/RoleBinding that allows creating ConfigMaps, Pods, etc. 172 | - Verify that the agent is using this ServiceAccount when building. 173 |
174 | 175 | --- 176 | 177 | ## 12) Scenario: Kubernetes Pod Logs Lost After Crash 178 | **Question:** 179 | A Kubernetes Pod crashes unexpectedly, and its logs are lost because the container restarts too quickly. 180 | 181 | - How would you recover logs from a previously crashed container? 182 | - How can you ensure logs are always accessible? 183 | 184 |
185 | Hints / Key Points 186 | 187 | - Use the CLI to get logs from the **previous** container instance (`-p` option), if still available. 188 | - Centralize logs in an external system like ELK, Loki, or FluentD. 189 | - Ensure your app flushes logs frequently so they aren’t lost on crash. 190 | - Attach a log volume to the Pod to persist logs and access them later. 191 |
192 | 193 | --- 194 | 195 | ## 13) Scenario: Resource Exhaustion (OOMKills) in Pods 196 | **Question:** 197 | A Kubernetes Pod crashes intermittently and is marked as OOMKilled. 198 | 199 | - How would you identify the cause of the memory spikes? 200 | - How do you stop the Pod from running out of memory in the future? 201 | 202 |
203 | Hints / Key Points 204 | 205 | - Check resource usage with `kubectl top` or a monitoring tool. 206 | - Increase the memory limit if the app truly needs more, or find memory leaks. 207 | - Monitor usage over time, maybe use VPA (Vertical Pod Autoscaler) if appropriate. 208 |
209 | 210 | --- 211 | 212 | ## 14) Scenario: Node Debugging 213 | **Question:** 214 | How do you debug issues on a Kubernetes node, such as accessing the node's file system or checking running services? 215 | 216 |
217 | Hints / Key Points 218 | 219 | - Use `kubectl debug` to create a temporary debugging pod on the node. 220 | - Access the node's file system and inspect logs or configuration files. 221 | - Check running services and their statuses. 222 | - Use tools like `top`, `ps`, and `netstat` to monitor resource usage and network connections. 223 | - Access pod/container logs directly by accessing the logs saved using a volume mount or a pre-defined log path for pods/containers. 224 |
225 | -------------------------------------------------------------------------------- /questions/06_helm.md: -------------------------------------------------------------------------------- 1 | # 06 – Helm 2 | 3 | Helm chart usage, templating, hooks, etc. 4 | 5 | ## Table of Contents 6 | 1. Chart Resource Requests/Limits 7 | 2. Requests vs Limits Differences 8 | 3. Sequence of Operations (Hooks) 9 | 4. Multiple Jobs Order 10 | 5. Dynamic Resource Generation 11 | 6. Handling API Deprecations 12 | 7. TPL Helper File 13 | 8. (Optional) Scenario: Deprecated APIs Causing ArgoCD Sync Failures w/ Helm 14 | 15 | --- 16 | 17 | ## 1) Chart Resource Requests/Limits 18 | **Question:** 19 | Why do we set resource requests and limits in a Helm chart, and how do we manage them across environments? 20 | 21 |
22 | Hints / Key Points 23 | 24 | - Ensures pods have enough CPU/memory, prevents resource hogging. 25 | - Helm `values.yaml` can differ for dev vs prod. 26 | - Good for cost control and stability. 27 |
28 | 29 | --- 30 | 31 | ## 2) Requests vs Limits Differences 32 | **Question:** 33 | What’s the difference between resource requests and limits in Kubernetes, and how does Helm simplify handling them? 34 | 35 |
36 | Hints / Key Points 37 | 38 | - **Requests**: minimum guaranteed resources. 39 | - **Limits**: maximum allowed before throttling or OOMKill. 40 | - Helm: store these in `values.yaml` for easy environment overrides. 41 |
42 | 43 | --- 44 | 45 | ## 3) Sequence of Operations (Hooks) 46 | **Question:** 47 | You want a job to run before your main app starts. How do you do that in Helm? 48 | 49 |
50 | Hints / Key Points 51 | 52 | - Use **Helm hooks** (`pre-install`, `post-install`) on that job. 53 | - The job runs first; if it succeeds, Helm proceeds to install the rest. 54 | - Weights can fine-tune the order of multiple hooks. 55 |
56 | 57 | --- 58 | 59 | ## 4) Multiple Jobs Order 60 | **Question:** 61 | If you have multiple jobs that need to run in a certain sequence, how can Helm handle that? 62 | 63 |
64 | Hints / Key Points 65 | 66 | - Hooks with **weights** (lower weight runs first). 67 | - Or a single job that does tasks in order. 68 | - Sometimes separate subcharts if they’re truly independent. 69 |
70 | 71 | --- 72 | 73 | ## 5) Dynamic Resource Generation 74 | **Question:** 75 | You want to create several similar resources from a list in `values.yaml`. How do you do that with Helm? 76 | 77 |
78 | Hints / Key Points 79 | 80 | - Use `{{- range .Values.myItems }}` in the template. 81 | - Each item in the list gets its own resource. 82 | - `_helpers.tpl` can keep repeated logic DRY. 83 |
84 | 85 | --- 86 | 87 | ## 6) Handling API Deprecations 88 | **Question:** 89 | A new Kubernetes version might deprecate older APIs. How do you update your Helm charts to handle that? 90 | 91 |
92 | Hints / Key Points 93 | 94 | - Replace old references (e.g., `extensions/v1beta1`) with `apps/v1`. 95 | - Tools like **pluto** can scan for deprecated usage. 96 | - Test in a lower environment or staging cluster first. 97 |
98 | 99 | --- 100 | 101 | ## 7) TPL Helper File 102 | **Question:** 103 | What is the `tpl` function in Helm, and why might you use it? 104 | 105 |
106 | Hints / Key Points 107 | 108 | - `tpl` parses a string as a Helm template at runtime. 109 | - Good for user-provided or nested templates in `values.yaml`. 110 | - Keep advanced logic or partials in `_helpers.tpl`. 111 |
112 | 113 | --- 114 | 115 | ## 8) (Optional) Scenario: Deprecated APIs Causing ArgoCD Sync Failures w/ Helm 116 | **Question:** 117 | After upgrading the cluster, your Helm chart can’t sync in ArgoCD because it references deprecated APIs. 118 | 119 | - How do you find which APIs are outdated? 120 | - How do you fix them in your chart? 121 | 122 |
123 | Hints / Key Points 124 | 125 | - Look at the chart’s templates for older API versions (e.g., `extensions/v1beta1`). 126 | - Update them to the newer equivalents (e.g., `apps/v1`). 127 | - **Helm 3.12+** offers a **server-side dry run** via: 128 | ```bash 129 | helm upgrade --dry-run=server ... 130 | ``` 131 | This checks the manifests against the actual cluster APIs, catching potential deprecations or validation issues before you apply them. 132 | - You can also use tools like **Pluto** to scan for deprecated or removed APIs. 133 | - Test in a non-production cluster to confirm everything works with the new APIs. 134 |
135 | -------------------------------------------------------------------------------- /questions/07_secrets.md: -------------------------------------------------------------------------------- 1 | # 07 – Secrets Management 2 | 3 | Deals with handling sensitive data like passwords, tokens, and keys in DevOps. 4 | 5 | ## Table of Contents 6 | 1. What Are Secrets & Why 7 | 2. Storing and Managing Secrets 8 | 9 | --- 10 | 11 | ## 1) What Are Secrets & Why 12 | **Question:** 13 | Organizations always talk about storing things like passwords or tokens as “secrets.” What’s the big deal? 14 | 15 |
16 | Hints / Key Points 17 | 18 | - It keeps sensitive data out of plain text in code or config. 19 | - Minimizes risk if repos or logs get exposed. 20 | - In Kubernetes, a Secret is base64-encoded, but a real secrets manager provides better security. 21 |
22 | 23 | --- 24 | 25 | ## 2) Storing and Managing Secrets 26 | **Question (Scenario):** 27 | Your team has many credentials for different microservices. How can you safely store and use them without embedding them in code? 28 | 29 |
30 | Hints / Key Points 31 | 32 | - Use a **secrets manager** (like HashiCorp Vault, AWS Secrets Manager, Azure Key Vault). 33 | - Give each service controlled access. 34 | - Automate secret rotation and auditing, especially if you handle sensitive data. 35 |
36 | -------------------------------------------------------------------------------- /questions/08_security.md: -------------------------------------------------------------------------------- 1 | # 08 – Security 2 | 3 | Focuses on adding security checks to CI/CD, avoiding hardcoded secrets, and building secure images. 4 | 5 | ## Table of Contents 6 | 1. Pipeline Security 7 | 2. Avoiding Hardcoded Secrets 8 | 3. Secure Docker Builds 9 | 10 | --- 11 | 12 | ## 1) Pipeline Security 13 | **Question (Scenario):** 14 | You want to catch vulnerabilities early. How can you add security steps to your CI/CD pipeline? 15 | 16 |
17 | Hints / Key Points 18 | 19 | - **Static code analysis** (SAST) to look for known flaws. 20 | - **Image scanning** for Docker containers. 21 | - **SCA scanning** Analyzing open-source and third-party components in the application for vulnerabilities and licensing issues. 22 | - Dependency checks to flag libraries with known CVEs. 23 |
24 | 25 | --- 26 | 27 | ## 2) Avoiding Hardcoded Secrets 28 | **Question:** 29 | You found actual passwords in your pipeline scripts. How do you clean that up and prevent it from happening again? 30 | 31 |
32 | Hints / Key Points 33 | 34 | - Store secrets in a secure variable store or a secrets manager. 35 | - Don’t commit them to Git. 36 | - Use environment variables or injected secrets at runtime. 37 |
38 | 39 | --- 40 | 41 | ## 3) Secure Docker Builds 42 | **Question:** 43 | Your production app runs in Docker containers. What steps can you take to make sure those containers are secure? 44 | 45 |
46 | Hints / Key Points 47 | 48 | - Use **minimal base images**, patch them regularly. 49 | - Don’t run as root if you can avoid it. 50 | - Scan images for vulnerabilities before deploying. 51 | - Sign images for verification (e.g., with Cosign or Notary). 52 |
53 | -------------------------------------------------------------------------------- /questions/09_shift_left.md: -------------------------------------------------------------------------------- 1 | # 09 – Shift Left 2 | 3 | Idea of doing testing and security earlier in the development process. 4 | 5 | ## Table of Contents 6 | 1. Shift Left Concept 7 | 2. Why Shift Left 8 | 3. CI/CD Tools for Shift Left 9 | 4. Empowering Developers 10 | 11 | --- 12 | 13 | ## 1) Shift Left Concept 14 | **Question:** 15 | What does “Shift Left” mean in software development, and why do we hear about it a lot now? 16 | 17 |
18 | Hints / Key Points 19 | 20 | - Do QA/testing/security as early as possible, not at the end. 21 | - Helps catch problems sooner, which is cheaper to fix. 22 | - Encourages developers to take ownership of quality from the start. 23 |
24 | 25 | --- 26 | 27 | ## 2) Why Shift Left 28 | **Question (Scenario):** 29 | Your testers always find big problems right before release. How could a Shift Left approach help? 30 | 31 |
32 | Hints / Key Points 33 | 34 | - If devs run tests and checks on each commit, issues are spotted earlier. 35 | - Fewer surprises at the end. 36 | - Faster feedback loops and more stable releases. 37 |
38 | 39 | --- 40 | 41 | ## 3) CI/CD Tools for Shift Left 42 | **Question:** 43 | How do tools like Jenkins, GitHub Actions, or Azure Pipelines help a Shift Left approach? 44 | 45 |
46 | Hints / Key Points 47 | 48 | - They let you run automated tests and scans on every push or pull request. 49 | - If something fails, devs see it right away. 50 | - Can even spin up test environments automatically for quick integration checks. 51 |
52 | 53 | --- 54 | 55 | ## 4) Empowering Developers 56 | **Question:** 57 | Some companies give developers full power to define infrastructure (e.g., Helm charts). How does this fit into Shift Left? 58 | 59 |
60 | Hints / Key Points 61 | 62 | - Developers can fix both code and deployment configs early on. 63 | - No waiting for ops teams to fix environment issues. 64 | - Still need guardrails, but it speeds up delivery and fosters more responsibility. 65 |
66 | -------------------------------------------------------------------------------- /questions/10_general.md: -------------------------------------------------------------------------------- 1 | # 10 – General / Architecture 2 | 3 | Covers broader DevOps or architecture topics. 4 | 5 | ## Table of Contents 6 | 1. Stateless vs Stateful 7 | 2. DevOps Culture 8 | 3. Scenario: Handling a Critical Outage 9 | 4. Additional General Questions 10 | 11 | --- 12 | 13 | ## 1) Stateless vs Stateful 14 | **Question:** 15 | What’s the main difference between stateless and stateful apps, and how does this affect scaling? 16 | 17 |
18 | Hints / Key Points 19 | 20 | - **Stateless**: Doesn’t keep data in memory across requests; easier to scale horizontally. 21 | - **Stateful**: Maintains sessions or data that might need external storage. Harder to scale. 22 | - Many modern microservices aim to be stateless for simplicity. 23 |
24 | 25 | --- 26 | 27 | ## 2) DevOps Culture 28 | **Question:** 29 | How would you describe “DevOps culture,” and what does it change compared to older dev-and-ops silos? 30 | 31 |
32 | Hints / Key Points 33 | 34 | - Focus on **collaboration** and **automation**. 35 | - Shared responsibility for stability, performance, and delivery. 36 | - Closer feedback loops, continuous integration, continuous delivery. 37 |
38 | 39 | --- 40 | 41 | ## 3) Scenario: Handling a Critical Outage 42 | **Question (Scenario):** 43 | Production goes down during peak hours. Describe how you’d handle the incident, from detecting the problem to resolving it. 44 | 45 |
46 | Hints / Key Points 47 | 48 | - **Quick triage**: check logs, monitoring, recent changes, and alerts. 49 | - Possibly roll back the last deployment if that caused it. 50 | - Communicate clearly with stakeholders. 51 | - After fixing, do a post-mortem to learn from it. 52 |
53 | 54 | --- 55 | 56 | ## 4) Additional General Questions 57 | - *(Placeholder for any other broad DevOps or architecture topics.)* 58 | --------------------------------------------------------------------------------