├── assets
└── img
│ ├── repo-banner.png
│ ├── deploy-scenarios
│ ├── image-pull-diff.png
│ ├── image-pull-event.png
│ ├── image-pull-err-timeline.png
│ ├── deploy-event-explanation.png
│ ├── deploy-with-configmap-example.png
│ ├── clean-timeline-with-deploy-event.png
│ ├── config-error-failed-deploy-event.png
│ └── config-error-failed-deploy-on-the-timeline.png
│ ├── failure-scenarios
│ ├── oomkilled-event.png
│ ├── oomkilled-timeline.png
│ ├── application-issue-logs.png
│ ├── application-issue-event.png
│ └── application-issue-availability-issue-detected.png
│ └── failed-scenarios
│ ├── failed-scheduling-event.png
│ └── failed-scheduling-timeline.png
├── deploys-scenarios
├── failed-deploy-creation-config-error
│ ├── healthy-deploy.yaml
│ ├── createcontainerconfigerror.yaml
│ └── README.md
├── failed-deploy-image-pull-backoff
│ ├── imagepullbackoff.yaml
│ ├── nginx-image-healthy.yaml
│ └── README.md
└── a-simple-deploy-with-a-configmap-change
│ ├── step1.yaml
│ ├── step2.yaml
│ └── README.md
├── failure-scenarios
├── failed-to-schedule-pods
│ ├── healthy-deploy.yaml
│ ├── failed-scheduling.yaml
│ └── README.md
├── OOMKilled
│ ├── oom.yaml
│ └── README.md
└── application-error-with-exception
│ ├── simple-application.yaml
│ ├── application-error.yaml
│ └── README.md
├── README.md
└── training-session
├── run-all.sh
└── README.md
/assets/img/repo-banner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/repo-banner.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/image-pull-diff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/image-pull-diff.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/image-pull-event.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/image-pull-event.png
--------------------------------------------------------------------------------
/assets/img/failure-scenarios/oomkilled-event.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/oomkilled-event.png
--------------------------------------------------------------------------------
/assets/img/failure-scenarios/oomkilled-timeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/oomkilled-timeline.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/image-pull-err-timeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/image-pull-err-timeline.png
--------------------------------------------------------------------------------
/assets/img/failed-scenarios/failed-scheduling-event.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failed-scenarios/failed-scheduling-event.png
--------------------------------------------------------------------------------
/assets/img/failure-scenarios/application-issue-logs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/application-issue-logs.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/deploy-event-explanation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/deploy-event-explanation.png
--------------------------------------------------------------------------------
/assets/img/failure-scenarios/application-issue-event.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/application-issue-event.png
--------------------------------------------------------------------------------
/assets/img/failed-scenarios/failed-scheduling-timeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failed-scenarios/failed-scheduling-timeline.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/deploy-with-configmap-example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/deploy-with-configmap-example.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/clean-timeline-with-deploy-event.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/clean-timeline-with-deploy-event.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/config-error-failed-deploy-event.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/config-error-failed-deploy-event.png
--------------------------------------------------------------------------------
/assets/img/deploy-scenarios/config-error-failed-deploy-on-the-timeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/config-error-failed-deploy-on-the-timeline.png
--------------------------------------------------------------------------------
/assets/img/failure-scenarios/application-issue-availability-issue-detected.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/application-issue-availability-issue-detected.png
--------------------------------------------------------------------------------
/deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: komodor-create-container-config-error
5 | labels:
6 | app: komodor-create-error
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: komodor-create-error
12 | template:
13 | metadata:
14 | labels:
15 | app: komodor-create-error
16 | spec:
17 | containers:
18 | - name: crash-demo
19 | image: nginx:1.21.6
20 |
--------------------------------------------------------------------------------
/deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: komodor-imagepull-backoff
5 | labels:
6 | app: komodor-imagepull-backoff
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: komodor-imagepull-backoff
12 | template:
13 | metadata:
14 | labels:
15 | app: komodor-imagepull-backoff
16 | spec:
17 | containers:
18 | - name: imagepull-demo
19 | image: nginx:1.221.0
20 |
--------------------------------------------------------------------------------
/deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: komodor-imagepull-backoff
5 | labels:
6 | app: komodor-imagepull-backoff
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: komodor-imagepull-backoff
12 | template:
13 | metadata:
14 | labels:
15 | app: komodor-imagepull-backoff
16 | spec:
17 | containers:
18 | - name: imagepull-demo
19 | image: nginx:1.21.0
20 |
--------------------------------------------------------------------------------
/failure-scenarios/failed-to-schedule-pods/healthy-deploy.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: komodor-failed-scheduling
5 | labels:
6 | app: komodor-failed-scheduling
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: komodor-failed-scheduling
12 | template:
13 | metadata:
14 | labels:
15 | app: komodor-failed-scheduling
16 | spec:
17 | containers:
18 | - name: nginx
19 | image: nginx:1.23.2
20 | env:
21 | - name: BITNAMI_DEBUG
22 | value: "false"
23 | - name: NGINX_HTTP_PORT_NUMBER
24 | value: "8080"
25 |
--------------------------------------------------------------------------------
/failure-scenarios/failed-to-schedule-pods/failed-scheduling.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: komodor-failed-scheduling
5 | labels:
6 | app: komodor-failed-scheduling
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: komodor-failed-scheduling
12 | template:
13 | metadata:
14 | labels:
15 | app: komodor-failed-scheduling
16 | spec:
17 | containers:
18 | - name: nginx
19 | image: nginx:1.23.2
20 | env:
21 | - name: BITNAMI_DEBUG
22 | value: "false"
23 | - name: NGINX_HTTP_PORT_NUMBER
24 | value: "8080"
25 | resources:
26 | limits:
27 | memory: 500Gi
28 | requests:
29 | memory: 500Gi
--------------------------------------------------------------------------------
/failure-scenarios/OOMKilled/oom.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: komodor-oomkilled
5 | labels:
6 | app: komodor-oomkilled
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: komodor-oomkilled
12 | template:
13 | metadata:
14 | labels:
15 | app: komodor-oomkilled
16 | spec:
17 | containers:
18 | - name: komodor-oomkilled
19 | image: polinux/stress
20 | command: ["/bin/sh", "-c"]
21 | args: ["echo 'Going to allocate 60MB of memory!' ; echo 'Going to allocate 60MB of memory!' ; echo 'Going to allocate 60MB of memory!' ; stress --vm 2 --vm-bytes 30M --vm-hang 120 --backoff 10000000 --verbose"]
22 | resources:
23 | requests:
24 | memory: "40Mi"
25 | limits:
26 | memory: "40Mi"
27 |
--------------------------------------------------------------------------------
/deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: komodor-create-container-config-error
5 | labels:
6 | app: komodor-create-error
7 | spec:
8 | replicas: 1
9 | selector:
10 | matchLabels:
11 | app: komodor-create-error
12 | template:
13 | metadata:
14 | labels:
15 | app: komodor-create-error
16 | spec:
17 | containers:
18 | - name: crash-demo
19 | image: nginx:1.21.6
20 | env:
21 | - name: SECRET_TOKEN
22 | valueFrom:
23 | configMapKeyRef:
24 | name: api-access-token
25 | key: SECRET_TOKEN
26 | - name: API_ENDPOINT
27 | valueFrom:
28 | configMapKeyRef:
29 | name: api-access-token
30 | key: API_ENDPOINT
31 | ---
32 | apiVersion: v1
33 | kind: ConfigMap
34 | metadata:
35 | name: api-access-token
36 | data:
37 | SECRET_TOKEN: dmFsdWUtMg0KDQo=
38 |
--------------------------------------------------------------------------------
/deploys-scenarios/a-simple-deploy-with-a-configmap-change/step1.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ConfigMap
3 | metadata:
4 | name: komodor-features-configuration
5 | data:
6 | reportToDataLake: 'true'
7 | debug: 'false'
8 | useAPIaccelerator: 'true'
9 | ---
10 | apiVersion: apps/v1
11 | kind: Deployment
12 | metadata:
13 | annotations:
14 | app: komodor-configmap-deploy
15 | labels:
16 | app: komodor-configmap-deploy
17 | name: komodor-configmap-deploy
18 | spec:
19 | replicas: 1
20 | selector:
21 | matchLabels:
22 | app: komodor-configmap-deploy
23 | template:
24 | metadata:
25 | labels:
26 | app: komodor-configmap-deploy
27 | spec:
28 | containers:
29 | - env:
30 | image: nginx:1.23.2
31 | name: nginx
32 | volumeMounts:
33 | - name: komodor-features-configuration
34 | mountPath: /usr/share/app/config
35 | subPath: komodor-features-configuration
36 | volumes:
37 | - name: komodor-features-configuration
38 | configMap:
39 | name: komodor-features-configuration
40 |
--------------------------------------------------------------------------------
/deploys-scenarios/a-simple-deploy-with-a-configmap-change/step2.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ConfigMap
3 | metadata:
4 | name: komodor-features-configuration
5 | data:
6 | reportToDataLake: 'true'
7 | debug: 'true'
8 | useAPIaccelerator: 'true'
9 | sensitivity: '5'
10 | ---
11 | apiVersion: apps/v1
12 | kind: Deployment
13 | metadata:
14 | annotations:
15 | app: komodor-configmap-deploy
16 | labels:
17 | app: komodor-configmap-deploy
18 | name: komodor-configmap-deploy
19 | spec:
20 | replicas: 1
21 | selector:
22 | matchLabels:
23 | app: komodor-configmap-deploy
24 | template:
25 | metadata:
26 | labels:
27 | app: komodor-configmap-deploy
28 | spec:
29 | containers:
30 | - env:
31 | - name: HW_ACCELERATION_ENABLED
32 | value: "True"
33 | image: nginx:1.23.2
34 | name: nginx
35 | volumeMounts:
36 | - name: komodor-features-configuration
37 | mountPath: /usr/share/app/config
38 | subPath: komodor-features-configuration
39 | volumes:
40 | - name: komodor-features-configuration
41 | configMap:
42 | name: komodor-features-configuration
43 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Komodor Failure Scenarios
4 |
5 | This is the go-to repository to run scenarios on your clusters. These use cases enable you simulate day-to-day real life experiences. Running these scenarios will demonstrate Komodor's ability to identify and remediate these scenarios.
6 |
7 |
8 | ## How to Use?
9 | Pick a scenario from the list below, apply it to your cluster. Open the [Komodor user interface](https://app.komodor.com/services) and navigate to the appropriate [Service](https://app.komodor.com/services) or ConfigMap.
10 |
11 |
12 | ## Before Starting
13 |
14 | Make sure you have the [Komodor agent running](https://docs.komodor.com/Learn/Install-Komodor-Agent.html) and configured on your clusters.
15 | [Configure monitors](https://app.komodor.com/main/monitors) to generate alerts in each failure.
16 |
17 |
18 | ## Scenarios
19 |
20 | ### Deploy Scenarios
21 | - [Image Pull backoff](./deploys-scenarios/failed-deploy-image-pull-backoff)
22 | - [Create Container Config Error](./deploys-scenarios/failed-deploy-creation-config-error)
23 | - [Deploy with a config map change](./deploys-scenarios/a-simple-deploy-with-a-configmap-change/)
24 |
25 |
26 | ### Failure Scenarios
27 | - [Out of Memory](./failure-scenarios/OOMKilled)
28 | - [Application Issue](./failure-scenarios/application-error-with-exception)
29 | - [Failed Scheduling](./failure-scenarios/failed-to-schedule-pods)
30 |
--------------------------------------------------------------------------------
/failure-scenarios/OOMKilled/README.md:
--------------------------------------------------------------------------------
1 | # Scenario: Application failed because of OOMKilled
2 |
3 | ## Why Is It Important?
4 | OOMKilled is an error that may be difficult to discover. It is transient and requires some expertise to find. It can cause a massive impact on the service itself and other services in the same cluster (noisy neighbor).
5 |
6 | ## Real-Life Example
7 | The application starts to be "killed" because the container exceeds the memory limit. The reason for the failure is not clear. Was the error caused by an application issue or infrastructure issue.
8 |
9 | ## How Komodor Helps?
10 | Komodor detects the OOMKilled failure and immediately shows the reason even if it's gone from the cluster. Komodor also correlates the OOMkilled with infrastructure failure and lets you know if it's an application issue or an infrastructure issue.
11 |
12 | Komodor shows the failed events on the timeline:
13 | 
14 |
15 | Komodor shows the failure reason explicitly with all the relevant information for you to find out how to fix the issue as quickly as you can and without reading endless manuals.
16 | 
17 |
18 |
19 | ## How To Run?
20 | 1. Apply [oom.yaml](oom.yaml)
21 | ``` bash
22 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/OOMKilled/oom.yaml
23 | ```
24 |
25 | It takes ~2 minutes for the OOM the start.
26 |
27 | 2. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-oomkilled) and click on the deploy event created.
28 |
--------------------------------------------------------------------------------
/deploys-scenarios/a-simple-deploy-with-a-configmap-change/README.md:
--------------------------------------------------------------------------------
1 | # Scenario: Correlate Deploy Event with Configmap Changes
2 |
3 | ## Why Is It Important?
4 | Many services are using configmap to separate the code from the running configuration. When there is a value change in the config map, it is very hard to correlate it with the deployment change.
5 |
6 | ## Real Life Example
7 | A user pushed a change that changes the configuration value and when someone else, like yourself is coming to troubleshoot. The information about the configuration change is invisible.
8 |
9 | ## How Komodor Helps?
10 | Komodor correlates changes across the system to a deploy event in a service. You can quickly identify all changes related to a specific deploy by only clicking on the deploy event.
11 |
12 | Komodor shows the deploy events on the timeline:
13 | 
14 |
15 | For each deploy events you have the full information about the deploy:
16 | 
17 |
18 |
19 | ## How To Run?
20 | 1. Apply step1.yaml
21 | ``` bash
22 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/a-simple-deploy-with-a-configmap-change/step1.yaml
23 | ```
24 | 2. Apply step2.yaml
25 | ``` bash
26 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/a-simple-deploy-with-a-configmap-change/step2.yaml
27 | ```
28 | 3. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-configmap-deploy) and click on the deploy event created.
29 |
--------------------------------------------------------------------------------
/deploys-scenarios/failed-deploy-image-pull-backoff/README.md:
--------------------------------------------------------------------------------
1 | # Scenario: Troubleshoot ImagePullBackoff
2 |
3 | ## Why Is It Important?
4 | From time to time, an image pull error can occur and cause the pods to not be able to start and run the application. Usually, it happened when the repository is not accessible or doesn't have the image.
5 |
6 | ## Real Life Example
7 | This problem can happen when someone changes the credentials to the repository and made the secret to it invalid or someone tried to change the image name/tag of a 3rd party tool.
8 |
9 |
10 | ## How Komodor Helps?
11 | Komodor shows the failure reason, the explanation for the error, and what changed in the latest deploy. The user who gets the error understands much fast why the service failed and what changed to fix it.
12 |
13 | Komodor shows the failed deploy events on the timeline:
14 | 
15 |
16 | For each deploy event you have the full information about the deploy with the errors that caused it to fail:
17 | 
18 |
19 | You can click on the diff to see the configuration changes made during this deploy:
20 | 
21 |
22 |
23 | ## How To Run?
24 | 1. Apply an healthy deployment:
25 | ``` bash
26 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml
27 | ```
28 | 1. Apply the same deployment with a wrong image tag:
29 | ``` bash
30 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml
31 | ```
32 | 1. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-imagepull-backoff) and click on the deploy event created.
33 |
--------------------------------------------------------------------------------
/deploys-scenarios/failed-deploy-creation-config-error/README.md:
--------------------------------------------------------------------------------
1 | # Scenario: Investigate Failed Deploy Because Of a Bad Reference To a ConfigMap
2 |
3 | ## Why Is It Important?
4 | Many services are using ConfigMaps to separate the code from the running configuration. In a case where there is a reference in a deployment to a non-exist config map. This bad ref causes failure to create the new pods.
5 |
6 | ## Real-Life Example
7 | A user wants to add a new configuration to the deployment, so it creates a new config map and changes the deployment configuration to get value from the config map. But the user ref to a configmap that is not in the cluster and maybe even a small typo. All of that will cause the pods to fail to be created.
8 |
9 | ## How Komodor Helps?
10 | Komodor detects the failed deploy, correlates it with the applied changes to the deployment configuration and the configmap, and shows exactly why the deploy failed with a clear explanation.
11 |
12 | Komodor shows the deploy events on the timeline:
13 | 
14 |
15 | For each deploy event you have the full information about the deploy and why it failed:
16 | 
17 |
18 |
19 | ## How To Run?
20 | 1. Apply a deployment with healthy status:
21 | ``` bash
22 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml
23 | ```
24 | 2. Apply the deployment with a bad ref to a secret:
25 | ``` bash
26 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml
27 | ```
28 | 3. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-create-container-config-error) and click on the deploy event created.
29 |
--------------------------------------------------------------------------------
/failure-scenarios/failed-to-schedule-pods/README.md:
--------------------------------------------------------------------------------
1 | # Scenario: My Application' Pods Failed to Schedule
2 |
3 | ## Why Is It Important?
4 | Pods & Nodes constraints can make the scheduling job challengin. When there is no node for a pod to run on, the pod fails with a **FailedScheduling** which causes a negative impact, espcially during scaling and rollout of a new version.
5 |
6 | ## Real Life Example
7 | 1. During scale up, many pods spawn to support the load, however if there are no available nodes in the cluster, the users will not be served.
8 | 2. An applicaiton pod requests a large amount of memory, which is unavailable, causing rollout to fail.
9 |
10 | ## How Komodor Helps?
11 | Komodor detects anytime a pod is **failed to schedule** and creates an event with a clear explanation for **why it's failed to schedule**?
12 |
13 | Komodor shows the failed deploy events on the timeline:
14 | 
15 |
16 | For each deploy events you have the full information about the deploy:
17 | 
18 |
19 | Note: This pod requires 500Gi of memory, please makes sure your autoascaler doesn't allow this size of nodes.
20 |
21 | ## How To Run?
22 | 1. Apply an healthy deployment:
23 | ``` bash
24 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/failed-to-schedule-pods/healthy-deploy.yaml
25 | ```
26 |
27 |
28 | 2. Apply [failed-scheduling.yaml](failed-scheduling.yaml)
29 | ``` bash
30 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/failed-to-schedule-pods/failed-scheduling.yaml
31 | ```
32 |
33 | It takes at least 10 minutes for Kubernetes to mark this deploy as failed.
34 |
35 | 3. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-failed-scheduling) and click on the deploy event created.
36 |
37 |
--------------------------------------------------------------------------------
/training-session/run-all.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 |
3 | echo ""
4 | echo "Please insert your name to create a new namespace: (lower case, no spaces allowed)"
5 | read NS_NAME
6 |
7 | kubectl create ns $NS_NAME
8 |
9 | echo ""
10 | echo "Deploy Services : (n/N/y/Y)"
11 | read ANSWER
12 | if [[ $ANSWER == "Y" ]] || [[ $ANSWER == "y" ]]
13 | then
14 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml -n $NS_NAME
15 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml -n $NS_NAME
16 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/simple-application.yaml -n $NS_NAME
17 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/OOMKilled/oom.yaml -n $NS_NAME
18 | sleep 5
19 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml -n $NS_NAME
20 | sleep 5
21 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml -n $NS_NAME
22 | sleep 5
23 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/application-error.yaml -n $NS_NAME
24 | else
25 | echo "Skipping..."
26 | fi
27 |
28 | echo ""
29 | echo " WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! "
30 | echo ""
31 | echo " Do you want to delete your environment? (n/N/y/Y)"
32 | echo ""
33 | read ANSWER
34 | if [[ $ANSWER == "Y" ]] || [[ $ANSWER == "y" ]]
35 | then
36 | kubectl delete ns $NS_NAME
37 | else
38 | echo "Skipping..."
39 | fi
40 |
41 | echo "END - Thank you"
--------------------------------------------------------------------------------
/failure-scenarios/application-error-with-exception/simple-application.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ConfigMap
3 | metadata:
4 | name: komodor-python-script
5 | data:
6 | python-script: |-
7 | import time
8 | import os
9 | import sys
10 | import logging
11 | logging.basicConfig(stream=sys.stdout, level=logging.INFO)
12 |
13 | exit_code = int(os.getenv('EXIT_CODE')) if os.getenv('EXIT_CODE') else 0
14 |
15 | def start_service():
16 | initialize_connections()
17 |
18 | def initialize_connections():
19 | fetch_configuration()
20 |
21 | def fetch_configuration():
22 | create_connection()
23 |
24 | def create_connection():
25 | conn_auth()
26 |
27 | def conn_auth():
28 | if exit_code == 0:
29 | logging.info("connection established")
30 | else:
31 | raise Exception("Can't perform the requested task - authentication error")
32 |
33 | time.sleep(10)
34 | start_service()
35 | while True:
36 | logging.info("service loop")
37 | time.sleep(10)
38 | ---
39 | apiVersion: apps/v1
40 | kind: Deployment
41 | metadata:
42 | annotations:
43 | app: komo-application-error
44 | labels:
45 | app: komo-application-error
46 | name: komo-application-error
47 | spec:
48 | replicas: 1
49 | selector:
50 | matchLabels:
51 | app: komo-application-error
52 | template:
53 | metadata:
54 | labels:
55 | app: komo-application-error
56 | spec:
57 | containers:
58 | - env:
59 | - name: LOG_LEVEL
60 | value: "INFO"
61 | - name: EXIT_CODE
62 | value: "0"
63 | image: python:3.11-alpine
64 | name: python
65 | command: ["python"]
66 | args: ["/usr/share/app/code.py"]
67 | volumeMounts:
68 | - name: komodor-python-script
69 | mountPath: /usr/share/app/code.py
70 | subPath: python-script
71 | volumes:
72 | - name: komodor-python-script
73 | configMap:
74 | name: komodor-python-script
75 |
--------------------------------------------------------------------------------
/failure-scenarios/application-error-with-exception/application-error.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ConfigMap
3 | metadata:
4 | name: komodor-python-script
5 | data:
6 | python-script: |-
7 | import time
8 | import os
9 | import sys
10 | import logging
11 | logging.basicConfig(stream=sys.stdout, level=logging.INFO)
12 |
13 | exit_code = int(os.getenv('EXIT_CODE')) if os.getenv('EXIT_CODE') else 0
14 |
15 | def start_service():
16 | initialize_connections()
17 |
18 | def initialize_connections():
19 | fetch_configuration()
20 |
21 | def fetch_configuration():
22 | create_connection()
23 |
24 | def create_connection():
25 | conn_auth()
26 |
27 | def conn_auth():
28 | if exit_code == 0:
29 | logging.info("connection established")
30 | else:
31 | logging.info("a problem detected, exit code env var is " + str(exit_code))
32 | raise Exception("Can't perform the requested task - authentication error")
33 |
34 | time.sleep(10)
35 | start_service()
36 | while True:
37 | logging.info("service loop")
38 | time.sleep(10)
39 | ---
40 | apiVersion: apps/v1
41 | kind: Deployment
42 | metadata:
43 | annotations:
44 | app: komo-application-error
45 | labels:
46 | app: komo-application-error
47 | name: komo-application-error
48 | spec:
49 | replicas: 1
50 | selector:
51 | matchLabels:
52 | app: komo-application-error
53 | template:
54 | metadata:
55 | labels:
56 | app: komo-application-error
57 | spec:
58 | containers:
59 | - env:
60 | - name: LOG_LEVEL
61 | value: "INFO"
62 | - name: EXIT_CODE
63 | value: "1"
64 | image: python:3.11-alpine
65 | name: python
66 | command: ["python"]
67 | args: ["/usr/share/app/code.py"]
68 | volumeMounts:
69 | - name: komodor-python-script
70 | mountPath: /usr/share/app/code.py
71 | subPath: python-script
72 | volumes:
73 | - name: komodor-python-script
74 | configMap:
75 | name: komodor-python-script
--------------------------------------------------------------------------------
/failure-scenarios/application-error-with-exception/README.md:
--------------------------------------------------------------------------------
1 | # Scenario: Application Failure
2 |
3 | ## Why Is It Important?
4 | Every service is prone to application failures. Some of these failures are minor while others can cause the pods to crash and the application to become completely unavailable.
5 |
6 | ## Real-Life Example
7 | A developer pushed a new code to the service with a bug and since then the pods of the service are continuously crashing.
8 |
9 | ## How Komodor Helps?
10 | Komodor automatically detects that there is an issue and immediately runs a playbook to investigates the issue's root cause.
11 | Komodor shows the user all checks and findings initially the root cause, which is an application issue correlated with
12 | the container logs, metrics, and recent changes, to solve the problem quickly as possible. Instead of giving an indicative message like `CrashLoopBackOff`.
13 |
14 |
15 | Komodor shows the availability issue & the failed deploy events on the timeline:
16 | 
17 |
18 | In the availability issue, you have the full information about the issue, like time, reason, explanation, relevant information and logs. It's easy to identify that this issue is caused by an application problem:
19 | 
20 |
21 | You can also view logs to debug the application:
22 | 
23 |
24 |
25 | ## How To Run?
26 | 1. Apply a healthy deployment:
27 | ``` bash
28 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/simple-application.yaml
29 | ```
30 |
31 | 1. Apply the same deployment with an application issue:
32 | ``` bash
33 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/application-error.yaml
34 | ```
35 | 1. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komo-application-error) and click on the availbility issue created.
36 |
--------------------------------------------------------------------------------
/training-session/README.md:
--------------------------------------------------------------------------------
1 | ### Namespace Creation
2 |
3 | 1. Create your namespace:
4 |
5 | ```bash
6 | kubectl create ns [user]
7 | ```
8 |
9 |
10 |
11 |
12 | ### First Deploy + Service Operations
13 |
14 | 1. Deploy a new deployment
15 |
16 | ```bash
17 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml -n [user]
18 | ```
19 |
20 | 2. Review the service created in Komodor
21 |
22 | 2. Delete one of the pods and check the new pod that just spawned
23 |
24 | 3. Scale the replica to 2 using the dedicated button on the top of the screen
25 |
26 | 4. Change the image to version 1.20.0 using the “edit yaml” button and check the deploy that just started
27 |
28 |
29 |
30 |
31 | ### Failed Deploy - Image Errors
32 |
33 | 1. Change the image of the deployment using this configuration
34 |
35 | ```bash
36 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml -n [user]]
37 | ```
38 |
39 | 2. Check the failed deploy and revert it using the button on the suggested actions section
40 |
41 |
42 |
43 |
44 | ### Application View
45 |
46 | Application views allow you to scope Komodor into your own application and get insights.
47 |
48 | 1. [Create a new application view](https://app.komodor.com/app-view/new), select dynamic scope for your own namespace.
49 |
50 |
51 | ### Failed Deploy - Configuration Errors
52 |
53 | 1. Deploy a new deployment
54 |
55 | ```bash
56 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml -n [user]
57 | ```
58 |
59 | 2. After the deploy is completed, apply the next deploy - which is going to fail
60 |
61 | ```bash
62 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml -n [user]
63 | ```
64 |
65 | 3. After you understand what's the reason for the failure, edit the yaml and fix the issue
66 |
67 |
68 |
69 |
70 | ### Troubleshooting - CrashLoop
71 |
72 | 1. Deploy a new deployment
73 |
74 | ```bash
75 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/simple-application.yaml -n [user]
76 | ```
77 |
78 | 2. Wait it for it to be healthy and apply a new version of this deployment
79 |
80 | ```bash
81 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/application-error.yaml -n [user]
82 | ```
83 |
84 | 3. Fix the issue by reverting it back to the previous version or edit the yaml.
85 | Advanced users - You can also edit the code resides in the configmap and force a rollout with the new version.
86 |
87 |
88 |
89 |
90 | ### Troubleshooting - OOMKilled
91 |
92 | 1. Apply a new deployment
93 |
94 | ```bash
95 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/OOMKilled/oom.yaml -n [user]
96 | ```
97 |
98 | 2. Review the issues created and the suggested actions.
99 |
100 | 3. Based on the suggest actions, change the memory limits to 75Mi.
101 |
102 |
103 |
104 |
105 | ### Troubleshooting - Probes
106 |
107 | 1. Deploy a new deployment
108 |
109 | ```bash
110 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/add-ready-live-example/failure-scenarios/ready-live-failure/healthy-app.yaml -n [user]
111 | ```
112 |
113 | 1. Check the best practices section in the info tab. Which probes are missing?
114 |
115 | 2. Let's configure it using this yaml - apply it
116 |
117 | ```bash
118 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/add-ready-live-example/failure-scenarios/ready-live-failure/fail-both.yaml -n [user]
119 | ```
120 |
121 | 3. let’s review the diff between the deploy on the deploy event
122 |
123 | 4. Check why the deploy failed - beacuse of liveness probes failure (you can find it in the events of the pod)
124 |
125 | 5. Change the configuration of the probes using the "edit YAML" button
126 |
127 | ## Delete Your Namespace
128 |
129 | ```bash
130 | kubectl delete ns [user]
131 | ```
132 |
133 |
134 | ## Run all
135 | ```bash
136 | bash <(curl -s https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/training-session/run-all.sh)
137 | ```
138 |
--------------------------------------------------------------------------------