├── assets └── img │ ├── repo-banner.png │ ├── deploy-scenarios │ ├── image-pull-diff.png │ ├── image-pull-event.png │ ├── image-pull-err-timeline.png │ ├── deploy-event-explanation.png │ ├── deploy-with-configmap-example.png │ ├── clean-timeline-with-deploy-event.png │ ├── config-error-failed-deploy-event.png │ └── config-error-failed-deploy-on-the-timeline.png │ ├── failure-scenarios │ ├── oomkilled-event.png │ ├── oomkilled-timeline.png │ ├── application-issue-logs.png │ ├── application-issue-event.png │ └── application-issue-availability-issue-detected.png │ └── failed-scenarios │ ├── failed-scheduling-event.png │ └── failed-scheduling-timeline.png ├── deploys-scenarios ├── failed-deploy-creation-config-error │ ├── healthy-deploy.yaml │ ├── createcontainerconfigerror.yaml │ └── README.md ├── failed-deploy-image-pull-backoff │ ├── imagepullbackoff.yaml │ ├── nginx-image-healthy.yaml │ └── README.md └── a-simple-deploy-with-a-configmap-change │ ├── step1.yaml │ ├── step2.yaml │ └── README.md ├── failure-scenarios ├── failed-to-schedule-pods │ ├── healthy-deploy.yaml │ ├── failed-scheduling.yaml │ └── README.md ├── OOMKilled │ ├── oom.yaml │ └── README.md └── application-error-with-exception │ ├── simple-application.yaml │ ├── application-error.yaml │ └── README.md ├── README.md └── training-session ├── run-all.sh └── README.md /assets/img/repo-banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/repo-banner.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/image-pull-diff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/image-pull-diff.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/image-pull-event.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/image-pull-event.png -------------------------------------------------------------------------------- /assets/img/failure-scenarios/oomkilled-event.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/oomkilled-event.png -------------------------------------------------------------------------------- /assets/img/failure-scenarios/oomkilled-timeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/oomkilled-timeline.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/image-pull-err-timeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/image-pull-err-timeline.png -------------------------------------------------------------------------------- /assets/img/failed-scenarios/failed-scheduling-event.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failed-scenarios/failed-scheduling-event.png -------------------------------------------------------------------------------- /assets/img/failure-scenarios/application-issue-logs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/application-issue-logs.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/deploy-event-explanation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/deploy-event-explanation.png -------------------------------------------------------------------------------- /assets/img/failure-scenarios/application-issue-event.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/application-issue-event.png -------------------------------------------------------------------------------- /assets/img/failed-scenarios/failed-scheduling-timeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failed-scenarios/failed-scheduling-timeline.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/deploy-with-configmap-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/deploy-with-configmap-example.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/clean-timeline-with-deploy-event.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/clean-timeline-with-deploy-event.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/config-error-failed-deploy-event.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/config-error-failed-deploy-event.png -------------------------------------------------------------------------------- /assets/img/deploy-scenarios/config-error-failed-deploy-on-the-timeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/deploy-scenarios/config-error-failed-deploy-on-the-timeline.png -------------------------------------------------------------------------------- /assets/img/failure-scenarios/application-issue-availability-issue-detected.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/komodorio/komodor-tutorials/HEAD/assets/img/failure-scenarios/application-issue-availability-issue-detected.png -------------------------------------------------------------------------------- /deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: komodor-create-container-config-error 5 | labels: 6 | app: komodor-create-error 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: komodor-create-error 12 | template: 13 | metadata: 14 | labels: 15 | app: komodor-create-error 16 | spec: 17 | containers: 18 | - name: crash-demo 19 | image: nginx:1.21.6 20 | -------------------------------------------------------------------------------- /deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: komodor-imagepull-backoff 5 | labels: 6 | app: komodor-imagepull-backoff 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: komodor-imagepull-backoff 12 | template: 13 | metadata: 14 | labels: 15 | app: komodor-imagepull-backoff 16 | spec: 17 | containers: 18 | - name: imagepull-demo 19 | image: nginx:1.221.0 20 | -------------------------------------------------------------------------------- /deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: komodor-imagepull-backoff 5 | labels: 6 | app: komodor-imagepull-backoff 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: komodor-imagepull-backoff 12 | template: 13 | metadata: 14 | labels: 15 | app: komodor-imagepull-backoff 16 | spec: 17 | containers: 18 | - name: imagepull-demo 19 | image: nginx:1.21.0 20 | -------------------------------------------------------------------------------- /failure-scenarios/failed-to-schedule-pods/healthy-deploy.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: komodor-failed-scheduling 5 | labels: 6 | app: komodor-failed-scheduling 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: komodor-failed-scheduling 12 | template: 13 | metadata: 14 | labels: 15 | app: komodor-failed-scheduling 16 | spec: 17 | containers: 18 | - name: nginx 19 | image: nginx:1.23.2 20 | env: 21 | - name: BITNAMI_DEBUG 22 | value: "false" 23 | - name: NGINX_HTTP_PORT_NUMBER 24 | value: "8080" 25 | -------------------------------------------------------------------------------- /failure-scenarios/failed-to-schedule-pods/failed-scheduling.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: komodor-failed-scheduling 5 | labels: 6 | app: komodor-failed-scheduling 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: komodor-failed-scheduling 12 | template: 13 | metadata: 14 | labels: 15 | app: komodor-failed-scheduling 16 | spec: 17 | containers: 18 | - name: nginx 19 | image: nginx:1.23.2 20 | env: 21 | - name: BITNAMI_DEBUG 22 | value: "false" 23 | - name: NGINX_HTTP_PORT_NUMBER 24 | value: "8080" 25 | resources: 26 | limits: 27 | memory: 500Gi 28 | requests: 29 | memory: 500Gi -------------------------------------------------------------------------------- /failure-scenarios/OOMKilled/oom.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: komodor-oomkilled 5 | labels: 6 | app: komodor-oomkilled 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: komodor-oomkilled 12 | template: 13 | metadata: 14 | labels: 15 | app: komodor-oomkilled 16 | spec: 17 | containers: 18 | - name: komodor-oomkilled 19 | image: polinux/stress 20 | command: ["/bin/sh", "-c"] 21 | args: ["echo 'Going to allocate 60MB of memory!' ; echo 'Going to allocate 60MB of memory!' ; echo 'Going to allocate 60MB of memory!' ; stress --vm 2 --vm-bytes 30M --vm-hang 120 --backoff 10000000 --verbose"] 22 | resources: 23 | requests: 24 | memory: "40Mi" 25 | limits: 26 | memory: "40Mi" 27 | -------------------------------------------------------------------------------- /deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: komodor-create-container-config-error 5 | labels: 6 | app: komodor-create-error 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: komodor-create-error 12 | template: 13 | metadata: 14 | labels: 15 | app: komodor-create-error 16 | spec: 17 | containers: 18 | - name: crash-demo 19 | image: nginx:1.21.6 20 | env: 21 | - name: SECRET_TOKEN 22 | valueFrom: 23 | configMapKeyRef: 24 | name: api-access-token 25 | key: SECRET_TOKEN 26 | - name: API_ENDPOINT 27 | valueFrom: 28 | configMapKeyRef: 29 | name: api-access-token 30 | key: API_ENDPOINT 31 | --- 32 | apiVersion: v1 33 | kind: ConfigMap 34 | metadata: 35 | name: api-access-token 36 | data: 37 | SECRET_TOKEN: dmFsdWUtMg0KDQo= 38 | -------------------------------------------------------------------------------- /deploys-scenarios/a-simple-deploy-with-a-configmap-change/step1.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: komodor-features-configuration 5 | data: 6 | reportToDataLake: 'true' 7 | debug: 'false' 8 | useAPIaccelerator: 'true' 9 | --- 10 | apiVersion: apps/v1 11 | kind: Deployment 12 | metadata: 13 | annotations: 14 | app: komodor-configmap-deploy 15 | labels: 16 | app: komodor-configmap-deploy 17 | name: komodor-configmap-deploy 18 | spec: 19 | replicas: 1 20 | selector: 21 | matchLabels: 22 | app: komodor-configmap-deploy 23 | template: 24 | metadata: 25 | labels: 26 | app: komodor-configmap-deploy 27 | spec: 28 | containers: 29 | - env: 30 | image: nginx:1.23.2 31 | name: nginx 32 | volumeMounts: 33 | - name: komodor-features-configuration 34 | mountPath: /usr/share/app/config 35 | subPath: komodor-features-configuration 36 | volumes: 37 | - name: komodor-features-configuration 38 | configMap: 39 | name: komodor-features-configuration 40 | -------------------------------------------------------------------------------- /deploys-scenarios/a-simple-deploy-with-a-configmap-change/step2.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: komodor-features-configuration 5 | data: 6 | reportToDataLake: 'true' 7 | debug: 'true' 8 | useAPIaccelerator: 'true' 9 | sensitivity: '5' 10 | --- 11 | apiVersion: apps/v1 12 | kind: Deployment 13 | metadata: 14 | annotations: 15 | app: komodor-configmap-deploy 16 | labels: 17 | app: komodor-configmap-deploy 18 | name: komodor-configmap-deploy 19 | spec: 20 | replicas: 1 21 | selector: 22 | matchLabels: 23 | app: komodor-configmap-deploy 24 | template: 25 | metadata: 26 | labels: 27 | app: komodor-configmap-deploy 28 | spec: 29 | containers: 30 | - env: 31 | - name: HW_ACCELERATION_ENABLED 32 | value: "True" 33 | image: nginx:1.23.2 34 | name: nginx 35 | volumeMounts: 36 | - name: komodor-features-configuration 37 | mountPath: /usr/share/app/config 38 | subPath: komodor-features-configuration 39 | volumes: 40 | - name: komodor-features-configuration 41 | configMap: 42 | name: komodor-features-configuration 43 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![banner](./assets/img/repo-banner.png) 2 | 3 | # Komodor Failure Scenarios 4 | 5 | This is the go-to repository to run scenarios on your clusters. These use cases enable you simulate day-to-day real life experiences. Running these scenarios will demonstrate Komodor's ability to identify and remediate these scenarios. 6 | 7 | 8 | ## How to Use? 9 | Pick a scenario from the list below, apply it to your cluster. Open the [Komodor user interface](https://app.komodor.com/services) and navigate to the appropriate [Service](https://app.komodor.com/services) or ConfigMap. 10 | 11 | 12 | ## Before Starting 13 | 14 | Make sure you have the [Komodor agent running](https://docs.komodor.com/Learn/Install-Komodor-Agent.html) and configured on your clusters. 15 | [Configure monitors](https://app.komodor.com/main/monitors) to generate alerts in each failure. 16 | 17 | 18 | ## Scenarios 19 | 20 | ### Deploy Scenarios 21 | - [Image Pull backoff](./deploys-scenarios/failed-deploy-image-pull-backoff) 22 | - [Create Container Config Error](./deploys-scenarios/failed-deploy-creation-config-error) 23 | - [Deploy with a config map change](./deploys-scenarios/a-simple-deploy-with-a-configmap-change/) 24 | 25 | 26 | ### Failure Scenarios 27 | - [Out of Memory](./failure-scenarios/OOMKilled) 28 | - [Application Issue](./failure-scenarios/application-error-with-exception) 29 | - [Failed Scheduling](./failure-scenarios/failed-to-schedule-pods) 30 | -------------------------------------------------------------------------------- /failure-scenarios/OOMKilled/README.md: -------------------------------------------------------------------------------- 1 | # Scenario: Application failed because of OOMKilled 2 | 3 | ## Why Is It Important? 4 | OOMKilled is an error that may be difficult to discover. It is transient and requires some expertise to find. It can cause a massive impact on the service itself and other services in the same cluster (noisy neighbor). 5 | 6 | ## Real-Life Example 7 | The application starts to be "killed" because the container exceeds the memory limit. The reason for the failure is not clear. Was the error caused by an application issue or infrastructure issue. 8 | 9 | ## How Komodor Helps? 10 | Komodor detects the OOMKilled failure and immediately shows the reason even if it's gone from the cluster. Komodor also correlates the OOMkilled with infrastructure failure and lets you know if it's an application issue or an infrastructure issue. 11 | 12 | Komodor shows the failed events on the timeline: 13 | ![banner](../../assets/img/failure-scenarios/oomkilled-timeline.png) 14 | 15 | Komodor shows the failure reason explicitly with all the relevant information for you to find out how to fix the issue as quickly as you can and without reading endless manuals. 16 | ![banner](../../assets/img/failure-scenarios/oomkilled-event.png) 17 | 18 | 19 | ## How To Run? 20 | 1. Apply [oom.yaml](oom.yaml) 21 | ``` bash 22 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/OOMKilled/oom.yaml 23 | ``` 24 | 25 | It takes ~2 minutes for the OOM the start. 26 | 27 | 2. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-oomkilled) and click on the deploy event created. 28 | -------------------------------------------------------------------------------- /deploys-scenarios/a-simple-deploy-with-a-configmap-change/README.md: -------------------------------------------------------------------------------- 1 | # Scenario: Correlate Deploy Event with Configmap Changes 2 | 3 | ## Why Is It Important? 4 | Many services are using configmap to separate the code from the running configuration. When there is a value change in the config map, it is very hard to correlate it with the deployment change. 5 | 6 | ## Real Life Example 7 | A user pushed a change that changes the configuration value and when someone else, like yourself is coming to troubleshoot. The information about the configuration change is invisible. 8 | 9 | ## How Komodor Helps? 10 | Komodor correlates changes across the system to a deploy event in a service. You can quickly identify all changes related to a specific deploy by only clicking on the deploy event. 11 | 12 | Komodor shows the deploy events on the timeline: 13 | ![banner](../../assets/img/deploy-scenarios/clean-timeline-with-deploy-event.png) 14 | 15 | For each deploy events you have the full information about the deploy: 16 | ![banner](../../assets/img/deploy-scenarios/deploy-event-explanation.png) 17 | 18 | 19 | ## How To Run? 20 | 1. Apply step1.yaml 21 | ``` bash 22 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/a-simple-deploy-with-a-configmap-change/step1.yaml 23 | ``` 24 | 2. Apply step2.yaml 25 | ``` bash 26 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/a-simple-deploy-with-a-configmap-change/step2.yaml 27 | ``` 28 | 3. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-configmap-deploy) and click on the deploy event created. 29 | -------------------------------------------------------------------------------- /deploys-scenarios/failed-deploy-image-pull-backoff/README.md: -------------------------------------------------------------------------------- 1 | # Scenario: Troubleshoot ImagePullBackoff 2 | 3 | ## Why Is It Important? 4 | From time to time, an image pull error can occur and cause the pods to not be able to start and run the application. Usually, it happened when the repository is not accessible or doesn't have the image. 5 | 6 | ## Real Life Example 7 | This problem can happen when someone changes the credentials to the repository and made the secret to it invalid or someone tried to change the image name/tag of a 3rd party tool. 8 | 9 | 10 | ## How Komodor Helps? 11 | Komodor shows the failure reason, the explanation for the error, and what changed in the latest deploy. The user who gets the error understands much fast why the service failed and what changed to fix it. 12 | 13 | Komodor shows the failed deploy events on the timeline: 14 | ![banner](../../assets/img/deploy-scenarios/image-pull-err-timeline.png) 15 | 16 | For each deploy event you have the full information about the deploy with the errors that caused it to fail: 17 | ![banner](../../assets/img/deploy-scenarios/image-pull-event.png) 18 | 19 | You can click on the diff to see the configuration changes made during this deploy: 20 | ![banner](../../assets/img/deploy-scenarios/image-pull-diff.png) 21 | 22 | 23 | ## How To Run? 24 | 1. Apply an healthy deployment: 25 | ``` bash 26 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml 27 | ``` 28 | 1. Apply the same deployment with a wrong image tag: 29 | ``` bash 30 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml 31 | ``` 32 | 1. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-imagepull-backoff) and click on the deploy event created. 33 | -------------------------------------------------------------------------------- /deploys-scenarios/failed-deploy-creation-config-error/README.md: -------------------------------------------------------------------------------- 1 | # Scenario: Investigate Failed Deploy Because Of a Bad Reference To a ConfigMap 2 | 3 | ## Why Is It Important? 4 | Many services are using ConfigMaps to separate the code from the running configuration. In a case where there is a reference in a deployment to a non-exist config map. This bad ref causes failure to create the new pods. 5 | 6 | ## Real-Life Example 7 | A user wants to add a new configuration to the deployment, so it creates a new config map and changes the deployment configuration to get value from the config map. But the user ref to a configmap that is not in the cluster and maybe even a small typo. All of that will cause the pods to fail to be created. 8 | 9 | ## How Komodor Helps? 10 | Komodor detects the failed deploy, correlates it with the applied changes to the deployment configuration and the configmap, and shows exactly why the deploy failed with a clear explanation. 11 | 12 | Komodor shows the deploy events on the timeline: 13 | ![banner](../../assets/img/deploy-scenarios/config-error-failed-deploy-on-the-timeline.png) 14 | 15 | For each deploy event you have the full information about the deploy and why it failed: 16 | ![banner](../../assets/img/deploy-scenarios/config-error-failed-deploy-event.png) 17 | 18 | 19 | ## How To Run? 20 | 1. Apply a deployment with healthy status: 21 | ``` bash 22 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml 23 | ``` 24 | 2. Apply the deployment with a bad ref to a secret: 25 | ``` bash 26 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml 27 | ``` 28 | 3. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-create-container-config-error) and click on the deploy event created. 29 | -------------------------------------------------------------------------------- /failure-scenarios/failed-to-schedule-pods/README.md: -------------------------------------------------------------------------------- 1 | # Scenario: My Application' Pods Failed to Schedule 2 | 3 | ## Why Is It Important? 4 | Pods & Nodes constraints can make the scheduling job challengin. When there is no node for a pod to run on, the pod fails with a **FailedScheduling** which causes a negative impact, espcially during scaling and rollout of a new version. 5 | 6 | ## Real Life Example 7 | 1. During scale up, many pods spawn to support the load, however if there are no available nodes in the cluster, the users will not be served. 8 | 2. An applicaiton pod requests a large amount of memory, which is unavailable, causing rollout to fail. 9 | 10 | ## How Komodor Helps? 11 | Komodor detects anytime a pod is **failed to schedule** and creates an event with a clear explanation for **why it's failed to schedule**? 12 | 13 | Komodor shows the failed deploy events on the timeline: 14 | ![banner](../../assets/img/deploy-scenarios/clean-timeline-with-deploy-event.png) 15 | 16 | For each deploy events you have the full information about the deploy: 17 | ![banner](../../assets/img/deploy-scenarios/deploy-event-explanation.png) 18 | 19 | Note: This pod requires 500Gi of memory, please makes sure your autoascaler doesn't allow this size of nodes. 20 | 21 | ## How To Run? 22 | 1. Apply an healthy deployment: 23 | ``` bash 24 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/failed-to-schedule-pods/healthy-deploy.yaml 25 | ``` 26 | 27 | 28 | 2. Apply [failed-scheduling.yaml](failed-scheduling.yaml) 29 | ``` bash 30 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/failed-to-schedule-pods/failed-scheduling.yaml 31 | ``` 32 | 33 | It takes at least 10 minutes for Kubernetes to mark this deploy as failed. 34 | 35 | 3. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komodor-failed-scheduling) and click on the deploy event created. 36 | 37 | -------------------------------------------------------------------------------- /training-session/run-all.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | echo "" 4 | echo "Please insert your name to create a new namespace: (lower case, no spaces allowed)" 5 | read NS_NAME 6 | 7 | kubectl create ns $NS_NAME 8 | 9 | echo "" 10 | echo "Deploy Services : (n/N/y/Y)" 11 | read ANSWER 12 | if [[ $ANSWER == "Y" ]] || [[ $ANSWER == "y" ]] 13 | then 14 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml -n $NS_NAME 15 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml -n $NS_NAME 16 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/simple-application.yaml -n $NS_NAME 17 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/OOMKilled/oom.yaml -n $NS_NAME 18 | sleep 5 19 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml -n $NS_NAME 20 | sleep 5 21 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml -n $NS_NAME 22 | sleep 5 23 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/application-error.yaml -n $NS_NAME 24 | else 25 | echo "Skipping..." 26 | fi 27 | 28 | echo "" 29 | echo " WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! " 30 | echo "" 31 | echo " Do you want to delete your environment? (n/N/y/Y)" 32 | echo "" 33 | read ANSWER 34 | if [[ $ANSWER == "Y" ]] || [[ $ANSWER == "y" ]] 35 | then 36 | kubectl delete ns $NS_NAME 37 | else 38 | echo "Skipping..." 39 | fi 40 | 41 | echo "END - Thank you" -------------------------------------------------------------------------------- /failure-scenarios/application-error-with-exception/simple-application.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: komodor-python-script 5 | data: 6 | python-script: |- 7 | import time 8 | import os 9 | import sys 10 | import logging 11 | logging.basicConfig(stream=sys.stdout, level=logging.INFO) 12 | 13 | exit_code = int(os.getenv('EXIT_CODE')) if os.getenv('EXIT_CODE') else 0 14 | 15 | def start_service(): 16 | initialize_connections() 17 | 18 | def initialize_connections(): 19 | fetch_configuration() 20 | 21 | def fetch_configuration(): 22 | create_connection() 23 | 24 | def create_connection(): 25 | conn_auth() 26 | 27 | def conn_auth(): 28 | if exit_code == 0: 29 | logging.info("connection established") 30 | else: 31 | raise Exception("Can't perform the requested task - authentication error") 32 | 33 | time.sleep(10) 34 | start_service() 35 | while True: 36 | logging.info("service loop") 37 | time.sleep(10) 38 | --- 39 | apiVersion: apps/v1 40 | kind: Deployment 41 | metadata: 42 | annotations: 43 | app: komo-application-error 44 | labels: 45 | app: komo-application-error 46 | name: komo-application-error 47 | spec: 48 | replicas: 1 49 | selector: 50 | matchLabels: 51 | app: komo-application-error 52 | template: 53 | metadata: 54 | labels: 55 | app: komo-application-error 56 | spec: 57 | containers: 58 | - env: 59 | - name: LOG_LEVEL 60 | value: "INFO" 61 | - name: EXIT_CODE 62 | value: "0" 63 | image: python:3.11-alpine 64 | name: python 65 | command: ["python"] 66 | args: ["/usr/share/app/code.py"] 67 | volumeMounts: 68 | - name: komodor-python-script 69 | mountPath: /usr/share/app/code.py 70 | subPath: python-script 71 | volumes: 72 | - name: komodor-python-script 73 | configMap: 74 | name: komodor-python-script 75 | -------------------------------------------------------------------------------- /failure-scenarios/application-error-with-exception/application-error.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: komodor-python-script 5 | data: 6 | python-script: |- 7 | import time 8 | import os 9 | import sys 10 | import logging 11 | logging.basicConfig(stream=sys.stdout, level=logging.INFO) 12 | 13 | exit_code = int(os.getenv('EXIT_CODE')) if os.getenv('EXIT_CODE') else 0 14 | 15 | def start_service(): 16 | initialize_connections() 17 | 18 | def initialize_connections(): 19 | fetch_configuration() 20 | 21 | def fetch_configuration(): 22 | create_connection() 23 | 24 | def create_connection(): 25 | conn_auth() 26 | 27 | def conn_auth(): 28 | if exit_code == 0: 29 | logging.info("connection established") 30 | else: 31 | logging.info("a problem detected, exit code env var is " + str(exit_code)) 32 | raise Exception("Can't perform the requested task - authentication error") 33 | 34 | time.sleep(10) 35 | start_service() 36 | while True: 37 | logging.info("service loop") 38 | time.sleep(10) 39 | --- 40 | apiVersion: apps/v1 41 | kind: Deployment 42 | metadata: 43 | annotations: 44 | app: komo-application-error 45 | labels: 46 | app: komo-application-error 47 | name: komo-application-error 48 | spec: 49 | replicas: 1 50 | selector: 51 | matchLabels: 52 | app: komo-application-error 53 | template: 54 | metadata: 55 | labels: 56 | app: komo-application-error 57 | spec: 58 | containers: 59 | - env: 60 | - name: LOG_LEVEL 61 | value: "INFO" 62 | - name: EXIT_CODE 63 | value: "1" 64 | image: python:3.11-alpine 65 | name: python 66 | command: ["python"] 67 | args: ["/usr/share/app/code.py"] 68 | volumeMounts: 69 | - name: komodor-python-script 70 | mountPath: /usr/share/app/code.py 71 | subPath: python-script 72 | volumes: 73 | - name: komodor-python-script 74 | configMap: 75 | name: komodor-python-script -------------------------------------------------------------------------------- /failure-scenarios/application-error-with-exception/README.md: -------------------------------------------------------------------------------- 1 | # Scenario: Application Failure 2 | 3 | ## Why Is It Important? 4 | Every service is prone to application failures. Some of these failures are minor while others can cause the pods to crash and the application to become completely unavailable. 5 | 6 | ## Real-Life Example 7 | A developer pushed a new code to the service with a bug and since then the pods of the service are continuously crashing. 8 | 9 | ## How Komodor Helps? 10 | Komodor automatically detects that there is an issue and immediately runs a playbook to investigates the issue's root cause. 11 | Komodor shows the user all checks and findings initially the root cause, which is an application issue correlated with 12 | the container logs, metrics, and recent changes, to solve the problem quickly as possible. Instead of giving an indicative message like `CrashLoopBackOff`. 13 | 14 | 15 | Komodor shows the availability issue & the failed deploy events on the timeline: 16 | ![banner](../../assets/img/failure-scenarios/application-issue-availability-issue-detected.png) 17 | 18 | In the availability issue, you have the full information about the issue, like time, reason, explanation, relevant information and logs. It's easy to identify that this issue is caused by an application problem: 19 | ![banner](../../assets/img/failure-scenarios/application-issue-event.png) 20 | 21 | You can also view logs to debug the application: 22 | ![banner](../../assets/img/failure-scenarios/application-issue-logs.png) 23 | 24 | 25 | ## How To Run? 26 | 1. Apply a healthy deployment: 27 | ``` bash 28 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/simple-application.yaml 29 | ``` 30 | 31 | 1. Apply the same deployment with an application issue: 32 | ``` bash 33 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/application-error.yaml 34 | ``` 35 | 1. [Go to the relevant service in Komodor](https://app.komodor.com/services?textFilter=komo-application-error) and click on the availbility issue created. 36 | -------------------------------------------------------------------------------- /training-session/README.md: -------------------------------------------------------------------------------- 1 | ### Namespace Creation 2 | 3 | 1. Create your namespace: 4 | 5 | ```bash 6 | kubectl create ns [user] 7 | ``` 8 | 9 |
10 |
11 | 12 | ### First Deploy + Service Operations 13 | 14 | 1. Deploy a new deployment 15 | 16 | ```bash 17 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/nginx-image-healthy.yaml -n [user] 18 | ``` 19 | 20 | 2. Review the service created in Komodor 21 | 22 | 2. Delete one of the pods and check the new pod that just spawned 23 | 24 | 3. Scale the replica to 2 using the dedicated button on the top of the screen 25 | 26 | 4. Change the image to version 1.20.0 using the “edit yaml” button and check the deploy that just started 27 | 28 |
29 |
30 | 31 | ### Failed Deploy - Image Errors 32 | 33 | 1. Change the image of the deployment using this configuration 34 | 35 | ```bash 36 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-image-pull-backoff/imagepullbackoff.yaml -n [user]] 37 | ``` 38 | 39 | 2. Check the failed deploy and revert it using the button on the suggested actions section 40 | 41 |
42 |
43 | 44 | ### Application View 45 | 46 | Application views allow you to scope Komodor into your own application and get insights. 47 | 48 | 1. [Create a new application view](https://app.komodor.com/app-view/new), select dynamic scope for your own namespace. 49 | 50 | 51 | ### Failed Deploy - Configuration Errors 52 | 53 | 1. Deploy a new deployment 54 | 55 | ```bash 56 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/healthy-deploy.yaml -n [user] 57 | ``` 58 | 59 | 2. After the deploy is completed, apply the next deploy - which is going to fail 60 | 61 | ```bash 62 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/deploys-scenarios/failed-deploy-creation-config-error/createcontainerconfigerror.yaml -n [user] 63 | ``` 64 | 65 | 3. After you understand what's the reason for the failure, edit the yaml and fix the issue 66 | 67 |
68 |
69 | 70 | ### Troubleshooting - CrashLoop 71 | 72 | 1. Deploy a new deployment 73 | 74 | ```bash 75 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/simple-application.yaml -n [user] 76 | ``` 77 | 78 | 2. Wait it for it to be healthy and apply a new version of this deployment 79 | 80 | ```bash 81 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/application-error-with-exception/application-error.yaml -n [user] 82 | ``` 83 | 84 | 3. Fix the issue by reverting it back to the previous version or edit the yaml. 85 | Advanced users - You can also edit the code resides in the configmap and force a rollout with the new version. 86 | 87 |
88 |
89 | 90 | ### Troubleshooting - OOMKilled 91 | 92 | 1. Apply a new deployment 93 | 94 | ```bash 95 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/failure-scenarios/OOMKilled/oom.yaml -n [user] 96 | ``` 97 | 98 | 2. Review the issues created and the suggested actions. 99 | 100 | 3. Based on the suggest actions, change the memory limits to 75Mi. 101 | 102 |
103 |
104 | 105 | ### Troubleshooting - Probes 106 | 107 | 1. Deploy a new deployment 108 | 109 | ```bash 110 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/add-ready-live-example/failure-scenarios/ready-live-failure/healthy-app.yaml -n [user] 111 | ``` 112 | 113 | 1. Check the best practices section in the info tab. Which probes are missing? 114 | 115 | 2. Let's configure it using this yaml - apply it 116 | 117 | ```bash 118 | kubectl apply -f https://raw.githubusercontent.com/komodorio/komodor-tutorials/add-ready-live-example/failure-scenarios/ready-live-failure/fail-both.yaml -n [user] 119 | ``` 120 | 121 | 3. let’s review the diff between the deploy on the deploy event 122 | 123 | 4. Check why the deploy failed - beacuse of liveness probes failure (you can find it in the events of the pod) 124 | 125 | 5. Change the configuration of the probes using the "edit YAML" button 126 | 127 | ## Delete Your Namespace 128 | 129 | ```bash 130 | kubectl delete ns [user] 131 | ``` 132 | 133 | 134 | ## Run all 135 | ```bash 136 | bash <(curl -s https://raw.githubusercontent.com/komodorio/komodor-tutorials/master/training-session/run-all.sh) 137 | ``` 138 | --------------------------------------------------------------------------------