├── .gitignore ├── .markdownlintrc ├── .travis.yml ├── README.md └── site ├── Dockerfile ├── Makefile ├── favicon.png ├── index.html ├── start.sh ├── template.html └── update.py /.gitignore: -------------------------------------------------------------------------------- 1 | .* 2 | !.markdownlintrc 3 | !.travis.yml 4 | -------------------------------------------------------------------------------- /.markdownlintrc: -------------------------------------------------------------------------------- 1 | { 2 | "comment": "https://github.com/DavidAnson/markdownlint", 3 | 4 | "default": true, 5 | "fenced-code-language": false, 6 | "heading-increment": false, 7 | "no-trailing-punctuation": false, 8 | "line-length": false, 9 | "MD025": false, 10 | "MD037": false, 11 | "MD022": false, 12 | "MD046": false, 13 | "MD047": false, 14 | "MD007": { "indent": 4 } 15 | } 16 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: generic 2 | 3 | install: 4 | - npm install -g markdownlint-cli 5 | 6 | script: 7 | - markdownlint README.md 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Failure Stories 2 | 3 | The source code repository for https://k8s.af moved to https://codeberg.org/hjacobs/kubernetes-failure-stories 4 | -------------------------------------------------------------------------------- /site/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.8-slim 2 | 3 | RUN pip install requests markdown 4 | 5 | COPY update.py / 6 | COPY template.html / 7 | 8 | WORKDIR /workdir 9 | 10 | ENTRYPOINT ["python", "/update.py"] 11 | 12 | -------------------------------------------------------------------------------- /site/Makefile: -------------------------------------------------------------------------------- 1 | build: 2 | docker build -t kubernetes-failure-stories . 3 | -------------------------------------------------------------------------------- /site/favicon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hjacobs/kubernetes-failure-stories/3b0774268a60855861058fe1a6122b9b34efb6a1/site/favicon.png -------------------------------------------------------------------------------- /site/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Kubernetes Failure Stories 7 | 8 | 9 | 10 | 11 |
12 |
13 |
14 |

Kubernetes Failure Stories

15 |

A compiled list of links to public failure stories related to Kubernetes. 16 | Most recent publications on top.

17 | 154 |

Why

155 |

Kubernetes is a fairly complex system with many moving parts. 156 | Its ecosystem is constantly evolving and adding even more layers (service mesh, ...) to the mix. 157 | Considering this environment, we don't hear enough real-world horror stories to learn from each other! 158 | This compilation of failure stories should make it easier for people dealing with Kubernetes operations (SRE, Ops, platform/infrastructure teams) to 159 | learn from others and reduce the unknown unknowns of running Kubernetes in production. 160 | For more information, see the blog post.

161 |

Contributing

162 |

Please help the community and share a link to your failure story by opening a Pull Request! 163 | Failure stories can be anything like blog posts, conference/meetup talks, incident postmortems, tweetstorms, ...

164 |

I would also be glad to hear about your failure stories on Twitter: my handle is @try_except_

165 |
166 |
167 |
168 | 173 |
174 | 175 | 176 | 177 | -------------------------------------------------------------------------------- /site/start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cp favicon.png /var/www/k8s.af/workdir/ 4 | 5 | IMAGE=kubernetes-failure-stories 6 | docker build -t $IMAGE . 7 | docker run -d --name kubernetes-failure-stories -u $(id -u) -v /var/www/k8s.af/workdir:/workdir --restart=always $IMAGE 8 | -------------------------------------------------------------------------------- /site/template.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Kubernetes Failure Stories 7 | 8 | 9 | 10 | 11 |
12 |
13 |
14 | {{content}} 15 |
16 |
17 |
18 | 23 |
24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /site/update.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import markdown 4 | import requests 5 | import time 6 | 7 | from pathlib import Path 8 | 9 | template = (Path(__file__).parent / "template.html").read_text() 10 | 11 | while True: 12 | response = requests.get( 13 | "https://codeberg.org/hjacobs/kubernetes-failure-stories/raw/branch/master/README.md", 14 | timeout=5, 15 | ) 16 | response.raise_for_status() 17 | 18 | html = markdown.markdown(response.text) 19 | 20 | out = template.replace("{{content}}", html) 21 | 22 | Path("index.html").write_text(out) 23 | 24 | time.sleep(300) 25 | --------------------------------------------------------------------------------