└── README.md /README.md: -------------------------------------------------------------------------------- 1 | Table of Contents 2 | ================= 3 | 4 | * [Important](#important) 5 | * [My experience and tips](#my-experience-and-tips) 6 | * [Certificate](#certificate) 7 | * [What can we do](#what-can-we-do) 8 | * [Tips](#tips) 9 | * [Cheatsheet](#cheatsheet) 10 | * [tmux](#tmux) 11 | * [various tools](#various-tools) 12 | * [yaml](#yaml) 13 | * [k8s](#k8s) 14 | * [Docs to study](#docs-to-study) 15 | * [Must study](#must-study) 16 | * [Recommended](#recommended) 17 | * [Docs that may be out\-of\-scope for CKAD](#docs-that-may-be-out-of-scope-for-ckad) 18 | * [Exercises](#exercises) 19 | 20 | # Important 21 | 22 | ## My experience and tips 23 | Here is a video I created with my experience plus tips for a better exam preparation: https://youtu.be/qA4dNATs5nE. 24 | 25 | ## Certificate 26 | - Don't forget to read the exam resources in the official CKAD page: https://www.cncf.io/certification/ckad/. 27 | - The Candidate must contact the Exam Proctoring Partner’s Support team within 15 minutes of the scheduled start time for their reservation to report an issue. Otherwise, they will be markedas a "No Show" for the Exam 28 | - Candidate is not allowed to read the questions out loud, to themselves, during the exam 29 | - Candidate is permitted to drink clear liquids from a label-free clear bottle or a clear glass 30 | - Candidate is not allowed to wear any electronic device in their ears, on their face or on their body 31 | - During the exam, Candidate may only run one application - the Chrome/Chromium browser in which the exam is shown 32 | - Candidate is not allowed to write or enter input on anything (whether paper, electronic device, etc.) outside of the exam console screen. 33 | 34 | ## What can we do 35 | - Root privileges can be obtained by running `sudo -i`. 36 | - Rebooting of your server is permitted at any time. 37 | - `Ctrl+C` & and `Ctrl+V` are not supported in your exam terminal. 38 | - For Linux: select text for copy and `middle button` for paste (or both `left` and `right` simultaneously if you have no middle button.) 39 | - For Mac: `⌘+C` to copy and `⌘+V` to paste. 40 | - For Windows: `Ctrl+Insert` to copy and `Shift+Insert` to paste. 41 | - Use `Ctrl+Alt+W` or `Esc+Backspace` instead of `Ctrl+W` to remove words. 42 | - Issues with wrapped text within the terminal pane may be resolved by resizing your browser window temporarily. 43 | - Candidates can confirm the time remaining with the proctor directly. 44 | 45 | ## Tips 46 | - Get used to the browser terminal: https://www.katacoda.com/courses/kubernetes/launch-single-node-cluster 47 | - Skip hard questions and come back to them later (flag it in the provided notepad) 48 | - Pay attention to the point value of the question. Skip difficult, low-value questions! 49 | - Use the documentation: https://kubernetes.io/docs 50 | - Dont be afraid of using sudo -i for root operations to save time. 51 | - After sshing to a given node (ssh command will be given) don't forget to exit to return to main machine. 52 | - **WARN**: if the exercises says that you need to find the broken `service`, and only the affected k8s object, describe the `service` search for the selector and find the pod with that one, and that is the failing one even though there may be other kubernetes objects . 53 | - **WARN**: if a livenessProbe is requested through something like `Implement a liveness-probe which checks the container to be reachable on port 80` it should be implemented as a `tcpSocket` and not an `httpGet`. 54 | 55 | # Cheatsheet 56 | 57 | ## tmux 58 | 59 | ```markdown 60 | # Tmux: https://www.linode.com/docs/networking/ssh/persistent-terminal-sessions-with-tmux/ 61 | - Prefix: ctrl+b 62 | - Optional: use as prefix with: `set -g prefix 'C-Space'` 63 | - New-window: Prefix + c 64 | - Split-horizontal: Prefix + " 65 | - Split-vertical: Prefix + % 66 | 67 | - List sessions: tmux ls 68 | - Attach session: tmux attach -t 0 69 | - Destroy all sessions: tmux kill-server 70 | 71 | ## Inside tmux 72 | - set mouse # for copy/paste with `Prefix + [` and `Prefix + ]` 73 | - set synchronize-panes 74 | ``` 75 | 76 | ## various tools 77 | ```bash 78 | # grep 79 | grep -Hnri # grep: print file name, print line number, recursive, ignore-case 80 | 81 | # vim 82 | set expandtab tabstop=2 shiftwidth=2 softtabstop=2 incsearch 83 | set list # to check if you have a mix of and 84 | retab # to fix in case you have a mix of and 85 | 86 | # ubuntu 87 | apt update 88 | apt install net-tools dnsutils procps 89 | 90 | # wget 91 | wget -qO- # quiet, output to - (stdout) 92 | wget --spider # just check the page is there 93 | 94 | # cron syntax: https://en.wikipedia.org/wiki/Cron 95 | # ┌───────────── minute (0 - 59) 96 | # │ ┌───────────── hour (0 - 23) 97 | # │ │ ┌───────────── day of the month (1 - 31) 98 | # │ │ │ ┌───────────── month (1 - 12) 99 | # │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday; 100 | # │ │ │ │ │ 7 is also Sunday on some systems) 101 | # │ │ │ │ │ 102 | # │ │ │ │ │ 103 | # * * * * * command to execute 104 | 105 | # Note: A question mark (?) in the schedule has the same meaning as an asterisk * 106 | 107 | # Example: https://crontab.guru/ 108 | # */1 * * * * => every minute 109 | # 1 * * * * => at minute 1 (18:01, 19:01...) 110 | ``` 111 | 112 | ## yaml 113 | ```yaml 114 | # https://stackoverflow.com/questions/3790454/how-do-i-break-a-string-over-multiple-lines 115 | 116 | # Usually you want a line continuation, use >: 117 | key: > 118 | Your long 119 | string here. 120 | 121 | # If you want the linebreaks to be preserved as \n in the string (for instance, embedded markdown with paragraphs), use |. 122 | key: | 123 | ### Heading 124 | 125 | * Bullet 126 | * Points 127 | 128 | # If you need to split lines in the middle of words or literally type linebreaks as \n, use double quotes instead: 129 | key: "Antidisestab\ 130 | lishmentarianism.\n\nGet on it." 131 | 132 | # HINT: Use >- or |- instead if you don't want a linebreak appended at the end. 133 | ``` 134 | 135 | ## k8s 136 | ```bash 137 | alias k=kubectl 138 | alias kn="kubectl config set-context --current --namespace" 139 | source <(kubectl completion bash) 140 | complete -F __start_kubectl k 141 | 142 | # quickly visit k8s resource documentation (really recommended you master it) 143 | kubectl explain [k8s_resource_path] # example: kubectl explain pod.spec.volumes 144 | 145 | # useful to sort by timestamp when watching events 146 | kubectl get events --sort-by .lastTimestamp # add -w to watch it 147 | 148 | # check if metric-server is installed 149 | kubectl get apiservices | grep metrics 150 | 151 | # run including a command! 152 | kubectl run busybox --image=busybox --restart=Never --dry-run -o yaml -- /bin/sh -c 'echo $(date); sleep 3600' 153 | 154 | # run connectivity with timeout (5 seconds) 155 | kubectl run curl --image=radial/busyboxplus -it --rm --restart=Never - curl -m 5 my-service:8080 # curl 156 | kubectl run wget --image=busybox -it --rm --restart=Never -- wget --timeout 5 -O- my-service:8080 # wget 157 | kubectl run wget --image=busybox -it --rm --restart=Never -- nc -w 5 -zv my-service 8080 # netcat TCP 158 | kubectl run wget --image=busybox -it --rm --restart=Never -- nc -w 5 -zuv my-service 8181 # netcat UDP 159 | 160 | # Flush DNS (in case you are seeing issues after pointing a k8s service to a different location) 161 | kubectl get pod -n kube-system | grep dns 162 | kubectl delete pod 163 | 164 | kubectl get pod mypod -o yaml --export > mypod.yaml # Export spec without status (WARN: does not include the namespace!) 165 | kubectl api-resources 166 | kubectl api-versions # https://akomljen.com/kubernetes-api-resources-which-group-and-version-to-use/ 167 | kubectl autoscale deployment --min=2 --max=10 168 | kubectl annotate pods icon-url=http://goo.gl/XXBTWq 169 | kubectl logs my-pod --previous # dump pod logs (stdout) for a previous instantiation of a container 170 | kubectl delete pods,services -l name=myLabel # Delete pods and services with label name=myLabel 171 | kubectl -n my-ns delete pod,svc --all # Delete all pods and services in namespace my-ns, 172 | kubectl attach my-pod -i # Attach to Running Container 173 | kubectl port-forward my-pod 5000:6000 # Listen on port 5000 on the local machine and forward to port 6000 on my-pod 174 | kubectl top pod POD_NAME --containers # Show metrics for a given pod and its containers 175 | kubectl logs --previous (also works with deploy/job) 176 | 177 | kubectl set image deployment/ = --record # Rolling update & Rollbacks 178 | kubectl rollout history [--revision ] 179 | kubectl rollout undo deployment/ [--to-revision=] 180 | 181 | kubectl describe networkpolicies # very useful to see the ingress/egress rules 182 | ``` 183 | 184 | # Docs to study 185 | 186 | Most of the links were taken from: 187 | - https://github.com/dgkanatsios/CKAD-exercises 188 | - https://github.com/twajr/ckad-prep-notes#tasks-from-kubernetes-doc 189 | 190 | ## Must study 191 | 192 | The following links are the ones you really need to learn. 193 | 194 | https://kubernetes.io/docs/reference/kubectl/cheatsheet/ 195 | - Must read, come here many times until you grasp every tip. 196 | 197 | https://kubernetes.io/docs/concepts/services-networking/network-policies/ 198 | - Network policies do not conflict, they are `additive`. If any policy or policies select a pod, the pod is restricted to what is allowed by the union of those policies’ ingress/egress rules. Thus, order of evaluation does not affect the policy result. 199 | - `podSelector`: Each NetworkPolicy includes a podSelector which selects the grouping of pods to which the policy applies. The example policy selects pods with the label “role=db”. An empty podSelector selects all pods in the namespace. 200 | - `policyTypes`: Each NetworkPolicy includes a policyTypes list which may include either Ingress, Egress, or both. The policyTypes field indicates whether or not the given policy applies to ingress traffic to selected pod, egress traffic from selected pods, or both. If no policyTypes are specified on a NetworkPolicy then by default Ingress will always be set and Egress will be set if the NetworkPolicy has any egress rules. 201 | - `namespaceSelector` and `podSelector`, be careful to use correct YAML syntax: 202 | 203 | ```yaml 204 | # contains a single from element allowing connections from Pods with the label role=client IN namespaces with the label user=alice. 205 | ingress: 206 | - from: 207 | - namespaceSelector: 208 | matchLabels: 209 | user: alice 210 | podSelector: 211 | matchLabels: 212 | role: client 213 | --- 214 | # contains two elements in the from array, and allows connections from Pods in the local Namespace with the label role=client OR from any Pod in any namespace with the label user=alice. 215 | ingress: 216 | - from: 217 | - namespaceSelector: 218 | matchLabels: 219 | user: alice 220 | - podSelector: 221 | matchLabels: 222 | role: client 223 | ``` 224 | - Examples: 225 | 226 | ```yaml 227 | # Default deny all ingress traffic (in a namespace) 228 | apiVersion: networking.k8s.io/v1 229 | kind: NetworkPolicy 230 | metadata: 231 | name: default-deny-ingress 232 | spec: 233 | podSelector: {} 234 | policyTypes: 235 | - Ingress 236 | 237 | --- 238 | 239 | # Default allow all ingress traffic (in a namespace) 240 | # This works even if policies are added that cause some pods to be treated as "isolated" 241 | apiVersion: networking.k8s.io/v1 242 | kind: NetworkPolicy 243 | metadata: 244 | name: allow-all-ingress 245 | spec: 246 | podSelector: {} 247 | ingress: 248 | - {} 249 | policyTypes: 250 | - Ingress 251 | 252 | # Same cases for egress traffic (or both!) 253 | --- 254 | 255 | # Bonus: how to keep DNS resolution support 256 | apiVersion: networking.k8s.io/v1 257 | kind: NetworkPolicy 258 | metadata: 259 | name: allow-dns 260 | spec: 261 | podSelector: 262 | matchLabels: 263 | app: myapp 264 | policyTypes: 265 | - Egress 266 | egress: 267 | - to: 268 | ports: 269 | - protocol: UDP 270 | port: 53 271 | - protocol: TCP # TCP should be added as well: https://www.infoblox.com/dns-security-resource-center/dns-security-faq/is-dns-tcp-or-udp-port-53/ 272 | port: 53 273 | ``` 274 | 275 | https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/ 276 | 277 | https://kubernetes.io/docs/concepts/services-networking/service/ 278 | - Port definitions in `Pods` have names, and you can reference these names in the `targetPort` attribute of a Service. You can change the port numbers that Pods expose in the next version of your backend software, without breaking clients. 279 | - If `kube-proxy` is running in `iptables mode` and the first Pod that’s selected does not respond, the connection fails. This is different from `userspace mode`: in that scenario, kube-proxy would detect that the connection to the first Pod had failed and would automatically retry with a different backend Pod. 280 | - You can use Pod `readiness probes` to verify that backend Pods are working OK, so that kube-proxy in iptables mode only sees backends that test out as healthy. Doing this means you avoid having traffic sent via kube-proxy to a Pod that’s known to have failed. 281 | - If you want to make sure that connections from a particular client are passed to the same Pod each time, you can select the session affinity based on the client’s IP addresses by setting `service.spec.sessionAffinity` to `ClientIP` (default is None). You can also set the maximum session sticky time by setting `service.spec.sessionAffinityConfig.clientIP.timeoutSeconds` (the default value is 10800, which works out to be 3 hours). 282 | - For some Services, you need to expose more than one port. Kubernetes lets you configure multiple port definitions on a Service object. **When using multiple ports for a Service, you must give all of your ports names so that these are unambiguous**. 283 | - You can specify your own cluster IP address as part of a Service creation request. To do this, set the `.spec.clusterIP` field. For example, if you already have an existing DNS entry that you wish to reuse, or legacy systems that are configured for a specific IP address and difficult to re-configure. 284 | - When you have a Pod that needs to access a Service, and you are using the `environment variable` method to publish the port and cluster IP to the client Pods, you must create the Service before the client Pods come into existence. 285 | - If the service's environment variables are not desired (because possible clashing with expected program ones, too many variables to process, only using DNS, etc) you can disable this mode by setting the `pod.spec.enableServiceLinks`. 286 | - A cluster-aware DNS server, such as `CoreDNS`, watches the Kubernetes API for new Services and creates a set of DNS records for each one. If DNS has been enabled throughout your cluster then all Pods should automatically be able to resolve Services by their DNS name. 287 | - Kubernetes also supports `DNS SRV` (Service) records for `named ports`. If the `my-service.my-ns` Service has a port named `http` with the protocol set to `TCP`, you can do a `DNS SRV` query for `_http._tcp.my-service.my-ns` to discover the port number for http, as well as the IP address. 288 | - Sometimes you don’t need load-balancing and a single Service IP. In this case, you can create what are termed `headless services`, by specifying `.spec.clusterIP = None`. 289 | - For `headless services` that define selectors, the endpoints controller creates `Endpoints` records in the API, and modifies the DNS configuration to return records (addresses) that point directly to the Pods backing the Service. 290 | - For `headless services` that do not define selectors, the endpoints controller does not create `Endpoints` records. However, the DNS system looks for and configures either: CNAME records for `ExternalName services` OR `A records` for any `Endpoints` that share a name with the Service. 291 | - `ExternalName`: Maps the Service to the contents of the `externalName` field (e.g. foo.bar.example.com), by returning a `CNAME record` with its value. **No proxying of any kind is set up**. 292 | - In the service spec, `externalIPs` can be specified along with any of the service types, then a `my-service` will be accessible by clients on `externalIP:port`. 293 | - Service without selectors are useful to point to external places or service in a different namespace/cluster. Remember the corresponding Endpoint object is not created automatically. 294 | - The set of pods that a `service` targets is defined with a `labelSelector`, and **only equality-based** requirement selectors are supported (no set-based support.) 295 | - You can run multiple `nginx pods` on the `same node` all using the `same containerPort` and access them `from any other pod or node` in your cluster using IP. Like Docker, ports can still be published to the host node's interfaces, but the need for this is radically diminished because of the networking model. 296 | 297 | ```yaml 298 | apiVersion: v1 299 | kind: Service 300 | metadata: 301 | name: my-service 302 | spec: 303 | ports: 304 | - protocol: TCP 305 | port: 80 306 | targetPort: 9376 307 | --- 308 | apiVersion: v1 309 | kind: Endpoints 310 | metadata: 311 | name: my-service # must have the same name than the service 312 | subsets: 313 | - addresses: 314 | - ip: 192.0.2.42 315 | ports: 316 | - port: 9376 317 | ``` 318 | 319 | https://kubernetes.io/docs/concepts/storage/volumes/ 320 | - A `volume` has an explicit `lifetime` - the same as the Pod that encloses it. Consequently, a volume outlives any containers that run within the Pod, and data is preserved across container restarts. Of course, when a Pod ceases to exist, the volume will cease to exist, too. 321 | - Volumes can not mount onto other volumes or have hard links to other volumes. Each Container in the Pod must independently specify where to mount each volume. 322 | - `configMap` 323 | - A Container using a ConfigMap as a `subPath` volume mount will not receive ConfigMap updates. 324 | - `secret` 325 | - A Container using a Secret as a `subPath` volume mount will not receive Secret updates. 326 | - Backed by `tmpfs` (a RAM-backed filesystem) so they are never written to non-volatile storage. 327 | - `emptyDir` 328 | - By default, emptyDir volumes are stored on whatever medium is backing the node - that might be disk or SSD or network storage, depending on your environment. However, you can set the `emptyDir.medium` field to `memory` to tell Kubernetes to mount a `tmpfs` (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, it is cleared on node reboot and any files you write will count against your container’s memory limit. 329 | - `gitRepo (deprecated)` 330 | - As an alternative, mount an `emptyDir` into an `InitContainer` that clones the git repo, then mount the emptyDir into the Pod's container. 331 | - `hostPath` 332 | - Mounts a file or directory from the host node's filesystem into your Pod. 333 | - Use: running a Container that needs access to Docker internals; use a hostPath of `/var/lib/docker`, allowing a Pod to specify whether a given hostPath should exist prior to the Pod running, whether it should be created, and what it should exist. 334 | - `hostPah.type`: DirectoryOrCreate (created with 0755, same ownership with Kubelet), Directory, FileOrCreate, File, Socket, CharDevice, BlockDevice (default is empty, so no check is performed.) 335 | - **WARN**: the files or directories created on the underlying hosts are only writable by `root`. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to the volume. 336 | - `nfs` 337 | - An nfs volume allows an existing NFS (Network File System) share to be mounted into your Pod. Unlike emptyDir, which is erased when a Pod is removed, the contents of an nfs volume are preserved and the volume is merely unmounted. This means that an NFS volume can be pre-populated with data, and that data can be handed off between Pods. NFS can be mounted by multiple writers imultaneously. 338 | - `persistentVolumeClaim` 339 | - A persistentVolumeClaim volume is used to mount a PersistentVolume into a Pod. PersistentVolumes are a way for users to "claim" durable storage (such as a GCE PersistentDisk or an iSCSI volume) without knowing the details of the particular cloud environment. 340 | - `local` 341 | - A local volume represents a mounted local storage device such as a disk, partition or directory. 342 | - Local volumes can only be used as a statically created PersistentVolume. Dynamic provisioning is not supported yet. 343 | - PersistentVolume nodeAffinity is required when using local volumes. It enables the Kubernetes scheduler to correctly schedule Pods using local volumes to the correct node. 344 | - They are subject to the availability of the underlying node and are not suitable for all applications. If a node becomes unhealthy, then the local volume will also become inaccessible, and a Pod using it will not be able to run. Applications using local volumes must be able to tolerate this reduced availability. 345 | - `projected` 346 | - Maps several existing volume sources into the same directory. 347 | ```yaml 348 | apiVersion: v1 349 | kind: Pod 350 | metadata: 351 | name: volume-test 352 | spec: 353 | containers: 354 | - name: container-test 355 | image: busybox 356 | volumeMounts: 357 | - name: all-in-one 358 | mountPath: "/projected-volume" 359 | readOnly: true 360 | volumes: 361 | - name: all-in-one 362 | projected: 363 | sources: 364 | - secret: 365 | name: mysecret 366 | items: # optional 367 | - key: username 368 | path: my-group/my-username 369 | - secret: 370 | name: mysecret2 371 | items: # optional 372 | - key: password 373 | path: my-group/my-password 374 | ``` 375 | 376 | - Use `volumeMounts.subPath` to share one volume for multiple uses in a single pod. It specifies a subpath inside the referenced volume instead of its root. 377 | - You can use `subPath` with expanded environment variables: 378 | ```yaml 379 | apiVersion: v1 380 | kind: Pod 381 | metadata: 382 | name: pod1 383 | spec: 384 | containers: 385 | - name: container1 386 | env: 387 | - name: POD_NAME 388 | valueFrom: 389 | fieldRef: # Cool trick to reference data from this k8s object 390 | apiVersion: v1 391 | fieldPath: metadata.name 392 | image: busybox 393 | command: [ "sh", "-c", "while [ true ]; do echo 'Hello'; sleep 10; done | tee -a /logs/hello.txt" ] 394 | volumeMounts: 395 | - name: workdir1 396 | mountPath: /logs 397 | subPathExpr: $(POD_NAME) # Mounted as /logs => /tmp/pods/pod1/ 398 | restartPolicy: Never 399 | volumes: 400 | - name: workdir1 401 | hostPath: 402 | path: /tmp/pods 403 | type: DirectoryOrCreate 404 | ``` 405 | 406 | https://kubernetes.io/docs/concepts/storage/persistent-volumes/ 407 | - Once bound, `PersistentVolumeClaim binds are exclusive`, regardless of how they were bound. A `PVC` to `PV` binding is a `one-to-one mapping`, using a `ClaimRef` which is a bi-directional binding between the PersistentVolume and the PersistentVolumeClaim. 408 | - Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster. 409 | - If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately. PVC removal is postponed until the PVC is no longer actively used by any Pods. Also, if an admin deletes a PV that is bound to a PVC, the PV is not removed immediately. PV removal is postponed until the PV is no longer bound to a PVC. 410 | - When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The `reclaim policy` for a `PersistentVolume` tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be: 411 | - `Retained`: allows for manual reclamation of the resource. You need to delete the PV, cleanup the data, create a new PV (or delete the associated storage asset like AWS EBS) 412 | - `Deleted`: removes both the PV from Kubernetes, as well as the associated storage asset in the external infrastructure, like AWS EBS. 413 | - `Recycled`: Deprecated, use dynamic provisioning instead. (Basic scrub: rm -rf /thevolume/*). only `NFS` and `HostPath` support recycling. 414 | - PV `HostPath` single node testing only – local storage is not supported in any way and WILL NOT WORK in a multi-node cluster. 415 | - Currently, `storage size` is the only resource that can be set or requested. Future attributes may include IOPS, throughput, etc. 416 | - A volume with `volumeMode: Filesystem` is mounted into Pods into a directory. If the volume is backed by a block device and the device is empty, Kuberneretes creates a filesystem on the device before mounting it for the first time. You can set the value of `volumeMode: Block` to use a volume as a raw block device. Such volume is presented into a Pod as a block device, without any filesystem on it. This mode is useful to provide a Pod the fastest possible way to access a volume, without any filesystem layer between the Pod and the volume. 417 | - `Access modes`: ReadWriteOnce (RWO), ReadOnlyMany (ROX), ReadWriteMany (RWX) 418 | - A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time. 419 | - A PV can have a `storageClassName` attribute to the name of a StorageClass. A PV of a particular class can only be bound to PVCs requesting that class. A PV with no storageClassName has no class and can only be bound to PVCs that request no particular class. 420 | - A PV can specify `node affinity` to define constraints that limit what nodes this volume can be accessed from. Pods that use a PV will only be scheduled to nodes that are selected by the node affinity. 421 | - `Phases of a Volumes`: Available, Bound, Released, Failed 422 | - PVC can have a `selector` to filter the set of volumes. You can use `matchLabels` as well as `matchExpressions`. 423 | - PVC without attribute `storageClassName` will default to the configured value in the admission plugin, or "" if there is no one. 424 | - Claims (in a Pod) must exist in the same namespace as the Pod using the PVC. 425 | - Since PV binds are exclusive, and since `PVC are namespaced` objects, mounting claims with "Many" modes `ROX, RWX` is only possible within one namespace. 426 | - Volume Snapshot / Restore and Cloning available to some CSI volume plugins. 427 | - Writing Portable Configuration: 428 | - Do not include PersistentVolume objects in the config, since the user instantiating the config may not have permission to create PersistentVolumes. 429 | - In your tooling, watch for PVCs that are not getting bound after some time and surface this to the user, as this may indicate that the cluster has no dynamic storage support (in which case the user should create a matching PV) or the cluster has no storage system (in which case the user cannot deploy config requiring PVCs). 430 | - When a Pod consumes a PV that has a `pv.beta.kubernetes.io/gid: "1234"` annotation, the annotated GID is applied to all containers in the Pod in the same way that GIDs specified in the Pod’s security context are. Every GID, whether it originates from a PV annotation or the Pod’s specification, is applied to the first process run in each container. 431 | - `Dynamic volume provisioning`: https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/ 432 | - The implementation of dynamic volume provisioning is based on the API object StorageClass from the API group storage.k8s.io. A cluster administrator can define as many StorageClass objects as needed, each specifying a volume plugin (aka provisioner) that provisions a volume and the set of parameters to pass to that provisioner when provisioning. A cluster administrator can define and expose multiple flavors of storage (from the same or different storage systems) within a cluster, each with a custom set of parameters. 433 | ```yaml 434 | apiVersion: storage.k8s.io/v1 435 | kind: StorageClass 436 | metadata: 437 | name: slow 438 | provisioner: kubernetes.io/gce-pd # A Google Cloud storage device 439 | parameters: 440 | type: pd-standard 441 | --- 442 | apiVersion: v1 443 | kind: PersistentVolumeClaim 444 | metadata: 445 | name: claim1 446 | spec: 447 | accessModes: 448 | - ReadWriteOnce 449 | storageClassName: slow # matches the name of the above StorageClass 450 | resources: 451 | requests: 452 | storage: 30Gi 453 | ``` 454 | 455 | https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/ 456 | - Avoid deleting job's pods with: `kubectl delete jobs/old --cascade=false`. 457 | - Job is only appropriate for pods with `RestartPolicy` equal to `OnFailure` or `Never`. Note: If RestartPolicy is not set, the default value is Always. 458 | - An entire Pod can also fail, for a number of reasons, such as when the pod is kicked off the node (node is upgraded, rebooted, deleted, etc.), or if a container of the Pod fails and the `.spec.template.spec.restartPolicy = "Never"`. When a Pod fails, then the Job controller starts a new Pod. This means that your application needs to handle the case when it is restarted in a new pod. In particular, it needs to handle temporary files, locks, incomplete output and the like caused by previous runs. 459 | - Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and `.spec.template.spec.restartPolicy = "Never"`, the same program may sometimes be started twice. 460 | - If you do specify `.spec.parallelism` and `.spec.completions` both greater than 1, then there may be multiple pods running at once. Therefore, your pods must also be tolerant of concurrency. 461 | - There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set `.spec.backoffLimit` to specify the number of retries before considering a Job as failed (`and any running Pods will be terminated`). The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes. 462 | - If your job has `restartPolicy = "OnFailure"`, keep in mind that your container running the Job will be terminated once the job backoff limit has been reached. This can make debugging the Job’s executable more difficult. We suggest setting `restartPolicy = "Never"` when debugging the Job. 463 | - Another way to terminate a Job is by setting the `.spec.activeDeadlineSeconds` field to a number of seconds. 464 | - Keep in mind that the `restartPolicy` applies to the Pod, and not to the Job itself: there is no automatic Job restart once the Job status is `type: Failed`. That is, the Job termination mechanisms activated with `.spec.activeDeadlineSeconds` and `.spec.backoffLimit` result in a permanent Job failure that requires manual intervention to resolve. 465 | - The Job object can be used to support reliable parallel execution of Pods. The Job object is not designed to support closely-communicating parallel processes, as commonly found in scientific computing. It does support parallel processing of a set of independent but related work items. These might be emails to be sent, frames to be rendered, files to be transcoded, ranges of keys in a NoSQL database to scan. 466 | - Job Parallel Patterns: 467 | - `Job Tamplate Expansion`: use `completion: 1` and `parallelims: 1` (default values, can be omitted) 468 | - Simply have some placeholders in your `Job` and replace them (eg: with sed) and then create them in k8s. This is one `Job object for each work item`. 469 | - `Queue with Pod Per Work Item`: use `completion: #work-items` and `parallelism: any (usually >=2)`. 470 | - Each POD should be passed in an item read from a `message queue`, so your app may not need modifications! 471 | - If the number of completions is set to less than the number of items in the queue, then not all items will be processed. 472 | - If the number of completions is set to more than the number of items in the queue, then the Job will not appear to be completed, even though all items in the queue have been processed. It will start additional pods which will block waiting for a message. 473 | - `Queue with Variable Pod Count`: use `completion: 1` (default) and `parallelism: any (usually >=2)`. 474 | - The app needs to be modified (eg: use a redis client to read message from the queue) 475 | - Each pod works on several items from the `message queue` and then exits when there are no more items. Since the workers themselves detect when the workqueue is empty, and the Job controller does not know about the workqueue, it relies on the workers to signal when they are done working. The workers signal that the queue is empty by exiting with success. So, as soon as any worker exits with success, the controller knows the work is done, and the Pods will exit soon. That's why we set the `completion: 1`. The job controller will wait for the other pods to complete too. 476 | - `Single Job with Static Work Assignment`: use `completion: #work-items` and `parallelism: any (usually >=2)`. 477 | - If you have a continuous stream of background processing work to run, then consider running your background workers with a `ReplicaSet` instead, and consider running a background processing library such as https://github.com/resque/resque. 478 | 479 | https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/ 480 | 481 | https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/ 482 | 483 | - A `CronJob` creates `Jobs` on a `time-based schedule` with cron format: https://en.wikipedia.org/wiki/Cron. 484 | - Cron jobs have limitations and idiosyncrasies. For example, in certain circumstances, a single cron job can create `multiple jobs`. Therefore, jobs should be idempotent. 485 | - The `CronJob` is only responsible for creating `Jobs` that match its schedule, and the `Job` in turn is responsible for the management of the Pods it represents. 486 | - The `startingDeadlineSeconds` field is optional. It stands for the deadline in seconds for starting the job if it misses its `scheduled time` for any reason. After the deadline, the cron job does not start the job. Jobs that do not meet their deadline in this way count as failed jobs. If this field is not specified, the jobs have no deadline. 487 | - The `CronJob controller` counts how many `missed schedules` happen for a cron job. If there are more than `100 missed schedules`, the cron job is no longer scheduled. When `startingDeadlineSeconds` is not set, the CronJob controller counts missed schedules from `status.lastScheduleTime` until now. 488 | - For example, one cron job is supposed to run every minute, the `status.lastScheduleTime` of the cronjob is `5:00am`, but now it’s `7:00am`. That means `120` schedules were missed, so the cron job is no longer scheduled. 489 | - If the `startingDeadlineSeconds` field is set (not null), the CronJob controller counts how many missed jobs occurred from the value of `startingDeadlineSeconds` until now. 490 | - For example, if it is set to 200, it counts how many missed schedules occurred in the last 200 seconds. In that case, if there were more than 100 missed schedules in the last 200 seconds, the cron job is no longer scheduled. 491 | - Deleting the `cronjob` removes all the `jobs` and `pods` it created and stops it from creating additional jobs. 492 | - The `concurrencyPolicy` field is optional. It specifies how to treat concurrent executions of a job that is created by this cron job: 493 | - `Allow` (default) 494 | - `Forbid`: if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run. 495 | - `Replace`: if it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run 496 | - The `suspend` field is optional. If it is set to `true`, all subsequent executions are suspended. This setting does not apply to already started executions. Defaults to `false`. 497 | - The `successfulJobsHistoryLimit` (default 3) and `failedJobsHistoryLimit` (default 1) fields are optional. Setting to 0 keeps nothing. 498 | 499 | https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/ 500 | - Each Pod is assigned a `unique IP address`. Every `container` in a Pod shares the `network namespace`, including the `IP address` and `network ports`. Containers inside a Pod can communicate with one another using localhost. When containers in a Pod communicate with entities outside the Pod, they must coordinate how they use the shared network resources (such as ports). 501 | - What's the use of giving a port name in a Pod definition? 502 | - Each named port in a pod must have a unique name. It can be referred to by services. 503 | - `pod.spec.containers.ports`: List of ports to expose from the container. Exposing a port here gives the system additional information about the network connections a container uses, but is `primarily informational`. Not specifying a port here `DOES NOT prevent that port from being exposed`. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. 504 | 505 | https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ 506 | - Pod phases: 507 | - `Pending`: includes time before being scheduled as well as time spent downloading images over the network. 508 | - `Running`: Pod bound to a node, and containers were created. 509 | - `Succeeded`: All containers have terminated in success, and will not be restarted. 510 | - `Failed`: All containers have terminated, and at least one of them in failure. 511 | - `Unknown`: state of the Pod could not be obtained, typically due to an error communicating with the host of the Pod. 512 | - The type field is a string with the following possible values: 513 | - `PodScheduled`: the Pod has been scheduled to a node; 514 | - `Ready`: the Pod is able to serve requests and should be added to the load balancing pools of all matching Services; 515 | - `Initialized`: all init containers have started successfully; 516 | - `ContainersReady`: all containers in the Pod are ready. 517 | - A `PodSpec` has a `restartPolicy` field with possible values `Always`, `OnFailure`, and `Never` (default is Always). `restartPolicy` applies to all Containers in the Pod. restartPolicy only refers to restarts of the Containers by the kubelet on the same node. Exited Containers that are restarted by the kubelet are restarted with an `exponential back-off` delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution. 518 | - A pod, once bound to a node, will never be rebound to another node. 519 | - `Pod readiness gate`: Additional conditions to be evaluated for Pod readiness. If Kubernetes cannot find such a condition in the `status.conditions`, the status of the condition is default to `False`: 520 | ```yaml 521 | spec: 522 | readinessGates: 523 | - conditionType: "www.example.com/feature-1" 524 | status: 525 | conditions: 526 | - type: "www.example.com/feature-1" # an extra PodCondition 527 | status: "False" 528 | lastProbeTime: null 529 | lastTransitionTime: 2018-01-01T00:00:00Z 530 | ``` 531 | 532 | https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/ 533 | - The `command` and `arguments` that you define in the configuration file override the default command and arguments provided by the container image. If you define args, but do not define a command, the default command is used with your new arguments. If you supply a command but no args for a Container, only the supplied command is used. 534 | - Environment variables needs to be in parentheses, `$(VAR)` to be expanded in the `command` or `args` fields. 535 | 536 | https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ 537 | - `Probe with command` 538 | - `Probe with HTTP`: Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure. 539 | - `Probe with TCP`: The kubelet will attempt to open a `socket` to your container on the specified port. If it can establish a connection, the container is considered healthy. 540 | - `Startup Probe`: The kubelet uses `startup probes` to know when a container application has started. If such a probe is configured, it disables `1liveness` and `readiness` checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running. Set up a `startup probe` with the same command, HTTP or TCP check, with a `failureThreshold * periodSeconds` long enough to cover the worse case startup time. 541 | - You can use a named `containerPort` for HTTP or TCP checks `port` field. 542 | - For a `TCP probe`, the kubelet makes the probe connection at the node, not in the pod, which means that you can not use a service name in the `host` parameter since the kubelet is unable to resolve it. 543 | - Probes fields: `initialDelaySeconds`, `periodSeconds`, `timeoutSeconds` (after which the probe times out), `successThreshold`, `failureThreshold`. 544 | - Additional fields for HTTP probes: `host` (use 127.0.0.1 if container listens here and Pod has `hostNetwork: true`), `scheme` (http/https, no certificate check), `path`, `httpHeaders`, `port`. 545 | 546 | https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ 547 | 548 | https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/ 549 | - Fractional values are allowed. A Container that requests 0.5 CPU is guaranteed half as much CPU as a Container that requests 1 CPU. You can use the suffix m to mean milli. For example 100m CPU, 100 milliCPU, and 0.1 CPU are all the same. **Precision finer than 1m is not allowed**. 550 | - If you do not specify a CPU limit for a Container, the Container has no upper bound on the CPU resources it can use; or if its running in a `namespace` that has a default CPU limit, the Container is automatically assigned the default limit. Cluster administrators can use a `LimitRange` to specify a default value for the CPU limit. 551 | 552 | https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/ 553 | - The memory resource is measured in bytes. You can express memory as a plain integer or a fixed-point integer with one of these suffixes: E, P, T, G, M, K, Ei, Pi, Ti, Gi, Mi, Ki. 554 | - If you do not specify a memory limit for a Container, the Container has no upper bound on the amount of memory it uses. The Container could use all of the memory available on the Node where it is running which in turn could invoke the `OOM Killer`. Further, in case of an OOM (out-of-memory) Kill, a container with no resource limits will have a greater chance of being killed; or the Container is running in a namespace that has a default memory limit, and the Container is automatically assigned the default limit. Cluster administrators can use a `LimitRange` to specify a default value for the memory limit. 555 | - Besides describing or seeing the status of a Pod, you can also run `kubectl describe nodes` and search for `OOMKilling`. 556 | 557 | - **WARN**: Take into account that `Ki != K`, see: https://en.wikipedia.org/wiki/Kibibyte 558 | - Name comes from the contraction `kilo binary byte`. 559 | - 1 kibibyte (KiB) = 1024 B = 2^10 bytes. 560 | - 1 kilobyte (KB) = 1000 B = 10^3 bytes. 561 | - 1024 ^ 2 = 1 MiB ... 1024 ^ 3 = 1 GiB ... 562 | - 1000 ^ 2 = 1 MB ... 1000 ^ 3 = 1 GB ... 563 | - If you run `kubectl run bash --image=bash --restart=Never --requests=memory=1000` and then `kubectl describe pod bash | grep memory` it prints `memory: 1k` (that is, 1 KB and **not** 1 KiB) 564 | 565 | 566 | https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/ 567 | - When you are creating a ConfigMap based on a file, the key in the `data-source` defaults to the filename, and the value defaults to the file content. 568 | - The total delay from the moment when the ConfigMap is updated to the moment when new keys are projected to the pod can be as long as kubelet sync period (1 minute by default) + ttl of ConfigMaps cache (1 minute by default) in kubelet. `You can trigger an immediate refresh by updating one of the pod’s annotations`. 569 | - If you use `envFrom` to define environment variables from ConfigMaps, keys that are considered invalid will be skipped. The pod will be allowed to start, but the invalid names will be recorded in the event log `InvalidVariableNames`. 570 | - Use the option `--from-env-file` to create a ConfigMap from an env-file: 571 | 572 | ```yaml 573 | # env-file.properties 574 | enemies=aliens 575 | lives=3 576 | allowed="true" 577 | 578 | # Create the configmap and then check it 579 | # kubectl create configmap config-env-file --from-env-file=env-file.properties 580 | # kubectl get configmap game-config-env-file -o yaml 581 | ... 582 | data: 583 | allowed: '"true"' # quotation marks are preserved 584 | enemies: aliens 585 | lives: "3" 586 | ... 587 | ``` 588 | 589 | - How to use configmaps in pods: 590 | 591 | ```yaml 592 | containers: 593 | - name: test-container 594 | image: busybox 595 | command: [ "/bin/sh", "-c", "env" ] 596 | # command: [ "/bin/sh", "-c", "echo $(SPECIAL_LEVEL_KEY) ] # You can use ConfigMap-defined env in the command section 597 | env: 598 | - name: SPECIAL_LEVEL_KEY # Define the environment variable 599 | valueFrom: 600 | configMapKeyRef: 601 | name: special-config # The ConfigMap containing the value you want to assign to SPECIAL_LEVEL_KEY 602 | key: special.how # Specify the key associated with the value 603 | --- 604 | envFrom: # Configure all key-value pairs in a ConfigMap as container environment variables 605 | - configMapRef: 606 | name: special-config 607 | --- 608 | volumeMounts: 609 | - name: config-volume 610 | mountPath: /etc/config # WARN: If there are some files in this directory, they will be deleted. 611 | # Files will be mounted as /etc/config/.../etc/config/ 612 | volumes: 613 | - name: config-volume 614 | configMap: 615 | name: special-config # Provide the name of the ConfigMap containing the files you want 616 | --- 617 | volumeMounts: 618 | - name: config-volume 619 | mountPath: /etc/config 620 | # MY_KEY_1 will be mounted as /etc/config/key_1 621 | volumes: 622 | - name: config-volume 623 | configMap: 624 | name: special-config 625 | items: # optional 626 | - key: MY_KEY_1 627 | path: key_1 628 | ``` 629 | 630 | https://kubernetes.io/docs/concepts/configuration/secret/ 631 | ```yaml 632 | containers: 633 | - name: test 634 | image: busybox 635 | volumeMounts: 636 | - name: secret-volume 637 | readOnly: true # useful attribute 638 | mountPath: "/etc/secret-volume" 639 | # username will be mounted as /etc/secret-volume/my-group/my-username 640 | volumes: 641 | - name: foo 642 | secret: 643 | secretName: mysecret 644 | items: # optional 645 | - key: username 646 | path: my-group/my-username 647 | mode: 0777 # 511 in decimal in case you use json 648 | --- 649 | 650 | # env.valueFrom.secretKeyRef && envFrom.secretRef can be used with secrets as well 651 | ``` 652 | - If `.spec.volumes[].secret.items` is used, only keys specified in items are projected. To consume all keys from the secret, all of them must be listed. All listed keys must exist in the corresponding secret. Otherwise, the volume is not created. 653 | - Inside the container that mounts a secret volume, the secret keys appear as files and the secret values are `base64 decoded`. 654 | - Linux users should use the base64 command as `base64 -w 0`. 655 | - Secret resources reside in a namespace. `Secrets can only be referenced by Pods in that same namespace`. 656 | - When a secret currently consumed in a volume is updated, projected keys are eventually updated as well. The kubelet checks whether the mounted secret is fresh on every periodic sync.The type of the cache is configurable using the `ConfigMapAndSecretChangeDetectionStrategy` field in the `KubeletConfiguration` struct. 657 | 658 | https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ 659 | 660 | - When you create a pod, if you do not specify a service account, it is `automatically` assigned the `default` service account in the same namespace. 661 | - `automountServiceAccountToken: false` can be set both in the `ServiceAccount` as well as in the `Pod.spec` (this takes precedence). 662 | - How to use a `docker-registry` secret type? 663 | - `kubectl create secret docker-registry secret_name --docker-server --docker-username --docker-password` 664 | - Then attach it with `imagePullSecrets` in a `Pod.spec` or or better yet in a `ServiceAccount` so that it is included as default in your pods. 665 | - `Service Account Token Volume Projection` example: 666 | ```yaml 667 | kind: Pod 668 | spec: 669 | containers: 670 | - image: nginx 671 | name: nginx 672 | volumeMounts: 673 | - mountPath: /var/run/secrets/tokens 674 | name: vault-token 675 | serviceAccountName: my-sa 676 | volumes: 677 | - name: vault-token 678 | projected: 679 | sources: 680 | - serviceAccountToken: 681 | path: vault-token 682 | expirationSeconds: 7200 683 | audience: vault 684 | ``` 685 | 686 | https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ 687 | - Best way to learn about fields is to run: `kubectl explain pod.spec.securityContext` and `kubectl explain pod.spec.containers.securityContext`. 688 | - Remember that `pod.spec.containers.securityContext` has precedence over `pod.spec.securityContext`. 689 | - Important ones in `pod.spec.securityContext`: 690 | - `fsGroup`: A special supplemental group that applies to all containers in a pod. Some `volumes` allow the Kubelet to change the ownership of that volume to be owned by the pod: 1. The owning GID will be the FSGroup 2. The setgid bit is set (new files created in the volume will be owned by FSGroup) 3. The permission bits are OR'd with rw-rw---- If unset, the Kubelet will not modify the ownership and permissions of any volume. 691 | - `runAsGroup`: The GID to run the entrypoint of the container process. Uses runtime default if unset, usualy root (0). 692 | - `runAsUser`: The UID to run the entrypoint of the container process. Defaults to user specified in image metadata if unspecified. 693 | - `runAsNonRoot`: Indicates that the container must run as a non-root user. If `true`, the Kubelet will validate the image at runtime to ensure that it does not run as UID 0 (root) and fail to start the container if it does. 694 | - `seLinuxOptions`: The SELinux context to be applied to all containers. 695 | - `supplementalGroups`: A list of groups applied to the first process run in each container, in addition to the container's primary GID. 696 | - Important ones in `pod.spec.containres.securityContext`: 697 | - `allowPrivilegeEscalation`: Controls whether a process can gain more privileges than its parent process. `True` always when the container is: 1) run as Privileged OR 2) has `CAP_SYS_ADMIN`. 698 | - `capabilities`: The capabilities to add/drop when running containers, eg: `capabilities.add: ["NET_ADMIN", "SYS_TIME"]`. 699 | - `privileged`: Run container in privileged mode. Processes in privileged containers are essentially equivalent to root on the host. Defaults to false. 700 | - `readOnlyRootFilesystem`: Whether this container has a read-only root filesystem. Default is false. 701 | - `runAsGroup`, `runAsNonRoot`, `runAsUser`, `seLinuxOptions`. 702 | 703 | https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/ 704 | ```yaml 705 | apiVersion: v1 706 | kind: Pod 707 | metadata: 708 | name: nginx 709 | spec: 710 | shareProcessNamespace: true # this is the required field 711 | # Processes/Filesystems will be visible to other containers in the pod, including `/proc`, `/proc/$pid/root`. 712 | # Watch out for passwords that were passed as arg or env. These are protected only by regular Unix permissions. 713 | containers: 714 | - name: nginx 715 | image: nginx 716 | - name: shell 717 | image: busybox 718 | securityContext: 719 | capabilities: 720 | add: 721 | - SYS_PTRACE # needed to send signal `kill -HUP 1` to the pod sandbox, `/pause` in this case. 722 | stdin: true 723 | tty: true 724 | --- 725 | 726 | # The container process no longer has PID 1, therefore some images will refuse to start without PID 1 (eg: systemd). 727 | # 728 | # kubectl exec nginx -c shell -it -- ps 729 | # 730 | # PID USER TIME COMMAND 731 | # 1 root 0:00 /pause 732 | # 14 root 0:00 sh 733 | # 36 root 0:00 nginx: master process nginx -g daemon off; 734 | # 42 101 0:00 nginx: worker process 735 | # 57 root 0:00 ps 736 | ``` 737 | 738 | https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-initialization/ 739 | ```yaml 740 | apiVersion: v1 741 | kind: Pod 742 | metadata: 743 | name: init-demo 744 | spec: 745 | containers: 746 | - name: nginx 747 | image: nginx 748 | volumeMounts: 749 | - name: workdir 750 | mountPath: /usr/share/nginx/html 751 | initContainers: # These containers are run to completion during pod initialization before others are run. 752 | - name: install 753 | image: busybox 754 | command: [wget, "-O", "/work-dir/index.html", "http://kubernetes.io"] 755 | volumeMounts: 756 | - name: workdir 757 | mountPath: "/work-dir" 758 | dnsPolicy: Default 759 | volumes: 760 | - name: workdir 761 | emptyDir: {} 762 | ``` 763 | 764 | https://kubernetes.io/docs/concepts/workloads/controllers/deployment 765 | - To update a deployment: `kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1 --record` or just edit it `kubectl edit deployment `. 766 | - Deployment ensures that only a certain number of Pods are down while they are being updated. By default, it ensures that at least 75% of the desired number of Pods are up (25% `deploy.spec.strategy.rollingUpdate.maxUnavailable`). 767 | - Deployment also ensures that only a certain number of Pods are created above the desired number of Pods. By default, it ensures that at most 125% of the desired number of Pods are up (25% `deploy.spec.strategy.rollingUpdate.maxSurge`). 768 | - `Multiple updates in-flight (Rollover)`: suppose you create a Deployment to create 5 replicas of `nginx:1.14.2`, but then update the Deployment to create 5 replicas of `nginx:1.16.1`, when only 3 replicas of `nginx:1.14.2` had been created. In that case, the Deployment immediately starts killing the 3 `nginx:1.14.2` Pods that it had created, and starts creating `nginx:1.16.1` Pods. **It does not wait for the 5 replicas of `nginx:1.14.2` to be created before changing course.** 769 | - A `Deployment's revision` is created when a Deployment's rollout is triggered. This means that the new revision is created if and only if the Deployment's Pod template `.spec.template` is changed, for example if you update the labels or container images of the template. Other updates, such as scaling the Deployment, do not create a Deployment revision, so that you can facilitate simultaneous manual- or auto-scaling. This means that when you roll back to an earlier revision, only the Deployment's Pod template part is rolled back. 770 | - Rolling back a deployment: `kubectl rollout history deployment.v1.apps/nginx-deployment [--revision=<#rev>]` and rollback with `kubectl rollout undo deployment.v1.apps/nginx-deployment [--to-revision=<#rev>]` 771 | - Scaling a deployment: `kubectl scale deployment.v1.apps/nginx-deployment --replicas=10` 772 | - Scaling with HPA (Horizontal-Pod-Autoscaling): `kubectl autoscale deployment --min=10 --max=15 --cpu-percent=80` 773 | - You can pause a Deployment before triggering one or more updates and then resume it. This allows you to apply multiple fixes in between pausing and resuming without triggering unnecessary rollouts. 774 | - `kubectl rollout pause|resume deployment ` 775 | - You can set `.spec.revisionHistoryLimit` field in a Deployment to specify how many old ReplicaSets for this Deployment you want to retain. The rest will be garbage-collected in the background. By default, it is 10. 776 | - If you want to roll out releases to a subset of users or servers using the Deployment, you can create multiple Deployments, one for each release, following the `canary pattern`. 777 | - Rolling update strategy: `.spec.strategy.type` can be `RollingUpdate` (default) or `Recreate`. 778 | - Only a `.spec.template.spec.restartPolicy = Always` is allowed, which is the default if not specified. 779 | - `.spec.progressDeadlineSeconds` seconds the Deployment controller waits before indicating (in the Deployment status) that the Deployment progress has stalled. In the future, once automatic rollback will be implemented, the Deployment controller will roll back a Deployment as soon as it observes such a condition. 780 | 781 | https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/ 782 | - Directly accessing the REST API: 783 | - Use `kubectl proxy --port=8080`, then do `curl http://localhost:8080/api` 784 | - This is recommended instead of doing it manually because it verifies identity of apiserver using self-signed cert, so no MITM attacks are possible. It will authenticate to `apiserver` and in the future it may do intelligent client-side local-balancing and failover. 785 | - Accesing the API from a Pod: 786 | - `kubernetes.default.svc` resolves to a `Service IP` which in turn will be routed to an apiserver. 787 | - Run `kubectl proxy` in a sidecar container in the pod or as a background process within the container. 788 | - Or use a k8s client library like Go, they will handle locating and authenticating to the apiserver. 789 | - Either case will use the credentials of the pod `/var/run/secrets/kubernetes.io/serviceaccount/*`. 790 | - Accessing services running on the cluster: 791 | - Access services, nodes, or pods using the `Proxy Verb`: apiserver does authentication and authorization prior to accessing the remote service. Use this if the services are not secure enough to expose to the internet, or to gain access to ports on the node IP, or for debugging. Note that proxies may cause problems for some web applications, and only works for HTTP/HTTPS. 792 | - `kubectl cluster-info` to get the list of services using the `Proxy Verb`. 793 | - You can also have a `kubectl proxy --port=8080` and access a service passing in a the corresponding url like `http://localhost:8080/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy/` 794 | - Syntax: `http://kubernetes_master_address/api/v1/namespaces/namespace_name/services/[https:]service_name[:port_name]/proxy` 795 | 796 | https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application-introspection/ 797 | - When you bind a pod to a `hostPort` there are a limited number of places that the pod can be scheduled. In most cases, hostPort is unnecessary; try using a `service` object to expose your pod. If you do require hostPort then you can only schedule as many pods as there are nodes in your container cluster. 798 | - To list all events you can use `kubectl get events` but you have to remember that events are `namespaced`. This means that if you’re interested in events for some namespaced object (e.g. what happened with Pods in namespace my-namespace) you need to explicitly provide a namespace to the command. To see events from all namespaces, you can use the `--all-namespaces` argument. 799 | - Rememeber to inspect `nodes` as the may become `NotReady`, and also notice that the pods are no longer running (they are evicted after five minutes of `NotReady` status). 800 | 801 | https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/ 802 | - **Highly recommended to read everything from this link.** 803 | - Run `kubectl apply --validate -f mypod.yaml`. It will error if for instance you misspelled `command as commnd`. 804 | - If you are missing `endpoints` for your `service`, try listing pods using the labels that Service uses: 805 | ```yaml 806 | spec: 807 | - selector: 808 | name: nginx 809 | type: frontend 810 | --- 811 | # kubectl get pods --selector=name=nginx,type=frontend 812 | ``` 813 | - If the list of pods matches expectations, but your endpoints are still empty, it’s possible that you don’t have the right ports exposed. Verify that the `Pod’s containerPort` matches up with the `Service’s targetPort`. 814 | 815 | https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/ 816 | - Same tips than for debugging Pods. 817 | 818 | https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/ 819 | - **Highly recommended to read everything from this link.** 820 | - Here are key points (most of them have to be run in a temp pod like a `busybox`) 821 | 822 | 1. Test connection directly from Pod IP 823 | 1. Check the service work by DNS, `nslookup ` or `nslookup .` in case you are in a different namespace than the service or with FQDN `nslookup ..svc.cluster.local` (WARN: cluster.local could be different in your cluster). 824 | 1. If you can do a FQDN lookup but not a relative one, check the file `/etc/resolv.conf` which should have the `"nameserver: 10.0.0.10"`, `"search: default.svc.cluster.local, svc.cluster.local, cluster.local"` and `"options ndots:5"`. 825 | 1. `` is the cluster's DNS Service IP, find it with `kubectl get svc -A | grep kube-dns`. 826 | 1. You can test from a Node as well with `nslookup `. Also, you can `curl `. 827 | 1. The `nslookup kubernetes.default` (k8s master service) should always work from within a Pod, if not there might be something with `kube-proxy`. 828 | 1. If DNS worked, you should know check if you can connect using the `Service IP`. 829 | 1. Whatch out for Service definition: 830 | 1. If you meant to use a numeric port, is it a number 9376 or a string “9376” (named port)? 831 | 1. Is the port's protocol correct for your Pods? 832 | 1. Use service selectors to find the associated pods and check that the correct `endpoints` are created with same pods IP. 833 | 1. If you get here, your Service is running, has Endpoints, and your Pods are actually serving. At this point, the whole `Service proxy mechanism is suspect`: 834 | 1. In your nodes: `ps auxw | grep kube-proxy` and check logs with `/var/log/kube-proxy.log` or using `journalctl`. 835 | 1. Ensure `conntrack` is installed. 836 | 1. If `kube-proxy` is in `iptables` mode, check them with `iptables-save | grep hostnames`, for each port of each Service, there should be 1 rule in `KUBE-SERVICES`. 837 | 838 | 839 | ## Recommended 840 | 841 | The following links are useful to familiarize with kubernetes. At the very least, read them one time. 842 | 843 | https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/ 844 | - If the pod does not have a `ServiceAccount` set, it sets it to `default`. 845 | - It adds a `volumeSource` to each container of the pod mounted at `/var/run/secrets/kubernetes.io/serviceaccount`. 846 | 847 | https://kubernetes.io/docs/concepts/cluster-administration/logging/ 848 | - Using a node-level logging agent is the most common and encouraged approach for a Kubernetes cluster, because it creates only one agent per node, and it doesn’t require any changes to the applications running on the node. However, node-level logging only works for applications' standard output and standard error. 849 | - By having your `sidecar containers` stream to their own stdout and stderr streams, you can take advantage of the kubelet and the logging agent that already run on each node. The sidecar containers read logs from a file, a socket, or the journald. Each individual sidecar container prints log to its own stdout or stderr stream. 850 | - Note: Using a logging agent in a sidecar container can lead to significant resource consumption. Moreover, you won't be able to access those logs using kubectl logs command, because they are not controlled by the kubelet. 851 | 852 | https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/ 853 | 854 | https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ 855 | - Labels can be attached to objects at creation time or later on. They can be modified at any time. 856 | - The name segment is required and must be 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between. The prefix is optional. If specified, the prefix must be a DNS subdomain: a series of DNS labels separated by dots (.), not longer than 253 characters in total, followed by a slash (/). 857 | - The `kubernetes.io/` and `k8s.io/` prefixes are reserved for Kubernetes core components. 858 | ```yaml 859 | selector: 860 | matchLabels: # equality-based selectors 861 | component: redis 862 | matchExpressions: # set-based selectors 863 | - {key: tier, operator: In, values: [cache]} 864 | - {key: environment, operator: NotIn, values: [dev]} 865 | ``` 866 | 867 | https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/ 868 | - `kubectl describe pod | grep -i qos` 869 | - Types: 870 | - `Guaranteed`: Containers in a Pod has `memory_request = memory_limit && cpu_request == cpu__limit` 871 | - `Burstable`: At least one container in a Pod has a memory or cpu request without meeting Guaranteed QoS. 872 | - `BestEffort`: Containers in a Pod must not have any memory or cpu limits or requests. 873 | 874 | https://kubernetes.io/blog/2015/06/the-distributed-system-toolkit-patterns/ 875 | - `Sidecar`: Extend and enhance the main container. Example: A Pod with a WebServer container and a Sync container which syncs files from a git repository. File system is shared between the two containers. 876 | - `Ambassador`: Proxy local connections to the world. Example: A Pod with your App container and a Redis Proxy container, which is responsible for splitting reads/writes to appropriate servers. Main container discovers the Redis Proxy container in localhost since they are in the same Pod. 877 | - `Adapter`: Standardize and normalize output. Example: Pods have their App container and a Adapter container to unified logs to be consumed by a Centralized monitoring system. 878 | 879 | https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/ 880 | - Better to see the help, `kubectl port-forward --help` 881 | 882 | https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ 883 | - Namespaces are a way to divide cluster resources between multiple users (via resource quota). 884 | - It is not necessary to use multiple namespaces just to separate slightly different resources, such as different versions of the same software: use labels to distinguish them. 885 | - When you create a Service, it creates a corresponding DNS entry. This entry is of the form `..svc.cluster.local` 886 | 887 | https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/ 888 | 889 | https://kubernetes.io/docs/tasks/administer-cluster/declare-network-policy/ 890 | 891 | https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/ 892 | 893 | https://kubernetes.io/docs/tasks/configure-pod-container/configure-volume-storage/ 894 | 895 | https://kubernetes.io/docs/tasks/configure-pod-container/configure-projected-volume-storage/ 896 | 897 | https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/ 898 | 899 | https://kubernetes.io/docs/tutorials/kubernetes-basics/expose/expose-intro/ 900 | 901 | https://kubernetes.io/docs/tasks/debug-application-cluster/local-debugging/ 902 | 903 | https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/ 904 | 905 | https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/ 906 | 907 | https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/ 908 | 909 | https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/ 910 | 911 | # Docs that may be out-of-scope for CKAD 912 | 913 | The following links are most likely not going to be tested in the CKAD exam. Nevertheless, it doesn't hurt knowing a little bit about them. 914 | 915 | https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ 916 | - Like a `Deployment`, a `StatefulSet` manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a `sticky identity` for each of their Pods. 917 | - StatefulSet represents a set of pods with consistent identities, defined as: `Network`, a single stable DNS and hostname, `Storage`, as many `VolumeClaims` as requested. The StatefulSet guarantees that a given network identity will always map to the same storage identity. 918 | - Use cases / limitations: 919 | - The storage for a given Pod must either be provisioned by a `PersistentVolume` based on the requested storage class, or pre-provisioned by an admin. 920 | - Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. 921 | - StatefulSets currently require a `headless service` to be responsible for the network identity of the Pods. 922 | - To achieve `ordered and graceful` termination of the pods in the StatefulSet, scale the StatefulSet down to 0 prior to deletion. 923 | - Naming example of a ReplicaSet named `web` and headless service named `nginx`: 924 | - Domain: nginx.default.svc.cluster.local 925 | - Pod DNS: web-{0..N-1}.nginx.default.svc.cluster.local 926 | - Pod Hostname: web-{0..N-1} 927 | - Deployment / Scaling: 928 | - For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}. 929 | - When Pods are being deleted, they are terminated in reverse order, from {N-1..0}. 930 | - Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready. It will proceed in the same order as Pod termination (from the largest ordinal to the smallest). 931 | - Before a Pod is terminated, all of its successors must be completely shutdown. 932 | - `spec.podManagementPolicy` can be `OrderedReady` (default) or `Parallel` (does not wait for Pods to become Running and Ready) 933 | - `spec.updateStrategy.type` can be `OnDelete` or `RollingUpdates` (default) 934 | - If you update the Pod template to a configuration that never becomes Running and Ready (for example, due to a bad binary or application-level configuration error), StatefulSet will stop the rollout and wait. After reverting the template, you must also delete any Pods that StatefulSet had already attempted to run with the bad configuration. StatefulSet will then begin to recreate the Pods using the reverted template. 935 | 936 | https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ 937 | - A `DaemonSet` ensures that all (or some) `Nodes` run a copy of a `Pod`. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. Useful for running storage, log and monitoring daemons. 938 | - A Pod Template in a DaemonSet must have a `RestartPolicy: Always` (default) 939 | - If you specify a `.spec.template.spec.nodeSelector`, then the DaemonSet controller will create Pods on nodes which match that node selector. 940 | - If you specify a `.spec.template.spec.affinity`, then DaemonSet controller will create Pods on nodes which match that node affinity. - If you do not specify either, then the DaemonSet controller will create Pods on all nodes. 941 | - Communicating with Daemon Pods: 942 | - Push: Pods in the DaemonSet are configured to send updates to another service, such as a stats database. 943 | - NodeIP and Known Port: Pods in the DaemonSet can use a hostPort, so that the pods are reachable via the node IPs. 944 | - DNS: Create a `headless service` with the same pod selector, and then discover DaemonSets using the endpoints resource. 945 | - Service: Create a service with the same Pod selector, and use the service to reach a daemon on a random node (No way to reach specific node.) 946 | - Update strategies: `OnDelete` (new DaemonSet pods will only be created when you manually delete old DaemonSet pods.) and `RollingUpdate` (default) 947 | - If node labels are changed, the DaemonSet will promptly add Pods to newly matching nodes and delete Pods from newly not-matching nodes. 948 | - It is possible to create Pods by writing a file to a certain directory watched by Kubelet. These are called `static pods`. Unlike DaemonSet, static Pods cannot be managed with kubectl or other Kubernetes API clients. Static Pods do not depend on the apiserver, making them useful in cluster `bootstrapping` cases. Also, static Pods may be deprecated in the future. 949 | - Use a `Deployment` for stateless services, like frontends, where scaling up and down the number of replicas and rolling out updates are more important than controlling exactly which host the Pod runs on. Use a `DaemonSet` when it is important that a copy of a Pod always run on all or certain hosts, and when it needs to start before other Pods. 950 | 951 | https://kubernetes.io/docs/concepts/services-networking/ingress/ 952 | - Ingress exposes HTTP and HTTPS routes from outside the cluster to `services` within the cluster. Traffic routing is controlled by rules defined on the Ingress resource. `Internet => Ingress => Services` 953 | - An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting. 954 | - **An Ingress does not expose arbitrary ports or protocols**. Exposing services other than HTTP and HTTPS to the internet typically uses a service of type `Service.Type=NodePort` or `Service.Type=LoadBalancer`. 955 | - You must have an `ingress controller` (like ingress-nginx) to satisfy an Ingress. Only creating an Ingress resource has no effect. 956 | - You can secure an `Ingress` by specifying a `Secret` that contains a TLS private key and certificate. Currently the Ingress only supports a single TLS port, `443`, and assumes TLS termination. If the TLS configuration section in an Ingress specifies different hosts, they are multiplexed on the same port according to the hostname specified through the SNI TLS extension (provided the Ingress controller supports SNI). 957 | - Types of Ingress: 958 | 959 | ```yaml 960 | apiVersion: networking.k8s.io/v1beta1 961 | kind: Ingress 962 | metadata: 963 | name: test-ingress 964 | spec: 965 | # Single Service Ingress (default backend with no rules) 966 | backend: 967 | serviceName: testsvc 968 | servicePort: 80 969 | --- 970 | # Simple fanout (routes traffic from a single IP address to more than one Service) 971 | # foo.bar.com -> 178.91.123.132 -> / foo service1:4200 972 | # / bar service2:8080 973 | rules: 974 | - host: foo.bar.com 975 | http: 976 | paths: 977 | - path: /foo 978 | backend: 979 | serviceName: service1 980 | servicePort: 4200 981 | - path: /bar 982 | backend: 983 | serviceName: service2 984 | servicePort: 8080 985 | --- 986 | # Name based virtual hosting (routing traffic to multiple host names at the same IP address.) 987 | # foo.bar.com --| |-> foo.bar.com service1:80 988 | # | 178.91.123.132 | 989 | # bar.foo.com --| |-> bar.foo.com service2:80 990 | spec: 991 | rules: 992 | - host: foo.bar.com 993 | http: 994 | paths: 995 | - backend: 996 | serviceName: service1 997 | servicePort: 80 998 | - host: bar.foo.com 999 | http: 1000 | paths: 1001 | - backend: 1002 | serviceName: service2 1003 | servicePort: 80 1004 | ``` 1005 | 1006 | https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ 1007 | - Pods can have priority. Priority indicates the importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. 1008 | ```yaml 1009 | apiVersion: scheduling.k8s.io/v1 1010 | kind: PriorityClass 1011 | metadata: 1012 | name: high-priority 1013 | value: 1000000 # 32-bit integer, the higher the number, the higher the priority 1014 | globalDefault: false # default value. If true, will be used as defaults for Pods (only one is allowed to be true) 1015 | preemptionPolicy: PreemptLowerPriority # default value. If 'Never', will be scheduled ahead of other lower-priority pods, but cannot preempt other pods; subject to scheduler back-off. 1016 | description: "This priority class should be used for XYZ service pods only." 1017 | --- 1018 | # Usage 1019 | kind: Pod 1020 | spec: 1021 | containers: 1022 | - name: nginx 1023 | image: nginx 1024 | priorityClassName: high-priority 1025 | ``` 1026 | - A Node is considered for preemption only when the answer to this question is yes: "If all the Pods with lower priority than the pending Pod are removed from the Node, can the pending Pod be scheduled on the Node?" 1027 | - Please note that `Pod P` is not necessarily scheduled to the `nominated Node`. After victim Pods are preempted, they get their `graceful termination period`. If another node becomes available while scheduler is waiting for the victim Pods to terminate, scheduler will use the other node to schedule `Pod P`. As a result `nominatedNodeName` and `nodeName` of Pod spec are not always the same. Also, if scheduler preempts Pods on `Node N`, but then a higher priority Pod than `Pod P` arrives, scheduler may give `Node N` to the new higher priority Pod. In such a case, scheduler clears nominatedNodeName of `Pod P`. By doing this, scheduler makes Pod P eligible to preempt Pods on another Node. 1028 | - When there are multiple nodes available for preemption, the scheduler tries to choose the node with a set of Pods with lowest priority. However, if such Pods have `PodDisruptionBudget` that would be violated if they are preempted then the scheduler may choose another node with higher priority Pods (`PodDisruptionBudget` is supported, **but not guaranteed**) 1029 | 1030 | https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ 1031 | - Kubernetes offers features to help run `highly available` applications at the same time as frequent `voluntary` disruptions. We call this set of features `Disruption Budgets`. 1032 | - Not all voluntary disruptions are constrained by Pod Disruption Budgets. For example, deleting deployments or pods bypasses `Pod Disruption Budgets`. 1033 | - An Application Owner can create a PodDisruptionBudget object (PDB) for each application. A PDB limits the number of pods of a replicated application that are down simultaneously from voluntary disruptions. For example, a quorum-based application would like to ensure that the number of replicas running is never brought below the number needed for a quorum. A web front end might want to ensure that the number of replicas serving load never falls below a certain percentage of the total. 1034 | - Cluster managers and hosting providers should use tools which respect Pod Disruption Budgets by calling the Eviction API instead of directly deleting pods or deployments. Examples are the `kubectl drain`. 1035 | ```yaml 1036 | apiVersion: policy/v1beta1 1037 | kind: PodDisruptionBudget 1038 | metadata: 1039 | name: zk-pdb 1040 | spec: 1041 | minAvailable: 2 1042 | selector: # similar selector than the one used in a Deployment 1043 | matchLabels: 1044 | app: zookeeper 1045 | ``` 1046 | 1047 | https://kubernetes.io/docs/concepts/policy/resource-quotas/ 1048 | - A `ResourceQuota`, provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that project. 1049 | - `Resource quotas` divides up aggregate cluster resources, but it creates no restrictions around nodes: pods from several namespaces may run on the same node. 1050 | - ResourceQuotas are independent of the cluster capacity. They are expressed in absolute units. So, if you add nodes to your cluster, this does not automatically give each namespace the ability to consume more resources. 1051 | - A pod is in a `terminal state` if `.status.phase` in (Failed, Succeeded) is true. 1052 | - Pods can be created at a specific priority. You can control a pod’s consumption of system resources based on a pod’s priority, by using the `scopeSelector` field in the quota spec. 1053 | ```yaml 1054 | apiVersion: v1 1055 | kind: ResourceQuota 1056 | metadata: 1057 | name: mem-cpu-demo 1058 | spec: 1059 | # Every Container must have a memory request and cpu request. 1060 | hard: 1061 | # compute quota 1062 | requests.cpu: "1" # Across all pods in a non-terminal state, the sum of CPU requests cannot exceed this value. 1063 | requests.memory: 1Gi # Across all pods in a non-terminal state, the sum of memory requests cannot exceed this value. 1064 | # storage quota 1065 | requests.storage: 500Gi # Across all persistent volume claims, the sum of storage requests cannot exceed this value. 1066 | gold.storageclass.storage.k8s.io/requests.storage: 500Gi # Across all persistent volume claims associated with the storage-class-name, the sum of storage requests cannot exceed this value. 1067 | # Object count quota 1068 | configmaps: 50 # The total number of config maps that can exist in the namespace. 1069 | pods: 500 1070 | services: 100 1071 | ``` 1072 | 1073 | https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ 1074 | 1075 | - Pod `.spec.nodeName` is the simplest form of node selection constraint, but **due to its limitations it is typically not used**. If it is provided, it takes precedence over the other methods for node selection. 1076 | - `Node affinity` is conceptually similar to `nodeSelector` – it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node: 1077 | 1078 | ```yaml 1079 | apiVersion: v1 1080 | kind: Pod 1081 | metadata: 1082 | name: nginx 1083 | labels: 1084 | env: test 1085 | spec: 1086 | containers: 1087 | - name: nginx 1088 | image: nginx 1089 | imagePullPolicy: IfNotPresent 1090 | nodeSelector: # Simplest form of node selection constraint 1091 | disktype: ssd # can be scheduled on nodes having this label 1092 | affinity: 1093 | # Node Affinity 1094 | nodeAffinity: 1095 | requiredDuringSchedulingIgnoredDuringExecution: # required is a "hard" requirement 1096 | nodeSelectorTerms: # multiple nodeSelectorTerms are ANDed (all must be satisfied) 1097 | - matchExpressions: # multiple matchExpressions are ORed (one must be satisfied) 1098 | - key: kubernetes.io/e2e-az-name 1099 | operator: In # valid operators: In, NotIn, Exists, DoesNotExist, Gt, Lt. Use NotIn and DoesNotExist for node anti-affinity 1100 | values: 1101 | - e2e-az1 1102 | - e2e-az2 1103 | preferredDuringSchedulingIgnoredDuringExecution: # preferred is a "soft" requirement 1104 | - weight: 1 # valid values [1-100], the computed nodes with higher weights are preferred 1105 | preference: 1106 | matchExpressions: 1107 | - key: another-node-label-key 1108 | operator: In 1109 | values: 1110 | - another-node-label-value 1111 | ``` 1112 | 1113 | - `Inter-pod affinity` and `anti-affinity` allow you to constrain which `nodes` your `pod` is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes. 1114 | - `Rules` are of the form "this pod should (or, in the case of anti-affinity, should not) run in an `X` if that `X` is already running one or more pods that meet rule `Y`". 1115 | - `Y` is expressed as a LabelSelector with an optional associated list of `namespaces`. 1116 | - `X` is a topology domain like `node`, `rack`, `cloud provider region`. You express it using a `topologyKey` which is the key for the node label that the system uses to denote such a topology domain. Examples: `kubernetes.io/hostname`, `topology.kubernetes.io/zone`, `topology.kubernetes.io/region`, `node.kubernetes.io/instance-type`, `kubernetes.io/os`, `kubernetes.io/arch`. 1117 | 1118 | ```yaml 1119 | apiVersion: apps/v1 1120 | kind: Deployment 1121 | metadata: 1122 | name: web-server 1123 | spec: 1124 | selector: 1125 | matchLabels: 1126 | app: web-store 1127 | replicas: 3 1128 | template: 1129 | metadata: 1130 | labels: 1131 | app: web-store 1132 | spec: 1133 | affinity: 1134 | podAntiAffinity: 1135 | # do not co-locate web-store pods in the same node 1136 | requiredDuringSchedulingIgnoredDuringExecution: 1137 | - labelSelector: 1138 | matchExpressions: 1139 | - key: app 1140 | operator: In # Valid operators: In, NotIn, Exists, DoesNotExist 1141 | values: 1142 | - web-store 1143 | topologyKey: "kubernetes.io/hostname" 1144 | podAffinity: 1145 | # co-locate web-store pods with cache pods 1146 | requiredDuringSchedulingIgnoredDuringExecution: 1147 | - labelSelector: 1148 | matchExpressions: 1149 | - key: app 1150 | operator: In 1151 | values: 1152 | - cache 1153 | topologyKey: "kubernetes.io/hostname" 1154 | containers: 1155 | - name: web-app 1156 | image: nginx:1.16-alpine 1157 | 1158 | # Will result in the following scheduling: 1159 | # node1: web1 && cache1 1160 | # node2: web2 && cache2 1161 | # node3: web3 && cache3 1162 | ``` 1163 | 1164 | https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ 1165 | - `Node affinity`, is a property of pods that `attracts` them to a set of nodes; `Taints` are the opposite - they allow a node to `repel` a set of pods. 1166 | - `Taints` are applied to a `node`; this marks that the node should not accept any pods that do not tolerate the taints. `Tolerations` are applied to `pods`, and allow (but do not require) the pods to schedule onto nodes with matching taints. 1167 | 1168 | ```bash 1169 | # Add a taint 1170 | kubectl taint nodes node1 key=value:NoSchedule 1171 | # Remove a taint 1172 | kubectl taint nodes node1 key:NoSchedule- 1173 | ``` 1174 | 1175 | ```yaml 1176 | apiVersion: v1 1177 | kind: Pod 1178 | metadata: 1179 | name: nginx 1180 | labels: 1181 | env: test 1182 | spec: 1183 | containers: 1184 | - name: nginx 1185 | image: nginx 1186 | imagePullPolicy: IfNotPresent 1187 | tolerations: 1188 | - key: "example-key" 1189 | operator: "Exists" 1190 | effect: "NoSchedule" 1191 | # value: only necessary if using the operator "Equal" 1192 | --- 1193 | # Tolerates everything (matches all keys, values and effects) 1194 | tolerations: 1195 | - operator: "Exists" 1196 | --- 1197 | # Matches all effects with key `mykey`. 1198 | tolerations: 1199 | - key: "mykey" 1200 | operator: "Exists" 1201 | ``` 1202 | 1203 | - You can put multiple taints on the same node and multiple tolerations on the same pod. The way Kubernetes processes multiple taints and tolerations is like a `filter`: start with all of a node's taints, then ignore the ones for which the pod has a matching toleration; the remaining `un-ignored` taints have the indicated effects on the pod. In particular: 1204 | - if there is at least `one un-ignored` taint with effect `NoSchedule` then Kubernetes `will not schedule` the pod onto that node. 1205 | - if there is `no un-ignored` taint with effect `NoSchedule` but there is at least `one un-ignored` taint with effect `PreferNoSchedule` then Kubernetes will `try to not schedule` the pod onto the node. 1206 | - if there is at least `one un-ignored` taint with effect `NoExecute` then the pod `will be evicted` from the node (if it is already running on the node), and `will not be scheduled` onto the node (if it is not yet running on the node). 1207 | - Normally, if a taint with effect NoExecute is added to a node, then any pods that do not tolerate the taint will be evicted immediately, and pods that do tolerate the taint `will never be evicted`. However, a toleration with `NoExecute` effect can specify an optional `tolerationSeconds` field that dictates how long the pod will stay bound to the node after the taint is added. 1208 | - Kubernetes `automatically` adds a tolerations NoExecute for `node.kubernetes.io/not-ready` with `tolerationSeconds=300` and `node.kubernetes.io/unreachable` with `tolerationSeconds=300` when the Node transitions to these states. 1209 | - `DaemonSet` are created with these tolerations as well plus other tolerations like `node.kubernetes.io/memory-pressure`, `node.kubernetes.io/disk-pressure`, `node.kubernetes.io/network-unavailable (host network only)` **without `tolerationSeconds` so they are never evicted**. 1210 | 1211 | https://kubernetes.io/docs/reference/access-authn-authz/rbac/ 1212 | - **Highly recommended to read/watch the following link: https://www.cncf.io/blog/2018/08/01/demystifying-rbac-in-kubernetes/** 1213 | - K8s provides no API objects for users (something like we have for Pods, Deployments) 1214 | - User management must be configured by the cluster administrator: 1215 | - Certificate-based auth, Token-based auth, Basic auth, OAuth2 1216 | - K8s is configured with a `CA`, so every cert signed with this CA will be accepted by the k8s API. You can use `OpenSSL` or `CloudFlare's PKI toolkit` to create these certs. 1217 | - `/etc/kubernetes/pki/[ca.crt,ca.key]` 1218 | - Important fields in the SSL cert: `Common Name (CN)`: k8s uses it as the `user` && `Organization (O)`: k8s uses it as the `group`. 1219 | - RBAC in Kubernetes 1220 | - Three groups: Subjects, API Resources, Operations (verbs) 1221 | - Roles: `Operations => API Resources`. 1222 | - RoleBindings: `Role => Subects`. 1223 | - ClusterRoles && ClusterRoleBindings are very similar but without namespaces (apply to the whole cluster) 1224 | - K8s comes with predefined ClusterRoleBindings, you can use them (eg: add the group in the organization field of your certificate) 1225 | - Take advantage of `kubectl create [clusterrole|clusterrolebinding|role|rolebinding] -h`. 1226 | - Take advantage of `kubectl auth can-i` command to test the role, see the next example: 1227 | ```yaml 1228 | kind: Role 1229 | apiVersion: rbac.authorization.k8s.io/v1beta1 1230 | metadata: 1231 | namespace: myns 1232 | name: admin 1233 | rules: 1234 | - apiGroups: [] # empty if using resources from core, can be "*" 1235 | resources: ["pods"] # can be "*" 1236 | verbs: ["get", "list"] # can be "*" 1237 | --- 1238 | kind: RoleBinding 1239 | apiVersion: rbac.authorization.k8s.io/v1beta1 1240 | namespace: myns 1241 | name: test 1242 | subjects: 1243 | - kind: Group # can be "user", "ServiceAccount" 1244 | name: dev@user.com 1245 | apiGroup: rbac.authorization.k8s.io 1246 | roleRef: # Only one role per binding! 1247 | kind: Role 1248 | name: admin 1249 | apiGroup: rbac.authorization.k8s.io 1250 | --- 1251 | # Test it with: 1252 | # kubectl auth can-i get pods --as dev@user.com 1253 | # kubectl auth can-i "*" secret --as dev@user.com 1254 | ``` 1255 | 1256 | # Exercises 1257 | 1258 | Highly recommended to do all the exercises from the following links: 1259 | 1260 | - https://github.com/dgkanatsios/CKAD-exercises 1261 | - https://codeburst.io/kubernetes-ckad-weekly-challenges-overview-and-tips-7282b36a2681 1262 | - https://codeburst.io/the-ckad-browser-terminal-10fab2e8122e 1263 | --------------------------------------------------------------------------------