├── .gitignore ├── LICENSE ├── README.md ├── TWITCH.md ├── code └── tf-cluster-asg │ ├── README.md │ ├── main.tf │ ├── provider.tf │ ├── user_data.sh │ ├── variable.tf │ └── versions.tf ├── img └── video.png └── objectives ├── objective1.md ├── objective2.md ├── objective3.md ├── objective4.md └── objective5.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Local Terraform plugins 2 | **/.terraform/plugins/** 3 | 4 | # Terraform secrets files 5 | #*.tfvars 6 | #*.tfvars.* 7 | 8 | # Terraform state files 9 | *.tfstate 10 | *.tfstate.* 11 | 12 | # Crash log files 13 | crash.log -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Wahl Network 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![Version 1.19](https://img.shields.io/badge/version-1.19-blue) 2 | 3 | # Certified Kubernetes Administrator (CKA) Exam Study Guide 4 | 5 | 👋🙂 Welcome! 6 | 7 | This repository contains a study guide created in preparation for passing the Certified Kubernetes Administrator (CKA) exam. All of the content found here was livestreamed on [Twitch](https://www.twitch.tv/wahlnetwork) in collaboration with viewers. 8 | 9 | Each [Exam Objective](#exam-objectives) is broken down into helpful links, commands, videos, scripts, code samples, and more so that you can refer back to this guide during your studies. Everything here is open source and made by a community of inclusive and friendly folks. If you found this project helpful, why not give us a 🌟star🌟 to help increase awareness! 10 | 11 | - [Certified Kubernetes Administrator (CKA) Exam Study Guide](#certified-kubernetes-administrator-cka-exam-study-guide) 12 | - [Project Overview](#project-overview) 13 | - [Exam Objectives](#exam-objectives) 14 | - [Resources](#resources) 15 | - [📝 Official References](#-official-references) 16 | - [🎓 Online Training](#-online-training) 17 | - [🛠 Tools](#-tools) 18 | - [Managed Kubernetes Clusters](#managed-kubernetes-clusters) 19 | - [🤗 Community](#-community) 20 | - [The Fine Print](#the-fine-print) 21 | - [Disclaimer](#disclaimer) 22 | - [Contributing](#contributing) 23 | - [Code of Conduct](#code-of-conduct) 24 | - [License](#license) 25 | 26 | ## Project Overview 27 | 28 | Key things to know: 29 | 30 | - Task tracking is contained on [this Trello board](https://bit.ly/2SzlFRr). 31 | - The `main` branch contains all of the finished work. 32 | - The `draft` branch contains work-in-progress that needs to be polished, verified, and formatted. 33 | 34 | Additionally, you can watch this brief introduction video below: 35 | 36 | [![Announcement Video](img/video.png)](https://youtu.be/dkYCw88mWow) 37 | 38 | ## Exam Objectives 39 | 40 | The CNCF curriculum is posted [here](https://github.com/cncf/curriculum). The percentage after each objective is the relative score weight on the exam. 41 | 42 | - [Objective 1: Cluster Architecture, Installation & Configuration](objectives/objective1.md) ✔ 43 | - [Objective 2: Workloads & Scheduling](objectives/objective2.md) ✔ 44 | - [Objective 3: Services & Networking](objectives/objective3.md) ✔ 45 | - [Objective 4: Storage](objectives/objective4.md) ✔ 46 | - [Objective 5: Troubleshooting](objectives/objective5.md) ✔ 47 | 48 | ## Resources 49 | 50 | Fantastic resources from around the world, sorted alphabetically. 51 | 52 | ### 📝 Official References 53 | 54 | - [Certified Kubernetes Administrator (CKA) Exam 1.19 Curriculum](https://github.com/cncf/curriculum/blob/master/CKA_Curriculum_v1.19.pdf) 55 | - [Certified Kubernetes Administrator Exam Registration](https://training.linuxfoundation.org/certification/certified-kubernetes-administrator-cka/) 56 | - [Enable kubectl autocompletion](https://kubernetes.io/docs/tasks/tools/install-kubectl/#enable-kubectl-autocompletion) 57 | - [kubectl Cheat Sheet](https://kubernetes.io/docs/reference/kubectl/cheatsheet/) 58 | - [kubectl Reference Docs](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands) 59 | - [Linux Foundation's Important Instructions: CKA and CKAD](https://docs.linuxfoundation.org/tc-docs/certification/tips-cka-and-ckad) 60 | 61 | ### 🎓 Online Training 62 | 63 | - [A Cloud Guru's Cloud Native Certified Kubernetes Administrator (CKA) Course](https://acloud.guru/learn/7f5137aa-2d26-4b19-8d8c-025b22667e76) 64 | - [Katacoda - Learn Kubernetes using Interactive Browser-Based Scenarios](https://www.katacoda.com/courses/kubernetes) 65 | - [Pluralsight CKA Learning Path](https://app.pluralsight.com/paths/certificate/certified-kubernetes-administrator) by author [Anthony Nocentino](https://app.pluralsight.com/profile/author/anthony-nocentino) 66 | 67 | ### 🛠 Tools 68 | 69 | - [Kubectl-fzf Autocomplete](https://github.com/bonnefoa/kubectl-fzf) 70 | - [Power tools for kubectl](https://github.com/ahmetb/kubectx) 71 | 72 | ## Managed Kubernetes Clusters 73 | 74 | - Google Kubernetes Engine (GKE) 75 | - [Creating a GKE Zonal Cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster) 76 | - [Generating a kubeconfig entry](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#generate_kubeconfig_entry) 77 | 78 | > Read [Configure Access to Multiple Clusters](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/) to switch between different clusters while studying. 79 | 80 | ### 🤗 Community 81 | 82 | - [Best Practices for CKA Exam](https://medium.com/@emreodabas_20110/best-practices-for-cka-exam-9c1e51ea9b29) 83 | - [CKA-Study Guide](https://github.com/David-VTUK/CKA-StudyGuide) by [David-VTUK](https://github.com/David-VTUK) 84 | - [How I passed the CKA (Certified Kubernetes Administrator) Exam](https://medium.com/platformer-blog/how-i-passed-the-cka-certified-kubernetes-administrator-exam-8943aa24d71d) 85 | - [How to pass the New CKA exam released at September 2020](https://medium.com/@krishna.sharma1408/how-to-pass-the-new-cka-exam-released-at-september-2020-e0e014d67f78) 86 | - [Interesting Kubernetes application demos](https://www.virtuallyghetto.com/2020/06/interesting-kubernetes-application-demos.html) 87 | - [Kubernetes the Hard Way - Kelsey Hightower](https://github.com/kelseyhightower/kubernetes-the-hard-way) 88 | - [Kubernetes tools and resources from learnk8s](https://learnk8s.io/kubernetes-resources) 89 | - [Practice Enough With These 150 Questions for the CKAD Exam](https://medium.com/bb-tutorials-and-thoughts/practice-enough-with-these-questions-for-the-ckad-exam-2f42d1228552) 90 | - [Stack Overflow - Questions tagged kubernetes](https://stackoverflow.com/questions/tagged/kubernetes) 91 | - [Walidshaari's Kubernetes-Certified-Administrator Repo](https://github.com/walidshaari/Kubernetes-Certified-Administrator) 92 | 93 | ## The Fine Print 94 | 95 | ### Disclaimer 96 | 97 | Absolutely nothing in this organization is officially supported and should be used at your own risk. 98 | 99 | ### Contributing 100 | 101 | Contributions via GitHub pull requests are gladly accepted from their original author. Along with any pull requests, please state that the contribution is your original work and that you license the work to the project under the project's open source license. Whether or not you state this explicitly, by submitting any copyrighted material via pull request, email, or other means you agree to license the material under the project's open source license and warrant that you have the legal authority to do so. 102 | 103 | ### Code of Conduct 104 | 105 | All contributors are expected to abide by the [Code of Conduct](https://github.com/WahlNetwork/welcome/blob/master/COC.md). 106 | 107 | ### License 108 | 109 | Every repository in this organization has a license so that you can freely consume, distribute, and modify the content for non-commercial purposes. By default, the [MIT License](https://opensource.org/licenses/MIT) is used. 110 | -------------------------------------------------------------------------------- /TWITCH.md: -------------------------------------------------------------------------------- 1 | # Become a Twitch Contributor 2 | 3 | Watching one of my Twitch live streams and want to earn a `contributor` badge for this repository? Make a contribution to this file! 4 | 5 | 1. Fork the repository to your user account. 6 | 2. Edit this file and add your username to the list below. 7 | 3. Commit the change to your forked copy. 8 | 4. Submit a pull request with the changes. 9 | 5. Once reviewed and merged, you will be listed as a `contributor`! 10 | 11 | ## Snazzy Folks 12 | 13 | - [Chris Wahl (Example)](https://github.com/chriswahl) -------------------------------------------------------------------------------- /code/tf-cluster-asg/README.md: -------------------------------------------------------------------------------- 1 | # Terraform Plan to Create Auto Scaling Group 2 | 3 | This plan will create the following resources: 4 | 5 | - Launch Template with EC2 instances prepared to install Kubernetes with `kubeadm` 6 | - Auto Scaling group to deploy as many instances as your heart desires 7 | 8 | ## Instructions 9 | 10 | - Edit `local.tf` with your environment's information. 11 | - Optionally, edit `user_data.sh` if you wish to alter the startup script. 12 | - Run `terraform init` and `terraform validate` to ensure the code is loaded properly. 13 | - Run `terraform plan` to see the results of a plan against your environment. 14 | - When satisfied, run `terraform apply` to apply the plan and construct the Launch Template and Auto Scaling group. 15 | - If more/less nodes are needed: 16 | - Edit `local.tf` and modify the `node-count` value to the desired amount. 17 | - Re-run `terraform apply` and the Auto Scaling group will create/destroy nodes to reach the new value. 18 | - When done, use `terraform destroy` to remove all resources and terminate potential billing. 19 | -------------------------------------------------------------------------------- /code/tf-cluster-asg/main.tf: -------------------------------------------------------------------------------- 1 | # Provides the security group id value 2 | data "aws_security_group" "sg" { 3 | tags = { 4 | Name = var.security-group-name 5 | } 6 | } 7 | 8 | # Provides the subnet id value 9 | data "aws_subnet" "subnet" { 10 | tags = { 11 | Name = var.subnet-name 12 | } 13 | } 14 | 15 | # Provides an AWS Launch Template for constructing EC2 instances 16 | resource "aws_launch_template" "cka-node" { 17 | name = var.instance-name 18 | image_id = "ami-07a29e5e945228fa1" 19 | instance_type = var.instance-type 20 | key_name = var.keypair-name 21 | vpc_security_group_ids = [data.aws_security_group.sg.id] 22 | block_device_mappings { 23 | device_name = "/dev/sda1" 24 | ebs { 25 | volume_size = 8 26 | encrypted = "true" 27 | } 28 | } 29 | tags = { 30 | environment = var.tag-environment 31 | source = "Terraform" 32 | } 33 | tag_specifications { 34 | resource_type = "instance" 35 | tags = { 36 | Name = var.instance-name 37 | environment = var.tag-environment 38 | source = "Terraform" 39 | } 40 | } 41 | tag_specifications { 42 | resource_type = "volume" 43 | tags = { 44 | Name = var.instance-name 45 | environment = var.tag-environment 46 | source = "Terraform" 47 | } 48 | } 49 | user_data = filebase64("user_data.sh") 50 | } 51 | 52 | # Provides an Auto Scaling group using instances described in the Launch Template 53 | resource "aws_autoscaling_group" "cka-cluster" { 54 | desired_capacity = var.node-count 55 | max_size = var.node-count 56 | min_size = var.node-count 57 | name = var.asg-name 58 | vpc_zone_identifier = [data.aws_subnet.subnet.id] 59 | launch_template { 60 | id = aws_launch_template.cka-node.id 61 | version = "$Latest" 62 | } 63 | } 64 | -------------------------------------------------------------------------------- /code/tf-cluster-asg/provider.tf: -------------------------------------------------------------------------------- 1 | terraform { 2 | required_providers { 3 | aws = { 4 | source = "hashicorp/aws" 5 | version = "~>3.3.0" 6 | } 7 | } 8 | } 9 | 10 | provider "aws" { 11 | region = "us-west-2" 12 | } 13 | -------------------------------------------------------------------------------- /code/tf-cluster-asg/user_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Disable Swap 4 | sudo swapoff -a 5 | 6 | # Bridge Network 7 | sudo modprobe br_netfilter 8 | sudo cat <<'EOF' | sudo tee /etc/sysctl.d/k8s.conf 9 | net.bridge.bridge-nf-call-ip6tables = 1 10 | net.bridge.bridge-nf-call-iptables = 1 11 | EOF 12 | sudo sysctl --system 13 | 14 | # Install Docker 15 | sudo curl -fsSL https://get.docker.com -o /home/ubuntu/get-docker.sh 16 | sudo sh /home/ubuntu/get-docker.sh 17 | 18 | # Install Kube tools 19 | sudo apt-get update && sudo apt-get install -y apt-transport-https curl 20 | curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - 21 | cat <<'EOF' | sudo tee /etc/apt/sources.list.d/kubernetes.list 22 | deb https://apt.kubernetes.io/ kubernetes-xenial main 23 | EOF 24 | sudo apt-get update 25 | sudo apt-get install -y kubelet kubeadm kubectl 26 | sudo apt-mark hold kubelet kubeadm kubectl 27 | 28 | # Setup aliases 29 | sudo printf "alias k=kubectl\ncomplete -F __start_kubectl k" > ~/.bash_aliases -------------------------------------------------------------------------------- /code/tf-cluster-asg/variable.tf: -------------------------------------------------------------------------------- 1 | variable "node-count" { 2 | default = 1 3 | description = "The quantity of EC2 instances to launch in the Auto Scaling group" 4 | type = number 5 | } 6 | 7 | variable "instance-name" { 8 | description = "The name of the EC2 instance" 9 | type = string 10 | } 11 | 12 | variable "asg-name" { 13 | description = "The name of the Auto Scaling group" 14 | type = string 15 | } 16 | 17 | variable "keypair-name" { 18 | description = "The name of the EC2 key pair" 19 | type = string 20 | } 21 | 22 | variable "tag-environment" { 23 | description = "Assigns and AWS environment tag to resources" 24 | type = string 25 | } 26 | 27 | variable "security-group-name" { 28 | description = "The name of the VPC security group" 29 | type = string 30 | } 31 | 32 | variable "subnet-name" { 33 | description = "The name of the VCP subnet" 34 | type = string 35 | } 36 | 37 | variable "instance-type" { 38 | description = "The type of EC2 instance to deploy" 39 | type = string 40 | } 41 | -------------------------------------------------------------------------------- /code/tf-cluster-asg/versions.tf: -------------------------------------------------------------------------------- 1 | terraform { 2 | required_version = ">= 0.13" 3 | } 4 | -------------------------------------------------------------------------------- /img/video.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WahlNetwork/certified-kubernetes-administrator-cka-exam/09aea886364f16b1c91a8a2088ca408acce3fc2d/img/video.png -------------------------------------------------------------------------------- /objectives/objective1.md: -------------------------------------------------------------------------------- 1 | # Objective 1: Cluster Architecture, Installation & Configuration 2 | 3 | - [Objective 1: Cluster Architecture, Installation & Configuration](#objective-1-cluster-architecture-installation--configuration) 4 | - [1.1 Manage Role Based Access Control (RBAC)](#11-manage-role-based-access-control-rbac) 5 | - [Lab Environment](#lab-environment) 6 | - [Lab Practice](#lab-practice) 7 | - [1.2 Use Kubeadm to Install a Basic Cluster](#12-use-kubeadm-to-install-a-basic-cluster) 8 | - [Kubeadm Tasks for All Nodes](#kubeadm-tasks-for-all-nodes) 9 | - [Kubeadm Tasks for Single Control Node](#kubeadm-tasks-for-single-control-node) 10 | - [Kubeadm Tasks for Worker Node(s)](#kubeadm-tasks-for-worker-nodes) 11 | - [Kubeadm Troubleshooting](#kubeadm-troubleshooting) 12 | - [Kubeadm Optional Tasks](#kubeadm-optional-tasks) 13 | - [1.3 Manage A Highly-Available Kubernetes Cluster](#13-manage-a-highly-available-kubernetes-cluster) 14 | - [HA Deployment Types](#ha-deployment-types) 15 | - [Upgrading from Single Control-Plane to High Availability](#upgrading-from-single-control-plane-to-high-availability) 16 | - [1.4 Provision Underlying Infrastructure to Deploy a Kubernetes Cluster](#14-provision-underlying-infrastructure-to-deploy-a-kubernetes-cluster) 17 | - [1.5 Perform a Version Upgrade on a Kubernetes Cluster using Kubeadm](#15-perform-a-version-upgrade-on-a-kubernetes-cluster-using-kubeadm) 18 | - [First Control Plane Node](#first-control-plane-node) 19 | - [Additional Control Plane Nodes](#additional-control-plane-nodes) 20 | - [Upgrade Control Plane Node Kubectl And Kubelet Tools](#upgrade-control-plane-node-kubectl-and-kubelet-tools) 21 | - [Upgrade Worker Nodes](#upgrade-worker-nodes) 22 | - [1.6 Implement Etcd Backup And Restore](#16-implement-etcd-backup-and-restore) 23 | - [Snapshot The Keyspace](#snapshot-the-keyspace) 24 | - [Restore From Snapshot](#restore-from-snapshot) 25 | 26 | ## 1.1 Manage Role Based Access Control (RBAC) 27 | 28 | Documentation and Resources: 29 | 30 | - [Kubectl Cheat Sheet](https://kubernetes.io/docs/reference/kubectl/cheatsheet/) 31 | - [Using RBAC Authorization](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) 32 | - [A Practical Approach to Understanding Kubernetes Authorization](https://thenewstack.io/a-practical-approach-to-understanding-kubernetes-authorization/) 33 | 34 | RBAC is handled by roles (permissions) and bindings (assignment of permissions to subjects): 35 | 36 | | Object | Description | 37 | | -------------------- | -------------------------------------------------------------------------------------------- | 38 | | `Role` | Permissions within a particular namespace | 39 | | `ClusterRole` | Permissions to non-namespaced resources; can be used to grant the same permissions as a Role | 40 | | `RoleBinding` | Grants the permissions defined in a role to a user or set of users | 41 | | `ClusterRoleBinding` | Grant permissions across a whole cluster | 42 | 43 | ### Lab Environment 44 | 45 | If desired, use a managed Kubernetes cluster, such as Amazon EKS, to immediately begin working with RBAC. The command `aws --region REGION eks update-kubeconfig --name CLUSTERNAME` will generate a .kube configuration file on your workstation to permit kubectl commands. 46 | 47 | ### Lab Practice 48 | 49 | Create the `wahlnetwork1` namespace. 50 | 51 | `kubectl create namespace wahlnetwork1` 52 | 53 | --- 54 | 55 | Create a deployment in the `wahlnetwork1` namespace using the image of your choice: 56 | 57 | 1. `kubectl create deployment hello-node --image=k8s.gcr.io/echoserver:1.4 -n wahlnetwork1` 58 | 1. `kubectl create deployment busybox --image=busybox -n wahlnetwork1 -- sleep 2000` 59 | 60 | You can view the yaml file by adding `--dry-run=client -o yaml` to the end of either deployment. 61 | 62 | ```yaml 63 | apiVersion: apps/v1 64 | kind: Deployment 65 | metadata: 66 | creationTimestamp: null 67 | labels: 68 | app: hello-node 69 | name: hello-node 70 | namespace: wahlnetwork1 71 | spec: 72 | replicas: 1 73 | selector: 74 | matchLabels: 75 | app: hello-node 76 | strategy: {} 77 | template: 78 | metadata: 79 | creationTimestamp: null 80 | labels: 81 | app: hello-node 82 | spec: 83 | containers: 84 | - image: k8s.gcr.io/echoserver:1.4 85 | name: echoserver 86 | resources: {} 87 | ``` 88 | 89 | --- 90 | 91 | Create the `pod-reader` role in the `wahlnetwork1` namespace. 92 | 93 | `kubectl create role pod-reader --verb=get --verb=list --verb=watch --resource=pods -n wahlnetwork1` 94 | 95 | > Alternatively, use `kubectl create role pod-reader --verb=get --verb=list --verb=watch --resource=pods -n wahlnetwork1 --dry-run=client -o yaml` to output a proper yaml configuration. 96 | 97 | ```yaml 98 | apiVersion: rbac.authorization.k8s.io/v1 99 | kind: Role 100 | metadata: 101 | creationTimestamp: null 102 | name: pod-reader 103 | namespace: wahlnetwork1 104 | rules: 105 | - apiGroups: 106 | - "" 107 | resources: 108 | - pods 109 | verbs: 110 | - get 111 | - list 112 | - watch 113 | ``` 114 | 115 | --- 116 | 117 | Create the `read-pods` rolebinding between the role named `pod-reader` and the user `spongebob` in the `wahlnetwork1` namespace. 118 | 119 | `kubectl create rolebinding --role=pod-reader --user=spongebob read-pods -n wahlnetwork1` 120 | 121 | > Alternatively, use `kubectl create rolebinding --role=pod-reader --user=spongebob read-pods -n wahlnetwork1 --dry-run=client -o yaml` to output a proper yaml configuration. 122 | 123 | ```yaml 124 | apiVersion: rbac.authorization.k8s.io/v1 125 | kind: RoleBinding 126 | metadata: 127 | creationTimestamp: null 128 | name: read-pods 129 | roleRef: 130 | apiGroup: rbac.authorization.k8s.io 131 | kind: Role 132 | name: pod-reader 133 | subjects: 134 | - apiGroup: rbac.authorization.k8s.io 135 | kind: User 136 | name: spongebob 137 | ``` 138 | 139 | --- 140 | 141 | Create the `cluster-secrets-reader` clusterrole. 142 | 143 | `kubectl create clusterrole cluster-secrets-reader --verb=get --verb=list --verb=watch --resource=secrets` 144 | 145 | > Alternatively, use `kubectl create clusterrole cluster-secrets-reader --verb=get --verb=list --verb=watch --resource=secrets --dry-run=client -o yaml` to output a proper yaml configuration. 146 | 147 | ```yaml 148 | apiVersion: rbac.authorization.k8s.io/v1 149 | kind: ClusterRole 150 | metadata: 151 | creationTimestamp: null 152 | name: cluster-secrets-reader 153 | rules: 154 | - apiGroups: 155 | - "" 156 | resources: 157 | - secrets 158 | verbs: 159 | - get 160 | - list 161 | - watch 162 | ``` 163 | 164 | --- 165 | 166 | Create the `cluster-read-secrets` clusterrolebinding between the clusterrole named `cluster-secrets-reader` and the user `gizmo`. 167 | 168 | `kubectl create clusterrolebinding --clusterrole=cluster-secrets-reader --user=gizmo cluster-read-secrets` 169 | 170 | > Alternatively, use `kubectl create clusterrolebinding --clusterrole=cluster-secrets-reader --user=gizmo cluster-read-secrets --dry-run=client -o yaml` to output a proper yaml configuration. 171 | 172 | ```yaml 173 | apiVersion: rbac.authorization.k8s.io/v1 174 | kind: ClusterRoleBinding 175 | metadata: 176 | creationTimestamp: null 177 | name: cluster-read-secrets 178 | roleRef: 179 | apiGroup: rbac.authorization.k8s.io 180 | kind: ClusterRole 181 | name: cluster-secrets-reader 182 | subjects: 183 | - apiGroup: rbac.authorization.k8s.io 184 | kind: User 185 | name: gizmo 186 | ``` 187 | 188 | Test to see if this works by running the `auth` command. 189 | 190 | `kubectl auth can-i get secrets --as=gizmo` 191 | 192 | Attempt to get secrets as the `gizmo` user. 193 | 194 | `kubectl get secrets --as=gizmo` 195 | 196 | ```bash 197 | NAME TYPE DATA AGE 198 | default-token-lz87v kubernetes.io/service-account-token 3 7d1h 199 | ``` 200 | 201 | ## 1.2 Use Kubeadm to Install a Basic Cluster 202 | 203 | Official documentation: [Creating a cluster with kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) 204 | 205 | > Terraform code is available [here](../code/tf-cluster-asg/) to create the resources necessary to experiment with `kubeadm` 206 | 207 | ### Kubeadm Tasks for All Nodes 208 | 209 | - Create Amazon EC2 Instances 210 | - Create an AWS Launch Template using an Ubuntu 18.04 LTS image (or newer) of size `t3a.small` (2 CPU, 2 GiB Memory). 211 | - Disable the [swap](https://askubuntu.com/questions/214805/how-do-i-disable-swap) file. 212 | - Note: This can be validated by using the console command `free` when SSH'd to the instance. The swap space total should be 0. 213 | - Consume this template as part of an Auto Scaling Group of 1 or more instances. This makes deployment of new instances and removal of old instances trivial. 214 | - [Configure iptables](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#letting-iptables-see-bridged-traffic) 215 | - This allows iptables to see bridged traffic. 216 | - [Install the Docker container runtime](https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker) 217 | - The [docker-install](https://github.com/docker/docker-install) script is handy for this. 218 | - [Install kubeadm, kubelet, and kubectl](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl) 219 | 220 | Alternatively, use a `user-data` bash script attached to the Launch Template: 221 | 222 | ```bash 223 | #!/bin/bash 224 | 225 | # Disable Swap 226 | sudo swapoff -a 227 | 228 | # Bridge Network 229 | sudo modprobe br_netfilter 230 | sudo cat <<'EOF' | sudo tee /etc/sysctl.d/k8s.conf 231 | net.bridge.bridge-nf-call-ip6tables = 1 232 | net.bridge.bridge-nf-call-iptables = 1 233 | EOF 234 | sudo sysctl --system 235 | 236 | # Install Docker 237 | sudo curl -fsSL https://get.docker.com -o /home/ubuntu/get-docker.sh 238 | sudo sh /home/ubuntu/get-docker.sh 239 | 240 | # Install Kube tools 241 | sudo apt-get update && sudo apt-get install -y apt-transport-https curl 242 | curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - 243 | cat <<'EOF' | sudo tee /etc/apt/sources.list.d/kubernetes.list 244 | deb https://apt.kubernetes.io/ kubernetes-xenial main 245 | EOF 246 | sudo apt-get update 247 | sudo apt-get install -y kubelet kubeadm kubectl 248 | sudo apt-mark hold kubelet kubeadm kubectl 249 | ``` 250 | 251 | Optionally, add `sudo kubeadm config images pull` to the end of the script to pre-pull images required for setting up a Kubernetes cluster. 252 | 253 | ```bash 254 | $ sudo kubeadm config images pull 255 | 256 | [config/images] Pulled k8s.gcr.io/kube-apiserver:v1.19.2 257 | [config/images] Pulled k8s.gcr.io/kube-controller-manager:v1.19.2 258 | [config/images] Pulled k8s.gcr.io/kube-scheduler:v1.19.2 259 | [config/images] Pulled k8s.gcr.io/kube-proxy:v1.19.2 260 | [config/images] Pulled k8s.gcr.io/pause:3.2 261 | [config/images] Pulled k8s.gcr.io/etcd:3.4.13-0 262 | [config/images] Pulled k8s.gcr.io/coredns:1.7.0 263 | ``` 264 | 265 | ### Kubeadm Tasks for Single Control Node 266 | 267 | - Initialize the cluster 268 | - Choose your Container Network Interface (CNI) plugin. This guide uses [Calico's CNI](https://docs.projectcalico.org/about/about-calico). 269 | - Run `sudo kubeadm init --pod-network-cidr=192.168.0.0/16` to initialize the cluster and provide a pod network aligned to [Calico's default configuration](https://docs.projectcalico.org/getting-started/kubernetes/quickstart#create-a-single-host-kubernetes-cluster). 270 | - Write down the `kubeadm join` output to [join worker nodes](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#join-nodes) later in this guide. 271 | - Example `kubeadm join 10.0.0.100:6443 --token 12345678901234567890 --discovery-token-ca-cert-hash sha256:123456789012345678901234567890123456789012345678901234567890` 272 | - [Install Calico](https://docs.projectcalico.org/getting-started/kubernetes/quickstart) 273 | - [Configure local kubectl access](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#optional-controlling-your-cluster-from-machines-other-than-the-control-plane-node) 274 | - This step simply copies the `admin.conf` file into a location accessible for a regular user. 275 | 276 | Alternatively, use the [Flannel CNI](https://coreos.com/flannel/docs/latest/kubernetes.html). 277 | 278 | - Run `sudo kubeadm init --pod-network-cidr=10.244.0.0/16` to initialize the cluster and provide a pod network aligned to [Flannel's default configuration](https://github.com/coreos/flannel/blob/master/Documentation/kubernetes.md). 279 | - Note: The [`kube-flannel.yml`](https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml) file is hosted in the same location. 280 | 281 | ### Kubeadm Tasks for Worker Node(s) 282 | 283 | - [Join the cluster](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#join-nodes) 284 | - Note: You can view the cluster config with `kubectl config view`. This includes the cluster server address (e.g. `server: https://10.0.0.100:6443`) 285 | 286 | ### Kubeadm Troubleshooting 287 | 288 | - If using `kubeadm init` without a pod network CIDR the CoreDNS pods will remain [stuck in pending state](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#coredns-or-kube-dns-is-stuck-in-the-pending-state) 289 | - Broke cluster and want to start over? Use `kubeadm reset` and `rm -rf .kube` in the user home directory to remove the old config and avoid [TLS certificate errors](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#tls-certificate-errors) 290 | - If seeing `error: error loading config file "/etc/kubernetes/admin.conf": open /etc/kubernetes/admin.conf: permission denied` it likely means the `KUBECONFIG` variable is set to that path, try `unset KUBECONFIG` to use the `$HOME/.kube/config` file. 291 | 292 | ### Kubeadm Optional Tasks 293 | 294 | - [Install kubectl client locally on Windows](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-windows) for those using this OS. 295 | - Single node cluster? [Taint the control node](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to accept pods without dedicated worker nodes. 296 | - Deploy the "hello-node" app from the [minikube tutorial](https://kubernetes.io/docs/tutorials/hello-minikube/) to test basic functionality. 297 | 298 | ## 1.3 Manage A Highly-Available Kubernetes Cluster 299 | 300 | [High Availability Production Environment](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/) 301 | 302 | Kubernetes Components for HA: 303 | 304 | - Load Balancer / VIP 305 | - DNS records 306 | - etcd Endpoint 307 | - Certificates 308 | - Any HA specific queries / configuration / settings 309 | 310 | ### HA Deployment Types 311 | 312 | - With stacked control plane nodes. This approach requires less infrastructure. The etcd members and control plane nodes are co-located. 313 | - With an external etcd cluster. This approach requires more infrastructure. The control plane nodes and etcd members are separated. ([source](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/)) 314 | 315 | ### Upgrading from Single Control-Plane to High Availability 316 | 317 | If you have plans to upgrade this single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer. ([source](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#initializing-your-control-plane-node)) 318 | 319 | ## 1.4 Provision Underlying Infrastructure to Deploy a Kubernetes Cluster 320 | 321 | See Objective [1.2 Use Kubeadm to Install a Basic Cluster](#12-use-kubeadm-to-install-a-basic-cluster). 322 | 323 | > Note: Make sure that swap is disabled on all nodes. 324 | 325 | ## 1.5 Perform a Version Upgrade on a Kubernetes Cluster using Kubeadm 326 | 327 | - [Upgrading kubeadm clusters](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/) 328 | - [Safely Drain a Node while Respecting the PodDisruptionBudget](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) 329 | - [Cluster Management: Maintenance on a Node](https://kubernetes.io/docs/tasks/administer-cluster/cluster-management/#maintenance-on-a-node) 330 | 331 | > Note: All containers are restarted after upgrade, because the container spec hash value is changed. Upgrades are constrained from one minor version to the next minor version. 332 | 333 | ### First Control Plane Node 334 | 335 | Update the kubeadm tool and verify the new version 336 | 337 | > Note: The `--allow-change-held-packages` flag is used because kubeadm updates should be held to prevent automated updates. 338 | 339 | ```bash 340 | apt-get update && \ 341 | apt-get install -y --allow-change-held-packages kubeadm=1.19.x-00 342 | kubeadm version 343 | ``` 344 | 345 | --- 346 | 347 | [Drain](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain) the node to mark as unschedulable 348 | 349 | `kubectl drain $NODENAME --ignore-daemonsets` 350 | 351 |
Drain Diagram 352 | 353 | ![drain](https://kubernetes.io/images/docs/kubectl_drain.svg) 354 | 355 |
356 | 357 | --- 358 | 359 | Perform an upgrade plan to validate that your cluster can be upgraded 360 | 361 | > Note: This also fetches the versions you can upgrade to and shows a table with the component config version states. 362 | 363 | `sudo kubeadm upgrade plan` 364 | 365 | --- 366 | 367 | Upgrade the cluster 368 | 369 | `sudo kubeadm upgrade apply v1.19.x` 370 | 371 | --- 372 | 373 | [Uncordon](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#uncordon) the node to mark as schedulable 374 | 375 | `kubectl uncordon $NODENAME` 376 | 377 | ### Additional Control Plane Nodes 378 | 379 | Repeat the first control plane node steps while replacing the "upgrade the cluster" step using the command below: 380 | 381 | `sudo kubeadm upgrade node` 382 | 383 | ### Upgrade Control Plane Node Kubectl And Kubelet Tools 384 | 385 | Upgrade the kubelet and kubectl on all control plane nodes 386 | 387 | ```bash 388 | apt-get update && \ 389 | apt-get install -y --allow-change-held-packages kubelet=1.19.x-00 kubectl=1.19.x-00 390 | ``` 391 | 392 | --- 393 | 394 | Restart the kubelet 395 | 396 | ```bash 397 | sudo systemctl daemon-reload 398 | sudo systemctl restart kubelet 399 | ``` 400 | 401 | ### Upgrade Worker Nodes 402 | 403 | Upgrade kubeadm 404 | 405 | ```bash 406 | apt-get update && \ 407 | apt-get install -y --allow-change-held-packages kubeadm=1.19.x-00 408 | ``` 409 | 410 | --- 411 | 412 | Drain the node 413 | 414 | `kubectl drain $NODENAME --ignore-daemonsets` 415 | 416 | --- 417 | 418 | Upgrade the kubelet configuration 419 | 420 | `sudo kubeadm upgrade node` 421 | 422 | --- 423 | 424 | Upgrade kubelet and kubectl 425 | 426 | ```bash 427 | apt-get update && \ 428 | apt-get install -y --allow-change-held-packages kubelet=1.19.x-00 kubectl=1.19.x-00 429 | 430 | sudo systemctl daemon-reload 431 | sudo systemctl restart kubelet 432 | ``` 433 | 434 | --- 435 | 436 | Uncordon the node 437 | 438 | `kubectl uncordon $NODENAME` 439 | 440 | ## 1.6 Implement Etcd Backup And Restore 441 | 442 | - [Operating etcd clusters for Kubernetes: Backing up an etcd cluster](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster) 443 | - [Etcd Documentation: Disaster Recovery](https://etcd.io/docs/v3.4.0/op-guide/recovery/) 444 | - [Kubernetes Tips: Backup and Restore Etcd](https://medium.com/better-programming/kubernetes-tips-backup-and-restore-etcd-97fe12e56c57) 445 | 446 | ### Snapshot The Keyspace 447 | 448 | Use `etcdctl snapshot save`. 449 | 450 | Snapshot the keyspace served by \$ENDPOINT to the file snapshot.db: 451 | 452 | `ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db` 453 | 454 | ### Restore From Snapshot 455 | 456 | Use `etcdctl snapshot restore`. 457 | 458 | > Note: Restoring overwrites some snapshot metadata (specifically, the member ID and cluster ID); the member loses its former identity. 459 | > 460 | > Note: Snapshot integrity is verified when restoring from a snapshot using an integrity hash created by `etcdctl snapshot save`, but not when restoring from a file copy. 461 | 462 | Create new etcd data directories (m1.etcd, m2.etcd, m3.etcd) for a three member cluster: 463 | 464 | ```bash 465 | $ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \ 466 | --name m1 \ 467 | --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \ 468 | --initial-cluster-token etcd-cluster-1 \ 469 | --initial-advertise-peer-urls http://host1:2380 470 | $ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \ 471 | --name m2 \ 472 | --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \ 473 | --initial-cluster-token etcd-cluster-1 \ 474 | --initial-advertise-peer-urls http://host2:2380 475 | $ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \ 476 | --name m3 \ 477 | --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \ 478 | --initial-cluster-token etcd-cluster-1 \ 479 | --initial-advertise-peer-urls http://host3:2380 480 | ``` 481 | -------------------------------------------------------------------------------- /objectives/objective2.md: -------------------------------------------------------------------------------- 1 | # Objective 2: Workloads & Scheduling 2 | 3 | - [Objective 2: Workloads & Scheduling](#objective-2-workloads--scheduling) 4 | - [2.1 Understand Deployments And How To Perform Rolling Update And Rollbacks](#21-understand-deployments-and-how-to-perform-rolling-update-and-rollbacks) 5 | - [Create Deployment](#create-deployment) 6 | - [Perform Rolling Update](#perform-rolling-update) 7 | - [Perform Rollbacks](#perform-rollbacks) 8 | - [2.2 Use Configmaps And Secrets To Configure Applications](#22-use-configmaps-and-secrets-to-configure-applications) 9 | - [Configmaps](#configmaps) 10 | - [Secrets](#secrets) 11 | - [Other Concepts](#other-concepts) 12 | - [2.3 Know How To Scale Applications](#23-know-how-to-scale-applications) 13 | - [2.4 Understand The Primitives Used To Create Robust, Self-Healing, Application Deployments](#24-understand-the-primitives-used-to-create-robust-self-healing-application-deployments) 14 | - [2.5 Understand How Resource Limits Can Affect Pod Scheduling](#25-understand-how-resource-limits-can-affect-pod-scheduling) 15 | - [2.6 Awareness Of Manifest Management And Common Templating Tools](#26-awareness-of-manifest-management-and-common-templating-tools) 16 | 17 | ## 2.1 Understand Deployments And How To Perform Rolling Update And Rollbacks 18 | 19 | [Official Documentation](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#use-case) 20 | 21 | Deployments are used to manage Pods and ReplicaSets in a declarative manner. 22 | 23 | ### Create Deployment 24 | 25 | Using the [nginx](https://hub.docker.com/_/nginx) image on Docker Hub, we can use a Deployment to push any number of replicas of that image to the cluster. 26 | 27 | Create the `nginx` deployment in the `wahlnetwork1` namespace. 28 | 29 | `kubectl create deployment nginx --image=nginx --replicas=3 -n wahlnetwork1` 30 | 31 | > Alternatively, use `kubectl create deployment nginx --image=nginx --replicas=3 -n wahlnetwork1 --dry-run=client -o yaml` to output a proper yaml configuration. 32 | 33 | ```yaml 34 | apiVersion: apps/v1 35 | kind: Deployment 36 | metadata: 37 | creationTimestamp: null 38 | labels: 39 | app: nginx 40 | name: nginx 41 | namespace: wahlnetwork1 42 | spec: 43 | replicas: 3 44 | selector: 45 | matchLabels: 46 | app: nginx 47 | strategy: {} 48 | template: 49 | metadata: 50 | creationTimestamp: null 51 | labels: 52 | app: nginx 53 | spec: 54 | containers: 55 | - image: nginx 56 | name: nginx 57 | resources: {} 58 | ``` 59 | 60 | ### Perform Rolling Update 61 | 62 | [Official Documentation](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment) 63 | 64 | Used to make changes to the pod's template and roll them out to the cluster. Triggered when data within `.spec.template` is changed. 65 | 66 | Update the `nginx` deployment in the `wahlnetwork1` namespace to use version `1.16.1` 67 | 68 | `kubectl set image deployment/nginx nginx=nginx:1.16.1 -n wahlnetwork1 --record` 69 | 70 | Track the rollout status. 71 | 72 | `kubectl rollout status deployment.v1.apps/nginx -n wahlnetwork1` 73 | 74 | ```bash 75 | Waiting for deployment "nginx" rollout to finish: 1 out of 2 new replicas have been updated... 76 | Waiting for deployment "nginx" rollout to finish: 1 out of 2 new replicas have been updated... 77 | Waiting for deployment "nginx" rollout to finish: 1 out of 2 new replicas have been updated... 78 | Waiting for deployment "nginx" rollout to finish: 1 old replicas are pending termination... 79 | Waiting for deployment "nginx" rollout to finish: 1 old replicas are pending termination... 80 | deployment "nginx" successfully rolled out 81 | ``` 82 | 83 | ### Perform Rollbacks 84 | 85 | [Official Documentation](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment) 86 | 87 | Rollbacks offer a method for reverting the changes to a pod's `.spec.template` data to a previous version. By default, executing the `rollout undo` command will revert to the previous version. The desired version can also be declared. 88 | 89 | Review the version history for the `nginx` deployment in the `wahlnetwork1` namespace. In this scenario, other revisions 1-4 have been made to simulate a deployment lifecycle. The 4th revision specifies a fake image version of `1.222222222222` to force a rolling update failure. 90 | 91 | `kubectl rollout history deployment.v1.apps/nginx -n wahlnetwork1` 92 | 93 | ```bash 94 | deployment.apps/nginx 95 | REVISION CHANGE-CAUSE 96 | 1 97 | 2 kubectl.exe set image deployment/nginx nginx=nginx:1.16.1 --record=true --namespace=wahlnetwork1 98 | 3 kubectl.exe set image deployment/nginx nginx=nginx:1.14.1 --record=true --namespace=wahlnetwork1 99 | 4 kubectl.exe set image deployment/nginx nginx=nginx:1.222222222222 --record=true --namespace=wahlnetwork1 100 | ``` 101 | 102 | Revert to the previous version of the `nginx` deployment to use image version `1.14.1`. This forces revision 3 to become revision 5. Note that revision 3 no longer exists. 103 | 104 | `kubectl rollout undo deployment.v1.apps/nginx -n wahlnetwork1` 105 | 106 | ```bash 107 | deployment.apps/nginx rolled back 108 | 109 | ~ kubectl rollout history deployment.v1.apps/nginx -n wahlnetwork1 110 | 111 | deployment.apps/nginx 112 | REVISION CHANGE-CAUSE 113 | 1 114 | 2 kubectl.exe set image deployment/nginx nginx=nginx:1.16.1 --record=true --namespace=wahlnetwork1 115 | 4 kubectl.exe set image deployment/nginx nginx=nginx:1.222222222222 --record=true --namespace=wahlnetwork1 116 | 5 kubectl.exe set image deployment/nginx nginx=nginx:1.14.1 --record=true --namespace=wahlnetwork1 117 | ``` 118 | 119 | Revert to revision 2 of the `nginx` deployment, which becomes revision 6 (the next available revision number). Note that revision 2 no longer exists. 120 | 121 | `kubectl rollout undo deployment.v1.apps/nginx -n wahlnetwork1 --to-revision=2` 122 | 123 | ```bash 124 | ~ kubectl rollout history deployment.v1.apps/nginx -n wahlnetwork1 125 | 126 | deployment.apps/nginx 127 | REVISION CHANGE-CAUSE 128 | 1 129 | 4 kubectl.exe set image deployment/nginx nginx=nginx:1.222222222222 --record=true --namespace=wahlnetwork1 130 | 5 kubectl.exe set image deployment/nginx nginx=nginx:1.14.1 --record=true --namespace=wahlnetwork1 131 | 6 kubectl.exe set image deployment/nginx nginx=nginx:1.16.1 --record=true --namespace=wahlnetwork1 132 | ``` 133 | 134 | ## 2.2 Use Configmaps And Secrets To Configure Applications 135 | 136 | ### Configmaps 137 | 138 | API object used to store non-confidential data in key-value pairs 139 | 140 | - [Official Documentation](https://kubernetes.io/docs/concepts/configuration/configmap/) 141 | [Configure a Pod to Use a ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) 142 | 143 | Create a configmap named `game-config` using a directory. 144 | 145 | `kubectl create configmap game-config --from-file=/code/configmap/` 146 | 147 | ```bash 148 | ~ k describe configmap game-config 149 | 150 | Name: game-config 151 | Namespace: default 152 | Labels: 153 | Annotations: 154 | 155 | Data 156 | ==== 157 | game.properties: 158 | ---- 159 | enemies=aliens 160 | lives=3 161 | enemies.cheat=true 162 | enemies.cheat.level=noGoodRotten 163 | secret.code.passphrase=UUDDLRLRBABAS 164 | secret.code.allowed=true 165 | secret.code.lives=30 166 | 167 | ui.properties: 168 | ---- 169 | color.good=purple 170 | color.bad=yellow 171 | allow.textmode=true 172 | how.nice.to.look=fairlyNice 173 | 174 | Events: 175 | ``` 176 | 177 | Create a configmap named `game-config` using a file. 178 | 179 | `kubectl create configmap game-config-2 --from-file=/code/configmap/game.properties` 180 | 181 | Create a configmap named `game-config` using an env-file. 182 | 183 | `kubectl create configmap game-config-env-file --from-env-file=/code/configmap/game-env-file.properties` 184 | 185 | Create a configmap named `special-config` using a literal key/value pair. 186 | 187 | `kubectl create configmap special-config --from-literal=special.how=very` 188 | 189 | Edit a configmap named `game-config`. 190 | 191 | `kubectl edit configmap game-config` 192 | 193 | Get a configmap named `game-config` and output the response into yaml. 194 | 195 | `kubectl get configmaps game-config -o yaml` 196 | 197 | Use a configmap with a pod by declaring a value for `.spec.containers.env.name.valueFrom.configMapKeyRef`. 198 | 199 | ```yaml 200 | apiVersion: v1 201 | kind: Pod 202 | metadata: 203 | name: dapi-test-pod 204 | spec: 205 | containers: 206 | - name: test-container 207 | image: k8s.gcr.io/busybox 208 | command: ["/bin/sh", "-c", "env"] 209 | env: 210 | # Define the environment variable 211 | - name: SPECIAL_LEVEL_KEY 212 | valueFrom: 213 | configMapKeyRef: 214 | # The ConfigMap containing the value you want to assign to SPECIAL_LEVEL_KEY 215 | name: special-config 216 | # Specify the key associated with the value 217 | key: special.how 218 | restartPolicy: Never 219 | ``` 220 | 221 | Investigate the configmap value `very` from the key `SPECIAL_LEVEL_KEY` by reviewing the logs for the pod or by connecting to the pod directly. 222 | 223 | `kubectl exec -n wahlnetwork1 --stdin nginx-6889dfccd5-msmn8 --tty -- /bin/bash` 224 | 225 | ```bash 226 | ~ kubectl logs dapi-test-pod 227 | 228 | KUBERNETES_SERVICE_PORT=443 229 | KUBERNETES_PORT=tcp://10.96.0.1:443 230 | HOSTNAME=dapi-test-pod 231 | SHLVL=1 232 | HOME=/root 233 | KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1 234 | PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 235 | KUBERNETES_PORT_443_TCP_PORT=443 236 | KUBERNETES_PORT_443_TCP_PROTO=tcp 237 | SPECIAL_LEVEL_KEY=very 238 | KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443 239 | KUBERNETES_SERVICE_PORT_HTTPS=443 240 | PWD=/ 241 | KUBERNETES_SERVICE_HOST=10.96.0.1 242 | ``` 243 | 244 | ### Secrets 245 | 246 | - [Managing Secret using kubectl](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kubectl/) 247 | - [Using Secrets](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets) 248 | 249 | Create a secret named `db-user-pass` using files. 250 | 251 | ```bash 252 | kubectl create secret generic db-user-pass ` 253 | --from-file=./username.txt ` 254 | --from-file=./password.txt 255 | ``` 256 | 257 | The key name can be modified by inserting a key name into the file path. For example, setting the key names to `funusername` and `funpassword` can be done as shown below: 258 | 259 | ```bash 260 | kubectl create secret generic fundb-user-pass ` 261 | --from-file=funusername=./username.txt ` 262 | --from-file=funpassword=./password.txt 263 | ``` 264 | 265 | Check to make sure the key names matches the defined names. 266 | 267 | `kubectl describe secret fundb-user-pass` 268 | 269 | ```bash 270 | Name: fundb-user-pass 271 | Namespace: default 272 | Labels: 273 | Annotations: 274 | 275 | Type: Opaque 276 | 277 | Data 278 | ==== 279 | funpassword: 14 bytes 280 | funusername: 7 bytes 281 | ``` 282 | 283 | Get secret values from `db-user-pass`. 284 | 285 | `kubectl get secret db-user-pass -o jsonpath='{.data}'` 286 | 287 | Edit secret values using the `edit` command. 288 | 289 | `kubectl edit secrets db-user-pass` 290 | 291 | ```yaml 292 | apiVersion: v1 293 | data: 294 | password.txt: PASSWORD 295 | username.txt: USERNAME 296 | kind: Secret 297 | metadata: 298 | creationTimestamp: "2020-10-13T22:48:27Z" 299 | name: db-user-pass 300 | namespace: default 301 | resourceVersion: "1022459" 302 | selfLink: /api/v1/namespaces/default/secrets/db-user-pass 303 | uid: 6bb24810-dd33-4b92-9a37-424f3c7553b6 304 | type: Opaque 305 | ``` 306 | 307 | Use a secret with a pod by declaring a value for `.spec.containers.env.name.valueFrom.secretKeyRef`. 308 | 309 | ```yaml 310 | apiVersion: v1 311 | kind: Pod 312 | metadata: 313 | name: secret-env-pod 314 | spec: 315 | containers: 316 | - name: mycontainer 317 | image: redis 318 | env: 319 | - name: SECRET_USERNAME 320 | valueFrom: 321 | secretKeyRef: 322 | name: mysecret 323 | key: username 324 | - name: SECRET_PASSWORD 325 | valueFrom: 326 | secretKeyRef: 327 | name: mysecret 328 | key: password 329 | restartPolicy: Never 330 | ``` 331 | 332 | ### Other Concepts 333 | 334 | - [Using imagePullSecrets](https://kubernetes.io/docs/concepts/configuration/secret/#using-imagepullsecrets) 335 | 336 | ## 2.3 Know How To Scale Applications 337 | 338 | Scaling is accomplished by changing the number of replicas in a Deployment. 339 | 340 | - [Running Multiple Instances of Your App](https://kubernetes.io/docs/tutorials/kubernetes-basics/scale/scale-intro/) 341 | 342 | Scale a deployment named `nginx` from 3 to 4 replicas. 343 | 344 | `kubectl scale deployments/nginx --replicas=4` 345 | 346 | ## 2.4 Understand The Primitives Used To Create Robust, Self-Healing, Application Deployments 347 | 348 | - Don't use naked Pods (that is, Pods not bound to a ReplicaSet or Deployment) if you can avoid it. Naked Pods will not be rescheduled in the event of a node failure. ([source](https://kubernetes.io/docs/concepts/configuration/overview/#naked-pods-vs-replicasets-deployments-and-jobs)) 349 | - A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available, and specifies a strategy to replace Pods (such as RollingUpdate), is almost always preferable to creating Pods directly, except for some explicit `restartPolicy: Never` scenarios. A Job may also be appropriate. ([source](https://kubernetes.io/docs/concepts/configuration/overview/#naked-pods-vs-replicasets-deployments-and-jobs)) 350 | - Define and use labels that identify semantic attributes of your application or Deployment, such as `{ app: myapp, tier: frontend, phase: test, deployment: v3 }`. ([source](https://kubernetes.io/docs/concepts/configuration/overview/#using-labels)) 351 | 352 | ## 2.5 Understand How Resource Limits Can Affect Pod Scheduling 353 | 354 | Resource limits are a mechanism to control the amount of resources needed by a container. This commonly translates into CPU and memory limits. 355 | 356 | - Limits set an upper boundary on the amount of resources a container is allowed to consume from the host. 357 | - Requests set an upper boundary on the amount of resources a container is allowed to consume from the host. 358 | - If a limit is set without a request, the request value is set to equal the limit value. 359 | - [Managing Resources for Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) 360 | - [Resource Quotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/) 361 | 362 | Here is an example of pod configured with resource requests and limits. 363 | 364 | ```yaml 365 | apiVersion: v1 366 | kind: Pod 367 | metadata: 368 | name: frontend 369 | spec: 370 | containers: 371 | - name: app 372 | image: images.my-company.example/app:v4 373 | resources: 374 | requests: 375 | memory: "64Mi" 376 | cpu: "250m" 377 | limits: 378 | memory: "128Mi" 379 | cpu: "500m" 380 | - name: log-aggregator 381 | image: images.my-company.example/log-aggregator:v6 382 | resources: 383 | requests: 384 | memory: "64Mi" 385 | cpu: "250m" 386 | limits: 387 | memory: "128Mi" 388 | cpu: "500m" 389 | ``` 390 | 391 | ## 2.6 Awareness Of Manifest Management And Common Templating Tools 392 | 393 | - [Templating YAML in Kubernetes with real code](https://learnk8s.io/templating-yaml-with-code) 394 | - [yq](https://github.com/kislyuk/yq): Command-line YAML/XML processor 395 | - [kustomize](https://github.com/kubernetes-sigs/kustomize): lets you customize raw, template-free YAML files for multiple purposes, leaving the original YAML untouched and usable as is. 396 | - [Helm](https://github.com/helm/helm): A tool for managing Charts. Charts are packages of pre-configured Kubernetes resources. 397 | -------------------------------------------------------------------------------- /objectives/objective3.md: -------------------------------------------------------------------------------- 1 | # Objective 3: Services & Networking 2 | 3 | - [Objective 3: Services & Networking](#objective-3-services--networking) 4 | - [3.1 Understand Host Networking Configuration On The Cluster Nodes](#31-understand-host-networking-configuration-on-the-cluster-nodes) 5 | - [3.2 Understand Connectivity Between Pods](#32-understand-connectivity-between-pods) 6 | - [3.3 Understand ClusterIP, NodePort, LoadBalancer Service Types And Endpoints](#33-understand-clusterip-nodeport-loadbalancer-service-types-and-endpoints) 7 | - [ClusterIP](#clusterip) 8 | - [NodePort](#nodeport) 9 | - [LoadBalancer](#loadbalancer) 10 | - [ExternalIP](#externalip) 11 | - [ExternalName](#externalname) 12 | - [Networking Cleanup for Objective 3.3](#networking-cleanup-for-objective-33) 13 | - [3.4 Know How To Use Ingress Controllers And Ingress Resources](#34-know-how-to-use-ingress-controllers-and-ingress-resources) 14 | - [3.5 Know How To Configure And Use CoreDNS](#35-know-how-to-configure-and-use-coredns) 15 | - [3.6 Choose An Appropriate Container Network Interface Plugin](#36-choose-an-appropriate-container-network-interface-plugin) 16 | 17 | > Note: If you need access to the pod network while working through the networking examples, use the [Get a Shell to a Running Container](https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/) guide to deploy a shell container. I often like to have a tab open to the shell container to run arbitrary network commands without the need to `exec` in and out of it repeatedly. 18 | 19 | ## 3.1 Understand Host Networking Configuration On The Cluster Nodes 20 | 21 | - Design 22 | 23 | - All nodes can talk 24 | - All pods can talk (without NAT) 25 | - Every pod gets a unique IP address 26 | 27 | - Network Types 28 | 29 | - Pod Network 30 | - Node Network 31 | - Services Network 32 | - Rewrites egress traffic destined to a service network endpoint with a pod network IP address 33 | 34 | - Proxy Modes 35 | - IPTables Mode 36 | - The standard mode 37 | - `kube-proxy` watches the Kubernetes control plane for the addition and removal of Service and Endpoint objects 38 | - For each Service, it installs iptables rules, which capture traffic to the Service's clusterIP and port, and redirect that traffic to one of the Service's backend sets. 39 | - For each Endpoint object, it installs iptables rules which select a backend Pod. 40 | - [Official Documentation](https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables) 41 | - [Kubernetes Networking Demystified: A Brief Guide](https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/) 42 | - IPVS Mode 43 | - Since 1.11 44 | - Linux IP Virtual Server (IPVS) 45 | - L4 load balancer 46 | 47 | ## 3.2 Understand Connectivity Between Pods 48 | 49 | [Official Documentation](https://kubernetes.io/docs/concepts/cluster-administration/networking/) 50 | 51 | Read [The Kubernetes network model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#the-kubernetes-network-model): 52 | 53 | - Every pod gets its own address 54 | - Fundamental requirements on any networking implementation 55 | - Pods on a node can communicate with all pods on all nodes without NAT 56 | - Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node 57 | - Pods in the host network of a node can communicate with all pods on all nodes without NAT 58 | - Kubernetes IP addresses exist at the Pod scope 59 | - Containers within a pod can communicate with one another over `localhost` 60 | - "IP-per-pod" model 61 | 62 | ## 3.3 Understand ClusterIP, NodePort, LoadBalancer Service Types And Endpoints 63 | 64 | Services are all about abstracting away the details of which pods are running behind a particular network endpoint. For many applications, work must be processed by some other service. Using a service allows the application to "toss over" the work to Kubernetes, which then uses a selector to determine which pods are healthy and available to receive the work. The service abstracts numerous replica pods that are available to do work. 65 | 66 | - [Official Documentation](https://kubernetes.io/docs/concepts/services-networking/service/) 67 | - [Katakoda Networking Introduction](https://www.katacoda.com/courses/kubernetes/networking-introduction) 68 | 69 | > Note: This section was completed using a GKE cluster and may differ from what your cluster looks like. 70 | 71 | ### ClusterIP 72 | 73 | - Exposes the Service on a cluster-internal IP. 74 | - Choosing this value makes the Service only reachable from within the cluster. 75 | - This is the default ServiceType. 76 | - [Using Source IP](https://kubernetes.io/docs/tutorials/services/source-ip/) 77 | - [Kubectl Expose Command Reference](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#expose) 78 | 79 | The imperative option is to create a deployment and then expose the deployment. In this example, the deployment is exposed using a ClusterIP service that accepts traffic on port 80 and translates it to the pod using port 8080. 80 | 81 | `kubectl create deployment funkyapp1 --image=k8s.gcr.io/echoserver:1.4` 82 | 83 | `kubectl expose deployment funkyapp1 --name=funkyip --port=80 --target-port=8080 --type=ClusterIP` 84 | 85 | > Note: The `--type=ClusterIP` parameter is optional when deploying a `ClusterIP` service since this is the default type. 86 | 87 | ```yaml 88 | apiVersion: v1 89 | kind: Service 90 | metadata: 91 | creationTimestamp: null 92 | labels: 93 | app: funkyapp1 #Selector 94 | name: funkyip 95 | spec: 96 | ports: 97 | - port: 80 98 | protocol: TCP 99 | targetPort: 8080 100 | selector: 101 | app: funkyapp1 102 | type: ClusterIP #Note this! 103 | ``` 104 | 105 | Using `kubectl describe svc funkyip` shows more details: 106 | 107 | ```bash 108 | Name: funkyip 109 | Namespace: default 110 | Labels: app=funkyapp1 111 | Annotations: cloud.google.com/neg: {"ingress":true} 112 | Selector: app=funkyapp1 113 | Type: ClusterIP 114 | IP: 10.108.3.156 115 | Port: 80/TCP 116 | TargetPort: 8080/TCP 117 | Endpoints: 10.104.2.7:8080 118 | Session Affinity: None 119 | Events: 120 | ``` 121 | 122 | --- 123 | 124 | Check to make sure the `funkyip` service exists. This also shows the assigned service (cluster IP) address. 125 | 126 | `kubectl get svc funkyip` 127 | 128 | ```bash 129 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 130 | funkyip ClusterIP 10.108.3.156 80/TCP 21m 131 | ``` 132 | 133 | --- 134 | 135 | From there, you can see the endpoint created to match any pod discovered using the `app: funkyapp1` label. 136 | 137 | `kubectl get endpoints funkyip` 138 | 139 | ```bash 140 | NAME ENDPOINTS AGE 141 | funkyip 10.104.2.7:8080 21m 142 | ``` 143 | 144 | --- 145 | 146 | The endpoint matches the IP address of the matching pod. 147 | 148 | `kubectl get pods -o wide` 149 | 150 | ```bash 151 | NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES 152 | funkyapp1-7b478ccf9b-2vlc2 1/1 Running 0 21m 10.104.2.7 gke-my-first-cluster-1-default-pool-504c1e77-zg6v 153 | shell-demo 1/1 Running 0 3m12s 10.128.0.14 gke-my-first-cluster-1-default-pool-504c1e77-m9lk 154 | ``` 155 | 156 | --- 157 | 158 | The `.spec.ports.port` value defines the port used to access the service. The `.spec.ports.targetPort` value defines the port used to access the container's application. 159 | 160 | `User -> Port -> Kubernetes Service -> Target Port -> Application` 161 | 162 | This can be tested using `curl`: 163 | 164 | ```bash 165 | export CLUSTER_IP=$(kubectl get services/funkyip -o go-template='{{(index .spec.clusterIP)}}') 166 | echo CLUSTER_IP=$CLUSTER_IP 167 | ``` 168 | 169 | From there, use `curl $CLUSTER_IP:80` to hit the service `port`, which redirects to the `targetPort` of 8080. 170 | 171 | `curl 10.108.3.156:80` 172 | 173 | ```bash 174 | CLIENT VALUES: 175 | client_address=10.128.0.14 176 | command=GET 177 | real path=/ 178 | query=nil 179 | request_version=1.1 180 | request_uri=http://10.108.3.156:8080/ 181 | 182 | SERVER VALUES: 183 | server_version=nginx: 1.10.0 - lua: 10001 184 | 185 | HEADERS RECEIVED: 186 | accept=*/* 187 | host=10.108.3.156 188 | user-agent=curl/7.64.0 189 | BODY: 190 | -no body in request-root 191 | ``` 192 | 193 | ### NodePort 194 | 195 | - Exposes the Service on each Node's IP at a static port (the NodePort). 196 | - [Official Documentation](https://kubernetes.io/docs/concepts/services-networking/service/#nodeport) 197 | 198 | `kubectl expose deployment funkyapp1 --name=funkynode --port=80 --target-port=8080 --type=NodePort` 199 | 200 | ```yaml 201 | apiVersion: v1 202 | kind: Service 203 | metadata: 204 | creationTimestamp: null 205 | labels: 206 | app: funkyapp1 #Selector 207 | name: funkynode 208 | spec: 209 | ports: 210 | - port: 80 211 | protocol: TCP 212 | targetPort: 8080 213 | selector: 214 | app: funkyapp1 215 | type: NodePort #Note this! 216 | ``` 217 | 218 | --- 219 | 220 | This service is available on each node at a specific port. 221 | 222 | `kubectl describe svc funkynode` 223 | 224 | ```bash 225 | Name: funkynode 226 | Namespace: default 227 | Labels: app=funkyapp1 228 | Annotations: cloud.google.com/neg: {"ingress":true} 229 | Selector: app=funkyapp1 230 | Type: NodePort 231 | IP: 10.108.5.37 232 | Port: 80/TCP 233 | TargetPort: 8080/TCP 234 | NodePort: 30182/TCP 235 | Endpoints: 10.104.2.7:8080 236 | Session Affinity: None 237 | External Traffic Policy: Cluster 238 | Events: 239 | ``` 240 | 241 | --- 242 | 243 | By using the node IP address with the `nodePort` value, we can see the desired payload. Make sure to scale the deployment so that each node is running one replica of the pod. For a cluster with 2 worker nodes, this can be done with `kubectl scale deploy funkyapp1 --replicas=3`. 244 | 245 | From there, it is possible to `curl` directly to a node IP address using the `nodePort` when using the shell pod demo. If working from outside the pod network, use the service IP address. 246 | 247 | `curl 10.128.0.14:30182` 248 | 249 | ```bash 250 | CLIENT VALUES: 251 | client_address=10.128.0.14 252 | command=GET 253 | real path=/ 254 | query=nil 255 | request_version=1.1 256 | request_uri=http://10.128.0.14:8080/ 257 | 258 | SERVER VALUES: 259 | server_version=nginx: 1.10.0 - lua: 10001 260 | 261 | HEADERS RECEIVED: 262 | accept=*/* 263 | host=10.128.0.14:30182 264 | user-agent=curl/7.64.0 265 | BODY: 266 | -no body in request-root 267 | ``` 268 | 269 | ### LoadBalancer 270 | 271 | - Exposes the Service externally using a cloud provider's load balancer. 272 | - NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created. 273 | - [Source IP for Services with Type LoadBalancer](https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-loadbalancer) 274 | 275 | `kubectl expose deployment funkyapp1 --name=funkylb --port=80 --target-port=8080 --type=LoadBalancer` 276 | 277 | ```yaml 278 | apiVersion: v1 279 | kind: Service 280 | metadata: 281 | creationTimestamp: null 282 | labels: 283 | app: funkyapp1 284 | name: funkylb 285 | spec: 286 | ports: 287 | - port: 80 288 | protocol: TCP 289 | targetPort: 8080 290 | selector: 291 | app: funkyapp1 292 | type: LoadBalancer #Note this! 293 | ``` 294 | 295 | --- 296 | 297 | Get information on the `funkylb` service to determine the External IP address. 298 | 299 | `kubectl get svc funkylb` 300 | 301 | ```bash 302 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 303 | funkylb LoadBalancer 10.108.11.148 35.232.149.96 80:31679/TCP 64s 304 | ``` 305 | 306 | It is then possible to retrieve the payload using the External IP address and port value from anywhere on the Internet; no need to use the pod shell demo! 307 | 308 | `curl 35.232.149.96:80` 309 | 310 | ```bash 311 | CLIENT VALUES: 312 | client_address=10.104.2.1 313 | command=GET 314 | real path=/ 315 | query=nil 316 | request_version=1.1 317 | request_uri=http://35.232.149.96:8080/ 318 | 319 | SERVER VALUES: 320 | server_version=nginx: 1.10.0 - lua: 10001 321 | 322 | HEADERS RECEIVED: 323 | accept=*/* 324 | host=35.232.149.96 325 | user-agent=curl/7.55.1 326 | BODY: 327 | -no body in request- 328 | ``` 329 | 330 | ### ExternalIP 331 | 332 | [Official Documentation](https://kubernetes.io/docs/concepts/services-networking/service/#external-ips) 333 | 334 | - Exposes a Kubernetes service on an external IP address. 335 | - Kubernetes has no control over this external IP address. 336 | 337 | Here is an example spec: 338 | 339 | ```yaml 340 | apiVersion: v1 341 | kind: Service 342 | metadata: 343 | name: my-service 344 | spec: 345 | selector: 346 | app: MyApp 347 | ports: 348 | - name: http 349 | protocol: TCP 350 | port: 80 351 | targetPort: 9376 352 | externalIPs: 353 | - 80.11.12.10 #Take note! 354 | ``` 355 | 356 | ### ExternalName 357 | 358 | - Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. 359 | - No proxy of any kind is set up. 360 | 361 | ### Networking Cleanup for Objective 3.3 362 | 363 | Run these commands to cleanup the resources, if desired. 364 | 365 | ```bash 366 | kubectl delete svc funkyip 367 | kubectl delete svc funkynode 368 | kubectl delete svc funkylb 369 | kubectl delete deploy funkyapp1 370 | ``` 371 | 372 | ## 3.4 Know How To Use Ingress Controllers And Ingress Resources 373 | 374 | Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. 375 | 376 | - Traffic routing is controlled by rules defined on the **Ingress resource**. 377 | - An **Ingress controller** is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic. 378 | - For example, the [NGINX Ingress Controller for Kubernetes](https://www.nginx.com/products/nginx/kubernetes-ingress-controller) 379 | - The name of an Ingress object must be a valid DNS subdomain name. 380 | - [Ingress Documentation](https://kubernetes.io/docs/concepts/services-networking/ingress/) 381 | - A list of [Ingress Controllers](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) 382 | - [Katacoda - Create Ingress Routing](https://www.katacoda.com/courses/kubernetes/create-kubernetes-ingress-routes) lab 383 | - [Katacoda - Nginx on Kubernetes](https://www.katacoda.com/javajon/courses/kubernetes-applications/nginx) lab 384 | 385 | Example of an ingress resource: 386 | 387 | ```yaml 388 | apiVersion: networking.k8s.io/v1 389 | kind: Ingress 390 | metadata: 391 | name: minimal-ingress 392 | annotations: 393 | nginx.ingress.kubernetes.io/rewrite-target: / 394 | spec: 395 | rules: 396 | - http: 397 | paths: 398 | - path: /testpath 399 | pathType: Prefix 400 | backend: 401 | service: 402 | name: test 403 | port: 404 | number: 80 405 | ``` 406 | 407 | Information on some of the objects within this resource: 408 | 409 | - [Ingress Rules](https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-rules) 410 | - [Path Types](https://kubernetes.io/docs/concepts/services-networking/ingress/#path-types) 411 | 412 | And, in the case of Nginx, [a custom resource definition (CRD) is often used](https://octopus.com/blog/nginx-ingress-crds) to extend the usefulness of an ingress. An example is shown below: 413 | 414 | ```yaml 415 | apiVersion: k8s.nginx.org/v1 416 | kind: VirtualServer 417 | metadata: 418 | name: cafe 419 | spec: 420 | host: cafe.example.com 421 | tls: 422 | secret: cafe-secret 423 | upstreams: 424 | - name: tea 425 | service: tea-svc 426 | port: 80 427 | - name: coffee 428 | service: coffee-svc 429 | port: 80 430 | routes: 431 | - path: /tea 432 | action: 433 | pass: tea 434 | - path: /coffee 435 | action: 436 | pass: coffee 437 | ``` 438 | 439 | ## 3.5 Know How To Configure And Use CoreDNS 440 | 441 | CoreDNS is a general-purpose authoritative DNS server that can serve as cluster DNS. 442 | 443 | - A bit of history: 444 | - As of Kubernetes v1.12, CoreDNS is the recommended DNS Server, replacing `kube-dns`. 445 | - In Kubernetes version 1.13 and later the CoreDNS feature gate is removed and CoreDNS is used by default. 446 | - In Kubernetes 1.18, `kube-dns` usage with kubeadm has been deprecated and will be removed in a future version. 447 | - [Using CoreDNS for Service Discovery](https://kubernetes.io/docs/tasks/administer-cluster/coredns/) 448 | - [Customizing DNS Service](https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/) 449 | 450 | CoreDNS is installed with the following default [Corefile](https://coredns.io/2017/07/23/corefile-explained/) configuration: 451 | 452 | ```yaml 453 | apiVersion: v1 454 | kind: ConfigMap 455 | metadata: 456 | name: coredns 457 | namespace: kube-system 458 | data: 459 | Corefile: | 460 | .:53 { 461 | errors 462 | health { 463 | lameduck 5s 464 | } 465 | ready 466 | kubernetes cluster.local in-addr.arpa ip6.arpa { 467 | pods insecure 468 | fallthrough in-addr.arpa ip6.arpa 469 | ttl 30 470 | } 471 | prometheus :9153 472 | forward . /etc/resolv.conf 473 | cache 30 474 | loop 475 | reload 476 | loadbalance 477 | } 478 | ``` 479 | 480 | If you need to customize CoreDNS behavior, you create and apply your own ConfigMap to override settings in the Corefile. The [Configuring DNS Servers for Kubernetes Clusters](https://docs.cloud.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengconfiguringdnsserver.htm) document describes this in detail. 481 | 482 | --- 483 | 484 | Review your configmaps for the `kube-system` namespace to determine if there is a `coredns-custom` configmap. 485 | 486 | `kubectl get configmaps --namespace=kube-system` 487 | 488 | ```bash 489 | NAME DATA AGE 490 | cluster-kubestore 0 23h 491 | clustermetrics 0 23h 492 | extension-apiserver-authentication 6 24h 493 | gke-common-webhook-lock 0 23h 494 | ingress-gce-lock 0 23h 495 | ingress-uid 2 23h 496 | kube-dns 0 23h 497 | kube-dns-autoscaler 1 23h 498 | metrics-server-config 1 23h 499 | ``` 500 | 501 | --- 502 | 503 | Create a file named `coredns.yml` containing a configmap with the desired DNS entries in the `data` field such as the example below: 504 | 505 | ```yaml 506 | apiVersion: v1 507 | kind: ConfigMap 508 | metadata: 509 | name: coredns-custom 510 | namespace: kube-system 511 | data: 512 | example.server: 513 | | # All custom server files must have a “.server” file extension. 514 | # Change example.com to the domain you wish to forward. 515 | example.com { 516 | # Change 1.1.1.1 to your customer DNS resolver. 517 | forward . 1.1.1.1 518 | } 519 | ``` 520 | 521 | --- 522 | 523 | Apply the configmap. 524 | 525 | `kubectl apply -f coredns.yml` 526 | 527 | --- 528 | 529 | Validate the existence of the `coredns-custom` configmap. 530 | 531 | `kubectl get configmaps --namespace=kube-system` 532 | 533 | ```bash 534 | NAME DATA AGE 535 | cluster-kubestore 0 24h 536 | clustermetrics 0 24h 537 | coredns-custom 1 6s 538 | extension-apiserver-authentication 6 24h 539 | gke-common-webhook-lock 0 24h 540 | ingress-gce-lock 0 24h 541 | ingress-uid 2 24h 542 | kube-dns 0 24h 543 | kube-dns-autoscaler 1 24h 544 | metrics-server-config 1 24h 545 | ``` 546 | 547 | --- 548 | 549 | Get the configmap and output the value in yaml format. 550 | 551 | `kubectl get configmaps --namespace=kube-system coredns-custom -o yaml` 552 | 553 | ```yaml 554 | apiVersion: v1 555 | data: 556 | example.server: | 557 | # Change example.com to the domain you wish to forward. 558 | example.com { 559 | # Change 1.1.1.1 to your customer DNS resolver. 560 | forward . 1.1.1.1 561 | } 562 | kind: ConfigMap 563 | metadata: 564 | annotations: 565 | kubectl.kubernetes.io/last-applied-configuration: | 566 | {"apiVersion":"v1","data":{"example.server":"# Change example.com to the domain you wish to forward.\nexample.com {\n # Change 1.1.1.1 to your customer DNS resolver.\n forward . 1.1.1.1\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"coredns-custom","namespace":"kube-system"}} 567 | creationTimestamp: "2020-10-27T19:49:24Z" 568 | managedFields: 569 | - apiVersion: v1 570 | fieldsType: FieldsV1 571 | fieldsV1: 572 | f:data: 573 | .: {} 574 | f:example.server: {} 575 | f:metadata: 576 | f:annotations: 577 | .: {} 578 | f:kubectl.kubernetes.io/last-applied-configuration: {} 579 | manager: kubectl-client-side-apply 580 | operation: Update 581 | time: "2020-10-27T19:49:24Z" 582 | name: coredns-custom 583 | namespace: kube-system 584 | resourceVersion: "519480" 585 | selfLink: /api/v1/namespaces/kube-system/configmaps/coredns-custom 586 | uid: 8d3250a5-cbb4-4f01-aae3-4e83bd158ebe 587 | ``` 588 | 589 | ## 3.6 Choose An Appropriate Container Network Interface Plugin 590 | 591 | Generally, it seems that Flannel is good for starting out in a very simplified environment, while Calico (and others) extend upon the basic functionality to meet design-specific requirements. 592 | 593 | - [Network Plugins](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) 594 | - [Choosing a CNI Network Provider for Kubernetes](https://chrislovecnm.com/kubernetes/cni/choosing-a-cni-provider/) 595 | - [Comparing Kubernetes CNI Providers: Flannel, Calico, Canal, and Weave](https://rancher.com/blog/2019/2019-03-21-comparing-kubernetes-cni-providers-flannel-calico-canal-and-weave/) 596 | 597 | Common decision points include: 598 | 599 | - Network Model: Layer 2, Layer 3, VXLAN, etc. 600 | - Routing: Routing and route distribution for pod traffic between nodes 601 | - Network Policy: Essentially the firewall between network / pod segments 602 | - IP Address Management (IPAM) 603 | - Datastore: 604 | - `etcd` - for direct connection to an etcd cluster 605 | - Kubernetes - for connection to a Kubernetes API server 606 | -------------------------------------------------------------------------------- /objectives/objective4.md: -------------------------------------------------------------------------------- 1 | # Objective 4: Storage 2 | 3 | - [Objective 4: Storage](#objective-4-storage) 4 | - [4.1 Understand Storage Classes, Persistent Volumes](#41-understand-storage-classes-persistent-volumes) 5 | - [Storage Classes](#storage-classes) 6 | - [Persistent Volumes](#persistent-volumes) 7 | - [4.2 Understand Volume Mode, Access Modes And Reclaim Policies For Volumes](#42-understand-volume-mode-access-modes-and-reclaim-policies-for-volumes) 8 | - [Volume Mode](#volume-mode) 9 | - [Access Modes](#access-modes) 10 | - [Reclaim Policies](#reclaim-policies) 11 | - [4.3 Understand Persistent Volume Claims Primitive](#43-understand-persistent-volume-claims-primitive) 12 | - [4.4 Know How To Configure Applications With Persistent Storage](#44-know-how-to-configure-applications-with-persistent-storage) 13 | 14 | ## 4.1 Understand Storage Classes, Persistent Volumes 15 | 16 | - [Storage Classes](https://kubernetes.io/docs/concepts/storage/storage-classes/) 17 | - [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) 18 | 19 | ### Storage Classes 20 | 21 | - [Reclaim Policy](https://kubernetes.io/docs/concepts/storage/storage-classes/#reclaim-policy): PersistentVolumes that are dynamically created by a StorageClass will have the reclaim policy specified in the reclaimPolicy field of the class 22 | - Delete: When PersistentVolumeClaim is deleted, also deletes PersistentVolume and underlying storage object 23 | - Retain: When PersistentVolumeClaim is deleted, PersistentVolume remains and volume is "released" 24 | - [Volume Binding Mode](https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode): 25 | - `Immediate`: By default, the `Immediate` mode indicates that volume binding and dynamic provisioning occurs once the PersistentVolumeClaim is created 26 | - `WaitForFirstConsumer`: Delay the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created 27 | - Supported by `AWSElasticBlockStore`, `GCEPersistentDisk`, and `AzureDisk` 28 | - [Allow Volume Expansion](https://kubernetes.io/docs/concepts/storage/storage-classes/#allow-volume-expansion): Allow volumes to be expanded 29 | - Note: It is not possible to reduce the size of a PersistentVolume 30 | - Default Storage Class: A default storage class is used when a PersistentVolumeClaim does not specify the storage class 31 | - Can be handy when a single default services all pod volumes 32 | - [Provisioner](https://kubernetes.io/docs/concepts/storage/storage-classes/#provisioner) 33 | - Determines the volume plugin to use for provisioning PVs. 34 | - Example: `gke-pd`, `azure-disk` 35 | 36 | --- 37 | 38 | View all storage classes 39 | 40 | `kubectl get storageclass` or `kubectl get sc` 41 | 42 | ```bash 43 | NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE 44 | standard (default) kubernetes.io/gce-pd Delete Immediate true 25h 45 | ``` 46 | 47 | --- 48 | 49 | View the storage class in yaml format 50 | 51 | `kubectl get sc standard -o yaml` 52 | 53 | ```yaml 54 | allowVolumeExpansion: true 55 | apiVersion: storage.k8s.io/v1 56 | kind: StorageClass 57 | metadata: 58 | name: standard 59 | parameters: 60 | type: pd-standard 61 | provisioner: kubernetes.io/gce-pd 62 | reclaimPolicy: Delete 63 | volumeBindingMode: Immediate 64 | ``` 65 | 66 | --- 67 | 68 | Make a custom storage class using the yaml configuration below and save it as `speedyssdclass.yaml` 69 | 70 | ```yaml 71 | allowVolumeExpansion: true 72 | apiVersion: storage.k8s.io/v1 73 | kind: StorageClass 74 | metadata: 75 | name: speedyssdclass 76 | parameters: 77 | type: pd-ssd # Note: This will use SSD backed disks 78 | fstype: ext4 79 | replication-type: none 80 | provisioner: kubernetes.io/gce-pd 81 | reclaimPolicy: Delete 82 | volumeBindingMode: Immediate 83 | ``` 84 | 85 | --- 86 | 87 | Apply the storage class configuration to the cluster 88 | 89 | `kubectl apply -f speedyssdclass.yaml` 90 | 91 | ```bash 92 | storageclass.storage.k8s.io/speedyssdclass created 93 | ``` 94 | 95 | --- 96 | 97 | Get the storage classes 98 | 99 | `kubectl get sc` 100 | 101 | ```bash 102 | NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE 103 | speedyssdclass kubernetes.io/gce-pd Retain WaitForFirstConsumer true 5m19s 104 | standard (default) kubernetes.io/gce-pd Delete Immediate true 8d 105 | ``` 106 | 107 | ### Persistent Volumes 108 | 109 | View a persistent volume in yaml format 110 | 111 | `kubectl get pv pvc-d2f6e37e-277f-4b7b-8725-542609f1dea4 -o yaml` 112 | 113 | ```yaml 114 | apiVersion: v1 115 | kind: PersistentVolume 116 | metadata: 117 | name: pvc-d2f6e37e-277f-4b7b-8725-542609f1dea4 118 | spec: 119 | accessModes: 120 | - ReadWriteOnce 121 | capacity: 122 | storage: 1Gi 123 | persistentVolumeReclaimPolicy: Delete 124 | storageClassName: standard 125 | volumeMode: Filesystem 126 | ``` 127 | 128 | --- 129 | 130 | Create a new disk named `pv100` in Google Cloud to be used as a persistent volume 131 | 132 | > Note: Use the zone of your GKE cluster 133 | 134 | `gcloud compute disks create pv100 --size 10GiB --zone=us-central1-c` 135 | 136 | --- 137 | 138 | Make a custom persistent volume using the yaml configuration below and save it as `pv100.yaml` 139 | 140 | ```yaml 141 | apiVersion: v1 142 | kind: PersistentVolume 143 | metadata: 144 | name: pv100 145 | spec: 146 | accessModes: 147 | - ReadWriteOnce 148 | capacity: 149 | storage: 1Gi 150 | persistentVolumeReclaimPolicy: Delete 151 | storageClassName: standard 152 | volumeMode: Filesystem 153 | gcePersistentDisk: # This section is required since we are not using a Storage Class 154 | fsType: ext4 155 | pdName: pv100 156 | ``` 157 | 158 | --- 159 | 160 | Apply the persistent volume to the cluster 161 | 162 | `kubectl apply -f pv100.yaml` 163 | 164 | ```bash 165 | persistentvolume/pv100 created 166 | ``` 167 | 168 | --- 169 | 170 | Get the persistent volume and notice that it has a status of `Available` since there is no `PersistentVolumeClaim` to bind against 171 | 172 | `kubectl get pv pv100` 173 | 174 | ```bash 175 | NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE 176 | pv100 1Gi RWO Delete Available standard 2m51s 177 | ``` 178 | 179 | ## 4.2 Understand Volume Mode, Access Modes And Reclaim Policies For Volumes 180 | 181 | - [Volume Mode](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#volume-mode) 182 | - [Access Modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) 183 | - [Reclaim Policy](https://kubernetes.io/docs/concepts/storage/storage-classes/#reclaim-policy) 184 | 185 | ### Volume Mode 186 | 187 | - Filesystem: Kubernetes formats the volume and presents it to a specified mount point. 188 | - If the volume is backed by a block device and the device is empty, Kuberneretes creates a filesystem on the device before mounting it for the first time. 189 | - Block: Kubernetes exposes a raw block device to the container. 190 | - Improved time to usage and perhaps performance. 191 | - The container must know what to do with the device; there is no filesystem. 192 | - Defined in `.spec.volumeMode` for a `PersistentVolumeClaim`. 193 | 194 | --- 195 | 196 | View the volume mode for persistent volume claims using the `-o wide` to see the `VOLUMEMODE` column 197 | 198 | `kubectl get pvc -o wide` 199 | 200 | ```bash 201 | NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE 202 | www-web-0 Bound pvc-f3e92637-7e0d-46a3-ad87-ef1275bb5a72 1Gi RWO standard 19m Filesystem 203 | www-web-1 Bound pvc-d2f6e37e-277f-4b7b-8725-542609f1dea4 1Gi RWO standard 19m Filesystem 204 | ``` 205 | 206 | ### Access Modes 207 | 208 | - ReadWriteOnce (RWO): can be mounted as read-write by a single node 209 | - ReadOnlyMany (ROX): can be mounted as read-only by many nodes 210 | - ReadWriteMany (RWX): can be mounted as read-write by many nodes 211 | - Defined in `.spec.accessModes` for a `PersistentVolumeClaim` and `PersistentVolume` 212 | 213 | View the access mode for persistent volume claims 214 | 215 | `kubectl get pvc` 216 | 217 | ```bash 218 | NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE 219 | www-web-0 Bound pvc-f3e92637-7e0d-46a3-ad87-ef1275bb5a72 1Gi RWO standard 28m 220 | www-web-1 Bound pvc-d2f6e37e-277f-4b7b-8725-542609f1dea4 1Gi RWO standard 27m 221 | ``` 222 | 223 | ### Reclaim Policies 224 | 225 | - [Reclaim Policy](https://kubernetes.io/docs/concepts/storage/storage-classes/#reclaim-policy): PersistentVolumes that are dynamically created by a StorageClass will have the reclaim policy specified in the reclaimPolicy field of the class 226 | - Delete: When PersistentVolumeClaim is deleted, also deletes PersistentVolume and underlying storage object 227 | - Retain: When PersistentVolumeClaim is deleted, PersistentVolume remains and volume is "released" 228 | - [Change the Reclaim Policy of a PersistentVolume](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/) 229 | - Defined in `.spec.persistentVolumeReclaimPolicy` for `PersistentVolume`. 230 | 231 | --- 232 | 233 | View the reclaim policy set on persistent volumes 234 | 235 | `kubectl get pv` 236 | 237 | ```bash 238 | NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE 239 | pvc-d2f6e37e-277f-4b7b-8725-542609f1dea4 1Gi RWO Delete Bound default/www-web-1 standard 45m 240 | pvc-f3e92637-7e0d-46a3-ad87-ef1275bb5a72 1Gi RWO Delete Bound default/www-web-0 standard 45m 241 | ``` 242 | 243 | ## 4.3 Understand Persistent Volume Claims Primitive 244 | 245 | Make a custom persistent volume claim using the yaml configuration below and save it as `pvc01.yaml` 246 | 247 | ```yaml 248 | apiVersion: v1 249 | kind: PersistentVolumeClaim 250 | metadata: 251 | name: pvc01 252 | spec: 253 | storageClassName: standard 254 | accessModes: 255 | - ReadWriteOnce 256 | resources: 257 | requests: 258 | storage: 3Gi 259 | ``` 260 | 261 | --- 262 | 263 | Apply the persistent volume claim 264 | 265 | `kubectl apply -f pvc01.yaml` 266 | 267 | ```bash 268 | persistentvolumeclaim/pvc01 created 269 | ``` 270 | 271 | --- 272 | 273 | Get the persistent volume claim 274 | 275 | `kubectl get pvc pvc01` 276 | 277 | ```bash 278 | NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE 279 | pvc01 Bound pvc-9f2e7c5d-b64c-467e-bba6-86ccb333d981 3Gi RWO standard 5m19s 280 | ``` 281 | 282 | ## 4.4 Know How To Configure Applications With Persistent Storage 283 | 284 | - [Configure a Pod to Use a PersistentVolume for Storage](https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/) 285 | 286 | --- 287 | 288 | Create a new yaml file using the configuration below and save it as `pv-pod.yaml` 289 | 290 | > Note: Make sure to create `pvc01` in [this earlier step](#43-understand-persistent-volume-claims-primitive) 291 | 292 | ```yaml 293 | apiVersion: v1 294 | kind: Pod 295 | metadata: 296 | name: pv-pod 297 | spec: 298 | volumes: 299 | - name: pv-pod-storage # The name of the volume, used by .spec.containers.volumeMounts.name 300 | persistentVolumeClaim: 301 | claimName: pvc01 # This pvc was created in an earlier step 302 | containers: 303 | - name: pv-pod-container 304 | image: nginx 305 | ports: 306 | - containerPort: 80 307 | name: "http-server" 308 | volumeMounts: 309 | - mountPath: "/usr/share/nginx/html" 310 | name: pv-pod-storage # This refers back to .spec.volumes.name 311 | ``` 312 | 313 | --- 314 | 315 | Apply the pod 316 | 317 | `kubectl apply -f pv-pod.yaml` 318 | 319 | ```bash 320 | pod/pv-pod created 321 | ``` 322 | 323 | --- 324 | 325 | Watch the pod provisioning process 326 | 327 | `kubectl get pod -w pv-pod` 328 | 329 | ```bash 330 | NAME READY STATUS RESTARTS AGE 331 | pv-pod 1/1 Running 0 30s 332 | ``` 333 | 334 | --- 335 | 336 | View the binding on `pvc01` 337 | 338 | `kubectl describe pvc pvc01` 339 | 340 | ```bash 341 | Name: pvc01 342 | Namespace: default 343 | StorageClass: standard 344 | Status: Bound 345 | Volume: pvc-9f2e7c5d-b64c-467e-bba6-86ccb333d981 346 | Labels: 347 | Annotations: pv.kubernetes.io/bind-completed: yes 348 | pv.kubernetes.io/bound-by-controller: yes 349 | volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd 350 | Finalizers: [kubernetes.io/pvc-protection] 351 | Capacity: 3Gi 352 | Access Modes: RWO 353 | VolumeMode: Filesystem 354 | Mounted By: pv-pod # Here it is! 355 | Events: 356 | Type Reason Age From Message 357 | ---- ------ ---- ---- ------- 358 | Normal ProvisioningSucceeded 36m persistentvolume-controller Successfully provisioned volume pvc-9f2e7c5d-b64c-467e-bba6-86ccb333d981 using kubernetes.io/gce-pd 359 | ``` 360 | -------------------------------------------------------------------------------- /objectives/objective5.md: -------------------------------------------------------------------------------- 1 | # Objective 5: Troubleshooting 2 | 3 | - [Troubleshooting Kubernetes deployments](https://learnk8s.io/troubleshooting-deployments) 4 | 5 | - [Objective 5: Troubleshooting](#objective-5-troubleshooting) 6 | - [5.1 Evaluate Cluster And Node Logging](#51-evaluate-cluster-and-node-logging) 7 | - [Cluster Logging](#cluster-logging) 8 | - [Node Logging](#node-logging) 9 | - [5.2 Understand How To Monitor Applications](#52-understand-how-to-monitor-applications) 10 | - [5.3 Manage Container Stdout & Stderr Logs](#53-manage-container-stdout--stderr-logs) 11 | - [5.4 Troubleshoot Application Failure](#54-troubleshoot-application-failure) 12 | - [5.5 Troubleshoot Cluster Component Failure](#55-troubleshoot-cluster-component-failure) 13 | - [5.6 Troubleshoot Networking](#56-troubleshoot-networking) 14 | 15 | ## 5.1 Evaluate Cluster And Node Logging 16 | 17 | ### Cluster Logging 18 | 19 | Having a separate storage location for cluster component logging, such as nodes, pods, and applications. 20 | 21 | - [Cluster-level logging architectures](https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures) 22 | - [Kubernetes Logging Best Practices](https://platform9.com/blog/kubernetes-logging-best-practices/) 23 | 24 | Commonly deployed in one of three ways: 25 | 26 | 1. [Logging agent on each node](https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent) that sends log data to a backend storage repository 27 | 1. These agents can be deployed using a DaemonSet replica to ensure nodes have the agent running 28 | 2. Note: This approach only works for applications' standard output (_stdout_) and standard error (_stderr_) 29 | 2. [Logging agent as a sidecar](https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-sidecar-container-with-the-logging-agent) to specific deployments that sends log data to a backend storage repository 30 | 1. Note: Writing logs to a file and then streaming them to stdout can double disk usage 31 | 3. [Configure the containerized application](https://kubernetes.io/docs/concepts/cluster-administration/logging/#exposing-logs-directly-from-the-application) to send log data to a backend storage repository 32 | 33 | ### Node Logging 34 | 35 | Having a log file on the node that is populated with standard output (_stdout_) and standard error (_stderr_) log entries from containers running on the node. 36 | 37 | - [Logging at the node level](https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level) 38 | 39 | ## 5.2 Understand How To Monitor Applications 40 | 41 | - [Using kubectl describe pod to fetch details about pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application-introspection/#using-kubectl-describe-pod-to-fetch-details-about-pods) 42 | - [Interacting with running Pods](https://kubernetes.io/docs/reference/kubectl/cheatsheet/#interacting-with-running-pods) 43 | 44 | ## 5.3 Manage Container Stdout & Stderr Logs 45 | 46 | - [Kubectl Commands - Logs](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#logs) 47 | - [How to find—and use—your GKE logs with Cloud Logging](https://cloud.google.com/blog/products/management-tools/finding-your-gke-logs) 48 | - [Enable Log Rotation in Kubernetes Cluster](https://vividcode.io/enable-log-rotation-in-kubernetes-cluster/) 49 | 50 | `kubectl logs [-f] [-p] (POD | TYPE/NAME) [-c CONTAINER]` 51 | 52 | - `-f` will follow the logs 53 | - `-p` will pull up the previous instance of the container 54 | - `-c` will select a specific container for pods that have more than one 55 | 56 | ## 5.4 Troubleshoot Application Failure 57 | 58 | - [Troubleshoot Applications](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/) 59 | - Status: Pending 60 | - The Pod has been accepted by the Kubernetes cluster, but one or more of the containers has not been set up and made ready to run. 61 | - If no resources available on cluster, Cluster Autoscaling will increased node count if enabled 62 | - Once node count satisfied, pods in Pending status will be deployed 63 | - Status: Waiting 64 | - A container in the Waiting state is still running the operations it requires in order to complete start up 65 | 66 | --- 67 | 68 | Describe the pod to get details on the configuration, containers, events, conditions, volumes, etc. 69 | 70 | - Is the status equal to RUNNING? 71 | - Are there enough resources to schedule the pod? 72 | - Are there enough `hostPorts` remaining to schedule the pod? 73 | 74 | `kubectl describe pod counter` 75 | 76 | ```yaml 77 | Name: counter 78 | Namespace: default 79 | Priority: 0 80 | Node: gke-my-first-cluster-1-default-pool-504c1e77-xcvj/10.128.0.15 81 | Start Time: Tue, 10 Nov 2020 16:33:10 -0600 82 | Labels: 83 | Annotations: 84 | Status: Running 85 | IP: 10.104.1.7 86 | IPs: 87 | IP: 10.104.1.7 88 | Containers: 89 | count: 90 | Container ID: docker://430313804a529153c1dc5badd1394164906a7dead8708a4b850a0466997e1c34 91 | Image: busybox 92 | Image ID: docker-pullable://busybox@sha256:a9286defaba7b3a519d585ba0e37d0b2cbee74ebfe590960b0b1d6a5e97d1e1d 93 | Port: 94 | Host Port: 95 | Args: 96 | /bin/sh 97 | -c 98 | i=0; while true; do 99 | echo "$i: $(date)" >> /var/log/1.log; 100 | echo "$(date) INFO $i" >> /var/log/2.log; 101 | i=$((i+1)); 102 | sleep 1; 103 | done 104 | 105 | State: Running 106 | Started: Tue, 10 Nov 2020 16:33:12 -0600 107 | Ready: True 108 | Restart Count: 0 109 | Environment: 110 | Mounts: 111 | /var/log from varlog (rw) 112 | /var/run/secrets/kubernetes.io/serviceaccount from default-token-2qnnp (ro) 113 | count-log-1: 114 | Container ID: docker://d5e95aa4aec3a55435d610298f94e7b8b2cfdf2fb88968f00ca4719a567a6e37 115 | Image: busybox 116 | Image ID: docker-pullable://busybox@sha256:a9286defaba7b3a519d585ba0e37d0b2cbee74ebfe590960b0b1d6a5e97d1e1d 117 | Port: 118 | Host Port: 119 | Args: 120 | /bin/sh 121 | -c 122 | tail -n+1 -f /var/log/1.log 123 | State: Running 124 | Started: Tue, 10 Nov 2020 16:33:13 -0600 125 | Ready: True 126 | Restart Count: 0 127 | Environment: 128 | Mounts: 129 | /var/log from varlog (rw) 130 | /var/run/secrets/kubernetes.io/serviceaccount from default-token-2qnnp (ro) 131 | count-log-2: 132 | Container ID: docker://eaa9983cbd55288a139b63c30cfe3811031dedfae0842b9233ac48db65387d4d 133 | Image: busybox 134 | Image ID: docker-pullable://busybox@sha256:a9286defaba7b3a519d585ba0e37d0b2cbee74ebfe590960b0b1d6a5e97d1e1d 135 | Port: 136 | Host Port: 137 | Args: 138 | /bin/sh 139 | -c 140 | tail -n+1 -f /var/log/2.log 141 | State: Running 142 | Started: Tue, 10 Nov 2020 16:33:13 -0600 143 | Ready: True 144 | Restart Count: 0 145 | Environment: 146 | Mounts: 147 | /var/log from varlog (rw) 148 | /var/run/secrets/kubernetes.io/serviceaccount from default-token-2qnnp (ro) 149 | Conditions: 150 | Type Status 151 | Initialized True 152 | Ready True 153 | ContainersReady True 154 | PodScheduled True 155 | Volumes: 156 | varlog: 157 | Type: EmptyDir (a temporary directory that shares a pod's lifetime) 158 | Medium: 159 | SizeLimit: 160 | default-token-2qnnp: 161 | Type: Secret (a volume populated by a Secret) 162 | SecretName: default-token-2qnnp 163 | Optional: false 164 | QoS Class: BestEffort 165 | Node-Selectors: 166 | Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s 167 | node.kubernetes.io/unreachable:NoExecute op=Exists for 300s 168 | Events: 169 | Type Reason Age From Message 170 | ---- ------ ---- ---- ------- 171 | Normal Scheduled 30m default-scheduler Successfully assigned default/counter to gke-my-first-cluster-1-default-pool-504c1e77-xcvj 172 | Normal Pulling 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Pulling image "busybox" 173 | Normal Pulled 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Successfully pulled image "busybox" 174 | Normal Created 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Created container count 175 | Normal Started 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Started container count 176 | Normal Pulling 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Pulling image "busybox" 177 | Normal Created 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Created container count-log-1 178 | Normal Pulled 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Successfully pulled image "busybox" 179 | Normal Started 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Started container count-log-1 180 | Normal Pulling 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Pulling image "busybox" 181 | Normal Pulled 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Successfully pulled image "busybox" 182 | Normal Created 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Created container count-log-2 183 | Normal Started 30m kubelet, gke-my-first-cluster-1-default-pool-504c1e77-xcvj Started container count-log-2 184 | ``` 185 | 186 | --- 187 | 188 | Validate the commands being presented to the pod to ensure nothing was configured incorrectly. 189 | 190 | `kubectl apply --validate -f mypod.yaml` 191 | 192 | ## 5.5 Troubleshoot Cluster Component Failure 193 | 194 | - [Troubleshoot Clusters](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/) 195 | - [A general overview of cluster failure modes](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/#a-general-overview-of-cluster-failure-modes) 196 | - [Control Plane Components](https://kubernetes.io/docs/concepts/overview/components/#control-plane-components) 197 | - [Node Components](https://kubernetes.io/docs/concepts/overview/components/#node-components) 198 | 199 | --- 200 | 201 | Components to investigate: 202 | 203 | - Control Plane Components 204 | - `kube-apiserver` 205 | - `etcd` 206 | - `kube-scheduler` 207 | - `kube-controller-manager` 208 | - `cloud-controller-manager` 209 | - Node Components 210 | - `kubelet` 211 | - `kube-proxy` 212 | - Container runtime (e.g. Docker) 213 | 214 | --- 215 | 216 | View the components with: 217 | 218 | `kubectl get all -n kube-system` 219 | 220 | ```bash 221 | NAME READY STATUS RESTARTS AGE 222 | pod/konnectivity-agent-56nck 1/1 Running 0 15d 223 | pod/konnectivity-agent-gmklx 1/1 Running 0 15d 224 | pod/konnectivity-agent-wg92c 1/1 Running 0 15d 225 | pod/kube-dns-576766df6b-cz4ln 3/3 Running 0 15d 226 | pod/kube-dns-576766df6b-rcsk7 3/3 Running 0 15d 227 | pod/kube-dns-autoscaler-7f89fb6b79-pq66d 1/1 Running 0 15d 228 | pod/kube-proxy-gke-my-first-cluster-1-default-pool-504c1e77-m9lk 1/1 Running 0 15d 229 | pod/kube-proxy-gke-my-first-cluster-1-default-pool-504c1e77-xcvj 1/1 Running 0 15d 230 | pod/kube-proxy-gke-my-first-cluster-1-default-pool-504c1e77-zg6v 1/1 Running 0 15d 231 | pod/l7-default-backend-7fd66b8b88-ng57f 1/1 Running 0 15d 232 | pod/metrics-server-v0.3.6-7c5cb99b6f-2d8bx 2/2 Running 0 15d 233 | 234 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 235 | service/default-http-backend NodePort 10.108.1.184 80:32084/TCP 15d 236 | service/kube-dns ClusterIP 10.108.0.10 53/UDP,53/TCP 15d 237 | service/metrics-server ClusterIP 10.108.1.154 443/TCP 15d 238 | 239 | NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE 240 | daemonset.apps/konnectivity-agent 3 3 3 3 3 15d 241 | daemonset.apps/kube-proxy 0 0 0 0 0 kubernetes.io/os=linux,node.kubernetes.io/kube-proxy-ds-ready=true 15d 242 | daemonset.apps/metadata-proxy-v0.1 0 0 0 0 0 cloud.google.com/metadata-proxy-ready=true,kubernetes.io/os=linux 15d 243 | daemonset.apps/nvidia-gpu-device-plugin 0 0 0 0 0 15d 244 | 245 | NAME READY UP-TO-DATE AVAILABLE AGE 246 | deployment.apps/kube-dns 2/2 2 2 15d 247 | deployment.apps/kube-dns-autoscaler 1/1 1 1 15d 248 | deployment.apps/l7-default-backend 1/1 1 1 15d 249 | deployment.apps/metrics-server-v0.3.6 1/1 1 1 15d 250 | 251 | NAME DESIRED CURRENT READY AGE 252 | replicaset.apps/kube-dns-576766df6b 2 2 2 15d 253 | replicaset.apps/kube-dns-autoscaler-7f89fb6b79 1 1 1 15d 254 | replicaset.apps/l7-default-backend-7fd66b8b88 1 1 1 15d 255 | replicaset.apps/metrics-server-v0.3.6-7c5cb99b6f 1 1 1 15d 256 | replicaset.apps/metrics-server-v0.3.6-7ff8cdbc49 0 0 0 15d 257 | ``` 258 | 259 | --- 260 | 261 | Retrieve detailed information about the cluster 262 | 263 | `kubectl cluster-info` or `kubectl cluster-info dump` 264 | 265 | --- 266 | 267 | Retrieve a list of known API resources to aid with describing or troubleshooting 268 | 269 | `kubectl api-resources` 270 | 271 | ```bash 272 | NAME SHORTNAMES APIGROUP NAMESPACED KIND 273 | bindings true Binding 274 | componentstatuses cs false ComponentStatus 275 | configmaps cm true ConfigMap 276 | endpoints ep true Endpoints 277 | events ev true Event 278 | limitranges limits true LimitRange 279 | namespaces ns false Namespace 280 | nodes no false Node 281 | persistentvolumeclaims pvc true PersistentVolumeClaim 282 | persistentvolumes pv false PersistentVolume 283 | 284 | 285 | ``` 286 | 287 | --- 288 | 289 | [Check the logs](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/#looking-at-logs) in `/var/log` on the master and worker nodes: 290 | 291 | - Master 292 | - `/var/log/kube-apiserver.log` - API Server, responsible for serving the API 293 | - `/var/log/kube-scheduler.log` - Scheduler, responsible for making scheduling decisions 294 | - `/var/log/kube-controller-manager.log` - Controller that manages replication controllers 295 | - Worker Nodes 296 | - `/var/log/kubelet.log` - Kubelet, responsible for running containers on the node 297 | - `/var/log/kube-proxy.log` - Kube Proxy, responsible for service load balancing 298 | 299 | ## 5.6 Troubleshoot Networking 300 | 301 | - [Flannel Troubleshooting](https://github.com/coreos/flannel/blob/master/Documentation/troubleshooting.md#kubernetes-specific) 302 | - The flannel kube subnet manager relies on the fact that each node already has a podCIDR defined. 303 | - [Calico Troubleshooting](https://docs.projectcalico.org/maintenance/troubleshoot/) 304 | - [Containers do not have network connectivity](https://docs.projectcalico.org/maintenance/troubleshoot/troubleshooting#containers-do-not-have-network-connectivity) 305 | --------------------------------------------------------------------------------