├── LICENSE ├── README.md ├── appendix-a.md ├── chapter01.md ├── chapter02.md ├── chapter03.md ├── chapter04.md ├── chapter05.md ├── chapter06.md ├── chapter07.md ├── chapter08.md ├── chapter09.md ├── chapter10.md ├── chapter11.md ├── chapter12.md ├── chapter13.md ├── chapter14.md ├── images ├── Kubernetes-BareMetal-Cluster-setup.dia ├── Kubernetes-BareMetal-Cluster-setup.dia~ ├── Kubernetes-BareMetal-Cluster-setup.png ├── libvirt-new-virtual-network-1.png ├── libvirt-new-virtual-network-2.png ├── libvirt-new-virtual-network-3.png ├── libvirt-new-virtual-network-4.png ├── libvirt-new-virtual-network-5.png ├── libvirt-new-virtual-network-6.png ├── libvirt-new-vm-01.png ├── libvirt-new-vm-02.png ├── libvirt-new-vm-03.png ├── libvirt-new-vm-04.png ├── libvirt-new-vm-05.png ├── libvirt-new-vm-06.png ├── libvirt-new-vm-07.png ├── libvirt-new-vm-08.png └── libvirt-new-vm-09.png └── outline.md /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Muhammad Kamran Azeem 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | --- 2 | maintainer: KamranAzeem 3 | --- 4 | 5 | # A sys admin's guide to setting up Kubernetes on Bare-Metal including High Availability. 6 | By: **Muhammad Kamran Azeem (Praqma)** 7 | 8 | * Outline: [outline.md](outline.md) 9 | 10 | -------------------------------------------------------------------------------- /appendix-a.md: -------------------------------------------------------------------------------- 1 | # Appendix A - DNS 2 | 3 | I used KVM/Libvirt to create my VMs for this lab. Libvirt uses DNSMASQ, which uses /etc/hosts for DNS records, and forwards them to upstream DNS server if the host/ IP address is not found in /etc/hosts on the virtualization server. It is important that the hostnames are resolved to correct IP addresses. I noticed that even though I had the correct IP address / hosts mapping in /etc/hosts in my physical server (KVM host),the names of the hosts were not resolving correctly from the VMs. 4 | 5 | First, here is the `/etc/hosts` file from my physical server: 6 | 7 | ``` 8 | [root@kworkhorse ~]# cat /etc/hosts 9 | 127.0.0.1 localhost.localdomain localhost 10 | 10.240.0.11 etcd1.example.com etcd1 11 | 10.240.0.12 etcd2.example.com etcd2 12 | 10.240.0.21 controller1.example.com controller1 13 | 10.240.0.22 controller2.example.com controller2 14 | 10.240.0.31 worker1.example.com worker1 15 | 10.240.0.32 worker2.example.com worker2 16 | ``` 17 | 18 | 19 | When I tried to resolve the names from a VM, it did not work: 20 | 21 | ``` 22 | [root@worker1 ~]# dig worker1.example.com 23 | 24 | ;; QUESTION SECTION: 25 | ;worker1.example.com. IN A 26 | 27 | ;; ANSWER SECTION: 28 | worker1.example.com. 0 IN A 52.59.239.224 29 | 30 | ;; Query time: 0 msec 31 | ;; SERVER: 10.240.0.1#53(10.240.0.1) 32 | ;; WHEN: Wed Sep 14 11:40:14 CEST 2016 33 | ;; MSG SIZE rcvd: 64 34 | 35 | [root@worker1 ~]# 36 | ``` 37 | 38 | 39 | This means I should restart the dnsmasq service on the physical server: 40 | 41 | ``` 42 | [root@kworkhorse ~]# service dnsmasq stop 43 | Redirecting to /bin/systemctl stop dnsmasq.service 44 | [root@kworkhorse ~]# 45 | ``` 46 | 47 | Then I start it again: 48 | ``` 49 | [root@kworkhorse ~]# service dnsmasq start 50 | Redirecting to /bin/systemctl start dnsmasq.service 51 | [root@kworkhorse ~]# 52 | ``` 53 | 54 | But it failed to start: 55 | ``` 56 | [root@kworkhorse ~]# service dnsmasq status 57 | Redirecting to /bin/systemctl status dnsmasq.service 58 | ● dnsmasq.service - DNS caching server. 59 | Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled) 60 | Active: failed (Result: exit-code) since Wed 2016-09-14 11:43:12 CEST; 5s ago 61 | Process: 10029 ExecStart=/usr/sbin/dnsmasq -k (code=exited, status=2) 62 | Main PID: 10029 (code=exited, status=2) 63 | 64 | Sep 14 11:43:12 kworkhorse systemd[1]: Started DNS caching server.. 65 | Sep 14 11:43:12 kworkhorse systemd[1]: Starting DNS caching server.... 66 | Sep 14 11:43:12 kworkhorse dnsmasq[10029]: dnsmasq: failed to create listening socket for port 53: Address already in use 67 | Sep 14 11:43:12 kworkhorse dnsmasq[10029]: failed to create listening socket for port 53: Address already in use 68 | Sep 14 11:43:12 kworkhorse dnsmasq[10029]: FAILED to start up 69 | Sep 14 11:43:12 kworkhorse systemd[1]: dnsmasq.service: Main process exited, code=exited, status=2/INVALIDARGUMENT 70 | Sep 14 11:43:12 kworkhorse systemd[1]: dnsmasq.service: Unit entered failed state. 71 | Sep 14 11:43:12 kworkhorse systemd[1]: dnsmasq.service: Failed with result 'exit-code'. 72 | [root@kworkhorse ~]# 73 | ``` 74 | 75 | This is because all the DNSMASQ processes did not exit. 76 | ``` 77 | [root@kworkhorse ~]# netstat -ntlp 78 | Active Internet connections (only servers) 79 | Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name 80 | tcp 0 0 127.0.0.1:43873 0.0.0.0:* LISTEN 3573/chrome 81 | tcp 0 0 127.0.0.1:56133 0.0.0.0:* LISTEN 24333/GoogleTalkPlu 82 | tcp 0 0 127.0.0.1:5900 0.0.0.0:* LISTEN 8379/qemu-system-x8 83 | tcp 0 0 127.0.0.1:5901 0.0.0.0:* LISTEN 9990/qemu-system-x8 84 | tcp 0 0 127.0.0.1:5902 0.0.0.0:* LISTEN 11664/qemu-system-x 85 | tcp 0 0 127.0.0.1:5903 0.0.0.0:* LISTEN 13021/qemu-system-x 86 | tcp 0 0 127.0.0.1:5904 0.0.0.0:* LISTEN 14446/qemu-system-x 87 | tcp 0 0 127.0.0.1:5905 0.0.0.0:* LISTEN 15613/qemu-system-x 88 | tcp 0 0 127.0.0.1:5939 0.0.0.0:* LISTEN 1265/teamviewerd 89 | tcp 0 0 127.0.0.1:60117 0.0.0.0:* LISTEN 24333/GoogleTalkPlu 90 | tcp 0 0 10.240.0.1:53 0.0.0.0:* LISTEN 6410/dnsmasq 91 | tcp 0 0 172.16.0.1:53 0.0.0.0:* LISTEN 1543/dnsmasq 92 | tcp 0 0 192.168.124.1:53 0.0.0.0:* LISTEN 1442/dnsmasq 93 | tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1240/sshd 94 | tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 2479/cupsd 95 | tcp6 0 0 :::22 :::* LISTEN 1240/sshd 96 | tcp6 0 0 ::1:631 :::* LISTEN 2479/cupsd 97 | [root@kworkhorse ~]# 98 | ``` 99 | 100 | 101 | So I just `killall` all the dnsmasq processes on the physical server, and started the service again. Which resulted in correct name resolution on the nodes: 102 | 103 | ``` 104 | [root@kworkhorse ~]# killall dnsmasq 105 | [root@kworkhorse ~]# 106 | ``` 107 | 108 | 109 | ``` 110 | [root@kworkhorse ~]# service dnsmasq start 111 | Redirecting to /bin/systemctl start dnsmasq.service 112 | ``` 113 | 114 | ``` 115 | [root@kworkhorse ~]# service dnsmasq status 116 | Redirecting to /bin/systemctl status dnsmasq.service 117 | ● dnsmasq.service - DNS caching server. 118 | Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled) 119 | Active: active (running) since Wed 2016-09-14 11:43:50 CEST; 2s ago 120 | Main PID: 10765 (dnsmasq) 121 | Memory: 600.0K 122 | CPU: 3ms 123 | CGroup: /system.slice/dnsmasq.service 124 | └─10765 /usr/sbin/dnsmasq -k 125 | 126 | Sep 14 11:43:50 kworkhorse systemd[1]: Started DNS caching server.. 127 | Sep 14 11:43:50 kworkhorse systemd[1]: Starting DNS caching server.... 128 | Sep 14 11:43:50 kworkhorse dnsmasq[10765]: started, version 2.76 cachesize 150 129 | Sep 14 11:43:50 kworkhorse dnsmasq[10765]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrac... inotify 130 | Sep 14 11:43:50 kworkhorse dnsmasq[10765]: reading /etc/resolv.conf 131 | Sep 14 11:43:50 kworkhorse dnsmasq[10765]: using nameserver 192.168.100.1#53 132 | Sep 14 11:43:50 kworkhorse dnsmasq[10765]: using nameserver fe80::1%wlp2s0#53 133 | Sep 14 11:43:50 kworkhorse dnsmasq[10765]: read /etc/hosts - 10 addresses 134 | Hint: Some lines were ellipsized, use -l to show in full. 135 | [root@kworkhorse ~]# 136 | ``` 137 | 138 | 139 | Correct name resolution from the VM: 140 | ``` 141 | [root@worker1 ~]# dig worker1.example.com 142 | 143 | ;; QUESTION SECTION: 144 | ;worker1.example.com. IN A 145 | 146 | ;; ANSWER SECTION: 147 | worker1.example.com. 0 IN A 10.240.0.31 148 | 149 | ;; Query time: 3 msec 150 | ;; SERVER: 10.240.0.1#53(10.240.0.1) 151 | ;; WHEN: Wed Sep 14 11:56:12 CEST 2016 152 | ;; MSG SIZE rcvd: 64 153 | 154 | [root@worker1 ~]# 155 | ``` 156 | 157 | -------------------------------------------------------------------------------- /chapter01.md: -------------------------------------------------------------------------------- 1 | # Chapter 1: Kubernetes introduction 2 | 3 | This chapter is all about what Kubernetes is. 4 | 5 | We would assume that you already know about what containers are, and what Docker is, and now you want to take the next logical step by moving to Kubernetes. We will give you a refresher anyway! 6 | 7 | 8 | ## The golden era of virtual machines: 9 | Before containers came along, we all used virtual machines to setup different (guest) operating systems running on top of main operating system of our computer. XEN, KVM, VMWare, VirtualBox, Microsoft Virtual PC and later Hyper-V are few popular virtualization solutions. Virtual machines allowed us to slice the resources of our computer, mainly RAM and CPU - and hard drive, and give it to different virtual machines connected through a virtual network on that physical machine. This was so because most of the time people were using barely 5-10% of the resources of a given physical machine. Virtual machines were cool, and saved a lot of money for a lot of people! All of a sudden all that wasted hardware resources on each physical server were being utilized by multiple virtual machines. There is something called a vitualization ration, which is ratio of physical server and the number of VMs running on it. With virtualization people achieved 1:20 and more! 10 | 11 | The downside of virtualization was (still is) that each VM would need a full OS installation. Each VM running on one physical machine just shares hardware resources and nothing else. So a full OS was the only way to go. Which may be an advantage too, as it was possible to have Windows and Linux VMs on the same physical hardware. Though the full OS is what makes VM very bulky in software sense. So if one had to run only a small web server say Apache, or Nginx, a full (albeit minimal) installation of Linux OS is required. 12 | 13 | In terms of automation, VMs pose a challenge of requiring full fledge provisioning software, and configuration managers which are required to automate the installation of multipe VMs on one or more physical machines. 14 | 15 | So the itch that "why do I need a full blown OS just to run Apache?" , soon resulted in discovery of a solution to this (itch). Linux Kernel 2.6.24 introduced cgroups (Control Groups) , and LXC - Linux Containers, was born, first released in August 2008. 16 | 17 | LXC provides operating system-level virtualization through a virtual environment that has its own process and network space, instead of creating a full-fledged virtual machine. LXC relies on the Linux kernel cgroups functionality. It also relies on other kinds of namespace isolation functionality - such as network namespaces, which were developed and integrated into the mainline Linux kernel. 18 | 19 | With LXC, it was suddenly possible to run just the piece of software you wanted to run in an isolated (sort of change-rooted) environment, by just having enough supporting libarires in the container, and sharing the Linux kernel from the host OS running on the physical machine. So, to run Apache, one does not need to have a full blown linux OS anymore! 20 | 21 | In 2013, Docker seazed the opportunity by creating a very usable and user-friendly container (image) format around LXC and provided necessary tooling to create, and manage containers on Linux OS. This brought Docker to limelight, and actually made Docker so popular that many people mistakenly think that Docker is the software (or company) which brough them containers. This is of-course not true. Docker just made it super easy to use containers. 22 | 23 | With that, all of a sudden the golden era of VMs came to an end, and the golden era of containers started. 24 | 25 | # Docker Containers 26 | 27 | Best explained from Docker's own web pages: 28 | 29 | >Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, >system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is >running in. 30 | 31 | Docker uses the resource isolation features of the Linux kernel such as cgroups and kernel namespaces, and a union-capable file system such as aufs and others to allow independent "containers" to run within a single Linux instance, avoiding the overhead of starting and maintaining virtual machines. This helps Docker to provide an additional layer of abstraction and automation of operating-system-level virtualization on Linux. 32 | 33 | So running Apache web server on some (physical or virtual) Linux computer would be as simple as: 34 | 35 | ``` 36 | [kamran@kworkhorse kamran]$ docker run -p 80:80 -d httpd 37 | Unable to find image 'httpd:latest' locally 38 | latest: Pulling from library/httpd 39 | 43c265008fae: Pull complete 40 | 2421250c862c: Pull complete 41 | f267bf8fc4ac: Pull complete 42 | 48efff98b4ba: Pull complete 43 | acb686eb7ab7: Pull complete 44 | Digest: sha256:9b29c9ba465af556997e6924f18efc1bbe7af0dc1b3f11884010592e700ddb09 45 | Status: Downloaded newer image for httpd:latest 46 | 14954a5c880c343d57e00cf270ca3f3212ef0e3b23635c1a5a40fe279548671f 47 | [kamran@kworkhorse kamran]$ 48 | ``` 49 | 50 | The -p 80:80 binds the container's port 80 to the host's port 80. To make sure that it is actually running on port 80 on the host. We verify: 51 | 52 | ``` 53 | [kamran@kworkhorse kamran]$ docker ps 54 | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 55 | 14954a5c880c httpd "httpd-foreground" 5 minutes ago Up 5 minutes 0.0.0.0:80->80/tcp nauseous_nobel 56 | [kamran@kworkhorse kamran]$ 57 | ``` 58 | 59 | Accessing this web server from any other machine (e.g. localhost/itself)is now possible using normal methods, such as curl: 60 | 61 | ``` 62 | [kamran@kworkhorse kamran]$ curl localhost 63 |

It works!

64 | [kamran@kworkhorse kamran]$ 65 | ``` 66 | 67 | 68 | To show you that how tiny this Apache (httpd) image is, here is something to feast your eyes: 69 | ``` 70 | [kamran@kworkhorse kamran]$ docker images | grep httpd 71 | httpd latest 9a0bc463edaa 3 days ago 193.3 MB 72 | [kamran@kworkhorse kamran]$ 73 | ``` 74 | Just 193 MB! Sure it is much more than the Apache/HTTPD package itslef, but it is tiny compared to the full OS installation on any Linux distribtion and installing Apache on top of that. (Yes, we know Alpine is tiny OS itslef - so excluding Alpine!) 75 | 76 | 77 | Ok so now you have an apache service running as a container! Great! Now what? Scale up to 50 apache instances? Well, can't really do that with VMs, we know that already! Scaling up with docker - well not straight forward exactly. What if the container dies for some reason? Any possibility to have it restarted? Again, with plain docker, that is not possible? 78 | 79 | So, do we need to create our own tooling to solve such problems, No! - because we have - Kubernetes! 80 | 81 | # Kubernetes - The container orchestrator: 82 | 83 | Kubernetes (Greek for "helmsman" or "pilot") - often referred simply as *k8s* - was founded and announced by Google in 2014. It is an open source container cluster manager, originally designed by Google, and is heavily based on Google's own *Borg* system. It is a platform for automating deployment, scaling, and operations of containers across a cluster of nodes. By design it is a collection of loosely coupled components, which help it to be flexible, scaleable and very manageable. This architecture helps it to meet a wide variety of workloads. Kubernetes uses Docker as it's container runtime engine, though it is possible to use other container runtime engines as well, such as *Rocket*. 84 | 85 | Kubernetes uses Pod as its fundamental unit for scheduling instead of Container. Whereas a pod can be composed of one or more containers guaranteed to be colocated on a single host. Instead of containers having an IP address - as in Docker, it is the pod which is assigned a (unique) IP address by the container runtime engine, and that IP is shared with all the containers inside that particular pod. Two same containers (exposing the same port number) can never be part of the same pod though. 86 | 87 | To elaborate it, we can take an example of a PHP based web-application, which uses a MySQL database backend. The web application is accessible on port 80 and the database is available on port 3306. When these services are containerized, they can be broken down in two containers - a web/PHP container and a db container. In plain docker setup, even involving docker-compose, these two would have been two containers running separately, each having a different IP address from the IP range belonging to docker0 bridge interface on the docker host. However in Kubernetes, these two could have been in the same pod, and they would actually share an IP address assigned to the pod by the same docker0 bridge! This helps to create applications in a tighter way. Continueing the same PHP/MySQL example, now we do not need to *expose* the mysql port at pod level. The web service can easily access the mysql application using **localhost** . So the only exposed port from this example pod would be port 80 used by the front end web application. 88 | 89 | It is arguably correct that combining the web application and a mysql database in a single pod is not a good idea and defeats the purpose of the availability of a service when one component goes down. That is why it is good idea to use separate pod for the web app and separate pod for db. This way we can independently scale up and down the web app, without worrying about scaling issues - such as - with mysql db. 90 | 91 | Perhaps an example of multiple containers in a single pod can be a web service and a cache service such as memcache, etc. Since it is best to have a cache as close to the service as possible. Also, normally cache is stateless, (as is web service (normally)), this pod (containing these two containers), can be scaled up and down independently. 92 | 93 | So if you are asking your self, "Why use Kubernetes instead of plain Docker or Docker-Compose?", the answer is simple. Kubernetes has all the tooling available to deal with containers, clustering and high availability, which a simple docker or docker-compose setup does not have. It is explained next. 94 | 95 | One of the main design ideas / principles of Kubernetes is the concept of "declared state". When you want to run containers on Kubernetes cluster, you describe what you want to run. You declare that to Kubernetes. Kubernetes takes that as declared state for that container and ensures that the running state matches the declared state. i.e. Kubernetes runs the container as declared in the declared state. If, for example, you kill the pod yourself, or somehow the pod gets killed accidentally (for whatever reason), Kubernetes sees this as a mismatch between the declared state and the running state. It sees that you have declared a web server (nginx or apache) to run as a pod, and now there are no pods running matching this description. So Kubernetes scheduler kicks in and tries to start it again all by itself - automatically. If there are multiple worker nodes available, the scheduler tries to restart the pod on another node! This is a *huge* advantage over any plain docker or docker-compose setup one may have! 96 | 97 | Referring to the example of the PHP based web application and it's backend database, had it being run on a plain docker, (or docker-compose), failure of a service container would not trigger an automatic restart. However in Kubernetes, if the web service pod dies, Kubernetes starts it again, and if the db pod dies Kubernetes starts it again, automatically! 98 | 99 | Also, if you want to scale the web application to 50 instances , all you need to do is to declare this to Kubernetes. Kubernetes scheduler will spawn 50 instances of this web service pod. Though you might want to add a load balancer in front of these 50 instances! 100 | 101 | Availability and scaling are just two of the many exciting features of Kubernetes. There are many more interesting features in Kubernetes, which makes the job of managing your applications super easy. e.g. The possibility of having separate namespaces to have isolation of pods, and access controls, labels, selectors, etc, are some other helpful features. 102 | 103 | 104 | ## Pods, Deployments, Labels, Selectors and Services 105 | 106 | ### Pods: 107 | So from above, we already learned about the pod. i.e. It is the fundamental unit of scheduling in Kubernetes; each pod gets a unique IP address - from the pool of IPs belonging to *Pod Network*; each pod can have one or more containers in it. 108 | 109 | ### Deployments (formerly Replication Controllers): 110 | An interesting thing is that you cannot directly create a pod in kubernetes! For example, if you want to run nginx web service, you don't create a pod for nginx. Instead, you have to create a *Deployment* (formerly known as *Replication Controller* or *RC*). This deployment will have the nginx pod inside it. Normally when you ask Kubernetes to run a container image without any specific definition/configuration file, then Kubernetes creates a deployment for it, makes the pod part of this deployment and runs the pod - all automatically. The Deployment object can have labels, which you can use in selectors while creating or exposing services, or for other general deployment management tasks. (todo: More on this later.) The deploymnet object is sometimes called *deployment controller* just as it's predecessor was called *replication controller*. 111 | 112 | ### Labels and Selectors 113 | Kubernetes enables users and internal components to attach key-value pairs called "labels" to any API object in the system, e.g. pods, deployments, nodes, etc. Correspondingly, "label selectors" are queries against labels that resolve to matching objects. Labels and selectors are the primary grouping mechanism in Kubernetes, and are used to determine which components or objects certain operation will be applied to. The name of the key and the value it contains depend solely on the operator. There are no fixed labels and no reserved words. 114 | 115 | For example, if the Pods of an application have labels for "tier" (front-end, back-end, etc.) and "release_track" (canary, production, etc.), then an operation on all of the back-end production nodes could use a selector such as the following: 116 | 117 | ``` 118 | . . . tier=back-end AND release_track=production 119 | ``` 120 | 121 | ### Services 122 | 123 | After a deployment is created it is not accissble directly for general use. One way is to query the deployment to check for the IPs of it's backend pods, and then access those IPs to utilize the services offered by those pods. This may work for a deploymnet containing one pod, but for a deployment having 10 or 50 pods behind it, accessing the individual IPs is not smart. Also depending on your cluster/network design, it may not be possible to access the pod IP directly in some cases. Also the pods are getting deleted and recreatd all the time, and their IP addresses keep changing. You need a persistent way of accessing these pods (or deployments to be precise). Enter *Services*. 124 | 125 | When you want to make a deployment available over a persistent IP, you expose the deployment as a *Service*. Exposing a deployment actually creates the Service object, assigns it a cluster IP and a DNS name for Kubernetes internal usage. So now, the web application can access the backend db using a proper DNS name such as "db.default.cluster.local". This way, even if the service's cluster IP is changed because of service deletion and re-creation, the web application will still be able to access it using this DNS name. It may look like that the emphasis is on DNS here, but in reality, the DNS points to the cluster IP, and without the ability to create a Service in the first place, there would be no cluster IP for this deployment! 126 | 127 | So service is the way to make the deployments available for actual use. The service can be exposed in different ways though, or there are few different types of service. The service can be exposed as **ClusterIP** (which is the default), as **NodePort**, or as **LoadBalancer**. The default type ClusterIP obtains an IP address from the pool of the IPs from Services IP range, also known as Cluster Network. Interestingly the Cluster Network is not really a network, rather special type of labels, which is used by kube-proxy software running on the worker nodes to create complex IPTables rules on worker nodes. When a pod tries to access a service using it's cluster IP, the IPtables rules ensure that the trafic from the client pod can reach the service pod. The Cluster IPs are not accessible outside worker nodes. 128 | 129 | 130 | 131 | ## Accessing services: Cluster IPs, Load Balancers, NodePorts: 132 | 133 | We already covered Cluster IPs above. Now comes the question that how a service can be accessed from outside the worker nodes, or outside the Kubernetes cluster? e.g. The web application in the example above (in the Serices section) can access the backend DB thanks to the cluster IP. But how do we access the frontend web application? Well, to do that we have two more types of Services, which are NodePort and LoadBalancer. 134 | 135 | ### Node Port: 136 | When you expose a deployment as a service and declare it of the type *NodePort*, then Kubernetes exposes the port of the backend pods of the deployment on the worker node, and replicates this on all worker nodes. Now, if you want to access this front end application, you access it using the IP address of any of the worker nodes. The process of exposing the port is not so straight forward internally! For example you want to expose your deployment running 10 nginx pods as a NodePort type of service. Kubernetes will assign it a higher number port (e.g. 33056) , bind that on the worker node, and map that high number port to the port 80 of the deploymnet (or to the ports of the pods inside of the deployment). When you want to access this web aplication, you access it in the form of `:33056` . This is a bit awkward, but this is the way with NodePort! 137 | 138 | ### Load Balancer: 139 | At the moment it is only possible to expose a deployment as a LoadBalancer type of service, IF, you are running your cluster on GCE (Google compute Engine), or AWS. The LoadBalancer type requires that you can map some public IP to the IPs of the backend pods of a deployment object. This is not dependent on ports though. This is the ideal way of accessing the service on your Kubernetes cluster, but so far this is the biggest problem if you are not on GCE or AWS. The good news is that Praqma has created a LoadBalancer just for you! (todo: More on this later!) 140 | 141 | 142 | 143 | 144 | 145 | ## Kubernetes infrastructure components: 146 | 147 | Now we come to some infrastructure level components of Kubernetes. Kubernetes uses **Worker Nodes** to run the pods. To manage worker nodes, there are **Controller Nodes**, and all the cluster state (meta data, etc), is maintained in another set of nodes running **etcd**. Though **Load Balancer** is a crucial component of the entire equation, it is technically not a formal part/component of Kubernetes cluster or kubernetes services. 148 | 149 | These components are shown in the diagram in the beginning of this document. 150 | 151 | Each type of node has specific role and thus have special services running on it. Very briefly, these are: 152 | 153 | * Etcd node: etcd 154 | * Controller node: kube-apiserver, kube-scheduler, kube-controller-manager. (+ kubectl to manage these services) 155 | * Worker node: kubelet, kube-proxy, docker 156 | * Load Balancer: iptables, haproxy, nginx, etc. (it depends!) 157 | 158 | 159 | -------------------------------------------------------------------------------- /chapter02.md: -------------------------------------------------------------------------------- 1 | # Chapter 2: Infrastructure design and provisioning 2 | 3 | * Write about what type of hardware is needed. If not physical hardware, then what size of VMs are needed. etc. 4 | * Discuss what type of network technologies are we going to use. Such as flannel or CIDR, etc. 5 | * This will be a relatively short chapter. 6 | 7 | So, from chapter 1, we have built our knowledge-base about what Kubernetes is, what are pods, etc. Naturally now the question arises, how can we have our own Kubernetes cluster? How to design it? How to set it up? In this chapter we have answers to these questions and more! 8 | 9 | This chapter will discuss how we design our infrastructure. We know that the book is about Kubernetes on Bare Metal hardware. But what if one does not have the necessary amount of hardware? The most natural option which comes to mind is to use virtual machines! Of-course! And cloud providers come to mind also. However each cloud provider has their own way of networking .i.e. their own way of how machines talk to each other and with the rest of the world. So we decided to use simple VMs running on a simple (but very powerful) hypervisor - Libvirt/KVM. You are welcome to use VirtualBox or VMware, or HyperV too. These VMs will be as good as bare-metal. There are no fancy networks, or firewalls, or permission groups or other access controls. It is just a hypervisor running on our very own work computer, with a single virtual network, and that is about it. This makes is very easy to learn and experiment with Kubernetes. It is still possible to run this setup on AWS or GCE or any other cloud provider of your choice though. 10 | 11 | ## How many VMs and what size? 12 | 13 | As we know from [chapter 1](chapter01.md), Kubernetes cluster has three main components. These three components are: 14 | 15 | * Etcd 16 | * Control plane (comprising of: API Server, Scheduler, Controller Manager) 17 | * Worker(s) 18 | 19 | It is technically possible to have all these components on a single machine - and brave souls have done that; a minimum of three nodes are recommended for smooth functionality - and clarity. So in a simple three node Kubernetes cluster, you will have an etcd node holding all the cluster meta data, a controller node acting as a control plane talking to the etcd node, and one worker node, which runs the pods. What if the worker node dies? Well, that very reason we need to have at least two worker nodes in a Kubernetes cluster, so if one node dies, Kubernetes scheduler can re-schedule the pods to the surviving node. That is actually the main point behind the Kubernetes project. Otherwise plain Docker setup can pretty much do the same thing! For experimentation and learning, a simple three node Kubernetes cluster is good enough. 20 | 21 | ## What about reaching the services on the pods from outside the cluster? 22 | To make sure that the incoming traffic can reach the pods, we need an additional component called a load balancer (more about this later - we promise!), or just use NodePort (more about this soon!). Just like when you buy a camera, an important component of the camera is missing, which is called a **tripod** :). You need to additionally buy this missing component when you buy a camera. Similarly you need to setup/add a load balancer when you setup a Kubernetes cluster, because - as you guessed it, it is a component which is missing from the Kubernetes software (at the moment). 23 | 24 | 25 | ## Do we need high availability? 26 | Absolutely! - if you want your cluster for more than just learning! 27 | 28 | So, even if you use a dedicated node for each component, there would still be no high availability against failure of "non-worker" components of the cluster, such as etcd node and controller node. So even our pods may survive a failing node, the cluster itself cannot survive if the controller node fails or if the etcd node fails. So we need to have high availability for these components too. 29 | 30 | To have high availability for non-worker components, we need to use some sort of (external) clustering solution. What we do is that we setup three nodes for etcd, which can form it's own individual/independent cluster. Then we choose to use two nodes for controller instead of just one and use LVS/IPVS to make it a small cluster too. This makes is quite resilient to failures! We have a load balancer, but that will become a single point of failure in our cluster, if we do not protect it against hardware failures. So we need to setup at least two nodes for load balancer, using LVS/IPVS HA technologies. We now have the number of required nodes which looks like the following: 31 | 32 | 33 | * 3 x Etcd nodes - Each with: 0.5 GB RAM, 4 GB disk 34 | * 2 x Control plane nodes - Each with: 0.5 GB RAM, 4 GB disk 35 | * 2 x Worker nodes - Each with: 1.5 GB RAM, 20 GB disk 36 | * 2 x Load Balancer nodes - Each with: 0.5 GB RAM, 4 GB disk 37 | 38 | That makes a total of nine (9) nodes. 39 | 40 | 41 | Though not abolsulutely necessary, we also need a shared storage solution to demonstrate the concept of network mounted volumes (todo - ????) . We *can* use the Load balancer nodes to setup NFS and make it Highly available. 42 | 43 | **Note about NodePort:** 44 | In Kubernetes, `NodePort` is way to expose a service to the outside world - when you specify it in the `kubectl expose --type=NodePort` (todo - confirm / verify). What this does is that it allows you to expose a service on the worker node, just like you can do the port mapping on a simple docker host. This makes the service available through all the worker nodes, because node port works it's magic by setting up certain iptables rules on all worker nodes. So no matter which port you land on (from outside the cluster), the service responds to you. However, to do this, you need to have the external DNS pointing to the worker nodes. e.g. www.example.com can have two A type addresses in DNS pointing to the IP addresses of two worker nodes. In case a node fails, DNS will do it's own DNS round robin. This may work, but DNS round robin does not work well with sticky sessions. Also, managing services through ports is kind of difficult to manage / keep track of. Plus it is totally uncool! (todo: editor can remove this if unhappy :( ). 45 | 46 | 47 | ## The choice of HA technology for etcd, controllers and load balancers: 48 | 49 | We selected three nodes for the etcd service. First, because it stores the meta data about our cluster, which is most important. Secondly etcd nodes can form their own individual/independent cluster. To satisfy the needs of a quorum, we decided to use three nodes for etcd cluster. This means failure of one node does not affect the etcd operation. (todo: rewrite:) In order for etcd cluster to be healthy, it needs to have quorum. i.e. alive nodes being more than half of the size of the total number of nodes of etcd cluter. So having a total of two nodes is not helpful. When one fails, the etcd custer will instantly become unhealthy. This is what we do not want to happen in a production cluster. 50 | 51 | We know that all Kubernetes components watch the API server, running on controller node. So making sure that control plane remains available at all times. Unfortunately, controller nodes are not cluster aware. So we need to provide some sort of HA for them. This high availability can be provided in two ways. 52 | 53 | * By setting up a proxy/load balancer (using HAProxy on our load balancer). This would mean that the load balancer has to be the first component to come alive when the cluster boots up. So cluster has dependency on the load balancer nodes. This is easier to setup but has dependency problem. 54 | * By setting up IPVS/LVS (using heartbeat, etc) on the controller nodes themselves, and only have a floating IP/VIP on the two nodes. This way we do not have to wait for the load balancer to come up before the rest of the cluster. We then use this *Controller VIP* in the worker nodes to point to controller node. This involves installing and configuring some HA software on controller nodes, but does not have dependency problems. 55 | 56 | We believe the second method holds more value, so we go for it. Although just for completeness sake, we will also show a method to do it through HAProxy. 57 | 58 | (todo: Load balancers use: (using IPVS/LVS + Heartbeat)) 59 | (todo: In case we use IPVS, would we need additional STONITH device to prevent split brain syndrome?) 60 | 61 | 62 | So now we know all about HA! Lets talk about Kubernetes networking as well! 63 | 64 | 65 | ## Kubernetes Networking: 66 | A Kubernetes cluster uses three different types of networks. 67 | 68 | todo: Needs review 69 | 70 | 71 | There are three main networks / IP ranges involved in a Kubernetes cluster setup. They are: 72 | * Infrastructure network 73 | * Pod Network (Overlay or CNI) (We are using CNI network in this document.) 74 | * Service Network (aka. Cluster Network) 75 | 76 | Out of above three, the first two (Infrastructure and Pod networks) are straight forward. The *Infrastructure network* is where the actual physical (or virtual) machines are connected. The *Pod network* can be either an overlay network or a CNI based network. Normally overlay networks are software defined networks (SDN). Flannel is an example of overlay network. Overlay network has this big network spread over several cluster nodes, and each node will have a subnet of that overlay network configured in it. Normally only worker nodes have overlay network configured, and to get this done, a special service such as flanneld (in case it is flannel you are using) runs on the nodes. Then you configure docker to use this network to create containers on. Normally overlay network is only configured on worker nodes, and is only accessible within the worker nodes. If you want to access pod IPs (belonging to this overlay network) from other cluster nodes, you will need to run the overlay network service such as flannel on that node. Only then the pods become accessible from that node. 77 | 78 | In CNI networking, there is again a big (sort-of) overlay network. Each worker node obtains a subnet from the big (sort of overlay) network and configures a bridge (cbr0) using that subnet. Each worker node creates pods/containers in it's own subnet. These subnets *do not* talk to each other. To make them talk to each other we add routes of each subnet on the router (working as default gateway) connected to our cluster network. Now each request from any node in the cluster (or even from outside), can consult the routing table and reach the required pods. So there is no need of running an additional service, and the benfit is that the pod IPs (no matter which worker node the pods are on), are accessible from any node within the cluster. This is the network used and shown in the diagram at the beginning of this document. 79 | 80 | 81 | The third (Service network), is not actually a network. (It is, and it is not - at the same time; it is special / complicated). It is used when you decide to expose a kubernetes deployment as a service. That is the only time when this comes into play. When you expose a deployment as a service, Kubernetes assigns it a Cluster IP from the *service* IP range. This IP is then used by kube-proxy running on each worker node to write special IPTables rules so these services are accessible from the various pods running anywhere on the worker nodes. The Service IP or a Cluster IP is actually just an abstraction, and represents the pods at it's backend; which (the pods) actually belong to a *Deployment* (formerly known as *Replication Controller*). 82 | 83 | ## How does the networking look like in Kubernetes? 84 | 85 | So when you setup a Kubernetes cluster, and start up your first pod, the pod is deployed on a worker node. The pod gets an IP from the pod network, which can be a type of overlay network, or it can be a type of CNI based network. Either way, when the pod gets an IP from the pod network, the worker nodes are the only *nodes* which can access that pod. Of-course if you run more pods, they (the pods) can definitely access each other directly. To access these nodes from outside the cluster, you still need a bit of work. You know that every pod is actually part of a *Deployment* object (formerly *RC* or *Replication Controller*). So you expose this deployment as a service, which automatically assigns it a *Cluster IP*. This Cluster IP is still not accessible from outside the cluster. This new *service* can now be referenced/accessed by the pods using a special name formatted as a DNS name. While exposing the deployment as a service, you also assign it an *External IP*, which normally belongs to the infrastructure network. Although this external IP still does no good directly. It is barely used as a label. If you try to access this external IP from anywhere in the cluster, or from anywhere on the infrastructure, it will not be accessible. 86 | 87 | 88 | * Infrastructure Network: The network your physical (or virtual) machines are connected to. Normally your production network, or a part of it. 89 | * Service Network: The (completely) virtual (rather fictional) network, which is used to assign IP addresses to Kubernetes Services, which you will be creating. (A Service is a frontend to a RC or a Deployment). It must be noted that IP from this network are **never** assigned to any of the interfaces of any of the nodes/VMs, etc. These (Service IPs) are used behind the scenes by kube-proxy to create (weird) iptables rules on the worker nodes. 90 | * Pod Network: This is the network, which is used by the pods. However it is not a simple network either, depending on what kubernetes network solution you are employing. If you are using flannel, then this would be a large software defined overlay network, and each worker node will get a subnet of this network and configured for it's docker0 interface (in very simple words, there is a little more to it). If you are employing CIDR network, using CNI, then it would be a large network called **cluster-cidr** , with small subnets corresponding to your worker nodes. The routing table of the router handling your part of infrastructure network will need to be updated with routes to these small subnets. This proved to be a challenge on AWS VPC router, but this is piece of cake on a simple/generic router in your network. I will be doing it on my work computer, and setting up routes on Linux is a very simple task. 91 | 92 | Kelsey used the following three networks in his guide, and we intend to use the same ones, so people are not confused in different IP schemes when they are following this book and at the same time checking his guide. Below are the three networks , which we will use in this book. 93 | 94 | * Infrastructure network: 10.240.0.0/24 95 | * Service Network: 10.32.0.0/24 96 | * Pod Network (Cluster CIDR): 10.200.0.0/16 97 | 98 | 99 | # Infrastructure layout: 100 | 101 | Building upon the information we have gathered so far, especially about the Kubernetes networking, we have designed our cluster to look like this: 102 | 103 | ![images/Kubernetes-BareMetal-Cluster-setup.png](images/Kubernetes-BareMetal-Cluster-setup.png) 104 | 105 | 106 | # Other software components of the cluster: 107 | 108 | ## DNS: 109 | It is understood that all nodes in this cluster will have some hostname assigned to them. It is important to have consistent hostnames, and if there is a DNS server in your infrastructure, then it is also important what are the reverse lookup names of these nodes. This information is critical at the time when you will generate SSL certificates. 110 | 111 | The dns domainname we will use in this setup is `example.com` . Each node will have a hostname in the form of `*hostname*.example.com` . If you do not have a DNS setup yet, it is good time to set it up now. As we mentioned this earlier, we are using Libvirt/KVM to provide VMs for our example setup. It would be interesting for you to know that libvirt has build in DNS service called `dnsmasq`. Setting up dnsmasq service is very simple. The dnsmasq service uses the `/etc/hosts` file for name resolution. So we will populate our `/etc/hosts` file with the necessary hostnames and corresponding IP addresses and restart the dnsmasq service. That way, any nodes (including our local work computer), which use dnsmasq, will resolve the example.com related hostnames correctly. 112 | 113 | ## Operating System: 114 | We are using Fedora 24 64 bit server edition - on all nodes (Download from [here](https://getfedora.org/en/server/download/) ). You can use a Linux distribution of your choice. 115 | For poeple wanting to use Fedora Atomic, we would like to issue a warning. Fedora Atomic, (great distribtion), is a collection of binaries (etcd, kubernetes) bundled together (in a read only filesystem), and individual packages *cannot* be updated. There is no yum/dnf, etc. In this book we are using Kubernetes 1.3 (the latest version), which is still not part of Fedora Atomic 24, so we decided to use Fedora 24 server edition (in minimal configuration), and added the packages we need directly from their official websites. 116 | 117 | ## Supporting software needed for this setup: 118 | * Kubernetes - 1.3.0 or later (Download latest from Kubernetes website) 119 | * etcd - 2.2.5 or later (The one that comes with Fedora is good enough) 120 | * Docker - 1.11.2 or later (Download latest from Docker website) 121 | * CNI networking [https://github.com/containernetworking/cni](https://github.com/containernetworking/cni) 122 | * Linux IPVS, heartbeat / pacemaker (todo) 123 | 124 | 125 | ## Expectations 126 | 127 | With the infrastructure choices made above, we have hope to have the following working on our Kubernetes cluster. 128 | 129 | * 3 x etcd nodes (in H/A configuration) 130 | * 2 x Kubernetes controller nodes (in H/A configuration) 131 | * 2 x Kubernetes worker nodes 132 | * SSL based communication between all Kubernetes components 133 | * Internal Cluster DNS (SkyDNS) - as cluster addon 134 | * Default Service accounts and Secrets 135 | * Load Balancer (in H/A configuration) 136 | 137 | # Summary: 138 | In this chapter, we designed our infrastructure. Hurray! 139 | -------------------------------------------------------------------------------- /chapter03.md: -------------------------------------------------------------------------------- 1 | # Infrastructure Setup 2 | 3 | * Here we provision our machines, and also setup networking. 4 | 5 | Note that I am doing this provisioning on my work computer, which is Fedora 23 64 bit, and I will use the built in Libvirt/KVM for virtualization. You can use any other virtualization software, or use real hardware! 6 | 7 | First, setting up the new infrastructure network in KVM. 8 | 9 | If you use libvirt, then you probably know that libvirt sets up a virtual network `192.168.124.0/24` and sets it up as default. We wanted to be Kelsey's guide as possible (todo: editor, should we mention that?), so my infrastructure network is going to be `10.240.0.0/24` . I will just create a new virtual network (10.240.0.0/24) on my work computer. 10 | 11 | ## Setup new virtual network in KVM: 12 | 13 | Start Virtual Machine Manager and go to "Edit"->"Connection Details"->"Virtual Networks" . Then follow the steps shown below to create a new virtual network and name it **Kubernetes**. Note that this is a NAT network, connected to any/all physical devices on my computer. So whether I am connected to wired network, or wireless, it will work. 14 | 15 | ![images/libvirt-new-virtual-network-1.png](images/libvirt-new-virtual-network-1.png) 16 | ![images/libvirt-new-virtual-network-2.png](images/libvirt-new-virtual-network-2.png) 17 | ![images/libvirt-new-virtual-network-3.png](images/libvirt-new-virtual-network-3.png) 18 | ![images/libvirt-new-virtual-network-4.png](images/libvirt-new-virtual-network-4.png) 19 | ![images/libvirt-new-virtual-network-5.png](images/libvirt-new-virtual-network-5.png) 20 | ![images/libvirt-new-virtual-network-6.png](images/libvirt-new-virtual-network-6.png) 21 | 22 | The wizard will create an internal DNS setup (automatically) for example.com . 23 | 24 | Now, we have the network out of the way, lets decide upon the size of these virtual machines, and what IPs will be assigned to them. Then, at the time of VM creation, we will attach them (VMs) to this new virtual network. 25 | 26 | 27 | ## IP addresses and VM provisioning: 28 | 29 | Here IP addresses of VMs we are about to create: 30 | 31 | * etcd1 10.240.0.11/24 32 | * etcd2 10.240.0.12/24 33 | * etcd3 10.240.0.13/24 34 | * controller1 10.240.0.21/24 35 | * controller2 10.240.0.22/24 36 | * worker1 10.240.0.31/24 37 | * worker2 10.240.0.32/24 38 | * lb1 10.240.0.41/24 39 | * lb2 10.240.0.42/24 40 | 41 | **Notes:** 42 | * There will be (additional) floating IP/VIP for controllers, which will be: `10.240.0.20` 43 | * There will be (additional) floating IP/VIP for load balancers, which will be: `10.240.0.40` 44 | * If you decide to use HAProxy to provide HA for controller nodes, then you can use the the load balancer's VIP (for port 6443), instead of having a dedicated (floating/V) IP for control plane. 45 | 46 | 47 | **More Notes:** 48 | * Kelsey's Kubernetes guide (the one this book uses as a reference), starts the node numbering from 0. We start them from 1 for ease of understanding. 49 | * The FQDN of each host is `*hostname*.example.com` 50 | * The nodes have only one user, **root** ; with a password: **redhat** . 51 | * I used libvirt's GUI interface (virt-manager) to create these VMs, but you can automate this by using CLI commands. 52 | 53 | 54 | ## Screenshots from actual installation 55 | 56 | ![images/libvirt-new-vm-01.png](images/libvirt-new-vm-01.png) 57 | ![images/libvirt-new-vm-02.png](images/libvirt-new-vm-02.png) 58 | ![images/libvirt-new-vm-03.png](images/libvirt-new-vm-03.png) 59 | ![images/libvirt-new-vm-04.png](images/libvirt-new-vm-04.png) 60 | ![images/libvirt-new-vm-05.png](images/libvirt-new-vm-05.png) 61 | ![images/libvirt-new-vm-06.png](images/libvirt-new-vm-06.png) 62 | ![images/libvirt-new-vm-07.png](images/libvirt-new-vm-07.png) 63 | ![images/libvirt-new-vm-08.png](images/libvirt-new-vm-08.png) 64 | ![images/libvirt-new-vm-09.png](images/libvirt-new-vm-09.png) 65 | 66 | **Notes:** 67 | * One of the installation screen shows OS as Fedora 22 (Step 5 of 5); but it is actually Fedora 24. Libvirt is not updated yet to recognize Fedora 24 ISO images. 68 | * The last screenshot is from the installation of second etcd node (etcd2). (todo: may be we can get a new screenshot?) 69 | 70 | 71 | ## Actual resource utilization from a running Kubernetes cluster: 72 | To give you an idea about how much RAM (and othere resources) are actually used by each type of node, we have provided some details from the nodes of a similar Kubernetes cluster. It should help you size your VMs accordingly. Though for production setups, you definitely want more resources for each component. 73 | 74 | 75 | ### etcd: 76 | Looks like etcd uses very little RAM! (about 88 MB!). I already gave this VM the minimum of 512 MB of RAM. 77 | 78 | ``` 79 | [root@etcd1 ~]# ps aux | grep etcd 80 | root 660 0.2 9.2 10569580 46508 ? Ssl Sep14 16:31 /usr/bin/etcd --name etcd1 --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/etc/etcd/kubernetes.pem --peer-key-file=/etc/etcd/kubernetes-key.pem --trusted-ca-file=/etc/etcd/ca.pem --peer-trusted-ca-file=/etc/etcd/ca.pem --initial-advertise-peer-urls https://10.240.0.11:2380 --listen-peer-urls https://10.240.0.11:2380 --listen-client-urls https://10.240.0.11:2379,http://127.0.0.1:2379 --advertise-client-urls https://10.240.0.11:2379 --initial-cluster-token etcd-cluster-0 --initial-cluster etcd1=https://10.240.0.11:2380,etcd2=https://10.240.0.12:2380 --initial-cluster-state new --data-dir=/var/lib/etcd 81 | [root@etcd1 ~]# 82 | 83 | 84 | [root@etcd1 ~]# free -m 85 | total used free shared buff/cache available 86 | Mem: 488 88 122 0 278 359 87 | Swap: 511 7 504 88 | [root@etcd1 ~]# 89 | ``` 90 | 91 | ### Controller (aka master): 92 | Looks like contoller nodes use only 167 MB RAM, and can run on 512 MB of RAM and will still function properly. Of-course the larger your cluster becomes and the more pods you start to create, this may quickly become insufficent. (todo: not tested though!) 93 | ``` 94 | [root@controller1 ~]# ps aux | grep kube 95 | root 8251 0.6 11.4 147236 116540 ? Ssl 09:12 0:42 /usr/bin/kube-apiserver --admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota --advertise-address=10.240.0.21 --allow-privileged=true --apiserver-count=2 --authorization-mode=ABAC --authorization-policy-file=/var/lib/kubernetes/authorization-policy.jsonl --bind-address=0.0.0.0 --enable-swagger-ui=true --etcd-cafile=/var/lib/kubernetes/ca.pem --insecure-bind-address=0.0.0.0 --kubelet-certificate-authority=/var/lib/kubernetes/ca.pem --etcd-servers=https://10.240.0.11:2379,https://10.240.0.12:2379 --service-account-key-file=/var/lib/kubernetes/kubernetes-key.pem --service-cluster-ip-range=10.32.0.0/24 --service-node-port-range=30000-32767 --tls-cert-file=/var/lib/kubernetes/kubernetes.pem --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem --token-auth-file=/var/lib/kubernetes/token.csv --v=2 96 | 97 | root 8292 0.2 5.1 80756 51988 ? Ssl 09:12 0:15 /usr/bin/kube-controller-manager --allocate-node-cidrs=true --cluster-cidr=10.200.0.0/16 --cluster-name=kubernetes --leader-elect=true --master=http://10.240.0.21:8080 --root-ca-file=/var/lib/kubernetes/ca.pem --service-account-private-key-file=/var/lib/kubernetes/kubernetes-key.pem --service-cluster-ip-range=10.32.0.0/24 --v=2 98 | 99 | root 8321 0.0 2.9 46844 29844 ? Ssl 09:12 0:04 /usr/bin/kube-scheduler --leader-elect=true --master=http://10.240.0.21:8080 --v=2 100 | [root@controller1 ~]# 101 | 102 | 103 | [root@controller1 ~]# free -m 104 | total used free shared buff/cache available 105 | Mem: 992 167 99 0 726 644 106 | Swap: 511 0 511 107 | [root@controller1 ~]# 108 | 109 | ``` 110 | 111 | 112 | ### Worker: 113 | The worker nodes need the most amount of RAM, because these will run your containers. Even though it shows only 168 MB of utilization, this will quickly be used up as soon as there are few pods running on this cluster. Worker nodes will be the beefiest nodes of your cluster. 114 | 115 | ``` 116 | [root@worker1 ~]# ps aux | grep kube 117 | root 13743 0.0 1.8 43744 28200 ? Ssl Sep16 0:15 /kube-dns --domain=cluster.local --dns-port=10053 118 | root 13942 0.0 0.4 14124 7320 ? Ssl Sep16 0:07 /exechealthz -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null -port=8080 -quiet 119 | root 22925 0.0 0.0 117148 980 pts/0 S+ 11:10 0:00 grep --color=auto kube 120 | root 27240 0.5 4.0 401936 61372 ? Ssl 09:14 0:36 /usr/bin/kubelet --allow-privileged=true --api-servers=https://10.240.0.21:6443,https://10.240.0.22:6443 --cloud-provider= --cluster-dns=10.32.0.10 --cluster-domain=cluster.local --configure-cbr0=true --container-runtime=docker --docker=unix:///var/run/docker.sock --network-plugin=kubenet --kubeconfig=/var/lib/kubelet/kubeconfig --reconcile-cidr=true --serialize-image-pulls=false --tls-cert-file=/var/lib/kubernetes/kubernetes.pem --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem --v=2 121 | root 27314 0.7 1.8 41536 28072 ? Ssl 09:14 0:50 /usr/bin/kube-proxy --master=https://10.240.0.21:6443 --kubeconfig=/var/lib/kubelet/kubeconfig --proxy-mode=iptables --v=2 122 | [root@worker1 ~]# 123 | 124 | 125 | [root@worker1 ~]# free -m 126 | total used free shared buff/cache available 127 | Mem: 1496 168 139 0 1188 1104 128 | Swap: 1023 0 1023 129 | [root@worker1 ~]# 130 | ``` 131 | 132 | ## Prepare the OS on each node: 133 | 134 | We need a way to access the cluster in a uniform fashion, so it is recommended to updated your `/etc/hosts` file on your work computer, as per the design of your cluster: 135 | 136 | ``` 137 | [kamran@kworkhorse ~]$ sudo vi /etc/hosts 138 | 127.0.0.1 localhost.localdomain localhost 139 | 10.240.0.11 etcd1.example.com etcd1 140 | 10.240.0.12 etcd2.example.com etcd2 141 | 10.240.0.13 etcd3.example.com etcd3 142 | 10.240.0.20 controller.example.com controller controller-vip 143 | 10.240.0.21 controller1.example.com controller1 144 | 10.240.0.22 controller2.example.com controller2 145 | 10.240.0.31 worker1.example.com worker1 146 | 10.240.0.32 worker2.example.com worker2 147 | 10.240.0.40 lb.example.com lb lb-vip 148 | 10.240.0.41 lb1.example.com lb1 149 | 10.240.0.42 lb2.example.com lb2 150 | ``` 151 | 152 | We will copy this file to all nodes in just a moment. First, we create RSA-2 keypair for SSH connections and copy our key to all the nodes. This way, we can ssh into them without requiring a password. 153 | 154 | If you do not have a RSA keypair generated already you can do that by using the following command on your work computer: 155 | 156 | ``` 157 | ssh-keygen -t rsa 158 | ``` 159 | **Note:** It is recommended that you have a passphrase assigned to our key. The key is useless without a passphrase if stolen. So having a passphrase protected key is always a good idea. 160 | 161 | 162 | Assuming you already have a rsa keypair generated, the command to copy the public part of the keypair to the nodes will be: 163 | 164 | ``` 165 | ssh-copy-id root@ 166 | ``` 167 | 168 | Sample run: 169 | ``` 170 | [kamran@kworkhorse ~]$ ssh-copy-id root@etcd1 171 | The authenticity of host 'etcd1 (10.240.0.11)' can't be established. 172 | ECDSA key fingerprint is SHA256:FUMy5JNZnaLXhkW3Y0/WlXzQQrjU5IZ8LMOcgBTOiLU. 173 | ECDSA key fingerprint is MD5:5e:9b:2d:ae:8e:16:7a:ee:ca:de:de:da:9a:04:19:8b. 174 | Are you sure you want to continue connecting (yes/no)? yes 175 | /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed 176 | /usr/bin/ssh-copy-id: INFO: 2 key(s) remain to be installed -- if you are prompted now it is to install the new keys 177 | root@etcd1's password: 178 | 179 | Number of key(s) added: 2 180 | 181 | Now try logging into the machine, with: "ssh 'root@etcd1'" 182 | and check to make sure that only the key(s) you wanted were added. 183 | 184 | [kamran@kworkhorse ~]$ 185 | ``` 186 | **Note:** You cannot run a loop for copying the keys, as each time it asks for confirmation of RSA fingerprint of the target node, and also the password for the root user for the first time. So this is manual step! 187 | 188 | 189 | After setting up your keys in all the nodes, you should be able to execute commands on the nodes using ssh: 190 | 191 | ``` 192 | [kamran@kworkhorse ~]$ ssh root@etcd1 uptime 193 | 13:16:27 up 1:29, 1 user, load average: 0.08, 0.03, 0.04 194 | [kamran@kworkhorse ~]$ 195 | ``` 196 | 197 | 198 | Now, copy the `/etc/hosts` file to all nodes: 199 | 200 | ``` 201 | for node in etcd{1,2,3} controller{1,2} worker{1,2} lb{1,2} ; do scp /etc/hosts root@${node}:/etc/hosts ; done 202 | ``` 203 | 204 | 205 | After all VMs are created, we update OS on them using `yum -y update`, disable firewalld service, and also disable SELINUX in `/etc/selinux/config` file and reboot all nodes for these changes to take effect. 206 | 207 | 208 | 209 | Disable firewall on all nodes: 210 | 211 | Note: For some strange reason, disabling `firewalld` service did not work. I had to actually remove the `firewalld` package from all of the nodes. 212 | ``` 213 | for node in etcd{1,2,3} controller{1,2} worker{1,2} lb{1,2} ; do ssh root@${node} "yum -y remove firewalld" ; done 214 | ``` 215 | 216 | 217 | Disable SELINUX on all nodes: 218 | 219 | ``` 220 | for node in etcd{1,2,3} controller{1,2} worker{1,2} lb{1,2} ; do ssh root@${node} "echo 'SELINUX=disabled' > /etc/selinux/config" ; done 221 | ``` 222 | **Note:** Setting the `/etc/selinux/config` file to only contain a single line saying `SELINUX=disabled` is enough to disabled SELINUX at next system boot. 223 | 224 | OS update on all nodes, and reboot: 225 | ``` 226 | for node in etcd{1,2,3} controller{1,2} worker{1,2} lb{1,2} ; do ssh root@${node} "yum -y update && reboot" ; done 227 | ``` 228 | 229 | After all nodes are rebooted, verify that SELINUX is disabled: 230 | 231 | ``` 232 | for i in node in etcd{1,2,3} controller{1,2} worker{1,2} lb{1,2} ; do ssh root@${i} "hostname; getenforce" ; done 233 | ``` 234 | 235 | Expected output from the above command: 236 | ``` 237 | [kamran@kworkhorse ~]$ for i in node in etcd{1,2,3} controller{1,2} worker{1,2} lb{1,2} ; do ssh root@${i} "hostname; getenforce"; done 238 | etcd1.example.com 239 | Disabled 240 | etcd2.example.com 241 | Disabled 242 | etcd3.example.com 243 | Disabled 244 | controller1.example.com 245 | Disabled 246 | controller2.example.com 247 | Disabled 248 | worker1.example.com 249 | Disabled 250 | worker2.example.com 251 | Disabled 252 | lb1.example.com 253 | Disabled 254 | lb2.example.com 255 | Disabled 256 | [kamran@kworkhorse ~]$ 257 | ``` 258 | 259 | 260 | # Conclusion: 261 | In this chapter, we provisioned a fresh network in libvirt, and also provisioned our VMs. In next chapter we are going to create SSL certificates which will be used by various components of the cluster. 262 | 263 | -------------------------------------------------------------------------------- /chapter04.md: -------------------------------------------------------------------------------- 1 | # Chapter 04: SSL Certificates 2 | 3 | We have our nodes provisioned, updated, keys synched, everthing. Now the next logical step is to configure Kubernetes software components on the nodes. However, we need SSL certificates first. So in this chapter, we generate SSL certificates. 4 | 5 | # Configure / setup TLS certificates for the cluster: 6 | 7 | Reference: [https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/02-certificate-authority.md](https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/02-certificate-authority.md) 8 | 9 | 10 | Before we start configuring various services on the nodes, we need to create the SSL/TLS certifcates, which will be used by the kubernetes components . Here I will setup a single certificate, but in production you are advised to create individual certificates for each component/service. We need to secure the following Kubernetes components: 11 | 12 | * etcd 13 | * Kubernetes API Server 14 | * Kubernetes Kubelet 15 | 16 | 17 | We will use CFSSL to create these certificates. 18 | 19 | Linux: 20 | ``` 21 | wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 22 | chmod +x cfssl_linux-amd64 23 | sudo mv cfssl_linux-amd64 /usr/local/bin/cfssl 24 | 25 | wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 26 | chmod +x cfssljson_linux-amd64 27 | sudo mv cfssljson_linux-amd64 /usr/local/bin/cfssljson 28 | ``` 29 | 30 | ## Create a Certificate Authority 31 | 32 | ### Create CA CSR config file: 33 | 34 | ``` 35 | echo '{ 36 | "signing": { 37 | "default": { 38 | "expiry": "8760h" 39 | }, 40 | "profiles": { 41 | "kubernetes": { 42 | "usages": ["signing", "key encipherment", "server auth", "client auth"], 43 | "expiry": "8760h" 44 | } 45 | } 46 | } 47 | }' > ca-config.json 48 | ``` 49 | 50 | ### Generate CA certificate and CA private key: 51 | 52 | First, create a CSR (Certificate Signing Request) for CA: 53 | 54 | ``` 55 | echo '{ 56 | "CN": "Kubernetes", 57 | "key": { 58 | "algo": "rsa", 59 | "size": 2048 60 | }, 61 | "names": [ 62 | { 63 | "C": "NO", 64 | "L": "Oslo", 65 | "O": "Kubernetes", 66 | "OU": "CA", 67 | "ST": "Oslo" 68 | } 69 | ] 70 | }' > ca-csr.json 71 | ``` 72 | 73 | 74 | Now, generate CA certificate and it's private key: 75 | 76 | ``` 77 | cfssl gencert -initca ca-csr.json | cfssljson -bare ca 78 | ``` 79 | 80 | ``` 81 | [kamran@kworkhorse certs-baremetal]$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca 82 | 2016/09/08 11:32:54 [INFO] generating a new CA key and certificate from CSR 83 | 2016/09/08 11:32:54 [INFO] generate received request 84 | 2016/09/08 11:32:54 [INFO] received CSR 85 | 2016/09/08 11:32:54 [INFO] generating key: rsa-2048 86 | 2016/09/08 11:32:54 [INFO] encoded CSR 87 | 2016/09/08 11:32:54 [INFO] signed certificate with serial number 161389974620705926236327234344288710670396137404 88 | [kamran@kworkhorse certs-baremetal]$ 89 | ``` 90 | 91 | This should give you the following files: 92 | 93 | ``` 94 | ca.pem 95 | ca-key.pem 96 | ca.csr 97 | ``` 98 | 99 | In the list of generated files above, **ca.pem** is your CA certificate, **ca-key.pem** is the CA-certificate's private key, and **ca.csr** is the certificate signing request for this certificate. 100 | 101 | 102 | You can verify that you have a certificate, by using the command below: 103 | 104 | ``` 105 | openssl x509 -in ca.pem -text -noout 106 | ``` 107 | 108 | It should give you the output similar to what is shown below: 109 | 110 | ``` 111 | [kamran@kworkhorse certs-baremetal]$ openssl x509 -in ca.pem -text -noout 112 | Certificate: 113 | Data: 114 | Version: 3 (0x2) 115 | Serial Number: 116 | 1c:44:fa:0c:9d:6f:5b:66:03:cc:ac:f7:fe:b0:be:65:ab:73:9f:bc 117 | Signature Algorithm: sha256WithRSAEncryption 118 | Issuer: C=NO, ST=Oslo, L=Oslo, O=Kubernetes, OU=CA, CN=Kubernetes 119 | Validity 120 | Not Before: Sep 8 09:28:00 2016 GMT 121 | Not After : Sep 7 09:28:00 2021 GMT 122 | Subject: C=NO, ST=Oslo, L=Oslo, O=Kubernetes, OU=CA, CN=Kubernetes 123 | Subject Public Key Info: 124 | Public Key Algorithm: rsaEncryption 125 | Public-Key: (2048 bit) 126 | Modulus: 127 | 00:c4:60:18:aa:dd:71:98:00:79:63:ee:31:82:11: 128 | db:26:fb:f1:74:47:7b:85:f4:b0:cf:b2:d7:ce:59: 129 | 26:b6:f0:01:ea:4a:b1:a0:53:ae:45:51:1c:2a:98: 130 | 55:00:a5:1c:07:6b:96:f9:26:84:6e:0e:23:20:07: 131 | 85:6a:3c:a7:9c:be:f1:b6:95:d9:6a:68:be:70:7d: 132 | 6b:31:c6:78:80:78:27:ed:77:f2:ef:71:3b:6b:2d: 133 | 66:5f:ce:71:46:16:0f:b9:e7:55:a6:e3:03:75:c4: 134 | 17:59:7d:61:b1:84:19:06:8d:90:0d:d9:cb:ee:72: 135 | cd:a2:7f:4e:ed:37:53:fc:cc:e4:12:b8:49:ad:bf: 136 | f2:0f:79:60:ea:08:9b:ed:9c:65:f8:9b:8a:81:b5: 137 | cc:1e:24:bd:9c:a9:fe:68:fa:49:73:cf:b4:aa:69: 138 | 1c:b1:e3:6b:a5:67:89:15:e8:e1:69:af:f9:b4:4b: 139 | c1:b8:33:fe:82:54:a7:fd:24:3b:18:3d:91:98:7a: 140 | e5:40:0d:1a:d2:4e:1c:38:12:c4:b9:8a:7e:54:8e: 141 | fe:b2:93:01:be:99:aa:18:5c:50:24:68:03:87:ec: 142 | 58:35:08:94:5b:b4:00:db:58:0d:e9:0f:5e:80:66: 143 | c7:8b:24:bd:4b:6d:31:9c:6f:b3:a2:0c:20:bb:3b: 144 | da:b1 145 | Exponent: 65537 (0x10001) 146 | X509v3 extensions: 147 | X509v3 Key Usage: critical 148 | Certificate Sign, CRL Sign 149 | X509v3 Basic Constraints: critical 150 | CA:TRUE, pathlen:2 151 | X509v3 Subject Key Identifier: 152 | 9F:0F:21:A2:F0:F1:FF:C9:19:BE:5F:4C:30:73:FD:9C:A6:C1:A0:3C 153 | X509v3 Authority Key Identifier: 154 | keyid:9F:0F:21:A2:F0:F1:FF:C9:19:BE:5F:4C:30:73:FD:9C:A6:C1:A0:3C 155 | 156 | Signature Algorithm: sha256WithRSAEncryption 157 | 0b:e0:60:9d:5c:3e:95:50:aa:6d:56:2b:83:90:83:fe:81:34: 158 | f2:64:e1:2d:56:13:9a:ec:13:cb:d0:fc:2f:82:3e:24:86:25: 159 | 73:5a:79:d3:07:76:4e:0b:2e:7c:56:7e:82:e1:6e:8f:89:94: 160 | 61:5d:20:76:31:4c:a6:f0:ad:bc:73:49:d9:81:9c:1f:6f:ad: 161 | ea:fd:8c:4a:c5:9c:f9:77:0a:76:c3:b7:b4:b7:dc:d4:4d:3c: 162 | 5a:47:d6:d7:fa:07:30:34:3b:f4:4c:59:1f:4e:15:e8:11:b6: 163 | b6:83:61:28:a9:86:70:f9:72:cd:91:2d:c3:d6:87:37:83:04: 164 | 74:e2:ff:67:3d:ef:bf:3b:67:88:a9:64:2b:41:72:d5:34:e5: 165 | 93:52:2e:4a:d5:6b:8d:8c:b3:66:fa:32:18:e0:5f:9e:f1:68: 166 | dc:51:81:52:dc:bc:8f:01:b5:22:92:d5:5e:1c:1c:f0:a3:ab: 167 | a8:c5:9d:84:60:80:e4:82:52:09:1a:1c:8d:1b:af:f9:a5:66: 168 | 06:9a:fe:f4:b1:5f:6e:51:de:49:1f:07:eb:05:3f:f1:39:cc: 169 | 29:aa:67:b0:e6:4a:6a:dd:14:6f:41:8d:67:f7:4b:55:99:49: 170 | 3c:4f:56:5e:a5:dd:6c:7b:2c:23:32:ee:a1:d2:0a:d4:dd:b7: 171 | 28:86:b4:42 172 | [kamran@kworkhorse certs-baremetal]$ 173 | ``` 174 | 175 | ## Generate the single Kubernetes TLS certificate: 176 | **Reminder:** We will generate a TLS certificate that will be valid for all Kubernetes components. This is being done for ease of use. In production you should strongly consider generating individual TLS certificates for each component. 177 | 178 | 179 | 180 | We should also setup an environment variable named `KUBERNETES_PUBLIC_IP_ADDRESS` with the value `10.240.0.20` . This will be handy in the next step. 181 | 182 | Above explanation is not entirely true. 183 | 184 | 185 | We need to setup KUBERNETES_PUBLIC_IP_ADDRESS with the IP through which we are going to access Kubernetes *from the internet*. The csr file below already has the VIP of the controller nodes listed. If you are setting up this bare-metal cluster for your corporate, then there might be a public IP which you use to access the corporate services running in your corporate DMZ. May be you can use one of those public IPs, or may be you can acquire a new public IP and assign it to the public facing interface of your edge router. So this variable here `KUBERNETES_PUBLIC_IP_ADDRESS` holds that public IP. That is what it is meant to be. 186 | 187 | 188 | 189 | ``` 190 | export KUBERNETES_PUBLIC_IP_ADDRESS='10.240.0.20' 191 | ``` 192 | 193 | ### Create Kubernetes certificate CSR config file: 194 | 195 | Be careful in creating this file. Make sure you use all the possible hostnames of the nodes you are generating this certificate for. This includes their FQDNs. When you setup node names like "nodename.example.com" then you need to include that in the CSR config file below. Also add a few extra entries for worker nodes, as you might want to increase the number of worker nodes later in this setup. So even though I have only two worker nodes right now, I have added two extra in the certificate below, worker 3 and 4. The hostnames controller.example.com and kubernetes.example.com are supposed to point to the VIP (10.240.0.20) of the controller nodes. All of these has to go into the infrastructure DNS. 196 | 197 | **Note:** Kelsey's guide set "CN" to be "kubernetes", whereas I set it to "*.example.com" . See: [https://cabforum.org/information-for-site-owners-and-administrators/](https://cabforum.org/information-for-site-owners-and-administrators/) 198 | 199 | ``` 200 | cat > kubernetes-csr.json < -ca=ca.pem \ 280 | > -ca-key=ca-key.pem \ 281 | > -config=ca-config.json \ 282 | > -profile=kubernetes \ 283 | > kubernetes-csr.json | cfssljson -bare kubernetes 284 | 2016/09/08 14:04:04 [INFO] generate received request 285 | 2016/09/08 14:04:04 [INFO] received CSR 286 | 2016/09/08 14:04:04 [INFO] generating key: rsa-2048 287 | 2016/09/08 14:04:04 [INFO] encoded CSR 288 | 2016/09/08 14:04:04 [INFO] signed certificate with serial number 448428141554905058774798041748928773753703785287 289 | 2016/09/08 14:04:04 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for 290 | websites. For more information see the Baseline Requirements for the Issuance and Management 291 | of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); 292 | specifically, section 10.2.3 ("Information Requirements"). 293 | [kamran@kworkhorse certs-baremetal]$ 294 | ``` 295 | 296 | After you execute the above code, you get the following additional files: 297 | 298 | ``` 299 | kubernetes-csr.json 300 | kubernetes-key.pem 301 | kubernetes.pem 302 | ``` 303 | 304 | Verify the contents of the generated certificate: 305 | 306 | ``` 307 | openssl x509 -in kubernetes.pem -text -noout 308 | ``` 309 | 310 | 311 | ``` 312 | [kamran@kworkhorse certs-baremetal]$ openssl x509 -in kubernetes.pem -text -noout 313 | Certificate: 314 | Data: 315 | Version: 3 (0x2) 316 | Serial Number: 317 | 72:f8:47:b0:9c:ff:4e:f1:4e:3a:0d:5c:e9:f9:77:e9:7d:85:fd:ae 318 | Signature Algorithm: sha256WithRSAEncryption 319 | Issuer: C=NO, ST=Oslo, L=Oslo, O=Kubernetes, OU=CA, CN=Kubernetes 320 | Validity 321 | Not Before: Sep 9 08:26:00 2016 GMT 322 | Not After : Sep 9 08:26:00 2017 GMT 323 | Subject: C=NO, ST=Oslo, L=Oslo, O=Kubernetes, OU=Cluster, CN=*.example.com 324 | Subject Public Key Info: 325 | Public Key Algorithm: rsaEncryption 326 | Public-Key: (2048 bit) 327 | Modulus: 328 | 00:e8:c4:01:e6:06:79:6b:b1:00:ec:7a:d4:c9:86: 329 | 77:f7:b2:e5:c6:e5:c8:6a:65:a1:89:d6:f6:66:09: 330 | 26:c3:9d:bd:39:2d:ee:eb:a8:88:d7:d9:85:3e:bf: 331 | 82:e0:34:83:68:70:33:6a:61:ae:c9:93:69:75:06: 332 | 57:da:a8:47:39:89:e1:a7:e8:72:27:89:46:6d:df: 333 | fe:ed:75:99:f5:74:f0:28:22:05:f5:ac:83:af:2e: 334 | e9:e0:79:0d:9b:a6:7e:71:78:90:b2:a0:14:54:92: 335 | 66:c1:16:e9:a2:9a:a8:4d:fb:ba:c3:22:d8:e1:f3: 336 | d5:38:97:08:2b:d5:ec:1f:ba:01:9f:02:e5:7e:c9: 337 | a2:a8:2d:b3:ba:33:ba:f0:61:da:ff:1a:e8:1f:61: 338 | f9:1b:42:eb:f8:be:52:bf:5e:56:7d:7e:85:f7:8b: 339 | 01:2f:e5:c9:56:53:af:b4:87:e8:44:e2:8f:09:bf: 340 | 6e:85:42:4d:cb:7a:f9:f4:03:85:3f:af:b7:2e:d5: 341 | 58:c0:1c:62:2b:fc:b8:b7:b7:b9:d3:d3:6f:82:19: 342 | 89:dc:df:d9:f3:43:13:e5:e0:04:f4:8d:ce:b0:98: 343 | 88:81:b5:96:bb:a2:cf:90:86:f4:16:6a:34:3d:c6: 344 | f7:a1:e1:2c:d4:3f:c0:b5:32:70:c1:77:2e:17:20: 345 | 7e:7b 346 | Exponent: 65537 (0x10001) 347 | X509v3 extensions: 348 | X509v3 Key Usage: critical 349 | Digital Signature, Key Encipherment 350 | X509v3 Extended Key Usage: 351 | TLS Web Server Authentication, TLS Web Client Authentication 352 | X509v3 Basic Constraints: critical 353 | CA:FALSE 354 | X509v3 Subject Key Identifier: 355 | A4:9B:A2:1A:F4:AF:71:A6:2F:C7:8B:BE:83:7B:A0:DB:D3:70:91:12 356 | X509v3 Authority Key Identifier: 357 | keyid:9F:0F:21:A2:F0:F1:FF:C9:19:BE:5F:4C:30:73:FD:9C:A6:C1:A0:3C 358 | 359 | X509v3 Subject Alternative Name: 360 | DNS:etcd1, DNS:etcd2, DNS:etcd1.example.com, DNS:etcd2.example.com, DNS:controller1, DNS:controller2, DNS:controller1.example.com, DNS:controller2.example.com, DNS:worker1, DNS:worker2, DNS:worker3, DNS:worker4, DNS:worker1.example.com, DNS:worker2.example.com, DNS:worker3.example.com, DNS:worker4.example.com, DNS:controller.example.com, DNS:kubernetes.example.com, DNS:localhost, IP Address:10.32.0.1, IP Address:10.240.0.11, IP Address:10.240.0.12, IP Address:10.240.0.21, IP Address:10.240.0.22, IP Address:10.240.0.31, IP Address:10.240.0.32, IP Address:10.240.0.33, IP Address:10.240.0.34, IP Address:10.240.0.20, IP Address:127.0.0.1 361 | Signature Algorithm: sha256WithRSAEncryption 362 | 5f:5f:cd:b0:0f:f6:7e:9d:6d:8b:ba:38:09:18:66:24:8b:4b: 363 | 5b:71:0a:a2:b4:36:79:ae:99:5a:9b:38:07:89:05:90:53:ee: 364 | 8c:e5:52:c9:ef:8e:1a:97:62:e7:a7:c5:70:06:6f:39:30:ba: 365 | 32:dd:9f:72:c7:d3:09:82:4a:b6:2c:80:35:ec:e2:8f:97:dd: 366 | e6:34:e9:27:e6:e0:2a:9d:d9:42:94:a5:45:fe:d0:b2:30:88: 367 | 1f:b1:5e:1c:91:a2:53:f8:6b:ad:2e:ae:b3:8a:4b:fe:aa:97: 368 | 7d:65:2a:39:02:f8:a0:28:e8:d2:d0:bf:fb:1b:4f:57:9c:3f: 369 | bf:78:07:0b:c9:67:12:48:63:a2:f0:59:ff:8b:a2:10:26:d3: 370 | 3a:0b:c3:73:85:2e:ee:14:ea:2f:1e:30:fb:78:b6:79:c9:6c: 371 | 76:f1:fe:02:26:13:69:7c:27:74:31:21:c6:43:b5:b3:17:94: 372 | ed:ab:b2:05:fe:07:90:8d:6f:38:67:dc:34:6a:2d:5b:1e:f1: 373 | 2b:b4:17:88:d6:9d:b3:0a:86:d4:0a:ad:c2:a3:bf:19:8c:99: 374 | 74:73:be:b0:65:da:b9:cf:78:e6:14:64:ce:04:0e:48:8d:c9: 375 | 16:c0:c7:8f:9e:9f:66:85:e6:c8:13:2e:73:20:22:35:db:ef: 376 | 0b:cf:b6:03 377 | [kamran@kworkhorse certs-baremetal]$ 378 | ``` 379 | 380 | ## Copy the certificates to the nodes: 381 | 382 | ``` 383 | [kamran@kworkhorse certs-baremetal]$ for node in etcd{1,2,3} controller{1,2} worker{1,2} lb{1,2}; do scp ca.pem kubernetes-key.pem kubernetes.pem root@${node}:/root/ ; done 384 | ca.pem 100% 1350 1.3KB/s 00:00 385 | kubernetes-key.pem 100% 1679 1.6KB/s 00:00 386 | kubernetes.pem 100% 1927 1.9KB/s 00:00 387 | ca.pem 100% 1350 1.3KB/s 00:00 388 | kubernetes-key.pem 100% 1679 1.6KB/s 00:00 389 | kubernetes.pem 100% 1927 1.9KB/s 00:00 390 | ca.pem 100% 1350 1.3KB/s 00:00 391 | kubernetes-key.pem 100% 1679 1.6KB/s 00:00 392 | kubernetes.pem 100% 1927 1.9KB/s 00:00 393 | ca.pem 100% 1350 1.3KB/s 00:00 394 | kubernetes-key.pem 100% 1679 1.6KB/s 00:00 395 | kubernetes.pem 100% 1927 1.9KB/s 00:00 396 | ca.pem 100% 1350 1.3KB/s 00:00 397 | kubernetes-key.pem 100% 1679 1.6KB/s 00:00 398 | kubernetes.pem 100% 1927 1.9KB/s 00:00 399 | ca.pem 100% 1350 1.3KB/s 00:00 400 | kubernetes-key.pem 100% 1679 1.6KB/s 00:00 401 | kubernetes.pem 100% 1927 1.9KB/s 00:00 402 | [kamran@kworkhorse certs-baremetal]$ 403 | ``` 404 | 405 | -------------------------------------------------------------------------------- /chapter05.md: -------------------------------------------------------------------------------- 1 | # Chapter 5: Setup a etcd cluster 2 | 3 | In this chapter, we are setting up a three node etcd cluster. Kubernetes needs a data store and etcd works very well for that purpose. 4 | 5 | # Configure etcd nodes: 6 | 7 | The reason of having dedicated etcd nodes, as explained by Kelsey: 8 | 9 | All Kubernetes components are stateless which greatly simplifies managing a Kubernetes cluster. All state is stored in etcd, which is a database and must be treated special. etcd is being run on a dedicated set of machines for the following reasons: 10 | 11 | * The etcd lifecycle is not tied to Kubernetes. We should be able to upgrade etcd independently of Kubernetes. 12 | * Scaling out etcd is different than scaling out the Kubernetes Control Plane. 13 | * Prevent other applications from taking up resources (CPU, Memory, I/O) required by etcd. 14 | 15 | First, move the certificates in place. 16 | 17 | ``` 18 | [root@etcd1 ~]# sudo mkdir -p /etc/etcd/ 19 | [root@etcd1 ~]# ls /etc/etcd/ 20 | [root@etcd1 ~]# sudo mv ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/ 21 | ``` 22 | 23 | 24 | 25 | Then, install necessary software on etcd nodes. Remember that the etcd version which comes with Fedora 24 is 2.2, whereas the latest version of etcd available on it's github page is 3.0.7 . So we download and install that one. 26 | 27 | Do the following steps on both nodes: 28 | ``` 29 | curl -L https://github.com/coreos/etcd/releases/download/v3.0.7/etcd-v3.0.7-linux-amd64.tar.gz -o etcd-v3.0.7-linux-amd64.tar.gz 30 | tar xzvf etcd-v3.0.7-linux-amd64.tar.gz 31 | sudo cp etcd-v3.0.7-linux-amd64/etcd* /usr/bin/ 32 | sudo mkdir -p /var/lib/etcd 33 | ``` 34 | 35 | Create the etcd systemd unit file: 36 | 37 | ``` 38 | cat > etcd.service <<"EOF" 39 | [Unit] 40 | Description=etcd 41 | Documentation=https://github.com/coreos 42 | 43 | [Service] 44 | ExecStart=/usr/bin/etcd --name ETCD_NAME \ 45 | --cert-file=/etc/etcd/kubernetes.pem \ 46 | --key-file=/etc/etcd/kubernetes-key.pem \ 47 | --peer-cert-file=/etc/etcd/kubernetes.pem \ 48 | --peer-key-file=/etc/etcd/kubernetes-key.pem \ 49 | --trusted-ca-file=/etc/etcd/ca.pem \ 50 | --peer-trusted-ca-file=/etc/etcd/ca.pem \ 51 | --initial-advertise-peer-urls https://INTERNAL_IP:2380 \ 52 | --listen-peer-urls https://INTERNAL_IP:2380 \ 53 | --listen-client-urls https://INTERNAL_IP:2379,http://127.0.0.1:2379 \ 54 | --advertise-client-urls https://INTERNAL_IP:2379 \ 55 | --initial-cluster-token etcd-cluster-0 \ 56 | --initial-cluster etcd1=https://10.240.0.11:2380,etcd2=https://10.240.0.12:2380 \ 57 | --initial-cluster-state new \ 58 | --data-dir=/var/lib/etcd 59 | Restart=on-failure 60 | RestartSec=5 61 | 62 | [Install] 63 | WantedBy=multi-user.target 64 | EOF 65 | ``` 66 | 67 | 68 | 69 | **Note:** Make sure to change the IP below to the one belonging to the etcd node you are configuring. 70 | ``` 71 | export INTERNAL_IP='10.240.0.11' 72 | export ETCD_NAME=$(hostname -s) 73 | sed -i s/INTERNAL_IP/$INTERNAL_IP/g etcd.service 74 | sed -i s/ETCD_NAME/$ETCD_NAME/g etcd.service 75 | sudo mv etcd.service /etc/systemd/system/ 76 | ``` 77 | 78 | Start etcd: 79 | ``` 80 | sudo systemctl daemon-reload 81 | sudo systemctl enable etcd 82 | sudo systemctl start etcd 83 | ``` 84 | 85 | 86 | 87 | ## Verify that etcd is running: 88 | ``` 89 | [root@etcd1 ~]# sudo systemctl status etcd --no-pager 90 | ● etcd.service - etcd 91 | Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled) 92 | Active: active (running) since Fri 2016-09-09 11:12:05 CEST; 29s ago 93 | Docs: https://github.com/coreos 94 | Main PID: 1563 (etcd) 95 | Tasks: 6 (limit: 512) 96 | CGroup: /system.slice/etcd.service 97 | └─1563 /usr/bin/etcd --name etcd1 --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/e... 98 | 99 | Sep 09 11:12:32 etcd1.example.com etcd[1563]: ffed16798470cab5 [logterm: 1, index: 2] sent vote request to 3a57933972cb5131 at term 20 100 | Sep 09 11:12:33 etcd1.example.com etcd[1563]: ffed16798470cab5 is starting a new election at term 20 101 | Sep 09 11:12:33 etcd1.example.com etcd[1563]: ffed16798470cab5 became candidate at term 21 102 | Sep 09 11:12:33 etcd1.example.com etcd[1563]: ffed16798470cab5 received vote from ffed16798470cab5 at term 21 103 | Sep 09 11:12:33 etcd1.example.com etcd[1563]: ffed16798470cab5 [logterm: 1, index: 2] sent vote request to 3a57933972cb5131 at term 21 104 | Sep 09 11:12:34 etcd1.example.com etcd[1563]: publish error: etcdserver: request timed out 105 | Sep 09 11:12:35 etcd1.example.com etcd[1563]: ffed16798470cab5 is starting a new election at term 21 106 | Sep 09 11:12:35 etcd1.example.com etcd[1563]: ffed16798470cab5 became candidate at term 22 107 | Sep 09 11:12:35 etcd1.example.com etcd[1563]: ffed16798470cab5 received vote from ffed16798470cab5 at term 22 108 | Sep 09 11:12:35 etcd1.example.com etcd[1563]: ffed16798470cab5 [logterm: 1, index: 2] sent vote request to 3a57933972cb5131 at term 22 109 | [root@etcd1 ~]# 110 | 111 | 112 | [root@etcd1 ~]# netstat -ntlp 113 | Active Internet connections (only servers) 114 | Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name 115 | tcp 0 0 10.240.0.11:2379 0.0.0.0:* LISTEN 1563/etcd 116 | tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN 1563/etcd 117 | tcp 0 0 10.240.0.11:2380 0.0.0.0:* LISTEN 1563/etcd 118 | tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 591/sshd 119 | tcp6 0 0 :::9090 :::* LISTEN 1/systemd 120 | tcp6 0 0 :::22 :::* LISTEN 591/sshd 121 | [root@etcd1 ~]# 122 | 123 | [root@etcd1 ~]# etcdctl --ca-file=/etc/etcd/ca.pem cluster-health 124 | cluster may be unhealthy: failed to list members 125 | Error: client: etcd cluster is unavailable or misconfigured 126 | error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout 127 | error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused 128 | 129 | [root@etcd1 ~]# 130 | ``` 131 | 132 | **Note:** When there is only one node, the etcd cluster will show up as unavailable or misconfigured. 133 | 134 | 135 | ## Verify: 136 | 137 | After executing all the steps on etcd2 too, I have the following status of services on etcd2: 138 | ``` 139 | [root@etcd2 ~]# systemctl status etcd 140 | ● etcd.service - etcd 141 | Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled) 142 | Active: active (running) since Fri 2016-09-09 11:26:15 CEST; 5s ago 143 | Docs: https://github.com/coreos 144 | Main PID: 2210 (etcd) 145 | Tasks: 7 (limit: 512) 146 | CGroup: /system.slice/etcd.service 147 | └─2210 /usr/bin/etcd --name etcd2 --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/etc/ 148 | 149 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: 3a57933972cb5131 [logterm: 1, index: 2, vote: 0] voted for ffed16798470cab5 [logterm: 1, index: 2] 150 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: raft.node: 3a57933972cb5131 elected leader ffed16798470cab5 at term 587 151 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: published {Name:etcd2 ClientURLs:[https://10.240.0.12:2379]} to cluster cdeaba18114f0e16 152 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: ready to serve client requests 153 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged! 154 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: forgot to set Type=notify in systemd service file? 155 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: ready to serve client requests 156 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: serving client requests on 10.240.0.12:2379 157 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: set the initial cluster version to 3.0 158 | Sep 09 11:26:16 etcd2.example.com etcd[2210]: enabled capabilities for version 3.0 159 | lines 1-19/19 (END) 160 | ``` 161 | 162 | ``` 163 | [root@etcd2 ~]# netstat -antlp 164 | Active Internet connections (servers and established) 165 | Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name 166 | tcp 0 0 10.240.0.12:2379 0.0.0.0:* LISTEN 2210/etcd 167 | tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN 2210/etcd 168 | tcp 0 0 10.240.0.12:2380 0.0.0.0:* LISTEN 2210/etcd 169 | tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 592/sshd 170 | tcp 0 0 127.0.0.1:40780 127.0.0.1:2379 ESTABLISHED 2210/etcd 171 | tcp 0 0 10.240.0.12:2379 10.240.0.12:35998 ESTABLISHED 2210/etcd 172 | tcp 0 0 127.0.0.1:2379 127.0.0.1:40780 ESTABLISHED 2210/etcd 173 | tcp 0 0 10.240.0.12:34986 10.240.0.11:2380 ESTABLISHED 2210/etcd 174 | tcp 0 0 10.240.0.12:35998 10.240.0.12:2379 ESTABLISHED 2210/etcd 175 | tcp 0 0 10.240.0.12:2379 10.240.0.12:36002 ESTABLISHED 2210/etcd 176 | tcp 0 0 127.0.0.1:40784 127.0.0.1:2379 ESTABLISHED 2210/etcd 177 | tcp 0 0 10.240.0.12:2379 10.240.0.12:35996 ESTABLISHED 2210/etcd 178 | tcp 0 0 10.240.0.12:2379 10.240.0.12:35994 ESTABLISHED 2210/etcd 179 | tcp 0 0 10.240.0.12:36002 10.240.0.12:2379 ESTABLISHED 2210/etcd 180 | tcp 0 0 127.0.0.1:2379 127.0.0.1:40788 ESTABLISHED 2210/etcd 181 | tcp 0 0 10.240.0.12:36004 10.240.0.12:2379 ESTABLISHED 2210/etcd 182 | tcp 0 0 10.240.0.12:35994 10.240.0.12:2379 ESTABLISHED 2210/etcd 183 | tcp 0 0 127.0.0.1:2379 127.0.0.1:40782 ESTABLISHED 2210/etcd 184 | tcp 0 0 10.240.0.12:2380 10.240.0.11:37048 ESTABLISHED 2210/etcd 185 | tcp 0 0 10.240.0.12:2380 10.240.0.11:37050 ESTABLISHED 2210/etcd 186 | tcp 0 0 10.240.0.12:2380 10.240.0.11:37046 ESTABLISHED 2210/etcd 187 | tcp 0 0 127.0.0.1:40782 127.0.0.1:2379 ESTABLISHED 2210/etcd 188 | tcp 0 0 10.240.0.12:35996 10.240.0.12:2379 ESTABLISHED 2210/etcd 189 | tcp 0 0 10.240.0.12:2380 10.240.0.11:37076 ESTABLISHED 2210/etcd 190 | tcp 0 0 127.0.0.1:40786 127.0.0.1:2379 ESTABLISHED 2210/etcd 191 | tcp 0 0 127.0.0.1:2379 127.0.0.1:40790 ESTABLISHED 2210/etcd 192 | tcp 0 0 10.240.0.12:34988 10.240.0.11:2380 ESTABLISHED 2210/etcd 193 | tcp 0 0 10.240.0.12:2379 10.240.0.12:36000 ESTABLISHED 2210/etcd 194 | tcp 0 0 127.0.0.1:40788 127.0.0.1:2379 ESTABLISHED 2210/etcd 195 | tcp 0 0 127.0.0.1:2379 127.0.0.1:40784 ESTABLISHED 2210/etcd 196 | tcp 0 0 10.240.0.12:22 10.240.0.1:51040 ESTABLISHED 1796/sshd: root [pr 197 | tcp 0 0 10.240.0.12:35014 10.240.0.11:2380 ESTABLISHED 2210/etcd 198 | tcp 0 0 127.0.0.1:2379 127.0.0.1:40786 ESTABLISHED 2210/etcd 199 | tcp 0 0 10.240.0.12:36000 10.240.0.12:2379 ESTABLISHED 2210/etcd 200 | tcp 0 0 127.0.0.1:40790 127.0.0.1:2379 ESTABLISHED 2210/etcd 201 | tcp 0 0 10.240.0.12:2379 10.240.0.12:36004 ESTABLISHED 2210/etcd 202 | tcp6 0 0 :::9090 :::* LISTEN 1/systemd 203 | tcp6 0 0 :::22 :::* LISTEN 592/sshd 204 | [root@etcd2 ~]# 205 | ``` 206 | 207 | 208 | ``` 209 | [root@etcd2 ~]# etcdctl --ca-file=/etc/etcd/ca.pem cluster-health 210 | member 3a57933972cb5131 is healthy: got healthy result from https://10.240.0.12:2379 211 | member ffed16798470cab5 is healthy: got healthy result from https://10.240.0.11:2379 212 | cluster is healthy 213 | [root@etcd2 ~]# 214 | ``` 215 | 216 | ``` 217 | [root@etcd1 ~]# etcdctl --ca-file=/etc/etcd/ca.pem cluster-health 218 | member 3a57933972cb5131 is healthy: got healthy result from https://10.240.0.12:2379 219 | member ffed16798470cab5 is healthy: got healthy result from https://10.240.0.11:2379 220 | cluster is healthy 221 | [root@etcd1 ~]# 222 | ``` 223 | 224 | 225 | **Note:** (to do) I noticed that when one etcd node (out of total two) was switched off, the worker nodes started having problem: 226 | 227 | ``` 228 | Sep 19 11:21:58 worker1.example.com kubelet[27240]: E0919 11:21:58.974948 27240 kubelet.go:2913] Error updating node status, will retry: client: etcd cluster is unavailable or misconfigured 229 | ``` 230 | 231 | 232 | -------------------------------------------------------------------------------- /chapter06.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Control nodes 2 | 3 | In this chapter we setup Kubernetes master / controller nodes. We setup HA for these nodes in the next chapter. 4 | 5 | ## The Kubernetes components that make up the control plane include the following components: 6 | 7 | * Kubernetes API Server 8 | * Kubernetes Scheduler 9 | * Kubernetes Controller Manager 10 | 11 | ## Each component is being run on the same machines for the following reasons: 12 | 13 | * The Scheduler and Controller Manager are tightly coupled with the API Server 14 | * Only one Scheduler and Controller Manager can be active at a given time, but it's ok to run multiple at the same time. Each component will elect a leader via the API Server. 15 | * Running multiple copies of each component is required for H/A 16 | * Running each component next to the API Server eases configuration. 17 | 18 | 19 | Setup TLS certificates in each controller node: 20 | 21 | ``` 22 | sudo mkdir -p /var/lib/kubernetes 23 | 24 | sudo mv ca.pem kubernetes-key.pem kubernetes.pem /var/lib/kubernetes/ 25 | ``` 26 | 27 | Download and install the Kubernetes controller binaries: 28 | 29 | ``` 30 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kube-apiserver 31 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kube-controller-manager 32 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kube-scheduler 33 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kubectl 34 | ``` 35 | 36 | ``` 37 | chmod +x kube-apiserver kube-controller-manager kube-scheduler kubectl 38 | 39 | sudo mv kube-apiserver kube-controller-manager kube-scheduler kubectl /usr/bin/ 40 | ``` 41 | 42 | 43 | ## Kubernetes API Server 44 | ### Setup Authentication and Authorization 45 | #### Authentication 46 | 47 | Token based authentication will be used to limit access to the Kubernetes API. The authentication token is used by the following components: 48 | 49 | * kubelet (client) 50 | * kubectl (client) 51 | * Kubernetes API Server (server) 52 | 53 | The other components, mainly the scheduler and controller manager, access the Kubernetes API server locally over the insecure API port which does not require authentication. The insecure port is only enabled for local access. 54 | 55 | Download the example token file: 56 | ``` 57 | wget https://raw.githubusercontent.com/kelseyhightower/kubernetes-the-hard-way/master/token.csv 58 | ``` 59 | 60 | Review the example token file and replace the default token. 61 | ``` 62 | [root@controller1 ~]# cat token.csv 63 | chAng3m3,admin,admin 64 | chAng3m3,scheduler,scheduler 65 | chAng3m3,kubelet,kubelet 66 | [root@controller1 ~]# 67 | ``` 68 | 69 | Move the token file into the Kubernetes configuration directory so it can be read by the Kubernetes API server. 70 | ``` 71 | sudo mv token.csv /var/lib/kubernetes/ 72 | ``` 73 | 74 | 75 | #### Authorization 76 | 77 | Attribute-Based Access Control (ABAC) will be used to authorize access to the Kubernetes API. In this lab ABAC will be setup using the Kubernetes policy file backend as documented in the Kubernetes authorization guide. 78 | 79 | Download the example authorization policy file: 80 | 81 | ``` 82 | wget https://raw.githubusercontent.com/kelseyhightower/kubernetes-the-hard-way/master/authorization-policy.jsonl 83 | ``` 84 | 85 | Review the example authorization policy file. No changes are required. 86 | ``` 87 | [root@controller1 ~]# cat authorization-policy.jsonl 88 | {"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user":"*", "nonResourcePath": "*", "readonly": true}} 89 | {"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user":"admin", "namespace": "*", "resource": "*", "apiGroup": "*"}} 90 | {"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user":"scheduler", "namespace": "*", "resource": "*", "apiGroup": "*"}} 91 | {"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user":"kubelet", "namespace": "*", "resource": "*", "apiGroup": "*"}} 92 | {"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"group":"system:serviceaccounts", "namespace": "*", "resource": "*", "apiGroup": "*", "nonResourcePath": "*"}} 93 | [root@controller1 ~]# 94 | ``` 95 | 96 | Move the authorization policy file into the Kubernetes configuration directory so it can be read by the Kubernetes API server. 97 | ``` 98 | sudo mv authorization-policy.jsonl /var/lib/kubernetes/ 99 | ``` 100 | 101 | ## Create the systemd unit file 102 | 103 | We need the IP address of each controller node, when we create the systemd file. We will setup a variable INTERNAL_IP with the IP address of each VM. 104 | 105 | ``` 106 | [root@controller1 ~]# export INTERNAL_IP='10.240.0.21' 107 | ``` 108 | 109 | ``` 110 | [root@controller2 ~]# export INTERNAL_IP='10.240.0.22' 111 | ``` 112 | 113 | 114 | Create the systemd unit file: 115 | ``` 116 | cat > kube-apiserver.service <<"EOF" 117 | [Unit] 118 | Description=Kubernetes API Server 119 | Documentation=https://github.com/GoogleCloudPlatform/kubernetes 120 | 121 | [Service] 122 | ExecStart=/usr/bin/kube-apiserver \ 123 | --admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota \ 124 | --advertise-address=INTERNAL_IP \ 125 | --allow-privileged=true \ 126 | --apiserver-count=3 \ 127 | --authorization-mode=ABAC \ 128 | --authorization-policy-file=/var/lib/kubernetes/authorization-policy.jsonl \ 129 | --bind-address=0.0.0.0 \ 130 | --enable-swagger-ui=true \ 131 | --etcd-cafile=/var/lib/kubernetes/ca.pem \ 132 | --insecure-bind-address=0.0.0.0 \ 133 | --kubelet-certificate-authority=/var/lib/kubernetes/ca.pem \ 134 | --etcd-servers=https://10.240.0.11:2379,https://10.240.0.12:2379 \ 135 | --service-account-key-file=/var/lib/kubernetes/kubernetes-key.pem \ 136 | --service-cluster-ip-range=10.32.0.0/24 \ 137 | --service-node-port-range=30000-32767 \ 138 | --tls-cert-file=/var/lib/kubernetes/kubernetes.pem \ 139 | --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \ 140 | --token-auth-file=/var/lib/kubernetes/token.csv \ 141 | --v=2 142 | Restart=on-failure 143 | RestartSec=5 144 | 145 | [Install] 146 | WantedBy=multi-user.target 147 | EOF 148 | ``` 149 | 150 | ``` 151 | sed -i s/INTERNAL_IP/$INTERNAL_IP/g kube-apiserver.service 152 | sudo mv kube-apiserver.service /etc/systemd/system/ 153 | ``` 154 | 155 | ``` 156 | sudo systemctl daemon-reload 157 | sudo systemctl enable kube-apiserver 158 | sudo systemctl start kube-apiserver 159 | 160 | sudo systemctl status kube-apiserver --no-pager 161 | ``` 162 | 163 | Verify that kube-api server is listening on both controller nodes: 164 | 165 | ``` 166 | [root@controller1 ~]# sudo systemctl status kube-apiserver --no-pager 167 | ● kube-apiserver.service - Kubernetes API Server 168 | Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled) 169 | Active: active (running) since Tue 2016-09-13 11:08:12 CEST; 17s ago 170 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 171 | Main PID: 1464 (kube-apiserver) 172 | Tasks: 6 (limit: 512) 173 | CGroup: /system.slice/kube-apiserver.service 174 | └─1464 /usr/bin/kube-apiserver --admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota... 175 | 176 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: W0913 11:08:13.299066 1464 controller.go:307] Resetting endpoints for ...ion:"" 177 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: [restful] 2016/09/13 11:08:13 log.go:30: [restful/swagger] listing is ava...erapi/ 178 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: [restful] 2016/09/13 11:08:13 log.go:30: [restful/swagger] https://10.240...er-ui/ 179 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: I0913 11:08:13.439571 1464 genericapiserver.go:690] Serving securely o...0:6443 180 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: I0913 11:08:13.439745 1464 genericapiserver.go:734] Serving insecurely...0:8080 181 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: I0913 11:08:13.940647 1464 handlers.go:165] GET /api/v1/serviceaccount...56140] 182 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: I0913 11:08:13.944980 1464 handlers.go:165] GET /api/v1/secrets?fieldS...56136] 183 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: I0913 11:08:13.947133 1464 handlers.go:165] GET /api/v1/resourcequotas...56138] 184 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: I0913 11:08:13.950795 1464 handlers.go:165] GET /api/v1/namespaces?res...56142] 185 | Sep 13 11:08:13 controller1.example.com kube-apiserver[1464]: I0913 11:08:13.966576 1464 handlers.go:165] GET /api/v1/limitranges?re...56142] 186 | Hint: Some lines were ellipsized, use -l to show in full. 187 | [root@controller1 ~]# 188 | 189 | [root@controller2 ~]# sudo systemctl status kube-apiserver --no-pager 190 | ● kube-apiserver.service - Kubernetes API Server 191 | Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled) 192 | Active: active (running) since Tue 2016-09-13 11:08:16 CEST; 1min 16s ago 193 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 194 | Main PID: 1488 (kube-apiserver) 195 | Tasks: 5 (limit: 512) 196 | CGroup: /system.slice/kube-apiserver.service 197 | └─1488 /usr/bin/kube-apiserver --admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota... 198 | 199 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: W0913 11:08:17.165892 1488 controller.go:342] Resetting endpoints for ...ion:"" 200 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: [restful] 2016/09/13 11:08:17 log.go:30: [restful/swagger] listing is ava...erapi/ 201 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: [restful] 2016/09/13 11:08:17 log.go:30: [restful/swagger] https://10.240...er-ui/ 202 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: I0913 11:08:17.244260 1488 genericapiserver.go:690] Serving securely o...0:6443 203 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: I0913 11:08:17.244275 1488 genericapiserver.go:734] Serving insecurely...0:8080 204 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: I0913 11:08:17.757433 1488 handlers.go:165] GET /api/v1/resourcequotas...35132] 205 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: I0913 11:08:17.759790 1488 handlers.go:165] GET /api/v1/secrets?fieldS...35126] 206 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: I0913 11:08:17.761101 1488 handlers.go:165] GET /api/v1/serviceaccount...35128] 207 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: I0913 11:08:17.763786 1488 handlers.go:165] GET /api/v1/limitranges?re...35130] 208 | Sep 13 11:08:17 controller2.example.com kube-apiserver[1488]: I0913 11:08:17.768911 1488 handlers.go:165] GET /api/v1/namespaces?res...35124] 209 | Hint: Some lines were ellipsized, use -l to show in full. 210 | [root@controller2 ~]# 211 | ``` 212 | 213 | ## Kubernetes Controller Manager 214 | 215 | ``` 216 | cat > kube-controller-manager.service <<"EOF" 217 | [Unit] 218 | Description=Kubernetes Controller Manager 219 | Documentation=https://github.com/GoogleCloudPlatform/kubernetes 220 | 221 | [Service] 222 | ExecStart=/usr/bin/kube-controller-manager \ 223 | --allocate-node-cidrs=true \ 224 | --cluster-cidr=10.200.0.0/16 \ 225 | --cluster-name=kubernetes \ 226 | --leader-elect=true \ 227 | --master=http://INTERNAL_IP:8080 \ 228 | --root-ca-file=/var/lib/kubernetes/ca.pem \ 229 | --service-account-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \ 230 | --service-cluster-ip-range=10.32.0.0/24 \ 231 | --v=2 232 | Restart=on-failure 233 | RestartSec=5 234 | 235 | [Install] 236 | WantedBy=multi-user.target 237 | EOF 238 | ``` 239 | 240 | ``` 241 | sed -i s/INTERNAL_IP/$INTERNAL_IP/g kube-controller-manager.service 242 | sudo mv kube-controller-manager.service /etc/systemd/system/ 243 | ``` 244 | 245 | ``` 246 | sudo systemctl daemon-reload 247 | sudo systemctl enable kube-controller-manager 248 | sudo systemctl start kube-controller-manager 249 | 250 | sudo systemctl status kube-controller-manager --no-pager 251 | ``` 252 | 253 | Verify that kube-api-server is running on both nodes: 254 | 255 | ``` 256 | [root@controller1 ~]# sudo systemctl status kube-controller-manager --no-pager 257 | ● kube-controller-manager.service - Kubernetes Controller Manager 258 | Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled) 259 | Active: active (running) since Tue 2016-09-13 11:12:13 CEST; 13s ago 260 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 261 | Main PID: 1531 (kube-controller) 262 | Tasks: 5 (limit: 512) 263 | CGroup: /system.slice/kube-controller-manager.service 264 | └─1531 /usr/bin/kube-controller-manager --allocate-node-cidrs=true --cluster-cidr=10.200.0.0/16 --cluster-name=kubernetes --leader... 265 | 266 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: I0913 11:12:23.485918 1531 pet_set.go:144] Starting petset controller 267 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: I0913 11:12:23.561887 1531 plugins.go:340] Loaded volume plugi...-ebs" 268 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: I0913 11:12:23.562103 1531 plugins.go:340] Loaded volume plugi...e-pd" 269 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: I0913 11:12:23.562227 1531 plugins.go:340] Loaded volume plugi...nder" 270 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: I0913 11:12:23.570878 1531 attach_detach_controller.go:191] St...oller 271 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: E0913 11:12:23.583095 1531 util.go:45] Metric for serviceaccou...tered 272 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: W0913 11:12:23.595468 1531 request.go:347] Field selector: v1 ...ctly. 273 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: I0913 11:12:23.619022 1531 endpoints_controller.go:322] Waitin...netes 274 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: W0913 11:12:23.649898 1531 request.go:347] Field selector: v1 ...ctly. 275 | Sep 13 11:12:23 controller1.example.com kube-controller-manager[1531]: I0913 11:12:23.737340 1531 endpoints_controller.go:322] Waitin...netes 276 | Hint: Some lines were ellipsized, use -l to show in full. 277 | [root@controller1 ~]# 278 | 279 | 280 | [root@controller2 ~]# sudo systemctl enable kube-controller-manager 281 | Created symlink from /etc/systemd/system/multi-user.target.wants/kube-controller-manager.service to /etc/systemd/system/kube-controller-manager.service. 282 | 283 | [root@controller2 ~]# sudo systemctl start kube-controller-manager 284 | [root@controller2 ~]# sudo systemctl status kube-controller-manager --no-pager 285 | ● kube-controller-manager.service - Kubernetes Controller Manager 286 | Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled) 287 | Active: active (running) since Tue 2016-09-13 11:12:18 CEST; 11s ago 288 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 289 | Main PID: 1553 (kube-controller) 290 | Tasks: 4 (limit: 512) 291 | CGroup: /system.slice/kube-controller-manager.service 292 | └─1553 /usr/bin/kube-controller-manager --allocate-node-cidrs=true --cluster-cidr=10.200.0.0/16 --cluster-name=kubernetes --leader... 293 | 294 | Sep 13 11:12:18 controller2.example.com systemd[1]: Started Kubernetes Controller Manager. 295 | Sep 13 11:12:18 controller2.example.com kube-controller-manager[1553]: I0913 11:12:18.246979 1553 leaderelection.go:296] lock is held...pired 296 | Sep 13 11:12:21 controller2.example.com kube-controller-manager[1553]: I0913 11:12:21.701152 1553 leaderelection.go:296] lock is held...pired 297 | Sep 13 11:12:25 controller2.example.com kube-controller-manager[1553]: I0913 11:12:25.960509 1553 leaderelection.go:296] lock is held...pired 298 | Sep 13 11:12:29 controller2.example.com kube-controller-manager[1553]: I0913 11:12:29.558337 1553 leaderelection.go:296] lock is held...pired 299 | Hint: Some lines were ellipsized, use -l to show in full. 300 | [root@controller2 ~]# 301 | ``` 302 | 303 | ## Kubernetes Scheduler 304 | 305 | ``` 306 | cat > kube-scheduler.service <<"EOF" 307 | [Unit] 308 | Description=Kubernetes Scheduler 309 | Documentation=https://github.com/GoogleCloudPlatform/kubernetes 310 | 311 | [Service] 312 | ExecStart=/usr/bin/kube-scheduler \ 313 | --leader-elect=true \ 314 | --master=http://INTERNAL_IP:8080 \ 315 | --v=2 316 | Restart=on-failure 317 | RestartSec=5 318 | 319 | [Install] 320 | WantedBy=multi-user.target 321 | EOF 322 | ``` 323 | 324 | ``` 325 | sed -i s/INTERNAL_IP/$INTERNAL_IP/g kube-scheduler.service 326 | sudo mv kube-scheduler.service /etc/systemd/system/ 327 | ``` 328 | 329 | ``` 330 | sudo systemctl daemon-reload 331 | sudo systemctl enable kube-scheduler 332 | sudo systemctl start kube-scheduler 333 | 334 | sudo systemctl status kube-scheduler --no-pager 335 | ``` 336 | 337 | Verify that kube-scheduler is running on both nodes: 338 | 339 | ``` 340 | [root@controller1 ~]# sudo systemctl status kube-scheduler --no-pager 341 | ● kube-scheduler.service - Kubernetes Scheduler 342 | Loaded: loaded (/etc/systemd/system/kube-scheduler.service; enabled; vendor preset: disabled) 343 | Active: active (running) since Tue 2016-09-13 11:16:19 CEST; 1s ago 344 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 345 | Main PID: 1591 (kube-scheduler) 346 | Tasks: 4 (limit: 512) 347 | CGroup: /system.slice/kube-scheduler.service 348 | └─1591 /usr/bin/kube-scheduler --leader-elect=true --master=http://10.240.0.21:8080 --v=2 349 | 350 | Sep 13 11:16:19 controller1.example.com systemd[1]: Started Kubernetes Scheduler. 351 | Sep 13 11:16:19 controller1.example.com kube-scheduler[1591]: I0913 11:16:19.701363 1591 factory.go:255] Creating scheduler from alg...vider' 352 | Sep 13 11:16:19 controller1.example.com kube-scheduler[1591]: I0913 11:16:19.701740 1591 factory.go:301] creating scheduler with fit predi... 353 | Sep 13 11:16:19 controller1.example.com kube-scheduler[1591]: E0913 11:16:19.743682 1591 event.go:257] Could not construct reference to: '... 354 | Sep 13 11:16:19 controller1.example.com kube-scheduler[1591]: I0913 11:16:19.744595 1591 leaderelection.go:215] sucessfully acquired...eduler 355 | Hint: Some lines were ellipsized, use -l to show in full. 356 | [root@controller1 ~]# 357 | 358 | 359 | 360 | [root@controller2 ~]# sudo systemctl status kube-scheduler --no-pager 361 | ● kube-scheduler.service - Kubernetes Scheduler 362 | Loaded: loaded (/etc/systemd/system/kube-scheduler.service; enabled; vendor preset: disabled) 363 | Active: active (running) since Tue 2016-09-13 11:16:24 CEST; 1s ago 364 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 365 | Main PID: 1613 (kube-scheduler) 366 | Tasks: 4 (limit: 512) 367 | CGroup: /system.slice/kube-scheduler.service 368 | └─1613 /usr/bin/kube-scheduler --leader-elect=true --master=http://10.240.0.22:8080 --v=2 369 | 370 | Sep 13 11:16:24 controller2.example.com systemd[1]: Started Kubernetes Scheduler. 371 | Sep 13 11:16:25 controller2.example.com kube-scheduler[1613]: I0913 11:16:25.111478 1613 factory.go:255] Creating scheduler from alg...vider' 372 | Sep 13 11:16:25 controller2.example.com kube-scheduler[1613]: I0913 11:16:25.112652 1613 factory.go:301] creating scheduler with fit predi... 373 | Sep 13 11:16:25 controller2.example.com kube-scheduler[1613]: I0913 11:16:25.163057 1613 leaderelection.go:296] lock is held by cont...xpired 374 | Hint: Some lines were ellipsized, use -l to show in full. 375 | [root@controller2 ~]# 376 | ``` 377 | 378 | 379 | Verify using `kubectl get componentstatuses`: 380 | ``` 381 | [root@controller1 ~]# kubectl get componentstatuses 382 | NAME STATUS MESSAGE ERROR 383 | scheduler Healthy ok 384 | controller-manager Healthy ok 385 | etcd-1 Healthy {"health": "true"} 386 | etcd-0 Healthy {"health": "true"} 387 | [root@controller1 ~]# 388 | 389 | 390 | [root@controller2 ~]# kubectl get componentstatuses 391 | NAME STATUS MESSAGE ERROR 392 | controller-manager Healthy ok 393 | scheduler Healthy ok 394 | etcd-0 Healthy {"health": "true"} 395 | etcd-1 Healthy {"health": "true"} 396 | [root@controller2 ~]# 397 | ``` 398 | 399 | 400 | 401 | # Setup Kubernetes frontend load balancer 402 | 403 | This is not critical, and can be done later. 404 | 405 | (TODO)(To do) 406 | 407 | -------------------------------------------------------------------------------- /chapter07.md: -------------------------------------------------------------------------------- 1 | # Chapter 7: HA for Kuberntes Control plane using Pacemaker 2 | 3 | Setup all the good stuff (IPVS/LVS/Pacemaker, etc), to provide HA for Kubernetes contol nodes. 4 | 5 | 6 | -------------------------------------------------------------------------------- /chapter08.md: -------------------------------------------------------------------------------- 1 | # Chapter 8: Setup Kubernetes Worker Nodes 2 | 3 | 4 | * Ease of deployment and configuration 5 | * Avoid mixing arbitrary workloads with critical cluster components. We are building machine with just enough resources so we don't have to worry about wasting resources. 6 | 7 | ## Provision the Kubernetes Worker Nodes 8 | 9 | Run the following commands on all worker nodes. 10 | 11 | Move the TLS certificates in place 12 | ``` 13 | sudo mkdir -p /var/lib/kubernetes 14 | sudo mv ca.pem kubernetes-key.pem kubernetes.pem /var/lib/kubernetes/ 15 | ``` 16 | 17 | ## Install Docker 18 | 19 | Kubernetes should be compatible with the Docker 1.9.x - 1.11.x: 20 | 21 | ``` 22 | wget https://get.docker.com/builds/Linux/x86_64/docker-1.11.2.tgz 23 | 24 | tar -xf docker-1.11.2.tgz 25 | 26 | sudo cp docker/docker* /usr/bin/ 27 | ``` 28 | 29 | Create the Docker systemd unit file: 30 | ``` 31 | sudo sh -c 'echo "[Unit] 32 | Description=Docker Application Container Engine 33 | Documentation=http://docs.docker.io 34 | 35 | [Service] 36 | ExecStart=/usr/bin/docker daemon \ 37 | --iptables=false \ 38 | --ip-masq=false \ 39 | --host=unix:///var/run/docker.sock \ 40 | --log-level=error \ 41 | --storage-driver=overlay 42 | Restart=on-failure 43 | RestartSec=5 44 | 45 | [Install] 46 | WantedBy=multi-user.target" > /etc/systemd/system/docker.service' 47 | ``` 48 | 49 | ``` 50 | sudo systemctl daemon-reload 51 | sudo systemctl enable docker 52 | sudo systemctl start docker 53 | 54 | sudo docker version 55 | ``` 56 | 57 | 58 | ``` 59 | [root@worker1 ~]# sudo docker version 60 | Client: 61 | Version: 1.11.2 62 | API version: 1.23 63 | Go version: go1.5.4 64 | Git commit: b9f10c9 65 | Built: Wed Jun 1 21:20:08 2016 66 | OS/Arch: linux/amd64 67 | 68 | Server: 69 | Version: 1.11.2 70 | API version: 1.23 71 | Go version: go1.5.4 72 | Git commit: b9f10c9 73 | Built: Wed Jun 1 21:20:08 2016 74 | OS/Arch: linux/amd64 75 | [root@worker1 ~]# 76 | ``` 77 | 78 | ## Setup kubelet on worker nodes: 79 | 80 | The Kubernetes kubelet no longer relies on docker networking for pods! The Kubelet can now use CNI - the Container Network Interface to manage machine level networking requirements. 81 | 82 | Download and install CNI plugins 83 | 84 | ``` 85 | sudo mkdir -p /opt/cni 86 | 87 | wget https://storage.googleapis.com/kubernetes-release/network-plugins/cni-c864f0e1ea73719b8f4582402b0847064f9883b0.tar.gz 88 | 89 | sudo tar -xvf cni-c864f0e1ea73719b8f4582402b0847064f9883b0.tar.gz -C /opt/cni 90 | ``` 91 | **Note:** Kelsey's guide does not mention this, but the kubernetes binaries look for plugin binaries in /opt/plugin-name/bin/, and then in other paths if nothing is found over there. 92 | 93 | 94 | Download and install the Kubernetes worker binaries: 95 | ``` 96 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kubectl 97 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kube-proxy 98 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kubelet 99 | ``` 100 | 101 | ``` 102 | chmod +x kubectl kube-proxy kubelet 103 | 104 | sudo mv kubectl kube-proxy kubelet /usr/bin/ 105 | 106 | sudo mkdir -p /var/lib/kubelet/ 107 | ``` 108 | 109 | 110 | Create kubeconfig file: 111 | ``` 112 | sudo sh -c 'echo "apiVersion: v1 113 | kind: Config 114 | clusters: 115 | - cluster: 116 | certificate-authority: /var/lib/kubernetes/ca.pem 117 | server: https://10.240.0.21:6443 118 | name: kubernetes 119 | contexts: 120 | - context: 121 | cluster: kubernetes 122 | user: kubelet 123 | name: kubelet 124 | current-context: kubelet 125 | users: 126 | - name: kubelet 127 | user: 128 | token: chAng3m3" > /var/lib/kubelet/kubeconfig' 129 | ``` 130 | **Note:** Notice that `server` is specified as 10.240.0.21, which is the IP address of the first controller. We can use the virtual IP of the controllers (which is 10.240.0.20) , but we have not actually configured a load balancer with this IP address yet. So we are just using the IP address of one of the controller nodes. Remember, Kelsey's guide uses the IP address of 10.240.0.20 , but that is the IP address of controller0 in his guide, not the VIP of controller nodes. 131 | 132 | 133 | Create the kubelet systemd unit file: 134 | ``` 135 | sudo sh -c 'echo "[Unit] 136 | Description=Kubernetes Kubelet 137 | Documentation=https://github.com/GoogleCloudPlatform/kubernetes 138 | After=docker.service 139 | Requires=docker.service 140 | 141 | [Service] 142 | ExecStart=/usr/bin/kubelet \ 143 | --allow-privileged=true \ 144 | --api-servers=https://10.240.0.21:6443,https://10.240.0.22:6443 \ 145 | --cloud-provider= \ 146 | --cluster-dns=10.32.0.10 \ 147 | --cluster-domain=cluster.local \ 148 | --configure-cbr0=true \ 149 | --container-runtime=docker \ 150 | --docker=unix:///var/run/docker.sock \ 151 | --network-plugin=kubenet \ 152 | --kubeconfig=/var/lib/kubelet/kubeconfig \ 153 | --reconcile-cidr=true \ 154 | --serialize-image-pulls=false \ 155 | --tls-cert-file=/var/lib/kubernetes/kubernetes.pem \ 156 | --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \ 157 | --v=2 158 | 159 | Restart=on-failure 160 | RestartSec=5 161 | 162 | [Install] 163 | WantedBy=multi-user.target" > /etc/systemd/system/kubelet.service' 164 | ``` 165 | **Note:** Notice `--configure-cbr0=true` , this enables the container bridge, which from the pool 10.200.0.0/16, and can be any of 10.200.x.0/24 network. Also notice that this service requires the docker service to be up before it starts. 166 | 167 | 168 | Start the kubelet service and check that it is running: 169 | ``` 170 | sudo systemctl daemon-reload 171 | sudo systemctl enable kubelet 172 | sudo systemctl start kubelet 173 | 174 | sudo systemctl status kubelet --no-pager 175 | ``` 176 | 177 | 178 | ``` 179 | [root@worker1 ~]# sudo systemctl status kubelet --no-pager 180 | ● kubelet.service - Kubernetes Kubelet 181 | Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) 182 | Active: active (running) since Wed 2016-09-14 11:38:03 CEST; 1s ago 183 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 184 | Main PID: 1954 (kubelet) 185 | Tasks: 11 (limit: 512) 186 | CGroup: /system.slice/kubelet.service 187 | ├─1954 /usr/bin/kubelet --allow-privileged=true --api-servers=https://10.240.0.21:6443,https://10.240.0.22:6443 --cloud-provider= ... 188 | └─2002 journalctl -k -f 189 | 190 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.018143 1954 kubelet.go:1197] Attempting to register node worker1....ple.com 191 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.019360 1954 kubelet.go:1200] Unable to register worker1.example.c...refused 192 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.220851 1954 kubelet.go:2924] Recording NodeHasSufficientDisk even...ple.com 193 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.221101 1954 kubelet.go:2924] Recording NodeHasSufficientMemory ev...ple.com 194 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.221281 1954 kubelet.go:1197] Attempting to register node worker1....ple.com 195 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.222266 1954 kubelet.go:1200] Unable to register worker1.example.c...refused 196 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.623784 1954 kubelet.go:2924] Recording NodeHasSufficientDisk even...ple.com 197 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.624059 1954 kubelet.go:2924] Recording NodeHasSufficientMemory ev...ple.com 198 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.624328 1954 kubelet.go:1197] Attempting to register node worker1....ple.com 199 | Sep 14 11:38:04 worker1.example.com kubelet[1954]: I0914 11:38:04.625329 1954 kubelet.go:1200] Unable to register worker1.example.c...refused 200 | Hint: Some lines were ellipsized, use -l to show in full. 201 | [root@worker1 ~]# 202 | 203 | 204 | [root@worker2 ~]# sudo systemctl status kubelet --no-pager 205 | ● kubelet.service - Kubernetes Kubelet 206 | Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) 207 | Active: active (running) since Wed 2016-09-14 11:38:08 CEST; 920ms ago 208 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 209 | Main PID: 1999 (kubelet) 210 | Tasks: 10 (limit: 512) 211 | CGroup: /system.slice/kubelet.service 212 | ├─1999 /usr/bin/kubelet --allow-privileged=true --api-servers=https://10.240.0.21:6443,https://10.240.0.22:6443 --cloud-provider= ... 213 | └─2029 journalctl -k -f 214 | 215 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.218332 1999 manager.go:281] Starting recovery of all containers 216 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.254250 1999 manager.go:286] Recovery completed 217 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.359776 1999 kubelet.go:2924] Recording NodeHasSufficientDisk even...ple.com 218 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.359800 1999 kubelet.go:2924] Recording NodeHasSufficientMemory ev...ple.com 219 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.360031 1999 kubelet.go:1197] Attempting to register node worker2....ple.com 220 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.363621 1999 kubelet.go:1200] Unable to register worker2.example.c...refused 221 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.565044 1999 kubelet.go:2924] Recording NodeHasSufficientDisk even...ple.com 222 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.566188 1999 kubelet.go:2924] Recording NodeHasSufficientMemory ev...ple.com 223 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.566323 1999 kubelet.go:1197] Attempting to register node worker2....ple.com 224 | Sep 14 11:38:09 worker2.example.com kubelet[1999]: I0914 11:38:09.568444 1999 kubelet.go:1200] Unable to register worker2.example.c...refused 225 | Hint: Some lines were ellipsized, use -l to show in full. 226 | [root@worker2 ~]# 227 | ``` 228 | 229 | ## kube-proxy 230 | Kube-proxy sets up IPTables rules on the nodes so containers can find services. 231 | 232 | Create systemd unit file for kube-proxy: 233 | ``` 234 | sudo sh -c 'echo "[Unit] 235 | Description=Kubernetes Kube Proxy 236 | Documentation=https://github.com/GoogleCloudPlatform/kubernetes 237 | 238 | [Service] 239 | ExecStart=/usr/bin/kube-proxy \ 240 | --master=https://10.240.0.21:6443 \ 241 | --kubeconfig=/var/lib/kubelet/kubeconfig \ 242 | --proxy-mode=iptables \ 243 | --v=2 244 | 245 | Restart=on-failure 246 | RestartSec=5 247 | 248 | [Install] 249 | WantedBy=multi-user.target" > /etc/systemd/system/kube-proxy.service' 250 | ``` 251 | **Note:** We have used the IP address of the first controller in the systemd file above. Later, we can change it to use the VIP of the controller nodes. 252 | 253 | 254 | 255 | 256 | 257 | ``` 258 | sudo systemctl daemon-reload 259 | sudo systemctl enable kube-proxy 260 | sudo systemctl start kube-proxy 261 | 262 | sudo systemctl status kube-proxy --no-pager 263 | ``` 264 | 265 | 266 | ``` 267 | [root@worker1 ~]# sudo systemctl status kube-proxy --no-pager 268 | ● kube-proxy.service - Kubernetes Kube Proxy 269 | Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled) 270 | Active: active (running) since Wed 2016-09-14 12:02:35 CEST; 635ms ago 271 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 272 | Main PID: 2373 (kube-proxy) 273 | Tasks: 4 (limit: 512) 274 | CGroup: /system.slice/kube-proxy.service 275 | └─2373 /usr/bin/kube-proxy --master=https://10.240.0.21:6443 --kubeconfig=/var/lib/kubelet/kubeconfig --proxy-mode=iptables --v=2 276 | 277 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: I0914 12:02:35.508769 2373 server.go:202] Using iptables Proxier. 278 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: W0914 12:02:35.509552 2373 server.go:416] Failed to retrieve node info: Get ht...efused 279 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: W0914 12:02:35.509608 2373 proxier.go:227] invalid nodeIP, initialize kube-pro...nodeIP 280 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: I0914 12:02:35.509618 2373 server.go:214] Tearing down userspace rules. 281 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: I0914 12:02:35.521907 2373 conntrack.go:40] Setting nf_conntrack_max to 32768 282 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: I0914 12:02:35.522205 2373 conntrack.go:57] Setting conntrack hashsize to 8192 283 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: I0914 12:02:35.522521 2373 conntrack.go:62] Setting nf_conntrack_tcp_timeout_e... 86400 284 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: E0914 12:02:35.523511 2373 event.go:207] Unable to write event: 'Post https://...eping) 285 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: E0914 12:02:35.523709 2373 reflector.go:205] pkg/proxy/config/api.go:33: Faile...efused 286 | Sep 14 12:02:35 worker1.example.com kube-proxy[2373]: E0914 12:02:35.523947 2373 reflector.go:205] pkg/proxy/config/api.go:30: Faile...efused 287 | Hint: Some lines were ellipsized, use -l to show in full. 288 | [root@worker1 ~]# 289 | 290 | 291 | [root@worker2 ~]# sudo systemctl status kube-proxy --no-pager 292 | ● kube-proxy.service - Kubernetes Kube Proxy 293 | Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled) 294 | Active: active (running) since Wed 2016-09-14 12:02:46 CEST; 1s ago 295 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 296 | Main PID: 2385 (kube-proxy) 297 | Tasks: 4 (limit: 512) 298 | CGroup: /system.slice/kube-proxy.service 299 | └─2385 /usr/bin/kube-proxy --master=https://10.240.0.21:6443 --kubeconfig=/var/lib/kubelet/kubeconfig --proxy-mode=iptables --v=2 300 | 301 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: W0914 12:02:46.660676 2385 proxier.go:227] invalid nodeIP, initialize kube-pro...nodeIP 302 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: I0914 12:02:46.660690 2385 server.go:214] Tearing down userspace rules. 303 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: I0914 12:02:46.670904 2385 conntrack.go:40] Setting nf_conntrack_max to 32768 304 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: I0914 12:02:46.671630 2385 conntrack.go:57] Setting conntrack hashsize to 8192 305 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: I0914 12:02:46.671687 2385 conntrack.go:62] Setting nf_conntrack_tcp_timeout_e... 86400 306 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: E0914 12:02:46.673067 2385 event.go:207] Unable to write event: 'Post https://...eping) 307 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: E0914 12:02:46.673266 2385 reflector.go:205] pkg/proxy/config/api.go:33: Faile...efused 308 | Sep 14 12:02:46 worker2.example.com kube-proxy[2385]: E0914 12:02:46.673514 2385 reflector.go:205] pkg/proxy/config/api.go:30: Faile...efused 309 | Sep 14 12:02:47 worker2.example.com kube-proxy[2385]: E0914 12:02:47.674206 2385 reflector.go:205] pkg/proxy/config/api.go:33: Faile...efused 310 | Sep 14 12:02:47 worker2.example.com kube-proxy[2385]: E0914 12:02:47.674254 2385 reflector.go:205] pkg/proxy/config/api.go:30: Faile...efused 311 | Hint: Some lines were ellipsized, use -l to show in full. 312 | [root@worker2 ~]# 313 | ``` 314 | 315 | 316 | At this point, you should be able to see the nodes as **Ready**. 317 | 318 | ``` 319 | [root@controller1 ~]# kubectl get componentstatuses 320 | NAME STATUS MESSAGE ERROR 321 | scheduler Healthy ok 322 | controller-manager Healthy ok 323 | etcd-0 Healthy {"health": "true"} 324 | etcd-1 Healthy {"health": "true"} 325 | 326 | 327 | [root@controller1 ~]# kubectl get nodes 328 | NAME STATUS AGE 329 | worker1.example.com Ready 47s 330 | worker2.example.com Ready 41s 331 | [root@controller1 ~]# 332 | ``` 333 | 334 | **Note:** Sometimes the nodes do not show up as Ready in the output of `kubectl get nodes` command. It is ok to reboot the worker nodes. 335 | 336 | 337 | **Note:** Worker node configuration is complete at this point. 338 | 339 | ------ 340 | ## Some notes on CIDR/CNI IP address showing/not-showing on the worker nodes: 341 | 342 | (to do) Add a step to make sure that the worker nodes have got the CIDR IP address. Right now, in my setup, I do not see CIDR addresses assigned to my worker nodes, even though they show up as Ready . 343 | 344 | 345 | ( **UPDATE:** I recently found out that the CIDR network assigned to each worker node shows up in the output of `kubectl describe node ` command. This is very handy! ) 346 | 347 | ``` 348 | [root@worker1 ~]# ifconfig 349 | docker0: flags=4099 mtu 1500 350 | inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0 351 | ether 02:42:4a:68:e4:2f txqueuelen 0 (Ethernet) 352 | RX packets 0 bytes 0 (0.0 B) 353 | RX errors 0 dropped 0 overruns 0 frame 0 354 | TX packets 0 bytes 0 (0.0 B) 355 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 356 | 357 | ens3: flags=4163 mtu 1500 358 | inet 10.240.0.31 netmask 255.255.255.0 broadcast 10.240.0.255 359 | inet6 fe80::5054:ff:fe03:a650 prefixlen 64 scopeid 0x20 360 | ether 52:54:00:03:a6:50 txqueuelen 1000 (Ethernet) 361 | RX packets 2028 bytes 649017 (633.8 KiB) 362 | RX errors 0 dropped 6 overruns 0 frame 0 363 | TX packets 1689 bytes 262384 (256.2 KiB) 364 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 365 | 366 | lo: flags=73 mtu 65536 367 | inet 127.0.0.1 netmask 255.0.0.0 368 | inet6 ::1 prefixlen 128 scopeid 0x10 369 | loop txqueuelen 1 (Local Loopback) 370 | RX packets 20 bytes 1592 (1.5 KiB) 371 | RX errors 0 dropped 0 overruns 0 frame 0 372 | TX packets 20 bytes 1592 (1.5 KiB) 373 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 374 | 375 | [root@worker1 ~]# 376 | ``` 377 | **Note:** Where is the IP address from CIDR? 378 | 379 | Here is a hint of the underlying problem (which, actually is not a problem): 380 | ``` 381 | [root@worker1 ~]# systemctl status kubelet -l 382 | ● kubelet.service - Kubernetes Kubelet 383 | Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) 384 | Active: active (running) since Wed 2016-09-14 13:16:13 CEST; 9min ago 385 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 386 | Main PID: 4744 (kubelet) 387 | Tasks: 11 (limit: 512) 388 | CGroup: /system.slice/kubelet.service 389 | ├─4744 /usr/bin/kubelet --allow-privileged=true --api-servers=https://10.240.0.21:6443,https://10.240.0.22:6443 --cloud-provider= --c 390 | └─4781 journalctl -k -f 391 | 392 | 4744 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR] 393 | 4744 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR] 394 | 4744 kubelet.go:2510] skipping pod synchronization - [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR] 395 | 4744 kubelet.go:2924] Recording NodeReady event message for node worker1.example.com 396 | ``` 397 | 398 | On the controller I see that the kube-controller-manager has some details: 399 | 400 | ``` 401 | [root@controller2 ~]# systemctl status kube-controller-manager.service -l 402 | ● kube-controller-manager.service - Kubernetes Controller Manager 403 | Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled) 404 | Active: active (running) since Wed 2016-09-14 13:07:10 CEST; 25min ago 405 | Docs: https://github.com/GoogleCloudPlatform/kubernetes 406 | Main PID: 550 (kube-controller) 407 | Tasks: 5 (limit: 512) 408 | CGroup: /system.slice/kube-controller-manager.service 409 | └─550 /usr/bin/kube-controller-manager --allocate-node-cidrs=true --cluster-cidr=10.200.0.0/16 --cluster-name=kubernetes --leader-ele 410 | 411 | . . . 412 | 13:10:52.772513 550 nodecontroller.go:534] NodeController is entering network segmentation mode. 413 | 13:10:52.772630 550 event.go:216] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"worker2.example.com", UID:"worker2.example. 414 | 13:10:57.775051 550 nodecontroller.go:534] NodeController is entering network segmentation mode. 415 | 13:11:02.777334 550 nodecontroller.go:534] NodeController is entering network segmentation mode. 416 | 13:11:07.781592 550 nodecontroller.go:534] NodeController is entering network segmentation mode. 417 | 13:11:12.784489 550 nodecontroller.go:534] NodeController is entering network segmentation mode. 418 | 13:11:17.787018 550 nodecontroller.go:539] NodeController exited network segmentation mode. 419 | 13:17:36.729147 550 request.go:347] Field selector: v1 - serviceaccounts - metadata.name - default: need to check if this is versioned correctly. 420 | 13:25:32.730591 550 request.go:347] Field selector: v1 - serviceaccounts - metadata.name - default: need to check if this is versioned correctly. 421 | ``` 422 | 423 | 424 | Also look at this issue: [https://github.com/kelseyhightower/kubernetes-the-hard-way/issues/58](https://github.com/kelseyhightower/kubernetes-the-hard-way/issues/58) 425 | 426 | 427 | **Note:** The above issue explains/shows that the cbr0 network only gets created on pods when firt pod is created and is placed on the worker node. This also means that we cannot update routing table on our router until we know which network exists on which node?! 428 | 429 | 430 | This means, at this point, we shall create a test pod, and see if worker node gets a cbr0 IP address . We will also use this information at a later step, when we add routes to the routing table on our router. 431 | 432 | ------ 433 | 434 | 435 | # Create a test pod: 436 | 437 | Login to controller1 and run a test pod. Many people like to run nginx, which runs the ngin webserver, but does not have any tools for network troubleshooting. There is a centos based multitool I created, which runs apache and has many network troubleshooting tools built into it. It is available at dockerhub kamranazeem/centos-multitool . 438 | 439 | ``` 440 | [root@controller1 ~]# kubectl run centos-multitool --image=kamranazeem/centos-multitool 441 | deployment "centos-multitool" created 442 | [root@controller1 ~]# 443 | 444 | 445 | 446 | [root@controller1 ~]# kubectl get pods -o wide 447 | NAME READY STATUS RESTARTS AGE IP NODE 448 | centos-multitool-3822887632-6qbrh 1/1 Running 0 6m 10.200.1.2 worker2.example.com 449 | [root@controller1 ~]# 450 | ``` 451 | 452 | Check if the node got a cbr0 IP belonging to 10.200.x.0/24 , which in-turn will be a subnet of 10.200.0.0/16 . 453 | 454 | ``` 455 | [root@worker2 ~]# ifconfig 456 | cbr0: flags=4419 mtu 1500 457 | inet 10.200.1.1 netmask 255.255.255.0 broadcast 0.0.0.0 458 | inet6 fe80::2c94:fff:fe9d:9cf6 prefixlen 64 scopeid 0x20 459 | ether 16:89:74:67:7b:33 txqueuelen 1000 (Ethernet) 460 | RX packets 8 bytes 536 (536.0 B) 461 | RX errors 0 dropped 0 overruns 0 frame 0 462 | TX packets 10 bytes 732 (732.0 B) 463 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 464 | 465 | docker0: flags=4099 mtu 1500 466 | inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0 467 | ether 02:42:bb:60:8d:d0 txqueuelen 0 (Ethernet) 468 | RX packets 0 bytes 0 (0.0 B) 469 | RX errors 0 dropped 0 overruns 0 frame 0 470 | TX packets 0 bytes 0 (0.0 B) 471 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 472 | 473 | ens3: flags=4163 mtu 1500 474 | inet 10.240.0.32 netmask 255.255.255.0 broadcast 10.240.0.255 475 | inet6 fe80::5054:ff:fe4c:f48a prefixlen 64 scopeid 0x20 476 | ether 52:54:00:4c:f4:8a txqueuelen 1000 (Ethernet) 477 | RX packets 44371 bytes 132559205 (126.4 MiB) 478 | RX errors 0 dropped 6 overruns 0 frame 0 479 | TX packets 37129 bytes 3515567 (3.3 MiB) 480 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 481 | 482 | lo: flags=73 mtu 65536 483 | inet 127.0.0.1 netmask 255.0.0.0 484 | inet6 ::1 prefixlen 128 scopeid 0x10 485 | loop txqueuelen 1 (Local Loopback) 486 | RX packets 20 bytes 1592 (1.5 KiB) 487 | RX errors 0 dropped 0 overruns 0 frame 0 488 | TX packets 20 bytes 1592 (1.5 KiB) 489 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 490 | 491 | veth5a59821e: flags=4163 mtu 1500 492 | inet6 fe80::1489:74ff:fe67:7b33 prefixlen 64 scopeid 0x20 493 | ether 16:89:74:67:7b:33 txqueuelen 0 (Ethernet) 494 | RX packets 8 bytes 648 (648.0 B) 495 | RX errors 0 dropped 0 overruns 0 frame 0 496 | TX packets 17 bytes 1290 (1.2 KiB) 497 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 498 | 499 | [root@worker2 ~]# 500 | ``` 501 | 502 | Good! 503 | 504 | Lets increase the number of replicas of this pod to two, which is the same as number of worker nodes. This will hopefully distribute the pods evenly on all workers. 505 | 506 | ``` 507 | [root@controller1 ~]# kubectl scale deployment centos-multitool --replicas=2 508 | deployment "centos-multitool" scaled 509 | [root@controller1 ~]# 510 | ``` 511 | 512 | Check the pods and the nodes they are put on: 513 | 514 | ``` 515 | [root@controller1 ~]# kubectl get pods -o wide 516 | NAME READY STATUS RESTARTS AGE IP NODE 517 | centos-multitool-3822887632-6qbrh 1/1 Running 0 16m 10.200.1.2 worker2.example.com 518 | centos-multitool-3822887632-jeyhb 1/1 Running 0 9m 10.200.0.2 worker1.example.com 519 | [root@controller1 ~]# 520 | ``` 521 | 522 | Check the cbr0 interface on worker1 too: 523 | ``` 524 | [root@worker1 ~]# ifconfig 525 | cbr0: flags=4419 mtu 1500 526 | inet 10.200.0.1 netmask 255.255.255.0 broadcast 0.0.0.0 527 | inet6 fe80::6cb1:ddff:fe78:4d2f prefixlen 64 scopeid 0x20 528 | ether 0a:79:9f:11:20:22 txqueuelen 1000 (Ethernet) 529 | RX packets 8 bytes 536 (536.0 B) 530 | RX errors 0 dropped 0 overruns 0 frame 0 531 | TX packets 10 bytes 732 (732.0 B) 532 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 533 | 534 | docker0: flags=4099 mtu 1500 535 | inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0 536 | ether 02:42:fc:7a:23:24 txqueuelen 0 (Ethernet) 537 | RX packets 0 bytes 0 (0.0 B) 538 | RX errors 0 dropped 0 overruns 0 frame 0 539 | TX packets 0 bytes 0 (0.0 B) 540 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 541 | 542 | ens3: flags=4163 mtu 1500 543 | inet 10.240.0.31 netmask 255.255.255.0 broadcast 10.240.0.255 544 | inet6 fe80::5054:ff:fe03:a650 prefixlen 64 scopeid 0x20 545 | ether 52:54:00:03:a6:50 txqueuelen 1000 (Ethernet) 546 | RX packets 32880 bytes 114219841 (108.9 MiB) 547 | RX errors 0 dropped 5 overruns 0 frame 0 548 | TX packets 28126 bytes 2708515 (2.5 MiB) 549 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 550 | 551 | lo: flags=73 mtu 65536 552 | inet 127.0.0.1 netmask 255.0.0.0 553 | inet6 ::1 prefixlen 128 scopeid 0x10 554 | loop txqueuelen 1 (Local Loopback) 555 | RX packets 18 bytes 1492 (1.4 KiB) 556 | RX errors 0 dropped 0 overruns 0 frame 0 557 | TX packets 18 bytes 1492 (1.4 KiB) 558 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 559 | 560 | veth06329870: flags=4163 mtu 1500 561 | inet6 fe80::879:9fff:fe11:2022 prefixlen 64 scopeid 0x20 562 | ether 0a:79:9f:11:20:22 txqueuelen 0 (Ethernet) 563 | RX packets 8 bytes 648 (648.0 B) 564 | RX errors 0 dropped 0 overruns 0 frame 0 565 | TX packets 17 bytes 1290 (1.2 KiB) 566 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 567 | 568 | [root@worker1 ~]# 569 | ``` 570 | 571 | Good! Lets find the IP addresses of the pods: 572 | 573 | ``` 574 | [root@controller1 ~]# kubectl exec centos-multitool-3822887632-6qbrh ifconfig 575 | eth0: flags=4163 mtu 1500 576 | inet 10.200.1.2 netmask 255.255.255.0 broadcast 0.0.0.0 577 | inet6 fe80::acbc:28ff:feae:3397 prefixlen 64 scopeid 0x20 578 | ether 0a:58:0a:c8:01:02 txqueuelen 0 (Ethernet) 579 | RX packets 17 bytes 1290 (1.2 KiB) 580 | RX errors 0 dropped 0 overruns 0 frame 0 581 | TX packets 8 bytes 648 (648.0 B) 582 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 583 | 584 | lo: flags=73 mtu 65536 585 | inet 127.0.0.1 netmask 255.0.0.0 586 | inet6 ::1 prefixlen 128 scopeid 0x10 587 | loop txqueuelen 1 (Local Loopback) 588 | RX packets 0 bytes 0 (0.0 B) 589 | RX errors 0 dropped 0 overruns 0 frame 0 590 | TX packets 0 bytes 0 (0.0 B) 591 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 592 | 593 | [root@controller1 ~]# 594 | ``` 595 | 596 | ``` 597 | [root@controller1 ~]# kubectl exec centos-multitool-3822887632-jeyhb ifconfig 598 | eth0: flags=4163 mtu 1500 599 | inet 10.200.0.2 netmask 255.255.255.0 broadcast 0.0.0.0 600 | inet6 fe80::442d:6eff:fe18:f7e0 prefixlen 64 scopeid 0x20 601 | ether 0a:58:0a:c8:00:02 txqueuelen 0 (Ethernet) 602 | RX packets 17 bytes 1290 (1.2 KiB) 603 | RX errors 0 dropped 0 overruns 0 frame 0 604 | TX packets 8 bytes 648 (648.0 B) 605 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 606 | 607 | lo: flags=73 mtu 65536 608 | inet 127.0.0.1 netmask 255.0.0.0 609 | inet6 ::1 prefixlen 128 scopeid 0x10 610 | loop txqueuelen 1 (Local Loopback) 611 | RX packets 0 bytes 0 (0.0 B) 612 | RX errors 0 dropped 0 overruns 0 frame 0 613 | TX packets 0 bytes 0 (0.0 B) 614 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 615 | 616 | [root@controller1 ~]# 617 | ``` 618 | 619 | 620 | At this point, the pods will not be able to ping the pods on other nodes. It is because the routing is not setup on the router this cluster is connected to. 621 | 622 | ``` 623 | [root@controller1 ~]# kubectl exec centos-multitool-3822887632-6qbrh -it -- bash 624 | [root@centos-multitool-3822887632-6qbrh /]# 625 | 626 | [root@centos-multitool-3822887632-6qbrh /]# ping -c 1 10.200.1.1 627 | PING 10.200.1.1 (10.200.1.1) 56(84) bytes of data. 628 | 64 bytes from 10.200.1.1: icmp_seq=1 ttl=64 time=0.062 ms 629 | 630 | --- 10.200.1.1 ping statistics --- 631 | 1 packets transmitted, 1 received, 0% packet loss, time 0ms 632 | rtt min/avg/max/mdev = 0.062/0.062/0.062/0.000 ms 633 | 634 | 635 | [root@centos-multitool-3822887632-6qbrh /]# ping -c 1 10.200.0.1 636 | PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data. 637 | ^C 638 | --- 10.200.0.1 ping statistics --- 639 | 1 packets transmitted, 0 received, 100% packet loss, time 0ms 640 | 641 | 642 | [root@centos-multitool-3822887632-6qbrh /]# ping -c 1 10.200.0.2 643 | PING 10.200.0.2 (10.200.0.2) 56(84) bytes of data. 644 | ^C 645 | --- 10.200.0.2 ping statistics --- 646 | 1 packets transmitted, 0 received, 100% packet loss, time 0ms 647 | 648 | [root@centos-multitool-3822887632-6qbrh /]# 649 | ``` 650 | 651 | We will setup routing in the coming steps. 652 | 653 | 654 | 655 | # ------ 656 | 657 | # Managing the Container Network Routes 658 | 659 | Now that each worker node is online we need to add routes to make sure that Pods running on different machines can talk to each other. In this lab we are not going to provision any overlay networks and instead rely on Layer 3 networking. That means we need to add routes to our router. In GCP and AWS each network has a router that can be configured. Ours is a bare metal installation, which means we have to add routes to our local router. Since my setup is a VM based setup on KVM/Libvirt, the router in question here is actually my local work computer. 660 | 661 | So, we know from experience above (during worker node setup), that the cbr0 on a worker node does not get an IP address until first pod is scheduled on it. This means we are not sure which node will get which network segment (10.200.x.0/24) from the main CIDR network (10.200.0.0/16) . That means, either we do it manually, or we can create a script which does this investigation for us; and (ideally) updates router accordingly. 662 | 663 | Basically this information is available from the output of `kubectl describe node ` command. (If it was not, I would have to ssh into each worker node and try to see what IP address is assigned to cbr0 interface!) . 664 | 665 | ``` 666 | [root@controller1 ~]# kubectl get nodes 667 | NAME STATUS AGE 668 | worker1.example.com Ready 23h 669 | worker2.example.com Ready 23h 670 | ``` 671 | 672 | ``` 673 | [root@controller1 ~]# kubectl describe node worker1 674 | Name: worker1.example.com 675 | Labels: beta.kubernetes.io/arch=amd64 676 | beta.kubernetes.io/os=linux 677 | kubernetes.io/hostname=worker1.example.com 678 | Taints: 679 | CreationTimestamp: Wed, 14 Sep 2016 13:10:44 +0200 680 | Phase: 681 | Conditions: 682 | Type Status LastHeartbeatTime LastTransitionTime Reason Message 683 | ---- ------ ----------------- ------------------ ------ ------- 684 | OutOfDisk False Thu, 15 Sep 2016 12:55:18 +0200 Thu, 15 Sep 2016 08:53:55 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available 685 | MemoryPressure False Thu, 15 Sep 2016 12:55:18 +0200 Wed, 14 Sep 2016 13:10:43 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available 686 | Ready True Thu, 15 Sep 2016 12:55:18 +0200 Thu, 15 Sep 2016 08:59:17 +0200 KubeletReady kubelet is posting ready status 687 | Addresses: 10.240.0.31,10.240.0.31 688 | Capacity: 689 | alpha.kubernetes.io/nvidia-gpu: 0 690 | cpu: 1 691 | memory: 1532864Ki 692 | pods: 110 693 | Allocatable: 694 | alpha.kubernetes.io/nvidia-gpu: 0 695 | cpu: 1 696 | memory: 1532864Ki 697 | pods: 110 698 | System Info: 699 | Machine ID: 87ac0ddf52aa40dcb138117283c65a10 700 | System UUID: 0947489A-D2E7-416F-AA1A-517900E2DCB5 701 | Boot ID: dbc1ab43-183d-475a-886c-d445fa7b41b4 702 | Kernel Version: 4.6.7-300.fc24.x86_64 703 | OS Image: Fedora 24 (Twenty Four) 704 | Operating System: linux 705 | Architecture: amd64 706 | Container Runtime Version: docker://1.11.2 707 | Kubelet Version: v1.3.6 708 | Kube-Proxy Version: v1.3.6 709 | PodCIDR: 10.200.0.0/24 710 | ExternalID: worker1.example.com 711 | Non-terminated Pods: (1 in total) 712 | Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits 713 | --------- ---- ------------ ---------- --------------- ------------- 714 | default centos-multitool-3822887632-jeyhb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 715 | Allocated resources: 716 | (Total limits may be over 100 percent, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md) 717 | CPU Requests CPU Limits Memory Requests Memory Limits 718 | ------------ ---------- --------------- ------------- 719 | 0 (0%) 0 (0%) 0 (0%) 0 (0%) 720 | No events. 721 | [root@controller1 ~]# 722 | ``` 723 | 724 | Extract the PodCIDR information from the output above: 725 | ``` 726 | [root@controller1 ~]# kubectl describe node worker1 | grep PodCIDR 727 | PodCIDR: 10.200.0.0/24 728 | [root@controller1 ~]# 729 | ``` 730 | 731 | 732 | We also know this: 733 | ``` 734 | [root@controller1 ~]# kubectl get nodes -o name 735 | node/worker1.example.com 736 | node/worker2.example.com 737 | [root@controller1 ~]# 738 | ``` 739 | 740 | 741 | ``` 742 | [root@controller1 ~]# kubectl get nodes -o name | sed 's/^.*\///' 743 | worker1.example.com 744 | worker2.example.com 745 | [root@controller1 ~]# 746 | ``` 747 | 748 | Ok, so we know what to do! 749 | 750 | ``` 751 | [root@controller1 ~]# NODE_LIST=$(kubectl get nodes -o name | sed 's/^.*\///') 752 | 753 | [root@controller1 ~]# for node in $NODE_LIST; do echo ${node}; kubectl describe node ${node} | grep PodCIDR; echo "------------------"; done 754 | 755 | [root@controller1 ~]# kubectl describe node worker1.example.com | grep PodCIDR| tr -d '[[:space:]]' | cut -d ':' -f2 756 | 10.200.0.0/24 757 | [root@controller1 ~]# 758 | ``` 759 | 760 | We also need the network address of the worker node: 761 | ``` 762 | [root@controller1 ~]# kubectl describe node worker1.example.com | grep Addresses| tr -d '[[:space:]]' | cut -d ':' -f 2 | cut -d ',' -f 1 763 | 10.240.0.31 764 | [root@controller1 ~]# 765 | ``` 766 | 767 | 768 | 769 | ``` 770 | [root@controller1 ~]# for node in $NODE_LIST; do echo ${node}; echo -n "Network: " ; kubectl describe node ${node} | grep PodCIDR| tr -d '[[:space:]]' | cut -d ':' -f2; echo -n "Reachable through: "; kubectl describe node ${node} | grep Addresses| tr -d '[[:space:]]' | cut -d ':' -f 2 | cut -d ',' -f 1; echo "--------------------------------"; done 771 | worker1.example.com 772 | Network: 10.200.0.0/24 773 | Reachable through: 10.240.0.31 774 | -------------------------------- 775 | worker2.example.com 776 | Network: 10.200.1.0/24 777 | Reachable through: 10.240.0.32 778 | -------------------------------- 779 | [root@controller1 ~]# 780 | ``` 781 | 782 | 783 | We can use this information to add routes to our network router, which is my work computer in our case. 784 | 785 | ``` 786 | [root@kworkhorse ~]# route add -net 10.200.0.0 netmask 255.255.255.0 gw 10.240.0.31 787 | [root@kworkhorse ~]# route add -net 10.200.1.0 netmask 255.255.255.0 gw 10.240.0.32 788 | ``` 789 | 790 | ( I will automate this by making a script out of the above manual steps). 791 | 792 | 793 | Here is how my routing table looks like on my work computer: 794 | ``` 795 | [root@kworkhorse ~]# route -n 796 | Kernel IP routing table 797 | Destination Gateway Genmask Flags Metric Ref Use Iface 798 | 0.0.0.0 192.168.100.1 0.0.0.0 UG 600 0 0 wlp2s0 799 | 10.200.0.0 10.240.0.31 255.255.255.0 UG 0 0 0 virbr2 800 | 10.200.1.0 10.240.0.32 255.255.255.0 UG 0 0 0 virbr2 801 | 10.240.0.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr2 802 | 172.16.0.0 0.0.0.0 255.255.0.0 U 0 0 0 virbr3 803 | 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 804 | 172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-8b79f8723f87 805 | 192.168.100.0 0.0.0.0 255.255.255.0 U 600 0 0 wlp2s0 806 | 192.168.124.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 807 | [root@kworkhorse ~]# 808 | ``` 809 | 810 | ## Moment of truth! 811 | Now one pod should be able to ping the other pod running on the other worker node: 812 | ``` 813 | [root@controller1 ~]# kubectl exec centos-multitool-3822887632-6qbrh -it -- bash 814 | 815 | [root@centos-multitool-3822887632-6qbrh /]# ping -c 1 10.200.1.1 816 | PING 10.200.1.1 (10.200.1.1) 56(84) bytes of data. 817 | 64 bytes from 10.200.1.1: icmp_seq=1 ttl=64 time=0.268 ms 818 | 819 | --- 10.200.1.1 ping statistics --- 820 | 1 packets transmitted, 1 received, 0% packet loss, time 0ms 821 | rtt min/avg/max/mdev = 0.268/0.268/0.268/0.000 ms 822 | 823 | 824 | [root@centos-multitool-3822887632-6qbrh /]# ping -c 1 10.200.0.1 825 | PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data. 826 | 64 bytes from 10.200.0.1: icmp_seq=1 ttl=62 time=4.57 ms 827 | 828 | --- 10.200.0.1 ping statistics --- 829 | 1 packets transmitted, 1 received, 0% packet loss, time 0ms 830 | rtt min/avg/max/mdev = 4.570/4.570/4.570/0.000 ms 831 | 832 | 833 | [root@centos-multitool-3822887632-6qbrh /]# ping -c 1 10.200.0.2 834 | PING 10.200.0.2 (10.200.0.2) 56(84) bytes of data. 835 | 64 bytes from 10.200.0.2: icmp_seq=1 ttl=61 time=0.586 ms 836 | 837 | --- 10.200.0.2 ping statistics --- 838 | 1 packets transmitted, 1 received, 0% packet loss, time 0ms 839 | rtt min/avg/max/mdev = 0.586/0.586/0.586/0.000 ms 840 | [root@centos-multitool-3822887632-6qbrh /]# 841 | ``` 842 | 843 | Great! It works! 844 | Configuring the Kubernetes Client - Remote Access 845 | 846 | This is step 6 in Kelseys guide. 847 | 848 | This step is not entirely necessary, as we can just login directly on one of the controller nodes, and can still manage the cluster. 849 | 850 | This is a (To do) 851 | 852 | ## Download and Install kubectl on your local work computer 853 | Linux 854 | ``` 855 | wget https://storage.googleapis.com/kubernetes-release/release/v1.3.6/bin/linux/amd64/kubectl 856 | chmod +x kubectl 857 | sudo mv kubectl /usr/local/bin 858 | ``` 859 | 860 | -------------------------------------------------------------------------------- /chapter09.md: -------------------------------------------------------------------------------- 1 | # Chapter 9: Verify cluster status 2 | 3 | In this chapter we verify that various components of the cluster are setup correctly, and are working as expected. 4 | 5 | * short chapter. 6 | * Just verification of components. 7 | * What to expect in logs, etc. 8 | 9 | 10 | Run pods, deployments, etc. 11 | 12 | Tip: Verify pod accessibility using nodeport , because we have still not setup load balancer yet. 13 | 14 | 15 | 16 | -------------------------------------------------------------------------------- /chapter10.md: -------------------------------------------------------------------------------- 1 | # Chapter 10: Working with Kubernetes 2 | * Long chapter. 3 | * Setting up a work computer to use kubectl and talk to kubernetes master. 4 | * Creating a simple nginx RC/Deployment 5 | * Scaling a Deployment 6 | * Accessing a pod from within pod network, using pod IPs 7 | * Creating a service using cluster IP and accessing it from within pod network 8 | * Creating a service using external IP and accessing it from outside the cluster network and also outside of kubernetes cluster. 9 | * NFS mounts 10 | * Simple PHP/MySQL database example 11 | 12 | 13 | In this chapter, we do all sorts of Kubernetes deployments, scaling, upgrades, canary patterns, etc. 14 | -------------------------------------------------------------------------------- /chapter11.md: -------------------------------------------------------------------------------- 1 | # Chapter 11: Setup Load Balancer nodes - with Praqma Load Balancer 2 | 3 | * Setup load balancer nodes 4 | * Setup HA for the nodes. 5 | * Deploy praqma load balancer and show how it works , etc. 6 | 7 | 8 | This chapter deploys the praqma load balancer on the load balancer nodes. First we do it with NodePort, and then we do it with our load balancer. 9 | 10 | How kubctl will access the cluster from outside, will be covered in next chapter. 11 | 12 | 13 | # Smoke test - with NodePort 14 | 15 | Kelsey likes to do this smoke test at this point, using the **NodePort** method. We can do this now, but what we are really interested in, is to be able to access the services using IP addresses and not using fancy ports. 16 | 17 | First, we do it the node port way. 18 | 19 | To begin with, we needs some pods running a web server. We already have two pods running centos-multitool which also contains (and runs) apache web server. 20 | 21 | ``` 22 | [root@controller1 ~]# kubectl get pods -o wide 23 | NAME READY STATUS RESTARTS AGE IP NODE 24 | centos-multitool-3822887632-6qbrh 1/1 Running 1 1d 10.200.1.4 worker2.example.com 25 | centos-multitool-3822887632-jeyhb 1/1 Running 0 1d 10.200.0.2 worker1.example.com 26 | [root@controller1 ~]# 27 | ``` 28 | The deployment behind these pods is centos-multitool. 29 | 30 | ``` 31 | [root@controller1 ~]# kubectl get deployments -o wide 32 | NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 33 | centos-multitool 2 2 2 2 1d 34 | [root@controller1 ~]# 35 | ``` 36 | 37 | 38 | If you don't have it running, or if you would like to run something else, such as a simple nginx web server, you can do that. Lets follow Kelsey's example: 39 | 40 | 41 | ``` 42 | [root@controller1 ~]# kubectl run nginx --image=nginx --port=80 --replicas=3 43 | deployment "nginx" created 44 | [root@controller1 ~]# 45 | ``` 46 | 47 | ``` 48 | [root@controller1 ~]# kubectl get pods -o wide 49 | NAME READY STATUS RESTARTS AGE IP NODE 50 | centos-multitool-3822887632-6qbrh 1/1 Running 1 1d 10.200.1.4 worker2.example.com 51 | centos-multitool-3822887632-jeyhb 1/1 Running 0 1d 10.200.0.2 worker1.example.com 52 | nginx-2032906785-a6pt5 1/1 Running 0 2m 10.200.0.4 worker1.example.com 53 | nginx-2032906785-foq6g 1/1 Running 0 2m 10.200.1.6 worker2.example.com 54 | nginx-2032906785-zbbkv 1/1 Running 0 2m 10.200.1.7 worker2.example.com 55 | [root@controller1 ~]# 56 | ``` 57 | 58 | 59 | Lets create a service out of this deployment: 60 | 61 | ``` 62 | [root@controller1 ~]# kubectl expose deployment nginx --type NodePort 63 | service "nginx" exposed 64 | [root@controller1 ~]# 65 | ``` 66 | **Note:** At this point `--type=LoadBalancer` will not work because we did not configure a cloud provider when bootstrapping this cluster. 67 | 68 | 69 | Extract the NodePort setup for this nginx service: 70 | ``` 71 | [root@controller1 ~]# NODE_PORT=$(kubectl get svc nginx --output=jsonpath='{range .spec.ports[0]}{.nodePort}') 72 | 73 | [root@controller1 ~]# echo $NODE_PORT 74 | 32133 75 | [root@controller1 ~]# 76 | ``` 77 | 78 | Lets try accessing this service using the port we have. We can use the IP address of any of the worker nodes to access this service using the NODE_PORT . 79 | 80 | ``` 81 | [root@controller1 ~]# curl http://10.240.0.31:32133 82 | 83 | 84 | 85 | Welcome to nginx! 86 | 93 | 94 | 95 |

Welcome to nginx!

96 |

If you see this page, the nginx web server is successfully installed and 97 | working. Further configuration is required.

98 | 99 |

For online documentation and support please refer to 100 | nginx.org.
101 | Commercial support is available at 102 | nginx.com.

103 | 104 |

Thank you for using nginx.

105 | 106 | 107 | [root@controller1 ~]# 108 | ``` 109 | 110 | From the other worker node, this time using the worker node's DNS name: 111 | 112 | ``` 113 | [root@controller1 ~]# curl http://worker2.example.com:32133 114 | 115 | 116 | 117 | Welcome to nginx! 118 | 125 | 126 | 127 |

Welcome to nginx!

128 |

If you see this page, the nginx web server is successfully installed and 129 | working. Further configuration is required.

130 | 131 |

For online documentation and support please refer to 132 | nginx.org.
133 | Commercial support is available at 134 | nginx.com.

135 | 136 |

Thank you for using nginx.

137 | 138 | 139 | [root@controller1 ~]# 140 | ``` 141 | 142 | So the node port method works! 143 | 144 | Thanks to correct routing setup, I can also access the nginx web server directly by using the IP address of the pod directly from my controller node: 145 | 146 | ``` 147 | [root@controller1 ~]# curl 10.200.0.4 148 | 149 | 150 | 151 | Welcome to nginx! 152 | 159 | 160 | 161 |

Welcome to nginx!

162 |

If you see this page, the nginx web server is successfully installed and 163 | working. Further configuration is required.

164 | 165 |

For online documentation and support please refer to 166 | nginx.org.
167 | Commercial support is available at 168 | nginx.com.

169 | 170 |

Thank you for using nginx.

171 | 172 | 173 | [root@controller1 ~]# 174 | ``` 175 | ---------- 176 | 177 | # Smoke test - Praqma Load Balancer: 178 | 179 | (aka. The real deal!) 180 | 181 | First we need to have the haproxy package installed on this VM. Also make sure that iptables service is disabled and SELINUX is also disabled. You need to install nmap as well; it is used by the script. 182 | 183 | ``` 184 | [root@lb ~]# yum -y install haproxy git nmap 185 | ``` 186 | 187 | If there are some pods already running in the cluster, then it is a good time to ping them to make sure that the load balancer is able to reach the pods. We started some pods in the above section, so we should be able to ping them. First we obtain endpoints of the nginx service from the controller node. 188 | 189 | ``` 190 | [root@controller1 ~]#kubectl get endpoints nginx 191 | NAME ENDPOINTS AGE 192 | nginx 10.200.0.4:80,10.200.0.5:80,10.200.0.6:80 + 7 more... 2d 193 | [root@controller1 ~]# 194 | ``` 195 | 196 | You can also use curl to get a list of IPs in json form and then filter it: 197 | 198 | ``` 199 | [root@controller1 ~]# curl -s http://localhost:8080/api/v1/namespaces/default/endpoints/nginx | grep "ip" 200 | "ip": "10.200.0.4", 201 | "ip": "10.200.0.5", 202 | "ip": "10.200.0.6", 203 | "ip": "10.200.0.7", 204 | "ip": "10.200.0.8", 205 | "ip": "10.200.1.10", 206 | "ip": "10.200.1.6", 207 | "ip": "10.200.1.7", 208 | "ip": "10.200.1.8", 209 | "ip": "10.200.1.9", 210 | [root@controller1 ~]# 211 | ``` 212 | 213 | 214 | 215 | **Note:** jq can be used to parse json output! (More on this later. to do) 216 | ``` 217 | $ sudo yum -y install jq 218 | ``` 219 | 220 | 221 | We will use two ips from two different networks and see if we can ping them from our load balancer. If we are successful, it means our routing is setup correctly. 222 | 223 | ``` 224 | [root@lb ~]# ping -c 1 10.200.0.4 225 | PING 10.200.0.4 (10.200.0.4) 56(84) bytes of data. 226 | 64 bytes from 10.200.0.4: icmp_seq=1 ttl=63 time=0.960 ms 227 | 228 | --- 10.200.0.4 ping statistics --- 229 | 1 packets transmitted, 1 received, 0% packet loss, time 0ms 230 | rtt min/avg/max/mdev = 0.960/0.960/0.960/0.000 ms 231 | 232 | 233 | [root@lb ~]# ping -c 1 10.200.1.6 234 | PING 10.200.1.6 (10.200.1.6) 56(84) bytes of data. 235 | 64 bytes from 10.200.1.6: icmp_seq=1 ttl=63 time=1.46 ms 236 | 237 | --- 10.200.1.6 ping statistics --- 238 | 1 packets transmitted, 1 received, 0% packet loss, time 0ms 239 | rtt min/avg/max/mdev = 1.463/1.463/1.463/0.000 ms 240 | [root@lb ~]# 241 | ``` 242 | 243 | Clearly, we are able to ping pods from our load balancer. Good! 244 | 245 | Create a combined certificate and then move certificates to /var/lib/kubernetes/. 246 | ``` 247 | mkdir /var/lib/kubernetes/ 248 | cat /root/kubernetes.pem /root/kubernetes-key.pem > /root/kubernetes-combined.pem 249 | mv /root/*.pem /var/lib/kubernetes/ 250 | ``` 251 | 252 | Next, we need the load balancer script and config files. You can clone the entire LearnKubernetes repository somewhere on the load balancer's file system. (You need to have git on load balancer machine!) 253 | 254 | ``` 255 | [root@lb ~]# git clone https://github.com/Praqma/LearnKubernetes.git 256 | [root@lb ~]# cd LearnKubernetes/kamran/LoadBalancer-Files/ 257 | [root@lb LoadBalancer-Files]# 258 | ``` 259 | 260 | 261 | Next, we need to copy the loadbalancer.conf to /opt/ . 262 | 263 | ``` 264 | [root@lb LoadBalancer-Files]# cp loadbalancer.conf /opt/ 265 | ``` 266 | 267 | And copy loadbalancer.sh to /usr/local/bin/ : 268 | ``` 269 | [root@lb LoadBalancer-Files]# cp loadbalancer.sh.cidr /usr/local/bin/loadbalancer.sh 270 | ``` 271 | 272 | 273 | Now, edit the loadbalancer.conf file and adjust it as following: 274 | ``` 275 | [root@lb LoadBalancer-Files]# vi /opt/loadbalancer.conf 276 | # This file contains the necessary information for loadbalancer script to work properly. 277 | # This IP / interface will never be shutdown. 278 | LB_PRIMARY_IP=10.240.0.200 279 | # LB_DATABASE=/opt/LoadBalancer.sqlite.db 280 | LB_LOG_FILE=/var/log/loadbalancer.log 281 | # IP Address of the Kubernetes master node. 282 | MASTER_IP=10.240.0.21 283 | # The user on master node, which is allowed to run the kubectl commands. This user needs to have the public RSA key from the root 284 | # user at load balancer in it's authorized keys file. 285 | MASTER_USER=root 286 | PRODUCTION_HAPROXY_CONFIG=/etc/haproxy/haproxy.cfg 287 | ``` 288 | 289 | 290 | Time to generate RSA key pair for user root at loadbalancer VM. Then, we will copy the public key of the RSA keypair to the authorized_keys file of root user on the controller nodes. 291 | 292 | ``` 293 | [root@lb LoadBalancer-Files]# ssh-keygen -t rsa -N '' 294 | ``` 295 | 296 | ``` 297 | [root@lb LoadBalancer-Files]# ssh-copy-id root@controller1 298 | The authenticity of host 'controller1 (10.240.0.21)' can't be established. 299 | ECDSA key fingerprint is 84:5d:ae:17:17:07:06:46:b6:7d:69:2f:32:25:50:d0. 300 | Are you sure you want to continue connecting (yes/no)? yes 301 | /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed 302 | /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys 303 | root@controller1's password: 304 | 305 | Number of key(s) added: 1 306 | 307 | Now try logging into the machine, with: "ssh 'root@controller1'" 308 | and check to make sure that only the key(s) you wanted were added. 309 | 310 | [root@lb LoadBalancer-Files]# 311 | ``` 312 | 313 | Verify that the passwordless login works from load balancer to the controller: 314 | 315 | ``` 316 | [root@lb LoadBalancer-Files]# ssh root@controller1 hostname 317 | controller1.example.com 318 | [root@lb LoadBalancer-Files]# 319 | ``` 320 | 321 | 322 | Now, make sure that the service you are interested in has an external IP specified in Kubernetes. If it does not exist, delete the service and recreate it. 323 | ``` 324 | [root@controller1 ~]# kubectl delete service nginx 325 | service "nginx" deleted 326 | 327 | [root@controller1 ~]# kubectl expose deployment nginx --external-ip=10.240.0.2 328 | service "nginx" exposed 329 | [root@controller1 ~]# 330 | ``` 331 | 332 | Then, run the loadbalancer.sh program. First, in show mode and then in create mode. 333 | 334 | **show mode:** 335 | ``` 336 | [root@lb LoadBalancer-Files]# loadbalancer.sh show 337 | 338 | Beginning execution of main program - in show mode... 339 | 340 | Showing status of service: haproxy 341 | ---------------------------------- 342 | ● haproxy.service - HAProxy Load Balancer 343 | Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor preset: disabled) 344 | Active: inactive (dead) 345 | 346 | 347 | Starting Sanity checks ... 348 | 349 | Checking if kubernetes master 10.240.0.21 is reachable over SSH ...Yes! 350 | Success connecting to Kubernetes master 10.240.0.21 on port 22. 351 | 352 | Running command 'uptime' as user root on Kubernetes Master 10.240.0.21. 353 | 354 | 13:58:44 up 2:42, 1 user, load average: 0.00, 0.00, 0.00 355 | 356 | Running command 'kubectl get cs' as user root on Kubernetes Master 10.240.0.21. 357 | 358 | NAME STATUS MESSAGE ERROR 359 | scheduler Healthy ok 360 | controller-manager Healthy ok 361 | etcd-1 Healthy {"health": "true"} 362 | etcd-0 Healthy {"health": "true"} 363 | 364 | Sanity checks completed successfully! 365 | 366 | Following services were found with external IPs - on Kubernetes master ... 367 | ==================================================================================================== 368 | default nginx 10.32.0.230 80/TCP 3d 369 | 370 | Here are Top 10 IPs from the available pool: 371 | -------------------------------------------- 372 | 10.240.0.2 373 | 10.240.0.3 374 | 10.240.0.4 375 | 10.240.0.5 376 | 10.240.0.6 377 | 10.240.0.7 378 | 10.240.0.8 379 | 10.240.0.9 380 | 10.240.0.10 381 | 10.240.0.13 382 | 383 | 384 | oooooooooooooooooooo Show load balancer configuration and status. - Operation completed. oooooooooooooooooooo 385 | Logs are in: /var/log/loadbalancer.log 386 | 387 | TODO: 388 | ----- 389 | * - Use [root@loadbalancer ~]# curl -k -s -u vagrant:vagrant https://10.245.1.2/api/v1/namespaces/default/endpoints/apache | grep ip 390 | The above is better to use instead of getting endpoints from kubectl, because kubectl only shows 2-3 endpoints and says +XX more... 391 | * - Create multiple listen sections depending on the ports of a service. such as 80, 443 for web servers. This may be tricky. Or there can be two bind commands in one listen directive/section. 392 | * - Use local kubectl instead of SSHing into Master 393 | 394 | [root@lb LoadBalancer-Files]# 395 | ``` 396 | 397 | 398 | 399 | **create mode:** 400 | ``` 401 | [root@lb LoadBalancer-Files]# loadbalancer.sh create 402 | 403 | Beginning execution of main program - in create mode... 404 | 405 | Acquiring program lock with PID: 27196 , in lock file: /var/lock/loadbalancer 406 | 407 | Starting Sanity checks ... 408 | 409 | Checking if kubernetes master 10.240.0.21 is reachable over SSH ...Yes! 410 | Success connecting to Kubernetes master 10.240.0.21 on port 22. 411 | 412 | Running command 'uptime' as user root on Kubernetes Master 10.240.0.21. 413 | 414 | 14:04:56 up 2:48, 1 user, load average: 0.00, 0.01, 0.00 415 | 416 | Running command 'kubectl get cs' as user root on Kubernetes Master 10.240.0.21. 417 | 418 | NAME STATUS MESSAGE ERROR 419 | scheduler Healthy ok 420 | controller-manager Healthy ok 421 | etcd-1 Healthy {"health": "true"} 422 | etcd-0 Healthy {"health": "true"} 423 | 424 | Sanity checks completed successfully! 425 | 426 | Following services were found with external IPs - on Kubernetes master ... 427 | ==================================================================================================== 428 | default nginx 10.32.0.237 10.240.0.2 80/TCP 42s 429 | -----> Creating HA proxy section: default-nginx-80 430 | listen default-nginx-80 431 | bind 10.240.0.2:80 432 | server pod-1 10.200.0.4:80 check 433 | server pod-2 10.200.0.5:80 check 434 | server pod-3 10.200.0.6:80 check 435 | server pod-4 10.200.0.7:80 check 436 | server pod-5 10.200.0.8:80 check 437 | server pod-6 10.200.1.10:80 check 438 | server pod-7 10.200.1.6:80 check 439 | server pod-8 10.200.1.7:80 check 440 | server pod-9 10.200.1.8:80 check 441 | server pod-10 10.200.1.9:80 check 442 | 443 | Comparing generated (haproxy) config with running config ... 444 | 445 | 20c20 446 | < bind 10.240.0.2:80 447 | --- 448 | > bind :80 449 | 450 | The generated and running (haproxy) config files differ. Replacing the running haproxy file with the newly generated one, and reloading haproxy service ... 451 | 452 | Checking/managing HA Proxy service ... 453 | HA Proxy process was not running on this system. Starting the service ... Successful. 454 | 455 | Aligning IP addresses on eth0... 456 | Adding IP address 10.240.0.2 to the interface eth0. 457 | 458 | Here is the final status of the network interface eth0 : 459 | --------------------------------------------------------------------------------------- 460 | 2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 461 | link/ether 52:54:00:36:27:7d brd ff:ff:ff:ff:ff:ff 462 | inet 10.240.0.200/24 brd 10.240.0.255 scope global eth0 463 | valid_lft forever preferred_lft forever 464 | inet 10.240.0.2/24 scope global secondary eth0 465 | valid_lft forever preferred_lft forever 466 | inet6 fe80::5054:ff:fe36:277d/64 scope link 467 | valid_lft forever preferred_lft forever 468 | --------------------------------------------------------------------------------------- 469 | 470 | Releasing progarm lock: /var/lock/loadbalancer 471 | 472 | oooooooooooooooooooo Create haproxy configuration. - Operation completed. oooooooooooooooooooo 473 | Logs are in: /var/log/loadbalancer.log 474 | 475 | TODO: 476 | ----- 477 | * - Use [root@loadbalancer ~]# curl -k -s -u vagrant:vagrant https://10.245.1.2/api/v1/namespaces/default/endpoints/apache | grep ip 478 | The above is better to use instead of getting endpoints from kubectl, because kubectl only shows 2-3 endpoints and says +XX more... 479 | * - Create multiple listen sections depending on the ports of a service. such as 80, 443 for web servers. This may be tricky. Or there can be two bind commands in one listen directive/section. 480 | * - Use local kubectl instead of SSHing into Master 481 | 482 | [root@lb LoadBalancer-Files]# 483 | ``` 484 | 485 | After running the loadbalancer in the create mode, the resultant `/etc/haproxy/haproxy.conf` file looks like this: 486 | 487 | ``` 488 | [root@lb LoadBalancer-Files]# cat /etc/haproxy/haproxy.cfg 489 | global 490 | log 127.0.0.1 local2 491 | chroot /var/lib/haproxy 492 | pidfile /var/run/haproxy.pid 493 | maxconn 4000 494 | user haproxy 495 | group haproxy 496 | daemon 497 | 498 | # turn on stats unix socket 499 | stats socket /var/lib/haproxy/stats 500 | 501 | defaults 502 | mode http 503 | timeout connect 5000ms 504 | timeout client 50000ms 505 | timeout server 50000ms 506 | 507 | listen default-nginx-80 508 | bind 10.240.0.2:80 509 | server pod-1 10.200.0.4:80 check 510 | server pod-2 10.200.0.5:80 check 511 | server pod-3 10.200.0.6:80 check 512 | server pod-4 10.200.0.7:80 check 513 | server pod-5 10.200.0.8:80 check 514 | server pod-6 10.200.1.10:80 check 515 | server pod-7 10.200.1.6:80 check 516 | server pod-8 10.200.1.7:80 check 517 | server pod-9 10.200.1.8:80 check 518 | server pod-10 10.200.1.9:80 check 519 | [root@lb LoadBalancer-Files]# 520 | ``` 521 | 522 | 523 | You can also re-run the loadbalancer in the **show** mode, just to be sure: 524 | 525 | ``` 526 | [root@lb LoadBalancer-Files]# loadbalancer.sh show 527 | 528 | Beginning execution of main program - in show mode... 529 | 530 | Showing status of service: haproxy 531 | ---------------------------------- 532 | ● haproxy.service - HAProxy Load Balancer 533 | Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor preset: disabled) 534 | Active: inactive (dead) 535 | 536 | Sep 19 14:00:13 lb.example.com haproxy-systemd-wrapper[27151]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds 537 | Sep 19 14:00:14 lb.example.com haproxy-systemd-wrapper[27151]: [ALERT] 262/140013 (27158) : parsing [/etc/haproxy/haproxy.cfg:20] : 'bind' : invalid address: '' in ':80' 538 | Sep 19 14:00:14 lb.example.com haproxy-systemd-wrapper[27151]: [ALERT] 262/140013 (27158) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg 539 | Sep 19 14:00:14 lb.example.com haproxy-systemd-wrapper[27151]: [ALERT] 262/140014 (27158) : Fatal errors found in configuration. 540 | Sep 19 14:00:14 lb.example.com haproxy-systemd-wrapper[27151]: haproxy-systemd-wrapper: exit, haproxy RC=256 541 | Sep 19 14:04:57 lb.example.com systemd[1]: Started HAProxy Load Balancer. 542 | Sep 19 14:04:57 lb.example.com systemd[1]: Starting HAProxy Load Balancer... 543 | Sep 19 14:04:57 lb.example.com haproxy-systemd-wrapper[27252]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds 544 | Sep 19 14:04:57 lb.example.com haproxy-systemd-wrapper[27252]: [ALERT] 262/140457 (27254) : Starting proxy default-nginx-80: cannot bind socket [10.240.0.2:80] 545 | Sep 19 14:04:57 lb.example.com haproxy-systemd-wrapper[27252]: haproxy-systemd-wrapper: exit, haproxy RC=256 546 | 547 | 548 | Starting Sanity checks ... 549 | 550 | Checking if kubernetes master 10.240.0.21 is reachable over SSH ...Yes! 551 | Success connecting to Kubernetes master 10.240.0.21 on port 22. 552 | 553 | Running command 'uptime' as user root on Kubernetes Master 10.240.0.21. 554 | 555 | 14:10:54 up 2:54, 1 user, load average: 0.02, 0.03, 0.00 556 | 557 | Running command 'kubectl get cs' as user root on Kubernetes Master 10.240.0.21. 558 | 559 | NAME STATUS MESSAGE ERROR 560 | scheduler Healthy ok 561 | controller-manager Healthy ok 562 | etcd-1 Healthy {"health": "true"} 563 | etcd-0 Healthy {"health": "true"} 564 | 565 | Sanity checks completed successfully! 566 | 567 | Following services were found with external IPs - on Kubernetes master ... 568 | ==================================================================================================== 569 | default nginx 10.32.0.237 10.240.0.2 80/TCP 6m 570 | 571 | Here are Top 10 IPs from the available pool: 572 | -------------------------------------------- 573 | 10.240.0.3 574 | 10.240.0.4 575 | 10.240.0.5 576 | 10.240.0.6 577 | 10.240.0.7 578 | 10.240.0.8 579 | 10.240.0.9 580 | 10.240.0.10 581 | 10.240.0.13 582 | 10.240.0.14 583 | 584 | 585 | oooooooooooooooooooo Show load balancer configuration and status. - Operation completed. oooooooooooooooooooo 586 | Logs are in: /var/log/loadbalancer.log 587 | 588 | TODO: 589 | ----- 590 | * - Use [root@loadbalancer ~]# curl -k -s -u vagrant:vagrant https://10.245.1.2/api/v1/namespaces/default/endpoints/apache | grep ip 591 | The above is better to use instead of getting endpoints from kubectl, because kubectl only shows 2-3 endpoints and says +XX more... 592 | * - Create multiple listen sections depending on the ports of a service. such as 80, 443 for web servers. This may be tricky. Or there can be two bind commands in one listen directive/section. 593 | * - Use local kubectl instead of SSHing into Master 594 | 595 | [root@lb LoadBalancer-Files]# 596 | ``` 597 | 598 | 599 | ## Moment of truth: 600 | Access the service from some other machine, such as my work computer: 601 | 602 | ``` 603 | [kamran@kworkhorse ~]$ curl http://10.240.0.2 604 | 605 | 606 | 607 | Welcome to nginx! 608 | 615 | 616 | 617 |

Welcome to nginx!

618 |

If you see this page, the nginx web server is successfully installed and 619 | working. Further configuration is required.

620 | 621 |

For online documentation and support please refer to 622 | nginx.org.
623 | Commercial support is available at 624 | nginx.com.

625 | 626 |

Thank you for using nginx.

627 | 628 | 629 | [kamran@kworkhorse ~]$ 630 | ``` 631 | 632 | **It works!** Hurray! 633 | 634 | 635 | 636 | 637 | -------------------------------------------------------------------------------- /chapter12.md: -------------------------------------------------------------------------------- 1 | Chapter 12: Kubectl from outside the network 2 | * Show how kubectl will access controller nodes. 3 | 4 | This has to do something with the HA we setup at the controller nodes, and the HA we setup at proxy level. 5 | Also specify how kubectl will access the controller nodes from outside the cluster. This is especially important, because we need to specify this on the edge router, whether the incoming traffic for 6443 will go to load balancer VIP or the VIP of the controller nodes. 6 | 7 | 8 | -------------------------------------------------------------------------------- /chapter13.md: -------------------------------------------------------------------------------- 1 | # Chapter 13: Deploying the cluster add-on: DNS (skydns) 2 | 3 | DNS add-on is required for every Kubernetes cluster. ( I wonder why it is not part of core kubernetes!) . Without the DNS add-on the following things will not work: 4 | 5 | * DNS based service discovery 6 | * DNS lookups from containers running in pods 7 | 8 | ## Create kubedns service 9 | 10 | ``` 11 | kubectl create -f https://raw.githubusercontent.com/kelseyhightower/kubernetes-the-hard-way/master/services/kubedns.yaml 12 | ``` 13 | 14 | Verification 15 | ``` 16 | [root@controller1 ~]# kubectl get svc --namespace=kube-system 17 | NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE 18 | kube-dns 10.32.0.10 53/UDP,53/TCP 2s 19 | [root@controller1 ~]# 20 | ``` 21 | 22 | ## Create the kubedns deployment: 23 | ``` 24 | kubectl create -f https://raw.githubusercontent.com/kelseyhightower/kubernetes-the-hard-way/master/deployments/kubedns.yaml 25 | ``` 26 | 27 | 28 | ## Verification & Validation 29 | 30 | ### Verification 31 | ``` 32 | kubectl --namespace=kube-system get pods 33 | ``` 34 | 35 | ``` 36 | [root@controller1 ~]# kubectl --namespace=kube-system get pods -o wide 37 | NAME READY STATUS RESTARTS AGE IP NODE 38 | kube-dns-v19-965658604-1jq36 3/3 Running 2 33m 10.200.1.5 worker2.example.com 39 | kube-dns-v19-965658604-oyws2 3/3 Running 0 33m 10.200.0.3 worker1.example.com 40 | [root@controller1 ~]# 41 | ``` 42 | 43 | (todo) I wonder why one pod had two restarts! 44 | 45 | 46 | ## Validation that DNS is actually working - using a pod: 47 | 48 | ``` 49 | [root@controller1 ~]# kubectl exec centos-multitool-3822887632-6qbrh -it -- bash 50 | ``` 51 | 52 | First confirm that the pod has correct DNS setup in it's /etc/resolv.conf file: 53 | 54 | ``` 55 | [root@centos-multitool-3822887632-6qbrh /]# cat /etc/resolv.conf 56 | search default.svc.cluster.local svc.cluster.local cluster.local example.com 57 | nameserver 10.32.0.10 58 | options ndots:5 59 | [root@centos-multitool-3822887632-6qbrh /]# 60 | ``` 61 | 62 | Now check if it resolves the service name registered with kubernetes internal DNS: 63 | ``` 64 | [root@centos-multitool-3822887632-6qbrh /]# dig kubernetes.default.svc.cluster.local 65 | 66 | ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> kubernetes.default.svc.cluster.local 67 | ;; global options: +cmd 68 | ;; Got answer: 69 | ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1090 70 | ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 71 | 72 | ;; OPT PSEUDOSECTION: 73 | ; EDNS: version: 0, flags:; udp: 4096 74 | ;; QUESTION SECTION: 75 | ;kubernetes.default.svc.cluster.local. IN A 76 | 77 | ;; ANSWER SECTION: 78 | kubernetes.default.svc.cluster.local. 22 IN A 10.32.0.1 79 | 80 | ;; Query time: 9 msec 81 | ;; SERVER: 10.32.0.10#53(10.32.0.10) 82 | ;; WHEN: Fri Sep 16 10:42:07 UTC 2016 83 | ;; MSG SIZE rcvd: 81 84 | 85 | [root@centos-multitool-3822887632-6qbrh /]# 86 | ``` 87 | Great! So it is able to resolve the internal service names! Lets see if it can also resolve host names outside this cluster. 88 | 89 | ``` 90 | [root@centos-multitool-3822887632-6qbrh /]# dig yahoo.com 91 | 92 | ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> yahoo.com 93 | ;; global options: +cmd 94 | ;; Got answer: 95 | ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55948 96 | ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 6, ADDITIONAL: 11 97 | 98 | ;; OPT PSEUDOSECTION: 99 | ; EDNS: version: 0, flags:; udp: 4096 100 | ;; QUESTION SECTION: 101 | ;yahoo.com. IN A 102 | 103 | ;; ANSWER SECTION: 104 | yahoo.com. 23 IN A 206.190.36.45 105 | yahoo.com. 23 IN A 98.139.183.24 106 | yahoo.com. 23 IN A 98.138.253.109 107 | 108 | ;; AUTHORITY SECTION: 109 | yahoo.com. 10714 IN NS ns4.yahoo.com. 110 | yahoo.com. 10714 IN NS ns3.yahoo.com. 111 | yahoo.com. 10714 IN NS ns5.yahoo.com. 112 | yahoo.com. 10714 IN NS ns2.yahoo.com. 113 | yahoo.com. 10714 IN NS ns6.yahoo.com. 114 | yahoo.com. 10714 IN NS ns1.yahoo.com. 115 | 116 | ;; ADDITIONAL SECTION: 117 | ns5.yahoo.com. 258970 IN A 119.160.247.124 118 | ns4.yahoo.com. 327698 IN A 98.138.11.157 119 | ns2.yahoo.com. 333473 IN A 68.142.255.16 120 | ns1.yahoo.com. 290475 IN A 68.180.131.16 121 | ns6.yahoo.com. 10714 IN A 121.101.144.139 122 | ns3.yahoo.com. 325559 IN A 203.84.221.53 123 | ns2.yahoo.com. 32923 IN AAAA 2001:4998:140::1002 124 | ns1.yahoo.com. 8839 IN AAAA 2001:4998:130::1001 125 | ns6.yahoo.com. 158315 IN AAAA 2406:2000:108:4::1006 126 | ns3.yahoo.com. 6892 IN AAAA 2406:8600:b8:fe03::1003 127 | 128 | ;; Query time: 23 msec 129 | ;; SERVER: 10.32.0.10#53(10.32.0.10) 130 | ;; WHEN: Fri Sep 16 11:12:12 UTC 2016 131 | ;; MSG SIZE rcvd: 402 132 | 133 | [root@centos-multitool-3822887632-6qbrh /]# 134 | ``` 135 | It clearly does so! 136 | 137 | 138 | ------- 139 | -------------------------------------------------------------------------------- /chapter14.md: -------------------------------------------------------------------------------- 1 | # Chapter 14: Monitoring and Alerting 2 | * Some Visualizers (CAdvisor, fedora CockPit, kubernetes visualizer, etc) 3 | * Alerting? 4 | -------------------------------------------------------------------------------- /images/Kubernetes-BareMetal-Cluster-setup.dia: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/Kubernetes-BareMetal-Cluster-setup.dia -------------------------------------------------------------------------------- /images/Kubernetes-BareMetal-Cluster-setup.dia~: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/Kubernetes-BareMetal-Cluster-setup.dia~ -------------------------------------------------------------------------------- /images/Kubernetes-BareMetal-Cluster-setup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/Kubernetes-BareMetal-Cluster-setup.png -------------------------------------------------------------------------------- /images/libvirt-new-virtual-network-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-virtual-network-1.png -------------------------------------------------------------------------------- /images/libvirt-new-virtual-network-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-virtual-network-2.png -------------------------------------------------------------------------------- /images/libvirt-new-virtual-network-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-virtual-network-3.png -------------------------------------------------------------------------------- /images/libvirt-new-virtual-network-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-virtual-network-4.png -------------------------------------------------------------------------------- /images/libvirt-new-virtual-network-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-virtual-network-5.png -------------------------------------------------------------------------------- /images/libvirt-new-virtual-network-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-virtual-network-6.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-01.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-02.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-03.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-04.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-04.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-05.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-05.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-06.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-06.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-07.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-07.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-08.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-08.png -------------------------------------------------------------------------------- /images/libvirt-new-vm-09.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Praqma/kubernetes-ebook/1b9b78d8bb476396723fcede0af6e91a179f5cb0/images/libvirt-new-vm-09.png -------------------------------------------------------------------------------- /outline.md: -------------------------------------------------------------------------------- 1 | # Preface: 2 | This book is inspired by Kelsey Hightower's work on Kubernetes. It tries to address areas which Kelsey's guide ("Kubernetes the Hard Way") does not cover, or are not explained well - or so I understood. 3 | 4 | 5 | This ebook will show you how to setup a Kubernetes cluster on bare-metal. The setup may as well consist of VMs instead of real bare-metal machines. You can also implement the same concepts on AWS and GCE clouds, or any other cloud for that matter, such as Digital Ocean, Zetta, etc - except for the HA bits. 6 | 7 | 8 | **Note:** The order of the chapters can change. This outline will change heavily in the coming days / weeks. 9 | 10 | # [Chapter 1: What is Kubernetes?](chapter01.md) 11 | * (and why not plain docker, or docker swarm, etc?) 12 | * Concepts and terminology of Kubernetes. 13 | 14 | # [Chapter 2: Selection of infrastructure and network technologies](chapter02.md) 15 | * Write about what type of hardware is needed. If not physical hardware, then what size of VMs are needed. etc. 16 | * Discuss what type of network technologies are we going to use. Such as flannel or CIDR, etc. 17 | * This will be a relatively short chapter. 18 | 19 | # [Chapter 3: Provisioning of machines, Network setup](chapter03.md) 20 | * Here we provision our machines, and also setup networking. 21 | * This will have a couple of diagrams 22 | 23 | # [Chapter 4: SSL certificates](chapter04.md) 24 | 25 | # [Chapter 5: etcd nodes](chapter05.md) 26 | * Talk about what etcd is and how to set it up, including it's installation 27 | * Also show how to to setup etcd in HA mode. 28 | 29 | # [Chapter 6: Kubernetes Master/Controller nodes](chapter06.md) 30 | * Talk about how kubernetes master node is setup 31 | * also talk about HA for controller nodes. 32 | * Include access control and Authentication/Authorization, etc. 33 | 34 | 35 | # [Chapter 7: HA for Kubernetes Control Plane](chapter07.md) 36 | * Here we setup Corosync/Pacemaker to provide HA to Kubernetes. 37 | 38 | # [Chapter 8: Kubernetes Worker nodes](chapter08.md) 39 | * Setup Kubernetes worker nodes. 40 | * Including docker 41 | * Setup networking (CNI/CIDR) 42 | * Setup remote access with Kubectl 43 | 44 | 45 | # [Chapter 9: Verify various cluster components](chapter09.md)) 46 | * short chapter. 47 | * Just verification of components. 48 | * What to expect in logs, etc. 49 | 50 | # [Chapter 10: Working with Kubernetes](chapter10.md) 51 | * Setting up a work computer to use kubectl and talk to kubernetes master. 52 | * Creating a simple nginx RC/Deployment 53 | * Scaling a Deployment 54 | * Accessing a pod from within pod network, using pod IPs 55 | * Creating a service using cluster IP and accessing it from within pod network 56 | * Creating a service using external IP and accessing it from outside the cluster network and also outside of kubernetes cluster. 57 | 58 | # [Chapter 11: Praqma Load Balancer/Traefik - with HA](chapter11.md) 59 | 60 | # [Chapter 12: Accessing HA Kubernetes service from outside the network](chapter12.md) 61 | 62 | # [Chapter 13: Cluster add-ons](chapter13.md) 63 | 64 | # [Chapter 14: Monitoring and Alerting](chapter14.md) 65 | * Some Visualizers (CAdvisor, fedora CockPit, kubernetes visualizer, etc) 66 | * Alerting? 67 | 68 | [Appendix A: DNS (dnsmasq)](appendix-a.md) 69 | 70 | --------------------------------------------------------------------------------