├── README.md └── _config.yml /README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Documentation 2 | 3 | ## Disclaimer before we start to go deeper into this rabbit hole 4 | 5 | The first time I started looking into virtualization, docker, kubernetes and etc... it was for work... a year ago. 6 | 7 | I happen to like it, to the point where I now managed to build a nice kubernetes cluster at home, which can always be improved (and will be improved). 8 | 9 | If I say all that is because I want one thing to be very clear the main goal of this document is to help you build your homelab, and nothing more I won't go into 10 | details on how to setup a full blown production ready cluster... That said we won't be very far from that and it will be more than enough for all your homelab needs. 11 | 12 | Second disclaimer english isn't my main language so if there is any mistake let me know. 13 | 14 | Third disclaimer, this tutorial is made for linux and mac based environment on windows some commands might differ. 15 | 16 | Last disclaimer, I consider here that you already have basic knowledge in IT, virtualization and containers so I take some shortcuts here and there. 17 | 18 | ## So where do we start ? - VMs Setup ! 19 | 20 | Disclaimer I could use ansible a 21 | First things first we need machines, hardware, something that we can use, possibly destroy, mess with etc you get the idea. 22 | For this purpose I want to use virtual machines as it is easy to recover from an issue and avoid mistakes when we are learning. 23 | The advantage here is that once we know this setup works on a set of virtual machines we can just replicate this setup on real machines and it should be fine. 24 | 25 | There is a TON of ways to setup virtual machines, to just name a few virtual box, vmware, proxmox, terraform are all names you should be familiar with. 26 | In 2020 one would setup everything with terraform and ansible as the tech is exactly made for what we are about to do, however in order not to make things more complicated for beginners I won't use it here for now. 27 | Feel free to setup your environment this way if you know how to do it, for everyone else let's continue. 28 | 29 | So for this demo I will use vmware and setup everything manually, feel free to use virtualbox as it is completely free and does the exact same job, I just use vmware as I have a licence for it. 30 | 31 | The first step here is to create three machines, completely identical (you can decide if you want more ram or cpu than what I show here it is up to you). 32 | Those three machines will be run our cluster. 33 | 34 | Each machine I created is setup like this: 35 | - OS Ubuntu server available at https://ubuntu.com/download/server (choose the manual install option and download the ISO, then use it to create your VM) 36 | - 2 CPU Cores 37 | - 4GB of ram 38 | - 20GB of disk 39 | 40 | It is clearly overkill for what we are doing, I have the hardware to handle it so there is no issue here, but your mileage may vary. Feel free to adjust each machine to your personal computer hardware. 41 | Also remember we will be running the three machines at the same time so make sure your computer can handle everything before starting. 42 | 43 | Now you can install ubuntu server on each machine, it will ask you a bunch of questions about the machines just make sure to install openSSH during the installation process, so we can remote access the machine. 44 | This step is very important and necessary so pay attention to it when the software asks you ! AND SAY YES ! 45 | 46 | You can name each machine as you wish, just remember that one machine will be the master and the other two will be the slaves. 47 | 48 | I names my machines like this: 49 | - KubernetesMaster 50 | - KubernetesSlave1 51 | - KubernetesSlave2 52 | 53 | Each machine will be given an IP by your virtualization platform of choice just make sure to take note of these IPs as they will be necessary in a minute. 54 | 55 | At this point each machine should be up and running and you should see a nice login screen (by nice I mean literally just the word login but that's to be expected) 56 | 57 | Now what we want to do is add a user that will be responsible to handle the cluster setup, for demonstration purposes we will create a user called k3 (spoiler alert if you are familiar with kube you know where this is going ;) ) 58 | 59 | In order to do so we must login into each machine and create a user. So we don't handle each machine in the virtualization interface I recommend you to get a terminal (command line on windows) 60 | which will make our life much easier. On mac I use iTerm2 but feel free to use what ever you want. 61 | 62 | Now we have to remote access each machine from our terminal (here is where SSH and IPs comes into play): 63 | ```shell script 64 | ssh @ 65 | ``` 66 | (Of course replace the values marked as with your own values) 67 | 68 | Side note: to terminate an ssh connection you must type 'exit' into the terminal. 69 | 70 | In order to create the user k3 we must do the following command (on each machine): 71 | ```shell script 72 | sudo adduser k3 73 | ``` 74 | This command will ask you your password and then will ask you the k3 user password, I've set mine to 'test' but you can choose whatever you want. 75 | 76 | Now we must allow this user to have admin rights when necessary without having to type the password all the time (we won't have this for very long just for the initial setup, as it saves us some time). 77 | In order to do so you must do the following: 78 | ```shell script 79 | sudo visudo 80 | ``` 81 | 82 | You are now taken to a very important file (do not erase anything here), this file will allow k3 to be able to do sudo commands without password. 83 | What you want to do now is go to the very end and add the following text (again on all machines): 84 | ```shell script 85 | k3 ALL=(ALL) NOPASSWD:ALL 86 | ``` 87 | 88 | One last thing we must do, is to be able to ssh into our machines without having to type k3's password all the time when we connect to it via ssh. 89 | In order to do so, you must exit the ssh connection, by just typing exit in the terminal, this will kill the connection to that machine and bring you back to your original computer. 90 | From your computer you can now do (again three times) the following command: 91 | ```shell script 92 | ssh-copy-id k3@ 93 | ``` 94 | This will ask you some question and a password (k3's password), once everything goes well you can ssh into any of the three machines (as k3) without having to type k3's password. 95 | 96 | OK ! So now we are at the point where things are ready ! We have three machines, all setup with a k3 account that can do pretty much everything on the machines (I know security wise it's not the best but those permissions will be revoked after the setup). 97 | 98 | ## Installing kubectl 99 | 100 | Kubectl is a tool that is used to access and run commands on a kubernetes cluster, it's primordial that you have this tool as it is THE tool to do pretty much everything on your cluster. 101 | You can follow this link in order to install the tool on your machine: https://kubernetes.io/docs/tasks/tools/install-kubectl/ 102 | 103 | ## Installing kubernetes on our machines. 104 | 105 | The kubernetes flavour we will setup today is K3S, there is tons of other version out there but I found out after an extensive research that K3S has many advantages, first of all it is very easy to install, has a very small footprint and is backed up by rancher 106 | which is a big company in the kubernetes space. All that aside we also are lucky cause someone already made a pretty cool tool for us to use in order to install kubernetes with a very simple command. 107 | 108 | The tool we will use for the job is k3sup (https://github.com/alexellis/k3sup) I invited you to check his readme but if you don't have time I will put here the important parts. 109 | 110 | The first step is the installation on our computer (not on the VMs) of the k3sup tool to do so you need the following command: 111 | ```shell script 112 | curl -sLS https://get.k3sup.dev | sh 113 | sudo install k3sup /usr/local/bin/ 114 | ``` 115 | 116 | Once this is done we can now proceed and install k3s on one of our virtual machines, this machine will become the master so choose carefully. 117 | In order to install k3s you must run the following command: 118 | ```shell script 119 | k3sup install --ip --user k3 --k3s-extra-args '--no-deploy traefik' 120 | ``` 121 | Side note: as you can see here I do --k3s-extra-args '--no-deploy traefik' this is because I do not want to install the traefik that comes with k3s as it is version 1.7, and we are now at 2.3.X, which brings to the table bunch of amazing feature that 1.7 lacks. 122 | 123 | You should have a message that lets you know that all is setup correctly and some commands written for you to test it out. 124 | Basically it tells you to export the kubeconfig file to a variable and then do 125 | ```shell script 126 | kubectl get nodes 127 | ``` 128 | Here you should see the node you just installed. 129 | 130 | Now we must add the two other machines to our cluster so that we have a nice multiple node cluster running our things. 131 | K3sup comes in handy once again as there is a simple command to do just that: 132 | ```shell script 133 | k3sup join --ip --server-ip --user k3 134 | ``` 135 | Do this command to both new machines that you want to add to your cluster. 136 | 137 | Feel free to add as many machines as necessary I just did three in this example but technically you can go as high as your hardware allows you to. 138 | 139 | ## Now onto installing stuff to make everything work 140 | 141 | (Side note: I won't use helm in this tutorial as I want to be able to understand and control everything that is happening. This approach helped me understand kubernetes instead of relying on the magic helm brings, I believe while learning this approach is better 142 | and once you fully understand what is happening feel free to use helm.) 143 | 144 | We are almost at the end of this small doc (I imagine cause I still didn't write everything, if it continues for our don't blame me please)... Now we have a running cluster fully ready to take one what you want to put on it. 145 | 146 | But we just need to add a couple of stuff for everything to be 100% ready. 147 | 148 | (Be ready, here I will start talking in kubernetes terms if you don't understand those terms you will have to look into it in the official kubernetes documentation: https://kubernetes.io/) 149 | 150 | The first thing we must have is called metallb, this tool allows you to create load balancer service on your cluster. Which will be relevant later when we tackle other software. 151 | 152 | In order to install metallb there is a couple of commands we must use (versions may vary depending on when you are reading this feel free to change it in the links below if a newer version is available): 153 | ```shell script 154 | kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/namespace.yaml 155 | kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/metallb.yaml 156 | # On first install only 157 | kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)" 158 | ``` 159 | Those files will create everything in order to use metallb in your cluster, EXCEPT one thing the configuration for it which you must do yourself. 160 | In order to make this configuration you must create a yaml file, with your text editor of choice, let's call it 'ips-configmap.yml'. 161 | Here is my configuration just remember to add your ip range to the file and change the name of the address-pool (those IPs will be attributed by metallb to your service so you should use IPs you know you can reach, in the case of your home network if your home is 192.164.1.X, 162 | you can for example give a range like 192.164.1.200-192.164.1.250, if you stay within this tutorial look at the ips your VMs have, if your VMs have for example ips like 123.123.123.12 you can do a range with 123.123.123.XXX - 123.123.123.YYY): 163 | ```yaml 164 | apiVersion: v1 165 | kind: ConfigMap 166 | metadata: 167 | namespace: metallb-system 168 | name: config 169 | data: 170 | config: | 171 | address-pools: 172 | - name: 173 | protocol: layer2 174 | addresses: 175 | - - 176 | ``` 177 | 178 | Save the configuration file and apply it to the cluster (to do the following command you must be in the same folder as your yml file otherwise you must add the path to it like /path/to/config/ips-configmap.yml): 179 | ```shell script 180 | kubectl apply -f ips-configmap.yml 181 | ``` 182 | If it says that the configmap was created you are golden ;) 183 | 184 | ## On to reverse proxy and ssl certs 185 | 186 | The solution we will use for this is called Traefik, it is pretty much bullet proof, used in the industry by tons of companies and free, which makes it in my honest opinion the best tool for the job, once you understand the tool and once it is setup 187 | you can forget about it, it just work. 188 | 189 | So onto traefik. To do so we will need to add some configurations to our cluster. Those are objects that traefik understands and needs in order to function properly. 190 | Here is my file you can copy paste it, let's name this file 'ingressRouteDefinition.yml': 191 | ```yaml 192 | apiVersion: apiextensions.k8s.io/v1beta1 193 | kind: CustomResourceDefinition 194 | metadata: 195 | name: ingressroutes.traefik.containo.us 196 | 197 | spec: 198 | group: traefik.containo.us 199 | version: v1alpha1 200 | names: 201 | kind: IngressRoute 202 | plural: ingressroutes 203 | singular: ingressroute 204 | scope: Namespaced 205 | 206 | --- 207 | apiVersion: apiextensions.k8s.io/v1beta1 208 | kind: CustomResourceDefinition 209 | metadata: 210 | name: middlewares.traefik.containo.us 211 | 212 | spec: 213 | group: traefik.containo.us 214 | version: v1alpha1 215 | names: 216 | kind: Middleware 217 | plural: middlewares 218 | singular: middleware 219 | scope: Namespaced 220 | 221 | --- 222 | apiVersion: apiextensions.k8s.io/v1beta1 223 | kind: CustomResourceDefinition 224 | metadata: 225 | name: ingressroutetcps.traefik.containo.us 226 | 227 | spec: 228 | group: traefik.containo.us 229 | version: v1alpha1 230 | names: 231 | kind: IngressRouteTCP 232 | plural: ingressroutetcps 233 | singular: ingressroutetcp 234 | scope: Namespaced 235 | 236 | --- 237 | apiVersion: apiextensions.k8s.io/v1beta1 238 | kind: CustomResourceDefinition 239 | metadata: 240 | name: ingressrouteudps.traefik.containo.us 241 | 242 | spec: 243 | group: traefik.containo.us 244 | version: v1alpha1 245 | names: 246 | kind: IngressRouteUDP 247 | plural: ingressrouteudps 248 | singular: ingressrouteudp 249 | scope: Namespaced 250 | 251 | --- 252 | apiVersion: apiextensions.k8s.io/v1beta1 253 | kind: CustomResourceDefinition 254 | metadata: 255 | name: tlsoptions.traefik.containo.us 256 | 257 | spec: 258 | group: traefik.containo.us 259 | version: v1alpha1 260 | names: 261 | kind: TLSOption 262 | plural: tlsoptions 263 | singular: tlsoption 264 | scope: Namespaced 265 | 266 | --- 267 | apiVersion: apiextensions.k8s.io/v1beta1 268 | kind: CustomResourceDefinition 269 | metadata: 270 | name: tlsstores.traefik.containo.us 271 | 272 | spec: 273 | group: traefik.containo.us 274 | version: v1alpha1 275 | names: 276 | kind: TLSStore 277 | plural: tlsstores 278 | singular: tlsstore 279 | scope: Namespaced 280 | 281 | --- 282 | apiVersion: apiextensions.k8s.io/v1beta1 283 | kind: CustomResourceDefinition 284 | metadata: 285 | name: traefikservices.traefik.containo.us 286 | 287 | spec: 288 | group: traefik.containo.us 289 | version: v1alpha1 290 | names: 291 | kind: TraefikService 292 | plural: traefikservices 293 | singular: traefikservice 294 | scope: Namespaced 295 | 296 | --- 297 | kind: ClusterRole 298 | apiVersion: rbac.authorization.k8s.io/v1beta1 299 | metadata: 300 | name: traefik-ingress-controller 301 | 302 | rules: 303 | - apiGroups: 304 | - "" 305 | resources: 306 | - services 307 | - endpoints 308 | - secrets 309 | verbs: 310 | - get 311 | - list 312 | - watch 313 | - apiGroups: 314 | - extensions 315 | - networking.k8s.io 316 | resources: 317 | - ingresses 318 | - ingressclasses 319 | verbs: 320 | - get 321 | - list 322 | - watch 323 | - apiGroups: 324 | - extensions 325 | resources: 326 | - ingresses/status 327 | verbs: 328 | - update 329 | - apiGroups: 330 | - traefik.containo.us 331 | resources: 332 | - middlewares 333 | - ingressroutes 334 | - traefikservices 335 | - ingressroutetcps 336 | - ingressrouteudps 337 | - tlsoptions 338 | - tlsstores 339 | verbs: 340 | - get 341 | - list 342 | - watch 343 | 344 | --- 345 | kind: ClusterRoleBinding 346 | apiVersion: rbac.authorization.k8s.io/v1beta1 347 | metadata: 348 | name: traefik-ingress-controller 349 | 350 | roleRef: 351 | apiGroup: rbac.authorization.k8s.io 352 | kind: ClusterRole 353 | name: traefik-ingress-controller 354 | subjects: 355 | - kind: ServiceAccount 356 | name: traefik-ingress-controller 357 | namespace: default 358 | ``` 359 | As usual we must now apply this to the cluster, same as before: 360 | ```shell script 361 | kubectl apply -f ingressRouteDefinition.yml 362 | ``` 363 | 364 | Here you will see that bunch of stuff have been created, that's normal and to be expected. 365 | 366 | Now we must create a deployment and a service account for our traefik for it to work, you can do soo by applying again another yaml file (everything is yaml in kube) 367 | Here is the file just make sure to check the comments and change according to your needs, also note that I am installing traefik in the default namespace you can change it if you want make sure to create a namespace beforehand. 368 | Let's call this one traefik.yml: 369 | ```yaml 370 | apiVersion: v1 371 | kind: ServiceAccount 372 | metadata: 373 | namespace: default 374 | name: traefik-ingress-controller 375 | 376 | --- 377 | kind: Deployment 378 | apiVersion: apps/v1 379 | metadata: 380 | namespace: default 381 | name: traefik 382 | labels: 383 | app: traefik 384 | 385 | spec: 386 | replicas: 1 387 | selector: 388 | matchLabels: 389 | app: traefik 390 | template: 391 | metadata: 392 | labels: 393 | app: traefik 394 | spec: 395 | serviceAccountName: traefik-ingress-controller 396 | containers: 397 | - name: traefik 398 | image: traefik:v2.3 # you might have a higher version by the time you reading this 399 | args: 400 | # This should not be setup when you are in production as it creates a dashboard that can be accessed by anyone 401 | # but for our test needs it is great just remember to remove it 402 | - --insecure.api 403 | - --accesslog 404 | 405 | # Here we define our entry points we have two of them one at 80 (we call it web) and one at 443 (we call it websecure) 406 | - --entrypoints.web.Address=:80 407 | # Traefik handles automatic redirections from http to https and it's done like so 408 | # feel free to comment that in the first time if you want to test your http endpoint 409 | - --entrypoints.web.http.redirections.entryPoint.to=web-secure 410 | - --entrypoints.web.http.redirections.entryPoint.scheme=https 411 | - --entrypoints.web.http.redirections.entrypoint.permanent=true 412 | - --entrypoints.web-secure.Address=:443 413 | 414 | # I still need to read about providers but basically we need that 415 | - --providers.kubernetescrd 416 | 417 | # This part is the part that will generate our ssl certificates 418 | # I invite you to read a bit more about this, you will need your own domain name in order to use it 419 | # Traefik has a nice documentation on the different options but in the meantime here is what I used 420 | - --certificatesresolvers.certresolver.acme.tlschallenge # many challenges exists you must see what you prefer/need 421 | - --certificatesresolvers.certresolver.acme.email=your@email.com # replace this with your mail 422 | 423 | # This file will store our certificates, I do not use a volume to store this so everytime traefik reboot it will destroyed 424 | # this is fine for our dev purposes but you might want to have a volume when you go live with your cluster 425 | # it is important for you to know that we will use letsencrypt and it has restrictions on the amount of certificates we can ask for 426 | # if you ask more then X certificate for test.domain.com you will be throttled and will have to wait to have your certificate 427 | # here we are going to use the staging server so we can make sure the certificate validation works and then we will remove the staging and go for real certs 428 | - --certificatesresolvers.certresolver.acme.storage=acme.json 429 | # here we setup who will give us certificates we chose the staging server as explained above as we want to make sure it works first 430 | # to have real certificate remove staging from the link below 431 | - --certificatesresolvers.certresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory 432 | # here we are opening ports in order to access later traefik 433 | # we have the usual 80 and 443 but also 8080 this last one is where the traefik dashboard is deployed 434 | # for testing it is ok but remember we set this as insecure so in the future we must protect this or simple not make it acessible 435 | ports: 436 | - name: http 437 | containerPort: 80 438 | - name: https 439 | containerPort: 443 440 | - name: admin 441 | containerPort: 8080 442 | ``` 443 | 444 | And once again we must apply that: 445 | ```shell script 446 | kubectl apply -f traefik.yml 447 | ``` 448 | 449 | Now traefik should be running (validate it with kubectl get deployment to make sure) but it is not yet accessible to you, it's still trapped inside the cluster. 450 | In order to access it we must have a service (of type loadbalancer here is where metallb comes in handy). 451 | Here is the service file let's call it service.yml: 452 | 453 | ```yaml 454 | apiVersion: v1 455 | kind: Service 456 | metadata: 457 | name: traefik 458 | annotations: 459 | metallb.universe.tf/address-pool: 460 | spec: 461 | ports: 462 | - port: 80 463 | targetPort: 80 464 | name: http 465 | - port: 443 466 | targetPort: 443 467 | name: https 468 | - port: 8080 469 | targetPort: 8080 470 | name: admin 471 | selector: 472 | app: traefik 473 | type: LoadBalancer 474 | ``` 475 | 476 | You know the drill we must apply the little fellow: 477 | ```shell script 478 | kubectl apply -f service.yml 479 | ``` 480 | 481 | Now the magic is almost about to happen ! 482 | 483 | ## Let's put it to the test 484 | 485 | Ok now we have a setup where we can in fact create our certificates for our services and so on, in order to keep it simple however we will use an image that is already available to us. 486 | We will setup now nginx (you don't really need to know what it is) on our custom domain and validate the certificate. 487 | 488 | Bunch of things we must do first though. 489 | Let say we want to expose test.example.com we must say to our computer to redirect test.example.com to the virtual machine (the entry point of our cluster), in our case it is the service we just created for traefik ! 490 | By running this command: 491 | ```shell script 492 | kubectl get service 493 | ``` 494 | You should see a line like so: 495 | ```shell script 496 | traefik LoadBalancer 10.43.79.253 80:31465/TCP,443:32256/TCP 21h 497 | ``` 498 | Notice the second IP this is the IP of the service that redirects to traefik, aka what we want. 499 | 500 | So now you must edit your host file so that test.domain.com redirects to this IP. 501 | You can check it here: https://www.howtogeek.com/howto/27350/beginner-geek-how-to-edit-your-hosts-file/ 502 | 503 | Once this is done if you try to access it... you won't see anything, thats expected :D 504 | But you should be able to access :8080 and see traefik alive if this is ok you are golden ! 505 | 506 | Now what we must do is add something that answers when we query test.domain.com and not just a blank 404 page. 507 | 508 | Here we go installing nginx by doing the following: 509 | 510 | First we must create a deployment let's call it ndeploy.yml: 511 | ```yaml 512 | kind: Deployment 513 | apiVersion: apps/v1 514 | metadata: 515 | namespace: default 516 | name: nginx 517 | labels: 518 | app: nginx 519 | 520 | spec: 521 | replicas: 1 522 | selector: 523 | matchLabels: 524 | app: nginx 525 | template: 526 | metadata: 527 | labels: 528 | app: nginx 529 | spec: 530 | containers: 531 | - name: nginx 532 | image: nginx 533 | ports: 534 | - name: http 535 | containerPort: 80 536 | ``` 537 | 538 | Apply it with: 539 | ```shell script 540 | kubectl apply -f ndeploy.yml 541 | ``` 542 | 543 | Now we must have a service so that the service can take us to that deployment, let's call it nservice.yml: 544 | ```yaml 545 | apiVersion: v1 546 | kind: Service 547 | metadata: 548 | name: nginx-service 549 | 550 | spec: 551 | ports: 552 | - name: http 553 | port: 80 554 | selector: 555 | app: nginx 556 | ``` 557 | 558 | Finally we must tell traefik that we want to access this service when we reach test.domain.com this is done via a route, there is much more information on the traefik website so you can also check it out there if this is complicated for you. 559 | Let's call this the nroute.yml: 560 | ```yaml 561 | apiVersion: traefik.containo.us/v1alpha1 562 | kind: IngressRoute 563 | metadata: 564 | name: nginx-route 565 | namespace: default 566 | spec: 567 | entryPoints: 568 | - http 569 | routes: 570 | - match: Host(`test.domain.com`) 571 | kind: Rule 572 | services: 573 | - name: nginx-service 574 | port: 80 575 | 576 | --- 577 | # Here we are defining two routes one in http and another one in https 578 | # If you have kept the global http to https redirecting you don't need the http route 579 | # as all traffic will be redirected automatically 580 | # I just left it here for you to see the difference between the two 581 | 582 | apiVersion: traefik.containo.us/v1alpha1 583 | kind: IngressRoute 584 | metadata: 585 | name: nginx-route-secure 586 | namespace: default 587 | spec: 588 | entryPoints: 589 | - https 590 | routes: 591 | - match: Host(`test.domain.com`) 592 | kind: Rule 593 | services: 594 | - name: nginx-service 595 | port: 80 596 | tls: 597 | certResolver: certresolver 598 | ``` 599 | 600 | What we are saying here is simple, we tell traefik that we want to reach the nginx-service when we arrive with test.domain.com, in http or https 601 | 602 | Apply it with: 603 | ```shell script 604 | kubectl apply -f nroute.yml 605 | ``` 606 | 607 | Now if everything has been done correctly you can now open your browser and go to test.domain.com 608 | 609 | You will see a warning that's because the certificate will not probably work as you probably don't own test.domain.com and that is normal but you can bypass this warning and reach your site 610 | 611 | Now you must just confirm that everything is working and remove the staging server from traefik and replace it with the production server so you can start having your own certificates. 612 | 613 | If you don't have a domain name, you can also decide not to implement the certificate resolver by removing the arguments from the traefik deployment, doing so will not resolve certificates and you will have warnings when acessing your site 614 | but as soon as your hosts are configured to reach traefik service correctly within your local domain you can have your own urls for your own apps and everything will run just fine. 615 | 616 | Just a tip don't bother with https if you don't have to have certificates, or if your cluster is never opened on the internet, you can run everything on http and have your own urls as well, just remember there won't be encryptions between you and your app if that's what you decide to do. 617 | 618 | ## Let's add NFS storage to our pods ! 619 | 620 | OK so in this update we will add volumes to our pods so that we can keep data on disk if we restart our pods. 621 | If yu followed along this tutorial of some sort you have now a traefik instance that automatically gets you a certificate once a new route is created. 622 | This is super convenient but once you remove the "staging" let's encrypt server you will see that you might reach a certificate limit, aka you asked too many times for the same certificate. 623 | 624 | That happens because as you turn off your traefik instance (for whatever reason) and run it again it will ask again and again for the certificate for the various routes you created. 625 | 626 | Reaching your limit very quickly. 627 | 628 | In order to avoid that we will create three things (and again we will do so without any automation so we really understand what is happening): 629 | 630 | - A volume 631 | - A volume claim 632 | - A traefik deployment with updated parameters 633 | 634 | Ok so let's go do that ! BUT before that, I am not entering here into the NFS territory I suppose you already have an NFS server with users that have the right to use a specific volume of your NFS server. 635 | Considering the following for the rest of this tutorial: 636 | 637 | - You have an NFS server with a folder in it that you have access to 638 | - You have a user ID and GID at hand and this user has permissions to use the NFS share 639 | 640 | In my case I created a volume on my nas and gave a user "kube" the right to access this folder, this kube user has an ID of 1040 and a GID of 200, but of course yours may vary, and thos values are random invented for this tutorial. 641 | I also made my NFS share accept direct connections from my kubernetes IPs and local machine IPs so I can access the NFS share directly without being bothered by access codes. 642 | This vary per NAS distribution so I can't really help you setting this up, but a quick search with "NFS on my XXX nas" should help you. 643 | 644 | OK so now we can tackle the kube side of things. 645 | 646 | First thing first we will create a volume, this object is the one mapping your NFS share and making it accessible on kubernetes. 647 | In my example below I created a volume called traefik-data-pv and gave it a capacity of 5gigs. I also made sure I have a "readWriteOnce" option which means only one claim can be made on that volume 648 | (more on that later). I also flyover storage class here and use the default, you may want to read more on that but let say here that I don't have different kinds of storage 649 | usually you would use that create a class based on SSD and HDD for instance so that some volumes are automatically stored on SSD, this kind of things. Imagine that as premium storage, basic storage, medium storage and so on. 650 | Not so relevant for me though. 651 | The interesting part is the "nfs" options where you specify a directory in my NFS volume and the IP of my NFS server. 652 | ```yaml 653 | ## PV traefik data 654 | 655 | apiVersion: v1 656 | kind: PersistentVolume 657 | metadata: 658 | name: traefik-data-pv 659 | spec: 660 | capacity: 661 | storage: 5Gi 662 | storageClassName: "local-path" 663 | volumeMode: Filesystem 664 | accessModes: 665 | - ReadWriteOnce 666 | mountOptions: 667 | - hard 668 | nfs: 669 | path: "/vl3/Kubernetes/traefik" # insert here your NFS path where traefik will save data 670 | server: "XX.XX.XX.XX" # insert here your NFS server IP 671 | 672 | ``` 673 | 674 | Now that is done we can create a "claim" an object that will basically use the volume, this claim makes some kind of a bridge between the volume and a kubernetes pod. 675 | Think of it as inserting an usb drive to the pod itself in some way. Note that the claim must match the volume created you can't have a claim that ask for bigger storage then a volume. 676 | You can however have a claim that ask for less storage but remember a volume can only support one claim so better use all the storage. 677 | 678 | Here is my claim (also note that it references the previous volume): 679 | 680 | ```yaml 681 | ## PVC traefik data 682 | 683 | apiVersion: v1 684 | kind: PersistentVolumeClaim 685 | metadata: 686 | name: traefik-data-pvc 687 | spec: 688 | volumeName: traefik-data-pv 689 | resources: 690 | requests: 691 | storage: 5Gi 692 | storageClassName: "local-path" 693 | volumeMode: Filesystem 694 | accessModes: 695 | - ReadWriteOnce 696 | ``` 697 | 698 | Now as usual we can save thos definition in a file like "volume.yml" and apply that with a good old: 699 | ``` 700 | kubectl apply -f volume.yml 701 | ``` 702 | 703 | Now the important thing is if you now do: 704 | ``` 705 | kubectl get pvc 706 | ``` 707 | You should see your claim with a status "Bound" this means the claim found the volume and the volume is ok ! 708 | AKA you are on the good direction. 709 | 710 | Now for the final step we want to make sure our traefik is saving the certificates on that volume. Here is an updated version of the traefik deployment file that does just that: 711 | (I removed the previous explanation comments written previously so we can focus on the volume part of things feel free to scroll up for more info on labels) 712 | ```yaml 713 | --- 714 | kind: Deployment 715 | apiVersion: apps/v1 716 | metadata: 717 | namespace: default 718 | name: traefik 719 | labels: 720 | app: traefik 721 | 722 | spec: 723 | replicas: 1 724 | selector: 725 | matchLabels: 726 | app: traefik 727 | template: 728 | metadata: 729 | labels: 730 | app: traefik 731 | spec: 732 | 733 | # now this security context will vary according to how you have setup things on your nfs share. 734 | # if you created a share and said "this IP is allowed to do everything" then you don't need the security context 735 | # if on the other hand you said "no only this user is allowed to do something" then you must tell your pod to act as this user 736 | # otherwise you will run into security issues such as "no permission allowed" when you will try to save data on the NFS share 737 | # in my case I have a double safety I say this IP and this account only can use my share so I must specifiy here the account "kube" I created earlier on my nas 738 | securityContext: 739 | runAsUser: 1040 # my kube user ID 740 | runAsGroup: 200 # my kube group ID 741 | 742 | serviceAccountName: traefik-ingress-controller 743 | containers: 744 | - name: traefik 745 | image: traefik:v2.3 746 | args: 747 | - --accesslog 748 | - --entrypoints.web.Address=:80 749 | - --entrypoints.web.http.redirections.entryPoint.to=web-secure 750 | - --entrypoints.web.http.redirections.entryPoint.scheme=https 751 | - --entrypoints.web.http.redirections.entrypoint.permanent=true 752 | - --entrypoints.web-secure.Address=:443 753 | - --providers.kubernetescrd 754 | - --certificatesresolvers.certresolver.acme.tlschallenge 755 | - --certificatesresolvers.certresolver.acme.email=you@email.com 756 | # Here if I would have left acme.json Kubernetes would have seen this as a folder even if I have a sub path onmy volume mount so I decided to 757 | # put the acme.json file in a folder called certs at the root of this container and it works just fine 758 | - --certificatesresolvers.certresolver.acme.storage=/certs/acme.json 759 | 760 | # Careful here I removed the staging server and put the real one cause I already know this is working for me 761 | # make sure you have staging first so you can test things 762 | - --certificatesresolvers.certresolver.acme.caserver=https://acme-v02.api.letsencrypt.org/directory 763 | ports: 764 | - name: http 765 | containerPort: 80 766 | - name: https 767 | containerPort: 443 768 | # Here is the interesting part I am here defining a mounting point so that in my cluster the path /certs is actually mounted on the nfs-data volume 769 | # which I add to my deployment later on 770 | volumeMounts: 771 | - name: nfs-data 772 | mountPath: /certs 773 | # Here is the volume nfs-data and as you can see I reference here my volume claim that I created earlier. 774 | # nothing really fancy must be done once you understand it is quite straight forward 775 | volumes: 776 | - name: nfs-data 777 | persistentVolumeClaim: 778 | claimName: traefik-data-pvc # the name of the claim you created earlier 779 | ``` 780 | 781 | OK so at this point once you "kubectl apply -f" this new traefik deployment you should see it starting up and if your permissions and account are setup correctly you will see your acme.json pop in your NFS folder. 782 | 783 | If this is not the case check the logs of traefik as they are rather explicit: 784 | ```shell script 785 | kubectl logs -f 786 | ``` 787 | 788 | OK so now not only your have a multi node kubernetes cluster but also a reverse proxy with ssl certificates AND NFS storage ! 789 | You are pretty much rock solid to start whatever apps you might want ! 790 | 791 | ## The last piece to the puzzle an SSO 792 | 793 | In this section the goal is to setup an SSO (single sign on) with two factor authentication for every services that you want to protect. 794 | This has some advantages, for instance if you want to deploy an app like for instance Homer (a dashboard to present different links) that doesn't come with 795 | a built in authentication mechanism you don't have to worry about it, cause something else is handling the authentication for you. 796 | 797 | Doing so you have a software running to let other services know you are authenticated or not and you can happily disable if you want the login screen of your other services because this is not required now. 798 | 799 | The flow that we want to put in place is the following: 800 | - You want to access home.domain.com (let say a homer instance) 801 | - You are first redirected to https via what we put in place earlier. 802 | - Once in https you are then redirected to an authentication server. 803 | - This server is checking if you are logged in, if you are you don't even see anything and you reach directly your initial address 804 | - If you are not however you are asked to login 805 | - After a successful login you are redirected to the initial address 806 | 807 | This however has some pre-requisites that are important: 808 | - You don't necessarily need a domain, however if you don't you need to change your host files so that example.domain.com and whatever.domain.com are redirect to different apps, in this tutorial we will setup an authentication server on auth.domain.com and a test app on app.domain.com, so make sure you have that ready ! 809 | 810 | What tools we will use for the job... There is many tools that are able to do this job ! I went through a lot and I figured that the easiest for us in a homelab environment is called Authelia. 811 | 812 | Authelia is excatly what we need for the job, it's fast, secure and even though it is not as huge as a keycloak server for instance, it also means that it's way faster to run it and easier to understand, which for a homelab is perfect ;) 813 | 814 | So as usual with every app that we want to run with kube we have to add a couple of things here, services, deployments etc you know the drill you just have to kubectl apply the different files, so here we go. 815 | 816 | We start first with a namespace, we will use that to keep everything related to authentication inside its own space: 817 | ```yaml 818 | apiVersion: v1 819 | kind: Namespace 820 | metadata: 821 | name: authentication 822 | ``` 823 | 824 | Then we need a service so we can reach our application: 825 | ```yaml 826 | apiVersion: v1 827 | kind: Service 828 | metadata: 829 | namespace: authentication 830 | name: authelia-service # give it the name that you want 831 | spec: 832 | ports: 833 | - name: http 834 | port: 9091 # our service exposes here the port 9091 (it's redundant to specify both but for sake of understanding here I do it all) 835 | protocol: TCP 836 | targetPort: 9091 # our service targets the port 9091 from the container running in the back 837 | selector: 838 | app: authelia # we want to access authelia 839 | ``` 840 | 841 | Once we got that, we need volumes and the associated volume claims in order to keep data on our NFS server that we managed to use on the previous steps of this tutorial, we have two volumes and two volume claims, in this example I will save the config and the secret in two different places but you can choose otherwise if you want to: 842 | 843 | ```yaml 844 | apiVersion: v1 845 | kind: PersistentVolume 846 | metadata: 847 | namespace: authentication 848 | name: authelia-config-pv # you can specify another name if you want 849 | spec: 850 | capacity: 851 | storage: 10Gi # 10 gigs may be too much you might decide to give it less 852 | storageClassName: "local-path" 853 | volumeMode: Filesystem 854 | accessModes: 855 | - ReadWriteOnce # specify here that only one claim will be able to use this volume 856 | mountOptions: 857 | - hard 858 | nfs: 859 | path: "your/path/to/volume/authelia/" # replace here with the destination folder where authelia will store data (inside your NFS server) 860 | server: "XX.XX.XX.XX" # replace here with your server IP 861 | 862 | --- 863 | 864 | apiVersion: v1 865 | kind: PersistentVolumeClaim 866 | metadata: 867 | namespace: authentication 868 | name: authelia-config-pvc 869 | spec: 870 | volumeName: authelia-config-pv # here as usual we reference the volume we created previously 871 | resources: 872 | requests: 873 | storage: 10Gi # and we make sure that they both have the same size 874 | storageClassName: "local-path" 875 | volumeMode: Filesystem 876 | accessModes: 877 | - ReadWriteOnce 878 | 879 | --- 880 | 881 | # the volume and claim below are used to store the "secret" files in order to setup authelia without revealing secret informations 882 | 883 | apiVersion: v1 884 | kind: PersistentVolume 885 | metadata: 886 | namespace: authentication 887 | name: authelia-secret-pv # you can always change those names 888 | spec: 889 | capacity: 890 | storage: 10Gi 891 | storageClassName: "local-path" 892 | volumeMode: Filesystem 893 | accessModes: 894 | - ReadWriteOnce 895 | mountOptions: 896 | - hard 897 | nfs: 898 | path: "/path/to/your/secret/location" 899 | server: "XX.XX.XX.XX" # your NFS server IP 900 | 901 | --- 902 | 903 | apiVersion: v1 904 | kind: PersistentVolumeClaim 905 | metadata: 906 | namespace: authentication 907 | name: authelia-secret-pvc 908 | spec: 909 | volumeName: authelia-secret-pv 910 | resources: 911 | requests: 912 | storage: 10Gi 913 | storageClassName: "local-path" 914 | volumeMode: Filesystem 915 | accessModes: 916 | - ReadWriteOnce 917 | ``` 918 | 919 | Now that this is done we just need a couple more things, like the deployment: 920 | ```yaml 921 | kind: Deployment 922 | apiVersion: apps/v1 923 | 924 | metadata: 925 | namespace: authentication 926 | name: authelia 927 | labels: 928 | app: authelia 929 | 930 | spec: 931 | replicas: 1 932 | selector: 933 | matchLabels: 934 | app: authelia 935 | template: 936 | metadata: 937 | labels: 938 | app: authelia 939 | spec: 940 | securityContext: 941 | runAsUser: 1040 # this as seen previously is the NFS kube id we saw in previous sections (yours will vary of course) 942 | runAsGroup: 200 # this is kube group id also seen previously 943 | containers: 944 | - name: authelia 945 | image: authelia/authelia # here we just take the latest version 946 | env: 947 | - name: AUTHELIA_JWT_SECRET_FILE 948 | value: /app/secrets/NFS-JWT-FILE # here we specify were is the jwt token required by authelia, the part after secret is present in your NFS server, this means you must have a file under your secret location defined above in the volume which is named NFS-JWT-FILE (of course you can change the name) this file must only contain the following: jwt_secret=your-super-secret-key-that-you-must-define-by-yourself 949 | 950 | - name: AUTHELIA_SESSION_SECRET_FILE 951 | value: /app/secrets/SESSION-FILE # same as above but for the session this file must contain: secret=your-super-secret-blablabla-you-got-it 952 | 953 | ports: 954 | - containerPort: 9091 # we open here port 9091 955 | name: http 956 | volumeMounts: 957 | - name: nfs-data 958 | mountPath: /config # we map the volume required for the config file 959 | - name: nfs-secret 960 | mountPath: /app/secrets # we map here the volume required for the secrets 961 | 962 | volumes: 963 | - name: nfs-data 964 | persistentVolumeClaim: 965 | claimName: authelia-config-pvc # claimes defined previously 966 | - name: nfs-secret 967 | persistentVolumeClaim: 968 | claimName: authelia-secret-pvc 969 | ``` 970 | 971 | Now don't apply it just yet. 972 | 973 | Now we have to create a configuration file for authelia that we must store at the root of our config volume and name it configuration.yml 974 | 975 | There is an example of a configuration file here: https://github.com/authelia/authelia/blob/master/compose/local/authelia/configuration.yml 976 | I also invite you to read this: https://www.authelia.com/docs/configuration/ it describe every options available to configure authelia according to your needs. 977 | 978 | As for myself for instance I decided to use a postgres database to store data and a redis database to store sessions, as everything can be different according to each person I will let you decide what is best for you and your cluster. 979 | 980 | Once you have this figured out you can apply the deployment and see if it boots up correctly (again via kubectl get deployment -A) 981 | 982 | The next step is to create a traefik route so we can access our authentication server via something like auth.domain.com 983 | Here is how you do it: 984 | ```yaml 985 | kind: IngressRoute 986 | apiVersion: traefik.containo.us/v1alpha1 987 | metadata: 988 | name: authelia-route-secure 989 | namespace: authentication 990 | spec: 991 | entryPoints: 992 | - https 993 | routes: 994 | - match: Host(`auth.domain.com`) # of course change it according to your domain 995 | kind: Rule 996 | services: 997 | - name: authelia-service # we want to reach authelia service 998 | port: 9091 # on port 9091 999 | tls: 1000 | certResolver: certresolver # we want automatic SSL as describe in past sections 1001 | ``` 1002 | 1003 | Ok so by now we have almost every pieces of the puzzle, we just need to let traefik know what to do. This is called forward auth. 1004 | 1005 | Forward auth works in a very simple way, in this example as we want to secure specific services we will create first a middleware and apply it on different traefik routes, or you can decide to apply it directly to every route by applying this middleware to the https endpoint. You decide. Let me show you how to specifically secure one endpoint. 1006 | 1007 | First as we said we need the middleware: 1008 | ```yaml 1009 | apiVersion: traefik.containo.us/v1alpha1 1010 | kind: Middleware 1011 | metadata: 1012 | name: authelia # here I am leaving it in the default namespace but you can change it if you want to 1013 | spec: 1014 | forwardAuth: 1015 | # the address is a bit tricky, first we want our forward auth to redirect to the service we created earlier which was named authelia-service 1016 | # but this service is in another namespace 1017 | # in kubernetes to reference a service in another namespace you must an internal DNS name which is written like so: 1018 | # service-name.namespace.svc.cluster.local 1019 | # this is why you see this complicated url under address 1020 | # also we target the port 9091 and the path required by authelia /api/verify?rd=TheURLWhereAutheliaIsRunning 1021 | # took me a long time to understand this but yeah you need all that 1022 | address: http://authelia-service.authentication.svc.cluster.local:9091/api/verify?rd=https://auth.domain.com/ 1023 | trustForwardHeader: true 1024 | # authelia requires those options to work properly basically you pass on arguments from your users when they login back to your apps 1025 | # you can add more stuff if you like but check the authelia documentation for more info 1026 | authResponseHeaders: 1027 | - Remote-User 1028 | - Remote-Groups 1029 | - Remote-Name 1030 | - Remote-Email 1031 | ``` 1032 | 1033 | OK so by now we have all but one piece of the puzzle here ! 1034 | 1035 | We just need to secure a traqefik route so that we pass by this middleware, here I am taking a random route that redirects to a homer dashboard, of course you must have the dashboard actually running and reachable for this to work 1036 | 1037 | Here is an example of the new route: 1038 | ```yaml 1039 | kind: IngressRoute 1040 | apiVersion: traefik.containo.us/v1alpha1 1041 | metadata: 1042 | name: homer-route-secure 1043 | namespace: dashboard 1044 | spec: 1045 | entryPoints: 1046 | - https 1047 | routes: 1048 | - match: Host(`home.domain.com`) 1049 | kind: Rule 1050 | services: 1051 | - name: homer-service # we redirect to a hypothetique homer service 1052 | port: 8080 # on port 8080 1053 | middlewares: # here is the important part we add to this route the middleware we just created above the name is 1054 | # namespace-nameOfTheMiddleware@kubernetescrd the kubernetescrd is because we have setup traefik with this argument --providers 1055 | # kubernetescrd 1056 | - name: default-authelia@kubernetescrd # the middleware authelia in the default namespace 1057 | tls: 1058 | certResolver: certresolver 1059 | ``` 1060 | 1061 | Finally now we have if you've done everything correctly you can access homer and see the authelia login screen ! You can login and then be redirected to your homer dashboard ! 1062 | 1063 | ## Conclusions 1064 | 1065 | This documentation is a draft as of today, I didn't put any picture yet didn't correct the language etc this is a work in progress and maybe will be improved further eventually maybe hopefully......... 1066 | 1067 | However I had a goal not so long ago when I had no idea about containers and docker and kubernetes and my goal was to help you guys not to struggle like me and spend months and months (thanks covid for the free time :/) to figure out everything by yourself... I hope it helps you and if you have any problems you can reach me here via an issue or on my reddit https://www.reddit.com/user/SirSirae feel free to message me. 1068 | 1069 | Summing it up we now have a multi node kube cluster, a reverse proxy with automatic ssl and https redirection, an SSO to secure every services we want to serve on our cluster ! Now you can pretty much do whatever you want on your cluster and host whatever apps you want they will all be secured ! 1070 | 1071 | Achievement unlocked: understand kubernetes before the end of 2020 !!!!! 1072 | 1073 | 1074 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman --------------------------------------------------------------------------------