14 |
15 | [← back to the Labs Overview](../README.md)
16 |
--------------------------------------------------------------------------------
/labs/70_upgrade.md:
--------------------------------------------------------------------------------
1 | # Lab 7: Upgrade OpenShift from 3.11.88 to 3.11.104
2 |
3 | In this chapter, we will do a minor upgrade from OpenShift 3.11.88 to 3.11.104.
4 |
5 |
6 | ## Chapter Content
7 |
8 | * [7.1: Upgrade OpenShift 3.11.88 to 3.11.104](71_upgrade_openshift3.11.104.md)
9 | * [7.2: Verify the Upgrade](72_upgrade_verification.md)
10 |
11 | ---
12 |
13 |
13 |
14 | [← back to the Labs Overview](../README.md)
15 |
--------------------------------------------------------------------------------
/theme/02_corner_top_left.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/03_corner_top_left.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/04_corner_top_left.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/05_corner_top_left.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/01_corner_bottom_right.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/01_corner_top_left.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/02_corner_top_right.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/03_corner_top_right.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/04_corner_top_right.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/theme/05_corner_top_right.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/labs/60_monitoring_troubleshooting.md:
--------------------------------------------------------------------------------
1 | # Lab 6: Monitoring and Troubleshooting
2 |
3 | In this chapter, we are going to look at how to do monitoring and troubleshooting of OpenShift. The former gives us the knowledge if our platform is operational and if it will remain so. The latter helps us in case those health checks fail but do not provide sufficient details as to why.
4 |
5 |
6 | ## Chapter Content
7 |
8 | * [6.1: Monitoring](61_monitoring.md)
9 | * [6.2: Logs](62_logs.md)
10 |
11 | ---
12 |
13 |
17 |
18 | [← back to the Labs Overview](../README.md)
19 |
--------------------------------------------------------------------------------
/appendices/02_internet_resources.md:
--------------------------------------------------------------------------------
1 | # Appendix 2: Useful Internet Resources
2 |
3 | This appendix is a small collection of rather useful online resources containing scripts and documentation as well as Ansible roles and playbooks and more.
4 |
5 | - Additional Ansible roles and playbooks provided by the community: https://github.com/openshift/openshift-ansible-contrib
6 | - Scripts used by OpenShift Online Operations: https://github.com/openshift/openshift-tools
7 | - Red Hat Communities of Practice: https://github.com/redhat-cop
8 | - Red Hat Consulting DevOps and OpenShift Playbooks: http://v1.uncontained.io/
9 | - APPUiO OpenShift resources: https://github.com/appuio/
10 | - Knowledge Base: https://kb.novaordis.com/index.php/OpenShift
11 |
12 | ---
13 |
14 | [← back to the labs overview](../README.md)
15 |
16 |
17 |
--------------------------------------------------------------------------------
/labs/21_ansible_inventory.md:
--------------------------------------------------------------------------------
1 | ## Lab 2.1: Create the Ansible Inventory
2 |
3 | In this lab, we will verify the Ansible inventory file, so it fits our lab cluster. The Inventory file describes, how the cluster will be built.
4 |
5 | Take a look at the prepared inventory file:
6 | ```
7 | [ec2-user@master0 ~]$ less /etc/ansible/hosts
8 | ```
9 |
10 | Download the default example hosts file from the OpenShift GitHub repository and compare it to the prepared inventory for the lab.
11 | ```
12 | [ec2-user@master0 ~]$ wget https://raw.githubusercontent.com/openshift/openshift-ansible/release-3.11/inventory/hosts.example
13 | [ec2-user@master0 ~]$ vimdiff hosts.example /etc/ansible/hosts
14 | ```
15 | ---
16 |
17 | **End of Lab 2.1**
18 |
19 |
39 |
40 | [← back to the Chapter Overview](10_warmup.md)
41 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to contribute to the APPUiO OpenShift OPStechlab
2 |
3 | :+1::tada: First off, thanks for taking the time to contribute! :tada::+1:
4 |
5 | ## Did you find a bug?
6 |
7 | * **Ensure the bug was not already reported** by searching on GitHub under [Issues](https://github.com/appuio/ops-techlab/issues).
8 |
9 | * If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/appuio/ops-techlab/issues/new). Be sure to include a **title and clear description**, as much relevant information as possible, and a **code sample** or an **executable test case** demonstrating the expected behavior that is not occurring.
10 |
11 | ## Did you write a patch that fixes a bug?
12 |
13 | * Open a new GitHub pull request with the patch.
14 |
15 | * Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable.
16 |
17 | ## Do you intend to add a new feature or change an existing one?
18 |
19 | * **Feature Request**: open an issue on GitHub and describe your feature.
20 |
21 | * **New Feature**: Implement your Feature on a fork and create a pull request. The core team will gladly check and eventually merge your pull request.
22 |
23 | ## Do you have questions about the techlab?
24 |
25 | * Ask your question as an issue on GitHub
26 |
27 | Thanks!
28 |
--------------------------------------------------------------------------------
/labs/11_overview.md:
--------------------------------------------------------------------------------
1 | ## Lab 1.1: Architectural Overview
2 |
3 | This is the environment we will build and work on. It is deployed on Amazon AWS.
4 |
5 | 
6 |
7 | Our lab installation consists of the following components:
8 | 1. Three Load Balancers
9 | 1. Application Load Balancer app[X]: Used for load balancing requests to the routers (*.app[X].lab.openshift.ch)
10 | 1. Application Load Balancer console[X]: Used for load balancing reqeusts to the master APIs (console.user[X].lab.openshift.ch)
11 | 1. Classic Load Balancer console[X]-internal: Used for internal load balancing reqeusts to the master APIs (internalconsole.user[X].lab.openshift.ch)
12 | 1. Two OpenShift masters, one will be added later
13 | 1. Two etcd, one will be added later
14 | 1. Three infra nodes, where the following components are running:
15 | 1. Container Native Storage (Gluster)
16 | 1. Routers
17 | 1. Metrics
18 | 1. Logging
19 | 1. Monitoring (Prometheus)
20 | 1. One app node, one will be added later
21 | 1. We are going to use the jump host as a bastion host (jump.lab.openshift.ch)
22 |
23 |
24 | ---
25 |
26 | **End of Lab 1.1**
27 |
28 |
44 |
45 | [← back to the Chapter Overview](20_installation.md)
46 |
--------------------------------------------------------------------------------
/labs/22_installation.md:
--------------------------------------------------------------------------------
1 | ## Lab 2.2: Install OpenShift
2 |
3 | In the previous lab we prepared the Ansible inventory to fit our test lab environment. Now we can prepare and run the installation.
4 |
5 | To make sure the playbook keeps on running even if there are network issues or something similar, we strongly encourage you to e.g. use `screen` or `tmux`.
6 |
7 | Now we run the prepare_hosts_for_ose.yml playbook. This will do the following:
8 | - Install the prerequisite packages: wget git net-tools bind-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
9 | - Enable Ansible ssh pipelining (performance improvements for Ansible)
10 | - Set timezone
11 | - Ensure hostname is preserved in cloud-init
12 | - Set default passwords
13 | - Install oc clients for various platforms on all master
14 |
15 | ```
16 | [ec2-user@master0 ~]$ ansible-playbook /home/ec2-user/resource/prepare_hosts_for_ose.yml
17 | ```
18 |
19 | Run the installation
20 | 1. Install OpenShift. This takes a while, get a coffee.
21 | ```
22 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
23 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
24 | ```
25 |
26 | 2. Add the cluster-admin role to the "sheriff" user.
27 | ```
28 | [ec2-user@master0 ~]$ oc adm policy --as system:admin add-cluster-role-to-user cluster-admin sheriff
29 | ```
30 |
31 | 3. Now open your browser and access the master API with the user "sheriff":
32 | ```
33 | https://console.user[X].lab.openshift.ch/console/
34 | ```
35 | Password is documented in the Ansible inventory:
36 | ```
37 | [ec2-user@master0 ~]$ grep keepass /etc/ansible/hosts
38 | ```
39 |
40 | 4. Deploy the APPUiO openshift-client-distributor. This provides the correct oc client in a Pod and can then be obtained via the OpenShift GUI. For this to work, the Masters must have the package `atomic-openshift-clients-redistributable` installed. In addition the variable `openshift_web_console_extension_script_urls` must be defined in the inventory.
41 | ```
42 | [ec2-user@master0 ~]$ grep openshift_web_console_extension_script_urls /etc/ansible/hosts
43 | openshift_web_console_extension_script_urls=["https://client.app1.lab.openshift.ch/cli-download-customization.js"]
44 | [ec2-user@master0 ~]$ ansible masters -m shell -a "rpm -qi atomic-openshift-clients-redistributable"
45 | ```
46 |
47 | Deploy the openshift-client-distributor.
48 | ```
49 | [ec2-user@master0 ~]$ sudo yum install python-openshift
50 | [ec2-user@master0 ~]$ git clone https://github.com/appuio/openshift-client-distributor
51 | [ec2-user@master0 ~]$ cd openshift-client-distributor
52 | [ec2-user@master0 ~]$ ansible-playbook playbook.yml -e 'openshift_client_distributor_hostname=client.app[X].lab.openshift.ch'
53 | ```
54 |
55 | 5. You can now download the client binary and use it from your local workstation. The binary is available for Linux, macOS and Windows. (optional)
56 | ```
57 | https://console.user[X].lab.openshift.ch/console/command-line
58 | ```
59 |
60 | ---
61 |
62 | **End of Lab 2.2**
63 |
64 |
65 |
66 | [← back to the Chapter Overview](20_installation.md)
67 |
--------------------------------------------------------------------------------
/resources/11_ops-techlab.xml:
--------------------------------------------------------------------------------
1 | 7V1tc9o6Fv41zOzuTBhLsiz7Y0jabmebTudmX7L7zWABvjWYa0yb3F+/kt/AOjIQkB0TmkwbLNsC9JzznKNzjuQBuVs8f0r81fwhDng0wFbwPCD3A4yxRWzxR7a85C0IOUXLLAmDom3b8Bj+yYtGq2jdhAFf1y5M4zhKw1W9cRIvl3yS1tr8JIl/1i+bxlH9XVf+jIOGx4kfwdb/hEE6L1odam9P/J2Hs3nx1pgQJz8z9iffZ0m8WRZvOMBkmv3kpxd+2VnxTddzP4h/7jSRDwNyl8Rxmr9aPN/xSI5uOW75fR8bzlYfPOHL9JgbXJzf8cOPNsWX36x5UvSzTl/KIcm+E5d3WQMy+jkPU/648ify7E8hBaJtni4icYTEy3WaxN+roRNfajQNo+gujuIk641MvSmTnY2m8TIt8EdUHPtROFuKg4hP06qjnRuJQzwib/zBkzQUiN0WN6Sx/Azw6xffRF7On3eaiuH4xOMFT5MXcUkpvpjkt5TCi2l+/HMrCa5btM13hICiotEvpG9W9b0FQLwoMGjAgwA8BtiJ5FgE4Q/ZefF9ResfGyklo2yoqiPxaib/ShSfylvFu2Z356daxpZnP73EFiHkHgQXM9wSuDZUNvQsjv/97a5EapyUKBF55nEzXvIUnoQtWUf3Xx/l547XqUATW/+Ll1xzbyY8/kKOadGZtd6Mg3jhh0vJ2jziMz8N46VJQUGKoCzlZxMfIPGDUACqNB8QndYExLbq4kEsIB6IMAvKh+2458uH5wD5+BpLQ3gWDoG/nmfXqhBIHqby9wRddbKfdnWV1HXVdqCuIlejq5VSn4UFBVg8+EKvkhbRCCh3A/sENFw8Jh2jgW1yJBoEG0DDBmh8Xk4T/6Zl/Qh87k4np+jHxOXjaaeIEHakfniWAT+lWTtUe/PAF7HonNyK6+1PGtN19+1fxVnx6jrVC5O64cEltrtOZ2li6trFzgeznHY1adc+RN29iOIcUfX8fbj+Xlxgyfutv9x9ffzrVaqxirxOi/XIWwbUGCGA/GnTjbYkRTq1DcLS5BObnOxcqO+kCpXOddILlQnXCc5yFpltsIYGQgtNnK5OJMQ5jwUWY3sAmojrebKX8eFs1wA8xFPYntpDje3GmnmGY0DpoS+V44N+4VMMMsElHm+DEJx7hJk9XgqWbUWLSvuoQYlx3+HWSShV1rMTLSJupxjBubq/WrWG0EFjsweF0txowB1nv10hZDPUJUJsjxa1wnWHvMwjNGWfArbEdbbCdR3rEYJk91u8keNkVH/cCZ80MJwY33GmIK915MYutallBgdqIRUFgAHVQGDCYUOQy37js3BddnO1INjY0qiC0xYMkLAeRB/h5MyYSKcotEJSzLbruCCvS1w8gAsAZD33V/LlNOLPtzIDLb44XwbFy/tJ5K/X4WQ/LnmeGI49fw7TpwJq+fq/8vUQa2BpHGoe1BLecKAPsEzZlvDIT8MfvNa5bmyLd/gWh+KTbNMrrupYU1ppWNnNOt4kE17cuUUJdOY4oDPbGzKlt9RPZjwFvWWYV9//uGktjJJlJQOblWiU8Q8gFEK60zrkCV+Hf/rj7AIJ50p+ruyT0tGA3ut8BTVIsAiDQN4/ivwxj0ZVvcExkqSRj1K2VV2siiuKTzvYrU/Q6ag1RK7Lanjc2Eak5garQCOqCk08na752QjDyMWdzKheN/lSV+MhHke+9vnkq/Hif5HvSWrE1EizjU+lXpepXSG1K4O8C3NNW97Ns06iC55Ogksk4Eq++0zAxKvTr6LXRsiX6JJQYRr6WW0hX0Xxy4IXwmSMjZtwOjxjt7KftjhXTMtrI+4w3OWcnMC00Dc/kRIRT2XX4p8QKKl34u/4jSCp5WY6xod4b4sPDOF/eF5F/lJftgVpsD5cZdEVLM86ngx1KNfl4Ahn9NVq4iow2DoYbJ1rYgIF6JqUIeA2wovdh4BrGmYALphX6TgiTKArkae+8LtIfZnGy3Xqbl7HWTAC4y6/pmOC+48J17c1HSu5dGf85fTmsTiMk3Qez+KlH33Ytip2YAeNanKVT6iq+VU2fxOfbPecPC5P/s7T9KWAxN+ksWjavvOXWJZR5GjvR64RjXweNKgFBfIJTb3t4MzuaMf76PGHlv8asyVqQLjLbAmDZl/M1NZxxHMjMhQu8TBe8eV6Hk7T4WR+HjSH19JMp3hySnFbIICkjqEgkcJKZYBgd10F0gBCDNgJBq3634bCE7tOJFxW94kZYjqjrTMQRsDoLFly1FDvmJGaFUH5W5ZgkuJsuRbRGRLpEpcDhrPTzbBt7Zhqxo4yMaUA75oYhI61Md1EDz1U9wUZOTV26KqxQ6+92CEisLpAvv9HWy5f+Ii8ct3mZQUNKyU7O2h4I6OGFNcQuUHnSYzRsKCrq1DuH59YdT5BKqHYZ7PJ3jldjU2Yjk3sXrEJspSVdk5ZT/daOkEWVpbBYGfoUoZsnP/vtcYtFEas7XJV1GXxidvkZZzCJ5RRZalYn+gE+oq7zvvTVbqMwtdQfMYOvXcXOoyZ936lUGBan9k2uO9Us3jDBBoe5LS3NbeVSXVVizo830NHBy26OLh/rh29DF7h2btaz572yxYjptpipY+jbbFN6z1VPbdhfaFjeKHWt9I5E9bXsZhSSNwj6+v1zZlvdtjp0HW6iwCURrDOE06/eIJYSrbVwSfyhBD5XR+d1CXWduzh1n93qbKzhsnoAAzlvoPogGfSm8cW6xWDQMRuv30GKJ2T/OBOYAX8BD/Q8xzkygs1WZOAMuIFZnxEolhZUq5a6CD54cH5050YpiSOIsM5qN7DYFvuQRh0a6yNwAAnTVenBo4SkRaDopkptbZmyoIzpWvVBMehSl2PHou2tAFZ0K+8LHUwXrmj1np3aCKQhd+HZhgHxaUHQWlPReDGktqS+f7CYcJku+ggAu2pBXRdf98sVlcZ5qx2OWoOOCMdECZCnFVmfbd6+sso32JIlo1cAwBqOt+xdGFm12kLAljGdltCUORjxOeXU3Hx/92X0U7zTSgn0Uu5LOT9w0Qdde0dcjwAk7ba0wxOsHxCxelJj9PTVeEEMmgdwwQ3cNhltKergKDaJPSNKM0Fo2ykMprtSVgxUNPWZWV0xQ6nZbeM10aj7jZqaLVWiA6FLBlLbZ6Tvqzs9C7C/aoksm06JDs/dWvp4dpJdbPU47eaAGYY20N3T88mkxRQqi800YnM7TZxYw2Jo0wi+pSnqBZF9Y+L2D4y8gAXWaeVLuLXkJG8bntKHpxPU7hXNKVhELqfQY7lJmbhWj9KINbpjKdsKPGXylOV8hrhKaosFj+zvFqp0jZMW7rnz/SCtvYt31BZyx563tke1CvoB0P6Yb2iH8/2ho7NLOq6mFCsCCSz8VBcghArrjmNiTzhE+32ArdIKN49u0ap9zbpMMFs0Duo6tiqpglSsjxSny33atEH0uxRfQE0BEtUrQ5XkWlpqF+lpgcYwgwNMWvfm9jC7apoUF7VHg3h90lDpWoaqS5zSpT76Q3BuGq+X8hTtlpEYCAwucUSCBXKy9o5pJFgXhMCV5aAY1ezbqEkJOO7SGv2CZQb8bQM1CXumw+AsjVZ8BaBghWEAChkGqiL3D4fAuXBp0K1CFTfoumXvRRIny3BPavxx8qWT6hcG/76Gn918yirxTp+yCkXG3oyGSLHrEx39mNap9nj8x1TSm0bK1QjlDwIfyRvODre6Fd8CVUzTfUBhq/nDavOG8RqcXcQGO18D1M0Yix8LSjEo8qewr0KFNmwQKknjGIsk9atl2Jpdg/5sfj6Nf3+4Z9/jO4eVp8+/2OFRvYNcvrluri4HtCsnmrzWgYCi5ip4k2bzJnBAINGgC+BcypFvLqUWekhX3VlPVP28+l4MRa9+NUNphebeI7qRnW41IHC9M320Xut7LV8iY/eU7dXJi7V6EyLD9779eS9TFPUDTO0Gyq3piswdPElns3C5eyCUGhFPTxlI9NuH8ZnwyKCX5uPCydbpazungSFKHSWAR79KwroYW1SZZ97MnVDFmZKdaRzxsOpEHjKRKtbzNJ3uYnMVtuuoNyI9i3VlTY/+6CVlSP6RUYdVmuX4eKeMJKLlGDSqeFspjzIC3vtEZEDsyuXmgWjJrNgvV4o0hBbhVCWy2nfxbMmNLgC1miONTmHnzVBNVURJpZ5NsDVt30NT1t4aCg/0WgeDmcnFPo/aEv2a49mB/S3MiiOp6z+QYowHu3dWl69I7e17ETD4OJLtjJHG5S66O1T+32mp9d7oTd8KRhK15mei92e+yzTg4hdDxb2wPb0balE+2V6+w3MyalxQ8aH6IwPfUvjgzBR0qmnWh8khHtIPMZw8atYtXKxRWfGCKYwrtYYNfg3W2PU663B99eYKHumnWdqerLa4ixDRLx6IL7KgHeROWzACsZBe2KI2ivF6rcl0hSO4re0Q7ZaXH6qGQIdtfeciYahhVlbjbRfhdlhh8zOJRVzNXzHI9ID51ggdit/Dc1Qqu13q23QNIZBYxfUOMIRhkEcJnGc7o62+Krzhzjg8or/Aw==
--------------------------------------------------------------------------------
/labs/31_user_management.md:
--------------------------------------------------------------------------------
1 | ## Lab 3.1: Manage Users
2 |
3 | ### OpenShift Authorization
4 |
5 | Before you begin with this lab, make sure you roughly understand the authorization concept of OpenShift.
6 | [Authorization](https://docs.openshift.com/container-platform/3.6/architecture/additional_concepts/authorization.html)
7 |
8 | ### Add User to Project
9 |
10 | First we create a user and give him the admin role in the openshift-infra project.
11 | Login to the master and create the local user with ansible on all masters (replace ```[password]```):
12 | ```
13 | [ec2-user@master0 ~]$ ansible masters -a "htpasswd -b /etc/origin/master/htpasswd cowboy [password]"
14 | ```
15 |
16 | Add the admin role to the newly created user, but only for the project `openshift-infra`:
17 | ```
18 | [ec2-user@master0 ~]$ oc adm policy add-role-to-user admin cowboy -n openshift-infra
19 | ```
20 |
21 | Now login with the new user from your client and check if you see the `openshift-infra` project:
22 | ```
23 | [localuser@localhost ~]$ oc login https://console.user[X].lab.openshift.ch
24 | Username: cowboy
25 | Password:
26 | Login successful.
27 |
28 | You have one project on this server: "openshift-infra"
29 |
30 | Using project "openshift-infra".
31 | ```
32 |
33 | ### Add Cluster Role to User
34 |
35 | In order to keep things clean, we delete the created rolebinding for the `openshift-infra` project again and give the user "cowboy" the global "cluster-admin" role.
36 |
37 | Login as "sheriff":
38 | ```
39 | [ec2-user@master0 ~]$ oc login -u sheriff
40 | ```
41 |
42 | Add the cluster-admin role to the created user:
43 | ```
44 | [ec2-user@master0 ~]$ oc adm policy remove-role-from-user admin cowboy -n openshift-infra
45 | role "admin" removed: "cowboy"
46 | [ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-user cluster-admin cowboy
47 | cluster role "cluster-admin" added: "cowboy"
48 | ```
49 |
50 | Now you can try to login from your client with user "cowboy" and check if you see all projects:
51 | ```
52 | [localuser@localhost ~]$ oc login https://console.user[X].lab.openshift.ch
53 | Authentication required for https://console.user[X].lab.openshift.ch (openshift)
54 | Username: cowboy
55 | Password:
56 | Login successful.
57 |
58 | You have access to the following projects and can switch between them with 'oc project ':
59 |
60 | appuio-infra
61 | default
62 | kube-public
63 | kube-system
64 | logging
65 | management-infra
66 | openshift
67 | * openshift-infra
68 |
69 | Using project "openshift-infra".
70 | ```
71 |
72 |
73 | ### Create Group and Add User
74 |
75 | Instead of giving privileges to single users, we can also create a group and assign a role to that group.
76 |
77 | Groups can be created manually or synchronized from an LDAP directory. So let's first create a local group manually and add the user "cowboy" to it:
78 | ```
79 | [ec2-user@master0 ~]$ oc login -u sheriff
80 |
81 | [ec2-user@master0 ~]$ oc adm groups new deputy-sheriffs cowboy
82 | group.user.openshift.io/deputy-sheriffs created
83 | [ec2-user@master0 ~]$ oc get groups
84 | NAME USERS
85 | deputy-sheriffs cowboy
86 | ```
87 |
88 | Add the cluster-role to the group "deputy-sheriffs":
89 | ```
90 | [ec2-user@master0 ~]$ oc adm policy add-cluster-role-to-group cluster-admin deputy-sheriffs
91 | cluster role "cluster-admin" added: "deputy-sheriffs"
92 | ```
93 |
94 | Verify that the group has been added to the cluster-admins:
95 | ```
96 | [ec2-user@master0 ~]$ oc get clusterrolebindings | grep cluster-admin
97 | cluster-admin /cluster-admin sheriff, cowboy system:masters, deputy-sheriffs
98 | ```
99 |
100 |
101 | ### Evaluate Authorizations
102 |
103 | It's possible to evaluate authorizations. This can be done with the following pattern:
104 | ```
105 | oc policy who-can VERB RESOURCE_NAME
106 | ```
107 |
108 | Examples:
109 | Who can delete the `openshift-infra` project:
110 | ```
111 | oc policy who-can delete project -n openshift-infra
112 | ```
113 |
114 | Who can create configmaps in the `default` project:
115 | ```
116 | oc policy who-can create configmaps -n default
117 | ```
118 |
119 | You can also get a description of all available clusterroles and clusterrolebinding with the following oc command:
120 | ```
121 | [ec2-user@master0 ~]$ oc describe clusterrole.rbac
122 | ```
123 |
124 | ```
125 | [ec2-user@master0 ~]$ oc describe clusterrolebinding.rbac
126 | ```
127 | https://docs.openshift.com/container-platform/3.11/admin_guide/manage_rbac.html
128 |
129 | ### Cleanup
130 |
131 | Delete the group, entity and user:
132 | ```
133 | [ec2-user@master0 ~]$ oc get group
134 | [ec2-user@master0 ~]$ oc delete group deputy-sheriffs
135 |
136 | [ec2-user@master0 ~]$ oc get user
137 | [ec2-user@master0 ~]$ oc delete user cowboy
138 |
139 | [ec2-user@master0 ~]$ oc get identity
140 | [ec2-user@master0 ~]$ oc delete identity htpasswd_auth:cowboy
141 |
142 | [ec2-user@master0 ~]$ ansible masters -a "htpasswd -D /etc/origin/master/htpasswd cowboy"
143 | ```
144 |
145 | ---
146 |
147 | **End of Lab 3.1**
148 |
149 |
150 |
151 | [← back to the Chapter Overview](30_daily_business.md)
152 |
--------------------------------------------------------------------------------
/labs/32_update_hosts.md:
--------------------------------------------------------------------------------
1 | ## Lab 3.2: Update Hosts
2 |
3 | ### OpenShift Excluder
4 | In this lab we take a look at the OpenShift excluders, apply OS updates to all nodes, drain, reboot and schedule them again.
5 |
6 | The config playbook we use to install and configure OpenShift removes yum excludes for specific packages at its beginning. Likewise it adds them back at the end of the playbook run. This makes it possible to update OpenShift-specific packages during a playbook run but freeze these package versions during e.g. a `yum update`.
7 |
8 | First, let's check if the excludes have been set on all nodes. Connect to the first master and run:
9 | ```
10 | [ec2-user@master0 ~]$ ansible nodes -m shell -a "atomic-openshift-excluder status && atomic-openshift-docker-excluder status"
11 | ...
12 | app-node0.user[X].lab.openshift.ch | SUCCESS | rc=0 >>
13 | exclude -- All packages excluded
14 | exclude -- All packages excluded
15 | ...
16 | ```
17 |
18 | These excludes are set by using the OpenShift Ansible playbooks or when using the command `atomic-openshift-excluder` or `atomic-openshift-docker-excluder`. For demonstration purposes, we will now remove and set these excludes again.
19 |
20 | ```
21 | [ec2-user@master0 ~]$ ansible nodes -m shell -a "atomic-openshift-excluder unexclude && atomic-openshift-docker-excluder unexclude"
22 | [ec2-user@master0 ~]$ ansible nodes -m shell -a "atomic-openshift-excluder status && atomic-openshift-docker-excluder status"
23 | [ec2-user@master0 ~]$ ansible nodes -m shell -a "atomic-openshift-excluder exclude && atomic-openshift-docker-excluder exclude"
24 | [ec2-user@master0 ~]$ ansible nodes -m shell -a "atomic-openshift-excluder status && atomic-openshift-docker-excluder status"
25 | ```
26 |
27 |
28 | ### Apply OS Patches to Masters and Nodes
29 |
30 | If you don't know if you're cluster-admin or not.
31 | Query all users with rolebindings=cluster-admin:
32 | ```
33 | oc get clusterrolebinding -o json | jq '.items[] | select(.metadata.name | startswith("cluster-admin")) | .userNames'
34 | ```
35 |
36 | Hint: root on master-node always is system:admin (don't use it for ansible-tasks). But you're able to grant permissions to other users.
37 |
38 | First, login as cluster-admin and drain the first app-node (this deletes all pods so the OpenShift scheduler creates them on other nodes and also disables scheduling of new pods on the node).
39 | ```
40 | [ec2-user@master0 ~]$ oc get nodes
41 | [ec2-user@master0 ~]$ oc adm drain app-node0.user[X].lab.openshift.ch --ignore-daemonsets --delete-local-data
42 | ```
43 |
44 | After draining a node, only pods from DaemonSets should remain on the node:
45 | ```
46 | [ec2-user@master0 ~]$ oc adm manage-node app-node0.user[X].lab.openshift.ch --list-pods
47 |
48 | Listing matched pods on node: app-node0.user[X].lab.openshift.ch
49 |
50 | NAMESPACE NAME READY STATUS RESTARTS AGE
51 | openshift-logging logging-fluentd-lfjnc 1/1 Running 0 33m
52 | openshift-monitoring node-exporter-czhr2 2/2 Running 0 36m
53 | openshift-node sync-rhh8z 1/1 Running 0 46m
54 | openshift-sdn ovs-hz9wj 1/1 Running 0 46m
55 | openshift-sdn sdn-49tpr 1/1 Running 0 46m
56 | ```
57 |
58 | Scheduling should now be disabled for this node:
59 | ```
60 | [ec2-user@master0 ~]$ oc get nodes
61 | ...
62 | app-node0.user[X].lab.openshift.ch Ready,SchedulingDisabled compute 2d v1.11.0+d4cacc0
63 | ...
64 |
65 | ```
66 |
67 | If everything looks good, you can update the node and reboot it. The first command can take a while and doesn't output anything until it's done:
68 | ```
69 | [ec2-user@master0 ~]$ ansible app-node0.user[X].lab.openshift.ch -m yum -a "name='*' state=latest exclude='atomic-openshift-* openshift-* docker-*'"
70 | [ec2-user@master0 ~]$ ansible app-node0.user[X].lab.openshift.ch --poll=0 --background=1 -m shell -a 'sleep 2 && reboot'
71 | ```
72 |
73 | After the node becomes ready again, enable schedulable anew. Do not do this before the node has rebooted (it takes a while for the node's status to change to `Not Ready`):
74 | ```
75 | [ec2-user@master0 ~]$ oc get nodes -w
76 | [ec2-user@master0 ~]$ oc adm manage-node app-node0.user[X].lab.openshift.ch --schedulable
77 | ```
78 |
79 | Check that pods are correctly starting:
80 | ```
81 | [ec2-user@master0 ~]$ oc adm manage-node app-node0.user[X].lab.openshift.ch --list-pods
82 |
83 | Listing matched pods on node: app-node0.user[X].lab.openshift.ch
84 |
85 | NAMESPACE NAME READY STATUS RESTARTS AGE
86 | dakota ruby-ex-1-6lc87 1/1 Running 0 12m
87 | openshift-logging logging-fluentd-lfjnc 1/1 Running 1 43m
88 | openshift-monitoring node-exporter-czhr2 2/2 Running 2 47m
89 | openshift-node sync-rhh8z 1/1 Running 1 56m
90 | openshift-sdn ovs-hz9wj 1/1 Running 1 56m
91 | openshift-sdn sdn-49tpr 1/1 Running 1 56m
92 | ```
93 |
94 | Since we want to update the whole cluster, **you will need to repeat these steps on all servers**. Masters do not need to be drained because they do not run any pods (unschedulable by default).
95 |
96 | ---
97 |
98 | **End of Lab 3.2**
99 |
100 |
101 |
102 | [← back to the Chapter Overview](30_daily_business.md)
103 |
--------------------------------------------------------------------------------
/labs/51_backup.md:
--------------------------------------------------------------------------------
1 | ## Lab 5.1: Backup
2 |
3 | In this techlab you will learn how to create a new backup and which files are important. The following items should be backuped:
4 |
5 | - Cluster data files
6 | - etcd data on each master
7 | - API objects (stored in etcd, but it's a good idea to regularly export all objects)
8 | - Docker registry storage
9 | - PV storage
10 | - Certificates
11 | - Ansible hosts file
12 |
13 |
14 | ### Lab 5.1.1: Master Backup Files
15 |
16 | The following files should be backuped on all masters:
17 |
18 | - Ansible inventory file (contains information about the cluster): `/etc/ansible/hosts`
19 | - Configuration files (for the master), certificates and htpasswd: `/etc/origin/master/`
20 | - Docker configurations: `/etc/sysconfig/docker` `/etc/sysconfig/docker-network` `/etc/sysconfig/docker-storage`
21 |
22 | ### Lab 5.1.2: Node Backup Files
23 |
24 | Backup the following folders on all nodes:
25 |
26 | - Node Configuration files: `/etc/origin/node/`
27 | - Certificates for the docker-registry: `/etc/docker/certs.d/`
28 | - Docker configurations: `/etc/sysconfig/docker` `/etc/sysconfig/docker-network` `/etc/sysconfig/docker-storage`
29 |
30 | ### Lab 5.1.3: Application Backup
31 |
32 | To backup the data in persistent volumes, you should mount them somewhere. If you mount a Glusterfs volume, it is guaranteed to be consistent. The bricks directly on the Glusterfs servers can contain small inconsistencies that Glusterfs hasn't synced to the other instances yet.
33 |
34 |
35 | ### Lab 5.1.4: Project Backup
36 |
37 | It is advisable to regularly backup all project data.
38 | We will set up a cronjob in a project called "project-backup" which hourly writes all resources on OpenShift to a PV.
39 | Let's gather the backup-script:
40 | ```
41 | [ec2-user@master0 ~]$ sudo yum install git python-openshift -y
42 | [ec2-user@master0 ~]$ git clone https://github.com/mabegglen/openshift-project-backup
43 | ```
44 | Now we create the cronjob on the first master:
45 | ```
46 | [ec2-user@master0 ~]$ cd openshift-project-backup
47 | [ec2-user@master0 ~]$ ansible-playbook playbook.yml \
48 | -e openshift_project_backup_job_name="cronjob-project-backup" \
49 | -e "openshift_project_backup_schedule=\"0 6,18 * * *\"" \
50 | -e openshift_project_backup_job_service_account="project-backup" \
51 | -e openshift_project_backup_namespace="project-backup" \
52 | -e openshift_project_backup_image="registry.access.redhat.com/openshift3/jenkins-slave-base-rhel7" \
53 | -e openshift_project_backup_image_tag="v3.11" \
54 | -e openshift_project_backup_storage_size="1G" \
55 | -e openshift_project_backup_deadline="3600" \
56 | -e openshift_project_backup_cronjob_api="batch/v1beta1"
57 | ```
58 | Details https://github.com/mabegglen/openshift-project-backup
59 |
60 | If you want to reschedule your backup-job to check it's functionality to every 1minute:
61 |
62 | Change the value of schedule: to "*/1 * * * *"
63 | ```
64 | [ec2-user@master0 ~]$ oc project project-backup
65 | [ec2-user@master0 ~]$ oc get cronjob
66 | [ec2-user@master0 ~]$ oc edit cronjob cronjob-project-backup
67 | ```
68 |
69 | Show if cronjob is active:
70 | ```
71 | [ec2-user@master0 openshift-project-backup]$ oc get cronjob
72 | NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
73 | cronjob-project-backup */1 * * * * False 1 1m 48m
74 | ```
75 |
76 | Show if backup-pod was launched:
77 | ```
78 | [ec2-user@master0 openshift-project-backup]$ oc get pods
79 | NAME READY STATUS RESTARTS AGE
80 | cronjob-project-backup-1561384620-kjm6v 1/1 Running 0 47s
81 |
82 | ```
83 |
84 | Check the logfiles while backup-job is running:
85 | ```
86 | [ec2-user@master0 openshift-project-backup]$ oc logs -f
87 | ```
88 | When your Backupjob runs as expected, don't forget to set up the cronjob back to "0 22 * * *" for example.
89 | ```
90 | [ec2-user@master0 ~]$ oc edit cronjob cronjob-project-backup
91 | ```
92 | If you wanna Restore a project, proceed to [Lab 5.2.1](52_restore.md#5.2.1)
93 |
94 |
95 | ### Lab 5.1.5: Create etcd Backup
96 | We plan to create a Backup of our etcd. When we've created our backup, we wan't to restore them on master1/master2 and scale out from 1 to 3 nodes.
97 |
98 | First we create a snapshot of our etcd cluster:
99 | ```
100 | [root@master0 ~]# export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"
101 | [root@master0 ~]# export ETCD_EP=$(grep https ${ETCD_POD_MANIFEST} | cut -d '/' -f3)
102 | [root@master0 ~]# export ETCD_POD=$(oc get pods -n kube-system | grep -o -m 1 '\S*etcd\S*')
103 | [root@master0 ~]# oc project kube-system
104 | Now using project "kube-system" on server "https://internalconsole.user[x].lab.openshift.ch:443".
105 | [root@master0 ~]# oc exec ${ETCD_POD} -c etcd -- /bin/bash -c "ETCDCTL_API=3 etcdctl \
106 | --cert /etc/etcd/peer.crt \
107 | --key /etc/etcd/peer.key \
108 | --cacert /etc/etcd/ca.crt \
109 | --endpoints $ETCD_EP \
110 | snapshot save /var/lib/etcd/snapshot.db"
111 |
112 | Snapshot saved at /var/lib/etcd/snapshot.db
113 | ```
114 | Check Filesize of the snapshot created:
115 | ```
116 | [root@master0 ~]# ls -hl /var/lib/etcd/snapshot.db
117 | -rw-r--r--. 1 root root 21M Jun 24 16:44 /var/lib/etcd/snapshot.db
118 | ```
119 |
120 | copy them to the tmp directory for further use:
121 | ```
122 | [root@master0 ~]# cp /var/lib/etcd/snapshot.db /tmp/snapshot.db
123 | [root@master0 ~]# cp /var/lib/etcd/member/snap/db /tmp/db
124 | ```
125 | If you wanna Restore an etcd, proceed to [Lab 5.2.2](52_restore.md#5.2.2)
126 |
127 | ---
128 |
129 | **End of Lab 5.1**
130 |
131 |
132 |
133 | [← back to the Chapter Overview](50_backup_restore.md)
134 |
--------------------------------------------------------------------------------
/labs/62_logs.md:
--------------------------------------------------------------------------------
1 | ## Lab 6.2: Troubleshooting Using Logs
2 |
3 | As soon as basic functionality of OpenShift itself is reduced or not working at all, we have to have a closer look at the underlying components' log messages. We find these logs either in the journal on the different servers or in Elasticsearch.
4 |
5 | **Note:** If the logging component is not part of the installation, Elasticsearch is not available and therefore the only log location is the journal. Also be aware that Fluentd is responsible for aggregating log messages, but it is possible that Fluentd was not deployed on all OpenShift nodes even though it is a DaemonSet. Check Fluentd's node selector and the node's labels to make sure all logs are aggregated as expected.
6 |
7 | **Note:** While it is convenient to use the EFK stack to analyze log messages in a central place, be aware that depending on the problem, relevant log messages might not be received by Elasticsearch (e.g. SDN problems).
8 |
9 |
10 | ### OpenShift Components Overview
11 |
12 | The master usually houses three master-specific containers:
13 | * `master-api` in OpenShift project `kube-system`
14 | * `master-controllers` in OpenShift project `kube-system`
15 | * `master-etcd` in OpenShift project `kube-system` (usually installed on all masters, also possible externally)
16 |
17 | The node-specific containers can also be found on a master:
18 | * `sync` in OpenShift project `openshift-node`
19 | * `sdn` and `ovs` in OpenShift project `openshift-sdn`
20 |
21 | The node-specific services can also be found on a master:
22 | * `atomic-openshift-node` (in order for the master to be part of the SDN)
23 | * `docker`
24 |
25 | General services include the following:
26 | * `dnsmasq`
27 | * `NetworkManager`
28 | * `firewalld`
29 |
30 |
31 | ### Service States
32 |
33 | Check etcd and master states from the first master using ansible. Check the OpenShift master container first:
34 | ```
35 | [ec2-user@master0 ~]$ oc get pods -n kube-system -o wide
36 | NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
37 | master-api-master0.user7.lab.openshift.ch 1/1 Running 9 1d 172.31.44.160 master0.user7.lab.openshift.ch
38 | master-api-master1.user7.lab.openshift.ch 1/1 Running 7 1d 172.31.45.211 master1.user7.lab.openshift.ch
39 | master-api-master2.user7.lab.openshift.ch 1/1 Running 0 4m 172.31.35.148 master2.user7.lab.openshift.ch
40 | master-controllers-master0.user7.lab.openshift.ch 1/1 Running 7 1d 172.31.44.160 master0.user7.lab.openshift.ch
41 | master-controllers-master1.user7.lab.openshift.ch 1/1 Running 6 1d 172.31.45.211 master1.user7.lab.openshift.ch
42 | master-controllers-master2.user7.lab.openshift.ch 1/1 Running 0 4m 172.31.35.148 master2.user7.lab.openshift.ch
43 | master-etcd-master0.user7.lab.openshift.ch 1/1 Running 6 1d 172.31.44.160 master0.user7.lab.openshift.ch
44 | master-etcd-master1.user7.lab.openshift.ch 1/1 Running 4 1d 172.31.45.211 master1.user7.lab.openshift.ch
45 | ```
46 |
47 | Depending on the outcome of the above commands we have to get a closer look at specific container. This can either be done the conventional way, e.g. the 30 most recent messages for etcd on the first master:
48 |
49 | ```
50 | [ec2-user@master0 ~]$ oc logs master-etcd-master0.user7.lab.openshift.ch -n kube-system --tail=30
51 | ```
52 |
53 | There is also the possibility of checking etcd's health using `etcdctl`:
54 | ```
55 | [root@master0 ~]# etcdctl2 --cert-file=/etc/etcd/peer.crt \
56 | --key-file=/etc/etcd/peer.key \
57 | --ca-file=/etc/etcd/ca.crt \
58 | --peers="https://master0.user[X].lab.openshift.ch:2379,https://master1.user[X].lab.openshift.ch:2379" \
59 | cluster-health
60 | ```
61 |
62 | As an etcd cluster needs a quorum to update its state, `etcdctl` will output that the cluster is healthy even if not every member is.
63 |
64 | Back to checking services with systemd: Master-specific services only need to be executed on master hosts, so note the change of the host group in the following command.
65 |
66 | atomic-openshift-node:
67 | ```
68 | [ec2-user@master0 ~]$ ansible nodes -a "systemctl is-active atomic-openshift-node"
69 | ```
70 |
71 | Above command applies to all the other node services (`docker`, `dnsmasq` and `NetworkManager`) with which we get an overall overview of OpenShift-specific service states.
72 |
73 | Depending on the outcome of the above commands we have to get a closer look at specific services. This can either be done the conventional way, e.g. the 30 most recent messages for atomic-openshift-node on the first master:
74 |
75 | ```
76 | [ec2-user@master0 ~]$ ansible masters[0] -a "journalctl -u atomic-openshift-node -n 30"
77 | ```
78 |
79 | Or by searching Elasticsearch: After logging in to https://logging.app[X].lab.openshift.ch, make sure you're on Kibana's "Discover" tab. Then choose the `.operations.*` index by clicking on the arrow in the dark-grey box on the left to get a list of all available indices. You can then create search queries such as `systemd.t.SYSTEMD_UNIT:atomic-openshift-node.service` in order to filter for all messages from every running OpenShift node service.
80 |
81 | Or if we wanted to filter for error messages we could simply use "error" in the search bar and then by looking at the available fields (in the menu on the left) limit the search results further.
82 |
83 | ---
84 |
85 | **End of Lab 6.2**
86 |
87 |
88 |
89 | [← back to the Chapter Overview](60_monitoring_troubleshooting.md)
90 |
--------------------------------------------------------------------------------
/labs/42_outgoing_http_proxies.md:
--------------------------------------------------------------------------------
1 | ## Lab 4.2: Outgoing HTTP Proxies
2 |
3 | Large corporations often allow internet access only via outgoing HTTP proxies for security reasons.
4 | To use OpenShift Container Platform in such an environment the various OpenShift components and
5 | the containers that run on the platform need to be configured to use an HTTP proxy. In addition
6 | internal resources must be excluded from access via proxy as outgoing proxies usually only allow
7 | access to external resources. This lab shows how to configure OpenShift Container for outgoing
8 | HTTP proxies using the included Ansible playbooks.
9 | We haven't yet added an outgoing HTTP proxy to our lab environment. Therefore this lab currently doesn't
10 | contain hands-on exercises.
11 |
12 |
13 | ### Configure the Ansible Inventory
14 |
15 | The OpenShift Ansible playbooks support three groups of variables for outgoing HTTP proxy configuration.
16 |
17 | Configuration for OpenShift components, container runtime, e.g. Docker, and containers running on the platform:
18 | ```
19 | openshift_http_proxy=
20 | openshift_https_proxy=
21 | openshift_no_proxy=''
22 | ```
23 |
24 | Where `` can take one of the following forms:
25 | ```
26 | http://proxy.example.org:3128
27 | http://192.0.2.42:3128
28 | http://proxyuser:proxypass@proxy.example.org:3128
29 | http://proxyuser:proxypass@192.0.2.42:3128
30 | ```
31 |
32 | In all cases https can be used instead of http, provided this is supported by the proxy.
33 | `` consists of a comma separated list of:
34 | * hostnames, e.g. `my.example.org`
35 | * domains, e.g. `.example.org`
36 | * IP addresses, e.g. `192.0.2.42`
37 |
38 | Additionally OpenShift implements support for IP subnets, e.g. `192.0.2.0/24`, in `no_proxy`. However other software, including Docker, does not support such entries and ignores them.
39 |
40 | Docker build containers are created directly by Docker with a clean environment, i.e. without the required proxy environment variables.
41 | The following variables tell OpenShift to add `ENV` instructions with the outgoing HTTP proxy configuration to all Docker builds.
42 | This is needed to allow builds to download dependencies from external sources:
43 | ```
44 | openshift_builddefaults_http_proxy=
45 | openshift_builddefaults_https_proxy=
46 | openshift_builddefaults_no_proxy=''
47 | ```
48 |
49 | Finally an outgoing HTTP proxy can be configured to allow OpenShift builds to check out sources from external Git repositories:
50 | ```
51 | openshift_builddefaults_git_http_proxy=
52 | openshift_builddefaults_git_https_proxy=
53 | openshift_builddefaults_git_no_proxy=
54 | ```
55 |
56 |
57 | ### Internal Docker Registry
58 |
59 | It's recommended to add the IP address of the internal registry to the `no_proxy`
60 | lists. The IP addressed of the internal registry can be looked up after cluster installation with:
61 | ```
62 | [ec2-user@master0 ~]$ oc get svc docker-registry -n default -o jsonpath='{.spec.clusterIP}'
63 | ```
64 |
65 | For OpenShift Container Platform 3.5 and earlier this is required as the registry is always
66 | accessed via IP address and Docker doesn't support IP subnets in its `no_proxy` list!
67 |
68 |
69 | ### Build Tools
70 |
71 | Some build tools use a different mechanism and need additional configuration for accessing outgoing HTTP proxies.
72 |
73 |
74 | #### Maven
75 |
76 | The Java build tool Maven reads the [proxy configuration from its settings.xml](https://maven.apache.org/guides/mini/guide-proxies.html).
77 | Java base images by Red Hat contain [support for configuring Maven's outgoing proxy through environment variables](https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.0/html-single/red_hat_jboss_enterprise_application_platform_for_openshift/#eap_s2i_process).
78 | These environment variables are used by all Red Hat Java base images, not just JBoss ones. They must be added to the BuildConfigs of Maven builds.
79 | To add them to all BuildConfigs on the platform you can use the Ansible inventory variable `openshift_builddefaults_json`,
80 | which must then contain the whole build proxy configuration, i.e. the other `openshift_builddefaults_` variables mentioned earlier are ignored. E.g.:
81 | ```
82 | openshift_builddefaults_json='
83 | {"BuildDefaults":{"configuration":{"apiVersion":"v1","env":[
84 | {"name":"HTTP_PROXY","value":""},
85 | {"name":"HTTPS_PROXY","value":""},
86 | {"name":"NO_PROXY","value":""},
87 | {"name":"http_proxy","value":""},
88 | {"name":"https_proxy","value":""},
89 | {"name":"no_proxy","value":""},
90 | {"name":"HTTP_PROXY_HOST","value":""},
91 | {"name":"HTTP_PROXY_PORT","value":""},
92 | {"name":"HTTP_PROXY_USERNAME","value":""},
93 | {"name":"HTTP_PROXY_PASSWORD","value":""},
94 | {"name":"HTTP_PROXY_NONPROXYHOSTS","value":""}],
95 | "gitHTTPProxy":"",
96 | "gitHTTPSProxy":"",
97 | "gitNoProxy":"",
98 | "kind":"BuildDefaultsConfig"}}}'
99 | ```
100 |
101 | Note that the value has to be valid JSON.
102 | Also Ansible inventories in INI format do not support line folding, so this has to be a single line.
103 |
104 | If you use Java base images other than the ones provided by Red Hat you have to implement your own solution to configure an outgoing HTTP proxy for Maven.
105 |
106 |
107 | ### Apply Outgoing HTTP Proxy Configuration to Cluster
108 |
109 | To apply the outgoing HTTP proxy configuration to the cluster you have to run the master and node config playbooks:
110 | ```
111 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-node/bootstrap.yml
112 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-master/config.yml
113 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-master/additional_config.yml
114 | ```
115 |
116 |
117 | ---
118 |
119 | **End of Lab 4.2**
120 |
121 |
122 |
123 | [← back to the Chapter Overview](40_configuration_best_practices.md)
124 |
--------------------------------------------------------------------------------
/labs/71_upgrade_openshift3.11.104.md:
--------------------------------------------------------------------------------
1 | ## Lab 7.1: Upgrade OpenShift 3.11.88 to 3.11.104
2 |
3 | ### Upgrade Preparation
4 |
5 | We first need to make sure our lab environment fulfills the requirements mentioned in the official documentation. We are going to do an "[Automated In-place Cluster Upgrade](https://docs.openshift.com/container-platform/3.11/upgrading/automated_upgrades.html)" which lists part of these requirements and explains how to verify the current installation. Also check the [Prerequisites](https://docs.openshift.com/container-platform/3.11/install/prerequisites.html#install-config-install-prerequisites) of the new release.
6 |
7 | Conveniently, our lab environment already fulfills all the requirements, so we can move on to the next step.
8 |
9 | #### 1. Ensure the openshift_deployment_type=openshift-enterprise ####
10 | ```
11 | [ec2-user@master0 ~]$ grep -i openshift_deployment_type /etc/ansible/hosts
12 | ```
13 |
14 | #### 2. disable rolling, full system restarts of the hosts ####
15 | ```
16 | [ec2-user@master0 ~]$ ansible masters -m shell -a "grep -i openshift_rolling_restart_mode /etc/ansible/hosts"
17 | ```
18 | in our lab environment this parameter isn't set, so let's do it on all master-nodes:
19 | ```
20 | [ec2-user@master0 ~]$ ansible masters -m lineinfile -a 'path="/etc/ansible/hosts" regexp="^openshift_rolling_restart_mode" line="openshift_rolling_restart_mode=services" state="present"'
21 | ```
22 | #### 3. change the value of openshift_pkg_version to 3.11.104 in /etc/ansible/hosts ####
23 | ```
24 | [ec2-user@master0 ~]$ ansible masters -m lineinfile -a 'path="/etc/ansible/hosts" regexp="^openshift_pkg_version" line="openshift_pkg_version=-3.11.104" state="present"'
25 | ```
26 | #### 4. upgrade the nodes ####
27 |
28 | ##### 4.1 prepare nodes for upgrade #####
29 | ```
30 | [ec2-user@master0 ~]$ ansible all -a 'subscription-manager refresh'
31 | [ec2-user@master0 ~]$ ansible all -a 'subscription-manager repos --enable="rhel-7-server-ose-3.11-rpms" --enable="rhel-7-server-rpms" --enable="rhel-7-server-extras-rpms" --enable="rhel-7-server-ansible-2.6-rpms" --enable="rhel-7-fast-datapath-rpms" --disable="rhel-7-server-ose-3.10-rpms" --disable="rhel-7-server-ansible-2.4-rpms"'
32 | [ec2-user@master0 ~]$ ansible all -a 'yum clean all'
33 | [ec2-user@master0 ~]$ ansible masters -m lineinfile -a 'path="/etc/ansible/hosts" regexp="^openshift_certificate_expiry_fail_on_warn" line="openshift_certificate_expiry_fail_on_warn=False" state="present"'
34 | ```
35 | ##### 4.2 prepare your upgrade-host #####
36 | ```
37 | [ec2-user@master0 ~]$ sudo -i
38 | [ec2-user@master0 ~]# yum update -y openshift-ansible
39 | ```
40 |
41 | ##### 4.3 upgrade the control plane #####
42 |
43 | Upgrade the so-called control plane, consisting of:
44 |
45 | - etcd
46 | - master components
47 | - node services running on masters
48 | - Docker running on masters
49 | - Docker running on any stand-alone etcd hosts
50 |
51 | ```
52 | [ec2-user@master0 ~]$ cd /usr/share/ansible/openshift-ansible
53 | [ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_control_plane.yml
54 | ```
55 |
56 | ##### 4.4 upgrade the nodes manually (one by one) #####
57 |
58 | Upgrade node by node manually because we need to make sure, that the nodes running GlusterFS in container have enough time to replicate to the other nodes.
59 |
60 | Upgrade `infra-node0.user[X].lab.openshift.ch`:
61 | ```
62 | [ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml \
63 | --extra-vars openshift_upgrade_nodes_label="kubernetes.io/hostname=infra-node0.user[X].lab.openshift.ch"
64 | ```
65 | Wait until all GlusterFS Pods are ready again and check if GlusterFS volumes have heal entries.
66 | ```
67 | [ec2-user@master0 ~]$ oc project glusterfs
68 | [ec2-user@master0 ~]$ oc get pods -o wide | grep glusterfs
69 | [ec2-user@master0 ~]$ oc rsh
70 | sh-4.2# for vol in `gluster volume list`; do gluster volume heal $vol info; done | grep -i "number of entries"
71 | Number of entries: 0
72 | ```
73 | If all volumes have `Number of entries: 0`, we can proceed with the next node and repeat the check of GlusterFS.
74 |
75 | Upgrade `infra-node1` and `infra-node2` the same way you as you did the first one:
76 | ```
77 | [ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml \
78 | --extra-vars openshift_upgrade_nodes_label="kubernetes.io/hostname=infra-node1.user[X].lab.openshift.ch"
79 | ```
80 |
81 | Afer upgrading the `infra_nodes`, you need to upgrade the compute nodes:
82 | ```
83 | [ec2-user@master0 ~]$ ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade_nodes.yml \
84 | --extra-vars openshift_upgrade_nodes_label="node-role.kubernetes.io/compute=true" \
85 | --extra-vars openshift_upgrade_nodes_serial="1"
86 | ```
87 |
88 | #### 5. Upgrading the EFK Logging Stack ####
89 |
90 | **Note:** Setting openshift_logging_install_logging=true enables you to upgrade the logging stack.
91 |
92 | ```
93 | [ec2-user@master0 ~]$ grep openshift_logging_install_logging /etc/ansible/hosts
94 | [ec2-user@master0 ~]$ cd /usr/share/ansible/openshift-ansible/playbooks
95 | [ec2-user@master0 ~]$ ansible-playbook openshift-logging/config.yml
96 | [ec2-user@master0 ~]$ oc delete pod --selector="component=fluentd" -n logging
97 | ```
98 |
99 | #### 6. Upgrading Cluster Metrics ####
100 | ```
101 | [ec2-user@master0 ~]$ cd /usr/share/ansible/openshift-ansible/playbooks
102 | [ec2-user@master0 ~]$ ansible-playbook openshift-metrics/config.yml
103 | ```
104 |
105 | #### 7. Update the oc binary ####
106 | The `atomic-openshift-clients-redistributable` package which provides the `oc` binary for different operating systems needs to be updated separately:
107 | ```
108 | [ec2-user@master0 ~]$ ansible masters -a "yum install --assumeyes --disableexcludes=all atomic-openshift-clients-redistributable"
109 | ```
110 |
111 | #### 8. Update oc binary on client ####
112 | Update the `oc` binary on your own client. As before, you can get it from:
113 | ```
114 | https://client.app[X].lab.openshift.ch
115 | ```
116 |
117 | **Note:** You should tell all users of your platform to update their client. Client and server version differences can lead to compatibility issues.
118 |
119 | ---
120 |
121 | **End of Lab 7.1**
122 |
123 |
153 |
154 | [← back to the Chapter Overview](50_backup_restore.md)
155 |
--------------------------------------------------------------------------------
/appendices/03_aws_storage.md:
--------------------------------------------------------------------------------
1 | # Appendix 3: Using AWS EBS and EFS Storage
2 | This appendix is going to show you how to use AWS EBS and EFS Storage on OpenShift 3.11.
3 |
4 | ## Installation
5 | :information_source: To access the efs-storage at aws, you will need an fsid. Please ask your instructor to get one.
6 |
7 | Uncomment the following part in your Ansible inventory and set the fsid:
8 | ```
9 | [ec2-user@master0 ~]$ sudo vi /etc/ansible/hosts
10 | ```
11 |
12 | # EFS Configuration
13 | ```
14 | openshift_provisioners_install_provisioners=True
15 | openshift_provisioners_efs=True
16 | openshift_provisioners_efs_fsid="[provided by instructor]"
17 | openshift_provisioners_efs_region="eu-central-1"
18 | openshift_provisioners_efs_nodeselector={"beta.kubernetes.io/os": "linux"}
19 | openshift_provisioners_efs_aws_access_key_id="[provided by instructor]"
20 | openshift_provisioners_efs_aws_secret_access_key="[provided by instructor]"
21 | openshift_provisioners_efs_supplementalgroup=65534
22 | openshift_provisioners_efs_path=/persistentvolumes
23 | ```
24 |
25 | For detailed information about provisioners take a look at https://docs.openshift.com/container-platform/3.11/install_config/provisioners.html#provisioners-efs-ansible-variables
26 |
27 | Execute the playbook to install the provisioner:
28 | ```
29 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-provisioners/config.yml
30 | ```
31 |
32 | Check if the pv was created:
33 | ```
34 | [ec2-user@master0 ~]$ oc get pv
35 |
36 | NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
37 | provisioners-efs 1Mi RWX Retain Bound openshift-infra/provisioners-efs 22h
38 | ```
39 |
40 |
41 | :warning: The external provisioner for AWS EFS on OpenShift Container Platform 3.11 is still a Technology Preview feature.
42 | https://docs.openshift.com/container-platform/3.11/install_config/provisioners.html#overview
43 |
44 | #### Create StorageClass
45 |
46 | To enable dynamic provisioning, you need to crate a storageclass:
47 | ```
48 | [ec2-user@master0 ~]$ cat << EOF > aws-efs-storageclass.yaml
49 | kind: StorageClass
50 | apiVersion: storage.k8s.io/v1beta1
51 | metadata:
52 | name: nfs
53 | provisioner: openshift.org/aws-efs
54 | EOF
55 | [ec2-user@master0 ~]$ oc create -f aws-efs-storageclass.yaml
56 | ```
57 |
58 | Check if the storage class has been created:
59 | ```
60 | [ec2-user@master0 ~]$ oc get sc
61 |
62 | NAME PROVISIONER AGE
63 | glusterfs-storage kubernetes.io/glusterfs 23h
64 | nfs openshift.org/aws-efs 23h
65 | ```
66 |
67 | #### Create PVC
68 |
69 | Now we create a little project and claim a volume from EFS.
70 |
71 | ```
72 | [ec2-user@master0 ~]$ oc new-project quotatest
73 | [ec2-user@master0 ~]$ oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git
74 | [ec2-user@master0 ~]$ cat << EOF > test-pvc.yaml
75 | apiVersion: v1
76 | kind: PersistentVolumeClaim
77 | metadata:
78 | name: quotatest
79 | spec:
80 | accessModes:
81 | - ReadWriteOnce
82 | volumeMode: Filesystem
83 | resources:
84 | requests:
85 | storage: 10Mi
86 | storageClassName: nfs
87 | EOF
88 | [ec2-user@master0 ~]$ oc create -f test-pvc.yaml
89 | [ec2-user@master0 ~]$ oc set volume dc/ruby-ex --add --overwrite --name=v1 --type=persistentVolumeClaim --claim-name=quotatest --mount-path=/quotatest
90 | ```
91 |
92 | Check if we can see our pvc:
93 | ```
94 | [ec2-user@master0 ~]$ oc get pvc
95 |
96 | NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
97 | quotatest Bound pvc-2fa78a43-98ee-11e9-94ce-064eab17d15e 10Mi RWX nfs 17m
98 | ```
99 |
100 | We will now try to write 40Mi in the 10Mi claim to demonstrate, that PVs do not enforce quotas
101 | ```
102 | [ec2-user@master0 ~]$ oc get pods
103 | NAME READY STATUS RESTARTS AGE
104 | ruby-ex-2-zwnws 1/1 Running 0 1h
105 | [ec2-user@master0 ~]$ oc rsh ruby-ex-2-zwnws
106 | $ df -h /quotatest
107 | Filesystem Size Used Avail Use% Mounted on
108 | fs-4f7f2916.efs.eu-central-1.amazonaws.com:/persistentvolumes/provisioners-efs-pvc-2fa78a43-98ee-11e9-94ce-064eab17d15e 8.0E 0 8.0E 0% /quotatest
109 | $ dd if=/dev/urandom of=/quotatest/quota bs=4096 count=10000
110 | $ $ du -hs /quotatest/
111 | 40M /quotatest/
112 | ```
113 |
114 | #### Delete EFS Volumes
115 | When you delete the PVC, the PV and the corresponding data gets deleted.
116 | The default RECLAIM POLICY is set to 'Delete':
117 | ```
118 | [ec2-user@master0 ~]$ oc get pv
119 | NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
120 | provisioners-efs 1Mi RWX Retain Bound openshift-infra/provisioners-efs 23m
121 | pvc-2fa78a43-98ee-11e9-94ce-064eab17d15e 10Mi RWX Delete Bound test/provisioners-efs nfs 17m
122 | registry-volume 5Gi RWX Retain Bound default/registry-claim 13m
123 | ```
124 |
125 | Rundown the application and delete the pvc:
126 | ```
127 | [ec2-user@master0 ~]$ oc scale dc/ruby-ex --replicas=0
128 | [ec2-user@master0 ~]$ oc delete pvc quotatest
129 | ```
130 |
131 | Check if the pv was deleted:
132 | ```
133 | [ec2-user@master0 ~]$ oc get pv
134 | NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
135 | provisioners-efs 1Mi RWX Retain Bound openshift-infra/provisioners-efs 23m
136 | registry-volume 5Gi RWX Retain Bound default/registry-claim 13m
137 | ```
138 |
139 | Check if the efs-provisioner cleans up the NFS Volume:
140 | ```
141 | [ec2-user@master0 ~]$ oc project openshift-infra
142 | [ec2-user@master0 ~]$ oc get pods
143 | NAME READY STATUS RESTARTS AGE
144 | provisioners-efs-1-l75qr 1/1 Running 0 1h
145 | [ec2-user@master0 ~]$ oc rsh provisioners-efs-1-l75qr
146 | sh-4.2# df /persistentvolumes
147 | Filesystem 1K-blocks Used Available Use% Mounted on
148 | fs-4f7f2916.efs.eu-central-1.amazonaws.com:/persistentvolumes 9007199254739968 0 9007199254739968 0% /persistentvolumes
149 | sh-4.2# ls /persistentvolumes
150 | sh-4.2#
151 | ```
152 |
153 | ---
154 |
155 | [← back to the labs overview](../README.md)
156 |
157 |
--------------------------------------------------------------------------------
/labs/34_renew_certificates.md:
--------------------------------------------------------------------------------
1 | ## Lab 3.4: Renew Certificates
2 |
3 | In this lab we take a look at the OpenShift certificates and how to renew them.
4 |
5 | These are the certificates that need to be maintained. For each component there is a playbook provided by Red Hat that will redeploy the certificates:
6 | - masters (API server and controllers)
7 | - etcd
8 | - nodes
9 | - registry
10 | - router
11 |
12 |
13 | ### Check the Expiration of the Certificates
14 |
15 | To check all your certificates, run the playbook `certificate_expiry/easy-mode.yaml`:
16 | ```
17 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-checks/certificate_expiry/easy-mode.yaml
18 | ```
19 | The playbook will generate the following reports with the information of each certificate in JSON and HTML format:
20 | ```
21 | grep -A2 summary $HOME/cert-expiry-report*.json
22 | $HOME/cert-expiry-report*.html
23 | ```
24 |
25 |
26 | ### Redeploy etcd Certificates
27 |
28 | To get a feeling for the process of redeploying certificates, we will redeploy the etcd certificates.
29 |
30 | **Warning:** This will lead to a restart of etcd and master services and consequently cause an outage for a few seconds of the OpenShift API.
31 |
32 | First, we check the current etcd certificates creation time:
33 | ```
34 | [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-ca.crt -text -noout | grep -i validity -A 2
35 | Validity
36 | Not Before: Jun 4 15:45:00 2019 GMT
37 | Not After : Jun 2 15:45:00 2024 GMT
38 |
39 | [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-client.crt -text -noout | grep -i validity -A 2
40 | Validity
41 | Not Before: Jun 4 15:45:00 2019 GMT
42 | Not After : Jun 2 15:45:00 2024 GMT
43 |
44 | ```
45 | Note the value for "Validity Not Before:". We will later compare this timestamp with the freshly deployed certificates.
46 |
47 | Redeploy the CA certificate of the etcd servers:
48 | ```
49 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-ca.yml
50 | ```
51 |
52 | Check the current etcd CA certificate creation time:
53 | ```
54 | [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-ca.crt -text -noout | grep -i validity -A 2
55 | Validity
56 | Not Before: Jun 6 12:58:04 2019 GMT
57 | Not After : Jun 4 12:58:04 2024 GMT
58 |
59 | [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-client.crt -text -noout | grep -i validity -A 2
60 | Validity
61 | Not Before: Jun 4 15:45:00 2019 GMT
62 | Not After : Jun 2 15:45:00 2024 GMT
63 | ```
64 | The etcd CA certificate has been generated, but etcd is still using the old server certificates. We will replace them with the `redeploy-etcd-certificates.yml` playbook.
65 |
66 | **Warning:** This will again lead to a restart of etcd and master services and consequently cause an outage for a few seconds of the OpenShift API.
67 | ```
68 | [ec2-user@master0 ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-certificates.yml
69 | ```
70 |
71 | Check if the server certificate has been replaced:
72 | ```
73 | [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-ca.crt -text -noout | grep -i validity -A 2
74 | Validity
75 | Not Before: Jun 6 12:58:04 2019 GMT
76 | Not After : Jun 4 12:58:04 2024 GMT
77 |
78 | [ec2-user@master0 ~]$ sudo openssl x509 -in /etc/origin/master/master.etcd-client.crt -text -noout | grep -i validity -A 2
79 | Validity
80 | Not Before: Jun 6 13:28:36 2019 GMT
81 | Not After : Jun 4 13:28:36 2024 GMT
82 | ```
83 | ### Redeploy nodes Certificates
84 |
85 | 1. Create a new bootstrap.kubeconfig for nodes (MASTER nodes will just copy admin.kubeconfig):"
86 | ```
87 | [ec2-user@master0 ~]$ sudo oc serviceaccounts create-kubeconfig node-bootstrapper -n openshift-infra --config /etc/origin/master/admin.kubeconfig > /tmp/bootstrap.kubeconfig
88 | ```
89 |
90 | 2. Distribute ~/bootstrap.kubeconfig from step 1 to infra and compute nodes replacing /etc/origin/node/bootstrap.kubeconfig
91 | ```
92 | [ec2-user@master0 ~]$ ansible nodes -m copy -a 'src=/tmp/bootstrap.kubeconfig dest=/etc/origin/node/bootstrap.kubeconfig'
93 | ```
94 |
95 | 3. Move node.kubeconfig and client-ca.crt. These will get recreated when the node service is restarted:
96 | ```
97 | [ec2-user@master0 ~]$ ansible nodes -m shell -a 'mv /etc/origin/node/client-ca.crt{,.old}'
98 | [ec2-user@master0 ~]$ ansible nodes -m shell -a 'mv /etc/origin/node/node.kubeconfig{,.old}'
99 | ```
100 | 4. Remove contents of /etc/origin/node/certificates/ on app-/infra-nodes:
101 | ```
102 | [ec2-user@master0 ~]$ ansible nodes -m shell -a 'rm -rf /etc/origin/node/certificates' --limit 'nodes:!master*'
103 | ```
104 | 5. Restart node service on app-/infra-nodes:
105 | :warning: restart atomic-openshift-node will fail, until CSR's are approved! Approve (Task 6) the CSR's and restart the Services again.
106 | ```
107 | [ec2-user@master0 ~]$ ansible nodes -m service -a "name=atomic-openshift-node state=restarted" --limit 'nodes:!master*'
108 | ```
109 | 6. Approve CSRs, 2 should be approved for each node:
110 | ```
111 | [ec2-user@master0 ~]$ oc get csr -o name | xargs oc adm certificate approve
112 | ```
113 | 7. Check if the app-/infra-nodes are READY:
114 | ```
115 | [ec2-user@master0 ~]$ oc get node
116 | [ec2-user@master0 ~]$ for i in `oc get nodes -o jsonpath=$'{range .items[*]}{.metadata.name}\n{end}'`; do oc get --raw /api/v1/nodes/$i/proxy/healthz; echo -e "\t$i"; done
117 | ```
118 | 8. Remove contents of /etc/origin/node/certificates/ on master-nodes:
119 | ```
120 | [ec2-user@master0 ~]$ ansible masters -m shell -a 'rm -rf /etc/origin/node/certificates'
121 | ```
122 | 9. Restart node service on master-nodes:
123 | ```
124 | [ec2-user@master0 ~]$ ansible masters -m service -a "name=atomic-openshift-node state=restarted"
125 | ```
126 | 10. Approve CSRs, 2 should be approved for each node:
127 | ```
128 | [ec2-user@master0 ~]$ oc get csr -o name | xargs oc adm certificate approve
129 | ```
130 | 11. Check if the master-nodes are READY:
131 | ```
132 | [ec2-user@master0 ~]$ oc get node
133 | [ec2-user@master0 ~]$ for i in `oc get nodes -o jsonpath=$'{range .items[*]}{.metadata.name}\n{end}' | grep master`; do oc get --raw /api/v1/nodes/$i/proxy/healthz; echo -e "\t$i"; done
134 | ```
135 |
136 |
137 | ### Replace the other main certificates
138 |
139 | Use the following playbooks to replace the certificates of the other main components of OpenShift:
140 |
141 | **Warning:** Do not yet replace the router certificates with the corresponding playbook as it will break your routers running on OpenShift 3.6. If you want to, replace the router certificates after upgrading to OpenShift 3.7. (Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1490186)
142 |
143 | - masters (API server and controllers)
144 | - /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-certificates.yml
145 |
146 | - etcd
147 | - /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-ca.yml
148 | - /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/redeploy-certificates.yml
149 |
150 | - registry
151 | - /usr/share/ansible/openshift-ansible/playbooks/openshift-hosted/redeploy-registry-certificates.yml
152 |
153 | - router
154 | - /usr/share/ansible/openshift-ansible/playbooks/openshift-hosted/redeploy-router-certificates.yml
155 |
156 | **Warning:** The documented redeploy-certificates.yml for Nodes doesn't exists anymore! (since 3.10)
157 | This is already reported: Red Hat Bugzilla – Bug 1635251.
158 | Red Hat provided this KCS: https://access.redhat.com/solutions/3782361
159 |
160 | - nodes (manual steps needed!)
161 | ---
162 |
163 | **End of Lab 3.4**
164 |
165 |
166 |
167 | [← back to the Chapter Overview](30_daily_business.md)
168 |
--------------------------------------------------------------------------------
/labs/33_persistent_storage.md:
--------------------------------------------------------------------------------
1 | ## Lab 3.3: Persistent Storage
2 |
3 | In this lab we take a look at the OpenShift implementation of Container Native Storage using the `heketi-cli` to resize a volume.
4 |
5 |
6 | ### heketi-cli
7 |
8 | The package `heketi-client` has been pre-installed for you on the bastion host. The package includes the `heketi-cli` command.
9 | In order to use `heketi-cli`, we need the server's URL and admin key:
10 | ```
11 | [ec2-user@master0 ~]$ oc describe pod -n glusterfs | grep HEKETI_ADMIN_KEY
12 | HEKETI_ADMIN_KEY: [HEKETI_ADMIN_KEY]
13 | ```
14 |
15 | We can then set variables with this information:
16 | ```
17 | [ec2-user@master0 ~]$ export HEKETI_CLI_USER=admin
18 | [ec2-user@master0 ~]$ export HEKETI_CLI_KEY="[HEKETI_ADMIN_KEY]"
19 | [ec2-user@master0 ~]$ export HEKETI_CLI_SERVER=$(oc get svc/heketi-storage -n glusterfs --template "http://{{.spec.clusterIP}}:{{(index .spec.ports 0).port}}")
20 | ```
21 |
22 | Verify that everything is set as it should:
23 | ```
24 | [ec2-user@master0 ~]$ env | grep -i heketi
25 | HEKETI_CLI_KEY=[PASSWORD]
26 | HEKETI_CLI_SERVER=http://172.30.250.14:8080
27 | HEKETI_CLI_USER=admin
28 | ```
29 |
30 | Now we can run some useful commands for troubleshooting.
31 |
32 | Get all volumes and then show details of a specific volume using its id:
33 | ```
34 | [ec2-user@master0 ~]$ heketi-cli volume list
35 | Id:255b9535ee460dfa696a7616b57a7035 Cluster:bc64bf1b4a4e7cc0702d28c7c02674cf Name:glusterfs-registry-volume
36 | Id:e5baabb2bca5ba5cdd749d48d47c4e89 Cluster:bc64bf1b4a4e7cc0702d28c7c02674cf Name:heketidbstorage
37 |
38 | [ec2-user@master0 ~]$ heketi-cli volume info 255b9535ee460dfa696a7616b57a7035
39 | ...
40 | ```
41 |
42 | Get the cluster id and details of the cluster:
43 | ```
44 | [ec2-user@master0 ~]$ heketi-cli cluster list
45 | Clusters:
46 | Id:bc64bf1b4a4e7cc0702d28c7c02674cf [file][block]
47 | [ec2-user@master0 ~]$ heketi-cli cluster info bc64bf1b4a4e7cc0702d28c7c02674cf
48 | ...
49 | ```
50 |
51 | Get nodes and details of a specific node using its id:
52 | ```
53 | [ec2-user@master0 ~]$ heketi-cli node list
54 | Id:3efc4d8267eb3b65c2d3ed9848aa4328 Cluster:bc64bf1b4a4e7cc0702d28c7c02674cf
55 | Id:c0de1021e7577c26721b22003c14427c Cluster:bc64bf1b4a4e7cc0702d28c7c02674cf
56 | Id:c9612d0eee19146642f51dc2f3d484e5 Cluster:bc64bf1b4a4e7cc0702d28c7c02674cf
57 | [ec2-user@master0 ~]$ heketi-cli node info c9612d0eee19146642f51dc2f3d484e5
58 | ...
59 | ```
60 |
61 | Show the whole topology:
62 | ```
63 | [ec2-user@master0 ~]$ heketi-cli topology info
64 | ...
65 | ```
66 |
67 |
68 | ### Set Default Storage Class
69 |
70 | A StorageClass provides a way to describe a certain type of storage. Different classes might map to different storage types (e.g. nfs, gluster, ...), quality-of-service levels, to backup policies or to arbitrary policies determined by the cluster administrators. In our case we only have one storage class which is `glusterfs-storage`:
71 | ```
72 | [ec2-user@master0 ~]$ oc get storageclass
73 | ```
74 |
75 | By setting the anotation `storageclass.kubernetes.io/is-default-class` on a StorageClass we make it the default storage class on an OpenShift cluster:
76 | ```
77 | [ec2-user@master0 ~]$ oc patch storageclass glusterfs-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
78 | ```
79 |
80 | If then someone creates a pvc and does not specify the StorageClass, the [DefaultStorageClass admission controller](https://kubernetes.io/docs/admin/admission-controllers/#defaultstorageclass) does automatically set the StorageClass to the DefaultStorageClass.
81 |
82 | **Note:** We could have set the Ansible inventory variable `openshift_storage_glusterfs_storageclass_default` to `true` during installation in order to let the playbooks automatically do what we just did by hand. For demonstration purposes however we set it to `false`.
83 |
84 |
85 | ### Create and Delete a Persistent Volume Claim
86 |
87 | If you create a PersistentVolumeClaim, Heketi will automatically create a PersistentVolume and bind it to your claim. Likewise if you delete a claim, Heketi will delete the PersistentVolume.
88 |
89 | Create a new project and create a pvc:
90 | ```
91 | [ec2-user@master0 ~]$ oc new-project labelle
92 | Now using project "labelle" on server "https://console.user[X].lab.openshift.ch:8443".
93 | ...
94 | [ec2-user@master0 ~]$ cat <pvc.yaml
95 | apiVersion: "v1"
96 | kind: "PersistentVolumeClaim"
97 | metadata:
98 | name: "testclaim"
99 | spec:
100 | accessModes:
101 | - "ReadWriteOnce"
102 | resources:
103 | requests:
104 | storage: "1Gi"
105 | EOF
106 |
107 | [ec2-user@master0 ~]$ oc create -f pvc.yaml
108 | persistentvolumeclaim "testclaim" created
109 | ```
110 |
111 | Check if the pvc could be bound to a new volume:
112 | ```
113 | [ec2-user@master0 ~]$ oc get pvc
114 | NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
115 | testclaim Bound pvc-839223fd-30d4-11e8-89f3-067e4f48dfe4 1Gi RWO glusterfs-storage 38s
116 |
117 | [ec2-user@master0 ~]$ oc get pv
118 | NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
119 | pvc-839223fd-30d4-11e8-89f3-067e4f48dfe4 1Gi RWO Delete Bound labelle/testclaim glusterfs-storage 41s
120 | ...
121 | ```
122 |
123 | Delete the claim and check if the volume gets deleted:
124 | ```
125 | [ec2-user@master0 ~]$ oc delete pvc testclaim
126 | persistentvolumeclaim "testclaim" deleted
127 | [ec2-user@master0 ~]$ oc get pv
128 |
129 | [ec2-user@master0 ~]$ oc delete project labelle
130 | ```
131 |
132 |
133 | ### Resize Existing Volume
134 |
135 | We will resize the registry volume with heketi-cli.
136 |
137 | First we need to know which volume is in use for the registry:
138 | ```
139 | [ec2-user@master0 ~]$ oc get pvc registry-claim -n default
140 | NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
141 | registry-claim Bound registry-volume 5Gi RWX 2d
142 |
143 | [ec2-user@master0 ~]$ oc describe pv registry-volume | grep Path
144 | Path: glusterfs-registry-volume
145 |
146 | [ec2-user@master0 ~]$ heketi-cli volume list | grep glusterfs-registry-volume
147 | Id:255b9535ee460dfa696a7616b57a7035 Cluster:bc64bf1b4a4e7cc0702d28c7c02674cf Name:glusterfs-registry-volume
148 | ```
149 |
150 | Now we can extend the volume from 5Gi to 6Gi:
151 | ```
152 | [ec2-user@master0 ~]$ heketi-cli volume expand --volume=255b9535ee460dfa696a7616b57a7035 --expand-size=1
153 | Name: glusterfs-registry-volume
154 | Size: 6
155 | ...
156 | ```
157 |
158 | Check if the gluster volume has the new size:
159 | ```
160 | [ec2-user@master0 ~]$ ansible infra_nodes -m shell -a "df -ah" | grep glusterfs-registry-volume
161 | 172.31.40.96:glusterfs-registry-volume 6.0G 317M 5.7G 3% /var/lib/origin/openshift.local.volumes/pods/d8dc2712-3bcf-11e8-90a6-066961eacc9a/volumes/kubernetes.io~glusterfs/registry-volume
162 | ```
163 |
164 | In order for the persistent volume's information and the actually available space to be consistent, we're going to edit the pv's specification:
165 | ```
166 | [ec2-user@master0 ~]$ oc get pv
167 | NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
168 | registry-volume 5Gi RWX Retain Bound default/registry-claim 1d
169 | [ec2-user@master0 ~]$ oc patch pv registry-volume -p '{"spec":{"capacity":{"storage":"6Gi"}}}'
170 | [ec2-user@master0 ~]$ oc get pv
171 | NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
172 | registry-volume 6Gi RWX Retain Bound default/registry-claim 1d
173 | ```
174 |
175 | ---
176 |
177 | **End of Lab 3.3**
178 |
179 |
180 |
181 | [← back to the Chapter Overview](30_daily_business.md)
182 |
--------------------------------------------------------------------------------
/labs/61_monitoring.md:
--------------------------------------------------------------------------------
1 | ## Lab 6.1: Monitoring
2 |
3 | OpenShift monitoring can be categorized into three different categories which each try to answer their own question:
4 | 1. Is our cluster in an operational state right now?
5 | 2. Will our cluster remain in an operational state in the near future?
6 | 3. Does our cluster have enough capacity to run all pods?
7 |
8 |
9 | ### Is Our Cluster in an Operational State at the Moment?
10 |
11 | In order to answer this first question, we check the state of different vital components:
12 | * Masters
13 | * etcd
14 | * Routers
15 | * Apps
16 |
17 | **Masters** expose health information on an HTTP endpoint at https://`openshift_master_cluster_public_hostname`:`openshift_master_api_port`/healthz that can be checked for a 200 status code. On one hand, this endpoint can be used as a health indicator in a loadbalancer configuration, on the other hand we can use it ourselves for monitoring or troubleshooting purposes.
18 |
19 | Check the masters' health state with a HTTP request:
20 | ```
21 | [ec2-user@master0 ~]$ curl -v https://console.user[X].lab.openshift.ch/healthz
22 | ```
23 |
24 | As long as the response is a 200 status code at least one of the masters is still working and the API is accessible via Load Balancer (if there is one).
25 |
26 | **etcd** also exposes a similar health endpoint at https://`openshift_master_cluster_public_hostname`:2379/health, though it is only accessible using the client certificate and corresponding key stored on the masters at `/etc/origin/master/master.etcd-client.crt` and `/etc/origin/master/master.etcd-client.key`.
27 | ```
28 | [ec2-user@master0 ~]$ sudo curl --cacert /etc/origin/master/master.etcd-ca.crt --cert /etc/origin/master/master.etcd-client.crt --key /etc/origin/master/master.etcd-client.key https://master0.user[X].lab.openshift.ch:2379/health
29 | ```
30 |
31 | The **HAProxy router pods** are responsible for getting application traffic into OpenShift. Similar to the masters, HAProxy also exposes a /healthz endpoint on port 1936 which can be checked with e.g.:
32 | ```
33 | [ec2-user@master0 ~]$ curl -v http://router.app[X].lab.openshift.ch:1936/healthz
34 | ```
35 |
36 | Using the wildcard domain to access a router's health page results in a positive answer if at least one router is up and running and that's all we want to know right now.
37 |
38 | **Note:** Port 1936 is not open by default, so it has to be opened at least for those nodes running the router pods. This can be achieved e.g. by setting the ansible variable `openshift_node_open_ports` (at least as of OpenShift version 3.7 or later).
39 |
40 | **Apps** running on OpenShift should of course be (end-to-end) monitored as well, however, we are not interested in a single application per se. We want to know if all applications of a group of monitored applications do not respond. The more applications not responding the more probable a platform-wide problem is the cause.
41 |
42 |
43 | ### Will our Cluster Remain in an Operational State in the Near Future?
44 |
45 | The second category is based on a wider array of checks. It includes checks that take a more "classic" approach such as storage monitoring, but also includes above checks to find out if single cluster members are not healthy.
46 |
47 | First, let's look at how to use above checks to answer this second question.
48 |
49 | The health endpoint exposed by **masters** was accessed via load balancer in the first category in order to find out if the API is generally available. This time however we want to find out if at least one of the master APIs is unavailable, even if there still are some that are accessible. So we check every single master endpoint directly instead of via load balancer:
50 | ```
51 | [ec2-user@master0 ~]$ for i in {0..2}; do curl -v https://master${i}.user[X].lab.openshift.ch/healthz; done
52 | ```
53 |
54 | The **etcd** check above is already run against single members of the cluster and can therefore be applied here in the exact same form. The difference only is that we want to make sure every single member is running, not just the number needed to have quorum.
55 |
56 | The approach used for the masters also applies to the **HAProxy routers**. A router pod is effectively listening on the node's interface it is running on. So instead of connecting via load balancer, we use the nodes' IP addresses the router pods are running on. In our case, these are nodes 0 and 1:
57 | ```
58 | [ec2-user@master0 ~]$ for i in {0..2}; do curl -v http://infra-node${i}.user[X].lab.openshift.ch:1936/healthz; done
59 | ```
60 |
61 | As already mentioned, finding out if our cluster will remain in an operational state in the near future also includes some better known checks we could call a more conventional **components monitoring**.
62 |
63 | Next to the usual monitoring of storage per partition/logical volume, there's one logical volume on each node of special interest to us: the **Docker storage**. The Docker storage contains images and container filesystems of running containers. Monitoring the available space of this logical volume is important in order to tune garbage collection. Garbage collection is done by the **kubelets** running on each node. The available garbage collection kubelet arguments can be found in the [official documentation](https://docs.openshift.com/container-platform/3.11/admin_guide/garbage_collection.html).
64 |
65 | Speaking of garbage collection, there's another component that needs frequent garbage collection: the registry. Contrary to the Docker storage on each node, OpenShift only provides a command to prune the registry but does not offer a means to execute it on a regular basis. Until it does, setup the [appuio-pruner](https://github.com/appuio/appuio-pruner) as described in its README.
66 |
67 |
68 | ### Does our Cluster Have Enough Capacity to Run All Pods?
69 |
70 | Besides the obvious components that need monitoring like CPU, memory and storage, this third question is tightly coupled with requests and limits we looked at in [chapter 4](41_out_of_resource_handling.md).
71 |
72 | But let's first get an overview of available resources using tools you might not have heard about before. One such tool is [Cockpit](http://cockpit-project.org/). Cockpit aims to ease administration tasks of Linux servers by making some basic tasks available via web interface. It is installed by default on every master by the OpenShift Ansible playbooks and listens on port 9090. We don't want to expose the web interface to the internet though, so we are going to use SSH port forwarding to access it:
73 | ```
74 | [ec2-user@master0 ~]$ ssh ec2-user@jump.lab.openshift.ch -L 9090:master0.user[X].lab.openshift.ch:9090
75 | ```
76 |
77 | After the SSH tunnel has been established, open http://localhost:9090 in your browser and log in using user `ec2-user` and the password provided by the instructor. Explore the different tabs and sections of the web interface.
78 |
79 | Another possibility to get a quick overview of used and available resources is the [kube-ops-view](https://github.com/hjacobs/kube-ops-view) project. Install it on your OpenShift cluster:
80 | ```
81 | oc new-project ocp-ops-view
82 | oc create sa kube-ops-view
83 | oc adm policy add-scc-to-user anyuid -z kube-ops-view
84 | oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:ocp-ops-view:kube-ops-view
85 | oc apply -f https://raw.githubusercontent.com/raffaelespazzoli/kube-ops-view/ocp/deploy-openshift/kube-ops-view.yaml
86 | oc create route edge --service kube-ops-view
87 | oc get route | grep kube-ops-view | awk '{print $2}'
88 | ```
89 |
90 | The design takes some getting used to, but at least the browser zoom can help with the small size.
91 |
92 | The information about kube-ops-view as well as its installation instructions are actually from a [blog post series](https://blog.openshift.com/full-cluster-capacity-management-monitoring-openshift/) from Red Hat that does a very good job at explaining the different relations and possibilities to finding an answer to our question about capacity.
93 |
94 | These two tools provide a quick look at resource availability. Implementing a mature, enterprise-grade monitoring of OpenShift resources depends on what tools are available already in an IT environment and would go beyond the scope and length of this techlab, but the referred blog post series certainly is a good start.
95 |
96 |
97 | ---
98 |
99 | **End of Lab 6.1**
100 |
101 |
102 |
103 | [← back to the Chapter Overview](60_monitoring_troubleshooting.md)
104 |
--------------------------------------------------------------------------------
/labs/41_out_of_resource_handling.md:
--------------------------------------------------------------------------------
1 | ## Lab 4.1: Out of Resource Handling
2 |
3 | This lab deals with out of resource handling on OpenShift platforms, most importantly the handling of out-of-memory conditions. Out of resource conditions can occur either on the container level because of resource limits or on the node level because a node runs out of memory as a result of overcommitting.
4 | They are either handled by OpenShift or directly by the kernel.
5 |
6 |
7 | ### Introduction
8 |
9 | The following terms and behaviours are crucial in understanding this lab.
10 |
11 | Killing a pod or a container are fundamentally different:
12 | * Pods and its containers live on the same node for the duration of their lifetime.
13 | * A pod's restart policy determines whether its containers are restarted after being killed.
14 | * Killed containers always restart on the same node.
15 | * If a pod is killed the configuration of its controller, e.g. ReplicationController, ReplicaSet, Job, ..., determines whether a replacement pod is created.
16 | * Pods without controllers are never replaced after being killed.
17 |
18 | An OpenShift node recovers from out of memory conditions by killing containers or pods:
19 | * **Out of Memory (OOM) Killer**: Linux kernel mechanism which kills processes to recover from out of memory conditions.
20 | * **Pod Eviction**: An OpenShift mechanism which kills pods to recover from out of memory conditions.
21 |
22 | The order in which containers and pods are killed is determined by their Quality of Service (QoS) class.
23 | The QoS class in turn is defined by resource requests and limits developers configure on their containers.
24 | For more information see [Quality of Service Tiers](https://docs.openshift.com/container-platform/3.11/dev_guide/compute_resources.html#quality-of-service-tiers).
25 |
26 |
27 | ### Out of Memory Killer in Action
28 |
29 | To observe how the OOM killer in action create a container which allocates all memory available on the node it runs on:
30 |
31 | ```
32 | [ec2-user@master0 ~]$ oc new-project out-of-memory
33 | [ec2-user@master0 ~]$ oc create -f https://raw.githubusercontent.com/appuio/ops-techlab/release-3.11/resources/membomb/pod_oom.yaml
34 | ```
35 |
36 | Wait and watch till the container is up and being killed. `oc get pods -o wide -w` will then show:
37 | ```
38 | NAME READY STATUS RESTARTS AGE IP NODE
39 | membomb-1-z6md2 0/1 OOMKilled 0 7s 10.131.2.24 app-node0.user8.lab.openshift.ch
40 | ```
41 |
42 | Run `oc describe pod -l app=membomb` to get more information about the container state which should look like this:
43 | ```
44 | State: Terminated
45 | Reason: OOMKilled
46 | Exit Code: 137
47 | Started: Thu, 17 May 2018 10:51:02 +0200
48 | Finished: Thu, 17 May 2018 10:51:04 +0200
49 | ```
50 |
51 | Exit code 137 [indicates](http://tldp.org/LDP/abs/html/exitcodes.html) that the container main process was killed by the `SIGKILL` signal.
52 | With the default `restartPolicy` of `Always` the container would now restart on the same node. For this lab the `restartPolicy`
53 | has been set to `Never` to prevent endless out-of-memory conditions and restarts.
54 |
55 | Now log into the OpenShift node the pod ran on and study how the OOM event looks like in the kernel logs.
56 | You can see on which node the pod ran in the output of either the `oc get` or `oc describe` command you just ran.
57 | In this example this would look like:
58 |
59 | ```
60 | ssh app-node0.user[X].lab.openshift.ch
61 | journalctl -ke
62 | ```
63 |
64 | The following lines should be highlighted:
65 |
66 | ```
67 | May 17 10:51:04 app-node0.user8.lab.openshift.ch kernel: Memory cgroup out of memory: Kill process 5806 (python) score 1990 or sacrifice child
68 | May 17 10:51:04 app-node0.user8.lab.openshift.ch kernel: Killed process 5806 (python) total-vm:7336912kB, anon-rss:5987524kB, file-rss:0kB, shmem-rss:0kB
69 | ```
70 |
71 | This log messages indicate that the OOM killer has been invoked because a cgroup memory limit has been exceeded
72 | and that it killed a python process which consumed 5987524kB memory. Cgroup is a kernel mechanism which limits
73 | resource usage of processes.
74 | Further up in the log you should see a line like the following, followed by usage and limits of the corresponding cgroup hierarchy:
75 |
76 | ```
77 | May 17 10:51:04 app-node0.user8.lab.openshift.ch kernel: Task in /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod6ba0af16_59af_11e8_9a62_0672f11196a0.slice/docker-648ff0b111978161b0ac94fb72a4656ee3f98b8e73f7eb63c5910f5cf8cd9c53.scope killed as a result of limit of /kubepods.slice
78 | ```
79 |
80 | This message tells you that a limit of the cgroup `kubepods.slice` has been exceeded. That's the cgroup
81 | limiting the resource usage of all container processes on a node, preventing them from using resources
82 | reserved for the kernel and system daemons.
83 | Note that a container can also be killed by the OOM killer because it reached its own memory limit. In that
84 | case a different cgroup will be listed in the `killed as a result of limit of` line. Everything
85 | else will however look the same.
86 |
87 | There are some drawbacks to containers being killed by the out of memory killer:
88 | * Containers are always restarted on the same node, possibly repeating the same out of memory condition over and over again.
89 | * There is no grace period, container processes are immediately killed with SIGKILL.
90 |
91 | Because of this OpenShift provides the "Pod Eviction" mechanism to kill and reschedule pods before they trigger
92 | an out of resource condition.
93 |
94 |
95 | ### Pod Eviction
96 |
97 | OpenShift offers hard and soft evictions. Hard evictions act immediately when the configured threshold is reached.
98 | Soft evictions allow the threshold to be exceeded for a configurable grace period before taking action.
99 |
100 | To observe a pod eviction create a container which allocates memory till it is being evicted:
101 |
102 | ```
103 | [ec2-user@master0 ~]$ oc create -f https://raw.githubusercontent.com/appuio/ops-techlab/release-3.11/resources/membomb/pod_eviction.yaml
104 | ```
105 |
106 | Wait till the container gets evicted. Run `oc describe pod -l app=membomb` to see the reason for the eviction:
107 | ```
108 | Status: Failed
109 | Reason: Evicted
110 | Message: The node was low on resource: memory.
111 | ```
112 |
113 | After a pod eviction a node is flagged as being under memory pressure for a short time, by default 5 minutes.
114 | Nodes under memory pressure are not considered for scheduling new pods.
115 |
116 | ### Recommendations
117 |
118 | Beginning with OCP 3.6 the memory available for pods on a node is determined by this formula:
119 | ```
120 | = - - -
121 | ```
122 |
123 | Where
124 | * `` is the memory (RAM) of a node.
125 | * `` is an option of the OpenShift node service (kubelet), specifying how much memory to reserve for OpenShift node components.
126 | * `` is an option of the OpenShift node service (kubelet), specifying how much memory to reserve for the kernel and system daemons.
127 |
128 | Also beginning with OCP 3.6 the OOM killer is now triggered when the total memory consumed by all pods on a node exceeds the
129 | allocatable memory, even when there's still memory available on the node. You can view the amount of allocatable memory on all
130 | nodes by running `oc describe nodes`.
131 |
132 | For stable operations we recommend to reserve about **10%** of the nodes memory for the kernel, system daemons and node components
133 | with the `kube-reserved` and `system-reserved` parameters. More memory may need to be reserved if you run additional system
134 | daemons for monitoring, backup, etc. on nodes.
135 | OCP 3.6 has a hard memory eviction threshold of 100 MiB preconfigured. No other eviction thresholds are enabled by default.
136 | This is usually to low to trigger pod eviction before the OOM killer hits. We recommend to start with a hard memory eviction
137 | threshold of **500Mi**. If you keep to see lots of OOM killed containers consider increasing the hard eviction threshold or
138 | adding a soft eviction threshold. But remember that hard eviction thresholds are subtracted from the nodes allocatable resources.
139 |
140 | You can configure reserves and eviction thresholds in the node configuration, e.g.:
141 |
142 | ```
143 | kubeletArguments:
144 | kube-reserved:
145 | - "cpu=200m,memory=512Mi"
146 | system-reserved:
147 | - "cpu=200m,memory=512Mi"
148 | ```
149 |
150 | See [Allocating Node Resources](https://docs.openshift.com/container-platform/3.11/admin_guide/allocating_node_resources.html)
151 | and [Out of Resource Handling](https://docs.openshift.com/container-platform/3.11/admin_guide/out_of_resource_handling.html) for more information.
152 |
153 | ---
154 |
155 | **End of Lab 4.1**
156 |
157 |
254 |
255 | [← back to the Chapter Overview](30_daily_business.md)
256 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Attribution-ShareAlike 4.0 International
2 |
3 | =======================================================================
4 |
5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
6 | does not provide legal services or legal advice. Distribution of
7 | Creative Commons public licenses does not create a lawyer-client or
8 | other relationship. Creative Commons makes its licenses and related
9 | information available on an "as-is" basis. Creative Commons gives no
10 | warranties regarding its licenses, any material licensed under their
11 | terms and conditions, or any related information. Creative Commons
12 | disclaims all liability for damages resulting from their use to the
13 | fullest extent possible.
14 |
15 | Using Creative Commons Public Licenses
16 |
17 | Creative Commons public licenses provide a standard set of terms and
18 | conditions that creators and other rights holders may use to share
19 | original works of authorship and other material subject to copyright
20 | and certain other rights specified in the public license below. The
21 | following considerations are for informational purposes only, are not
22 | exhaustive, and do not form part of our licenses.
23 |
24 | Considerations for licensors: Our public licenses are
25 | intended for use by those authorized to give the public
26 | permission to use material in ways otherwise restricted by
27 | copyright and certain other rights. Our licenses are
28 | irrevocable. Licensors should read and understand the terms
29 | and conditions of the license they choose before applying it.
30 | Licensors should also secure all rights necessary before
31 | applying our licenses so that the public can reuse the
32 | material as expected. Licensors should clearly mark any
33 | material not subject to the license. This includes other CC-
34 | licensed material, or material used under an exception or
35 | limitation to copyright. More considerations for licensors:
36 | wiki.creativecommons.org/Considerations_for_licensors
37 |
38 | Considerations for the public: By using one of our public
39 | licenses, a licensor grants the public permission to use the
40 | licensed material under specified terms and conditions. If
41 | the licensor's permission is not necessary for any reason--for
42 | example, because of any applicable exception or limitation to
43 | copyright--then that use is not regulated by the license. Our
44 | licenses grant only permissions under copyright and certain
45 | other rights that a licensor has authority to grant. Use of
46 | the licensed material may still be restricted for other
47 | reasons, including because others have copyright or other
48 | rights in the material. A licensor may make special requests,
49 | such as asking that all changes be marked or described.
50 | Although not required by our licenses, you are encouraged to
51 | respect those requests where reasonable. More_considerations
52 | for the public:
53 | wiki.creativecommons.org/Considerations_for_licensees
54 |
55 | =======================================================================
56 |
57 | Creative Commons Attribution-ShareAlike 4.0 International Public
58 | License
59 |
60 | By exercising the Licensed Rights (defined below), You accept and agree
61 | to be bound by the terms and conditions of this Creative Commons
62 | Attribution-ShareAlike 4.0 International Public License ("Public
63 | License"). To the extent this Public License may be interpreted as a
64 | contract, You are granted the Licensed Rights in consideration of Your
65 | acceptance of these terms and conditions, and the Licensor grants You
66 | such rights in consideration of benefits the Licensor receives from
67 | making the Licensed Material available under these terms and
68 | conditions.
69 |
70 |
71 | Section 1 -- Definitions.
72 |
73 | a. Adapted Material means material subject to Copyright and Similar
74 | Rights that is derived from or based upon the Licensed Material
75 | and in which the Licensed Material is translated, altered,
76 | arranged, transformed, or otherwise modified in a manner requiring
77 | permission under the Copyright and Similar Rights held by the
78 | Licensor. For purposes of this Public License, where the Licensed
79 | Material is a musical work, performance, or sound recording,
80 | Adapted Material is always produced where the Licensed Material is
81 | synched in timed relation with a moving image.
82 |
83 | b. Adapter's License means the license You apply to Your Copyright
84 | and Similar Rights in Your contributions to Adapted Material in
85 | accordance with the terms and conditions of this Public License.
86 |
87 | c. BY-SA Compatible License means a license listed at
88 | creativecommons.org/compatiblelicenses, approved by Creative
89 | Commons as essentially the equivalent of this Public License.
90 |
91 | d. Copyright and Similar Rights means copyright and/or similar rights
92 | closely related to copyright including, without limitation,
93 | performance, broadcast, sound recording, and Sui Generis Database
94 | Rights, without regard to how the rights are labeled or
95 | categorized. For purposes of this Public License, the rights
96 | specified in Section 2(b)(1)-(2) are not Copyright and Similar
97 | Rights.
98 |
99 | e. Effective Technological Measures means those measures that, in the
100 | absence of proper authority, may not be circumvented under laws
101 | fulfilling obligations under Article 11 of the WIPO Copyright
102 | Treaty adopted on December 20, 1996, and/or similar international
103 | agreements.
104 |
105 | f. Exceptions and Limitations means fair use, fair dealing, and/or
106 | any other exception or limitation to Copyright and Similar Rights
107 | that applies to Your use of the Licensed Material.
108 |
109 | g. License Elements means the license attributes listed in the name
110 | of a Creative Commons Public License. The License Elements of this
111 | Public License are Attribution and ShareAlike.
112 |
113 | h. Licensed Material means the artistic or literary work, database,
114 | or other material to which the Licensor applied this Public
115 | License.
116 |
117 | i. Licensed Rights means the rights granted to You subject to the
118 | terms and conditions of this Public License, which are limited to
119 | all Copyright and Similar Rights that apply to Your use of the
120 | Licensed Material and that the Licensor has authority to license.
121 |
122 | j. Licensor means the individual(s) or entity(ies) granting rights
123 | under this Public License.
124 |
125 | k. Share means to provide material to the public by any means or
126 | process that requires permission under the Licensed Rights, such
127 | as reproduction, public display, public performance, distribution,
128 | dissemination, communication, or importation, and to make material
129 | available to the public including in ways that members of the
130 | public may access the material from a place and at a time
131 | individually chosen by them.
132 |
133 | l. Sui Generis Database Rights means rights other than copyright
134 | resulting from Directive 96/9/EC of the European Parliament and of
135 | the Council of 11 March 1996 on the legal protection of databases,
136 | as amended and/or succeeded, as well as other essentially
137 | equivalent rights anywhere in the world.
138 |
139 | m. You means the individual or entity exercising the Licensed Rights
140 | under this Public License. Your has a corresponding meaning.
141 |
142 |
143 | Section 2 -- Scope.
144 |
145 | a. License grant.
146 |
147 | 1. Subject to the terms and conditions of this Public License,
148 | the Licensor hereby grants You a worldwide, royalty-free,
149 | non-sublicensable, non-exclusive, irrevocable license to
150 | exercise the Licensed Rights in the Licensed Material to:
151 |
152 | a. reproduce and Share the Licensed Material, in whole or
153 | in part; and
154 |
155 | b. produce, reproduce, and Share Adapted Material.
156 |
157 | 2. Exceptions and Limitations. For the avoidance of doubt, where
158 | Exceptions and Limitations apply to Your use, this Public
159 | License does not apply, and You do not need to comply with
160 | its terms and conditions.
161 |
162 | 3. Term. The term of this Public License is specified in Section
163 | 6(a).
164 |
165 | 4. Media and formats; technical modifications allowed. The
166 | Licensor authorizes You to exercise the Licensed Rights in
167 | all media and formats whether now known or hereafter created,
168 | and to make technical modifications necessary to do so. The
169 | Licensor waives and/or agrees not to assert any right or
170 | authority to forbid You from making technical modifications
171 | necessary to exercise the Licensed Rights, including
172 | technical modifications necessary to circumvent Effective
173 | Technological Measures. For purposes of this Public License,
174 | simply making modifications authorized by this Section 2(a)
175 | (4) never produces Adapted Material.
176 |
177 | 5. Downstream recipients.
178 |
179 | a. Offer from the Licensor -- Licensed Material. Every
180 | recipient of the Licensed Material automatically
181 | receives an offer from the Licensor to exercise the
182 | Licensed Rights under the terms and conditions of this
183 | Public License.
184 |
185 | b. Additional offer from the Licensor -- Adapted Material.
186 | Every recipient of Adapted Material from You
187 | automatically receives an offer from the Licensor to
188 | exercise the Licensed Rights in the Adapted Material
189 | under the conditions of the Adapter's License You apply.
190 |
191 | c. No downstream restrictions. You may not offer or impose
192 | any additional or different terms or conditions on, or
193 | apply any Effective Technological Measures to, the
194 | Licensed Material if doing so restricts exercise of the
195 | Licensed Rights by any recipient of the Licensed
196 | Material.
197 |
198 | 6. No endorsement. Nothing in this Public License constitutes or
199 | may be construed as permission to assert or imply that You
200 | are, or that Your use of the Licensed Material is, connected
201 | with, or sponsored, endorsed, or granted official status by,
202 | the Licensor or others designated to receive attribution as
203 | provided in Section 3(a)(1)(A)(i).
204 |
205 | b. Other rights.
206 |
207 | 1. Moral rights, such as the right of integrity, are not
208 | licensed under this Public License, nor are publicity,
209 | privacy, and/or other similar personality rights; however, to
210 | the extent possible, the Licensor waives and/or agrees not to
211 | assert any such rights held by the Licensor to the limited
212 | extent necessary to allow You to exercise the Licensed
213 | Rights, but not otherwise.
214 |
215 | 2. Patent and trademark rights are not licensed under this
216 | Public License.
217 |
218 | 3. To the extent possible, the Licensor waives any right to
219 | collect royalties from You for the exercise of the Licensed
220 | Rights, whether directly or through a collecting society
221 | under any voluntary or waivable statutory or compulsory
222 | licensing scheme. In all other cases the Licensor expressly
223 | reserves any right to collect such royalties.
224 |
225 |
226 | Section 3 -- License Conditions.
227 |
228 | Your exercise of the Licensed Rights is expressly made subject to the
229 | following conditions.
230 |
231 | a. Attribution.
232 |
233 | 1. If You Share the Licensed Material (including in modified
234 | form), You must:
235 |
236 | a. retain the following if it is supplied by the Licensor
237 | with the Licensed Material:
238 |
239 | i. identification of the creator(s) of the Licensed
240 | Material and any others designated to receive
241 | attribution, in any reasonable manner requested by
242 | the Licensor (including by pseudonym if
243 | designated);
244 |
245 | ii. a copyright notice;
246 |
247 | iii. a notice that refers to this Public License;
248 |
249 | iv. a notice that refers to the disclaimer of
250 | warranties;
251 |
252 | v. a URI or hyperlink to the Licensed Material to the
253 | extent reasonably practicable;
254 |
255 | b. indicate if You modified the Licensed Material and
256 | retain an indication of any previous modifications; and
257 |
258 | c. indicate the Licensed Material is licensed under this
259 | Public License, and include the text of, or the URI or
260 | hyperlink to, this Public License.
261 |
262 | 2. You may satisfy the conditions in Section 3(a)(1) in any
263 | reasonable manner based on the medium, means, and context in
264 | which You Share the Licensed Material. For example, it may be
265 | reasonable to satisfy the conditions by providing a URI or
266 | hyperlink to a resource that includes the required
267 | information.
268 |
269 | 3. If requested by the Licensor, You must remove any of the
270 | information required by Section 3(a)(1)(A) to the extent
271 | reasonably practicable.
272 |
273 | b. ShareAlike.
274 |
275 | In addition to the conditions in Section 3(a), if You Share
276 | Adapted Material You produce, the following conditions also apply.
277 |
278 | 1. The Adapter's License You apply must be a Creative Commons
279 | license with the same License Elements, this version or
280 | later, or a BY-SA Compatible License.
281 |
282 | 2. You must include the text of, or the URI or hyperlink to, the
283 | Adapter's License You apply. You may satisfy this condition
284 | in any reasonable manner based on the medium, means, and
285 | context in which You Share Adapted Material.
286 |
287 | 3. You may not offer or impose any additional or different terms
288 | or conditions on, or apply any Effective Technological
289 | Measures to, Adapted Material that restrict exercise of the
290 | rights granted under the Adapter's License You apply.
291 |
292 |
293 | Section 4 -- Sui Generis Database Rights.
294 |
295 | Where the Licensed Rights include Sui Generis Database Rights that
296 | apply to Your use of the Licensed Material:
297 |
298 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right
299 | to extract, reuse, reproduce, and Share all or a substantial
300 | portion of the contents of the database;
301 |
302 | b. if You include all or a substantial portion of the database
303 | contents in a database in which You have Sui Generis Database
304 | Rights, then the database in which You have Sui Generis Database
305 | Rights (but not its individual contents) is Adapted Material,
306 |
307 | including for purposes of Section 3(b); and
308 | c. You must comply with the conditions in Section 3(a) if You Share
309 | all or a substantial portion of the contents of the database.
310 |
311 | For the avoidance of doubt, this Section 4 supplements and does not
312 | replace Your obligations under this Public License where the Licensed
313 | Rights include other Copyright and Similar Rights.
314 |
315 |
316 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
317 |
318 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
319 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
320 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
321 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
322 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
323 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
324 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
325 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
326 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
327 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
328 |
329 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
330 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
331 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
332 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
333 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
334 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
335 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
336 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
337 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
338 |
339 | c. The disclaimer of warranties and limitation of liability provided
340 | above shall be interpreted in a manner that, to the extent
341 | possible, most closely approximates an absolute disclaimer and
342 | waiver of all liability.
343 |
344 |
345 | Section 6 -- Term and Termination.
346 |
347 | a. This Public License applies for the term of the Copyright and
348 | Similar Rights licensed here. However, if You fail to comply with
349 | this Public License, then Your rights under this Public License
350 | terminate automatically.
351 |
352 | b. Where Your right to use the Licensed Material has terminated under
353 | Section 6(a), it reinstates:
354 |
355 | 1. automatically as of the date the violation is cured, provided
356 | it is cured within 30 days of Your discovery of the
357 | violation; or
358 |
359 | 2. upon express reinstatement by the Licensor.
360 |
361 | For the avoidance of doubt, this Section 6(b) does not affect any
362 | right the Licensor may have to seek remedies for Your violations
363 | of this Public License.
364 |
365 | c. For the avoidance of doubt, the Licensor may also offer the
366 | Licensed Material under separate terms or conditions or stop
367 | distributing the Licensed Material at any time; however, doing so
368 | will not terminate this Public License.
369 |
370 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
371 | License.
372 |
373 |
374 | Section 7 -- Other Terms and Conditions.
375 |
376 | a. The Licensor shall not be bound by any additional or different
377 | terms or conditions communicated by You unless expressly agreed.
378 |
379 | b. Any arrangements, understandings, or agreements regarding the
380 | Licensed Material not stated herein are separate from and
381 | independent of the terms and conditions of this Public License.
382 |
383 |
384 | Section 8 -- Interpretation.
385 |
386 | a. For the avoidance of doubt, this Public License does not, and
387 | shall not be interpreted to, reduce, limit, restrict, or impose
388 | conditions on any use of the Licensed Material that could lawfully
389 | be made without permission under this Public License.
390 |
391 | b. To the extent possible, if any provision of this Public License is
392 | deemed unenforceable, it shall be automatically reformed to the
393 | minimum extent necessary to make it enforceable. If the provision
394 | cannot be reformed, it shall be severed from this Public License
395 | without affecting the enforceability of the remaining terms and
396 | conditions.
397 |
398 | c. No term or condition of this Public License will be waived and no
399 | failure to comply consented to unless expressly agreed to by the
400 | Licensor.
401 |
402 | d. Nothing in this Public License constitutes or may be interpreted
403 | as a limitation upon, or waiver of, any privileges and immunities
404 | that apply to the Licensor or You, including from the legal
405 | processes of any jurisdiction or authority.
406 |
407 |
408 | =======================================================================
409 |
410 | Creative Commons is not a party to its public
411 | licenses. Notwithstanding, Creative Commons may elect to apply one of
412 | its public licenses to material it publishes and in those instances
413 | will be considered the “Licensor.” The text of the Creative Commons
414 | public licenses is dedicated to the public domain under the CC0 Public
415 | Domain Dedication. Except for the limited purpose of indicating that
416 | material is shared under a Creative Commons public license or as
417 | otherwise permitted by the Creative Commons policies published at
418 | creativecommons.org/policies, Creative Commons does not authorize the
419 | use of the trademark "Creative Commons" or any other trademark or logo
420 | of Creative Commons without its prior written consent including,
421 | without limitation, in connection with any unauthorized modifications
422 | to any of its public licenses or any other arrangements,
423 | understandings, or agreements concerning use of licensed material. For
424 | the avoidance of doubt, this paragraph does not form part of the
425 | public licenses.
426 |
427 | Creative Commons may be contacted at creativecommons.org.
428 |
429 |
--------------------------------------------------------------------------------
/theme/puzzle_tagline_bg_rgb.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------