├── disable-smt
├── cos
│ ├── enable_smt_cos.sh.md5
│ ├── disable_smt_cos.sh.md5
│ ├── enable_smt_cos.sh
│ └── disable_smt_cos.sh
└── gke
│ ├── enable-smt.yaml
│ ├── disable-smt.yaml
│ └── READ.md
├── containerd
├── containerd-http-proxy
│ ├── sample_configmap.yaml
│ ├── README.md
│ └── configure_http_proxy.yaml
├── containerd-nofile-infinity
│ ├── containerd-nofile-infinity-allowlist.yaml
│ ├── containerd-nofile-infinity.yaml
│ └── README.md
├── socket-tracer
│ ├── containerd-socket-tracer-allowlist.yaml
│ ├── cri-v1alpha2-api-deprecation-reporter-allowlist.yaml
│ ├── cri-v1alpha2-api-deprecation-reporter.yaml
│ ├── containerd-socket-tracer.yaml
│ └── README.md
├── migrating-to-containerd
│ ├── README.md
│ └── find-nodepools-to-migrate.sh
├── debug-logging
│ ├── README.md
│ └── containerd-debug-logging-daemonset.yaml
└── container-insecure-registry
│ ├── README.md
│ └── insecure-registry-config.yaml
├── troubleshooting
├── os-audit
│ ├── README.md
│ └── cos-auditd-logging.yaml
├── ssh-server-config
│ ├── README.md
│ ├── set-login-grace-time.yaml
│ └── set-login-grace-time-gdcso-vmware.yaml
├── enable-kdump
│ ├── disable-hung-task-panic-sysctl.yaml
│ ├── ubuntu-kdump.md
│ ├── ubuntu-enable-kdump.yaml
│ └── cos-enable-kdump.yaml
└── perf
│ ├── perf-record.yaml
│ ├── README.md
│ └── perf-trace.yaml
├── .vscode
└── launch.json
├── manual-node-upgrade
├── README.md
└── manual_node_upgrade.sh
├── CONTRIBUTING.md
├── drop-small-mss
├── drop-small-mss.yaml
└── READ.md
├── disable-mglru
└── disable-mglru.yaml
├── kubelet
└── kubelet-log-config
│ ├── READ.md
│ └── kubelet-log-config.yaml
├── gvisor
├── enable-gvisor-flags.yaml
└── README.md
├── LICENSE
└── README.md
/disable-smt/cos/enable_smt_cos.sh.md5:
--------------------------------------------------------------------------------
1 | 78e4c15395235663789022f4cf3e0b60
--------------------------------------------------------------------------------
/disable-smt/cos/disable_smt_cos.sh.md5:
--------------------------------------------------------------------------------
1 | f11fa6d1a69c3008a5eb1e037aef3cf3
2 |
--------------------------------------------------------------------------------
/containerd/containerd-http-proxy/sample_configmap.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ConfigMap
3 | metadata:
4 | name: containerd-proxy-configmap
5 | data:
6 | HTTP_PROXY: http://proxy.example.com:80
7 | HTTPS_PROXY: https://proxy.example.com:443
8 | NO_PROXY: localhost,metadata.google.internal
9 |
--------------------------------------------------------------------------------
/containerd/containerd-nofile-infinity/containerd-nofile-infinity-allowlist.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: auto.gke.io/v1
2 | kind: AllowlistSynchronizer
3 | metadata:
4 | name: gke-org-nofile-infinity-synchronizer
5 | spec:
6 | allowlistPaths:
7 | - "Gke-Org/nofile-infinity/gke-org-nofile-infinity-allowlist.yaml"
8 |
--------------------------------------------------------------------------------
/containerd/socket-tracer/containerd-socket-tracer-allowlist.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: auto.gke.io/v1
2 | kind: AllowlistSynchronizer
3 | metadata:
4 | name: gke-org-containerd-socket-tracer-allowlist
5 | spec:
6 | allowlistPaths:
7 | - "Gke-Org/containerd-socket-tracer/gke-org-containerd-socket-tracer-allowlist.yaml"
8 |
--------------------------------------------------------------------------------
/troubleshooting/os-audit/README.md:
--------------------------------------------------------------------------------
1 | The os-audit tool is the example code for
2 | [enabling Linux auditd logs on GKE nodes](https://cloud.google.com/kubernetes-engine/docs/how-to/linux-auditd-logging),
3 | which documents how to enable verbose operating system audit logs on Google
4 | Kubernetes Engine nodes running Container-Optimized OS.
5 |
--------------------------------------------------------------------------------
/containerd/socket-tracer/cri-v1alpha2-api-deprecation-reporter-allowlist.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: auto.gke.io/v1
2 | kind: AllowlistSynchronizer
3 | metadata:
4 | name: gke-org-cri-v1alpha2-api-deprecation-reporter-allowlist
5 | spec:
6 | allowlistPaths:
7 | - "Gke-Org/containerd-socket-tracer/gke-org-cri-v1alpha2-api-deprecation-reporter-allowlist.yaml"
8 |
--------------------------------------------------------------------------------
/.vscode/launch.json:
--------------------------------------------------------------------------------
1 | {
2 | // Use IntelliSense to learn about possible attributes.
3 | // Hover to view descriptions of existing attributes.
4 | // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
5 | "version": "0.2.0",
6 | "configurations": [
7 | {
8 | "type": "bashdb",
9 | "request": "launch",
10 | "name": "Bash-Debug (simplest configuration)",
11 | "program": "${file}"
12 | }
13 | ]
14 | }
--------------------------------------------------------------------------------
/manual-node-upgrade/README.md:
--------------------------------------------------------------------------------
1 | The sample script `manual_node_upgrade.sh` filters all node pools not matching the control plane's k8s version for a specified cluster. For each node pool identified by the filter, this script submits an upgrade request via the `gcloud` command. This script is idempotent and can be run as many times as necessary to ensure all node pools manually upgrade without side effects.
2 |
3 | For more information about GKE manual upgrades, consult the online documentation: https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster
--------------------------------------------------------------------------------
/containerd/migrating-to-containerd/README.md:
--------------------------------------------------------------------------------
1 | # Migrating to Containerd
2 |
3 | Find information about running Containerd nodes on GKE [here](https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd).
4 |
5 | The sample script `find-nodepools-to-migrate.sh` iterates over all node pools across available projects, and for each node pool outputs the suggestion on whether the node pool should be migrated to Containerd. This script also outputs the node pool version and suggested migration command as listed in the [updating your node images](https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#updating-image-type) document. Make sure that you review the [known issues](https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues) for a node pool version before migration.
6 |
--------------------------------------------------------------------------------
/containerd/debug-logging/README.md:
--------------------------------------------------------------------------------
1 | # Containerd Debugging Logging
2 |
3 | The `containerd-debug-logging-daemonset.yaml` is a daemonset that enables
4 | containerd debug logs. These logs may be useful for troubleshooting. Note that
5 | debug logs are quite verbose and such increase log volume for these logs.
6 |
7 | The daemonset includes a nodeSelector targeting
8 | `containerd-debug-logging=true`. To run the daemonset on selected nodes for
9 | debugging, labels the nodes with the corresponding label (`kubectl label node
10 | ${NODE_NAME} containerd-debug-logging=true`)
11 |
12 | Otherwise, modify the daemonset's existing
13 | [nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector)
14 | or add an [node
15 | affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector)
16 | to target the desired nodes or node pools.
17 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to Contribute
2 |
3 | We'd love to accept your patches and contributions to this project. There are
4 | just a few small guidelines you need to follow.
5 |
6 | ## Contributor License Agreement
7 |
8 | Contributions to this project must be accompanied by a Contributor License
9 | Agreement. You (or your employer) retain the copyright to your contribution;
10 | this simply gives us permission to use and redistribute your contributions as
11 | part of the project. Head over to to see
12 | your current agreements on file or to sign a new one.
13 |
14 | You generally only need to submit a CLA once, so if you've already submitted one
15 | (even if it was for a different project), you probably don't need to do it
16 | again.
17 |
18 | ## Code reviews
19 |
20 | All submissions, including submissions by project members, require review. We
21 | use GitHub pull requests for this purpose. Consult
22 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
23 | information on using pull requests.
24 |
25 | ## Community Guidelines
26 |
27 | This project follows [Google's Open Source Community
28 | Guidelines](https://opensource.google.com/conduct/).
29 |
--------------------------------------------------------------------------------
/containerd/debug-logging/containerd-debug-logging-daemonset.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: DaemonSet
3 | metadata:
4 | name: containerd-debug-logging
5 | namespace: default
6 | labels:
7 | k8s-app: containerd-debug-logging
8 | spec:
9 | selector:
10 | matchLabels:
11 | name: containerd-debug-logging
12 | template:
13 | metadata:
14 | labels:
15 | name: containerd-debug-logging
16 | spec:
17 | nodeSelector:
18 | containerd-debug-logging: "true"
19 | hostPID: true
20 | containers:
21 | - name: startup-script
22 | image: gke.gcr.io/startup-script:v2
23 | imagePullPolicy: Always
24 | securityContext:
25 | privileged: true
26 | env:
27 | - name: STARTUP_SCRIPT
28 | value: |
29 | set -o errexit
30 | set -o pipefail
31 | set -o nounset
32 |
33 | echo "creating containerd.service.d directory"
34 | mkdir -p /etc/systemd/system/containerd.service.d
35 | echo "creating 10-level_debug.conf file"
36 | echo -e "[Service]\nExecStart=\nExecStart=/usr/bin/containerd --log-level debug" > /etc/systemd/system/containerd.service.d/10-level_debug.conf
37 | echo "Reloading systemd management configuration"
38 | systemctl daemon-reload
39 | echo "Restarting containerd..."
40 | systemctl restart containerd
41 |
--------------------------------------------------------------------------------
/manual-node-upgrade/manual_node_upgrade.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 |
3 | # Copyright 2023 Google LLC
4 |
5 | # Licensed under the Apache License, Version 2.0 (the "License");
6 | # you may not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 |
9 | # https://www.apache.org/licenses/LICENSE-2.0
10 |
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 |
17 | # Example: `./manual_node_upgrade.sh CLUSTER REGION`
18 | #
19 |
20 | CLUSTER_NAME=$1
21 | REGION=$2
22 |
23 | [ -z "$CLUSTER_NAME" ] || [ -z "$REGION" ] \
24 | && echo "Usage: ./manual_node_upgrade.sh CLUSTER_NAME REGION" \
25 | && exit 1;
26 |
27 | # fetch current control plane version
28 | CLUSTER_VERSION=$(gcloud container clusters describe \
29 | "$CLUSTER_NAME" --format="value(currentMasterVersion)" \
30 | --region=$REGION)
31 |
32 | # list node pools with version not matching control plane
33 | # and then issue an upgrade command for each identified node pool
34 | for np in $(gcloud container node-pools list \
35 | --format="value(name)" --filter="version!=$CLUSTER_VERSION" \
36 | --cluster "$CLUSTER_NAME" --region=$REGION); do
37 | gcloud container clusters upgrade "$CLUSTER_NAME" --node-pool $np \
38 | --region=$REGION --quiet;
39 | done
--------------------------------------------------------------------------------
/troubleshooting/ssh-server-config/README.md:
--------------------------------------------------------------------------------
1 | The ssh-server-config tool is a Kubernates DaemonSet that set [loginGraceTime](https://man.openbsd.org/sshd#g) to 0.
2 | This tool is important because it prevents the SSH server from disconnecting you during lengthy or complex debugging sessions on a cluster node.
3 |
4 | ## :warning: This configuration may increase the risk of denial of service attacks and may cause issues with legitimate SSH access.
5 |
6 | ## How to use it?
7 | Apply it to all nodes in your cluster by running the
8 | following command. Run the command once per cluster per
9 | Google Cloud Platform project.
10 |
11 | ### GKE Clusters
12 |
13 | ```
14 | kubectl apply -f \
15 | https://raw.githubusercontent.com/GoogleCloudPlatform\
16 | /k8s-node-tools/master/ssh-server-config/set-login-grace-time.yaml
17 | ```
18 |
19 | ### GDC software-only for VMware Clusters
20 |
21 | ```
22 | kubectl apply -f \
23 | https://raw.githubusercontent.com/GoogleCloudPlatform\
24 | /k8s-node-tools/master/troubleshooting/ssh-server-config/set-login-grace-time-gdcso-vmware.yaml
25 | ```
26 |
27 | ## How to get the result?
28 | Run the command below to get related log.
29 | ```
30 | kubectl -n kube-system logs -l app=ssh-server-config -c ssh-server-config
31 | ```
32 |
33 | ### GKE Clusters
34 | # How to remove the DS with GKE Cluster as example. Swap URL for GDC example.
35 | ```
36 | kubectl delete -f \
37 | https://raw.githubusercontent.com/GoogleCloudPlatform\
38 | /k8s-node-tools/master/troubleshooting/ssh-server-config/set-login-grace-time.yaml
39 | ```
40 |
41 |
42 |
--------------------------------------------------------------------------------
/containerd/containerd-http-proxy/README.md:
--------------------------------------------------------------------------------
1 | # Configure Containerd HTTP/S proxy
2 |
3 | This guide outlines the steps to configure a HTTP/S proxy for Containerd on GKE nodes, including Autopilot mode clusters. Typical use cases include access of external image repositories for container pulls.
4 |
5 | ## Instructions
6 |
7 | 1. Create a ConfigMap named `containerd-proxy-configmap` that includes the values for `HTTP_PROXY`, `HTTPS_PROXY`, and `NO_PROXY` (optional). These values are used as environment variables to configure the proxy settings for the Containerd service. A sample ConfigMap configuration is provided in `sample_configmap.yaml`. Please modify this sample with your proxy settings before applying it to your cluster.
8 |
9 | ```
10 | kubectl apply -f sample_configmap.yaml
11 | ```
12 |
13 | 2. Deploy the daemonset in `configure_http_proxy.yaml`. As it has been specifically allowlisted for GKE Autopilot, this **manifest in this repo cannot be changed if you are deploying to GKE Autopilot mode clusters**.
14 |
15 | ```
16 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd_http_proxy/configure_http_proxy.yaml
17 | ```
18 |
19 | ## Note
20 | **Any update on the `configure_http_proxy.yaml` will break the allowlist for GKE Autopilot**. If you need to make any necessary change, please ask your Google Cloud sales representative to reach to the GKE Autopilot team.
21 |
22 | How to remove it?
23 | ```bash
24 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd_http_proxy/containerd-nofile-infinity-allowlist.yaml
25 | ```
26 |
--------------------------------------------------------------------------------
/drop-small-mss/drop-small-mss.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | kind: DaemonSet
16 | apiVersion: apps/v1
17 | metadata:
18 | name: drop-small-mss
19 | namespace: kube-system
20 | labels:
21 | app: drop-small-mss
22 | spec:
23 | selector:
24 | matchLabels:
25 | app: drop-small-mss
26 | template:
27 | metadata:
28 | labels:
29 | app: drop-small-mss
30 | annotations:
31 | scheduler.alpha.kubernetes.io/critical-pod: ""
32 | spec:
33 | hostPID: true
34 | containers:
35 | - name: drop-small-mss
36 | image: k8s.gcr.io/startup-script:v2
37 | imagePullPolicy: Always
38 | securityContext:
39 | privileged: true
40 | env:
41 | - name: STARTUP_SCRIPT
42 | value: |
43 | #! /bin/bash
44 |
45 | set -o errexit
46 | set -o pipefail
47 | set -o nounset
48 |
49 | iptables -w -t mangle -I PREROUTING -m comment --comment "drop-small-mss" -p tcp -m tcpmss --mss 1:500 -j DROP
50 | priorityClassName: system-node-critical
51 | tolerations:
52 | - operator: Exists
53 |
--------------------------------------------------------------------------------
/disable-mglru/disable-mglru.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2024 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | # This DaemonSet disables the kernel option Multi-Gen LRU.
16 | #
17 | # - First, update your GKE node pools with the annotation `disable-mglru: true`
18 | # or your preferred annotation, if you update the annotation also update
19 | # the nodeSelection section below
20 | # - Next, deploy the DaemonSet to your cluster.
21 | apiVersion: apps/v1
22 | kind: DaemonSet
23 | metadata:
24 | name: disable-mglru
25 | namespace: default
26 | labels:
27 | k8s-app: disable-mglru
28 | spec:
29 | selector:
30 | matchLabels:
31 | name: disable-mglru
32 | template:
33 | metadata:
34 | labels:
35 | name: disable-mglru
36 | spec:
37 | nodeSelector:
38 | disable-mglru: "true"
39 | hostPID: true
40 | containers:
41 | - name: startup-script
42 | image: gke.gcr.io/startup-script:v2
43 | imagePullPolicy: Always
44 | securityContext:
45 | privileged: true
46 | env:
47 | - name: STARTUP_SCRIPT
48 | value: |
49 | set -o errexit
50 | set -o pipefail
51 | set -o nounset
52 | echo n > /sys/kernel/mm/lru_gen/enabled
53 |
--------------------------------------------------------------------------------
/containerd/containerd-nofile-infinity/containerd-nofile-infinity.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: DaemonSet
3 | metadata:
4 | name: nofile-infinity
5 | namespace: default
6 | labels:
7 | k8s-app: nofile-infinity
8 | spec:
9 | selector:
10 | matchLabels:
11 | name: nofile-infinity
12 | updateStrategy:
13 | type: RollingUpdate
14 | template:
15 | metadata:
16 | labels:
17 | name: nofile-infinity
18 | spec:
19 | nodeSelector:
20 | cloud.google.com/gke-container-runtime: "containerd"
21 | kubernetes.io/os: "linux"
22 | hostPID: true
23 | tolerations:
24 | - operator: "Exists"
25 | effect: "NoExecute"
26 | - operator: "Exists"
27 | effect: "NoSchedule"
28 | volumes:
29 | - name: host
30 | hostPath:
31 | path: /
32 | type: DirectoryOrCreate
33 | initContainers:
34 | - name: startup-script
35 | image: gke.gcr.io/debian-base:bookworm-v1.0.0-gke.1
36 | imagePullPolicy: Always
37 | securityContext:
38 | privileged: true
39 | volumeMounts:
40 | - name: host
41 | mountPath: /host
42 | command:
43 | - /bin/sh
44 | - -c
45 | - |
46 | set -e
47 | set -u
48 | echo "Generating containerd system drop in config for nofile limit"
49 | nofile_limit_path="/host/etc/systemd/system/containerd.service.d/40-LimitNOFILE-infinity.conf"
50 | cat >> "${nofile_limit_path}" < /dev/null; then
45 | echo "'nosmt' already present on the kernel command line. Nothing to do."
46 | return
47 | fi
48 | echo "Attempting to set 'nosmt' on the kernel command line."
49 | if [[ "${EUID}" -ne 0 ]]; then
50 | echo "This script must be run as root."
51 | return 1
52 | fi
53 | check_not_secure_boot
54 |
55 | dir="$(mktemp -d)"
56 | mount /dev/sda12 "${dir}"
57 | sed -i -e "s|cros_efi|cros_efi nosmt|g" "${dir}/efi/boot/grub.cfg"
58 | umount "${dir}"
59 | rmdir "${dir}"
60 | echo "Rebooting."
61 | reboot
62 | }
63 |
64 | disable_smt
65 |
--------------------------------------------------------------------------------
/drop-small-mss/READ.md:
--------------------------------------------------------------------------------
1 | # drop-small-mss.yaml (Network Security Hardening)
2 |
3 | The `drop-small-mss` tool is a Kubernetes DaemonSet that adds a firewall rule to every cluster node to drop incoming TCP packets with an unusually small Maximum Segment Size (MSS). This tool is important for hardening network security and protecting nodes from certain types of denial-of-service (DoS) attacks that exploit small packet sizes to exhaust server resources. 🛡️
4 |
5 | ---
6 | ### ⚠️ Warning
7 | This tool runs as a **privileged** and **system-node-critical** pod to modify the host's core networking rules. While this rule protects against a known attack vector, it could potentially interfere with legitimate but non-standard network clients or tunnels that require a small MSS to function correctly.
8 |
9 | ---
10 | ### **Prerequisites**
11 |
12 | There are no special prerequisites. This DaemonSet is designed to run on **all nodes** in your cluster, including control-plane nodes.
13 |
14 | ---
15 | ### **How to use it?**
16 |
17 | 1. Save the script to a file named `drop-small-mss.yaml`.
18 | 2. Apply the DaemonSet to your cluster.
19 | ```bash
20 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/drop-small-mss/drop-small-mss.yaml
21 | ```
22 |
23 | ---
24 | ### **How to get the result?**
25 |
26 | The result of this tool is a new `iptables` rule on every node. You can verify that the rule was successfully applied:
27 |
28 | 1. **SSH into any node** in your cluster.
29 | 2. **List the firewall rules** in the `mangle` table. You should see the new rule with the "drop-small-mss" comment at the top of the `PREROUTING` chain.
30 | ```bash
31 | iptables -t mangle -L PREROUTING -v
32 | ```
33 |
34 | ---
35 | ### **How to remove it?**
36 |
37 | You can delete the DaemonSet to prevent the rule from being applied to any new nodes that join the cluster.
38 |
39 | ```bash
40 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/drop-small-mss/drop-small-mss.yaml
--------------------------------------------------------------------------------
/containerd/containerd-nofile-infinity/README.md:
--------------------------------------------------------------------------------
1 | # Configure Containerd NOFILE Limit
2 |
3 | This guide outlines the steps to configure the `LimitNOFILE` setting for the containerd service on GKE nodes. This is typically used to increase the maximum number of open file descriptors allowed for containerd, which can be beneficial for high-concurrency workloads or specific applications that require a large number of open files. Since containerd 2.0 the `LimitNOFILE` has been removed - see containerd/containerd#8924 for more details.
4 |
5 | ## Prerequiste for GKE Autopilot Clusters:
6 | Deploy the `AllowListSynchronizer` resource in `containerd-nofile-infinity-allowlist.yaml`. This resource updates [Autopilot's security policies](https://cloud.google.com/kubernetes-engine/docs/how-to/run-autopilot-partner-workloads#about-allowlistsynchronizer) to run the privileged daemonset.
7 |
8 | ```bash
9 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd-nofile-infinity/containerd-nofile-infinity-allowlist.yaml
10 | ```
11 |
12 | ## Instructions
13 |
14 | 1. Deploy the daemonset in `containerd-nofile-infinity.yaml`. This DaemonSet runs a privileged container that modifies the `containerd.service` systemd unit on each node to set `LimitNOFILE=infinity` and then restarts the Containerd service.
15 |
16 | ```bash
17 | kubectl apply -f containerd-nofile-infinity.yaml
18 | ```
19 |
20 | ## Note
21 | **This DaemonSet is specifically allowlisted for GKE Autopilot clusters.** Attempting to deploy privileged DaemonSets that modify the underlying host OS on Autopilot may lead to unexpected behavior, stability issues, or prevented by Autopilot's security policies. If you need to make any necessary change, please ask your Google Cloud sales representative to reach to the GKE Autopilot team.
22 |
23 | How to remove it?
24 | ```bash
25 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd-nofile-infinity/containerd-nofile-infinity-allowlist.yaml
26 | ```
--------------------------------------------------------------------------------
/troubleshooting/enable-kdump/ubuntu-kdump.md:
--------------------------------------------------------------------------------
1 | # kdump for Ubuntu
2 |
3 | ## Obtaining an kdump
4 |
5 | Use the `ubuntu-enable-kdump.yaml` DaemonSet to install and setup kdump on a set
6 | of nodes. The DaemonSet uses the `enable-kdump=true` node selector, so nodes
7 | must be labeled
8 |
9 | ```
10 | kubectl label nodes ${NODE_NAME} enable-kdump=true
11 | ```
12 |
13 | ## Triggering a test kdump
14 |
15 | SSH into a node and trigger a system crash with sysrq
16 |
17 | ```
18 | sudo -i
19 | sysctl -w kernel.sysrq=1
20 | echo c > /proc/sysrq-trigger
21 | ```
22 |
23 | A dump will be written to `/var/crash`
24 |
25 | ## Analyzing an kdump
26 |
27 | Create a VM for the analysis
28 |
29 | ```
30 | gcloud beta compute instances create dump-test-vm \
31 | --machine-type=e2-standard-4 \
32 | --image-family=ubuntu-1804-lts \
33 | --image-project=ubuntu-os-cloud \
34 | --boot-disk-size=100GB \
35 | --boot-disk-type=pd-ssd \
36 | --zone=us-central1-c
37 | ```
38 |
39 | SCP the contents of `/var/crash` to dump-test-vm
40 |
41 | Find the right deb for the correct kernel version, see
42 | [here](https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+packages?field.name_filter=linux-gke&field.status_filter=published).
43 | Obtain the url for the deb for the linux image with debug symbols, e.g. for
44 | `linux-gke-5.0 - 5.0.0-1046.47` the deb containing the vmlinux can be obtained
45 | [here](https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/18789932/+files/linux-image-unsigned-5.0.0-1032-gke-dbgsym_5.0.0-1032.33_amd64.ddeb).
46 |
47 | ```
48 | gcloud compute ssh dump-test-vm
49 | sudo apt-get update && sudo apt-get install -y linux-crashdump
50 |
51 | cd ${HOME}
52 | # Location of deb for vmlinux
53 | LINUX_DEB_IMAGE_URL="https://launchpad.net/..."
54 | wget "${LINUX_DEB_IMAGE_URL}"
55 | ar -x linux-image-unsigned-5.0.0-1032-gke-dbgsym_5.0.0-1032.33_amd64.ddeb
56 | mkdir debug_image
57 | tar -xf data.tar.xz -C debug_image/
58 |
59 | # Contents of /var/crash from the crash dump.
60 | CRASH_DUMP="var/crash/SOME_TIMESTAMP/dump.SOME_TIMESTAMP"
61 |
62 | # Start debugging!
63 | crash debug_image/usr/lib/debug/boot/vmlinux-5.0.0-1032-gke ${CRASH_DUMP}
64 | ```
65 |
--------------------------------------------------------------------------------
/troubleshooting/ssh-server-config/set-login-grace-time.yaml:
--------------------------------------------------------------------------------
1 | kind: DaemonSet
2 | apiVersion: apps/v1
3 | metadata:
4 | name: ssh-server-config
5 | namespace: kube-system
6 | labels:
7 | app: ssh-server-config
8 | spec:
9 | selector:
10 | matchLabels:
11 | app: ssh-server-config
12 | template:
13 | metadata:
14 | labels:
15 | app: ssh-server-config
16 | spec:
17 | hostPID: true
18 | initContainers:
19 | - name: ssh-server-config
20 | image: gke.gcr.io/debian-base:bookworm-v1.0.3-gke.0@sha256:91b29592ee0b782c0ab777bfcabd14a0ae83d8e8eb90d3f0eb500acafae3f4e5
21 | securityContext:
22 | privileged: true
23 | command:
24 | - /bin/sh
25 | - -c
26 | - |
27 | set -e
28 | set -u
29 | if [ ! -e "/etc/ssh/sshd_config" ] ; then
30 | echo "/etc/ssh/sshd_config not found"
31 | exit 1
32 | fi
33 |
34 | cp /etc/ssh/sshd_config /etc/ssh/sshd_config.cp
35 | if grep -q "^LoginGraceTime" "/etc/ssh/sshd_config.cp"; then
36 | # Update existing LoginGraceTime
37 | sed -i "s/^LoginGraceTime.*/LoginGraceTime 0/" "/etc/ssh/sshd_config.cp"
38 | else
39 | # Add new LoginGraceTime
40 | echo "LoginGraceTime 0" >> "/etc/ssh/sshd_config.cp"
41 | fi
42 |
43 | cp /etc/ssh/sshd_config.cp /etc/ssh/sshd_config
44 | rm /etc/ssh/sshd_config.cp
45 |
46 |
47 | EXEC="nsenter -t 1 -m -p --"
48 | $EXEC systemctl reload sshd
49 | echo "sshd logingracetime after restart:"
50 | $EXEC sshd -T | grep logingracetime
51 | volumeMounts:
52 | - name: sshd-config
53 | mountPath: /etc/ssh/sshd_config
54 | containers:
55 | - name: pause-container
56 | image: gke.gcr.io/pause:3.7@sha256:5b658f3c4f034a9619ad7e6d1ee49ee532a1e0a598dc68b06d17b6036116b924
57 | volumes:
58 | - name: sshd-config
59 | hostPath:
60 | path: /etc/ssh/sshd_config
61 | type: File
62 |
63 |
64 |
65 |
--------------------------------------------------------------------------------
/kubelet/kubelet-log-config/READ.md:
--------------------------------------------------------------------------------
1 | # kubelet-log-config.yaml (Container Log Rotation)
2 |
3 | The `kubelet-log-config` tool is a Kubernetes DaemonSet that configures container log rotation settings on a node by modifying the kubelet's configuration file. This tool is important for managing disk space on your nodes by controlling the size and number of log files kept for each container. 🪵
4 |
5 | ---
6 | ### ⚠️ Warning
7 | This tool runs as **privileged** and **restarts the kubelet service** on the node. A kubelet restart can disrupt running pods. It is highly recommended to apply this configuration to a new, empty node pool and then safely migrate your workloads to the newly configured nodes.
8 |
9 | ---
10 | ### **Prerequisites**
11 |
12 | This tool only runs on nodes with a specific label. You must label the node(s) you want to configure before applying the DaemonSet.
13 |
14 | 1. **Label the node:**
15 | ```bash
16 | kubectl label node kubelet-log-config=true
17 | ```
18 |
19 | ---
20 | ### **How to use it?**
21 |
22 | 1. Save the script to a file named `kubelet-log-config.yaml`.
23 | 2. Before applying, you can edit the file to change the values for `CONTAINER_LOG_MAX_SIZE` (e.g., "20Mi") and `CONTAINER_LOG_MAX_FILES` (e.g., "10").
24 | 3. Apply the DaemonSet to your cluster:
25 | ```bash
26 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/kubelet/kubelet-log-config/kubelet-log-config.yaml
27 | ```
28 |
29 | ---
30 | ### **How to get the result?**
31 |
32 | You can verify that the script ran successfully in two ways:
33 |
34 | 1. **Check the pod logs:** The pod runs in the `kube-system` namespace and will output "Success!" upon completion.
35 | ```bash
36 | kubectl logs -n kube-system -l name=kubelet-log-config
37 | ```
38 | 2. **Inspect the node's configuration:** SSH into the labeled node and view the kubelet config file to see the new values.
39 | ```bash
40 | cat /home/kubernetes/kubelet-config.yaml
41 | ```
42 |
43 | ---
44 | ### **How to remove it?**
45 |
46 | You can remove the DaemonSet to prevent it from configuring any new nodes.
47 |
48 | ```bash
49 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/kubelet/kubelet-log-config/kubelet-log-config.yaml
--------------------------------------------------------------------------------
/containerd/socket-tracer/cri-v1alpha2-api-deprecation-reporter.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: DaemonSet
3 | metadata:
4 | name: cri-v1alpha2-api-deprecation-reporter
5 | namespace: default
6 | labels:
7 | k8s-app: cri-v1alpha2-api-deprecation-reporter
8 | spec:
9 | selector:
10 | matchLabels:
11 | name: cri-v1alpha2-api-deprecation-reporter
12 | template:
13 | metadata:
14 | labels:
15 | name: cri-v1alpha2-api-deprecation-reporter
16 | annotations:
17 | autopilot.gke.io/no-connect: "true"
18 | spec:
19 | hostPID: true
20 | containers:
21 | - name: reporter
22 | image: mirror.gcr.io/ubuntu:24.04
23 | command: ["/bin/sh", "-c"]
24 | args:
25 | - |
26 | apt-get update && apt-get install -y jq
27 |
28 | echo "time=\"$(date -u +'%Y-%m-%dT%H:%M:%SZ')\" msg=\"checking for CRI v1alpha2 API deprecation warnings\" node=\"$NODE_NAME\""
29 |
30 | while true; do
31 | DEPRECATION_DATA=$(nsenter -at 1 -- /usr/bin/ctr deprecations list --format json)
32 | V1ALPHA2_WARNING=$(echo "$DEPRECATION_DATA" | jq '.[] | select(.id == "io.containerd.deprecation/cri-api-v1alpha2")')
33 | if [ -n "$V1ALPHA2_WARNING" ]; then
34 | LAST_OCCURRENCE=$(echo $V1ALPHA2_WARNING | jq -r .lastOccurrence)
35 | echo "time=\"$(date -u +'%Y-%m-%dT%H:%M:%SZ')\" msg=\"found CRI v1alpha2 API deprecation warning\" node=\"$NODE_NAME\" lastOccurrence=\"$LAST_OCCURRENCE\""
36 | else
37 | echo "time=\"$(date -u +'%Y-%m-%dT%H:%M:%SZ')\" msg=\"CRI v1alpha2 API deprecation warning not found on this node\" node=\"$NODE_NAME\""
38 | fi
39 |
40 | # NOTE: You can update this interval as needed.
41 | sleep $INTERVAL
42 | done
43 | env:
44 | - name: NODE_NAME
45 | valueFrom:
46 | fieldRef:
47 | fieldPath: spec.nodeName
48 | - name: INTERVAL
49 | value: "60"
50 | securityContext:
51 | # Privileged is required to use 'nsenter' to enter the host's PID
52 | # namespace to run 'ctr' on the node.
53 | privileged: true
54 | tolerations:
55 | - key: "node.kubernetes.io/not-ready"
56 | operator: "Exists"
57 | effect: "NoExecute"
58 |
--------------------------------------------------------------------------------
/troubleshooting/ssh-server-config/set-login-grace-time-gdcso-vmware.yaml:
--------------------------------------------------------------------------------
1 | kind: DaemonSet
2 | apiVersion: apps/v1
3 | metadata:
4 | name: ssh-server-config
5 | namespace: kube-system
6 | labels:
7 | app: ssh-server-config
8 | spec:
9 | selector:
10 | matchLabels:
11 | app: ssh-server-config
12 | template:
13 | metadata:
14 | labels:
15 | app: ssh-server-config
16 | spec:
17 | hostPID: true
18 | tolerations:
19 | - operator: Exists
20 | initContainers:
21 | - name: ssh-server-config
22 | image: gke.gcr.io/debian-base:bookworm-v1.0.3-gke.0@sha256:91b29592ee0b782c0ab777bfcabd14a0ae83d8e8eb90d3f0eb500acafae3f4e5
23 | securityContext:
24 | privileged: true
25 | command:
26 | - /bin/sh
27 | - -c
28 | - |
29 | set -e
30 | set -u
31 | if [ ! -e "/etc/ssh/sshd_config" ] ; then
32 | echo "/etc/ssh/sshd_config not found"
33 | exit 1
34 | fi
35 |
36 | cp /etc/ssh/sshd_config /etc/ssh/sshd_config.cp
37 | if grep -q "^LoginGraceTime" "/etc/ssh/sshd_config.cp"; then
38 | # Update existing LoginGraceTime
39 | sed -i "s/^LoginGraceTime.*/LoginGraceTime 0/" "/etc/ssh/sshd_config.cp"
40 | else
41 | # Add new LoginGraceTime
42 | echo "LoginGraceTime 0" >> "/etc/ssh/sshd_config.cp"
43 | fi
44 |
45 | cp /etc/ssh/sshd_config.cp /etc/ssh/sshd_config
46 | rm /etc/ssh/sshd_config.cp
47 |
48 |
49 | EXEC="nsenter -t 1 -m -p --"
50 | $EXEC systemctl reload sshd
51 | echo "sshd logingracetime after restart:"
52 | $EXEC sshd -T | grep logingracetime
53 | resources:
54 | requests:
55 | memory: 5Mi
56 | cpu: 5m
57 | volumeMounts:
58 | - name: sshd-config
59 | mountPath: /etc/ssh/sshd_config
60 | containers:
61 | - name: pause-container
62 | image: gke.gcr.io/pause:3.7@sha256:5b658f3c4f034a9619ad7e6d1ee49ee532a1e0a598dc68b06d17b6036116b924
63 | volumes:
64 | - name: sshd-config
65 | hostPath:
66 | path: /etc/ssh/sshd_config
67 | type: File
68 |
69 |
70 |
--------------------------------------------------------------------------------
/gvisor/enable-gvisor-flags.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2020 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | # Deploy this DaemonSet to enable flag override to gVisor pods. To set flags for
16 | # a given pod, add pod annotations with the following format:
17 | # dev.gvisor.flag.:
18 | #
19 | # Here is an example that enables "debug-log", "debug", and "strace" flags:
20 | # metadata:
21 | # annotations:
22 | # dev.gvisor.flag.debug-log: "/tmp/sandbox-%ID/"
23 | # dev.gvisor.flag.debug: "true"
24 | # dev.gvisor.flag.strace: "true"
25 | #
26 | # Note: this is supported starting from 1.18.6-gke.3504.
27 |
28 | apiVersion: apps/v1
29 | kind: DaemonSet
30 | metadata:
31 | name: enable-gvisor-flags
32 | namespace: kube-system
33 | spec:
34 | selector:
35 | matchLabels:
36 | name: enable-gvisor-flags
37 | updateStrategy:
38 | type: RollingUpdate
39 | template:
40 | metadata:
41 | labels:
42 | name: enable-gvisor-flags
43 | spec:
44 | tolerations:
45 | - operator: Exists
46 | volumes:
47 | - name: host
48 | hostPath:
49 | path: /
50 | initContainers:
51 | - name: enable-gvisor-flags
52 | image: ubuntu
53 | command:
54 | - /bin/bash
55 | - -c
56 | - echo -e ' allow-flag-override = "true"' >> "/host/run/containerd/runsc/config.toml"
57 | volumeMounts:
58 | - name: host
59 | mountPath: /host
60 | resources:
61 | requests:
62 | memory: 5Mi
63 | cpu: 5m
64 | securityContext:
65 | privileged: true
66 | containers:
67 | - image: gke.gcr.io/pause:3.8@sha256:880e63f94b145e46f1b1082bb71b85e21f16b99b180b9996407d61240ceb9830
68 | name: pause
69 | nodeSelector:
70 | "sandbox.gke.io/runtime": "gvisor"
71 |
--------------------------------------------------------------------------------
/troubleshooting/perf/perf-record.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | #
15 | #
16 | #
17 | # Change TARGET_PGREP below to names of processes to trace.
18 | # Also change KERNEL_VERSION below to version being tested.
19 |
20 | apiVersion: apps/v1
21 | kind: DaemonSet
22 | metadata:
23 | name: enable-perf-record
24 | labels:
25 | app: enable-perf-record
26 | spec:
27 | selector:
28 | matchLabels:
29 | name: enable-perf-record
30 | template:
31 | metadata:
32 | labels:
33 | name: enable-perf-record
34 | spec:
35 | nodeSelector:
36 | "enable-perf": "true"
37 | hostPID: true
38 | volumes:
39 | - name: host
40 | hostPath:
41 | path: /
42 | containers:
43 | - name: enable-perf-record
44 | image: gke.gcr.io/debian-base
45 | imagePullPolicy: Always
46 | volumeMounts:
47 | - name: host
48 | mountPath: /host
49 | securityContext:
50 | privileged: true
51 | command:
52 | - /bin/bash
53 | - -c
54 | - |
55 |
56 | set -o errexit
57 | set -o pipefail
58 | set -o nounset
59 |
60 | KERNEL_VERSION="5.0.0"
61 |
62 | apt-get update && apt-get install -y curl procps daemontools build-essential bison flex libelf-dev binutils-dev
63 | curl -O https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/v"${KERNEL_VERSION}"/perf-"${KERNEL_VERSION}".tar.gz
64 | tar xzf perf-"${KERNEL_VERSION}".tar.gz
65 | make -C perf-"${KERNEL_VERSION}"/tools/perf install
66 | PERF="/root/bin/perf"
67 |
68 | d=$(date '+%Y-%m-%dT%H:%M:%SZ')
69 | out_dir="/host/var/log/perf_record/${d}"
70 | mkdir -p "${out_dir}"
71 | cd "${out_dir}"
72 | echo "starting perf recording! will dump to ${out_dir}"
73 |
74 | "${PERF}" record -F 999 -g --timestamp-filename --switch-output="10s" > /dev/null
75 |
--------------------------------------------------------------------------------
/containerd/container-insecure-registry/README.md:
--------------------------------------------------------------------------------
1 | # Configure Insecure Registries for Containerd on GKE
2 |
3 | This guide outlines how to configure Google Kubernetes Engine (GKE) nodes to pull container images from an insecure (HTTP) registry. This is achieved by deploying a `DaemonSet` that modifies the `containerd` configuration on each node to trust the specified registry. This is typically used for accessing local or private container registries that are not configured with TLS.
4 |
5 | ---
6 |
7 | ## Instructions
8 |
9 | 1. **Modify the DaemonSet YAML file.** Before applying the manifest, you must edit it to specify the address of your insecure registry. Locate the `env` section for the `startup-script` container and change the `value` for the `ADDRESS` variable from `REGISTRY_ADDRESS` to your registry's actual address (e.g., `my-registry.local:5000`).
10 |
11 | **Original:**
12 | ```yaml
13 | env:
14 | - name: ADDRESS
15 | value: "REGISTRY_ADDRESS"
16 | ```
17 | **Example after editing:**
18 | ```yaml
19 | env:
20 | - name: ADDRESS
21 | value: "my-registry.local:5000"
22 | ```
23 | 2. **Deploy the DaemonSet.** Apply your modified YAML file to your cluster. The `DaemonSet` will create a pod on each GKE node that uses the `containerd` runtime.
24 |
25 | ```bash
26 | kubectl apply -f your-daemonset-filename.yaml
27 | ```
28 | The script within the pod will then automatically update the `containerd` configuration and restart the service to apply the changes.
29 |
30 | ---
31 |
32 | ## How It Works
33 |
34 | The `DaemonSet` runs a privileged pod on each node selected by the `nodeSelector` (`cloud.google.com/gke-container-runtime: "containerd"`). This pod executes a startup script that:
35 | 1. Identifies the `containerd` configuration file (`/etc/containerd/config.toml`).
36 | 2. Detects which configuration model the node is using (legacy V1 or modern V2).
37 | 3. Injects the necessary configuration to define your registry as an insecure endpoint (`http://...`).
38 | 4. Reloads the `systemd` daemon and restarts the `containerd` service to make the changes take effect.
39 |
40 | ---
41 |
42 | ## Important Considerations
43 |
44 | * **Security Risk:** Configuring an insecure registry means that image pulls are unencrypted (HTTP) and unauthenticated, which can be a security risk. This should only be used in trusted, isolated network environments.
45 | * **GKE Autopilot:** This method uses a privileged `DaemonSet` that modifies the host node's file system. This is **not compatible with GKE Autopilot clusters**, which restrict this level of host access. This solution is intended for GKE Standard clusters only.
--------------------------------------------------------------------------------
/gvisor/README.md:
--------------------------------------------------------------------------------
1 | # enable-gvisor-flags.yaml (gVisor Pod Flag Overrides)
2 |
3 | The `enable-gvisor-flags` tool is a Kubernetes DaemonSet that modifies the gVisor (`runsc`) configuration on GKE Sandbox nodes to allow runtime flags to be set on a per-pod basis. This tool is important for advanced debugging and customization of sandboxed pods by enabling specific gVisor features like `strace` or debug logging. 🔬
4 |
5 | ---
6 | ### ⚠️ Warning
7 | This tool runs as **privileged** to modify a system configuration file. The flags you enable via pod annotations can alter the security, stability, and performance characteristics of your sandboxed workloads. Use these flags with a clear understanding of their function.
8 |
9 | ---
10 | ### **Prerequisites**
11 |
12 | This DaemonSet is designed to run automatically on nodes that are part of a **GKE Sandbox node pool**, which uses the gVisor runtime. It will not run on standard node pools. This feature is supported on GKE versions `1.18.6-gke.3504` and later.
13 |
14 | ---
15 | ### **How to use it?**
16 |
17 | 1. Save the script to a file named `enable-gvisor-flags.yaml`.
18 | 2. Apply the DaemonSet to your cluster. It will automatically find and configure all gVisor-enabled nodes.
19 | ```bash
20 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/gvisor/enable-gvisor-flags.yaml
21 | ```
22 |
23 | ---
24 | ### **How to get the result?**
25 |
26 | The result of this tool is the **ability to set gVisor flags using pod annotations**. After the DaemonSet has run, you can enable flags on any pod scheduled to a GKE Sandbox node.
27 |
28 | 1. **Add annotations to your pod's metadata.** Here is an example that enables debug logging and `strace` for a specific pod:
29 | ```yaml
30 | apiVersion: v1
31 | kind: Pod
32 | metadata:
33 | name: my-sandboxed-pod
34 | annotations:
35 | dev.gvisor.flag.debug-log: "/tmp/sandbox-%ID/"
36 | dev.gvisor.flag.debug: "true"
37 | dev.gvisor.flag.strace: "true"
38 | spec:
39 | runtimeClassName: gvisor
40 | containers:
41 | - name: my-container
42 | image: nginx
43 | ```
44 |
45 | 2. **To verify the configuration was applied to the node,** you can SSH into a gVisor node and check the contents of the `runsc` config file. It should contain the new line.
46 | ```bash
47 | cat /run/containerd/runsc/config.toml
48 | ```
49 |
50 | ---
51 | ### **How to remove it?**
52 |
53 | You can remove the DaemonSet to prevent it from configuring any new gVisor nodes.
54 |
55 | ```bash
56 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/gvisor/enable-gvisor-flags.yaml
--------------------------------------------------------------------------------
/containerd/container-insecure-registry/insecure-registry-config.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: DaemonSet
3 | metadata:
4 | name: insecure-registries
5 | namespace: default
6 | labels:
7 | k8s-app: insecure-registries
8 | spec:
9 | selector:
10 | matchLabels:
11 | name: insecure-registries
12 | updateStrategy:
13 | type: RollingUpdate
14 | template:
15 | metadata:
16 | labels:
17 | name: insecure-registries
18 | spec:
19 | nodeSelector:
20 | cloud.google.com/gke-container-runtime: "containerd"
21 | hostPID: true
22 | containers:
23 | - name: startup-script
24 | image: gke.gcr.io/startup-script:v2
25 | imagePullPolicy: Always
26 | securityContext:
27 | privileged: true
28 | env:
29 | - name: ADDRESS
30 | value: "REGISTRY_ADDRESS"
31 | - name: STARTUP_SCRIPT
32 | value: |
33 | set -o errexit
34 | set -o pipefail
35 | set -o nounset
36 |
37 | if [[ -z "$ADDRESS" || "$ADDRESS" == "REGISTRY_ADDRESS" ]]; then
38 | echo "Error: Environment variable ADDRESS is not set in containers.spec.env"
39 | exit 1
40 | fi
41 |
42 | echo "Allowlisting insecure registries..."
43 | containerd_config="/etc/containerd/config.toml"
44 | hostpath=$(sed -nr 's; config_path = "([-/a-z0-9_.]+)";\1;p' "$containerd_config")
45 | if [[ -z "$hostpath" ]]; then
46 | echo "Node uses CRI config model V1 (deprecated), adding mirror under $containerd_config..."
47 | grep -qxF '[plugins."io.containerd.grpc.v1.cri".registry.mirrors."'$ADDRESS'"]' "$containerd_config" || \
48 | echo -e '[plugins."io.containerd.grpc.v1.cri".registry.mirrors."'$ADDRESS'"]\n endpoint = ["http://'$ADDRESS'"]' >> "$containerd_config"
49 | else
50 | host_config_dir="$hostpath/$ADDRESS"
51 | host_config_file="$host_config_dir/hosts.toml"
52 | echo "Node uses CRI config model V2, adding mirror under $host_config_file..."
53 | if [[ ! -e "$host_config_file" ]]; then
54 | mkdir -p "$host_config_dir"
55 | echo -e "server = \"https://$ADDRESS\"\n" > "$host_config_file"
56 | fi
57 | echo -e "[host.\"http://$ADDRESS\"]\n capabilities = [\"pull\", \"resolve\"]\n" >> "$host_config_file"
58 | fi
59 | echo "Reloading systemd management configuration"
60 | systemctl daemon-reload
61 | echo "Restarting containerd..."
62 | systemctl restart containerd
63 |
--------------------------------------------------------------------------------
/troubleshooting/perf/README.md:
--------------------------------------------------------------------------------
1 | # Kubernetes Node Profiling Tools
2 |
3 | These tools deploy the Linux `perf` utility to profile and trace performance on specific Kubernetes nodes.
4 |
5 | ---
6 | ## **Prerequisites for Both Tools**
7 |
8 | These tools only run on nodes with a specific label. You must label the node you want to profile before using them.
9 |
10 | 1. **Label the node:**
11 | ```bash
12 | kubectl label node enable-perf=true
13 | ```
14 | 2. **Check Kernel Version:** You must edit the downloaded YAML file to ensure the `KERNEL_VERSION` variable matches your node's kernel (`uname -r`).
15 |
16 | ---
17 | ### **`perf-record.yaml` (CPU Profiling)**
18 |
19 | The `perf-record` tool is a Kubernetes DaemonSet that continuously profiles CPU usage on a node. This tool is important for discovering which programs and functions are consuming the most CPU time. 🏎️
20 |
21 | #### ⚠️ Warning
22 | This tool runs as **privileged** and adds performance overhead. Use it only for debugging and remove it when finished.
23 |
24 | #### How to use it?
25 | Apply it to a labeled node by running the following command.
26 | ```
27 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/troubleshooting/perf/perf-record.yaml
28 | ```
29 |
30 | #### How to get the result?
31 | The profiling data is not in the pod logs. It is saved directly on the node's filesystem. SSH into your node and find the `perf.data` files in a new timestamped directory inside:
32 | /var/log/perf_record/
33 |
34 | ---
35 | ### **`perf-trace.yaml` (System Call Tracing)**
36 |
37 | The `perf-trace` tool is a Kubernetes DaemonSet that traces system calls made by programs on a node. This tool is important for debugging low-level application errors related to file access, permissions, or networking. 🐞
38 |
39 | #### ⚠️ Warning
40 | This tool runs as **privileged** and adds performance overhead. Use it only for debugging and remove it when finished.
41 |
42 | #### How to use it?
43 | Apply it to a labeled node by running the following command. You can edit the YAML to change the `TARGET_PGREP` environment variable to trace a specific process (e.g., "containerd") or leave it empty to trace the whole system.
44 | ```
45 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/troubleshooting/perf/perf-trace.yaml
46 | ```
47 |
48 | #### How to get the result?
49 | The trace data is not in the pod logs. It is saved directly on the node's filesystem. SSH into your node and find the log files in a new timestamped directory inside:
50 | ```
51 | /var/log/perf_trace/
52 | ```
53 |
54 | #### How to remove it?
55 | ```
56 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/troubleshooting/perf/perf-trace.yaml
57 | ```
--------------------------------------------------------------------------------
/disable-smt/gke/enable-smt.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
16 | # Deploy this DaemonSet to enable hyper-threading on the nodes with the
17 | # "cloud.google.com/gke-smt-disabled=false" label.
18 | #
19 | # WARNING: Enabling hyper-threading may make the node vulnerable to
20 | # Microarchitectural Data Sampling (MDS). Please ensure that this is acceptable
21 | # before deploying this to your production clusters.
22 | #
23 | # WARNING: Enabling hyper-threading requires node reboot. Therefore, in order
24 | # to avoid disrupting your workloads, it is recommended to create a new node
25 | # pool with the "cloud.google.com/gke-smt-disabled=false" label in your cluster,
26 | # deploy the DaemonSet to enable hyper-threading in that node pool, and then
27 | # migrate your workloads to the new node pool.
28 |
29 | #
30 | # NOTE:
31 | # It's recommended to use the --threads-per-core flag on the node-pool to
32 | # configure SMT setting on nodes.
33 | # https://cloud.google.com/kubernetes-engine/docs/how-to/configure-smt
34 | #
35 |
36 | apiVersion: apps/v1
37 | kind: DaemonSet
38 | metadata:
39 | name: enable-smt
40 | namespace: kube-system
41 | spec:
42 | selector:
43 | matchLabels:
44 | name: enable-smt
45 | updateStrategy:
46 | type: RollingUpdate
47 | template:
48 | metadata:
49 | labels:
50 | name: enable-smt
51 | spec:
52 | tolerations:
53 | - operator: Exists
54 | volumes:
55 | - name: host
56 | hostPath:
57 | path: /
58 | hostPID: true
59 | initContainers:
60 | - name: smt
61 | image: bash
62 | command:
63 | - /usr/local/bin/bash
64 | - -c
65 | - |
66 | set -euo pipefail
67 | echo "SMT is set to $(cat /host/sys/devices/system/cpu/smt/control)"
68 | echo "Setting SMT to on";
69 | echo -n "on" > /host/sys/devices/system/cpu/smt/control
70 | echo "Restarting Kubelet..."
71 | chroot /host nsenter --target=1 --all -- systemctl restart kubelet.service
72 | volumeMounts:
73 | - name: host
74 | mountPath: /host
75 | resources:
76 | requests:
77 | memory: 5Mi
78 | cpu: 5m
79 | securityContext:
80 | privileged: true
81 | containers:
82 | - image: gcr.io/google-containers/pause:3.2
83 | name: pause
84 | # Ensures that the pods will only run on the nodes having the certain
85 | # label.
86 | nodeSelector:
87 | "cloud.google.com/gke-smt-disabled": "false"
88 |
--------------------------------------------------------------------------------
/disable-smt/gke/disable-smt.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
16 | # Deploy this DaemonSet to disable hyper-threading on the nodes with the
17 | # "cloud.google.com/gke-smt-disabled=true" label.
18 | #
19 | # WARNING: Disabling hyper-threading might have severe performance impact on
20 | # your clusters and application. Please ensure that this is acceptable before
21 | # deploying this to your production clusters.
22 | #
23 | # WARNING: Disabling hyper-threading requires node reboot. Therefore, in order
24 | # to avoid disrupting your workloads, it is recommended to create a new node
25 | # pool with the "cloud.google.com/gke-smt-disabled=true" label in your cluster,
26 | # deploy the DaemonSet to disable hyper-threading in that node pool, and then
27 | # migrate your workloads to the new node pool.
28 |
29 | #
30 | # NOTE:
31 | # It's recommended to use the --threads-per-core flag on the node-pool to
32 | # configure SMT setting on nodes.
33 | # https://cloud.google.com/kubernetes-engine/docs/how-to/configure-smt
34 | #
35 |
36 | apiVersion: apps/v1
37 | kind: DaemonSet
38 | metadata:
39 | name: disable-smt
40 | namespace: kube-system
41 | spec:
42 | selector:
43 | matchLabels:
44 | name: disable-smt
45 | updateStrategy:
46 | type: RollingUpdate
47 | template:
48 | metadata:
49 | labels:
50 | name: disable-smt
51 | spec:
52 | tolerations:
53 | - operator: Exists
54 | volumes:
55 | - name: host
56 | hostPath:
57 | path: /
58 | hostPID: true
59 | initContainers:
60 | - name: smt
61 | image: bash
62 | command:
63 | - /usr/local/bin/bash
64 | - -c
65 | - |
66 | set -euo pipefail
67 | echo "SMT is set to $(cat /host/sys/devices/system/cpu/smt/control)"
68 | echo "Setting SMT to off"
69 | echo -n "off" > /host/sys/devices/system/cpu/smt/control
70 | echo "Restarting Kubelet..."
71 | chroot /host nsenter --target=1 --all -- systemctl restart kubelet.service
72 | volumeMounts:
73 | - name: host
74 | mountPath: /host
75 | resources:
76 | requests:
77 | memory: 5Mi
78 | cpu: 5m
79 | securityContext:
80 | privileged: true
81 | containers:
82 | - image: gcr.io/google-containers/pause:3.2
83 | name: pause
84 | # Ensures that the pods will only run on the nodes having the certain
85 | # label.
86 | nodeSelector:
87 | "cloud.google.com/gke-smt-disabled": "true"
88 |
--------------------------------------------------------------------------------
/containerd/containerd-http-proxy/configure_http_proxy.yaml:
--------------------------------------------------------------------------------
1 | kind: DaemonSet
2 | apiVersion: apps/v1
3 | metadata:
4 | name: containerd-http-proxy
5 | spec:
6 | selector:
7 | matchLabels:
8 | name: containerd-http-proxy
9 | template:
10 | metadata:
11 | labels:
12 | name: containerd-http-proxy
13 | spec:
14 | hostPID: true
15 | volumes:
16 | - name: systemd-containerd-service
17 | hostPath:
18 | path: /etc/systemd/system/containerd.service.d
19 | type: DirectoryOrCreate
20 | initContainers:
21 | - name: startup-script
22 | image: gke.gcr.io/debian-base:bookworm-v1.0.0-gke.1
23 | imagePullPolicy: IfNotPresent
24 | securityContext:
25 | privileged: true
26 | volumeMounts:
27 | - name: systemd-containerd-service
28 | mountPath: /etc/systemd/system/containerd.service.d
29 | command:
30 | - /bin/sh
31 | - -c
32 | - |
33 | set -e
34 | set -u
35 |
36 | validate_proxy() {
37 | input_string=$1
38 |
39 | if echo "$input_string" | grep -q ' '; then
40 | echo "Error: Input cannot contain spaces. Input: '$input_string'"
41 | exit 1
42 | fi
43 |
44 | if echo "$input_string" | sed 1d | grep -q .; then
45 | echo "Error: Input cannot contain newline. Input: '$input_string'"
46 | exit 1
47 | fi
48 |
49 | if echo "$input_string" | grep -q -e '"' -e "'"; then
50 | echo "Error: Input cannot contain quotes. Input: '$input_string'"
51 | exit 1
52 | fi
53 | }
54 |
55 | validate_proxy "${HTTP_PROXY:-}"
56 | validate_proxy "${HTTPS_PROXY:-}"
57 | validate_proxy "${NO_PROXY:-localhost}"
58 |
59 | cat > /etc/systemd/system/containerd.service.d/http-proxy.conf <&2
73 | env:
74 | - name: HTTP_PROXY
75 | valueFrom:
76 | configMapKeyRef:
77 | name: containerd-proxy-configmap
78 | key: HTTP_PROXY
79 | - name: HTTPS_PROXY
80 | valueFrom:
81 | configMapKeyRef:
82 | name: containerd-proxy-configmap
83 | key: HTTPS_PROXY
84 | - name: NO_PROXY
85 | valueFrom:
86 | configMapKeyRef:
87 | name: containerd-proxy-configmap
88 | key: NO_PROXY
89 | optional: true
90 | containers:
91 | - name: pause-container
92 | image: gke.gcr.io/pause:3.7
93 | imagePullPolicy: IfNotPresent
94 |
95 |
--------------------------------------------------------------------------------
/troubleshooting/perf/perf-trace.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | #
15 | #
16 | #
17 | # Change TARGET_PGREP below to names of processes to trace.
18 | # Also change KERNEL_VERSION below to version being tested.
19 |
20 | apiVersion: apps/v1
21 | kind: DaemonSet
22 | metadata:
23 | name: enable-perf-trace
24 | labels:
25 | app: enable-perf-trace
26 | spec:
27 | selector:
28 | matchLabels:
29 | name: enable-perf-trace
30 | template:
31 | metadata:
32 | labels:
33 | name: enable-perf-trace
34 | spec:
35 | nodeSelector:
36 | "enable-perf": "true"
37 | hostPID: true
38 | volumes:
39 | - name: host
40 | hostPath:
41 | path: /
42 | containers:
43 | - name: enable-perf-trace
44 | image: gke.gcr.io/debian-base
45 | imagePullPolicy: Always
46 | volumeMounts:
47 | - name: host
48 | mountPath: /host
49 | securityContext:
50 | privileged: true
51 | env:
52 | # TARGET_PGREP options
53 | # Values:
54 | # empty or unset to do full trace
55 | # pgrep query to filter
56 | - name: TARGET_PGREP
57 | value: "kubelet"
58 | command:
59 | - /bin/bash
60 | - -c
61 | - |
62 |
63 | set -o errexit
64 | set -o pipefail
65 | set -o nounset
66 |
67 | MAX_LOG_SIZE="16777215" # 16 MB
68 | MAX_LOGS="2000"
69 | KERNEL_VERSION="5.0.0"
70 |
71 | apt-get update && apt-get install -y curl procps daemontools build-essential bison flex libelf-dev binutils-dev
72 | curl -O https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/v"${KERNEL_VERSION}"/perf-"${KERNEL_VERSION}".tar.gz
73 | tar xzf perf-"${KERNEL_VERSION}".tar.gz
74 | make -C perf-"${KERNEL_VERSION}"/tools/perf install
75 | PERF="/root/bin/perf"
76 |
77 | d=$(date '+%Y-%m-%dT%H:%M:%SZ')
78 |
79 | out_dir="/host/var/log/perf_trace/${d}"
80 | mkdir -p "${out_dir}"
81 |
82 | echo "starting perf! will dump to ${out_dir}"
83 |
84 | if [[ -z "${TARGET_PGREP+x}" ]]; then
85 | echo "full system perf trace"
86 | "${PERF}" trace |& multilog t s${MAX_LOG_SIZE} n${MAX_LOGS} "${out_dir}"
87 | else
88 | echo "PID perf trace"
89 | PIDS=$(pgrep "${TARGET_PGREP}" -d ",")
90 | echo "staring perf on pids == ${PIDS}"
91 | "${PERF}" trace --pid="${PIDS}" |& multilog t s${MAX_LOG_SIZE} n${MAX_LOGS} "${out_dir}"
92 | fi
93 |
--------------------------------------------------------------------------------
/troubleshooting/enable-kdump/ubuntu-enable-kdump.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | apiVersion: apps/v1
16 | kind: DaemonSet
17 | metadata:
18 | name: enable-kdump
19 | labels:
20 | app: enable-kdump
21 | spec:
22 | selector:
23 | matchLabels:
24 | name: enable-kdump
25 | template:
26 | metadata:
27 | labels:
28 | name: enable-kdump
29 | spec:
30 | nodeSelector:
31 | "cloud.google.com/gke-os-distribution": "ubuntu"
32 | "enable-kdump": "true"
33 | hostPID: true
34 | containers:
35 | - name: enable-kdump
36 | image: debian
37 | imagePullPolicy: Always
38 | securityContext:
39 | privileged: true
40 | command:
41 | - /usr/bin/nsenter
42 | - -t 1
43 | - -m
44 | - -u
45 | - -i
46 | - -n
47 | - -p
48 | - --
49 | - /bin/bash
50 | - -c
51 | - |
52 |
53 | set -o errexit
54 | set -o pipefail
55 | set -o nounset
56 |
57 | function check_kdump() {
58 | local kdump_show
59 | kdump_show=$(kdump-config show)
60 | if echo "${kdump_show}" | grep -q "ready to kdump"; then
61 | echo "ready to kdump!"
62 |
63 | echo "setting sysctls"
64 | sysctl -w kernel.hung_task_panic=1
65 | sysctl -w kernel.hung_task_timeout_secs=20
66 | echo "sysctls are set"
67 | else
68 | echo "kdump not setup, isn't ready"
69 | fi
70 | echo "kdump-config show ==> ${kdump_show}"
71 | echo "/proc/cmdline ==> $(cat /proc/cmdline)"
72 | }
73 |
74 | function install() {
75 | echo "installing kdump"
76 | apt-get update
77 | DEBIAN_FRONTEND=noninteractive apt-get install -y linux-crashdump
78 | sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT.*/GRUB_CMDLINE_LINUX_DEFAULT="\$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M"/g' /etc/default/grub.d/kdump-tools.cfg
79 | update-grub
80 | echo "kdump enabled; waiting for reboot in 10 secs..."
81 | ( sleep 10 && reboot ) &
82 |
83 | while true; do
84 | echo "$(date '+%Y-%m-%dT%H:%M:%SZ') waiting for reboot..."
85 | sleep 1
86 | done
87 | }
88 |
89 | if command -v "kdump-config" &> /dev/null; then
90 | check_kdump
91 | sleep 10
92 | else
93 | install
94 | fi
95 |
--------------------------------------------------------------------------------
/troubleshooting/enable-kdump/cos-enable-kdump.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
16 | # Deploy this DaemonSet to enable kdump on the COS nodes with the
17 | # "cloud.google.com/gke-kdump-enabled=true" label.
18 | #
19 | # WARNING: Enabling kdump requires node reboot. Therefore, in order to avoid
20 | # disrupting your workloads, it is recommended to create a new node pool with
21 | # the "cloud.google.com/gke-kdump-enabled=true" label in your cluster,
22 | # deploy the DaemonSet to enable kdump in that node pool, and then migrate
23 | # your workloads to the new node pool.
24 |
25 | apiVersion: apps/v1
26 | kind: DaemonSet
27 | metadata:
28 | name: enable-kdump
29 | namespace: kube-system
30 | spec:
31 | selector:
32 | matchLabels:
33 | name: enable-kdump
34 | updateStrategy:
35 | type: RollingUpdate
36 | template:
37 | metadata:
38 | labels:
39 | name: enable-kdump
40 | spec:
41 | volumes:
42 | - name: host
43 | hostPath:
44 | path: /
45 | initContainers:
46 | - name: enable-kdump
47 | image: ubuntu
48 | command:
49 | - /bin/bash
50 | - -c
51 | - |
52 | function verify_base_image {
53 | local id="$(grep "^ID=" /host/etc/os-release)"
54 | if [[ "${id#*=}" != "cos" ]]; then
55 | echo "This kdump feature switch is designed to run on Container-Optimized OS only"
56 | exit 0
57 | fi
58 | }
59 | function check_kdump_feature {
60 | chroot /host /usr/sbin/kdump_helper show
61 | }
62 | function enable_kdump_feature_and_reboot_if_needed {
63 | chroot /host /usr/sbin/kdump_helper enable
64 | local -r is_enabled=$(chroot /host /usr/sbin/kdump_helper show | grep "kdump enabled" | sed -rn "s/kdump enabled: (.*)/\1/p")
65 | local -r is_ready=$(chroot /host /usr/sbin/kdump_helper show | grep "kdump ready" | sed -rn "s/kdump ready: (.*)/\1/p")
66 | if [[ "${is_enabled}" == "true" && "${is_ready}" == "false" ]]; then
67 | echo "kdump is enabled. Rebooting for it to take effect."
68 | chroot /host systemctl reboot
69 | fi
70 | }
71 | verify_base_image
72 | check_kdump_feature
73 | enable_kdump_feature_and_reboot_if_needed
74 | resources:
75 | requests:
76 | memory: 5Mi
77 | cpu: 5m
78 | securityContext:
79 | privileged: true
80 | volumeMounts:
81 | - name: host
82 | mountPath: /host
83 | containers:
84 | - image: gke.gcr.io/pause:3.8@sha256:880e63f94b145e46f1b1082bb71b85e21f16b99b180b9996407d61240ceb9830
85 | name: pause
86 | nodeSelector:
87 | "cloud.google.com/gke-kdump-enabled": "true"
88 | "cloud.google.com/gke-os-distribution": "cos"
89 |
--------------------------------------------------------------------------------
/kubelet/kubelet-log-config/kubelet-log-config.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2024 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | # Deploy this DaemonSet to configure kubelet log rotation on nodes with the
16 | # "kubelet-log-config=true" label.
17 | #
18 | # Change the values of CONTAINER_LOG_MAX_SIZE and CONTAINER_LOG_MAX_FILES to
19 | # suit your needs.
20 | #
21 | # WARNING: Changing the kubelet log rotation configuration requires a kubelet
22 | # restart. Therefore, in order to avoid disrupting your workloads, it is
23 | # recommended to create a new node pool with the "kubelet-log-config=true" label
24 | # in your cluster, deploy the DaemonSet to configure kubelet log rotation in
25 | # that node pool, and then migrate your workloads to the new node pool.
26 |
27 | apiVersion: apps/v1
28 | kind: DaemonSet
29 | metadata:
30 | name: kubelet-log-config
31 | namespace: kube-system
32 | spec:
33 | selector:
34 | matchLabels:
35 | name: kubelet-log-config
36 | updateStrategy:
37 | type: RollingUpdate
38 | template:
39 | metadata:
40 | labels:
41 | name: kubelet-log-config
42 | spec:
43 | tolerations:
44 | - operator: Exists
45 | volumes:
46 | - name: host
47 | hostPath:
48 | path: /
49 | hostPID: true
50 | initContainers:
51 | - name: kubelet-log-config
52 | image: gke.gcr.io/debian-base
53 | env:
54 | # The maximum size of the container log file before it is rotated.
55 | # Update the value as desired.
56 | - name: CONTAINER_LOG_MAX_SIZE
57 | value: "10Mi"
58 | # The maximum number of container log files that for a container.
59 | # Update the value as desired.
60 | - name: CONTAINER_LOG_MAX_FILES
61 | value: "5"
62 | command:
63 | - /bin/bash
64 | - -c
65 | - |
66 | set -xeuo pipefail
67 |
68 | # Configure the kubelet log rotation behavior.
69 | # $1: Field name in kubelet configuration.
70 | # $2: Value for the kubelet config field.
71 | function set-kubelet-log-config() {
72 | [[ "$#" -eq 2 ]] || return
73 | local field; field="$1"; shift
74 | local value; value="$1"; shift
75 |
76 | local config; config="/host/home/kubernetes/kubelet-config.yaml"
77 |
78 | echo "Remove existing configuration for ${field} if there is any."
79 | sed -i "/${field}/d" "${config}"
80 |
81 | echo "Set ${field} to ${value}."
82 | echo "${field}: ${value}" >> "${config}"
83 | }
84 |
85 | set-kubelet-log-config containerLogMaxSize "${CONTAINER_LOG_MAX_SIZE}"
86 | set-kubelet-log-config containerLogMaxFiles "${CONTAINER_LOG_MAX_FILES}"
87 |
88 | echo "Restarting kubelet..."
89 | chroot /host nsenter -a -t1 -- systemctl restart kubelet.service
90 |
91 | echo "Success!"
92 | volumeMounts:
93 | - name: host
94 | mountPath: /host
95 | resources:
96 | requests:
97 | memory: 5Mi
98 | cpu: 5m
99 | securityContext:
100 | privileged: true
101 | containers:
102 | - image: gcr.io/google-containers/pause:3.2
103 | name: pause
104 | # Ensures that the pods will only run on the nodes having the correct
105 | # label.
106 | nodeSelector:
107 | "kubelet-log-config": "true"
108 |
--------------------------------------------------------------------------------
/disable-smt/gke/READ.md:
--------------------------------------------------------------------------------
1 | # SMT (Hyper-Threading) Configuration
2 |
3 | These tools modify the Simultaneous Multithreading (SMT), also known as hyper-threading, setting on GKE nodes.
4 |
5 | ### **Important Note**
6 | The officially recommended method for configuring SMT on GKE is to use the `--threads-per-core` flag when creating or updating a node pool with the `gcloud` command-line tool. These DaemonSets should be considered an alternative method. You can find more information in the [official GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/configure-smt).
7 |
8 | ---
9 | ## **`enable-smt.yaml` (Enable Hyper-Threading)**
10 |
11 | The `enable-smt` tool is a Kubernetes DaemonSet that enables Simultaneous Multithreading (SMT) on targeted GKE nodes. This is important for increasing the potential CPU throughput for highly parallel workloads by allowing each physical CPU core to execute multiple threads concurrently. 🧠
12 |
13 | #### ⚠️ Warning
14 | * This tool runs as **privileged** and **restarts the kubelet**, which can disrupt workloads on the node.
15 | * Enabling SMT may expose the node to security vulnerabilities like **Microarchitectural Data Sampling (MDS)**.
16 | * It is highly recommended to apply this configuration to a new, empty node pool and then safely migrate your workloads to the newly configured nodes.
17 |
18 | #### **Prerequisites**
19 | This DaemonSet only runs on nodes with a specific label. You must label the node(s) you want to configure:
20 | ```bash
21 | kubectl label node [cloud.google.com/gke-smt-disabled=false](https://cloud.google.com/gke-smt-disabled=false)
22 | ```
23 | How to use it?
24 | Save the script to a file named enable-smt.yaml.
25 |
26 | Apply the DaemonSet to your cluster.
27 |
28 | ```bash
29 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/enable-smt.yaml
30 | ```
31 | How to get the result?
32 | SSH into the labeled node and check the SMT control file. The output should be on.
33 |
34 | ```bash
35 | cat /sys/devices/system/cpu/smt/control
36 | ```
37 | You can also check the pod's logs in the kube-system namespace to see the script's output.
38 |
39 | How to remove it?
40 | ```bash
41 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/enable-smt.yaml
42 | Note: Removing the DaemonSet does not disable SMT on already-configured nodes.
43 | ```
44 | disable-smt.yaml (Disable Hyper-Threading)
45 | The disable-smt tool is a Kubernetes DaemonSet that disables Simultaneous Multithreading (SMT) on targeted GKE nodes. This is important for mitigating certain CPU security vulnerabilities (like MDS) or for ensuring predictable performance for workloads that do not benefit from hyper-threading. 🔒
46 |
47 | ⚠️ Warning
48 | This tool runs as privileged and restarts the kubelet, which can disrupt workloads on the node.
49 |
50 | Disabling SMT may have a severe performance impact on applications that rely on high thread counts.
51 |
52 | It is highly recommended to apply this configuration to a new, empty node pool and then safely migrate your workloads.
53 |
54 | Prerequisites
55 | This DaemonSet only runs on nodes with a specific label. You must label the node(s) you want to configure:
56 |
57 | ```bash
58 | kubectl label node [cloud.google.com/gke-smt-disabled=true](https://cloud.google.com/gke-smt-disabled=true)
59 | ```
60 | How to use it?
61 | Save the script to a file named disable-smt.yaml.
62 |
63 | Apply the DaemonSet to your cluster.
64 |
65 | ```bash
66 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/disable-smt.yaml
67 | ```
68 | How to get the result?
69 | SSH into the labeled node and check the SMT control file. The output should be off.
70 |
71 | ```bash
72 | cat /sys/devices/system/cpu/smt/control
73 | You can also check the pod's logs in the kube-system namespace to see the script's output.
74 | ```
75 |
76 | How to remove it?
77 |
78 | ```bash
79 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/disable-smt.yaml
80 | Note: Removing the DaemonSet does not re-enable SMT on already-configured nodes.
81 | ```
--------------------------------------------------------------------------------
/containerd/socket-tracer/containerd-socket-tracer.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: DaemonSet
3 | metadata:
4 | name: containerd-socket-tracer
5 | namespace: default
6 | labels:
7 | k8s-app: containerd-socket-tracer
8 | spec:
9 | selector:
10 | matchLabels:
11 | name: containerd-socket-tracer
12 | template:
13 | metadata:
14 | labels:
15 | name: containerd-socket-tracer
16 | annotations:
17 | autopilot.gke.io/no-connect: "true"
18 | spec:
19 | hostPID: true
20 | containers:
21 | - name: tracer
22 | image: mirror.gcr.io/ubuntu:24.04
23 | command: ["/bin/sh", "-c"]
24 | args:
25 | - |
26 | apt-get update && apt-get install -y bpftrace && \
27 |
28 | start_time=$(date -u +'%Y-%m-%dT%H:%M:%SZ') && \
29 | echo "time=\"$start_time\" msg=\"eBPF tracepoint for containerd socket connections started\" node=\"$NODE_NAME\"" && \
30 |
31 | bpftrace -e '
32 | #include
33 |
34 | // Log all PIDs and commands connecting to the containerd socket.
35 | tracepoint:syscalls:sys_enter_connect
36 | // Skip commands we know are not using the deprecated API.
37 | // Skip "crictl" (used below) to prevent creating a loop.
38 | // NOTE: You can update this filter with more commands as needed.
39 | /comm != "kubelet" && comm != "containerd" && comm != "ctr" && comm != "crictl"/ {
40 | $sa = (struct sockaddr_un *)args->uservaddr;
41 | if ($sa->sun_family == AF_UNIX &&
42 | strcontains($sa->sun_path, "containerd.sock") &&
43 | !strcontains($sa->sun_path, "containerd.sock.ttrpc")) {
44 | printf("%d %s\n", pid, comm);
45 | }
46 | }' | {
47 | # Skip parsing bpftrace header text.
48 | read -r _
49 |
50 | # Query CRI for the container with that PID.
51 | while read -r pid comm; do
52 | current_pid="$pid"
53 | while true; do
54 | output=$(nsenter -at 1 /home/kubernetes/bin/crictl inspect --output go-template --template '
55 | {{- range . -}}
56 | {{- if eq .info.pid "'"$current_pid"'" -}}
57 | {{- $time := "'"$(date -u +'%Y-%m-%dT%H:%M:%SZ')"'" -}}
58 | {{- $node := "'"$NODE_NAME"'" -}}
59 | {{- $namespace := index .info.runtimeSpec.annotations "io.kubernetes.cri.sandbox-namespace" -}}
60 | {{- $name := index .info.runtimeSpec.annotations "io.kubernetes.cri.sandbox-name" -}}
61 | {{- $container := index .info.runtimeSpec.annotations "io.kubernetes.cri.container-name" -}}
62 | {{- printf "time=\"%s\" msg=\"containerd socket connection opened\" node=\"%s\" pod=\"%s/%s\" container=\"%s\" comm=\"%s\"" $time $node $namespace $name $container "'"$comm"'" -}}
63 | {{- end -}}
64 | {{- end -}}'
65 | )
66 |
67 | if [ -n "$output" ]; then
68 | echo "$output"
69 | break
70 | fi
71 |
72 | # If it cannot be found, then walk up the ancestor tree to find the main container process.
73 | if ! ppid=$(ps -o ppid= -p "$current_pid" 2>/dev/null | tr -d ' '); then
74 | break
75 | fi
76 | if [ -z "$ppid" ] || [ "$ppid" -eq 1 ]; then
77 | break
78 | fi
79 | current_pid="$ppid"
80 | done
81 | done
82 | }
83 | resources:
84 | requests:
85 | ephemeral-storage: "500Mi"
86 | limits:
87 | ephemeral-storage: "500Mi"
88 | env:
89 | - name: NODE_NAME
90 | valueFrom:
91 | fieldRef:
92 | fieldPath: spec.nodeName
93 | securityContext:
94 | # Privileged is required for the bpftrace tool to use eBPF syscalls.
95 | privileged: true
96 | volumeMounts:
97 | - name: debugfs
98 | mountPath: /sys/kernel/debug
99 | readOnly: true
100 | volumes:
101 | # debugfs is required by bpftrace to access kernel tracepoints.
102 | - name: debugfs
103 | hostPath:
104 | path: /sys/kernel/debug
105 | tolerations:
106 | - key: "node.kubernetes.io/not-ready"
107 | operator: "Exists"
108 | effect: "NoExecute"
109 |
--------------------------------------------------------------------------------
/troubleshooting/os-audit/cos-auditd-logging.yaml:
--------------------------------------------------------------------------------
1 | # Copyright 2019 Google LLC
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # https://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | apiVersion: v1
16 | kind: Namespace
17 | metadata:
18 | name: cos-auditd
19 | ---
20 | apiVersion: apps/v1
21 | kind: DaemonSet
22 | metadata:
23 | name: cos-auditd-logging
24 | namespace: cos-auditd
25 | annotations:
26 | kubernetes.io/description: 'DaemonSet that enables Linux auditd logging on non-Autopilot COS nodes.'
27 | spec:
28 | selector:
29 | matchLabels:
30 | name: cos-auditd-logging
31 | template:
32 | metadata:
33 | labels:
34 | name: cos-auditd-logging
35 | spec:
36 | # Necessary for ensuring access to Google Cloud credentials from the node's metadata server.
37 | hostNetwork: true
38 | hostPID: true
39 | dnsPolicy: Default
40 | initContainers:
41 | - name: cos-auditd-setup
42 | image: ubuntu
43 | command: ["chroot", "/host", "systemctl", "start", "cloud-audit-setup"]
44 | securityContext:
45 | privileged: true
46 | volumeMounts:
47 | - name: host
48 | mountPath: /host
49 | resources:
50 | requests:
51 | memory: "10Mi"
52 | cpu: "10m"
53 | containers:
54 | - name: cos-auditd-fluent-bit
55 | securityContext:
56 | allowPrivilegeEscalation: false
57 | capabilities:
58 | drop:
59 | - all
60 | add:
61 | - DAC_OVERRIDE
62 | env:
63 | - name: NODE_NAME
64 | valueFrom:
65 | fieldRef:
66 | apiVersion: v1
67 | fieldPath: spec.nodeName
68 | # Substitute these (manually or via envsubst). For example, run
69 | # `CLUSTER_NAME=example-cluster CLUSTER_LOCATION=us-central1-a envsubst '$CLUSTER_NAME,$CLUSTER_LOCATION' < ${THIS_FILE:?} | kubectl apply -f -`
70 | - name: CLUSTER_NAME
71 | value: "$CLUSTER_NAME"
72 | - name: CLUSTER_LOCATION
73 | value: "$CLUSTER_LOCATION"
74 | # This image is used for demo purposes. The best practice is to use the image from controlled registry and reference it by SHA.
75 | image: fluent/fluent-bit:latest
76 | imagePullPolicy: IfNotPresent
77 | livenessProbe:
78 | httpGet:
79 | path: /
80 | port: 2024
81 | initialDelaySeconds: 120
82 | periodSeconds: 60
83 | timeoutSeconds: 5
84 | ports:
85 | - name: metrics
86 | containerPort: 2024
87 | resources:
88 | limits:
89 | cpu: "1"
90 | memory: 500Mi
91 | requests:
92 | cpu: 100m
93 | memory: 200Mi
94 | terminationMessagePath: /dev/termination-log
95 | terminationMessagePolicy: File
96 | volumeMounts:
97 | - mountPath: /var/log
98 | name: varlog
99 | - mountPath: /var/lib/cos-auditd-fluent-bit/pos-files
100 | name: varlib-cos-auditd-fluent-bit-pos-files
101 | - mountPath: /fluent-bit/etc
102 | name: config-volume
103 | nodeSelector:
104 | cloud.google.com/gke-os-distribution: cos
105 | restartPolicy: Always
106 | terminationGracePeriodSeconds: 120
107 | tolerations:
108 | - operator: "Exists"
109 | effect: "NoExecute"
110 | - operator: "Exists"
111 | effect: "NoSchedule"
112 | volumes:
113 | - name: host
114 | hostPath:
115 | path: /
116 | - name: varlog
117 | hostPath:
118 | path: /var/log
119 | - name: varlibcos-auditd-fluent-bit
120 | hostPath:
121 | path: /var/lib/cos-auditd-fluent-bit
122 | type: DirectoryOrCreate
123 | - name: varlib-cos-auditd-fluent-bit-pos-files
124 | hostPath:
125 | path: /var/lib/cos-auditd-fluent-bit/pos-files
126 | type: DirectoryOrCreate
127 | - name: config-volume
128 | configMap:
129 | name: cos-auditd-fluent-bit-config
130 | updateStrategy:
131 | type: RollingUpdate
132 | ---
133 | kind: ConfigMap
134 | apiVersion: v1
135 | metadata:
136 | name: cos-auditd-fluent-bit-config
137 | namespace: cos-auditd
138 | annotations:
139 | kubernetes.io/description: 'ConfigMap for Linux auditd logging daemonset on COS nodes.'
140 | data:
141 | fluent-bit.conf: |-
142 | [SERVICE]
143 | Flush 5
144 | Grace 120
145 | Log_Level info
146 | Daemon off
147 | HTTP_Server On
148 | HTTP_Listen 0.0.0.0
149 | HTTP_PORT 2024
150 |
151 | [INPUT]
152 | # https://docs.fluentbit.io/manual/input/systemd
153 | Name systemd
154 | Alias audit
155 | Tag audit
156 | Systemd_Filter SYSLOG_IDENTIFIER=audit
157 | Path /var/log/journal
158 | DB /var/lib/cos-auditd-fluent-bit/pos-files/audit.db
159 |
160 | [FILTER]
161 | # https://docs.fluentbit.io/manual/pipeline/filters/modify
162 | Name modify
163 | Match audit
164 | Add logging.googleapis.com/local_resource_id k8s_node.${NODE_NAME}
165 |
166 | [FILTER]
167 | Name modify
168 | Match audit
169 | Add logging.googleapis.com/logName linux-auditd
170 |
171 | [OUTPUT]
172 | # https://docs.fluentbit.io/manual/pipeline/outputs/stackdriver
173 | Name stackdriver
174 | Match audit
175 | Severity_key severity
176 | log_name_key logging.googleapis.com/logName
177 | Resource k8s_node
178 | # The plugin will read the project ID from the metadata server, but not the cluster name and location for some reason, so they have to be injected.
179 | k8s_cluster_name ${CLUSTER_NAME}
180 | k8s_cluster_location ${CLUSTER_LOCATION}
181 | net.connect_timeout 60
182 | Retry_Limit 14
183 | Workers 1
184 |
--------------------------------------------------------------------------------
/containerd/migrating-to-containerd/find-nodepools-to-migrate.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # [START gke_node_find_non_containerd_nodepools]
4 | for project in $(gcloud projects list --format="value(projectId)")
5 | do
6 | echo "ProjectId: $project"
7 | for clusters in $( \
8 | gcloud container clusters list \
9 | --project $project \
10 | --format="csv[no-heading](name,location,autopilot.enabled,currentMasterVersion,autoscaling.enableNodeAutoprovisioning,autoscaling.autoprovisioningNodePoolDefaults.imageType)")
11 | do
12 | IFS=',' read -r -a clustersArray <<< "$clusters"
13 | cluster_name="${clustersArray[0]}"
14 | cluster_zone="${clustersArray[1]}"
15 | cluster_isAutopilot="${clustersArray[2]}"
16 | cluster_version="${clustersArray[3]}"
17 | cluster_minorVersion=${cluster_version:0:4}
18 | cluster_autoprovisioning="${clustersArray[4]}"
19 | cluster_autoprovisioningImageType="${clustersArray[5]}"
20 |
21 | if [ "$cluster_isAutopilot" = "True" ]; then
22 | echo " Cluster: $cluster_name (autopilot) (zone: $cluster_zone)"
23 | echo " Autopilot clusters are running Containerd."
24 | else
25 | echo " Cluster: $cluster_name (zone: $cluster_zone)"
26 |
27 | if [ "$cluster_autoprovisioning" = "True" ]; then
28 | if [ "$cluster_minorVersion" \< "1.20" ]; then
29 | echo " Node autoprovisioning is enabled, and new node pools will have image type 'COS'."
30 | echo " This settings is not configurable on the current version of a cluster."
31 | echo " Please upgrade you cluster and configure the default node autoprovisioning image type."
32 | echo " "
33 | else
34 | if [ "$cluster_autoprovisioningImageType" = "COS" ]; then
35 | echo " Node autoprovisioning is configured to create new node pools of type 'COS'."
36 | echo " Run the following command to update:"
37 | echo " gcloud container clusters update '$cluster_name' --project '$project' --zone '$cluster_zone' --enable-autoprovisioning --autoprovisioning-image-type='COS_CONTAINERD'"
38 | echo " "
39 | fi
40 |
41 | if [ "$cluster_autoprovisioningImageType" = "UBUNTU" ]; then
42 | echo " Node autoprovisioning is configured to create new node pools of type 'UBUNTU'."
43 | echo " Run the following command to update:"
44 | echo " gcloud container clusters update '$cluster_name' --project '$project' --zone '$cluster_zone' --enable-autoprovisioning --autoprovisioning-image-type='UBUNTU_CONTAINERD'"
45 | echo " "
46 | fi
47 | fi
48 | fi
49 |
50 | for nodepools in $( \
51 | gcloud container node-pools list \
52 | --project $project \
53 | --cluster $cluster_name \
54 | --zone $cluster_zone \
55 | --format="csv[no-heading](name,version,config.imageType)")
56 | do
57 | IFS=',' read -r -a nodepoolsArray <<< "$nodepools"
58 | nodepool_name="${nodepoolsArray[0]}"
59 | nodepool_version="${nodepoolsArray[1]}"
60 | nodepool_imageType="${nodepoolsArray[2]}"
61 |
62 | nodepool_minorVersion=${nodepool_version:0:4}
63 |
64 | echo " Nodepool: $nodepool_name, version: $nodepool_version ($nodepool_minorVersion), image: $nodepool_imageType"
65 |
66 | minorVersionWithRev="${nodepool_version/-gke./.}"
67 | linuxGkeMinVersion="1.14"
68 | windowsGkeMinVersion="1.21.1.2200"
69 |
70 | suggestedImageType="COS_CONTAINERD"
71 |
72 | if [ "$nodepool_imageType" = "UBUNTU" ]; then
73 | suggestedImageType="UBUNTU_CONTAINERD"
74 | elif [ "$nodepool_imageType" = "WINDOWS_LTSC" ]; then
75 | suggestedImageType="WINDOWS_LTSC_CONTAINERD"
76 | elif [ "$nodepool_imageType" = "WINDOWS_SAC" ]; then
77 | suggestedImageType="WINDOWS_SAC_CONTAINERD"
78 | fi
79 |
80 | tab=$'\n ';
81 | nodepool_message="$tab Please update the nodepool to use Containerd."
82 | nodepool_message+="$tab Make sure to consult with the list of known issues https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues."
83 | nodepool_message+="$tab Run the following command to upgrade:"
84 | nodepool_message+="$tab "
85 | nodepool_message+="$tab gcloud container clusters upgrade '$cluster_name' --project '$project' --zone '$cluster_zone' --image-type '$suggestedImageType' --node-pool '$nodepool_name'"
86 | nodepool_message+="$tab "
87 |
88 | # see https://cloud.google.com/kubernetes-engine/docs/concepts/node-images
89 | if [ "$nodepool_imageType" = "COS_CONTAINERD" ] || [ "$nodepool_imageType" = "UBUNTU_CONTAINERD" ] ||
90 | [ "$nodepool_imageType" = "WINDOWS_LTSC_CONTAINERD" ] || [ "$nodepool_imageType" = "WINDOWS_SAC_CONTAINERD" ]; then
91 | nodepool_message="$tab Nodepool is using Containerd already"
92 | elif ( [ "$nodepool_imageType" = "WINDOWS_LTSC" ] || [ "$nodepool_imageType" = "WINDOWS_SAC" ] ) &&
93 | [ "$(printf '%s\n' "$windowsGkeMinVersion" "$minorVersionWithRev" | sort -V | head -n1)" != "$windowsGkeMinVersion" ]; then
94 | nodepool_message="$tab Upgrade nodepool to the version that supports Containerd for Windows"
95 | elif [ "$(printf '%s\n' "$linuxGkeMinVersion" "$minorVersionWithRev" | sort -V | head -n1)" != "$linuxGkeMinVersion" ]; then
96 | nodepool_message="$tab Upgrade nodepool to the version that supports Containerd"
97 | fi
98 | echo "$nodepool_message"
99 | done
100 | fi # not autopilot
101 | done
102 | done
103 |
104 | # Sample output:
105 | #
106 | # ProjectId: my-project-id
107 | # Cluster: autopilot-cluster-1 (autopilot) (zone: us-central1)
108 | # Autopilot clusters are running Containerd.
109 | # Cluster: cluster-1 (zone: us-central1-c)
110 | # Nodepool: default-pool, version: 1.18.12-gke.1210 (1.18), image: COS
111 | #
112 | # Please update the nodepool to use Containerd.
113 | # Make sure to consult with the list of known issues https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues.
114 | # Run the following command to upgrade:
115 | #
116 | # gcloud container clusters upgrade 'cluster-1' --project 'my-project-id' --zone 'us-central1-c' --image-type 'COS_CONTAINERD' --node-pool 'default-pool'
117 | #
118 | # Nodepool: pool-1, version: 1.18.12-gke.1210 (1.18), image: COS
119 | #
120 | # Please update the nodepool to use Containerd.
121 | # Make sure to consult with the list of known issues https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues.
122 | # Run the following command to upgrade:
123 | #
124 | # gcloud container clusters upgrade 'cluster-1' --project 'my-project-id' --zone 'us-central1-c' --image-type 'COS_CONTAINERD' --node-pool 'pool-1'
125 | #
126 | # Nodepool: winpool, version: 1.18.12-gke.1210 (1.18), image: WINDOWS_SAC
127 | #
128 | # Upgrade nodepool to the version that supports Containerd for Windows
129 | #
130 | # Cluster: another-test-cluster (zone: us-central1-c)
131 | # Nodepool: default-pool, version: 1.20.4-gke.400 (1.20), image: COS_CONTAINERD
132 | #
133 | # Nodepool is using Containerd already
134 | #
135 | # [END gke_node_find_non_containerd_nodepools]
136 | #
137 |
--------------------------------------------------------------------------------
/containerd/socket-tracer/README.md:
--------------------------------------------------------------------------------
1 | # Tracing containerd Socket Connections
2 |
3 | The `containerd-socket-tracer.yaml` is a DaemonSet that monitors and logs
4 | connections made to the `containerd` socket. This is particularly useful for
5 | identifying which containers are interacting with the container runtime
6 | directly.
7 |
8 | Before deploying, please review the
9 | [important considerations](#important-considerations)
10 | regarding potential performance impact and system conflicts.
11 |
12 | ### How It Works
13 |
14 | This tool leverages [eBPF](https://ebpf.io/) to trace system calls. It watches
15 | for any process that attempts to open a connection to the `containerd` socket.
16 | Once a connection is detected, the tracer identifies the PID and queries the
17 | node's [CRI](https://kubernetes.io/docs/concepts/architecture/cri/) to resolve
18 | the corresponding Pod and container details.
19 |
20 | ## Example Use Case: Identifying Deprecated API Clients
21 |
22 | A practical application for this tracer is to identify applications that are
23 | still using the
24 | [deprecated CRI v1alpha2 API](https://github.com/containerd/containerd/blob/v2.1.2/RELEASES.md?plain=1#L167).
25 | As `containerd` moves towards newer API versions, it's crucial to find and update
26 | any clients using outdated versions.
27 |
28 | The following sections describe two methods for finding these clients: a manual
29 | approach using SSH and an automated approach using a companion workload.
30 |
31 | ### Method 1: Manual Correlation via SSH
32 |
33 | While the tracer is running, it will generate log entries for every container
34 | that establishes a socket connection.
35 |
36 | **Example Log Output:**
37 |
38 | ```
39 | time="2025-06-16T18:19:10Z" msg="eBPF tracepoint for containerd socket connections started" node="gke-cluster-default-pool-1e092676-x8m9"
40 | time="2025-06-16T18:19:30Z" msg="containerd socket connection opened" node="gke-cluster-default-pool-1e092676-x8m9" pod="default/cri-v1alpha2-api-client-dmghs" container="deployment-container-1" comm="v1alpha2_client"
41 | ```
42 |
43 | If your version of `containerd` includes the
44 | [deprecation service](https://samuel.karp.dev/blog/2024/01/deprecation-warnings-in-containerd-getting-ready-for-2.0/),
45 | you can correlate the timestamp from the tracer's log with the `lastOccurrence`
46 | of the deprecated API call. This allows you to pinpoint the exact application
47 | making the call.
48 |
49 | **Correlating with `containerd` Deprecation Warnings:**
50 |
51 | ```sh
52 | $ ctr deprecations list --format json | jq '.[] | select(.id == "io.containerd.deprecation/cri-api-v1alpha2")'
53 | {
54 | "id": "io.containerd.deprecation/cri-api-v1alpha2",
55 | "message": "CRI API v1alpha2 is deprecated since containerd v1.7 and removed in containerd v2.0. Use CRI API v1 instead.",
56 | "lastOccurrence": "2025-06-16T18:21:30.959558222Z"
57 | }
58 | ```
59 |
60 | ### Method 2: Automated Correlation with a Reporter Workload
61 |
62 | To simplify this process, a second workload,
63 | `cri-v1alpha2-api-deprecation-reporter.yaml`, is provided. This DaemonSet
64 | periodically logs the occurrence of the last CRI v1alpha2 API call.
65 |
66 | **Example Log Output:**
67 |
68 | ```
69 | time="2025-06-16T18:22:19Z" msg="checking for CRI v1alpha2 API deprecation warnings" node="gke-cluster-default-pool-1e092676-x8m9"
70 | time="2025-06-16T18:22:19Z" msg="found CRI v1alpha2 API deprecation warning" node="gke-cluster-default-pool-1e092676-x8m9" lastOccurrence="2025-06-16T18:21:30.959558222Z"
71 | ```
72 |
73 | ### Putting It All Together: Finding the Client
74 |
75 | With both DaemonSets deployed and running, you can find the deprecated API client by correlating the logs from both tools.
76 |
77 | In one terminal, stream the logs from the reporter to watch for deprecation events:
78 |
79 | ```sh
80 | $ kubectl logs -f -l name=cri-v1alpha2-api-deprecation-reporter
81 | ```
82 |
83 | In a second terminal, stream the logs from the tracer to watch for new connections:
84 |
85 | ```sh
86 | $ kubectl logs -f -l name=containerd-socket-tracer
87 | ```
88 |
89 | **Wait and Compare:** When a `lastOccurrence` timestamp appears in the reporter's log, look for a "containerd socket connection opened" event in the tracer's log that occurred on the **same node** at nearly the **exact same time**.
90 |
91 | Alternatively, if your cluster is configured to send logs to a centralized platform, you can run a single query to see the aggregated logs from all nodes at once. This is the recommended approach for analyzing historical data and correlating events across your entire cluster.
92 |
93 | For example, users on Google Kubernetes Engine (GKE) can use the following query in Cloud Logging to view the output from both workloads:
94 |
95 | ```
96 | resource.type="k8s_container"
97 | (
98 | labels."k8s-pod/name"="containerd-socket-tracer"
99 | OR
100 | labels."k8s-pod/name"="cri-v1alpha2-api-deprecation-reporter"
101 | )
102 | ```
103 |
104 | This correlation between the two log events pinpoints the exact pod and container that is responsible for the deprecated API call.
105 |
106 | ### Filtering Commands (`comm`)
107 |
108 | To reduce noise, the underlying `bpftrace` script is configured to ignore a
109 | default set of common, node-level commands (like `kubelet` and `containerd`
110 | itself). The primary focus is on identifying connections from other,
111 | containerized applications.
112 |
113 | You can customize this behavior by modifying the `bpftrace`
114 | [filter](https://github.com/bpftrace/bpftrace/blob/v0.23.5/man/adoc/bpftrace.adoc#filterspredicates)
115 | to include or exclude other commands as needed.
116 |
117 | ## Important Considerations
118 |
119 | Please review the following disclaimers before deploying this workload.
120 |
121 | ### Production Use
122 |
123 | Always test this tool in a dedicated test environment before deploying to
124 | production. For production rollouts, use exposure control by deploying to a
125 | small subset of nodes first to monitor for any adverse effects.
126 |
127 | This tool is intended for temporary, targeted debugging, not for prolonged
128 | execution. Its main purpose is to help identify workloads violating containerd
129 | deprecations when other detection methods have been unsuccessful.
130 |
131 | ### CPU Overhead
132 |
133 | Enabling this tracer may increase the CPU load on your nodes. This is especially
134 | true if the `comm` filter is not restrictive enough and there is a high volume
135 | of socket connections.
136 |
137 | ### Potential eBPF Conflicts
138 |
139 | Running this tracer on nodes where other eBPF-based tools are already active may
140 | lead to unexpected behavior or conflicts.
141 |
142 | ## Installation
143 |
144 | To deploy the tracer to your cluster, apply the `containerd-socket-tracer.yaml`
145 | manifest using `kubectl`.
146 |
147 | ```sh
148 | $ kubectl apply -f containerd-socket-tracer.yaml
149 | ```
150 |
151 | And to deploy the optional API reporter:
152 |
153 | ```sh
154 | $ kubectl apply -f cri-v1alpha2-api-deprecation-reporter.yaml
155 | ```
156 |
157 | ### GKE Autopilot Users
158 |
159 | GKE Autopilot clusters enforce security policies that prevent workloads
160 | requiring privileged access from running by default. To deploy the
161 | `containerd-socket-tracer` and its companion
162 | `cri-v1alpha2-api-deprecation-reporter`, you must first install the
163 | corresponding `AllowlistSynchronizer` resources in your cluster.
164 |
165 | These synchronizers enable the workloads to run on Autopilot nodes by matching
166 | them with a `WorkloadAllowlist`.
167 |
168 | To install the allowlists, apply the following manifests:
169 |
170 | ```sh
171 | $ kubectl apply -f containerd-socket-tracer-allowlist.yaml
172 | $ kubectl apply -f cri-v1alpha2-api-deprecation-reporter-allowlist.yaml
173 | ```
174 |
175 | After applying these manifests and allowing a few moments for the allowlists to
176 | synchronize, you can deploy the `containerd-socket-tracer` and
177 | `cri-v1alpha2-api-deprecation-reporter` DaemonSets as described above.
178 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 |
2 | Apache License
3 | Version 2.0, January 2004
4 | http://www.apache.org/licenses/
5 |
6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7 |
8 | 1. Definitions.
9 |
10 | "License" shall mean the terms and conditions for use, reproduction,
11 | and distribution as defined by Sections 1 through 9 of this document.
12 |
13 | "Licensor" shall mean the copyright owner or entity authorized by
14 | the copyright owner that is granting the License.
15 |
16 | "Legal Entity" shall mean the union of the acting entity and all
17 | other entities that control, are controlled by, or are under common
18 | control with that entity. For the purposes of this definition,
19 | "control" means (i) the power, direct or indirect, to cause the
20 | direction or management of such entity, whether by contract or
21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
22 | outstanding shares, or (iii) beneficial ownership of such entity.
23 |
24 | "You" (or "Your") shall mean an individual or Legal Entity
25 | exercising permissions granted by this License.
26 |
27 | "Source" form shall mean the preferred form for making modifications,
28 | including but not limited to software source code, documentation
29 | source, and configuration files.
30 |
31 | "Object" form shall mean any form resulting from mechanical
32 | transformation or translation of a Source form, including but
33 | not limited to compiled object code, generated documentation,
34 | and conversions to other media types.
35 |
36 | "Work" shall mean the work of authorship, whether in Source or
37 | Object form, made available under the License, as indicated by a
38 | copyright notice that is included in or attached to the work
39 | (an example is provided in the Appendix below).
40 |
41 | "Derivative Works" shall mean any work, whether in Source or Object
42 | form, that is based on (or derived from) the Work and for which the
43 | editorial revisions, annotations, elaborations, or other modifications
44 | represent, as a whole, an original work of authorship. For the purposes
45 | of this License, Derivative Works shall not include works that remain
46 | separable from, or merely link (or bind by name) to the interfaces of,
47 | the Work and Derivative Works thereof.
48 |
49 | "Contribution" shall mean any work of authorship, including
50 | the original version of the Work and any modifications or additions
51 | to that Work or Derivative Works thereof, that is intentionally
52 | submitted to Licensor for inclusion in the Work by the copyright owner
53 | or by an individual or Legal Entity authorized to submit on behalf of
54 | the copyright owner. For the purposes of this definition, "submitted"
55 | means any form of electronic, verbal, or written communication sent
56 | to the Licensor or its representatives, including but not limited to
57 | communication on electronic mailing lists, source code control systems,
58 | and issue tracking systems that are managed by, or on behalf of, the
59 | Licensor for the purpose of discussing and improving the Work, but
60 | excluding communication that is conspicuously marked or otherwise
61 | designated in writing by the copyright owner as "Not a Contribution."
62 |
63 | "Contributor" shall mean Licensor and any individual or Legal Entity
64 | on behalf of whom a Contribution has been received by Licensor and
65 | subsequently incorporated within the Work.
66 |
67 | 2. Grant of Copyright License. Subject to the terms and conditions of
68 | this License, each Contributor hereby grants to You a perpetual,
69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70 | copyright license to reproduce, prepare Derivative Works of,
71 | publicly display, publicly perform, sublicense, and distribute the
72 | Work and such Derivative Works in Source or Object form.
73 |
74 | 3. Grant of Patent License. Subject to the terms and conditions of
75 | this License, each Contributor hereby grants to You a perpetual,
76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77 | (except as stated in this section) patent license to make, have made,
78 | use, offer to sell, sell, import, and otherwise transfer the Work,
79 | where such license applies only to those patent claims licensable
80 | by such Contributor that are necessarily infringed by their
81 | Contribution(s) alone or by combination of their Contribution(s)
82 | with the Work to which such Contribution(s) was submitted. If You
83 | institute patent litigation against any entity (including a
84 | cross-claim or counterclaim in a lawsuit) alleging that the Work
85 | or a Contribution incorporated within the Work constitutes direct
86 | or contributory patent infringement, then any patent licenses
87 | granted to You under this License for that Work shall terminate
88 | as of the date such litigation is filed.
89 |
90 | 4. Redistribution. You may reproduce and distribute copies of the
91 | Work or Derivative Works thereof in any medium, with or without
92 | modifications, and in Source or Object form, provided that You
93 | meet the following conditions:
94 |
95 | (a) You must give any other recipients of the Work or
96 | Derivative Works a copy of this License; and
97 |
98 | (b) You must cause any modified files to carry prominent notices
99 | stating that You changed the files; and
100 |
101 | (c) You must retain, in the Source form of any Derivative Works
102 | that You distribute, all copyright, patent, trademark, and
103 | attribution notices from the Source form of the Work,
104 | excluding those notices that do not pertain to any part of
105 | the Derivative Works; and
106 |
107 | (d) If the Work includes a "NOTICE" text file as part of its
108 | distribution, then any Derivative Works that You distribute must
109 | include a readable copy of the attribution notices contained
110 | within such NOTICE file, excluding those notices that do not
111 | pertain to any part of the Derivative Works, in at least one
112 | of the following places: within a NOTICE text file distributed
113 | as part of the Derivative Works; within the Source form or
114 | documentation, if provided along with the Derivative Works; or,
115 | within a display generated by the Derivative Works, if and
116 | wherever such third-party notices normally appear. The contents
117 | of the NOTICE file are for informational purposes only and
118 | do not modify the License. You may add Your own attribution
119 | notices within Derivative Works that You distribute, alongside
120 | or as an addendum to the NOTICE text from the Work, provided
121 | that such additional attribution notices cannot be construed
122 | as modifying the License.
123 |
124 | You may add Your own copyright statement to Your modifications and
125 | may provide additional or different license terms and conditions
126 | for use, reproduction, or distribution of Your modifications, or
127 | for any such Derivative Works as a whole, provided Your use,
128 | reproduction, and distribution of the Work otherwise complies with
129 | the conditions stated in this License.
130 |
131 | 5. Submission of Contributions. Unless You explicitly state otherwise,
132 | any Contribution intentionally submitted for inclusion in the Work
133 | by You to the Licensor shall be under the terms and conditions of
134 | this License, without any additional terms or conditions.
135 | Notwithstanding the above, nothing herein shall supersede or modify
136 | the terms of any separate license agreement you may have executed
137 | with Licensor regarding such Contributions.
138 |
139 | 6. Trademarks. This License does not grant permission to use the trade
140 | names, trademarks, service marks, or product names of the Licensor,
141 | except as required for reasonable and customary use in describing the
142 | origin of the Work and reproducing the content of the NOTICE file.
143 |
144 | 7. Disclaimer of Warranty. Unless required by applicable law or
145 | agreed to in writing, Licensor provides the Work (and each
146 | Contributor provides its Contributions) on an "AS IS" BASIS,
147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 | implied, including, without limitation, any warranties or conditions
149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 | PARTICULAR PURPOSE. You are solely responsible for determining the
151 | appropriateness of using or redistributing the Work and assume any
152 | risks associated with Your exercise of permissions under this License.
153 |
154 | 8. Limitation of Liability. In no event and under no legal theory,
155 | whether in tort (including negligence), contract, or otherwise,
156 | unless required by applicable law (such as deliberate and grossly
157 | negligent acts) or agreed to in writing, shall any Contributor be
158 | liable to You for damages, including any direct, indirect, special,
159 | incidental, or consequential damages of any character arising as a
160 | result of this License or out of the use or inability to use the
161 | Work (including but not limited to damages for loss of goodwill,
162 | work stoppage, computer failure or malfunction, or any and all
163 | other commercial damages or losses), even if such Contributor
164 | has been advised of the possibility of such damages.
165 |
166 | 9. Accepting Warranty or Additional Liability. While redistributing
167 | the Work or Derivative Works thereof, You may choose to offer,
168 | and charge a fee for, acceptance of support, warranty, indemnity,
169 | or other liability obligations and/or rights consistent with this
170 | License. However, in accepting such obligations, You may act only
171 | on Your own behalf and on Your sole responsibility, not on behalf
172 | of any other Contributor, and only if You agree to indemnify,
173 | defend, and hold each Contributor harmless for any liability
174 | incurred by, or claims asserted against, such Contributor by reason
175 | of your accepting any such warranty or additional liability.
176 |
177 | END OF TERMS AND CONDITIONS
178 |
179 | APPENDIX: How to apply the Apache License to your work.
180 |
181 | To apply the Apache License to your work, attach the following
182 | boilerplate notice, with the fields enclosed by brackets "[]"
183 | replaced with your own identifying information. (Don't include
184 | the brackets!) The text should be enclosed in the appropriate
185 | comment syntax for the file format. We also recommend that a
186 | file or class name and description of purpose be included on the
187 | same "printed page" as the copyright notice for easier
188 | identification within third-party archives.
189 |
190 | Copyright [yyyy] [name of copyright owner]
191 |
192 | Licensed under the Apache License, Version 2.0 (the "License");
193 | you may not use this file except in compliance with the License.
194 | You may obtain a copy of the License at
195 |
196 | http://www.apache.org/licenses/LICENSE-2.0
197 |
198 | Unless required by applicable law or agreed to in writing, software
199 | distributed under the License is distributed on an "AS IS" BASIS,
200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 | See the License for the specific language governing permissions and
202 | limitations under the License.
203 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | # Node Configuration in GKE
5 |
6 | The below outlines a strategy for performing custom node configuration in Google Kubernetes Engine (GKE) using init containers. The goal is to apply node-level settings (like sysctl adjustments, software installations, or kernel parameter checks) *before* regular application workloads are scheduled onto those nodes. To achieve this isolation temporarily, we'll mark nodes as unshcedulable with taint until configuration is completed.
7 |
8 | ## Supporting Tools for Node Configuration
9 |
10 | A collection of tools, potentially packaged within a container image used by an init container, can help manage and configure various aspects of a Kubernetes node:
11 |
12 | * **Kubelet:** These tools can assist in managing and interacting with the Kubelet, the primary node agent in Kubernetes. This could include tools for:
13 | * **Configuration management:** Scripts or utilities to configure Kubelet settings.
14 | * **Status monitoring:** Tools to check the health and status of the Kubelet service.
15 | * **Log analysis:** Scripts to collect and analyze Kubelet logs for debugging.
16 | * **Containerd:** Tools related to containerd, the container runtime used by Kubelet to manage containers. This might include:
17 | * **Container image management:** Tools for pulling,cl inspecting, or managing container images on nodes.
18 | * **Container lifecycle management:** Utilities to interact with containerd for container creation, deletion, and status checks.
19 | * **Snapshotter management:** Tools to manage containerd snapshotter for efficient storage.
20 | * **Sysctl:** Tools for managing and configuring sysctl parameters on Kubernetes nodes. Sysctl allows you to modify kernel parameters at runtime. These tools could help with:
21 | * **Parameter modification:** Tools to apply specific sysctl settings to nodes, potentially for performance tuning or Node Specific node hardening required by your organization.
22 | * **Configuration persistence:** Mechanisms to ensure sysctl settings are applied consistently across node reboots.
23 | * **Kernel:** Tools for interacting with or gathering information about the node's kernel. This might include:
24 | * **Kernel module management:** Tools for loading, unloading, or checking the status of kernel modules.
25 | * **Kernel parameter inspection:** Utilities to examine kernel parameters beyond sysctl.
26 | * **Software Installations:** Tools to automate or simplify software installations on Kubernetes nodes. This could involve:
27 | * **Package management:** Scripts to install packages using package managers like `apt` or `yum`.
28 | * **Binary deployments:** Tools to deploy pre-compiled binaries to nodes.
29 | * **Configuration management:** Scripts to configure installed software.
30 | * **Troubleshooting Tooling:** A suite of tools to aid in diagnosing and resolving issues on Kubernetes nodes. This is likely to encompass:
31 | * **Log collection and analysis:** Tools to gather logs from various node components and facilitate analysis. to encompass:
32 | * **Log collection and analysis:** Tools to gather logs from various node components and facilitate analysis.
33 |
34 | ### Part 1: Scheduling an Init Container for Node Modification
35 |
36 | This example demonstrates how to use a DaemonSet with an init container to apply a `sysctl` configuration change to Kubernetes nodes upon pod startup. The DaemonSet will attempt to run on all available nodes.
37 |
38 | #### Concept:
39 |
40 | We deploy a DaemonSet ensuring one pod replica runs on each eligible node. This pod has an *init container* that runs first. The init container executes a command to modify a kernel parameter (`sysctl`) on the host node. Because modifying host kernel parameters requires high privileges, the init container runs in **`privileged`** mode and mounts the host's root filesystem. After the init container completes successfully, a minimal main container (`pause`) starts just to keep the pod running.
41 |
42 | #### Example `sysctl-init-daemonset.yaml`:
43 |
44 | ```yaml
45 | apiVersion: apps/v1
46 | kind: DaemonSet
47 | metadata:
48 | name: sysctl-init-daemonset
49 | labels:
50 | app: sysctl-modifier
51 | spec:
52 | selector:
53 | matchLabels:
54 | app: sysctl-modifier
55 | template:
56 | metadata:
57 | labels:
58 | app: sysctl-modifier
59 | spec:
60 | # Allow access to host PID namespace if needed for specific tools
61 | hostPID: true
62 | volumes:
63 | - name: host-root-fs
64 | hostPath:
65 | path: /
66 | type: Directory # Mount the node's root filesystem
67 | initContainers:
68 | - name: apply-sysctl-value
69 | image: gcr.io/gke-release/debian-base # Small image with shell tools
70 | # *** Requires privilege to modify host kernel settings ***
71 | securityContext:
72 | privileged: true
73 | volumeMounts:
74 | - name: host-root-fs
75 | mountPath: /host # Access the host filesystem at /host
76 | readOnly: false # Needs write access to modify sysctl typically
77 | command: ["/bin/sh", "-c"]
78 | args:
79 | - |
80 | echo "Attempting to set net.ipv4.ip_forward=1 on the host..."
81 | # Use chroot to execute the command in the host's root filesystem context
82 | if chroot /host sysctl -w net.ipv4.ip_forward=1; then
83 | echo "Successfully set net.ipv4.ip_forward."
84 | else
85 | echo "Failed to set net.ipv4.ip_forward." >&2
86 | exit 1 # Fail the init container if command fails
87 | fi
88 | # Add other setup commands here if needed
89 | containers:
90 | - name: pause-container
91 | # Minimal container just to keep the Pod running after init succeeds
92 | image: gcr.io/gke-release/pause:latest
93 | updateStrategy:
94 | type: RollingUpdate
95 | ```
96 |
97 | #### Walkthrough:
98 |
99 | 1. **Save the YAML:** Save the content above as `sysctl-init-daemonset.yaml`.
100 | 2. **Apply the DaemonSet:**
101 | ```bash
102 | kubectl apply -f sysctl-init-daemonset.yaml
103 | ```
104 | 3. **Verify Pod Creation:** Check that the DaemonSet pods are being created on your nodes:
105 | ```bash
106 | kubectl get pods -l app=sysctl-modifier -o wide
107 | ```
108 | * (Wait for pods to reach 'Running' state).*
109 | 4. **Check Init Container Logs:** View the logs of the init container on one of the pods to see its output:
110 | ```bash
111 | # Get a pod name first from the command above
112 | POD_NAME=$(kubectl get pods -l app=sysctl-modifier -o jsonpath='{.items[0].metadata.name}')
113 | kubectl logs $POD_NAME -c apply-sysctl-value
114 | ```
115 | You should see the "Attempting..." and "Successfully set..." messages.
116 | 5. **Verify on Node (Optional):** You can confirm the change by SSHing into a node where the pod ran and executing `sysctl net.ipv4.ip_forward`, or by running a privileged debug pod on that node.
117 |
118 | ---
119 |
120 | ### Part 2: Taint -> Configure with Init Container -> Untaint Workflow
121 |
122 | This section outlines a more advanced and automated method for node configuration. The workflow uses a single, intelligent DaemonSet that not only applies the configuration to tainted nodes but also automatically removes the taint from the node once its job is complete. This approach is ideal for streamlining configuration changes across a node pool without manual intervention to make the nodes schedulable again.
123 |
124 | The process leverages the kubectl binary available on GKE nodes and requires RBAC permissions for the DaemonSet's ServiceAccount to modify its own node object.
125 | #### Concept:
126 | 1. **Taint Node Pool:** A taint is applied to a node pool to prevent regular workloads from being scheduled, effectively reserving it for configuration.
127 | 2. **Deploy Smart DaemonSet:** A DaemonSet is deployed with three key characteristics:
128 | * **Toleration:** It has a `toleration` to allow it to run on the tainted nodes.
129 | * **Configuration Logic:** An init container runs privileged commands to configure the node (e.g., setting `sysctl` values).
130 | * **Untainting Logic:** After applying the configuration, the same container uses the node's `kubectl` tool to remove the taint from itself, making the node available for general use.
131 | 3. **Verification:** Once the DaemonSet pod completes its init container, the node is both configured and fully schedulable.
132 |
133 | ---
134 | #### Walkthrough:
135 |
136 | 1. **Taint the Node Pool:**
137 | * Identify your target GKE node pool, cluster, and location (zone/region).
138 | * Apply a specific taint using `gcloud`. Let's use `node.config.status/stage=configuring:NoSchedule`.
139 |
140 | ```bash
141 | # Replace placeholders with your actual values
142 | GKE_CLUSTER="your-cluster-name"
143 | NODE_POOL="your-node-pool-name"
144 | GKE_ZONE="your-zone" # Or GKE_REGION="your-region"
145 |
146 | gcloud container node-pools update $NODE_POOL \
147 | --cluster=$GKE_CLUSTER \
148 | --node-taints=node.config.status/stage=configuring:NoSchedule \
149 | --zone=$GKE_ZONE # Or --region=$GKE_REGION
150 | ```
151 | * Verify the taint is applied to nodes in the pool:
152 |
153 | ```bash
154 | kubectl describe node | grep Taints
155 | ```
156 | You should see node.config/status=initializing:NoSchedule.
157 |
158 | 2. **Create and Deploy the Self-Untainting DaemonSet:**
159 | * The following YAML creates all the necessary components: a **ServiceAccount** for permissions, a **ClusterRole** granting node-patching rights, a **ClusterRoleBinding** to link them, and the **DaemonSet** itself.
160 | Save the following content as `auto-untaint-daemonset.yaml`.
161 |
162 | ```
163 | # WARNING: THIS MAKES YOUR NODES LESS SECURE. This DaemonSet runs as privileged.
164 | # WARNING: This DaemonSet runs as privileged, which has significant
165 | # security implications. Only use this on clusters where you have
166 | # strict controls over what is deployed.
167 | ---
168 | apiVersion: v1
169 | kind: ServiceAccount
170 | metadata:
171 | name: node-config-sa
172 | namespace: default
173 | ---
174 | apiVersion: rbac.authorization.k8s.io/v1
175 | kind: ClusterRole
176 | metadata:
177 | name: node-patcher-role
178 | rules:
179 | - apiGroups: [""]
180 | resources: ["nodes"]
181 | # Permissions needed to read and remove a taint from the node.
182 | verbs: ["get", "patch", "update"]
183 | ---
184 | apiVersion: rbac.authorization.k8s.io/v1
185 | kind: ClusterRoleBinding
186 | metadata:
187 | name: node-config-binding
188 | subjects:
189 | - kind: ServiceAccount
190 | name: node-config-sa
191 | namespace: default
192 | roleRef:
193 | kind: ClusterRole
194 | name: node-patcher-role
195 | apiGroup: rbac.authorization.k8s.io
196 | ---
197 | apiVersion: apps/v1
198 | kind: DaemonSet
199 | metadata:
200 | name: auto-untaint-daemonset
201 | labels:
202 | app: auto-untaint-configurator
203 | spec:
204 | selector:
205 | matchLabels:
206 | app: auto-untaint-configurator
207 | updateStrategy:
208 | type: RollingUpdate
209 | template:
210 | metadata:
211 | labels:
212 | app: auto-untaint-configurator
213 | spec:
214 | serviceAccountName: node-config-sa
215 | hostPID: true
216 | # Toleration now matches the taint on your node.
217 | tolerations:
218 | - key: "node.config.status/stage"
219 | operator: "Equal"
220 | value: "configuring"
221 | effect: "NoSchedule"
222 | volumes:
223 | - name: host-root-fs
224 | hostPath:
225 | path: /
226 | initContainers:
227 | - name: configure-and-untaint
228 | image: ubuntu:22.04 # Using a standard container image.
229 | securityContext:
230 | privileged: true # Required for chroot and sysctl.
231 | env:
232 | - name: NODE_NAME
233 | valueFrom:
234 | fieldRef:
235 | fieldPath: spec.nodeName
236 | volumeMounts:
237 | - name: host-root-fs
238 | mountPath: /host
239 | command: ["/bin/bash", "-c"]
240 | args:
241 | - |
242 | # Using explicit error checking for each critical command.
243 |
244 | # Define the configuration and taint details.
245 | SYSCTL_PARAM="vm.max_map_count"
246 | SYSCTL_VALUE="262144"
247 | TAINT_KEY="node.config.status/stage"
248 |
249 | echo "Running configuration on node: ${NODE_NAME}"
250 |
251 | # 1. APPLY CONFIGURATION
252 | echo "--> Applying ${SYSCTL_PARAM}=${SYSCTL_VALUE}..."
253 | if ! chroot /host sysctl -w "${SYSCTL_PARAM}=${SYSCTL_VALUE}"; then
254 | echo "ERROR: Failed to apply sysctl parameter." >&2
255 | exit 1
256 | fi
257 | echo "--> Configuration applied successfully."
258 |
259 | # 2. UNTAINT THE NODE
260 | # This command removes the taint from the node this pod is running on.
261 | echo "--> Untainting node ${NODE_NAME} by removing taint ${TAINT_KEY}..."
262 | if ! /host/home/kubernetes/bin/kubectl taint node "${NODE_NAME}" "${TAINT_KEY}:NoSchedule-"; then
263 | echo "ERROR: Failed to untaint the node." >&2
264 | exit 1
265 | fi
266 | echo "--> Node has been untainted and is now schedulable."
267 | # The main container is minimal; it just keeps the pod running.
268 | containers:
269 | - name: pause-container
270 | image: registry.k8s.io/pause:3.9
271 | ```
272 |
273 |
274 | * **Apply the `DaemonSet` manifest.**
275 | ```bash
276 | kubectl apply -f auto-untaint-daemonset.yaml
277 | ```
278 | 3. **Validate the DaemonSet:**
279 |
280 | * **Verify the pods are running on the tainted nodes.**
281 | You should see the pod in a `Running` state after the init container completes.
282 | ```bash
283 | kubectl get pods -l app=auto-untaint-configurator -o wide
284 | ```
285 |
286 | * **Check the logs to confirm execution.**
287 | View the `initContainer` logs to ensure the script ran and untainted the node successfully.
288 | ```bash
289 | # Get a pod name from the command above
290 | POD_NAME=$(kubectl get pods -l app=auto-untaint-configurator -o jsonpath='{.items[0].metadata.name}')
291 |
292 | # Check the logs for that pod's init container
293 | kubectl logs $POD_NAME -c configure-and-untaint
294 | ```
295 | The output will confirm that the `sysctl` command ran and the node was untainted.
296 |
297 | ---
298 | ### Part 3: Privileged DaemonSet Tradeoffs and Security Restrictions
299 |
300 | Using `securityContext: privileged: true` in a DaemonSet (or any pod) is powerful but comes with significant security implications. It essentially disables most container isolation boundaries for that pod.
301 |
302 | #### The Tradeoff:
303 |
304 | * **Benefit:** Grants the container capabilities necessary for deep host system interactions, such as:
305 | * Modifying kernel parameters (`sysctl`).
306 | * Loading/unloading kernel modules (`modprobe`).
307 | * Accessing host devices (`/dev/*`).
308 | * Modifying protected host filesystems.
309 | * Full network stack manipulation (beyond standard Kubernetes networking).
310 | * Running tools that require raw socket access or specific hardware interactions.
311 | * **Cost:** Massively increased security risk and potential for node/cluster instability.
312 |
313 | #### Security Restrictions and Risks:
314 |
315 | * **Container Escape/Host Compromise:** A vulnerability within the privileged container's application or image can directly lead to root access on the host node. The attacker bypasses standard container defenses.
316 | * **Violation of Least Privilege:** Privileged mode grants *all* capabilities, likely far more than needed for a specific task. This broad access increases the potential damage if the container is compromised.
317 | * **Node Destabilization:** Accidental or malicious commands run within the privileged container (e.g., incorrect `sysctl` values, `rm -rf /host/boot`) can crash or corrupt the host node operating system.
318 | * **Lateral Movement:** Compromising one node via a privileged DaemonSet gives an attacker a strong foothold to attack other nodes, the Kubernetes control plane, or connected systems.
319 | * **Data Exposure:** Unrestricted access to the host filesystem (`/`) can expose sensitive data stored on the node, including credentials, keys, or data belonging to other pods (if accessible via host paths).
320 | * **Increased Attack Surface:** Exposes more of the host kernel's system calls and features to potential exploits from within the container.
321 |
322 | #### Best Practices / Mitigations:
323 |
324 | * **Avoid If Possible:** The most secure approach is to avoid **`privileged: true`** entirely.
325 | * **Use Linux Capabilities:** If elevated rights are needed, grant *specific* Linux capabilities (e.g., `NET_ADMIN`, `SYS_ADMIN`, `SYS_MODULE`) in the `securityContext.capabilities.add` field instead of full privilege. This follows the principle of least privilege.
326 | * **Limit Scope:** Run privileged DaemonSets only on dedicated, possibly tainted, node pools to contain the potential blast radius.
327 | * **Policy Enforcement:** Use GKE Policy Controller (or OPA Gatekeeper) to create policies that restrict, audit, or require justification for deploying privileged containers.
328 | * **Image Scanning & Trust:** Use GKE Binary Authorization and rigorous image scanning to ensure only vetted, trusted container images are run with privilege.
329 | * **Minimize Host Mounts:** Only mount the specific host paths needed, and use `readOnly: true` whenever possible. Avoid mounting the entire root filesystem (`/`) unless absolutely necessary.
330 | * **Regular Audits:** Periodically review all workloads running with **`privileged: true`**.
331 |
--------------------------------------------------------------------------------