├── disable-smt ├── cos │ ├── enable_smt_cos.sh.md5 │ ├── disable_smt_cos.sh.md5 │ ├── enable_smt_cos.sh │ └── disable_smt_cos.sh └── gke │ ├── enable-smt.yaml │ ├── disable-smt.yaml │ └── READ.md ├── containerd ├── containerd-http-proxy │ ├── sample_configmap.yaml │ ├── README.md │ └── configure_http_proxy.yaml ├── containerd-nofile-infinity │ ├── containerd-nofile-infinity-allowlist.yaml │ ├── containerd-nofile-infinity.yaml │ └── README.md ├── socket-tracer │ ├── containerd-socket-tracer-allowlist.yaml │ ├── cri-v1alpha2-api-deprecation-reporter-allowlist.yaml │ ├── cri-v1alpha2-api-deprecation-reporter.yaml │ ├── containerd-socket-tracer.yaml │ └── README.md ├── migrating-to-containerd │ ├── README.md │ └── find-nodepools-to-migrate.sh ├── debug-logging │ ├── README.md │ └── containerd-debug-logging-daemonset.yaml └── container-insecure-registry │ ├── README.md │ └── insecure-registry-config.yaml ├── troubleshooting ├── os-audit │ ├── README.md │ └── cos-auditd-logging.yaml ├── ssh-server-config │ ├── README.md │ ├── set-login-grace-time.yaml │ └── set-login-grace-time-gdcso-vmware.yaml ├── enable-kdump │ ├── disable-hung-task-panic-sysctl.yaml │ ├── ubuntu-kdump.md │ ├── ubuntu-enable-kdump.yaml │ └── cos-enable-kdump.yaml └── perf │ ├── perf-record.yaml │ ├── README.md │ └── perf-trace.yaml ├── .vscode └── launch.json ├── manual-node-upgrade ├── README.md └── manual_node_upgrade.sh ├── CONTRIBUTING.md ├── drop-small-mss ├── drop-small-mss.yaml └── READ.md ├── disable-mglru └── disable-mglru.yaml ├── kubelet └── kubelet-log-config │ ├── READ.md │ └── kubelet-log-config.yaml ├── gvisor ├── enable-gvisor-flags.yaml └── README.md ├── LICENSE └── README.md /disable-smt/cos/enable_smt_cos.sh.md5: -------------------------------------------------------------------------------- 1 | 78e4c15395235663789022f4cf3e0b60 -------------------------------------------------------------------------------- /disable-smt/cos/disable_smt_cos.sh.md5: -------------------------------------------------------------------------------- 1 | f11fa6d1a69c3008a5eb1e037aef3cf3 2 | -------------------------------------------------------------------------------- /containerd/containerd-http-proxy/sample_configmap.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: containerd-proxy-configmap 5 | data: 6 | HTTP_PROXY: http://proxy.example.com:80 7 | HTTPS_PROXY: https://proxy.example.com:443 8 | NO_PROXY: localhost,metadata.google.internal 9 | -------------------------------------------------------------------------------- /containerd/containerd-nofile-infinity/containerd-nofile-infinity-allowlist.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: auto.gke.io/v1 2 | kind: AllowlistSynchronizer 3 | metadata: 4 | name: gke-org-nofile-infinity-synchronizer 5 | spec: 6 | allowlistPaths: 7 | - "Gke-Org/nofile-infinity/gke-org-nofile-infinity-allowlist.yaml" 8 | -------------------------------------------------------------------------------- /containerd/socket-tracer/containerd-socket-tracer-allowlist.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: auto.gke.io/v1 2 | kind: AllowlistSynchronizer 3 | metadata: 4 | name: gke-org-containerd-socket-tracer-allowlist 5 | spec: 6 | allowlistPaths: 7 | - "Gke-Org/containerd-socket-tracer/gke-org-containerd-socket-tracer-allowlist.yaml" 8 | -------------------------------------------------------------------------------- /troubleshooting/os-audit/README.md: -------------------------------------------------------------------------------- 1 | The os-audit tool is the example code for 2 | [enabling Linux auditd logs on GKE nodes](https://cloud.google.com/kubernetes-engine/docs/how-to/linux-auditd-logging), 3 | which documents how to enable verbose operating system audit logs on Google 4 | Kubernetes Engine nodes running Container-Optimized OS. 5 | -------------------------------------------------------------------------------- /containerd/socket-tracer/cri-v1alpha2-api-deprecation-reporter-allowlist.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: auto.gke.io/v1 2 | kind: AllowlistSynchronizer 3 | metadata: 4 | name: gke-org-cri-v1alpha2-api-deprecation-reporter-allowlist 5 | spec: 6 | allowlistPaths: 7 | - "Gke-Org/containerd-socket-tracer/gke-org-cri-v1alpha2-api-deprecation-reporter-allowlist.yaml" 8 | -------------------------------------------------------------------------------- /.vscode/launch.json: -------------------------------------------------------------------------------- 1 | { 2 | // Use IntelliSense to learn about possible attributes. 3 | // Hover to view descriptions of existing attributes. 4 | // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 5 | "version": "0.2.0", 6 | "configurations": [ 7 | { 8 | "type": "bashdb", 9 | "request": "launch", 10 | "name": "Bash-Debug (simplest configuration)", 11 | "program": "${file}" 12 | } 13 | ] 14 | } -------------------------------------------------------------------------------- /manual-node-upgrade/README.md: -------------------------------------------------------------------------------- 1 | The sample script `manual_node_upgrade.sh` filters all node pools not matching the control plane's k8s version for a specified cluster. For each node pool identified by the filter, this script submits an upgrade request via the `gcloud` command. This script is idempotent and can be run as many times as necessary to ensure all node pools manually upgrade without side effects. 2 | 3 | For more information about GKE manual upgrades, consult the online documentation: https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster -------------------------------------------------------------------------------- /containerd/migrating-to-containerd/README.md: -------------------------------------------------------------------------------- 1 | # Migrating to Containerd 2 | 3 | Find information about running Containerd nodes on GKE [here](https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd). 4 | 5 | The sample script `find-nodepools-to-migrate.sh` iterates over all node pools across available projects, and for each node pool outputs the suggestion on whether the node pool should be migrated to Containerd. This script also outputs the node pool version and suggested migration command as listed in the [updating your node images](https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#updating-image-type) document. Make sure that you review the [known issues](https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues) for a node pool version before migration. 6 | -------------------------------------------------------------------------------- /containerd/debug-logging/README.md: -------------------------------------------------------------------------------- 1 | # Containerd Debugging Logging 2 | 3 | The `containerd-debug-logging-daemonset.yaml` is a daemonset that enables 4 | containerd debug logs. These logs may be useful for troubleshooting. Note that 5 | debug logs are quite verbose and such increase log volume for these logs. 6 | 7 | The daemonset includes a nodeSelector targeting 8 | `containerd-debug-logging=true`. To run the daemonset on selected nodes for 9 | debugging, labels the nodes with the corresponding label (`kubectl label node 10 | ${NODE_NAME} containerd-debug-logging=true`) 11 | 12 | Otherwise, modify the daemonset's existing 13 | [nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) 14 | or add an [node 15 | affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) 16 | to target the desired nodes or node pools. 17 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # How to Contribute 2 | 3 | We'd love to accept your patches and contributions to this project. There are 4 | just a few small guidelines you need to follow. 5 | 6 | ## Contributor License Agreement 7 | 8 | Contributions to this project must be accompanied by a Contributor License 9 | Agreement. You (or your employer) retain the copyright to your contribution; 10 | this simply gives us permission to use and redistribute your contributions as 11 | part of the project. Head over to to see 12 | your current agreements on file or to sign a new one. 13 | 14 | You generally only need to submit a CLA once, so if you've already submitted one 15 | (even if it was for a different project), you probably don't need to do it 16 | again. 17 | 18 | ## Code reviews 19 | 20 | All submissions, including submissions by project members, require review. We 21 | use GitHub pull requests for this purpose. Consult 22 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more 23 | information on using pull requests. 24 | 25 | ## Community Guidelines 26 | 27 | This project follows [Google's Open Source Community 28 | Guidelines](https://opensource.google.com/conduct/). 29 | -------------------------------------------------------------------------------- /containerd/debug-logging/containerd-debug-logging-daemonset.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: DaemonSet 3 | metadata: 4 | name: containerd-debug-logging 5 | namespace: default 6 | labels: 7 | k8s-app: containerd-debug-logging 8 | spec: 9 | selector: 10 | matchLabels: 11 | name: containerd-debug-logging 12 | template: 13 | metadata: 14 | labels: 15 | name: containerd-debug-logging 16 | spec: 17 | nodeSelector: 18 | containerd-debug-logging: "true" 19 | hostPID: true 20 | containers: 21 | - name: startup-script 22 | image: gke.gcr.io/startup-script:v2 23 | imagePullPolicy: Always 24 | securityContext: 25 | privileged: true 26 | env: 27 | - name: STARTUP_SCRIPT 28 | value: | 29 | set -o errexit 30 | set -o pipefail 31 | set -o nounset 32 | 33 | echo "creating containerd.service.d directory" 34 | mkdir -p /etc/systemd/system/containerd.service.d 35 | echo "creating 10-level_debug.conf file" 36 | echo -e "[Service]\nExecStart=\nExecStart=/usr/bin/containerd --log-level debug" > /etc/systemd/system/containerd.service.d/10-level_debug.conf 37 | echo "Reloading systemd management configuration" 38 | systemctl daemon-reload 39 | echo "Restarting containerd..." 40 | systemctl restart containerd 41 | -------------------------------------------------------------------------------- /manual-node-upgrade/manual_node_upgrade.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # Copyright 2023 Google LLC 4 | 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | 9 | # https://www.apache.org/licenses/LICENSE-2.0 10 | 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | # Example: `./manual_node_upgrade.sh CLUSTER REGION` 18 | # 19 | 20 | CLUSTER_NAME=$1 21 | REGION=$2 22 | 23 | [ -z "$CLUSTER_NAME" ] || [ -z "$REGION" ] \ 24 | && echo "Usage: ./manual_node_upgrade.sh CLUSTER_NAME REGION" \ 25 | && exit 1; 26 | 27 | # fetch current control plane version 28 | CLUSTER_VERSION=$(gcloud container clusters describe \ 29 | "$CLUSTER_NAME" --format="value(currentMasterVersion)" \ 30 | --region=$REGION) 31 | 32 | # list node pools with version not matching control plane 33 | # and then issue an upgrade command for each identified node pool 34 | for np in $(gcloud container node-pools list \ 35 | --format="value(name)" --filter="version!=$CLUSTER_VERSION" \ 36 | --cluster "$CLUSTER_NAME" --region=$REGION); do 37 | gcloud container clusters upgrade "$CLUSTER_NAME" --node-pool $np \ 38 | --region=$REGION --quiet; 39 | done -------------------------------------------------------------------------------- /troubleshooting/ssh-server-config/README.md: -------------------------------------------------------------------------------- 1 | The ssh-server-config tool is a Kubernates DaemonSet that set [loginGraceTime](https://man.openbsd.org/sshd#g) to 0. 2 | This tool is important because it prevents the SSH server from disconnecting you during lengthy or complex debugging sessions on a cluster node. 3 | 4 | ## :warning: This configuration may increase the risk of denial of service attacks and may cause issues with legitimate SSH access. 5 | 6 | ## How to use it? 7 | Apply it to all nodes in your cluster by running the 8 | following command. Run the command once per cluster per 9 | Google Cloud Platform project. 10 | 11 | ### GKE Clusters 12 | 13 | ``` 14 | kubectl apply -f \ 15 | https://raw.githubusercontent.com/GoogleCloudPlatform\ 16 | /k8s-node-tools/master/ssh-server-config/set-login-grace-time.yaml 17 | ``` 18 | 19 | ### GDC software-only for VMware Clusters 20 | 21 | ``` 22 | kubectl apply -f \ 23 | https://raw.githubusercontent.com/GoogleCloudPlatform\ 24 | /k8s-node-tools/master/troubleshooting/ssh-server-config/set-login-grace-time-gdcso-vmware.yaml 25 | ``` 26 | 27 | ## How to get the result? 28 | Run the command below to get related log. 29 | ``` 30 | kubectl -n kube-system logs -l app=ssh-server-config -c ssh-server-config 31 | ``` 32 | 33 | ### GKE Clusters 34 | # How to remove the DS with GKE Cluster as example. Swap URL for GDC example. 35 | ``` 36 | kubectl delete -f \ 37 | https://raw.githubusercontent.com/GoogleCloudPlatform\ 38 | /k8s-node-tools/master/troubleshooting/ssh-server-config/set-login-grace-time.yaml 39 | ``` 40 | 41 | 42 | -------------------------------------------------------------------------------- /containerd/containerd-http-proxy/README.md: -------------------------------------------------------------------------------- 1 | # Configure Containerd HTTP/S proxy 2 | 3 | This guide outlines the steps to configure a HTTP/S proxy for Containerd on GKE nodes, including Autopilot mode clusters. Typical use cases include access of external image repositories for container pulls. 4 | 5 | ## Instructions 6 | 7 | 1. Create a ConfigMap named `containerd-proxy-configmap` that includes the values for `HTTP_PROXY`, `HTTPS_PROXY`, and `NO_PROXY` (optional). These values are used as environment variables to configure the proxy settings for the Containerd service. A sample ConfigMap configuration is provided in `sample_configmap.yaml`. Please modify this sample with your proxy settings before applying it to your cluster. 8 | 9 | ``` 10 | kubectl apply -f sample_configmap.yaml 11 | ``` 12 | 13 | 2. Deploy the daemonset in `configure_http_proxy.yaml`. As it has been specifically allowlisted for GKE Autopilot, this **manifest in this repo cannot be changed if you are deploying to GKE Autopilot mode clusters**. 14 | 15 | ``` 16 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd_http_proxy/configure_http_proxy.yaml 17 | ``` 18 | 19 | ## Note 20 | **Any update on the `configure_http_proxy.yaml` will break the allowlist for GKE Autopilot**. If you need to make any necessary change, please ask your Google Cloud sales representative to reach to the GKE Autopilot team. 21 | 22 | How to remove it? 23 | ```bash 24 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd_http_proxy/containerd-nofile-infinity-allowlist.yaml 25 | ``` 26 | -------------------------------------------------------------------------------- /drop-small-mss/drop-small-mss.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | kind: DaemonSet 16 | apiVersion: apps/v1 17 | metadata: 18 | name: drop-small-mss 19 | namespace: kube-system 20 | labels: 21 | app: drop-small-mss 22 | spec: 23 | selector: 24 | matchLabels: 25 | app: drop-small-mss 26 | template: 27 | metadata: 28 | labels: 29 | app: drop-small-mss 30 | annotations: 31 | scheduler.alpha.kubernetes.io/critical-pod: "" 32 | spec: 33 | hostPID: true 34 | containers: 35 | - name: drop-small-mss 36 | image: k8s.gcr.io/startup-script:v2 37 | imagePullPolicy: Always 38 | securityContext: 39 | privileged: true 40 | env: 41 | - name: STARTUP_SCRIPT 42 | value: | 43 | #! /bin/bash 44 | 45 | set -o errexit 46 | set -o pipefail 47 | set -o nounset 48 | 49 | iptables -w -t mangle -I PREROUTING -m comment --comment "drop-small-mss" -p tcp -m tcpmss --mss 1:500 -j DROP 50 | priorityClassName: system-node-critical 51 | tolerations: 52 | - operator: Exists 53 | -------------------------------------------------------------------------------- /disable-mglru/disable-mglru.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2024 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | # This DaemonSet disables the kernel option Multi-Gen LRU. 16 | # 17 | # - First, update your GKE node pools with the annotation `disable-mglru: true` 18 | # or your preferred annotation, if you update the annotation also update 19 | # the nodeSelection section below 20 | # - Next, deploy the DaemonSet to your cluster. 21 | apiVersion: apps/v1 22 | kind: DaemonSet 23 | metadata: 24 | name: disable-mglru 25 | namespace: default 26 | labels: 27 | k8s-app: disable-mglru 28 | spec: 29 | selector: 30 | matchLabels: 31 | name: disable-mglru 32 | template: 33 | metadata: 34 | labels: 35 | name: disable-mglru 36 | spec: 37 | nodeSelector: 38 | disable-mglru: "true" 39 | hostPID: true 40 | containers: 41 | - name: startup-script 42 | image: gke.gcr.io/startup-script:v2 43 | imagePullPolicy: Always 44 | securityContext: 45 | privileged: true 46 | env: 47 | - name: STARTUP_SCRIPT 48 | value: | 49 | set -o errexit 50 | set -o pipefail 51 | set -o nounset 52 | echo n > /sys/kernel/mm/lru_gen/enabled 53 | -------------------------------------------------------------------------------- /containerd/containerd-nofile-infinity/containerd-nofile-infinity.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: DaemonSet 3 | metadata: 4 | name: nofile-infinity 5 | namespace: default 6 | labels: 7 | k8s-app: nofile-infinity 8 | spec: 9 | selector: 10 | matchLabels: 11 | name: nofile-infinity 12 | updateStrategy: 13 | type: RollingUpdate 14 | template: 15 | metadata: 16 | labels: 17 | name: nofile-infinity 18 | spec: 19 | nodeSelector: 20 | cloud.google.com/gke-container-runtime: "containerd" 21 | kubernetes.io/os: "linux" 22 | hostPID: true 23 | tolerations: 24 | - operator: "Exists" 25 | effect: "NoExecute" 26 | - operator: "Exists" 27 | effect: "NoSchedule" 28 | volumes: 29 | - name: host 30 | hostPath: 31 | path: / 32 | type: DirectoryOrCreate 33 | initContainers: 34 | - name: startup-script 35 | image: gke.gcr.io/debian-base:bookworm-v1.0.0-gke.1 36 | imagePullPolicy: Always 37 | securityContext: 38 | privileged: true 39 | volumeMounts: 40 | - name: host 41 | mountPath: /host 42 | command: 43 | - /bin/sh 44 | - -c 45 | - | 46 | set -e 47 | set -u 48 | echo "Generating containerd system drop in config for nofile limit" 49 | nofile_limit_path="/host/etc/systemd/system/containerd.service.d/40-LimitNOFILE-infinity.conf" 50 | cat >> "${nofile_limit_path}" < /dev/null; then 45 | echo "'nosmt' already present on the kernel command line. Nothing to do." 46 | return 47 | fi 48 | echo "Attempting to set 'nosmt' on the kernel command line." 49 | if [[ "${EUID}" -ne 0 ]]; then 50 | echo "This script must be run as root." 51 | return 1 52 | fi 53 | check_not_secure_boot 54 | 55 | dir="$(mktemp -d)" 56 | mount /dev/sda12 "${dir}" 57 | sed -i -e "s|cros_efi|cros_efi nosmt|g" "${dir}/efi/boot/grub.cfg" 58 | umount "${dir}" 59 | rmdir "${dir}" 60 | echo "Rebooting." 61 | reboot 62 | } 63 | 64 | disable_smt 65 | -------------------------------------------------------------------------------- /drop-small-mss/READ.md: -------------------------------------------------------------------------------- 1 | # drop-small-mss.yaml (Network Security Hardening) 2 | 3 | The `drop-small-mss` tool is a Kubernetes DaemonSet that adds a firewall rule to every cluster node to drop incoming TCP packets with an unusually small Maximum Segment Size (MSS). This tool is important for hardening network security and protecting nodes from certain types of denial-of-service (DoS) attacks that exploit small packet sizes to exhaust server resources. 🛡️ 4 | 5 | --- 6 | ### ⚠️ Warning 7 | This tool runs as a **privileged** and **system-node-critical** pod to modify the host's core networking rules. While this rule protects against a known attack vector, it could potentially interfere with legitimate but non-standard network clients or tunnels that require a small MSS to function correctly. 8 | 9 | --- 10 | ### **Prerequisites** 11 | 12 | There are no special prerequisites. This DaemonSet is designed to run on **all nodes** in your cluster, including control-plane nodes. 13 | 14 | --- 15 | ### **How to use it?** 16 | 17 | 1. Save the script to a file named `drop-small-mss.yaml`. 18 | 2. Apply the DaemonSet to your cluster. 19 | ```bash 20 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/drop-small-mss/drop-small-mss.yaml 21 | ``` 22 | 23 | --- 24 | ### **How to get the result?** 25 | 26 | The result of this tool is a new `iptables` rule on every node. You can verify that the rule was successfully applied: 27 | 28 | 1. **SSH into any node** in your cluster. 29 | 2. **List the firewall rules** in the `mangle` table. You should see the new rule with the "drop-small-mss" comment at the top of the `PREROUTING` chain. 30 | ```bash 31 | iptables -t mangle -L PREROUTING -v 32 | ``` 33 | 34 | --- 35 | ### **How to remove it?** 36 | 37 | You can delete the DaemonSet to prevent the rule from being applied to any new nodes that join the cluster. 38 | 39 | ```bash 40 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/drop-small-mss/drop-small-mss.yaml -------------------------------------------------------------------------------- /containerd/containerd-nofile-infinity/README.md: -------------------------------------------------------------------------------- 1 | # Configure Containerd NOFILE Limit 2 | 3 | This guide outlines the steps to configure the `LimitNOFILE` setting for the containerd service on GKE nodes. This is typically used to increase the maximum number of open file descriptors allowed for containerd, which can be beneficial for high-concurrency workloads or specific applications that require a large number of open files. Since containerd 2.0 the `LimitNOFILE` has been removed - see containerd/containerd#8924 for more details. 4 | 5 | ## Prerequiste for GKE Autopilot Clusters: 6 | Deploy the `AllowListSynchronizer` resource in `containerd-nofile-infinity-allowlist.yaml`. This resource updates [Autopilot's security policies](https://cloud.google.com/kubernetes-engine/docs/how-to/run-autopilot-partner-workloads#about-allowlistsynchronizer) to run the privileged daemonset. 7 | 8 | ```bash 9 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd-nofile-infinity/containerd-nofile-infinity-allowlist.yaml 10 | ``` 11 | 12 | ## Instructions 13 | 14 | 1. Deploy the daemonset in `containerd-nofile-infinity.yaml`. This DaemonSet runs a privileged container that modifies the `containerd.service` systemd unit on each node to set `LimitNOFILE=infinity` and then restarts the Containerd service. 15 | 16 | ```bash 17 | kubectl apply -f containerd-nofile-infinity.yaml 18 | ``` 19 | 20 | ## Note 21 | **This DaemonSet is specifically allowlisted for GKE Autopilot clusters.** Attempting to deploy privileged DaemonSets that modify the underlying host OS on Autopilot may lead to unexpected behavior, stability issues, or prevented by Autopilot's security policies. If you need to make any necessary change, please ask your Google Cloud sales representative to reach to the GKE Autopilot team. 22 | 23 | How to remove it? 24 | ```bash 25 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/containerd/containerd-nofile-infinity/containerd-nofile-infinity-allowlist.yaml 26 | ``` -------------------------------------------------------------------------------- /troubleshooting/enable-kdump/ubuntu-kdump.md: -------------------------------------------------------------------------------- 1 | # kdump for Ubuntu 2 | 3 | ## Obtaining an kdump 4 | 5 | Use the `ubuntu-enable-kdump.yaml` DaemonSet to install and setup kdump on a set 6 | of nodes. The DaemonSet uses the `enable-kdump=true` node selector, so nodes 7 | must be labeled 8 | 9 | ``` 10 | kubectl label nodes ${NODE_NAME} enable-kdump=true 11 | ``` 12 | 13 | ## Triggering a test kdump 14 | 15 | SSH into a node and trigger a system crash with sysrq 16 | 17 | ``` 18 | sudo -i 19 | sysctl -w kernel.sysrq=1 20 | echo c > /proc/sysrq-trigger 21 | ``` 22 | 23 | A dump will be written to `/var/crash` 24 | 25 | ## Analyzing an kdump 26 | 27 | Create a VM for the analysis 28 | 29 | ``` 30 | gcloud beta compute instances create dump-test-vm \ 31 | --machine-type=e2-standard-4 \ 32 | --image-family=ubuntu-1804-lts \ 33 | --image-project=ubuntu-os-cloud \ 34 | --boot-disk-size=100GB \ 35 | --boot-disk-type=pd-ssd \ 36 | --zone=us-central1-c 37 | ``` 38 | 39 | SCP the contents of `/var/crash` to dump-test-vm 40 | 41 | Find the right deb for the correct kernel version, see 42 | [here](https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+packages?field.name_filter=linux-gke&field.status_filter=published). 43 | Obtain the url for the deb for the linux image with debug symbols, e.g. for 44 | `linux-gke-5.0 - 5.0.0-1046.47` the deb containing the vmlinux can be obtained 45 | [here](https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/18789932/+files/linux-image-unsigned-5.0.0-1032-gke-dbgsym_5.0.0-1032.33_amd64.ddeb). 46 | 47 | ``` 48 | gcloud compute ssh dump-test-vm 49 | sudo apt-get update && sudo apt-get install -y linux-crashdump 50 | 51 | cd ${HOME} 52 | # Location of deb for vmlinux 53 | LINUX_DEB_IMAGE_URL="https://launchpad.net/..." 54 | wget "${LINUX_DEB_IMAGE_URL}" 55 | ar -x linux-image-unsigned-5.0.0-1032-gke-dbgsym_5.0.0-1032.33_amd64.ddeb 56 | mkdir debug_image 57 | tar -xf data.tar.xz -C debug_image/ 58 | 59 | # Contents of /var/crash from the crash dump. 60 | CRASH_DUMP="var/crash/SOME_TIMESTAMP/dump.SOME_TIMESTAMP" 61 | 62 | # Start debugging! 63 | crash debug_image/usr/lib/debug/boot/vmlinux-5.0.0-1032-gke ${CRASH_DUMP} 64 | ``` 65 | -------------------------------------------------------------------------------- /troubleshooting/ssh-server-config/set-login-grace-time.yaml: -------------------------------------------------------------------------------- 1 | kind: DaemonSet 2 | apiVersion: apps/v1 3 | metadata: 4 | name: ssh-server-config 5 | namespace: kube-system 6 | labels: 7 | app: ssh-server-config 8 | spec: 9 | selector: 10 | matchLabels: 11 | app: ssh-server-config 12 | template: 13 | metadata: 14 | labels: 15 | app: ssh-server-config 16 | spec: 17 | hostPID: true 18 | initContainers: 19 | - name: ssh-server-config 20 | image: gke.gcr.io/debian-base:bookworm-v1.0.3-gke.0@sha256:91b29592ee0b782c0ab777bfcabd14a0ae83d8e8eb90d3f0eb500acafae3f4e5 21 | securityContext: 22 | privileged: true 23 | command: 24 | - /bin/sh 25 | - -c 26 | - | 27 | set -e 28 | set -u 29 | if [ ! -e "/etc/ssh/sshd_config" ] ; then 30 | echo "/etc/ssh/sshd_config not found" 31 | exit 1 32 | fi 33 | 34 | cp /etc/ssh/sshd_config /etc/ssh/sshd_config.cp 35 | if grep -q "^LoginGraceTime" "/etc/ssh/sshd_config.cp"; then 36 | # Update existing LoginGraceTime 37 | sed -i "s/^LoginGraceTime.*/LoginGraceTime 0/" "/etc/ssh/sshd_config.cp" 38 | else 39 | # Add new LoginGraceTime 40 | echo "LoginGraceTime 0" >> "/etc/ssh/sshd_config.cp" 41 | fi 42 | 43 | cp /etc/ssh/sshd_config.cp /etc/ssh/sshd_config 44 | rm /etc/ssh/sshd_config.cp 45 | 46 | 47 | EXEC="nsenter -t 1 -m -p --" 48 | $EXEC systemctl reload sshd 49 | echo "sshd logingracetime after restart:" 50 | $EXEC sshd -T | grep logingracetime 51 | volumeMounts: 52 | - name: sshd-config 53 | mountPath: /etc/ssh/sshd_config 54 | containers: 55 | - name: pause-container 56 | image: gke.gcr.io/pause:3.7@sha256:5b658f3c4f034a9619ad7e6d1ee49ee532a1e0a598dc68b06d17b6036116b924 57 | volumes: 58 | - name: sshd-config 59 | hostPath: 60 | path: /etc/ssh/sshd_config 61 | type: File 62 | 63 | 64 | 65 | -------------------------------------------------------------------------------- /kubelet/kubelet-log-config/READ.md: -------------------------------------------------------------------------------- 1 | # kubelet-log-config.yaml (Container Log Rotation) 2 | 3 | The `kubelet-log-config` tool is a Kubernetes DaemonSet that configures container log rotation settings on a node by modifying the kubelet's configuration file. This tool is important for managing disk space on your nodes by controlling the size and number of log files kept for each container. 🪵 4 | 5 | --- 6 | ### ⚠️ Warning 7 | This tool runs as **privileged** and **restarts the kubelet service** on the node. A kubelet restart can disrupt running pods. It is highly recommended to apply this configuration to a new, empty node pool and then safely migrate your workloads to the newly configured nodes. 8 | 9 | --- 10 | ### **Prerequisites** 11 | 12 | This tool only runs on nodes with a specific label. You must label the node(s) you want to configure before applying the DaemonSet. 13 | 14 | 1. **Label the node:** 15 | ```bash 16 | kubectl label node kubelet-log-config=true 17 | ``` 18 | 19 | --- 20 | ### **How to use it?** 21 | 22 | 1. Save the script to a file named `kubelet-log-config.yaml`. 23 | 2. Before applying, you can edit the file to change the values for `CONTAINER_LOG_MAX_SIZE` (e.g., "20Mi") and `CONTAINER_LOG_MAX_FILES` (e.g., "10"). 24 | 3. Apply the DaemonSet to your cluster: 25 | ```bash 26 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/kubelet/kubelet-log-config/kubelet-log-config.yaml 27 | ``` 28 | 29 | --- 30 | ### **How to get the result?** 31 | 32 | You can verify that the script ran successfully in two ways: 33 | 34 | 1. **Check the pod logs:** The pod runs in the `kube-system` namespace and will output "Success!" upon completion. 35 | ```bash 36 | kubectl logs -n kube-system -l name=kubelet-log-config 37 | ``` 38 | 2. **Inspect the node's configuration:** SSH into the labeled node and view the kubelet config file to see the new values. 39 | ```bash 40 | cat /home/kubernetes/kubelet-config.yaml 41 | ``` 42 | 43 | --- 44 | ### **How to remove it?** 45 | 46 | You can remove the DaemonSet to prevent it from configuring any new nodes. 47 | 48 | ```bash 49 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/kubelet/kubelet-log-config/kubelet-log-config.yaml -------------------------------------------------------------------------------- /containerd/socket-tracer/cri-v1alpha2-api-deprecation-reporter.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: DaemonSet 3 | metadata: 4 | name: cri-v1alpha2-api-deprecation-reporter 5 | namespace: default 6 | labels: 7 | k8s-app: cri-v1alpha2-api-deprecation-reporter 8 | spec: 9 | selector: 10 | matchLabels: 11 | name: cri-v1alpha2-api-deprecation-reporter 12 | template: 13 | metadata: 14 | labels: 15 | name: cri-v1alpha2-api-deprecation-reporter 16 | annotations: 17 | autopilot.gke.io/no-connect: "true" 18 | spec: 19 | hostPID: true 20 | containers: 21 | - name: reporter 22 | image: mirror.gcr.io/ubuntu:24.04 23 | command: ["/bin/sh", "-c"] 24 | args: 25 | - | 26 | apt-get update && apt-get install -y jq 27 | 28 | echo "time=\"$(date -u +'%Y-%m-%dT%H:%M:%SZ')\" msg=\"checking for CRI v1alpha2 API deprecation warnings\" node=\"$NODE_NAME\"" 29 | 30 | while true; do 31 | DEPRECATION_DATA=$(nsenter -at 1 -- /usr/bin/ctr deprecations list --format json) 32 | V1ALPHA2_WARNING=$(echo "$DEPRECATION_DATA" | jq '.[] | select(.id == "io.containerd.deprecation/cri-api-v1alpha2")') 33 | if [ -n "$V1ALPHA2_WARNING" ]; then 34 | LAST_OCCURRENCE=$(echo $V1ALPHA2_WARNING | jq -r .lastOccurrence) 35 | echo "time=\"$(date -u +'%Y-%m-%dT%H:%M:%SZ')\" msg=\"found CRI v1alpha2 API deprecation warning\" node=\"$NODE_NAME\" lastOccurrence=\"$LAST_OCCURRENCE\"" 36 | else 37 | echo "time=\"$(date -u +'%Y-%m-%dT%H:%M:%SZ')\" msg=\"CRI v1alpha2 API deprecation warning not found on this node\" node=\"$NODE_NAME\"" 38 | fi 39 | 40 | # NOTE: You can update this interval as needed. 41 | sleep $INTERVAL 42 | done 43 | env: 44 | - name: NODE_NAME 45 | valueFrom: 46 | fieldRef: 47 | fieldPath: spec.nodeName 48 | - name: INTERVAL 49 | value: "60" 50 | securityContext: 51 | # Privileged is required to use 'nsenter' to enter the host's PID 52 | # namespace to run 'ctr' on the node. 53 | privileged: true 54 | tolerations: 55 | - key: "node.kubernetes.io/not-ready" 56 | operator: "Exists" 57 | effect: "NoExecute" 58 | -------------------------------------------------------------------------------- /troubleshooting/ssh-server-config/set-login-grace-time-gdcso-vmware.yaml: -------------------------------------------------------------------------------- 1 | kind: DaemonSet 2 | apiVersion: apps/v1 3 | metadata: 4 | name: ssh-server-config 5 | namespace: kube-system 6 | labels: 7 | app: ssh-server-config 8 | spec: 9 | selector: 10 | matchLabels: 11 | app: ssh-server-config 12 | template: 13 | metadata: 14 | labels: 15 | app: ssh-server-config 16 | spec: 17 | hostPID: true 18 | tolerations: 19 | - operator: Exists 20 | initContainers: 21 | - name: ssh-server-config 22 | image: gke.gcr.io/debian-base:bookworm-v1.0.3-gke.0@sha256:91b29592ee0b782c0ab777bfcabd14a0ae83d8e8eb90d3f0eb500acafae3f4e5 23 | securityContext: 24 | privileged: true 25 | command: 26 | - /bin/sh 27 | - -c 28 | - | 29 | set -e 30 | set -u 31 | if [ ! -e "/etc/ssh/sshd_config" ] ; then 32 | echo "/etc/ssh/sshd_config not found" 33 | exit 1 34 | fi 35 | 36 | cp /etc/ssh/sshd_config /etc/ssh/sshd_config.cp 37 | if grep -q "^LoginGraceTime" "/etc/ssh/sshd_config.cp"; then 38 | # Update existing LoginGraceTime 39 | sed -i "s/^LoginGraceTime.*/LoginGraceTime 0/" "/etc/ssh/sshd_config.cp" 40 | else 41 | # Add new LoginGraceTime 42 | echo "LoginGraceTime 0" >> "/etc/ssh/sshd_config.cp" 43 | fi 44 | 45 | cp /etc/ssh/sshd_config.cp /etc/ssh/sshd_config 46 | rm /etc/ssh/sshd_config.cp 47 | 48 | 49 | EXEC="nsenter -t 1 -m -p --" 50 | $EXEC systemctl reload sshd 51 | echo "sshd logingracetime after restart:" 52 | $EXEC sshd -T | grep logingracetime 53 | resources: 54 | requests: 55 | memory: 5Mi 56 | cpu: 5m 57 | volumeMounts: 58 | - name: sshd-config 59 | mountPath: /etc/ssh/sshd_config 60 | containers: 61 | - name: pause-container 62 | image: gke.gcr.io/pause:3.7@sha256:5b658f3c4f034a9619ad7e6d1ee49ee532a1e0a598dc68b06d17b6036116b924 63 | volumes: 64 | - name: sshd-config 65 | hostPath: 66 | path: /etc/ssh/sshd_config 67 | type: File 68 | 69 | 70 | -------------------------------------------------------------------------------- /gvisor/enable-gvisor-flags.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | # Deploy this DaemonSet to enable flag override to gVisor pods. To set flags for 16 | # a given pod, add pod annotations with the following format: 17 | # dev.gvisor.flag.: 18 | # 19 | # Here is an example that enables "debug-log", "debug", and "strace" flags: 20 | # metadata: 21 | # annotations: 22 | # dev.gvisor.flag.debug-log: "/tmp/sandbox-%ID/" 23 | # dev.gvisor.flag.debug: "true" 24 | # dev.gvisor.flag.strace: "true" 25 | # 26 | # Note: this is supported starting from 1.18.6-gke.3504. 27 | 28 | apiVersion: apps/v1 29 | kind: DaemonSet 30 | metadata: 31 | name: enable-gvisor-flags 32 | namespace: kube-system 33 | spec: 34 | selector: 35 | matchLabels: 36 | name: enable-gvisor-flags 37 | updateStrategy: 38 | type: RollingUpdate 39 | template: 40 | metadata: 41 | labels: 42 | name: enable-gvisor-flags 43 | spec: 44 | tolerations: 45 | - operator: Exists 46 | volumes: 47 | - name: host 48 | hostPath: 49 | path: / 50 | initContainers: 51 | - name: enable-gvisor-flags 52 | image: ubuntu 53 | command: 54 | - /bin/bash 55 | - -c 56 | - echo -e ' allow-flag-override = "true"' >> "/host/run/containerd/runsc/config.toml" 57 | volumeMounts: 58 | - name: host 59 | mountPath: /host 60 | resources: 61 | requests: 62 | memory: 5Mi 63 | cpu: 5m 64 | securityContext: 65 | privileged: true 66 | containers: 67 | - image: gke.gcr.io/pause:3.8@sha256:880e63f94b145e46f1b1082bb71b85e21f16b99b180b9996407d61240ceb9830 68 | name: pause 69 | nodeSelector: 70 | "sandbox.gke.io/runtime": "gvisor" 71 | -------------------------------------------------------------------------------- /troubleshooting/perf/perf-record.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # 16 | # 17 | # Change TARGET_PGREP below to names of processes to trace. 18 | # Also change KERNEL_VERSION below to version being tested. 19 | 20 | apiVersion: apps/v1 21 | kind: DaemonSet 22 | metadata: 23 | name: enable-perf-record 24 | labels: 25 | app: enable-perf-record 26 | spec: 27 | selector: 28 | matchLabels: 29 | name: enable-perf-record 30 | template: 31 | metadata: 32 | labels: 33 | name: enable-perf-record 34 | spec: 35 | nodeSelector: 36 | "enable-perf": "true" 37 | hostPID: true 38 | volumes: 39 | - name: host 40 | hostPath: 41 | path: / 42 | containers: 43 | - name: enable-perf-record 44 | image: gke.gcr.io/debian-base 45 | imagePullPolicy: Always 46 | volumeMounts: 47 | - name: host 48 | mountPath: /host 49 | securityContext: 50 | privileged: true 51 | command: 52 | - /bin/bash 53 | - -c 54 | - | 55 | 56 | set -o errexit 57 | set -o pipefail 58 | set -o nounset 59 | 60 | KERNEL_VERSION="5.0.0" 61 | 62 | apt-get update && apt-get install -y curl procps daemontools build-essential bison flex libelf-dev binutils-dev 63 | curl -O https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/v"${KERNEL_VERSION}"/perf-"${KERNEL_VERSION}".tar.gz 64 | tar xzf perf-"${KERNEL_VERSION}".tar.gz 65 | make -C perf-"${KERNEL_VERSION}"/tools/perf install 66 | PERF="/root/bin/perf" 67 | 68 | d=$(date '+%Y-%m-%dT%H:%M:%SZ') 69 | out_dir="/host/var/log/perf_record/${d}" 70 | mkdir -p "${out_dir}" 71 | cd "${out_dir}" 72 | echo "starting perf recording! will dump to ${out_dir}" 73 | 74 | "${PERF}" record -F 999 -g --timestamp-filename --switch-output="10s" > /dev/null 75 | -------------------------------------------------------------------------------- /containerd/container-insecure-registry/README.md: -------------------------------------------------------------------------------- 1 | # Configure Insecure Registries for Containerd on GKE 2 | 3 | This guide outlines how to configure Google Kubernetes Engine (GKE) nodes to pull container images from an insecure (HTTP) registry. This is achieved by deploying a `DaemonSet` that modifies the `containerd` configuration on each node to trust the specified registry. This is typically used for accessing local or private container registries that are not configured with TLS. 4 | 5 | --- 6 | 7 | ## Instructions 8 | 9 | 1. **Modify the DaemonSet YAML file.** Before applying the manifest, you must edit it to specify the address of your insecure registry. Locate the `env` section for the `startup-script` container and change the `value` for the `ADDRESS` variable from `REGISTRY_ADDRESS` to your registry's actual address (e.g., `my-registry.local:5000`). 10 | 11 | **Original:** 12 | ```yaml 13 | env: 14 | - name: ADDRESS 15 | value: "REGISTRY_ADDRESS" 16 | ``` 17 | **Example after editing:** 18 | ```yaml 19 | env: 20 | - name: ADDRESS 21 | value: "my-registry.local:5000" 22 | ``` 23 | 2. **Deploy the DaemonSet.** Apply your modified YAML file to your cluster. The `DaemonSet` will create a pod on each GKE node that uses the `containerd` runtime. 24 | 25 | ```bash 26 | kubectl apply -f your-daemonset-filename.yaml 27 | ``` 28 | The script within the pod will then automatically update the `containerd` configuration and restart the service to apply the changes. 29 | 30 | --- 31 | 32 | ## How It Works 33 | 34 | The `DaemonSet` runs a privileged pod on each node selected by the `nodeSelector` (`cloud.google.com/gke-container-runtime: "containerd"`). This pod executes a startup script that: 35 | 1. Identifies the `containerd` configuration file (`/etc/containerd/config.toml`). 36 | 2. Detects which configuration model the node is using (legacy V1 or modern V2). 37 | 3. Injects the necessary configuration to define your registry as an insecure endpoint (`http://...`). 38 | 4. Reloads the `systemd` daemon and restarts the `containerd` service to make the changes take effect. 39 | 40 | --- 41 | 42 | ## Important Considerations 43 | 44 | * **Security Risk:** Configuring an insecure registry means that image pulls are unencrypted (HTTP) and unauthenticated, which can be a security risk. This should only be used in trusted, isolated network environments. 45 | * **GKE Autopilot:** This method uses a privileged `DaemonSet` that modifies the host node's file system. This is **not compatible with GKE Autopilot clusters**, which restrict this level of host access. This solution is intended for GKE Standard clusters only. -------------------------------------------------------------------------------- /gvisor/README.md: -------------------------------------------------------------------------------- 1 | # enable-gvisor-flags.yaml (gVisor Pod Flag Overrides) 2 | 3 | The `enable-gvisor-flags` tool is a Kubernetes DaemonSet that modifies the gVisor (`runsc`) configuration on GKE Sandbox nodes to allow runtime flags to be set on a per-pod basis. This tool is important for advanced debugging and customization of sandboxed pods by enabling specific gVisor features like `strace` or debug logging. 🔬 4 | 5 | --- 6 | ### ⚠️ Warning 7 | This tool runs as **privileged** to modify a system configuration file. The flags you enable via pod annotations can alter the security, stability, and performance characteristics of your sandboxed workloads. Use these flags with a clear understanding of their function. 8 | 9 | --- 10 | ### **Prerequisites** 11 | 12 | This DaemonSet is designed to run automatically on nodes that are part of a **GKE Sandbox node pool**, which uses the gVisor runtime. It will not run on standard node pools. This feature is supported on GKE versions `1.18.6-gke.3504` and later. 13 | 14 | --- 15 | ### **How to use it?** 16 | 17 | 1. Save the script to a file named `enable-gvisor-flags.yaml`. 18 | 2. Apply the DaemonSet to your cluster. It will automatically find and configure all gVisor-enabled nodes. 19 | ```bash 20 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/gvisor/enable-gvisor-flags.yaml 21 | ``` 22 | 23 | --- 24 | ### **How to get the result?** 25 | 26 | The result of this tool is the **ability to set gVisor flags using pod annotations**. After the DaemonSet has run, you can enable flags on any pod scheduled to a GKE Sandbox node. 27 | 28 | 1. **Add annotations to your pod's metadata.** Here is an example that enables debug logging and `strace` for a specific pod: 29 | ```yaml 30 | apiVersion: v1 31 | kind: Pod 32 | metadata: 33 | name: my-sandboxed-pod 34 | annotations: 35 | dev.gvisor.flag.debug-log: "/tmp/sandbox-%ID/" 36 | dev.gvisor.flag.debug: "true" 37 | dev.gvisor.flag.strace: "true" 38 | spec: 39 | runtimeClassName: gvisor 40 | containers: 41 | - name: my-container 42 | image: nginx 43 | ``` 44 | 45 | 2. **To verify the configuration was applied to the node,** you can SSH into a gVisor node and check the contents of the `runsc` config file. It should contain the new line. 46 | ```bash 47 | cat /run/containerd/runsc/config.toml 48 | ``` 49 | 50 | --- 51 | ### **How to remove it?** 52 | 53 | You can remove the DaemonSet to prevent it from configuring any new gVisor nodes. 54 | 55 | ```bash 56 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/gvisor/enable-gvisor-flags.yaml -------------------------------------------------------------------------------- /containerd/container-insecure-registry/insecure-registry-config.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: DaemonSet 3 | metadata: 4 | name: insecure-registries 5 | namespace: default 6 | labels: 7 | k8s-app: insecure-registries 8 | spec: 9 | selector: 10 | matchLabels: 11 | name: insecure-registries 12 | updateStrategy: 13 | type: RollingUpdate 14 | template: 15 | metadata: 16 | labels: 17 | name: insecure-registries 18 | spec: 19 | nodeSelector: 20 | cloud.google.com/gke-container-runtime: "containerd" 21 | hostPID: true 22 | containers: 23 | - name: startup-script 24 | image: gke.gcr.io/startup-script:v2 25 | imagePullPolicy: Always 26 | securityContext: 27 | privileged: true 28 | env: 29 | - name: ADDRESS 30 | value: "REGISTRY_ADDRESS" 31 | - name: STARTUP_SCRIPT 32 | value: | 33 | set -o errexit 34 | set -o pipefail 35 | set -o nounset 36 | 37 | if [[ -z "$ADDRESS" || "$ADDRESS" == "REGISTRY_ADDRESS" ]]; then 38 | echo "Error: Environment variable ADDRESS is not set in containers.spec.env" 39 | exit 1 40 | fi 41 | 42 | echo "Allowlisting insecure registries..." 43 | containerd_config="/etc/containerd/config.toml" 44 | hostpath=$(sed -nr 's; config_path = "([-/a-z0-9_.]+)";\1;p' "$containerd_config") 45 | if [[ -z "$hostpath" ]]; then 46 | echo "Node uses CRI config model V1 (deprecated), adding mirror under $containerd_config..." 47 | grep -qxF '[plugins."io.containerd.grpc.v1.cri".registry.mirrors."'$ADDRESS'"]' "$containerd_config" || \ 48 | echo -e '[plugins."io.containerd.grpc.v1.cri".registry.mirrors."'$ADDRESS'"]\n endpoint = ["http://'$ADDRESS'"]' >> "$containerd_config" 49 | else 50 | host_config_dir="$hostpath/$ADDRESS" 51 | host_config_file="$host_config_dir/hosts.toml" 52 | echo "Node uses CRI config model V2, adding mirror under $host_config_file..." 53 | if [[ ! -e "$host_config_file" ]]; then 54 | mkdir -p "$host_config_dir" 55 | echo -e "server = \"https://$ADDRESS\"\n" > "$host_config_file" 56 | fi 57 | echo -e "[host.\"http://$ADDRESS\"]\n capabilities = [\"pull\", \"resolve\"]\n" >> "$host_config_file" 58 | fi 59 | echo "Reloading systemd management configuration" 60 | systemctl daemon-reload 61 | echo "Restarting containerd..." 62 | systemctl restart containerd 63 | -------------------------------------------------------------------------------- /troubleshooting/perf/README.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Node Profiling Tools 2 | 3 | These tools deploy the Linux `perf` utility to profile and trace performance on specific Kubernetes nodes. 4 | 5 | --- 6 | ## **Prerequisites for Both Tools** 7 | 8 | These tools only run on nodes with a specific label. You must label the node you want to profile before using them. 9 | 10 | 1. **Label the node:** 11 | ```bash 12 | kubectl label node enable-perf=true 13 | ``` 14 | 2. **Check Kernel Version:** You must edit the downloaded YAML file to ensure the `KERNEL_VERSION` variable matches your node's kernel (`uname -r`). 15 | 16 | --- 17 | ### **`perf-record.yaml` (CPU Profiling)** 18 | 19 | The `perf-record` tool is a Kubernetes DaemonSet that continuously profiles CPU usage on a node. This tool is important for discovering which programs and functions are consuming the most CPU time. 🏎️ 20 | 21 | #### ⚠️ Warning 22 | This tool runs as **privileged** and adds performance overhead. Use it only for debugging and remove it when finished. 23 | 24 | #### How to use it? 25 | Apply it to a labeled node by running the following command. 26 | ``` 27 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/troubleshooting/perf/perf-record.yaml 28 | ``` 29 | 30 | #### How to get the result? 31 | The profiling data is not in the pod logs. It is saved directly on the node's filesystem. SSH into your node and find the `perf.data` files in a new timestamped directory inside: 32 | /var/log/perf_record/ 33 | 34 | --- 35 | ### **`perf-trace.yaml` (System Call Tracing)** 36 | 37 | The `perf-trace` tool is a Kubernetes DaemonSet that traces system calls made by programs on a node. This tool is important for debugging low-level application errors related to file access, permissions, or networking. 🐞 38 | 39 | #### ⚠️ Warning 40 | This tool runs as **privileged** and adds performance overhead. Use it only for debugging and remove it when finished. 41 | 42 | #### How to use it? 43 | Apply it to a labeled node by running the following command. You can edit the YAML to change the `TARGET_PGREP` environment variable to trace a specific process (e.g., "containerd") or leave it empty to trace the whole system. 44 | ``` 45 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/troubleshooting/perf/perf-trace.yaml 46 | ``` 47 | 48 | #### How to get the result? 49 | The trace data is not in the pod logs. It is saved directly on the node's filesystem. SSH into your node and find the log files in a new timestamped directory inside: 50 | ``` 51 | /var/log/perf_trace/ 52 | ``` 53 | 54 | #### How to remove it? 55 | ``` 56 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/troubleshooting/perf/perf-trace.yaml 57 | ``` -------------------------------------------------------------------------------- /disable-smt/gke/enable-smt.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | 16 | # Deploy this DaemonSet to enable hyper-threading on the nodes with the 17 | # "cloud.google.com/gke-smt-disabled=false" label. 18 | # 19 | # WARNING: Enabling hyper-threading may make the node vulnerable to 20 | # Microarchitectural Data Sampling (MDS). Please ensure that this is acceptable 21 | # before deploying this to your production clusters. 22 | # 23 | # WARNING: Enabling hyper-threading requires node reboot. Therefore, in order 24 | # to avoid disrupting your workloads, it is recommended to create a new node 25 | # pool with the "cloud.google.com/gke-smt-disabled=false" label in your cluster, 26 | # deploy the DaemonSet to enable hyper-threading in that node pool, and then 27 | # migrate your workloads to the new node pool. 28 | 29 | # 30 | # NOTE: 31 | # It's recommended to use the --threads-per-core flag on the node-pool to 32 | # configure SMT setting on nodes. 33 | # https://cloud.google.com/kubernetes-engine/docs/how-to/configure-smt 34 | # 35 | 36 | apiVersion: apps/v1 37 | kind: DaemonSet 38 | metadata: 39 | name: enable-smt 40 | namespace: kube-system 41 | spec: 42 | selector: 43 | matchLabels: 44 | name: enable-smt 45 | updateStrategy: 46 | type: RollingUpdate 47 | template: 48 | metadata: 49 | labels: 50 | name: enable-smt 51 | spec: 52 | tolerations: 53 | - operator: Exists 54 | volumes: 55 | - name: host 56 | hostPath: 57 | path: / 58 | hostPID: true 59 | initContainers: 60 | - name: smt 61 | image: bash 62 | command: 63 | - /usr/local/bin/bash 64 | - -c 65 | - | 66 | set -euo pipefail 67 | echo "SMT is set to $(cat /host/sys/devices/system/cpu/smt/control)" 68 | echo "Setting SMT to on"; 69 | echo -n "on" > /host/sys/devices/system/cpu/smt/control 70 | echo "Restarting Kubelet..." 71 | chroot /host nsenter --target=1 --all -- systemctl restart kubelet.service 72 | volumeMounts: 73 | - name: host 74 | mountPath: /host 75 | resources: 76 | requests: 77 | memory: 5Mi 78 | cpu: 5m 79 | securityContext: 80 | privileged: true 81 | containers: 82 | - image: gcr.io/google-containers/pause:3.2 83 | name: pause 84 | # Ensures that the pods will only run on the nodes having the certain 85 | # label. 86 | nodeSelector: 87 | "cloud.google.com/gke-smt-disabled": "false" 88 | -------------------------------------------------------------------------------- /disable-smt/gke/disable-smt.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | 16 | # Deploy this DaemonSet to disable hyper-threading on the nodes with the 17 | # "cloud.google.com/gke-smt-disabled=true" label. 18 | # 19 | # WARNING: Disabling hyper-threading might have severe performance impact on 20 | # your clusters and application. Please ensure that this is acceptable before 21 | # deploying this to your production clusters. 22 | # 23 | # WARNING: Disabling hyper-threading requires node reboot. Therefore, in order 24 | # to avoid disrupting your workloads, it is recommended to create a new node 25 | # pool with the "cloud.google.com/gke-smt-disabled=true" label in your cluster, 26 | # deploy the DaemonSet to disable hyper-threading in that node pool, and then 27 | # migrate your workloads to the new node pool. 28 | 29 | # 30 | # NOTE: 31 | # It's recommended to use the --threads-per-core flag on the node-pool to 32 | # configure SMT setting on nodes. 33 | # https://cloud.google.com/kubernetes-engine/docs/how-to/configure-smt 34 | # 35 | 36 | apiVersion: apps/v1 37 | kind: DaemonSet 38 | metadata: 39 | name: disable-smt 40 | namespace: kube-system 41 | spec: 42 | selector: 43 | matchLabels: 44 | name: disable-smt 45 | updateStrategy: 46 | type: RollingUpdate 47 | template: 48 | metadata: 49 | labels: 50 | name: disable-smt 51 | spec: 52 | tolerations: 53 | - operator: Exists 54 | volumes: 55 | - name: host 56 | hostPath: 57 | path: / 58 | hostPID: true 59 | initContainers: 60 | - name: smt 61 | image: bash 62 | command: 63 | - /usr/local/bin/bash 64 | - -c 65 | - | 66 | set -euo pipefail 67 | echo "SMT is set to $(cat /host/sys/devices/system/cpu/smt/control)" 68 | echo "Setting SMT to off" 69 | echo -n "off" > /host/sys/devices/system/cpu/smt/control 70 | echo "Restarting Kubelet..." 71 | chroot /host nsenter --target=1 --all -- systemctl restart kubelet.service 72 | volumeMounts: 73 | - name: host 74 | mountPath: /host 75 | resources: 76 | requests: 77 | memory: 5Mi 78 | cpu: 5m 79 | securityContext: 80 | privileged: true 81 | containers: 82 | - image: gcr.io/google-containers/pause:3.2 83 | name: pause 84 | # Ensures that the pods will only run on the nodes having the certain 85 | # label. 86 | nodeSelector: 87 | "cloud.google.com/gke-smt-disabled": "true" 88 | -------------------------------------------------------------------------------- /containerd/containerd-http-proxy/configure_http_proxy.yaml: -------------------------------------------------------------------------------- 1 | kind: DaemonSet 2 | apiVersion: apps/v1 3 | metadata: 4 | name: containerd-http-proxy 5 | spec: 6 | selector: 7 | matchLabels: 8 | name: containerd-http-proxy 9 | template: 10 | metadata: 11 | labels: 12 | name: containerd-http-proxy 13 | spec: 14 | hostPID: true 15 | volumes: 16 | - name: systemd-containerd-service 17 | hostPath: 18 | path: /etc/systemd/system/containerd.service.d 19 | type: DirectoryOrCreate 20 | initContainers: 21 | - name: startup-script 22 | image: gke.gcr.io/debian-base:bookworm-v1.0.0-gke.1 23 | imagePullPolicy: IfNotPresent 24 | securityContext: 25 | privileged: true 26 | volumeMounts: 27 | - name: systemd-containerd-service 28 | mountPath: /etc/systemd/system/containerd.service.d 29 | command: 30 | - /bin/sh 31 | - -c 32 | - | 33 | set -e 34 | set -u 35 | 36 | validate_proxy() { 37 | input_string=$1 38 | 39 | if echo "$input_string" | grep -q ' '; then 40 | echo "Error: Input cannot contain spaces. Input: '$input_string'" 41 | exit 1 42 | fi 43 | 44 | if echo "$input_string" | sed 1d | grep -q .; then 45 | echo "Error: Input cannot contain newline. Input: '$input_string'" 46 | exit 1 47 | fi 48 | 49 | if echo "$input_string" | grep -q -e '"' -e "'"; then 50 | echo "Error: Input cannot contain quotes. Input: '$input_string'" 51 | exit 1 52 | fi 53 | } 54 | 55 | validate_proxy "${HTTP_PROXY:-}" 56 | validate_proxy "${HTTPS_PROXY:-}" 57 | validate_proxy "${NO_PROXY:-localhost}" 58 | 59 | cat > /etc/systemd/system/containerd.service.d/http-proxy.conf <&2 73 | env: 74 | - name: HTTP_PROXY 75 | valueFrom: 76 | configMapKeyRef: 77 | name: containerd-proxy-configmap 78 | key: HTTP_PROXY 79 | - name: HTTPS_PROXY 80 | valueFrom: 81 | configMapKeyRef: 82 | name: containerd-proxy-configmap 83 | key: HTTPS_PROXY 84 | - name: NO_PROXY 85 | valueFrom: 86 | configMapKeyRef: 87 | name: containerd-proxy-configmap 88 | key: NO_PROXY 89 | optional: true 90 | containers: 91 | - name: pause-container 92 | image: gke.gcr.io/pause:3.7 93 | imagePullPolicy: IfNotPresent 94 | 95 | -------------------------------------------------------------------------------- /troubleshooting/perf/perf-trace.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # 16 | # 17 | # Change TARGET_PGREP below to names of processes to trace. 18 | # Also change KERNEL_VERSION below to version being tested. 19 | 20 | apiVersion: apps/v1 21 | kind: DaemonSet 22 | metadata: 23 | name: enable-perf-trace 24 | labels: 25 | app: enable-perf-trace 26 | spec: 27 | selector: 28 | matchLabels: 29 | name: enable-perf-trace 30 | template: 31 | metadata: 32 | labels: 33 | name: enable-perf-trace 34 | spec: 35 | nodeSelector: 36 | "enable-perf": "true" 37 | hostPID: true 38 | volumes: 39 | - name: host 40 | hostPath: 41 | path: / 42 | containers: 43 | - name: enable-perf-trace 44 | image: gke.gcr.io/debian-base 45 | imagePullPolicy: Always 46 | volumeMounts: 47 | - name: host 48 | mountPath: /host 49 | securityContext: 50 | privileged: true 51 | env: 52 | # TARGET_PGREP options 53 | # Values: 54 | # empty or unset to do full trace 55 | # pgrep query to filter 56 | - name: TARGET_PGREP 57 | value: "kubelet" 58 | command: 59 | - /bin/bash 60 | - -c 61 | - | 62 | 63 | set -o errexit 64 | set -o pipefail 65 | set -o nounset 66 | 67 | MAX_LOG_SIZE="16777215" # 16 MB 68 | MAX_LOGS="2000" 69 | KERNEL_VERSION="5.0.0" 70 | 71 | apt-get update && apt-get install -y curl procps daemontools build-essential bison flex libelf-dev binutils-dev 72 | curl -O https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/v"${KERNEL_VERSION}"/perf-"${KERNEL_VERSION}".tar.gz 73 | tar xzf perf-"${KERNEL_VERSION}".tar.gz 74 | make -C perf-"${KERNEL_VERSION}"/tools/perf install 75 | PERF="/root/bin/perf" 76 | 77 | d=$(date '+%Y-%m-%dT%H:%M:%SZ') 78 | 79 | out_dir="/host/var/log/perf_trace/${d}" 80 | mkdir -p "${out_dir}" 81 | 82 | echo "starting perf! will dump to ${out_dir}" 83 | 84 | if [[ -z "${TARGET_PGREP+x}" ]]; then 85 | echo "full system perf trace" 86 | "${PERF}" trace |& multilog t s${MAX_LOG_SIZE} n${MAX_LOGS} "${out_dir}" 87 | else 88 | echo "PID perf trace" 89 | PIDS=$(pgrep "${TARGET_PGREP}" -d ",") 90 | echo "staring perf on pids == ${PIDS}" 91 | "${PERF}" trace --pid="${PIDS}" |& multilog t s${MAX_LOG_SIZE} n${MAX_LOGS} "${out_dir}" 92 | fi 93 | -------------------------------------------------------------------------------- /troubleshooting/enable-kdump/ubuntu-enable-kdump.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | apiVersion: apps/v1 16 | kind: DaemonSet 17 | metadata: 18 | name: enable-kdump 19 | labels: 20 | app: enable-kdump 21 | spec: 22 | selector: 23 | matchLabels: 24 | name: enable-kdump 25 | template: 26 | metadata: 27 | labels: 28 | name: enable-kdump 29 | spec: 30 | nodeSelector: 31 | "cloud.google.com/gke-os-distribution": "ubuntu" 32 | "enable-kdump": "true" 33 | hostPID: true 34 | containers: 35 | - name: enable-kdump 36 | image: debian 37 | imagePullPolicy: Always 38 | securityContext: 39 | privileged: true 40 | command: 41 | - /usr/bin/nsenter 42 | - -t 1 43 | - -m 44 | - -u 45 | - -i 46 | - -n 47 | - -p 48 | - -- 49 | - /bin/bash 50 | - -c 51 | - | 52 | 53 | set -o errexit 54 | set -o pipefail 55 | set -o nounset 56 | 57 | function check_kdump() { 58 | local kdump_show 59 | kdump_show=$(kdump-config show) 60 | if echo "${kdump_show}" | grep -q "ready to kdump"; then 61 | echo "ready to kdump!" 62 | 63 | echo "setting sysctls" 64 | sysctl -w kernel.hung_task_panic=1 65 | sysctl -w kernel.hung_task_timeout_secs=20 66 | echo "sysctls are set" 67 | else 68 | echo "kdump not setup, isn't ready" 69 | fi 70 | echo "kdump-config show ==> ${kdump_show}" 71 | echo "/proc/cmdline ==> $(cat /proc/cmdline)" 72 | } 73 | 74 | function install() { 75 | echo "installing kdump" 76 | apt-get update 77 | DEBIAN_FRONTEND=noninteractive apt-get install -y linux-crashdump 78 | sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT.*/GRUB_CMDLINE_LINUX_DEFAULT="\$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M"/g' /etc/default/grub.d/kdump-tools.cfg 79 | update-grub 80 | echo "kdump enabled; waiting for reboot in 10 secs..." 81 | ( sleep 10 && reboot ) & 82 | 83 | while true; do 84 | echo "$(date '+%Y-%m-%dT%H:%M:%SZ') waiting for reboot..." 85 | sleep 1 86 | done 87 | } 88 | 89 | if command -v "kdump-config" &> /dev/null; then 90 | check_kdump 91 | sleep 10 92 | else 93 | install 94 | fi 95 | -------------------------------------------------------------------------------- /troubleshooting/enable-kdump/cos-enable-kdump.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | 16 | # Deploy this DaemonSet to enable kdump on the COS nodes with the 17 | # "cloud.google.com/gke-kdump-enabled=true" label. 18 | # 19 | # WARNING: Enabling kdump requires node reboot. Therefore, in order to avoid 20 | # disrupting your workloads, it is recommended to create a new node pool with 21 | # the "cloud.google.com/gke-kdump-enabled=true" label in your cluster, 22 | # deploy the DaemonSet to enable kdump in that node pool, and then migrate 23 | # your workloads to the new node pool. 24 | 25 | apiVersion: apps/v1 26 | kind: DaemonSet 27 | metadata: 28 | name: enable-kdump 29 | namespace: kube-system 30 | spec: 31 | selector: 32 | matchLabels: 33 | name: enable-kdump 34 | updateStrategy: 35 | type: RollingUpdate 36 | template: 37 | metadata: 38 | labels: 39 | name: enable-kdump 40 | spec: 41 | volumes: 42 | - name: host 43 | hostPath: 44 | path: / 45 | initContainers: 46 | - name: enable-kdump 47 | image: ubuntu 48 | command: 49 | - /bin/bash 50 | - -c 51 | - | 52 | function verify_base_image { 53 | local id="$(grep "^ID=" /host/etc/os-release)" 54 | if [[ "${id#*=}" != "cos" ]]; then 55 | echo "This kdump feature switch is designed to run on Container-Optimized OS only" 56 | exit 0 57 | fi 58 | } 59 | function check_kdump_feature { 60 | chroot /host /usr/sbin/kdump_helper show 61 | } 62 | function enable_kdump_feature_and_reboot_if_needed { 63 | chroot /host /usr/sbin/kdump_helper enable 64 | local -r is_enabled=$(chroot /host /usr/sbin/kdump_helper show | grep "kdump enabled" | sed -rn "s/kdump enabled: (.*)/\1/p") 65 | local -r is_ready=$(chroot /host /usr/sbin/kdump_helper show | grep "kdump ready" | sed -rn "s/kdump ready: (.*)/\1/p") 66 | if [[ "${is_enabled}" == "true" && "${is_ready}" == "false" ]]; then 67 | echo "kdump is enabled. Rebooting for it to take effect." 68 | chroot /host systemctl reboot 69 | fi 70 | } 71 | verify_base_image 72 | check_kdump_feature 73 | enable_kdump_feature_and_reboot_if_needed 74 | resources: 75 | requests: 76 | memory: 5Mi 77 | cpu: 5m 78 | securityContext: 79 | privileged: true 80 | volumeMounts: 81 | - name: host 82 | mountPath: /host 83 | containers: 84 | - image: gke.gcr.io/pause:3.8@sha256:880e63f94b145e46f1b1082bb71b85e21f16b99b180b9996407d61240ceb9830 85 | name: pause 86 | nodeSelector: 87 | "cloud.google.com/gke-kdump-enabled": "true" 88 | "cloud.google.com/gke-os-distribution": "cos" 89 | -------------------------------------------------------------------------------- /kubelet/kubelet-log-config/kubelet-log-config.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2024 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | # Deploy this DaemonSet to configure kubelet log rotation on nodes with the 16 | # "kubelet-log-config=true" label. 17 | # 18 | # Change the values of CONTAINER_LOG_MAX_SIZE and CONTAINER_LOG_MAX_FILES to 19 | # suit your needs. 20 | # 21 | # WARNING: Changing the kubelet log rotation configuration requires a kubelet 22 | # restart. Therefore, in order to avoid disrupting your workloads, it is 23 | # recommended to create a new node pool with the "kubelet-log-config=true" label 24 | # in your cluster, deploy the DaemonSet to configure kubelet log rotation in 25 | # that node pool, and then migrate your workloads to the new node pool. 26 | 27 | apiVersion: apps/v1 28 | kind: DaemonSet 29 | metadata: 30 | name: kubelet-log-config 31 | namespace: kube-system 32 | spec: 33 | selector: 34 | matchLabels: 35 | name: kubelet-log-config 36 | updateStrategy: 37 | type: RollingUpdate 38 | template: 39 | metadata: 40 | labels: 41 | name: kubelet-log-config 42 | spec: 43 | tolerations: 44 | - operator: Exists 45 | volumes: 46 | - name: host 47 | hostPath: 48 | path: / 49 | hostPID: true 50 | initContainers: 51 | - name: kubelet-log-config 52 | image: gke.gcr.io/debian-base 53 | env: 54 | # The maximum size of the container log file before it is rotated. 55 | # Update the value as desired. 56 | - name: CONTAINER_LOG_MAX_SIZE 57 | value: "10Mi" 58 | # The maximum number of container log files that for a container. 59 | # Update the value as desired. 60 | - name: CONTAINER_LOG_MAX_FILES 61 | value: "5" 62 | command: 63 | - /bin/bash 64 | - -c 65 | - | 66 | set -xeuo pipefail 67 | 68 | # Configure the kubelet log rotation behavior. 69 | # $1: Field name in kubelet configuration. 70 | # $2: Value for the kubelet config field. 71 | function set-kubelet-log-config() { 72 | [[ "$#" -eq 2 ]] || return 73 | local field; field="$1"; shift 74 | local value; value="$1"; shift 75 | 76 | local config; config="/host/home/kubernetes/kubelet-config.yaml" 77 | 78 | echo "Remove existing configuration for ${field} if there is any." 79 | sed -i "/${field}/d" "${config}" 80 | 81 | echo "Set ${field} to ${value}." 82 | echo "${field}: ${value}" >> "${config}" 83 | } 84 | 85 | set-kubelet-log-config containerLogMaxSize "${CONTAINER_LOG_MAX_SIZE}" 86 | set-kubelet-log-config containerLogMaxFiles "${CONTAINER_LOG_MAX_FILES}" 87 | 88 | echo "Restarting kubelet..." 89 | chroot /host nsenter -a -t1 -- systemctl restart kubelet.service 90 | 91 | echo "Success!" 92 | volumeMounts: 93 | - name: host 94 | mountPath: /host 95 | resources: 96 | requests: 97 | memory: 5Mi 98 | cpu: 5m 99 | securityContext: 100 | privileged: true 101 | containers: 102 | - image: gcr.io/google-containers/pause:3.2 103 | name: pause 104 | # Ensures that the pods will only run on the nodes having the correct 105 | # label. 106 | nodeSelector: 107 | "kubelet-log-config": "true" 108 | -------------------------------------------------------------------------------- /disable-smt/gke/READ.md: -------------------------------------------------------------------------------- 1 | # SMT (Hyper-Threading) Configuration 2 | 3 | These tools modify the Simultaneous Multithreading (SMT), also known as hyper-threading, setting on GKE nodes. 4 | 5 | ### **Important Note** 6 | The officially recommended method for configuring SMT on GKE is to use the `--threads-per-core` flag when creating or updating a node pool with the `gcloud` command-line tool. These DaemonSets should be considered an alternative method. You can find more information in the [official GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/configure-smt). 7 | 8 | --- 9 | ## **`enable-smt.yaml` (Enable Hyper-Threading)** 10 | 11 | The `enable-smt` tool is a Kubernetes DaemonSet that enables Simultaneous Multithreading (SMT) on targeted GKE nodes. This is important for increasing the potential CPU throughput for highly parallel workloads by allowing each physical CPU core to execute multiple threads concurrently. 🧠 12 | 13 | #### ⚠️ Warning 14 | * This tool runs as **privileged** and **restarts the kubelet**, which can disrupt workloads on the node. 15 | * Enabling SMT may expose the node to security vulnerabilities like **Microarchitectural Data Sampling (MDS)**. 16 | * It is highly recommended to apply this configuration to a new, empty node pool and then safely migrate your workloads to the newly configured nodes. 17 | 18 | #### **Prerequisites** 19 | This DaemonSet only runs on nodes with a specific label. You must label the node(s) you want to configure: 20 | ```bash 21 | kubectl label node [cloud.google.com/gke-smt-disabled=false](https://cloud.google.com/gke-smt-disabled=false) 22 | ``` 23 | How to use it? 24 | Save the script to a file named enable-smt.yaml. 25 | 26 | Apply the DaemonSet to your cluster. 27 | 28 | ```bash 29 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/enable-smt.yaml 30 | ``` 31 | How to get the result? 32 | SSH into the labeled node and check the SMT control file. The output should be on. 33 | 34 | ```bash 35 | cat /sys/devices/system/cpu/smt/control 36 | ``` 37 | You can also check the pod's logs in the kube-system namespace to see the script's output. 38 | 39 | How to remove it? 40 | ```bash 41 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/enable-smt.yaml 42 | Note: Removing the DaemonSet does not disable SMT on already-configured nodes. 43 | ``` 44 | disable-smt.yaml (Disable Hyper-Threading) 45 | The disable-smt tool is a Kubernetes DaemonSet that disables Simultaneous Multithreading (SMT) on targeted GKE nodes. This is important for mitigating certain CPU security vulnerabilities (like MDS) or for ensuring predictable performance for workloads that do not benefit from hyper-threading. 🔒 46 | 47 | ⚠️ Warning 48 | This tool runs as privileged and restarts the kubelet, which can disrupt workloads on the node. 49 | 50 | Disabling SMT may have a severe performance impact on applications that rely on high thread counts. 51 | 52 | It is highly recommended to apply this configuration to a new, empty node pool and then safely migrate your workloads. 53 | 54 | Prerequisites 55 | This DaemonSet only runs on nodes with a specific label. You must label the node(s) you want to configure: 56 | 57 | ```bash 58 | kubectl label node [cloud.google.com/gke-smt-disabled=true](https://cloud.google.com/gke-smt-disabled=true) 59 | ``` 60 | How to use it? 61 | Save the script to a file named disable-smt.yaml. 62 | 63 | Apply the DaemonSet to your cluster. 64 | 65 | ```bash 66 | kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/disable-smt.yaml 67 | ``` 68 | How to get the result? 69 | SSH into the labeled node and check the SMT control file. The output should be off. 70 | 71 | ```bash 72 | cat /sys/devices/system/cpu/smt/control 73 | You can also check the pod's logs in the kube-system namespace to see the script's output. 74 | ``` 75 | 76 | How to remove it? 77 | 78 | ```bash 79 | kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-smt/gke/disable-smt.yaml 80 | Note: Removing the DaemonSet does not re-enable SMT on already-configured nodes. 81 | ``` -------------------------------------------------------------------------------- /containerd/socket-tracer/containerd-socket-tracer.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: DaemonSet 3 | metadata: 4 | name: containerd-socket-tracer 5 | namespace: default 6 | labels: 7 | k8s-app: containerd-socket-tracer 8 | spec: 9 | selector: 10 | matchLabels: 11 | name: containerd-socket-tracer 12 | template: 13 | metadata: 14 | labels: 15 | name: containerd-socket-tracer 16 | annotations: 17 | autopilot.gke.io/no-connect: "true" 18 | spec: 19 | hostPID: true 20 | containers: 21 | - name: tracer 22 | image: mirror.gcr.io/ubuntu:24.04 23 | command: ["/bin/sh", "-c"] 24 | args: 25 | - | 26 | apt-get update && apt-get install -y bpftrace && \ 27 | 28 | start_time=$(date -u +'%Y-%m-%dT%H:%M:%SZ') && \ 29 | echo "time=\"$start_time\" msg=\"eBPF tracepoint for containerd socket connections started\" node=\"$NODE_NAME\"" && \ 30 | 31 | bpftrace -e ' 32 | #include 33 | 34 | // Log all PIDs and commands connecting to the containerd socket. 35 | tracepoint:syscalls:sys_enter_connect 36 | // Skip commands we know are not using the deprecated API. 37 | // Skip "crictl" (used below) to prevent creating a loop. 38 | // NOTE: You can update this filter with more commands as needed. 39 | /comm != "kubelet" && comm != "containerd" && comm != "ctr" && comm != "crictl"/ { 40 | $sa = (struct sockaddr_un *)args->uservaddr; 41 | if ($sa->sun_family == AF_UNIX && 42 | strcontains($sa->sun_path, "containerd.sock") && 43 | !strcontains($sa->sun_path, "containerd.sock.ttrpc")) { 44 | printf("%d %s\n", pid, comm); 45 | } 46 | }' | { 47 | # Skip parsing bpftrace header text. 48 | read -r _ 49 | 50 | # Query CRI for the container with that PID. 51 | while read -r pid comm; do 52 | current_pid="$pid" 53 | while true; do 54 | output=$(nsenter -at 1 /home/kubernetes/bin/crictl inspect --output go-template --template ' 55 | {{- range . -}} 56 | {{- if eq .info.pid "'"$current_pid"'" -}} 57 | {{- $time := "'"$(date -u +'%Y-%m-%dT%H:%M:%SZ')"'" -}} 58 | {{- $node := "'"$NODE_NAME"'" -}} 59 | {{- $namespace := index .info.runtimeSpec.annotations "io.kubernetes.cri.sandbox-namespace" -}} 60 | {{- $name := index .info.runtimeSpec.annotations "io.kubernetes.cri.sandbox-name" -}} 61 | {{- $container := index .info.runtimeSpec.annotations "io.kubernetes.cri.container-name" -}} 62 | {{- printf "time=\"%s\" msg=\"containerd socket connection opened\" node=\"%s\" pod=\"%s/%s\" container=\"%s\" comm=\"%s\"" $time $node $namespace $name $container "'"$comm"'" -}} 63 | {{- end -}} 64 | {{- end -}}' 65 | ) 66 | 67 | if [ -n "$output" ]; then 68 | echo "$output" 69 | break 70 | fi 71 | 72 | # If it cannot be found, then walk up the ancestor tree to find the main container process. 73 | if ! ppid=$(ps -o ppid= -p "$current_pid" 2>/dev/null | tr -d ' '); then 74 | break 75 | fi 76 | if [ -z "$ppid" ] || [ "$ppid" -eq 1 ]; then 77 | break 78 | fi 79 | current_pid="$ppid" 80 | done 81 | done 82 | } 83 | resources: 84 | requests: 85 | ephemeral-storage: "500Mi" 86 | limits: 87 | ephemeral-storage: "500Mi" 88 | env: 89 | - name: NODE_NAME 90 | valueFrom: 91 | fieldRef: 92 | fieldPath: spec.nodeName 93 | securityContext: 94 | # Privileged is required for the bpftrace tool to use eBPF syscalls. 95 | privileged: true 96 | volumeMounts: 97 | - name: debugfs 98 | mountPath: /sys/kernel/debug 99 | readOnly: true 100 | volumes: 101 | # debugfs is required by bpftrace to access kernel tracepoints. 102 | - name: debugfs 103 | hostPath: 104 | path: /sys/kernel/debug 105 | tolerations: 106 | - key: "node.kubernetes.io/not-ready" 107 | operator: "Exists" 108 | effect: "NoExecute" 109 | -------------------------------------------------------------------------------- /troubleshooting/os-audit/cos-auditd-logging.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2019 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | apiVersion: v1 16 | kind: Namespace 17 | metadata: 18 | name: cos-auditd 19 | --- 20 | apiVersion: apps/v1 21 | kind: DaemonSet 22 | metadata: 23 | name: cos-auditd-logging 24 | namespace: cos-auditd 25 | annotations: 26 | kubernetes.io/description: 'DaemonSet that enables Linux auditd logging on non-Autopilot COS nodes.' 27 | spec: 28 | selector: 29 | matchLabels: 30 | name: cos-auditd-logging 31 | template: 32 | metadata: 33 | labels: 34 | name: cos-auditd-logging 35 | spec: 36 | # Necessary for ensuring access to Google Cloud credentials from the node's metadata server. 37 | hostNetwork: true 38 | hostPID: true 39 | dnsPolicy: Default 40 | initContainers: 41 | - name: cos-auditd-setup 42 | image: ubuntu 43 | command: ["chroot", "/host", "systemctl", "start", "cloud-audit-setup"] 44 | securityContext: 45 | privileged: true 46 | volumeMounts: 47 | - name: host 48 | mountPath: /host 49 | resources: 50 | requests: 51 | memory: "10Mi" 52 | cpu: "10m" 53 | containers: 54 | - name: cos-auditd-fluent-bit 55 | securityContext: 56 | allowPrivilegeEscalation: false 57 | capabilities: 58 | drop: 59 | - all 60 | add: 61 | - DAC_OVERRIDE 62 | env: 63 | - name: NODE_NAME 64 | valueFrom: 65 | fieldRef: 66 | apiVersion: v1 67 | fieldPath: spec.nodeName 68 | # Substitute these (manually or via envsubst). For example, run 69 | # `CLUSTER_NAME=example-cluster CLUSTER_LOCATION=us-central1-a envsubst '$CLUSTER_NAME,$CLUSTER_LOCATION' < ${THIS_FILE:?} | kubectl apply -f -` 70 | - name: CLUSTER_NAME 71 | value: "$CLUSTER_NAME" 72 | - name: CLUSTER_LOCATION 73 | value: "$CLUSTER_LOCATION" 74 | # This image is used for demo purposes. The best practice is to use the image from controlled registry and reference it by SHA. 75 | image: fluent/fluent-bit:latest 76 | imagePullPolicy: IfNotPresent 77 | livenessProbe: 78 | httpGet: 79 | path: / 80 | port: 2024 81 | initialDelaySeconds: 120 82 | periodSeconds: 60 83 | timeoutSeconds: 5 84 | ports: 85 | - name: metrics 86 | containerPort: 2024 87 | resources: 88 | limits: 89 | cpu: "1" 90 | memory: 500Mi 91 | requests: 92 | cpu: 100m 93 | memory: 200Mi 94 | terminationMessagePath: /dev/termination-log 95 | terminationMessagePolicy: File 96 | volumeMounts: 97 | - mountPath: /var/log 98 | name: varlog 99 | - mountPath: /var/lib/cos-auditd-fluent-bit/pos-files 100 | name: varlib-cos-auditd-fluent-bit-pos-files 101 | - mountPath: /fluent-bit/etc 102 | name: config-volume 103 | nodeSelector: 104 | cloud.google.com/gke-os-distribution: cos 105 | restartPolicy: Always 106 | terminationGracePeriodSeconds: 120 107 | tolerations: 108 | - operator: "Exists" 109 | effect: "NoExecute" 110 | - operator: "Exists" 111 | effect: "NoSchedule" 112 | volumes: 113 | - name: host 114 | hostPath: 115 | path: / 116 | - name: varlog 117 | hostPath: 118 | path: /var/log 119 | - name: varlibcos-auditd-fluent-bit 120 | hostPath: 121 | path: /var/lib/cos-auditd-fluent-bit 122 | type: DirectoryOrCreate 123 | - name: varlib-cos-auditd-fluent-bit-pos-files 124 | hostPath: 125 | path: /var/lib/cos-auditd-fluent-bit/pos-files 126 | type: DirectoryOrCreate 127 | - name: config-volume 128 | configMap: 129 | name: cos-auditd-fluent-bit-config 130 | updateStrategy: 131 | type: RollingUpdate 132 | --- 133 | kind: ConfigMap 134 | apiVersion: v1 135 | metadata: 136 | name: cos-auditd-fluent-bit-config 137 | namespace: cos-auditd 138 | annotations: 139 | kubernetes.io/description: 'ConfigMap for Linux auditd logging daemonset on COS nodes.' 140 | data: 141 | fluent-bit.conf: |- 142 | [SERVICE] 143 | Flush 5 144 | Grace 120 145 | Log_Level info 146 | Daemon off 147 | HTTP_Server On 148 | HTTP_Listen 0.0.0.0 149 | HTTP_PORT 2024 150 | 151 | [INPUT] 152 | # https://docs.fluentbit.io/manual/input/systemd 153 | Name systemd 154 | Alias audit 155 | Tag audit 156 | Systemd_Filter SYSLOG_IDENTIFIER=audit 157 | Path /var/log/journal 158 | DB /var/lib/cos-auditd-fluent-bit/pos-files/audit.db 159 | 160 | [FILTER] 161 | # https://docs.fluentbit.io/manual/pipeline/filters/modify 162 | Name modify 163 | Match audit 164 | Add logging.googleapis.com/local_resource_id k8s_node.${NODE_NAME} 165 | 166 | [FILTER] 167 | Name modify 168 | Match audit 169 | Add logging.googleapis.com/logName linux-auditd 170 | 171 | [OUTPUT] 172 | # https://docs.fluentbit.io/manual/pipeline/outputs/stackdriver 173 | Name stackdriver 174 | Match audit 175 | Severity_key severity 176 | log_name_key logging.googleapis.com/logName 177 | Resource k8s_node 178 | # The plugin will read the project ID from the metadata server, but not the cluster name and location for some reason, so they have to be injected. 179 | k8s_cluster_name ${CLUSTER_NAME} 180 | k8s_cluster_location ${CLUSTER_LOCATION} 181 | net.connect_timeout 60 182 | Retry_Limit 14 183 | Workers 1 184 | -------------------------------------------------------------------------------- /containerd/migrating-to-containerd/find-nodepools-to-migrate.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # [START gke_node_find_non_containerd_nodepools] 4 | for project in $(gcloud projects list --format="value(projectId)") 5 | do 6 | echo "ProjectId: $project" 7 | for clusters in $( \ 8 | gcloud container clusters list \ 9 | --project $project \ 10 | --format="csv[no-heading](name,location,autopilot.enabled,currentMasterVersion,autoscaling.enableNodeAutoprovisioning,autoscaling.autoprovisioningNodePoolDefaults.imageType)") 11 | do 12 | IFS=',' read -r -a clustersArray <<< "$clusters" 13 | cluster_name="${clustersArray[0]}" 14 | cluster_zone="${clustersArray[1]}" 15 | cluster_isAutopilot="${clustersArray[2]}" 16 | cluster_version="${clustersArray[3]}" 17 | cluster_minorVersion=${cluster_version:0:4} 18 | cluster_autoprovisioning="${clustersArray[4]}" 19 | cluster_autoprovisioningImageType="${clustersArray[5]}" 20 | 21 | if [ "$cluster_isAutopilot" = "True" ]; then 22 | echo " Cluster: $cluster_name (autopilot) (zone: $cluster_zone)" 23 | echo " Autopilot clusters are running Containerd." 24 | else 25 | echo " Cluster: $cluster_name (zone: $cluster_zone)" 26 | 27 | if [ "$cluster_autoprovisioning" = "True" ]; then 28 | if [ "$cluster_minorVersion" \< "1.20" ]; then 29 | echo " Node autoprovisioning is enabled, and new node pools will have image type 'COS'." 30 | echo " This settings is not configurable on the current version of a cluster." 31 | echo " Please upgrade you cluster and configure the default node autoprovisioning image type." 32 | echo " " 33 | else 34 | if [ "$cluster_autoprovisioningImageType" = "COS" ]; then 35 | echo " Node autoprovisioning is configured to create new node pools of type 'COS'." 36 | echo " Run the following command to update:" 37 | echo " gcloud container clusters update '$cluster_name' --project '$project' --zone '$cluster_zone' --enable-autoprovisioning --autoprovisioning-image-type='COS_CONTAINERD'" 38 | echo " " 39 | fi 40 | 41 | if [ "$cluster_autoprovisioningImageType" = "UBUNTU" ]; then 42 | echo " Node autoprovisioning is configured to create new node pools of type 'UBUNTU'." 43 | echo " Run the following command to update:" 44 | echo " gcloud container clusters update '$cluster_name' --project '$project' --zone '$cluster_zone' --enable-autoprovisioning --autoprovisioning-image-type='UBUNTU_CONTAINERD'" 45 | echo " " 46 | fi 47 | fi 48 | fi 49 | 50 | for nodepools in $( \ 51 | gcloud container node-pools list \ 52 | --project $project \ 53 | --cluster $cluster_name \ 54 | --zone $cluster_zone \ 55 | --format="csv[no-heading](name,version,config.imageType)") 56 | do 57 | IFS=',' read -r -a nodepoolsArray <<< "$nodepools" 58 | nodepool_name="${nodepoolsArray[0]}" 59 | nodepool_version="${nodepoolsArray[1]}" 60 | nodepool_imageType="${nodepoolsArray[2]}" 61 | 62 | nodepool_minorVersion=${nodepool_version:0:4} 63 | 64 | echo " Nodepool: $nodepool_name, version: $nodepool_version ($nodepool_minorVersion), image: $nodepool_imageType" 65 | 66 | minorVersionWithRev="${nodepool_version/-gke./.}" 67 | linuxGkeMinVersion="1.14" 68 | windowsGkeMinVersion="1.21.1.2200" 69 | 70 | suggestedImageType="COS_CONTAINERD" 71 | 72 | if [ "$nodepool_imageType" = "UBUNTU" ]; then 73 | suggestedImageType="UBUNTU_CONTAINERD" 74 | elif [ "$nodepool_imageType" = "WINDOWS_LTSC" ]; then 75 | suggestedImageType="WINDOWS_LTSC_CONTAINERD" 76 | elif [ "$nodepool_imageType" = "WINDOWS_SAC" ]; then 77 | suggestedImageType="WINDOWS_SAC_CONTAINERD" 78 | fi 79 | 80 | tab=$'\n '; 81 | nodepool_message="$tab Please update the nodepool to use Containerd." 82 | nodepool_message+="$tab Make sure to consult with the list of known issues https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues." 83 | nodepool_message+="$tab Run the following command to upgrade:" 84 | nodepool_message+="$tab " 85 | nodepool_message+="$tab gcloud container clusters upgrade '$cluster_name' --project '$project' --zone '$cluster_zone' --image-type '$suggestedImageType' --node-pool '$nodepool_name'" 86 | nodepool_message+="$tab " 87 | 88 | # see https://cloud.google.com/kubernetes-engine/docs/concepts/node-images 89 | if [ "$nodepool_imageType" = "COS_CONTAINERD" ] || [ "$nodepool_imageType" = "UBUNTU_CONTAINERD" ] || 90 | [ "$nodepool_imageType" = "WINDOWS_LTSC_CONTAINERD" ] || [ "$nodepool_imageType" = "WINDOWS_SAC_CONTAINERD" ]; then 91 | nodepool_message="$tab Nodepool is using Containerd already" 92 | elif ( [ "$nodepool_imageType" = "WINDOWS_LTSC" ] || [ "$nodepool_imageType" = "WINDOWS_SAC" ] ) && 93 | [ "$(printf '%s\n' "$windowsGkeMinVersion" "$minorVersionWithRev" | sort -V | head -n1)" != "$windowsGkeMinVersion" ]; then 94 | nodepool_message="$tab Upgrade nodepool to the version that supports Containerd for Windows" 95 | elif [ "$(printf '%s\n' "$linuxGkeMinVersion" "$minorVersionWithRev" | sort -V | head -n1)" != "$linuxGkeMinVersion" ]; then 96 | nodepool_message="$tab Upgrade nodepool to the version that supports Containerd" 97 | fi 98 | echo "$nodepool_message" 99 | done 100 | fi # not autopilot 101 | done 102 | done 103 | 104 | # Sample output: 105 | # 106 | # ProjectId: my-project-id 107 | # Cluster: autopilot-cluster-1 (autopilot) (zone: us-central1) 108 | # Autopilot clusters are running Containerd. 109 | # Cluster: cluster-1 (zone: us-central1-c) 110 | # Nodepool: default-pool, version: 1.18.12-gke.1210 (1.18), image: COS 111 | # 112 | # Please update the nodepool to use Containerd. 113 | # Make sure to consult with the list of known issues https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues. 114 | # Run the following command to upgrade: 115 | # 116 | # gcloud container clusters upgrade 'cluster-1' --project 'my-project-id' --zone 'us-central1-c' --image-type 'COS_CONTAINERD' --node-pool 'default-pool' 117 | # 118 | # Nodepool: pool-1, version: 1.18.12-gke.1210 (1.18), image: COS 119 | # 120 | # Please update the nodepool to use Containerd. 121 | # Make sure to consult with the list of known issues https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd#known_issues. 122 | # Run the following command to upgrade: 123 | # 124 | # gcloud container clusters upgrade 'cluster-1' --project 'my-project-id' --zone 'us-central1-c' --image-type 'COS_CONTAINERD' --node-pool 'pool-1' 125 | # 126 | # Nodepool: winpool, version: 1.18.12-gke.1210 (1.18), image: WINDOWS_SAC 127 | # 128 | # Upgrade nodepool to the version that supports Containerd for Windows 129 | # 130 | # Cluster: another-test-cluster (zone: us-central1-c) 131 | # Nodepool: default-pool, version: 1.20.4-gke.400 (1.20), image: COS_CONTAINERD 132 | # 133 | # Nodepool is using Containerd already 134 | # 135 | # [END gke_node_find_non_containerd_nodepools] 136 | # 137 | -------------------------------------------------------------------------------- /containerd/socket-tracer/README.md: -------------------------------------------------------------------------------- 1 | # Tracing containerd Socket Connections 2 | 3 | The `containerd-socket-tracer.yaml` is a DaemonSet that monitors and logs 4 | connections made to the `containerd` socket. This is particularly useful for 5 | identifying which containers are interacting with the container runtime 6 | directly. 7 | 8 | Before deploying, please review the 9 | [important considerations](#important-considerations) 10 | regarding potential performance impact and system conflicts. 11 | 12 | ### How It Works 13 | 14 | This tool leverages [eBPF](https://ebpf.io/) to trace system calls. It watches 15 | for any process that attempts to open a connection to the `containerd` socket. 16 | Once a connection is detected, the tracer identifies the PID and queries the 17 | node's [CRI](https://kubernetes.io/docs/concepts/architecture/cri/) to resolve 18 | the corresponding Pod and container details. 19 | 20 | ## Example Use Case: Identifying Deprecated API Clients 21 | 22 | A practical application for this tracer is to identify applications that are 23 | still using the 24 | [deprecated CRI v1alpha2 API](https://github.com/containerd/containerd/blob/v2.1.2/RELEASES.md?plain=1#L167). 25 | As `containerd` moves towards newer API versions, it's crucial to find and update 26 | any clients using outdated versions. 27 | 28 | The following sections describe two methods for finding these clients: a manual 29 | approach using SSH and an automated approach using a companion workload. 30 | 31 | ### Method 1: Manual Correlation via SSH 32 | 33 | While the tracer is running, it will generate log entries for every container 34 | that establishes a socket connection. 35 | 36 | **Example Log Output:** 37 | 38 | ``` 39 | time="2025-06-16T18:19:10Z" msg="eBPF tracepoint for containerd socket connections started" node="gke-cluster-default-pool-1e092676-x8m9" 40 | time="2025-06-16T18:19:30Z" msg="containerd socket connection opened" node="gke-cluster-default-pool-1e092676-x8m9" pod="default/cri-v1alpha2-api-client-dmghs" container="deployment-container-1" comm="v1alpha2_client" 41 | ``` 42 | 43 | If your version of `containerd` includes the 44 | [deprecation service](https://samuel.karp.dev/blog/2024/01/deprecation-warnings-in-containerd-getting-ready-for-2.0/), 45 | you can correlate the timestamp from the tracer's log with the `lastOccurrence` 46 | of the deprecated API call. This allows you to pinpoint the exact application 47 | making the call. 48 | 49 | **Correlating with `containerd` Deprecation Warnings:** 50 | 51 | ```sh 52 | $ ctr deprecations list --format json | jq '.[] | select(.id == "io.containerd.deprecation/cri-api-v1alpha2")' 53 | { 54 | "id": "io.containerd.deprecation/cri-api-v1alpha2", 55 | "message": "CRI API v1alpha2 is deprecated since containerd v1.7 and removed in containerd v2.0. Use CRI API v1 instead.", 56 | "lastOccurrence": "2025-06-16T18:21:30.959558222Z" 57 | } 58 | ``` 59 | 60 | ### Method 2: Automated Correlation with a Reporter Workload 61 | 62 | To simplify this process, a second workload, 63 | `cri-v1alpha2-api-deprecation-reporter.yaml`, is provided. This DaemonSet 64 | periodically logs the occurrence of the last CRI v1alpha2 API call. 65 | 66 | **Example Log Output:** 67 | 68 | ``` 69 | time="2025-06-16T18:22:19Z" msg="checking for CRI v1alpha2 API deprecation warnings" node="gke-cluster-default-pool-1e092676-x8m9" 70 | time="2025-06-16T18:22:19Z" msg="found CRI v1alpha2 API deprecation warning" node="gke-cluster-default-pool-1e092676-x8m9" lastOccurrence="2025-06-16T18:21:30.959558222Z" 71 | ``` 72 | 73 | ### Putting It All Together: Finding the Client 74 | 75 | With both DaemonSets deployed and running, you can find the deprecated API client by correlating the logs from both tools. 76 | 77 | In one terminal, stream the logs from the reporter to watch for deprecation events: 78 | 79 | ```sh 80 | $ kubectl logs -f -l name=cri-v1alpha2-api-deprecation-reporter 81 | ``` 82 | 83 | In a second terminal, stream the logs from the tracer to watch for new connections: 84 | 85 | ```sh 86 | $ kubectl logs -f -l name=containerd-socket-tracer 87 | ``` 88 | 89 | **Wait and Compare:** When a `lastOccurrence` timestamp appears in the reporter's log, look for a "containerd socket connection opened" event in the tracer's log that occurred on the **same node** at nearly the **exact same time**. 90 | 91 | Alternatively, if your cluster is configured to send logs to a centralized platform, you can run a single query to see the aggregated logs from all nodes at once. This is the recommended approach for analyzing historical data and correlating events across your entire cluster. 92 | 93 | For example, users on Google Kubernetes Engine (GKE) can use the following query in Cloud Logging to view the output from both workloads: 94 | 95 | ``` 96 | resource.type="k8s_container" 97 | ( 98 | labels."k8s-pod/name"="containerd-socket-tracer" 99 | OR 100 | labels."k8s-pod/name"="cri-v1alpha2-api-deprecation-reporter" 101 | ) 102 | ``` 103 | 104 | This correlation between the two log events pinpoints the exact pod and container that is responsible for the deprecated API call. 105 | 106 | ### Filtering Commands (`comm`) 107 | 108 | To reduce noise, the underlying `bpftrace` script is configured to ignore a 109 | default set of common, node-level commands (like `kubelet` and `containerd` 110 | itself). The primary focus is on identifying connections from other, 111 | containerized applications. 112 | 113 | You can customize this behavior by modifying the `bpftrace` 114 | [filter](https://github.com/bpftrace/bpftrace/blob/v0.23.5/man/adoc/bpftrace.adoc#filterspredicates) 115 | to include or exclude other commands as needed. 116 | 117 | ## Important Considerations 118 | 119 | Please review the following disclaimers before deploying this workload. 120 | 121 | ### Production Use 122 | 123 | Always test this tool in a dedicated test environment before deploying to 124 | production. For production rollouts, use exposure control by deploying to a 125 | small subset of nodes first to monitor for any adverse effects. 126 | 127 | This tool is intended for temporary, targeted debugging, not for prolonged 128 | execution. Its main purpose is to help identify workloads violating containerd 129 | deprecations when other detection methods have been unsuccessful. 130 | 131 | ### CPU Overhead 132 | 133 | Enabling this tracer may increase the CPU load on your nodes. This is especially 134 | true if the `comm` filter is not restrictive enough and there is a high volume 135 | of socket connections. 136 | 137 | ### Potential eBPF Conflicts 138 | 139 | Running this tracer on nodes where other eBPF-based tools are already active may 140 | lead to unexpected behavior or conflicts. 141 | 142 | ## Installation 143 | 144 | To deploy the tracer to your cluster, apply the `containerd-socket-tracer.yaml` 145 | manifest using `kubectl`. 146 | 147 | ```sh 148 | $ kubectl apply -f containerd-socket-tracer.yaml 149 | ``` 150 | 151 | And to deploy the optional API reporter: 152 | 153 | ```sh 154 | $ kubectl apply -f cri-v1alpha2-api-deprecation-reporter.yaml 155 | ``` 156 | 157 | ### GKE Autopilot Users 158 | 159 | GKE Autopilot clusters enforce security policies that prevent workloads 160 | requiring privileged access from running by default. To deploy the 161 | `containerd-socket-tracer` and its companion 162 | `cri-v1alpha2-api-deprecation-reporter`, you must first install the 163 | corresponding `AllowlistSynchronizer` resources in your cluster. 164 | 165 | These synchronizers enable the workloads to run on Autopilot nodes by matching 166 | them with a `WorkloadAllowlist`. 167 | 168 | To install the allowlists, apply the following manifests: 169 | 170 | ```sh 171 | $ kubectl apply -f containerd-socket-tracer-allowlist.yaml 172 | $ kubectl apply -f cri-v1alpha2-api-deprecation-reporter-allowlist.yaml 173 | ``` 174 | 175 | After applying these manifests and allowing a few moments for the allowlists to 176 | synchronize, you can deploy the `containerd-socket-tracer` and 177 | `cri-v1alpha2-api-deprecation-reporter` DaemonSets as described above. 178 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | # Node Configuration in GKE 5 | 6 | The below outlines a strategy for performing custom node configuration in Google Kubernetes Engine (GKE) using init containers. The goal is to apply node-level settings (like sysctl adjustments, software installations, or kernel parameter checks) *before* regular application workloads are scheduled onto those nodes. To achieve this isolation temporarily, we'll mark nodes as unshcedulable with taint until configuration is completed. 7 | 8 | ## Supporting Tools for Node Configuration 9 | 10 | A collection of tools, potentially packaged within a container image used by an init container, can help manage and configure various aspects of a Kubernetes node: 11 | 12 | * **Kubelet:** These tools can assist in managing and interacting with the Kubelet, the primary node agent in Kubernetes. This could include tools for: 13 | * **Configuration management:** Scripts or utilities to configure Kubelet settings. 14 | * **Status monitoring:** Tools to check the health and status of the Kubelet service. 15 | * **Log analysis:** Scripts to collect and analyze Kubelet logs for debugging. 16 | * **Containerd:** Tools related to containerd, the container runtime used by Kubelet to manage containers. This might include: 17 | * **Container image management:** Tools for pulling,cl inspecting, or managing container images on nodes. 18 | * **Container lifecycle management:** Utilities to interact with containerd for container creation, deletion, and status checks. 19 | * **Snapshotter management:** Tools to manage containerd snapshotter for efficient storage. 20 | * **Sysctl:** Tools for managing and configuring sysctl parameters on Kubernetes nodes. Sysctl allows you to modify kernel parameters at runtime. These tools could help with: 21 | * **Parameter modification:** Tools to apply specific sysctl settings to nodes, potentially for performance tuning or Node Specific node hardening required by your organization. 22 | * **Configuration persistence:** Mechanisms to ensure sysctl settings are applied consistently across node reboots. 23 | * **Kernel:** Tools for interacting with or gathering information about the node's kernel. This might include: 24 | * **Kernel module management:** Tools for loading, unloading, or checking the status of kernel modules. 25 | * **Kernel parameter inspection:** Utilities to examine kernel parameters beyond sysctl. 26 | * **Software Installations:** Tools to automate or simplify software installations on Kubernetes nodes. This could involve: 27 | * **Package management:** Scripts to install packages using package managers like `apt` or `yum`. 28 | * **Binary deployments:** Tools to deploy pre-compiled binaries to nodes. 29 | * **Configuration management:** Scripts to configure installed software. 30 | * **Troubleshooting Tooling:** A suite of tools to aid in diagnosing and resolving issues on Kubernetes nodes. This is likely to encompass: 31 | * **Log collection and analysis:** Tools to gather logs from various node components and facilitate analysis. to encompass: 32 | * **Log collection and analysis:** Tools to gather logs from various node components and facilitate analysis. 33 | 34 | ### Part 1: Scheduling an Init Container for Node Modification 35 | 36 | This example demonstrates how to use a DaemonSet with an init container to apply a `sysctl` configuration change to Kubernetes nodes upon pod startup. The DaemonSet will attempt to run on all available nodes. 37 | 38 | #### Concept: 39 | 40 | We deploy a DaemonSet ensuring one pod replica runs on each eligible node. This pod has an *init container* that runs first. The init container executes a command to modify a kernel parameter (`sysctl`) on the host node. Because modifying host kernel parameters requires high privileges, the init container runs in **`privileged`** mode and mounts the host's root filesystem. After the init container completes successfully, a minimal main container (`pause`) starts just to keep the pod running. 41 | 42 | #### Example `sysctl-init-daemonset.yaml`: 43 | 44 | ```yaml 45 | apiVersion: apps/v1 46 | kind: DaemonSet 47 | metadata: 48 | name: sysctl-init-daemonset 49 | labels: 50 | app: sysctl-modifier 51 | spec: 52 | selector: 53 | matchLabels: 54 | app: sysctl-modifier 55 | template: 56 | metadata: 57 | labels: 58 | app: sysctl-modifier 59 | spec: 60 | # Allow access to host PID namespace if needed for specific tools 61 | hostPID: true 62 | volumes: 63 | - name: host-root-fs 64 | hostPath: 65 | path: / 66 | type: Directory # Mount the node's root filesystem 67 | initContainers: 68 | - name: apply-sysctl-value 69 | image: gcr.io/gke-release/debian-base # Small image with shell tools 70 | # *** Requires privilege to modify host kernel settings *** 71 | securityContext: 72 | privileged: true 73 | volumeMounts: 74 | - name: host-root-fs 75 | mountPath: /host # Access the host filesystem at /host 76 | readOnly: false # Needs write access to modify sysctl typically 77 | command: ["/bin/sh", "-c"] 78 | args: 79 | - | 80 | echo "Attempting to set net.ipv4.ip_forward=1 on the host..." 81 | # Use chroot to execute the command in the host's root filesystem context 82 | if chroot /host sysctl -w net.ipv4.ip_forward=1; then 83 | echo "Successfully set net.ipv4.ip_forward." 84 | else 85 | echo "Failed to set net.ipv4.ip_forward." >&2 86 | exit 1 # Fail the init container if command fails 87 | fi 88 | # Add other setup commands here if needed 89 | containers: 90 | - name: pause-container 91 | # Minimal container just to keep the Pod running after init succeeds 92 | image: gcr.io/gke-release/pause:latest 93 | updateStrategy: 94 | type: RollingUpdate 95 | ``` 96 | 97 | #### Walkthrough: 98 | 99 | 1. **Save the YAML:** Save the content above as `sysctl-init-daemonset.yaml`. 100 | 2. **Apply the DaemonSet:** 101 | ```bash 102 | kubectl apply -f sysctl-init-daemonset.yaml 103 | ``` 104 | 3. **Verify Pod Creation:** Check that the DaemonSet pods are being created on your nodes: 105 | ```bash 106 | kubectl get pods -l app=sysctl-modifier -o wide 107 | ``` 108 | * (Wait for pods to reach 'Running' state).* 109 | 4. **Check Init Container Logs:** View the logs of the init container on one of the pods to see its output: 110 | ```bash 111 | # Get a pod name first from the command above 112 | POD_NAME=$(kubectl get pods -l app=sysctl-modifier -o jsonpath='{.items[0].metadata.name}') 113 | kubectl logs $POD_NAME -c apply-sysctl-value 114 | ``` 115 | You should see the "Attempting..." and "Successfully set..." messages. 116 | 5. **Verify on Node (Optional):** You can confirm the change by SSHing into a node where the pod ran and executing `sysctl net.ipv4.ip_forward`, or by running a privileged debug pod on that node. 117 | 118 | --- 119 | 120 | ### Part 2: Taint -> Configure with Init Container -> Untaint Workflow 121 | 122 | This section outlines a more advanced and automated method for node configuration. The workflow uses a single, intelligent DaemonSet that not only applies the configuration to tainted nodes but also automatically removes the taint from the node once its job is complete. This approach is ideal for streamlining configuration changes across a node pool without manual intervention to make the nodes schedulable again. 123 | 124 | The process leverages the kubectl binary available on GKE nodes and requires RBAC permissions for the DaemonSet's ServiceAccount to modify its own node object. 125 | #### Concept: 126 | 1. **Taint Node Pool:** A taint is applied to a node pool to prevent regular workloads from being scheduled, effectively reserving it for configuration. 127 | 2. **Deploy Smart DaemonSet:** A DaemonSet is deployed with three key characteristics: 128 | * **Toleration:** It has a `toleration` to allow it to run on the tainted nodes. 129 | * **Configuration Logic:** An init container runs privileged commands to configure the node (e.g., setting `sysctl` values). 130 | * **Untainting Logic:** After applying the configuration, the same container uses the node's `kubectl` tool to remove the taint from itself, making the node available for general use. 131 | 3. **Verification:** Once the DaemonSet pod completes its init container, the node is both configured and fully schedulable. 132 | 133 | --- 134 | #### Walkthrough: 135 | 136 | 1. **Taint the Node Pool:** 137 | * Identify your target GKE node pool, cluster, and location (zone/region). 138 | * Apply a specific taint using `gcloud`. Let's use `node.config.status/stage=configuring:NoSchedule`. 139 | 140 | ```bash 141 | # Replace placeholders with your actual values 142 | GKE_CLUSTER="your-cluster-name" 143 | NODE_POOL="your-node-pool-name" 144 | GKE_ZONE="your-zone" # Or GKE_REGION="your-region" 145 | 146 | gcloud container node-pools update $NODE_POOL \ 147 | --cluster=$GKE_CLUSTER \ 148 | --node-taints=node.config.status/stage=configuring:NoSchedule \ 149 | --zone=$GKE_ZONE # Or --region=$GKE_REGION 150 | ``` 151 | * Verify the taint is applied to nodes in the pool: 152 | 153 | ```bash 154 | kubectl describe node | grep Taints 155 | ``` 156 | You should see node.config/status=initializing:NoSchedule. 157 | 158 | 2. **Create and Deploy the Self-Untainting DaemonSet:** 159 | * The following YAML creates all the necessary components: a **ServiceAccount** for permissions, a **ClusterRole** granting node-patching rights, a **ClusterRoleBinding** to link them, and the **DaemonSet** itself. 160 | Save the following content as `auto-untaint-daemonset.yaml`. 161 | 162 | ``` 163 | # WARNING: THIS MAKES YOUR NODES LESS SECURE. This DaemonSet runs as privileged. 164 | # WARNING: This DaemonSet runs as privileged, which has significant 165 | # security implications. Only use this on clusters where you have 166 | # strict controls over what is deployed. 167 | --- 168 | apiVersion: v1 169 | kind: ServiceAccount 170 | metadata: 171 | name: node-config-sa 172 | namespace: default 173 | --- 174 | apiVersion: rbac.authorization.k8s.io/v1 175 | kind: ClusterRole 176 | metadata: 177 | name: node-patcher-role 178 | rules: 179 | - apiGroups: [""] 180 | resources: ["nodes"] 181 | # Permissions needed to read and remove a taint from the node. 182 | verbs: ["get", "patch", "update"] 183 | --- 184 | apiVersion: rbac.authorization.k8s.io/v1 185 | kind: ClusterRoleBinding 186 | metadata: 187 | name: node-config-binding 188 | subjects: 189 | - kind: ServiceAccount 190 | name: node-config-sa 191 | namespace: default 192 | roleRef: 193 | kind: ClusterRole 194 | name: node-patcher-role 195 | apiGroup: rbac.authorization.k8s.io 196 | --- 197 | apiVersion: apps/v1 198 | kind: DaemonSet 199 | metadata: 200 | name: auto-untaint-daemonset 201 | labels: 202 | app: auto-untaint-configurator 203 | spec: 204 | selector: 205 | matchLabels: 206 | app: auto-untaint-configurator 207 | updateStrategy: 208 | type: RollingUpdate 209 | template: 210 | metadata: 211 | labels: 212 | app: auto-untaint-configurator 213 | spec: 214 | serviceAccountName: node-config-sa 215 | hostPID: true 216 | # Toleration now matches the taint on your node. 217 | tolerations: 218 | - key: "node.config.status/stage" 219 | operator: "Equal" 220 | value: "configuring" 221 | effect: "NoSchedule" 222 | volumes: 223 | - name: host-root-fs 224 | hostPath: 225 | path: / 226 | initContainers: 227 | - name: configure-and-untaint 228 | image: ubuntu:22.04 # Using a standard container image. 229 | securityContext: 230 | privileged: true # Required for chroot and sysctl. 231 | env: 232 | - name: NODE_NAME 233 | valueFrom: 234 | fieldRef: 235 | fieldPath: spec.nodeName 236 | volumeMounts: 237 | - name: host-root-fs 238 | mountPath: /host 239 | command: ["/bin/bash", "-c"] 240 | args: 241 | - | 242 | # Using explicit error checking for each critical command. 243 | 244 | # Define the configuration and taint details. 245 | SYSCTL_PARAM="vm.max_map_count" 246 | SYSCTL_VALUE="262144" 247 | TAINT_KEY="node.config.status/stage" 248 | 249 | echo "Running configuration on node: ${NODE_NAME}" 250 | 251 | # 1. APPLY CONFIGURATION 252 | echo "--> Applying ${SYSCTL_PARAM}=${SYSCTL_VALUE}..." 253 | if ! chroot /host sysctl -w "${SYSCTL_PARAM}=${SYSCTL_VALUE}"; then 254 | echo "ERROR: Failed to apply sysctl parameter." >&2 255 | exit 1 256 | fi 257 | echo "--> Configuration applied successfully." 258 | 259 | # 2. UNTAINT THE NODE 260 | # This command removes the taint from the node this pod is running on. 261 | echo "--> Untainting node ${NODE_NAME} by removing taint ${TAINT_KEY}..." 262 | if ! /host/home/kubernetes/bin/kubectl taint node "${NODE_NAME}" "${TAINT_KEY}:NoSchedule-"; then 263 | echo "ERROR: Failed to untaint the node." >&2 264 | exit 1 265 | fi 266 | echo "--> Node has been untainted and is now schedulable." 267 | # The main container is minimal; it just keeps the pod running. 268 | containers: 269 | - name: pause-container 270 | image: registry.k8s.io/pause:3.9 271 | ``` 272 | 273 | 274 | * **Apply the `DaemonSet` manifest.** 275 | ```bash 276 | kubectl apply -f auto-untaint-daemonset.yaml 277 | ``` 278 | 3. **Validate the DaemonSet:** 279 | 280 | * **Verify the pods are running on the tainted nodes.** 281 | You should see the pod in a `Running` state after the init container completes. 282 | ```bash 283 | kubectl get pods -l app=auto-untaint-configurator -o wide 284 | ``` 285 | 286 | * **Check the logs to confirm execution.** 287 | View the `initContainer` logs to ensure the script ran and untainted the node successfully. 288 | ```bash 289 | # Get a pod name from the command above 290 | POD_NAME=$(kubectl get pods -l app=auto-untaint-configurator -o jsonpath='{.items[0].metadata.name}') 291 | 292 | # Check the logs for that pod's init container 293 | kubectl logs $POD_NAME -c configure-and-untaint 294 | ``` 295 | The output will confirm that the `sysctl` command ran and the node was untainted. 296 | 297 | --- 298 | ### Part 3: Privileged DaemonSet Tradeoffs and Security Restrictions 299 | 300 | Using `securityContext: privileged: true` in a DaemonSet (or any pod) is powerful but comes with significant security implications. It essentially disables most container isolation boundaries for that pod. 301 | 302 | #### The Tradeoff: 303 | 304 | * **Benefit:** Grants the container capabilities necessary for deep host system interactions, such as: 305 | * Modifying kernel parameters (`sysctl`). 306 | * Loading/unloading kernel modules (`modprobe`). 307 | * Accessing host devices (`/dev/*`). 308 | * Modifying protected host filesystems. 309 | * Full network stack manipulation (beyond standard Kubernetes networking). 310 | * Running tools that require raw socket access or specific hardware interactions. 311 | * **Cost:** Massively increased security risk and potential for node/cluster instability. 312 | 313 | #### Security Restrictions and Risks: 314 | 315 | * **Container Escape/Host Compromise:** A vulnerability within the privileged container's application or image can directly lead to root access on the host node. The attacker bypasses standard container defenses. 316 | * **Violation of Least Privilege:** Privileged mode grants *all* capabilities, likely far more than needed for a specific task. This broad access increases the potential damage if the container is compromised. 317 | * **Node Destabilization:** Accidental or malicious commands run within the privileged container (e.g., incorrect `sysctl` values, `rm -rf /host/boot`) can crash or corrupt the host node operating system. 318 | * **Lateral Movement:** Compromising one node via a privileged DaemonSet gives an attacker a strong foothold to attack other nodes, the Kubernetes control plane, or connected systems. 319 | * **Data Exposure:** Unrestricted access to the host filesystem (`/`) can expose sensitive data stored on the node, including credentials, keys, or data belonging to other pods (if accessible via host paths). 320 | * **Increased Attack Surface:** Exposes more of the host kernel's system calls and features to potential exploits from within the container. 321 | 322 | #### Best Practices / Mitigations: 323 | 324 | * **Avoid If Possible:** The most secure approach is to avoid **`privileged: true`** entirely. 325 | * **Use Linux Capabilities:** If elevated rights are needed, grant *specific* Linux capabilities (e.g., `NET_ADMIN`, `SYS_ADMIN`, `SYS_MODULE`) in the `securityContext.capabilities.add` field instead of full privilege. This follows the principle of least privilege. 326 | * **Limit Scope:** Run privileged DaemonSets only on dedicated, possibly tainted, node pools to contain the potential blast radius. 327 | * **Policy Enforcement:** Use GKE Policy Controller (or OPA Gatekeeper) to create policies that restrict, audit, or require justification for deploying privileged containers. 328 | * **Image Scanning & Trust:** Use GKE Binary Authorization and rigorous image scanning to ensure only vetted, trusted container images are run with privilege. 329 | * **Minimize Host Mounts:** Only mount the specific host paths needed, and use `readOnly: true` whenever possible. Avoid mounting the entire root filesystem (`/`) unless absolutely necessary. 330 | * **Regular Audits:** Periodically review all workloads running with **`privileged: true`**. 331 | --------------------------------------------------------------------------------