├── .github └── workflows │ └── release.yml ├── .gitignore ├── .goreleaser.yml ├── .krew.yaml ├── LICENSE ├── README.md ├── demo ├── installation.gif └── usage.gif ├── node-restart.sh ├── node-restart.yaml └── v1.0.7.zip /.github/workflows/release.yml: -------------------------------------------------------------------------------- 1 | name: release 2 | on: 3 | push: 4 | tags: 5 | - "v*.*.*" 6 | jobs: 7 | goreleaser: 8 | runs-on: ubuntu-latest 9 | steps: 10 | - name: Checkout 11 | uses: actions/checkout@master 12 | - name: Setup Go 13 | uses: actions/setup-go@v1 14 | with: 15 | go-version: 1.16 16 | - name: GoReleaser 17 | uses: goreleaser/goreleaser-action@v1 18 | with: 19 | version: latest 20 | args: release --rm-dist 21 | env: 22 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 23 | - name: Update new version in krew-index 24 | uses: rajatjindal/krew-release-bot@v0.0.40 -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MnrGreg/kubectl-node-restart/35588b4a5b742f14bf3a954fb15c061216af54c0/.gitignore -------------------------------------------------------------------------------- /.goreleaser.yml: -------------------------------------------------------------------------------- 1 | builds: 2 | - id: kubectl-node-restart 3 | skip: true 4 | archives: 5 | - id: kubectl-node-restart 6 | name_template: "{{ .TagName }}.zip" 7 | wrap_in_directory: false 8 | format: zip 9 | files: 10 | - LICENSE 11 | - node-restart.sh 12 | changelog: 13 | sort: asc 14 | filters: 15 | exclude: 16 | - '^docs:' 17 | - '^test:' 18 | release: 19 | ids: 20 | - kubectl-node-restart 21 | name_template: "{{ .Tag }}" 22 | extra_files: 23 | - glob: ./*.zip -------------------------------------------------------------------------------- /.krew.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: krew.googlecontainertools.github.com/v1alpha2 2 | kind: Plugin 3 | metadata: 4 | name: node-restart 5 | spec: 6 | version: "{{ .TagName }}" 7 | platforms: 8 | - selector: 9 | matchExpressions: 10 | - {key: os, operator: In, values: [darwin, linux]} 11 | {{addURIAndSha "https://github.com/MnrGreg/kubectl-node-restart/releases/download/{{ .TagName }}/{{ .TagName }}.zip" .TagName }} 12 | files: 13 | - from: "*.sh" 14 | to: "." 15 | - from: "LICENSE" 16 | to: "." 17 | bin: "node-restart.sh" 18 | shortDescription: >- 19 | Restart cluster nodes sequentially and gracefully 20 | homepage: https://github.com/mnrgreg/kubectl-node-restart 21 | caveats: | 22 | Execution of this plugin requires Kubernetes cluster-admin Rolebindings 23 | and the ability to schedule Privileged Pods. 24 | description: | 25 | This plugin performs a sequential, rolling restart of selected nodes by first 26 | draining each node, then running a Kubernetes Job to reboot each node, and 27 | finally uncordoning each node when Ready. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2020 Greg May 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining 6 | a copy of this software and associated documentation files (the 7 | "Software"), to deal in the Software without restriction, including 8 | without limitation the rights to use, copy, modify, merge, publish, 9 | distribute, sublicense, and/or sell copies of the Software, and to 10 | permit persons to whom the Software is furnished to do so, subject to 11 | the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | This repository contains the code which has the following license 25 | notice: 26 | 27 | Copyright 2016 The Kubernetes Authors All rights reserved. 28 | 29 | Licensed under the Apache License, Version 2.0 (the "License"); 30 | you may not use this file except in compliance with the License. 31 | You may obtain a copy of the License at 32 | 33 | http://www.apache.org/licenses/LICENSE-2.0 34 | 35 | Unless required by applicable law or agreed to in writing, software 36 | distributed under the License is distributed on an "AS IS" BASIS, 37 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 38 | See the License for the specific language governing permissions and 39 | limitations under the License. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # kubectl-node-restart 2 | 3 | `kubectl-node-restart` is a [kubectl plugin](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/) that sequentially and gracefully performs a rolling restart of Nodes within a Kubernetes cluster 4 | 5 | ![using kubectl-node-restart plugin](demo/usage.gif) 6 | 7 | # Installing 8 | - install `krew` using instructions [here](https://github.com/kubernetes-sigs/krew#installation) 9 | - run `kubectl krew update` 10 | - run `kubectl krew install node-restart` 11 | 12 | ![installing kubectl-node-restart plugin](demo/installation.gif) 13 | 14 | 15 | # Usage 16 | 17 | - perform rolling restart of all nodes in a cluster 18 | 19 | ```bash 20 | kubectl node-restart [--context cluster] all 21 | ``` 22 | 23 | - restart only specific nodes selected through labels 24 | 25 | ```bash 26 | kubectl node-restart --selector node-role.kubernetes.io/master 27 | ``` 28 | 29 | - execute a command prior to reboot labels 30 | 31 | ```bash 32 | kubectl node-restart all --command "echo 'hello world'" 33 | ``` 34 | 35 | - perform a dry-run 36 | 37 | ```bash 38 | kubectl node-restart all --dry-run 39 | ``` 40 | 41 | - restart node(s) without first draining 42 | 43 | ```bash 44 | kubectl node-restart all --force 45 | ``` 46 | 47 | - add a delay of 120seconds between node restarts 48 | 49 | ```bash 50 | kubectl node-restart all --sleep 120 51 | ``` 52 | 53 | - Pull the Alpine image from a private registry 54 | 55 | ```bash 56 | kubectl node-restart all --registry myregistry.local/library/alpine:3.9 57 | ``` 58 | 59 | 69 | -------------------------------------------------------------------------------- /demo/installation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MnrGreg/kubectl-node-restart/35588b4a5b742f14bf3a954fb15c061216af54c0/demo/installation.gif -------------------------------------------------------------------------------- /demo/usage.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MnrGreg/kubectl-node-restart/35588b4a5b742f14bf3a954fb15c061216af54c0/demo/usage.gif -------------------------------------------------------------------------------- /node-restart.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | image='alpine:3.15' 4 | nodesleep=20 #Time delay between node restarts - give pods time to start up 5 | restartdeadline=300 6 | kubeletdeadline=300 7 | uncordondelay=0 8 | force=false 9 | dryrun=false 10 | blue='\033[0;34m' 11 | nocolor='\033[0m' 12 | rebootcommand='touch /node-restart-flag && reboot' 13 | 14 | function print_usage() { 15 | echo "Usage: kubectl node-restart []" 16 | echo "" 17 | echo " all Restarts all nodes within the cluster" 18 | echo "" 19 | echo " --context context Specify the context (or use kubectx)" 20 | echo " -l|--selector key=value Selector (label query) to target specific nodes" 21 | echo " -f|--force Restart node(s) without first draining" 22 | echo " -d|--dry-run Just print what to do; don't actually do it" 23 | echo " -s|--sleep Sleep delay between restarting Nodes (default 20s)" 24 | echo " -r|--registry Pull Alpine image from an alternate registry" 25 | echo " -c|--command Pre-restart command to be executed" 26 | echo " -ud|--uncordon-delay Sleep delay before uncordoning a node (default 0s)" 27 | echo " -rd|--restart-deadline Deadline for the restart job to complete (default 300s)" 28 | echo " -kd|--kubelet-deadline Deadling for kubelet to start up (default 300s)" 29 | echo " -h|--help Print usage and exit" 30 | } 31 | 32 | while [[ $# -gt 0 ]]; do 33 | key="$1" 34 | 35 | case $key in 36 | all) 37 | allnodes=true 38 | shift 39 | ;; 40 | --context) 41 | cluster="$2" 42 | echo -e "${blue}Targeting cluster $cluster${nocolor}" 43 | context="--context $cluster" 44 | shift 45 | shift 46 | ;; 47 | -l | --selector) 48 | selector="$2" 49 | shift 50 | shift 51 | ;; 52 | -f | --force) 53 | force=true 54 | shift 55 | ;; 56 | -d | --dry-run) 57 | dryrun=true 58 | shift 59 | ;; 60 | -s | --sleep) 61 | nodesleep="$2" 62 | shift 63 | shift 64 | ;; 65 | -r | --registry) 66 | image="$2" 67 | shift 68 | shift 69 | ;; 70 | -c | --command) 71 | rebootcommand="$2 && touch /node-restart-flag && reboot" 72 | shift 73 | shift 74 | ;; 75 | -ud | --uncordon-delay) 76 | uncordondelay="$2" 77 | shift 78 | shift 79 | ;; 80 | -rd | --restart-deadline) 81 | restartdeadline="$2" 82 | shift 83 | shift 84 | ;; 85 | -kd | --kubelet-deadline) 86 | kubeletdeadline="$2" 87 | shift 88 | shift 89 | ;; 90 | -h | --help) 91 | print_usage 92 | exit 0 93 | ;; 94 | *) 95 | print_usage 96 | exit 1 97 | ;; 98 | esac 99 | done 100 | 101 | function wait_for_job_completion() { 102 | pod=$1 103 | i=0 104 | while [[ $i -lt $restartdeadline ]]; do 105 | status=$(kubectl get job $pod -n kube-system -o "jsonpath={.status.succeeded}" $context 2> /dev/null) 106 | if [[ $status -gt 0 ]]; then 107 | echo "Restart complete after $i seconds" 108 | break 109 | else 110 | i=$(($i + 10)) 111 | sleep 10 112 | echo "$node - $i seconds" 113 | fi 114 | done 115 | if [[ $i == $restartdeadline ]]; then 116 | echo "Error: Restart job did not complete within $restartdeadline seconds" 117 | exit 1 118 | fi 119 | } 120 | 121 | function wait_for_status() { 122 | node=$1 123 | i=0 124 | while [[ $i -lt $kubeletdeadline ]]; do 125 | status=$(kubectl get node $node -o "jsonpath={.status.conditions[?(.reason==\"KubeletReady\")].type}" $context 2> /dev/null) 126 | if [[ "$status" == "Ready" ]]; then 127 | echo "KubeletReady after $i seconds" 128 | break 129 | else 130 | i=$(($i + 10)) 131 | sleep 10 132 | echo "$node NotReady - waited $i seconds" 133 | fi 134 | done 135 | if [[ $i == $kubeletdeadline ]]; then 136 | echo "Error: Did not reach KubeletReady state within $kubeletdeadline seconds" 137 | exit 1 138 | fi 139 | } 140 | 141 | if [ "$allnodes" == "true" ]; then 142 | nodes=$(kubectl get nodes -o jsonpath={.items[*].metadata.name} $context) 143 | echo -e "${blue}Targeting nodes:${nocolor}" 144 | for node in $nodes; do 145 | echo " $node" 146 | done 147 | elif [ ! -z "$selector" ]; then 148 | nodes=$(kubectl get nodes --selector=$selector -o jsonpath={.items[*].metadata.name} $context) 149 | echo -e "${blue}Targeting selective nodes:${nocolor}" 150 | for node in $nodes; do 151 | echo " $node" 152 | done 153 | else 154 | print_usage 155 | fi 156 | 157 | for node in $nodes; do 158 | if $force; then 159 | echo -e "\nWARNING: --force specified, restarting node $node without draining first" 160 | if $dryrun; then 161 | echo "kubectl cordon $node $context" 162 | else 163 | kubectl $context cordon "$node" 164 | fi 165 | else 166 | echo -e "\n${blue}Draining node $node...${nocolor}" 167 | if $dryrun; then 168 | echo "kubectl drain $node --ignore-daemonsets --delete-emptydir-data --force $context" 169 | else 170 | kubectl drain "$node" --ignore-daemonsets --delete-emptydir-data --force $context 171 | fi 172 | fi 173 | 174 | echo -e "${blue}Initiating node restart job on $node...${nocolor}" 175 | pod="node-restart-$(env LC_CTYPE=C LC_ALL=C tr -dc a-z0-9 < /dev/urandom | head -c 5)" 176 | if $dryrun; then 177 | echo "kubectl create job $pod $context" 178 | else 179 | kubectl apply $context -f- << EOT 180 | apiVersion: batch/v1 181 | kind: Job 182 | metadata: 183 | name: $pod 184 | namespace: kube-system 185 | spec: 186 | backoffLimit: 3 187 | ttlSecondsAfterFinished: 30 188 | template: 189 | spec: 190 | nodeName: $node 191 | hostPID: true 192 | tolerations: 193 | - effect: NoSchedule 194 | operator: Exists 195 | containers: 196 | - name: $pod 197 | image: $image 198 | command: [ "nsenter", "--target", "1", "--mount", "--uts", "--ipc", "--pid", "--", "bash", "-c" ] 199 | args: [ "if [ -f /node-restart-flag ]; then rm /node-restart-flag && exit 0; else $rebootcommand && exit 1; fi" ] 200 | securityContext: 201 | privileged: true 202 | restartPolicy: Never 203 | EOT 204 | fi 205 | 206 | if ! $dryrun; then 207 | echo -e "${blue}Waiting for restart job to complete on node $node...${nocolor}" 208 | wait_for_job_completion $pod 209 | wait_for_status $node KubeletReady 210 | else 211 | echo "Waiting $restartdeadline seconds for restart job completion." 212 | echo "Waiting $kubeletdeadline seconds for kubelet initialization." 213 | fi 214 | 215 | if [[ $uncordondelay -gt 0 ]]; then 216 | echo "Waiting $uncordondelay seconds before uncordoning." 217 | fi 218 | 219 | echo -e "${blue}Uncordoning node $node${nocolor}" 220 | 221 | if $dryrun; then 222 | echo "kubectl uncordon $node $context" 223 | else 224 | sleep $uncordondelay 225 | kubectl uncordon "$node" $context 226 | kubectl delete job $pod -n kube-system $context 227 | [ "$node" != "${nodes##* }" ] && sleep $nodesleep 228 | fi 229 | done 230 | -------------------------------------------------------------------------------- /node-restart.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: krew.googlecontainertools.github.com/v1alpha2 2 | kind: Plugin 3 | metadata: 4 | name: node-restart 5 | spec: 6 | version: "v1.0.1" 7 | platforms: 8 | - selector: 9 | matchExpressions: 10 | - {key: os, operator: In, values: [darwin, linux]} 11 | uri: https://github.com/MnrGreg/kubectl-node-restart/releases/download/v1.0.1/v1.0.1.zip 12 | sha256: "3ab20f10179111f54410f4ac41ef15231aeba6e7bf6a24b7cf3a54d30a293ce4" 13 | files: 14 | - from: "*.sh" 15 | to: "." 16 | - from: "LICENSE" 17 | to: "." 18 | bin: "node-restart.sh" 19 | shortDescription: >- 20 | Restart cluster nodes sequentially and gracefully 21 | homepage: https://github.com/mnrgreg/kubectl-node-restart 22 | caveats: | 23 | Execution of this plugin requires Kubernetes cluster-admin Rolebindings 24 | and the ability to schedule Privileged Pods. 25 | description: | 26 | This plugin performs a sequential, rolling restart of selected nodes by first 27 | draining each node, then running a Kubernetes Job to reboot each node, and 28 | finally uncordoning each node when Ready. -------------------------------------------------------------------------------- /v1.0.7.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MnrGreg/kubectl-node-restart/35588b4a5b742f14bf3a954fb15c061216af54c0/v1.0.7.zip --------------------------------------------------------------------------------