├── .gitignore ├── README.md ├── Screenshots ├── 2021-10-28_134300.png ├── 2021-10-28_134337.png ├── 2021-10-28_134639.png ├── 2021-10-29_162836.png ├── 2021-10-29_163551.png ├── 2021-10-29_163637.png ├── 2021-10-29_163837.png ├── 2021-10-29_164027.png ├── image-20211012134939433.png ├── image-20211012140900479.png └── image-20211012140957412.png ├── centos ├── Vagrantfile ├── bootstrap.sh ├── bootstrap_kmaster.sh └── bootstrap_kworker.sh ├── kube-prometheus └── manifests │ ├── alertmanager-alertmanager.yaml │ ├── alertmanager-podDisruptionBudget.yaml │ ├── alertmanager-prometheusRule.yaml │ ├── alertmanager-secret.yaml │ ├── alertmanager-service.yaml │ ├── alertmanager-serviceAccount.yaml │ ├── alertmanager-serviceMonitor.yaml │ ├── blackbox-exporter-clusterRole.yaml │ ├── blackbox-exporter-clusterRoleBinding.yaml │ ├── blackbox-exporter-configuration.yaml │ ├── blackbox-exporter-deployment.yaml │ ├── blackbox-exporter-service.yaml │ ├── blackbox-exporter-serviceAccount.yaml │ ├── blackbox-exporter-serviceMonitor.yaml │ ├── grafana-config.yaml │ ├── grafana-dashboardDatasources.yaml │ ├── grafana-dashboardDefinitions.yaml │ ├── grafana-dashboardSources.yaml │ ├── grafana-deployment.yaml │ ├── grafana-service.yaml │ ├── grafana-serviceAccount.yaml │ ├── grafana-serviceMonitor.yaml │ ├── kube-prometheus-prometheusRule.yaml │ ├── kube-state-metrics-clusterRole.yaml │ ├── kube-state-metrics-clusterRoleBinding.yaml │ ├── kube-state-metrics-deployment.yaml │ ├── kube-state-metrics-prometheusRule.yaml │ ├── kube-state-metrics-service.yaml │ ├── kube-state-metrics-serviceAccount.yaml │ ├── kube-state-metrics-serviceMonitor.yaml │ ├── kubernetes-prometheusRule.yaml │ ├── kubernetes-serviceMonitorApiserver.yaml │ ├── kubernetes-serviceMonitorCoreDNS.yaml │ ├── kubernetes-serviceMonitorKubeControllerManager.yaml │ ├── kubernetes-serviceMonitorKubeScheduler.yaml │ ├── kubernetes-serviceMonitorKubelet.yaml │ ├── node-exporter-clusterRole.yaml │ ├── node-exporter-clusterRoleBinding.yaml │ ├── node-exporter-daemonset.yaml │ ├── node-exporter-prometheusRule.yaml │ ├── node-exporter-service.yaml │ ├── node-exporter-serviceAccount.yaml │ ├── node-exporter-serviceMonitor.yaml │ ├── prometheus-adapter-apiService.yaml │ ├── prometheus-adapter-clusterRole.yaml │ ├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml │ ├── prometheus-adapter-clusterRoleBinding.yaml │ ├── prometheus-adapter-clusterRoleBindingDelegator.yaml │ ├── prometheus-adapter-clusterRoleServerResources.yaml │ ├── prometheus-adapter-configMap.yaml │ ├── prometheus-adapter-deployment.yaml │ ├── prometheus-adapter-podDisruptionBudget.yaml │ ├── prometheus-adapter-roleBindingAuthReader.yaml │ ├── prometheus-adapter-service.yaml │ ├── prometheus-adapter-serviceAccount.yaml │ ├── prometheus-adapter-serviceMonitor.yaml │ ├── prometheus-clusterRole.yaml │ ├── prometheus-clusterRoleBinding.yaml │ ├── prometheus-operator-prometheusRule.yaml │ ├── prometheus-operator-serviceMonitor.yaml │ ├── prometheus-podDisruptionBudget.yaml │ ├── prometheus-prometheus.yaml │ ├── prometheus-prometheusRule.yaml │ ├── prometheus-roleBindingConfig.yaml │ ├── prometheus-roleBindingSpecificNamespaces.yaml │ ├── prometheus-roleConfig.yaml │ ├── prometheus-roleSpecificNamespaces.yaml │ ├── prometheus-service.yaml │ ├── prometheus-serviceAccount.yaml │ ├── prometheus-serviceMonitor.yaml │ └── setup │ ├── 0namespace-namespace.yaml │ ├── prometheus-operator-0alertmanagerConfigCustomResourceDefinition.yaml │ ├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml │ ├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml │ ├── prometheus-operator-0probeCustomResourceDefinition.yaml │ ├── prometheus-operator-0prometheusCustomResourceDefinition.yaml │ ├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml │ ├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml │ ├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml │ ├── prometheus-operator-clusterRole.yaml │ ├── prometheus-operator-clusterRoleBinding.yaml │ ├── prometheus-operator-deployment.yaml │ ├── prometheus-operator-service.yaml │ └── prometheus-operator-serviceAccount.yaml ├── kubernetes-dashboard └── kubernetes-dashboard.yaml ├── kubesphere └── nfs.yaml ├── metrics └── metrics.yaml └── ubuntu ├── Vagrantfile ├── bootstrap.sh ├── bootstrap_kmaster.sh └── bootstrap_kworker.sh /.gitignore: -------------------------------------------------------------------------------- 1 | .vagrant -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # vagrant-kubernetes-cluster 2 | 3 | [![Powered by DartNode](https://dartnode.com/branding/DN-Open-Source-sm.png)](https://dartnode.com "Powered by DartNode - Free VPS for Open Source") 4 | 5 | **_Vagrant一键安装Kubernetes集群。安装 Metrics Server 、Kuboard 、Kubernetes Dashboard、KubePi、Kubernetes集群监控prometheus-operator等。_** 6 | 7 | **安装环境**: 8 | 9 | - Vagrant 版本: 2.2.18 10 | - VirtualBox 版本: 6.1.26 11 | 12 | 虚拟机网卡设置如图所示: 13 | 14 | ![image-20211012134939433](Screenshots/image-20211012134939433.png) 15 | 16 | **CentOS7 环境安装版本**: 17 | 18 | - CentOS 版本: centos7 19 | - Containerd 版本: 1.4.11 20 | - Kubernetes 版本: v1.22.2 21 | 22 | **Ubuntu 环境安装版本**: 23 | 24 | - Ubuntu 版本: 20.04.2 LTS 25 | - Containerd 版本: 1.5.5 26 | - Kubernetes 版本: v1.22.0 27 | 28 | ## 一键安装 29 | 30 | ```bash 31 | vagrant up 32 | 33 | Bringing machine 'kmaster' up with 'virtualbox' provider... 34 | Bringing machine 'kworker1' up with 'virtualbox' provider... 35 | Bringing machine 'kworker2' up with 'virtualbox' provider... 36 | ==> kmaster: Importing base box 'generic/ubuntu2004'... 37 | ==> kmaster: Matching MAC address for NAT networking... 38 | ==> kmaster: Setting the name of the VM: kmaster 39 | ==> kmaster: Clearing any previously set network interfaces... 40 | ==> kmaster: Preparing network interfaces based on configuration... 41 | kmaster: Adapter 1: nat 42 | kmaster: Adapter 2: hostonly 43 | ==> kmaster: Forwarding ports... 44 | kmaster: 22 (guest) => 2222 (host) (adapter 1) 45 | ==> kmaster: Running 'pre-boot' VM customizations... 46 | ==> kmaster: Booting VM... 47 | ==> kmaster: Waiting for machine to boot. This may take a few minutes... 48 | kmaster: SSH address: 127.0.0.1:2222 49 | kmaster: SSH username: vagrant 50 | kmaster: SSH auth method: private key 51 | kmaster: 52 | kmaster: Vagrant insecure key detected. Vagrant will automatically replace 53 | kmaster: this with a newly generated keypair for better security. 54 | kmaster: 55 | kmaster: Inserting generated public key within guest... 56 | kmaster: Removing insecure key from the guest if it's present... 57 | kmaster: Key inserted! Disconnecting and reconnecting using new SSH key... 58 | ==> kmaster: Machine booted and ready! 59 | ==> kmaster: Checking for guest additions in VM... 60 | ==> kmaster: Setting hostname... 61 | ==> kmaster: Configuring and enabling network interfaces... 62 | ==> kmaster: Mounting shared folders... 63 | kmaster: /vagrant => D:/Vagrant/kubernetes-cluster 64 | ==> kmaster: Running provisioner: shell... 65 | kmaster: Running: C:/Users/swfeng/AppData/Local/Temp/vagrant-shell20211012-49908-1qfj4jz.sh 66 | kmaster: [TASK 0] Setting TimeZone 67 | kmaster: [TASK 1] Setting DNS 68 | kmaster: [TASK 2] Setting Ubuntu System Mirrors 69 | kmaster: [TASK 3] Disable and turn off SWAP 70 | kmaster: [TASK 4] Stop and Disable firewall 71 | kmaster: [TASK 5] Enable and Load Kernel modules 72 | kmaster: [TASK 6] Add Kernel settings 73 | kmaster: [TASK 7] Install containerd runtime 74 | kmaster: [TASK 8] Add apt repo for kubernetes 75 | kmaster: Warning: apt-key output should not be parsed (stdout is not a terminal) 76 | kmaster: OK 77 | kmaster: [TASK 9] Install Kubernetes components (kubeadm, kubelet and kubectl) 78 | kmaster: [TASK 10] Enable ssh password authentication 79 | kmaster: [TASK 11] Set root password 80 | kmaster: [TASK 12] Update /etc/hosts file 81 | ==> kmaster: Running provisioner: shell... 82 | kmaster: Running: C:/Users/swfeng/AppData/Local/Temp/vagrant-shell20211012-49908-11nj6h4.sh 83 | kmaster: [TASK 1] Pull required containers 84 | kmaster: [TASK 2] Initialize Kubernetes Cluster 85 | kmaster: [TASK 3] Deploy Calico network 86 | kmaster: [TASK 4] Generate and save cluster join command to /joincluster.sh 87 | ==> kworker1: Importing base box 'generic/ubuntu2004'... 88 | ==> kworker1: Matching MAC address for NAT networking... 89 | ==> kworker1: Setting the name of the VM: kworker1 90 | ==> kworker1: Fixed port collision for 22 => 2222. Now on port 2200. 91 | ==> kworker1: Clearing any previously set network interfaces... 92 | ==> kworker1: Preparing network interfaces based on configuration... 93 | kworker1: Adapter 1: nat 94 | kworker1: Adapter 2: hostonly 95 | ==> kworker1: Forwarding ports... 96 | kworker1: 22 (guest) => 2200 (host) (adapter 1) 97 | ==> kworker1: Running 'pre-boot' VM customizations... 98 | ==> kworker1: Booting VM... 99 | ==> kworker1: Waiting for machine to boot. This may take a few minutes... 100 | kworker1: SSH address: 127.0.0.1:2200 101 | kworker1: SSH username: vagrant 102 | kworker1: SSH auth method: private key 103 | kworker1: 104 | kworker1: Vagrant insecure key detected. Vagrant will automatically replace 105 | kworker1: this with a newly generated keypair for better security. 106 | kworker1: 107 | kworker1: Inserting generated public key within guest... 108 | kworker1: Removing insecure key from the guest if it's present... 109 | kworker1: Key inserted! Disconnecting and reconnecting using new SSH key... 110 | ==> kworker1: Machine booted and ready! 111 | ==> kworker1: Checking for guest additions in VM... 112 | ==> kworker1: Setting hostname... 113 | ==> kworker1: Configuring and enabling network interfaces... 114 | ==> kworker1: Mounting shared folders... 115 | kworker1: /vagrant => D:/Vagrant/kubernetes-cluster 116 | ==> kworker1: Running provisioner: shell... 117 | kworker1: Running: C:/Users/swfeng/AppData/Local/Temp/vagrant-shell20211012-49908-6qmkd4.sh 118 | kworker1: [TASK 0] Setting TimeZone 119 | kworker1: [TASK 1] Setting DNS 120 | kworker1: [TASK 2] Setting Ubuntu System Mirrors 121 | kworker1: [TASK 3] Disable and turn off SWAP 122 | kworker1: [TASK 4] Stop and Disable firewall 123 | kworker1: [TASK 5] Enable and Load Kernel modules 124 | kworker1: [TASK 6] Add Kernel settings 125 | kworker1: [TASK 7] Install containerd runtime 126 | kworker1: [TASK 8] Add apt repo for kubernetes 127 | kworker1: Warning: apt-key output should not be parsed (stdout is not a terminal) 128 | kworker1: OK 129 | kworker1: [TASK 9] Install Kubernetes components (kubeadm, kubelet and kubectl) 130 | kworker1: [TASK 10] Enable ssh password authentication 131 | kworker1: [TASK 11] Set root password 132 | kworker1: [TASK 12] Update /etc/hosts file 133 | ==> kworker1: Running provisioner: shell... 134 | kworker1: Running: C:/Users/swfeng/AppData/Local/Temp/vagrant-shell20211012-49908-vmdbxa.sh 135 | kworker1: [TASK 1] Join node to Kubernetes Cluster 136 | ==> kworker2: Importing base box 'generic/ubuntu2004'... 137 | ==> kworker2: Matching MAC address for NAT networking... 138 | ==> kworker2: Setting the name of the VM: kworker2 139 | ==> kworker2: Fixed port collision for 22 => 2222. Now on port 2201. 140 | ==> kworker2: Clearing any previously set network interfaces... 141 | ==> kworker2: Preparing network interfaces based on configuration... 142 | kworker2: Adapter 1: nat 143 | kworker2: Adapter 2: hostonly 144 | ==> kworker2: Forwarding ports... 145 | kworker2: 22 (guest) => 2201 (host) (adapter 1) 146 | ==> kworker2: Running 'pre-boot' VM customizations... 147 | ==> kworker2: Booting VM... 148 | ==> kworker2: Waiting for machine to boot. This may take a few minutes... 149 | kworker2: SSH address: 127.0.0.1:2201 150 | kworker2: SSH username: vagrant 151 | kworker2: SSH auth method: private key 152 | kworker2: 153 | kworker2: Vagrant insecure key detected. Vagrant will automatically replace 154 | kworker2: this with a newly generated keypair for better security. 155 | kworker2: 156 | kworker2: Inserting generated public key within guest... 157 | kworker2: Removing insecure key from the guest if it's present... 158 | kworker2: Key inserted! Disconnecting and reconnecting using new SSH key... 159 | ==> kworker2: Machine booted and ready! 160 | ==> kworker2: Checking for guest additions in VM... 161 | ==> kworker2: Setting hostname... 162 | ==> kworker2: Configuring and enabling network interfaces... 163 | ==> kworker2: Mounting shared folders... 164 | kworker2: /vagrant => D:/Vagrant/kubernetes-cluster 165 | ==> kworker2: Running provisioner: shell... 166 | kworker2: Running: C:/Users/swfeng/AppData/Local/Temp/vagrant-shell20211012-49908-1s6ys4c.sh 167 | kworker2: [TASK 0] Setting TimeZone 168 | kworker2: [TASK 1] Setting DNS 169 | kworker2: [TASK 2] Setting Ubuntu System Mirrors 170 | kworker2: [TASK 3] Disable and turn off SWAP 171 | kworker2: [TASK 4] Stop and Disable firewall 172 | kworker2: [TASK 5] Enable and Load Kernel modules 173 | kworker2: [TASK 6] Add Kernel settings 174 | kworker2: [TASK 7] Install containerd runtime 175 | kworker2: [TASK 8] Add apt repo for kubernetes 176 | kworker2: Warning: apt-key output should not be parsed (stdout is not a terminal) 177 | kworker2: OK 178 | kworker2: [TASK 9] Install Kubernetes components (kubeadm, kubelet and kubectl) 179 | kworker2: [TASK 10] Enable ssh password authentication 180 | kworker2: [TASK 11] Set root password 181 | kworker2: [TASK 12] Update /etc/hosts file 182 | ==> kworker2: Running provisioner: shell... 183 | kworker2: Running: C:/Users/swfeng/AppData/Local/Temp/vagrant-shell20211012-49908-1qxwo1n.sh 184 | kworker2: [TASK 1] Join node to Kubernetes Cluster 185 | ``` 186 | 187 | > 安装后三台机器的 IP 为: 188 | 189 | | 机器名 | IP | 190 | | :------: | :------------: | 191 | | kmaster | 192.168.56.100 | 192 | | kworker1 | 192.168.56.101 | 193 | | kworker2 | 192.168.56.102 | 194 | 195 | > `root`用户密码为`kubeadmin` 196 | 197 | ## 配置.kube/config 198 | 199 | ```bash 200 | root@kmaster:~# mkdir -p $HOME/.kube 201 | root@kmaster:~# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config 202 | root@kmaster:~# sudo chown $(id -u):$(id -g) $HOME/.kube/config 203 | ``` 204 | 205 | 集群状态: 206 | 207 | ```bash 208 | root@kmaster:~# kubectl cluster-info 209 | Kubernetes control plane is running at https://kmaster.k8s.com:6443 210 | CoreDNS is running at https://kmaster.k8s.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy 211 | ``` 212 | 213 | ```bash 214 | root@kmaster:~# kubectl get node,po,svc -A -owide 215 | 216 | Every 2.0s: kubectl get node,po,svc -A -owide kmaster: Tue Oct 12 13:53:57 2021 217 | 218 | NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 219 | node/kmaster Ready control-plane,master 20m v1.22.0 192.168.56.100 Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.5.5 220 | node/kworker1 Ready 9m40s v1.22.0 192.168.56.101 Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.5.5 221 | node/kworker2 Ready 7m35s v1.22.0 192.168.56.102 Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.5.5 222 | 223 | NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES 224 | kube-system pod/calico-kube-controllers-7659fb8886-dwvc4 1/1 Running 0 20m 192.168.189.2 kmaster 225 | kube-system pod/calico-node-2w8x5 1/1 Running 0 20m 192.168.56.100 kmaster 226 | kube-system pod/calico-node-vqjsc 1/1 Running 0 7m35s 192.168.56.102 kworker2 227 | kube-system pod/calico-node-zj98h 1/1 Running 0 9m40s 192.168.56.101 kworker1 228 | kube-system pod/coredns-7568f67dbd-4jssz 1/1 Running 0 20m 192.168.189.3 kmaster 229 | kube-system pod/coredns-7568f67dbd-vn8ph 1/1 Running 0 20m 192.168.189.1 kmaster 230 | kube-system pod/etcd-kmaster 1/1 Running 0 20m 192.168.56.100 kmaster 231 | kube-system pod/kube-apiserver-kmaster 1/1 Running 0 20m 192.168.56.100 kmaster 232 | kube-system pod/kube-controller-manager-kmaster 1/1 Running 0 20m 192.168.56.100 kmaster 233 | kube-system pod/kube-proxy-2sqmm 1/1 Running 0 7m35s 192.168.56.102 kworker2 234 | kube-system pod/kube-proxy-8z758 1/1 Running 0 20m 192.168.56.100 kmaster 235 | kube-system pod/kube-proxy-brgl8 1/1 Running 0 9m40s 192.168.56.101 kworker1 236 | kube-system pod/kube-scheduler-kmaster 1/1 Running 0 20m 192.168.56.100 kmaster 237 | 238 | NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR 239 | default service/kubernetes ClusterIP 10.96.0.1 443/TCP 20m 240 | kube-system service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 20m k8s-app=kube-dns 241 | ``` 242 | 243 | ## 安装 metrics-server 244 | 245 | ```bash 246 | root@kmaster:/vagrant/metrics# kubectl apply -f metrics.yaml 247 | serviceaccount/metrics-server created 248 | clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created 249 | clusterrole.rbac.authorization.k8s.io/system:metrics-server created 250 | rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created 251 | clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created 252 | clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created 253 | service/metrics-server created 254 | deployment.apps/metrics-server created 255 | apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created 256 | ``` 257 | 258 | ## 安装 kuboard 259 | 260 | ```bash 261 | root@kmaster:~# kubectl apply -f https://addons.kuboard.cn/kuboard/kuboard-v3.yaml 262 | namespace/kuboard created 263 | configmap/kuboard-v3-config created 264 | serviceaccount/kuboard-boostrap created 265 | clusterrolebinding.rbac.authorization.k8s.io/kuboard-boostrap-crb created 266 | daemonset.apps/kuboard-etcd created 267 | deployment.apps/kuboard-v3 created 268 | service/kuboard-v3 created 269 | 270 | ``` 271 | 272 | 访问 kuboard http://192.168.56.100:30080 273 | 274 | > 用户名: admin 275 | > 密码: Kuboard123 276 | 277 | ![image-20211012140900479](Screenshots/image-20211012140900479.png) 278 | 279 | ## 安装 kubernetes-dashboard 280 | 281 | ```bash 282 | 283 | root@kmaster:/vagrant/kubernetes-dashboard# kubectl apply -f kubernetes-dashboard.yaml 284 | namespace/kubernetes-dashboard created 285 | serviceaccount/kubernetes-dashboard created 286 | service/kubernetes-dashboard created 287 | secret/kubernetes-dashboard-certs created 288 | secret/kubernetes-dashboard-csrf created 289 | secret/kubernetes-dashboard-key-holder created 290 | configmap/kubernetes-dashboard-settings created 291 | role.rbac.authorization.k8s.io/kubernetes-dashboard created 292 | clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created 293 | rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created 294 | clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created 295 | deployment.apps/kubernetes-dashboard created 296 | service/dashboard-metrics-scraper created 297 | Warning: spec.template.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: deprecated since v1.19; use the "seccompProfile" field instead 298 | deployment.apps/dashboard-metrics-scraper created 299 | serviceaccount/admin-user created 300 | clusterrolebinding.rbac.authorization.k8s.io/admin-user created 301 | 302 | # 执行下面命令后手动将type: ClusterIP 改为 type: NodePort 303 | root@kmaster:~# kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard 304 | 305 | # 查看svc,放行端口 306 | root@kmaster:~# kubectl get svc -A |grep kubernetes-dashboard 307 | 308 | kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.111.109.182 8000/TCP 2m53s 309 | kubernetes-dashboard kubernetes-dashboard NodePort 10.97.250.165 443:31825/TCP 2m53s 310 | 311 | 312 | # 获取访问令牌 313 | root@kmaster:~# kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}" 314 | 315 | eyJhbGciOiJSUzI1NiIsImtpZCI6Ik9BODl1TGtTRjUzWUl4dnJKUHdpYnB1V0RIZGpxNkxoT2VMWEEzNW1yVk0ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLXdtN3hqIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIzNzAzOGNhZC1jYjE2LTQ3ZjAtYTIxZS1hODNlNjhjYjA4ZGMiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.iPxLZnueJz9y2ngFTtgEuZ36Ae0QLK2oFXEBXinYcsM5712_sw3iyYODB9Eyu9AzscMDin-jL4ssctl6dQt-3PD6vdrLjSWAlDNK_PXXYlnFCTehrcFjZNGWv3yM7e5dfUOqmrl0ROwYEKFtF93sQAYPtXHZUqDnQOQ15VE-NVd7RyCgHHNtCiV_UeDrRg7M0YBvPtL24w35MaaKyeLIs_YWZpNgjV3zNfdl86Lo3SEoU0_nVAqwZzBroUxrE6ekBDGisWvQ6NtrEZLRTgk2izPCUiT3XOj4bENwf3Ba1bCKGvIzmWx41KIVdNamN_c1YOiY1HL__1ryKwMad4JR-w 316 | ``` 317 | 318 | 访问 kubernetes-dashboard https://192.168.56.100:31825 319 | 320 | ![image-20211012140957412](Screenshots/image-20211012140957412.png) 321 | 322 | ## 集群概况 323 | 324 | ```bash 325 | Every 2.0s: kubectl get node,po,svc -A -owide kmaster: Tue Oct 12 14:08:09 2021 326 | 327 | NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 328 | node/kmaster Ready control-plane,master 35m v1.22.0 192.168.56.100 Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.5.5 329 | node/kworker1 Ready 23m v1.22.0 192.168.56.101 Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.5.5 330 | node/kworker2 Ready 21m v1.22.0 192.168.56.102 Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.5.5 331 | 332 | NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES 333 | kube-system pod/calico-kube-controllers-7659fb8886-dwvc4 1/1 Running 0 34m 192.168.189.2 kmaster 334 | kube-system pod/calico-node-2w8x5 1/1 Running 0 34m 192.168.56.100 kmaster 335 | kube-system pod/calico-node-vqjsc 1/1 Running 0 21m 192.168.56.102 kworker2 336 | kube-system pod/calico-node-zj98h 1/1 Running 0 23m 192.168.56.101 kworker1 337 | kube-system pod/coredns-7568f67dbd-4jssz 1/1 Running 0 34m 192.168.189.3 kmaster 338 | kube-system pod/coredns-7568f67dbd-vn8ph 1/1 Running 0 34m 192.168.189.1 kmaster 339 | kube-system pod/etcd-kmaster 1/1 Running 0 34m 192.168.56.100 kmaster 340 | kube-system pod/kube-apiserver-kmaster 1/1 Running 0 35m 192.168.56.100 kmaster 341 | kube-system pod/kube-controller-manager-kmaster 1/1 Running 0 34m 192.168.56.100 kmaster 342 | kube-system pod/kube-proxy-2sqmm 1/1 Running 0 21m 192.168.56.102 kworker2 343 | kube-system pod/kube-proxy-8z758 1/1 Running 0 34m 192.168.56.100 kmaster 344 | kube-system pod/kube-proxy-brgl8 1/1 Running 0 23m 192.168.56.101 kworker1 345 | kube-system pod/kube-scheduler-kmaster 1/1 Running 0 35m 192.168.56.100 kmaster 346 | kube-system pod/metrics-server-9577d976b-xzrgt 1/1 Running 0 9m27s 192.168.41.129 kworker1 347 | kubernetes-dashboard pod/dashboard-metrics-scraper-856586f554-kdgtw 1/1 Running 0 6m57s 192.168.41.130 kworker1 348 | kubernetes-dashboard pod/kubernetes-dashboard-67484c44f6-lbp5l 1/1 Running 0 6m57s 192.168.77.129 kworker2 349 | kuboard pod/kuboard-agent-2-767f88b647-pr7br 1/1 Running 1 (5m57s ago) 6m26s 192.168.189.5 kmaster 350 | kuboard pod/kuboard-agent-656c95877f-g968n 1/1 Running 1 (5m37s ago) 6m26s 192.168.189.6 kmaster 351 | kuboard pod/kuboard-etcd-th9nq 1/1 Running 0 8m39s 192.168.56.100 kmaster 352 | kuboard pod/kuboard-questdb-68d5bfb5b-2tnwf 1/1 Running 0 6m26s 192.168.189.7 kmaster 353 | kuboard pod/kuboard-v3-5fc46b5557-44hlj 1/1 Running 0 8m39s 192.168.189.4 kmaster 354 | ``` 355 | 356 | ## 安装KubePi 357 | 358 | https://kubeoperator.io/docs/kubepi/install/ 359 | 360 | ```bash 361 | kubectl apply -f https://raw.githubusercontent.com/KubeOperator/KubePi/master/docs/deploy/kubectl/kubepi.yaml 362 | ``` 363 | 364 | 获取访问地址 365 | 366 | ```bash 367 | # 获取 NodeIp 368 | export NODE_IP=$(kubectl get nodes -o jsonpath="{.items[0].status.addresses[0].address}") 369 | 370 | # 获取 NodePort 371 | export NODE_PORT=$(kubectl -n kube-system get services kubepi -o jsonpath="{.spec.ports[0].nodePort}") 372 | 373 | # 获取 Address 374 | echo http://$NODE_IP:$NODE_PORT 375 | ``` 376 | 377 | 登录 378 | 379 | ``` 380 | 地址: http://$NODE_IP:$NODE_PORT 381 | 用户名: admin 382 | 密码: kubepi 383 | ``` 384 | 385 | 导入集群,获取token 386 | 387 | ```bash 388 | kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}" 389 | ``` 390 | 391 | ![2021-10-28_134300.png](Screenshots/2021-10-28_134300.png) 392 | 393 | ![2021-10-28_134337.png](Screenshots/2021-10-28_134337.png) 394 | 395 | ![2021-10-28_134639.png](Screenshots/2021-10-28_134639.png) 396 | 397 | --- 398 | 399 | **以下环境需要调整虚拟机配置,至少需4核8G内存** 400 | 401 | ## 安装KubeSphere 402 | 403 | ### 安装KubeSphere前置环境 404 | 405 | 安装nfs文件系统 406 | 407 | #### 安装nfs-server 408 | 409 | ```bash 410 | # 在每个机器。 411 | yum install -y nfs-utils 412 | 413 | # 在kmaster 执行以下命令 192.168.56.100 414 | echo "/nfs/data/ *(insecure,rw,sync,no_root_squash)" > /etc/exports 415 | 416 | # 执行以下命令,启动 nfs 服务;创建共享目录 417 | mkdir -p /nfs/data 418 | 419 | # 在master执行 420 | systemctl enable rpcbind 421 | systemctl enable nfs-server 422 | systemctl start rpcbind 423 | systemctl start nfs-server 424 | 425 | # 使配置生效 426 | exportfs -r 427 | 428 | #检查配置是否生效 429 | exportfs 430 | ``` 431 | 432 | #### 配置nfs-client 433 | 434 | ```bash 435 | showmount -e 192.168.56.100 436 | mkdir -p /nfs/data 437 | mount -t nfs 192.168.56.100:/nfs/data /nfs/data 438 | ``` 439 | 440 | #### 配置默认存储 441 | 442 | 配置动态供应的默认存储类 443 | 444 | ```yaml 445 | ## 创建了一个存储类 446 | apiVersion: storage.k8s.io/v1 447 | kind: StorageClass 448 | metadata: 449 | name: nfs-storage 450 | annotations: 451 | storageclass.kubernetes.io/is-default-class: "true" 452 | provisioner: k8s-sigs.io/nfs-subdir-external-provisioner 453 | parameters: 454 | archiveOnDelete: "true" ## 删除pv的时候,pv的内容是否要备份 455 | 456 | --- 457 | apiVersion: apps/v1 458 | kind: Deployment 459 | metadata: 460 | name: nfs-client-provisioner 461 | labels: 462 | app: nfs-client-provisioner 463 | # replace with namespace where provisioner is deployed 464 | namespace: default 465 | spec: 466 | replicas: 1 467 | strategy: 468 | type: Recreate 469 | selector: 470 | matchLabels: 471 | app: nfs-client-provisioner 472 | template: 473 | metadata: 474 | labels: 475 | app: nfs-client-provisioner 476 | spec: 477 | serviceAccountName: nfs-client-provisioner 478 | containers: 479 | - name: nfs-client-provisioner 480 | image: docker.io/v5cn/nfs-subdir-external-provisioner:v4.0.2 481 | # resources: 482 | # limits: 483 | # cpu: 10m 484 | # requests: 485 | # cpu: 10m 486 | volumeMounts: 487 | - name: nfs-client-root 488 | mountPath: /persistentvolumes 489 | env: 490 | - name: PROVISIONER_NAME 491 | value: k8s-sigs.io/nfs-subdir-external-provisioner 492 | - name: NFS_SERVER 493 | value: 192.168.56.100 ## 指定自己nfs服务器地址 494 | - name: NFS_PATH 495 | value: /nfs/data ## nfs服务器共享的目录 496 | volumes: 497 | - name: nfs-client-root 498 | nfs: 499 | server: 192.168.56.100 500 | path: /nfs/data 501 | --- 502 | apiVersion: v1 503 | kind: ServiceAccount 504 | metadata: 505 | name: nfs-client-provisioner 506 | # replace with namespace where provisioner is deployed 507 | namespace: default 508 | --- 509 | kind: ClusterRole 510 | apiVersion: rbac.authorization.k8s.io/v1 511 | metadata: 512 | name: nfs-client-provisioner-runner 513 | rules: 514 | - apiGroups: [""] 515 | resources: ["nodes"] 516 | verbs: ["get", "list", "watch"] 517 | - apiGroups: [""] 518 | resources: ["persistentvolumes"] 519 | verbs: ["get", "list", "watch", "create", "delete"] 520 | - apiGroups: [""] 521 | resources: ["persistentvolumeclaims"] 522 | verbs: ["get", "list", "watch", "update"] 523 | - apiGroups: ["storage.k8s.io"] 524 | resources: ["storageclasses"] 525 | verbs: ["get", "list", "watch"] 526 | - apiGroups: [""] 527 | resources: ["events"] 528 | verbs: ["create", "update", "patch"] 529 | --- 530 | kind: ClusterRoleBinding 531 | apiVersion: rbac.authorization.k8s.io/v1 532 | metadata: 533 | name: run-nfs-client-provisioner 534 | subjects: 535 | - kind: ServiceAccount 536 | name: nfs-client-provisioner 537 | # replace with namespace where provisioner is deployed 538 | namespace: default 539 | roleRef: 540 | kind: ClusterRole 541 | name: nfs-client-provisioner-runner 542 | apiGroup: rbac.authorization.k8s.io 543 | --- 544 | kind: Role 545 | apiVersion: rbac.authorization.k8s.io/v1 546 | metadata: 547 | name: leader-locking-nfs-client-provisioner 548 | # replace with namespace where provisioner is deployed 549 | namespace: default 550 | rules: 551 | - apiGroups: [""] 552 | resources: ["endpoints"] 553 | verbs: ["get", "list", "watch", "create", "update", "patch"] 554 | --- 555 | kind: RoleBinding 556 | apiVersion: rbac.authorization.k8s.io/v1 557 | metadata: 558 | name: leader-locking-nfs-client-provisioner 559 | # replace with namespace where provisioner is deployed 560 | namespace: default 561 | subjects: 562 | - kind: ServiceAccount 563 | name: nfs-client-provisioner 564 | # replace with namespace where provisioner is deployed 565 | namespace: default 566 | roleRef: 567 | kind: Role 568 | name: leader-locking-nfs-client-provisioner 569 | apiGroup: rbac.authorization.k8s.io 570 | ``` 571 | 572 | #### 确认配置是否生效 573 | 574 | ```bash 575 | kubectl get sc 576 | ``` 577 | 578 | 579 | ### 安装KubeSphere 580 | 581 | *KubeSphere目前还不支持kubernetes 1.22,这部分内容稍后就来...* 582 | 583 | 584 | 585 | 586 | 587 | ## 安装Kubernetes集群监控prometheus-operator 588 | 589 | 590 | 591 | ### 查看集群信息 592 | 593 | ```bash 594 | kubectl cluster-info 595 | ``` 596 | 597 | ### 克隆prometheus-operator 598 | 599 | ```bash 600 | git clone https://github.com/prometheus-operator/kube-prometheus.git 601 | cd kube-prometheus 602 | ``` 603 | 604 | ### 创建namespace, CustomResourceDefinitions & operator pod 605 | 606 | > 因为原配置里面的好多镜拉取不下来,因此应用修改过的配置文件(当前目录下的kube-prometheus) 607 | 608 | ```bash 609 | kubectl apply -f manifests/setup 610 | ``` 611 | 612 | ### 查看namespace 613 | 614 | ```bash 615 | kubectl get ns monitoring 616 | ``` 617 | 618 | ### 查看pod 619 | 620 | ```bash 621 | kubectl get pods -n monitoring 622 | ``` 623 | 624 | ### 应用部署配置文件 625 | 626 | ```bash 627 | kubectl apply -f manifests/ 628 | ``` 629 | 630 | ### 查看pods,svc 631 | 632 | ```bash 633 | kubectl get pods,svc -n monitoring 634 | ``` 635 | 636 | ### 调整SVC访问方式 637 | 638 | Prometheus: 639 | 640 | ```bash 641 | kubectl --namespace monitoring patch svc prometheus-k8s -p '{"spec": {"type": "NodePort"}}' 642 | ``` 643 | 644 | Alertmanager: 645 | 646 | ```bash 647 | kubectl --namespace monitoring patch svc alertmanager-main -p '{"spec": {"type": "NodePort"}}' 648 | ``` 649 | 650 | Grafana: 651 | 652 | ```bash 653 | kubectl --namespace monitoring patch svc grafana -p '{"spec": {"type": "NodePort"}}' 654 | ``` 655 | 656 | ### 查看端口 657 | 658 | ```bash 659 | $ kubectl -n monitoring get svc | grep NodePort 660 | alertmanager-main NodePort 10.96.212.116 9093:30496/TCP,8080:30519/TCP 7m53s 661 | grafana NodePort 10.96.216.187 3000:31045/TCP 7m50s 662 | prometheus-k8s NodePort 10.96.180.95 9090:30253/TCP,8080:30023/TCP 7m44s 663 | ``` 664 | 665 | 访问 Grafana Dashboard 666 | 667 | http://192.168.56.100:31045 668 | 669 | ``` 670 | Username: admin 671 | Password: admin 672 | ``` 673 | 674 | ![2021-10-29_162836.png](Screenshots/2021-10-29_162836.png) 675 | 676 | ![2021-10-29_163551.png](Screenshots/2021-10-29_163551.png) 677 | 678 | ![2021-10-29_163637.png](Screenshots/2021-10-29_163637.png) 679 | 680 | ![2021-10-29_163837.png](Screenshots/2021-10-29_163837.png) 681 | 682 | ![2021-10-29_164027.png](Screenshots/2021-10-29_164027.png) 683 | 684 | 685 | 访问 Prometheus Dashboard 686 | 687 | http://192.168.56.100:30253 688 | 689 | 690 | 访问 Alert Manager Dashboard 691 | 692 | http://192.168.56.100:30496 693 | 694 | 695 | ### 销毁prometheus-operator监控服务 696 | 697 | ```bash 698 | kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup 699 | ``` 700 | 701 | https://computingforgeeks.com/setup-prometheus-and-grafana-on-kubernetes 702 | -------------------------------------------------------------------------------- /Screenshots/2021-10-28_134300.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-28_134300.png -------------------------------------------------------------------------------- /Screenshots/2021-10-28_134337.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-28_134337.png -------------------------------------------------------------------------------- /Screenshots/2021-10-28_134639.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-28_134639.png -------------------------------------------------------------------------------- /Screenshots/2021-10-29_162836.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-29_162836.png -------------------------------------------------------------------------------- /Screenshots/2021-10-29_163551.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-29_163551.png -------------------------------------------------------------------------------- /Screenshots/2021-10-29_163637.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-29_163637.png -------------------------------------------------------------------------------- /Screenshots/2021-10-29_163837.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-29_163837.png -------------------------------------------------------------------------------- /Screenshots/2021-10-29_164027.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/2021-10-29_164027.png -------------------------------------------------------------------------------- /Screenshots/image-20211012134939433.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/image-20211012134939433.png -------------------------------------------------------------------------------- /Screenshots/image-20211012140900479.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/image-20211012140900479.png -------------------------------------------------------------------------------- /Screenshots/image-20211012140957412.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/v5tech/vagrant-kubernetes-cluster/c4ca8c9b21bb3e379ebfc0d66dd46a7cf8ce4799/Screenshots/image-20211012140957412.png -------------------------------------------------------------------------------- /centos/Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | ENV['VAGRANT_NO_PARALLEL'] = 'yes' 5 | 6 | Vagrant.configure(2) do |config| 7 | 8 | config.vm.provision "shell", path: "bootstrap.sh" 9 | # config.vm.synced_folder ".", "/vagrant", type: "virtualbox" 10 | 11 | # Kubernetes Master Server 12 | config.vm.define "kmaster" do |node| 13 | 14 | node.vm.box = "generic/centos7" 15 | node.vm.box_check_update = false 16 | node.vm.box_version = "3.4.2" 17 | node.vm.hostname = "kmaster" 18 | 19 | node.vm.network "private_network", ip: "192.168.56.100" 20 | 21 | node.vm.provider :virtualbox do |v| 22 | v.name = "kmaster-centos7" 23 | v.memory = 2048 24 | v.cpus = 2 25 | end 26 | 27 | node.vm.provision "shell", path: "bootstrap_kmaster.sh" 28 | 29 | end 30 | 31 | 32 | # Kubernetes Worker Nodes 33 | NodeCount = 2 34 | 35 | (1..NodeCount).each do |i| 36 | 37 | config.vm.define "kworker#{i}" do |node| 38 | 39 | node.vm.box = "generic/centos7" 40 | node.vm.box_check_update = false 41 | node.vm.box_version = "3.4.2" 42 | node.vm.hostname = "kworker#{i}" 43 | 44 | node.vm.network "private_network", ip: "192.168.56.10#{i}" 45 | 46 | node.vm.provider :virtualbox do |v| 47 | v.name = "kworker#{i}-centos7" 48 | v.memory = 2048 49 | v.cpus = 2 50 | end 51 | 52 | node.vm.provision "shell", path: "bootstrap_kworker.sh" 53 | 54 | end 55 | 56 | end 57 | 58 | end 59 | -------------------------------------------------------------------------------- /centos/bootstrap.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ## !IMPORTANT ## 4 | # 5 | ## This script is tested only in the generic/centos7 Vagrant box 6 | ## If you use a different version of CentOS or a different CentOS Vagrant box test this again 7 | # 8 | 9 | echo "[TASK 0] Setting TimeZone" 10 | timedatectl set-timezone Asia/Shanghai 11 | 12 | echo "[TASK 1] Setting DNS" 13 | 14 | echo "[TASK 2] Setting CentOS System Mirrors" 15 | mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup 16 | wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo >/dev/null 2>&1 17 | # yum clean all >/dev/null 2>&1 18 | # yum makecache fast >/dev/null 2>&1 19 | 20 | echo "[TASK 3] Disable and turn off SWAP" 21 | sed -ri 's/.*swap.*/#&/' /etc/fstab 22 | swapoff -a 23 | 24 | echo "[TASK 4] Disable SeLinux" 25 | setenforce 0 26 | sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config 27 | 28 | echo "[TASK 5] Stop and Disable firewall" 29 | systemctl stop firewalld 30 | systemctl disable firewalld >/dev/null 2>&1 31 | 32 | echo "[TASK 6] Enable and Load Kernel modules" 33 | cat >>/etc/modules-load.d/containerd.conf <>/etc/sysctl.d/kubernetes.conf </dev/null 2>&1 47 | 48 | echo "[TASK 8] Install containerd runtime" 49 | yum install -y yum-utils device-mapper-persistent-data lvm2 >/dev/null 2>&1 50 | yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo >/dev/null 2>&1 51 | sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo 52 | yum makecache fast >/dev/null 2>&1 53 | yum install -y containerd.io >/dev/null 2>&1 54 | containerd config default > /etc/containerd/config.toml 55 | # 配置containerd镜像源 56 | # 替换k8s.gcr.io为registry.aliyuncs.com/k8sxio 57 | # 替换https://registry-1.docker.io为https://registry.cn-hangzhou.aliyuncs.com 58 | # 设置k8s.gcr.io的镜像地址为https://registry.aliyuncs.com/k8sxio 59 | sed -i "s#k8s.gcr.io#registry.aliyuncs.com/k8sxio#g" /etc/containerd/config.toml 60 | sed -i '/containerd.runtimes.runc.options/a\ \ \ \ \ \ \ \ \ \ \ \ SystemdCgroup = true' /etc/containerd/config.toml 61 | sed -i "s#https://registry-1.docker.io#https://8bfcfsp1.mirror.aliyuncs.com#g" /etc/containerd/config.toml 62 | sed -i '/\[plugins\.\"io\.containerd\.grpc\.v1\.cri\"\.registry\.mirrors\]/ a\\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]\n\ \ \ \ \ \ \ \ \ \ endpoint = ["https://registry.aliyuncs.com/k8sxio"]' /etc/containerd/config.toml 63 | systemctl daemon-reload 64 | systemctl enable containerd --now >/dev/null 2>&1 65 | systemctl restart containerd 66 | 67 | echo "[TASK 9] Add apt repo for kubernetes" 68 | cat < /etc/yum.repos.d/kubernetes.repo 69 | [kubernetes] 70 | name=Kubernetes 71 | baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ 72 | enabled=1 73 | gpgcheck=1 74 | repo_gpgcheck=1 75 | gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg 76 | EOF 77 | 78 | echo "[TASK 10] Install Kubernetes components (kubeadm, kubelet and kubectl)" 79 | yum install -y --disableexcludes=kubernetes kubeadm-1.22.2-0 kubelet-1.22.2-0 kubectl-1.22.2-0 >/dev/null 2>&1 80 | crictl config runtime-endpoint /run/containerd/containerd.sock 81 | crictl config image-endpoint /run/containerd/containerd.sock 82 | systemctl daemon-reload 83 | systemctl enable --now kubelet >/dev/null 2>&1 84 | systemctl start kubelet >/dev/null 2>&1 85 | 86 | echo "[TASK 11] Enable ssh password authentication" 87 | sed -i 's/^PasswordAuthentication .*/PasswordAuthentication yes/' /etc/ssh/sshd_config 88 | echo 'PermitRootLogin yes' >> /etc/ssh/sshd_config 89 | systemctl reload sshd 90 | 91 | echo "[TASK 12] Set root password" 92 | echo -e "kubeadmin\nkubeadmin" | passwd root >/dev/null 2>&1 93 | echo "export TERM=xterm" >> /etc/bash.bashrc 94 | 95 | echo "[TASK 13] Update /etc/hosts file" 96 | echo "192.168.56.100 apiserver.endpoint" >> /etc/hosts 97 | cat >>/etc/hosts </dev/null 2>&1 8 | 9 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.2 >/dev/null 2>&1 10 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.2 >/dev/null 2>&1 11 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.2 >/dev/null 2>&1 12 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.2 >/dev/null 2>&1 13 | # ctr images pull registry.aliyuncs.com/k8sxio/pause:3.5 >/dev/null 2>&1 14 | # ctr images pull registry.aliyuncs.com/k8sxio/etcd:3.5.0-0 >/dev/null 2>&1 15 | # ctr -n k8s.io images pull docker.io/v5cn/coredns:v1.8.4 >/dev/null 2>&1 16 | # ctr -n k8s.io images tag docker.io/v5cn/coredns:v1.8.4 registry.aliyuncs.com/k8sxio/coredns:v1.8.4 >/dev/null 2>&1 17 | 18 | # 曲线救国,拉取kubernetes所需镜像 19 | kubeadm config images list | grep -v 'coredns' | sed 's#k8s.gcr.io#ctr images pull registry.aliyuncs.com\/k8sxio#g' > images.sh 20 | # registry.aliyuncs.com/k8sxio 仓库中没有coredns镜像,再次曲线救国拉取coredns镜像 21 | # containerd环境下镜像存在namespace隔离,kubernetes的镜像在k8s.io namespace下,因此需要指定namespace 22 | # 拉取到镜像后,将镜像标记为registry.aliyuncs.com/k8sxio/coredns:v1.8.4 后面的 kubeadm init 指定了image-repository为registry.aliyuncs.com/k8sxio 23 | cat >> images.sh </dev/null 2>&1 28 | 29 | echo "[TASK 2] Initialize Kubernetes Cluster" 30 | kubeadm init \ 31 | --apiserver-advertise-address=192.168.56.100 \ 32 | --control-plane-endpoint=apiserver.endpoint \ 33 | --kubernetes-version v1.22.2 \ 34 | --image-repository registry.aliyuncs.com/k8sxio \ 35 | --service-cidr=10.96.0.0/16 \ 36 | --pod-network-cidr=${POD_CIDR} > /root/kubeinit.log 2>/dev/null 37 | 38 | echo "[TASK 3] Deploy Calico network" 39 | curl -s https://docs.projectcalico.org/v3.18/manifests/calico.yaml > /root/calico.yaml 40 | sed -i 's@# - name: CALICO_IPV4POOL_CIDR@- name: CALICO_IPV4POOL_CIDR@g; s@# value: "192.168.0.0/16"@ value: '"${POD_CIDR}"'@g' /root/calico.yaml 41 | kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f /root/calico.yaml >/dev/null 2>&1 42 | 43 | echo "[TASK 4] Generate and save cluster join command to /joincluster.sh" 44 | kubeadm token create --print-join-command > /root/joincluster.sh 2>/dev/null 45 | -------------------------------------------------------------------------------- /centos/bootstrap_kworker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "[TASK 1] Join node to Kubernetes Cluster" 4 | yum install -y sshpass >/dev/null 2>&1 5 | sshpass -p "kubeadmin" scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no apiserver.endpoint:/root/joincluster.sh /root/joincluster.sh 2>/dev/null 6 | bash /root/joincluster.sh >/dev/null 2>&1 7 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/alertmanager-alertmanager.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: Alertmanager 3 | metadata: 4 | labels: 5 | alertmanager: main 6 | app.kubernetes.io/component: alert-router 7 | app.kubernetes.io/name: alertmanager 8 | app.kubernetes.io/part-of: kube-prometheus 9 | app.kubernetes.io/version: 0.23.0 10 | name: main 11 | namespace: monitoring 12 | spec: 13 | image: quay.io/prometheus/alertmanager:v0.23.0 14 | nodeSelector: 15 | kubernetes.io/os: linux 16 | podMetadata: 17 | labels: 18 | app.kubernetes.io/component: alert-router 19 | app.kubernetes.io/name: alertmanager 20 | app.kubernetes.io/part-of: kube-prometheus 21 | app.kubernetes.io/version: 0.23.0 22 | replicas: 3 23 | resources: 24 | limits: 25 | cpu: 100m 26 | memory: 100Mi 27 | requests: 28 | cpu: 4m 29 | memory: 100Mi 30 | securityContext: 31 | fsGroup: 2000 32 | runAsNonRoot: true 33 | runAsUser: 1000 34 | serviceAccountName: alertmanager-main 35 | version: 0.23.0 36 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/alertmanager-podDisruptionBudget.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: policy/v1 2 | kind: PodDisruptionBudget 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: alert-router 6 | app.kubernetes.io/name: alertmanager 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.23.0 9 | name: alertmanager-main 10 | namespace: monitoring 11 | spec: 12 | maxUnavailable: 1 13 | selector: 14 | matchLabels: 15 | alertmanager: main 16 | app.kubernetes.io/component: alert-router 17 | app.kubernetes.io/name: alertmanager 18 | app.kubernetes.io/part-of: kube-prometheus 19 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/alertmanager-prometheusRule.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: PrometheusRule 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: alert-router 6 | app.kubernetes.io/name: alertmanager 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.23.0 9 | prometheus: k8s 10 | role: alert-rules 11 | name: alertmanager-main-rules 12 | namespace: monitoring 13 | spec: 14 | groups: 15 | - name: alertmanager.rules 16 | rules: 17 | - alert: AlertmanagerFailedReload 18 | annotations: 19 | description: Configuration has failed to load for {{ $labels.namespace }}/{{ 20 | $labels.pod}}. 21 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerfailedreload 22 | summary: Reloading an Alertmanager configuration has failed. 23 | expr: | 24 | # Without max_over_time, failed scrapes could create false negatives, see 25 | # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. 26 | max_over_time(alertmanager_config_last_reload_successful{job="alertmanager-main",namespace="monitoring"}[5m]) == 0 27 | for: 10m 28 | labels: 29 | severity: critical 30 | - alert: AlertmanagerMembersInconsistent 31 | annotations: 32 | description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} has only 33 | found {{ $value }} members of the {{$labels.job}} cluster. 34 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagermembersinconsistent 35 | summary: A member of an Alertmanager cluster has not found all other cluster 36 | members. 37 | expr: | 38 | # Without max_over_time, failed scrapes could create false negatives, see 39 | # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. 40 | max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"}[5m]) 41 | < on (namespace,service) group_left 42 | count by (namespace,service) (max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"}[5m])) 43 | for: 15m 44 | labels: 45 | severity: critical 46 | - alert: AlertmanagerFailedToSendAlerts 47 | annotations: 48 | description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} failed 49 | to send {{ $value | humanizePercentage }} of notifications to {{ $labels.integration 50 | }}. 51 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerfailedtosendalerts 52 | summary: An Alertmanager instance failed to send notifications. 53 | expr: | 54 | ( 55 | rate(alertmanager_notifications_failed_total{job="alertmanager-main",namespace="monitoring"}[5m]) 56 | / 57 | rate(alertmanager_notifications_total{job="alertmanager-main",namespace="monitoring"}[5m]) 58 | ) 59 | > 0.01 60 | for: 5m 61 | labels: 62 | severity: warning 63 | - alert: AlertmanagerClusterFailedToSendAlerts 64 | annotations: 65 | description: The minimum notification failure rate to {{ $labels.integration 66 | }} sent from any instance in the {{$labels.job}} cluster is {{ $value | 67 | humanizePercentage }}. 68 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts 69 | summary: All Alertmanager instances in a cluster failed to send notifications 70 | to a critical integration. 71 | expr: | 72 | min by (namespace,service, integration) ( 73 | rate(alertmanager_notifications_failed_total{job="alertmanager-main",namespace="monitoring", integration=~`.*`}[5m]) 74 | / 75 | rate(alertmanager_notifications_total{job="alertmanager-main",namespace="monitoring", integration=~`.*`}[5m]) 76 | ) 77 | > 0.01 78 | for: 5m 79 | labels: 80 | severity: critical 81 | - alert: AlertmanagerClusterFailedToSendAlerts 82 | annotations: 83 | description: The minimum notification failure rate to {{ $labels.integration 84 | }} sent from any instance in the {{$labels.job}} cluster is {{ $value | 85 | humanizePercentage }}. 86 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts 87 | summary: All Alertmanager instances in a cluster failed to send notifications 88 | to a non-critical integration. 89 | expr: | 90 | min by (namespace,service, integration) ( 91 | rate(alertmanager_notifications_failed_total{job="alertmanager-main",namespace="monitoring", integration!~`.*`}[5m]) 92 | / 93 | rate(alertmanager_notifications_total{job="alertmanager-main",namespace="monitoring", integration!~`.*`}[5m]) 94 | ) 95 | > 0.01 96 | for: 5m 97 | labels: 98 | severity: warning 99 | - alert: AlertmanagerConfigInconsistent 100 | annotations: 101 | description: Alertmanager instances within the {{$labels.job}} cluster have 102 | different configurations. 103 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerconfiginconsistent 104 | summary: Alertmanager instances within the same cluster have different configurations. 105 | expr: | 106 | count by (namespace,service) ( 107 | count_values by (namespace,service) ("config_hash", alertmanager_config_hash{job="alertmanager-main",namespace="monitoring"}) 108 | ) 109 | != 1 110 | for: 20m 111 | labels: 112 | severity: critical 113 | - alert: AlertmanagerClusterDown 114 | annotations: 115 | description: '{{ $value | humanizePercentage }} of Alertmanager instances 116 | within the {{$labels.job}} cluster have been up for less than half of the 117 | last 5m.' 118 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterdown 119 | summary: Half or more of the Alertmanager instances within the same cluster 120 | are down. 121 | expr: | 122 | ( 123 | count by (namespace,service) ( 124 | avg_over_time(up{job="alertmanager-main",namespace="monitoring"}[5m]) < 0.5 125 | ) 126 | / 127 | count by (namespace,service) ( 128 | up{job="alertmanager-main",namespace="monitoring"} 129 | ) 130 | ) 131 | >= 0.5 132 | for: 5m 133 | labels: 134 | severity: critical 135 | - alert: AlertmanagerClusterCrashlooping 136 | annotations: 137 | description: '{{ $value | humanizePercentage }} of Alertmanager instances 138 | within the {{$labels.job}} cluster have restarted at least 5 times in the 139 | last 10m.' 140 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclustercrashlooping 141 | summary: Half or more of the Alertmanager instances within the same cluster 142 | are crashlooping. 143 | expr: | 144 | ( 145 | count by (namespace,service) ( 146 | changes(process_start_time_seconds{job="alertmanager-main",namespace="monitoring"}[10m]) > 4 147 | ) 148 | / 149 | count by (namespace,service) ( 150 | up{job="alertmanager-main",namespace="monitoring"} 151 | ) 152 | ) 153 | >= 0.5 154 | for: 5m 155 | labels: 156 | severity: critical 157 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/alertmanager-secret.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Secret 3 | metadata: 4 | labels: 5 | alertmanager: main 6 | app.kubernetes.io/component: alert-router 7 | app.kubernetes.io/name: alertmanager 8 | app.kubernetes.io/part-of: kube-prometheus 9 | app.kubernetes.io/version: 0.23.0 10 | name: alertmanager-main 11 | namespace: monitoring 12 | stringData: 13 | alertmanager.yaml: |- 14 | "global": 15 | "resolve_timeout": "5m" 16 | "inhibit_rules": 17 | - "equal": 18 | - "namespace" 19 | - "alertname" 20 | "source_match": 21 | "severity": "critical" 22 | "target_match_re": 23 | "severity": "warning|info" 24 | - "equal": 25 | - "namespace" 26 | - "alertname" 27 | "source_match": 28 | "severity": "warning" 29 | "target_match_re": 30 | "severity": "info" 31 | "receivers": 32 | - "name": "Default" 33 | - "name": "Watchdog" 34 | - "name": "Critical" 35 | "route": 36 | "group_by": 37 | - "namespace" 38 | "group_interval": "5m" 39 | "group_wait": "30s" 40 | "receiver": "Default" 41 | "repeat_interval": "12h" 42 | "routes": 43 | - "match": 44 | "alertname": "Watchdog" 45 | "receiver": "Watchdog" 46 | - "match": 47 | "severity": "critical" 48 | "receiver": "Critical" 49 | type: Opaque 50 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/alertmanager-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | alertmanager: main 6 | app.kubernetes.io/component: alert-router 7 | app.kubernetes.io/name: alertmanager 8 | app.kubernetes.io/part-of: kube-prometheus 9 | app.kubernetes.io/version: 0.23.0 10 | name: alertmanager-main 11 | namespace: monitoring 12 | spec: 13 | ports: 14 | - name: web 15 | port: 9093 16 | targetPort: web 17 | - name: reloader-web 18 | port: 8080 19 | targetPort: reloader-web 20 | selector: 21 | alertmanager: main 22 | app.kubernetes.io/component: alert-router 23 | app.kubernetes.io/name: alertmanager 24 | app.kubernetes.io/part-of: kube-prometheus 25 | sessionAffinity: ClientIP 26 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/alertmanager-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | alertmanager: main 6 | app.kubernetes.io/component: alert-router 7 | app.kubernetes.io/name: alertmanager 8 | app.kubernetes.io/part-of: kube-prometheus 9 | app.kubernetes.io/version: 0.23.0 10 | name: alertmanager-main 11 | namespace: monitoring 12 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/alertmanager-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: alert-router 6 | app.kubernetes.io/name: alertmanager 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.23.0 9 | name: alertmanager 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - interval: 30s 14 | port: web 15 | - interval: 30s 16 | port: reloader-web 17 | selector: 18 | matchLabels: 19 | alertmanager: main 20 | app.kubernetes.io/component: alert-router 21 | app.kubernetes.io/name: alertmanager 22 | app.kubernetes.io/part-of: kube-prometheus 23 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/blackbox-exporter-clusterRole.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | name: blackbox-exporter 5 | rules: 6 | - apiGroups: 7 | - authentication.k8s.io 8 | resources: 9 | - tokenreviews 10 | verbs: 11 | - create 12 | - apiGroups: 13 | - authorization.k8s.io 14 | resources: 15 | - subjectaccessreviews 16 | verbs: 17 | - create 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/blackbox-exporter-clusterRoleBinding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | name: blackbox-exporter 5 | roleRef: 6 | apiGroup: rbac.authorization.k8s.io 7 | kind: ClusterRole 8 | name: blackbox-exporter 9 | subjects: 10 | - kind: ServiceAccount 11 | name: blackbox-exporter 12 | namespace: monitoring 13 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/blackbox-exporter-configuration.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | data: 3 | config.yml: |- 4 | "modules": 5 | "http_2xx": 6 | "http": 7 | "preferred_ip_protocol": "ip4" 8 | "prober": "http" 9 | "http_post_2xx": 10 | "http": 11 | "method": "POST" 12 | "preferred_ip_protocol": "ip4" 13 | "prober": "http" 14 | "irc_banner": 15 | "prober": "tcp" 16 | "tcp": 17 | "preferred_ip_protocol": "ip4" 18 | "query_response": 19 | - "send": "NICK prober" 20 | - "send": "USER prober prober prober :prober" 21 | - "expect": "PING :([^ ]+)" 22 | "send": "PONG ${1}" 23 | - "expect": "^:[^ ]+ 001" 24 | "pop3s_banner": 25 | "prober": "tcp" 26 | "tcp": 27 | "preferred_ip_protocol": "ip4" 28 | "query_response": 29 | - "expect": "^+OK" 30 | "tls": true 31 | "tls_config": 32 | "insecure_skip_verify": false 33 | "ssh_banner": 34 | "prober": "tcp" 35 | "tcp": 36 | "preferred_ip_protocol": "ip4" 37 | "query_response": 38 | - "expect": "^SSH-2.0-" 39 | "tcp_connect": 40 | "prober": "tcp" 41 | "tcp": 42 | "preferred_ip_protocol": "ip4" 43 | kind: ConfigMap 44 | metadata: 45 | labels: 46 | app.kubernetes.io/component: exporter 47 | app.kubernetes.io/name: blackbox-exporter 48 | app.kubernetes.io/part-of: kube-prometheus 49 | app.kubernetes.io/version: 0.19.0 50 | name: blackbox-exporter-configuration 51 | namespace: monitoring 52 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/blackbox-exporter-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: blackbox-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.19.0 9 | name: blackbox-exporter 10 | namespace: monitoring 11 | spec: 12 | replicas: 1 13 | selector: 14 | matchLabels: 15 | app.kubernetes.io/component: exporter 16 | app.kubernetes.io/name: blackbox-exporter 17 | app.kubernetes.io/part-of: kube-prometheus 18 | template: 19 | metadata: 20 | annotations: 21 | kubectl.kubernetes.io/default-container: blackbox-exporter 22 | labels: 23 | app.kubernetes.io/component: exporter 24 | app.kubernetes.io/name: blackbox-exporter 25 | app.kubernetes.io/part-of: kube-prometheus 26 | app.kubernetes.io/version: 0.19.0 27 | spec: 28 | containers: 29 | - args: 30 | - --config.file=/etc/blackbox_exporter/config.yml 31 | - --web.listen-address=:19115 32 | image: quay.io/prometheus/blackbox-exporter:v0.19.0 33 | name: blackbox-exporter 34 | ports: 35 | - containerPort: 19115 36 | name: http 37 | resources: 38 | limits: 39 | cpu: 20m 40 | memory: 40Mi 41 | requests: 42 | cpu: 10m 43 | memory: 20Mi 44 | securityContext: 45 | runAsNonRoot: true 46 | runAsUser: 65534 47 | volumeMounts: 48 | - mountPath: /etc/blackbox_exporter/ 49 | name: config 50 | readOnly: true 51 | - args: 52 | - --webhook-url=http://localhost:19115/-/reload 53 | - --volume-dir=/etc/blackbox_exporter/ 54 | image: jimmidyson/configmap-reload:v0.5.0 55 | name: module-configmap-reloader 56 | resources: 57 | limits: 58 | cpu: 20m 59 | memory: 40Mi 60 | requests: 61 | cpu: 10m 62 | memory: 20Mi 63 | securityContext: 64 | runAsNonRoot: true 65 | runAsUser: 65534 66 | terminationMessagePath: /dev/termination-log 67 | terminationMessagePolicy: FallbackToLogsOnError 68 | volumeMounts: 69 | - mountPath: /etc/blackbox_exporter/ 70 | name: config 71 | readOnly: true 72 | - args: 73 | - --logtostderr 74 | - --secure-listen-address=:9115 75 | - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 76 | - --upstream=http://127.0.0.1:19115/ 77 | image: quay.io/brancz/kube-rbac-proxy:v0.11.0 78 | name: kube-rbac-proxy 79 | ports: 80 | - containerPort: 9115 81 | name: https 82 | resources: 83 | limits: 84 | cpu: 20m 85 | memory: 40Mi 86 | requests: 87 | cpu: 10m 88 | memory: 20Mi 89 | securityContext: 90 | runAsGroup: 65532 91 | runAsNonRoot: true 92 | runAsUser: 65532 93 | nodeSelector: 94 | kubernetes.io/os: linux 95 | serviceAccountName: blackbox-exporter 96 | volumes: 97 | - configMap: 98 | name: blackbox-exporter-configuration 99 | name: config 100 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/blackbox-exporter-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: blackbox-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.19.0 9 | name: blackbox-exporter 10 | namespace: monitoring 11 | spec: 12 | ports: 13 | - name: https 14 | port: 9115 15 | targetPort: https 16 | - name: probe 17 | port: 19115 18 | targetPort: http 19 | selector: 20 | app.kubernetes.io/component: exporter 21 | app.kubernetes.io/name: blackbox-exporter 22 | app.kubernetes.io/part-of: kube-prometheus 23 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/blackbox-exporter-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | name: blackbox-exporter 5 | namespace: monitoring 6 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/blackbox-exporter-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: blackbox-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.19.0 9 | name: blackbox-exporter 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 14 | interval: 30s 15 | path: /metrics 16 | port: https 17 | scheme: https 18 | tlsConfig: 19 | insecureSkipVerify: true 20 | selector: 21 | matchLabels: 22 | app.kubernetes.io/component: exporter 23 | app.kubernetes.io/name: blackbox-exporter 24 | app.kubernetes.io/part-of: kube-prometheus 25 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/grafana-config.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Secret 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: grafana 6 | app.kubernetes.io/name: grafana 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 8.2.2 9 | name: grafana-config 10 | namespace: monitoring 11 | stringData: 12 | grafana.ini: | 13 | [date_formats] 14 | default_timezone = UTC 15 | type: Opaque 16 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/grafana-dashboardDatasources.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Secret 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: grafana 6 | app.kubernetes.io/name: grafana 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 8.2.2 9 | name: grafana-datasources 10 | namespace: monitoring 11 | stringData: 12 | datasources.yaml: |- 13 | { 14 | "apiVersion": 1, 15 | "datasources": [ 16 | { 17 | "access": "proxy", 18 | "editable": false, 19 | "name": "prometheus", 20 | "orgId": 1, 21 | "type": "prometheus", 22 | "url": "http://prometheus-k8s.monitoring.svc:9090", 23 | "version": 1 24 | } 25 | ] 26 | } 27 | type: Opaque 28 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/grafana-dashboardSources.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | data: 3 | dashboards.yaml: |- 4 | { 5 | "apiVersion": 1, 6 | "providers": [ 7 | { 8 | "folder": "Default", 9 | "folderUid": "", 10 | "name": "0", 11 | "options": { 12 | "path": "/grafana-dashboard-definitions/0" 13 | }, 14 | "orgId": 1, 15 | "type": "file" 16 | } 17 | ] 18 | } 19 | kind: ConfigMap 20 | metadata: 21 | labels: 22 | app.kubernetes.io/component: grafana 23 | app.kubernetes.io/name: grafana 24 | app.kubernetes.io/part-of: kube-prometheus 25 | app.kubernetes.io/version: 8.2.2 26 | name: grafana-dashboards 27 | namespace: monitoring 28 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/grafana-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: grafana 6 | app.kubernetes.io/name: grafana 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 8.2.2 9 | name: grafana 10 | namespace: monitoring 11 | spec: 12 | replicas: 1 13 | selector: 14 | matchLabels: 15 | app.kubernetes.io/component: grafana 16 | app.kubernetes.io/name: grafana 17 | app.kubernetes.io/part-of: kube-prometheus 18 | template: 19 | metadata: 20 | annotations: 21 | checksum/grafana-config: 46c0415b77b987f164f26db3534f3126 22 | checksum/grafana-dashboardproviders: c6b43488328ffef02efb6cf85119aee7 23 | checksum/grafana-datasources: fe3334ba9a65c5dc3f3546019f2122ee 24 | labels: 25 | app.kubernetes.io/component: grafana 26 | app.kubernetes.io/name: grafana 27 | app.kubernetes.io/part-of: kube-prometheus 28 | app.kubernetes.io/version: 8.2.2 29 | spec: 30 | containers: 31 | - env: [] 32 | image: grafana/grafana:8.2.2 33 | name: grafana 34 | ports: 35 | - containerPort: 3000 36 | name: http 37 | readinessProbe: 38 | httpGet: 39 | path: /api/health 40 | port: http 41 | resources: 42 | limits: 43 | cpu: 200m 44 | memory: 200Mi 45 | requests: 46 | cpu: 100m 47 | memory: 100Mi 48 | volumeMounts: 49 | - mountPath: /var/lib/grafana 50 | name: grafana-storage 51 | readOnly: false 52 | - mountPath: /etc/grafana/provisioning/datasources 53 | name: grafana-datasources 54 | readOnly: false 55 | - mountPath: /etc/grafana/provisioning/dashboards 56 | name: grafana-dashboards 57 | readOnly: false 58 | - mountPath: /grafana-dashboard-definitions/0/alertmanager-overview 59 | name: grafana-dashboard-alertmanager-overview 60 | readOnly: false 61 | - mountPath: /grafana-dashboard-definitions/0/apiserver 62 | name: grafana-dashboard-apiserver 63 | readOnly: false 64 | - mountPath: /grafana-dashboard-definitions/0/cluster-total 65 | name: grafana-dashboard-cluster-total 66 | readOnly: false 67 | - mountPath: /grafana-dashboard-definitions/0/controller-manager 68 | name: grafana-dashboard-controller-manager 69 | readOnly: false 70 | - mountPath: /grafana-dashboard-definitions/0/k8s-resources-cluster 71 | name: grafana-dashboard-k8s-resources-cluster 72 | readOnly: false 73 | - mountPath: /grafana-dashboard-definitions/0/k8s-resources-namespace 74 | name: grafana-dashboard-k8s-resources-namespace 75 | readOnly: false 76 | - mountPath: /grafana-dashboard-definitions/0/k8s-resources-node 77 | name: grafana-dashboard-k8s-resources-node 78 | readOnly: false 79 | - mountPath: /grafana-dashboard-definitions/0/k8s-resources-pod 80 | name: grafana-dashboard-k8s-resources-pod 81 | readOnly: false 82 | - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workload 83 | name: grafana-dashboard-k8s-resources-workload 84 | readOnly: false 85 | - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workloads-namespace 86 | name: grafana-dashboard-k8s-resources-workloads-namespace 87 | readOnly: false 88 | - mountPath: /grafana-dashboard-definitions/0/kubelet 89 | name: grafana-dashboard-kubelet 90 | readOnly: false 91 | - mountPath: /grafana-dashboard-definitions/0/namespace-by-pod 92 | name: grafana-dashboard-namespace-by-pod 93 | readOnly: false 94 | - mountPath: /grafana-dashboard-definitions/0/namespace-by-workload 95 | name: grafana-dashboard-namespace-by-workload 96 | readOnly: false 97 | - mountPath: /grafana-dashboard-definitions/0/node-cluster-rsrc-use 98 | name: grafana-dashboard-node-cluster-rsrc-use 99 | readOnly: false 100 | - mountPath: /grafana-dashboard-definitions/0/node-rsrc-use 101 | name: grafana-dashboard-node-rsrc-use 102 | readOnly: false 103 | - mountPath: /grafana-dashboard-definitions/0/nodes 104 | name: grafana-dashboard-nodes 105 | readOnly: false 106 | - mountPath: /grafana-dashboard-definitions/0/persistentvolumesusage 107 | name: grafana-dashboard-persistentvolumesusage 108 | readOnly: false 109 | - mountPath: /grafana-dashboard-definitions/0/pod-total 110 | name: grafana-dashboard-pod-total 111 | readOnly: false 112 | - mountPath: /grafana-dashboard-definitions/0/prometheus-remote-write 113 | name: grafana-dashboard-prometheus-remote-write 114 | readOnly: false 115 | - mountPath: /grafana-dashboard-definitions/0/prometheus 116 | name: grafana-dashboard-prometheus 117 | readOnly: false 118 | - mountPath: /grafana-dashboard-definitions/0/proxy 119 | name: grafana-dashboard-proxy 120 | readOnly: false 121 | - mountPath: /grafana-dashboard-definitions/0/scheduler 122 | name: grafana-dashboard-scheduler 123 | readOnly: false 124 | - mountPath: /grafana-dashboard-definitions/0/workload-total 125 | name: grafana-dashboard-workload-total 126 | readOnly: false 127 | - mountPath: /etc/grafana 128 | name: grafana-config 129 | readOnly: false 130 | nodeSelector: 131 | kubernetes.io/os: linux 132 | securityContext: 133 | fsGroup: 65534 134 | runAsNonRoot: true 135 | runAsUser: 65534 136 | serviceAccountName: grafana 137 | volumes: 138 | - emptyDir: {} 139 | name: grafana-storage 140 | - name: grafana-datasources 141 | secret: 142 | secretName: grafana-datasources 143 | - configMap: 144 | name: grafana-dashboards 145 | name: grafana-dashboards 146 | - configMap: 147 | name: grafana-dashboard-alertmanager-overview 148 | name: grafana-dashboard-alertmanager-overview 149 | - configMap: 150 | name: grafana-dashboard-apiserver 151 | name: grafana-dashboard-apiserver 152 | - configMap: 153 | name: grafana-dashboard-cluster-total 154 | name: grafana-dashboard-cluster-total 155 | - configMap: 156 | name: grafana-dashboard-controller-manager 157 | name: grafana-dashboard-controller-manager 158 | - configMap: 159 | name: grafana-dashboard-k8s-resources-cluster 160 | name: grafana-dashboard-k8s-resources-cluster 161 | - configMap: 162 | name: grafana-dashboard-k8s-resources-namespace 163 | name: grafana-dashboard-k8s-resources-namespace 164 | - configMap: 165 | name: grafana-dashboard-k8s-resources-node 166 | name: grafana-dashboard-k8s-resources-node 167 | - configMap: 168 | name: grafana-dashboard-k8s-resources-pod 169 | name: grafana-dashboard-k8s-resources-pod 170 | - configMap: 171 | name: grafana-dashboard-k8s-resources-workload 172 | name: grafana-dashboard-k8s-resources-workload 173 | - configMap: 174 | name: grafana-dashboard-k8s-resources-workloads-namespace 175 | name: grafana-dashboard-k8s-resources-workloads-namespace 176 | - configMap: 177 | name: grafana-dashboard-kubelet 178 | name: grafana-dashboard-kubelet 179 | - configMap: 180 | name: grafana-dashboard-namespace-by-pod 181 | name: grafana-dashboard-namespace-by-pod 182 | - configMap: 183 | name: grafana-dashboard-namespace-by-workload 184 | name: grafana-dashboard-namespace-by-workload 185 | - configMap: 186 | name: grafana-dashboard-node-cluster-rsrc-use 187 | name: grafana-dashboard-node-cluster-rsrc-use 188 | - configMap: 189 | name: grafana-dashboard-node-rsrc-use 190 | name: grafana-dashboard-node-rsrc-use 191 | - configMap: 192 | name: grafana-dashboard-nodes 193 | name: grafana-dashboard-nodes 194 | - configMap: 195 | name: grafana-dashboard-persistentvolumesusage 196 | name: grafana-dashboard-persistentvolumesusage 197 | - configMap: 198 | name: grafana-dashboard-pod-total 199 | name: grafana-dashboard-pod-total 200 | - configMap: 201 | name: grafana-dashboard-prometheus-remote-write 202 | name: grafana-dashboard-prometheus-remote-write 203 | - configMap: 204 | name: grafana-dashboard-prometheus 205 | name: grafana-dashboard-prometheus 206 | - configMap: 207 | name: grafana-dashboard-proxy 208 | name: grafana-dashboard-proxy 209 | - configMap: 210 | name: grafana-dashboard-scheduler 211 | name: grafana-dashboard-scheduler 212 | - configMap: 213 | name: grafana-dashboard-workload-total 214 | name: grafana-dashboard-workload-total 215 | - name: grafana-config 216 | secret: 217 | secretName: grafana-config 218 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/grafana-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: grafana 6 | app.kubernetes.io/name: grafana 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 8.2.2 9 | name: grafana 10 | namespace: monitoring 11 | spec: 12 | ports: 13 | - name: http 14 | port: 3000 15 | targetPort: http 16 | selector: 17 | app.kubernetes.io/component: grafana 18 | app.kubernetes.io/name: grafana 19 | app.kubernetes.io/part-of: kube-prometheus 20 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/grafana-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: grafana 6 | app.kubernetes.io/name: grafana 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 8.2.2 9 | name: grafana 10 | namespace: monitoring 11 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/grafana-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: grafana 6 | app.kubernetes.io/name: grafana 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 8.2.2 9 | name: grafana 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - interval: 15s 14 | port: http 15 | selector: 16 | matchLabels: 17 | app.kubernetes.io/name: grafana 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-prometheus-prometheusRule.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: PrometheusRule 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | prometheus: k8s 9 | role: alert-rules 10 | name: kube-prometheus-rules 11 | namespace: monitoring 12 | spec: 13 | groups: 14 | - name: general.rules 15 | rules: 16 | - alert: TargetDown 17 | annotations: 18 | description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service 19 | }} targets in {{ $labels.namespace }} namespace are down.' 20 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/targetdown 21 | summary: One or more targets are unreachable. 22 | expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, 23 | namespace, service)) > 10 24 | for: 10m 25 | labels: 26 | severity: warning 27 | - alert: Watchdog 28 | annotations: 29 | description: | 30 | This is an alert meant to ensure that the entire alerting pipeline is functional. 31 | This alert is always firing, therefore it should always be firing in Alertmanager 32 | and always fire against a receiver. There are integrations with various notification 33 | mechanisms that send a notification when this alert is not firing. For example the 34 | "DeadMansSnitch" integration in PagerDuty. 35 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/watchdog 36 | summary: An alert that should always be firing to certify that Alertmanager 37 | is working properly. 38 | expr: vector(1) 39 | labels: 40 | severity: none 41 | - name: node-network 42 | rules: 43 | - alert: NodeNetworkInterfaceFlapping 44 | annotations: 45 | description: Network interface "{{ $labels.device }}" changing its up status 46 | often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }} 47 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/nodenetworkinterfaceflapping 48 | summary: Network interface is often changing its status 49 | expr: | 50 | changes(node_network_up{job="node-exporter",device!~"veth.+"}[2m]) > 2 51 | for: 2m 52 | labels: 53 | severity: warning 54 | - name: kube-prometheus-node-recording.rules 55 | rules: 56 | - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[3m])) 57 | BY (instance) 58 | record: instance:node_cpu:rate:sum 59 | - expr: sum(rate(node_network_receive_bytes_total[3m])) BY (instance) 60 | record: instance:node_network_receive_bytes:rate:sum 61 | - expr: sum(rate(node_network_transmit_bytes_total[3m])) BY (instance) 62 | record: instance:node_network_transmit_bytes:rate:sum 63 | - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[5m])) 64 | WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) 65 | BY (instance, cpu)) BY (instance) 66 | record: instance:node_cpu:ratio 67 | - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[5m])) 68 | record: cluster:node_cpu:sum_rate5m 69 | - expr: cluster:node_cpu_seconds_total:rate5m / count(sum(node_cpu_seconds_total) 70 | BY (instance, cpu)) 71 | record: cluster:node_cpu:ratio 72 | - name: kube-prometheus-general.rules 73 | rules: 74 | - expr: count without(instance, pod, node) (up == 1) 75 | record: count:up1 76 | - expr: count without(instance, pod, node) (up == 0) 77 | record: count:up0 78 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-state-metrics-clusterRole.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-state-metrics 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.2.3 9 | name: kube-state-metrics 10 | rules: 11 | - apiGroups: 12 | - "" 13 | resources: 14 | - configmaps 15 | - secrets 16 | - nodes 17 | - pods 18 | - services 19 | - resourcequotas 20 | - replicationcontrollers 21 | - limitranges 22 | - persistentvolumeclaims 23 | - persistentvolumes 24 | - namespaces 25 | - endpoints 26 | verbs: 27 | - list 28 | - watch 29 | - apiGroups: 30 | - apps 31 | resources: 32 | - statefulsets 33 | - daemonsets 34 | - deployments 35 | - replicasets 36 | verbs: 37 | - list 38 | - watch 39 | - apiGroups: 40 | - batch 41 | resources: 42 | - cronjobs 43 | - jobs 44 | verbs: 45 | - list 46 | - watch 47 | - apiGroups: 48 | - autoscaling 49 | resources: 50 | - horizontalpodautoscalers 51 | verbs: 52 | - list 53 | - watch 54 | - apiGroups: 55 | - authentication.k8s.io 56 | resources: 57 | - tokenreviews 58 | verbs: 59 | - create 60 | - apiGroups: 61 | - authorization.k8s.io 62 | resources: 63 | - subjectaccessreviews 64 | verbs: 65 | - create 66 | - apiGroups: 67 | - policy 68 | resources: 69 | - poddisruptionbudgets 70 | verbs: 71 | - list 72 | - watch 73 | - apiGroups: 74 | - certificates.k8s.io 75 | resources: 76 | - certificatesigningrequests 77 | verbs: 78 | - list 79 | - watch 80 | - apiGroups: 81 | - storage.k8s.io 82 | resources: 83 | - storageclasses 84 | - volumeattachments 85 | verbs: 86 | - list 87 | - watch 88 | - apiGroups: 89 | - admissionregistration.k8s.io 90 | resources: 91 | - mutatingwebhookconfigurations 92 | - validatingwebhookconfigurations 93 | verbs: 94 | - list 95 | - watch 96 | - apiGroups: 97 | - networking.k8s.io 98 | resources: 99 | - networkpolicies 100 | - ingresses 101 | verbs: 102 | - list 103 | - watch 104 | - apiGroups: 105 | - coordination.k8s.io 106 | resources: 107 | - leases 108 | verbs: 109 | - list 110 | - watch 111 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-state-metrics-clusterRoleBinding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-state-metrics 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.2.3 9 | name: kube-state-metrics 10 | roleRef: 11 | apiGroup: rbac.authorization.k8s.io 12 | kind: ClusterRole 13 | name: kube-state-metrics 14 | subjects: 15 | - kind: ServiceAccount 16 | name: kube-state-metrics 17 | namespace: monitoring 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-state-metrics-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-state-metrics 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.2.3 9 | name: kube-state-metrics 10 | namespace: monitoring 11 | spec: 12 | replicas: 1 13 | selector: 14 | matchLabels: 15 | app.kubernetes.io/component: exporter 16 | app.kubernetes.io/name: kube-state-metrics 17 | app.kubernetes.io/part-of: kube-prometheus 18 | template: 19 | metadata: 20 | annotations: 21 | kubectl.kubernetes.io/default-container: kube-state-metrics 22 | labels: 23 | app.kubernetes.io/component: exporter 24 | app.kubernetes.io/name: kube-state-metrics 25 | app.kubernetes.io/part-of: kube-prometheus 26 | app.kubernetes.io/version: 2.2.3 27 | spec: 28 | containers: 29 | - args: 30 | - --host=127.0.0.1 31 | - --port=8081 32 | - --telemetry-host=127.0.0.1 33 | - --telemetry-port=8082 34 | image: docker.io/v5cn/kube-state-metrics:v2.2.3 35 | name: kube-state-metrics 36 | resources: 37 | limits: 38 | cpu: 100m 39 | memory: 250Mi 40 | requests: 41 | cpu: 10m 42 | memory: 190Mi 43 | securityContext: 44 | runAsUser: 65534 45 | - args: 46 | - --logtostderr 47 | - --secure-listen-address=:8443 48 | - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 49 | - --upstream=http://127.0.0.1:8081/ 50 | image: quay.io/brancz/kube-rbac-proxy:v0.11.0 51 | name: kube-rbac-proxy-main 52 | ports: 53 | - containerPort: 8443 54 | name: https-main 55 | resources: 56 | limits: 57 | cpu: 40m 58 | memory: 40Mi 59 | requests: 60 | cpu: 20m 61 | memory: 20Mi 62 | securityContext: 63 | runAsGroup: 65532 64 | runAsNonRoot: true 65 | runAsUser: 65532 66 | - args: 67 | - --logtostderr 68 | - --secure-listen-address=:9443 69 | - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 70 | - --upstream=http://127.0.0.1:8082/ 71 | image: quay.io/brancz/kube-rbac-proxy:v0.11.0 72 | name: kube-rbac-proxy-self 73 | ports: 74 | - containerPort: 9443 75 | name: https-self 76 | resources: 77 | limits: 78 | cpu: 20m 79 | memory: 40Mi 80 | requests: 81 | cpu: 10m 82 | memory: 20Mi 83 | securityContext: 84 | runAsGroup: 65532 85 | runAsNonRoot: true 86 | runAsUser: 65532 87 | nodeSelector: 88 | kubernetes.io/os: linux 89 | serviceAccountName: kube-state-metrics 90 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-state-metrics-prometheusRule.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: PrometheusRule 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-state-metrics 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.2.3 9 | prometheus: k8s 10 | role: alert-rules 11 | name: kube-state-metrics-rules 12 | namespace: monitoring 13 | spec: 14 | groups: 15 | - name: kube-state-metrics 16 | rules: 17 | - alert: KubeStateMetricsListErrors 18 | annotations: 19 | description: kube-state-metrics is experiencing errors at an elevated rate 20 | in list operations. This is likely causing it to not be able to expose metrics 21 | about Kubernetes objects correctly or at all. 22 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricslisterrors 23 | summary: kube-state-metrics is experiencing errors in list operations. 24 | expr: | 25 | (sum(rate(kube_state_metrics_list_total{job="kube-state-metrics",result="error"}[5m])) 26 | / 27 | sum(rate(kube_state_metrics_list_total{job="kube-state-metrics"}[5m]))) 28 | > 0.01 29 | for: 15m 30 | labels: 31 | severity: critical 32 | - alert: KubeStateMetricsWatchErrors 33 | annotations: 34 | description: kube-state-metrics is experiencing errors at an elevated rate 35 | in watch operations. This is likely causing it to not be able to expose 36 | metrics about Kubernetes objects correctly or at all. 37 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricswatcherrors 38 | summary: kube-state-metrics is experiencing errors in watch operations. 39 | expr: | 40 | (sum(rate(kube_state_metrics_watch_total{job="kube-state-metrics",result="error"}[5m])) 41 | / 42 | sum(rate(kube_state_metrics_watch_total{job="kube-state-metrics"}[5m]))) 43 | > 0.01 44 | for: 15m 45 | labels: 46 | severity: critical 47 | - alert: KubeStateMetricsShardingMismatch 48 | annotations: 49 | description: kube-state-metrics pods are running with different --total-shards 50 | configuration, some Kubernetes objects may be exposed multiple times or 51 | not exposed at all. 52 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricsshardingmismatch 53 | summary: kube-state-metrics sharding is misconfigured. 54 | expr: | 55 | stdvar (kube_state_metrics_total_shards{job="kube-state-metrics"}) != 0 56 | for: 15m 57 | labels: 58 | severity: critical 59 | - alert: KubeStateMetricsShardsMissing 60 | annotations: 61 | description: kube-state-metrics shards are missing, some Kubernetes objects 62 | are not being exposed. 63 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricsshardsmissing 64 | summary: kube-state-metrics shards are missing. 65 | expr: | 66 | 2^max(kube_state_metrics_total_shards{job="kube-state-metrics"}) - 1 67 | - 68 | sum( 2 ^ max by (shard_ordinal) (kube_state_metrics_shard_ordinal{job="kube-state-metrics"}) ) 69 | != 0 70 | for: 15m 71 | labels: 72 | severity: critical 73 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-state-metrics-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-state-metrics 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.2.3 9 | name: kube-state-metrics 10 | namespace: monitoring 11 | spec: 12 | clusterIP: None 13 | ports: 14 | - name: https-main 15 | port: 8443 16 | targetPort: https-main 17 | - name: https-self 18 | port: 9443 19 | targetPort: https-self 20 | selector: 21 | app.kubernetes.io/component: exporter 22 | app.kubernetes.io/name: kube-state-metrics 23 | app.kubernetes.io/part-of: kube-prometheus 24 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-state-metrics-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-state-metrics 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.2.3 9 | name: kube-state-metrics 10 | namespace: monitoring 11 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kube-state-metrics-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: kube-state-metrics 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.2.3 9 | name: kube-state-metrics 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 14 | honorLabels: true 15 | interval: 30s 16 | port: https-main 17 | relabelings: 18 | - action: labeldrop 19 | regex: (pod|service|endpoint|namespace) 20 | scheme: https 21 | scrapeTimeout: 30s 22 | tlsConfig: 23 | insecureSkipVerify: true 24 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 25 | interval: 30s 26 | port: https-self 27 | scheme: https 28 | tlsConfig: 29 | insecureSkipVerify: true 30 | jobLabel: app.kubernetes.io/name 31 | selector: 32 | matchLabels: 33 | app.kubernetes.io/component: exporter 34 | app.kubernetes.io/name: kube-state-metrics 35 | app.kubernetes.io/part-of: kube-prometheus 36 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kubernetes-serviceMonitorApiserver.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/name: apiserver 6 | name: kube-apiserver 7 | namespace: monitoring 8 | spec: 9 | endpoints: 10 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 11 | interval: 30s 12 | metricRelabelings: 13 | - action: drop 14 | regex: kubelet_(pod_worker_latency_microseconds|pod_start_latency_microseconds|cgroup_manager_latency_microseconds|pod_worker_start_latency_microseconds|pleg_relist_latency_microseconds|pleg_relist_interval_microseconds|runtime_operations|runtime_operations_latency_microseconds|runtime_operations_errors|eviction_stats_age_microseconds|device_plugin_registration_count|device_plugin_alloc_latency_microseconds|network_plugin_operations_latency_microseconds) 15 | sourceLabels: 16 | - __name__ 17 | - action: drop 18 | regex: scheduler_(e2e_scheduling_latency_microseconds|scheduling_algorithm_predicate_evaluation|scheduling_algorithm_priority_evaluation|scheduling_algorithm_preemption_evaluation|scheduling_algorithm_latency_microseconds|binding_latency_microseconds|scheduling_latency_seconds) 19 | sourceLabels: 20 | - __name__ 21 | - action: drop 22 | regex: apiserver_(request_count|request_latencies|request_latencies_summary|dropped_requests|storage_data_key_generation_latencies_microseconds|storage_transformation_failures_total|storage_transformation_latencies_microseconds|proxy_tunnel_sync_latency_secs) 23 | sourceLabels: 24 | - __name__ 25 | - action: drop 26 | regex: kubelet_docker_(operations|operations_latency_microseconds|operations_errors|operations_timeout) 27 | sourceLabels: 28 | - __name__ 29 | - action: drop 30 | regex: reflector_(items_per_list|items_per_watch|list_duration_seconds|lists_total|short_watches_total|watch_duration_seconds|watches_total) 31 | sourceLabels: 32 | - __name__ 33 | - action: drop 34 | regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary) 35 | sourceLabels: 36 | - __name__ 37 | - action: drop 38 | regex: transformation_(transformation_latencies_microseconds|failures_total) 39 | sourceLabels: 40 | - __name__ 41 | - action: drop 42 | regex: (admission_quota_controller_adds|admission_quota_controller_depth|admission_quota_controller_longest_running_processor_microseconds|admission_quota_controller_queue_latency|admission_quota_controller_unfinished_work_seconds|admission_quota_controller_work_duration|APIServiceOpenAPIAggregationControllerQueue1_adds|APIServiceOpenAPIAggregationControllerQueue1_depth|APIServiceOpenAPIAggregationControllerQueue1_longest_running_processor_microseconds|APIServiceOpenAPIAggregationControllerQueue1_queue_latency|APIServiceOpenAPIAggregationControllerQueue1_retries|APIServiceOpenAPIAggregationControllerQueue1_unfinished_work_seconds|APIServiceOpenAPIAggregationControllerQueue1_work_duration|APIServiceRegistrationController_adds|APIServiceRegistrationController_depth|APIServiceRegistrationController_longest_running_processor_microseconds|APIServiceRegistrationController_queue_latency|APIServiceRegistrationController_retries|APIServiceRegistrationController_unfinished_work_seconds|APIServiceRegistrationController_work_duration|autoregister_adds|autoregister_depth|autoregister_longest_running_processor_microseconds|autoregister_queue_latency|autoregister_retries|autoregister_unfinished_work_seconds|autoregister_work_duration|AvailableConditionController_adds|AvailableConditionController_depth|AvailableConditionController_longest_running_processor_microseconds|AvailableConditionController_queue_latency|AvailableConditionController_retries|AvailableConditionController_unfinished_work_seconds|AvailableConditionController_work_duration|crd_autoregistration_controller_adds|crd_autoregistration_controller_depth|crd_autoregistration_controller_longest_running_processor_microseconds|crd_autoregistration_controller_queue_latency|crd_autoregistration_controller_retries|crd_autoregistration_controller_unfinished_work_seconds|crd_autoregistration_controller_work_duration|crdEstablishing_adds|crdEstablishing_depth|crdEstablishing_longest_running_processor_microseconds|crdEstablishing_queue_latency|crdEstablishing_retries|crdEstablishing_unfinished_work_seconds|crdEstablishing_work_duration|crd_finalizer_adds|crd_finalizer_depth|crd_finalizer_longest_running_processor_microseconds|crd_finalizer_queue_latency|crd_finalizer_retries|crd_finalizer_unfinished_work_seconds|crd_finalizer_work_duration|crd_naming_condition_controller_adds|crd_naming_condition_controller_depth|crd_naming_condition_controller_longest_running_processor_microseconds|crd_naming_condition_controller_queue_latency|crd_naming_condition_controller_retries|crd_naming_condition_controller_unfinished_work_seconds|crd_naming_condition_controller_work_duration|crd_openapi_controller_adds|crd_openapi_controller_depth|crd_openapi_controller_longest_running_processor_microseconds|crd_openapi_controller_queue_latency|crd_openapi_controller_retries|crd_openapi_controller_unfinished_work_seconds|crd_openapi_controller_work_duration|DiscoveryController_adds|DiscoveryController_depth|DiscoveryController_longest_running_processor_microseconds|DiscoveryController_queue_latency|DiscoveryController_retries|DiscoveryController_unfinished_work_seconds|DiscoveryController_work_duration|kubeproxy_sync_proxy_rules_latency_microseconds|non_structural_schema_condition_controller_adds|non_structural_schema_condition_controller_depth|non_structural_schema_condition_controller_longest_running_processor_microseconds|non_structural_schema_condition_controller_queue_latency|non_structural_schema_condition_controller_retries|non_structural_schema_condition_controller_unfinished_work_seconds|non_structural_schema_condition_controller_work_duration|rest_client_request_latency_seconds|storage_operation_errors_total|storage_operation_status_count) 43 | sourceLabels: 44 | - __name__ 45 | - action: drop 46 | regex: etcd_(debugging|disk|server).* 47 | sourceLabels: 48 | - __name__ 49 | - action: drop 50 | regex: apiserver_admission_controller_admission_latencies_seconds_.* 51 | sourceLabels: 52 | - __name__ 53 | - action: drop 54 | regex: apiserver_admission_step_admission_latencies_seconds_.* 55 | sourceLabels: 56 | - __name__ 57 | - action: drop 58 | regex: apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50) 59 | sourceLabels: 60 | - __name__ 61 | - le 62 | port: https 63 | scheme: https 64 | tlsConfig: 65 | caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 66 | serverName: kubernetes 67 | jobLabel: component 68 | namespaceSelector: 69 | matchNames: 70 | - default 71 | selector: 72 | matchLabels: 73 | component: apiserver 74 | provider: kubernetes 75 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kubernetes-serviceMonitorCoreDNS.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/name: coredns 6 | name: coredns 7 | namespace: monitoring 8 | spec: 9 | endpoints: 10 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 11 | interval: 15s 12 | port: metrics 13 | jobLabel: app.kubernetes.io/name 14 | namespaceSelector: 15 | matchNames: 16 | - kube-system 17 | selector: 18 | matchLabels: 19 | k8s-app: kube-dns 20 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kubernetes-serviceMonitorKubeControllerManager.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/name: kube-controller-manager 6 | name: kube-controller-manager 7 | namespace: monitoring 8 | spec: 9 | endpoints: 10 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 11 | interval: 30s 12 | metricRelabelings: 13 | - action: drop 14 | regex: kubelet_(pod_worker_latency_microseconds|pod_start_latency_microseconds|cgroup_manager_latency_microseconds|pod_worker_start_latency_microseconds|pleg_relist_latency_microseconds|pleg_relist_interval_microseconds|runtime_operations|runtime_operations_latency_microseconds|runtime_operations_errors|eviction_stats_age_microseconds|device_plugin_registration_count|device_plugin_alloc_latency_microseconds|network_plugin_operations_latency_microseconds) 15 | sourceLabels: 16 | - __name__ 17 | - action: drop 18 | regex: scheduler_(e2e_scheduling_latency_microseconds|scheduling_algorithm_predicate_evaluation|scheduling_algorithm_priority_evaluation|scheduling_algorithm_preemption_evaluation|scheduling_algorithm_latency_microseconds|binding_latency_microseconds|scheduling_latency_seconds) 19 | sourceLabels: 20 | - __name__ 21 | - action: drop 22 | regex: apiserver_(request_count|request_latencies|request_latencies_summary|dropped_requests|storage_data_key_generation_latencies_microseconds|storage_transformation_failures_total|storage_transformation_latencies_microseconds|proxy_tunnel_sync_latency_secs) 23 | sourceLabels: 24 | - __name__ 25 | - action: drop 26 | regex: kubelet_docker_(operations|operations_latency_microseconds|operations_errors|operations_timeout) 27 | sourceLabels: 28 | - __name__ 29 | - action: drop 30 | regex: reflector_(items_per_list|items_per_watch|list_duration_seconds|lists_total|short_watches_total|watch_duration_seconds|watches_total) 31 | sourceLabels: 32 | - __name__ 33 | - action: drop 34 | regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary) 35 | sourceLabels: 36 | - __name__ 37 | - action: drop 38 | regex: transformation_(transformation_latencies_microseconds|failures_total) 39 | sourceLabels: 40 | - __name__ 41 | - action: drop 42 | regex: (admission_quota_controller_adds|admission_quota_controller_depth|admission_quota_controller_longest_running_processor_microseconds|admission_quota_controller_queue_latency|admission_quota_controller_unfinished_work_seconds|admission_quota_controller_work_duration|APIServiceOpenAPIAggregationControllerQueue1_adds|APIServiceOpenAPIAggregationControllerQueue1_depth|APIServiceOpenAPIAggregationControllerQueue1_longest_running_processor_microseconds|APIServiceOpenAPIAggregationControllerQueue1_queue_latency|APIServiceOpenAPIAggregationControllerQueue1_retries|APIServiceOpenAPIAggregationControllerQueue1_unfinished_work_seconds|APIServiceOpenAPIAggregationControllerQueue1_work_duration|APIServiceRegistrationController_adds|APIServiceRegistrationController_depth|APIServiceRegistrationController_longest_running_processor_microseconds|APIServiceRegistrationController_queue_latency|APIServiceRegistrationController_retries|APIServiceRegistrationController_unfinished_work_seconds|APIServiceRegistrationController_work_duration|autoregister_adds|autoregister_depth|autoregister_longest_running_processor_microseconds|autoregister_queue_latency|autoregister_retries|autoregister_unfinished_work_seconds|autoregister_work_duration|AvailableConditionController_adds|AvailableConditionController_depth|AvailableConditionController_longest_running_processor_microseconds|AvailableConditionController_queue_latency|AvailableConditionController_retries|AvailableConditionController_unfinished_work_seconds|AvailableConditionController_work_duration|crd_autoregistration_controller_adds|crd_autoregistration_controller_depth|crd_autoregistration_controller_longest_running_processor_microseconds|crd_autoregistration_controller_queue_latency|crd_autoregistration_controller_retries|crd_autoregistration_controller_unfinished_work_seconds|crd_autoregistration_controller_work_duration|crdEstablishing_adds|crdEstablishing_depth|crdEstablishing_longest_running_processor_microseconds|crdEstablishing_queue_latency|crdEstablishing_retries|crdEstablishing_unfinished_work_seconds|crdEstablishing_work_duration|crd_finalizer_adds|crd_finalizer_depth|crd_finalizer_longest_running_processor_microseconds|crd_finalizer_queue_latency|crd_finalizer_retries|crd_finalizer_unfinished_work_seconds|crd_finalizer_work_duration|crd_naming_condition_controller_adds|crd_naming_condition_controller_depth|crd_naming_condition_controller_longest_running_processor_microseconds|crd_naming_condition_controller_queue_latency|crd_naming_condition_controller_retries|crd_naming_condition_controller_unfinished_work_seconds|crd_naming_condition_controller_work_duration|crd_openapi_controller_adds|crd_openapi_controller_depth|crd_openapi_controller_longest_running_processor_microseconds|crd_openapi_controller_queue_latency|crd_openapi_controller_retries|crd_openapi_controller_unfinished_work_seconds|crd_openapi_controller_work_duration|DiscoveryController_adds|DiscoveryController_depth|DiscoveryController_longest_running_processor_microseconds|DiscoveryController_queue_latency|DiscoveryController_retries|DiscoveryController_unfinished_work_seconds|DiscoveryController_work_duration|kubeproxy_sync_proxy_rules_latency_microseconds|non_structural_schema_condition_controller_adds|non_structural_schema_condition_controller_depth|non_structural_schema_condition_controller_longest_running_processor_microseconds|non_structural_schema_condition_controller_queue_latency|non_structural_schema_condition_controller_retries|non_structural_schema_condition_controller_unfinished_work_seconds|non_structural_schema_condition_controller_work_duration|rest_client_request_latency_seconds|storage_operation_errors_total|storage_operation_status_count) 43 | sourceLabels: 44 | - __name__ 45 | - action: drop 46 | regex: etcd_(debugging|disk|request|server).* 47 | sourceLabels: 48 | - __name__ 49 | port: https-metrics 50 | scheme: https 51 | tlsConfig: 52 | insecureSkipVerify: true 53 | jobLabel: app.kubernetes.io/name 54 | namespaceSelector: 55 | matchNames: 56 | - kube-system 57 | selector: 58 | matchLabels: 59 | app.kubernetes.io/name: kube-controller-manager 60 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kubernetes-serviceMonitorKubeScheduler.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/name: kube-scheduler 6 | name: kube-scheduler 7 | namespace: monitoring 8 | spec: 9 | endpoints: 10 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 11 | interval: 30s 12 | port: https-metrics 13 | scheme: https 14 | tlsConfig: 15 | insecureSkipVerify: true 16 | jobLabel: app.kubernetes.io/name 17 | namespaceSelector: 18 | matchNames: 19 | - kube-system 20 | selector: 21 | matchLabels: 22 | app.kubernetes.io/name: kube-scheduler 23 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/kubernetes-serviceMonitorKubelet.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/name: kubelet 6 | name: kubelet 7 | namespace: monitoring 8 | spec: 9 | endpoints: 10 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 11 | honorLabels: true 12 | interval: 30s 13 | metricRelabelings: 14 | - action: drop 15 | regex: kubelet_(pod_worker_latency_microseconds|pod_start_latency_microseconds|cgroup_manager_latency_microseconds|pod_worker_start_latency_microseconds|pleg_relist_latency_microseconds|pleg_relist_interval_microseconds|runtime_operations|runtime_operations_latency_microseconds|runtime_operations_errors|eviction_stats_age_microseconds|device_plugin_registration_count|device_plugin_alloc_latency_microseconds|network_plugin_operations_latency_microseconds) 16 | sourceLabels: 17 | - __name__ 18 | - action: drop 19 | regex: scheduler_(e2e_scheduling_latency_microseconds|scheduling_algorithm_predicate_evaluation|scheduling_algorithm_priority_evaluation|scheduling_algorithm_preemption_evaluation|scheduling_algorithm_latency_microseconds|binding_latency_microseconds|scheduling_latency_seconds) 20 | sourceLabels: 21 | - __name__ 22 | - action: drop 23 | regex: apiserver_(request_count|request_latencies|request_latencies_summary|dropped_requests|storage_data_key_generation_latencies_microseconds|storage_transformation_failures_total|storage_transformation_latencies_microseconds|proxy_tunnel_sync_latency_secs) 24 | sourceLabels: 25 | - __name__ 26 | - action: drop 27 | regex: kubelet_docker_(operations|operations_latency_microseconds|operations_errors|operations_timeout) 28 | sourceLabels: 29 | - __name__ 30 | - action: drop 31 | regex: reflector_(items_per_list|items_per_watch|list_duration_seconds|lists_total|short_watches_total|watch_duration_seconds|watches_total) 32 | sourceLabels: 33 | - __name__ 34 | - action: drop 35 | regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary) 36 | sourceLabels: 37 | - __name__ 38 | - action: drop 39 | regex: transformation_(transformation_latencies_microseconds|failures_total) 40 | sourceLabels: 41 | - __name__ 42 | - action: drop 43 | regex: (admission_quota_controller_adds|admission_quota_controller_depth|admission_quota_controller_longest_running_processor_microseconds|admission_quota_controller_queue_latency|admission_quota_controller_unfinished_work_seconds|admission_quota_controller_work_duration|APIServiceOpenAPIAggregationControllerQueue1_adds|APIServiceOpenAPIAggregationControllerQueue1_depth|APIServiceOpenAPIAggregationControllerQueue1_longest_running_processor_microseconds|APIServiceOpenAPIAggregationControllerQueue1_queue_latency|APIServiceOpenAPIAggregationControllerQueue1_retries|APIServiceOpenAPIAggregationControllerQueue1_unfinished_work_seconds|APIServiceOpenAPIAggregationControllerQueue1_work_duration|APIServiceRegistrationController_adds|APIServiceRegistrationController_depth|APIServiceRegistrationController_longest_running_processor_microseconds|APIServiceRegistrationController_queue_latency|APIServiceRegistrationController_retries|APIServiceRegistrationController_unfinished_work_seconds|APIServiceRegistrationController_work_duration|autoregister_adds|autoregister_depth|autoregister_longest_running_processor_microseconds|autoregister_queue_latency|autoregister_retries|autoregister_unfinished_work_seconds|autoregister_work_duration|AvailableConditionController_adds|AvailableConditionController_depth|AvailableConditionController_longest_running_processor_microseconds|AvailableConditionController_queue_latency|AvailableConditionController_retries|AvailableConditionController_unfinished_work_seconds|AvailableConditionController_work_duration|crd_autoregistration_controller_adds|crd_autoregistration_controller_depth|crd_autoregistration_controller_longest_running_processor_microseconds|crd_autoregistration_controller_queue_latency|crd_autoregistration_controller_retries|crd_autoregistration_controller_unfinished_work_seconds|crd_autoregistration_controller_work_duration|crdEstablishing_adds|crdEstablishing_depth|crdEstablishing_longest_running_processor_microseconds|crdEstablishing_queue_latency|crdEstablishing_retries|crdEstablishing_unfinished_work_seconds|crdEstablishing_work_duration|crd_finalizer_adds|crd_finalizer_depth|crd_finalizer_longest_running_processor_microseconds|crd_finalizer_queue_latency|crd_finalizer_retries|crd_finalizer_unfinished_work_seconds|crd_finalizer_work_duration|crd_naming_condition_controller_adds|crd_naming_condition_controller_depth|crd_naming_condition_controller_longest_running_processor_microseconds|crd_naming_condition_controller_queue_latency|crd_naming_condition_controller_retries|crd_naming_condition_controller_unfinished_work_seconds|crd_naming_condition_controller_work_duration|crd_openapi_controller_adds|crd_openapi_controller_depth|crd_openapi_controller_longest_running_processor_microseconds|crd_openapi_controller_queue_latency|crd_openapi_controller_retries|crd_openapi_controller_unfinished_work_seconds|crd_openapi_controller_work_duration|DiscoveryController_adds|DiscoveryController_depth|DiscoveryController_longest_running_processor_microseconds|DiscoveryController_queue_latency|DiscoveryController_retries|DiscoveryController_unfinished_work_seconds|DiscoveryController_work_duration|kubeproxy_sync_proxy_rules_latency_microseconds|non_structural_schema_condition_controller_adds|non_structural_schema_condition_controller_depth|non_structural_schema_condition_controller_longest_running_processor_microseconds|non_structural_schema_condition_controller_queue_latency|non_structural_schema_condition_controller_retries|non_structural_schema_condition_controller_unfinished_work_seconds|non_structural_schema_condition_controller_work_duration|rest_client_request_latency_seconds|storage_operation_errors_total|storage_operation_status_count) 44 | sourceLabels: 45 | - __name__ 46 | port: https-metrics 47 | relabelings: 48 | - sourceLabels: 49 | - __metrics_path__ 50 | targetLabel: metrics_path 51 | scheme: https 52 | tlsConfig: 53 | insecureSkipVerify: true 54 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 55 | honorLabels: true 56 | honorTimestamps: false 57 | interval: 30s 58 | metricRelabelings: 59 | - action: drop 60 | regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s) 61 | sourceLabels: 62 | - __name__ 63 | - action: drop 64 | regex: (container_spec_.*|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);; 65 | sourceLabels: 66 | - __name__ 67 | - pod 68 | - namespace 69 | - action: drop 70 | regex: (container_blkio_device_usage_total);.+ 71 | sourceLabels: 72 | - __name__ 73 | - container 74 | path: /metrics/cadvisor 75 | port: https-metrics 76 | relabelings: 77 | - sourceLabels: 78 | - __metrics_path__ 79 | targetLabel: metrics_path 80 | scheme: https 81 | tlsConfig: 82 | insecureSkipVerify: true 83 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 84 | honorLabels: true 85 | interval: 30s 86 | path: /metrics/probes 87 | port: https-metrics 88 | relabelings: 89 | - sourceLabels: 90 | - __metrics_path__ 91 | targetLabel: metrics_path 92 | scheme: https 93 | tlsConfig: 94 | insecureSkipVerify: true 95 | jobLabel: app.kubernetes.io/name 96 | namespaceSelector: 97 | matchNames: 98 | - kube-system 99 | selector: 100 | matchLabels: 101 | app.kubernetes.io/name: kubelet 102 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/node-exporter-clusterRole.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: node-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 1.2.2 9 | name: node-exporter 10 | rules: 11 | - apiGroups: 12 | - authentication.k8s.io 13 | resources: 14 | - tokenreviews 15 | verbs: 16 | - create 17 | - apiGroups: 18 | - authorization.k8s.io 19 | resources: 20 | - subjectaccessreviews 21 | verbs: 22 | - create 23 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/node-exporter-clusterRoleBinding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: node-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 1.2.2 9 | name: node-exporter 10 | roleRef: 11 | apiGroup: rbac.authorization.k8s.io 12 | kind: ClusterRole 13 | name: node-exporter 14 | subjects: 15 | - kind: ServiceAccount 16 | name: node-exporter 17 | namespace: monitoring 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/node-exporter-daemonset.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: DaemonSet 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: node-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 1.2.2 9 | name: node-exporter 10 | namespace: monitoring 11 | spec: 12 | selector: 13 | matchLabels: 14 | app.kubernetes.io/component: exporter 15 | app.kubernetes.io/name: node-exporter 16 | app.kubernetes.io/part-of: kube-prometheus 17 | template: 18 | metadata: 19 | annotations: 20 | kubectl.kubernetes.io/default-container: node-exporter 21 | labels: 22 | app.kubernetes.io/component: exporter 23 | app.kubernetes.io/name: node-exporter 24 | app.kubernetes.io/part-of: kube-prometheus 25 | app.kubernetes.io/version: 1.2.2 26 | spec: 27 | containers: 28 | - args: 29 | - --web.listen-address=127.0.0.1:9100 30 | - --path.sysfs=/host/sys 31 | - --path.rootfs=/host/root 32 | - --no-collector.wifi 33 | - --no-collector.hwmon 34 | - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/) 35 | - --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15})$ 36 | - --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15})$ 37 | image: quay.io/prometheus/node-exporter:v1.2.2 38 | name: node-exporter 39 | resources: 40 | limits: 41 | cpu: 250m 42 | memory: 180Mi 43 | requests: 44 | cpu: 102m 45 | memory: 180Mi 46 | volumeMounts: 47 | - mountPath: /host/sys 48 | mountPropagation: HostToContainer 49 | name: sys 50 | readOnly: true 51 | - mountPath: /host/root 52 | mountPropagation: HostToContainer 53 | name: root 54 | readOnly: true 55 | - args: 56 | - --logtostderr 57 | - --secure-listen-address=[$(IP)]:9100 58 | - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 59 | - --upstream=http://127.0.0.1:9100/ 60 | env: 61 | - name: IP 62 | valueFrom: 63 | fieldRef: 64 | fieldPath: status.podIP 65 | image: quay.io/brancz/kube-rbac-proxy:v0.11.0 66 | name: kube-rbac-proxy 67 | ports: 68 | - containerPort: 9100 69 | hostPort: 9100 70 | name: https 71 | resources: 72 | limits: 73 | cpu: 20m 74 | memory: 40Mi 75 | requests: 76 | cpu: 10m 77 | memory: 20Mi 78 | securityContext: 79 | runAsGroup: 65532 80 | runAsNonRoot: true 81 | runAsUser: 65532 82 | hostNetwork: true 83 | hostPID: true 84 | nodeSelector: 85 | kubernetes.io/os: linux 86 | securityContext: 87 | runAsNonRoot: true 88 | runAsUser: 65534 89 | serviceAccountName: node-exporter 90 | tolerations: 91 | - operator: Exists 92 | volumes: 93 | - hostPath: 94 | path: /sys 95 | name: sys 96 | - hostPath: 97 | path: / 98 | name: root 99 | updateStrategy: 100 | rollingUpdate: 101 | maxUnavailable: 10% 102 | type: RollingUpdate 103 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/node-exporter-prometheusRule.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: PrometheusRule 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: node-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 1.2.2 9 | prometheus: k8s 10 | role: alert-rules 11 | name: node-exporter-rules 12 | namespace: monitoring 13 | spec: 14 | groups: 15 | - name: node-exporter 16 | rules: 17 | - alert: NodeFilesystemSpaceFillingUp 18 | annotations: 19 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 20 | has only {{ printf "%.2f" $value }}% available space left and is filling 21 | up. 22 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup 23 | summary: Filesystem is predicted to run out of space within the next 24 hours. 24 | expr: | 25 | ( 26 | node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 20 27 | and 28 | predict_linear(node_filesystem_avail_bytes{job="node-exporter",fstype!=""}[6h], 24*60*60) < 0 29 | and 30 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 31 | ) 32 | for: 1h 33 | labels: 34 | severity: warning 35 | - alert: NodeFilesystemSpaceFillingUp 36 | annotations: 37 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 38 | has only {{ printf "%.2f" $value }}% available space left and is filling 39 | up fast. 40 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup 41 | summary: Filesystem is predicted to run out of space within the next 4 hours. 42 | expr: | 43 | ( 44 | node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 15 45 | and 46 | predict_linear(node_filesystem_avail_bytes{job="node-exporter",fstype!=""}[6h], 4*60*60) < 0 47 | and 48 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 49 | ) 50 | for: 1h 51 | labels: 52 | severity: critical 53 | - alert: NodeFilesystemAlmostOutOfSpace 54 | annotations: 55 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 56 | has only {{ printf "%.2f" $value }}% available space left. 57 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutofspace 58 | summary: Filesystem has less than 5% space left. 59 | expr: | 60 | ( 61 | node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 5 62 | and 63 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 64 | ) 65 | for: 30m 66 | labels: 67 | severity: warning 68 | - alert: NodeFilesystemAlmostOutOfSpace 69 | annotations: 70 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 71 | has only {{ printf "%.2f" $value }}% available space left. 72 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutofspace 73 | summary: Filesystem has less than 3% space left. 74 | expr: | 75 | ( 76 | node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 3 77 | and 78 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 79 | ) 80 | for: 30m 81 | labels: 82 | severity: critical 83 | - alert: NodeFilesystemFilesFillingUp 84 | annotations: 85 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 86 | has only {{ printf "%.2f" $value }}% available inodes left and is filling 87 | up. 88 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup 89 | summary: Filesystem is predicted to run out of inodes within the next 24 hours. 90 | expr: | 91 | ( 92 | node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 40 93 | and 94 | predict_linear(node_filesystem_files_free{job="node-exporter",fstype!=""}[6h], 24*60*60) < 0 95 | and 96 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 97 | ) 98 | for: 1h 99 | labels: 100 | severity: warning 101 | - alert: NodeFilesystemFilesFillingUp 102 | annotations: 103 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 104 | has only {{ printf "%.2f" $value }}% available inodes left and is filling 105 | up fast. 106 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup 107 | summary: Filesystem is predicted to run out of inodes within the next 4 hours. 108 | expr: | 109 | ( 110 | node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 20 111 | and 112 | predict_linear(node_filesystem_files_free{job="node-exporter",fstype!=""}[6h], 4*60*60) < 0 113 | and 114 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 115 | ) 116 | for: 1h 117 | labels: 118 | severity: critical 119 | - alert: NodeFilesystemAlmostOutOfFiles 120 | annotations: 121 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 122 | has only {{ printf "%.2f" $value }}% available inodes left. 123 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutoffiles 124 | summary: Filesystem has less than 5% inodes left. 125 | expr: | 126 | ( 127 | node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 5 128 | and 129 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 130 | ) 131 | for: 1h 132 | labels: 133 | severity: warning 134 | - alert: NodeFilesystemAlmostOutOfFiles 135 | annotations: 136 | description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} 137 | has only {{ printf "%.2f" $value }}% available inodes left. 138 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutoffiles 139 | summary: Filesystem has less than 3% inodes left. 140 | expr: | 141 | ( 142 | node_filesystem_files_free{job="node-exporter",fstype!=""} / node_filesystem_files{job="node-exporter",fstype!=""} * 100 < 3 143 | and 144 | node_filesystem_readonly{job="node-exporter",fstype!=""} == 0 145 | ) 146 | for: 1h 147 | labels: 148 | severity: critical 149 | - alert: NodeNetworkReceiveErrs 150 | annotations: 151 | description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered 152 | {{ printf "%.0f" $value }} receive errors in the last two minutes.' 153 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodenetworkreceiveerrs 154 | summary: Network interface is reporting many receive errors. 155 | expr: | 156 | rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01 157 | for: 1h 158 | labels: 159 | severity: warning 160 | - alert: NodeNetworkTransmitErrs 161 | annotations: 162 | description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered 163 | {{ printf "%.0f" $value }} transmit errors in the last two minutes.' 164 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodenetworktransmiterrs 165 | summary: Network interface is reporting many transmit errors. 166 | expr: | 167 | rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m]) > 0.01 168 | for: 1h 169 | labels: 170 | severity: warning 171 | - alert: NodeHighNumberConntrackEntriesUsed 172 | annotations: 173 | description: '{{ $value | humanizePercentage }} of conntrack entries are used.' 174 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodehighnumberconntrackentriesused 175 | summary: Number of conntrack are getting close to the limit. 176 | expr: | 177 | (node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75 178 | labels: 179 | severity: warning 180 | - alert: NodeTextFileCollectorScrapeError 181 | annotations: 182 | description: Node Exporter text file collector failed to scrape. 183 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodetextfilecollectorscrapeerror 184 | summary: Node Exporter text file collector failed to scrape. 185 | expr: | 186 | node_textfile_scrape_error{job="node-exporter"} == 1 187 | labels: 188 | severity: warning 189 | - alert: NodeClockSkewDetected 190 | annotations: 191 | description: Clock on {{ $labels.instance }} is out of sync by more than 300s. 192 | Ensure NTP is configured correctly on this host. 193 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclockskewdetected 194 | summary: Clock skew detected. 195 | expr: | 196 | ( 197 | node_timex_offset_seconds > 0.05 198 | and 199 | deriv(node_timex_offset_seconds[5m]) >= 0 200 | ) 201 | or 202 | ( 203 | node_timex_offset_seconds < -0.05 204 | and 205 | deriv(node_timex_offset_seconds[5m]) <= 0 206 | ) 207 | for: 10m 208 | labels: 209 | severity: warning 210 | - alert: NodeClockNotSynchronising 211 | annotations: 212 | description: Clock on {{ $labels.instance }} is not synchronising. Ensure 213 | NTP is configured on this host. 214 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclocknotsynchronising 215 | summary: Clock not synchronising. 216 | expr: | 217 | min_over_time(node_timex_sync_status[5m]) == 0 218 | and 219 | node_timex_maxerror_seconds >= 16 220 | for: 10m 221 | labels: 222 | severity: warning 223 | - alert: NodeRAIDDegraded 224 | annotations: 225 | description: RAID array '{{ $labels.device }}' on {{ $labels.instance }} is 226 | in degraded state due to one or more disks failures. Number of spare drives 227 | is insufficient to fix issue automatically. 228 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/noderaiddegraded 229 | summary: RAID Array is degraded 230 | expr: | 231 | node_md_disks_required - ignoring (state) (node_md_disks{state="active"}) > 0 232 | for: 15m 233 | labels: 234 | severity: critical 235 | - alert: NodeRAIDDiskFailure 236 | annotations: 237 | description: At least one device in RAID array on {{ $labels.instance }} failed. 238 | Array '{{ $labels.device }}' needs attention and possibly a disk swap. 239 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/noderaiddiskfailure 240 | summary: Failed device in RAID array 241 | expr: | 242 | node_md_disks{state="failed"} > 0 243 | labels: 244 | severity: warning 245 | - alert: NodeFileDescriptorLimit 246 | annotations: 247 | description: File descriptors limit at {{ $labels.instance }} is currently 248 | at {{ printf "%.2f" $value }}%. 249 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefiledescriptorlimit 250 | summary: Kernel is predicted to exhaust file descriptors limit soon. 251 | expr: | 252 | ( 253 | node_filefd_allocated{job="node-exporter"} * 100 / node_filefd_maximum{job="node-exporter"} > 70 254 | ) 255 | for: 15m 256 | labels: 257 | severity: warning 258 | - alert: NodeFileDescriptorLimit 259 | annotations: 260 | description: File descriptors limit at {{ $labels.instance }} is currently 261 | at {{ printf "%.2f" $value }}%. 262 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefiledescriptorlimit 263 | summary: Kernel is predicted to exhaust file descriptors limit soon. 264 | expr: | 265 | ( 266 | node_filefd_allocated{job="node-exporter"} * 100 / node_filefd_maximum{job="node-exporter"} > 90 267 | ) 268 | for: 15m 269 | labels: 270 | severity: critical 271 | - name: node-exporter.rules 272 | rules: 273 | - expr: | 274 | count without (cpu, mode) ( 275 | node_cpu_seconds_total{job="node-exporter",mode="idle"} 276 | ) 277 | record: instance:node_num_cpu:sum 278 | - expr: | 279 | 1 - avg without (cpu, mode) ( 280 | rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[5m]) 281 | ) 282 | record: instance:node_cpu_utilisation:rate5m 283 | - expr: | 284 | ( 285 | node_load1{job="node-exporter"} 286 | / 287 | instance:node_num_cpu:sum{job="node-exporter"} 288 | ) 289 | record: instance:node_load1_per_cpu:ratio 290 | - expr: | 291 | 1 - ( 292 | ( 293 | node_memory_MemAvailable_bytes{job="node-exporter"} 294 | or 295 | ( 296 | node_memory_Buffers_bytes{job="node-exporter"} 297 | + 298 | node_memory_Cached_bytes{job="node-exporter"} 299 | + 300 | node_memory_MemFree_bytes{job="node-exporter"} 301 | + 302 | node_memory_Slab_bytes{job="node-exporter"} 303 | ) 304 | ) 305 | / 306 | node_memory_MemTotal_bytes{job="node-exporter"} 307 | ) 308 | record: instance:node_memory_utilisation:ratio 309 | - expr: | 310 | rate(node_vmstat_pgmajfault{job="node-exporter"}[5m]) 311 | record: instance:node_vmstat_pgmajfault:rate5m 312 | - expr: | 313 | rate(node_disk_io_time_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[5m]) 314 | record: instance_device:node_disk_io_time_seconds:rate5m 315 | - expr: | 316 | rate(node_disk_io_time_weighted_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[5m]) 317 | record: instance_device:node_disk_io_time_weighted_seconds:rate5m 318 | - expr: | 319 | sum without (device) ( 320 | rate(node_network_receive_bytes_total{job="node-exporter", device!="lo"}[5m]) 321 | ) 322 | record: instance:node_network_receive_bytes_excluding_lo:rate5m 323 | - expr: | 324 | sum without (device) ( 325 | rate(node_network_transmit_bytes_total{job="node-exporter", device!="lo"}[5m]) 326 | ) 327 | record: instance:node_network_transmit_bytes_excluding_lo:rate5m 328 | - expr: | 329 | sum without (device) ( 330 | rate(node_network_receive_drop_total{job="node-exporter", device!="lo"}[5m]) 331 | ) 332 | record: instance:node_network_receive_drop_excluding_lo:rate5m 333 | - expr: | 334 | sum without (device) ( 335 | rate(node_network_transmit_drop_total{job="node-exporter", device!="lo"}[5m]) 336 | ) 337 | record: instance:node_network_transmit_drop_excluding_lo:rate5m 338 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/node-exporter-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: node-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 1.2.2 9 | name: node-exporter 10 | namespace: monitoring 11 | spec: 12 | clusterIP: None 13 | ports: 14 | - name: https 15 | port: 9100 16 | targetPort: https 17 | selector: 18 | app.kubernetes.io/component: exporter 19 | app.kubernetes.io/name: node-exporter 20 | app.kubernetes.io/part-of: kube-prometheus 21 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/node-exporter-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: node-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 1.2.2 9 | name: node-exporter 10 | namespace: monitoring 11 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/node-exporter-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: exporter 6 | app.kubernetes.io/name: node-exporter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 1.2.2 9 | name: node-exporter 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 14 | interval: 15s 15 | port: https 16 | relabelings: 17 | - action: replace 18 | regex: (.*) 19 | replacement: $1 20 | sourceLabels: 21 | - __meta_kubernetes_pod_node_name 22 | targetLabel: instance 23 | scheme: https 24 | tlsConfig: 25 | insecureSkipVerify: true 26 | jobLabel: app.kubernetes.io/name 27 | selector: 28 | matchLabels: 29 | app.kubernetes.io/component: exporter 30 | app.kubernetes.io/name: node-exporter 31 | app.kubernetes.io/part-of: kube-prometheus 32 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-apiService.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apiregistration.k8s.io/v1 2 | kind: APIService 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: v1beta1.metrics.k8s.io 10 | spec: 11 | group: metrics.k8s.io 12 | groupPriorityMinimum: 100 13 | insecureSkipTLSVerify: true 14 | service: 15 | name: prometheus-adapter 16 | namespace: monitoring 17 | version: v1beta1 18 | versionPriority: 100 19 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-clusterRole.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: prometheus-adapter 10 | rules: 11 | - apiGroups: 12 | - "" 13 | resources: 14 | - nodes 15 | - namespaces 16 | - pods 17 | - services 18 | verbs: 19 | - get 20 | - list 21 | - watch 22 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | rbac.authorization.k8s.io/aggregate-to-admin: "true" 10 | rbac.authorization.k8s.io/aggregate-to-edit: "true" 11 | rbac.authorization.k8s.io/aggregate-to-view: "true" 12 | name: system:aggregated-metrics-reader 13 | rules: 14 | - apiGroups: 15 | - metrics.k8s.io 16 | resources: 17 | - pods 18 | - nodes 19 | verbs: 20 | - get 21 | - list 22 | - watch 23 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-clusterRoleBinding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: prometheus-adapter 10 | roleRef: 11 | apiGroup: rbac.authorization.k8s.io 12 | kind: ClusterRole 13 | name: prometheus-adapter 14 | subjects: 15 | - kind: ServiceAccount 16 | name: prometheus-adapter 17 | namespace: monitoring 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-clusterRoleBindingDelegator.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: resource-metrics:system:auth-delegator 10 | roleRef: 11 | apiGroup: rbac.authorization.k8s.io 12 | kind: ClusterRole 13 | name: system:auth-delegator 14 | subjects: 15 | - kind: ServiceAccount 16 | name: prometheus-adapter 17 | namespace: monitoring 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-clusterRoleServerResources.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: resource-metrics-server-resources 10 | rules: 11 | - apiGroups: 12 | - metrics.k8s.io 13 | resources: 14 | - '*' 15 | verbs: 16 | - '*' 17 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-configMap.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | data: 3 | config.yaml: |- 4 | "resourceRules": 5 | "cpu": 6 | "containerLabel": "container" 7 | "containerQuery": | 8 | sum by (<<.GroupBy>>) ( 9 | irate ( 10 | container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="",pod!=""}[120s] 11 | ) 12 | ) 13 | "nodeQuery": | 14 | sum by (<<.GroupBy>>) ( 15 | 1 - irate( 16 | node_cpu_seconds_total{mode="idle"}[60s] 17 | ) 18 | * on(namespace, pod) group_left(node) ( 19 | node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>} 20 | ) 21 | ) 22 | or sum by (<<.GroupBy>>) ( 23 | 1 - irate( 24 | windows_cpu_time_total{mode="idle", job="windows-exporter",<<.LabelMatchers>>}[4m] 25 | ) 26 | ) 27 | "resources": 28 | "overrides": 29 | "namespace": 30 | "resource": "namespace" 31 | "node": 32 | "resource": "node" 33 | "pod": 34 | "resource": "pod" 35 | "memory": 36 | "containerLabel": "container" 37 | "containerQuery": | 38 | sum by (<<.GroupBy>>) ( 39 | container_memory_working_set_bytes{<<.LabelMatchers>>,container!="",pod!=""} 40 | ) 41 | "nodeQuery": | 42 | sum by (<<.GroupBy>>) ( 43 | node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>} 44 | - 45 | node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>} 46 | ) 47 | or sum by (<<.GroupBy>>) ( 48 | windows_cs_physical_memory_bytes{job="windows-exporter",<<.LabelMatchers>>} 49 | - 50 | windows_memory_available_bytes{job="windows-exporter",<<.LabelMatchers>>} 51 | ) 52 | "resources": 53 | "overrides": 54 | "instance": 55 | "resource": "node" 56 | "namespace": 57 | "resource": "namespace" 58 | "pod": 59 | "resource": "pod" 60 | "window": "5m" 61 | kind: ConfigMap 62 | metadata: 63 | labels: 64 | app.kubernetes.io/component: metrics-adapter 65 | app.kubernetes.io/name: prometheus-adapter 66 | app.kubernetes.io/part-of: kube-prometheus 67 | app.kubernetes.io/version: 0.9.1 68 | name: adapter-config 69 | namespace: monitoring 70 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: prometheus-adapter 10 | namespace: monitoring 11 | spec: 12 | replicas: 2 13 | selector: 14 | matchLabels: 15 | app.kubernetes.io/component: metrics-adapter 16 | app.kubernetes.io/name: prometheus-adapter 17 | app.kubernetes.io/part-of: kube-prometheus 18 | strategy: 19 | rollingUpdate: 20 | maxSurge: 1 21 | maxUnavailable: 1 22 | template: 23 | metadata: 24 | labels: 25 | app.kubernetes.io/component: metrics-adapter 26 | app.kubernetes.io/name: prometheus-adapter 27 | app.kubernetes.io/part-of: kube-prometheus 28 | app.kubernetes.io/version: 0.9.1 29 | spec: 30 | containers: 31 | - args: 32 | - --cert-dir=/var/run/serving-cert 33 | - --config=/etc/adapter/config.yaml 34 | - --logtostderr=true 35 | - --metrics-relist-interval=1m 36 | - --prometheus-url=http://prometheus-k8s.monitoring.svc:9090/ 37 | - --secure-port=6443 38 | - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA 39 | image: docker.io/v5cn/prometheus-adapter:v0.9.1 40 | name: prometheus-adapter 41 | ports: 42 | - containerPort: 6443 43 | resources: 44 | limits: 45 | cpu: 250m 46 | memory: 180Mi 47 | requests: 48 | cpu: 102m 49 | memory: 180Mi 50 | volumeMounts: 51 | - mountPath: /tmp 52 | name: tmpfs 53 | readOnly: false 54 | - mountPath: /var/run/serving-cert 55 | name: volume-serving-cert 56 | readOnly: false 57 | - mountPath: /etc/adapter 58 | name: config 59 | readOnly: false 60 | nodeSelector: 61 | kubernetes.io/os: linux 62 | serviceAccountName: prometheus-adapter 63 | volumes: 64 | - emptyDir: {} 65 | name: tmpfs 66 | - emptyDir: {} 67 | name: volume-serving-cert 68 | - configMap: 69 | name: adapter-config 70 | name: config 71 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-podDisruptionBudget.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: policy/v1 2 | kind: PodDisruptionBudget 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: prometheus-adapter 10 | namespace: monitoring 11 | spec: 12 | minAvailable: 1 13 | selector: 14 | matchLabels: 15 | app.kubernetes.io/component: metrics-adapter 16 | app.kubernetes.io/name: prometheus-adapter 17 | app.kubernetes.io/part-of: kube-prometheus 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-roleBindingAuthReader.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: RoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: resource-metrics-auth-reader 10 | namespace: kube-system 11 | roleRef: 12 | apiGroup: rbac.authorization.k8s.io 13 | kind: Role 14 | name: extension-apiserver-authentication-reader 15 | subjects: 16 | - kind: ServiceAccount 17 | name: prometheus-adapter 18 | namespace: monitoring 19 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: prometheus-adapter 10 | namespace: monitoring 11 | spec: 12 | ports: 13 | - name: https 14 | port: 443 15 | targetPort: 6443 16 | selector: 17 | app.kubernetes.io/component: metrics-adapter 18 | app.kubernetes.io/name: prometheus-adapter 19 | app.kubernetes.io/part-of: kube-prometheus 20 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: prometheus-adapter 10 | namespace: monitoring 11 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-adapter-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: metrics-adapter 6 | app.kubernetes.io/name: prometheus-adapter 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.9.1 9 | name: prometheus-adapter 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 14 | interval: 30s 15 | metricRelabelings: 16 | - action: drop 17 | regex: (apiserver_client_certificate_.*|apiserver_envelope_.*|apiserver_flowcontrol_.*|apiserver_storage_.*|apiserver_webhooks_.*|workqueue_.*) 18 | sourceLabels: 19 | - __name__ 20 | port: https 21 | scheme: https 22 | tlsConfig: 23 | insecureSkipVerify: true 24 | selector: 25 | matchLabels: 26 | app.kubernetes.io/component: metrics-adapter 27 | app.kubernetes.io/name: prometheus-adapter 28 | app.kubernetes.io/part-of: kube-prometheus 29 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-clusterRole.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | name: prometheus-k8s 10 | rules: 11 | - apiGroups: 12 | - "" 13 | resources: 14 | - nodes/metrics 15 | verbs: 16 | - get 17 | - nonResourceURLs: 18 | - /metrics 19 | verbs: 20 | - get 21 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-clusterRoleBinding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | name: prometheus-k8s 10 | roleRef: 11 | apiGroup: rbac.authorization.k8s.io 12 | kind: ClusterRole 13 | name: prometheus-k8s 14 | subjects: 15 | - kind: ServiceAccount 16 | name: prometheus-k8s 17 | namespace: monitoring 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-operator-prometheusRule.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: PrometheusRule 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: controller 6 | app.kubernetes.io/name: prometheus-operator 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.51.2 9 | prometheus: k8s 10 | role: alert-rules 11 | name: prometheus-operator-rules 12 | namespace: monitoring 13 | spec: 14 | groups: 15 | - name: prometheus-operator 16 | rules: 17 | - alert: PrometheusOperatorListErrors 18 | annotations: 19 | description: Errors while performing List operations in controller {{$labels.controller}} 20 | in {{$labels.namespace}} namespace. 21 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/prometheusoperatorlisterrors 22 | summary: Errors while performing list operations in controller. 23 | expr: | 24 | (sum by (controller,namespace) (rate(prometheus_operator_list_operations_failed_total{job="prometheus-operator",namespace="monitoring"}[10m])) / sum by (controller,namespace) (rate(prometheus_operator_list_operations_total{job="prometheus-operator",namespace="monitoring"}[10m]))) > 0.4 25 | for: 15m 26 | labels: 27 | severity: warning 28 | - alert: PrometheusOperatorWatchErrors 29 | annotations: 30 | description: Errors while performing watch operations in controller {{$labels.controller}} 31 | in {{$labels.namespace}} namespace. 32 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/prometheusoperatorwatcherrors 33 | summary: Errors while performing watch operations in controller. 34 | expr: | 35 | (sum by (controller,namespace) (rate(prometheus_operator_watch_operations_failed_total{job="prometheus-operator",namespace="monitoring"}[10m])) / sum by (controller,namespace) (rate(prometheus_operator_watch_operations_total{job="prometheus-operator",namespace="monitoring"}[10m]))) > 0.4 36 | for: 15m 37 | labels: 38 | severity: warning 39 | - alert: PrometheusOperatorSyncFailed 40 | annotations: 41 | description: Controller {{ $labels.controller }} in {{ $labels.namespace }} 42 | namespace fails to reconcile {{ $value }} objects. 43 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/prometheusoperatorsyncfailed 44 | summary: Last controller reconciliation failed 45 | expr: | 46 | min_over_time(prometheus_operator_syncs{status="failed",job="prometheus-operator",namespace="monitoring"}[5m]) > 0 47 | for: 10m 48 | labels: 49 | severity: warning 50 | - alert: PrometheusOperatorReconcileErrors 51 | annotations: 52 | description: '{{ $value | humanizePercentage }} of reconciling operations 53 | failed for {{ $labels.controller }} controller in {{ $labels.namespace }} 54 | namespace.' 55 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/prometheusoperatorreconcileerrors 56 | summary: Errors while reconciling controller. 57 | expr: | 58 | (sum by (controller,namespace) (rate(prometheus_operator_reconcile_errors_total{job="prometheus-operator",namespace="monitoring"}[5m]))) / (sum by (controller,namespace) (rate(prometheus_operator_reconcile_operations_total{job="prometheus-operator",namespace="monitoring"}[5m]))) > 0.1 59 | for: 10m 60 | labels: 61 | severity: warning 62 | - alert: PrometheusOperatorNodeLookupErrors 63 | annotations: 64 | description: Errors while reconciling Prometheus in {{ $labels.namespace }} 65 | Namespace. 66 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/prometheusoperatornodelookuperrors 67 | summary: Errors while reconciling Prometheus. 68 | expr: | 69 | rate(prometheus_operator_node_address_lookup_errors_total{job="prometheus-operator",namespace="monitoring"}[5m]) > 0.1 70 | for: 10m 71 | labels: 72 | severity: warning 73 | - alert: PrometheusOperatorNotReady 74 | annotations: 75 | description: Prometheus operator in {{ $labels.namespace }} namespace isn't 76 | ready to reconcile {{ $labels.controller }} resources. 77 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/prometheusoperatornotready 78 | summary: Prometheus operator not ready 79 | expr: | 80 | min by(namespace, controller) (max_over_time(prometheus_operator_ready{job="prometheus-operator",namespace="monitoring"}[5m]) == 0) 81 | for: 5m 82 | labels: 83 | severity: warning 84 | - alert: PrometheusOperatorRejectedResources 85 | annotations: 86 | description: Prometheus operator in {{ $labels.namespace }} namespace rejected 87 | {{ printf "%0.0f" $value }} {{ $labels.controller }}/{{ $labels.resource 88 | }} resources. 89 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/prometheusoperatorrejectedresources 90 | summary: Resources rejected by Prometheus operator 91 | expr: | 92 | min_over_time(prometheus_operator_managed_resources{state="rejected",job="prometheus-operator",namespace="monitoring"}[5m]) > 0 93 | for: 5m 94 | labels: 95 | severity: warning 96 | - name: config-reloaders 97 | rules: 98 | - alert: ConfigReloaderSidecarErrors 99 | annotations: 100 | description: |- 101 | Errors encountered while the {{$labels.pod}} config-reloader sidecar attempts to sync config in {{$labels.namespace}} namespace. 102 | As a result, configuration for service running in {{$labels.pod}} may be stale and cannot be updated anymore. 103 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/configreloadersidecarerrors 104 | summary: config-reloader sidecar has not had a successful reload for 10m 105 | expr: | 106 | max_over_time(reloader_last_reload_successful{namespace=~".+"}[5m]) == 0 107 | for: 10m 108 | labels: 109 | severity: warning 110 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-operator-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: controller 6 | app.kubernetes.io/name: prometheus-operator 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.51.2 9 | name: prometheus-operator 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token 14 | honorLabels: true 15 | port: https 16 | scheme: https 17 | tlsConfig: 18 | insecureSkipVerify: true 19 | selector: 20 | matchLabels: 21 | app.kubernetes.io/component: controller 22 | app.kubernetes.io/name: prometheus-operator 23 | app.kubernetes.io/part-of: kube-prometheus 24 | app.kubernetes.io/version: 0.51.2 25 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-podDisruptionBudget.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: policy/v1 2 | kind: PodDisruptionBudget 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | name: prometheus-k8s 10 | namespace: monitoring 11 | spec: 12 | minAvailable: 1 13 | selector: 14 | matchLabels: 15 | app.kubernetes.io/component: prometheus 16 | app.kubernetes.io/name: prometheus 17 | app.kubernetes.io/part-of: kube-prometheus 18 | prometheus: k8s 19 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-prometheus.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: Prometheus 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | prometheus: k8s 10 | name: k8s 11 | namespace: monitoring 12 | spec: 13 | alerting: 14 | alertmanagers: 15 | - apiVersion: v2 16 | name: alertmanager-main 17 | namespace: monitoring 18 | port: web 19 | enableFeatures: [] 20 | externalLabels: {} 21 | image: quay.io/prometheus/prometheus:v2.30.3 22 | nodeSelector: 23 | kubernetes.io/os: linux 24 | podMetadata: 25 | labels: 26 | app.kubernetes.io/component: prometheus 27 | app.kubernetes.io/name: prometheus 28 | app.kubernetes.io/part-of: kube-prometheus 29 | app.kubernetes.io/version: 2.30.3 30 | podMonitorNamespaceSelector: {} 31 | podMonitorSelector: {} 32 | probeNamespaceSelector: {} 33 | probeSelector: {} 34 | replicas: 2 35 | resources: 36 | requests: 37 | memory: 400Mi 38 | ruleNamespaceSelector: {} 39 | ruleSelector: {} 40 | securityContext: 41 | fsGroup: 2000 42 | runAsNonRoot: true 43 | runAsUser: 1000 44 | serviceAccountName: prometheus-k8s 45 | serviceMonitorNamespaceSelector: {} 46 | serviceMonitorSelector: {} 47 | version: 2.30.3 48 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-prometheusRule.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: PrometheusRule 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | prometheus: k8s 10 | role: alert-rules 11 | name: prometheus-k8s-prometheus-rules 12 | namespace: monitoring 13 | spec: 14 | groups: 15 | - name: prometheus 16 | rules: 17 | - alert: PrometheusBadConfig 18 | annotations: 19 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has failed to 20 | reload its configuration. 21 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusbadconfig 22 | summary: Failed Prometheus configuration reload. 23 | expr: | 24 | # Without max_over_time, failed scrapes could create false negatives, see 25 | # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. 26 | max_over_time(prometheus_config_last_reload_successful{job="prometheus-k8s",namespace="monitoring"}[5m]) == 0 27 | for: 10m 28 | labels: 29 | severity: critical 30 | - alert: PrometheusNotificationQueueRunningFull 31 | annotations: 32 | description: Alert notification queue of Prometheus {{$labels.namespace}}/{{$labels.pod}} 33 | is running full. 34 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusnotificationqueuerunningfull 35 | summary: Prometheus alert notification queue predicted to run full in less 36 | than 30m. 37 | expr: | 38 | # Without min_over_time, failed scrapes could create false negatives, see 39 | # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. 40 | ( 41 | predict_linear(prometheus_notifications_queue_length{job="prometheus-k8s",namespace="monitoring"}[5m], 60 * 30) 42 | > 43 | min_over_time(prometheus_notifications_queue_capacity{job="prometheus-k8s",namespace="monitoring"}[5m]) 44 | ) 45 | for: 15m 46 | labels: 47 | severity: warning 48 | - alert: PrometheusErrorSendingAlertsToSomeAlertmanagers 49 | annotations: 50 | description: '{{ printf "%.1f" $value }}% errors while sending alerts from 51 | Prometheus {{$labels.namespace}}/{{$labels.pod}} to Alertmanager {{$labels.alertmanager}}.' 52 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheuserrorsendingalertstosomealertmanagers 53 | summary: Prometheus has encountered more than 1% errors sending alerts to 54 | a specific Alertmanager. 55 | expr: | 56 | ( 57 | rate(prometheus_notifications_errors_total{job="prometheus-k8s",namespace="monitoring"}[5m]) 58 | / 59 | rate(prometheus_notifications_sent_total{job="prometheus-k8s",namespace="monitoring"}[5m]) 60 | ) 61 | * 100 62 | > 1 63 | for: 15m 64 | labels: 65 | severity: warning 66 | - alert: PrometheusNotConnectedToAlertmanagers 67 | annotations: 68 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is not connected 69 | to any Alertmanagers. 70 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusnotconnectedtoalertmanagers 71 | summary: Prometheus is not connected to any Alertmanagers. 72 | expr: | 73 | # Without max_over_time, failed scrapes could create false negatives, see 74 | # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. 75 | max_over_time(prometheus_notifications_alertmanagers_discovered{job="prometheus-k8s",namespace="monitoring"}[5m]) < 1 76 | for: 10m 77 | labels: 78 | severity: warning 79 | - alert: PrometheusTSDBReloadsFailing 80 | annotations: 81 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has detected 82 | {{$value | humanize}} reload failures over the last 3h. 83 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheustsdbreloadsfailing 84 | summary: Prometheus has issues reloading blocks from disk. 85 | expr: | 86 | increase(prometheus_tsdb_reloads_failures_total{job="prometheus-k8s",namespace="monitoring"}[3h]) > 0 87 | for: 4h 88 | labels: 89 | severity: warning 90 | - alert: PrometheusTSDBCompactionsFailing 91 | annotations: 92 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has detected 93 | {{$value | humanize}} compaction failures over the last 3h. 94 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheustsdbcompactionsfailing 95 | summary: Prometheus has issues compacting blocks. 96 | expr: | 97 | increase(prometheus_tsdb_compactions_failed_total{job="prometheus-k8s",namespace="monitoring"}[3h]) > 0 98 | for: 4h 99 | labels: 100 | severity: warning 101 | - alert: PrometheusNotIngestingSamples 102 | annotations: 103 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is not ingesting 104 | samples. 105 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusnotingestingsamples 106 | summary: Prometheus is not ingesting samples. 107 | expr: | 108 | ( 109 | rate(prometheus_tsdb_head_samples_appended_total{job="prometheus-k8s",namespace="monitoring"}[5m]) <= 0 110 | and 111 | ( 112 | sum without(scrape_job) (prometheus_target_metadata_cache_entries{job="prometheus-k8s",namespace="monitoring"}) > 0 113 | or 114 | sum without(rule_group) (prometheus_rule_group_rules{job="prometheus-k8s",namespace="monitoring"}) > 0 115 | ) 116 | ) 117 | for: 10m 118 | labels: 119 | severity: warning 120 | - alert: PrometheusDuplicateTimestamps 121 | annotations: 122 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is dropping 123 | {{ printf "%.4g" $value }} samples/s with different values but duplicated 124 | timestamp. 125 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusduplicatetimestamps 126 | summary: Prometheus is dropping samples with duplicate timestamps. 127 | expr: | 128 | rate(prometheus_target_scrapes_sample_duplicate_timestamp_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0 129 | for: 10m 130 | labels: 131 | severity: warning 132 | - alert: PrometheusOutOfOrderTimestamps 133 | annotations: 134 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} is dropping 135 | {{ printf "%.4g" $value }} samples/s with timestamps arriving out of order. 136 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusoutofordertimestamps 137 | summary: Prometheus drops samples with out-of-order timestamps. 138 | expr: | 139 | rate(prometheus_target_scrapes_sample_out_of_order_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0 140 | for: 10m 141 | labels: 142 | severity: warning 143 | - alert: PrometheusRemoteStorageFailures 144 | annotations: 145 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} failed to send 146 | {{ printf "%.1f" $value }}% of the samples to {{ $labels.remote_name}}:{{ 147 | $labels.url }} 148 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusremotestoragefailures 149 | summary: Prometheus fails to send samples to remote storage. 150 | expr: | 151 | ( 152 | (rate(prometheus_remote_storage_failed_samples_total{job="prometheus-k8s",namespace="monitoring"}[5m]) or rate(prometheus_remote_storage_samples_failed_total{job="prometheus-k8s",namespace="monitoring"}[5m])) 153 | / 154 | ( 155 | (rate(prometheus_remote_storage_failed_samples_total{job="prometheus-k8s",namespace="monitoring"}[5m]) or rate(prometheus_remote_storage_samples_failed_total{job="prometheus-k8s",namespace="monitoring"}[5m])) 156 | + 157 | (rate(prometheus_remote_storage_succeeded_samples_total{job="prometheus-k8s",namespace="monitoring"}[5m]) or rate(prometheus_remote_storage_samples_total{job="prometheus-k8s",namespace="monitoring"}[5m])) 158 | ) 159 | ) 160 | * 100 161 | > 1 162 | for: 15m 163 | labels: 164 | severity: critical 165 | - alert: PrometheusRemoteWriteBehind 166 | annotations: 167 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote write 168 | is {{ printf "%.1f" $value }}s behind for {{ $labels.remote_name}}:{{ $labels.url 169 | }}. 170 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusremotewritebehind 171 | summary: Prometheus remote write is behind. 172 | expr: | 173 | # Without max_over_time, failed scrapes could create false negatives, see 174 | # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. 175 | ( 176 | max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job="prometheus-k8s",namespace="monitoring"}[5m]) 177 | - ignoring(remote_name, url) group_right 178 | max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job="prometheus-k8s",namespace="monitoring"}[5m]) 179 | ) 180 | > 120 181 | for: 15m 182 | labels: 183 | severity: critical 184 | - alert: PrometheusRemoteWriteDesiredShards 185 | annotations: 186 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote write 187 | desired shards calculation wants to run {{ $value }} shards for queue {{ 188 | $labels.remote_name}}:{{ $labels.url }}, which is more than the max of {{ 189 | printf `prometheus_remote_storage_shards_max{instance="%s",job="prometheus-k8s",namespace="monitoring"}` 190 | $labels.instance | query | first | value }}. 191 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusremotewritedesiredshards 192 | summary: Prometheus remote write desired shards calculation wants to run more 193 | than configured max shards. 194 | expr: | 195 | # Without max_over_time, failed scrapes could create false negatives, see 196 | # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details. 197 | ( 198 | max_over_time(prometheus_remote_storage_shards_desired{job="prometheus-k8s",namespace="monitoring"}[5m]) 199 | > 200 | max_over_time(prometheus_remote_storage_shards_max{job="prometheus-k8s",namespace="monitoring"}[5m]) 201 | ) 202 | for: 15m 203 | labels: 204 | severity: warning 205 | - alert: PrometheusRuleFailures 206 | annotations: 207 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has failed to 208 | evaluate {{ printf "%.0f" $value }} rules in the last 5m. 209 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusrulefailures 210 | summary: Prometheus is failing rule evaluations. 211 | expr: | 212 | increase(prometheus_rule_evaluation_failures_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0 213 | for: 15m 214 | labels: 215 | severity: critical 216 | - alert: PrometheusMissingRuleEvaluations 217 | annotations: 218 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has missed {{ 219 | printf "%.0f" $value }} rule group evaluations in the last 5m. 220 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusmissingruleevaluations 221 | summary: Prometheus is missing rule evaluations due to slow rule group evaluation. 222 | expr: | 223 | increase(prometheus_rule_group_iterations_missed_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0 224 | for: 15m 225 | labels: 226 | severity: warning 227 | - alert: PrometheusTargetLimitHit 228 | annotations: 229 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has dropped 230 | {{ printf "%.0f" $value }} targets because the number of targets exceeded 231 | the configured target_limit. 232 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheustargetlimithit 233 | summary: Prometheus has dropped targets because some scrape configs have exceeded 234 | the targets limit. 235 | expr: | 236 | increase(prometheus_target_scrape_pool_exceeded_target_limit_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0 237 | for: 15m 238 | labels: 239 | severity: warning 240 | - alert: PrometheusLabelLimitHit 241 | annotations: 242 | description: Prometheus {{$labels.namespace}}/{{$labels.pod}} has dropped 243 | {{ printf "%.0f" $value }} targets because some samples exceeded the configured 244 | label_limit, label_name_length_limit or label_value_length_limit. 245 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheuslabellimithit 246 | summary: Prometheus has dropped targets because some scrape configs have exceeded 247 | the labels limit. 248 | expr: | 249 | increase(prometheus_target_scrape_pool_exceeded_label_limits_total{job="prometheus-k8s",namespace="monitoring"}[5m]) > 0 250 | for: 15m 251 | labels: 252 | severity: warning 253 | - alert: PrometheusTargetSyncFailure 254 | annotations: 255 | description: '{{ printf "%.0f" $value }} targets in Prometheus {{$labels.namespace}}/{{$labels.pod}} 256 | have failed to sync because invalid configuration was supplied.' 257 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheustargetsyncfailure 258 | summary: Prometheus has failed to sync targets. 259 | expr: | 260 | increase(prometheus_target_sync_failed_total{job="prometheus-k8s",namespace="monitoring"}[30m]) > 0 261 | for: 5m 262 | labels: 263 | severity: critical 264 | - alert: PrometheusErrorSendingAlertsToAnyAlertmanager 265 | annotations: 266 | description: '{{ printf "%.1f" $value }}% minimum errors while sending alerts 267 | from Prometheus {{$labels.namespace}}/{{$labels.pod}} to any Alertmanager.' 268 | runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheuserrorsendingalertstoanyalertmanager 269 | summary: Prometheus encounters more than 3% errors sending alerts to any Alertmanager. 270 | expr: | 271 | min without (alertmanager) ( 272 | rate(prometheus_notifications_errors_total{job="prometheus-k8s",namespace="monitoring",alertmanager!~``}[5m]) 273 | / 274 | rate(prometheus_notifications_sent_total{job="prometheus-k8s",namespace="monitoring",alertmanager!~``}[5m]) 275 | ) 276 | * 100 277 | > 3 278 | for: 15m 279 | labels: 280 | severity: critical 281 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-roleBindingConfig.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: RoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | name: prometheus-k8s-config 10 | namespace: monitoring 11 | roleRef: 12 | apiGroup: rbac.authorization.k8s.io 13 | kind: Role 14 | name: prometheus-k8s-config 15 | subjects: 16 | - kind: ServiceAccount 17 | name: prometheus-k8s 18 | namespace: monitoring 19 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-roleBindingSpecificNamespaces.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | items: 3 | - apiVersion: rbac.authorization.k8s.io/v1 4 | kind: RoleBinding 5 | metadata: 6 | labels: 7 | app.kubernetes.io/component: prometheus 8 | app.kubernetes.io/name: prometheus 9 | app.kubernetes.io/part-of: kube-prometheus 10 | app.kubernetes.io/version: 2.30.3 11 | name: prometheus-k8s 12 | namespace: default 13 | roleRef: 14 | apiGroup: rbac.authorization.k8s.io 15 | kind: Role 16 | name: prometheus-k8s 17 | subjects: 18 | - kind: ServiceAccount 19 | name: prometheus-k8s 20 | namespace: monitoring 21 | - apiVersion: rbac.authorization.k8s.io/v1 22 | kind: RoleBinding 23 | metadata: 24 | labels: 25 | app.kubernetes.io/component: prometheus 26 | app.kubernetes.io/name: prometheus 27 | app.kubernetes.io/part-of: kube-prometheus 28 | app.kubernetes.io/version: 2.30.3 29 | name: prometheus-k8s 30 | namespace: kube-system 31 | roleRef: 32 | apiGroup: rbac.authorization.k8s.io 33 | kind: Role 34 | name: prometheus-k8s 35 | subjects: 36 | - kind: ServiceAccount 37 | name: prometheus-k8s 38 | namespace: monitoring 39 | - apiVersion: rbac.authorization.k8s.io/v1 40 | kind: RoleBinding 41 | metadata: 42 | labels: 43 | app.kubernetes.io/component: prometheus 44 | app.kubernetes.io/name: prometheus 45 | app.kubernetes.io/part-of: kube-prometheus 46 | app.kubernetes.io/version: 2.30.3 47 | name: prometheus-k8s 48 | namespace: monitoring 49 | roleRef: 50 | apiGroup: rbac.authorization.k8s.io 51 | kind: Role 52 | name: prometheus-k8s 53 | subjects: 54 | - kind: ServiceAccount 55 | name: prometheus-k8s 56 | namespace: monitoring 57 | kind: RoleBindingList 58 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-roleConfig.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: Role 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | name: prometheus-k8s-config 10 | namespace: monitoring 11 | rules: 12 | - apiGroups: 13 | - "" 14 | resources: 15 | - configmaps 16 | verbs: 17 | - get 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-roleSpecificNamespaces.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | items: 3 | - apiVersion: rbac.authorization.k8s.io/v1 4 | kind: Role 5 | metadata: 6 | labels: 7 | app.kubernetes.io/component: prometheus 8 | app.kubernetes.io/name: prometheus 9 | app.kubernetes.io/part-of: kube-prometheus 10 | app.kubernetes.io/version: 2.30.3 11 | name: prometheus-k8s 12 | namespace: default 13 | rules: 14 | - apiGroups: 15 | - "" 16 | resources: 17 | - services 18 | - endpoints 19 | - pods 20 | verbs: 21 | - get 22 | - list 23 | - watch 24 | - apiGroups: 25 | - extensions 26 | resources: 27 | - ingresses 28 | verbs: 29 | - get 30 | - list 31 | - watch 32 | - apiGroups: 33 | - networking.k8s.io 34 | resources: 35 | - ingresses 36 | verbs: 37 | - get 38 | - list 39 | - watch 40 | - apiVersion: rbac.authorization.k8s.io/v1 41 | kind: Role 42 | metadata: 43 | labels: 44 | app.kubernetes.io/component: prometheus 45 | app.kubernetes.io/name: prometheus 46 | app.kubernetes.io/part-of: kube-prometheus 47 | app.kubernetes.io/version: 2.30.3 48 | name: prometheus-k8s 49 | namespace: kube-system 50 | rules: 51 | - apiGroups: 52 | - "" 53 | resources: 54 | - services 55 | - endpoints 56 | - pods 57 | verbs: 58 | - get 59 | - list 60 | - watch 61 | - apiGroups: 62 | - extensions 63 | resources: 64 | - ingresses 65 | verbs: 66 | - get 67 | - list 68 | - watch 69 | - apiGroups: 70 | - networking.k8s.io 71 | resources: 72 | - ingresses 73 | verbs: 74 | - get 75 | - list 76 | - watch 77 | - apiVersion: rbac.authorization.k8s.io/v1 78 | kind: Role 79 | metadata: 80 | labels: 81 | app.kubernetes.io/component: prometheus 82 | app.kubernetes.io/name: prometheus 83 | app.kubernetes.io/part-of: kube-prometheus 84 | app.kubernetes.io/version: 2.30.3 85 | name: prometheus-k8s 86 | namespace: monitoring 87 | rules: 88 | - apiGroups: 89 | - "" 90 | resources: 91 | - services 92 | - endpoints 93 | - pods 94 | verbs: 95 | - get 96 | - list 97 | - watch 98 | - apiGroups: 99 | - extensions 100 | resources: 101 | - ingresses 102 | verbs: 103 | - get 104 | - list 105 | - watch 106 | - apiGroups: 107 | - networking.k8s.io 108 | resources: 109 | - ingresses 110 | verbs: 111 | - get 112 | - list 113 | - watch 114 | kind: RoleList 115 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | prometheus: k8s 10 | name: prometheus-k8s 11 | namespace: monitoring 12 | spec: 13 | ports: 14 | - name: web 15 | port: 9090 16 | targetPort: web 17 | - name: reloader-web 18 | port: 8080 19 | targetPort: reloader-web 20 | selector: 21 | app.kubernetes.io/component: prometheus 22 | app.kubernetes.io/name: prometheus 23 | app.kubernetes.io/part-of: kube-prometheus 24 | prometheus: k8s 25 | sessionAffinity: ClientIP 26 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | name: prometheus-k8s 10 | namespace: monitoring 11 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/prometheus-serviceMonitor.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: monitoring.coreos.com/v1 2 | kind: ServiceMonitor 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: prometheus 6 | app.kubernetes.io/name: prometheus 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 2.30.3 9 | name: prometheus-k8s 10 | namespace: monitoring 11 | spec: 12 | endpoints: 13 | - interval: 30s 14 | port: web 15 | - interval: 30s 16 | port: reloader-web 17 | selector: 18 | matchLabels: 19 | app.kubernetes.io/component: prometheus 20 | app.kubernetes.io/name: prometheus 21 | app.kubernetes.io/part-of: kube-prometheus 22 | prometheus: k8s 23 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/setup/0namespace-namespace.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: monitoring 5 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/setup/prometheus-operator-0prometheusruleCustomResourceDefinition.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apiextensions.k8s.io/v1 2 | kind: CustomResourceDefinition 3 | metadata: 4 | annotations: 5 | controller-gen.kubebuilder.io/version: v0.4.1 6 | creationTimestamp: null 7 | name: prometheusrules.monitoring.coreos.com 8 | spec: 9 | group: monitoring.coreos.com 10 | names: 11 | categories: 12 | - prometheus-operator 13 | kind: PrometheusRule 14 | listKind: PrometheusRuleList 15 | plural: prometheusrules 16 | singular: prometheusrule 17 | scope: Namespaced 18 | versions: 19 | - name: v1 20 | schema: 21 | openAPIV3Schema: 22 | description: PrometheusRule defines recording and alerting rules for a Prometheus 23 | instance 24 | properties: 25 | apiVersion: 26 | description: 'APIVersion defines the versioned schema of this representation 27 | of an object. Servers should convert recognized schemas to the latest 28 | internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' 29 | type: string 30 | kind: 31 | description: 'Kind is a string value representing the REST resource this 32 | object represents. Servers may infer this from the endpoint the client 33 | submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' 34 | type: string 35 | metadata: 36 | type: object 37 | spec: 38 | description: Specification of desired alerting rule definitions for Prometheus. 39 | properties: 40 | groups: 41 | description: Content of Prometheus rule file 42 | items: 43 | description: 'RuleGroup is a list of sequentially evaluated recording 44 | and alerting rules. Note: PartialResponseStrategy is only used 45 | by ThanosRuler and will be ignored by Prometheus instances. Valid 46 | values for this field are ''warn'' or ''abort''. More info: https://github.com/thanos-io/thanos/blob/master/docs/components/rule.md#partial-response' 47 | properties: 48 | interval: 49 | type: string 50 | name: 51 | type: string 52 | partial_response_strategy: 53 | type: string 54 | rules: 55 | items: 56 | description: 'Rule describes an alerting or recording rule 57 | See Prometheus documentation: [alerting](https://www.prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) 58 | or [recording](https://www.prometheus.io/docs/prometheus/latest/configuration/recording_rules/#recording-rules) 59 | rule' 60 | properties: 61 | alert: 62 | type: string 63 | annotations: 64 | additionalProperties: 65 | type: string 66 | type: object 67 | expr: 68 | anyOf: 69 | - type: integer 70 | - type: string 71 | x-kubernetes-int-or-string: true 72 | for: 73 | type: string 74 | labels: 75 | additionalProperties: 76 | type: string 77 | type: object 78 | record: 79 | type: string 80 | required: 81 | - expr 82 | type: object 83 | type: array 84 | required: 85 | - name 86 | - rules 87 | type: object 88 | type: array 89 | type: object 90 | required: 91 | - spec 92 | type: object 93 | served: true 94 | storage: true 95 | status: 96 | acceptedNames: 97 | kind: "" 98 | plural: "" 99 | conditions: [] 100 | storedVersions: [] 101 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/setup/prometheus-operator-clusterRole.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRole 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: controller 6 | app.kubernetes.io/name: prometheus-operator 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.51.2 9 | name: prometheus-operator 10 | rules: 11 | - apiGroups: 12 | - monitoring.coreos.com 13 | resources: 14 | - alertmanagers 15 | - alertmanagers/finalizers 16 | - alertmanagerconfigs 17 | - prometheuses 18 | - prometheuses/finalizers 19 | - thanosrulers 20 | - thanosrulers/finalizers 21 | - servicemonitors 22 | - podmonitors 23 | - probes 24 | - prometheusrules 25 | verbs: 26 | - '*' 27 | - apiGroups: 28 | - apps 29 | resources: 30 | - statefulsets 31 | verbs: 32 | - '*' 33 | - apiGroups: 34 | - "" 35 | resources: 36 | - configmaps 37 | - secrets 38 | verbs: 39 | - '*' 40 | - apiGroups: 41 | - "" 42 | resources: 43 | - pods 44 | verbs: 45 | - list 46 | - delete 47 | - apiGroups: 48 | - "" 49 | resources: 50 | - services 51 | - services/finalizers 52 | - endpoints 53 | verbs: 54 | - get 55 | - create 56 | - update 57 | - delete 58 | - apiGroups: 59 | - "" 60 | resources: 61 | - nodes 62 | verbs: 63 | - list 64 | - watch 65 | - apiGroups: 66 | - "" 67 | resources: 68 | - namespaces 69 | verbs: 70 | - get 71 | - list 72 | - watch 73 | - apiGroups: 74 | - networking.k8s.io 75 | resources: 76 | - ingresses 77 | verbs: 78 | - get 79 | - list 80 | - watch 81 | - apiGroups: 82 | - authentication.k8s.io 83 | resources: 84 | - tokenreviews 85 | verbs: 86 | - create 87 | - apiGroups: 88 | - authorization.k8s.io 89 | resources: 90 | - subjectaccessreviews 91 | verbs: 92 | - create 93 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/setup/prometheus-operator-clusterRoleBinding.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: ClusterRoleBinding 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: controller 6 | app.kubernetes.io/name: prometheus-operator 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.51.2 9 | name: prometheus-operator 10 | roleRef: 11 | apiGroup: rbac.authorization.k8s.io 12 | kind: ClusterRole 13 | name: prometheus-operator 14 | subjects: 15 | - kind: ServiceAccount 16 | name: prometheus-operator 17 | namespace: monitoring 18 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/setup/prometheus-operator-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: controller 6 | app.kubernetes.io/name: prometheus-operator 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.51.2 9 | name: prometheus-operator 10 | namespace: monitoring 11 | spec: 12 | replicas: 1 13 | selector: 14 | matchLabels: 15 | app.kubernetes.io/component: controller 16 | app.kubernetes.io/name: prometheus-operator 17 | app.kubernetes.io/part-of: kube-prometheus 18 | template: 19 | metadata: 20 | annotations: 21 | kubectl.kubernetes.io/default-container: prometheus-operator 22 | labels: 23 | app.kubernetes.io/component: controller 24 | app.kubernetes.io/name: prometheus-operator 25 | app.kubernetes.io/part-of: kube-prometheus 26 | app.kubernetes.io/version: 0.51.2 27 | spec: 28 | containers: 29 | - args: 30 | - --kubelet-service=kube-system/kubelet 31 | - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.51.2 32 | image: quay.io/prometheus-operator/prometheus-operator:v0.51.2 33 | name: prometheus-operator 34 | ports: 35 | - containerPort: 8080 36 | name: http 37 | resources: 38 | limits: 39 | cpu: 200m 40 | memory: 200Mi 41 | requests: 42 | cpu: 100m 43 | memory: 100Mi 44 | securityContext: 45 | allowPrivilegeEscalation: false 46 | - args: 47 | - --logtostderr 48 | - --secure-listen-address=:8443 49 | - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 50 | - --upstream=http://127.0.0.1:8080/ 51 | image: quay.io/brancz/kube-rbac-proxy:v0.11.0 52 | name: kube-rbac-proxy 53 | ports: 54 | - containerPort: 8443 55 | name: https 56 | resources: 57 | limits: 58 | cpu: 20m 59 | memory: 40Mi 60 | requests: 61 | cpu: 10m 62 | memory: 20Mi 63 | securityContext: 64 | runAsGroup: 65532 65 | runAsNonRoot: true 66 | runAsUser: 65532 67 | nodeSelector: 68 | kubernetes.io/os: linux 69 | securityContext: 70 | runAsNonRoot: true 71 | runAsUser: 65534 72 | serviceAccountName: prometheus-operator 73 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/setup/prometheus-operator-service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: controller 6 | app.kubernetes.io/name: prometheus-operator 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.51.2 9 | name: prometheus-operator 10 | namespace: monitoring 11 | spec: 12 | clusterIP: None 13 | ports: 14 | - name: https 15 | port: 8443 16 | targetPort: https 17 | selector: 18 | app.kubernetes.io/component: controller 19 | app.kubernetes.io/name: prometheus-operator 20 | app.kubernetes.io/part-of: kube-prometheus 21 | -------------------------------------------------------------------------------- /kube-prometheus/manifests/setup/prometheus-operator-serviceAccount.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | app.kubernetes.io/component: controller 6 | app.kubernetes.io/name: prometheus-operator 7 | app.kubernetes.io/part-of: kube-prometheus 8 | app.kubernetes.io/version: 0.51.2 9 | name: prometheus-operator 10 | namespace: monitoring 11 | -------------------------------------------------------------------------------- /kubernetes-dashboard/kubernetes-dashboard.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2017 The Kubernetes Authors. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | apiVersion: v1 16 | kind: Namespace 17 | metadata: 18 | name: kubernetes-dashboard 19 | 20 | --- 21 | 22 | apiVersion: v1 23 | kind: ServiceAccount 24 | metadata: 25 | labels: 26 | k8s-app: kubernetes-dashboard 27 | name: kubernetes-dashboard 28 | namespace: kubernetes-dashboard 29 | 30 | --- 31 | 32 | kind: Service 33 | apiVersion: v1 34 | metadata: 35 | labels: 36 | k8s-app: kubernetes-dashboard 37 | name: kubernetes-dashboard 38 | namespace: kubernetes-dashboard 39 | spec: 40 | ports: 41 | - port: 443 42 | targetPort: 8443 43 | selector: 44 | k8s-app: kubernetes-dashboard 45 | 46 | --- 47 | 48 | apiVersion: v1 49 | kind: Secret 50 | metadata: 51 | labels: 52 | k8s-app: kubernetes-dashboard 53 | name: kubernetes-dashboard-certs 54 | namespace: kubernetes-dashboard 55 | type: Opaque 56 | 57 | --- 58 | 59 | apiVersion: v1 60 | kind: Secret 61 | metadata: 62 | labels: 63 | k8s-app: kubernetes-dashboard 64 | name: kubernetes-dashboard-csrf 65 | namespace: kubernetes-dashboard 66 | type: Opaque 67 | data: 68 | csrf: "" 69 | 70 | --- 71 | 72 | apiVersion: v1 73 | kind: Secret 74 | metadata: 75 | labels: 76 | k8s-app: kubernetes-dashboard 77 | name: kubernetes-dashboard-key-holder 78 | namespace: kubernetes-dashboard 79 | type: Opaque 80 | 81 | --- 82 | 83 | kind: ConfigMap 84 | apiVersion: v1 85 | metadata: 86 | labels: 87 | k8s-app: kubernetes-dashboard 88 | name: kubernetes-dashboard-settings 89 | namespace: kubernetes-dashboard 90 | 91 | --- 92 | 93 | kind: Role 94 | apiVersion: rbac.authorization.k8s.io/v1 95 | metadata: 96 | labels: 97 | k8s-app: kubernetes-dashboard 98 | name: kubernetes-dashboard 99 | namespace: kubernetes-dashboard 100 | rules: 101 | # Allow Dashboard to get, update and delete Dashboard exclusive secrets. 102 | - apiGroups: [""] 103 | resources: ["secrets"] 104 | resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs", "kubernetes-dashboard-csrf"] 105 | verbs: ["get", "update", "delete"] 106 | # Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map. 107 | - apiGroups: [""] 108 | resources: ["configmaps"] 109 | resourceNames: ["kubernetes-dashboard-settings"] 110 | verbs: ["get", "update"] 111 | # Allow Dashboard to get metrics. 112 | - apiGroups: [""] 113 | resources: ["services"] 114 | resourceNames: ["heapster", "dashboard-metrics-scraper"] 115 | verbs: ["proxy"] 116 | - apiGroups: [""] 117 | resources: ["services/proxy"] 118 | resourceNames: ["heapster", "http:heapster:", "https:heapster:", "dashboard-metrics-scraper", "http:dashboard-metrics-scraper"] 119 | verbs: ["get"] 120 | 121 | --- 122 | 123 | kind: ClusterRole 124 | apiVersion: rbac.authorization.k8s.io/v1 125 | metadata: 126 | labels: 127 | k8s-app: kubernetes-dashboard 128 | name: kubernetes-dashboard 129 | rules: 130 | # Allow Metrics Scraper to get metrics from the Metrics server 131 | - apiGroups: ["metrics.k8s.io"] 132 | resources: ["pods", "nodes"] 133 | verbs: ["get", "list", "watch"] 134 | 135 | --- 136 | 137 | apiVersion: rbac.authorization.k8s.io/v1 138 | kind: RoleBinding 139 | metadata: 140 | labels: 141 | k8s-app: kubernetes-dashboard 142 | name: kubernetes-dashboard 143 | namespace: kubernetes-dashboard 144 | roleRef: 145 | apiGroup: rbac.authorization.k8s.io 146 | kind: Role 147 | name: kubernetes-dashboard 148 | subjects: 149 | - kind: ServiceAccount 150 | name: kubernetes-dashboard 151 | namespace: kubernetes-dashboard 152 | 153 | --- 154 | 155 | apiVersion: rbac.authorization.k8s.io/v1 156 | kind: ClusterRoleBinding 157 | metadata: 158 | name: kubernetes-dashboard 159 | roleRef: 160 | apiGroup: rbac.authorization.k8s.io 161 | kind: ClusterRole 162 | name: kubernetes-dashboard 163 | subjects: 164 | - kind: ServiceAccount 165 | name: kubernetes-dashboard 166 | namespace: kubernetes-dashboard 167 | 168 | --- 169 | 170 | kind: Deployment 171 | apiVersion: apps/v1 172 | metadata: 173 | labels: 174 | k8s-app: kubernetes-dashboard 175 | name: kubernetes-dashboard 176 | namespace: kubernetes-dashboard 177 | spec: 178 | replicas: 1 179 | revisionHistoryLimit: 10 180 | selector: 181 | matchLabels: 182 | k8s-app: kubernetes-dashboard 183 | template: 184 | metadata: 185 | labels: 186 | k8s-app: kubernetes-dashboard 187 | spec: 188 | containers: 189 | - name: kubernetes-dashboard 190 | image: kubernetesui/dashboard:v2.3.1 191 | imagePullPolicy: Always 192 | ports: 193 | - containerPort: 8443 194 | protocol: TCP 195 | args: 196 | - --auto-generate-certificates 197 | - --namespace=kubernetes-dashboard 198 | # Uncomment the following line to manually specify Kubernetes API server Host 199 | # If not specified, Dashboard will attempt to auto discover the API server and connect 200 | # to it. Uncomment only if the default does not work. 201 | # - --apiserver-host=http://my-address:port 202 | volumeMounts: 203 | - name: kubernetes-dashboard-certs 204 | mountPath: /certs 205 | # Create on-disk volume to store exec logs 206 | - mountPath: /tmp 207 | name: tmp-volume 208 | livenessProbe: 209 | httpGet: 210 | scheme: HTTPS 211 | path: / 212 | port: 8443 213 | initialDelaySeconds: 30 214 | timeoutSeconds: 30 215 | securityContext: 216 | allowPrivilegeEscalation: false 217 | readOnlyRootFilesystem: true 218 | runAsUser: 1001 219 | runAsGroup: 2001 220 | volumes: 221 | - name: kubernetes-dashboard-certs 222 | secret: 223 | secretName: kubernetes-dashboard-certs 224 | - name: tmp-volume 225 | emptyDir: {} 226 | serviceAccountName: kubernetes-dashboard 227 | nodeSelector: 228 | "kubernetes.io/os": linux 229 | # Comment the following tolerations if Dashboard must not be deployed on master 230 | tolerations: 231 | - key: node-role.kubernetes.io/master 232 | effect: NoSchedule 233 | 234 | --- 235 | 236 | kind: Service 237 | apiVersion: v1 238 | metadata: 239 | labels: 240 | k8s-app: dashboard-metrics-scraper 241 | name: dashboard-metrics-scraper 242 | namespace: kubernetes-dashboard 243 | spec: 244 | ports: 245 | - port: 8000 246 | targetPort: 8000 247 | selector: 248 | k8s-app: dashboard-metrics-scraper 249 | 250 | --- 251 | 252 | kind: Deployment 253 | apiVersion: apps/v1 254 | metadata: 255 | labels: 256 | k8s-app: dashboard-metrics-scraper 257 | name: dashboard-metrics-scraper 258 | namespace: kubernetes-dashboard 259 | spec: 260 | replicas: 1 261 | revisionHistoryLimit: 10 262 | selector: 263 | matchLabels: 264 | k8s-app: dashboard-metrics-scraper 265 | template: 266 | metadata: 267 | labels: 268 | k8s-app: dashboard-metrics-scraper 269 | annotations: 270 | seccomp.security.alpha.kubernetes.io/pod: 'runtime/default' 271 | spec: 272 | containers: 273 | - name: dashboard-metrics-scraper 274 | image: kubernetesui/metrics-scraper:v1.0.6 275 | ports: 276 | - containerPort: 8000 277 | protocol: TCP 278 | livenessProbe: 279 | httpGet: 280 | scheme: HTTP 281 | path: / 282 | port: 8000 283 | initialDelaySeconds: 30 284 | timeoutSeconds: 30 285 | volumeMounts: 286 | - mountPath: /tmp 287 | name: tmp-volume 288 | securityContext: 289 | allowPrivilegeEscalation: false 290 | readOnlyRootFilesystem: true 291 | runAsUser: 1001 292 | runAsGroup: 2001 293 | serviceAccountName: kubernetes-dashboard 294 | nodeSelector: 295 | "kubernetes.io/os": linux 296 | # Comment the following tolerations if Dashboard must not be deployed on master 297 | tolerations: 298 | - key: node-role.kubernetes.io/master 299 | effect: NoSchedule 300 | volumes: 301 | - name: tmp-volume 302 | emptyDir: {} 303 | 304 | --- 305 | 306 | apiVersion: v1 307 | kind: ServiceAccount 308 | metadata: 309 | name: admin-user 310 | namespace: kubernetes-dashboard 311 | 312 | --- 313 | 314 | apiVersion: rbac.authorization.k8s.io/v1 315 | kind: ClusterRoleBinding 316 | metadata: 317 | name: admin-user 318 | roleRef: 319 | apiGroup: rbac.authorization.k8s.io 320 | kind: ClusterRole 321 | name: cluster-admin 322 | subjects: 323 | - kind: ServiceAccount 324 | name: admin-user 325 | namespace: kubernetes-dashboard -------------------------------------------------------------------------------- /kubesphere/nfs.yaml: -------------------------------------------------------------------------------- 1 | ## 创建了一个存储类 2 | apiVersion: storage.k8s.io/v1 3 | kind: StorageClass 4 | metadata: 5 | name: nfs-storage 6 | annotations: 7 | storageclass.kubernetes.io/is-default-class: "true" 8 | provisioner: k8s-sigs.io/nfs-subdir-external-provisioner 9 | parameters: 10 | archiveOnDelete: "true" ## 删除pv的时候,pv的内容是否要备份 11 | 12 | --- 13 | apiVersion: apps/v1 14 | kind: Deployment 15 | metadata: 16 | name: nfs-client-provisioner 17 | labels: 18 | app: nfs-client-provisioner 19 | # replace with namespace where provisioner is deployed 20 | namespace: default 21 | spec: 22 | replicas: 1 23 | strategy: 24 | type: Recreate 25 | selector: 26 | matchLabels: 27 | app: nfs-client-provisioner 28 | template: 29 | metadata: 30 | labels: 31 | app: nfs-client-provisioner 32 | spec: 33 | serviceAccountName: nfs-client-provisioner 34 | containers: 35 | - name: nfs-client-provisioner 36 | image: docker.io/v5cn/nfs-subdir-external-provisioner:v4.0.2 37 | # resources: 38 | # limits: 39 | # cpu: 10m 40 | # requests: 41 | # cpu: 10m 42 | volumeMounts: 43 | - name: nfs-client-root 44 | mountPath: /persistentvolumes 45 | env: 46 | - name: PROVISIONER_NAME 47 | value: k8s-sigs.io/nfs-subdir-external-provisioner 48 | - name: NFS_SERVER 49 | value: 192.168.56.100 ## 指定自己nfs服务器地址 50 | - name: NFS_PATH 51 | value: /nfs/data ## nfs服务器共享的目录 52 | volumes: 53 | - name: nfs-client-root 54 | nfs: 55 | server: 192.168.56.100 56 | path: /nfs/data 57 | --- 58 | apiVersion: v1 59 | kind: ServiceAccount 60 | metadata: 61 | name: nfs-client-provisioner 62 | # replace with namespace where provisioner is deployed 63 | namespace: default 64 | --- 65 | kind: ClusterRole 66 | apiVersion: rbac.authorization.k8s.io/v1 67 | metadata: 68 | name: nfs-client-provisioner-runner 69 | rules: 70 | - apiGroups: [""] 71 | resources: ["nodes"] 72 | verbs: ["get", "list", "watch"] 73 | - apiGroups: [""] 74 | resources: ["persistentvolumes"] 75 | verbs: ["get", "list", "watch", "create", "delete"] 76 | - apiGroups: [""] 77 | resources: ["persistentvolumeclaims"] 78 | verbs: ["get", "list", "watch", "update"] 79 | - apiGroups: ["storage.k8s.io"] 80 | resources: ["storageclasses"] 81 | verbs: ["get", "list", "watch"] 82 | - apiGroups: [""] 83 | resources: ["events"] 84 | verbs: ["create", "update", "patch"] 85 | --- 86 | kind: ClusterRoleBinding 87 | apiVersion: rbac.authorization.k8s.io/v1 88 | metadata: 89 | name: run-nfs-client-provisioner 90 | subjects: 91 | - kind: ServiceAccount 92 | name: nfs-client-provisioner 93 | # replace with namespace where provisioner is deployed 94 | namespace: default 95 | roleRef: 96 | kind: ClusterRole 97 | name: nfs-client-provisioner-runner 98 | apiGroup: rbac.authorization.k8s.io 99 | --- 100 | kind: Role 101 | apiVersion: rbac.authorization.k8s.io/v1 102 | metadata: 103 | name: leader-locking-nfs-client-provisioner 104 | # replace with namespace where provisioner is deployed 105 | namespace: default 106 | rules: 107 | - apiGroups: [""] 108 | resources: ["endpoints"] 109 | verbs: ["get", "list", "watch", "create", "update", "patch"] 110 | --- 111 | kind: RoleBinding 112 | apiVersion: rbac.authorization.k8s.io/v1 113 | metadata: 114 | name: leader-locking-nfs-client-provisioner 115 | # replace with namespace where provisioner is deployed 116 | namespace: default 117 | subjects: 118 | - kind: ServiceAccount 119 | name: nfs-client-provisioner 120 | # replace with namespace where provisioner is deployed 121 | namespace: default 122 | roleRef: 123 | kind: Role 124 | name: leader-locking-nfs-client-provisioner 125 | apiGroup: rbac.authorization.k8s.io 126 | -------------------------------------------------------------------------------- /metrics/metrics.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | labels: 5 | k8s-app: metrics-server 6 | name: metrics-server 7 | namespace: kube-system 8 | --- 9 | apiVersion: rbac.authorization.k8s.io/v1 10 | kind: ClusterRole 11 | metadata: 12 | labels: 13 | k8s-app: metrics-server 14 | rbac.authorization.k8s.io/aggregate-to-admin: "true" 15 | rbac.authorization.k8s.io/aggregate-to-edit: "true" 16 | rbac.authorization.k8s.io/aggregate-to-view: "true" 17 | name: system:aggregated-metrics-reader 18 | rules: 19 | - apiGroups: 20 | - metrics.k8s.io 21 | resources: 22 | - pods 23 | - nodes 24 | verbs: 25 | - get 26 | - list 27 | - watch 28 | --- 29 | apiVersion: rbac.authorization.k8s.io/v1 30 | kind: ClusterRole 31 | metadata: 32 | labels: 33 | k8s-app: metrics-server 34 | name: system:metrics-server 35 | rules: 36 | - apiGroups: 37 | - "" 38 | resources: 39 | - pods 40 | - nodes 41 | - nodes/stats 42 | - namespaces 43 | - configmaps 44 | verbs: 45 | - get 46 | - list 47 | - watch 48 | --- 49 | apiVersion: rbac.authorization.k8s.io/v1 50 | kind: RoleBinding 51 | metadata: 52 | labels: 53 | k8s-app: metrics-server 54 | name: metrics-server-auth-reader 55 | namespace: kube-system 56 | roleRef: 57 | apiGroup: rbac.authorization.k8s.io 58 | kind: Role 59 | name: extension-apiserver-authentication-reader 60 | subjects: 61 | - kind: ServiceAccount 62 | name: metrics-server 63 | namespace: kube-system 64 | --- 65 | apiVersion: rbac.authorization.k8s.io/v1 66 | kind: ClusterRoleBinding 67 | metadata: 68 | labels: 69 | k8s-app: metrics-server 70 | name: metrics-server:system:auth-delegator 71 | roleRef: 72 | apiGroup: rbac.authorization.k8s.io 73 | kind: ClusterRole 74 | name: system:auth-delegator 75 | subjects: 76 | - kind: ServiceAccount 77 | name: metrics-server 78 | namespace: kube-system 79 | --- 80 | apiVersion: rbac.authorization.k8s.io/v1 81 | kind: ClusterRoleBinding 82 | metadata: 83 | labels: 84 | k8s-app: metrics-server 85 | name: system:metrics-server 86 | roleRef: 87 | apiGroup: rbac.authorization.k8s.io 88 | kind: ClusterRole 89 | name: system:metrics-server 90 | subjects: 91 | - kind: ServiceAccount 92 | name: metrics-server 93 | namespace: kube-system 94 | --- 95 | apiVersion: v1 96 | kind: Service 97 | metadata: 98 | labels: 99 | k8s-app: metrics-server 100 | name: metrics-server 101 | namespace: kube-system 102 | spec: 103 | ports: 104 | - name: https 105 | port: 443 106 | protocol: TCP 107 | targetPort: https 108 | selector: 109 | k8s-app: metrics-server 110 | --- 111 | apiVersion: apps/v1 112 | kind: Deployment 113 | metadata: 114 | labels: 115 | k8s-app: metrics-server 116 | name: metrics-server 117 | namespace: kube-system 118 | spec: 119 | selector: 120 | matchLabels: 121 | k8s-app: metrics-server 122 | strategy: 123 | rollingUpdate: 124 | maxUnavailable: 0 125 | template: 126 | metadata: 127 | labels: 128 | k8s-app: metrics-server 129 | spec: 130 | containers: 131 | - args: 132 | - --cert-dir=/tmp 133 | - --kubelet-insecure-tls 134 | - --secure-port=4443 135 | - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname 136 | - --kubelet-use-node-status-port 137 | image: registry.aliyuncs.com/k8sxio/metrics-server:v0.4.3 138 | imagePullPolicy: IfNotPresent 139 | livenessProbe: 140 | failureThreshold: 3 141 | httpGet: 142 | path: /livez 143 | port: https 144 | scheme: HTTPS 145 | periodSeconds: 10 146 | name: metrics-server 147 | ports: 148 | - containerPort: 4443 149 | name: https 150 | protocol: TCP 151 | readinessProbe: 152 | failureThreshold: 3 153 | httpGet: 154 | path: /readyz 155 | port: https 156 | scheme: HTTPS 157 | periodSeconds: 10 158 | securityContext: 159 | readOnlyRootFilesystem: true 160 | runAsNonRoot: true 161 | runAsUser: 1000 162 | volumeMounts: 163 | - mountPath: /tmp 164 | name: tmp-dir 165 | nodeSelector: 166 | kubernetes.io/os: linux 167 | priorityClassName: system-cluster-critical 168 | serviceAccountName: metrics-server 169 | volumes: 170 | - emptyDir: {} 171 | name: tmp-dir 172 | --- 173 | apiVersion: apiregistration.k8s.io/v1 174 | kind: APIService 175 | metadata: 176 | labels: 177 | k8s-app: metrics-server 178 | name: v1beta1.metrics.k8s.io 179 | spec: 180 | group: metrics.k8s.io 181 | groupPriorityMinimum: 100 182 | insecureSkipTLSVerify: true 183 | service: 184 | name: metrics-server 185 | namespace: kube-system 186 | version: v1beta1 187 | versionPriority: 100 188 | 189 | -------------------------------------------------------------------------------- /ubuntu/Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | ENV['VAGRANT_NO_PARALLEL'] = 'yes' 5 | 6 | Vagrant.configure(2) do |config| 7 | 8 | config.vm.provision "shell", path: "bootstrap.sh" 9 | config.vm.synced_folder ".", "/vagrant", type: "virtualbox" 10 | 11 | # Kubernetes Master Server 12 | config.vm.define "kmaster" do |node| 13 | 14 | node.vm.box = "generic/ubuntu2004" 15 | node.vm.box_check_update = false 16 | node.vm.box_version = "3.3.0" 17 | node.vm.hostname = "kmaster.k8s.com" 18 | 19 | node.vm.network "private_network", ip: "192.168.56.100" 20 | 21 | node.vm.provider :virtualbox do |v| 22 | v.name = "kmaster" 23 | v.memory = 2048 24 | v.cpus = 2 25 | end 26 | 27 | node.vm.provision "shell", path: "bootstrap_kmaster.sh" 28 | 29 | end 30 | 31 | 32 | # Kubernetes Worker Nodes 33 | NodeCount = 2 34 | 35 | (1..NodeCount).each do |i| 36 | 37 | config.vm.define "kworker#{i}" do |node| 38 | 39 | node.vm.box = "generic/ubuntu2004" 40 | node.vm.box_check_update = false 41 | node.vm.box_version = "3.3.0" 42 | node.vm.hostname = "kworker#{i}.k8s.com" 43 | 44 | node.vm.network "private_network", ip: "192.168.56.10#{i}" 45 | 46 | node.vm.provider :virtualbox do |v| 47 | v.name = "kworker#{i}" 48 | v.memory = 2048 49 | v.cpus = 2 50 | end 51 | 52 | node.vm.provision "shell", path: "bootstrap_kworker.sh" 53 | 54 | end 55 | 56 | end 57 | 58 | end 59 | -------------------------------------------------------------------------------- /ubuntu/bootstrap.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ## !IMPORTANT ## 4 | # 5 | ## This script is tested only in the generic/ubuntu2004 Vagrant box 6 | ## If you use a different version of Ubuntu or a different Ubuntu Vagrant box test this again 7 | # 8 | 9 | echo "[TASK 0] Setting TimeZone" 10 | timedatectl set-timezone Asia/Shanghai 11 | 12 | echo "[TASK 1] Setting DNS" 13 | cat >/etc/systemd/resolved.conf </etc/apt/sources.list</dev/null 2>&1 41 | 42 | echo "[TASK 3] Disable and turn off SWAP" 43 | sed -i '/swap/d' /etc/fstab 44 | swapoff -a 45 | 46 | echo "[TASK 4] Stop and Disable firewall" 47 | systemctl disable --now ufw >/dev/null 2>&1 48 | 49 | echo "[TASK 5] Enable and Load Kernel modules" 50 | cat >>/etc/modules-load.d/containerd.conf<>/etc/sysctl.d/kubernetes.conf</dev/null 2>&1 64 | 65 | echo "[TASK 7] Install containerd runtime" 66 | apt install -qq -y containerd apt-transport-https >/dev/null 2>&1 67 | mkdir /etc/containerd 68 | containerd config default > /etc/containerd/config.toml 69 | # 配置containerd镜像源 70 | # 替换k8s.gcr.io为registry.aliyuncs.com/k8sxio 71 | # 替换https://registry-1.docker.io为https://registry.cn-hangzhou.aliyuncs.com 72 | # 设置docker.io的镜像地址为https://bqr1dr1n.mirror.aliyuncs.com 73 | # 设置k8s.gcr.io的镜像地址为https://registry.aliyuncs.com/k8sxio 74 | sed -i "s#k8s.gcr.io#registry.aliyuncs.com/k8sxio#g" /etc/containerd/config.toml 75 | sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml 76 | sed -i "s#https://registry-1.docker.io#https://registry.cn-hangzhou.aliyuncs.com#g" /etc/containerd/config.toml 77 | sed -i '/\[plugins\.\"io\.containerd\.grpc\.v1\.cri\"\.registry\.mirrors\]/ a\\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]\n\ \ \ \ \ \ \ \ \ \ endpoint = ["https://bqr1dr1n.mirror.aliyuncs.com"]' /etc/containerd/config.toml 78 | sed -i '/\[plugins\.\"io\.containerd\.grpc\.v1\.cri\"\.registry\.mirrors\]/ a\\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]\n\ \ \ \ \ \ \ \ \ \ endpoint = ["https://registry.aliyuncs.com/k8sxio"]' /etc/containerd/config.toml 79 | systemctl daemon-reload 80 | systemctl enable containerd --now >/dev/null 2>&1 81 | systemctl restart containerd 82 | 83 | echo "[TASK 8] Add apt repo for kubernetes" 84 | curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 85 | cat </etc/apt/sources.list.d/kubernetes.list 86 | deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main 87 | EOF 88 | apt update -qq >/dev/null 2>&1 89 | 90 | echo "[TASK 9] Install Kubernetes components (kubeadm, kubelet and kubectl)" 91 | apt install -qq -y kubeadm=1.22.0-00 kubelet=1.22.0-00 kubectl=1.22.0-00 >/dev/null 2>&1 92 | crictl config runtime-endpoint /run/containerd/containerd.sock 93 | crictl config image-endpoint /run/containerd/containerd.sock 94 | 95 | echo "[TASK 10] Enable ssh password authentication" 96 | sed -i 's/^PasswordAuthentication .*/PasswordAuthentication yes/' /etc/ssh/sshd_config 97 | echo 'PermitRootLogin yes' >> /etc/ssh/sshd_config 98 | systemctl reload sshd 99 | 100 | echo "[TASK 11] Set root password" 101 | echo -e "kubeadmin\nkubeadmin" | passwd root >/dev/null 2>&1 102 | echo "export TERM=xterm" >> /etc/bash.bashrc 103 | 104 | echo "[TASK 12] Update /etc/hosts file" 105 | cat >>/etc/hosts</dev/null 2>&1 6 | 7 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.2 >/dev/null 2>&1 8 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.2 >/dev/null 2>&1 9 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.2 >/dev/null 2>&1 10 | # ctr images pull registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.2 >/dev/null 2>&1 11 | # ctr images pull registry.aliyuncs.com/k8sxio/pause:3.5 >/dev/null 2>&1 12 | # ctr images pull registry.aliyuncs.com/k8sxio/etcd:3.5.0-0 >/dev/null 2>&1 13 | # ctr -n k8s.io images pull docker.io/v5cn/coredns:v1.8.4 >/dev/null 2>&1 14 | # ctr -n k8s.io images tag docker.io/v5cn/coredns:v1.8.4 registry.aliyuncs.com/k8sxio/coredns:v1.8.4 >/dev/null 2>&1 15 | 16 | # 曲线救国,拉取kubernetes所需镜像 17 | kubeadm config images list | grep -v 'coredns' | sed 's#k8s.gcr.io#ctr images pull registry.aliyuncs.com\/k8sxio#g' > images.sh 18 | # registry.aliyuncs.com/k8sxio 仓库中没有coredns镜像,再次曲线救国拉取coredns镜像 19 | # containerd环境下镜像存在namespace隔离,kubernetes的镜像在k8s.io namespace下,因此需要指定namespace 20 | # 拉取到镜像后,将镜像标记为registry.aliyuncs.com/k8sxio/coredns:v1.8.4 后面的 kubeadm init 指定了image-repository为registry.aliyuncs.com/k8sxio 21 | cat >> images.sh</dev/null 2>&1 26 | 27 | echo "[TASK 2] Initialize Kubernetes Cluster" 28 | kubeadm init \ 29 | --apiserver-advertise-address=192.168.56.100 \ 30 | --control-plane-endpoint=kmaster.k8s.com \ 31 | --kubernetes-version v1.22.0 \ 32 | --image-repository registry.aliyuncs.com/k8sxio \ 33 | --pod-network-cidr=192.168.0.0/16 > /root/kubeinit.log 2>/dev/null 34 | 35 | echo "[TASK 3] Deploy Calico network" 36 | kubectl --kubeconfig=/etc/kubernetes/admin.conf create -f https://docs.projectcalico.org/v3.18/manifests/calico.yaml >/dev/null 2>&1 37 | 38 | echo "[TASK 4] Generate and save cluster join command to /joincluster.sh" 39 | kubeadm token create --print-join-command > /root/joincluster.sh 2>/dev/null 40 | -------------------------------------------------------------------------------- /ubuntu/bootstrap_kworker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "[TASK 1] Join node to Kubernetes Cluster" 4 | apt install -qq -y sshpass >/dev/null 2>&1 5 | sshpass -p "kubeadmin" scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no kmaster.k8s.com:/root/joincluster.sh /root/joincluster.sh 2>/dev/null 6 | bash /root/joincluster.sh >/dev/null 2>&1 7 | --------------------------------------------------------------------------------