├── .gitignore ├── 00-create-eks-cluster ├── 01-AmazonEKSAdminPolicy.json ├── 02-demo-cluster.yaml └── README.md ├── 01-deploy-sample-application ├── 01-deployment.yaml ├── 02-service.yaml ├── 03-update-deployment.yaml └── README.md ├── 02-deploy-prometheus ├── README.md └── values.yaml ├── 03-deploy-grafana ├── README.md └── values.yaml ├── 04-deploy-argocd ├── README.md └── applications │ ├── grafana │ └── grafana.yaml │ └── prometheus │ └── prometheus.yaml ├── 05-deploy-EFK ├── README.md ├── elasticsearch.yaml ├── filebeat.yaml └── kibana.yaml ├── 06-deploy-keda └── README.md └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | kubeconfig_* 2 | .idea 3 | .DS_Store -------------------------------------------------------------------------------- /00-create-eks-cluster/01-AmazonEKSAdminPolicy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Action": [ 7 | "eks:*" 8 | ], 9 | "Resource": "*" 10 | }, 11 | { 12 | "Effect": "Allow", 13 | "Action": "iam:PassRole", 14 | "Resource": "*", 15 | "Condition": { 16 | "StringEquals": { 17 | "iam:PassedToService": "eks.amazonaws.com" 18 | } 19 | } 20 | } 21 | ] 22 | } 23 | -------------------------------------------------------------------------------- /00-create-eks-cluster/02-demo-cluster.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | apiVersion: eksctl.io/v1alpha5 3 | kind: ClusterConfig 4 | metadata: 5 | name: my-demo-cluster 6 | region: us-west-2 7 | addons: 8 | - name: vpc-cni 9 | version: latest 10 | resolveConflicts: overwrite 11 | - name: coredns 12 | version: latest 13 | configurationValues: "{\"replicaCount\":3}" 14 | resolveConflicts: overwrite 15 | - name: aws-ebs-csi-driver 16 | version: latest 17 | resolveConflicts: overwrite 18 | - name: kube-proxy 19 | version: latest 20 | resolveConflicts: overwrite 21 | managedNodeGroups: 22 | - name: my-demo-workers 23 | labels: { role: workers } 24 | instanceType: t3.large 25 | volumeSize: 100 26 | privateNetworking: true 27 | desiredCapacity: 2 28 | minSize: 1 29 | maxSize: 4 -------------------------------------------------------------------------------- /00-create-eks-cluster/README.md: -------------------------------------------------------------------------------- 1 | # deploy-eks-cluster 2 | 3 | This example is based on eksctl which is a simple CLI tool for creating and managing clusters on EKS.EKS Clusters can be deployed and managed with a number of solutions including Terraform, Cloudformation,AWS Console and AWS CLI. 4 | 5 | ## Prerequisites 6 | 7 | - An active AWS account 8 | - VPC - eksctl creates a new vpc named eksctl-my-demo-cluster-cluster/VPC in the target region (if you need to use custom vpc configuration then refer to [link](https://eksctl.io/usage/creating-and-managing-clusters/#:~:text=If%20you%20needed%20to%20use%20an%20existing%20VPC%2C%20you%20can%20use%20a%20config%20file%20like%20this%3A)) 9 | - IAM permissions – The IAM security principal that you're using must have permissions to work with Amazon EKS IAM roles and service-linked roles, AWS CloudFormation, and a VPC and related resources. 10 | - Install [kubectl](https://kubernetes.io/docs/tasks/tools/),[eksctl](https://eksctl.io/introduction/?h=install#installation) and [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) in your local machine or in the CICD setup 11 | 12 | ### IAM Setup 13 | 14 | - For this setup, create an IAM policy name AmazonEKSAdminPolicy with the policy details in `01-AmazonEKSAdminPolicy.json` and attach the policy to the principal creating the cluster. 15 | 16 | ### Creating an EKS cluster 17 | 18 | The eksctl tool uses CloudFormation under the hood, creating one stack for the EKS master control plane and another stack for the worker nodes. Refer to [eksctl](https://eksctl.io/introduction/) for all avaiable configuration options. 19 | 20 | Run the command below to create a new cluster in the `us-west-2` region; expect this to take around 20 minutes and refer to eksctl documentation for all available options to customize the cluster configurations. 21 | 22 | eksctl create cluster \ 23 | --version 1.23 \ 24 | --region us-west-2 \ 25 | --node-type t3.medium \ 26 | --nodes 3 \ 27 | --nodes-min 1 \ 28 | --nodes-max 4 \ 29 | --name my-demo-cluster 30 | 31 | As an alternative, you can also use YAML, as sort of DSL (domain specific language) script for creating Kubernetes clusters with EKS. 32 | 33 | Run the below command to create the cluster (expect this to take around 20 minutes): 34 | 35 | eksctl create cluster -f 02-demo-cluster.yaml 36 | 37 | Sample log: 38 | 39 | ``` 40 | ❯❯ eksctl create cluster -f 02-demo-cluster.yaml 41 | 2023-01-15 10:52:38 [ℹ] eksctl version 0.124.0-dev+ac917eb50.2022-12-23T08:05:44Z 42 | 2023-01-15 10:52:38 [ℹ] using region us-west-2 43 | 2023-01-15 10:52:39 [ℹ] setting availability zones to [us-west-2c us-west-2a us-west-2b] 44 | 2023-01-15 10:52:39 [ℹ] subnets for us-west-2c - public:192.168.0.0/19 private:192.168.96.0/19 45 | 2023-01-15 10:52:39 [ℹ] subnets for us-west-2a - public:192.168.32.0/19 private:192.168.128.0/19 46 | 2023-01-15 10:52:39 [ℹ] subnets for us-west-2b - public:192.168.64.0/19 private:192.168.160.0/19 47 | 2023-01-15 10:52:40 [ℹ] nodegroup "my-demo-workers" will use "ami-0d453cab46e7202b2" [AmazonLinux2/1.23] 48 | 2023-01-15 10:52:40 [ℹ] using Kubernetes version 1.23 49 | 2023-01-15 10:52:40 [ℹ] creating EKS cluster "my-demo-cluster" in "us-west-2" region with un-managed nodes 50 | 2023-01-15 10:52:40 [ℹ] 1 nodegroup (my-demo-workers) was included (based on the include/exclude rules) 51 | 2023-01-15 10:52:40 [ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s) 52 | 2023-01-15 10:52:40 [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s) 53 | 2023-01-15 10:52:40 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=my-demo-cluster' 54 | 2023-01-15 10:52:40 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "my-demo-cluster" in "us-west-2" 55 | 2023-01-15 10:52:40 [ℹ] CloudWatch logging will not be enabled for cluster "my-demo-cluster" in "us-west-2" 56 | 2023-01-15 10:52:40 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=my-demo-cluster' 57 | 2023-01-15 10:52:40 [ℹ] 58 | 2 sequential tasks: { create cluster control plane "my-demo-cluster", 59 | 2 sequential sub-tasks: { 60 | wait for control plane to become ready, 61 | create nodegroup "my-demo-workers", 62 | } 63 | } 64 | 2023-01-15 10:52:40 [ℹ] building cluster stack "eksctl-my-demo-cluster-cluster" 65 | 2023-01-15 10:52:42 [ℹ] deploying stack "eksctl-my-demo-cluster-cluster" 66 | 2023-01-15 10:53:12 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 67 | 2023-01-15 10:53:44 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 68 | 2023-01-15 10:54:45 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 69 | 2023-01-15 10:55:46 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 70 | 2023-01-15 10:56:47 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 71 | 2023-01-15 10:57:48 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 72 | 2023-01-15 10:58:49 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 73 | 2023-01-15 10:59:50 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 74 | 2023-01-15 11:00:51 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 75 | 2023-01-15 11:01:53 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 76 | 2023-01-15 11:02:54 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 77 | 2023-01-15 11:03:55 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-cluster" 78 | 2023-01-15 11:06:02 [ℹ] building nodegroup stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 79 | 2023-01-15 11:06:04 [ℹ] deploying stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 80 | 2023-01-15 11:06:04 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 81 | 2023-01-15 11:06:35 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 82 | 2023-01-15 11:07:11 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 83 | 2023-01-15 11:07:50 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 84 | 2023-01-15 11:08:36 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 85 | 2023-01-15 11:09:15 [ℹ] waiting for CloudFormation stack "eksctl-my-demo-cluster-nodegroup-my-demo-workers" 86 | 2023-01-15 11:09:15 [ℹ] waiting for the control plane to become ready 87 | 2023-01-15 11:09:16 [✔] saved kubeconfig as "/Users/chimbu/.kube/config" 88 | 2023-01-15 11:09:16 [ℹ] no tasks 89 | 2023-01-15 11:09:16 [✔] all EKS cluster resources for "my-demo-cluster" have been created 90 | 2023-01-15 11:09:17 [ℹ] adding identity "arn:aws:iam::317630533282:role/eksctl-my-demo-cluster-nodegroup-NodeInstanceRole-14J48FWWCMCO7" to auth ConfigMap 91 | 2023-01-15 11:09:17 [ℹ] nodegroup "my-demo-workers" has 0 node(s) 92 | 2023-01-15 11:09:17 [ℹ] waiting for at least 1 node(s) to become ready in "my-demo-workers" 93 | 2023-01-15 11:10:05 [ℹ] nodegroup "my-demo-workers" has 4 node(s) 94 | 2023-01-15 11:10:05 [ℹ] node "ip-192-168-51-22.us-west-2.compute.internal" is ready 95 | 2023-01-15 11:10:05 [ℹ] node "ip-192-168-62-41.us-west-2.compute.internal" is not ready 96 | 2023-01-15 11:10:05 [ℹ] node "ip-192-168-8-29.us-west-2.compute.internal" is not ready 97 | 2023-01-15 11:10:05 [ℹ] node "ip-192-168-84-235.us-west-2.compute.internal" is not ready 98 | 2023-01-15 11:10:07 [ℹ] kubectl command should work with "/Users/chimbu/.kube/config", try 'kubectl get nodes' 99 | 2023-01-15 11:10:07 [✔] EKS cluster "my-demo-cluster" in "us-west-2" region is ready 100 | 101 | ``` 102 | 103 | Screenshot 2023-01-15 at 11 12 17 104 | 105 | Run the below command to destroy the cluster (expect this to take around 20 minutes): 106 | 107 | eksctl delete cluster -f 02-demo-cluster.yaml 108 | 109 | eksctl automatically updates the kubeconfig with the cluster configurations. Run the below command to verify the cluster connecivity 110 | 111 | kubectl get pods --all-namespaces 112 | 113 | Sample output: 114 | 115 | ``` 116 | ❯❯ kubectl get pods --all-namespaces 117 | NAMESPACE NAME READY STATUS RESTARTS AGE 118 | default nginx 1/1 Running 0 36h 119 | elastic-system elastic-operator-0 1/1 Running 0 36h 120 | kube-system aws-node-26f7k 1/1 Running 0 37h 121 | kube-system aws-node-8x2fh 1/1 Running 0 37h 122 | kube-system aws-node-nsqjc 1/1 Running 0 37h 123 | kube-system coredns-57ff979f67-m2hlh 1/1 Running 0 37h 124 | kube-system coredns-57ff979f67-qvxqx 1/1 Running 0 37h 125 | kube-system ebs-csi-controller-6d4b84cd85-kfjz4 6/6 Running 0 36h 126 | kube-system ebs-csi-controller-6d4b84cd85-vvjlf 6/6 Running 0 36h 127 | kube-system ebs-csi-node-d9hkt 3/3 Running 0 36h 128 | kube-system ebs-csi-node-pv688 3/3 Running 0 36h 129 | kube-system ebs-csi-node-wmmq4 3/3 Running 0 36h 130 | kube-system kube-proxy-9hxsh 1/1 Running 0 37h 131 | kube-system kube-proxy-9jlqz 1/1 Running 0 37h 132 | kube-system kube-proxy-dgtgv 1/1 Running 0 37h 133 | ``` 134 | -------------------------------------------------------------------------------- /01-deploy-sample-application/01-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: eks-sample-deployment 5 | namespace: eks-sample-app 6 | labels: 7 | app: eks-sample-app 8 | spec: 9 | replicas: 3 10 | selector: 11 | matchLabels: 12 | app: eks-sample-app 13 | template: 14 | metadata: 15 | labels: 16 | app: eks-sample-app 17 | spec: 18 | containers: 19 | - name: eks-sample-app 20 | image: nginx:1.22.1 21 | ports: 22 | - name: http 23 | containerPort: 80 -------------------------------------------------------------------------------- /01-deploy-sample-application/02-service.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | apiVersion: v1 3 | kind: Service 4 | metadata: 5 | name: eks-sample-service 6 | namespace: eks-sample-app 7 | labels: 8 | app: eks-sample-app 9 | spec: 10 | type: LoadBalancer 11 | selector: 12 | app: eks-sample-app 13 | ports: 14 | - protocol: TCP 15 | port: 80 16 | targetPort: 80 17 | -------------------------------------------------------------------------------- /01-deploy-sample-application/03-update-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: eks-sample-deployment 5 | namespace: eks-sample-app 6 | labels: 7 | app: eks-sample-app 8 | spec: 9 | replicas: 3 10 | selector: 11 | matchLabels: 12 | app: eks-sample-app 13 | template: 14 | metadata: 15 | labels: 16 | app: eks-sample-app 17 | spec: 18 | containers: 19 | - name: eks-sample-app 20 | image: nginx:1.23.3 21 | ports: 22 | - name: http 23 | containerPort: 80 -------------------------------------------------------------------------------- /01-deploy-sample-application/README.md: -------------------------------------------------------------------------------- 1 | # deploy-sample-application 2 | 3 | Follow the instructions after the EKS cluster configurations are completed. We will deploy a few core kubernetes objects and provision an external loadbalancer to access the application outside the EKS cluster. 4 | 5 | ## Prerequisites 6 | 7 | - An active EKS cluster and kubectl is configured to the correct EKS cluster 8 | - Clone this repo to your local and set the current working directory to the cloned repo 9 | 10 | ## Create a Namespace 11 | 12 | In Kubernetes namespaces provide a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace, but not across namespaces. Run the below command to create namespace 13 | 14 | kubectl create namespace eks-sample-app 15 | 16 | ## Create a Kubernetes deployment 17 | 18 | A Kubernetes Deployment tells Kubernetes how to create or modify instances of the pods that hold a containerized application. Deployments can help to efficiently scale the number of replica pods, enable the rollout of updated code in a controlled manner, or roll back to an earlier deployment version if necessary. To learn more, see Deployments in the Kubernetes documentation. 19 | 20 | Apply the deployment manifest to your cluster. 21 | 22 | kubectl apply -f 01-deployment.yaml 23 | 24 | Review the deployment configurations. 25 | 26 | kubectl describe deployments.apps --namespace eks-sample-app eks-sample-deployment 27 | 28 | 29 | ## Create a service 30 | 31 | A service allows you to access all replicas through a single IP address or name. For more information, see Service in the Kubernetes documentation. 32 | 33 | There are different types of Service objects, and the one we want to use for testing is called LoadBalancer, which means an external load balancer. Amazon EKS has support for the LoadBalancer type using the class Elastic Load Balancer (ELB). EKS will automatically provision and de-provision a ELB when we create and destroy service objects. 34 | 35 | Apply the service manifest to your cluster. 36 | 37 | kubectl apply -f 02-service.yaml 38 | 39 | View all resources that exist in the eks-sample-app namespace. 40 | 41 | kubectl get all --namespace eks-sample-app 42 | 43 | Screenshot 2023-01-15 at 11 07 31 44 | 45 | You can see AWS automatically provisioned an external loadbalancer for the service type loadbalancer and you can access the application outside the cluster with the DNS name available under the EXTERNAL-IP field. 46 | 47 | Screenshot 2023-01-15 at 11 07 37 48 | 49 | 50 | ## Deploy a new application version 51 | 52 | In kubernetes you can easily deploy a new version of an existing deployment by updating the image details. 53 | 54 | Apply `03-update-deployment.yaml` deployment manifest to your cluster. 55 | 56 | kubectl apply -f 03-update-deployment.yaml 57 | 58 | Kubernetes performs a rolling update by default to minimize the downtime during upgrades and create a replica set and pods. 59 | Review the deployment configurations and verify the image details 60 | 61 | kubectl describe deployments.apps --namespace eks-sample-app eks-sample-deployment 62 | 63 | Once you're finished with the sample application, you can remove the sample namespace, service, and deployment with the following command. 64 | 65 | kubectl delete namespace eks-sample-app 66 | -------------------------------------------------------------------------------- /02-deploy-prometheus/README.md: -------------------------------------------------------------------------------- 1 | # Deploy prometheus 2 | 3 | [Prometheus](https://prometheus.io/), a [Cloud Native Computing Foundation](https://cncf.io/) project, is is a popular open-source monitoring and alerting solution optimized for container environments. 4 | 5 | It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. 6 | 7 | Follow the instructions in this document to deploy a self-managed prometheus in EKE cluster and the instructions are based on [Prometheus Community Kubernetes Helm Charts](https://github.com/prometheus-community/helm-charts) 8 | 9 | If you are looking for a fully managed prometheus offering then please refer to [Amazon Managed Service for Prometheus](https://aws.amazon.com/prometheus/). 10 | 11 | ## Prerequisites 12 | 13 | - Kubernetes 1.22+ 14 | - Helm 3.9+ 15 | 16 | ## Get Repository Info 17 | 18 | ```console 19 | helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 20 | helm repo update 21 | ``` 22 | 23 | ## Install/Upgrade prometheus with default values 24 | 25 | ```console 26 | helm upgrade -install [RELEASE_NAME] prometheus-community/prometheus --namespace [K8S_NAMESPACE] --create-namespace --wait --debug 27 | ``` 28 | 29 | By default this chart installs additional, dependent charts: 30 | 31 | - [alertmanager](https://github.com/prometheus-community/helm-charts/tree/main/charts/alertmanager) 32 | - [kube-state-metrics](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-state-metrics) 33 | - [prometheus-node-exporter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-node-exporter) 34 | - [prometheus-pushgateway](https://github.com/walker-tom/helm-charts/tree/main/charts/prometheus-pushgateway) 35 | 36 | Run the following command to install prometheus without any additional add-ons 37 | 38 | ```console 39 | helm upgrade -install [RELEASE_NAME] prometheus-community/prometheus --set alertmanager.enabled=false --set kube-state-metrics.enabled=false --set prometheus-node-exporter.enabled=false --set prometheus-pushgateway.enabled=false --namespace [K8S_NAMESPACE] --create-namespace --wait --debug 40 | ``` 41 | 42 | The above commands install the latest chart version and use the `--version` argument to install a specific version of the prometheus chart. 43 | 44 | ```console 45 | helm upgrade -install [RELEASE_NAME] prometheus-community/prometheus --namespace [K8S_NAMESPACE] --version 18.0.0 --create-namespace --wait --debug 46 | ``` 47 | 48 | ## Install/Upgrade prometheus with custom values 49 | 50 | - Create a `values.yaml` file with custom helm chart inputs. Refer to the `values.yaml` file in this repo for sample configurations. 51 | 52 | - Refer to the [official promethues chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus) for recent configurations. 53 | 54 | Run the following command to install prometheus with custom configurations 55 | 56 | ```console 57 | helm upgrade -install [RELEASE_NAME] prometheus-community/prometheus --namespace [K8S_NAMESPACE] -f values.yaml --create-namespace --wait --debug 58 | ``` 59 | 60 | ## Scraping Pod Metrics 61 | 62 | This chart uses a default configuration that causes prometheus to scrape a variety of [kubernetes resource types](https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/values.yaml#L614), provided they have the correct annotations. 63 | 64 | In order to get prometheus to scrape pods, you must add annotations to the the required pods as below: 65 | 66 | ```yaml 67 | metadata: 68 | annotations: 69 | prometheus.io/scrape: "true" 70 | prometheus.io/path: /metrics 71 | prometheus.io/port: "8080" 72 | ``` 73 | 74 | You should adjust `prometheus.io/path` based on the URL that your pod serves metrics from. `prometheus.io/port` should be set to the port that your pod serves metrics from. Note that the values for `prometheus.io/scrape` and `prometheus.io/port` must be enclosed in double quotes. 75 | 76 | ## View/Query Pod Metrics 77 | 78 | This chart creates a `prometheus-server` service with `ClusterIP` type which is accessible only inside the cluster. Change the [service type](https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/values.yaml#L562) to `LoadBalancer` if you want to access prometheus outside cluster. 79 | 80 | Implement [basic-auth](https://prometheus.io/docs/guides/basic-auth/) and IP restrictions if you are exposing prometheus outside the cluster. 81 | 82 | Run the following `kubectl port-forward` command to connect to prometheus-server and go to `localhost:8080` in the browser. 83 | 84 | ```console 85 | kubectl port-forward --namespace [K8S_NAMESPACE] svc/prometheus-server 8080:80 86 | ``` 87 | 88 | Query the required metrics in promethues UI 89 | 90 | Screenshot 2023-01-26 at 14 53 24 91 | 92 | Screenshot 2023-01-26 at 14 55 32 93 | -------------------------------------------------------------------------------- /02-deploy-prometheus/values.yaml: -------------------------------------------------------------------------------- 1 | rbac: 2 | create: true 3 | 4 | podSecurityPolicy: 5 | enabled: false 6 | 7 | imagePullSecrets: 8 | # - name: "image-pull-secret" 9 | 10 | ## Define serviceAccount names for components. Defaults to component's fully qualified name. 11 | ## 12 | serviceAccounts: 13 | server: 14 | create: true 15 | name: 16 | annotations: {} 17 | 18 | ## Monitors ConfigMap changes and POSTs to a URL 19 | ## Ref: https://github.com/jimmidyson/configmap-reload 20 | ## 21 | configmapReload: 22 | prometheus: 23 | ## If false, the configmap-reload container will not be deployed 24 | ## 25 | enabled: true 26 | 27 | ## configmap-reload container name 28 | ## 29 | name: configmap-reload 30 | 31 | ## configmap-reload container image 32 | ## 33 | image: 34 | repository: jimmidyson/configmap-reload 35 | tag: v0.8.0 36 | # When digest is set to a non-empty value, images will be pulled by digest (regardless of tag value). 37 | digest: "" 38 | pullPolicy: IfNotPresent 39 | 40 | # containerPort: 9533 41 | 42 | ## Additional configmap-reload container arguments 43 | ## 44 | extraArgs: {} 45 | ## Additional configmap-reload volume directories 46 | ## 47 | extraVolumeDirs: [] 48 | 49 | 50 | ## Additional configmap-reload mounts 51 | ## 52 | extraConfigmapMounts: [] 53 | # - name: prometheus-alerts 54 | # mountPath: /etc/alerts.d 55 | # subPath: "" 56 | # configMap: prometheus-alerts 57 | # readOnly: true 58 | 59 | ## Security context to be added to configmap-reload container 60 | containerSecurityContext: {} 61 | 62 | ## configmap-reload resource requests and limits 63 | ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ 64 | ## 65 | resources: {} 66 | 67 | server: 68 | ## Prometheus server container name 69 | ## 70 | name: server 71 | 72 | ## Use a ClusterRole (and ClusterRoleBinding) 73 | ## - If set to false - we define a RoleBinding in the defined namespaces ONLY 74 | ## 75 | ## NB: because we need a Role with nonResourceURL's ("/metrics") - you must get someone with Cluster-admin privileges to define this role for you, before running with this setting enabled. 76 | ## This makes prometheus work - for users who do not have ClusterAdmin privs, but wants prometheus to operate on their own namespaces, instead of clusterwide. 77 | ## 78 | ## You MUST also set namespaces to the ones you have access to and want monitored by Prometheus. 79 | ## 80 | # useExistingClusterRoleName: nameofclusterrole 81 | 82 | ## namespaces to monitor (instead of monitoring all - clusterwide). Needed if you want to run without Cluster-admin privileges. 83 | # namespaces: 84 | # - yournamespace 85 | 86 | # sidecarContainers - add more containers to prometheus server 87 | # Key/Value where Key is the sidecar `- name: ` 88 | # Example: 89 | # sidecarContainers: 90 | # webserver: 91 | # image: nginx 92 | sidecarContainers: {} 93 | 94 | # sidecarTemplateValues - context to be used in template for sidecarContainers 95 | # Example: 96 | # sidecarTemplateValues: *your-custom-globals 97 | # sidecarContainers: 98 | # webserver: |- 99 | # {{ include "webserver-container-template" . }} 100 | # Template for `webserver-container-template` might looks like this: 101 | # image: "{{ .Values.server.sidecarTemplateValues.repository }}:{{ .Values.server.sidecarTemplateValues.tag }}" 102 | # ... 103 | # 104 | sidecarTemplateValues: {} 105 | 106 | ## Prometheus server container image 107 | ## 108 | image: 109 | repository: quay.io/prometheus/prometheus 110 | # if not set appVersion field from Chart.yaml is used 111 | tag: "" 112 | # When digest is set to a non-empty value, images will be pulled by digest (regardless of tag value). 113 | digest: "" 114 | pullPolicy: IfNotPresent 115 | 116 | ## prometheus server priorityClassName 117 | ## 118 | priorityClassName: "" 119 | 120 | ## EnableServiceLinks indicates whether information about services should be injected 121 | ## into pod's environment variables, matching the syntax of Docker links. 122 | ## WARNING: the field is unsupported and will be skipped in K8s prior to v1.13.0. 123 | ## 124 | enableServiceLinks: true 125 | 126 | ## The URL prefix at which the container can be accessed. Useful in the case the '-web.external-url' includes a slug 127 | ## so that the various internal URLs are still able to access as they are in the default case. 128 | ## (Optional) 129 | prefixURL: "" 130 | 131 | ## External URL which can access prometheus 132 | ## Maybe same with Ingress host name 133 | baseURL: "" 134 | 135 | ## Additional server container environment variables 136 | ## 137 | ## You specify this manually like you would a raw deployment manifest. 138 | ## This means you can bind in environment variables from secrets. 139 | ## 140 | ## e.g. static environment variable: 141 | ## - name: DEMO_GREETING 142 | ## value: "Hello from the environment" 143 | ## 144 | ## e.g. secret environment variable: 145 | ## - name: USERNAME 146 | ## valueFrom: 147 | ## secretKeyRef: 148 | ## name: mysecret 149 | ## key: username 150 | env: [] 151 | 152 | # List of flags to override default parameters, e.g: 153 | # - --enable-feature=agent 154 | # - --storage.agent.retention.max-time=30m 155 | defaultFlagsOverride: [] 156 | 157 | extraFlags: 158 | - web.enable-lifecycle 159 | ## web.enable-admin-api flag controls access to the administrative HTTP API which includes functionality such as 160 | ## deleting time series. This is disabled by default. 161 | # - web.enable-admin-api 162 | ## 163 | ## storage.tsdb.no-lockfile flag controls BD locking 164 | # - storage.tsdb.no-lockfile 165 | ## 166 | ## storage.tsdb.wal-compression flag enables compression of the write-ahead log (WAL) 167 | # - storage.tsdb.wal-compression 168 | 169 | ## Path to a configuration file on prometheus server container FS 170 | configPath: /etc/config/prometheus.yml 171 | 172 | ### The data directory used by prometheus to set --storage.tsdb.path 173 | ### When empty server.persistentVolume.mountPath is used instead 174 | storagePath: "" 175 | 176 | global: 177 | ## How frequently to scrape targets by default 178 | ## 179 | scrape_interval: 1m 180 | ## How long until a scrape request times out 181 | ## 182 | scrape_timeout: 10s 183 | ## How frequently to evaluate rules 184 | ## 185 | evaluation_interval: 1m 186 | ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write 187 | ## 188 | remoteWrite: [] 189 | ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read 190 | ## 191 | remoteRead: [] 192 | 193 | ## Custom HTTP headers for Liveness/Readiness/Startup Probe 194 | ## 195 | ## Useful for providing HTTP Basic Auth to healthchecks 196 | probeHeaders: [] 197 | 198 | ## Additional Prometheus server container arguments 199 | ## 200 | extraArgs: {} 201 | 202 | ## Additional InitContainers to initialize the pod 203 | ## 204 | extraInitContainers: [] 205 | 206 | ## Additional Prometheus server Volume mounts 207 | ## 208 | extraVolumeMounts: [] 209 | 210 | ## Additional Prometheus server Volumes 211 | ## 212 | extraVolumes: [] 213 | 214 | ## Additional Prometheus server hostPath mounts 215 | ## 216 | extraHostPathMounts: [] 217 | # - name: certs-dir 218 | # mountPath: /etc/kubernetes/certs 219 | # subPath: "" 220 | # hostPath: /etc/kubernetes/certs 221 | # readOnly: true 222 | 223 | extraConfigmapMounts: [] 224 | # - name: certs-configmap 225 | # mountPath: /prometheus 226 | # subPath: "" 227 | # configMap: certs-configmap 228 | # readOnly: true 229 | 230 | ## Additional Prometheus server Secret mounts 231 | # Defines additional mounts with secrets. Secrets must be manually created in the namespace. 232 | extraSecretMounts: [] 233 | # - name: secret-files 234 | # mountPath: /etc/secrets 235 | # subPath: "" 236 | # secretName: prom-secret-files 237 | # readOnly: true 238 | 239 | ## ConfigMap override where fullname is {{.Release.Name}}-{{.Values.server.configMapOverrideName}} 240 | ## Defining configMapOverrideName will cause templates/server-configmap.yaml 241 | ## to NOT generate a ConfigMap resource 242 | ## 243 | configMapOverrideName: "" 244 | 245 | ## Extra labels for Prometheus server ConfigMap (ConfigMap that holds serverFiles) 246 | extraConfigmapLabels: {} 247 | 248 | ingress: 249 | ## If true, Prometheus server Ingress will be created 250 | ## 251 | enabled: false 252 | 253 | # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName 254 | # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress 255 | # ingressClassName: nginx 256 | 257 | ## Prometheus server Ingress annotations 258 | ## 259 | annotations: {} 260 | # kubernetes.io/ingress.class: nginx 261 | # kubernetes.io/tls-acme: 'true' 262 | 263 | ## Prometheus server Ingress additional labels 264 | ## 265 | extraLabels: {} 266 | 267 | ## Prometheus server Ingress hostnames with optional path 268 | ## Must be provided if Ingress is enabled 269 | ## 270 | hosts: [] 271 | # - prometheus.domain.com 272 | # - domain.com/prometheus 273 | 274 | path: / 275 | 276 | # pathType is only for k8s >= 1.18 277 | pathType: Prefix 278 | 279 | ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services. 280 | extraPaths: [] 281 | # - path: /* 282 | # backend: 283 | # serviceName: ssl-redirect 284 | # servicePort: use-annotation 285 | 286 | ## Prometheus server Ingress TLS configuration 287 | ## Secrets must be manually created in the namespace 288 | ## 289 | tls: [] 290 | # - secretName: prometheus-server-tls 291 | # hosts: 292 | # - prometheus.domain.com 293 | 294 | ## Server Deployment Strategy type 295 | strategy: 296 | type: Recreate 297 | 298 | ## hostAliases allows adding entries to /etc/hosts inside the containers 299 | hostAliases: [] 300 | # - ip: "127.0.0.1" 301 | # hostnames: 302 | # - "example.com" 303 | 304 | ## Node tolerations for server scheduling to nodes with taints 305 | ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ 306 | ## 307 | tolerations: [] 308 | # - key: "key" 309 | # operator: "Equal|Exists" 310 | # value: "value" 311 | # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)" 312 | 313 | ## Node labels for Prometheus server pod assignment 314 | ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ 315 | ## 316 | nodeSelector: {} 317 | 318 | ## Pod affinity 319 | ## 320 | affinity: {} 321 | 322 | ## PodDisruptionBudget settings 323 | ## ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ 324 | ## 325 | podDisruptionBudget: 326 | enabled: false 327 | maxUnavailable: 1 328 | 329 | ## Use an alternate scheduler, e.g. "stork". 330 | ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ 331 | ## 332 | # schedulerName: 333 | 334 | persistentVolume: 335 | ## If true, Prometheus server will create/use a Persistent Volume Claim 336 | ## If false, use emptyDir 337 | ## 338 | enabled: true 339 | 340 | ## Prometheus server data Persistent Volume access modes 341 | ## Must match those of existing PV or dynamic provisioner 342 | ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ 343 | ## 344 | accessModes: 345 | - ReadWriteOnce 346 | 347 | ## Prometheus server data Persistent Volume labels 348 | ## 349 | labels: {} 350 | 351 | ## Prometheus server data Persistent Volume annotations 352 | ## 353 | annotations: {} 354 | 355 | ## Prometheus server data Persistent Volume existing claim name 356 | ## Requires server.persistentVolume.enabled: true 357 | ## If defined, PVC must be created manually before volume will be bound 358 | existingClaim: "" 359 | 360 | ## Prometheus server data Persistent Volume mount root path 361 | ## 362 | mountPath: /data 363 | 364 | ## Prometheus server data Persistent Volume size 365 | ## 366 | size: 8Gi 367 | 368 | ## Prometheus server data Persistent Volume Storage Class 369 | ## If defined, storageClassName: 370 | ## If set to "-", storageClassName: "", which disables dynamic provisioning 371 | ## If undefined (the default) or set to null, no storageClassName spec is 372 | ## set, choosing the default provisioner. (gp2 on AWS, standard on 373 | ## GKE, AWS & OpenStack) 374 | ## 375 | # storageClass: "-" 376 | 377 | ## Prometheus server data Persistent Volume Binding Mode 378 | ## If defined, volumeBindingMode: 379 | ## If undefined (the default) or set to null, no volumeBindingMode spec is 380 | ## set, choosing the default mode. 381 | ## 382 | # volumeBindingMode: "" 383 | 384 | ## Subdirectory of Prometheus server data Persistent Volume to mount 385 | ## Useful if the volume's root directory is not empty 386 | ## 387 | subPath: "" 388 | 389 | ## Persistent Volume Claim Selector 390 | ## Useful if Persistent Volumes have been provisioned in advance 391 | ## Ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#selector 392 | ## 393 | # selector: 394 | # matchLabels: 395 | # release: "stable" 396 | # matchExpressions: 397 | # - { key: environment, operator: In, values: [ dev ] } 398 | 399 | ## Persistent Volume Name 400 | ## Useful if Persistent Volumes have been provisioned in advance and you want to use a specific one 401 | ## 402 | # volumeName: "" 403 | 404 | emptyDir: 405 | ## Prometheus server emptyDir volume size limit 406 | ## 407 | sizeLimit: "" 408 | 409 | ## Annotations to be added to Prometheus server pods 410 | ## 411 | podAnnotations: {} 412 | # iam.amazonaws.com/role: prometheus 413 | 414 | ## Labels to be added to Prometheus server pods 415 | ## 416 | podLabels: {} 417 | 418 | ## Prometheus AlertManager configuration 419 | ## 420 | alertmanagers: [] 421 | 422 | ## Specify if a Pod Security Policy for node-exporter must be created 423 | ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ 424 | ## 425 | podSecurityPolicy: 426 | annotations: {} 427 | ## Specify pod annotations 428 | ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor 429 | ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp 430 | ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl 431 | ## 432 | # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*' 433 | # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default' 434 | # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default' 435 | 436 | ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below) 437 | ## 438 | replicaCount: 1 439 | 440 | ## Annotations to be added to deployment 441 | ## 442 | deploymentAnnotations: {} 443 | 444 | statefulSet: 445 | ## If true, use a statefulset instead of a deployment for pod management. 446 | ## This allows to scale replicas to more than 1 pod 447 | ## 448 | enabled: false 449 | 450 | annotations: {} 451 | labels: {} 452 | podManagementPolicy: OrderedReady 453 | 454 | ## Alertmanager headless service to use for the statefulset 455 | ## 456 | headless: 457 | annotations: {} 458 | labels: {} 459 | servicePort: 80 460 | ## Enable gRPC port on service to allow auto discovery with thanos-querier 461 | gRPC: 462 | enabled: false 463 | servicePort: 10901 464 | # nodePort: 10901 465 | 466 | ## Prometheus server readiness and liveness probe initial delay and timeout 467 | ## Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ 468 | ## 469 | tcpSocketProbeEnabled: false 470 | probeScheme: HTTP 471 | readinessProbeInitialDelay: 30 472 | readinessProbePeriodSeconds: 5 473 | readinessProbeTimeout: 4 474 | readinessProbeFailureThreshold: 3 475 | readinessProbeSuccessThreshold: 1 476 | livenessProbeInitialDelay: 30 477 | livenessProbePeriodSeconds: 15 478 | livenessProbeTimeout: 10 479 | livenessProbeFailureThreshold: 3 480 | livenessProbeSuccessThreshold: 1 481 | startupProbe: 482 | enabled: false 483 | periodSeconds: 5 484 | failureThreshold: 30 485 | timeoutSeconds: 10 486 | 487 | ## Prometheus server resource requests and limits 488 | ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ 489 | ## 490 | resources: {} 491 | # limits: 492 | # cpu: 500m 493 | # memory: 512Mi 494 | # requests: 495 | # cpu: 500m 496 | # memory: 512Mi 497 | 498 | # Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico), 499 | # because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working 500 | ## 501 | hostNetwork: false 502 | 503 | # When hostNetwork is enabled, this will set to ClusterFirstWithHostNet automatically 504 | dnsPolicy: ClusterFirst 505 | 506 | # Use hostPort 507 | # hostPort: 9090 508 | 509 | ## Vertical Pod Autoscaler config 510 | ## Ref: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler 511 | verticalAutoscaler: 512 | ## If true a VPA object will be created for the controller (either StatefulSet or Deployemnt, based on above configs) 513 | enabled: false 514 | # updateMode: "Auto" 515 | # containerPolicies: 516 | # - containerName: 'prometheus-server' 517 | 518 | # Custom DNS configuration to be added to prometheus server pods 519 | dnsConfig: {} 520 | # nameservers: 521 | # - 1.2.3.4 522 | # searches: 523 | # - ns1.svc.cluster-domain.example 524 | # - my.dns.search.suffix 525 | # options: 526 | # - name: ndots 527 | # value: "2" 528 | # - name: edns0 529 | 530 | ## Security context to be added to server pods 531 | ## 532 | securityContext: 533 | runAsUser: 65534 534 | runAsNonRoot: true 535 | runAsGroup: 65534 536 | fsGroup: 65534 537 | 538 | ## Security context to be added to server container 539 | ## 540 | containerSecurityContext: {} 541 | 542 | service: 543 | ## If false, no Service will be created for the Prometheus server 544 | ## 545 | enabled: true 546 | 547 | annotations: {} 548 | labels: {} 549 | clusterIP: "" 550 | 551 | ## List of IP addresses at which the Prometheus server service is available 552 | ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips 553 | ## 554 | externalIPs: [] 555 | 556 | loadBalancerIP: "" 557 | loadBalancerSourceRanges: [] 558 | servicePort: 80 559 | sessionAffinity: None 560 | type: ClusterIP 561 | 562 | ## Enable gRPC port on service to allow auto discovery with thanos-querier 563 | gRPC: 564 | enabled: false 565 | servicePort: 10901 566 | # nodePort: 10901 567 | 568 | ## If using a statefulSet (statefulSet.enabled=true), configure the 569 | ## service to connect to a specific replica to have a consistent view 570 | ## of the data. 571 | statefulsetReplica: 572 | enabled: false 573 | replica: 0 574 | 575 | ## Prometheus server pod termination grace period 576 | ## 577 | terminationGracePeriodSeconds: 300 578 | 579 | ## Prometheus data retention period (default if not specified is 15 days) 580 | ## 581 | retention: "15d" 582 | 583 | ## Prometheus server ConfigMap entries for rule files (allow prometheus labels interpolation) 584 | ruleFiles: {} 585 | 586 | ## Prometheus server ConfigMap entries 587 | ## 588 | serverFiles: 589 | ## Alerts configuration 590 | ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ 591 | alerting_rules.yml: {} 592 | # groups: 593 | # - name: Instances 594 | # rules: 595 | # - alert: InstanceDown 596 | # expr: up == 0 597 | # for: 5m 598 | # labels: 599 | # severity: page 600 | # annotations: 601 | # description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' 602 | # summary: 'Instance {{ $labels.instance }} down' 603 | ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use alerting_rules.yml 604 | alerts: {} 605 | 606 | ## Records configuration 607 | ## Ref: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/ 608 | recording_rules.yml: {} 609 | ## DEPRECATED DEFAULT VALUE, unless explicitly naming your files, please use recording_rules.yml 610 | rules: {} 611 | 612 | prometheus.yml: 613 | rule_files: 614 | - /etc/config/recording_rules.yml 615 | - /etc/config/alerting_rules.yml 616 | ## Below two files are DEPRECATED will be removed from this default values file 617 | - /etc/config/rules 618 | - /etc/config/alerts 619 | 620 | scrape_configs: 621 | - job_name: prometheus 622 | static_configs: 623 | - targets: 624 | - localhost:9090 625 | 626 | # A scrape configuration for running Prometheus on a Kubernetes cluster. 627 | # This uses separate scrape configs for cluster components (i.e. API server, node) 628 | # and services to allow each to use different authentication configs. 629 | # 630 | # Kubernetes labels will be added as Prometheus labels on metrics via the 631 | # `labelmap` relabeling action. 632 | 633 | # Scrape config for API servers. 634 | # 635 | # Kubernetes exposes API servers as endpoints to the default/kubernetes 636 | # service so this uses `endpoints` role and uses relabelling to only keep 637 | # the endpoints associated with the default/kubernetes service using the 638 | # default named port `https`. This works for single API server deployments as 639 | # well as HA API server deployments. 640 | - job_name: 'kubernetes-apiservers' 641 | 642 | kubernetes_sd_configs: 643 | - role: endpoints 644 | 645 | # Default to scraping over https. If required, just disable this or change to 646 | # `http`. 647 | scheme: https 648 | 649 | # This TLS & bearer token file config is used to connect to the actual scrape 650 | # endpoints for cluster components. This is separate to discovery auth 651 | # configuration because discovery & scraping are two separate concerns in 652 | # Prometheus. The discovery auth config is automatic if Prometheus runs inside 653 | # the cluster. Otherwise, more config options have to be provided within the 654 | # . 655 | tls_config: 656 | ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 657 | # If your node certificates are self-signed or use a different CA to the 658 | # master CA, then disable certificate verification below. Note that 659 | # certificate verification is an integral part of a secure infrastructure 660 | # so this should only be disabled in a controlled environment. You can 661 | # disable certificate verification by uncommenting the line below. 662 | # 663 | insecure_skip_verify: true 664 | bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 665 | 666 | # Keep only the default/kubernetes service endpoints for the https port. This 667 | # will add targets for each API server which Kubernetes adds an endpoint to 668 | # the default/kubernetes service. 669 | relabel_configs: 670 | - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] 671 | action: keep 672 | regex: default;kubernetes;https 673 | 674 | - job_name: 'kubernetes-nodes' 675 | 676 | # Default to scraping over https. If required, just disable this or change to 677 | # `http`. 678 | scheme: https 679 | 680 | # This TLS & bearer token file config is used to connect to the actual scrape 681 | # endpoints for cluster components. This is separate to discovery auth 682 | # configuration because discovery & scraping are two separate concerns in 683 | # Prometheus. The discovery auth config is automatic if Prometheus runs inside 684 | # the cluster. Otherwise, more config options have to be provided within the 685 | # . 686 | tls_config: 687 | ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 688 | # If your node certificates are self-signed or use a different CA to the 689 | # master CA, then disable certificate verification below. Note that 690 | # certificate verification is an integral part of a secure infrastructure 691 | # so this should only be disabled in a controlled environment. You can 692 | # disable certificate verification by uncommenting the line below. 693 | # 694 | insecure_skip_verify: true 695 | bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 696 | 697 | kubernetes_sd_configs: 698 | - role: node 699 | 700 | relabel_configs: 701 | - action: labelmap 702 | regex: __meta_kubernetes_node_label_(.+) 703 | - target_label: __address__ 704 | replacement: kubernetes.default.svc:443 705 | - source_labels: [__meta_kubernetes_node_name] 706 | regex: (.+) 707 | target_label: __metrics_path__ 708 | replacement: /api/v1/nodes/$1/proxy/metrics 709 | 710 | 711 | - job_name: 'kubernetes-nodes-cadvisor' 712 | 713 | # Default to scraping over https. If required, just disable this or change to 714 | # `http`. 715 | scheme: https 716 | 717 | # This TLS & bearer token file config is used to connect to the actual scrape 718 | # endpoints for cluster components. This is separate to discovery auth 719 | # configuration because discovery & scraping are two separate concerns in 720 | # Prometheus. The discovery auth config is automatic if Prometheus runs inside 721 | # the cluster. Otherwise, more config options have to be provided within the 722 | # . 723 | tls_config: 724 | ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 725 | # If your node certificates are self-signed or use a different CA to the 726 | # master CA, then disable certificate verification below. Note that 727 | # certificate verification is an integral part of a secure infrastructure 728 | # so this should only be disabled in a controlled environment. You can 729 | # disable certificate verification by uncommenting the line below. 730 | # 731 | insecure_skip_verify: true 732 | bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 733 | 734 | kubernetes_sd_configs: 735 | - role: node 736 | 737 | # This configuration will work only on kubelet 1.7.3+ 738 | # As the scrape endpoints for cAdvisor have changed 739 | # if you are using older version you need to change the replacement to 740 | # replacement: /api/v1/nodes/$1:4194/proxy/metrics 741 | # more info here https://github.com/coreos/prometheus-operator/issues/633 742 | relabel_configs: 743 | - action: labelmap 744 | regex: __meta_kubernetes_node_label_(.+) 745 | - target_label: __address__ 746 | replacement: kubernetes.default.svc:443 747 | - source_labels: [__meta_kubernetes_node_name] 748 | regex: (.+) 749 | target_label: __metrics_path__ 750 | replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor 751 | 752 | # Metric relabel configs to apply to samples before ingestion. 753 | # [Metric Relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs) 754 | # metric_relabel_configs: 755 | # - action: labeldrop 756 | # regex: (kubernetes_io_hostname|failure_domain_beta_kubernetes_io_region|beta_kubernetes_io_os|beta_kubernetes_io_arch|beta_kubernetes_io_instance_type|failure_domain_beta_kubernetes_io_zone) 757 | 758 | # Scrape config for service endpoints. 759 | # 760 | # The relabeling allows the actual service scrape endpoint to be configured 761 | # via the following annotations: 762 | # 763 | # * `prometheus.io/scrape`: Only scrape services that have a value of 764 | # `true`, except if `prometheus.io/scrape-slow` is set to `true` as well. 765 | # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need 766 | # to set this to `https` & most likely set the `tls_config` of the scrape config. 767 | # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. 768 | # * `prometheus.io/port`: If the metrics are exposed on a different port to the 769 | # service then set this appropriately. 770 | # * `prometheus.io/param_`: If the metrics endpoint uses parameters 771 | # then you can set any parameter 772 | - job_name: 'kubernetes-service-endpoints' 773 | honor_labels: true 774 | 775 | kubernetes_sd_configs: 776 | - role: endpoints 777 | 778 | relabel_configs: 779 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] 780 | action: keep 781 | regex: true 782 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow] 783 | action: drop 784 | regex: true 785 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] 786 | action: replace 787 | target_label: __scheme__ 788 | regex: (https?) 789 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] 790 | action: replace 791 | target_label: __metrics_path__ 792 | regex: (.+) 793 | - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] 794 | action: replace 795 | target_label: __address__ 796 | regex: (.+?)(?::\d+)?;(\d+) 797 | replacement: $1:$2 798 | - action: labelmap 799 | regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+) 800 | replacement: __param_$1 801 | - action: labelmap 802 | regex: __meta_kubernetes_service_label_(.+) 803 | - source_labels: [__meta_kubernetes_namespace] 804 | action: replace 805 | target_label: namespace 806 | - source_labels: [__meta_kubernetes_service_name] 807 | action: replace 808 | target_label: service 809 | - source_labels: [__meta_kubernetes_pod_node_name] 810 | action: replace 811 | target_label: node 812 | 813 | # Scrape config for slow service endpoints; same as above, but with a larger 814 | # timeout and a larger interval 815 | # 816 | # The relabeling allows the actual service scrape endpoint to be configured 817 | # via the following annotations: 818 | # 819 | # * `prometheus.io/scrape-slow`: Only scrape services that have a value of `true` 820 | # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need 821 | # to set this to `https` & most likely set the `tls_config` of the scrape config. 822 | # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. 823 | # * `prometheus.io/port`: If the metrics are exposed on a different port to the 824 | # service then set this appropriately. 825 | # * `prometheus.io/param_`: If the metrics endpoint uses parameters 826 | # then you can set any parameter 827 | - job_name: 'kubernetes-service-endpoints-slow' 828 | honor_labels: true 829 | 830 | scrape_interval: 5m 831 | scrape_timeout: 30s 832 | 833 | kubernetes_sd_configs: 834 | - role: endpoints 835 | 836 | relabel_configs: 837 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow] 838 | action: keep 839 | regex: true 840 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] 841 | action: replace 842 | target_label: __scheme__ 843 | regex: (https?) 844 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] 845 | action: replace 846 | target_label: __metrics_path__ 847 | regex: (.+) 848 | - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] 849 | action: replace 850 | target_label: __address__ 851 | regex: (.+?)(?::\d+)?;(\d+) 852 | replacement: $1:$2 853 | - action: labelmap 854 | regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+) 855 | replacement: __param_$1 856 | - action: labelmap 857 | regex: __meta_kubernetes_service_label_(.+) 858 | - source_labels: [__meta_kubernetes_namespace] 859 | action: replace 860 | target_label: namespace 861 | - source_labels: [__meta_kubernetes_service_name] 862 | action: replace 863 | target_label: service 864 | - source_labels: [__meta_kubernetes_pod_node_name] 865 | action: replace 866 | target_label: node 867 | 868 | - job_name: 'prometheus-pushgateway' 869 | honor_labels: true 870 | 871 | kubernetes_sd_configs: 872 | - role: service 873 | 874 | relabel_configs: 875 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] 876 | action: keep 877 | regex: pushgateway 878 | 879 | # Example scrape config for probing services via the Blackbox Exporter. 880 | # 881 | # The relabeling allows the actual service scrape endpoint to be configured 882 | # via the following annotations: 883 | # 884 | # * `prometheus.io/probe`: Only probe services that have a value of `true` 885 | - job_name: 'kubernetes-services' 886 | honor_labels: true 887 | 888 | metrics_path: /probe 889 | params: 890 | module: [http_2xx] 891 | 892 | kubernetes_sd_configs: 893 | - role: service 894 | 895 | relabel_configs: 896 | - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] 897 | action: keep 898 | regex: true 899 | - source_labels: [__address__] 900 | target_label: __param_target 901 | - target_label: __address__ 902 | replacement: blackbox 903 | - source_labels: [__param_target] 904 | target_label: instance 905 | - action: labelmap 906 | regex: __meta_kubernetes_service_label_(.+) 907 | - source_labels: [__meta_kubernetes_namespace] 908 | target_label: namespace 909 | - source_labels: [__meta_kubernetes_service_name] 910 | target_label: service 911 | 912 | # Example scrape config for pods 913 | # 914 | # The relabeling allows the actual pod scrape endpoint to be configured via the 915 | # following annotations: 916 | # 917 | # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`, 918 | # except if `prometheus.io/scrape-slow` is set to `true` as well. 919 | # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need 920 | # to set this to `https` & most likely set the `tls_config` of the scrape config. 921 | # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. 922 | # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`. 923 | - job_name: 'kubernetes-pods' 924 | honor_labels: true 925 | 926 | kubernetes_sd_configs: 927 | - role: pod 928 | 929 | relabel_configs: 930 | - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] 931 | action: keep 932 | regex: true 933 | - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow] 934 | action: drop 935 | regex: true 936 | - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] 937 | action: replace 938 | regex: (https?) 939 | target_label: __scheme__ 940 | - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] 941 | action: replace 942 | target_label: __metrics_path__ 943 | regex: (.+) 944 | - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] 945 | action: replace 946 | regex: (.+?)(?::\d+)?;(\d+) 947 | replacement: $1:$2 948 | target_label: __address__ 949 | - action: labelmap 950 | regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+) 951 | replacement: __param_$1 952 | - action: labelmap 953 | regex: __meta_kubernetes_pod_label_(.+) 954 | - source_labels: [__meta_kubernetes_namespace] 955 | action: replace 956 | target_label: namespace 957 | - source_labels: [__meta_kubernetes_pod_name] 958 | action: replace 959 | target_label: pod 960 | - source_labels: [__meta_kubernetes_pod_phase] 961 | regex: Pending|Succeeded|Failed|Completed 962 | action: drop 963 | 964 | # Example Scrape config for pods which should be scraped slower. An useful example 965 | # would be stackriver-exporter which queries an API on every scrape of the pod 966 | # 967 | # The relabeling allows the actual pod scrape endpoint to be configured via the 968 | # following annotations: 969 | # 970 | # * `prometheus.io/scrape-slow`: Only scrape pods that have a value of `true` 971 | # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need 972 | # to set this to `https` & most likely set the `tls_config` of the scrape config. 973 | # * `prometheus.io/path`: If the metrics path is not `/metrics` override this. 974 | # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`. 975 | - job_name: 'kubernetes-pods-slow' 976 | honor_labels: true 977 | 978 | scrape_interval: 5m 979 | scrape_timeout: 30s 980 | 981 | kubernetes_sd_configs: 982 | - role: pod 983 | 984 | relabel_configs: 985 | - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow] 986 | action: keep 987 | regex: true 988 | - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] 989 | action: replace 990 | regex: (https?) 991 | target_label: __scheme__ 992 | - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] 993 | action: replace 994 | target_label: __metrics_path__ 995 | regex: (.+) 996 | - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] 997 | action: replace 998 | regex: (.+?)(?::\d+)?;(\d+) 999 | replacement: $1:$2 1000 | target_label: __address__ 1001 | - action: labelmap 1002 | regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+) 1003 | replacement: __param_$1 1004 | - action: labelmap 1005 | regex: __meta_kubernetes_pod_label_(.+) 1006 | - source_labels: [__meta_kubernetes_namespace] 1007 | action: replace 1008 | target_label: namespace 1009 | - source_labels: [__meta_kubernetes_pod_name] 1010 | action: replace 1011 | target_label: pod 1012 | - source_labels: [__meta_kubernetes_pod_phase] 1013 | regex: Pending|Succeeded|Failed|Completed 1014 | action: drop 1015 | 1016 | # adds additional scrape configs to prometheus.yml 1017 | # must be a string so you have to add a | after extraScrapeConfigs: 1018 | # example adds prometheus-blackbox-exporter scrape config 1019 | extraScrapeConfigs: 1020 | # - job_name: 'prometheus-blackbox-exporter' 1021 | # metrics_path: /probe 1022 | # params: 1023 | # module: [http_2xx] 1024 | # static_configs: 1025 | # - targets: 1026 | # - https://example.com 1027 | # relabel_configs: 1028 | # - source_labels: [__address__] 1029 | # target_label: __param_target 1030 | # - source_labels: [__param_target] 1031 | # target_label: instance 1032 | # - target_label: __address__ 1033 | # replacement: prometheus-blackbox-exporter:9115 1034 | 1035 | # Adds option to add alert_relabel_configs to avoid duplicate alerts in alertmanager 1036 | # useful in H/A prometheus with different external labels but the same alerts 1037 | alertRelabelConfigs: 1038 | # alert_relabel_configs: 1039 | # - source_labels: [dc] 1040 | # regex: (.+)\d+ 1041 | # target_label: dc 1042 | 1043 | networkPolicy: 1044 | ## Enable creation of NetworkPolicy resources. 1045 | ## 1046 | enabled: false 1047 | 1048 | # Force namespace of namespaced resources 1049 | forceNamespace: null 1050 | 1051 | # Extra manifests to deploy as an array 1052 | extraManifests: [] 1053 | # - apiVersion: v1 1054 | # kind: ConfigMap 1055 | # metadata: 1056 | # labels: 1057 | # name: prometheus-extra 1058 | # data: 1059 | # extra-data: "value" 1060 | 1061 | # Configuration of subcharts defined in Chart.yaml 1062 | 1063 | ## alertmanager sub-chart configurable values 1064 | ## Please see https://github.com/prometheus-community/helm-charts/tree/main/charts/alertmanager 1065 | ## 1066 | alertmanager: 1067 | ## If false, alertmanager will not be installed 1068 | ## 1069 | enabled: true 1070 | 1071 | persistence: 1072 | size: 2Gi 1073 | 1074 | podSecurityContext: 1075 | runAsUser: 65534 1076 | runAsNonRoot: true 1077 | runAsGroup: 65534 1078 | fsGroup: 65534 1079 | 1080 | ## kube-state-metrics sub-chart configurable values 1081 | ## Please see https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-state-metrics 1082 | ## 1083 | kube-state-metrics: 1084 | ## If false, kube-state-metrics sub-chart will not be installed 1085 | ## 1086 | enabled: true 1087 | 1088 | ## promtheus-node-exporter sub-chart configurable values 1089 | ## Please see https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-node-exporter 1090 | ## 1091 | prometheus-node-exporter: 1092 | ## If false, node-exporter will not be installed 1093 | ## 1094 | enabled: true 1095 | 1096 | rbac: 1097 | pspEnabled: false 1098 | 1099 | containerSecurityContext: 1100 | allowPrivilegeEscalation: false 1101 | 1102 | ## pprometheus-pushgateway sub-chart configurable values 1103 | ## Please see https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-pushgateway 1104 | ## 1105 | prometheus-pushgateway: 1106 | ## If false, pushgateway will not be installed 1107 | ## 1108 | enabled: true 1109 | 1110 | # Optional service annotations 1111 | serviceAnnotations: 1112 | prometheus.io/probe: pushgateway 1113 | -------------------------------------------------------------------------------- /03-deploy-grafana/README.md: -------------------------------------------------------------------------------- 1 | # Deploy grafana 2 | 3 | [Grafana](https://grafana.com/) is an open-source observability and data visualization platform that allows you to query, visualize, alert on and understand your metrics no matter where they are stored. 4 | 5 | Follow the instructions in this document to deploy a self-managed Grafana instance in EKE cluster and the instructions are based on [Grafana Community Kubernetes Helm Charts](https://github.com/grafana/helm-charts/tree/main/charts/grafana) 6 | 7 | If you are looking for a fully managed Grafana solution, then please refer to [Amazon Managed Grafana](https://aws.amazon.com/grafana/) or [Grafana Cloud](https://grafana.com/products/cloud/). 8 | 9 | ## Prerequisites 10 | 11 | - Kubernetes 1.22+ 12 | - Helm 3.9+ 13 | 14 | 15 | ## Get Repository Info 16 | 17 | ```console 18 | helm repo add grafana https://grafana.github.io/helm-charts 19 | helm repo update 20 | ``` 21 | 22 | ## Install/Upgrade grafana with default values 23 | 24 | ```console 25 | helm upgrade -install [RELEASE_NAME] grafana/grafana --namespace [K8S_NAMESPACE] --create-namespace --wait --debug 26 | ``` 27 | 28 | The above commands install the latest chart version and use the `--version` argument to install a specific version of the prometheus chart. 29 | 30 | ```console 31 | helm upgrade -install [RELEASE_NAME] grafana/grafana --namespace [K8S_NAMESPACE] --version 18.0.0 --create-namespace --wait --debug 32 | ``` 33 | 34 | ## Install/Upgrade prometheus with custom values 35 | 36 | - Create a `values.yaml` file with custom helm chart inputs. Refer to the `values.yaml` file in this repo for sample configurations. 37 | 38 | - Refer to the [official grafana chart](https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.yaml)) for recent configurations. 39 | 40 | Run the following command to install prometheus with custom configurations 41 | 42 | ```console 43 | helm upgrade -install [RELEASE_NAME] grafana/grafana --namespace [K8S_NAMESPACE] -f values.yaml --create-namespace --wait --debug 44 | ``` 45 | 46 | ## Access Grafana 47 | 48 | This chart creates a `grafana` service with `ClusterIP` type which is accessible only inside the cluster. Change the [service type](https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.yaml#L173) to `LoadBalancer` if you want to access grafana outside cluster. 49 | 50 | Run the following `kubectl port-forward` command to connect to grafana and go to `localhost:3000` in the browser or use the loadbalancer DNS address 51 | 52 | ```console 53 | kubectl port-forward --namespace [K8S_NAMESPACE] svc/grafana 3000:80 54 | ``` 55 | 56 | You can get the default username and password from the Kubernetes secret 57 | 58 | ```console 59 | kubectl get secrets grafana --template='{{ range $key, $value := .data }}{{ printf "%s: %s\n" $key ($value | base64decode) }}{{ end }}' 60 | ``` 61 | 62 | Login to grafana with the default username and password 63 | 64 | Screenshot 2023-01-26 at 17 01 01 65 | 66 | 67 | ## Configure prometheus Datasource 68 | 69 | Follow the below steps to configure the prometheus data source and access the data stored in prometheus 70 | 71 | Goto Datasources -> Add data source -> Select Prometheus -> configure the datasource with prometheus endpoint -> Click "Save & Exit" 72 | 73 | Screenshot 2023-01-26 at 17 03 37 74 | 75 | You should see a Successful "Data source is working" message if the prometheus endpoint is configured as expected. 76 | 77 | Screenshot 2023-01-26 at 17 11 29 78 | 79 | You can now able to query the metrics in grafana and create dashboards/setup alerts. 80 | 81 | Screenshot 2023-01-26 at 17 05 38 82 | -------------------------------------------------------------------------------- /03-deploy-grafana/values.yaml: -------------------------------------------------------------------------------- 1 | global: 2 | # To help compatibility with other charts which use global.imagePullSecrets. 3 | # Allow either an array of {name: pullSecret} maps (k8s-style), or an array of strings (more common helm-style). 4 | # Can be tempalted. 5 | # global: 6 | # imagePullSecrets: 7 | # - name: pullSecret1 8 | # - name: pullSecret2 9 | # or 10 | # global: 11 | # imagePullSecrets: 12 | # - pullSecret1 13 | # - pullSecret2 14 | imagePullSecrets: [] 15 | 16 | rbac: 17 | create: true 18 | ## Use an existing ClusterRole/Role (depending on rbac.namespaced false/true) 19 | # useExistingRole: name-of-some-(cluster)role 20 | pspEnabled: true 21 | pspUseAppArmor: true 22 | namespaced: false 23 | extraRoleRules: [] 24 | # - apiGroups: [] 25 | # resources: [] 26 | # verbs: [] 27 | extraClusterRoleRules: [] 28 | # - apiGroups: [] 29 | # resources: [] 30 | # verbs: [] 31 | serviceAccount: 32 | create: true 33 | name: 34 | nameTest: 35 | ## ServiceAccount labels. 36 | labels: {} 37 | ## Service account annotations. Can be templated. 38 | # annotations: 39 | # eks.amazonaws.com/role-arn: arn:aws:iam::123456789000:role/iam-role-name-here 40 | autoMount: true 41 | 42 | replicas: 1 43 | 44 | ## Create a headless service for the deployment 45 | headlessService: false 46 | 47 | ## Create HorizontalPodAutoscaler object for deployment type 48 | # 49 | autoscaling: 50 | enabled: false 51 | minReplicas: 1 52 | maxReplicas: 5 53 | targetCPU: "60" 54 | targetMemory: "" 55 | behavior: {} 56 | 57 | ## See `kubectl explain poddisruptionbudget.spec` for more 58 | ## ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ 59 | podDisruptionBudget: {} 60 | # minAvailable: 1 61 | # maxUnavailable: 1 62 | 63 | ## See `kubectl explain deployment.spec.strategy` for more 64 | ## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy 65 | deploymentStrategy: 66 | type: RollingUpdate 67 | 68 | readinessProbe: 69 | httpGet: 70 | path: /api/health 71 | port: 3000 72 | 73 | livenessProbe: 74 | httpGet: 75 | path: /api/health 76 | port: 3000 77 | initialDelaySeconds: 60 78 | timeoutSeconds: 30 79 | failureThreshold: 10 80 | 81 | ## Use an alternate scheduler, e.g. "stork". 82 | ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ 83 | ## 84 | # schedulerName: "default-scheduler" 85 | 86 | image: 87 | repository: grafana/grafana 88 | # Overrides the Grafana image tag whose default is the chart appVersion 89 | tag: "" 90 | sha: "" 91 | pullPolicy: IfNotPresent 92 | 93 | ## Optionally specify an array of imagePullSecrets. 94 | ## Secrets must be manually created in the namespace. 95 | ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ 96 | ## Can be templated. 97 | ## 98 | pullSecrets: [] 99 | # - myRegistrKeySecretName 100 | 101 | testFramework: 102 | enabled: true 103 | image: "bats/bats" 104 | tag: "v1.4.1" 105 | imagePullPolicy: IfNotPresent 106 | securityContext: {} 107 | 108 | securityContext: 109 | runAsUser: 472 110 | runAsGroup: 472 111 | fsGroup: 472 112 | 113 | containerSecurityContext: {} 114 | 115 | # Enable creating the grafana configmap 116 | createConfigmap: true 117 | 118 | # Extra configmaps to mount in grafana pods 119 | # Values are templated. 120 | extraConfigmapMounts: [] 121 | # - name: certs-configmap 122 | # mountPath: /etc/grafana/ssl/ 123 | # subPath: certificates.crt # (optional) 124 | # configMap: certs-configmap 125 | # readOnly: true 126 | 127 | 128 | extraEmptyDirMounts: [] 129 | # - name: provisioning-notifiers 130 | # mountPath: /etc/grafana/provisioning/notifiers 131 | 132 | 133 | # Apply extra labels to common labels. 134 | extraLabels: {} 135 | 136 | ## Assign a PriorityClassName to pods if set 137 | # priorityClassName: 138 | 139 | downloadDashboardsImage: 140 | repository: curlimages/curl 141 | tag: 7.85.0 142 | sha: "" 143 | pullPolicy: IfNotPresent 144 | 145 | downloadDashboards: 146 | env: {} 147 | envFromSecret: "" 148 | resources: {} 149 | securityContext: {} 150 | envValueFrom: {} 151 | # ENV_NAME: 152 | # configMapKeyRef: 153 | # name: configmap-name 154 | # key: value_key 155 | 156 | ## Pod Annotations 157 | # podAnnotations: {} 158 | 159 | ## Pod Labels 160 | # podLabels: {} 161 | 162 | podPortName: grafana 163 | gossipPortName: grafana-alert 164 | ## Deployment annotations 165 | # annotations: {} 166 | 167 | ## Expose the grafana service to be accessed from outside the cluster (LoadBalancer service). 168 | ## or access it from within the cluster (ClusterIP service). Set the service type and the port to serve it. 169 | ## ref: http://kubernetes.io/docs/user-guide/services/ 170 | ## 171 | service: 172 | enabled: true 173 | type: ClusterIP 174 | port: 80 175 | targetPort: 3000 176 | # targetPort: 4181 To be used with a proxy extraContainer 177 | ## Service annotations. Can be templated. 178 | annotations: {} 179 | labels: {} 180 | portName: service 181 | # Adds the appProtocol field to the service. This allows to work with istio protocol selection. Ex: "http" or "tcp" 182 | appProtocol: "" 183 | 184 | serviceMonitor: 185 | ## If true, a ServiceMonitor CRD is created for a prometheus operator 186 | ## https://github.com/coreos/prometheus-operator 187 | ## 188 | enabled: false 189 | path: /metrics 190 | # namespace: monitoring (defaults to use the namespace this chart is deployed to) 191 | labels: {} 192 | interval: 1m 193 | scheme: http 194 | tlsConfig: {} 195 | scrapeTimeout: 30s 196 | relabelings: [] 197 | targetLabels: [] 198 | 199 | extraExposePorts: [] 200 | # - name: keycloak 201 | # port: 8080 202 | # targetPort: 8080 203 | # type: ClusterIP 204 | 205 | # overrides pod.spec.hostAliases in the grafana deployment's pods 206 | hostAliases: [] 207 | # - ip: "1.2.3.4" 208 | # hostnames: 209 | # - "my.host.com" 210 | 211 | ingress: 212 | enabled: false 213 | # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName 214 | # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress 215 | # ingressClassName: nginx 216 | # Values can be templated 217 | annotations: {} 218 | # kubernetes.io/ingress.class: nginx 219 | # kubernetes.io/tls-acme: "true" 220 | labels: {} 221 | path: / 222 | 223 | # pathType is only for k8s >= 1.1= 224 | pathType: Prefix 225 | 226 | hosts: 227 | - chart-example.local 228 | ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services. 229 | extraPaths: [] 230 | # - path: /* 231 | # backend: 232 | # serviceName: ssl-redirect 233 | # servicePort: use-annotation 234 | ## Or for k8s > 1.19 235 | # - path: /* 236 | # pathType: Prefix 237 | # backend: 238 | # service: 239 | # name: ssl-redirect 240 | # port: 241 | # name: use-annotation 242 | 243 | 244 | tls: [] 245 | # - secretName: chart-example-tls 246 | # hosts: 247 | # - chart-example.local 248 | 249 | resources: {} 250 | # limits: 251 | # cpu: 100m 252 | # memory: 128Mi 253 | # requests: 254 | # cpu: 100m 255 | # memory: 128Mi 256 | 257 | ## Node labels for pod assignment 258 | ## ref: https://kubernetes.io/docs/user-guide/node-selection/ 259 | # 260 | nodeSelector: {} 261 | 262 | ## Tolerations for pod assignment 263 | ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ 264 | ## 265 | tolerations: [] 266 | 267 | ## Affinity for pod assignment (evaluated as template) 268 | ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity 269 | ## 270 | affinity: {} 271 | 272 | ## Topology Spread Constraints 273 | ## ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/ 274 | ## 275 | topologySpreadConstraints: [] 276 | 277 | ## Additional init containers (evaluated as template) 278 | ## ref: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ 279 | ## 280 | extraInitContainers: [] 281 | 282 | ## Enable an Specify container in extraContainers. This is meant to allow adding an authentication proxy to a grafana pod 283 | extraContainers: "" 284 | # extraContainers: | 285 | # - name: proxy 286 | # image: quay.io/gambol99/keycloak-proxy:latest 287 | # args: 288 | # - -provider=github 289 | # - -client-id= 290 | # - -client-secret= 291 | # - -github-org= 292 | # - -email-domain=* 293 | # - -cookie-secret= 294 | # - -http-address=http://0.0.0.0:4181 295 | # - -upstream-url=http://127.0.0.1:3000 296 | # ports: 297 | # - name: proxy-web 298 | # containerPort: 4181 299 | 300 | ## Volumes that can be used in init containers that will not be mounted to deployment pods 301 | extraContainerVolumes: [] 302 | # - name: volume-from-secret 303 | # secret: 304 | # secretName: secret-to-mount 305 | # - name: empty-dir-volume 306 | # emptyDir: {} 307 | 308 | ## Enable persistence using Persistent Volume Claims 309 | ## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ 310 | ## 311 | persistence: 312 | type: pvc 313 | enabled: false 314 | # storageClassName: default 315 | accessModes: 316 | - ReadWriteOnce 317 | size: 10Gi 318 | # annotations: {} 319 | finalizers: 320 | - kubernetes.io/pvc-protection 321 | # selectorLabels: {} 322 | ## Sub-directory of the PV to mount. Can be templated. 323 | # subPath: "" 324 | ## Name of an existing PVC. Can be templated. 325 | # existingClaim: 326 | ## Extra labels to apply to a PVC. 327 | extraPvcLabels: {} 328 | 329 | ## If persistence is not enabled, this allows to mount the 330 | ## local storage in-memory to improve performance 331 | ## 332 | inMemory: 333 | enabled: false 334 | ## The maximum usage on memory medium EmptyDir would be 335 | ## the minimum value between the SizeLimit specified 336 | ## here and the sum of memory limits of all containers in a pod 337 | ## 338 | # sizeLimit: 300Mi 339 | 340 | initChownData: 341 | ## If false, data ownership will not be reset at startup 342 | ## This allows the grafana-server to be run with an arbitrary user 343 | ## 344 | enabled: true 345 | 346 | ## initChownData container image 347 | ## 348 | image: 349 | repository: busybox 350 | tag: "1.31.1" 351 | sha: "" 352 | pullPolicy: IfNotPresent 353 | 354 | ## initChownData resource requests and limits 355 | ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ 356 | ## 357 | resources: {} 358 | # limits: 359 | # cpu: 100m 360 | # memory: 128Mi 361 | # requests: 362 | # cpu: 100m 363 | # memory: 128Mi 364 | securityContext: 365 | runAsNonRoot: false 366 | runAsUser: 0 367 | 368 | 369 | # Administrator credentials when not using an existing secret (see below) 370 | adminUser: admin 371 | # adminPassword: strongpassword 372 | 373 | # Use an existing secret for the admin user. 374 | admin: 375 | ## Name of the secret. Can be templated. 376 | existingSecret: "" 377 | userKey: admin-user 378 | passwordKey: admin-password 379 | 380 | ## Define command to be executed at startup by grafana container 381 | ## Needed if using `vault-env` to manage secrets (ref: https://banzaicloud.com/blog/inject-secrets-into-pods-vault/) 382 | ## Default is "run.sh" as defined in grafana's Dockerfile 383 | # command: 384 | # - "sh" 385 | # - "/run.sh" 386 | 387 | ## Extra environment variables that will be pass onto deployment pods 388 | ## 389 | ## to provide grafana with access to CloudWatch on AWS EKS: 390 | ## 1. create an iam role of type "Web identity" with provider oidc.eks.* (note the provider for later) 391 | ## 2. edit the "Trust relationships" of the role, add a line inside the StringEquals clause using the 392 | ## same oidc eks provider as noted before (same as the existing line) 393 | ## also, replace NAMESPACE and prometheus-operator-grafana with the service account namespace and name 394 | ## 395 | ## "oidc.eks.us-east-1.amazonaws.com/id/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:sub": "system:serviceaccount:NAMESPACE:prometheus-operator-grafana", 396 | ## 397 | ## 3. attach a policy to the role, you can use a built in policy called CloudWatchReadOnlyAccess 398 | ## 4. use the following env: (replace 123456789000 and iam-role-name-here with your aws account number and role name) 399 | ## 400 | ## env: 401 | ## AWS_ROLE_ARN: arn:aws:iam::123456789000:role/iam-role-name-here 402 | ## AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token 403 | ## AWS_REGION: us-east-1 404 | ## 405 | ## 5. uncomment the EKS section in extraSecretMounts: below 406 | ## 6. uncomment the annotation section in the serviceAccount: above 407 | ## make sure to replace arn:aws:iam::123456789000:role/iam-role-name-here with your role arn 408 | 409 | env: {} 410 | 411 | ## "valueFrom" environment variable references that will be added to deployment pods. Name is templated. 412 | ## ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#envvarsource-v1-core 413 | ## Renders in container spec as: 414 | ## env: 415 | ## ... 416 | ## - name: 417 | ## valueFrom: 418 | ## 419 | envValueFrom: {} 420 | # ENV_NAME: 421 | # configMapKeyRef: 422 | # name: configmap-name 423 | # key: value_key 424 | 425 | ## The name of a secret in the same kubernetes namespace which contain values to be added to the environment 426 | ## This can be useful for auth tokens, etc. Value is templated. 427 | envFromSecret: "" 428 | 429 | ## Sensible environment variables that will be rendered as new secret object 430 | ## This can be useful for auth tokens, etc 431 | envRenderSecret: {} 432 | 433 | ## The names of secrets in the same kubernetes namespace which contain values to be added to the environment 434 | ## Each entry should contain a name key, and can optionally specify whether the secret must be defined with an optional key. 435 | ## Name is templated. 436 | envFromSecrets: [] 437 | ## - name: secret-name 438 | ## optional: true 439 | 440 | ## The names of conifgmaps in the same kubernetes namespace which contain values to be added to the environment 441 | ## Each entry should contain a name key, and can optionally specify whether the configmap must be defined with an optional key. 442 | ## Name is templated. 443 | ## ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmapenvsource-v1-core 444 | envFromConfigMaps: [] 445 | ## - name: configmap-name 446 | ## optional: true 447 | 448 | # Inject Kubernetes services as environment variables. 449 | # See https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#environment-variables 450 | enableServiceLinks: true 451 | 452 | ## Additional grafana server secret mounts 453 | # Defines additional mounts with secrets. Secrets must be manually created in the namespace. 454 | extraSecretMounts: [] 455 | # - name: secret-files 456 | # mountPath: /etc/secrets 457 | # secretName: grafana-secret-files 458 | # readOnly: true 459 | # subPath: "" 460 | # 461 | # for AWS EKS (cloudwatch) use the following (see also instruction in env: above) 462 | # - name: aws-iam-token 463 | # mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount 464 | # readOnly: true 465 | # projected: 466 | # defaultMode: 420 467 | # sources: 468 | # - serviceAccountToken: 469 | # audience: sts.amazonaws.com 470 | # expirationSeconds: 86400 471 | # path: token 472 | # 473 | # for CSI e.g. Azure Key Vault use the following 474 | # - name: secrets-store-inline 475 | # mountPath: /run/secrets 476 | # readOnly: true 477 | # csi: 478 | # driver: secrets-store.csi.k8s.io 479 | # readOnly: true 480 | # volumeAttributes: 481 | # secretProviderClass: "akv-grafana-spc" 482 | # nodePublishSecretRef: # Only required when using service principal mode 483 | # name: grafana-akv-creds # Only required when using service principal mode 484 | 485 | ## Additional grafana server volume mounts 486 | # Defines additional volume mounts. 487 | extraVolumeMounts: [] 488 | # - name: extra-volume-0 489 | # mountPath: /mnt/volume0 490 | # readOnly: true 491 | # existingClaim: volume-claim 492 | # - name: extra-volume-1 493 | # mountPath: /mnt/volume1 494 | # readOnly: true 495 | # hostPath: /usr/shared/ 496 | # - name: grafana-secrets 497 | # csi: true 498 | # data: 499 | # driver: secrets-store.csi.k8s.io 500 | # readOnly: true 501 | # volumeAttributes: 502 | # secretProviderClass: "grafana-env-spc" 503 | 504 | ## Container Lifecycle Hooks. Execute a specific bash command or make an HTTP request 505 | lifecycleHooks: {} 506 | # postStart: 507 | # exec: 508 | # command: [] 509 | 510 | ## Pass the plugins you want installed as a list. 511 | ## 512 | plugins: [] 513 | # - digrich-bubblechart-panel 514 | # - grafana-clock-panel 515 | 516 | ## Configure grafana datasources 517 | ## ref: http://docs.grafana.org/administration/provisioning/#datasources 518 | ## 519 | datasources: {} 520 | # datasources.yaml: 521 | # apiVersion: 1 522 | # datasources: 523 | # - name: Prometheus 524 | # type: prometheus 525 | # url: http://prometheus-prometheus-server 526 | # access: proxy 527 | # isDefault: true 528 | # - name: CloudWatch 529 | # type: cloudwatch 530 | # access: proxy 531 | # uid: cloudwatch 532 | # editable: false 533 | # jsonData: 534 | # authType: default 535 | # defaultRegion: us-east-1 536 | 537 | ## Configure grafana alerting (can be templated) 538 | ## ref: http://docs.grafana.org/administration/provisioning/#alerting 539 | ## 540 | alerting: {} 541 | # rules.yaml: 542 | # apiVersion: 1 543 | # groups: 544 | # - orgId: 1 545 | # name: '{{ .Chart.Name }}_my_rule_group' 546 | # folder: my_first_folder 547 | # interval: 60s 548 | # rules: 549 | # - uid: my_id_1 550 | # title: my_first_rule 551 | # condition: A 552 | # data: 553 | # - refId: A 554 | # datasourceUid: '-100' 555 | # model: 556 | # conditions: 557 | # - evaluator: 558 | # params: 559 | # - 3 560 | # type: gt 561 | # operator: 562 | # type: and 563 | # query: 564 | # params: 565 | # - A 566 | # reducer: 567 | # type: last 568 | # type: query 569 | # datasource: 570 | # type: __expr__ 571 | # uid: '-100' 572 | # expression: 1==0 573 | # intervalMs: 1000 574 | # maxDataPoints: 43200 575 | # refId: A 576 | # type: math 577 | # dashboardUid: my_dashboard 578 | # panelId: 123 579 | # noDataState: Alerting 580 | # for: 60s 581 | # annotations: 582 | # some_key: some_value 583 | # labels: 584 | # team: sre_team_1 585 | # contactpoints.yaml: 586 | # apiVersion: 1 587 | # contactPoints: 588 | # - orgId: 1 589 | # name: cp_1 590 | # receivers: 591 | # - uid: first_uid 592 | # type: pagerduty 593 | # settings: 594 | # integrationKey: XXX 595 | # severity: critical 596 | # class: ping failure 597 | # component: Grafana 598 | # group: app-stack 599 | # summary: | 600 | # {{ `{{ include "default.message" . }}` }} 601 | 602 | ## Configure notifiers 603 | ## ref: http://docs.grafana.org/administration/provisioning/#alert-notification-channels 604 | ## 605 | notifiers: {} 606 | # notifiers.yaml: 607 | # notifiers: 608 | # - name: email-notifier 609 | # type: email 610 | # uid: email1 611 | # # either: 612 | # org_id: 1 613 | # # or 614 | # org_name: Main Org. 615 | # is_default: true 616 | # settings: 617 | # addresses: an_email_address@example.com 618 | # delete_notifiers: 619 | 620 | ## Configure grafana dashboard providers 621 | ## ref: http://docs.grafana.org/administration/provisioning/#dashboards 622 | ## 623 | ## `path` must be /var/lib/grafana/dashboards/ 624 | ## 625 | dashboardProviders: {} 626 | # dashboardproviders.yaml: 627 | # apiVersion: 1 628 | # providers: 629 | # - name: 'default' 630 | # orgId: 1 631 | # folder: '' 632 | # type: file 633 | # disableDeletion: false 634 | # editable: true 635 | # options: 636 | # path: /var/lib/grafana/dashboards/default 637 | 638 | ## Configure grafana dashboard to import 639 | ## NOTE: To use dashboards you must also enable/configure dashboardProviders 640 | ## ref: https://grafana.com/dashboards 641 | ## 642 | ## dashboards per provider, use provider name as key. 643 | ## 644 | dashboards: {} 645 | # default: 646 | # some-dashboard: 647 | # json: | 648 | # $RAW_JSON 649 | # custom-dashboard: 650 | # file: dashboards/custom-dashboard.json 651 | # prometheus-stats: 652 | # gnetId: 2 653 | # revision: 2 654 | # datasource: Prometheus 655 | # local-dashboard: 656 | # url: https://example.com/repository/test.json 657 | # token: '' 658 | # local-dashboard-base64: 659 | # url: https://example.com/repository/test-b64.json 660 | # token: '' 661 | # b64content: true 662 | # local-dashboard-gitlab: 663 | # url: https://example.com/repository/test-gitlab.json 664 | # gitlabToken: '' 665 | # local-dashboard-bitbucket: 666 | # url: https://example.com/repository/test-bitbucket.json 667 | # bearerToken: '' 668 | 669 | ## Reference to external ConfigMap per provider. Use provider name as key and ConfigMap name as value. 670 | ## A provider dashboards must be defined either by external ConfigMaps or in values.yaml, not in both. 671 | ## ConfigMap data example: 672 | ## 673 | ## data: 674 | ## example-dashboard.json: | 675 | ## RAW_JSON 676 | ## 677 | dashboardsConfigMaps: {} 678 | # default: "" 679 | 680 | ## Grafana's primary configuration 681 | ## NOTE: values in map will be converted to ini format 682 | ## ref: http://docs.grafana.org/installation/configuration/ 683 | ## 684 | grafana.ini: 685 | paths: 686 | data: /var/lib/grafana/ 687 | logs: /var/log/grafana 688 | plugins: /var/lib/grafana/plugins 689 | provisioning: /etc/grafana/provisioning 690 | analytics: 691 | check_for_updates: true 692 | log: 693 | mode: console 694 | grafana_net: 695 | url: https://grafana.net 696 | server: 697 | domain: "{{ if (and .Values.ingress.enabled .Values.ingress.hosts) }}{{ .Values.ingress.hosts | first }}{{ else }}''{{ end }}" 698 | ## grafana Authentication can be enabled with the following values on grafana.ini 699 | # server: 700 | # The full public facing url you use in browser, used for redirects and emails 701 | # root_url: 702 | # https://grafana.com/docs/grafana/latest/auth/github/#enable-github-in-grafana 703 | # auth.github: 704 | # enabled: false 705 | # allow_sign_up: false 706 | # scopes: user:email,read:org 707 | # auth_url: https://github.com/login/oauth/authorize 708 | # token_url: https://github.com/login/oauth/access_token 709 | # api_url: https://api.github.com/user 710 | # team_ids: 711 | # allowed_organizations: 712 | # client_id: 713 | # client_secret: 714 | ## LDAP Authentication can be enabled with the following values on grafana.ini 715 | ## NOTE: Grafana will fail to start if the value for ldap.toml is invalid 716 | # auth.ldap: 717 | # enabled: true 718 | # allow_sign_up: true 719 | # config_file: /etc/grafana/ldap.toml 720 | 721 | ## Grafana's LDAP configuration 722 | ## Templated by the template in _helpers.tpl 723 | ## NOTE: To enable the grafana.ini must be configured with auth.ldap.enabled 724 | ## ref: http://docs.grafana.org/installation/configuration/#auth-ldap 725 | ## ref: http://docs.grafana.org/installation/ldap/#configuration 726 | ldap: 727 | enabled: false 728 | # `existingSecret` is a reference to an existing secret containing the ldap configuration 729 | # for Grafana in a key `ldap-toml`. 730 | existingSecret: "" 731 | # `config` is the content of `ldap.toml` that will be stored in the created secret 732 | config: "" 733 | # config: |- 734 | # verbose_logging = true 735 | 736 | # [[servers]] 737 | # host = "my-ldap-server" 738 | # port = 636 739 | # use_ssl = true 740 | # start_tls = false 741 | # ssl_skip_verify = false 742 | # bind_dn = "uid=%s,ou=users,dc=myorg,dc=com" 743 | 744 | ## Grafana's SMTP configuration 745 | ## NOTE: To enable, grafana.ini must be configured with smtp.enabled 746 | ## ref: http://docs.grafana.org/installation/configuration/#smtp 747 | smtp: 748 | # `existingSecret` is a reference to an existing secret containing the smtp configuration 749 | # for Grafana. 750 | existingSecret: "" 751 | userKey: "user" 752 | passwordKey: "password" 753 | 754 | ## Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders 755 | ## Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards 756 | sidecar: 757 | image: 758 | repository: quay.io/kiwigrid/k8s-sidecar 759 | tag: 1.22.0 760 | sha: "" 761 | imagePullPolicy: IfNotPresent 762 | resources: {} 763 | # limits: 764 | # cpu: 100m 765 | # memory: 100Mi 766 | # requests: 767 | # cpu: 50m 768 | # memory: 50Mi 769 | securityContext: {} 770 | # skipTlsVerify Set to true to skip tls verification for kube api calls 771 | # skipTlsVerify: true 772 | enableUniqueFilenames: false 773 | readinessProbe: {} 774 | livenessProbe: {} 775 | # Log level default for all sidecars. Can be one of: DEBUG, INFO, WARN, ERROR, CRITICAL. Defaults to INFO 776 | # logLevel: INFO 777 | alerts: 778 | enabled: false 779 | # Additional environment variables for the alerts sidecar 780 | env: {} 781 | # Do not reprocess already processed unchanged resources on k8s API reconnect. 782 | # ignoreAlreadyProcessed: true 783 | # label that the configmaps with alert are marked with 784 | label: grafana_alert 785 | # value of label that the configmaps with alert are set to 786 | labelValue: "" 787 | # Log level. Can be one of: DEBUG, INFO, WARN, ERROR, CRITICAL. 788 | # logLevel: INFO 789 | # If specified, the sidecar will search for alert config-maps inside this namespace. 790 | # Otherwise the namespace in which the sidecar is running will be used. 791 | # It's also possible to specify ALL to search in all namespaces 792 | searchNamespace: null 793 | # Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH requests, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds. 794 | watchMethod: WATCH 795 | # search in configmap, secret or both 796 | resource: both 797 | # watchServerTimeout: request to the server, asking it to cleanly close the connection after that. 798 | # defaults to 60sec; much higher values like 3600 seconds (1h) are feasible for non-Azure K8S 799 | # watchServerTimeout: 3600 800 | # 801 | # watchClientTimeout: is a client-side timeout, configuring your local socket. 802 | # If you have a network outage dropping all packets with no RST/FIN, 803 | # this is how long your client waits before realizing & dropping the connection. 804 | # defaults to 66sec (sic!) 805 | # watchClientTimeout: 60 806 | # 807 | # Endpoint to send request to reload alerts 808 | reloadURL: "http://localhost:3000/api/admin/provisioning/alerting/reload" 809 | # Absolute path to shell script to execute after a alert got reloaded 810 | script: null 811 | skipReload: false 812 | # Deploy the alert sidecar as an initContainer in addition to a container. 813 | # Sets the size limit of the alert sidecar emptyDir volume 814 | sizeLimit: {} 815 | dashboards: 816 | enabled: false 817 | # Additional environment variables for the dashboards sidecar 818 | env: {} 819 | # Do not reprocess already processed unchanged resources on k8s API reconnect. 820 | # ignoreAlreadyProcessed: true 821 | SCProvider: true 822 | # label that the configmaps with dashboards are marked with 823 | label: grafana_dashboard 824 | # value of label that the configmaps with dashboards are set to 825 | labelValue: "" 826 | # Log level. Can be one of: DEBUG, INFO, WARN, ERROR, CRITICAL. 827 | # logLevel: INFO 828 | # folder in the pod that should hold the collected dashboards (unless `defaultFolderName` is set) 829 | folder: /tmp/dashboards 830 | # The default folder name, it will create a subfolder under the `folder` and put dashboards in there instead 831 | defaultFolderName: null 832 | # Namespaces list. If specified, the sidecar will search for config-maps/secrets inside these namespaces. 833 | # Otherwise the namespace in which the sidecar is running will be used. 834 | # It's also possible to specify ALL to search in all namespaces. 835 | searchNamespace: null 836 | # Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH requests, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds. 837 | watchMethod: WATCH 838 | # search in configmap, secret or both 839 | resource: both 840 | # If specified, the sidecar will look for annotation with this name to create folder and put graph here. 841 | # You can use this parameter together with `provider.foldersFromFilesStructure`to annotate configmaps and create folder structure. 842 | folderAnnotation: null 843 | # Endpoint to send request to reload alerts 844 | reloadURL: "http://localhost:3000/api/admin/provisioning/dashboards/reload" 845 | # Absolute path to shell script to execute after a configmap got reloaded 846 | script: null 847 | skipReload: false 848 | # watchServerTimeout: request to the server, asking it to cleanly close the connection after that. 849 | # defaults to 60sec; much higher values like 3600 seconds (1h) are feasible for non-Azure K8S 850 | # watchServerTimeout: 3600 851 | # 852 | # watchClientTimeout: is a client-side timeout, configuring your local socket. 853 | # If you have a network outage dropping all packets with no RST/FIN, 854 | # this is how long your client waits before realizing & dropping the connection. 855 | # defaults to 66sec (sic!) 856 | # watchClientTimeout: 60 857 | # 858 | # provider configuration that lets grafana manage the dashboards 859 | provider: 860 | # name of the provider, should be unique 861 | name: sidecarProvider 862 | # orgid as configured in grafana 863 | orgid: 1 864 | # folder in which the dashboards should be imported in grafana 865 | folder: '' 866 | # type of the provider 867 | type: file 868 | # disableDelete to activate a import-only behaviour 869 | disableDelete: false 870 | # allow updating provisioned dashboards from the UI 871 | allowUiUpdates: false 872 | # allow Grafana to replicate dashboard structure from filesystem 873 | foldersFromFilesStructure: false 874 | # Additional dashboard sidecar volume mounts 875 | extraMounts: [] 876 | # Sets the size limit of the dashboard sidecar emptyDir volume 877 | sizeLimit: {} 878 | datasources: 879 | enabled: false 880 | # Additional environment variables for the datasourcessidecar 881 | env: {} 882 | # Do not reprocess already processed unchanged resources on k8s API reconnect. 883 | # ignoreAlreadyProcessed: true 884 | # label that the configmaps with datasources are marked with 885 | label: grafana_datasource 886 | # value of label that the configmaps with datasources are set to 887 | labelValue: "" 888 | # Log level. Can be one of: DEBUG, INFO, WARN, ERROR, CRITICAL. 889 | # logLevel: INFO 890 | # If specified, the sidecar will search for datasource config-maps inside this namespace. 891 | # Otherwise the namespace in which the sidecar is running will be used. 892 | # It's also possible to specify ALL to search in all namespaces 893 | searchNamespace: null 894 | # Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH requests, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds. 895 | watchMethod: WATCH 896 | # search in configmap, secret or both 897 | resource: both 898 | # watchServerTimeout: request to the server, asking it to cleanly close the connection after that. 899 | # defaults to 60sec; much higher values like 3600 seconds (1h) are feasible for non-Azure K8S 900 | # watchServerTimeout: 3600 901 | # 902 | # watchClientTimeout: is a client-side timeout, configuring your local socket. 903 | # If you have a network outage dropping all packets with no RST/FIN, 904 | # this is how long your client waits before realizing & dropping the connection. 905 | # defaults to 66sec (sic!) 906 | # watchClientTimeout: 60 907 | # 908 | # Endpoint to send request to reload datasources 909 | reloadURL: "http://localhost:3000/api/admin/provisioning/datasources/reload" 910 | # Absolute path to shell script to execute after a datasource got reloaded 911 | script: null 912 | skipReload: false 913 | # Deploy the datasource sidecar as an initContainer in addition to a container. 914 | # This is needed if skipReload is true, to load any datasources defined at startup time. 915 | initDatasources: false 916 | # Sets the size limit of the datasource sidecar emptyDir volume 917 | sizeLimit: {} 918 | plugins: 919 | enabled: false 920 | # Additional environment variables for the plugins sidecar 921 | env: {} 922 | # Do not reprocess already processed unchanged resources on k8s API reconnect. 923 | # ignoreAlreadyProcessed: true 924 | # label that the configmaps with plugins are marked with 925 | label: grafana_plugin 926 | # value of label that the configmaps with plugins are set to 927 | labelValue: "" 928 | # Log level. Can be one of: DEBUG, INFO, WARN, ERROR, CRITICAL. 929 | # logLevel: INFO 930 | # If specified, the sidecar will search for plugin config-maps inside this namespace. 931 | # Otherwise the namespace in which the sidecar is running will be used. 932 | # It's also possible to specify ALL to search in all namespaces 933 | searchNamespace: null 934 | # Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH requests, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds. 935 | watchMethod: WATCH 936 | # search in configmap, secret or both 937 | resource: both 938 | # watchServerTimeout: request to the server, asking it to cleanly close the connection after that. 939 | # defaults to 60sec; much higher values like 3600 seconds (1h) are feasible for non-Azure K8S 940 | # watchServerTimeout: 3600 941 | # 942 | # watchClientTimeout: is a client-side timeout, configuring your local socket. 943 | # If you have a network outage dropping all packets with no RST/FIN, 944 | # this is how long your client waits before realizing & dropping the connection. 945 | # defaults to 66sec (sic!) 946 | # watchClientTimeout: 60 947 | # 948 | # Endpoint to send request to reload plugins 949 | reloadURL: "http://localhost:3000/api/admin/provisioning/plugins/reload" 950 | # Absolute path to shell script to execute after a plugin got reloaded 951 | script: null 952 | skipReload: false 953 | # Deploy the datasource sidecar as an initContainer in addition to a container. 954 | # This is needed if skipReload is true, to load any plugins defined at startup time. 955 | initPlugins: false 956 | # Sets the size limit of the plugin sidecar emptyDir volume 957 | sizeLimit: {} 958 | notifiers: 959 | enabled: false 960 | # Additional environment variables for the notifierssidecar 961 | env: {} 962 | # Do not reprocess already processed unchanged resources on k8s API reconnect. 963 | # ignoreAlreadyProcessed: true 964 | # label that the configmaps with notifiers are marked with 965 | label: grafana_notifier 966 | # value of label that the configmaps with notifiers are set to 967 | labelValue: "" 968 | # Log level. Can be one of: DEBUG, INFO, WARN, ERROR, CRITICAL. 969 | # logLevel: INFO 970 | # If specified, the sidecar will search for notifier config-maps inside this namespace. 971 | # Otherwise the namespace in which the sidecar is running will be used. 972 | # It's also possible to specify ALL to search in all namespaces 973 | searchNamespace: null 974 | # Method to use to detect ConfigMap changes. With WATCH the sidecar will do a WATCH requests, with SLEEP it will list all ConfigMaps, then sleep for 60 seconds. 975 | watchMethod: WATCH 976 | # search in configmap, secret or both 977 | resource: both 978 | # watchServerTimeout: request to the server, asking it to cleanly close the connection after that. 979 | # defaults to 60sec; much higher values like 3600 seconds (1h) are feasible for non-Azure K8S 980 | # watchServerTimeout: 3600 981 | # 982 | # watchClientTimeout: is a client-side timeout, configuring your local socket. 983 | # If you have a network outage dropping all packets with no RST/FIN, 984 | # this is how long your client waits before realizing & dropping the connection. 985 | # defaults to 66sec (sic!) 986 | # watchClientTimeout: 60 987 | # 988 | # Endpoint to send request to reload notifiers 989 | reloadURL: "http://localhost:3000/api/admin/provisioning/notifications/reload" 990 | # Absolute path to shell script to execute after a notifier got reloaded 991 | script: null 992 | skipReload: false 993 | # Deploy the notifier sidecar as an initContainer in addition to a container. 994 | # This is needed if skipReload is true, to load any notifiers defined at startup time. 995 | initNotifiers: false 996 | # Sets the size limit of the notifier sidecar emptyDir volume 997 | sizeLimit: {} 998 | 999 | ## Override the deployment namespace 1000 | ## 1001 | namespaceOverride: "" 1002 | 1003 | ## Number of old ReplicaSets to retain 1004 | ## 1005 | revisionHistoryLimit: 10 1006 | 1007 | ## Add a seperate remote image renderer deployment/service 1008 | imageRenderer: 1009 | deploymentStrategy: {} 1010 | # Enable the image-renderer deployment & service 1011 | enabled: false 1012 | replicas: 1 1013 | autoscaling: 1014 | enabled: false 1015 | minReplicas: 1 1016 | maxReplicas: 5 1017 | targetCPU: "60" 1018 | targetMemory: "" 1019 | behavior: {} 1020 | image: 1021 | # image-renderer Image repository 1022 | repository: grafana/grafana-image-renderer 1023 | # image-renderer Image tag 1024 | tag: latest 1025 | # image-renderer Image sha (optional) 1026 | sha: "" 1027 | # image-renderer ImagePullPolicy 1028 | pullPolicy: Always 1029 | # extra environment variables 1030 | env: 1031 | HTTP_HOST: "0.0.0.0" 1032 | # RENDERING_ARGS: --no-sandbox,--disable-gpu,--window-size=1280x758 1033 | # RENDERING_MODE: clustered 1034 | # IGNORE_HTTPS_ERRORS: true 1035 | # image-renderer deployment serviceAccount 1036 | serviceAccountName: "" 1037 | # image-renderer deployment securityContext 1038 | securityContext: {} 1039 | # image-renderer deployment container securityContext 1040 | containerSecurityContext: 1041 | capabilities: 1042 | drop: ['ALL'] 1043 | allowPrivilegeEscalation: false 1044 | readOnlyRootFilesystem: true 1045 | # image-renderer deployment Host Aliases 1046 | hostAliases: [] 1047 | # image-renderer deployment priority class 1048 | priorityClassName: '' 1049 | service: 1050 | # Enable the image-renderer service 1051 | enabled: true 1052 | # image-renderer service port name 1053 | portName: 'http' 1054 | # image-renderer service port used by both service and deployment 1055 | port: 8081 1056 | targetPort: 8081 1057 | # Adds the appProtocol field to the image-renderer service. This allows to work with istio protocol selection. Ex: "http" or "tcp" 1058 | appProtocol: "" 1059 | serviceMonitor: 1060 | ## If true, a ServiceMonitor CRD is created for a prometheus operator 1061 | ## https://github.com/coreos/prometheus-operator 1062 | ## 1063 | enabled: false 1064 | path: /metrics 1065 | # namespace: monitoring (defaults to use the namespace this chart is deployed to) 1066 | labels: {} 1067 | interval: 1m 1068 | scheme: http 1069 | tlsConfig: {} 1070 | scrapeTimeout: 30s 1071 | relabelings: [] 1072 | # See: https://doc.crds.dev/github.com/prometheus-operator/kube-prometheus/monitoring.coreos.com/ServiceMonitor/v1@v0.11.0#spec-targetLabels 1073 | targetLabels: [] 1074 | # - targetLabel1 1075 | # - targetLabel2 1076 | # If https is enabled in Grafana, this needs to be set as 'https' to correctly configure the callback used in Grafana 1077 | grafanaProtocol: http 1078 | # In case a sub_path is used this needs to be added to the image renderer callback 1079 | grafanaSubPath: "" 1080 | # name of the image-renderer port on the pod 1081 | podPortName: http 1082 | # number of image-renderer replica sets to keep 1083 | revisionHistoryLimit: 10 1084 | networkPolicy: 1085 | # Enable a NetworkPolicy to limit inbound traffic to only the created grafana pods 1086 | limitIngress: true 1087 | # Enable a NetworkPolicy to limit outbound traffic to only the created grafana pods 1088 | limitEgress: false 1089 | # Allow additional services to access image-renderer (eg. Prometheus operator when ServiceMonitor is enabled) 1090 | extraIngressSelectors: [] 1091 | resources: {} 1092 | # limits: 1093 | # cpu: 100m 1094 | # memory: 100Mi 1095 | # requests: 1096 | # cpu: 50m 1097 | # memory: 50Mi 1098 | ## Node labels for pod assignment 1099 | ## ref: https://kubernetes.io/docs/user-guide/node-selection/ 1100 | # 1101 | nodeSelector: {} 1102 | 1103 | ## Tolerations for pod assignment 1104 | ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ 1105 | ## 1106 | tolerations: [] 1107 | 1108 | ## Affinity for pod assignment (evaluated as template) 1109 | ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity 1110 | ## 1111 | affinity: {} 1112 | 1113 | ## Use an alternate scheduler, e.g. "stork". 1114 | ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ 1115 | ## 1116 | # schedulerName: "default-scheduler" 1117 | 1118 | networkPolicy: 1119 | ## @param networkPolicy.enabled Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now. 1120 | ## 1121 | enabled: false 1122 | ## @param networkPolicy.allowExternal Don't require client label for connections 1123 | ## The Policy model to apply. When set to false, only pods with the correct 1124 | ## client label will have network access to grafana port defined. 1125 | ## When true, grafana will accept connections from any source 1126 | ## (with the correct destination port). 1127 | ## 1128 | ingress: true 1129 | ## @param networkPolicy.ingress When true enables the creation 1130 | ## an ingress network policy 1131 | ## 1132 | allowExternal: true 1133 | ## @param networkPolicy.explicitNamespacesSelector A Kubernetes LabelSelector to explicitly select namespaces from which traffic could be allowed 1134 | ## If explicitNamespacesSelector is missing or set to {}, only client Pods that are in the networkPolicy's namespace 1135 | ## and that match other criteria, the ones that have the good label, can reach the grafana. 1136 | ## But sometimes, we want the grafana to be accessible to clients from other namespaces, in this case, we can use this 1137 | ## LabelSelector to select these namespaces, note that the networkPolicy's namespace should also be explicitly added. 1138 | ## 1139 | ## Example: 1140 | ## explicitNamespacesSelector: 1141 | ## matchLabels: 1142 | ## role: frontend 1143 | ## matchExpressions: 1144 | ## - {key: role, operator: In, values: [frontend]} 1145 | ## 1146 | explicitNamespacesSelector: {} 1147 | ## 1148 | ## 1149 | ## 1150 | ## 1151 | ## 1152 | ## 1153 | egress: 1154 | ## @param networkPolicy.egress.enabled When enabled, an egress network policy will be 1155 | ## created allowing grafana to connect to external data sources from kubernetes cluster. 1156 | enabled: false 1157 | ## 1158 | ## @param networkPolicy.egress.ports Add individual ports to be allowed by the egress 1159 | ports: [] 1160 | ## Add ports to the egress by specifying - port: 1161 | ## E.X. 1162 | ## ports: 1163 | ## - port: 80 1164 | ## - port: 443 1165 | ## 1166 | ## 1167 | ## 1168 | ## 1169 | ## 1170 | ## 1171 | 1172 | # Enable backward compatibility of kubernetes where version below 1.13 doesn't have the enableServiceLinks option 1173 | enableKubeBackwardCompatibility: false 1174 | useStatefulSet: false 1175 | # Create a dynamic manifests via values: 1176 | extraObjects: [] 1177 | # - apiVersion: "kubernetes-client.io/v1" 1178 | # kind: ExternalSecret 1179 | # metadata: 1180 | # name: grafana-secrets 1181 | # spec: 1182 | # backendType: gcpSecretsManager 1183 | # data: 1184 | # - key: grafana-admin-password 1185 | # name: adminPassword -------------------------------------------------------------------------------- /04-deploy-argocd/README.md: -------------------------------------------------------------------------------- 1 | # Argo CD 2 | 3 | [Argo CD](https://argo-cd.readthedocs.io/en/stable/) is a declarative, GitOps continuous delivery tool for Kubernetes. This repository contains the instructions to deploy argo cd in the EKS cluster and configurations to manage applications in Argo CD. 4 | 5 | ## Prerequisites 6 | 7 | - Kubernetes 1.22+ 8 | - kubectl 9 | 10 | 11 | ## Install Argo CD 12 | 13 | Follow the [official documentation](https://argo-cd.readthedocs.io/en/stable/getting_started/) to deploy Argo CD to your cluster. 14 | 15 | Use the [manifests](https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml) file to quickly install Argo CD or use the [helm chart](https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd) to customize the configurations. 16 | 17 | Run the command `kubectl get pods -n argocd` to verify the installation and ensure all the pods are in `Running` state. 18 | 19 | ```console 20 | ❯❯ kubectl get pods -n argocd 21 | NAME READY STATUS RESTARTS AGE 22 | argocd-application-controller-0 1/1 Running 0 136m 23 | argocd-applicationset-controller-bdbc5976d-rsz4p 1/1 Running 0 136m 24 | argocd-dex-server-7c8974cfc9-zq894 1/1 Running 0 136m 25 | argocd-notifications-controller-56dbd4976-4kdjn 1/1 Running 0 136m 26 | argocd-redis-6bdcf5f74-wdx5v 1/1 Running 0 136m 27 | argocd-repo-server-5bcc9567f8-5rjfc 1/1 Running 0 136m 28 | argocd-server-5ccfbc6db6-dz8c5 1/1 Running 0 136m 29 | 30 | ``` 31 | 32 | Kubectl port-forwarding can be used to connect to the Argo CD API server without exposing the service. 33 | 34 | ```console 35 | kubectl port-forward svc/argocd-server -n argocd 8080:443 36 | ``` 37 | 38 | The API server can then be accessed using https://localhost:8080 39 | 40 | You can use the loadbalancer address if the service is exposed outside the cluster. 41 | 42 | Login to the Argo CD dashboard, The default username is `admin` and run the below command to retrieve the default password. 43 | 44 | ```console 45 | kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo 46 | ``` 47 | 48 | ## Deploy Applications 49 | 50 | In Argo CD you can deploy the applications using the Helm charts or manifest files stored in a git repository. For this example, we will deploy Prometheus and grafana to the EKS cluster and use the official helm charts. 51 | 52 | You can install Helm charts through the UI, or in the [declarative GitOps way](https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/). We recommend you follow the declarative GitOps way to deploy an argo application. 53 | 54 | Run the below command to deploy grafana and prometheus applications via Argo CD and explore the deployed configurations in argo CD UI 55 | 56 | 57 | ```console 58 | kubectl apply -f ./applications/grafana/grafana.yaml 59 | kubectl apply -f ./applications/prometheus/prometheus.yaml 60 | ``` 61 | 62 | Change the `targetRevision` version `6.50.5` in `grafana.yaml` file and apply the changes. Argo CD will automatically identify the changes in the configurations and roll out the new version. 63 | 64 | ## Demo 65 | 66 | https://user-images.githubusercontent.com/112865563/215147384-92f62a74-b411-42e9-859a-5896ad870707.mp4 67 | -------------------------------------------------------------------------------- /04-deploy-argocd/applications/grafana/grafana.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | apiVersion: argoproj.io/v1alpha1 3 | kind: Application 4 | metadata: 5 | name: grafana 6 | namespace: argocd 7 | spec: 8 | project: default 9 | source: 10 | chart: grafana 11 | repoURL: https://grafana.github.io/helm-charts 12 | targetRevision: 6.50.0 13 | helm: 14 | releaseName: grafana 15 | destination: 16 | server: "https://kubernetes.default.svc" 17 | namespace: grafana 18 | syncPolicy: 19 | syncOptions: 20 | - CreateNamespace=true 21 | automated: 22 | selfHeal: true 23 | prune: true 24 | -------------------------------------------------------------------------------- /04-deploy-argocd/applications/prometheus/prometheus.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | apiVersion: argoproj.io/v1alpha1 3 | kind: Application 4 | metadata: 5 | name: prometheus 6 | namespace: argocd 7 | spec: 8 | project: default 9 | source: 10 | chart: prometheus 11 | repoURL: https://prometheus-community.github.io/helm-charts 12 | targetRevision: 19.3.3 13 | helm: 14 | releaseName: prometheus 15 | destination: 16 | server: "https://kubernetes.default.svc" 17 | namespace: prometheus 18 | syncPolicy: 19 | syncOptions: 20 | - CreateNamespace=true 21 | automated: 22 | selfHeal: true 23 | prune: true -------------------------------------------------------------------------------- /05-deploy-EFK/README.md: -------------------------------------------------------------------------------- 1 | # deploy-Elasticsearch-Filebeat-Kibana-stack 2 | 3 | This repository contains sample code to deploy a self-managed elastic search cluster in EKS with kibana and filebat. 4 | 5 | The required resources are created and managed via [Elastic Cloud on Kubernetes (ECK)](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-overview.html) which is a Kubernetes Operator to orchestrate Elastic applications (Elasticsearch, Kibana, APM Server, Enterprise Search, Beats, Elastic Agent, and Elastic Maps Server) on Kubernetes. 6 | 7 | The ECK operator relies on a set of Custom Resource Definitions (CRD) to declaratively define how each application is deployed. ECK simplifies deploying the whole Elastic stack on Kubernetes, giving us tools to automate and streamline critical operations. 8 | 9 | It focuses on streamlining all those critical operations such as, Managing and monitoring multiple clusters, Upgrading to new stack versions with ease, Scaling cluster capacity up and down, Changing cluster configuration, Dynamically scaling local storage (includes Elastic Local Volume, a local storage driver), Scheduling backups etc. 10 | 11 | The sample configurations are based on elastic search version 8.6.1 and. Refer to the official documentation for version details. 12 | 13 | ## Prerequisites 14 | 15 | - [EKS cluster with Kubernetes 1.22+](https://github.com/doitintl/aws-eks-devops-best-practices/tree/main/00-create-eks-cluster) 16 | - [EKS cluster with Amazon EBS CSI Driver](https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html) (Note: The add-on is deployed as part of cluster creation if you have used [create-eks-cluster](https://github.com/doitintl/aws-eks-devops-best-practices/tree/main/00-create-eks-cluster)] 17 | - Attach `AmazonEBSCSIDriverPolicy` to the worker node role 18 | - [kubectl](https://kubernetes.io/docs/tasks/tools/) 19 | 20 | ## Install ECK Operator 21 | 22 | To deploy Elasticsearch on Kubernetes, first we need to install the ECK operator in the Kubernetes cluster. 23 | 24 | There are two main ways to install the ECK in a Kubernetes cluster, 1) Install ECK using the YAML manifests, and 2) Install ECK using the Helm chart. This installation is based on YAML manifests. 25 | 26 | Run the following command to install the custom resource definitions for ECK operator version 2.6.1 27 | 28 | kubectl create -f https://download.elastic.co/downloads/eck/2.6.1/crds.yaml 29 | 30 | Install the operator with its RBAC rules: 31 | 32 | kubectl apply -f https://download.elastic.co/downloads/eck/2.6.1/operator.yaml 33 | 34 | 35 | Verify the ECK operator installtion and ensure the workload is running as expected 36 | 37 | ``` 38 | ❯❯ kubectl get crd 39 | NAME CREATED AT 40 | agents.agent.k8s.elastic.co 2023-02-10T16:19:25Z 41 | apmservers.apm.k8s.elastic.co 2023-02-10T16:19:26Z 42 | beats.beat.k8s.elastic.co 2023-02-10T16:19:27Z 43 | elasticmapsservers.maps.k8s.elastic.co 2023-02-10T16:19:28Z 44 | elasticsearchautoscalers.autoscaling.k8s.elastic.co 2023-02-10T16:19:29Z 45 | elasticsearches.elasticsearch.k8s.elastic.co 2023-02-10T16:19:30Z 46 | enterprisesearches.enterprisesearch.k8s.elastic.co 2023-02-10T16:19:31Z 47 | kibanas.kibana.k8s.elastic.co 2023-02-10T16:19:32Z 48 | 49 | ❯❯ kubectl get all -n elastic-system 50 | NAME READY STATUS RESTARTS AGE 51 | pod/elastic-operator-0 1/1 Running 0 74s 52 | 53 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 54 | service/elastic-webhook-server ClusterIP 10.100.206.147 443/TCP 76s 55 | 56 | NAME READY AGE 57 | statefulset.apps/elastic-operator 1/1 78s 58 | 59 | #Monitor the operator pod logs 60 | ❯❯ kubectl logs -f -n elastic-system pod/elastic-operator-0 61 | ``` 62 | 63 | ## Deploy Elasticsearch Cluster 64 | 65 | Now that ECK is running in the Kubernetes cluster, let's deploy an elastic search cluster with 1 Master node and 2 Data node pods in the `default` namespace. 66 | 67 | Refer to `elasticsearch.yaml` for [elatsicsample configurations and [ECK documentation](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-orchestrating-elastic-stack-applications.html) for all the available configuration options. Customize the configurations based on your requirements. 68 | 69 | 70 | kubectl apply -f elasticsearch.yaml 71 | 72 | 73 | Verify the installation 74 | 75 | ``` 76 | ❯❯ kubectl get statefulset,pods,sc,pv,pvc 77 | 78 | NAME READY AGE 79 | statefulset.apps/demo-es-data 2/2 94s 80 | statefulset.apps/demo-es-masters 1/1 94s 81 | 82 | NAME READY STATUS RESTARTS AGE 83 | pod/demo-es-data-0 1/1 Running 0 95s 84 | pod/demo-es-data-1 1/1 Running 0 95s 85 | pod/demo-es-masters-0 1/1 Running 0 95s 86 | 87 | NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE 88 | storageclass.storage.k8s.io/gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 2d16h 89 | 90 | NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE 91 | persistentvolume/pvc-7f3961d0-e227-4e63-a4bb-48178b33d37e 10Gi RWO Delete Bound default/elasticsearch-data-demo-es-data-1 gp2 93s 92 | persistentvolume/pvc-b553429d-fe34-4c7b-bcf0-eb8262144408 10Gi RWO Delete Bound default/elasticsearch-data-demo-es-data-0 gp2 93s 93 | persistentvolume/pvc-d37144db-fba3-48a8-a866-b882be011d3a 10Gi RWO Delete Bound default/elasticsearch-data-demo-es-masters-0 gp2 93s 94 | 95 | NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE 96 | persistentvolumeclaim/elasticsearch-data-demo-es-data-0 Bound pvc-b553429d-fe34-4c7b-bcf0-eb8262144408 10Gi RWO gp2 98s 97 | persistentvolumeclaim/elasticsearch-data-demo-es-data-1 Bound pvc-7f3961d0-e227-4e63-a4bb-48178b33d37e 10Gi RWO gp2 98s 98 | persistentvolumeclaim/elasticsearch-data-demo-es-masters-0 Bound pvc-d37144db-fba3-48a8-a866-b882be011d3a 10Gi RWO gp2 98s 99 | 100 | ``` 101 | 102 | Elasticsearch svc `demo-es-http` is created as a ClusterIP service and accessible only within the cluster. Change the service type to [loadbalancer](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-services.html#k8s-allow-public-access) if required. 103 | 104 | A default user named elastic is automatically created with the password stored in a Kubernetes secret: 105 | 106 | 107 | ELASTIC_USER_PASSWORD=$(kubectl get secret demo-es-elastic-user -o go-template='{{.data.elastic | base64decode}}') 108 | 109 | 110 | Test the elasticsearch connection from your local work station 111 | 112 | kubectl port-forward service/demo-es-http 9200 113 | 114 | 115 | Disabling certificate verification using the -k flag is not recommended and should be used for testing purposes only 116 | 117 | curl -u "elastic:$ELASTIC_USER_PASSWORD" -k "https://localhost:9200" 118 | 119 | Sample output: 120 | 121 | ``` 122 | { 123 | "name" : "demo-es-masters-0", 124 | "cluster_name" : "demo", 125 | "cluster_uuid" : "9js8hJuAQhmdXv7p1C-YJw", 126 | "version" : { 127 | "number" : "8.6.1", 128 | "build_flavor" : "default", 129 | "build_type" : "docker", 130 | "build_hash" : "180c9830da956993e59e2cd70eb32b5e383ea42c", 131 | "build_date" : "2023-01-24T21:35:11.506992272Z", 132 | "build_snapshot" : false, 133 | "lucene_version" : "9.4.2", 134 | "minimum_wire_compatibility_version" : "7.17.0", 135 | "minimum_index_compatibility_version" : "7.0.0" 136 | }, 137 | "tagline" : "You Know, for Search" 138 | } 139 | ``` 140 | 141 | ## Deploy Kibana Cluster 142 | 143 | The next step is to deploy kibana, a free and open user interface that lets you visualize your Elasticsearch data. 144 | 145 | Refer to `kibana.yaml` for sample configurations and refer to [eck documents](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-kibana-es.html) for all available configuration options. 146 | 147 | kubectl apply -f kibana.yaml 148 | 149 | Verify the installation 150 | 151 | ``` 152 | ❯❯ kubectl get all -l "common.k8s.elastic.co/type=kibana" 153 | 154 | NAME READY STATUS RESTARTS AGE 155 | pod/demo-kb-799d67ffff-7tftp 1/1 Running 0 2m49s 156 | 157 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 158 | service/demo-kb-http LoadBalancer 10.100.235.170 a8cad8cba32374f788b1744cb79c46d5-2001568967.us-west-2.elb.amazonaws.com 5601:30645/TCP 2m54s 159 | 160 | NAME READY UP-TO-DATE AVAILABLE AGE 161 | deployment.apps/demo-kb 1/1 1 1 2m54s 162 | 163 | NAME DESIRED CURRENT READY AGE 164 | replicaset.apps/demo-kb-799d67ffff 1 1 1 2m53s 165 | ``` 166 | 167 | Access the kibana application via the network loadbalancer and use the default elasticsearch username and password. (Ex: https://a8cad8cba32374f788b1744cb79c46d5-2001568967.us-west-2.elb.amazonaws.com:5601) 168 | 169 | ![Screenshot 2023-02-13 at 11 26 07](https://user-images.githubusercontent.com/112865563/218456889-44b9a760-cd5b-4a78-ab27-3fc91072eca2.jpeg) 170 | 171 | ![Screenshot 2023-02-13 at 11 27 29](https://user-images.githubusercontent.com/112865563/218456816-c55e8e56-d7c1-4579-87b5-e10ef22582e4.jpeg) 172 | 173 | ## Deploy Filebeat 174 | 175 | Filebeat is a lightweight shipper for forwarding and centralizing log data.It is installed as a daemonset in the kubernetes cluster and collects the logs generated by the containers. The collected logs are shipped to elasticsearch and indexed. You can then query the logs via kibana or elasticsearch API. 176 | 177 | Refer to `filebeat.yaml` for sample configurations and refer to [eck documents](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-beat-configuration.html) for all available configuration options. 178 | 179 | kubectl apply -f filebeat.yaml 180 | 181 | Verify the installation 182 | 183 | ``` 184 | ❯❯ kubectl get pods -l "beat.k8s.elastic.co/name=demo" 185 | 186 | NAME READY STATUS RESTARTS AGE 187 | demo-beat-filebeat-4xvs2 1/1 Running 0 2m53s 188 | demo-beat-filebeat-7s8f5 1/1 Running 0 2m53s 189 | Chimbus-MBP:05-deploy-ELK-stack chimbu$ kubectl get all -l "beat.k8s.elastic.co/name=demo" 190 | NAME READY STATUS RESTARTS AGE 191 | pod/demo-beat-filebeat-4xvs2 1/1 Running 0 3m2s 192 | pod/demo-beat-filebeat-7s8f5 1/1 Running 0 3m2s 193 | 194 | NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE 195 | daemonset.apps/demo-beat-filebeat 2 2 2 2 2 3m5s 196 | 197 | ``` 198 | 199 | Login to kibana and create a data view to explore the collected data. Also deploy a sample application and explore the log data. 200 | 201 | ![Screenshot 2023-02-13 at 11 27 52](https://user-images.githubusercontent.com/112865563/218457017-a8cf3742-9a61-4123-8cf3-973ba17b1a0c.jpeg) 202 | 203 | ![Screenshot 2023-02-13 at 11 42 33](https://user-images.githubusercontent.com/112865563/218457038-6970def8-a1ea-4b31-8aaf-53cc260fe0a8.jpeg) 204 | 205 | ![Screenshot 2023-02-13 at 11 25 22](https://user-images.githubusercontent.com/112865563/218457121-387c3594-e416-490a-8bf3-5ac9fe5be808.jpeg) 206 | 207 | ![Screenshot 2023-02-13 at 11 25 42](https://user-images.githubusercontent.com/112865563/218457163-e8eddb76-2e69-4d4c-926f-cfef60c05332.jpeg) 208 | -------------------------------------------------------------------------------- /05-deploy-EFK/elasticsearch.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: elasticsearch.k8s.elastic.co/v1 2 | kind: Elasticsearch 3 | metadata: 4 | name: demo 5 | spec: 6 | version: 8.6.1 7 | nodeSets: 8 | - name: masters 9 | count: 1 10 | config: 11 | node.roles: ["master"] 12 | node.store.allow_mmap: false 13 | volumeClaimTemplates: 14 | - metadata: 15 | name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path. 16 | spec: 17 | accessModes: 18 | - ReadWriteOnce 19 | resources: 20 | requests: 21 | storage: 10Gi 22 | storageClassName: gp2 #default storage class in eks 23 | - name: data 24 | count: 2 25 | config: 26 | node.roles: ["data", "ingest", "ml", "transform"] 27 | node.store.allow_mmap: false 28 | volumeClaimTemplates: 29 | - metadata: 30 | name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path. 31 | spec: 32 | accessModes: 33 | - ReadWriteOnce 34 | resources: 35 | requests: 36 | storage: 10Gi 37 | storageClassName: gp2 #default storage class in eks 38 | -------------------------------------------------------------------------------- /05-deploy-EFK/filebeat.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: beat.k8s.elastic.co/v1beta1 2 | kind: Beat 3 | metadata: 4 | name: demo 5 | spec: 6 | type: filebeat 7 | version: 8.6.1 8 | elasticsearchRef: 9 | name: demo 10 | config: 11 | filebeat.autodiscover.providers: 12 | - node: ${NODE_NAME} 13 | type: kubernetes 14 | hints: 15 | enabled: true 16 | default_config: 17 | type: container 18 | paths: 19 | - /var/log/containers/*${data.kubernetes.container.id}.log 20 | processors: 21 | - add_cloud_metadata: {} 22 | - add_host_metadata: {} 23 | daemonSet: 24 | podTemplate: 25 | spec: 26 | serviceAccountName: filebeat 27 | automountServiceAccountToken: true 28 | terminationGracePeriodSeconds: 30 29 | dnsPolicy: ClusterFirstWithHostNet 30 | hostNetwork: true # Allows to provide richer host metadata 31 | containers: 32 | - name: filebeat 33 | securityContext: 34 | runAsUser: 0 35 | # If using Red Hat OpenShift uncomment this: 36 | #privileged: true 37 | volumeMounts: 38 | - name: varlogcontainers 39 | mountPath: /var/log/containers 40 | - name: varlogpods 41 | mountPath: /var/log/pods 42 | - name: varlibdockercontainers 43 | mountPath: /var/lib/docker/containers 44 | env: 45 | - name: NODE_NAME 46 | valueFrom: 47 | fieldRef: 48 | fieldPath: spec.nodeName 49 | volumes: 50 | - name: varlogcontainers 51 | hostPath: 52 | path: /var/log/containers 53 | - name: varlogpods 54 | hostPath: 55 | path: /var/log/pods 56 | - name: varlibdockercontainers 57 | hostPath: 58 | path: /var/lib/docker/containers 59 | --- 60 | apiVersion: rbac.authorization.k8s.io/v1 61 | kind: ClusterRole 62 | metadata: 63 | name: filebeat 64 | rules: 65 | - apiGroups: [""] # "" indicates the core API group 66 | resources: 67 | - namespaces 68 | - pods 69 | - nodes 70 | verbs: 71 | - get 72 | - watch 73 | - list 74 | --- 75 | apiVersion: v1 76 | kind: ServiceAccount 77 | metadata: 78 | name: filebeat 79 | namespace: default 80 | --- 81 | apiVersion: rbac.authorization.k8s.io/v1 82 | kind: ClusterRoleBinding 83 | metadata: 84 | name: filebeat 85 | subjects: 86 | - kind: ServiceAccount 87 | name: filebeat 88 | namespace: default 89 | roleRef: 90 | kind: ClusterRole 91 | name: filebeat 92 | apiGroup: rbac.authorization.k8s.io -------------------------------------------------------------------------------- /05-deploy-EFK/kibana.yaml: -------------------------------------------------------------------------------- 1 | 2 | --- 3 | apiVersion: kibana.k8s.elastic.co/v1 4 | kind: Kibana 5 | metadata: 6 | name: demo 7 | spec: 8 | version: 8.6.1 9 | count: 1 10 | elasticsearchRef: 11 | name: demo #elasticsearch deployment name 12 | namespace: default 13 | http: 14 | service: 15 | metadata: 16 | annotations: 17 | service.beta.kubernetes.io/aws-load-balancer-type: "nlb" 18 | service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" 19 | service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip" 20 | service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" 21 | service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp 22 | spec: 23 | type: LoadBalancer # default is ClusterIP 24 | podTemplate: 25 | spec: 26 | containers: 27 | - name: kibana 28 | env: 29 | - name: NODE_OPTIONS 30 | value: "--max-old-space-size=2048" 31 | resources: 32 | requests: 33 | memory: 1Gi 34 | cpu: 0.5 35 | limits: 36 | memory: 2.5Gi 37 | cpu: 2 38 | -------------------------------------------------------------------------------- /06-deploy-keda/README.md: -------------------------------------------------------------------------------- 1 | Please refer to [keda-eks-event-driven-autoscaling-demo](https://github.com/ChimbuChinnadurai/keda-eks-event-driven-autoscaling-demo/tree/main) for EKS KEDA example. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # aws-eks-sample-templates 2 | 3 | This repository contains the sample templates to get started with AWS EKS quickly 4 | 5 | Each directory contains the instructions to deploy the required resources in EKS. 6 | 7 | ## Create a new eks cluster and deploy a sample application 8 | 9 | https://user-images.githubusercontent.com/112865563/213638891-8c4e03c0-4ef4-4e1e-a2fb-9e0679317f89.mp4 10 | 11 | ## Deploy Prometheus and Grafana 12 | 13 | https://user-images.githubusercontent.com/112865563/215078845-5fcabb5f-3bd8-4769-b735-b9c5ce808111.mp4 14 | --------------------------------------------------------------------------------