├── README.md ├── autoscaling ├── fuel-core-cpu-hpa.yaml └── fuel-core-memory-hpa.yaml ├── grafana └── dashboards │ ├── beta-3-api-metrics.json │ └── fuel-core-test.json ├── ingress ├── monitoring-ingress-oauth.yaml ├── monitoring-ingress.yaml ├── oauth-ingress.yaml └── prod-issuer.yaml ├── logging ├── elasticsearch │ ├── kibana-ingress.yaml │ ├── logging-cluster.yaml │ └── logging-kibana.yaml ├── fluentbit │ └── fluentbit-configmap.yaml └── fluentd │ ├── fluentd-cm.yaml │ └── fluentd-ds.yaml ├── monitoring └── values.yaml ├── oauth └── oauth-proxy-deploy.yaml ├── postgres ├── .helmignore ├── Chart.yaml ├── README.md ├── templates │ ├── NOTES.txt │ ├── _helpers.tpl │ ├── deployment.yaml │ ├── pvc.yaml │ ├── secrets.yaml │ ├── service.yaml │ └── tests │ │ └── test-connection.yaml └── values.yaml ├── rbac └── dev-user-rbac.yaml ├── scripts ├── .env ├── create-ebs-snapshot.sh ├── create-k8s.sh ├── delete-k8s.sh ├── deploy-falco-monitoring.sh ├── deploy-jaeger-tracing.sh ├── deploy-k8s-autoscaling.sh ├── deploy-oauth.sh ├── setup-aws-velero-backup.sh └── upgrade-k8s.sh ├── security └── falco │ └── values.yaml ├── terraform ├── environments │ ├── aws-alerts │ │ ├── main.tf │ │ ├── state.tf │ │ └── versions.tf │ └── eks │ │ ├── main.tf │ │ ├── state.tf │ │ └── versions.tf └── modules │ ├── aws-alerts │ ├── aws.tf │ ├── es-alerts.tf │ ├── variables.tf │ └── versions.tf │ └── eks │ ├── aws.tf │ ├── eks.tf │ ├── variables.tf │ ├── versions.tf │ └── vpc.tf └── tracing ├── jaeger-tracing-ingress.yaml └── jaeger-tracing.yaml /README.md: -------------------------------------------------------------------------------- 1 | # Fuel Infrastructure 2 | 3 | ## Prerequisites 4 | 5 | Before proceeding make sure to have these software packages installed on your machine: 6 | 7 | 1) [Helm][helm]: Install latest version of Helm3 for your OS 8 | 9 | 2) [Terraform][terraform]: Install latest version of Terraform for your OS 10 | 11 | 3) [kubectl][kubectl-cli]: Install latest version of kubectl 12 | 13 | 4) [gettext][gettext-cli]: Install gettext for your OS 14 | 15 | 5) AWS (for EKS deployment only): 16 | - [aws cli v2][aws-cli]: Install latest version of aws cli v2 17 | 18 | - [aws-iam-authenticator][iam-auth]: Install to authenticate to EKS cluster via AWS IAM 19 | 20 | - IAM user(s) with AWS access keys with following IAM access: 21 | ```json 22 | { 23 | "Version": "2012-10-17", 24 | "Statement": [ 25 | { 26 | "Sid": "VisualEditor0", 27 | "Effect": "Allow", 28 | "Action": [ 29 | "iam:CreateInstanceProfile", 30 | "iam:GetPolicyVersion", 31 | "iam:PutRolePermissionsBoundary", 32 | "iam:DeletePolicy", 33 | "iam:CreateRole", 34 | "iam:AttachRolePolicy", 35 | "iam:PutRolePolicy", 36 | "iam:DeleteRolePermissionsBoundary", 37 | "iam:CreateLoginProfile", 38 | "iam:ListInstanceProfilesForRole", 39 | "iam:PassRole", 40 | "iam:DetachRolePolicy", 41 | "iam:DeleteRolePolicy", 42 | "iam:ListAttachedRolePolicies", 43 | "iam:ListRolePolicies", 44 | "iam:CreatePolicyVersion", 45 | "iam:DeleteInstanceProfile", 46 | "iam:GetRole", 47 | "iam:GetInstanceProfile", 48 | "iam:GetPolicy", 49 | "iam:ListRoles", 50 | "iam:DeleteRole", 51 | "iam:CreatePolicy", 52 | "iam:ListPolicyVersions", 53 | "iam:UpdateRole", 54 | "iam:DeleteServiceLinkedRole", 55 | "iam:GetRolePolicy", 56 | "iam:DeletePolicyVersion", 57 | "logs:*", 58 | "s3:*", 59 | "autoscaling:*", 60 | "cloudwatch:*", 61 | "elasticloadbalancing:*", 62 | "ec2:*", 63 | "eks:*" 64 | ], 65 | "Resource": "*" 66 | } 67 | ] 68 | } 69 | ``` 70 | 71 | Note: Currently only Linux and Unix operating systems are supported for terraform creation of a k8s cluster. 72 | 73 | ## Deploying k8s Cluster 74 | 75 | Currently Fuel Core support terraform based k8s cluster environment deployments for: 76 | 77 | 1) AWS Elastic Kubernetes Service ([EKS][aws-eks]) 78 | 79 | ### k8s Cluster Configuration 80 | 81 | The current k8s cluster configuration is based on a single [env][env-file] file. 82 | 83 | You will need to customize the following environment variables as needed (for variables not needed - keep the defaults): 84 | 85 | | ENV Variable | Script Usage | Description | 86 | |--------------------------------|---------------------------|---------------------------------------------------------------------------------------------------| 87 | | kibana_ingress_dns | deploy-k8s-logging | your kibaa ingress dns | 88 | | letsencrypt_email | create-k8s (all) | the email address for requesting & renewing your lets encrypt certificate | 89 | | grafana_ingress_dns | create-k8s (all) | the custom dns address for the grafana ingress | 90 | | k8s_provider | create-k8s (all) | your kubernetes provider name, possible options: eks | 91 | | TF_VAR_aws_environment | create-k8s (all) | environment name | 92 | | TF_VAR_aws_region | create-k8s (aws) | AWS region where you plan to deploy your EKS cluster e.g. us-east-1 | 93 | | TF_VAR_aws_account_id | create-k8s (aws) | AWS account id | 94 | | TF_state_s3_bucket | create-k8s (aws) | the s3 bucket to store the deployed terraform state | 95 | | TF_state_s3_bucket_key | create-k8s (aws) | the s3 key to save the deployed terraform state.tf | 96 | | TF_VAR_aws_vpc_cidr_block | create-k8s (aws) | AWS vpc cidr block | 97 | | TF_VAR_aws_azs | create-k8s (aws) | A list of regional availability zones for the AWS vpc subnets | 98 | | TF_VAR_aws_public_subnets | create-k8s (aws) | A list of cidr blocks for AWS public subnets | 99 | | TF_VAR_aws_private_subnets | create-k8s (aws) | A list of cidr blocks for AWS private subnets | 100 | | TF_VAR_eks_cluster_name | create-k8s (aws) | EKS cluster name | 101 | | TF_VAR_eks_cluster_version | create-k8s (aws) | EKS cluster version, possible options: 1.18.16, 1.19.8, 1.20.7, 1.21.2 | 102 | | TF_VAR_eks_node_groupname | create-k8s (aws) | EKS worker node group name | 103 | | TF_VAR_eks_node_ami_type | create-k8s (aws) | EKS worker node group AMI type, possible options: AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM | 104 | | TF_VAR_eks_node_disk_size | create-k8s (aws) | disk size (GiB) for EKS worker nodes | 105 | | TF_VAR_eks_node_instance_types | create-k8s (aws) | A list of instance types for the EKS worker nodes | 106 | | TF_VAR_eks_node_min_size | create-k8s (aws) | minimum number of eks worker nodes | 107 | | TF_VAR_eks_node_desired_size | create-k8s (aws) | desired number of eks worker nodes | 108 | | TF_VAR_eks_node_max_size | create-k8s (aws) | maximum number of eks worker nodes | 109 | | TF_VAR_eks_capacity_type | create-k8s (aws) | type of capacity associated with the eks node group, possible options: ON_DEMAND, SPOT | 110 | | TF_VAR_ec2_ssh_key | create-k8s (aws) | ec2 key Pair name for ssh access (must create this key pair in your AWS account before) | 111 | 112 | Notes: 113 | 114 | - create-k8s refers to the [create-k8s.sh][create-k8s-sh] script 115 | 116 | ### k8s Cluster Deployment 117 | 118 | Once your env file is updated with your parameters, then run the [create-k8s.sh][create-k8s-sh] to create, deploy, update, and/or setup the k8s cluster to your cloud provider: 119 | 120 | ```bash 121 | ./create-k8s.sh 122 | ``` 123 | The script will read the "k8s_provider" from the env file and then terraform will automatically create the k8s cluster. 124 | 125 | Note: 126 | 127 | - During the create-k8s script run, please do not interrupt your terminal as terraform is deploying your infrastructure. 128 | 129 | If you stop the script somehow, terraform may lock the state of configuration. 130 | 131 | - If you have deployed an AWS EKS cluster, post creation of the EKS cluster make sure the proper IAM users have access to the EKS cluster via the [aws-auth][add-users-aws-auth] configmap to run the other deployment scripts. 132 | 133 | ### k8s Cluster Delete 134 | 135 | If you need to tear down your entire k8s cluster, just run the [delete-k8s.sh][delete-k8s-sh] script: 136 | 137 | ```bash 138 | ./delete-k8s.sh 139 | ``` 140 | 141 | 142 | ## Deploying Prometheus-Grafana on k8s 143 | 144 | [Prometheus][prometheus] and [Grafana][grafana] are used for monitoring and visualization of the k8s cluster and fuel-core deployment(s) metrics. 145 | 146 | The prometheus-grafana stack is deployed to the monitoring namespace via create-k8s script: 147 | 148 | In order to access the grafana dashboard, you can will need to run: 149 | 150 | ```bash 151 | kubectl port-forward svc/kube-prometheus-grafana 3001:80 -n monitoring 152 | ``` 153 | 154 | You can then access the grafana dashboard via localhost:3001. 155 | 156 | For grafana console access, the default username is 'admin' and password is 'prom-operator', 157 | 158 | If you want to access the grafana dashboard from a custom DNS address, you need to select 'grafana_ingress_dns' env that is a custom DNS address available in your owned DNS domain. 159 | 160 | Check that the grafana ingress is setup via: 161 | 162 | ```bash 163 | % kubectl get ingress -n monitoring 164 | NAME CLASS HOSTS ADDRESS PORTS AGE 165 | monitoring-ingress monitoring.example.com xxxxxx.elb.us-east-1.amazonaws.com 80, 443 19d 166 | 167 | ``` 168 | 169 | ## Setup Elasticsearch & FluentD Logging on k8s 170 | 171 | Once your k8s cluster is deployed, you can setup elasticsearch and fluentd setup on your k8s cluster. 172 | 173 | Make sure you have setup the [certificate manager][cert-manager] and [ingress controller][ingress-controller] before you setup logging on your k8s cluster. 174 | 175 | Then run the [deploy-k8s-logging][deploy-k8s-logging] script: 176 | 177 | ```bash 178 | ./deploy-k8s-logging.sh 179 | ``` 180 | 181 | This will setup elasticsearch and fluentd on your cluster. 182 | 183 | In order to deploy your Kibana ingress, run the [deploy-k8s-kibana-ingress][deploy-k8s-kibana-ingress] 184 | 185 | ```bash 186 | ./deploy-k8s-kibana-ingress.sh 187 | ``` 188 | 189 | Then to view the kibana ingress: 190 | 191 | ```bash 192 | kubectl get ingress kibana-ingress -n logging 193 | ``` 194 | 195 | The default username for kibana dashboard UI will be "elastic" and the password can be gotted from 196 | 197 | ```bash 198 | PASSWORD=$(kubectl get secret eck-es-elastic-user -o go-template='{{.data.elastic | base64decode}}') 199 | echo $PASSWORD 200 | ``` 201 | 202 | ## Deploying Jaeger on k8s 203 | 204 | Jaeger is an opensource end to end distributed system. There is native support for OpenTelemetry in [Jaeger][jaeger]. 205 | 206 | Before you deploy Jaeger, make sure to follow the section above to "Setup Elasticsearch & FluentD Logging on k8s". 207 | 208 | The elasticsearch instance is required to deploy Jaeger with an ElasticSearch storage backend. 209 | 210 | To deploy the jaeger operator and instance, simply run the [deploy-jaeger-tracing][deploy-jaeger-tracing] 211 | 212 | ```bash 213 | ./deploy-jaeger-tracing.sh 214 | ``` 215 | 216 | Then to view the jaeger ingress: 217 | 218 | ```bash 219 | kubectl get ingress jaeger-tracing-ingress -n observability 220 | ``` 221 | 222 | Once Jaeger is setup, you can start to integrate fuel services to send traces Jaeger via [OpenTelemetry SDK][opentelemetry-sdk] 223 | 224 | [add-users-aws-auth]: https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html 225 | [aws-cli]: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html 226 | [aws-eks]: https://aws.amazon.com/eks/ 227 | [cert-manager]: https://cert-manager.io/docs/configuration/acme/ 228 | [create-k8s-sh]: https://github.com/FuelLabs/infrastructure/blob/master/scripts/create-k8s.sh 229 | [delete-k8s-sh]: https://github.com/FuelLabs/infrastructure/blob/master/scripts/delete-k8s.sh 230 | [deploy-k8s-kibana-ingress]: https://github.com/FuelLabs/infrastructure/blob/master/scripts/deploy-k8s-kibana-ingress.sh 231 | [deploy-k8s-logging]: https://github.com/FuelLabs/infrastructure/blob/master/scripts/deploy-k8s-logging.sh 232 | [deploy-jaeger-tracing]: https://github.com/FuelLabs/infrastructure/blob/master/scripts/deploy-jaeger-tracing.sh 233 | [docker-desktop]: https://docs.docker.com/engine/install/ 234 | [env-file]: https://github.com/FuelLabs/infrastructure/blob/master/scripts/.env 235 | [gettext-cli]: https://www.gnu.org/software/gettext/ 236 | [grafana]: https://grafana.com/ 237 | [helm]: https://helm.sh/docs/intro/install/ 238 | [iam-auth]: https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html 239 | [ingress-controller]: https://github.com/kubernetes/ingress-nginx 240 | [ingress-def]: https://kubernetes.io/docs/concepts/services-networking/ingress/ 241 | [jaeger]: https://www.jaegertracing.io/ 242 | [jaeger-operator]: https://www.jaegertracing.io/docs/1.34/operator/ 243 | [k8s-terraform]: https://github.com/FuelLabs/infrastructure/tree/master/terraform 244 | [kubectl-cli]: https://kubernetes.io/docs/tasks/tools/ 245 | [prometheus]: https://prometheus.io/ 246 | [terraform]: https://learn.hashicorp.com/tutorials/terraform/install-cli 247 | [opentelemetry-sdk]: https://www.jaegertracing.io/_client_libs/client-libraries/ 248 | -------------------------------------------------------------------------------- /autoscaling/fuel-core-cpu-hpa.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: autoscaling/v1 2 | kind: HorizontalPodAutoscaler 3 | metadata: 4 | annotations: 5 | name: fuel-core-cpu-hpa 6 | namespace: fuel-core 7 | spec: 8 | maxReplicas: 5 9 | minReplicas: 2 10 | scaleTargetRef: 11 | apiVersion: apps/v1 12 | kind: Deployment 13 | name: fuel-core-k8s 14 | targetCPUUtilizationPercentage: 75 15 | -------------------------------------------------------------------------------- /autoscaling/fuel-core-memory-hpa.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: autoscaling/v2beta2 2 | kind: HorizontalPodAutoscaler 3 | metadata: 4 | annotations: 5 | name: fuel-core-memory-hpa 6 | namespace: fuel-core 7 | spec: 8 | maxReplicas: 5 9 | minReplicas: 2 10 | scaleTargetRef: 11 | apiVersion: apps/v1 12 | kind: Deployment 13 | name: fuel-core-k8s 14 | metrics: 15 | - type: Resource 16 | resource: 17 | name: memory 18 | target: 19 | type: Utilization 20 | averageValue: 2Gi 21 | -------------------------------------------------------------------------------- /grafana/dashboards/beta-3-api-metrics.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "target": { 15 | "limit": 100, 16 | "matchAny": false, 17 | "tags": [], 18 | "type": "dashboard" 19 | }, 20 | "type": "dashboard" 21 | } 22 | ] 23 | }, 24 | "editable": true, 25 | "fiscalYearStartMonth": 0, 26 | "graphTooltip": 0, 27 | "id": 133, 28 | "links": [], 29 | "liveNow": false, 30 | "panels": [ 31 | { 32 | "datasource": { 33 | "type": "prometheus", 34 | "uid": "prometheus" 35 | }, 36 | "description": "", 37 | "fieldConfig": { 38 | "defaults": { 39 | "custom": { 40 | "hideFrom": { 41 | "legend": false, 42 | "tooltip": false, 43 | "viz": false 44 | }, 45 | "scaleDistribution": { 46 | "type": "linear" 47 | } 48 | } 49 | }, 50 | "overrides": [] 51 | }, 52 | "gridPos": { 53 | "h": 7, 54 | "w": 24, 55 | "x": 0, 56 | "y": 0 57 | }, 58 | "id": 5, 59 | "options": { 60 | "calculate": false, 61 | "cellGap": 1, 62 | "color": { 63 | "exponent": 0.5, 64 | "fill": "dark-orange", 65 | "mode": "scheme", 66 | "reverse": false, 67 | "scale": "exponential", 68 | "scheme": "Oranges", 69 | "steps": 64 70 | }, 71 | "exemplars": { 72 | "color": "rgba(255,0,255,0.7)" 73 | }, 74 | "filterValues": { 75 | "le": 1e-9 76 | }, 77 | "legend": { 78 | "show": true 79 | }, 80 | "rowsFrame": { 81 | "layout": "auto" 82 | }, 83 | "tooltip": { 84 | "show": true, 85 | "yHistogram": false 86 | }, 87 | "yAxis": { 88 | "axisPlacement": "left", 89 | "reverse": false, 90 | "unit": "s" 91 | } 92 | }, 93 | "pluginVersion": "9.5.1", 94 | "targets": [ 95 | { 96 | "datasource": { 97 | "type": "prometheus", 98 | "uid": "prometheus" 99 | }, 100 | "editorMode": "builder", 101 | "exemplar": false, 102 | "expr": "sum by(le) (increase(graphql_request_duration_seconds_bucket{path=\"request\", service=\"sentry-1-lb-service\"}[1m]))", 103 | "format": "heatmap", 104 | "instant": false, 105 | "interval": "", 106 | "legendFormat": "__auto", 107 | "range": true, 108 | "refId": "A" 109 | } 110 | ], 111 | "title": "Requests Response Time", 112 | "type": "heatmap" 113 | }, 114 | { 115 | "datasource": { 116 | "type": "prometheus", 117 | "uid": "prometheus" 118 | }, 119 | "description": "", 120 | "fieldConfig": { 121 | "defaults": { 122 | "custom": { 123 | "hideFrom": { 124 | "legend": false, 125 | "tooltip": false, 126 | "viz": false 127 | }, 128 | "scaleDistribution": { 129 | "type": "linear" 130 | } 131 | } 132 | }, 133 | "overrides": [] 134 | }, 135 | "gridPos": { 136 | "h": 7, 137 | "w": 24, 138 | "x": 0, 139 | "y": 7 140 | }, 141 | "id": 14, 142 | "options": { 143 | "calculate": false, 144 | "cellGap": 1, 145 | "color": { 146 | "exponent": 0.5, 147 | "fill": "dark-orange", 148 | "mode": "scheme", 149 | "reverse": false, 150 | "scale": "exponential", 151 | "scheme": "Oranges", 152 | "steps": 64 153 | }, 154 | "exemplars": { 155 | "color": "rgba(255,0,255,0.7)" 156 | }, 157 | "filterValues": { 158 | "le": 1e-9 159 | }, 160 | "legend": { 161 | "show": true 162 | }, 163 | "rowsFrame": { 164 | "layout": "auto" 165 | }, 166 | "tooltip": { 167 | "show": true, 168 | "yHistogram": false 169 | }, 170 | "yAxis": { 171 | "axisPlacement": "left", 172 | "reverse": false, 173 | "unit": "s" 174 | } 175 | }, 176 | "pluginVersion": "9.5.1", 177 | "targets": [ 178 | { 179 | "datasource": { 180 | "type": "prometheus", 181 | "uid": "prometheus" 182 | }, 183 | "editorMode": "builder", 184 | "exemplar": false, 185 | "expr": "sum by(le) (increase(graphql_request_duration_seconds_bucket{path=\"dryRun\", service=\"sentry-1-lb-service\"}[1m]))", 186 | "format": "heatmap", 187 | "instant": false, 188 | "interval": "", 189 | "legendFormat": "__auto", 190 | "range": true, 191 | "refId": "A" 192 | } 193 | ], 194 | "title": "Dry Run Response Time", 195 | "type": "heatmap" 196 | }, 197 | { 198 | "datasource": { 199 | "type": "prometheus", 200 | "uid": "prometheus" 201 | }, 202 | "fieldConfig": { 203 | "defaults": { 204 | "color": { 205 | "mode": "palette-classic" 206 | }, 207 | "custom": { 208 | "axisCenteredZero": false, 209 | "axisColorMode": "text", 210 | "axisLabel": "", 211 | "axisPlacement": "auto", 212 | "barAlignment": 0, 213 | "drawStyle": "line", 214 | "fillOpacity": 0, 215 | "gradientMode": "none", 216 | "hideFrom": { 217 | "legend": false, 218 | "tooltip": false, 219 | "viz": false 220 | }, 221 | "lineInterpolation": "linear", 222 | "lineWidth": 1, 223 | "pointSize": 5, 224 | "scaleDistribution": { 225 | "type": "linear" 226 | }, 227 | "showPoints": "auto", 228 | "spanNulls": false, 229 | "stacking": { 230 | "group": "A", 231 | "mode": "none" 232 | }, 233 | "thresholdsStyle": { 234 | "mode": "off" 235 | } 236 | }, 237 | "mappings": [], 238 | "thresholds": { 239 | "mode": "absolute", 240 | "steps": [ 241 | { 242 | "color": "green", 243 | "value": null 244 | }, 245 | { 246 | "color": "red", 247 | "value": 80 248 | } 249 | ] 250 | } 251 | }, 252 | "overrides": [] 253 | }, 254 | "gridPos": { 255 | "h": 8, 256 | "w": 24, 257 | "x": 0, 258 | "y": 14 259 | }, 260 | "id": 2, 261 | "options": { 262 | "legend": { 263 | "calcs": [], 264 | "displayMode": "list", 265 | "placement": "bottom", 266 | "showLegend": true 267 | }, 268 | "tooltip": { 269 | "mode": "single", 270 | "sort": "none" 271 | } 272 | }, 273 | "targets": [ 274 | { 275 | "datasource": { 276 | "type": "prometheus", 277 | "uid": "prometheus" 278 | }, 279 | "editorMode": "builder", 280 | "expr": "rate(graphql_request_duration_seconds_count{namespace=\"beta3\", service=\"sentry-1-lb-service\", path!=\"request\"}[1m])", 281 | "legendFormat": "{{container}}:{{path}}", 282 | "range": true, 283 | "refId": "A" 284 | } 285 | ], 286 | "title": "Endpoints call frequency per container", 287 | "type": "timeseries" 288 | }, 289 | { 290 | "datasource": { 291 | "type": "prometheus", 292 | "uid": "prometheus" 293 | }, 294 | "fieldConfig": { 295 | "defaults": { 296 | "color": { 297 | "mode": "palette-classic" 298 | }, 299 | "custom": { 300 | "axisCenteredZero": false, 301 | "axisColorMode": "text", 302 | "axisLabel": "", 303 | "axisPlacement": "auto", 304 | "barAlignment": 0, 305 | "drawStyle": "line", 306 | "fillOpacity": 0, 307 | "gradientMode": "none", 308 | "hideFrom": { 309 | "legend": false, 310 | "tooltip": false, 311 | "viz": false 312 | }, 313 | "lineInterpolation": "linear", 314 | "lineWidth": 1, 315 | "pointSize": 5, 316 | "scaleDistribution": { 317 | "type": "linear" 318 | }, 319 | "showPoints": "auto", 320 | "spanNulls": false, 321 | "stacking": { 322 | "group": "A", 323 | "mode": "none" 324 | }, 325 | "thresholdsStyle": { 326 | "mode": "off" 327 | } 328 | }, 329 | "mappings": [], 330 | "thresholds": { 331 | "mode": "absolute", 332 | "steps": [ 333 | { 334 | "color": "green" 335 | }, 336 | { 337 | "color": "red", 338 | "value": 80 339 | } 340 | ] 341 | } 342 | }, 343 | "overrides": [] 344 | }, 345 | "gridPos": { 346 | "h": 7, 347 | "w": 24, 348 | "x": 0, 349 | "y": 22 350 | }, 351 | "id": 3, 352 | "options": { 353 | "legend": { 354 | "calcs": [], 355 | "displayMode": "list", 356 | "placement": "bottom", 357 | "showLegend": true 358 | }, 359 | "tooltip": { 360 | "mode": "single", 361 | "sort": "none" 362 | } 363 | }, 364 | "targets": [ 365 | { 366 | "datasource": { 367 | "type": "prometheus", 368 | "uid": "prometheus" 369 | }, 370 | "editorMode": "builder", 371 | "expr": "sum by(path) (rate(graphql_request_duration_seconds_count{namespace=\"beta3\", service=\"sentry-1-lb-service\", path!=\"request\"}[1m]))", 372 | "legendFormat": "__auto", 373 | "range": true, 374 | "refId": "A" 375 | } 376 | ], 377 | "title": "Total endpoints call frequency", 378 | "transformations": [], 379 | "type": "timeseries" 380 | }, 381 | { 382 | "datasource": { 383 | "type": "prometheus", 384 | "uid": "prometheus" 385 | }, 386 | "fieldConfig": { 387 | "defaults": { 388 | "color": { 389 | "mode": "palette-classic" 390 | }, 391 | "custom": { 392 | "axisCenteredZero": false, 393 | "axisColorMode": "text", 394 | "axisLabel": "", 395 | "axisPlacement": "auto", 396 | "barAlignment": 0, 397 | "drawStyle": "line", 398 | "fillOpacity": 0, 399 | "gradientMode": "none", 400 | "hideFrom": { 401 | "legend": false, 402 | "tooltip": false, 403 | "viz": false 404 | }, 405 | "lineInterpolation": "linear", 406 | "lineWidth": 1, 407 | "pointSize": 5, 408 | "scaleDistribution": { 409 | "type": "linear" 410 | }, 411 | "showPoints": "auto", 412 | "spanNulls": false, 413 | "stacking": { 414 | "group": "A", 415 | "mode": "none" 416 | }, 417 | "thresholdsStyle": { 418 | "mode": "off" 419 | } 420 | }, 421 | "mappings": [], 422 | "thresholds": { 423 | "mode": "absolute", 424 | "steps": [ 425 | { 426 | "color": "green" 427 | }, 428 | { 429 | "color": "red", 430 | "value": 80 431 | } 432 | ] 433 | } 434 | }, 435 | "overrides": [] 436 | }, 437 | "gridPos": { 438 | "h": 9, 439 | "w": 24, 440 | "x": 0, 441 | "y": 29 442 | }, 443 | "id": 13, 444 | "options": { 445 | "legend": { 446 | "calcs": [], 447 | "displayMode": "list", 448 | "placement": "bottom", 449 | "showLegend": true 450 | }, 451 | "tooltip": { 452 | "mode": "single", 453 | "sort": "none" 454 | } 455 | }, 456 | "targets": [ 457 | { 458 | "datasource": { 459 | "type": "prometheus", 460 | "uid": "prometheus" 461 | }, 462 | "editorMode": "builder", 463 | "expr": "sum by(path) (rate(graphql_request_duration_seconds_count{namespace=\"beta3\", service=\"sentry-1-lb-service\", path=\"request\"}[1m]))", 464 | "legendFormat": "__auto", 465 | "range": true, 466 | "refId": "A" 467 | } 468 | ], 469 | "title": "Total endpoints requests", 470 | "transformations": [], 471 | "type": "timeseries" 472 | } 473 | ], 474 | "refresh": "5s", 475 | "revision": 1, 476 | "schemaVersion": 38, 477 | "style": "dark", 478 | "tags": [], 479 | "templating": { 480 | "list": [] 481 | }, 482 | "time": { 483 | "from": "now-6h", 484 | "to": "now" 485 | }, 486 | "timepicker": {}, 487 | "timezone": "", 488 | "title": "Beta 3 API metrics", 489 | "uid": "x8RcZ3YVz", 490 | "version": 35, 491 | "weekStart": "" 492 | } -------------------------------------------------------------------------------- /grafana/dashboards/fuel-core-test.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "grafana", 8 | "uid": "-- Grafana --" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "target": { 15 | "limit": 100, 16 | "matchAny": false, 17 | "tags": [], 18 | "type": "dashboard" 19 | }, 20 | "type": "dashboard" 21 | } 22 | ] 23 | }, 24 | "editable": true, 25 | "fiscalYearStartMonth": 0, 26 | "graphTooltip": 0, 27 | "id": 80, 28 | "links": [], 29 | "liveNow": false, 30 | "panels": [ 31 | { 32 | "datasource": { 33 | "type": "prometheus", 34 | "uid": "prometheus" 35 | }, 36 | "fieldConfig": { 37 | "defaults": { 38 | "color": { 39 | "mode": "palette-classic" 40 | }, 41 | "custom": { 42 | "axisCenteredZero": false, 43 | "axisColorMode": "text", 44 | "axisLabel": "", 45 | "axisPlacement": "auto", 46 | "barAlignment": 0, 47 | "drawStyle": "line", 48 | "fillOpacity": 0, 49 | "gradientMode": "none", 50 | "hideFrom": { 51 | "legend": false, 52 | "tooltip": false, 53 | "viz": false 54 | }, 55 | "lineInterpolation": "linear", 56 | "lineWidth": 1, 57 | "pointSize": 5, 58 | "scaleDistribution": { 59 | "type": "linear" 60 | }, 61 | "showPoints": "auto", 62 | "spanNulls": false, 63 | "stacking": { 64 | "group": "A", 65 | "mode": "none" 66 | }, 67 | "thresholdsStyle": { 68 | "mode": "off" 69 | } 70 | }, 71 | "mappings": [], 72 | "thresholds": { 73 | "mode": "absolute", 74 | "steps": [ 75 | { 76 | "color": "green", 77 | "value": null 78 | }, 79 | { 80 | "color": "red", 81 | "value": 80 82 | } 83 | ] 84 | } 85 | }, 86 | "overrides": [] 87 | }, 88 | "gridPos": { 89 | "h": 14, 90 | "w": 23, 91 | "x": 0, 92 | "y": 0 93 | }, 94 | "id": 2, 95 | "options": { 96 | "legend": { 97 | "calcs": [], 98 | "displayMode": "list", 99 | "placement": "bottom", 100 | "showLegend": true 101 | }, 102 | "tooltip": { 103 | "mode": "single", 104 | "sort": "none" 105 | } 106 | }, 107 | "targets": [ 108 | { 109 | "datasource": { 110 | "type": "prometheus", 111 | "uid": "prometheus" 112 | }, 113 | "editorMode": "code", 114 | "expr": "node_cpu_seconds_total", 115 | "legendFormat": "__auto", 116 | "range": true, 117 | "refId": "A" 118 | } 119 | ], 120 | "title": "Test", 121 | "type": "timeseries" 122 | } 123 | ], 124 | "refresh": "", 125 | "revision": 1, 126 | "schemaVersion": 38, 127 | "style": "dark", 128 | "tags": [], 129 | "templating": { 130 | "list": [] 131 | }, 132 | "time": { 133 | "from": "now-6h", 134 | "to": "now" 135 | }, 136 | "timepicker": {}, 137 | "timezone": "", 138 | "title": "Test", 139 | "uid": "g-tRWqY4k", 140 | "version": 2, 141 | "weekStart": "" 142 | } -------------------------------------------------------------------------------- /ingress/monitoring-ingress-oauth.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.k8s.io/v1 2 | kind: Ingress 3 | metadata: 4 | annotations: 5 | cert-manager.io/cluster-issuer: letsencrypt-prod 6 | kubernetes.io/ingress.class: nginx 7 | nginx.ingress.kubernetes.io/auth-signin: https://$host/oauth2/start?rd=$http_host$request_uri 8 | nginx.ingress.kubernetes.io/auth-url: https://$host/oauth2/auth 9 | nginx.ingress.kubernetes.io/force-ssl-redirect: "false" 10 | nginx.ingress.kubernetes.io/proxy-body-size: 500m 11 | nginx.ingress.kubernetes.io/rewrite-target: / 12 | nginx.ingress.kubernetes.io/ssl-redirect: "false" 13 | name: monitoring-oauth-ingress 14 | namespace: monitoring 15 | spec: 16 | rules: 17 | - host: ${grafana_ingress_dns} 18 | http: 19 | paths: 20 | - backend: 21 | service: 22 | name: kube-prometheus-grafana 23 | port: 24 | number: 80 25 | path: / 26 | pathType: Prefix 27 | tls: 28 | - hosts: 29 | - ${grafana_ingress_dns} 30 | secretName: grafana-letsencrypt-secret 31 | -------------------------------------------------------------------------------- /ingress/monitoring-ingress.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.k8s.io/v1 2 | kind: Ingress 3 | metadata: 4 | name: monitoring-ingress 5 | namespace: monitoring 6 | annotations: 7 | cert-manager.io/cluster-issuer: "letsencrypt-prod" 8 | kubernetes.io/ingress.class: "nginx" 9 | spec: 10 | rules: 11 | - host: ${grafana_ingress_dns} 12 | http: 13 | paths: 14 | - path: / 15 | pathType: Prefix 16 | backend: 17 | service: 18 | name: kube-prometheus-grafana 19 | port: 20 | number: 80 21 | tls: 22 | - hosts: 23 | - ${grafana_ingress_dns} 24 | secretName: grafana-letsencrypt-secret 25 | -------------------------------------------------------------------------------- /ingress/oauth-ingress.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.k8s.io/v1 2 | kind: Ingress 3 | metadata: 4 | annotations: 5 | cert-manager.io/cluster-issuer: letsencrypt-prod 6 | kubernetes.io/ingress.class: nginx 7 | nginx.ingress.kubernetes.io/force-ssl-redirect: "false" 8 | nginx.ingress.kubernetes.io/ssl-redirect: "false" 9 | name: oauth2-proxy 10 | namespace: monitoring 11 | spec: 12 | rules: 13 | - host: ${grafana_ingress_dns} 14 | http: 15 | paths: 16 | - backend: 17 | service: 18 | name: oauth 19 | port: 20 | number: 4180 21 | path: /oauth2 22 | pathType: Prefix 23 | tls: 24 | - hosts: 25 | - ${grafana_ingress_dns} 26 | secretName: grafana-letsencrypt-secret 27 | -------------------------------------------------------------------------------- /ingress/prod-issuer.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: cert-manager.io/v1 2 | kind: ClusterIssuer 3 | metadata: 4 | name: letsencrypt-prod 5 | namespace: cert-manager 6 | spec: 7 | acme: 8 | server: https://acme-v02.api.letsencrypt.org/directory 9 | email: ${letsencrypt_email} 10 | privateKeySecretRef: 11 | name: letsencrypt-prod 12 | solvers: 13 | - http01: 14 | ingress: 15 | class: nginx 16 | -------------------------------------------------------------------------------- /logging/elasticsearch/kibana-ingress.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.k8s.io/v1 2 | kind: Ingress 3 | metadata: 4 | name: kibana-ingress 5 | namespace: logging 6 | annotations: 7 | cert-manager.io/cluster-issuer: "letsencrypt-prod" 8 | kubernetes.io/ingress.class: "nginx" 9 | spec: 10 | rules: 11 | - host: ${kibana_ingress_dns} 12 | http: 13 | paths: 14 | - path: / 15 | pathType: Prefix 16 | backend: 17 | service: 18 | name: kibana-efk-kb-http 19 | port: 20 | number: 5601 21 | tls: 22 | - hosts: 23 | - ${kibana_ingress_dns} 24 | secretName: kibana-logging 25 | -------------------------------------------------------------------------------- /logging/elasticsearch/logging-cluster.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: elasticsearch.k8s.elastic.co/v1 2 | kind: Elasticsearch 3 | metadata: 4 | name: eck 5 | namespace: logging 6 | spec: 7 | version: 7.14.0 8 | nodeSets: 9 | - name: logging 10 | count: 3 11 | config: 12 | node.store.allow_mmap: false 13 | node.roles: [ master, data, ingest ] 14 | volumeClaimTemplates: 15 | - metadata: 16 | name: elasticsearch-data 17 | spec: 18 | accessModes: 19 | - ReadWriteOnce 20 | resources: 21 | requests: 22 | storage: 200Gi 23 | storageClassName: gp2 24 | -------------------------------------------------------------------------------- /logging/elasticsearch/logging-kibana.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: kibana.k8s.elastic.co/v1beta1 2 | kind: Kibana 3 | metadata: 4 | name: kibana-efk 5 | namespace: logging 6 | spec: 7 | version: 7.14.0 8 | count: 1 9 | elasticsearchRef: 10 | name: eck 11 | http: 12 | tls: 13 | selfSignedCertificate: 14 | disabled: true 15 | -------------------------------------------------------------------------------- /logging/fluentbit/fluentbit-configmap.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | data: 3 | application-log.conf: "[INPUT]\n Name tail\n Tag application.*\n 4 | \ Exclude_Path /var/log/containers/cluster-autoscaler*, /var/log/containers/kube-prometheus*, 5 | /var/log/containers/snapshot-controller*, /var/log/containers/ebs*, /var/log/containers/prometheus*, 6 | /var/log/containers/cert*, /var/log/containers/kube*, /var/log/containers/cloudwatch-agent*, 7 | /var/log/containers/fluent-bit*, /var/log/containers/aws*, /var/log/containers/kube-proxy*, 8 | /var/log/containers/coredns*, /var/log/containers/dev*, /var/log/containers/metrics*, 9 | /var/log/containers/alert*, /kube/log/containers/oauth* \n Path /var/log/containers/*.log\n 10 | \ Docker_Mode On\n Docker_Mode_Flush 5\n Docker_Mode_Parser 11 | \ container_firstline\n Parser docker\n DB /var/fluent-bit/state/flb_container.db\n 12 | \ Mem_Buf_Limit 50MB\n Skip_Long_Lines On\n Refresh_Interval 13 | \ 10\n Rotate_Wait 30\n storage.type filesystem\n Read_from_Head 14 | \ ${READ_FROM_HEAD}\n\n[FILTER]\n Name kubernetes\n Match 15 | \ application.*\n Kube_URL https://kubernetes.default.svc:443\n 16 | \ Kube_Tag_Prefix application.var.log.containers.\n Merge_Log On\n 17 | \ Merge_Log_Key log_processed\n K8S-Logging.Parser On\n K8S-Logging.Exclude 18 | Off\n Labels Off\n Annotations Off\n Use_Kubelet 19 | \ On\n Kubelet_Port 10250\n Buffer_Size 0\n\n[OUTPUT]\n 20 | \ Name cloudwatch_logs\n Match application.*\n 21 | \ region ${AWS_REGION}\n log_group_name /aws/containerinsights/${CLUSTER_NAME}/application\n 22 | \ log_stream_prefix ${HOST_NAME}-\n auto_create_group true\n extra_user_agent 23 | \ container-insights\n" 24 | dataplane-log.conf: | 25 | [INPUT] 26 | Name systemd 27 | Tag dataplane.systemd.* 28 | Systemd_Filter _SYSTEMD_UNIT=docker.service 29 | Systemd_Filter _SYSTEMD_UNIT=kubelet.service 30 | DB /var/fluent-bit/state/systemd.db 31 | Path /var/log/journal 32 | Read_From_Tail ${READ_FROM_TAIL} 33 | 34 | [INPUT] 35 | Name tail 36 | Tag dataplane.tail.* 37 | Path /var/log/containers/aws-node*, /var/log/containers/kube-proxy* 38 | Docker_Mode On 39 | Docker_Mode_Flush 5 40 | Docker_Mode_Parser container_firstline 41 | Parser docker 42 | DB /var/fluent-bit/state/flb_dataplane_tail.db 43 | Mem_Buf_Limit 50MB 44 | Skip_Long_Lines On 45 | Refresh_Interval 10 46 | Rotate_Wait 30 47 | storage.type filesystem 48 | Read_from_Head ${READ_FROM_HEAD} 49 | 50 | [FILTER] 51 | Name modify 52 | Match dataplane.systemd.* 53 | Rename _HOSTNAME hostname 54 | Rename _SYSTEMD_UNIT systemd_unit 55 | Rename MESSAGE message 56 | Remove_regex ^((?!hostname|systemd_unit|message).)*$ 57 | 58 | [FILTER] 59 | Name aws 60 | Match dataplane.* 61 | imds_version v1 62 | 63 | [OUTPUT] 64 | Name cloudwatch_logs 65 | Match dataplane.* 66 | region ${AWS_REGION} 67 | log_group_name /aws/containerinsights/${CLUSTER_NAME}/dataplane 68 | log_stream_prefix ${HOST_NAME}- 69 | auto_create_group true 70 | extra_user_agent container-insights 71 | fluent-bit.conf: "[SERVICE]\n Flush 5\n Log_Level info\n 72 | \ Daemon off\n Parsers_File parsers.conf\n 73 | \ HTTP_Server ${HTTP_SERVER}\n HTTP_Listen 0.0.0.0\n 74 | \ HTTP_Port ${HTTP_PORT}\n storage.path /var/fluent-bit/state/flb-storage/\n 75 | \ storage.sync normal\n storage.checksum off\n storage.backlog.mem_limit 76 | 5M\n \n@INCLUDE application-log.conf\n@INCLUDE dataplane-log.conf\n@INCLUDE 77 | host-log.conf\n" 78 | host-log.conf: | 79 | [INPUT] 80 | Name tail 81 | Tag host.dmesg 82 | Path /var/log/dmesg 83 | Parser syslog 84 | DB /var/fluent-bit/state/flb_dmesg.db 85 | Mem_Buf_Limit 5MB 86 | Skip_Long_Lines On 87 | Refresh_Interval 10 88 | Read_from_Head ${READ_FROM_HEAD} 89 | 90 | [INPUT] 91 | Name tail 92 | Tag host.messages 93 | Path /var/log/messages 94 | Parser syslog 95 | DB /var/fluent-bit/state/flb_messages.db 96 | Mem_Buf_Limit 5MB 97 | Skip_Long_Lines On 98 | Refresh_Interval 10 99 | Read_from_Head ${READ_FROM_HEAD} 100 | 101 | [INPUT] 102 | Name tail 103 | Tag host.secure 104 | Path /var/log/secure 105 | Parser syslog 106 | DB /var/fluent-bit/state/flb_secure.db 107 | Mem_Buf_Limit 5MB 108 | Skip_Long_Lines On 109 | Refresh_Interval 10 110 | Read_from_Head ${READ_FROM_HEAD} 111 | 112 | [FILTER] 113 | Name aws 114 | Match host.* 115 | imds_version v1 116 | 117 | [OUTPUT] 118 | Name cloudwatch_logs 119 | Match host.* 120 | region ${AWS_REGION} 121 | log_group_name /aws/containerinsights/${CLUSTER_NAME}/host 122 | log_stream_prefix ${HOST_NAME}. 123 | auto_create_group true 124 | extra_user_agent container-insights 125 | parsers.conf: | 126 | [PARSER] 127 | Name docker 128 | Format json 129 | Time_Key time 130 | Time_Format %Y-%m-%dT%H:%M:%S.%LZ 131 | 132 | [PARSER] 133 | Name syslog 134 | Format regex 135 | Regex ^(?