├── Makefile ├── README.md ├── design.pdf └── directory.md /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: prometheus_alerts.yaml prometheus_rules.yaml dashboards_out 2 | 3 | prometheus_alerts.yaml: mixin.libsonnet 4 | @mkdir -p out/ 5 | mixtool generate alerts -a out/prometheus_alerts.yaml -y $< 6 | 7 | prometheus_rules.yaml: mixin.libsonnet 8 | @mkdir -p out/ 9 | mixtool generate rules -r out/prometheus_rules.yaml -y $< 10 | 11 | dashboards_out: mixin.libsonnet 12 | @mkdir -p out/dashboards 13 | mixtool generate dashboards -d out/dashboards $< 14 | 15 | all: mixin.libsonnet 16 | @mkdir -p out/ 17 | mixtool generate all -d out/dashboards -r out/prometheus_rules.yaml -a out/prometheus_alerts.yaml -y $< -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Prometheus Monitoring Mixins 2 | 3 | > NOTE: This project is *beta* stage. 4 | 5 | A mixin is a set of Grafana dashboards and Prometheus rules and alerts, packaged together in a reuseable and extensible bundle. 6 | Mixins are written in [jsonnet](https://jsonnet.org/), and are typically installed and updated with [jsonnet-bundler](https://github.com/jsonnet-bundler/jsonnet-bundler). 7 | 8 | For more information about mixins, see: 9 | * [Prometheus Monitoring Mixins Design Doc](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/view). A [cached pdf](design.pdf) is included in this repo. 10 | * For more motivation, see 11 | "[The RED Method: How to instrument your services](https://kccncna17.sched.com/event/CU8K/the-red-method-how-to-instrument-your-services-b-tom-wilkie-kausal?iframe=no&w=100%&sidebar=yes&bg=no)" talk from CloudNativeCon Austin 2018. The KLUMPs system demo'd became the basis for the kubernetes-mixin. 12 | * "[Prometheus Monitoring Mixins: Using Jsonnet to Package Together Dashboards, Alerts and Exporters](https://www.youtube.com/watch?v=b7-DtFfsL6E)" talk from CloudNativeCon Copenhagen 2018. 13 | * "[Prometheus Monitoring Mixins: Using Jsonnet to Package Together Dashboards, Alerts and Exporters](https://promcon.io/2018-munich/talks/prometheus-monitoring-mixins/)" talk from PromCon 2018 (slightly updated). 14 | 15 | ## Mixin creation guidelines 16 | 17 | Mixins follow a standard structure, which is enforced by the [mixtool CLI](https://github.com/monitoring-mixins/mixtool). 18 | 19 | The schema defines 4 root jsonnet objects to keep Grafana dashboards, Prometheus alerts and rules, and configuration parameters to be applied on the fly when building the mixin. 20 | 21 | ``` 22 | . 23 | ├── grafanaDashboards:: 24 | ├── prometheusAlerts:: 25 | │ ├── groups: 26 | ├── prometheusRules:: 27 | │ ├── groups: 28 | ├── _config:: 29 | ``` 30 | 31 | This is the expected structure of a mixin jsonnet object definition. The actual recommended file structure has a `mixin.libsonnet` file as root, like the following. 32 | 33 | ``` 34 | . 35 | ├── mixin.libsonnet 36 | ├── dashboards 37 | │ ├── dashboards.libsonnet 38 | │ │ ├── grafanaDashboards:: 39 | ├── alerts 40 | │ ├── alerts.libsonnet 41 | │ │ ├── prometheusAlerts:: 42 | ├── rules 43 | │ ├── rules.libsonnet 44 | │ │ ├── prometheusRules:: 45 | ``` 46 | 47 | Please note that all the 3 jsonnet objects are made hidden/private (indicated by the trailing ::). 48 | This is the expected standard to be used with mixtool CLI. 49 | If you want to use pure jsonnet commands against the mixin, please make those objects public. 50 | 51 | The mixin.libsonnet file imports all the files described on the previous section, packaging up the mixin. 52 | It may also contain some changes to the original files, whenever needed. 53 | These can be defined in itself or importing another jsonnet definition file that changes any of the 4 objects of the structure. 54 | 55 | ```json 56 | (import 'dashboards/dashboards.libsonnet') + 57 | (import 'alerts/alerts.libsonnet') + 58 | (import 'rules/rules.libsonnet') 59 | (import 'config.libsonnet') + 60 | ``` 61 | 62 | ## How to use mixins. 63 | 64 | Mixins are designed to be vendored into the repo with your infrastructure config. 65 | To do this, use [jsonnet-bundler](https://github.com/jsonnet-bundler/jsonnet-bundler): 66 | 67 | You then have three options for deploying your dashboards 68 | 1. Generate the config files and deploy them yourself. 69 | 2. Use kube-prometheus to deploy this mixin. 70 | 3. Use Grizzly to deploy this mixin 71 | 72 | ## Generate config files 73 | 74 | You can manually generate the alerts, dashboards and rules files, but first you 75 | must install some tools. Make sure you're using golang v1.17 or higher, and run: 76 | 77 | ``` 78 | go install github.com/monitoring-mixins/mixtool/cmd/mixtool@master 79 | ``` 80 | 81 | Then, grab the mixin and its dependencies: 82 | 83 | ``` 84 | $ git clone https://github.com// 85 | $ cd 86 | $ jb install 87 | ``` 88 | 89 | Finally, build the mixin with the self contained Makefile, that can be copied from the one present in this repo: 90 | 91 | ``` 92 | $ make prometheus_alerts.yaml 93 | $ make prometheus_rules.yaml 94 | $ make dashboards_out 95 | $ make all 96 | ``` 97 | 98 | All files are generated inside an `out` folder. 99 | The `prometheus_alerts.yaml` and `prometheus_rules.yaml` file then need to passed 100 | to your Prometheus server, and the files in `out/dashboards` need to be imported 101 | into you Grafana server. The exact details will depending on how you deploy your 102 | monitoring stack to Kubernetes. 103 | 104 | ## Using kube-prometheus 105 | 106 | See the kube-prometheus docs for [instructions on how to use mixins with kube-prometheus](https://github.com/coreos/kube-prometheus#kube-prometheus). 107 | 108 | ## Using Grizzly 109 | 110 | See the grizzly docs for [instructions on how to use mixins with grizzly](https://grafana.github.io/grizzly/hidden-elements/) 111 | 112 | Grizzly support for monitoring-mixin standard is deprecated, but you can use [this adapter](https://github.com/grafana/jsonnet-libs/tree/master/grizzly) to make it work on newer versions of Grizlly. 113 | 114 | ## Customising the mixin 115 | 116 | Mixins typically allows you to override the selectors used for various jobs, 117 | to match those used in your Prometheus set. 118 | 119 | This example uses the [kubernetes-mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin). 120 | In a new directory, add a file `mixin.libsonnet`: 121 | 122 | ``` 123 | local kubernetes = import "kubernetes-mixin/mixin.libsonnet"; 124 | 125 | kubernetes { 126 | _config+:: { 127 | kubeStateMetricsSelector: 'job="kube-state-metrics"', 128 | cadvisorSelector: 'job="kubernetes-cadvisor"', 129 | nodeExporterSelector: 'job="kubernetes-node-exporter"', 130 | kubeletSelector: 'job="kubernetes-kubelet"', 131 | }, 132 | } 133 | ``` 134 | 135 | Then, install the kubernetes-mixin: 136 | 137 | ``` 138 | $ jb init 139 | $ jb install github.com/kubernetes-monitoring/kubernetes-mixin 140 | ``` 141 | 142 | Generate the alerts, rules and dashboards: 143 | 144 | ``` 145 | $ jsonnet -J vendor -S -e 'std.manifestYamlDoc((import "mixin.libsonnet").prometheusAlerts)' > alerts.yml 146 | $ jsonnet -J vendor -S -e 'std.manifestYamlDoc((import "mixin.libsonnet").prometheusRules)' > files/rules.yml 147 | $ jsonnet -J vendor -m files/dashboards -e '(import "mixin.libsonnet").grafanaDashboards' 148 | ``` 149 | 150 | ## Guidelines for alert names, labels, and annotations 151 | 152 | Prometheus alerts deliberately allow users to define their own schema for 153 | names, labels, and annotations. The following is a style guide recommended for 154 | alerts in monitoring mixins. Following this guide helps creating useful 155 | notification templates for all mixins and customizing mixin alerts in a unified 156 | fashion. 157 | 158 | The alert **name** is a terse description of the alerting condition, using 159 | camel case, without whitespace, starting with a capital letter. The first 160 | component of the name should be shared between all alerts of a mixin (or 161 | between a group of related alerts within a larger mixin). Examples: 162 | `NodeFilesystemAlmostOutOfFiles` (from the [node-exporter 163 | mixin](https://github.com/prometheus/node_exporter/tree/master/docs/node-mixin), 164 | `PrometheusNotificationQueueRunningFull` (from the [Prometheus 165 | mixin](https://github.com/prometheus/prometheus/blob/master/documentation/prometheus-mixin)). 166 | 167 | To mark the severity of an alert, use a **label** called `severity` with one of 168 | the following label values: 169 | - `critical` for alerts that require immediate action. For a production system, 170 | those alerts will usually hit a pager. 171 | - `warning` for alerts that require action eventually but not urgently enough 172 | to wake someone up or require them to immediately interrupt what they are 173 | working on. A typical routing target for those alerts is some kind of ticket 174 | queueing or bug tracking system. 175 | - `info` for alerts that do not require any action by itself but mark something 176 | as “out of the ordinary”. Those alerts aren't usually routed anywhere, but 177 | can be inspected during troubleshooting. 178 | 179 | An alert can have the following **annotations**: 180 | - `summary` (mandatory): Essentially a more comprehensive and readable version 181 | of the alert name. Use a human-readable sentence, starting with a capital 182 | letter and ending with a period. Use a static string or, if dynamic expansion 183 | is needed, aim for expanding into the same string for alerts that are 184 | typically grouped together into one notification. In that way, it can be used 185 | as a common “headline” for all alerts in the notification template. Examples: 186 | `Filesystem has less than 3% inodes left.` (for the 187 | `NodeFilesystemAlmostOutOfFiles` alert mentioned above), `Prometheus alert 188 | notification queue predicted to run full in less than 30m.` (for the 189 | `PrometheusNotificationQueueRunningFull` alert mentioned above). 190 | - `description` (mandatory): A detailed description of a single alert, with 191 | most of the important information templated in. The description usually 192 | expands into a different string for every individual alert within a 193 | notification. A notification template can iterate through all the 194 | descriptions and format them into a list. Examples (again corresponding to 195 | the examples above): `Filesystem on {{ $labels.device }} at {{ 196 | $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes 197 | left.`, `Alert notification queue of Prometheus %(prometheusName)s is running 198 | full.`. 199 | 200 | Note that we plan to add recommended optional annotations for a runbook link 201 | (presumably called `runbook_url`) and a dashboard link 202 | (`dashboard_url`). However, we still need to work out how to configure patterns 203 | for those URLs across mixins in a useful way. 204 | -------------------------------------------------------------------------------- /design.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monitoring-mixins/docs/5ef6256185cdc47db8139f576bf463a80d91780e/design.pdf -------------------------------------------------------------------------------- /directory.md: -------------------------------------------------------------------------------- 1 | # Monitoring Mixin Directory 2 | 3 | ***Please submit PRs for other mixins!*** 4 | 5 | | Target | Mixin | 6 | |--------|-------| 7 | | [Alertmanager](https://github.com/prometheus/alertmanager) | [prometheus/alertmanager/doc/alertmanager-mixin](https://github.com/prometheus/alertmanager/tree/master/doc/alertmanager-mixin) | 8 | | [Etcd](https://coreos.com/etcd/) | [etcd-io/etcd/Documentation/etcd-mixin](https://github.com/etcd-io/etcd/tree/master/Documentation/etcd-mixin) | 9 | | [Ceph](https://ceph.com/) | [ceph/ceph-mixin](https://github.com/ceph/ceph-mixins) | 10 | | [Cert Manager](https://gitlab.com/uneeq-oss/cert-manager-mixin) | [cert-manager-mixin](https://gitlab.com/uneeq-oss/cert-manager-mixin) | 11 | | [Gluster](https://redhatstorage.redhat.com/products/glusterfs/) | [gluster/gluster-mixin](https://github.com/gluster/gluster-mixins) | 12 | | [Hashicorp Consul](https://www.consul.io/) | [grafana/jsonnet-libs/consul-mixin](https://github.com/grafana/jsonnet-libs/tree/master/consul-mixin) | 13 | | [Jaeger](https://www.jaegertracing.io/) | [grafana/jsonnet-libs/jaeger-mixin](https://github.com/grafana/jsonnet-libs/tree/master/jaeger-mixin) | 14 | | [Kubernetes](https://kubernetes.io/) | [kubernetes-monitoring/kubernetes-mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin) | 15 | | [Kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) | [kubernetes/kube-state-metrics/jsonnet/kube-state-metrics-mixin](https://github.com/kubernetes/kube-state-metrics/tree/master/jsonnet/kube-state-metrics-mixin) | 16 | | [Memcached](https://memcached.org/) | [grafana/jsonnet-libs/memcached-mixin](https://github.com/grafana/jsonnet-libs/tree/master/memcached-mixin) | 17 | | [Node Exporter](https://github.com/prometheus/node_exporter) | [prometheus/node-exporter/docs/node-mixin](https://github.com/prometheus/node_exporter/tree/master/docs/node-mixin) | 18 | | [Prometheus](https://prometheus.io) | [prometheus/prometheus/docs/prometheus-mixin](https://github.com/prometheus/prometheus/tree/master/documentation/prometheus-mixin) | 19 | | [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) | [contrib/prometheus-mixin](https://github.com/bitnami-labs/sealed-secrets/tree/master/contrib/prometheus-mixin) | 20 | | [Spinnaker](https://gitlab.com/uneeq-oss/spinnaker-mixin) | [spinnaker-mixin](https://gitlab.com/uneeq-oss/spinnaker-mixin) | 21 | --------------------------------------------------------------------------------