├── LICENSE ├── Makefile ├── README.md ├── collector ├── corosync │ ├── corosync.go │ ├── corosync_test.go │ ├── parser.go │ └── parser_test.go ├── default_collector.go ├── default_collector_test.go ├── drbd │ ├── drbd.go │ └── drbd_test.go ├── instrumented_collector.go ├── instrumented_collector_test.go ├── pacemaker │ ├── cib │ │ ├── data.go │ │ ├── parser.go │ │ └── parser_test.go │ ├── crmmon │ │ ├── data.go │ │ ├── parser.go │ │ └── parser_test.go │ ├── pacemaker.go │ └── pacemaker_test.go └── sbd │ ├── sbd.go │ └── sbd_test.go ├── dashboards ├── README.md ├── grafana-ha-cluster-details.json ├── grafana-multi-cluster-overview.json ├── provider-sleha.yaml ├── screenshot-detail.png └── screenshot-multi.png ├── doc ├── design.md ├── development.md └── metrics.md ├── go.mod ├── go.sum ├── ha_cluster_exporter.service ├── ha_cluster_exporter.sysconfig ├── ha_cluster_exporter.yaml ├── internal ├── assert │ └── assertions.go └── clock │ ├── clock.go │ ├── stop_clock.go │ └── system_clock.go ├── main.go ├── main_test.go ├── packaging └── obs │ ├── grafana-ha-cluster-dashboards │ ├── _service │ ├── grafana-ha-cluster-dashboards.changes │ └── grafana-ha-cluster-dashboards.spec │ └── prometheus-ha_cluster_exporter │ ├── _service │ └── prometheus-ha_cluster_exporter.spec ├── supportconfig-ha_cluster_exporter └── test ├── corosync.metrics ├── drbd-splitbrain ├── drbd-split-brain-detected-missingthingsWrongSkippedMetrics ├── drbd-split-brain-detected-resource01-vol01 └── drbd-split-brain-detected-resource02-vol02 ├── drbd.metrics ├── dummy ├── fake_cibadmin.sh ├── fake_corosync-cfgtool.sh ├── fake_corosync-quorumtool.sh ├── fake_crm_mon.sh ├── fake_drbdsetup.sh ├── fake_sbd.sh ├── fake_sbd_dump.sh ├── fake_sbdconfig ├── mock_collector └── instrumented_collector.go ├── pacemaker.metrics ├── sbd.metrics └── test_config.yaml /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | https://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | Copyright 2019-2020 SUSE LLC 180 | 181 | Licensed under the Apache License, Version 2.0 (the "License"); 182 | you may not use this file except in compliance with the License. 183 | You may obtain a copy of the License at 184 | 185 | https://www.apache.org/licenses/LICENSE-2.0 186 | 187 | Unless required by applicable law or agreed to in writing, software 188 | distributed under the License is distributed on an "AS IS" BASIS, 189 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 190 | See the License for the specific language governing permissions and 191 | limitations under the License. 192 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | GO := GO111MODULE=on go 2 | FIRST_GOPATH := $(firstword $(subst :, ,$(shell $(GO) env GOPATH))) 3 | GOHOSTOS ?= $(shell $(GO) env GOHOSTOS) 4 | GOHOSTARCH ?= $(shell $(GO) env GOHOSTARCH) 5 | ifeq (arm, $(GOHOSTARCH)) 6 | GOHOSTARM ?= $(shell GOARM= $(GO) env GOARM) 7 | GO_BUILD_PLATFORM ?= $(GOHOSTOS)-$(GOHOSTARCH)v$(GOHOSTARM) 8 | else 9 | GO_BUILD_PLATFORM ?= $(GOHOSTOS)-$(GOHOSTARCH) 10 | endif 11 | PROMU := $(FIRST_GOPATH)/bin/promu 12 | PROMU_VERSION ?= 0.13.0 13 | PROMU_URL := https://github.com/prometheus/promu/releases/download/v$(PROMU_VERSION)/promu-$(PROMU_VERSION).$(GO_BUILD_PLATFORM).tar.gz 14 | 15 | # this is the what ends up in the RPM "Version" field and embedded in the --version CLI flag 16 | VERSION ?= $(shell .ci/get_version_from_git.sh) 17 | 18 | # if you want to release to OBS, this must be a remotely available Git reference 19 | REVISION ?= $(shell git rev-parse --abbrev-ref HEAD) 20 | 21 | # we only use this to comply with RPM changelog conventions at SUSE 22 | AUTHOR ?= shap-staff@suse.de 23 | 24 | # you can customize any of the following to build forks 25 | OBS_PROJECT ?= devel:sap:monitoring:factory 26 | REPOSITORY ?= clusterlabs/ha_cluster_exporter 27 | 28 | # the Go archs we crosscompile to 29 | ARCHS ?= amd64 arm64 ppc64le s390x 30 | 31 | default: clean mod-tidy generate fmt vet-check test build 32 | 33 | promu-prepare: 34 | sed "s/{{.Version}}/$(VERSION)/" .promu.yml >.promu.release.yml 35 | mkdir -p build/bin 36 | 37 | # from https://github.com/prometheus/prometheus/blob/main/Makefile.common 38 | $(PROMU): 39 | $(eval PROMU_TMP := $(shell mktemp -d)) 40 | curl -s -L $(PROMU_URL) | tar -xvzf - -C $(PROMU_TMP) 41 | mkdir -p $(FIRST_GOPATH)/bin 42 | cp $(PROMU_TMP)/promu-$(PROMU_VERSION).$(GO_BUILD_PLATFORM)/promu $(FIRST_GOPATH)/bin/promu 43 | rm -r $(PROMU_TMP) 44 | 45 | build: 46 | $(MAKE) clean 47 | $(MAKE) promu-prepare $(PROMU) 48 | $(MAKE) amd64 49 | 50 | build-all: 51 | $(MAKE) clean 52 | $(MAKE) promu-prepare $(PROMU) 53 | $(MAKE) $(ARCHS) 54 | 55 | $(ARCHS): 56 | GOOS=linux GOARCH=$@ $(PROMU) build --config .promu.release.yml --prefix=build/bin ha_cluster_exporter-$@ 57 | 58 | install: 59 | $(GO) install 60 | 61 | static-checks: vet-check fmt-check 62 | 63 | vet-check: 64 | $(GO) vet ./... 65 | 66 | fmt: 67 | $(GO) fmt ./... 68 | 69 | mod-tidy: 70 | $(GO) mod tidy 71 | 72 | fmt-check: 73 | .ci/go_lint.sh 74 | 75 | generate: 76 | $(GO) generate ./... 77 | 78 | test: 79 | $(GO) test -v ./... 80 | 81 | checks: static-checks test 82 | 83 | coverage: 84 | @mkdir -p build 85 | $(GO) test -cover -coverprofile=build/coverage ./... 86 | $(GO) tool cover -html=build/coverage 87 | 88 | clean: 89 | $(GO) clean 90 | rm -rf build 91 | rm -f .promu.release.yml 92 | 93 | exporter-obs-workdir: build/obs/prometheus-ha_cluster_exporter 94 | build/obs/prometheus-ha_cluster_exporter: 95 | @mkdir -p $@ 96 | osc checkout $(OBS_PROJECT) prometheus-ha_cluster_exporter -o $@ 97 | rm -f $@/*.tar.gz 98 | cp -rv packaging/obs/prometheus-ha_cluster_exporter/* $@/ 99 | # we interpolate environment variables in OBS _service file so that we control what is downloaded by the tar_scm source service 100 | sed -i 's~%%VERSION%%~$(VERSION)~' $@/_service 101 | sed -i 's~%%REVISION%%~$(REVISION)~' $@/_service 102 | sed -i 's~%%REPOSITORY%%~$(REPOSITORY)~' $@/_service 103 | go mod vendor 104 | tar --sort=name --mtime='UTC 1970-01-01' -c vendor | gzip -n > $@/vendor.tar.gz 105 | cd $@; osc service manualrun 106 | 107 | exporter-obs-changelog: exporter-obs-workdir 108 | .ci/gh_release_to_obs_changeset.py $(REPOSITORY) -a $(AUTHOR) -t $(REVISION) -f build/obs/prometheus-ha_cluster_exporter/prometheus-ha_cluster_exporter.changes 109 | 110 | exporter-obs-commit: exporter-obs-workdir 111 | cd build/obs/prometheus-ha_cluster_exporter; osc addremove 112 | cd build/obs/prometheus-ha_cluster_exporter; osc commit -m "Update from git rev $(REVISION)" 113 | 114 | dashboards-obs-workdir: build/obs/grafana-ha-cluster-dashboards 115 | build/obs/grafana-ha-cluster-dashboards: 116 | @mkdir -p $@ 117 | osc checkout $(OBS_PROJECT) grafana-ha-cluster-dashboards -o $@ 118 | rm -f $@/*.tar.gz 119 | cp -rv packaging/obs/grafana-ha-cluster-dashboards/* $@/ 120 | # we interpolate environment variables in OBS _service file so that we control what is downloaded by the tar_scm source service 121 | sed -i 's~%%REVISION%%~$(REVISION)~' $@/_service 122 | sed -i 's~%%REPOSITORY%%~$(REPOSITORY)~' $@/_service 123 | cd $@; osc service manualrun 124 | 125 | dashboards-obs-commit: dashboards-obs-workdir 126 | cd build/obs/grafana-ha-cluster-dashboards; osc addremove 127 | cd build/obs/grafana-ha-cluster-dashboards; osc commit -m "Update from git rev $(REVISION)" 128 | 129 | .PHONY: $(ARCHS) build build-all checks clean coverage dashboards-obs-commit dashboards-obs-workdir default download \ 130 | exporter-obs-changelog exporter-obs-commit exporter-obs-workdir fmt fmt-check generate install mod-tidy \ 131 | static-checks test vet-check 132 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ha_cluster_exporter 2 | 3 | [![Exporter CI](https://github.com/ClusterLabs/ha_cluster_exporter/workflows/Exporter%20CI/badge.svg)](https://github.com/ClusterLabs/ha_cluster_exporter/actions?query=workflow%3A%22Exporter+CI%22) 4 | [![Dashboards CI](https://github.com/ClusterLabs/ha_cluster_exporter/workflows/Dashboards%20CI/badge.svg)](https://github.com/ClusterLabs/ha_cluster_exporter/actions?query=workflow%3A%22Dashboards+CI%22) 5 | 6 | This is a bespoke Prometheus exporter used to enable the monitoring of Pacemaker based HA clusters. 7 | 8 | ## Table of Contents 9 | 1. [Features](#features) 10 | 2. [Installation](#installation) 11 | 3. [Usage](#usage) 12 | 1. [Metrics](doc/metrics.md) 13 | 2. [Dashboards](dashboards/README.md) 14 | 5. [Contributing](#contributing) 15 | 1. [Design](doc/design.md) 16 | 2. [Development](doc/development.md) 17 | 5. [License](#license) 18 | 19 | ## Features 20 | 21 | The exporter is a stateless HTTP endpoint. On each HTTP request, it locally inspects the cluster status by parsing pre-existing distributed data, provided by the tools of the various cluster components. 22 | 23 | Exported data include: 24 | - Pacemaker cluster summary, nodes and resources stats 25 | - Corosync ring errors and quorum votes 26 | - SBD devices health status 27 | - DRBD resources and connections stats 28 | (note: only DBRD v9 is supported; for v8.4, please refer to the [Prometheus Node Exporter](https://github.com/prometheus/node_exporter) project) 29 | 30 | A comprehensive list of all the metrics can be found in the [metrics document](doc/metrics.md). 31 | 32 | ## Installation 33 | 34 | The project can be installed in many ways, including but not limited to: 35 | 36 | 1. [Manual clone & build](#manual-clone-&-build) 37 | 2. [Go](#go) 38 | 3. [RPM](#rpm) 39 | 40 | ### Manual clone & build 41 | 42 | ``` 43 | git clone https://github.com/ClusterLabs/ha_cluster_exporter 44 | cd ha_cluster_exporter 45 | make 46 | make install 47 | ``` 48 | 49 | ### Go 50 | 51 | ``` 52 | go get github.com/ClusterLabs/ha_cluster_exporter 53 | ``` 54 | 55 | ### RPM 56 | 57 | On openSUSE or SUSE Linux Enterprise you can just use the `zypper` system package manager: 58 | ```shell 59 | zypper install prometheus-ha_cluster_exporter 60 | ``` 61 | 62 | You can find the latest development repositories at [SUSE's Open Build Service](https://build.opensuse.org/package/show/network:ha-clustering:sap-deployments:devel/prometheus-ha_cluster_exporter). 63 | 64 | ## Usage 65 | 66 | You can run the exporter in any of the cluster nodes. 67 | 68 | ``` 69 | $ ./ha_cluster_exporter 70 | INFO[0000] Serving metrics on 0.0.0.0:9664 71 | ``` 72 | 73 | Though not strictly required, it is _strongly_ advised to run it in all the nodes. 74 | 75 | It will export the metrics under the `/metrics` path, on port `9664` by default. 76 | 77 | While the exporter can run outside a HA cluster node, it won't export any metric it can't collect; e.g. it won't export DRBD metrics if it can't be locally inspected with `drbdsetup`. 78 | A warning message will inform the user of such cases. 79 | 80 | Please, refer to [doc/metrics.md](doc/metrics.md) for extensive details about all the exported metrics. 81 | 82 | To see a practical example of how to consume the metrics, we also provide a couple of [Grafana dashboards](dashboards). 83 | 84 | **Hint:** 85 | You can deploy a full HA Cluster via Terraform with [SUSE/ha-sap-terraform-deployments](https://github.com/SUSE/ha-sap-terraform-deployments). 86 | 87 | ### Configuration 88 | 89 | All the runtime parameters can be configured either via CLI flags or via a configuration file, both or which are completely optional. 90 | 91 | For more details, refer to the help message via `ha_cluster_exporter --help`. 92 | 93 | **Note**: 94 | the built-in defaults are tailored for the latest version of SUSE Linux Enterprise and openSUSE. 95 | 96 | The program will scan, in order, the current working directory, `$HOME/.config`, `/etc` and `/usr/etc` for files named `ha_cluster_exporter.(yaml|json|toml)`. 97 | The first match has precedence, and the CLI flags have precedence over the config file. 98 | 99 | Please refer to the example [YAML configuration](ha_cluster_exporter.yaml) for more details. 100 | 101 | Additional CLI flags can also be passed via `/etc/sysconfig/prometheus-ha_cluster_exporter`. 102 | 103 | #### General Flags 104 | 105 | Name | Description 106 | ---- | ----------- 107 | web.listen-address | Address to listen on for web interface and telemetry (default `:9664`). 108 | web.telemetry-path | Path under which to expose metrics (default `/metrics`). 109 | web.config.file | Path to a [web configuration file](#tls-and-basic-authentication) (default `/etc/ha_cluster_exporter.web.yaml`). 110 | log.level | Logging verbosity (default `info`). 111 | version | Print the version information. 112 | 113 | ##### Deprecated Flags 114 | Name | Description 115 | ---- | ----------- 116 | address | deprecated: please use --web.listen-address or --web.config.file to use Prometheus Exporter Toolkit 117 | port | deprecated: please use --web.listen-address or --web.config.file to use Prometheus Exporter Toolkit 118 | log-level | deprecated: please use log.level 119 | enable-timestamps | deprecated: server-side metric timestamping is discouraged by Prometheus best-practices and should be avoided 120 | 121 | #### Collector Flags 122 | 123 | Name | Description 124 | ---- | ----------- 125 | crm-mon-path | Path to crm_mon executable (default `/usr/sbin/crm_mon`). 126 | cibadmin-path | Path to cibadmin executable (default `/usr/sbin/cibadmin`). 127 | corosync-cfgtoolpath-path | Path to corosync-cfgtool executable (default `/usr/sbin/corosync-cfgtool`). 128 | corosync-quorumtool-path | Path to corosync-quorumtool executable (default `/usr/sbin/corosync-quorumtool`). 129 | sbd-path | Path to sbd executable (default `/usr/sbin/sbd`). 130 | sbd-config-path | Path to sbd configuration (default `/etc/sysconfig/sbd`). 131 | drbdsetup-path | Path to drbdsetup executable (default `/sbin/drbdsetup`). 132 | drbdsplitbrain-path | Path to drbd splitbrain hooks temporary files (default `/var/run/drbd/splitbrain`). 133 | 134 | ### TLS and basic authentication 135 | 136 | The ha_cluster_exporter supports TLS and basic authentication. 137 | 138 | To use TLS and/or basic authentication, you need to pass a configuration file 139 | using the `--web.config.file` parameter. The format of the file is described 140 | [in the exporter-toolkit repository](https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md). 141 | 142 | ### systemd integration 143 | 144 | A [systemd unit file](ha_cluster_exporter.service) is provided with the RPM packages. You can enable and start it as usual: 145 | 146 | ``` 147 | systemctl --now enable prometheus-ha_cluster_exporter 148 | ``` 149 | 150 | ## Development 151 | 152 | Pull requests are more than welcome! 153 | 154 | We recommend having a look at the [design document](doc/design.md) and the [development notes](doc/development.md) before contributing. 155 | 156 | ## License 157 | 158 | Copyright 2019-2022 SUSE LLC 159 | 160 | Licensed under the Apache License, Version 2.0 (the "License"); 161 | you may not use this file except in compliance with the License. 162 | You may obtain a copy of the License at 163 | 164 | https://www.apache.org/licenses/LICENSE-2.0 165 | 166 | Unless required by applicable law or agreed to in writing, software 167 | distributed under the License is distributed on an "AS IS" BASIS, 168 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 169 | See the License for the specific language governing permissions and 170 | limitations under the License. 171 | -------------------------------------------------------------------------------- /collector/corosync/corosync.go: -------------------------------------------------------------------------------- 1 | package corosync 2 | 3 | import ( 4 | "os/exec" 5 | 6 | "github.com/go-kit/log" 7 | "github.com/go-kit/log/level" 8 | "github.com/pkg/errors" 9 | "github.com/prometheus/client_golang/prometheus" 10 | 11 | "github.com/ClusterLabs/ha_cluster_exporter/collector" 12 | ) 13 | 14 | const subsystem = "corosync" 15 | 16 | func NewCollector(cfgToolPath string, quorumToolPath string, timestamps bool, logger log.Logger) (*corosyncCollector, error) { 17 | err := collector.CheckExecutables(cfgToolPath, quorumToolPath) 18 | if err != nil { 19 | return nil, errors.Wrapf(err, "could not initialize '%s' collector", subsystem) 20 | } 21 | 22 | c := &corosyncCollector{ 23 | collector.NewDefaultCollector(subsystem, timestamps, logger), 24 | cfgToolPath, 25 | quorumToolPath, 26 | NewParser(), 27 | } 28 | c.SetDescriptor("quorate", "Whether or not the cluster is quorate", nil) 29 | c.SetDescriptor("rings", "The status of each Corosync ring; 1 means healthy, 0 means faulty.", []string{"ring_id", "node_id", "number", "address"}) 30 | c.SetDescriptor("ring_errors", "The total number of faulty corosync rings", nil) 31 | c.SetDescriptor("member_votes", "How many votes each member node has contributed with to the current quorum", []string{"node_id", "node", "local"}) 32 | c.SetDescriptor("quorum_votes", "Cluster quorum votes; one line per type", []string{"type"}) 33 | 34 | return c, nil 35 | } 36 | 37 | type corosyncCollector struct { 38 | collector.DefaultCollector 39 | cfgToolPath string 40 | quorumToolPath string 41 | parser Parser 42 | } 43 | 44 | func (c *corosyncCollector) CollectWithError(ch chan<- prometheus.Metric) error { 45 | level.Debug(c.Logger).Log("msg", "Collecting corosync metrics...") 46 | 47 | // We suppress the exec errors because if any interface is faulty the tools will exit with code 1, but we still want to parse the output. 48 | cfgToolOutput, _ := exec.Command(c.cfgToolPath, "-s").Output() 49 | quorumToolOutput, _ := exec.Command(c.quorumToolPath, "-p").Output() 50 | 51 | status, err := c.parser.Parse(cfgToolOutput, quorumToolOutput) 52 | if err != nil { 53 | return errors.Wrap(err, "corosync parser error") 54 | } 55 | 56 | c.collectRings(status, ch) 57 | c.collectRingErrors(status, ch) 58 | c.collectQuorate(status, ch) 59 | c.collectQuorumVotes(status, ch) 60 | c.collectMemberVotes(status, ch) 61 | 62 | return nil 63 | } 64 | 65 | func (c *corosyncCollector) Collect(ch chan<- prometheus.Metric) { 66 | level.Debug(c.Logger).Log("msg", "Collecting corosync metrics...") 67 | 68 | err := c.CollectWithError(ch) 69 | if err != nil { 70 | level.Warn(c.Logger).Log("msg", c.GetSubsystem()+" collector scrape failed", "err", err) 71 | } 72 | } 73 | 74 | func (c *corosyncCollector) collectQuorumVotes(status *Status, ch chan<- prometheus.Metric) { 75 | ch <- c.MakeGaugeMetric("quorum_votes", float64(status.QuorumVotes.ExpectedVotes), "expected_votes") 76 | ch <- c.MakeGaugeMetric("quorum_votes", float64(status.QuorumVotes.HighestExpected), "highest_expected") 77 | ch <- c.MakeGaugeMetric("quorum_votes", float64(status.QuorumVotes.TotalVotes), "total_votes") 78 | ch <- c.MakeGaugeMetric("quorum_votes", float64(status.QuorumVotes.Quorum), "quorum") 79 | } 80 | 81 | func (c *corosyncCollector) collectQuorate(status *Status, ch chan<- prometheus.Metric) { 82 | var quorate float64 83 | if status.Quorate { 84 | quorate = 1 85 | } 86 | ch <- c.MakeGaugeMetric("quorate", quorate) 87 | } 88 | 89 | func (c *corosyncCollector) collectRingErrors(status *Status, ch chan<- prometheus.Metric) { 90 | var numErrors float64 91 | for _, ring := range status.Rings { 92 | if ring.Faulty { 93 | numErrors += 1 94 | } 95 | } 96 | ch <- c.MakeGaugeMetric("ring_errors", numErrors) 97 | } 98 | 99 | func (c *corosyncCollector) collectRings(status *Status, ch chan<- prometheus.Metric) { 100 | for _, ring := range status.Rings { 101 | var healthy float64 = 1 102 | if ring.Faulty { 103 | healthy = 0 104 | } 105 | ch <- c.MakeGaugeMetric("rings", healthy, status.RingId, status.NodeId, ring.Number, ring.Address) 106 | } 107 | } 108 | 109 | func (c *corosyncCollector) collectMemberVotes(status *Status, ch chan<- prometheus.Metric) { 110 | for _, member := range status.Members { 111 | local := "false" 112 | if member.Local { 113 | local = "true" 114 | } 115 | ch <- c.MakeGaugeMetric("member_votes", float64(member.Votes), member.Id, member.Name, local) 116 | } 117 | } 118 | -------------------------------------------------------------------------------- /collector/corosync/corosync_test.go: -------------------------------------------------------------------------------- 1 | package corosync 2 | 3 | import ( 4 | "testing" 5 | 6 | "github.com/go-kit/log" 7 | "github.com/stretchr/testify/assert" 8 | 9 | assertcustom "github.com/ClusterLabs/ha_cluster_exporter/internal/assert" 10 | ) 11 | 12 | func TestNewCorosyncCollector(t *testing.T) { 13 | _, err := NewCollector("../../test/fake_corosync-cfgtool.sh", "../../test/fake_corosync-quorumtool.sh", false, log.NewNopLogger()) 14 | assert.Nil(t, err) 15 | } 16 | 17 | func TestNewCorosyncCollectorChecksCfgtoolExistence(t *testing.T) { 18 | _, err := NewCollector("../../test/nonexistent", "../../test/fake_corosync-quorumtool.sh", false, log.NewNopLogger()) 19 | 20 | assert.Error(t, err) 21 | assert.Contains(t, err.Error(), "'../../test/nonexistent' does not exist") 22 | } 23 | 24 | func TestNewCorosyncCollectorChecksQuorumtoolExistence(t *testing.T) { 25 | _, err := NewCollector("../../test/fake_corosync-cfgtool.sh", "../../test/nonexistent", false, log.NewNopLogger()) 26 | 27 | assert.Error(t, err) 28 | assert.Contains(t, err.Error(), "'../../test/nonexistent' does not exist") 29 | } 30 | 31 | func TestNewCorosyncCollectorChecksCfgtoolExecutableBits(t *testing.T) { 32 | _, err := NewCollector("../../test/dummy", "../../test/fake_corosync-quorumtool.sh", false, log.NewNopLogger()) 33 | 34 | assert.Error(t, err) 35 | assert.Contains(t, err.Error(), "'../../test/dummy' is not executable") 36 | } 37 | 38 | func TestNewCorosyncCollectorChecksQuorumtoolExecutableBits(t *testing.T) { 39 | _, err := NewCollector("../../test/fake_corosync-cfgtool.sh", "../../test/dummy", false, log.NewNopLogger()) 40 | 41 | assert.Error(t, err) 42 | assert.Contains(t, err.Error(), "'../../test/dummy' is not executable") 43 | } 44 | 45 | func TestCorosyncCollector(t *testing.T) { 46 | collector, _ := NewCollector("../../test/fake_corosync-cfgtool.sh", "../../test/fake_corosync-quorumtool.sh", false, log.NewNopLogger()) 47 | assertcustom.Metrics(t, collector, "corosync.metrics") 48 | } 49 | -------------------------------------------------------------------------------- /collector/corosync/parser.go: -------------------------------------------------------------------------------- 1 | package corosync 2 | 3 | import ( 4 | "regexp" 5 | "strconv" 6 | "strings" 7 | 8 | "github.com/pkg/errors" 9 | ) 10 | 11 | type Parser interface { 12 | Parse(cfgToolOutput []byte, quorumToolOutput []byte) (*Status, error) 13 | } 14 | 15 | type Status struct { 16 | NodeId string 17 | RingId string 18 | Rings []Ring 19 | QuorumVotes QuorumVotes 20 | Quorate bool 21 | Members []Member 22 | } 23 | 24 | type QuorumVotes struct { 25 | ExpectedVotes uint64 26 | HighestExpected uint64 27 | TotalVotes uint64 28 | Quorum uint64 29 | } 30 | 31 | type Ring struct { 32 | Number string 33 | Address string 34 | Faulty bool 35 | } 36 | 37 | type Member struct { 38 | Id string 39 | Name string 40 | Qdevice string 41 | Votes uint64 42 | Local bool 43 | } 44 | 45 | func NewParser() Parser { 46 | return &defaultParser{} 47 | } 48 | 49 | type defaultParser struct{} 50 | 51 | func (p *defaultParser) Parse(cfgToolOutput []byte, quorumToolOutput []byte) (*Status, error) { 52 | status := &Status{} 53 | var err error 54 | 55 | status.NodeId, err = parseNodeId(quorumToolOutput) 56 | if err != nil { 57 | return nil, errors.Wrap(err, "could not parse node id in corosync-quorumtool output") 58 | } 59 | 60 | status.RingId, err = parseRingId(quorumToolOutput) 61 | if err != nil { 62 | return nil, errors.Wrap(err, "could not parse ring id and seq number in corosync-quorumtool output") 63 | } 64 | 65 | status.Quorate, err = parseQuorate(quorumToolOutput) 66 | if err != nil { 67 | return nil, errors.Wrap(err, "could not parse quorate in corosync-quorumtool output") 68 | } 69 | 70 | status.QuorumVotes, err = parseQuoromVotes(quorumToolOutput) 71 | if err != nil { 72 | return nil, errors.Wrap(err, "could not parse quorum votes in corosync-quorumtool output") 73 | } 74 | 75 | status.Members, err = parseMembers(quorumToolOutput) 76 | if err != nil { 77 | return nil, errors.Wrap(err, "could not parse members in corosync-quorumtool output") 78 | } 79 | 80 | status.Rings = parseRings(cfgToolOutput) 81 | 82 | return status, nil 83 | } 84 | 85 | func parseNodeId(quorumToolOutput []byte) (string, error) { 86 | nodeRe := regexp.MustCompile(`(?m)Node ID:\s+(\w+)`) 87 | matches := nodeRe.FindSubmatch(quorumToolOutput) 88 | if matches == nil { 89 | return "", errors.New("could not find Node ID line") 90 | } 91 | 92 | return string(matches[1]), nil 93 | } 94 | 95 | func parseRingId(quorumToolOutput []byte) (string, error) { 96 | // the following regex matches and capture the ring id from this kind of output from corosync-quorumtool 97 | // the ring id is composed by the representative node id (not to be confused with the local node id) and the sequence number 98 | /* 99 | Quorum information 100 | ------------------ 101 | Date: Sun Sep 29 16:10:37 2019 102 | Quorum provider: corosync_votequorum 103 | Nodes: 2 104 | Node ID: 1084780051 105 | Ring ID: 1084780051.44 106 | Quorate: Yes 107 | */ 108 | // in corosync < v2.99 the line is slightly different: 109 | /* 110 | Ring ID: 1084780051/44 111 | */ 112 | // in corosync < v2.4 there is no representative node id: 113 | /* 114 | Ring ID: 1084780051 115 | */ 116 | 117 | // given the differences in format between corosync versions, we just parse it as a whole string 118 | re := regexp.MustCompile(`(?m)Ring ID:\s+\b(.+)\b`) 119 | matches := re.FindSubmatch(quorumToolOutput) 120 | if matches == nil { 121 | return "", errors.New("could not find Ring ID line") 122 | } 123 | 124 | return string(matches[1]), nil 125 | } 126 | 127 | func parseQuorate(quorumToolOutput []byte) (bool, error) { 128 | re := regexp.MustCompile(`(?m)Quorate:\s+(Yes|No)`) 129 | matches := re.FindSubmatch(quorumToolOutput) 130 | if matches == nil { 131 | return false, errors.New("could not find Quorate line") 132 | } 133 | 134 | if string(matches[1]) == "Yes" { 135 | return true, nil 136 | } 137 | 138 | return false, nil 139 | } 140 | 141 | func parseRings(cfgToolOutput []byte) []Ring { 142 | // the following regex matches and capture all the relevant elements of this kind of output from corosync-cfgtool 143 | /* 144 | RING ID 0 145 | id = 192.168.125.15 146 | status = ring 0 active with no faults 147 | */ 148 | // in corosync v2.99.0+ this has changed to 149 | /* 150 | Link ID 0 151 | addr = 192.168.125.15 152 | status = ring 0 active with no faults 153 | */ 154 | re := regexp.MustCompile(`(?m)(?PRING|Link) ID (?P\d+)\s+(?Pid|addr)\s+= (?P
.+)\s+status\s+= (?P.+)`) 155 | matches := re.FindAllSubmatch(cfgToolOutput, -1) 156 | rings := make([]Ring, len(matches)) 157 | for i, match := range matches { 158 | namedMatches := extractRENamedCaptureGroups(re, match) 159 | 160 | rings[i] = Ring{ 161 | Number: namedMatches["number"], 162 | Address: namedMatches["address"], 163 | Faulty: strings.Contains(namedMatches["status"], "FAULTY"), 164 | } 165 | } 166 | return rings 167 | } 168 | 169 | func parseQuoromVotes(quorumToolOutput []byte) (quorumVotes QuorumVotes, err error) { 170 | // the following regex matches and capture all the relevant elements of this kind of output from corosync-quorumtool 171 | /* 172 | Votequorum information 173 | ---------------------- 174 | Expected votes: 2 175 | Highest expected: 2 176 | Total votes: 1 177 | Quorum: 1 178 | Flags: 2Node Quorate 179 | */ 180 | re := regexp.MustCompile(`(?m)Expected votes:\s+(\d+)\s+Highest expected:\s+(\d+)\s+Total votes:\s+(\d+)\s+Quorum:\s+(\d+)`) 181 | 182 | matches := re.FindSubmatch(quorumToolOutput) 183 | if matches == nil { 184 | return quorumVotes, errors.New("could not find quorum votes numbers") 185 | } 186 | 187 | quorumVotes.ExpectedVotes, err = strconv.ParseUint(string(matches[1]), 10, 64) 188 | if err != nil { 189 | return quorumVotes, errors.Wrap(err, "could not parse vote number to uint64") 190 | } 191 | 192 | quorumVotes.HighestExpected, err = strconv.ParseUint(string(matches[2]), 10, 64) 193 | if err != nil { 194 | return quorumVotes, errors.Wrap(err, "could not parse vote number to uint64") 195 | } 196 | 197 | quorumVotes.TotalVotes, err = strconv.ParseUint(string(matches[3]), 10, 64) 198 | if err != nil { 199 | return quorumVotes, errors.Wrap(err, "could not parse vote number to uint64") 200 | } 201 | 202 | quorumVotes.Quorum, err = strconv.ParseUint(string(matches[4]), 10, 64) 203 | if err != nil { 204 | return quorumVotes, errors.Wrap(err, "could not parse vote number to uint64") 205 | } 206 | 207 | return quorumVotes, nil 208 | } 209 | 210 | func parseMembers(quorumToolOutput []byte) (members []Member, err error) { 211 | // the following regex matches and capture all the relevant elements of this kind of output from corosync-quorumtool 212 | /* 213 | Membership information 214 | ---------------------- 215 | Nodeid Votes Qdevice Name 216 | 1 1 A,V,NMW nfs01 (local) 217 | 2 1 A,V,NMW nfs02 218 | 0 1 Qdevice 219 | */ 220 | sectionRE := regexp.MustCompile(`(?m)Membership information\n-+\s+Nodeid\s+Votes\s+Qdevice\s+Name\n+((?:.*\n?)+)`) 221 | sectionMatch := sectionRE.FindSubmatch(quorumToolOutput) 222 | if sectionMatch == nil { 223 | return nil, errors.New("could not find membership information") 224 | } 225 | 226 | // we also need a second regex to capture the single elements of each node line, e.g.: 227 | /* 228 | 1 1 A,V,NMW 192.168.125.24 (local) 229 | */ 230 | linesRE := regexp.MustCompile(`(?m)(?P\w+)\s+(?P\d+)\s+(?P(\w,?)+)?\s+(?P[^\s]+)(?:\s(?P\(local\)))?\n?`) 231 | linesMatches := linesRE.FindAllSubmatch(sectionMatch[1], -1) 232 | for _, match := range linesMatches { 233 | matches := extractRENamedCaptureGroups(linesRE, match) 234 | 235 | votes, err := strconv.ParseUint(matches["votes"], 10, 64) 236 | if err != nil { 237 | return nil, errors.Wrap(err, "could not parse vote number to uint64") 238 | } 239 | 240 | var local bool 241 | if matches["local"] != "" { 242 | local = true 243 | } 244 | 245 | members = append(members, Member{ 246 | Id: matches["node_id"], 247 | Name: matches["name"], 248 | Votes: votes, 249 | Local: local, 250 | Qdevice: matches["qdevice"], 251 | }) 252 | } 253 | 254 | return members, nil 255 | } 256 | 257 | // extracts (?P) RegEx capture groups from a match, to avoid numerical index lookups 258 | func extractRENamedCaptureGroups(ringsRe *regexp.Regexp, match [][]byte) map[string]string { 259 | namedMatches := make(map[string]string) 260 | for i, name := range ringsRe.SubexpNames() { 261 | if i != 0 && name != "" { 262 | namedMatches[name] = string(match[i]) 263 | } 264 | } 265 | return namedMatches 266 | } 267 | -------------------------------------------------------------------------------- /collector/corosync/parser_test.go: -------------------------------------------------------------------------------- 1 | package corosync 2 | 3 | import ( 4 | "strconv" 5 | "testing" 6 | 7 | "github.com/stretchr/testify/assert" 8 | ) 9 | 10 | func TestParse(t *testing.T) { 11 | p := NewParser() 12 | 13 | cfgToolOutput := []byte(`Printing link status. 14 | Local node ID 1084780051 15 | Link ID 0 16 | addr = 10.0.0.1 17 | status = OK 18 | Link ID 1 19 | addr = 172.16.0.1 20 | status = OK`) 21 | 22 | quoromToolOutput := []byte(`Quorum information 23 | ------------------ 24 | Date: Sun Sep 29 16:10:37 2019 25 | Quorum provider: corosync_votequorum 26 | Nodes: 2 27 | Node ID: 1084780051 28 | Ring ID: 1084780051.44 29 | Quorate: Yes 30 | 31 | Votequorum information 32 | ---------------------- 33 | Expected votes: 232 34 | Highest expected: 22 35 | Total votes: 21 36 | Quorum: 421 37 | Flags: 2Node Quorate WaitForAll 38 | 39 | Membership information 40 | ---------------------- 41 | Nodeid Votes Qdevice Name 42 | 1084780051 1 NR dma-dog-hana01 (local) 43 | 1084780052 1 A,V,NMW dma-dog-hana02`) 44 | 45 | status, err := p.Parse(cfgToolOutput, quoromToolOutput) 46 | assert.NoError(t, err) 47 | 48 | rings := status.Rings 49 | 50 | assert.Len(t, rings, 2) 51 | assert.Equal(t, "0", rings[0].Number) 52 | assert.Equal(t, "10.0.0.1", rings[0].Address) 53 | assert.False(t, rings[0].Faulty) 54 | assert.Equal(t, "1", rings[1].Number) 55 | assert.Equal(t, "172.16.0.1", rings[1].Address) 56 | assert.False(t, rings[1].Faulty) 57 | 58 | assert.True(t, status.Quorate) 59 | assert.Equal(t, "1084780051", status.NodeId) 60 | assert.Equal(t, "1084780051.44", status.RingId) 61 | assert.EqualValues(t, 232, status.QuorumVotes.ExpectedVotes) 62 | assert.EqualValues(t, 22, status.QuorumVotes.HighestExpected) 63 | assert.EqualValues(t, 21, status.QuorumVotes.TotalVotes) 64 | assert.EqualValues(t, 421, status.QuorumVotes.Quorum) 65 | 66 | members := status.Members 67 | assert.Len(t, members, 2) 68 | assert.Exactly(t, "1084780051", members[0].Id) 69 | assert.Exactly(t, "dma-dog-hana01", members[0].Name) 70 | assert.Exactly(t, "NR", members[0].Qdevice) 71 | assert.True(t, members[0].Local) 72 | assert.EqualValues(t, 1, members[0].Votes) 73 | assert.Exactly(t, "1084780052", members[1].Id) 74 | assert.Exactly(t, "dma-dog-hana02", members[1].Name) 75 | assert.Exactly(t, "A,V,NMW", members[1].Qdevice) 76 | assert.False(t, members[1].Local) 77 | assert.EqualValues(t, 1, members[1].Votes) 78 | } 79 | 80 | func TestParseRingIdInCorosyncV2_4(t *testing.T) { 81 | quoromToolOutput := []byte(`Quorum information 82 | ------------------ 83 | Date: Sun Sep 29 16:10:37 2019 84 | Quorum provider: corosync_votequorum 85 | Nodes: 2 86 | Node ID: 1084780051 87 | Ring ID: 1084780051/44 88 | Quorate: Yes 89 | 90 | Votequorum information 91 | ---------------------- 92 | Expected votes: 232 93 | Highest expected: 22 94 | Total votes: 21 95 | Quorum: 421 96 | Flags: 2Node Quorate WaitForAll 97 | 98 | Membership information 99 | ---------------------- 100 | Nodeid Votes Name 101 | 1084780051 1 dma-dog-hana01 (local) 102 | 1084780052 1 dma-dog-hana02`) 103 | 104 | ringId, err := parseRingId(quoromToolOutput) 105 | assert.NoError(t, err) 106 | 107 | assert.Equal(t, "1084780051/44", ringId) 108 | } 109 | 110 | func TestParseRingIdInCorosyncV2_3(t *testing.T) { 111 | quoromToolOutput := []byte(`Quorum information 112 | ------------------ 113 | Date: Wed May 27 14:16:10 2020 114 | Quorum provider: corosync_votequorum 115 | Nodes: 2 116 | Node ID: 1 117 | Ring ID: 100 118 | Quorate: Yes 119 | Votequorum information 120 | ---------------------- 121 | Expected votes: 2 122 | Highest expected: 2 123 | Total votes: 2 124 | Quorum: 1 125 | Flags: 2Node Quorate WaitForAll 126 | Membership information 127 | ---------------------- 128 | Nodeid Votes Name 129 | 1 1 10.1.2.4 (local) 130 | 2 1 10.1.2.5`) 131 | 132 | ringId, err := parseRingId(quoromToolOutput) 133 | assert.NoError(t, err) 134 | 135 | assert.Equal(t, "100", ringId) 136 | } 137 | 138 | func TestParseFaultyRings(t *testing.T) { 139 | cfgToolOutput := []byte(`Printing ring status. 140 | Local node ID 16777226 141 | Link ID 0 142 | addr = 10.0.0.1 143 | status = Marking ringid 0 interface 10.0.0.1 FAULTY 144 | Link ID 1 145 | addr = 172.16.0.1 146 | status = ring 1 active with no faults`) 147 | 148 | rings := parseRings(cfgToolOutput) 149 | 150 | assert.Len(t, rings, 2) 151 | assert.True(t, rings[0].Faulty) 152 | assert.False(t, rings[1].Faulty) 153 | } 154 | 155 | func TestParseFaultyRingsInCorosyncV2(t *testing.T) { 156 | cfgToolOutput := []byte(`Printing ring status. 157 | Local node ID 16777226 158 | RING ID 0 159 | id = 10.0.0.1 160 | status = Marking ringid 0 interface 10.0.0.1 FAULTY 161 | RING ID 1 162 | id = 172.16.0.1 163 | status = ring 1 active with no faults`) 164 | 165 | rings := parseRings(cfgToolOutput) 166 | 167 | assert.Len(t, rings, 2) 168 | assert.True(t, rings[0].Faulty) 169 | assert.False(t, rings[1].Faulty) 170 | } 171 | 172 | func TestParseNodeIdEmptyError(t *testing.T) { 173 | quoromToolOutput := []byte(``) 174 | 175 | _, err := parseNodeId(quoromToolOutput) 176 | assert.EqualError(t, err, "could not find Node ID line") 177 | } 178 | 179 | func TestParseNoQuorate(t *testing.T) { 180 | quoromToolOutput := []byte(`Quorate: No`) 181 | 182 | quorate, err := parseQuorate(quoromToolOutput) 183 | assert.NoError(t, err) 184 | assert.False(t, quorate) 185 | } 186 | 187 | func TestParseQuorateEmptyError(t *testing.T) { 188 | quoromToolOutput := []byte(``) 189 | 190 | _, err := parseQuorate(quoromToolOutput) 191 | assert.EqualError(t, err, "could not find Quorate line") 192 | } 193 | 194 | func TestParseQuorumVotesEmptyError(t *testing.T) { 195 | quoromToolOutput := []byte(``) 196 | 197 | _, err := parseQuoromVotes(quoromToolOutput) 198 | assert.EqualError(t, err, "could not find quorum votes numbers") 199 | } 200 | 201 | func TestParseRingIdEmptyError(t *testing.T) { 202 | quoromToolOutput := []byte(``) 203 | 204 | _, err := parseRingId(quoromToolOutput) 205 | assert.EqualError(t, err, "could not find Ring ID line") 206 | } 207 | 208 | func TestParseQuorumVotesUintErrors(t *testing.T) { 209 | quorumToolOutputs := [][]byte{ 210 | []byte(` 211 | Expected votes: 10000000000000000000000000000000000000000000000 212 | Highest expected: 1 213 | Total votes: 1 214 | Quorum: 1 215 | `), 216 | []byte(` 217 | Expected votes: 1 218 | Highest expected: 10000000000000000000000000000000000000000000000 219 | Total votes: 1 220 | Quorum: 1 221 | `), 222 | []byte(` 223 | Expected votes: 1 224 | Highest expected: 1 225 | Total votes: 10000000000000000000000000000000000000000000000 226 | Quorum: 1 227 | `), 228 | []byte(` 229 | Expected votes: 1 230 | Highest expected: 1 231 | Total votes: 1 232 | Quorum: 10000000000000000000000000000000000000000000000 233 | `), 234 | } 235 | for i, quorumToolOutput := range quorumToolOutputs { 236 | t.Run(strconv.Itoa(i), func(t *testing.T) { 237 | _, err := parseQuoromVotes(quorumToolOutput) 238 | assert.Error(t, err) 239 | assert.Contains(t, err.Error(), "could not parse vote number to uint64") 240 | assert.Contains(t, err.Error(), "value out of range") 241 | }) 242 | } 243 | } 244 | 245 | func TestParseMembersEmptyError(t *testing.T) { 246 | quoromToolOutput := []byte(``) 247 | 248 | _, err := parseMembers(quoromToolOutput) 249 | assert.EqualError(t, err, "could not find membership information") 250 | } 251 | 252 | func TestParseMembersUintError(t *testing.T) { 253 | quoromToolOutput := []byte(`Membership information 254 | ---------------------- 255 | Nodeid Votes Qdevice Name 256 | 1084780051 10000000000000000000000000000000000000000000000 NW dma-dog-hana01`) 257 | 258 | _, err := parseMembers(quoromToolOutput) 259 | 260 | assert.Error(t, err) 261 | assert.Contains(t, err.Error(), "could not parse vote number to uint64") 262 | assert.Contains(t, err.Error(), "value out of range") 263 | } 264 | 265 | func TestParseMembersWithAnotherExample(t *testing.T) { 266 | quorumToolOutput := []byte(`Quorum information 267 | ------------------ 268 | Date: Mon May 4 16:50:13 2020 269 | Quorum provider: corosync_votequorum 270 | Nodes: 2 271 | Node ID: 2 272 | Ring ID: 1/8 273 | Quorate: Yes 274 | 275 | Votequorum information 276 | ---------------------- 277 | Expected votes: 2 278 | Highest expected: 2 279 | Total votes: 2 280 | Quorum: 1 281 | Flags: 2Node Quorate WaitForAll 282 | 283 | Membership information 284 | ---------------------- 285 | Nodeid Votes Qdevice Name 286 | 1 1 NR 192.168.127.20 287 | 2 1 NR 192.168.127.21 (local)`) 288 | 289 | members, err := parseMembers(quorumToolOutput) 290 | 291 | assert.NoError(t, err) 292 | 293 | assert.Len(t, members, 2) 294 | assert.Exactly(t, "1", members[0].Id) 295 | assert.Exactly(t, "192.168.127.20", members[0].Name) 296 | assert.False(t, members[0].Local) 297 | assert.EqualValues(t, 1, members[0].Votes) 298 | assert.Exactly(t, "2", members[1].Id) 299 | assert.Exactly(t, "192.168.127.21", members[1].Name) 300 | assert.True(t, members[1].Local) 301 | assert.EqualValues(t, 1, members[1].Votes) 302 | } 303 | 304 | func TestParseMembersWithIpv6Hostnames(t *testing.T) { 305 | quorumToolOutput := []byte(`Quorum information 306 | Membership information 307 | ---------------------- 308 | Nodeid Votes Qdevice Name 309 | 1 1 NR fe80:00:000:0000:1234:5678:ABCD:EF 310 | 2 1 NR FE80:0:00:000:0000::1 (local)`) 311 | 312 | members, err := parseMembers(quorumToolOutput) 313 | 314 | assert.NoError(t, err) 315 | 316 | assert.Len(t, members, 2) 317 | assert.Exactly(t, "1", members[0].Id) 318 | assert.Exactly(t, "fe80:00:000:0000:1234:5678:ABCD:EF", members[0].Name) 319 | assert.False(t, members[0].Local) 320 | assert.EqualValues(t, 1, members[0].Votes) 321 | assert.Exactly(t, "2", members[1].Id) 322 | assert.Exactly(t, "FE80:0:00:000:0000::1", members[1].Name) 323 | assert.True(t, members[1].Local) 324 | assert.EqualValues(t, 1, members[1].Votes) 325 | } 326 | -------------------------------------------------------------------------------- /collector/default_collector.go: -------------------------------------------------------------------------------- 1 | package collector 2 | 3 | import ( 4 | "github.com/ClusterLabs/ha_cluster_exporter/internal/clock" 5 | "github.com/go-kit/log" 6 | "github.com/pkg/errors" 7 | "github.com/prometheus/client_golang/prometheus" 8 | "os" 9 | ) 10 | 11 | const NAMESPACE = "ha_cluster" 12 | 13 | type SubsystemCollector interface { 14 | GetSubsystem() string 15 | } 16 | 17 | type DefaultCollector struct { 18 | subsystem string 19 | descriptors map[string]*prometheus.Desc 20 | Clock clock.Clock 21 | timestamps bool 22 | Logger log.Logger 23 | } 24 | 25 | func NewDefaultCollector(subsystem string, timestamps bool, logger log.Logger) DefaultCollector { 26 | return DefaultCollector{ 27 | subsystem, 28 | make(map[string]*prometheus.Desc), 29 | &clock.SystemClock{}, 30 | timestamps, 31 | logger, 32 | } 33 | } 34 | 35 | func (c *DefaultCollector) GetDescriptor(name string) *prometheus.Desc { 36 | desc, ok := c.descriptors[name] 37 | if !ok { 38 | // we hard panic on this because it's most certainly a coding error 39 | panic(errors.Errorf("undeclared metric '%s'", name)) 40 | } 41 | return desc 42 | } 43 | 44 | // Convenience wrapper around prometheus.NewDesc constructor. 45 | // Stores a metric descriptor with a fully qualified name like `NAMESPACE_subsystem_name`. 46 | // `name` is the last and most relevant part of the metrics Full Qualified Name; 47 | // `help` is the message displayed in the HELP line 48 | // `variableLabels` is a list of labels to declare. Use `nil` to declare no labels. 49 | func (c *DefaultCollector) SetDescriptor(name, help string, variableLabels []string) { 50 | c.descriptors[name] = prometheus.NewDesc(prometheus.BuildFQName(NAMESPACE, c.subsystem, name), help, variableLabels, nil) 51 | } 52 | 53 | func (c *DefaultCollector) Describe(ch chan<- *prometheus.Desc) { 54 | for _, descriptor := range c.descriptors { 55 | ch <- descriptor 56 | } 57 | } 58 | 59 | func (c *DefaultCollector) GetSubsystem() string { 60 | return c.subsystem 61 | } 62 | 63 | func (c *DefaultCollector) MakeGaugeMetric(name string, value float64, labelValues ...string) prometheus.Metric { 64 | return c.makeMetric(name, value, prometheus.GaugeValue, labelValues...) 65 | } 66 | 67 | func (c *DefaultCollector) MakeCounterMetric(name string, value float64, labelValues ...string) prometheus.Metric { 68 | return c.makeMetric(name, value, prometheus.CounterValue, labelValues...) 69 | } 70 | 71 | func (c *DefaultCollector) makeMetric(name string, value float64, valueType prometheus.ValueType, labelValues ...string) prometheus.Metric { 72 | desc := c.GetDescriptor(name) 73 | metric := prometheus.MustNewConstMetric(desc, valueType, value, labelValues...) 74 | if c.timestamps == true { 75 | metric = prometheus.NewMetricWithTimestamp(c.Clock.Now(), metric) 76 | } 77 | return metric 78 | } 79 | 80 | // check that all the given paths exist and are executable files 81 | func CheckExecutables(paths ...string) error { 82 | for _, path := range paths { 83 | fileInfo, err := os.Stat(path) 84 | if err != nil || os.IsNotExist(err) { 85 | return errors.Errorf("'%s' does not exist", path) 86 | } 87 | if fileInfo.IsDir() { 88 | return errors.Errorf("'%s' is a directory", path) 89 | } 90 | if (fileInfo.Mode() & 0111) == 0 { 91 | return errors.Errorf("'%s' is not executable", path) 92 | } 93 | } 94 | return nil 95 | } 96 | -------------------------------------------------------------------------------- /collector/default_collector_test.go: -------------------------------------------------------------------------------- 1 | package collector 2 | 3 | import ( 4 | "testing" 5 | 6 | "github.com/go-kit/log" 7 | dto "github.com/prometheus/client_model/go" 8 | "github.com/stretchr/testify/assert" 9 | 10 | "github.com/ClusterLabs/ha_cluster_exporter/internal/clock" 11 | ) 12 | 13 | func TestMetricFactory(t *testing.T) { 14 | SUT := NewDefaultCollector("test", false, log.NewNopLogger()) 15 | SUT.SetDescriptor("test_metric", "", nil) 16 | 17 | metric := SUT.MakeGaugeMetric("test_metric", 1) 18 | 19 | assert.Equal(t, SUT.GetDescriptor("test_metric"), metric.Desc()) 20 | } 21 | 22 | func TestMetricFactoryWithTimestamp(t *testing.T) { 23 | 24 | SUT := NewDefaultCollector("test", true, log.NewNopLogger()) 25 | SUT.Clock = &clock.StoppedClock{} 26 | SUT.SetDescriptor("test_metric", "", nil) 27 | 28 | metric := SUT.MakeGaugeMetric("test_metric", 1) 29 | metricDto := &dto.Metric{} 30 | err := metric.Write(metricDto) 31 | 32 | assert.Nil(t, err, "Unexpected error") 33 | 34 | assert.Equal(t, int64(clock.TEST_TIMESTAMP), *metricDto.TimestampMs) 35 | } 36 | -------------------------------------------------------------------------------- /collector/drbd/drbd.go: -------------------------------------------------------------------------------- 1 | package drbd 2 | 3 | import ( 4 | "encoding/json" 5 | "os/exec" 6 | "path/filepath" 7 | "regexp" 8 | "strconv" 9 | "strings" 10 | 11 | "github.com/go-kit/log" 12 | "github.com/go-kit/log/level" 13 | "github.com/pkg/errors" 14 | "github.com/prometheus/client_golang/prometheus" 15 | 16 | "github.com/ClusterLabs/ha_cluster_exporter/collector" 17 | ) 18 | 19 | const subsystem = "drbd" 20 | 21 | // drbdStatus is for parsing relevant data we want to convert to metrics 22 | type drbdStatus struct { 23 | Name string `json:"name"` 24 | Role string `json:"role"` 25 | Devices []struct { 26 | Volume int `json:"volume"` 27 | Written int `json:"written"` 28 | Read int `json:"read"` 29 | AlWrites int `json:"al-writes"` 30 | BmWrites int `json:"bm-writes"` 31 | UpPending int `json:"upper-pending"` 32 | LoPending int `json:"lower-pending"` 33 | Quorum bool `json:"quorum"` 34 | DiskState string `json:"disk-state"` 35 | } `json:"devices"` 36 | Connections []struct { 37 | PeerNodeID int `json:"peer-node-id"` 38 | PeerRole string `json:"peer-role"` 39 | PeerDevices []struct { 40 | Volume int `json:"volume"` 41 | Received int `json:"received"` 42 | Sent int `json:"sent"` 43 | Pending int `json:"pending"` 44 | Unacked int `json:"unacked"` 45 | PeerDiskState string `json:"peer-disk-state"` 46 | PercentInSync float64 `json:"percent-in-sync"` 47 | } `json:"peer_devices"` 48 | } `json:"connections"` 49 | } 50 | 51 | func NewCollector(drbdSetupPath string, drbdSplitBrainPath string, timestamps bool, logger log.Logger) (*drbdCollector, error) { 52 | err := collector.CheckExecutables(drbdSetupPath) 53 | if err != nil { 54 | return nil, errors.Wrapf(err, "could not initialize '%s' collector", subsystem) 55 | } 56 | 57 | c := &drbdCollector{ 58 | collector.NewDefaultCollector(subsystem, timestamps, logger), 59 | drbdSetupPath, 60 | drbdSplitBrainPath, 61 | } 62 | 63 | c.SetDescriptor("resources", "The DRBD resources; 1 line per name, per volume", []string{"resource", "role", "volume", "disk_state"}) 64 | c.SetDescriptor("written", "KiB written to DRBD; 1 line per res, per volume", []string{"resource", "volume"}) 65 | c.SetDescriptor("read", "KiB read from DRBD; 1 line per res, per volume", []string{"resource", "volume"}) 66 | c.SetDescriptor("al_writes", "Writes to activity log; 1 line per res, per volume", []string{"resource", "volume"}) 67 | c.SetDescriptor("bm_writes", "Writes to bitmap; 1 line per res, per volume", []string{"resource", "volume"}) 68 | c.SetDescriptor("upper_pending", "Upper pending; 1 line per res, per volume", []string{"resource", "volume"}) 69 | c.SetDescriptor("lower_pending", "Lower pending; 1 line per res, per volume", []string{"resource", "volume"}) 70 | c.SetDescriptor("quorum", "Quorum status per resource and per volume", []string{"resource", "volume"}) 71 | c.SetDescriptor("connections", "The DRBD resource connections; 1 line per per resource, per peer_node_id", []string{"resource", "peer_node_id", "peer_role", "volume", "peer_disk_state"}) 72 | c.SetDescriptor("connections_sync", "The in sync percentage value for DRBD resource connections", []string{"resource", "peer_node_id", "volume"}) 73 | c.SetDescriptor("connections_received", "KiB received per connection", []string{"resource", "peer_node_id", "volume"}) 74 | c.SetDescriptor("connections_sent", "KiB sent per connection", []string{"resource", "peer_node_id", "volume"}) 75 | c.SetDescriptor("connections_pending", "Pending value per connection", []string{"resource", "peer_node_id", "volume"}) 76 | c.SetDescriptor("connections_unacked", "Unacked value per connection", []string{"resource", "peer_node_id", "volume"}) 77 | c.SetDescriptor("split_brain", "Whether a split brain has been detected; 1 line per resource, per volume.", []string{"resource", "volume"}) 78 | 79 | return c, nil 80 | } 81 | 82 | type drbdCollector struct { 83 | collector.DefaultCollector 84 | drbdsetupPath string 85 | drbdSplitBrainPath string 86 | } 87 | 88 | func (c *drbdCollector) CollectWithError(ch chan<- prometheus.Metric) error { 89 | level.Debug(c.Logger).Log("msg", "Collecting DRBD metrics...") 90 | 91 | c.recordDrbdSplitBrainMetric(ch) 92 | 93 | drbdStatusRaw, err := exec.Command(c.drbdsetupPath, "status", "--json").Output() 94 | if err != nil { 95 | return errors.Wrap(err, "drbdsetup command failed") 96 | } 97 | // populate structs and parse relevant info we will expose via metrics 98 | drbdDev, err := parseDrbdStatus(drbdStatusRaw) 99 | if err != nil { 100 | return errors.Wrap(err, "could not parse drbdsetup status output") 101 | } 102 | 103 | for _, resource := range drbdDev { 104 | for _, device := range resource.Devices { 105 | // the `resources` metric value is always 1, otherwise it's absent 106 | ch <- c.MakeGaugeMetric("resources", float64(1), resource.Name, resource.Role, strconv.Itoa(device.Volume), strings.ToLower(device.DiskState)) 107 | ch <- c.MakeGaugeMetric("written", float64(device.Written), resource.Name, strconv.Itoa(device.Volume)) 108 | ch <- c.MakeGaugeMetric("read", float64(device.Read), resource.Name, strconv.Itoa(device.Volume)) 109 | ch <- c.MakeGaugeMetric("al_writes", float64(device.AlWrites), resource.Name, strconv.Itoa(device.Volume)) 110 | ch <- c.MakeGaugeMetric("bm_writes", float64(device.BmWrites), resource.Name, strconv.Itoa(device.Volume)) 111 | ch <- c.MakeGaugeMetric("upper_pending", float64(device.UpPending), resource.Name, strconv.Itoa(device.Volume)) 112 | ch <- c.MakeGaugeMetric("lower_pending", float64(device.LoPending), resource.Name, strconv.Itoa(device.Volume)) 113 | 114 | if device.Quorum == true { 115 | ch <- c.MakeGaugeMetric("quorum", float64(1), resource.Name, strconv.Itoa(device.Volume)) 116 | } else { 117 | ch <- c.MakeGaugeMetric("quorum", float64(0), resource.Name, strconv.Itoa(device.Volume)) 118 | } 119 | } 120 | if len(resource.Connections) == 0 { 121 | level.Warn(c.Logger).Log("msg", "Could not retrieve connection info for resource "+resource.Name, "err", err) 122 | continue 123 | } 124 | // a Resource can have multiple connection with different nodes 125 | for _, conn := range resource.Connections { 126 | if len(conn.PeerDevices) == 0 { 127 | level.Warn(c.Logger).Log("msg", "Could not retrieve any peer device info for connection "+resource.Name, "err", err) 128 | continue 129 | } 130 | for _, peerDev := range conn.PeerDevices { 131 | ch <- c.MakeGaugeMetric("connections", float64(1), resource.Name, strconv.Itoa(conn.PeerNodeID), conn.PeerRole, strconv.Itoa(peerDev.Volume), strings.ToLower(peerDev.PeerDiskState)) 132 | ch <- c.MakeGaugeMetric("connections_sync", float64(peerDev.PercentInSync), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume)) 133 | ch <- c.MakeGaugeMetric("connections_received", float64(peerDev.Received), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume)) 134 | ch <- c.MakeGaugeMetric("connections_sent", float64(peerDev.Sent), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume)) 135 | ch <- c.MakeGaugeMetric("connections_pending", float64(peerDev.Pending), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume)) 136 | ch <- c.MakeGaugeMetric("connections_unacked", float64(peerDev.Unacked), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume)) 137 | } 138 | } 139 | } 140 | 141 | return nil 142 | } 143 | 144 | func (c *drbdCollector) Collect(ch chan<- prometheus.Metric) { 145 | level.Debug(c.Logger).Log("msg", "Collecting DRBD metrics...") 146 | 147 | err := c.CollectWithError(ch) 148 | if err != nil { 149 | level.Warn(c.Logger).Log("msg", c.GetSubsystem()+" collector scrape failed", "err", err) 150 | } 151 | } 152 | 153 | func parseDrbdStatus(statusRaw []byte) ([]drbdStatus, error) { 154 | var drbdDevs []drbdStatus 155 | err := json.Unmarshal(statusRaw, &drbdDevs) 156 | if err != nil { 157 | return drbdDevs, err 158 | } 159 | return drbdDevs, nil 160 | } 161 | 162 | func (c *drbdCollector) recordDrbdSplitBrainMetric(ch chan<- prometheus.Metric) { 163 | // look for files created by the DRBD split brain hook 164 | files, _ := filepath.Glob(c.drbdSplitBrainPath + "/drbd-split-brain-detected-*") 165 | 166 | // prepare some pattern matching 167 | re := regexp.MustCompile(`drbd-split-brain-detected-(?P[\w-]+)-(?P[\w-]+)`) 168 | 169 | // for each of these files, we extract the name of the resource end volume from its name and record the metric 170 | for _, f := range files { 171 | // matches[0] will be the whole file name, matches[1] the resource, matches[2] the volume 172 | matches := re.FindStringSubmatch(f) 173 | if matches == nil { 174 | continue 175 | } 176 | 177 | ch <- c.MakeGaugeMetric("split_brain", float64(1), matches[1], matches[2]) 178 | } 179 | } 180 | -------------------------------------------------------------------------------- /collector/drbd/drbd_test.go: -------------------------------------------------------------------------------- 1 | package drbd 2 | 3 | import ( 4 | "strings" 5 | "testing" 6 | 7 | "github.com/go-kit/log" 8 | "github.com/prometheus/client_golang/prometheus/testutil" 9 | "github.com/stretchr/testify/assert" 10 | 11 | assertcustom "github.com/ClusterLabs/ha_cluster_exporter/internal/assert" 12 | ) 13 | 14 | func TestDrbdParsing(t *testing.T) { 15 | var drbdDataRaw = []byte(` 16 | [ 17 | { 18 | "name": "1-single-0", 19 | "node-id": 2, 20 | "role": "Secondary", 21 | "suspended": false, 22 | "write-ordering": "flush", 23 | "devices": [ 24 | { 25 | "volume": 0, 26 | "minor": 2, 27 | "disk-state": "UpToDate", 28 | "client": false, 29 | "quorum": true, 30 | "size": 409600, 31 | "read": 654321, 32 | "written": 123456, 33 | "al-writes": 123, 34 | "bm-writes": 321, 35 | "upper-pending": 1, 36 | "lower-pending": 2 37 | } 38 | ], 39 | "connections": [ 40 | { 41 | "peer-node-id": 1, 42 | "name": "SLE15-sp1-gm-drbd1145296-node1", 43 | "connection-state": "Connected", 44 | "congested": false, 45 | "peer-role": "Primary", 46 | "ap-in-flight": 0, 47 | "rs-in-flight": 0, 48 | "peer_devices": [ 49 | { 50 | "volume": 0, 51 | "replication-state": "Established", 52 | "peer-disk-state": "UpToDate", 53 | "peer-client": false, 54 | "resync-suspended": "no", 55 | "received": 456, 56 | "sent": 654, 57 | "out-of-sync": 0, 58 | "pending": 3, 59 | "unacked": 4, 60 | "has-sync-details": false, 61 | "has-online-verify-details": false, 62 | "percent-in-sync": 100 63 | } 64 | ] 65 | } 66 | ] 67 | }, 68 | { 69 | "name": "1-single-1", 70 | "node-id": 2, 71 | "role": "Secondary", 72 | "suspended": false, 73 | "write-ordering": "flush", 74 | "devices": [ 75 | { 76 | "volume": 0, 77 | "minor": 3, 78 | "disk-state": "UpToDate", 79 | "client": false, 80 | "quorum": false, 81 | "size": 10200, 82 | "read": 654321, 83 | "written": 123456, 84 | "al-writes": 123, 85 | "bm-writes": 321, 86 | "upper-pending": 1, 87 | "lower-pending": 2 88 | } 89 | ], 90 | "connections": [ 91 | { 92 | "peer-node-id": 1, 93 | "name": "SLE15-sp1-gm-drbd1145296-node1", 94 | "connection-state": "Connected", 95 | "congested": false, 96 | "peer-role": "Primary", 97 | "ap-in-flight": 0, 98 | "rs-in-flight": 0, 99 | "peer_devices": [ 100 | { 101 | "volume": 0, 102 | "replication-state": "Established", 103 | "peer-disk-state": "UpToDate", 104 | "peer-client": false, 105 | "resync-suspended": "no", 106 | "received": 456, 107 | "sent": 654, 108 | "out-of-sync": 0, 109 | "pending": 3, 110 | "unacked": 4, 111 | "has-sync-details": false, 112 | "has-online-verify-details": false, 113 | "percent-in-sync": 99.8 114 | } 115 | ] 116 | } 117 | ] 118 | } 119 | ]`) 120 | 121 | drbdDevs, err := parseDrbdStatus(drbdDataRaw) 122 | 123 | assert.Nil(t, err) 124 | assert.Equal(t, "1-single-0", drbdDevs[0].Name) 125 | assert.Equal(t, "Secondary", drbdDevs[0].Role) 126 | assert.Equal(t, "UpToDate", drbdDevs[0].Devices[0].DiskState) 127 | assert.Equal(t, 1, drbdDevs[0].Connections[0].PeerNodeID) 128 | assert.Equal(t, "UpToDate", drbdDevs[0].Connections[0].PeerDevices[0].PeerDiskState) 129 | assert.Equal(t, 0, drbdDevs[0].Devices[0].Volume) 130 | assert.Equal(t, 123456, drbdDevs[0].Devices[0].Written) 131 | assert.Equal(t, 654321, drbdDevs[0].Devices[0].Read) 132 | assert.Equal(t, 123, drbdDevs[0].Devices[0].AlWrites) 133 | assert.Equal(t, 321, drbdDevs[0].Devices[0].BmWrites) 134 | assert.Equal(t, 1, drbdDevs[0].Devices[0].UpPending) 135 | assert.Equal(t, 2, drbdDevs[0].Devices[0].LoPending) 136 | assert.Equal(t, true, drbdDevs[0].Devices[0].Quorum) 137 | assert.Equal(t, false, drbdDevs[1].Devices[0].Quorum) 138 | assert.Equal(t, 456, drbdDevs[0].Connections[0].PeerDevices[0].Received) 139 | assert.Equal(t, 654, drbdDevs[0].Connections[0].PeerDevices[0].Sent) 140 | assert.Equal(t, 3, drbdDevs[0].Connections[0].PeerDevices[0].Pending) 141 | assert.Equal(t, 4, drbdDevs[0].Connections[0].PeerDevices[0].Unacked) 142 | assert.Equal(t, 100.0, drbdDevs[0].Connections[0].PeerDevices[0].PercentInSync) 143 | assert.Equal(t, 99.8, drbdDevs[1].Connections[0].PeerDevices[0].PercentInSync) 144 | } 145 | 146 | func TestNewDrbdCollector(t *testing.T) { 147 | _, err := NewCollector("../../test/fake_drbdsetup.sh", "splitbrainpath", false, log.NewNopLogger()) 148 | 149 | assert.Nil(t, err) 150 | } 151 | 152 | func TestNewDrbdCollectorChecksDrbdsetupExistence(t *testing.T) { 153 | _, err := NewCollector("../../test/nonexistent", "splitbrainfake", false, log.NewNopLogger()) 154 | 155 | assert.Error(t, err) 156 | assert.Contains(t, err.Error(), "'../../test/nonexistent' does not exist") 157 | } 158 | 159 | func TestNewDrbdCollectorChecksDrbdsetupExecutableBits(t *testing.T) { 160 | _, err := NewCollector("../../test/dummy", "splibrainfake", false, log.NewNopLogger()) 161 | 162 | assert.Error(t, err) 163 | assert.Contains(t, err.Error(), "'../../test/dummy' is not executable") 164 | } 165 | 166 | func TestDRBDCollector(t *testing.T) { 167 | collector, _ := NewCollector("../../test/fake_drbdsetup.sh", "fake", false, log.NewNopLogger()) 168 | assertcustom.Metrics(t, collector, "drbd.metrics") 169 | } 170 | 171 | func TestDRBDSplitbrainCollector(t *testing.T) { 172 | collector, _ := NewCollector("../../test/fake_drbdsetup.sh", "../../test/drbd-splitbrain", false, log.NewNopLogger()) 173 | 174 | expect := ` 175 | # HELP ha_cluster_drbd_split_brain Whether a split brain has been detected; 1 line per resource, per volume. 176 | # TYPE ha_cluster_drbd_split_brain gauge 177 | ha_cluster_drbd_split_brain{resource="resource01",volume="vol01"} 1 178 | ha_cluster_drbd_split_brain{resource="resource02",volume="vol02"} 1 179 | ` 180 | 181 | err := testutil.CollectAndCompare(collector, strings.NewReader(expect), "ha_cluster_drbd_split_brain") 182 | 183 | assert.NoError(t, err) 184 | } 185 | -------------------------------------------------------------------------------- /collector/instrumented_collector.go: -------------------------------------------------------------------------------- 1 | package collector 2 | 3 | import ( 4 | "github.com/ClusterLabs/ha_cluster_exporter/internal/clock" 5 | "github.com/go-kit/log" 6 | "github.com/go-kit/log/level" 7 | "github.com/prometheus/client_golang/prometheus" 8 | ) 9 | 10 | //go:generate go run -mod=mod github.com/golang/mock/mockgen --build_flags=-mod=mod -package mock_collector -destination ../test/mock_collector/instrumented_collector.go github.com/ClusterLabs/ha_cluster_exporter/collector InstrumentableCollector 11 | 12 | // describes a collector that can return errors from collection cycles, 13 | // instead of the default Prometheus one, which has void Collect returns 14 | type InstrumentableCollector interface { 15 | prometheus.Collector 16 | SubsystemCollector 17 | CollectWithError(ch chan<- prometheus.Metric) error 18 | } 19 | 20 | type InstrumentedCollector struct { 21 | collector InstrumentableCollector 22 | Clock clock.Clock 23 | scrapeDurationDesc *prometheus.Desc 24 | scrapeSuccessDesc *prometheus.Desc 25 | logger log.Logger 26 | } 27 | 28 | func NewInstrumentedCollector(collector InstrumentableCollector, logger log.Logger) *InstrumentedCollector { 29 | return &InstrumentedCollector{ 30 | collector, 31 | &clock.SystemClock{}, 32 | prometheus.NewDesc( 33 | prometheus.BuildFQName(NAMESPACE, "scrape", "duration_seconds"), 34 | "Duration of a collector scrape.", 35 | nil, 36 | prometheus.Labels{ 37 | "collector": collector.GetSubsystem(), 38 | }, 39 | ), 40 | prometheus.NewDesc( 41 | prometheus.BuildFQName(NAMESPACE, "scrape", "success"), 42 | "Whether a collector succeeded.", 43 | nil, 44 | prometheus.Labels{ 45 | "collector": collector.GetSubsystem(), 46 | }, 47 | ), 48 | logger, 49 | } 50 | } 51 | 52 | func (ic *InstrumentedCollector) Collect(ch chan<- prometheus.Metric) { 53 | var success float64 54 | begin := ic.Clock.Now() 55 | err := ic.collector.CollectWithError(ch) 56 | duration := ic.Clock.Since(begin) 57 | if err == nil { 58 | success = 1 59 | } else { 60 | level.Warn(ic.logger).Log("msg", ic.collector.GetSubsystem()+" collector scrape failed", "err", err) 61 | } 62 | ch <- prometheus.MustNewConstMetric(ic.scrapeDurationDesc, prometheus.GaugeValue, duration.Seconds()) 63 | ch <- prometheus.MustNewConstMetric(ic.scrapeSuccessDesc, prometheus.GaugeValue, success) 64 | } 65 | 66 | func (ic *InstrumentedCollector) Describe(ch chan<- *prometheus.Desc) { 67 | ic.collector.Describe(ch) 68 | ch <- ic.scrapeDurationDesc 69 | ch <- ic.scrapeSuccessDesc 70 | } 71 | 72 | func (ic *InstrumentedCollector) GetSubsystem() string { 73 | return ic.collector.GetSubsystem() 74 | } 75 | -------------------------------------------------------------------------------- /collector/instrumented_collector_test.go: -------------------------------------------------------------------------------- 1 | package collector 2 | 3 | import ( 4 | "errors" 5 | "strings" 6 | "testing" 7 | 8 | "github.com/go-kit/log" 9 | "github.com/golang/mock/gomock" 10 | "github.com/prometheus/client_golang/prometheus/testutil" 11 | "github.com/stretchr/testify/assert" 12 | 13 | "github.com/ClusterLabs/ha_cluster_exporter/internal/clock" 14 | "github.com/ClusterLabs/ha_cluster_exporter/test/mock_collector" 15 | ) 16 | 17 | func TestInstrumentedCollector(t *testing.T) { 18 | ctrl := gomock.NewController(t) 19 | defer ctrl.Finish() 20 | 21 | mockCollector := mock_collector.NewMockInstrumentableCollector(ctrl) 22 | mockCollector.EXPECT().GetSubsystem().Return("mock_collector").AnyTimes() 23 | mockCollector.EXPECT().Describe(gomock.Any()) 24 | mockCollector.EXPECT().CollectWithError(gomock.Any()) 25 | 26 | SUT := NewInstrumentedCollector(mockCollector, log.NewNopLogger()) 27 | SUT.Clock = &clock.StoppedClock{} 28 | 29 | metrics := `# HELP ha_cluster_scrape_duration_seconds Duration of a collector scrape. 30 | # TYPE ha_cluster_scrape_duration_seconds gauge 31 | ha_cluster_scrape_duration_seconds{collector="mock_collector"} 1.234 32 | # HELP ha_cluster_scrape_success Whether a collector succeeded. 33 | # TYPE ha_cluster_scrape_success gauge 34 | ha_cluster_scrape_success{collector="mock_collector"} 1 35 | ` 36 | 37 | err := testutil.CollectAndCompare(SUT, strings.NewReader(metrics)) 38 | assert.NoError(t, err) 39 | } 40 | 41 | func TestInstrumentedCollectorScrapeFailure(t *testing.T) { 42 | ctrl := gomock.NewController(t) 43 | defer ctrl.Finish() 44 | 45 | mockCollector := mock_collector.NewMockInstrumentableCollector(ctrl) 46 | mockCollector.EXPECT().GetSubsystem().Return("mock_collector").AnyTimes() 47 | mockCollector.EXPECT().Describe(gomock.Any()) 48 | collectWithError := mockCollector.EXPECT().CollectWithError(gomock.Any()) 49 | collectWithError.Return(errors.New("test error")) 50 | 51 | SUT := NewInstrumentedCollector(mockCollector, log.NewNopLogger()) 52 | 53 | metrics := `# HELP ha_cluster_scrape_success Whether a collector succeeded. 54 | # TYPE ha_cluster_scrape_success gauge 55 | ha_cluster_scrape_success{collector="mock_collector"} 0 56 | ` 57 | 58 | err := testutil.CollectAndCompare(SUT, strings.NewReader(metrics), "ha_cluster_scrape_success") 59 | assert.NoError(t, err) 60 | 61 | assert.NotNil(t, collectWithError) 62 | } 63 | -------------------------------------------------------------------------------- /collector/pacemaker/cib/data.go: -------------------------------------------------------------------------------- 1 | package cib 2 | 3 | /* 4 | The Cluster Information Base (Root) is an XML representation of the cluster’s configuration and the state of all nodes and resources. 5 | The Root manager (pacemaker-based) keeps the Root synchronized across the cluster, and handles requests to modify it. 6 | 7 | https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Administration/index.html 8 | 9 | */ 10 | 11 | type Root struct { 12 | Configuration struct { 13 | CrmConfig struct { 14 | ClusterProperties []Attribute `xml:"cluster_property_set>nvpair"` 15 | } `xml:"crm_config"` 16 | Nodes []struct { 17 | Id string `xml:"id,attr"` 18 | Uname string `xml:"uname,attr"` 19 | InstanceAttributes []Attribute `xml:"instance_attributes>nvpair"` 20 | } `xml:"nodes>node"` 21 | Resources struct { 22 | Primitives []Primitive `xml:"primitive"` 23 | Masters []Clone `xml:"master"` 24 | Clones []Clone `xml:"clone"` 25 | } `xml:"resources"` 26 | Constraints struct { 27 | RscLocations []struct { 28 | Id string `xml:"id,attr"` 29 | Node string `xml:"node,attr"` 30 | Resource string `xml:"rsc,attr"` 31 | Role string `xml:"role,attr"` 32 | Score string `xml:"score,attr"` 33 | } `xml:"rsc_location"` 34 | } `xml:"constraints"` 35 | } `xml:"configuration"` 36 | } 37 | 38 | type Attribute struct { 39 | Id string `xml:"id,attr"` 40 | Name string `xml:"name,attr"` 41 | Value string `xml:"value,attr"` 42 | } 43 | 44 | type Primitive struct { 45 | Id string `xml:"id,attr"` 46 | Class string `xml:"class,attr"` 47 | Type string `xml:"type,attr"` 48 | Provider string `xml:"provider,attr"` 49 | InstanceAttributes []Attribute `xml:"instance_attributes>nvpair"` 50 | MetaAttributes []Attribute `xml:"meta_attributes>nvpair"` 51 | Operations []struct { 52 | Id string `xml:"id,attr"` 53 | Name string `xml:"name,attr"` 54 | Role string `xml:"role,attr"` 55 | // todo: interval and timeout are time based vars. We should in future parse them correctly insteaf of string 56 | Interval string `xml:"interval,attr"` 57 | Timeout string `xml:"timeout,attr"` 58 | } `xml:"operations>op"` 59 | } 60 | 61 | type Clone struct { 62 | Id string `xml:"id,attr"` 63 | MetaAttributes []Attribute `xml:"meta_attributes>nvpair"` 64 | Primitive Primitive `xml:"primitive"` 65 | } 66 | -------------------------------------------------------------------------------- /collector/pacemaker/cib/parser.go: -------------------------------------------------------------------------------- 1 | package cib 2 | 3 | import ( 4 | "encoding/xml" 5 | "os/exec" 6 | 7 | "github.com/pkg/errors" 8 | ) 9 | 10 | type Parser interface { 11 | Parse() (Root, error) 12 | } 13 | 14 | type cibAdminParser struct { 15 | cibAdminPath string 16 | } 17 | 18 | func (p *cibAdminParser) Parse() (Root, error) { 19 | var CIB Root 20 | cibXML, err := exec.Command(p.cibAdminPath, "--query", "--local").Output() 21 | if err != nil { 22 | return CIB, errors.Wrap(err, "error while executing cibadmin") 23 | } 24 | 25 | err = xml.Unmarshal(cibXML, &CIB) 26 | if err != nil { 27 | return CIB, errors.Wrap(err, "could not parse cibadmin status from XML") 28 | } 29 | 30 | return CIB, nil 31 | } 32 | 33 | func NewCibAdminParser(cibAdminPath string) *cibAdminParser { 34 | return &cibAdminParser{cibAdminPath} 35 | } 36 | -------------------------------------------------------------------------------- /collector/pacemaker/cib/parser_test.go: -------------------------------------------------------------------------------- 1 | package cib 2 | 3 | import ( 4 | "testing" 5 | 6 | "github.com/stretchr/testify/assert" 7 | ) 8 | 9 | func TestConstructor(t *testing.T) { 10 | p := NewCibAdminParser("foo") 11 | assert.Equal(t, "foo", p.cibAdminPath) 12 | } 13 | 14 | func TestParse(t *testing.T) { 15 | p := NewCibAdminParser("../../../test/fake_cibadmin.sh") 16 | data, err := p.Parse() 17 | assert.NoError(t, err) 18 | assert.Equal(t, 2, len(data.Configuration.Nodes)) 19 | assert.Equal(t, "cib-bootstrap-options-cluster-name", data.Configuration.CrmConfig.ClusterProperties[3].Id) 20 | assert.Equal(t, "hana_cluster", data.Configuration.CrmConfig.ClusterProperties[3].Value) 21 | assert.Equal(t, "node01", data.Configuration.Nodes[0].Uname) 22 | assert.Equal(t, "node02", data.Configuration.Nodes[1].Uname) 23 | assert.Equal(t, 4, len(data.Configuration.Resources.Primitives)) 24 | assert.Equal(t, 1, len(data.Configuration.Resources.Masters)) 25 | assert.Equal(t, 1, len(data.Configuration.Resources.Clones)) 26 | assert.Equal(t, "stonith-sbd", data.Configuration.Resources.Primitives[0].Id) 27 | assert.Equal(t, "stonith", data.Configuration.Resources.Primitives[0].Class) 28 | assert.Equal(t, "external/sbd", data.Configuration.Resources.Primitives[0].Type) 29 | assert.Equal(t, 1, len(data.Configuration.Resources.Primitives[0].InstanceAttributes)) 30 | assert.Equal(t, "pcmk_delay_max", data.Configuration.Resources.Primitives[0].InstanceAttributes[0].Name) 31 | assert.Equal(t, "stonith-sbd-instance_attributes-pcmk_delay_max", data.Configuration.Resources.Primitives[0].InstanceAttributes[0].Id) 32 | assert.Equal(t, "30s", data.Configuration.Resources.Primitives[0].InstanceAttributes[0].Value) 33 | assert.Equal(t, "msl_SAPHana_PRD_HDB00", data.Configuration.Resources.Masters[0].Id) 34 | assert.Equal(t, 3, len(data.Configuration.Resources.Masters[0].MetaAttributes)) 35 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00", data.Configuration.Resources.Masters[0].Primitive.Id) 36 | assert.Equal(t, 5, len(data.Configuration.Resources.Masters[0].Primitive.Operations)) 37 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00-start-0", data.Configuration.Resources.Masters[0].Primitive.Operations[0].Id) 38 | assert.Equal(t, "start", data.Configuration.Resources.Masters[0].Primitive.Operations[0].Name) 39 | assert.Equal(t, "0", data.Configuration.Resources.Masters[0].Primitive.Operations[0].Interval) 40 | assert.Equal(t, "3600", data.Configuration.Resources.Masters[0].Primitive.Operations[0].Timeout) 41 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00-stop-0", data.Configuration.Resources.Masters[0].Primitive.Operations[1].Id) 42 | assert.Equal(t, "stop", data.Configuration.Resources.Masters[0].Primitive.Operations[1].Name) 43 | assert.Equal(t, "0", data.Configuration.Resources.Masters[0].Primitive.Operations[1].Interval) 44 | assert.Equal(t, "3600", data.Configuration.Resources.Masters[0].Primitive.Operations[1].Timeout) 45 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00-promote-0", data.Configuration.Resources.Masters[0].Primitive.Operations[2].Id) 46 | assert.Equal(t, "promote", data.Configuration.Resources.Masters[0].Primitive.Operations[2].Name) 47 | assert.Equal(t, "0", data.Configuration.Resources.Masters[0].Primitive.Operations[2].Interval) 48 | assert.Equal(t, "3600", data.Configuration.Resources.Masters[0].Primitive.Operations[2].Timeout) 49 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00-monitor-60", data.Configuration.Resources.Masters[0].Primitive.Operations[3].Id) 50 | assert.Equal(t, "monitor", data.Configuration.Resources.Masters[0].Primitive.Operations[3].Name) 51 | assert.Equal(t, "Master", data.Configuration.Resources.Masters[0].Primitive.Operations[3].Role) 52 | assert.Equal(t, "60", data.Configuration.Resources.Masters[0].Primitive.Operations[3].Interval) 53 | assert.Equal(t, "700", data.Configuration.Resources.Masters[0].Primitive.Operations[3].Timeout) 54 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00-monitor-61", data.Configuration.Resources.Masters[0].Primitive.Operations[4].Id) 55 | assert.Equal(t, "monitor", data.Configuration.Resources.Masters[0].Primitive.Operations[4].Name) 56 | assert.Equal(t, "Slave", data.Configuration.Resources.Masters[0].Primitive.Operations[4].Role) 57 | assert.Equal(t, "61", data.Configuration.Resources.Masters[0].Primitive.Operations[4].Interval) 58 | assert.Equal(t, "700", data.Configuration.Resources.Masters[0].Primitive.Operations[4].Timeout) 59 | assert.Equal(t, "test", data.Configuration.Resources.Primitives[2].Id) 60 | assert.Equal(t, "ocf", data.Configuration.Resources.Primitives[2].Class) 61 | assert.Equal(t, "heartbeat", data.Configuration.Resources.Primitives[2].Provider) 62 | assert.Equal(t, "Dummy", data.Configuration.Resources.Primitives[2].Type) 63 | 64 | } 65 | -------------------------------------------------------------------------------- /collector/pacemaker/crmmon/data.go: -------------------------------------------------------------------------------- 1 | package crmmon 2 | 3 | // *** crm_mon XML unserialization structures 4 | 5 | type Root struct { 6 | Version string `xml:"version,attr"` 7 | Summary struct { 8 | Nodes struct { 9 | Number int `xml:"number,attr"` 10 | } `xml:"nodes_configured"` 11 | LastChange struct { 12 | Time string `xml:"time,attr"` 13 | } `xml:"last_change"` 14 | Resources struct { 15 | Number int `xml:"number,attr"` 16 | Disabled int `xml:"disabled,attr"` 17 | Blocked int `xml:"blocked,attr"` 18 | } `xml:"resources_configured"` 19 | ClusterOptions struct { 20 | StonithEnabled bool `xml:"stonith-enabled,attr"` 21 | MaintenanceMode bool `xml:"maintenance-mode,attr"` 22 | } `xml:"cluster_options"` 23 | } `xml:"summary"` 24 | Nodes []Node `xml:"nodes>node"` 25 | NodeAttributes struct { 26 | Nodes []struct { 27 | Name string `xml:"name,attr"` 28 | Attributes []struct { 29 | Name string `xml:"name,attr"` 30 | Value string `xml:"value,attr"` 31 | } `xml:"attribute"` 32 | } `xml:"node"` 33 | } `xml:"node_attributes"` 34 | NodeHistory struct { 35 | Nodes []struct { 36 | Name string `xml:"name,attr"` 37 | ResourceHistory []struct { 38 | Name string `xml:"id,attr"` 39 | MigrationThreshold int `xml:"migration-threshold,attr"` 40 | FailCount int `xml:"fail-count,attr"` 41 | } `xml:"resource_history"` 42 | } `xml:"node"` 43 | } `xml:"node_history"` 44 | Resources []Resource `xml:"resources>resource"` 45 | Clones []Clone `xml:"resources>clone"` 46 | Groups []Group `xml:"resources>group"` 47 | } 48 | 49 | type Node struct { 50 | Name string `xml:"name,attr"` 51 | Id string `xml:"id,attr"` 52 | Online bool `xml:"online,attr"` 53 | Standby bool `xml:"standby,attr"` 54 | StandbyOnFail bool `xml:"standby_onfail,attr"` 55 | Maintenance bool `xml:"maintenance,attr"` 56 | Pending bool `xml:"pending,attr"` 57 | Unclean bool `xml:"unclean,attr"` 58 | Shutdown bool `xml:"shutdown,attr"` 59 | ExpectedUp bool `xml:"expected_up,attr"` 60 | DC bool `xml:"is_dc,attr"` 61 | ResourcesRunning int `xml:"resources_running,attr"` 62 | Type string `xml:"type,attr"` 63 | } 64 | 65 | type Resource struct { 66 | Id string `xml:"id,attr"` 67 | Agent string `xml:"resource_agent,attr"` 68 | Role string `xml:"role,attr"` 69 | Active bool `xml:"active,attr"` 70 | Orphaned bool `xml:"orphaned,attr"` 71 | Blocked bool `xml:"blocked,attr"` 72 | Managed bool `xml:"managed,attr"` 73 | Failed bool `xml:"failed,attr"` 74 | FailureIgnored bool `xml:"failure_ignored,attr"` 75 | NodesRunningOn int `xml:"nodes_running_on,attr"` 76 | Node *struct { 77 | Name string `xml:"name,attr"` 78 | Id string `xml:"id,attr"` 79 | Cached bool `xml:"cached,attr"` 80 | } `xml:"node,omitempty"` 81 | } 82 | 83 | type Clone struct { 84 | Id string `xml:"id,attr"` 85 | MultiState bool `xml:"multi_state,attr"` 86 | Managed bool `xml:"managed,attr"` 87 | Failed bool `xml:"failed,attr"` 88 | FailureIgnored bool `xml:"failure_ignored,attr"` 89 | Unique bool `xml:"unique,attr"` 90 | Resources []Resource `xml:"resource"` 91 | } 92 | 93 | type Group struct { 94 | Id string `xml:"id,attr"` 95 | Resources []Resource `xml:"resource"` 96 | } 97 | -------------------------------------------------------------------------------- /collector/pacemaker/crmmon/parser.go: -------------------------------------------------------------------------------- 1 | package crmmon 2 | 3 | import ( 4 | "encoding/xml" 5 | "os/exec" 6 | 7 | "github.com/pkg/errors" 8 | ) 9 | 10 | type Parser interface { 11 | Parse() (Root, error) 12 | } 13 | 14 | type crmMonParser struct { 15 | crmMonPath string 16 | } 17 | 18 | func (c *crmMonParser) Parse() (crmMon Root, err error) { 19 | crmMonXML, err := exec.Command(c.crmMonPath, "-X", "--inactive").Output() 20 | if err != nil { 21 | return crmMon, errors.Wrap(err, "error while executing crm_mon") 22 | } 23 | 24 | err = xml.Unmarshal(crmMonXML, &crmMon) 25 | if err != nil { 26 | return crmMon, errors.Wrap(err, "error while parsing crm_mon XML output") 27 | } 28 | 29 | return crmMon, nil 30 | } 31 | 32 | func NewCrmMonParser(crmMonPath string) *crmMonParser { 33 | return &crmMonParser{crmMonPath} 34 | } 35 | -------------------------------------------------------------------------------- /collector/pacemaker/crmmon/parser_test.go: -------------------------------------------------------------------------------- 1 | package crmmon 2 | 3 | import ( 4 | "testing" 5 | 6 | "github.com/stretchr/testify/assert" 7 | ) 8 | 9 | func TestConstructor(t *testing.T) { 10 | p := NewCrmMonParser("foo") 11 | assert.Equal(t, "foo", p.crmMonPath) 12 | } 13 | 14 | func TestParse(t *testing.T) { 15 | p := NewCrmMonParser("../../../test/fake_crm_mon.sh") 16 | data, err := p.Parse() 17 | assert.NoError(t, err) 18 | assert.Equal(t, "2.0.0", data.Version) 19 | assert.Equal(t, 8, data.Summary.Resources.Number) 20 | assert.Equal(t, 1, data.Summary.Resources.Disabled) 21 | assert.Equal(t, 0, data.Summary.Resources.Blocked) 22 | assert.Equal(t, "Fri Oct 18 11:48:22 2019", data.Summary.LastChange.Time) 23 | assert.Equal(t, 2, data.Summary.Nodes.Number) 24 | assert.Equal(t, "node01", data.Nodes[0].Name) 25 | assert.Equal(t, "1084783375", data.Nodes[0].Id) 26 | assert.Equal(t, true, data.Nodes[0].Online) 27 | assert.Equal(t, true, data.Nodes[0].ExpectedUp) 28 | assert.Equal(t, true, data.Nodes[0].DC) 29 | assert.Equal(t, false, data.Nodes[0].Unclean) 30 | assert.Equal(t, false, data.Nodes[0].Shutdown) 31 | assert.Equal(t, false, data.Nodes[0].StandbyOnFail) 32 | assert.Equal(t, false, data.Nodes[0].Maintenance) 33 | assert.Equal(t, false, data.Nodes[0].Pending) 34 | assert.Equal(t, false, data.Nodes[0].Standby) 35 | assert.Equal(t, "node02", data.Nodes[1].Name) 36 | assert.Equal(t, "1084783376", data.Nodes[1].Id) 37 | assert.Equal(t, true, data.Nodes[1].Online) 38 | assert.Equal(t, true, data.Nodes[1].ExpectedUp) 39 | assert.Equal(t, false, data.Nodes[1].DC) 40 | assert.Equal(t, false, data.Nodes[1].Unclean) 41 | assert.Equal(t, false, data.Nodes[1].Shutdown) 42 | assert.Equal(t, false, data.Nodes[1].StandbyOnFail) 43 | assert.Equal(t, false, data.Nodes[1].Maintenance) 44 | assert.Equal(t, false, data.Nodes[1].Pending) 45 | assert.Equal(t, false, data.Nodes[1].Standby) 46 | assert.Equal(t, "node01", data.NodeHistory.Nodes[0].Name) 47 | assert.Equal(t, 5000, data.NodeHistory.Nodes[0].ResourceHistory[0].MigrationThreshold) 48 | assert.Equal(t, 2, data.NodeHistory.Nodes[0].ResourceHistory[1].FailCount) 49 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00", data.NodeHistory.Nodes[0].ResourceHistory[0].Name) 50 | assert.Equal(t, 4, len(data.Resources)) 51 | assert.Equal(t, "test-stop", data.Resources[0].Id) 52 | assert.Equal(t, false, data.Resources[0].Active) 53 | assert.Equal(t, "Stopped", data.Resources[0].Role) 54 | } 55 | 56 | func TestParseClones(t *testing.T) { 57 | p := NewCrmMonParser("../../../test/fake_crm_mon.sh") 58 | data, err := p.Parse() 59 | assert.NoError(t, err) 60 | assert.Equal(t, 3, len(data.Clones)) 61 | assert.Equal(t, "msl_SAPHana_PRD_HDB00", data.Clones[0].Id) 62 | assert.Equal(t, "cln_SAPHanaTopology_PRD_HDB00", data.Clones[1].Id) 63 | assert.Equal(t, "c-clusterfs", data.Clones[2].Id) 64 | assert.Equal(t, 2, len(data.Clones[0].Resources)) 65 | assert.Equal(t, 2, len(data.Clones[1].Resources)) 66 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00", data.Clones[0].Resources[0].Id) 67 | assert.Equal(t, "Master", data.Clones[0].Resources[0].Role) 68 | assert.Equal(t, "rsc_SAPHana_PRD_HDB00", data.Clones[0].Resources[1].Id) 69 | assert.Equal(t, "Slave", data.Clones[0].Resources[1].Role) 70 | } 71 | 72 | func TestParseGroups(t *testing.T) { 73 | p := NewCrmMonParser("../../../test/fake_crm_mon.sh") 74 | data, err := p.Parse() 75 | assert.NoError(t, err) 76 | assert.Equal(t, 2, len(data.Groups)) 77 | 78 | assert.Equal(t, "grp_HA1_ASCS00", data.Groups[0].Id) 79 | assert.Equal(t, 3, len(data.Groups[0].Resources)) 80 | assert.Equal(t, "rsc_ip_HA1_ASCS00", data.Groups[0].Resources[0].Id) 81 | assert.Equal(t, "rsc_fs_HA1_ASCS00", data.Groups[0].Resources[1].Id) 82 | assert.Equal(t, "rsc_sap_HA1_ASCS00", data.Groups[0].Resources[2].Id) 83 | 84 | assert.Equal(t, "grp_HA1_ERS10", data.Groups[1].Id) 85 | assert.Equal(t, 3, len(data.Groups[1].Resources)) 86 | assert.Equal(t, "rsc_ip_HA1_ERS10", data.Groups[1].Resources[0].Id) 87 | assert.Equal(t, "rsc_fs_HA1_ERS10", data.Groups[1].Resources[1].Id) 88 | assert.Equal(t, "rsc_sap_HA1_ERS10", data.Groups[1].Resources[2].Id) 89 | } 90 | 91 | func TestParseNodeAttributes(t *testing.T) { 92 | p := NewCrmMonParser("../../../test/fake_crm_mon.sh") 93 | data, err := p.Parse() 94 | assert.NoError(t, err) 95 | assert.Len(t, data.NodeAttributes.Nodes, 2) 96 | assert.Equal(t, "node01", data.NodeAttributes.Nodes[0].Name) 97 | assert.Equal(t, "node02", data.NodeAttributes.Nodes[1].Name) 98 | 99 | assert.Len(t, data.NodeAttributes.Nodes[0].Attributes, 11) 100 | assert.Equal(t, "hana_prd_clone_state", data.NodeAttributes.Nodes[0].Attributes[0].Name) 101 | assert.Equal(t, "hana_prd_op_mode", data.NodeAttributes.Nodes[0].Attributes[1].Name) 102 | assert.Equal(t, "hana_prd_remoteHost", data.NodeAttributes.Nodes[0].Attributes[2].Name) 103 | assert.Equal(t, "hana_prd_roles", data.NodeAttributes.Nodes[0].Attributes[3].Name) 104 | assert.Equal(t, "hana_prd_site", data.NodeAttributes.Nodes[0].Attributes[4].Name) 105 | assert.Equal(t, "hana_prd_srmode", data.NodeAttributes.Nodes[0].Attributes[5].Name) 106 | assert.Equal(t, "hana_prd_sync_state", data.NodeAttributes.Nodes[0].Attributes[6].Name) 107 | assert.Equal(t, "hana_prd_version", data.NodeAttributes.Nodes[0].Attributes[7].Name) 108 | assert.Equal(t, "hana_prd_vhost", data.NodeAttributes.Nodes[0].Attributes[8].Name) 109 | assert.Equal(t, "lpa_prd_lpt", data.NodeAttributes.Nodes[0].Attributes[9].Name) 110 | assert.Equal(t, "master-rsc_SAPHana_PRD_HDB00", data.NodeAttributes.Nodes[0].Attributes[10].Name) 111 | 112 | assert.Equal(t, "PROMOTED", data.NodeAttributes.Nodes[0].Attributes[0].Value) 113 | assert.Equal(t, "logreplay", data.NodeAttributes.Nodes[0].Attributes[1].Value) 114 | assert.Equal(t, "node02", data.NodeAttributes.Nodes[0].Attributes[2].Value) 115 | assert.Equal(t, "4:P:master1:master:worker:master", data.NodeAttributes.Nodes[0].Attributes[3].Value) 116 | assert.Equal(t, "PRIMARY_SITE_NAME", data.NodeAttributes.Nodes[0].Attributes[4].Value) 117 | assert.Equal(t, "sync", data.NodeAttributes.Nodes[0].Attributes[5].Value) 118 | assert.Equal(t, "PRIM", data.NodeAttributes.Nodes[0].Attributes[6].Value) 119 | assert.Equal(t, "2.00.040.00.1553674765", data.NodeAttributes.Nodes[0].Attributes[7].Value) 120 | assert.Equal(t, "node01", data.NodeAttributes.Nodes[0].Attributes[8].Value) 121 | assert.Equal(t, "1571392102", data.NodeAttributes.Nodes[0].Attributes[9].Value) 122 | assert.Equal(t, "150", data.NodeAttributes.Nodes[0].Attributes[10].Value) 123 | 124 | assert.Len(t, data.NodeAttributes.Nodes[1].Attributes, 11) 125 | assert.Equal(t, "hana_prd_clone_state", data.NodeAttributes.Nodes[0].Attributes[0].Name) 126 | assert.Equal(t, "hana_prd_op_mode", data.NodeAttributes.Nodes[0].Attributes[1].Name) 127 | assert.Equal(t, "hana_prd_remoteHost", data.NodeAttributes.Nodes[0].Attributes[2].Name) 128 | assert.Equal(t, "hana_prd_roles", data.NodeAttributes.Nodes[0].Attributes[3].Name) 129 | assert.Equal(t, "hana_prd_site", data.NodeAttributes.Nodes[0].Attributes[4].Name) 130 | assert.Equal(t, "hana_prd_srmode", data.NodeAttributes.Nodes[0].Attributes[5].Name) 131 | assert.Equal(t, "hana_prd_sync_state", data.NodeAttributes.Nodes[0].Attributes[6].Name) 132 | assert.Equal(t, "hana_prd_version", data.NodeAttributes.Nodes[0].Attributes[7].Name) 133 | assert.Equal(t, "hana_prd_vhost", data.NodeAttributes.Nodes[0].Attributes[8].Name) 134 | assert.Equal(t, "lpa_prd_lpt", data.NodeAttributes.Nodes[0].Attributes[9].Name) 135 | assert.Equal(t, "master-rsc_SAPHana_PRD_HDB00", data.NodeAttributes.Nodes[0].Attributes[10].Name) 136 | 137 | assert.Equal(t, "DEMOTED", data.NodeAttributes.Nodes[1].Attributes[0].Value) 138 | assert.Equal(t, "logreplay", data.NodeAttributes.Nodes[1].Attributes[1].Value) 139 | assert.Equal(t, "node01", data.NodeAttributes.Nodes[1].Attributes[2].Value) 140 | assert.Equal(t, "4:S:master1:master:worker:master", data.NodeAttributes.Nodes[1].Attributes[3].Value) 141 | assert.Equal(t, "SECONDARY_SITE_NAME", data.NodeAttributes.Nodes[1].Attributes[4].Value) 142 | assert.Equal(t, "sync", data.NodeAttributes.Nodes[1].Attributes[5].Value) 143 | assert.Equal(t, "SOK", data.NodeAttributes.Nodes[1].Attributes[6].Value) 144 | assert.Equal(t, "2.00.040.00.1553674765", data.NodeAttributes.Nodes[1].Attributes[7].Value) 145 | assert.Equal(t, "node02", data.NodeAttributes.Nodes[1].Attributes[8].Value) 146 | assert.Equal(t, "30", data.NodeAttributes.Nodes[1].Attributes[9].Value) 147 | assert.Equal(t, "100", data.NodeAttributes.Nodes[1].Attributes[10].Value) 148 | } 149 | -------------------------------------------------------------------------------- /collector/pacemaker/pacemaker.go: -------------------------------------------------------------------------------- 1 | package pacemaker 2 | 3 | import ( 4 | "math" 5 | "strconv" 6 | "strings" 7 | "time" 8 | 9 | "github.com/ClusterLabs/ha_cluster_exporter/collector" 10 | "github.com/ClusterLabs/ha_cluster_exporter/collector/pacemaker/cib" 11 | "github.com/ClusterLabs/ha_cluster_exporter/collector/pacemaker/crmmon" 12 | 13 | "github.com/go-kit/log" 14 | "github.com/go-kit/log/level" 15 | "github.com/pkg/errors" 16 | "github.com/prometheus/client_golang/prometheus" 17 | ) 18 | 19 | const subsystem = "pacemaker" 20 | 21 | func NewCollector(crmMonPath string, cibAdminPath string, timestamps bool, logger log.Logger) (*pacemakerCollector, error) { 22 | err := collector.CheckExecutables(crmMonPath, cibAdminPath) 23 | if err != nil { 24 | return nil, errors.Wrapf(err, "could not initialize '%s' collector", subsystem) 25 | } 26 | 27 | c := &pacemakerCollector{ 28 | collector.NewDefaultCollector(subsystem, timestamps, logger), 29 | crmmon.NewCrmMonParser(crmMonPath), 30 | cib.NewCibAdminParser(cibAdminPath), 31 | } 32 | c.SetDescriptor("nodes", "The status of each node in the cluster; 1 means the node is in that status, 0 otherwise", []string{"node", "type", "status"}) 33 | c.SetDescriptor("node_attributes", "Metadata attributes of each node; value is always 1", []string{"node", "name", "value"}) 34 | c.SetDescriptor("resources", "The status of each resource in the cluster; 1 means the resource is in that status, 0 otherwise", []string{"node", "resource", "role", "managed", "status", "agent", "group", "clone"}) 35 | c.SetDescriptor("stonith_enabled", "Whether or not stonith is enabled", nil) 36 | c.SetDescriptor("maintenance_mode_enabled", "Whether or not cluster wide maintenance-mode is enabled", nil) 37 | c.SetDescriptor("fail_count", "The Fail count number per node and resource id", []string{"node", "resource"}) 38 | c.SetDescriptor("migration_threshold", "The migration_threshold number per node and resource id", []string{"node", "resource"}) 39 | c.SetDescriptor("config_last_change", "The timestamp of the last change of the cluster configuration", nil) 40 | c.SetDescriptor("location_constraints", "Resource location constraints. The value indicates the score.", []string{"constraint", "node", "resource", "role"}) 41 | 42 | return c, nil 43 | } 44 | 45 | type pacemakerCollector struct { 46 | collector.DefaultCollector 47 | crmMonParser crmmon.Parser 48 | cibParser cib.Parser 49 | } 50 | 51 | func (c *pacemakerCollector) CollectWithError(ch chan<- prometheus.Metric) error { 52 | level.Debug(c.Logger).Log("msg", "Collecting pacemaker metrics...") 53 | 54 | crmMon, err := c.crmMonParser.Parse() 55 | if err != nil { 56 | return errors.Wrap(err, "crm_mon parser error") 57 | } 58 | 59 | CIB, err := c.cibParser.Parse() 60 | if err != nil { 61 | return errors.Wrap(err, "cibadmin parser error") 62 | } 63 | 64 | c.recordStonithStatus(crmMon, ch) 65 | c.recordMaintenanceModeStatus(crmMon, ch) 66 | c.recordNodes(crmMon, ch) 67 | c.recordNodeAttributes(crmMon, ch) 68 | c.recordResources(crmMon, ch) 69 | c.recordFailCounts(crmMon, ch) 70 | c.recordMigrationThresholds(crmMon, ch) 71 | c.recordConstraints(CIB, ch) 72 | 73 | err = c.recordCibLastChange(crmMon, ch) 74 | if err != nil { 75 | return errors.Wrap(err, "could not record CIB last change") 76 | } 77 | 78 | return nil 79 | } 80 | 81 | func (c *pacemakerCollector) Collect(ch chan<- prometheus.Metric) { 82 | level.Debug(c.Logger).Log("msg", "Collecting pacemaker metrics...") 83 | 84 | err := c.CollectWithError(ch) 85 | if err != nil { 86 | level.Warn(c.Logger).Log("msg", c.GetSubsystem()+" collector scrape failed", "err", err) 87 | } 88 | } 89 | 90 | func (c *pacemakerCollector) recordMaintenanceModeStatus(crmMon crmmon.Root, ch chan<- prometheus.Metric) { 91 | var maintenanceModeEnabled float64 92 | if crmMon.Summary.ClusterOptions.MaintenanceMode { 93 | maintenanceModeEnabled = 1 94 | } 95 | 96 | ch <- c.MakeGaugeMetric("maintenance_mode_enabled", maintenanceModeEnabled) 97 | } 98 | 99 | func (c *pacemakerCollector) recordStonithStatus(crmMon crmmon.Root, ch chan<- prometheus.Metric) { 100 | var stonithEnabled float64 101 | if crmMon.Summary.ClusterOptions.StonithEnabled { 102 | stonithEnabled = 1 103 | } 104 | 105 | ch <- c.MakeGaugeMetric("stonith_enabled", stonithEnabled) 106 | } 107 | 108 | func (c *pacemakerCollector) recordNodes(crmMon crmmon.Root, ch chan<- prometheus.Metric) { 109 | for _, node := range crmMon.Nodes { 110 | 111 | // this is a map of boolean flags for each possible status of the node 112 | nodeStatuses := map[string]bool{ 113 | "online": node.Online, 114 | "standby": node.Standby, 115 | "standby_onfail": node.StandbyOnFail, 116 | "maintenance": node.Maintenance, 117 | "pending": node.Pending, 118 | "unclean": node.Unclean, 119 | "shutdown": node.Shutdown, 120 | "expected_up": node.ExpectedUp, 121 | "dc": node.DC, 122 | } 123 | 124 | // since we have a combined cardinality of node * status, we cycle through all the possible statuses 125 | // and we record a metric for each one 126 | for nodeStatus, flag := range nodeStatuses { 127 | var statusValue float64 128 | if flag { 129 | statusValue = 1 130 | } 131 | ch <- c.MakeGaugeMetric("nodes", statusValue, node.Name, node.Type, nodeStatus) 132 | } 133 | } 134 | } 135 | 136 | func (c *pacemakerCollector) recordResources(crmMon crmmon.Root, ch chan<- prometheus.Metric) { 137 | for _, resource := range crmMon.Resources { 138 | c.recordResource(resource, "", "", ch) 139 | } 140 | for _, clone := range crmMon.Clones { 141 | recorded := make(map[crmmon.Resource]bool) // we need to track cloned resources to avoid duplicates 142 | for _, resource := range clone.Resources { 143 | // Avoid recording stopped cloned resources multiple times 144 | if recorded[resource] == true { 145 | continue 146 | } 147 | 148 | c.recordResource(resource, "", clone.Id, ch) 149 | 150 | recorded[resource] = true 151 | } 152 | } 153 | for _, group := range crmMon.Groups { 154 | for _, resource := range group.Resources { 155 | c.recordResource(resource, group.Id, "", ch) 156 | } 157 | } 158 | } 159 | 160 | func (c *pacemakerCollector) recordResource(resource crmmon.Resource, group string, clone string, ch chan<- prometheus.Metric) { 161 | 162 | // this is a map of boolean flags for each possible status of the resource 163 | resourceStatuses := map[string]bool{ 164 | "active": resource.Active, 165 | "orphaned": resource.Orphaned, 166 | "blocked": resource.Blocked, 167 | "failed": resource.Failed, 168 | "failure_ignored": resource.FailureIgnored, 169 | } 170 | 171 | var nodeName string 172 | if resource.Node != nil { 173 | nodeName = resource.Node.Name 174 | } 175 | 176 | // since we have a combined cardinality of resource * status, we cycle through all the possible statuses 177 | // and we record a new metric if the flag for that status is on 178 | for resourceStatus, flag := range resourceStatuses { 179 | var statusValue float64 180 | if flag { 181 | statusValue = 1 182 | } 183 | 184 | labels := []string{ 185 | nodeName, 186 | resource.Id, 187 | strings.ToLower(resource.Role), 188 | strconv.FormatBool(resource.Managed), 189 | resourceStatus, 190 | resource.Agent, 191 | group, 192 | clone, 193 | } 194 | 195 | ch <- c.MakeGaugeMetric("resources", statusValue, labels...) 196 | } 197 | } 198 | 199 | func (c *pacemakerCollector) recordFailCounts(crmMon crmmon.Root, ch chan<- prometheus.Metric) { 200 | for _, node := range crmMon.NodeHistory.Nodes { 201 | for _, resHistory := range node.ResourceHistory { 202 | failCount := float64(resHistory.FailCount) 203 | 204 | // if value is 1000000 this is a special value in pacemaker which is infinity fail count 205 | if resHistory.FailCount >= 1000000 { 206 | failCount = math.Inf(1) 207 | } 208 | 209 | ch <- c.MakeGaugeMetric("fail_count", failCount, node.Name, resHistory.Name) 210 | 211 | } 212 | } 213 | } 214 | 215 | func (c *pacemakerCollector) recordCibLastChange(crmMon crmmon.Root, ch chan<- prometheus.Metric) error { 216 | t, err := time.Parse(time.ANSIC, crmMon.Summary.LastChange.Time) 217 | if err != nil { 218 | return errors.Wrap(err, "could not parse date") 219 | } 220 | // we record the timestamp of the last change as a float counter metric 221 | ch <- c.MakeCounterMetric("config_last_change", float64(t.Unix())) 222 | 223 | return nil 224 | } 225 | 226 | func (c *pacemakerCollector) recordMigrationThresholds(crmMon crmmon.Root, ch chan<- prometheus.Metric) { 227 | for _, node := range crmMon.NodeHistory.Nodes { 228 | for _, resHistory := range node.ResourceHistory { 229 | ch <- c.MakeGaugeMetric("migration_threshold", float64(resHistory.MigrationThreshold), node.Name, resHistory.Name) 230 | } 231 | } 232 | } 233 | 234 | func (c *pacemakerCollector) recordConstraints(CIB cib.Root, ch chan<- prometheus.Metric) { 235 | for _, constraint := range CIB.Configuration.Constraints.RscLocations { 236 | var constraintScore float64 237 | switch constraint.Score { 238 | case "INFINITY": 239 | constraintScore = math.Inf(1) 240 | case "-INFINITY": 241 | constraintScore = math.Inf(-1) 242 | default: 243 | s, _ := strconv.Atoi(constraint.Score) 244 | constraintScore = float64(s) 245 | } 246 | 247 | ch <- c.MakeGaugeMetric("location_constraints", constraintScore, constraint.Id, constraint.Node, constraint.Resource, strings.ToLower(constraint.Role)) 248 | } 249 | } 250 | 251 | func (c *pacemakerCollector) recordNodeAttributes(crmMon crmmon.Root, ch chan<- prometheus.Metric) { 252 | for _, node := range crmMon.NodeAttributes.Nodes { 253 | for _, attr := range node.Attributes { 254 | ch <- c.MakeGaugeMetric("node_attributes", 1, node.Name, attr.Name, attr.Value) 255 | } 256 | } 257 | } 258 | -------------------------------------------------------------------------------- /collector/pacemaker/pacemaker_test.go: -------------------------------------------------------------------------------- 1 | package pacemaker 2 | 3 | import ( 4 | "testing" 5 | 6 | "github.com/go-kit/log" 7 | "github.com/stretchr/testify/assert" 8 | 9 | assertcustom "github.com/ClusterLabs/ha_cluster_exporter/internal/assert" 10 | ) 11 | 12 | func TestNewPacemakerCollector(t *testing.T) { 13 | _, err := NewCollector("../../test/fake_crm_mon.sh", "../../test/fake_cibadmin.sh", false, log.NewNopLogger()) 14 | 15 | assert.Nil(t, err) 16 | } 17 | 18 | func TestNewPacemakerCollectorChecksCrmMonExistence(t *testing.T) { 19 | _, err := NewCollector("../../test/nonexistent", "", false, log.NewNopLogger()) 20 | 21 | assert.Error(t, err) 22 | assert.Contains(t, err.Error(), "'../../test/nonexistent' does not exist") 23 | } 24 | 25 | func TestNewPacemakerCollectorChecksCrmMonExecutableBits(t *testing.T) { 26 | _, err := NewCollector("../../test/dummy", "", false, log.NewNopLogger()) 27 | 28 | assert.Error(t, err) 29 | assert.Contains(t, err.Error(), "'../../test/dummy' is not executable") 30 | } 31 | 32 | func TestPacemakerCollector(t *testing.T) { 33 | collector, err := NewCollector("../../test/fake_crm_mon.sh", "../../test/fake_cibadmin.sh", false, log.NewNopLogger()) 34 | 35 | assert.Nil(t, err) 36 | assertcustom.Metrics(t, collector, "pacemaker.metrics") 37 | } 38 | -------------------------------------------------------------------------------- /collector/sbd/sbd.go: -------------------------------------------------------------------------------- 1 | package sbd 2 | 3 | import ( 4 | "fmt" 5 | "io/ioutil" 6 | "os" 7 | "os/exec" 8 | "regexp" 9 | "strconv" 10 | "strings" 11 | 12 | "github.com/go-kit/log" 13 | "github.com/go-kit/log/level" 14 | "github.com/pkg/errors" 15 | "github.com/prometheus/client_golang/prometheus" 16 | 17 | "github.com/ClusterLabs/ha_cluster_exporter/collector" 18 | ) 19 | 20 | const subsystem = "sbd" 21 | 22 | const SBD_STATUS_UNHEALTHY = "unhealthy" 23 | const SBD_STATUS_HEALTHY = "healthy" 24 | 25 | // NewCollector create a new sbd collector 26 | func NewCollector(sbdPath string, sbdConfigPath string, timestamps bool, logger log.Logger) (*sbdCollector, error) { 27 | err := checkArguments(sbdPath, sbdConfigPath) 28 | if err != nil { 29 | return nil, errors.Wrapf(err, "could not initialize '%s' collector", subsystem) 30 | } 31 | 32 | c := &sbdCollector{ 33 | collector.NewDefaultCollector(subsystem, timestamps, logger), 34 | sbdPath, 35 | sbdConfigPath, 36 | } 37 | 38 | c.SetDescriptor("devices", "SBD devices; one line per device", []string{"device", "status"}) 39 | c.SetDescriptor("timeouts", "SBD timeouts for each device and type", []string{"device", "type"}) 40 | 41 | return c, nil 42 | } 43 | 44 | func checkArguments(sbdPath string, sbdConfigPath string) error { 45 | if err := collector.CheckExecutables(sbdPath); err != nil { 46 | return err 47 | } 48 | if _, err := os.Stat(sbdConfigPath); os.IsNotExist(err) { 49 | return errors.Errorf("'%s' does not exist", sbdConfigPath) 50 | } 51 | return nil 52 | } 53 | 54 | type sbdCollector struct { 55 | collector.DefaultCollector 56 | sbdPath string 57 | sbdConfigPath string 58 | } 59 | 60 | func (c *sbdCollector) CollectWithError(ch chan<- prometheus.Metric) error { 61 | level.Debug(c.Logger).Log("msg", "Collecting pacemaker metrics...") 62 | 63 | sbdConfiguration, err := readSdbFile(c.sbdConfigPath) 64 | if err != nil { 65 | return err 66 | } 67 | 68 | sbdDevices := getSbdDevices(sbdConfiguration) 69 | 70 | sbdStatuses := c.getSbdDeviceStatuses(sbdDevices) 71 | for sbdDev, sbdStatus := range sbdStatuses { 72 | ch <- c.MakeGaugeMetric("devices", 1, sbdDev, sbdStatus) 73 | } 74 | 75 | sbdWatchdogs, sbdMsgWaits := c.getSbdTimeouts(sbdDevices) 76 | for sbdDev, sbdWatchdog := range sbdWatchdogs { 77 | ch <- c.MakeGaugeMetric("timeouts", sbdWatchdog, sbdDev, "watchdog") 78 | } 79 | 80 | for sbdDev, sbdMsgWait := range sbdMsgWaits { 81 | ch <- c.MakeGaugeMetric("timeouts", sbdMsgWait, sbdDev, "msgwait") 82 | } 83 | 84 | return nil 85 | } 86 | 87 | func (c *sbdCollector) Collect(ch chan<- prometheus.Metric) { 88 | level.Debug(c.Logger).Log("msg", "Collecting pacemaker metrics...") 89 | 90 | err := c.CollectWithError(ch) 91 | if err != nil { 92 | level.Warn(c.Logger).Log("msg", c.GetSubsystem()+" collector scrape failed", "err", err) 93 | } 94 | } 95 | 96 | func readSdbFile(sbdConfigPath string) ([]byte, error) { 97 | sbdConfFile, err := os.Open(sbdConfigPath) 98 | if err != nil { 99 | return nil, fmt.Errorf("could not open sbd config file %s", err) 100 | } 101 | 102 | defer sbdConfFile.Close() 103 | sbdConfigRaw, err := ioutil.ReadAll(sbdConfFile) 104 | 105 | if err != nil { 106 | return nil, fmt.Errorf("could not read sbd config file %s", err) 107 | } 108 | return sbdConfigRaw, nil 109 | } 110 | 111 | // retrieve a list of sbd devices from the config file contents 112 | func getSbdDevices(sbdConfigRaw []byte) []string { 113 | // The following regex matches lines like SBD_DEVICE="/dev/foo" or SBD_DEVICE=/dev/foo;/dev/bar 114 | // It captures all the colon separated device names, without double quotes, into a capture group 115 | // It allows for free indentation, trailing spaces and end of lines, and it will ignore commented lines 116 | // Unbalanced double quotes are not checked and they will still produce a match 117 | // If multiple matching lines are present, only the first will be used 118 | // The single device name pattern is `[\w-/]+`, which is pretty relaxed 119 | regex := regexp.MustCompile(`(?m)^\s*SBD_DEVICE="?((?:[\w-/]+;?\s?)+)"?\s*$`) 120 | sbdDevicesLine := regex.FindStringSubmatch(string(sbdConfigRaw)) 121 | 122 | // if SBD_DEVICE line could not be found, return 0 devices 123 | if sbdDevicesLine == nil { 124 | return nil 125 | } 126 | 127 | // split the first capture group, e.g. `/dev/foo;/dev/bar`; the 0th element is always the whole line 128 | sbdDevices := strings.Split(strings.TrimRight(sbdDevicesLine[1], ";"), ";") 129 | for i, _ := range sbdDevices { 130 | sbdDevices[i] = strings.TrimSpace(sbdDevices[i]) 131 | } 132 | 133 | return sbdDevices 134 | } 135 | 136 | // this function takes a list of sbd devices and returns 137 | // a map of SBD device names with 1 if healthy, 0 if not 138 | func (c *sbdCollector) getSbdDeviceStatuses(sbdDevices []string) map[string]string { 139 | sbdStatuses := make(map[string]string) 140 | for _, sbdDev := range sbdDevices { 141 | _, err := exec.Command(c.sbdPath, "-d", sbdDev, "dump").Output() 142 | 143 | // in case of error the device is not healthy 144 | if err != nil { 145 | sbdStatuses[sbdDev] = SBD_STATUS_UNHEALTHY 146 | } else { 147 | sbdStatuses[sbdDev] = SBD_STATUS_HEALTHY 148 | } 149 | } 150 | 151 | return sbdStatuses 152 | } 153 | 154 | // for each sbd device, extract the watchdog and msgwait timeout via regex 155 | func (c *sbdCollector) getSbdTimeouts(sbdDevices []string) (map[string]float64, map[string]float64) { 156 | sbdWatchdogs := make(map[string]float64) 157 | sbdMsgWaits := make(map[string]float64) 158 | for _, sbdDev := range sbdDevices { 159 | sbdDump, _ := exec.Command(c.sbdPath, "-d", sbdDev, "dump").Output() 160 | 161 | regexW := regexp.MustCompile(`Timeout \(msgwait\) *: \d+`) 162 | regex := regexp.MustCompile(`Timeout \(watchdog\) *: \d+`) 163 | 164 | msgWaitLine := regexW.FindStringSubmatch(string(sbdDump)) 165 | watchdogLine := regex.FindStringSubmatch(string(sbdDump)) 166 | 167 | if watchdogLine == nil || msgWaitLine == nil { 168 | continue 169 | } 170 | 171 | // get the timeout from the line 172 | regexNumber := regexp.MustCompile(`\d+`) 173 | watchdogTimeout := regexNumber.FindString(string(watchdogLine[0])) 174 | msgWaitTimeout := regexNumber.FindString(string(msgWaitLine[0])) 175 | 176 | // map the timeout to the device 177 | if s, err := strconv.ParseFloat(watchdogTimeout, 64); err == nil { 178 | sbdWatchdogs[sbdDev] = s 179 | } 180 | 181 | // map the timeout to the device 182 | if s, err := strconv.ParseFloat(msgWaitTimeout, 64); err == nil { 183 | sbdMsgWaits[sbdDev] = s 184 | } 185 | 186 | } 187 | return sbdWatchdogs, sbdMsgWaits 188 | } 189 | -------------------------------------------------------------------------------- /collector/sbd/sbd_test.go: -------------------------------------------------------------------------------- 1 | package sbd 2 | 3 | import ( 4 | "testing" 5 | 6 | "github.com/go-kit/log" 7 | "github.com/stretchr/testify/assert" 8 | 9 | assertcustom "github.com/ClusterLabs/ha_cluster_exporter/internal/assert" 10 | ) 11 | 12 | func TestReadSbdConfFileError(t *testing.T) { 13 | sbdConfFile, err := readSdbFile("../../test/nonexistent") 14 | 15 | assert.Nil(t, sbdConfFile) 16 | assert.Error(t, err) 17 | } 18 | 19 | func TestGetSbdDevicesWithoutDoubleQuotes(t *testing.T) { 20 | // this is a full config file more or less , in other tests it is cutted 21 | sbdConfig := ` 22 | # SBD_DEVICE specifies the devices to use for exchanging sbd messages 23 | # and to monitor. If specifying more than one path, use ";" as 24 | # separator. 25 | # 26 | #SBD_DEVICE="" 27 | 28 | ## Type: yesno 29 | ## Default: yes 30 | # 31 | # Whether to enable the pacemaker integration. 32 | # 33 | SBD_PACEMAKER=yes 34 | 35 | ## Type: list(always,clean) 36 | ## Default: always 37 | # 38 | # Specify the start mode for sbd. Setting this to "clean" will only 39 | # allow sbd to start if it was not previously fenced. See the -S option 40 | # in the man page. 41 | # 42 | SBD_STARTMODE=always 43 | 44 | ## Type: yesno / integer 45 | ## Default: no 46 | # 47 | # Whether to delay after starting sbd on boot for "msgwait" seconds. 48 | # This may be necessary if your cluster nodes reboot so fast that the 49 | # other nodes are still waiting in the fence acknowledgement phase. 50 | # This is an occasional issue with virtual machines. 51 | # 52 | # This can also be enabled by being set to a specific delay value, in 53 | # seconds. Sometimes a longer delay than the default, "msgwait", is 54 | # needed, for example in the cases where it's considered to be safer to 55 | # wait longer than: 56 | # corosync token timeout + consensus timeout + pcmk_delay_max + msgwait 57 | # 58 | # Be aware that the special value "1" means "yes" rather than "1s". 59 | # 60 | # Consider that you might have to adapt the startup-timeout accordingly 61 | # if the default isn't sufficient. (TimeoutStartSec for systemd) 62 | # 63 | # This option may be ignored at a later point, once pacemaker handles 64 | # this case better. 65 | # 66 | SBD_DELAY_START=no 67 | 68 | ## Type: string 69 | ## Default: /dev/watchdog 70 | # 71 | # Watchdog device to use. If set to /dev/null, no watchdog device will 72 | # be used. 73 | # 74 | SBD_WATCHDOG_DEV=/dev/watchdog 75 | 76 | ## Type: integer 77 | ## Default: 5 78 | # 79 | # How long, in seconds, the watchdog will wait before panicking the 80 | # node if no-one tickles it. 81 | # 82 | # This depends mostly on your storage latency; the majority of devices 83 | # must be successfully read within this time, or else the node will 84 | # self-fence. 85 | # 86 | # If your sbd device(s) reside on a multipath setup or iSCSI, this 87 | # should be the time required to detect a path failure. 88 | # 89 | # Be aware that watchdog timeout set in the on-disk metadata takes 90 | # precedence. 91 | # 92 | SBD_WATCHDOG_TIMEOUT=5 93 | 94 | ## Type: string 95 | ## Default: "flush,reboot" 96 | # 97 | # Actions to be executed when the watchers don't timely report to the sbd 98 | # master process or one of the watchers detects that the master process 99 | # has died. 100 | # 101 | # Set timeout-action to comma-separated combination of 102 | # noflush|flush plus reboot|crashdump|off. 103 | # If just one of both is given the other stays at the default. 104 | # 105 | # This doesn't affect actions like off, crashdump, reboot explicitly 106 | # triggered via message slots. 107 | # And it does as well not configure the action a watchdog would 108 | # trigger should it run off (there is no generic interface). 109 | # 110 | SBD_TIMEOUT_ACTION=flush,reboot 111 | 112 | ## Type: string 113 | ## Default: "" 114 | # 115 | # Additional options for starting sbd 116 | # 117 | SBD_OPTS= 118 | SBD_DEVICE=/dev/vda;/dev/vdb;/dev/vdc 119 | ` 120 | 121 | sbdDevices := getSbdDevices([]byte(sbdConfig)) 122 | 123 | assert.Len(t, sbdDevices, 3) 124 | assert.Equal(t, "/dev/vda", sbdDevices[0]) 125 | assert.Equal(t, "/dev/vdb", sbdDevices[1]) 126 | assert.Equal(t, "/dev/vdc", sbdDevices[2]) 127 | } 128 | 129 | // test the other case with double quotes, and put the string in random place 130 | func TestGetSbdDevicesWithDoubleQuotes(t *testing.T) { 131 | sbdConfig := `## Type: string 132 | ## Default: "" 133 | # 134 | # SBD_DEVICE specifies the devices to use for exchanging sbd messages 135 | # and to monitor. If specifying more than one path, use ";" as 136 | # separator. 137 | # 138 | #SBD_DEVICE="" 139 | 140 | SBD_WATCHDOG_TIMEOUT=5 141 | 142 | SBD_DEVICE="/dev/vda;/dev/vdb;/dev/vdc" 143 | 144 | SBD_TIMEOUT_ACTION=flush,reboot 145 | 146 | ## Type: string 147 | ## Default: "" 148 | # 149 | # Additional options for starting sbd 150 | # 151 | SBD_OPTS=` 152 | 153 | sbdDevices := getSbdDevices([]byte(sbdConfig)) 154 | 155 | assert.Len(t, sbdDevices, 3) 156 | assert.Equal(t, "/dev/vda", sbdDevices[0]) 157 | assert.Equal(t, "/dev/vdb", sbdDevices[1]) 158 | assert.Equal(t, "/dev/vdc", sbdDevices[2]) 159 | } 160 | 161 | // test the other case with double quotes, and put the string in random place 162 | func TestOnlyOneDeviceSbd(t *testing.T) { 163 | sbdConfig := `## Type: string 164 | ## Default: "" 165 | 166 | SBD_DEVICE=/dev/vdc 167 | 168 | ## Type: string 169 | ## Default: "flush,reboot" 170 | ` 171 | 172 | sbdDevices := getSbdDevices([]byte(sbdConfig)) 173 | 174 | assert.Len(t, sbdDevices, 1) 175 | assert.Equal(t, "/dev/vdc", sbdDevices[0]) 176 | } 177 | 178 | func TestSbdDeviceParserWithFullCommentBeforeActualSetting(t *testing.T) { 179 | sbdConfig := ` 180 | # SBD_DEVICE=/dev/foo 181 | SBD_DEVICE=/dev/vdc;/dev/vdd` 182 | 183 | sbdDevices := getSbdDevices([]byte(sbdConfig)) 184 | 185 | assert.Len(t, sbdDevices, 2) 186 | assert.Equal(t, "/dev/vdc", sbdDevices[0]) 187 | assert.Equal(t, "/dev/vdd", sbdDevices[1]) 188 | } 189 | 190 | func TestSbdDeviceParserWithSpaceAfterSemicolon(t *testing.T) { 191 | sbdConfig := `SBD_DEVICE=/dev/vdc; /dev/vdd` 192 | 193 | sbdDevices := getSbdDevices([]byte(sbdConfig)) 194 | 195 | assert.Len(t, sbdDevices, 2) 196 | assert.Equal(t, "/dev/vdc", sbdDevices[0]) 197 | assert.Equal(t, "/dev/vdd", sbdDevices[1]) 198 | } 199 | 200 | func TestSbdDeviceParserWithSemicolon(t *testing.T) { 201 | sbdConfig := `SBD_DEVICE=/dev/vdc;/dev/vdd;` 202 | 203 | sbdDevices := getSbdDevices([]byte(sbdConfig)) 204 | 205 | assert.Len(t, sbdDevices, 2) 206 | assert.Equal(t, "/dev/vdc", sbdDevices[0]) 207 | assert.Equal(t, "/dev/vdd", sbdDevices[1]) 208 | } 209 | 210 | func TestNewSbdCollector(t *testing.T) { 211 | _, err := NewCollector("../../test/fake_sbd.sh", "../../test/fake_sbdconfig", false, log.NewNopLogger()) 212 | 213 | assert.Nil(t, err) 214 | } 215 | 216 | func TestNewSbdCollectorChecksSbdConfigExistence(t *testing.T) { 217 | _, err := NewCollector("../../test/fake_sbd.sh", "../../test/nonexistent", false, log.NewNopLogger()) 218 | 219 | assert.Error(t, err) 220 | assert.Contains(t, err.Error(), "'../../test/nonexistent' does not exist") 221 | } 222 | 223 | func TestNewSbdCollectorChecksSbdExistence(t *testing.T) { 224 | _, err := NewCollector("../../test/nonexistent", "../../test/fake_sbdconfig", false, log.NewNopLogger()) 225 | 226 | assert.Error(t, err) 227 | assert.Contains(t, err.Error(), "'../../test/nonexistent' does not exist") 228 | } 229 | 230 | func TestNewSbdCollectorChecksSbdExecutableBits(t *testing.T) { 231 | _, err := NewCollector("../../test/dummy", "../../test/fake_sbdconfig", false, log.NewNopLogger()) 232 | 233 | assert.Error(t, err) 234 | assert.Contains(t, err.Error(), "'../../test/dummy' is not executable") 235 | } 236 | 237 | func TestSBDCollector(t *testing.T) { 238 | collector, _ := NewCollector("../../test/fake_sbd_dump.sh", "../../test/fake_sbdconfig", false, log.NewNopLogger()) 239 | assertcustom.Metrics(t, collector, "sbd.metrics") 240 | } 241 | 242 | func TestWatchdog(t *testing.T) { 243 | collector, err := NewCollector("../../test/fake_sbd_dump.sh", "../../test/fake_sbdconfig", false, log.NewNopLogger()) 244 | 245 | assert.Nil(t, err) 246 | assertcustom.Metrics(t, collector, "sbd.metrics") 247 | } 248 | -------------------------------------------------------------------------------- /dashboards/README.md: -------------------------------------------------------------------------------- 1 | # Grafana dashboards 2 | 3 | We provide two dashboards for Grafana, leveraging the exporter. 4 | 5 | In addition to `ha_cluster_exporter`, these dashboards require Prometheus `node_exporter` to be configured on the target nodes. 6 | 7 | They also assume that the target nodes in each cluster are grouped via the `job` label. 8 | 9 | ## Multi-Cluster overview 10 | 11 | This dashboard gives an overview of multiple clusters monitored by the same Prometheus server. 12 | 13 | ![Multi-Cluster overview](screenshot-multi.png) 14 | 15 | ## HA Cluster details 16 | 17 | This dashboard shows the details of a single cluster. 18 | 19 | ![HA Cluster details](screenshot-detail.png) 20 | 21 | 22 | ## Installation 23 | 24 | ### RPM 25 | 26 | On openSUSE and SUSE Linux Enterprise distributions, you can install the package via zypper in your Grafana host: 27 | ``` 28 | zypper in grafana-ha-cluster-dashboards 29 | systemctl restart grafana-server 30 | ``` 31 | 32 | For the latest development version, please refer to the [development upstream project in OBS](https://build.opensuse.org/project/show/network:ha-clustering:sap-deployments:devel), which is automatically updated everytime we merge changes in this repository. 33 | 34 | ### Manual 35 | 36 | Copy the [provider configuration file](provider-sleha.yaml) in `/etc/grafana/provisioning/dashboards` and then the JSON files inside `/var/lib/grafana/dashboards/sleha`. 37 | 38 | Once done, restart the Grafana server. 39 | 40 | ### Grafana.com 41 | 42 | Dashboards will be soon available on [grafana.com/dashboards](https://grafana.com/dashboards) 43 | 44 | ## Development notes 45 | 46 | - Please make sure the `version` field in the JSON is incremented just once per PR. 47 | - Unlike the exporter, OBS Submit Requests are not automated for the dashboard package. 48 | Once PRs are merged, you will have to manually perform a Submit Request, after updating the `version` field in the `_service` file and adding an entry to the `grafana-ha-cluster-dashboards.changes` file. 49 | -------------------------------------------------------------------------------- /dashboards/provider-sleha.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: 1 2 | 3 | providers: 4 | - name: SUSE Linux Enterprise High Availability Extension 5 | folder: SUSE Linux Enterprise 6 | folderUid: 3b1e0b26-fc28-4254-88a1-2d3516b5e404 7 | type: file 8 | allowUiUpdates: true 9 | editable: true 10 | options: 11 | path: /var/lib/grafana/dashboards/sleha 12 | -------------------------------------------------------------------------------- /dashboards/screenshot-detail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClusterLabs/ha_cluster_exporter/8cf20d5b3b3cafa8d1e5d349ddf7fbbbe681f1a8/dashboards/screenshot-detail.png -------------------------------------------------------------------------------- /dashboards/screenshot-multi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClusterLabs/ha_cluster_exporter/8cf20d5b3b3cafa8d1e5d349ddf7fbbbe681f1a8/dashboards/screenshot-multi.png -------------------------------------------------------------------------------- /doc/design.md: -------------------------------------------------------------------------------- 1 | # Design Notes 2 | 3 | This document describes the rationale behind design decisions taken during the development of this project. 4 | 5 | ## Goals 6 | 7 | - Export runtime statistics about the various ClusterLabs Linux HA cluster components from existing data sources, to be consumed by a Prometheus monitoring stack. 8 | 9 | ## Non-goals 10 | 11 | - Maintain an internal, consistent, persisting representation of the cluster state; since the original source of truth is distributed, we want to avoid the complexity of a stateful middleware. 12 | 13 | 14 | ## Structure 15 | 16 | The project consist in a small HTTP application that exposes runtime data in a line protocol. 17 | 18 | A series of "metric collectors" are consumed by the main application entry point, `ha_cluster_exporter.go`, where they are registered with the Prometheus client and then exposed via its HTTP handler. 19 | 20 | Concurrency is handled internally by a worker pool provided by the Prometheus library, but this implementation detail is completely obfuscated to the consumers. 21 | 22 | The data sources are read every time an HTTP request comes, and the collected metrics are not shared: their lifecycle corresponds with the request's. 23 | 24 | The `internal` package contains common code shared among all the other packages, but not intended for usage outside this projects. 25 | 26 | ## Collectors 27 | 28 | Inside the `collector` package, you wil find the code of the main logic of the project: these are a number of [`prometheus.Collector`](https://github.com/prometheus/client_golang/blob/b25ce2693a6de99c3ea1a1471cd8f873301a452f/prometheus/collector.go#L16-L63) implementations, one for each cluster component (that we'll call _subsystems_), like Pacemaker, or Corosync. 29 | 30 | Common functionality is provided by composing the [`DefaultCollector`](../collector/default_collector.go). 31 | 32 | Each subsystem collector has a dedicated package; some are very simple, some are little more nuanced. In general, they depend on external, globally available, system tools, to introspect the subsystems. 33 | 34 | The collectors usually just invoke these system commands, parsing the output into bespoke data structures. 35 | When building these data structures involves a significant amount of code, for a better separation of concerns this responsibility is extracted in dedicated subpackages, like [`collector/pacemaker/cib`](../collector/pacemaker/cib). 36 | 37 | The data structures are then used by the collectors to build the Prometheus metrics. 38 | 39 | More details about the metrics themselves can be found in the [metrics](metrics.md) document. 40 | -------------------------------------------------------------------------------- /doc/development.md: -------------------------------------------------------------------------------- 1 | # Developer notes 2 | 3 | 1. [Makefile](#makefile) 4 | 2. [OBS packaging](#obs-packaging) 5 | 6 | 7 | ## Makefile 8 | 9 | Most development tasks can be accomplished via [make](../Makefile). 10 | 11 | For starters, you can run the default target with just `make`. 12 | 13 | The default target will clean, analyse, test and build the amd64 binary into the `build/bin` directory. 14 | 15 | You can also cross-compile to the various architectures we support with `make build-all`. 16 | 17 | 18 | ## OBS Packaging 19 | 20 | The CI will automatically publish GitHub releases to SUSE's Open Build Service: to perform a new release, just publish a new GH release. Tags must always follow the [SemVer](https://semver.org/) scheme. 21 | 22 | If you wish to produce an OBS working directory locally, having configured [`osc`](https://en.opensuse.org/openSUSE:OSC) already, you can run: 23 | ``` 24 | make exporter-obs-workdir 25 | ``` 26 | This will checkout the OBS project and prepare a new OBS commit in the `build/obs` directory. 27 | 28 | You can use the `OSB_PROJECT`, `REPOSITORY`, `VERSION` and `REVISION` environment variables to change the behaviour of OBS-related make targets. 29 | 30 | By default, the current Git working directory is used to infer the values of `VERSION` and `REVISION`, which are used by OBS source services to generate a compressed archive of the sources. 31 | 32 | For example, if you were on a feature branch of your own fork, you may want to change these variables, so: 33 | ```bash 34 | git checkout feature/xyz 35 | git push johndoe feature/xyz # don't forget to push changes your own fork remote 36 | export OBS_PROJECT=home:JohnDoe 37 | export REPOSITORY=johndoe/prometheus-ha_cluster_exporter 38 | make exporter-obs-workdir 39 | ``` 40 | will prepare to commit on OBS into `home:JohnDoe/prometheus-ha_cluster_exporter` by checking out the `feature/xyz` branch from `github.com/johndoe/my_forked_repo`. 41 | 42 | At last, to actually perform the commit into OBS, run: 43 | ```bash 44 | make exporter-obs-commit 45 | ``` 46 | 47 | Note that that actual continuously deployed releases also involve an intermediate step that updates the changelog automatically with the markdown text of the GitHub release. 48 | -------------------------------------------------------------------------------- /go.mod: -------------------------------------------------------------------------------- 1 | module github.com/ClusterLabs/ha_cluster_exporter 2 | 3 | go 1.23 4 | 5 | toolchain go1.23.0 6 | 7 | require ( 8 | github.com/alecthomas/kingpin/v2 v2.4.0 9 | github.com/go-kit/log v0.2.1 10 | github.com/golang/mock v1.6.0 11 | github.com/pkg/errors v0.9.1 12 | github.com/prometheus/client_golang v1.20.5 13 | github.com/prometheus/client_model v0.6.1 14 | github.com/prometheus/common v0.61.0 15 | github.com/prometheus/exporter-toolkit v0.10.0 16 | github.com/spf13/viper v1.19.0 17 | github.com/stretchr/testify v1.10.0 18 | ) 19 | 20 | require ( 21 | github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137 // indirect 22 | github.com/beorn7/perks v1.0.1 // indirect 23 | github.com/cespare/xxhash/v2 v2.3.0 // indirect 24 | github.com/coreos/go-systemd/v22 v22.5.0 // indirect 25 | github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect 26 | github.com/fsnotify/fsnotify v1.7.0 // indirect 27 | github.com/go-logfmt/logfmt v0.5.1 // indirect 28 | github.com/hashicorp/hcl v1.0.0 // indirect 29 | github.com/jpillora/backoff v1.0.0 // indirect 30 | github.com/klauspost/compress v1.17.9 // indirect 31 | github.com/kylelemons/godebug v1.1.0 // indirect 32 | github.com/magiconair/properties v1.8.7 // indirect 33 | github.com/mitchellh/mapstructure v1.5.0 // indirect 34 | github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect 35 | github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f // indirect 36 | github.com/pelletier/go-toml/v2 v2.2.2 // indirect 37 | github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect 38 | github.com/prometheus/procfs v0.15.1 // indirect 39 | github.com/sagikazarmark/locafero v0.4.0 // indirect 40 | github.com/sagikazarmark/slog-shim v0.1.0 // indirect 41 | github.com/sourcegraph/conc v0.3.0 // indirect 42 | github.com/spf13/afero v1.11.0 // indirect 43 | github.com/spf13/cast v1.6.0 // indirect 44 | github.com/spf13/pflag v1.0.5 // indirect 45 | github.com/subosito/gotenv v1.6.0 // indirect 46 | github.com/xhit/go-str2duration/v2 v2.1.0 // indirect 47 | go.uber.org/atomic v1.9.0 // indirect 48 | go.uber.org/multierr v1.9.0 // indirect 49 | golang.org/x/crypto v0.31.0 // indirect 50 | golang.org/x/exp v0.0.0-20230905200255-921286631fa9 // indirect 51 | golang.org/x/net v0.33.0 // indirect 52 | golang.org/x/oauth2 v0.24.0 // indirect 53 | golang.org/x/sync v0.10.0 // indirect 54 | golang.org/x/sys v0.28.0 // indirect 55 | golang.org/x/text v0.21.0 // indirect 56 | google.golang.org/protobuf v1.35.2 // indirect 57 | gopkg.in/ini.v1 v1.67.0 // indirect 58 | gopkg.in/yaml.v2 v2.4.0 // indirect 59 | gopkg.in/yaml.v3 v3.0.1 // indirect 60 | ) 61 | -------------------------------------------------------------------------------- /go.sum: -------------------------------------------------------------------------------- 1 | github.com/alecthomas/kingpin/v2 v2.4.0 h1:f48lwail6p8zpO1bC4TxtqACaGqHYA22qkHjHpqDjYY= 2 | github.com/alecthomas/kingpin/v2 v2.4.0/go.mod h1:0gyi0zQnjuFk8xrkNKamJoyUo382HRL7ATRpFZCw6tE= 3 | github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137 h1:s6gZFSlWYmbqAuRjVTiNNhvNRfY2Wxp9nhfyel4rklc= 4 | github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137/go.mod h1:OMCwj8VM1Kc9e19TLln2VL61YJF0x1XFtfdL4JdbSyE= 5 | github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM= 6 | github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw= 7 | github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= 8 | github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= 9 | github.com/coreos/go-systemd/v22 v22.5.0 h1:RrqgGjYQKalulkV8NGVIfkXQf6YYmOyiJKk8iXXhfZs= 10 | github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc= 11 | github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 12 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 13 | github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM= 14 | github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 15 | github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8= 16 | github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0= 17 | github.com/fsnotify/fsnotify v1.7.0 h1:8JEhPFa5W2WU7YfeZzPNqzMP6Lwt7L2715Ggo0nosvA= 18 | github.com/fsnotify/fsnotify v1.7.0/go.mod h1:40Bi/Hjc2AVfZrqy+aj+yEI+/bRxZnMJyTJwOpGvigM= 19 | github.com/go-kit/log v0.2.1 h1:MRVx0/zhvdseW+Gza6N9rVzU/IVzaeE1SFI4raAhmBU= 20 | github.com/go-kit/log v0.2.1/go.mod h1:NwTd00d/i8cPZ3xOwwiv2PO5MOcx78fFErGNcVmBjv0= 21 | github.com/go-logfmt/logfmt v0.5.1 h1:otpy5pqBCBZ1ng9RQ0dPu4PN7ba75Y/aA+UpowDyNVA= 22 | github.com/go-logfmt/logfmt v0.5.1/go.mod h1:WYhtIu8zTZfxdn5+rREduYbwxfcBr/Vr6KEVveWlfTs= 23 | github.com/godbus/dbus/v5 v5.0.4/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA= 24 | github.com/golang/mock v1.6.0 h1:ErTB+efbowRARo13NNdxyJji2egdxLGQhRaY+DUumQc= 25 | github.com/golang/mock v1.6.0/go.mod h1:p6yTPP+5HYm5mzsMV8JkE6ZKdX+/wYM6Hr+LicevLPs= 26 | github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= 27 | github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= 28 | github.com/hashicorp/hcl v1.0.0 h1:0Anlzjpi4vEasTeNFn2mLJgTSwt0+6sfsiTG8qcWGx4= 29 | github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ= 30 | github.com/jpillora/backoff v1.0.0 h1:uvFg412JmmHBHw7iwprIxkPMI+sGQ4kzOWsMeHnm2EA= 31 | github.com/jpillora/backoff v1.0.0/go.mod h1:J/6gKK9jxlEcS3zixgDgUAsiuZ7yrSoa/FX5e0EB2j4= 32 | github.com/klauspost/compress v1.17.9 h1:6KIumPrER1LHsvBVuDa0r5xaG0Es51mhhB9BQB2qeMA= 33 | github.com/klauspost/compress v1.17.9/go.mod h1:Di0epgTjJY877eYKx5yC51cX2A2Vl2ibi7bDH9ttBbw= 34 | github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= 35 | github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= 36 | github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= 37 | github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= 38 | github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc= 39 | github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw= 40 | github.com/magiconair/properties v1.8.7 h1:IeQXZAiQcpL9mgcAe1Nu6cX9LLw6ExEHKjN0VQdvPDY= 41 | github.com/magiconair/properties v1.8.7/go.mod h1:Dhd985XPs7jluiymwWYZ0G4Z61jb3vdS329zhj2hYo0= 42 | github.com/mitchellh/mapstructure v1.5.0 h1:jeMsZIYE/09sWLaz43PL7Gy6RuMjD2eJVyuac5Z2hdY= 43 | github.com/mitchellh/mapstructure v1.5.0/go.mod h1:bFUtVrKA4DC2yAKiSyO/QUcy7e+RRV2QTWOzhPopBRo= 44 | github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA= 45 | github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ= 46 | github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f h1:KUppIJq7/+SVif2QVs3tOP0zanoHgBEVAwHxUSIzRqU= 47 | github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U= 48 | github.com/pelletier/go-toml/v2 v2.2.2 h1:aYUidT7k73Pcl9nb2gScu7NSrKCSHIDE89b3+6Wq+LM= 49 | github.com/pelletier/go-toml/v2 v2.2.2/go.mod h1:1t835xjRzz80PqgE6HHgN2JOsmgYu/h4qDAS4n929Rs= 50 | github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= 51 | github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= 52 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 53 | github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= 54 | github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 55 | github.com/prometheus/client_golang v1.20.5 h1:cxppBPuYhUnsO6yo/aoRol4L7q7UFfdm+bR9r+8l63Y= 56 | github.com/prometheus/client_golang v1.20.5/go.mod h1:PIEt8X02hGcP8JWbeHyeZ53Y/jReSnHgO035n//V5WE= 57 | github.com/prometheus/client_model v0.6.1 h1:ZKSh/rekM+n3CeS952MLRAdFwIKqeY8b62p8ais2e9E= 58 | github.com/prometheus/client_model v0.6.1/go.mod h1:OrxVMOVHjw3lKMa8+x6HeMGkHMQyHDk9E3jmP2AmGiY= 59 | github.com/prometheus/common v0.61.0 h1:3gv/GThfX0cV2lpO7gkTUwZru38mxevy90Bj8YFSRQQ= 60 | github.com/prometheus/common v0.61.0/go.mod h1:zr29OCN/2BsJRaFwG8QOBr41D6kkchKbpeNH7pAjb/s= 61 | github.com/prometheus/exporter-toolkit v0.10.0 h1:yOAzZTi4M22ZzVxD+fhy1URTuNRj/36uQJJ5S8IPza8= 62 | github.com/prometheus/exporter-toolkit v0.10.0/go.mod h1:+sVFzuvV5JDyw+Ih6p3zFxZNVnKQa3x5qPmDSiPu4ZY= 63 | github.com/prometheus/procfs v0.15.1 h1:YagwOFzUgYfKKHX6Dr+sHT7km/hxC76UB0learggepc= 64 | github.com/prometheus/procfs v0.15.1/go.mod h1:fB45yRUv8NstnjriLhBQLuOUt+WW4BsoGhij/e3PBqk= 65 | github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ= 66 | github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog= 67 | github.com/sagikazarmark/locafero v0.4.0 h1:HApY1R9zGo4DBgr7dqsTH/JJxLTTsOt7u6keLGt6kNQ= 68 | github.com/sagikazarmark/locafero v0.4.0/go.mod h1:Pe1W6UlPYUk/+wc/6KFhbORCfqzgYEpgQ3O5fPuL3H4= 69 | github.com/sagikazarmark/slog-shim v0.1.0 h1:diDBnUNK9N/354PgrxMywXnAwEr1QZcOr6gto+ugjYE= 70 | github.com/sagikazarmark/slog-shim v0.1.0/go.mod h1:SrcSrq8aKtyuqEI1uvTDTK1arOWRIczQRv+GVI1AkeQ= 71 | github.com/sourcegraph/conc v0.3.0 h1:OQTbbt6P72L20UqAkXXuLOj79LfEanQ+YQFNpLA9ySo= 72 | github.com/sourcegraph/conc v0.3.0/go.mod h1:Sdozi7LEKbFPqYX2/J+iBAM6HpqSLTASQIKqDmF7Mt0= 73 | github.com/spf13/afero v1.11.0 h1:WJQKhtpdm3v2IzqG8VMqrr6Rf3UYpEF239Jy9wNepM8= 74 | github.com/spf13/afero v1.11.0/go.mod h1:GH9Y3pIexgf1MTIWtNGyogA5MwRIDXGUr+hbWNoBjkY= 75 | github.com/spf13/cast v1.6.0 h1:GEiTHELF+vaR5dhz3VqZfFSzZjYbgeKDpBxQVS4GYJ0= 76 | github.com/spf13/cast v1.6.0/go.mod h1:ancEpBxwJDODSW/UG4rDrAqiKolqNNh2DX3mk86cAdo= 77 | github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA= 78 | github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= 79 | github.com/spf13/viper v1.19.0 h1:RWq5SEjt8o25SROyN3z2OrDB9l7RPd3lwTWU8EcEdcI= 80 | github.com/spf13/viper v1.19.0/go.mod h1:GQUN9bilAbhU/jgc1bKs99f/suXKeUMct8Adx5+Ntkg= 81 | github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 82 | github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= 83 | github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo= 84 | github.com/stretchr/objx v0.5.2/go.mod h1:FRsXN1f5AsAjCGJKqEizvkpNtU+EGNCLh3NxZ/8L+MA= 85 | github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= 86 | github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= 87 | github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 88 | github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= 89 | github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo= 90 | github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= 91 | github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA= 92 | github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= 93 | github.com/subosito/gotenv v1.6.0 h1:9NlTDc1FTs4qu0DDq7AEtTPNw6SVm7uBMsUCUjABIf8= 94 | github.com/subosito/gotenv v1.6.0/go.mod h1:Dk4QP5c2W3ibzajGcXpNraDfq2IrhjMIvMSWPKKo0FU= 95 | github.com/xhit/go-str2duration/v2 v2.1.0 h1:lxklc02Drh6ynqX+DdPyp5pCKLUQpRT8bp8Ydu2Bstc= 96 | github.com/xhit/go-str2duration/v2 v2.1.0/go.mod h1:ohY8p+0f07DiV6Em5LKB0s2YpLtXVyJfNt1+BlmyAsU= 97 | github.com/yuin/goldmark v1.3.5/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k= 98 | go.uber.org/atomic v1.9.0 h1:ECmE8Bn/WFTYwEW/bpKD3M8VtR/zQVbavAoalC1PYyE= 99 | go.uber.org/atomic v1.9.0/go.mod h1:fEN4uk6kAWBTFdckzkM89CLk9XfWZrxpCo0nPH17wJc= 100 | go.uber.org/multierr v1.9.0 h1:7fIwc/ZtS0q++VgcfqFDxSBZVv/Xo49/SYnDFupUwlI= 101 | go.uber.org/multierr v1.9.0/go.mod h1:X2jQV1h+kxSjClGpnseKVIxpmcjrj7MNnI0bnlfKTVQ= 102 | golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= 103 | golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= 104 | golang.org/x/crypto v0.31.0 h1:ihbySMvVjLAeSH1IbfcRTkD/iNscyz8rGzjF/E5hV6U= 105 | golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk= 106 | golang.org/x/exp v0.0.0-20230905200255-921286631fa9 h1:GoHiUyI/Tp2nVkLI2mCxVkOjsbSXD66ic0XW0js0R9g= 107 | golang.org/x/exp v0.0.0-20230905200255-921286631fa9/go.mod h1:S2oDrQGGwySpoQPVqRShND87VCbxmc6bL1Yd2oYrm6k= 108 | golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= 109 | golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= 110 | golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= 111 | golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM= 112 | golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I= 113 | golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4= 114 | golang.org/x/oauth2 v0.24.0 h1:KTBBxWqUa0ykRPLtV69rRto9TLXcqYkeswu48x/gvNE= 115 | golang.org/x/oauth2 v0.24.0/go.mod h1:XYTD2NtWslqkgxebSiOHnXEap4TF09sJSc7H1sXbhtI= 116 | golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= 117 | golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= 118 | golang.org/x/sync v0.10.0 h1:3NQrjDixjgGwUOCaF8w2+VYHv0Ve/vGYSbdkTa98gmQ= 119 | golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= 120 | golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= 121 | golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= 122 | golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= 123 | golang.org/x/sys v0.0.0-20210330210617-4fbd30eecc44/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= 124 | golang.org/x/sys v0.0.0-20210510120138-977fb7262007/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 125 | golang.org/x/sys v0.28.0 h1:Fksou7UEQUWlKvIdsqzJmUmCX3cZuD2+P3XyyzwMhlA= 126 | golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= 127 | golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= 128 | golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= 129 | golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= 130 | golang.org/x/text v0.21.0 h1:zyQAAkrwaneQ066sspRyJaG9VNi/YJ1NfzcGB3hZ/qo= 131 | golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ= 132 | golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= 133 | golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= 134 | golang.org/x/tools v0.1.1/go.mod h1:o0xws9oXOQQZyjljx8fwUC0k7L1pTE6eaCbjGeHmOkk= 135 | golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= 136 | golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= 137 | golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= 138 | google.golang.org/protobuf v1.35.2 h1:8Ar7bF+apOIoThw1EdZl0p1oWvMqTHmpA2fRTyZO8io= 139 | google.golang.org/protobuf v1.35.2/go.mod h1:9fA7Ob0pmnwhb644+1+CVWFRbNajQ6iRojtC/QF5bRE= 140 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 141 | gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= 142 | gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= 143 | gopkg.in/ini.v1 v1.67.0 h1:Dgnx+6+nfE+IfzjUEISNeydPJh9AXNNsWbGP9KzCsOA= 144 | gopkg.in/ini.v1 v1.67.0/go.mod h1:pNLf8WUiyNEtQjuu5G5vTm06TEv9tsIgeAvK8hOrP4k= 145 | gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= 146 | gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY= 147 | gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ= 148 | gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 149 | gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= 150 | gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 151 | -------------------------------------------------------------------------------- /ha_cluster_exporter.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=Prometheus exporter for Pacemaker HA clusters metrics 3 | After=network.target 4 | 5 | [Service] 6 | Type=simple 7 | Restart=always 8 | EnvironmentFile=-/etc/sysconfig/prometheus-ha_cluster_exporter 9 | ExecStart=/usr/bin/ha_cluster_exporter $ARGS 10 | ExecReload=/bin/kill -HUP $MAINPID 11 | 12 | [Install] 13 | WantedBy=multi-user.target 14 | -------------------------------------------------------------------------------- /ha_cluster_exporter.sysconfig: -------------------------------------------------------------------------------- 1 | ## Path: Network/Monitors/Prometheus/ha_cluster_exporter 2 | ## Description: Prometheus ha_cluster_exporter startup parameters 3 | ## Type: string 4 | ## Default: '' 5 | # 6 | # Additional arguments for the ha_cluster_exporter. 7 | # Please call: /usr/bin/ha_cluster_exporter --help 8 | # for a full list of possible options. 9 | # Note: Please keep the list on one line, of possible. 10 | # 11 | ARGS='' 12 | -------------------------------------------------------------------------------- /ha_cluster_exporter.yaml: -------------------------------------------------------------------------------- 1 | # sample config 2 | web: 3 | listen-address: "0.0.0.0:9664" 4 | telemetry-path: "/metrics" 5 | config: 6 | file: "/etc/ha_cluster_exporter.web.yaml" 7 | log: 8 | level: "info" 9 | format: "logfmt" 10 | crm-mon-path: "/usr/sbin/crm_mon" 11 | cibadmin-path: "/usr/sbin/cibadmin" 12 | corosync-cfgtoolpath-path: "/usr/sbin/corosync-cfgtool" 13 | corosync-quorumtool-path: "/usr/sbin/corosync-quorumtool" 14 | sbd-path: "/usr/sbin/sbd" 15 | sbd-config-path: "/etc/sysconfig/sbd" 16 | drbdsetup-path: "/sbin/drbdsetup" 17 | -------------------------------------------------------------------------------- /internal/assert/assertions.go: -------------------------------------------------------------------------------- 1 | package assert 2 | 3 | import ( 4 | "os" 5 | "path" 6 | "testing" 7 | 8 | "github.com/prometheus/client_golang/prometheus" 9 | "github.com/prometheus/client_golang/prometheus/testutil" 10 | ) 11 | 12 | // borrowed from haproxy_exporter 13 | // https://github.com/prometheus/haproxy_exporter/blob/0ddc4bc5cb4074ba95d57257f63ab82ab451a45b/haproxy_exporter_test.go 14 | func Metrics(t *testing.T, c prometheus.Collector, fixture string) { 15 | exp, err := os.Open(path.Join("../../test", fixture)) 16 | if err != nil { 17 | t.Fatalf("Error opening fixture file %q: %v", fixture, err) 18 | } 19 | if err := testutil.CollectAndCompare(c, exp); err != nil { 20 | t.Fatal("Unexpected metrics returned:", err) 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /internal/clock/clock.go: -------------------------------------------------------------------------------- 1 | package clock 2 | 3 | import "time" 4 | 5 | type Clock interface { 6 | Now() time.Time 7 | Since(t time.Time) time.Duration 8 | } 9 | -------------------------------------------------------------------------------- /internal/clock/stop_clock.go: -------------------------------------------------------------------------------- 1 | package clock 2 | 3 | import "time" 4 | 5 | type StoppedClock struct{} 6 | 7 | const TEST_TIMESTAMP = 1234 8 | 9 | func (StoppedClock) Now() time.Time { 10 | ms := TEST_TIMESTAMP * time.Millisecond 11 | return time.Date(1970, 1, 1, 0, 0, 0, int(ms.Nanoseconds()), time.UTC) 12 | // 1234 millisecond after Unix epoch (1970-01-01 00:00:01.234 +0000 UTC) 13 | // this will allow us to use a fixed timestamped when running assertions 14 | } 15 | 16 | func (StoppedClock) Since(t time.Time) time.Duration { 17 | return TEST_TIMESTAMP * time.Millisecond 18 | } 19 | -------------------------------------------------------------------------------- /internal/clock/system_clock.go: -------------------------------------------------------------------------------- 1 | package clock 2 | 3 | import "time" 4 | 5 | type SystemClock struct{} 6 | 7 | func (SystemClock) Now() time.Time { 8 | return time.Now() 9 | } 10 | 11 | func (SystemClock) Since(t time.Time) time.Duration { 12 | return time.Since(t) 13 | } 14 | -------------------------------------------------------------------------------- /main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "net/http" 6 | "os" 7 | "strings" 8 | 9 | "github.com/go-kit/log" 10 | "github.com/go-kit/log/level" 11 | "github.com/prometheus/client_golang/prometheus" 12 | "github.com/prometheus/client_golang/prometheus/promhttp" 13 | "github.com/prometheus/common/promlog" 14 | 15 | // cannot use as setConfigDefault function will not work here 16 | // log.level and log.format flags are set in vars/init 17 | // "github.com/prometheus/common/promlog/flag" 18 | "github.com/prometheus/common/version" 19 | "github.com/prometheus/exporter-toolkit/web" 20 | 21 | "github.com/spf13/viper" 22 | // we could use this but want to define our own defaults 23 | // webflag "github.com/prometheus/exporter-toolkit/web/kingpinflag" 24 | "github.com/alecthomas/kingpin/v2" 25 | 26 | "github.com/ClusterLabs/ha_cluster_exporter/collector" 27 | "github.com/ClusterLabs/ha_cluster_exporter/collector/corosync" 28 | "github.com/ClusterLabs/ha_cluster_exporter/collector/drbd" 29 | "github.com/ClusterLabs/ha_cluster_exporter/collector/pacemaker" 30 | "github.com/ClusterLabs/ha_cluster_exporter/collector/sbd" 31 | ) 32 | 33 | const ( 34 | namespace = "ha_cluster_exporter" 35 | ) 36 | 37 | var ( 38 | config *viper.Viper 39 | 40 | // general flags 41 | webListenAddress *string 42 | webTelemetryPath *string 43 | webConfig *string 44 | logLevel *string 45 | logFormat *string 46 | 47 | // collector flags 48 | haClusterCrmMonPath *string 49 | haClusterCibadminPath *string 50 | haClusterCorosyncCfgtoolpathPath *string 51 | haClusterCorosyncQuorumtoolPath *string 52 | haClusterSbdPath *string 53 | haClusterSbdConfigPath *string 54 | haClusterDrbdsetupPath *string 55 | haClusterDrbdsplitbrainPath *string 56 | 57 | // deprecated flags 58 | enableTimestampsDeprecated *bool 59 | portDeprecated *int 60 | addressDeprecated *string 61 | logLevelDeprecated *string 62 | 63 | promlogConfig = &promlog.Config{ 64 | Level: &promlog.AllowedLevel{}, 65 | Format: &promlog.AllowedFormat{}, 66 | } 67 | ) 68 | 69 | func init() { 70 | config = viper.New() 71 | config.SetConfigName("ha_cluster_exporter") 72 | config.AddConfigPath("./") 73 | config.AddConfigPath("$HOME/.config/") 74 | config.AddConfigPath("/etc/") 75 | config.AddConfigPath("/usr/etc/") 76 | config.ReadInConfig() 77 | 78 | // general flags 79 | webListenAddress = kingpin.Flag( 80 | "web.listen-address", 81 | "Address to listen on for web interface and telemetry.", 82 | ).PlaceHolder(":9664").Default(setConfigDefault("web.listen-address", ":9664")).String() 83 | webTelemetryPath = kingpin.Flag( 84 | "web.telemetry-path", 85 | "Path under which to expose metrics.", 86 | ).PlaceHolder("/metrics").Default(setConfigDefault("web.telemetry-path", "/metrics")).String() 87 | // we could use this but want to define our own defaults 88 | // webConfig = webflag.AddFlags(kingpin.CommandLine) 89 | webConfig = kingpin.Flag( 90 | "web.config.file", 91 | "[EXPERIMENTAL] Path to configuration file that can enable TLS or authentication.", 92 | ).PlaceHolder("/etc/" + namespace + ".web.yaml").Default(setConfigDefault("web.config.file", "/etc/"+namespace+".web.yaml")).String() 93 | 94 | // collector flags 95 | haClusterCrmMonPath = kingpin.Flag( 96 | "crm-mon-path", 97 | "path to crm_mon executable", 98 | ).PlaceHolder("/usr/sbin/crm_mon").Default(setConfigDefault("crm-mon-path", "/usr/sbin/crm_mon")).String() 99 | haClusterCibadminPath = kingpin.Flag( 100 | "cibadmin-path", 101 | "path to cibadmin executable", 102 | ).PlaceHolder("/usr/sbin/cibadmin").Default(setConfigDefault("cibadmin-path", "/usr/sbin/cibadmin")).String() 103 | haClusterCorosyncCfgtoolpathPath = kingpin.Flag( 104 | "corosync-cfgtoolpath-path", 105 | "path to corosync-cfgtool executable", 106 | ).PlaceHolder("/usr/sbin/corosync-cfgtool").Default(setConfigDefault("corosync-cfgtoolpath-path", "/usr/sbin/corosync-cfgtool")).String() 107 | haClusterCorosyncQuorumtoolPath = kingpin.Flag( 108 | "corosync-quorumtool-path", 109 | "path to corosync-quorumtool executable", 110 | ).PlaceHolder("/usr/sbin/corosync-quorumtool").Default(setConfigDefault("corosync-quorumtool-path", "/usr/sbin/corosync-quorumtool")).String() 111 | haClusterSbdPath = kingpin.Flag( 112 | "sbd-path", 113 | "path to sbd executable", 114 | ).PlaceHolder("/usr/sbin/sbd").Default(setConfigDefault("sbd-path", "/usr/sbin/sbd")).String() 115 | haClusterSbdConfigPath = kingpin.Flag( 116 | "sbd-config-path", 117 | "path to sbd configuration", 118 | ).PlaceHolder("/etc/sysconfig/sbd").Default(setConfigDefault("sbd-config-path", "/etc/sysconfig/sbd")).String() 119 | haClusterDrbdsetupPath = kingpin.Flag( 120 | "drbdsetup-path", 121 | "path to drbdsetup executable", 122 | ).PlaceHolder("/sbin/drbdsetup").Default(setConfigDefault("drbdsetup-path", "/sbin/drbdsetup")).String() 123 | haClusterDrbdsplitbrainPath = kingpin.Flag( 124 | "drbdsplitbrain-path", 125 | "path to drbd splitbrain hooks temporary files", 126 | ).PlaceHolder("/var/run/drbd/splitbrain").Default(setConfigDefault("drbdsplitbrain-path", "/var/run/drbd/splitbrain")).String() 127 | enableTimestampsDeprecated = kingpin.Flag( 128 | "enable-timestamps", 129 | "[DEPRECATED] server-side metric timestamping is discouraged by Prometheus best-practices and should be avoided", 130 | ).PlaceHolder("false").Default(setConfigDefault("enable-timestamps", "false")).Bool() 131 | addressDeprecated = kingpin.Flag( 132 | "address", 133 | "[DEPRECATED] please use --web.listen-address or --web.config.file to use Prometheus Exporter Toolkit", 134 | ).PlaceHolder("0.0.0.0").Default(setConfigDefault("address", "0.0.0.0")).String() 135 | portDeprecated = kingpin.Flag( 136 | "port", 137 | "[DEPRECATED] please use --web.listen-address or --web.config.file to use Prometheus Exporter Toolkit", 138 | ).PlaceHolder("9664").Default(setConfigDefault("port", "9664")).Int() 139 | logLevelDeprecated = kingpin.Flag( 140 | "log-level", 141 | "[DEPRECATED] please user log.level", 142 | ).PlaceHolder("info").Default(setConfigDefault("log-level", "info")).String() 143 | 144 | // cannot use as setConfigDefault function will not work here 145 | // log.level and log.format flags are set in vars/init 146 | // flag.AddFlags(kingpin.CommandLine, promlogConfig) 147 | logLevel = kingpin.Flag( 148 | "log.level", 149 | "Only log messages with the given severity or above. One of: [debug, info, warn, error]", 150 | ).PlaceHolder("info").Default(setConfigDefault("log.level", "info")).String() 151 | logFormat = kingpin.Flag( 152 | "log.format", 153 | "Output format of log messages. One of: [logfmt, json]", 154 | ).PlaceHolder("logfmt").Default(setConfigDefault("log.format", "logfmt")).String() 155 | 156 | // detect unit testing and skip kingpin.Parse() in init. 157 | // see: https://github.com/alecthomas/kingpin/issues/187 158 | testing := (strings.HasSuffix(os.Args[0], ".test") || 159 | strings.HasSuffix(os.Args[0], "__debug_bin")) 160 | if testing { 161 | return 162 | } 163 | 164 | kingpin.Version(version.Print(namespace)) 165 | kingpin.HelpFlag.Short('h') 166 | 167 | var err error 168 | 169 | kingpin.Parse() 170 | 171 | // use deprecated log-level parameter if set 172 | if *logLevelDeprecated != "info" { 173 | *logLevel = *logLevelDeprecated 174 | } 175 | 176 | err = promlogConfig.Level.Set(*logLevel) 177 | if err != nil { 178 | fmt.Printf("%s: error: %s, try --help\n", namespace, err) 179 | os.Exit(1) 180 | } 181 | err = promlogConfig.Format.Set(*logFormat) 182 | if err != nil { 183 | fmt.Printf("%s: error: %s, try --help\n", namespace, err) 184 | os.Exit(1) 185 | } 186 | } 187 | 188 | // looks up if a configName is define in viper config 189 | // if it is not defined in the viper config, set the passed configDefault 190 | func setConfigDefault(configName string, configDefault string) string { 191 | var result string 192 | if config.IsSet(configName) { 193 | result = config.GetString(configName) 194 | } else { 195 | result = configDefault 196 | } 197 | return result 198 | } 199 | 200 | func registerCollectors(logger log.Logger) (collectors []prometheus.Collector, errors []error) { 201 | pacemakerCollector, err := pacemaker.NewCollector( 202 | *haClusterCrmMonPath, 203 | *haClusterCibadminPath, 204 | *enableTimestampsDeprecated, 205 | logger, 206 | ) 207 | if err != nil { 208 | errors = append(errors, err) 209 | } else { 210 | collectors = append(collectors, pacemakerCollector) 211 | } 212 | 213 | corosyncCollector, err := corosync.NewCollector( 214 | *haClusterCorosyncCfgtoolpathPath, 215 | *haClusterCorosyncQuorumtoolPath, 216 | *enableTimestampsDeprecated, 217 | logger, 218 | ) 219 | if err != nil { 220 | errors = append(errors, err) 221 | } else { 222 | collectors = append(collectors, corosyncCollector) 223 | } 224 | 225 | sbdCollector, err := sbd.NewCollector( 226 | *haClusterSbdPath, 227 | *haClusterSbdConfigPath, 228 | *enableTimestampsDeprecated, 229 | logger, 230 | ) 231 | if err != nil { 232 | errors = append(errors, err) 233 | } else { 234 | collectors = append(collectors, sbdCollector) 235 | } 236 | 237 | drbdCollector, err := drbd.NewCollector( 238 | *haClusterDrbdsetupPath, 239 | *haClusterDrbdsplitbrainPath, 240 | *enableTimestampsDeprecated, 241 | logger, 242 | ) 243 | if err != nil { 244 | errors = append(errors, err) 245 | } else { 246 | collectors = append(collectors, drbdCollector) 247 | } 248 | 249 | for i, c := range collectors { 250 | if c, ok := c.(collector.InstrumentableCollector); ok { 251 | collectors[i] = collector.NewInstrumentedCollector(c, logger) 252 | } 253 | } 254 | 255 | prometheus.MustRegister(collectors...) 256 | 257 | return collectors, errors 258 | } 259 | 260 | func main() { 261 | var err error 262 | 263 | logger := promlog.New(promlogConfig) 264 | 265 | level.Info(logger).Log("msg", fmt.Sprintf("Starting %s %s", namespace, version.Info())) 266 | level.Info(logger).Log("msg", fmt.Sprintf("Build context %s", version.BuildContext())) 267 | 268 | // re-read only to display Info/Warn 269 | err = config.ReadInConfig() 270 | if err != nil { 271 | level.Warn(logger).Log("msg", "Reading config file failed", "err", err) 272 | level.Info(logger).Log("msg", "Default config values will be used") 273 | } else { 274 | level.Info(logger).Log("msg", "Using config file: "+config.ConfigFileUsed()) 275 | } 276 | 277 | // register collectors 278 | collectors, errors := registerCollectors(logger) 279 | for _, err = range errors { 280 | level.Warn(logger).Log("msg", "Registration failure", "err", err) 281 | } 282 | if len(collectors) == 0 { 283 | level.Error(logger).Log("msg", "No collector could be registered.") 284 | os.Exit(1) 285 | } 286 | for _, c := range collectors { 287 | if c, ok := c.(collector.SubsystemCollector); ok { 288 | level.Info(logger).Log("msg", c.GetSubsystem()+" collector registered.") 289 | } 290 | } 291 | 292 | // if we're not in debug log level, we unregister the Go runtime metrics collector that gets registered by default 293 | if *logLevel != "debug" { 294 | prometheus.Unregister(prometheus.NewGoCollector()) 295 | } 296 | 297 | var fullListenAddress string 298 | // use deprecated parameters 299 | if *addressDeprecated != "0.0.0.0" || *portDeprecated != 9664 { 300 | fullListenAddress = fmt.Sprintf("%s:%d", *addressDeprecated, *portDeprecated) 301 | // use new parameters 302 | } else { 303 | fullListenAddress = *webListenAddress 304 | } 305 | serveAddress := &http.Server{Addr: fullListenAddress} 306 | servePath := *webTelemetryPath 307 | 308 | landingPage := []byte(` 309 | 310 | ClusterLabs Linux HA Cluster Exporter 311 | 312 | 313 |

ClusterLabs Linux HA Cluster Exporter

314 |

Prometheus exporter for Pacemaker based Linux HA clusters

315 | 319 | 320 | 321 | `) 322 | 323 | http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { 324 | w.Write(landingPage) 325 | }) 326 | http.Handle(servePath, promhttp.Handler()) 327 | 328 | level.Info(logger).Log("msg", "Serving metrics on "+fullListenAddress+servePath) 329 | 330 | toolkitFlags := &web.FlagConfig{ 331 | WebListenAddresses: func() *[]string { 332 | r := []string{*webListenAddress} 333 | return &r 334 | }(), 335 | WebSystemdSocket: func() *bool { 336 | r := false 337 | return &r 338 | }(), 339 | WebConfigFile: func() *string { 340 | r := "" 341 | return &r 342 | }(), 343 | } 344 | 345 | var listen error 346 | _, err = os.Stat(*webConfig) 347 | 348 | if err != nil { 349 | level.Warn(logger).Log("msg", "Reading web config file failed", "err", err) 350 | level.Info(logger).Log("msg", "Default web config or commandline values will be used") 351 | listen = web.ListenAndServe(serveAddress, toolkitFlags, logger) 352 | } else { 353 | level.Info(logger).Log("msg", "Using web config file: "+*webConfig) 354 | toolkitFlags.WebConfigFile = webConfig 355 | listen = web.ListenAndServe(serveAddress, toolkitFlags, logger) 356 | } 357 | 358 | if err := listen; err != nil { 359 | level.Error(logger).Log("msg", "Error starting HTTP server", "err", err) 360 | os.Exit(1) 361 | } 362 | } 363 | -------------------------------------------------------------------------------- /main_test.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "context" 5 | "fmt" 6 | "io" 7 | "net" 8 | "net/http" 9 | "net/url" 10 | "os" 11 | "os/exec" 12 | "reflect" 13 | "runtime" 14 | "strings" 15 | "syscall" 16 | "testing" 17 | "time" 18 | 19 | "github.com/go-kit/log" 20 | "github.com/prometheus/client_golang/prometheus" 21 | "github.com/stretchr/testify/assert" 22 | ) 23 | 24 | func TestRegisterCollectors(t *testing.T) { 25 | *haClusterCrmMonPath = "test/fake_crm_mon.sh" 26 | *haClusterCibadminPath = "test/fake_cibadmin.sh" 27 | *haClusterCorosyncCfgtoolpathPath = "test/fake_corosync-cfgtool.sh" 28 | *haClusterCorosyncQuorumtoolPath = "test/fake_corosync-quorumtool.sh" 29 | *haClusterSbdPath = "test/fake_sbd.sh" 30 | *haClusterSbdConfigPath = "test/fake_sbdconfig" 31 | *haClusterDrbdsetupPath = "test/fake_drbdsetup.sh" 32 | *haClusterDrbdsplitbrainPath = "test/fake_drbdsplitbrain" 33 | 34 | t.Run("success", func(t *testing.T) { 35 | wantCollectors := 4 36 | wantErrors := 0 37 | prometheus.DefaultRegisterer = prometheus.NewRegistry() 38 | prometheus.DefaultGatherer = prometheus.NewRegistry() 39 | collectors, errors := registerCollectors(log.NewNopLogger()) 40 | assert.Len(t, collectors, wantCollectors) 41 | assert.Len(t, errors, wantErrors) 42 | }) 43 | 44 | *haClusterCrmMonPath = "does_not_exist" 45 | t.Run("1 failure", func(t *testing.T) { 46 | wantCollectors := 3 47 | wantErrors := 1 48 | prometheus.DefaultRegisterer = prometheus.NewRegistry() 49 | prometheus.DefaultGatherer = prometheus.NewRegistry() 50 | collectors, errors := registerCollectors(log.NewNopLogger()) 51 | assert.Len(t, collectors, wantCollectors) 52 | assert.Len(t, errors, wantErrors) 53 | }) 54 | 55 | *haClusterCorosyncCfgtoolpathPath = "does_not_exist" 56 | t.Run("2 failures", func(t *testing.T) { 57 | wantCollectors := 2 58 | wantErrors := 2 59 | prometheus.DefaultRegisterer = prometheus.NewRegistry() 60 | prometheus.DefaultGatherer = prometheus.NewRegistry() 61 | collectors, errors := registerCollectors(log.NewNopLogger()) 62 | assert.Len(t, collectors, wantCollectors) 63 | assert.Len(t, errors, wantErrors) 64 | }) 65 | 66 | *haClusterSbdPath = "does_not_exist" 67 | t.Run("3 failures", func(t *testing.T) { 68 | wantCollectors := 1 69 | wantErrors := 3 70 | prometheus.DefaultRegisterer = prometheus.NewRegistry() 71 | prometheus.DefaultGatherer = prometheus.NewRegistry() 72 | collectors, errors := registerCollectors(log.NewNopLogger()) 73 | assert.Len(t, collectors, wantCollectors) 74 | assert.Len(t, errors, wantErrors) 75 | }) 76 | 77 | *haClusterDrbdsetupPath = "does_not_exist" 78 | t.Run("4 failures", func(t *testing.T) { 79 | wantCollectors := 0 80 | wantErrors := 4 81 | prometheus.DefaultRegisterer = prometheus.NewRegistry() 82 | prometheus.DefaultGatherer = prometheus.NewRegistry() 83 | collectors, errors := registerCollectors(log.NewNopLogger()) 84 | assert.Len(t, collectors, wantCollectors) 85 | assert.Len(t, errors, wantErrors) 86 | }) 87 | } 88 | 89 | //// Kudos for the build/run tests to https://github.com/prometheus/mysqld_exporter 90 | // TestBin builds, runs and tests binary. 91 | 92 | // bin stores information about path of executable and attached port 93 | type bin struct { 94 | path string 95 | port int 96 | } 97 | 98 | func TestBin(t *testing.T) { 99 | var err error 100 | binName := "ha" 101 | 102 | binDir, err := os.MkdirTemp("/tmp", binName+"-test-bindir-") 103 | if err != nil { 104 | t.Fatal(err) 105 | } 106 | defer func() { 107 | err := os.RemoveAll(binDir) 108 | if err != nil { 109 | t.Fatal(err) 110 | } 111 | }() 112 | 113 | importpath := "github.com/prometheus/ha_cluster_exporter/vendor/github.com/prometheus/common" 114 | path := binDir + "/" + binName 115 | xVariables := map[string]string{ 116 | importpath + "/version.Version": "gotest-version", 117 | importpath + "/version.Branch": "gotest-branch", 118 | importpath + "/version.Revision": "gotest-revision", 119 | } 120 | var ldflags []string 121 | for x, value := range xVariables { 122 | ldflags = append(ldflags, fmt.Sprintf("-X %s=%s", x, value)) 123 | } 124 | cmd := exec.Command( 125 | "go", 126 | "build", 127 | "-o", 128 | path, 129 | "-ldflags", 130 | strings.Join(ldflags, " "), 131 | ) 132 | cmd.Stdout = os.Stdout 133 | cmd.Stderr = os.Stderr 134 | err = cmd.Run() 135 | if err != nil { 136 | t.Fatalf("Failed to build: %s", err) 137 | } 138 | 139 | tests := []func(*testing.T, bin){ 140 | testLandingPage, 141 | } 142 | 143 | portStart := 56000 144 | t.Run(binName, func(t *testing.T) { 145 | for _, f := range tests { 146 | f := f // capture range variable 147 | fName := runtime.FuncForPC(reflect.ValueOf(f).Pointer()).Name() 148 | portStart++ 149 | data := bin{ 150 | path: path, 151 | port: portStart, 152 | } 153 | t.Run(fName, func(t *testing.T) { 154 | t.Parallel() 155 | f(t, data) 156 | }) 157 | } 158 | }) 159 | } 160 | 161 | func testLandingPage(t *testing.T, data bin) { 162 | ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) 163 | defer cancel() 164 | 165 | // Run exporter. 166 | servePath := "/metrics" 167 | cmd := exec.CommandContext( 168 | ctx, 169 | data.path, 170 | "--web.listen-address", fmt.Sprintf(":%d", data.port), 171 | "--web.telemetry-path", fmt.Sprintf("%s", servePath), 172 | "--crm-mon-path=test/fake_crm_mon.sh", // needed to register at least one collector 173 | "--cibadmin-path=test/fake_cibadmin.sh", 174 | ) 175 | if err := cmd.Start(); err != nil { 176 | t.Fatal(err) 177 | } 178 | defer cmd.Wait() 179 | defer cmd.Process.Kill() 180 | 181 | // Get the main page. 182 | urlToGet := fmt.Sprintf("http://127.0.0.1:%d", data.port) 183 | body, err := waitForBody(urlToGet) 184 | if err != nil { 185 | t.Fatal(err) 186 | } 187 | got := string(body) 188 | expected := ` 189 | 190 | ClusterLabs Linux HA Cluster Exporter 191 | 192 | 193 |

ClusterLabs Linux HA Cluster Exporter

194 |

Prometheus exporter for Pacemaker based Linux HA clusters

195 | 199 | 200 | 201 | ` 202 | 203 | if got != expected { 204 | t.Fatalf("got '%s' but expected '%s'", got, expected) 205 | } 206 | } 207 | 208 | // waitForBody is a helper function which makes http calls until http server is up 209 | // and then returns body of the successful call. 210 | func waitForBody(urlToGet string) (body []byte, err error) { 211 | tries := 60 212 | 213 | // Get data, but we need to wait a bit for http server. 214 | for i := 0; i <= tries; i++ { 215 | // Try to get web page. 216 | body, err = getBody(urlToGet) 217 | if err == nil { 218 | return body, err 219 | } 220 | 221 | // If there is a syscall.ECONNREFUSED error (web server not available) then retry. 222 | if urlError, ok := err.(*url.Error); ok { 223 | if opError, ok := urlError.Err.(*net.OpError); ok { 224 | if osSyscallError, ok := opError.Err.(*os.SyscallError); ok { 225 | if osSyscallError.Err == syscall.ECONNREFUSED { 226 | time.Sleep(1 * time.Second) 227 | continue 228 | } 229 | } 230 | } 231 | } 232 | 233 | // There was an error, and it wasn't syscall.ECONNREFUSED. 234 | return nil, err 235 | } 236 | 237 | return nil, fmt.Errorf("failed to GET %s after %d tries: %s", urlToGet, tries, err) 238 | } 239 | 240 | // getBody is a helper function which retrieves http body from given address. 241 | func getBody(urlToGet string) ([]byte, error) { 242 | resp, err := http.Get(urlToGet) 243 | if err != nil { 244 | return nil, err 245 | } 246 | defer resp.Body.Close() 247 | 248 | body, err := io.ReadAll(resp.Body) 249 | if err != nil { 250 | return nil, err 251 | } 252 | 253 | return body, nil 254 | } 255 | -------------------------------------------------------------------------------- /packaging/obs/grafana-ha-cluster-dashboards/_service: -------------------------------------------------------------------------------- 1 | 2 | 3 | https://github.com/%%REPOSITORY%%.git 4 | git 5 | %%REVISION%% 6 | dashboards 7 | LICENSE 8 | 1.1.0+git.%ct.%h 9 | grafana-ha-cluster-dashboards 10 | 11 | 12 | grafana-ha-cluster-dashboards.spec 13 | 14 | 15 | *.tar 16 | gz 17 | 18 | 19 | -------------------------------------------------------------------------------- /packaging/obs/grafana-ha-cluster-dashboards/grafana-ha-cluster-dashboards.changes: -------------------------------------------------------------------------------- 1 | ------------------------------------------------------------------- 2 | Fri Jun 4 10:58:51 UTC 2021 - Stefano Torresi 3 | 4 | - Rename grafana-sleha-cluster-provider subpackage to grafana-sleha-provider 5 | 6 | ------------------------------------------------------------------- 7 | Mon Nov 9 16:41:51 UTC 2020 - Witek Bedyk 8 | 9 | - Release 1.1.0 10 | * Split provider file to own package 11 | 12 | ------------------------------------------------------------------- 13 | Wed Sep 16 13:24:29 UTC 2020 - Dario Maiocchi 14 | 15 | - Release 1.0.3 (jsc#SLE-10545) 16 | * don't use require grafana, use recommends 17 | * fix permissions accordingly 18 | * fix minor typo on dashboard spec file 19 | 20 | ------------------------------------------------------------------- 21 | Wed Aug 5 11:33:51 UTC 2020 - Stefano Torresi 22 | 23 | - Release 1.0.2 24 | * update title and description 25 | 26 | ------------------------------------------------------------------- 27 | Tue Jul 14 13:47:20 UTC 2020 - Stefano Torresi 28 | 29 | - Release 1.0.1 30 | * fixed datasource variable initialization 31 | * minor Grafana 7 compatibility fixes 32 | 33 | 34 | ------------------------------------------------------------------- 35 | Fri Jun 12 10:39:33 UTC 2020 - Stefano Torresi 36 | 37 | - First release 38 | -------------------------------------------------------------------------------- /packaging/obs/grafana-ha-cluster-dashboards/grafana-ha-cluster-dashboards.spec: -------------------------------------------------------------------------------- 1 | # 2 | # Copyright 2019-2020 SUSE LLC 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # https://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # 16 | Name: grafana-ha-cluster-dashboards 17 | # Version will be processed via set_version source service 18 | Version: 0 19 | Release: 0 20 | License: Apache-2.0 21 | Summary: Grafana Dashboards for Pacemaker/Corosync HA Clusters 22 | Group: System/Monitoring 23 | Url: https://github.com/ClusterLabs/ha_cluster_exporter 24 | Source: %{name}-%{version}.tar.gz 25 | BuildArch: noarch 26 | Requires(pre): shadow 27 | Requires: grafana-sleha-provider 28 | Recommends: grafana 29 | 30 | # TECHNICAL NOTE: 31 | # Originally we used to require grafana but, for product management reasons, we use recommends now. 32 | # This impacts how we do pkging here: requiring shadow, creating grafana usr/group 33 | # and modifiying files attributes (this was done automagically when requiring grafana). 34 | 35 | %description 36 | Grafana Dashboards displaying metrics about Pacemaker/Corosync High Availability Clusters. 37 | 38 | %package -n grafana-sleha-provider 39 | Summary: Grafana configuration providers for the SLES HA Extension 40 | Group: System/Monitoring 41 | Recommends: grafana 42 | BuildArch: noarch 43 | Provides: grafana-sleha-cluster-provider = %version-%release 44 | Obsoletes: grafana-sleha-cluster-provider < %version-%release 45 | 46 | %description -n grafana-sleha-provider 47 | Automated configuration provisioners leveraged by other packages to enable a zero-config installation of Grafana dashboards. 48 | 49 | %prep 50 | %setup -q 51 | 52 | %pre 53 | echo "Creating grafana user and group if not present" 54 | getent group grafana > /dev/null || groupadd -r grafana 55 | getent passwd grafana > /dev/null || useradd -r -g grafana -d %{_datadir}/grafana -s /sbin/nologin grafana 56 | 57 | %build 58 | 59 | %install 60 | install -d -m0755 %{buildroot}%{_localstatedir}/lib/grafana/dashboards/sleha 61 | install -m644 dashboards/*.json %{buildroot}%{_localstatedir}/lib/grafana/dashboards/sleha 62 | install -Dm644 dashboards/provider-sleha.yaml %{buildroot}%{_sysconfdir}/grafana/provisioning/dashboards/provider-sleha.yaml 63 | 64 | %files 65 | %defattr(-,root,root) 66 | %doc dashboards/README.md 67 | %license LICENSE 68 | %attr(0644,grafana,grafana) %config %{_localstatedir}/lib/grafana/dashboards/sleha/* 69 | %attr(0755,grafana,grafana) %dir %{_localstatedir}/lib/grafana 70 | %attr(0755,grafana,grafana) %dir %{_localstatedir}/lib/grafana/dashboards 71 | %attr(0755,grafana,grafana) %dir %{_localstatedir}/lib/grafana/dashboards/sleha 72 | 73 | %files -n grafana-sleha-provider 74 | %attr(0755,root,root) %dir %{_sysconfdir}/grafana 75 | %attr(0755,root,root) %dir %{_sysconfdir}/grafana/provisioning 76 | %attr(0755,root,root) %dir %{_sysconfdir}/grafana/provisioning/dashboards 77 | %attr(0644,root,root) %config %{_sysconfdir}/grafana/provisioning/dashboards/provider-sleha.yaml 78 | 79 | %changelog 80 | -------------------------------------------------------------------------------- /packaging/obs/prometheus-ha_cluster_exporter/_service: -------------------------------------------------------------------------------- 1 | 2 | 3 | https://github.com/%%REPOSITORY%%.git 4 | git 5 | %%REVISION%% 6 | .git 7 | .github 8 | dashboards 9 | packaging/obs/grafana-ha-cluster-dashboards 10 | %%VERSION%% 11 | prometheus-ha_cluster_exporter 12 | 13 | 14 | prometheus-ha_cluster_exporter.spec 15 | 16 | 17 | *.tar 18 | gz 19 | 20 | 21 | -------------------------------------------------------------------------------- /packaging/obs/prometheus-ha_cluster_exporter/prometheus-ha_cluster_exporter.spec: -------------------------------------------------------------------------------- 1 | # 2 | # Copyright 2019-2024 SUSE LLC 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # https://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # 16 | 17 | 18 | Name: prometheus-ha_cluster_exporter 19 | # Version will be processed via set_version source service 20 | Version: 0 21 | Release: 0 22 | Summary: Prometheus exporter for Pacemaker HA clusters metrics 23 | License: Apache-2.0 24 | Group: System/Monitoring 25 | URL: https://github.com/ClusterLabs/ha_cluster_exporter 26 | Source: %{name}-%{version}.tar.gz 27 | Source1: vendor.tar.gz 28 | BuildRequires: golang(API) >= 1.23 29 | Requires(post): %fillup_prereq 30 | Provides: ha_cluster_exporter = %{version}-%{release} 31 | Provides: prometheus(ha_cluster_exporter) = %{version}-%{release} 32 | ExclusiveArch: aarch64 x86_64 ppc64le s390x 33 | 34 | #Compat macro for new _fillupdir macro introduced in Nov 2017 35 | %if ! %{defined _fillupdir} 36 | %define _fillupdir /var/adm/fillup-templates 37 | %endif 38 | 39 | %description 40 | Prometheus exporter for Pacemaker HA clusters metrics 41 | 42 | %prep 43 | %setup -q # unpack project sources 44 | %setup -q -T -D -a 1 # unpack go dependencies in vendor.tar.gz, which was prepared by the source services 45 | 46 | %define shortname ha_cluster_exporter 47 | 48 | %build 49 | %ifarch s390x 50 | export CGO_ENABLED=1 51 | %else 52 | export CGO_ENABLED=0 53 | %endif 54 | go build -mod=vendor \ 55 | -buildmode=pie \ 56 | -ldflags="-s -w -X github.com/prometheus/common/version.Version=%{version}" \ 57 | -o %{shortname} 58 | 59 | %install 60 | 61 | # Install the binary. 62 | install -D -m 0755 %{shortname} "%{buildroot}%{_bindir}/%{shortname}" 63 | 64 | # Install the systemd unit 65 | install -D -m 0644 %{shortname}.service %{buildroot}%{_unitdir}/%{name}.service 66 | 67 | # Install the environment file 68 | install -D -m 0644 %{shortname}.sysconfig %{buildroot}%{_fillupdir}/sysconfig.%{name} 69 | 70 | # Install compat wrapper for legacy init systems 71 | install -Dd -m 0755 %{buildroot}%{_sbindir} 72 | ln -s /usr/sbin/service %{buildroot}%{_sbindir}/rc%{name} 73 | 74 | # Install supportconfig plugin 75 | install -D -m 755 supportconfig-ha_cluster_exporter %{buildroot}%{_prefix}/lib/supportconfig/plugins/%{shortname} 76 | 77 | %pre 78 | %service_add_pre %{name}.service 79 | 80 | %post 81 | %service_add_post %{name}.service 82 | %fillup_only -n %{name} 83 | 84 | %preun 85 | %service_del_preun %{name}.service 86 | 87 | %postun 88 | %service_del_postun %{name}.service 89 | 90 | %files 91 | %doc *.md 92 | %doc doc/* 93 | %if 0%{?suse_version} >= 1500 94 | %license LICENSE 95 | %else 96 | %doc LICENSE 97 | %endif 98 | %{_bindir}/%{shortname} 99 | %{_unitdir}/%{name}.service 100 | %{_fillupdir}/sysconfig.%{name} 101 | %{_sbindir}/rc%{name} 102 | %dir %{_prefix}/lib/supportconfig 103 | %dir %{_prefix}/lib/supportconfig/plugins 104 | %{_prefix}/lib/supportconfig/plugins/%{shortname} 105 | 106 | %changelog 107 | -------------------------------------------------------------------------------- /supportconfig-ha_cluster_exporter: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set -u 3 | 4 | # supportconfig plugin for ha_cluster_exporter 5 | # 6 | # v1.0 7 | # 8 | # February 2024 v1.0 first release 9 | 10 | SVER='1.0.0' 11 | TITLE="SUSE supportconfig plugin for ha_cluster_exporter" 12 | 13 | function display_package_info() { 14 | echo -e "\n#==[ Command ]======================================#" 15 | echo -e "# rpm -qi ${1}" 16 | rpm -qi "${1}" 17 | 18 | echo -e "\n#==[ Command ]======================================#" 19 | echo -e "# rpm -V ${1}" 20 | rpm -V "${1}" 21 | } 22 | 23 | function display_file_stat() { 24 | echo -e "\n#==[ File ]===========================#" 25 | echo -e "# ls -ld ${1} ; stat ${1} \n" 26 | 27 | if [ -e "${1}" ] ; then 28 | ls -ld "${1}" 29 | echo 30 | stat "${1}" 31 | else 32 | echo "${1} does not exist!" 33 | fi 34 | } 35 | 36 | function display_file() { 37 | echo -e "\n#==[ File Content ]===========================#" 38 | echo -e "# cat ${1}" 39 | 40 | if [ -e "${1}" ] ; then 41 | cat "${1}" 42 | else 43 | echo "${1} does not exist!" 44 | fi 45 | } 46 | 47 | function display_systemd_status() { 48 | echo -e "\n#==[ Command ]======================================#" 49 | echo -e "# systemctl status ${1}" 50 | 51 | systemctl status "${1}" 2>&1 52 | } 53 | 54 | function display_cmd() { 55 | ORG_CMDLINE="${@}" 56 | CMDBIN=${ORG_CMDLINE%% *} 57 | FULLCMD=$(\which $CMDBIN 2>/dev/null | awk '{print $1}') 58 | echo -e "\n#==[ Command ]======================================#" 59 | if [ -x "$FULLCMD" ]; then 60 | CMDLINE=$(echo $ORG_CMDLINE | sed -e "s!${CMDBIN}!${FULLCMD}!") 61 | echo -e "# $CMDLINE" 62 | echo "$CMDLINE" | bash 63 | else 64 | echo -e "# $ORG_CMDLINE" 65 | echo "Command not found or not executable" 66 | fi 67 | } 68 | 69 | # ---- Main ---- 70 | echo -e "Supportconfig Plugin for $TITLE, v${SVER}" 71 | 72 | display_package_info prometheus-ha_cluster_exporter 73 | display_systemd_status prometheus-ha_cluster_exporter 74 | 75 | for file in /usr/etc/ha_cluster_exporter.{yaml,json,toml} /etc/ha_cluster_exporter.{yaml,json,toml} /usr/etc/ha_cluster_exporter.web.yaml /etc/ha_cluster_exporter.web.yaml; do 76 | [ -e "${file}" ] && { display_file_stat "${file}" ; display_file "${file}" ; echo ; } 77 | done 78 | 79 | display_file_stat /etc/sysconfig/prometheus-ha_cluster_exporter 80 | display_file /etc/sysconfig/prometheus-ha_cluster_exporter 81 | 82 | #log infos from system log 83 | display_cmd "grep -E -i 'ha_cluster_exporter\[.*\]:' /var/log/messages" 84 | display_cmd "ss -tulpan | grep exporter" 85 | 86 | # Bye. 87 | exit 0 88 | -------------------------------------------------------------------------------- /test/corosync.metrics: -------------------------------------------------------------------------------- 1 | # HELP ha_cluster_corosync_member_votes How many votes each member node has contributed with to the current quorum 2 | # TYPE ha_cluster_corosync_member_votes gauge 3 | ha_cluster_corosync_member_votes{local="false",node="Qdevice",node_id="0"} 1 4 | ha_cluster_corosync_member_votes{local="false",node="stefanotorresi-hana02",node_id="1084783376"} 1 5 | ha_cluster_corosync_member_votes{local="true",node="stefanotorresi-hana01",node_id="1084783375"} 1 6 | # HELP ha_cluster_corosync_quorate Whether or not the cluster is quorate 7 | # TYPE ha_cluster_corosync_quorate gauge 8 | ha_cluster_corosync_quorate 1 9 | # HELP ha_cluster_corosync_quorum_votes Cluster quorum votes; one line per type 10 | # TYPE ha_cluster_corosync_quorum_votes gauge 11 | ha_cluster_corosync_quorum_votes{type="expected_votes"} 2 12 | ha_cluster_corosync_quorum_votes{type="highest_expected"} 2 13 | ha_cluster_corosync_quorum_votes{type="quorum"} 1 14 | ha_cluster_corosync_quorum_votes{type="total_votes"} 2 15 | # HELP ha_cluster_corosync_ring_errors The total number of faulty corosync rings 16 | # TYPE ha_cluster_corosync_ring_errors gauge 17 | ha_cluster_corosync_ring_errors 1 18 | # HELP ha_cluster_corosync_rings The status of each Corosync ring; 1 means healthy, 0 means faulty. 19 | # TYPE ha_cluster_corosync_rings gauge 20 | ha_cluster_corosync_rings{address="10.0.0.1",node_id="1084783375",number="0",ring_id="1084783375/40"} 0 21 | ha_cluster_corosync_rings{address="172.16.0.1",node_id="1084783375",number="1",ring_id="1084783375/40"} 1 22 | -------------------------------------------------------------------------------- /test/drbd-splitbrain/drbd-split-brain-detected-missingthingsWrongSkippedMetrics: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClusterLabs/ha_cluster_exporter/8cf20d5b3b3cafa8d1e5d349ddf7fbbbe681f1a8/test/drbd-splitbrain/drbd-split-brain-detected-missingthingsWrongSkippedMetrics -------------------------------------------------------------------------------- /test/drbd-splitbrain/drbd-split-brain-detected-resource01-vol01: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClusterLabs/ha_cluster_exporter/8cf20d5b3b3cafa8d1e5d349ddf7fbbbe681f1a8/test/drbd-splitbrain/drbd-split-brain-detected-resource01-vol01 -------------------------------------------------------------------------------- /test/drbd-splitbrain/drbd-split-brain-detected-resource02-vol02: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClusterLabs/ha_cluster_exporter/8cf20d5b3b3cafa8d1e5d349ddf7fbbbe681f1a8/test/drbd-splitbrain/drbd-split-brain-detected-resource02-vol02 -------------------------------------------------------------------------------- /test/drbd.metrics: -------------------------------------------------------------------------------- 1 | # HELP ha_cluster_drbd_al_writes Writes to activity log; 1 line per res, per volume 2 | # TYPE ha_cluster_drbd_al_writes gauge 3 | ha_cluster_drbd_al_writes{resource="1-single-0",volume="0"} 123 4 | ha_cluster_drbd_al_writes{resource="1-single-1",volume="0"} 123 5 | # HELP ha_cluster_drbd_bm_writes Writes to bitmap; 1 line per res, per volume 6 | # TYPE ha_cluster_drbd_bm_writes gauge 7 | ha_cluster_drbd_bm_writes{resource="1-single-0",volume="0"} 321 8 | ha_cluster_drbd_bm_writes{resource="1-single-1",volume="0"} 321 9 | # HELP ha_cluster_drbd_connections The DRBD resource connections; 1 line per per resource, per peer_node_id 10 | # TYPE ha_cluster_drbd_connections gauge 11 | ha_cluster_drbd_connections{peer_disk_state="uptodate",peer_node_id="1",peer_role="Primary",resource="1-single-0",volume="0"} 1 12 | ha_cluster_drbd_connections{peer_disk_state="uptodate",peer_node_id="1",peer_role="Primary",resource="1-single-1",volume="0"} 1 13 | # HELP ha_cluster_drbd_connections_pending Pending value per connection 14 | # TYPE ha_cluster_drbd_connections_pending gauge 15 | ha_cluster_drbd_connections_pending{peer_node_id="1",resource="1-single-0",volume="0"} 3 16 | ha_cluster_drbd_connections_pending{peer_node_id="1",resource="1-single-1",volume="0"} 3 17 | # HELP ha_cluster_drbd_connections_received KiB received per connection 18 | # TYPE ha_cluster_drbd_connections_received gauge 19 | ha_cluster_drbd_connections_received{peer_node_id="1",resource="1-single-0",volume="0"} 456 20 | ha_cluster_drbd_connections_received{peer_node_id="1",resource="1-single-1",volume="0"} 456 21 | # HELP ha_cluster_drbd_connections_sent KiB sent per connection 22 | # TYPE ha_cluster_drbd_connections_sent gauge 23 | ha_cluster_drbd_connections_sent{peer_node_id="1",resource="1-single-0",volume="0"} 654 24 | ha_cluster_drbd_connections_sent{peer_node_id="1",resource="1-single-1",volume="0"} 654 25 | # HELP ha_cluster_drbd_connections_sync The in sync percentage value for DRBD resource connections 26 | # TYPE ha_cluster_drbd_connections_sync gauge 27 | ha_cluster_drbd_connections_sync{peer_node_id="1",resource="1-single-0",volume="0"} 100 28 | ha_cluster_drbd_connections_sync{peer_node_id="1",resource="1-single-1",volume="0"} 100 29 | # HELP ha_cluster_drbd_connections_unacked Unacked value per connection 30 | # TYPE ha_cluster_drbd_connections_unacked gauge 31 | ha_cluster_drbd_connections_unacked{peer_node_id="1",resource="1-single-0",volume="0"} 4 32 | ha_cluster_drbd_connections_unacked{peer_node_id="1",resource="1-single-1",volume="0"} 4 33 | # HELP ha_cluster_drbd_lower_pending Lower pending; 1 line per res, per volume 34 | # TYPE ha_cluster_drbd_lower_pending gauge 35 | ha_cluster_drbd_lower_pending{resource="1-single-0",volume="0"} 2 36 | ha_cluster_drbd_lower_pending{resource="1-single-1",volume="0"} 2 37 | # HELP ha_cluster_drbd_quorum Quorum status per resource and per volume 38 | # TYPE ha_cluster_drbd_quorum gauge 39 | ha_cluster_drbd_quorum{resource="1-single-0",volume="0"} 1 40 | ha_cluster_drbd_quorum{resource="1-single-1",volume="0"} 0 41 | # HELP ha_cluster_drbd_read KiB read from DRBD; 1 line per res, per volume 42 | # TYPE ha_cluster_drbd_read gauge 43 | ha_cluster_drbd_read{resource="1-single-0",volume="0"} 654321 44 | ha_cluster_drbd_read{resource="1-single-1",volume="0"} 654321 45 | # HELP ha_cluster_drbd_resources The DRBD resources; 1 line per name, per volume 46 | # TYPE ha_cluster_drbd_resources gauge 47 | ha_cluster_drbd_resources{disk_state="uptodate",resource="1-single-0",role="Secondary",volume="0"} 1 48 | ha_cluster_drbd_resources{disk_state="uptodate",resource="1-single-1",role="Secondary",volume="0"} 1 49 | # HELP ha_cluster_drbd_upper_pending Upper pending; 1 line per res, per volume 50 | # TYPE ha_cluster_drbd_upper_pending gauge 51 | ha_cluster_drbd_upper_pending{resource="1-single-0",volume="0"} 1 52 | ha_cluster_drbd_upper_pending{resource="1-single-1",volume="0"} 1 53 | # HELP ha_cluster_drbd_written KiB written to DRBD; 1 line per res, per volume 54 | # TYPE ha_cluster_drbd_written gauge 55 | ha_cluster_drbd_written{resource="1-single-0",volume="0"} 123456 56 | ha_cluster_drbd_written{resource="1-single-1",volume="0"} 123456 57 | -------------------------------------------------------------------------------- /test/dummy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClusterLabs/ha_cluster_exporter/8cf20d5b3b3cafa8d1e5d349ddf7fbbbe681f1a8/test/dummy -------------------------------------------------------------------------------- /test/fake_corosync-cfgtool.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | cat < 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | EOF 149 | -------------------------------------------------------------------------------- /test/fake_drbdsetup.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | cat <