├── debian ├── compat ├── docs ├── source │ └── format ├── rules ├── control ├── copyright └── changelog ├── .gitignore ├── AUTHORS ├── BUILD_DEBIAN.md ├── COPYRIGHT ├── CHANGELOG ├── config └── ceph.cfg ├── Makefile ├── src ├── check_ceph_rgw ├── check_ceph_rgw_api ├── check_ceph_mon ├── check_ceph_osd_db ├── check_ceph_mgr ├── check_ceph_osd ├── check_ceph_osd_df ├── check_ceph_mds ├── check_ceph_health └── check_ceph_df ├── LICENSE └── README.md /debian/compat: -------------------------------------------------------------------------------- 1 | 9 2 | -------------------------------------------------------------------------------- /debian/docs: -------------------------------------------------------------------------------- 1 | README.md 2 | -------------------------------------------------------------------------------- /debian/source/format: -------------------------------------------------------------------------------- 1 | 3.0 (quilt) 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.tar.gz 3 | *.tar.xz 4 | *.deb 5 | *.dsc 6 | /.idea/ 7 | /tmp/ 8 | .history/ 9 | -------------------------------------------------------------------------------- /debian/rules: -------------------------------------------------------------------------------- 1 | #!/usr/bin/make -f 2 | # -*- makefile -*- 3 | 4 | # Uncomment this to turn on verbose mode. 5 | #export DH_VERBOSE=1 6 | 7 | %: 8 | dh "$@" --with python2 9 | 10 | # install in official directories 11 | override_dh_auto_install: 12 | dh_auto_install -- prefix=/usr sysconfdir=/etc 13 | 14 | 15 | -------------------------------------------------------------------------------- /AUTHORS: -------------------------------------------------------------------------------- 1 | Valery Tschopp 2 | Ricardo Rocha 3 | Roland Scheike https://github.com/elliot64 4 | Walter Huf 5 | Sebiastian Nickel 6 | Roman Plessl 7 | Vincent 8 | Konstantin Shalygin 9 | -------------------------------------------------------------------------------- /debian/control: -------------------------------------------------------------------------------- 1 | Source: nagios-plugins-ceph 2 | Maintainer: Valery Tschopp 3 | Section: misc 4 | Priority: optional 5 | Standards-Version: 3.9.8 6 | Build-Depends: debhelper (>= 9), dh-python, python:any 7 | 8 | Package: nagios-plugins-ceph 9 | Architecture: all 10 | Depends: python, ${misc:Depends}, ${python:Depends} 11 | Description: Nagios plugins for Ceph 12 | This package provides a set of plugins to monitor a ceph cluster. 13 | -------------------------------------------------------------------------------- /BUILD_DEBIAN.md: -------------------------------------------------------------------------------- 1 | # Build Debian Package 2 | 3 | You can build a Debian package (source and binary) 4 | 5 | ## Requirments 6 | 7 | To build the package first install the required package building dependencies: 8 | 9 | sudo apt-get install build-essential fakeroot devscripts python debhelper dh-python 10 | 11 | ## Build package 12 | 13 | Use the included `Makefile` to build the binary package: 14 | 15 | make deb 16 | 17 | And to build the source package: 18 | 19 | make deb-src 20 | 21 | And to clean the project: 22 | 23 | make clean 24 | 25 | -------------------------------------------------------------------------------- /COPYRIGHT: -------------------------------------------------------------------------------- 1 | 2 | Copyright (c) 2013 SWITCH http://www.switch.ch 3 | 4 | Licensed under the Apache License, Version 2.0 (the "License"); 5 | you may not use this file except in compliance with the License. 6 | You may obtain a copy of the License at 7 | 8 | http://www.apache.org/licenses/LICENSE-2.0 9 | 10 | Unless required by applicable law or agreed to in writing, software 11 | distributed under the License is distributed on an "AS IS" BASIS, 12 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | See the License for the specific language governing permissions and 14 | limitations under the License. 15 | 16 | -------------------------------------------------------------------------------- /CHANGELOG: -------------------------------------------------------------------------------- 1 | Nagios plugins for Ceph 2 | ----------------------- 3 | Version 1.5.6: 4 | * `check_ceph_rgw`, `check_ceph_rgw_api`: handled possible situations when 5 | bucket exists, but is empty (rgw bugs). 6 | 7 | Version 1.5.5: 8 | * `check_ceph_mgr` plugin added. 9 | 10 | Version 1.5.4: 11 | * `check_ceph_rgw_api` 'insecure' option added. 12 | 13 | Version 1.5.3: 14 | * `check_ceph_rgw_api` plugin added. 15 | 16 | Version 1.5.0: 17 | * `check_ceph_df` plugin added. 18 | 19 | Version 1.4.0: 20 | * Refactored `check_ceph_mon plugin` (v1.1.0), parses ceph quorum_status output. 21 | 22 | Version 1.3.1: 23 | * `check_ceph_rgw` with detailed bucket stats as perf data (--detail). 24 | 25 | Version 1.3.0: 26 | * `check_ceph_rgw` with bucket stats as perf data. 27 | 28 | Version 1.2.0: 29 | * `check_ceph_rgw` plugin added. 30 | 31 | Version 1.1.0: 32 | * `check_ceph_mon` plugin added. 33 | * `check_ceph_osd` plugin added. 34 | 35 | Version 1.0.1: 36 | * HEALTH_OK doesn't always have additional information. 37 | 38 | Version 1.0: 39 | * Initial 'ceph health' nagios plugin. 40 | -------------------------------------------------------------------------------- /debian/copyright: -------------------------------------------------------------------------------- 1 | Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ 2 | Upstream-Name: ceph-nagios-plugins 3 | Upstream-Contact: Valery Tschopp 4 | Ricardo Rocha 5 | Source: https://github.com/ceph/ceph-nagios-plugins 6 | 7 | Files: * 8 | Copyright: 2013 SWITCH http://www.switch.ch 9 | 2013 Catalyst IT http://www.catalyst.net.nz 10 | License: Apache-2.0 11 | Licensed under the Apache License, Version 2.0 (the "License"); 12 | you may not use this file except in compliance with the License. 13 | You may obtain a copy of the License at 14 | . 15 | http://www.apache.org/licenses/LICENSE-2.0 16 | . 17 | Unless required by applicable law or agreed to in writing, software 18 | distributed under the License is distributed on an "AS IS" BASIS, 19 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 20 | See the License for the specific language governing permissions and 21 | limitations under the License. 22 | . 23 | On Debian systems, the complete text of the Apache version 2.0 license 24 | can be found in "/usr/share/common-licenses/Apache-2.0". 25 | -------------------------------------------------------------------------------- /config/ceph.cfg: -------------------------------------------------------------------------------- 1 | # 'check_ceph_health' command definition 2 | define command{ 3 | command_name check_ceph_health 4 | command_line /usr/lib/nagios/plugins/check_ceph_health 5 | } 6 | define command{ 7 | command_name check_ceph_health_wargs 8 | command_line /usr/lib/nagios/plugins/check_ceph_health -H '$HOSTADDRESS$' 9 | } 10 | define command{ 11 | command_name check_ceph_health_filtered 12 | command_line /usr/lib/nagios/plugins/check_ceph_health -H '$HOSTADDRESS$' --check '$ARG1' --whitelist '$ARG2' 13 | } 14 | define command{ 15 | command_name check_ceph_mon 16 | command_line /usr/lib/nagios/plugins/check_ceph_mon -I '$ARG1$' 17 | } 18 | define command{ 19 | command_name check_ceph_mon_wargs 20 | command_line /usr/lib/nagios/plugins/check_ceph_mon -I '$ARG1$' -m '$ARG2$' -i '$ARG3$' -k '$ARG4$' 21 | } 22 | define command{ 23 | command_name check_ceph_osd 24 | command_line /usr/lib/nagios/plugins/check_ceph_osd -H '$HOSTADDRESS$' -I '$ARG1$' 25 | } 26 | define command{ 27 | command_name check_ceph_osd_wmon 28 | command_line /usr/lib/nagios/plugins/check_ceph_osd -H '$HOSTADDRESS$' -I '$ARG1$' -m '$ARG2$' -i '$ARG3$' -k '$ARG4$' 29 | } 30 | define command{ 31 | command_name check_ceph_rgw 32 | command_line /usr/lib/nagios/plugins/check_ceph_rgw 33 | } 34 | define command{ 35 | command_name check_ceph_rgw_api 36 | command_line /usr/lib/nagios/plugins/check_ceph_rgw_api -h '$HOSTADDRESS$' -a '$ARG1$' -s '$ARG2$' 37 | } 38 | define command{ 39 | command_name check_ceph_df 40 | command_line /usr/lib/nagios/plugins/check_ceph_df -m '$ARG1$' -i '$ARG2$' -k '$ARG3$' -W '$ARG4$' -C '$ARG5$' $ARG6$ 41 | } 42 | define command{ 43 | command_name check_ceph_osd_df 44 | command_line /usr/lib/nagios/plugins/check_ceph_osd_df -m '$ARG1$' -i '$ARG2$' -k '$ARG3$' -W '$ARG4$' -C '$ARG5$' 45 | } 46 | -------------------------------------------------------------------------------- /debian/changelog: -------------------------------------------------------------------------------- 1 | nagios-plugins-ceph (1.5.6-1) unstable; urgency=medium 2 | 3 | * Upstream version 1.5.6 4 | 5 | -- Valery Tschopp Mon, 04 Jan 2021 08:59:02 +0000 6 | 7 | nagios-plugins-ceph (1.5.5-1) unstable; urgency=medium 8 | 9 | * Upstream version 1.5.5 10 | 11 | -- Valery Tschopp Mon, 19 Nov 2018 14:53:02 +0100 12 | 13 | nagios-plugins-ceph (1.5.1-1) unstable; urgency=medium 14 | 15 | * Added whitelist regexp filter for good/unhandled ceph health warnings 16 | 17 | -- Roman Plessl Mon, 27 Jun 2016 13:45:00 +0200 18 | 19 | nagios-plugins-ceph (1.5.0-1) unstable; urgency=medium 20 | 21 | * Upstream version 1.5.0 22 | 23 | -- Valery Tschopp Tue, 15 Mar 2016 09:57:18 +0100 24 | 25 | nagios-plugins-ceph (1.4.0-1) unstable; urgency=medium 26 | 27 | * Upstream version 1.4.0 28 | 29 | -- Valery Tschopp Tue, 04 Aug 2015 17:32:19 +0200 30 | 31 | nagios-plugins-ceph (1.3.1-3) unstable; urgency=low 32 | 33 | * make it a native package 34 | 35 | -- Sebastian Nickel Thu, 16 Apr 2015 15:52:46 +0200 36 | 37 | nagios-plugins-ceph (1.3.1-2) unstable; urgency=low 38 | 39 | * fix quilt packaging 40 | 41 | -- Sebastian Nickel Thu, 16 Apr 2015 14:52:46 +0200 42 | 43 | nagios-plugins-ceph (1.3.1-1) unstable; urgency=low 44 | 45 | * new upstream version 1.3.1 46 | 47 | -- Sebastian Nickel Thu, 16 Apr 2015 10:12:09 +0200 48 | 49 | nagios-plugins-ceph (1.2.0-1) unstable; urgency=low 50 | 51 | * Upstream version 1.2.0 52 | 53 | -- Ricardo Rocha Wed, 12 Feb 2013 16:17:42 +0100 54 | 55 | nagios-plugins-ceph (1.1.0-1) unstable; urgency=low 56 | 57 | * Upstream version 1.1.0 58 | 59 | -- Ricardo Rocha Thu, 12 Dec 2013 16:17:42 +0100 60 | 61 | nagios-plugins-ceph (1.0.1-1) unstable; urgency=low 62 | 63 | * Initial release 64 | 65 | -- Ricardo Rocha Wed, 11 Dec 2013 15:52:10 +1300 66 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2013 SWITCH http://www.switch.ch 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # Authors: 16 | # Valery Tschopp 17 | # Ricardo Rocha 18 | 19 | name = nagios-plugins-ceph 20 | version = 1.5.6 21 | 22 | # install options (like configure) 23 | # ex: make sysconfdir=/etc libdir=/usr/lib64 sysconfdir=/etc install 24 | prefix = /usr 25 | libdir = $(prefix)/lib 26 | sysconfdir = $(prefix)/etc 27 | nagiosdir = $(libdir)/nagios/plugins 28 | nagiosconfdir = $(sysconfdir)/nagios-plugins/config 29 | 30 | tmp_dir = $(CURDIR)/tmp 31 | 32 | .PHONY: clean dist install deb deb-src 33 | 34 | clean: 35 | rm -rf $(tmp_dir) 36 | rm -f *.tar.gz *.deb *.dsc *.xz 37 | 38 | dist: 39 | @echo "Packaging sources" 40 | test ! -d $(tmp_dir) || rm -fr $(tmp_dir) 41 | mkdir -p $(tmp_dir)/$(name)-$(version) 42 | cp Makefile $(tmp_dir)/$(name)-$(version) 43 | cp COPYRIGHT LICENSE README.md CHANGELOG $(tmp_dir)/$(name)-$(version) 44 | cp -R src $(tmp_dir)/$(name)-$(version) 45 | cp -R config $(tmp_dir)/$(name)-$(version) 46 | cp -R debian $(tmp_dir)/$(name)-$(version) 47 | test ! -f $(name)-$(version).tar.gz || rm $(name)-$(version).tar.gz 48 | tar -C $(tmp_dir) -czf $(name)-$(version).tar.gz $(name)-$(version) 49 | rm -fr $(tmp_dir) 50 | 51 | install: 52 | @echo "Installing Ceph Nagios plugins in $(DESTDIR)$(nagiosdir)" 53 | install -d $(DESTDIR)$(nagiosdir) 54 | install -m 0755 src/* $(DESTDIR)$(nagiosdir) 55 | install -d $(DESTDIR)$(nagiosconfdir) 56 | install -m 0644 config/* $(DESTDIR)$(nagiosconfdir) 57 | 58 | pre_deb: dist 59 | mkdir -p $(tmp_dir) 60 | cp $(name)-$(version).tar.gz $(tmp_dir)/$(name)_$(version).orig.tar.gz 61 | tar -C $(tmp_dir) -xzf $(tmp_dir)/$(name)_$(version).orig.tar.gz 62 | 63 | deb-src: pre_deb 64 | @echo "Debian source package..." 65 | cd $(tmp_dir) && dpkg-source -b $(name)-$(version) 66 | cp $(tmp_dir)/$(name)_$(version)* . 67 | rm -rf $(tmp_dir) 68 | 69 | deb: pre_deb 70 | @echo "Debian package..." 71 | cd $(tmp_dir)/$(name)-$(version) && debuild -uc -us 72 | cp $(tmp_dir)/$(name)*.deb . 73 | rm -rf $(tmp_dir) 74 | 75 | docker-build: 76 | echo " \ 77 | FROM debian:buster\n \ 78 | ARG DEBIAN_FRONTEND=noninteractive\n \ 79 | RUN apt-get update; \ 80 | apt-get --yes upgrade; \ 81 | apt-get --no-install-recommends --yes install build-essential fakeroot devscripts python debhelper dh-python" | docker build -t ceph-nagios-plugins-builder - 82 | docker run -v $(CURDIR):/opt ceph-nagios-plugins-builder /bin/bash -c "cd /opt; make deb; make deb-src" 83 | 84 | # vim: set noexpandtab: 85 | -------------------------------------------------------------------------------- /src/check_ceph_rgw: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2014 Catalyst IT http://www.catalyst.net.nz 5 | # Copyright (c) 2015 SWITCH http://www.switch.ch 6 | # 7 | # Licensed under the Apache License, Version 2.0 (the "License"); 8 | # you may not use this file except in compliance with the License. 9 | # You may obtain a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, software 14 | # distributed under the License is distributed on an "AS IS" BASIS, 15 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 | # See the License for the specific language governing permissions and 17 | # limitations under the License. 18 | 19 | 20 | from __future__ import print_function 21 | import argparse 22 | import os 23 | import re 24 | import subprocess 25 | import sys 26 | import json 27 | 28 | __version__ = '1.5.1' 29 | 30 | # default ceph values 31 | RGW_COMMAND = '/usr/bin/radosgw-admin' 32 | 33 | # nagios exit code 34 | STATUS_OK = 0 35 | STATUS_WARNING = 1 36 | STATUS_ERROR = 2 37 | STATUS_UNKNOWN = 3 38 | 39 | def main(): 40 | 41 | # parse args 42 | parser = argparse.ArgumentParser(description="'radosgw-admin bucket stats' nagios plugin.") 43 | parser.add_argument('-d','--detail', help='output perf data for all buckets', action='store_true') 44 | parser.add_argument('-B','--byte', help='output perf data in Byte instead of KB', action='store_true') 45 | parser.add_argument('-e','--exe', help='radosgw-admin executable [%s]' % RGW_COMMAND) 46 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 47 | parser.add_argument('-i','--id', help='ceph client id') 48 | parser.add_argument('-n','--name', help='ceph client name (type.id)') 49 | parser.add_argument('-V','--version', help='show version and exit', action='store_true') 50 | args = parser.parse_args() 51 | 52 | # validate args 53 | rgw_exec = args.exe if args.exe else RGW_COMMAND 54 | if not os.path.exists(rgw_exec): 55 | print("RGW ERROR: radosgw-admin executable '%s' doesn't exist" % rgw_exec) 56 | return STATUS_UNKNOWN 57 | 58 | if args.version: 59 | print('version %s' % __version__) 60 | return STATUS_OK 61 | 62 | if args.conf and not os.path.exists(args.conf): 63 | print("RGW ERROR: ceph conf file '%s' doesn't exist" % args.conf) 64 | return STATUS_UNKNOWN 65 | 66 | # build command 67 | rgw_cmd = [rgw_exec] 68 | if args.conf: 69 | rgw_cmd.append('-c') 70 | rgw_cmd.append(args.conf) 71 | if args.id: 72 | rgw_cmd.append('--id') 73 | rgw_cmd.append(args.id) 74 | if args.name: 75 | rgw_cmd.append('-n') 76 | rgw_cmd.append(args.name) 77 | rgw_cmd.append('bucket') 78 | rgw_cmd.append('stats') 79 | 80 | # exec command 81 | p = subprocess.Popen(rgw_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 82 | output, err = p.communicate() 83 | 84 | if p.returncode != 0 or not output: 85 | print("RGW ERROR: %s :: %s" % (output, err)) 86 | return STATUS_ERROR 87 | 88 | bucket_stats = json.loads(output) 89 | #print bucket_stats 90 | 91 | buckets = [] 92 | for i in bucket_stats: 93 | if type(i) is dict: 94 | bucket_name = i['bucket'] 95 | usage_dict = i['usage'] 96 | if usage_dict and 'rgw.main' in usage_dict: 97 | bucket_usage_kb = usage_dict['rgw.main']['size_kb_actual'] 98 | else: 99 | bucket_usage_kb = 0 100 | buckets.append((bucket_name, bucket_usage_kb)) 101 | buckets_total_kb = sum([b[1] for b in buckets]) 102 | 103 | if args.byte: 104 | status = "RGW OK: {} buckets, {} KB total | /={}B ".format(len(buckets),buckets_total_kb,buckets_total_kb*1024) 105 | else: 106 | status = "RGW OK: {} buckets, {} KB total | /={}KB ".format(len(buckets),buckets_total_kb,buckets_total_kb) 107 | #print buckets 108 | if buckets and args.detail: 109 | if args.byte: 110 | status = status + " ".join(["{}={}B".format(b[0],b[1]*1024) for b in buckets]) 111 | else: 112 | status = status + " ".join(["{}={}KB".format(b[0],b[1]) for b in buckets]) 113 | 114 | print(status) 115 | return STATUS_OK 116 | 117 | if __name__ == "__main__": 118 | sys.exit(main()) 119 | -------------------------------------------------------------------------------- /src/check_ceph_rgw_api: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2014 Catalyst IT http://www.catalyst.net.nz 5 | # Copyright (c) 2015 SWITCH http://www.switch.ch 6 | # 7 | # Licensed under the Apache License, Version 2.0 (the "License"); 8 | # you may not use this file except in compliance with the License. 9 | # You may obtain a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, software 14 | # distributed under the License is distributed on an "AS IS" BASIS, 15 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 | # See the License for the specific language governing permissions and 17 | # limitations under the License. 18 | 19 | from __future__ import print_function 20 | import requests 21 | import warnings 22 | import json 23 | import argparse 24 | import sys 25 | from awsauth import S3Auth 26 | 27 | __version__ = '1.7.2' 28 | 29 | # nagios exit code 30 | STATUS_OK = 0 31 | STATUS_WARNING = 1 32 | STATUS_CRITICAL = 2 33 | STATUS_UNKNOWN = 3 34 | 35 | def main(): 36 | 37 | # parse args 38 | parser = argparse.ArgumentParser(description="'radosgw api bucket stats' nagios plugin.") 39 | parser.add_argument('-H', '--host', help="Server URL for the radosgw api (example: http://objects.dreamhost.com/)", required=True) 40 | parser.add_argument('-k', '--insecure', help="Allow insecure server connections when using SSL", action="store_false") 41 | parser.add_argument('-e', '--admin_entry', help="The entry point for an admin request URL [default is '%(default)s']", default="admin") 42 | parser.add_argument('-a', '--access_key', help="S3 access key", required=True) 43 | parser.add_argument('-s', '--secret_key', help="S3 secret key", required=True) 44 | parser.add_argument('-d', '--detail', help="output perf data for all buckets", action="store_true") 45 | parser.add_argument('-b', '--byte', help="output perf data in Byte instead of KB", action="store_true") 46 | parser.add_argument('-v', '--version', help='show version and exit', action="store_true") 47 | args = parser.parse_args() 48 | 49 | if args.version: 50 | print("version {0}".format(__version__)) 51 | return STATUS_OK 52 | 53 | # helpers for default schema 54 | if not args.host.startswith("http"): 55 | args.host = "http://{0}".format(args.host) 56 | # and for request_uri 57 | if not args.host.endswith("/"): 58 | args.host = "{0}/".format(args.host) 59 | 60 | url = "{0}{1}/bucket?format=json&stats=True".format(args.host, 61 | args.admin_entry) 62 | 63 | try: 64 | # Inversion of condition, when '--insecure' is defined we disable 65 | # requests warning about certificate hostname mismatch. 66 | if not args.insecure: 67 | warnings.filterwarnings('ignore', message='Unverified HTTPS request') 68 | 69 | response = requests.get(url, verify=args.insecure, 70 | auth=S3Auth(args.access_key, args.secret_key, 71 | args.host)) 72 | 73 | if response.status_code == requests.codes.ok: 74 | bucket_stats = response.json() 75 | else: 76 | # no usage caps or wrong admin entry 77 | print("RGW ERROR [{0}]: {1}".format(response.status_code, 78 | response.content.decode('utf-8'))) 79 | return STATUS_WARNING 80 | 81 | # DNS, connection errors, etc 82 | except requests.exceptions.RequestException as e: 83 | print("RGW ERROR: {0}".format(e)) 84 | return STATUS_UNKNOWN 85 | 86 | #print(bucket_stats) 87 | buckets = [] 88 | for i in bucket_stats: 89 | if type(i) is dict: 90 | bucket_name = i['bucket'] 91 | usage_dict = i['usage'] 92 | if usage_dict and 'rgw.main' in usage_dict: 93 | bucket_usage_kb = usage_dict['rgw.main']['size_kb_actual'] 94 | else: 95 | bucket_usage_kb = 0 96 | buckets.append((bucket_name, bucket_usage_kb)) 97 | buckets_total_kb = sum([b[1] for b in buckets]) 98 | 99 | status = "RGW OK: {0} buckets, {1} KB total | /={2}{3} " 100 | 101 | if args.byte: 102 | status = status.format(len(buckets), buckets_total_kb, buckets_total_kb*1024, "B") 103 | else: 104 | status = status.format(len(buckets), buckets_total_kb, buckets_total_kb, "KB") 105 | #print(buckets) 106 | if buckets and args.detail: 107 | if args.byte: 108 | status = status + " ".join(["{}={}B".format(b[0], b[1]*1024) for b in buckets]) 109 | else: 110 | status = status + " ".join(["{}={}KB".format(b[0], b[1]) for b in buckets]) 111 | 112 | print(status) 113 | return STATUS_OK 114 | 115 | if __name__ == "__main__": 116 | sys.exit(main()) 117 | -------------------------------------------------------------------------------- /src/check_ceph_mon: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2013 Catalyst IT http://www.catalyst.net.nz 5 | # Copyright (c) 2015 SWITCH http://www.switch.ch 6 | # 7 | # Licensed under the Apache License, Version 2.0 (the "License"); 8 | # you may not use this file except in compliance with the License. 9 | # You may obtain a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, software 14 | # distributed under the License is distributed on an "AS IS" BASIS, 15 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 | # See the License for the specific language governing permissions and 17 | # limitations under the License. 18 | # 19 | 20 | from __future__ import print_function 21 | import argparse 22 | import socket 23 | import os 24 | import re 25 | import subprocess 26 | import sys 27 | import json 28 | 29 | __version__ = '1.5.0' 30 | 31 | # default ceph values 32 | CEPH_EXEC = '/usr/bin/ceph' 33 | CEPH_COMMAND = 'quorum_status' 34 | 35 | # nagios exit code 36 | STATUS_OK = 0 37 | STATUS_WARNING = 1 38 | STATUS_ERROR = 2 39 | STATUS_UNKNOWN = 3 40 | 41 | ## 42 | # ceph quorum_status output example 43 | ## 44 | ceph_quorum_status_output_example = '''{ 45 | "quorum_leader_name" : "s0001", 46 | "monmap" : { 47 | "mons" : [ 48 | { 49 | "name" : "s0001", 50 | "addr" : "[2001:620:5ca1:8000::1001]:6789/0", 51 | "rank" : 0 52 | }, 53 | { 54 | "name" : "s0003", 55 | "addr" : "[2001:620:5ca1:8000::1003]:6789/0", 56 | "rank" : 1 57 | } 58 | ], 59 | "created" : "2014-12-15 08:28:35.153650", 60 | "epoch" : 2, 61 | "modified" : "2014-12-15 08:28:40.371878", 62 | "fsid" : "22348d2b-b69d-46cc-9a79-ca93cd6bae84" 63 | }, 64 | "quorum_names" : [ 65 | "s0001", 66 | "s0003" 67 | ], 68 | "quorum" : [ 69 | 0, 70 | 1 71 | ], 72 | "election_epoch" : 24 73 | }''' 74 | 75 | def main(): 76 | 77 | # parse args 78 | parser = argparse.ArgumentParser(description="'ceph quorum_status' nagios plugin.") 79 | parser.add_argument('-e','--exe', help='ceph executable [%s]' % CEPH_EXEC) 80 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 81 | parser.add_argument('-m','--monaddress', help='ceph monitor to use for queries (address[:port])') 82 | parser.add_argument('-i','--id', help='ceph client id') 83 | parser.add_argument('-k','--keyring', help='ceph client keyring file') 84 | parser.add_argument('-V','--version', help='show version and exit', action='store_true') 85 | parser.add_argument('-I','--monid', help='mon ID to be checked for availability') 86 | args = parser.parse_args() 87 | 88 | if args.version: 89 | print('version %s' % __version__) 90 | return STATUS_OK 91 | 92 | # validate args 93 | ceph_exec = args.exe if args.exe else CEPH_EXEC 94 | if not os.path.exists(ceph_exec): 95 | print("MON ERROR: ceph executable '%s' doesn't exist" % ceph_exec) 96 | return STATUS_UNKNOWN 97 | 98 | if args.conf and not os.path.exists(args.conf): 99 | print("MON ERROR: ceph conf file '%s' doesn't exist" % args.conf) 100 | return STATUS_UNKNOWN 101 | 102 | if args.keyring and not os.path.exists(args.keyring): 103 | print("MON ERROR: keyring file '%s' doesn't exist" % args.keyring) 104 | return STATUS_UNKNOWN 105 | 106 | if not args.monid: 107 | print("MON ERROR: no MON ID given, use -I/--monid parameter") 108 | return STATUS_UNKNOWN 109 | 110 | # build command 111 | ceph_cmd = [ceph_exec] 112 | if args.monaddress: 113 | ceph_cmd.append('-m') 114 | ceph_cmd.append(args.monaddress) 115 | if args.conf: 116 | ceph_cmd.append('-c') 117 | ceph_cmd.append(args.conf) 118 | if args.id: 119 | ceph_cmd.append('--id') 120 | ceph_cmd.append(args.id) 121 | if args.keyring: 122 | ceph_cmd.append('--keyring') 123 | ceph_cmd.append(args.keyring) 124 | ceph_cmd.append(CEPH_COMMAND) 125 | 126 | # exec command 127 | p = subprocess.Popen(ceph_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 128 | output, err = p.communicate() 129 | 130 | if p.returncode != 0 or not output: 131 | print("MON ERROR: %s" % err) 132 | return STATUS_ERROR 133 | 134 | # load json output and parse 135 | quorum_status = False 136 | try: 137 | quorum_status = json.loads(output) 138 | except Exception as e: 139 | print("MON ERROR: could not parse '%s' output: %s: %s" % (CEPH_COMMAND,output,e)) 140 | return STATUS_UNKNOWN 141 | 142 | #print "XXX: quorum_status['quorum_names']:", quorum_status['quorum_names'] 143 | 144 | # do our checks 145 | is_monitor = False 146 | for mon in quorum_status['monmap']['mons']: 147 | if mon['name'] == args.monid: 148 | is_monitor = True 149 | if not is_monitor: 150 | print("MON WARN: mon '%s' is not in monmap: %s" % (args.monid,quorum_status['monmap']['mons'])) 151 | return STATUS_WARNING 152 | 153 | in_quorum = args.monid in quorum_status['quorum_names'] 154 | if in_quorum: 155 | print("MON OK") 156 | return STATUS_OK 157 | else: 158 | print("MON WARN: no MON '%s' found in quorum" % args.monid) 159 | return STATUS_WARNING 160 | 161 | # main 162 | if __name__ == "__main__": 163 | sys.exit(main()) 164 | -------------------------------------------------------------------------------- /src/check_ceph_osd_db: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2020 Binero AB https://binero.com 5 | # Copyright (c) 2013 Catalyst IT http://www.catalyst.net.nz 6 | # 7 | # Licensed under the Apache License, Version 2.0 (the "License"); 8 | # you may not use this file except in compliance with the License. 9 | # You may obtain a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, software 14 | # distributed under the License is distributed on an "AS IS" BASIS, 15 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 | # See the License for the specific language governing permissions and 17 | # limitations under the License. 18 | 19 | import argparse 20 | import os 21 | import re 22 | import subprocess 23 | import sys 24 | import socket 25 | import json 26 | 27 | 28 | CEPH_COMMAND = '/usr/bin/ceph' 29 | 30 | STATUS_OK = 0 31 | STATUS_CRITICAL = 2 32 | STATUS_UNKNOWN = 3 33 | 34 | 35 | def main(): 36 | parser = argparse.ArgumentParser(description="'ceph osd' nagios plugin.") 37 | 38 | parser.add_argument('-e','--exe', help='ceph executable [%s]' % CEPH_COMMAND) 39 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 40 | parser.add_argument('-m','--monaddress', help='ceph monitor address[:port]') 41 | parser.add_argument('-i','--id', help='ceph client id') 42 | parser.add_argument('-k','--keyring', help='ceph client keyring file') 43 | parser.add_argument('-H','--host', help='osd host', required=True) 44 | parser.add_argument('-C','--critical', help='critical threshold', default=60) 45 | 46 | args = parser.parse_args() 47 | 48 | ceph_exec = args.exe if args.exe else CEPH_COMMAND 49 | if not os.path.exists(ceph_exec): 50 | print("UNKNOWN: ceph executable '%s' doesn't exist" % ceph_exec) 51 | return STATUS_UNKNOWN 52 | 53 | if args.conf and not os.path.exists(args.conf): 54 | print("UNKNOWN: ceph conf file '%s' doesn't exist" % args.conf) 55 | return STATUS_UNKNOWN 56 | 57 | if args.keyring and not os.path.exists(args.keyring): 58 | print("UNKNOWN: keyring file '%s' doesn't exist" % args.keyring) 59 | return STATUS_UNKNOWN 60 | 61 | if not args.host: 62 | print("UNKNOWN: no OSD hostname given") 63 | return STATUS_UNKNOWN 64 | 65 | try: 66 | addrinfo = socket.getaddrinfo(args.host, None, 0, socket.SOCK_STREAM) 67 | args.host = addrinfo[0][-1][0] 68 | if addrinfo[0][0] == socket.AF_INET6: 69 | args.host = "[%s]" % args.host 70 | except Exception: 71 | print('UNKNOWN: could not resolve %s' % args.host) 72 | return STATUS_UNKNOWN 73 | 74 | ceph_cmd = [ceph_exec] 75 | if args.monaddress: 76 | ceph_cmd.append('-m') 77 | ceph_cmd.append(args.monaddress) 78 | if args.conf: 79 | ceph_cmd.append('-c') 80 | ceph_cmd.append(args.conf) 81 | if args.id: 82 | ceph_cmd.append('--id') 83 | ceph_cmd.append(args.id) 84 | if args.keyring: 85 | ceph_cmd.append('--keyring') 86 | ceph_cmd.append(args.keyring) 87 | 88 | ceph_cmd.append('osd') 89 | ceph_cmd.append('dump') 90 | 91 | p = subprocess.Popen(ceph_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 92 | output, err = p.communicate() 93 | 94 | if err or not output: 95 | print("CRITICAL: %s" % err) 96 | return STATUS_CRITICAL 97 | 98 | # escape IPv4 host address 99 | osd_host = args.host.replace('.', '\.') 100 | # escape IPv6 host address 101 | osd_host = osd_host.replace('[', '\[') 102 | osd_host = osd_host.replace(']', '\]') 103 | 104 | osds_up = re.findall(r"^(osd\.[^ ]*) up.*%s:" % (osd_host), output, re.MULTILINE) 105 | 106 | final_status = STATUS_OK 107 | lines = [] 108 | 109 | for osd in osds_up: 110 | daemon_ceph_cmd = [ceph_exec, '--format', 'json'] 111 | daemon_ceph_cmd.append('daemon') 112 | daemon_ceph_cmd.append(osd) 113 | daemon_ceph_cmd.append('perf') 114 | daemon_ceph_cmd.append('dump') 115 | 116 | p = subprocess.Popen(daemon_ceph_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 117 | output, err = p.communicate() 118 | 119 | if err or not output: 120 | print("CRITICAL: %s" % err) 121 | return STATUS_CRITICAL 122 | 123 | try: 124 | data = json.loads(output) 125 | except Exception: 126 | print("CRITICAL: failed to load json") 127 | return STATUS_CRITICAL 128 | 129 | bluefs = data.get('bluefs', None) 130 | 131 | if not bluefs: 132 | continue 133 | 134 | db_total_bytes = bluefs.get('db_total_bytes') 135 | db_used_bytes = bluefs.get('db_used_bytes') 136 | perc = (float(db_used_bytes) / float(db_total_bytes) * 100) 137 | 138 | if perc >= args.critical and final_status == STATUS_OK: 139 | final_status = STATUS_CRITICAL 140 | 141 | lines.append("%s=%.2f%%" % (osd, perc)) 142 | 143 | if final_status == STATUS_OK: 144 | print("OK: %s" % (' '.join(lines))) 145 | else: 146 | print("CRITICAL: %s" % (' '.join(lines))) 147 | 148 | return final_status 149 | 150 | 151 | if __name__ == "__main__": 152 | sys.exit(main()) 153 | -------------------------------------------------------------------------------- /src/check_ceph_mgr: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2018 SWITCH http://www.switch.ch 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | # 18 | 19 | from __future__ import print_function 20 | import argparse 21 | import os 22 | import subprocess 23 | import sys 24 | import json 25 | 26 | __version__ = '1.0.0' 27 | 28 | # default ceph values 29 | CEPH_EXEC = '/usr/bin/ceph' 30 | CEPH_COMMAND = 'mgr dump -f json' 31 | 32 | CEPH_MGR_DUMP_EXAMPLE = ''' 33 | $ ceph --version 34 | ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) 35 | $ ceph mgr dump -f json|jq . 36 | { 37 | "epoch": 165, 38 | "active_gid": 248001409, 39 | "active_name": "zhdk0013", 40 | "active_addr": "10.10.10.9:6800/810408", 41 | "available": true, 42 | "standbys": [ 43 | { 44 | "gid": 247991934, 45 | "name": "zhdk0009", 46 | "available_modules": [ 47 | "balancer", 48 | "dashboard", 49 | "influx", 50 | "localpool", 51 | "prometheus", 52 | "restful", 53 | "selftest", 54 | "status", 55 | "zabbix" 56 | ] 57 | }, 58 | { 59 | "gid": 248011196, 60 | "name": "zhdk0025", 61 | "available_modules": [ 62 | "balancer", 63 | "dashboard", 64 | "influx", 65 | "localpool", 66 | "prometheus", 67 | "restful", 68 | "selftest", 69 | "status", 70 | "zabbix" 71 | ] 72 | } 73 | ], 74 | "modules": [ 75 | "balancer", 76 | "restful", 77 | "status" 78 | ], 79 | "available_modules": [ 80 | "balancer", 81 | "dashboard", 82 | "influx", 83 | "localpool", 84 | "prometheus", 85 | "restful", 86 | "selftest", 87 | "status", 88 | "zabbix" 89 | ], 90 | "services": {} 91 | } 92 | ''' 93 | 94 | # nagios exit code 95 | STATUS_OK = 0 96 | STATUS_WARNING = 1 97 | STATUS_ERROR = 2 98 | STATUS_UNKNOWN = 3 99 | 100 | 101 | def main(): 102 | # parse args 103 | parser = argparse.ArgumentParser(description="'ceph mgr dump' nagios plugin.") 104 | parser.add_argument('-e', '--exe', help='ceph executable [%s]' % CEPH_EXEC) 105 | parser.add_argument('-c', '--conf', help='alternative ceph conf file') 106 | parser.add_argument('-m', '--monaddress', help='ceph monitor to use for queries (address[:port])') 107 | parser.add_argument('-i', '--id', help='ceph client id') 108 | parser.add_argument('-n', '--name', help='ceph client name') 109 | parser.add_argument('-k', '--keyring', help='ceph client keyring file') 110 | parser.add_argument('-V', '--version', help='show version and exit', action='store_true') 111 | args = parser.parse_args() 112 | 113 | if args.version: 114 | print("version {}".format(__version__)) 115 | return STATUS_OK 116 | 117 | # validate args 118 | ceph_exec = args.exe if args.exe else CEPH_EXEC 119 | if not os.path.exists(ceph_exec): 120 | print("MGR ERROR: ceph executable '{}' doesn't exist".format(ceph_exec)) 121 | return STATUS_UNKNOWN 122 | 123 | if args.conf and not os.path.exists(args.conf): 124 | print("MGR ERROR: ceph conf file '{}' doesn't exist".format(args.conf)) 125 | return STATUS_UNKNOWN 126 | 127 | if args.keyring and not os.path.exists(args.keyring): 128 | print("MGR ERROR: keyring file '{}' doesn't exist".format(args.keyring)) 129 | return STATUS_UNKNOWN 130 | 131 | # build command 132 | ceph_cmd = [ceph_exec] 133 | if args.monaddress: 134 | ceph_cmd.append('-m') 135 | ceph_cmd.append(args.monaddress) 136 | if args.conf: 137 | ceph_cmd.append('-c') 138 | ceph_cmd.append(args.conf) 139 | if args.id: 140 | ceph_cmd.append('--id') 141 | ceph_cmd.append(args.id) 142 | if args.name: 143 | ceph_cmd.append('--name') 144 | ceph_cmd.append(args.name) 145 | if args.keyring: 146 | ceph_cmd.append('--keyring') 147 | ceph_cmd.append(args.keyring) 148 | ceph_cmd.extend(CEPH_COMMAND.split(' ')) 149 | 150 | # exec command 151 | p = subprocess.Popen(ceph_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) 152 | output, err = p.communicate() 153 | 154 | if p.returncode != 0 or not output: 155 | print("MGR ERROR: {}".format(err)) 156 | return STATUS_UNKNOWN 157 | 158 | # load json output and parse 159 | mgr_dump = None 160 | try: 161 | mgr_dump = json.loads(output) 162 | except Exception as e: 163 | print("MGR ERROR: could not parse '{}' output: {}: {}".format(ceph_cmd, output, e)) 164 | return STATUS_UNKNOWN 165 | 166 | # check active 167 | if 'active_name' not in mgr_dump: 168 | print("MGR CRITICAL: not active mgr found") 169 | print("JSON: {}".format(json.dumps(mgr_dump))) 170 | return STATUS_ERROR 171 | 172 | active_mgr_name = mgr_dump['active_name'] 173 | # check standby 174 | standby_mgr_names = [] 175 | for standby_mgr in mgr_dump['standbys']: 176 | standby_mgr_names.append(standby_mgr['name']) 177 | 178 | if len(standby_mgr_names) <= 0: 179 | print("MGR WARN: active: {} but no standbys".format(active_mgr_name)) 180 | return STATUS_WARNING 181 | else: 182 | print("MGR OK: active: {}, standbys: {}".format(active_mgr_name, 183 | ", ".join(standby_mgr_names))) 184 | return STATUS_OK 185 | 186 | # main 187 | if __name__ == "__main__": 188 | sys.exit(main()) 189 | -------------------------------------------------------------------------------- /src/check_ceph_osd: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2013 Catalyst IT http://www.catalyst.net.nz 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | # 18 | # 1.5.2 (2019-06-16) Martin Seener: fixed regex to work with Ceph Nautilus (14.2.x) 19 | 20 | from __future__ import print_function 21 | import argparse 22 | import os 23 | import re 24 | import subprocess 25 | import sys 26 | import socket 27 | 28 | __version__ = '1.5.2' 29 | 30 | # default ceph values 31 | CEPH_COMMAND = '/usr/bin/ceph' 32 | 33 | # nagios exit code 34 | STATUS_OK = 0 35 | STATUS_WARNING = 1 36 | STATUS_ERROR = 2 37 | STATUS_UNKNOWN = 3 38 | 39 | def main(): 40 | 41 | # parse args 42 | parser = argparse.ArgumentParser(description="'ceph osd' nagios plugin.") 43 | parser.add_argument('-e','--exe', help='ceph executable [%s]' % CEPH_COMMAND) 44 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 45 | parser.add_argument('-m','--monaddress', help='ceph monitor address[:port]') 46 | parser.add_argument('-i','--id', help='ceph client id') 47 | parser.add_argument('-k','--keyring', help='ceph client keyring file') 48 | parser.add_argument('-V','--version', help='show version and exit', action='store_true') 49 | parser.add_argument('-H','--host', help='osd host', required=True) 50 | parser.add_argument('-I','--osdid', help='osd id', required=False) 51 | parser.add_argument('-C','--crit', help='Number of failed OSDs to trigger critical (default=2)',type=int,default=2, required=False) 52 | parser.add_argument('-o','--out', help='check osds that are set OUT', default=False, action='store_true', required=False) 53 | args = parser.parse_args() 54 | 55 | # validate args 56 | ceph_exec = args.exe if args.exe else CEPH_COMMAND 57 | if not os.path.exists(ceph_exec): 58 | print("OSD ERROR: ceph executable '%s' doesn't exist" % ceph_exec) 59 | return STATUS_UNKNOWN 60 | 61 | if args.version: 62 | print('version %s' % __version__) 63 | return STATUS_OK 64 | 65 | if args.conf and not os.path.exists(args.conf): 66 | print("OSD ERROR: ceph conf file '%s' doesn't exist" % args.conf) 67 | return STATUS_UNKNOWN 68 | 69 | if args.keyring and not os.path.exists(args.keyring): 70 | print("OSD ERROR: keyring file '%s' doesn't exist" % args.keyring) 71 | return STATUS_UNKNOWN 72 | 73 | if not args.osdid: 74 | args.osdid = '[^ ]*' 75 | 76 | if not args.host: 77 | print("OSD ERROR: no OSD hostname given") 78 | return STATUS_UNKNOWN 79 | 80 | try: 81 | addrinfo = socket.getaddrinfo(args.host, None, 0, socket.SOCK_STREAM) 82 | args.host = addrinfo[0][-1][0] 83 | if addrinfo[0][0] == socket.AF_INET6: 84 | args.host = "[%s]" % args.host 85 | except: 86 | print('OSD ERROR: could not resolve %s' % args.host) 87 | return STATUS_UNKNOWN 88 | 89 | 90 | # build command 91 | ceph_cmd = [ceph_exec] 92 | if args.monaddress: 93 | ceph_cmd.append('-m') 94 | ceph_cmd.append(args.monaddress) 95 | if args.conf: 96 | ceph_cmd.append('-c') 97 | ceph_cmd.append(args.conf) 98 | if args.id: 99 | ceph_cmd.append('--id') 100 | ceph_cmd.append(args.id) 101 | if args.keyring: 102 | ceph_cmd.append('--keyring') 103 | ceph_cmd.append(args.keyring) 104 | ceph_cmd.append('osd') 105 | ceph_cmd.append('dump') 106 | 107 | # exec command 108 | p = subprocess.Popen(ceph_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 109 | output, err = p.communicate() 110 | output = output.decode('utf8') 111 | 112 | if err or not output: 113 | print("OSD ERROR: %s" % err) 114 | return STATUS_ERROR 115 | 116 | # escape IPv4 host address 117 | osd_host = args.host.replace('.', '\.') 118 | # escape IPv6 host address 119 | osd_host = osd_host.replace('[', '\[') 120 | osd_host = osd_host.replace(']', '\]') 121 | up = re.findall(r"^(osd\.%s) up.*%s:" % (args.osdid, osd_host), output, re.MULTILINE) 122 | if args.out: 123 | down = re.findall(r"^(osd\.%s) down.*%s:" % (args.osdid, osd_host), output, re.MULTILINE) 124 | down_in = re.findall(r"^(osd\.%s) down[ ]+in.*%s:" % (args.osdid, osd_host), output, re.MULTILINE) 125 | down_out = re.findall(r"^(osd\.%s) down[ ]+out.*%s:" % (args.osdid, osd_host), output, re.MULTILINE) 126 | else: 127 | down = re.findall(r"^(osd\.%s) down[ ]+in.*%s:" % (args.osdid, osd_host), output, re.MULTILINE) 128 | down_in = down 129 | down_out = re.findall(r"^(osd\.%s) down[ ]+out.*%s:" % (args.osdid, osd_host), output, re.MULTILINE) 130 | 131 | if down: 132 | print("OSD %s: Down OSD%s on %s: %s" % ('CRITICAL' if len(down)>=args.crit else 'WARNING' ,'s' if len(down)>1 else '', args.host, " ".join(down))) 133 | print("Up OSDs: " + " ".join(up)) 134 | print("Down+In OSDs: " + " ".join(down_in)) 135 | print("Down+Out OSDs: " + " ".join(down_out)) 136 | print("| 'osd_up'=%d 'osd_down_in'=%d;;%d 'osd_down_out'=%d;;%d" % (len(up), len(down_in), args.crit, len(down_out), args.crit)) 137 | if len(down)>=args.crit: 138 | return STATUS_ERROR 139 | else: 140 | return STATUS_WARNING 141 | 142 | if up: 143 | print("OSD OK") 144 | print("Up OSDs: " + " ".join(up)) 145 | print("Down+In OSDs: " + " ".join(down_in)) 146 | print("Down+Out OSDs: " + " ".join(down_out)) 147 | print("| 'osd_up'=%d 'osd_down_in'=%d;;%d 'osd_down_out'=%d;;%d" % (len(up), len(down_in), args.crit, len(down_out), args.crit)) 148 | return STATUS_OK 149 | 150 | print("OSD WARN: no OSD.%s found on host %s" % (args.osdid, args.host)) 151 | return STATUS_WARNING 152 | 153 | if __name__ == "__main__": 154 | sys.exit(main()) 155 | -------------------------------------------------------------------------------- /src/check_ceph_osd_df: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # check_ceph_osd_df - Check OSD DF output 5 | # Copyright (c) 2020 noris network AG https://www.noris.de 6 | # 7 | # This plugin will not output perfdata as there is likely a lot of output 8 | # which should be gathered using other tools. 9 | # 10 | # Parts based on code from check_ceph_df which is 11 | # Copyright (c) 2013 SWITCH http://www.switch.ch 12 | # 13 | # Licensed under the Apache License, Version 2.0 (the "License"); 14 | # you may not use this file except in compliance with the License. 15 | # You may obtain a copy of the License at 16 | # 17 | # http://www.apache.org/licenses/LICENSE-2.0 18 | # 19 | # Unless required by applicable law or agreed to in writing, software 20 | # distributed under the License is distributed on an "AS IS" BASIS, 21 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 22 | # See the License for the specific language governing permissions and 23 | # limitations under the License. 24 | # 25 | 26 | from __future__ import print_function 27 | import argparse 28 | import os 29 | import subprocess 30 | import sys 31 | import json 32 | from operator import itemgetter 33 | 34 | # Semver 35 | __version__ = '1.0.0' 36 | 37 | # default ceph values 38 | CEPH_COMMAND = '/usr/bin/ceph' 39 | 40 | # nagios exit code 41 | STATUS_OK = 0 42 | STATUS_WARNING = 1 43 | STATUS_ERROR = 2 44 | STATUS_UNKNOWN = 3 45 | 46 | def main(): 47 | 48 | # parse args 49 | parser = argparse.ArgumentParser(description="'ceph osd df' nagios plugin.") 50 | parser.add_argument('-e','--exe', help='ceph executable [%s]' % CEPH_COMMAND) 51 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 52 | parser.add_argument('-m','--monaddress', help='ceph monitor address[:port]') 53 | parser.add_argument('-i','--id', help='ceph client id') 54 | parser.add_argument('-n','--name', help='ceph client name') 55 | parser.add_argument('-k','--keyring', help='ceph client keyring file') 56 | parser.add_argument('-W','--warn', help="warn above this percent USED", type=float) 57 | parser.add_argument('-C','--critical', help="critical alert above this percent USED", type=float) 58 | parser.add_argument('-V','--version', help='show version and exit', action='store_true') 59 | args = parser.parse_args() 60 | 61 | # validate args 62 | ceph_exec = args.exe if args.exe else CEPH_COMMAND 63 | if not os.path.exists(ceph_exec): 64 | print("ERROR: ceph executable '%s' doesn't exist" % ceph_exec) 65 | return STATUS_UNKNOWN 66 | 67 | if args.version: 68 | print('version %s' % __version__) 69 | return STATUS_OK 70 | 71 | if args.conf and not os.path.exists(args.conf): 72 | print("ERROR: ceph conf file '%s' doesn't exist" % args.conf) 73 | return STATUS_UNKNOWN 74 | 75 | if args.keyring and not os.path.exists(args.keyring): 76 | print("ERROR: keyring file '%s' doesn't exist" % args.keyring) 77 | return STATUS_UNKNOWN 78 | 79 | if not args.warn or not args.critical or args.warn > args.critical: 80 | print("ERROR: warn and critical level must be set and critical must be greater than warn") 81 | return STATUS_UNKNOWN 82 | 83 | # build command 84 | ceph_osd_df = [ceph_exec] 85 | if args.monaddress: 86 | ceph_osd_df.append('-m') 87 | ceph_osd_df.append(args.monaddress) 88 | if args.conf: 89 | ceph_osd_df.append('-c') 90 | ceph_osd_df.append(args.conf) 91 | if args.id: 92 | ceph_osd_df.append('--id') 93 | ceph_osd_df.append(args.id) 94 | if args.name: 95 | ceph_osd_df.append('--name') 96 | ceph_osd_df.append(args.name) 97 | if args.keyring: 98 | ceph_osd_df.append('--keyring') 99 | ceph_osd_df.append(args.keyring) 100 | ceph_osd_df.append('osd') 101 | ceph_osd_df.append('df') 102 | ceph_osd_df.append('--format=json') 103 | 104 | # exec command 105 | p = subprocess.Popen(ceph_osd_df,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 106 | output, err = p.communicate() 107 | 108 | # parse output 109 | # print "DEBUG: output:", output 110 | # print "DEBUG: err:", err 111 | if output: 112 | # parse output 113 | try: 114 | result = json.loads(output) 115 | check_return_value = STATUS_OK 116 | nodes_sorted = sorted(result["nodes"], key=itemgetter('utilization','id')) 117 | 118 | warn_crit_osds = [] 119 | 120 | for node in reversed(nodes_sorted): 121 | if node["utilization"] >= args.warn and check_return_value is not STATUS_ERROR: 122 | check_return_value = STATUS_WARNING 123 | warn_crit_osds.append("{}={:04.2f}".format(node["name"], node["utilization"])) 124 | 125 | if node["utilization"] >= args.critical: 126 | check_return_value = STATUS_ERROR 127 | warn_crit_osds.append("{}={:04.2f}".format(node["name"], node["utilization"])) 128 | 129 | if check_return_value == STATUS_OK: 130 | print("OK: All OSDs within limits") 131 | return STATUS_OK 132 | elif check_return_value == STATUS_WARNING: 133 | print("WARNING: OSD usage above warn threshold: {:.4054}".format(", ".join(warn_crit_osds))) 134 | return STATUS_WARNING 135 | elif check_return_value == STATUS_ERROR: 136 | print("CRITICAL: OSD usage above critical or warn threshold: {:.4041}".format(", ".join(warn_crit_osds))) 137 | return STATUS_ERROR 138 | except: 139 | print("ERROR: {}".format(sys.exc_info()[0])) 140 | return STATUS_UNKNOWN 141 | elif err: 142 | # read only first line of error 143 | one_line = err.split('\n')[0] 144 | if '-1 ' in one_line: 145 | idx = one_line.rfind('-1 ') 146 | print('ERROR: %s: %s' % (ceph_exec, one_line[idx+len('-1 '):])) 147 | else: 148 | print(one_line) 149 | 150 | return STATUS_UNKNOWN 151 | 152 | if __name__ == "__main__": 153 | sys.exit(main()) 154 | -------------------------------------------------------------------------------- /src/check_ceph_mds: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2013 Catalyst IT http://www.catalyst.net.nz 5 | # Copyright (c) 2015 SWITCH http://www.switch.ch 6 | # 7 | # Licensed under the Apache License, Version 2.0 (the "License"); 8 | # you may not use this file except in compliance with the License. 9 | # You may obtain a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, software 14 | # distributed under the License is distributed on an "AS IS" BASIS, 15 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 | # See the License for the specific language governing permissions and 17 | # limitations under the License. 18 | 19 | 20 | from __future__ import print_function 21 | import argparse 22 | import socket 23 | import os 24 | import re 25 | import subprocess 26 | import sys 27 | import json 28 | 29 | __version__ = '1.6.0' 30 | 31 | # default ceph values 32 | CEPH_EXEC = '/usr/bin/ceph' 33 | CEPH_COMMAND = 'mds stat -f json' 34 | 35 | # nagios exit code 36 | STATUS_OK = 0 37 | STATUS_WARNING = 1 38 | STATUS_ERROR = 2 39 | STATUS_UNKNOWN = 3 40 | 41 | def main(): 42 | # parse args 43 | parser = argparse.ArgumentParser(description="'ceph mds stat' nagios plugin.") 44 | parser.add_argument('-e','--exe', help='ceph executable [%s]' % CEPH_EXEC) 45 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 46 | parser.add_argument('-m','--monaddress', help='ceph monitor to use for queries (address[:port])') 47 | parser.add_argument('-i','--id', help='ceph client id') 48 | parser.add_argument('-k','--keyring', help='ceph client keyring file') 49 | parser.add_argument('-V','--version', help='show version and exit', action='store_true') 50 | parser.add_argument('-n','--name', help='mds daemon name', required=True) 51 | parser.add_argument('-f','--filesystem', help='mds filesystem name', required=True) 52 | args = parser.parse_args() 53 | 54 | if args.version: 55 | print('version %s' % __version__) 56 | return STATUS_OK 57 | 58 | # validate args 59 | ceph_exec = args.exe if args.exe else CEPH_EXEC 60 | if not os.path.exists(ceph_exec): 61 | print("MDS ERROR: ceph executable '%s' doesn't exist" % ceph_exec) 62 | return STATUS_UNKNOWN 63 | 64 | if args.conf and not os.path.exists(args.conf): 65 | print("MDS ERROR: ceph conf file '%s' doesn't exist" % args.conf) 66 | return STATUS_UNKNOWN 67 | 68 | if args.keyring and not os.path.exists(args.keyring): 69 | print("MDS ERROR: keyring file '%s' doesn't exist" % args.keyring) 70 | return STATUS_UNKNOWN 71 | 72 | # build command 73 | ceph_cmd = [ceph_exec] 74 | if args.monaddress: 75 | ceph_cmd.append('-m') 76 | ceph_cmd.append(args.monaddress) 77 | if args.conf: 78 | ceph_cmd.append('-c') 79 | ceph_cmd.append(args.conf) 80 | if args.id: 81 | ceph_cmd.append('--id') 82 | ceph_cmd.append(args.id) 83 | if args.keyring: 84 | ceph_cmd.append('--keyring') 85 | ceph_cmd.append(args.keyring) 86 | ceph_cmd.extend(CEPH_COMMAND.split(' ')) 87 | 88 | # exec command 89 | p = subprocess.Popen(ceph_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 90 | output, err = p.communicate() 91 | 92 | if p.returncode != 0 or not output: 93 | print("MDS ERROR: %s" % err) 94 | return STATUS_ERROR 95 | 96 | # load json output and parse 97 | mds_stat = None 98 | try: 99 | mds_stat = json.loads(output) 100 | except Exception as e: 101 | print("MDS ERROR: could not parse '%s' output: %s: %s" % (CEPH_COMMAND,output,e)) 102 | return STATUS_UNKNOWN 103 | 104 | return check_target_mds(mds_stat, args.filesystem, args.name) 105 | 106 | def check_target_mds(mds_stat, fs_name, name): 107 | # find mds from standby list 108 | standby_mdss = _get_standby_mds(mds_stat) 109 | for mds in standby_mdss: 110 | if mds.get_name() == name: 111 | print("MDS OK: %s" % (mds)) 112 | return STATUS_OK 113 | 114 | # find mds from active list 115 | active_mdss = _get_active_mds(mds_stat, fs_name) 116 | 117 | if active_mdss: 118 | for mds in active_mdss: 119 | if mds.get_name() != name: 120 | continue 121 | # target mds in active list 122 | print("MDS %s: %s" % ("WARN" if mds.is_laggy() else "OK", mds)) 123 | return STATUS_WARNING if mds.is_laggy() else STATUS_OK 124 | 125 | # mds not found 126 | print("MDS ERROR: MDS '%s' is not found (offline?)" % (name)) 127 | return STATUS_ERROR 128 | else: 129 | # fs not found in map, perhaps user input error 130 | print("MDS ERROR: FS '%s' is not found in fsmap" % (fs_name)) 131 | return STATUS_ERROR 132 | 133 | def _get_standby_mds(mds_stat): 134 | mds_array = [] 135 | for mds in mds_stat['fsmap']['standbys']: 136 | name = mds['name'] 137 | state = mds['state'] 138 | laggy_since = mds['laggy_since'] if 'laggy_since' in mds else None 139 | mds_array.append(MDS(name, state)) 140 | 141 | return mds_array 142 | 143 | def _get_active_mds(mds_stat, fs_name): 144 | mds_fs = mds_stat['fsmap']['filesystems'] 145 | 146 | # find filesystem in stat 147 | for i in range(len(mds_fs)): 148 | mdsmap = mds_fs[i]['mdsmap'] 149 | if mdsmap['fs_name'] != fs_name: 150 | continue 151 | # put mds to array 152 | mds_array = [] 153 | infos = mds_stat['fsmap']['filesystems'][i]['mdsmap']['info'] 154 | for gid in infos: 155 | name = infos[gid]['name'] 156 | state = infos[gid]['state'] 157 | laggy_since = infos[gid]['laggy_since'] if 'laggy_since' in infos[gid] else None 158 | mds_array.append(MDS(name, state, laggy_since)) 159 | 160 | return mds_array 161 | 162 | # no fs found 163 | return None 164 | 165 | class MDS(object): 166 | def __init__(self, name, state, laggy_since=None): 167 | self.name = name 168 | self.state = state 169 | self.laggy_since = laggy_since 170 | 171 | def get_name(self): 172 | return self.name 173 | 174 | def get_state(self): 175 | return self.state 176 | 177 | def is_laggy(self): 178 | return self.laggy_since is not None 179 | 180 | def __str__(self): 181 | msg = "MDS '%s' is %s" % (self.name, self.state) 182 | if self.laggy_since is not None: 183 | msg += " (laggy or crashed)" 184 | return msg 185 | 186 | # main 187 | if __name__ == "__main__": 188 | sys.exit(main()) 189 | -------------------------------------------------------------------------------- /src/check_ceph_health: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2013-2016 SWITCH http://www.switch.ch 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | # 18 | 19 | from __future__ import print_function 20 | import argparse 21 | import os 22 | import subprocess 23 | import sys 24 | import re 25 | import json 26 | 27 | __version__ = '1.7.0' 28 | 29 | # default ceph values 30 | CEPH_ADM_COMMAND = '/usr/sbin/cephadm' 31 | CEPH_COMMAND = '/usr/bin/ceph' 32 | 33 | # nagios exit code 34 | STATUS_OK = 0 35 | STATUS_WARNING = 1 36 | STATUS_ERROR = 2 37 | STATUS_UNKNOWN = 3 38 | 39 | 40 | def main(): 41 | 42 | # parse args 43 | parser = argparse.ArgumentParser(description="'ceph health' nagios plugin.") 44 | parser.add_argument('-e','--exe', help='ceph executable [%s]' % CEPH_COMMAND) 45 | parser.add_argument('-A','--admexe', help='cephadm executable [%s]' % CEPH_ADM_COMMAND) 46 | parser.add_argument('--cluster', help='ceph cluster name') 47 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 48 | parser.add_argument('-m','--monaddress', help='ceph monitor address[:port]') 49 | parser.add_argument('-i','--id', help='ceph client id') 50 | parser.add_argument('-n','--name', help='ceph client name') 51 | parser.add_argument('-k','--keyring', help='ceph client keyring file') 52 | parser.add_argument('--check', help='regexp of which check(s) to check (luminous+) ' 53 | "Can be inverted, e.g. '^((?!(PG_DEGRADED|OBJECT_MISPLACED)$).)*$'") 54 | parser.add_argument('-w','--whitelist', help='whitelist regexp for ceph health warnings') 55 | parser.add_argument('-d','--detail', help="exec 'ceph health detail'", action='store_true') 56 | parser.add_argument('-V','--version', help='show version and exit', action='store_true') 57 | parser.add_argument('-a','--cephadm', help='uses cephadm to execute the command', action='store_true') 58 | parser.add_argument('-s','--skip-muted', help='skip muted checks', action='store_true') 59 | args = parser.parse_args() 60 | 61 | # validate args 62 | cephadm_exec = args.admexe if args.admexe else CEPH_ADM_COMMAND 63 | ceph_exec = args.exe if args.exe else CEPH_COMMAND 64 | 65 | if args.cephadm: 66 | if not os.path.exists(cephadm_exec): 67 | print("ERROR: cephadm executable '%s' doesn't exist" % cephadm_exec) 68 | return STATUS_UNKNOWN 69 | else: 70 | if not os.path.exists(ceph_exec): 71 | print("ERROR: ceph executable '%s' doesn't exist" % ceph_exec) 72 | return STATUS_UNKNOWN 73 | 74 | if args.version: 75 | print('version %s' % __version__) 76 | return STATUS_OK 77 | 78 | if args.conf and not os.path.exists(args.conf): 79 | print("ERROR: ceph conf file '%s' doesn't exist" % args.conf) 80 | return STATUS_UNKNOWN 81 | 82 | if args.keyring and not os.path.exists(args.keyring): 83 | print("ERROR: keyring file '%s' doesn't exist" % args.keyring) 84 | return STATUS_UNKNOWN 85 | 86 | # build command 87 | ceph_health = [ceph_exec] 88 | 89 | if args.cephadm: 90 | # Prepend the command with the cephadm binary and the shell command 91 | ceph_health = [cephadm_exec, 'shell'] + ceph_health 92 | 93 | if args.monaddress: 94 | ceph_health.append('-m') 95 | ceph_health.append(args.monaddress) 96 | if args.cluster: 97 | ceph_health.append('--cluster') 98 | ceph_health.append(args.cluster) 99 | if args.conf: 100 | ceph_health.append('-c') 101 | ceph_health.append(args.conf) 102 | if args.id: 103 | ceph_health.append('--id') 104 | ceph_health.append(args.id) 105 | if args.name: 106 | ceph_health.append('--name') 107 | ceph_health.append(args.name) 108 | if args.keyring: 109 | ceph_health.append('--keyring') 110 | ceph_health.append(args.keyring) 111 | ceph_health.append('health') 112 | if args.detail: 113 | ceph_health.append('detail') 114 | 115 | ceph_health.append('--format') 116 | ceph_health.append('json') 117 | #print(ceph_health) 118 | 119 | # exec command 120 | p = subprocess.Popen(ceph_health,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 121 | output, err = p.communicate() 122 | try: 123 | output = json.loads(output) 124 | except ValueError: 125 | output = dict() 126 | 127 | # parse output 128 | # print "output:", output 129 | #print "err:", err 130 | if output: 131 | ret = STATUS_OK 132 | msg = "" 133 | extended = [] 134 | if 'checks' in output: 135 | #luminous 136 | for check,status in output['checks'].items(): 137 | # skip check if not selected 138 | if args.check and not re.search(args.check, check): 139 | continue 140 | 141 | if args.skip_muted and ('muted' in status and status['muted']): 142 | continue 143 | 144 | check_detail = "%s( %s )" % (check, status['summary']['message']) 145 | 146 | if status["severity"] == "HEALTH_ERR": 147 | extended.append(msg) 148 | msg = "CRITICAL: %s" % check_detail 149 | ret = STATUS_ERROR 150 | continue 151 | 152 | if args.whitelist and re.search(args.whitelist,status['summary']['message']): 153 | continue 154 | 155 | check_msg = "WARNING: %s" % check_detail 156 | if not msg: 157 | msg = check_msg 158 | ret = STATUS_WARNING 159 | else: 160 | extended.append(check_msg) 161 | else: 162 | #pre-luminous 163 | for status in output["summary"]: 164 | if status != "HEALTH_OK": 165 | if status == "HEALTH_ERROR": 166 | msg = "CRITICAL: %s" % status['summary'] 167 | ret = STATUS_ERROR 168 | continue 169 | 170 | if args.whitelist and re.search(args.whitelist,status['summary']): 171 | continue 172 | 173 | if not msg: 174 | msg = "WARNING: %s" % status['summary'] 175 | ret = STATUS_WARNING 176 | else: 177 | extended.append("WARNING: %s" % status['summary']) 178 | 179 | if msg: 180 | print(msg) 181 | else: 182 | print("HEALTH OK") 183 | if extended: print('\n'.join(extended)) 184 | return ret 185 | 186 | 187 | elif err: 188 | # read only first line of error 189 | one_line = err.split('\n')[0] 190 | if '-1 ' in one_line: 191 | idx = one_line.rfind('-1 ') 192 | print('ERROR: %s: %s' % (ceph_exec, one_line[idx+len('-1 '):])) 193 | else: 194 | print(one_line) 195 | 196 | return STATUS_UNKNOWN 197 | 198 | 199 | if __name__ == "__main__": 200 | sys.exit(main()) 201 | -------------------------------------------------------------------------------- /src/check_ceph_df: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Copyright (c) 2013 SWITCH http://www.switch.ch 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | # 18 | 19 | from __future__ import print_function 20 | import argparse 21 | import os 22 | import subprocess 23 | import sys 24 | 25 | __version__ = '1.7.1' 26 | 27 | # default ceph values 28 | CEPH_COMMAND = '/usr/bin/ceph' 29 | 30 | # nagios exit code 31 | STATUS_OK = 0 32 | STATUS_WARNING = 1 33 | STATUS_ERROR = 2 34 | STATUS_UNKNOWN = 3 35 | 36 | def main(): 37 | 38 | # parse args 39 | parser = argparse.ArgumentParser(description="'ceph df' nagios plugin.") 40 | parser.add_argument('-e','--exe', help='ceph executable [%s]' % CEPH_COMMAND) 41 | parser.add_argument('-c','--conf', help='alternative ceph conf file') 42 | parser.add_argument('-m','--monaddress', help='ceph monitor address[:port]') 43 | parser.add_argument('-i','--id', help='ceph client id') 44 | parser.add_argument('-n','--name', help='ceph client name') 45 | parser.add_argument('-k','--keyring', help='ceph client keyring file') 46 | parser.add_argument('-p','--pool', help='ceph pool name') 47 | parser.add_argument('-d','--detail', help="show pool details on warn and critical", action='store_true') 48 | parser.add_argument('-W','--warn', help="warn above this percent RAW USED", type=float) 49 | parser.add_argument('-C','--critical', help="critical alert above this percent RAW USED", type=float) 50 | parser.add_argument('-V','--version', help='show version and exit', action='store_true') 51 | args = parser.parse_args() 52 | 53 | # validate args 54 | ceph_exec = args.exe if args.exe else CEPH_COMMAND 55 | if not os.path.exists(ceph_exec): 56 | print("ERROR: ceph executable '%s' doesn't exist" % ceph_exec) 57 | return STATUS_UNKNOWN 58 | 59 | if args.version: 60 | print('version %s' % __version__) 61 | return STATUS_OK 62 | 63 | if args.conf and not os.path.exists(args.conf): 64 | print("ERROR: ceph conf file '%s' doesn't exist" % args.conf) 65 | return STATUS_UNKNOWN 66 | 67 | if args.keyring and not os.path.exists(args.keyring): 68 | print("ERROR: keyring file '%s' doesn't exist" % args.keyring) 69 | return STATUS_UNKNOWN 70 | 71 | if not args.warn or not args.critical or args.warn > args.critical: 72 | print("ERROR: warn and critical level must be set and critical must be greater than warn") 73 | return STATUS_UNKNOWN 74 | 75 | # build command 76 | ceph_df = [ceph_exec] 77 | if args.monaddress: 78 | ceph_df.append('-m') 79 | ceph_df.append(args.monaddress) 80 | if args.conf: 81 | ceph_df.append('-c') 82 | ceph_df.append(args.conf) 83 | if args.id: 84 | ceph_df.append('--id') 85 | ceph_df.append(args.id) 86 | if args.name: 87 | ceph_df.append('--name') 88 | ceph_df.append(args.name) 89 | if args.keyring: 90 | ceph_df.append('--keyring') 91 | ceph_df.append(args.keyring) 92 | ceph_df.append('df') 93 | 94 | #print ceph_df 95 | 96 | # exec command 97 | p = subprocess.Popen(ceph_df,stdout=subprocess.PIPE,stderr=subprocess.PIPE) 98 | output, err = p.communicate() 99 | 100 | # parse output 101 | # print "DEBUG: output:", output 102 | # print "DEBUG: err:", err 103 | if output: 104 | output = output.decode('utf-8') 105 | # parse output 106 | # if detail switch was not set only show global values and compare to warning and critical 107 | # otherwise show space for pools too 108 | result=output.splitlines() 109 | # values for GLOBAL are in 3rd line of output 110 | globalline = result[2] 111 | globalvals = globalline.split() 112 | # Luminous vs Minic output (27.3TiB vs 27.3 TiB) 113 | if len(globalvals) == 7: 114 | gv = [] 115 | gv.append("{}{}".format(globalvals[0], globalvals[1])) 116 | gv.append("{}{}".format(globalvals[2], globalvals[3])) 117 | gv.append("{}{}".format(globalvals[4], globalvals[5])) 118 | gv.append(globalvals[6]) 119 | globalvals = gv 120 | #print "XXX: globalvals: {} {}".format(len(globalvals), globalvals) 121 | # Nautilus output 122 | if len(globalvals) == 10: 123 | gv = [] 124 | gv.append("{}{}".format(globalvals[1], globalvals[2])) 125 | gv.append("{}{}".format(globalvals[3], globalvals[4])) 126 | gv.append("{}{}".format(globalvals[5], globalvals[6])) 127 | gv.append(globalvals[9]) 128 | globalvals = gv 129 | #print "XXX: globalvals: {} {}".format(len(globalvals), globalvals) 130 | 131 | # prepare pool values 132 | # pool output starts in line 4 with the bare word POOLS: followed by the output 133 | poollines = result[3:] 134 | 135 | if args.pool: 136 | for line in poollines: 137 | if args.pool in line: 138 | poolvals = line.split() 139 | # Luminous vs Minic output (27.3TiB vs 27.3 TiB) 140 | if len(poolvals) == 8: 141 | pv = [] 142 | pv.append(poolvals[0]) # NAME 143 | pv.append(poolvals[1]) # ID 144 | pv.append("{}{}".format(poolvals[2], poolvals[3])) # USED 27.3 TiB 145 | pv.append(poolvals[4]) # %USED 146 | pv.append("{}{}".format(poolvals[5], poolvals[6])) # MAX AVAIL 27.3 TiB 147 | # pv.append(poolvals[7]) # OBJECTS 148 | poolvals = pv 149 | #print "XXX: poolvals: {} {}".format(len(poolvals), poolvals) 150 | # Nautilus output 151 | if len(poolvals) == 10: 152 | pv = [] 153 | pv.append(poolvals[0]) # NAME 154 | pv.append(poolvals[1]) # ID 155 | pv.append("{}{}".format(poolvals[2], poolvals[3])) # USED 27.3 TiB 156 | pv.append(poolvals[7]) # %USED 157 | pv.append("{}{}".format(poolvals[8], poolvals[9])) # MAX AVAIL 27.3 TiB 158 | # pv.append(poolvals[7]) # OBJECTS, not used 159 | poolvals = pv 160 | #print "XXX: poolvals: {} {}".format(len(poolvals), poolvals) 161 | # Octopus >= v15.2.8 (pgs added to ceph-df) 162 | if len(poolvals) == 11: 163 | pv = [] 164 | pv.append(poolvals[0]) # NAME 165 | pv.append(poolvals[1]) # ID 166 | #pv.append(poolvals[2]) # PGS, not used 167 | pv.append("{}{}".format(poolvals[3], poolvals[4])) # USED 27.3 TiB 168 | pv.append(poolvals[8]) # %USED 169 | pv.append("{}{}".format(poolvals[9], poolvals[10])) # MAX AVAIL 27.3 TiB 170 | # pv.append(poolvals[7]) # OBJECTS, not used 171 | poolvals = pv 172 | #print "XXX: poolvals: {} {}".format(len(poolvals), poolvals) 173 | 174 | 175 | pool_used = poolvals[2] 176 | pool_usage_percent = float(poolvals[3]) 177 | pool_available_space = poolvals[4] 178 | # pool_objects = float(poolvals[5]) # not used 179 | 180 | if pool_usage_percent > args.critical: 181 | print('CRITICAL: %s%% usage in Pool \'%s\' is above %s%% (%s used) | Usage=%s%%;%s;%s;;' % (pool_usage_percent, args.pool, args.critical, pool_used, pool_usage_percent, args.warn, args.critical)) 182 | return STATUS_ERROR 183 | if pool_usage_percent > args.warn: 184 | print('WARNING: %s%% usage in Pool \'%s\' is above %s%% (%s used) | Usage=%s%%;%s;%s;;' % (pool_usage_percent, args.pool, args.warn, pool_used, pool_usage_percent, args.warn, args.critical)) 185 | return STATUS_WARNING 186 | else: 187 | print('%s%% usage in Pool \'%s\' | Usage=%s%%;%s;%s;;' % (pool_usage_percent, args.pool, pool_usage_percent, args.warn, args.critical)) 188 | return STATUS_OK 189 | else: 190 | # print 'DEBUG:', globalvals 191 | # finally 4th element contains percentual value 192 | # print 'DEBUG USAGE:', globalvals[3] 193 | global_usage_percent = float(globalvals[3]) 194 | global_available_space = globalvals[1] 195 | global_total_space = globalvals[0] 196 | # print 'DEBUG WARNLEVEL:', args.warn 197 | # print 'DEBUG CRITICALLEVEL:', args.critical 198 | if global_usage_percent > args.critical: 199 | if args.detail: 200 | poollines.insert(0, '\n') 201 | poolout = '\n '.join(poollines) 202 | else: 203 | poolout = '' 204 | print('CRITICAL: global RAW usage of %s%% is above %s%% (%s of %s free)%s | Usage=%s%%;%s;%s;;' % (global_usage_percent, args.critical, global_available_space, global_total_space, poolout, global_usage_percent, args.warn, args.critical)) 205 | return STATUS_ERROR 206 | elif global_usage_percent > args.warn: 207 | if args.detail: 208 | poollines.insert(0, '\n') 209 | poolout = '\n '.join(poollines) 210 | else: 211 | poolout = '' 212 | print('WARNING: global RAW usage of %s%% is above %s%% (%s of %s free)%s | Usage=%s%%;%s;%s;;' % (global_usage_percent, args.warn, global_available_space, global_total_space, poolout, global_usage_percent, args.warn, args.critical)) 213 | return STATUS_WARNING 214 | else: 215 | print('RAW usage %s%% | Usage=%s%%;%s;%s;;' % (global_usage_percent, global_usage_percent, args.warn, args.critical)) 216 | return STATUS_OK 217 | 218 | #for 219 | elif err: 220 | # read only first line of error 221 | one_line = err.split('\n')[0] 222 | if '-1 ' in one_line: 223 | idx = one_line.rfind('-1 ') 224 | print('ERROR: %s: %s' % (ceph_exec, one_line[idx+len('-1 '):])) 225 | else: 226 | print(one_line) 227 | 228 | return STATUS_UNKNOWN 229 | 230 | 231 | if __name__ == "__main__": 232 | sys.exit(main()) 233 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Nagios plugins for Ceph 2 | 3 | A collection of nagios plugins to monitor a [Ceph][] cluster. 4 | 5 | ## Authentication 6 | 7 | Ceph is normally configured to use [cephx] to authenticate its client. 8 | 9 | To run the `check_ceph_health` or other plugins as user `nagios` you have to create a special keyring: 10 | 11 | root# ceph auth get-or-create client.nagios mon 'allow r' > ceph.client.nagios.keyring 12 | 13 | And use this keyring with the plugin: 14 | 15 | nagios$ ./check_ceph_health --id nagios --keyring ceph.client.nagios.keyring 16 | 17 | ## check_ceph_health 18 | 19 | The `check_ceph_health` nagios plugin monitors the ceph cluster, and report its health. 20 | Can be filtered to only look at certain [health checks](https://docs.ceph.com/en/latest/rados/operations/health-checks/). 21 | 22 | ### Usage 23 | 24 | usage: check_ceph_health [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-n NAME] [-i ID] [-k KEYRING] [-w WHITELIST] [-d] 25 | 26 | 'ceph health' nagios plugin. 27 | 28 | optional arguments: 29 | -h, --help show this help message and exit 30 | -e EXE, --exe EXE ceph executable [/usr/bin/ceph] 31 | -c CONF, --conf CONF alternative ceph conf file 32 | -m MONADDRESS, --monaddress MONADDRESS 33 | ceph monitor address[:port] 34 | -i ID, --id ID ceph client id 35 | -n NAME, --name NAME ceph client name 36 | -k KEYRING, --keyring KEYRING 37 | ceph client keyring file 38 | --check CHECK regexp of which check(s) to check (luminous+) Can be 39 | inverted, e.g. '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$' 40 | -w, --whitelist REGEXP 41 | whitelist regexp for ceph health warnings 42 | -d, --detail exec 'ceph health detail' 43 | -V, --version show version and exit 44 | 45 | ### Example 46 | 47 | nagios$ ./check_ceph_health --name client.nagios --keyring ceph.client.nagios.keyring 48 | HEALTH WARNING: 1 pgs degraded; 1 pgs recovering; 1 pgs stuck unclean; recovery 4448/28924462 degraded (0.015%); 2/9857830 unfound (0.000%); 49 | nagios$ echo $? 50 | 1 51 | nagios$ 52 | 53 | nagios$ ./check_ceph_health --id nagios --whitelist 'requests.are.blocked(\s)*32.sec' 54 | 55 | nagios$ ./check_ceph_health --id nagios 56 | WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a ) 57 | OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) ) 58 | PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded ) 59 | 60 | nagios$ ./check_ceph_health --id nagios --check 'PG_DEGRADED|OBJECT_MISPLACED' 61 | WARNING: OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) ) 62 | PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded ) 63 | 64 | nagios$ ./check_ceph_health --id nagios --check '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$' 65 | WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a ) 66 | 67 | 68 | ## check_ceph_mon 69 | 70 | The `check_ceph_mon` nagios plugin monitors an individual mon daemon, reporting its status. 71 | 72 | Possible result includes OK (up), WARN (missing). 73 | 74 | ### Usage 75 | 76 | usage: check_ceph_mon [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] 77 | [-k KEYRING] [-V] [-I MONID] 78 | 79 | 'ceph quorum_status' nagios plugin. 80 | 81 | optional arguments: 82 | -h, --help show this help message and exit 83 | -e EXE, --exe EXE ceph executable [/usr/bin/ceph] 84 | -c CONF, --conf CONF alternative ceph conf file 85 | -m MONADDRESS, --monaddress MONADDRESS 86 | ceph monitor to use for queries (address[:port]) 87 | -i ID, --id ID ceph client id 88 | -k KEYRING, --keyring KEYRING 89 | ceph client keyring file 90 | -V, --version show version and exit 91 | -I MONID, --monid MONID 92 | mon ID to be checked for availability 93 | 94 | ### Example 95 | 96 | nagios$ ./check_ceph_mon -I node1 97 | MON OK 98 | 99 | nagios$ ./check_ceph_mon --monid node2 100 | MON WARN: no mon 'node2' found in quorum 101 | 102 | ## check_ceph_osd 103 | 104 | The `check_ceph_osd` nagios plugin monitors an individual osd daemon or host, reporting its status. 105 | 106 | Possible result includes OK (up), WARN (down or missing). 107 | 108 | ### Usage 109 | 110 | usage: check_ceph_osd [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] 111 | [-k KEYRING] [-V] -H HOST [-I OSDID] [-o] 112 | 113 | 'ceph osd' nagios plugin. 114 | 115 | optional arguments: 116 | -h, --help show this help message and exit 117 | -e EXE, --exe EXE ceph executable [/usr/bin/ceph] 118 | -c CONF, --conf CONF alternative ceph conf file 119 | -m MONADDRESS, --monaddress MONADDRESS 120 | ceph monitor address[:port] 121 | -i ID, --id ID ceph client id 122 | -k KEYRING, --keyring KEYRING 123 | ceph client keyring file 124 | -V, --version show version and exit 125 | -H HOST, --host HOST osd host 126 | -I OSDID, --osdid OSDID 127 | osd id 128 | -o, --out check osds that are set OUT 129 | 130 | ### Example 131 | 132 | nagios$ ./check_ceph_osd -H 172.17.0.2 -I 0 133 | OSD OK 134 | 135 | nagios$ ./check_ceph_osd -H 172.17.0.2 -I 0 136 | OSD WARN: OSD.0 is down at 172.17.0.2 137 | 138 | nagios$ ./check_ceph_osd -H 172.17.0.2 -I 100 139 | OSD WARN: no OSD.100 found at host 172.17.0.2 140 | 141 | nagios$ ./check_ceph_osd -H 172.17.0.2 142 | OSD WARN: Down OSD on 172.17.0.2: osd.0 143 | 144 | ## check_ceph_rgw 145 | 146 | The `check_ceph_rgw` nagios plugin monitors a ceph rados gateway, reporting its status and buckets usage. 147 | 148 | Possible result includes OK (up), WARN (down or missing). 149 | 150 | ### Usage 151 | 152 | usage: check_ceph_rgw [-h] [-d] [-B] [-e EXE] [-c CONF] [-i ID] [-V] 153 | 154 | 'radosgw-admin bucket stats' nagios plugin. 155 | 156 | optional arguments: 157 | -h, --help show this help message and exit 158 | -d, --detail output perf data for all buckets 159 | -B, --byte output perf data in Byte instead of KB 160 | -e EXE, --exe EXE radosgw-admin executable [/usr/bin/radosgw-admin] 161 | -c CONF, --conf CONF alternative ceph conf file 162 | -i ID, --id ID ceph client id 163 | -n NAME, --name NAME ceph client name 164 | -V, --version show version and exit 165 | 166 | ### Example 167 | 168 | nagios$ ./check_ceph_rgw 169 | RGW OK: 4 buckets, 102276 KB total | /=102276KB 170 | 171 | nagios$ ./check_ceph_rgw --detail --byte 172 | RGW OK: 4 buckets, 102276 KB total | /=104730624B bucket-test1=151552B bucket-test0=12288B bucket-test2=104566784B bucket-test=0B 173 | 174 | ## check_ceph_rgw_api 175 | 176 | The `check_ceph_rgw_api` nagios plugin monitors a ceph rados gateway, reporting 177 | its status and buckets usage. 178 | 179 | ##### Difference with `check_ceph_rgw`: 180 | 181 | `check_ceph_rgw` is designed for connect to cluster, `check_ceph_rgw_api` is 182 | connected to radosgw directly via 183 | [admin api](http://docs.ceph.com/docs/master/radosgw/adminops/). You can 184 | check each instance of radosgw or only one endpoint via proxy/balancer 185 | (or both). 186 | 187 | #### Possible results 188 | - OK - bucket info recieved from radosgw; 189 | - WARNING - connected, but wrong admin entry or usage caps; 190 | - UNKNOWN - can't connect to proxy/balancer or radosgw directly; 191 | 192 | #### Requirements 193 | 194 | 1. Install [requests-aws](//github.com/tax/python-requests-aws) python library: 195 | ``` 196 | pip install requests-aws 197 | ``` 198 | 199 | 2. Configure admin entry point (default is 'admin'): 200 | ``` 201 | rgw admin entry = "admin" 202 | ``` 203 | 204 | 3. Enable admin API (default is enabled): 205 | ``` 206 | rgw enable apis = "s3, admin" 207 | ``` 208 | 209 | 4. Add capability `buckets=read` for your user who performed checks, see 210 | [Admin Guide](http://docs.ceph.com/docs/master/radosgw/admin/#add-remove-admin-capabilities) 211 | for more details. 212 | 213 | ### Usage 214 | 215 | usage: check_ceph_rgw_api [-h] -H HOST [-k] [-e ADMIN_ENTRY] -a ACCESS_KEY -s 216 | SECRET_KEY [-d] [-b] [-v] 217 | 218 | 'radosgw api bucket stats' nagios plugin. 219 | 220 | optional arguments: 221 | -h, --help show this help message and exit 222 | -H HOST, --host HOST Server URL for the radosgw api (example: 223 | http://objects.dreamhost.com/) 224 | -k, --insecure Allow insecure server connections when using SSL 225 | -e ADMIN_ENTRY, --admin_entry ADMIN_ENTRY 226 | The entry point for an admin request URL [default is 227 | 'admin'] 228 | -a ACCESS_KEY, --access_key ACCESS_KEY 229 | S3 access key 230 | -s SECRET_KEY, --secret_key SECRET_KEY 231 | S3 secret key 232 | -d, --detail output perf data for all buckets 233 | -b, --byte output perf data in Byte instead of KB 234 | -v, --version show version and exit 235 | 236 | ### Example 237 | 238 | nagios$ ./check_ceph_rgw_api -H https://objects.dreamhost.com/ -a JXUABTZZYHAFLCMF9VYV -s jjP8RDD0R156atS6ACSy2vNdJLdEPM0TJQ5jD1pw 239 | RGW OK: 1 buckets, 7696 KB total | /=7696KB 240 | 241 | nagios$ ./check_ceph_rgw_api -H objects.dreamhost.com -a JXUABTZZYHAFLCMF9VYV -s jjP8RDD0R156atS6ACSy2vNdJLdEPM0TJQ5jD1pw --detail --byte 242 | RGW OK: 1 buckets, 7696 KB total | /=7880704B k0ste=7880704B 243 | 244 | ## check_ceph_df 245 | 246 | The `check_ceph_df` nagios plugin monitors a ceph cluster, reporting its percentual RAW capacity usage, or specific pool usage. 247 | 248 | Possible result includes OK, WARN and CRITICAL. 249 | 250 | ### Usage 251 | 252 | usage: check_ceph_df [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] [-n NAME] 253 | [-k KEYRING] [-d] [-W WARN] [-C CRITICAL] [-V] 254 | 255 | 'ceph df' nagios plugin. 256 | 257 | optional arguments: 258 | -h, --help show this help message and exit 259 | -e EXE, --exe EXE ceph executable [/usr/bin/ceph] 260 | -c CONF, --conf CONF alternative ceph conf file 261 | -m MONADDRESS, --monaddress MONADDRESS 262 | ceph monitor address[:port] 263 | -i ID, --id ID ceph client id 264 | -n NAME, --name NAME ceph client name 265 | -k KEYRING, --keyring KEYRING 266 | ceph client keyring file 267 | -p POOL, --pool POOL ceph pool name 268 | -d, --detail show pool details on warn and critical 269 | -W WARN, --warn WARN warn above this percent RAW USED 270 | -C CRITICAL, --critical CRITICAL 271 | critical alert above this percent RAW USED 272 | -V, --version show version and exit 273 | 274 | ### Example 275 | 276 | nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 29.12 -C 30.22 -d 277 | RAW usage 28.36% 278 | 279 | nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 26.14 -C 30 280 | WARNING: global RAW usage of 28.36% is above 26.14% (783G of 1093G free) 281 | 282 | nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 60 -C 70 -p hdd 283 | CRITICAL: Pool 'hdd' usage of 71.71% is above 70.0% (9703G used) 284 | 285 | nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 60 -C 70 -p nvme 286 | CRITICAL: Pool 'nvme' usage of 76.08% is above 70.0% (223G used) 287 | 288 | nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 26.14 -C 30 -d 289 | WARNING: global RAW usage of 28.36% is above 26.14% (783G of 1093G free) 290 | 291 | POOLS: 292 | NAME ID USED %USED MAX AVAIL OBJECTS 293 | rbd 0 96137M 8.59 348G 24441 294 | cephfs_data 1 61785M 5.52 348G 99940 295 | cephfs_metadata 2 40380k 0 348G 8037 296 | libvirt-pool 3 145 0 348G 2 297 | 298 | ## check_ceph_mds 299 | 300 | The `check_ceph_mds` nagios plugin monitors an individual mds daemon, reporting its status. 301 | 302 | Possible result includes OK, WARN (laggy) and Error (not found). 303 | 304 | ### Usage 305 | 306 | usage: check_ceph_mds [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] 307 | [-k KEYRING] [-V] -n NAME -f FILESYSTEM 308 | 309 | 'ceph mds stat' nagios plugin. 310 | 311 | optional arguments: 312 | -h, --help show this help message and exit 313 | -e EXE, --exe EXE ceph executable [/usr/bin/ceph] 314 | -c CONF, --conf CONF alternative ceph conf file 315 | -m MONADDRESS, --monaddress MONADDRESS 316 | ceph monitor to use for queries (address[:port]) 317 | -i ID, --id ID ceph client id 318 | -k KEYRING, --keyring KEYRING 319 | ceph client keyring file 320 | -V, --version show version and exit 321 | -n NAME, --name NAME mds daemon name 322 | -f FILESYSTEM, --filesystem FILESYSTEM 323 | mds filesystem name 324 | ### Example 325 | 326 | nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-1 327 | MDS OK: MDS 'ceph-mds-1' is up:active 328 | 329 | nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-2 330 | MDS OK: MDS 'ceph-mds-2' is up:standby 331 | 332 | nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-1 333 | MDS WARN: MDS 'ceph-mds-1' is up:active (laggy or crashed) 334 | 335 | nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-3 336 | MDS ERROR: MDS 'ceph-mds-3' is not found (offline?) 337 | 338 | ## check_ceph_mgr 339 | 340 | The `check_ceph_mgr` nagios plugin monitors the mgr. 341 | 342 | 343 | ### Usage 344 | 345 | usage: check_ceph_mgr [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] 346 | [-n NAME] [-k KEYRING] [-V] 347 | 348 | 'ceph mgr dump' nagios plugin. 349 | 350 | optional arguments: 351 | -h, --help show this help message and exit 352 | -e EXE, --exe EXE ceph executable [/usr/bin/ceph] 353 | -c CONF, --conf CONF alternative ceph conf file 354 | -m MONADDRESS, --monaddress MONADDRESS 355 | ceph monitor to use for queries (address[:port]) 356 | -i ID, --id ID ceph client id 357 | -n NAME, --name NAME ceph client name 358 | -k KEYRING, --keyring KEYRING 359 | ceph client keyring file 360 | -V, --version show version and exit 361 | 362 | ### Example 363 | 364 | nagios$ ./check_ceph_mgr 365 | MGR OK: active: zhdk0013, standbys: zhdk0009, zhdk0025 366 | 367 | ## check_ceph_osd_db 368 | 369 | The `check_ceph_osd_db` checks the percentage usage of the BlueStore DB 370 | for the OSD and reports it as critical if it's above the threshold. 371 | 372 | [ceph]: http://www.ceph.com 373 | [cephx]: http://ceph.com/docs/master/rados/operations/authentication/ 374 | --------------------------------------------------------------------------------