├── LICENSE.txt ├── README.md ├── setup.cfg ├── setup.py └── supervisor_checks ├── __init__.py ├── bin ├── __init__.py ├── complex_check.py ├── cpu_check.py ├── file_check.py ├── http_check.py ├── memory_check.py ├── tcp_check.py └── xmlrpc_check.py ├── check_modules ├── __init__.py ├── base.py ├── cpu.py ├── file.py ├── http.py ├── memory.py ├── tcp.py └── xmlrpc.py ├── check_runner.py ├── compat.py ├── errors.py └── utils.py /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright 2015 Volodymyr Kuznetsov 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining 4 | a copy of this software and associated documentation files (the 5 | "Software"), to deal in the Software without restriction, including 6 | without limitation the rights to use, copy, modify, merge, publish, 7 | distribute, sublicense, and/or sell copies of the Software, and to 8 | permit persons to whom the Software is furnished to do so, subject to 9 | the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be 12 | included in all copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 15 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 16 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 17 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 18 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 19 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 20 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Supervisor Health Checks 2 | 3 | Framework to build health checks for Supervisor-based services. 4 | 5 | Health check programs are supposed to run as event listeners in [Supervisor](http://supervisord.org) 6 | environment. On check failure Supervisor will attempt to restart monitored 7 | process. 8 | 9 | Here's typical configuration example: 10 | 11 | [eventlistener:example_check] 12 | command=python 13 | stderr_logfile = /var/log/supervisor/supervisor_example_check-stderr.log 14 | stdout_logfile = /var/log/supervisor/supervisor_example_check-stdout.log 15 | events=TICK_60 16 | 17 | Here's the list of check programs package provides out-of-box: 18 | 19 | * _supervisor_http_check_: process check based on HTTP query. 20 | * _supervisor_tcp_check_: process check based on TCP connection status. 21 | * _supervisor_xmlrpc_check_: process check based on call to XML-RPC server. 22 | * _supervisor_memory_check_: process check based on amount of memory consumed by process. 23 | * _supervisor_cpu_check_: process check based on CPU percent usage within time interval. 24 | * _supervisor_file_check_: process check based on file update timeout. (Only UNIX) 25 | * _supervisor_complex_check_: complex check (run multiple checks at once). 26 | 27 | For now, it is developed and supposed to work primarily with Python 3 and 28 | Supervisor 4 branch. There's nominal Python 2.x support but it's not tested. 29 | 30 | # Installation 31 | 32 | Install and update using [pip](https://pypi.org/project/pip/) 33 | 34 | pip install supervisor_checks 35 | 36 | ## Developing Custom Check Modules 37 | 38 | While framework provides the good set of ready-for-use health check classes, 39 | it can be easily extended by adding application-specific custom health checks. 40 | 41 | To implement custom check class, _check_modules.base.BaseCheck_ class must 42 | be inherited: 43 | 44 | ```python 45 | class BaseCheck(object): 46 | """Base class for checks. 47 | """ 48 | 49 | NAME = None 50 | 51 | def __call__(self, process_spec): 52 | """Run single check. 53 | 54 | :param dict process_spec: process specification dictionary as returned 55 | by SupervisorD API. 56 | 57 | :return: True is check succeeded, otherwise False. If check failed - 58 | monitored process will be automatically restarted. 59 | 60 | :rtype: bool 61 | """ 62 | 63 | def _validate_config(self): 64 | """Method may be implemented in subclasses. Should return None or 65 | raise InvalidCheckConfig in case if configuration is invalid. 66 | 67 | Here's typical example of parameter check: 68 | 69 | if 'url' not in self._config: 70 | raise errors.InvalidCheckConfig( 71 | 'Required `url` parameter is missing in %s check config.' % ( 72 | self.NAME,)) 73 | """ 74 | ``` 75 | 76 | Here's the example of adding custom check: 77 | 78 | ```python 79 | from supervisor_checks.check_modules import base 80 | from supervisor_checks import check_runner 81 | 82 | class ExampleCheck(base.BaseCheck): 83 | 84 | NAME = 'example' 85 | 86 | def __call__(self, process_spec): 87 | 88 | # Always return True 89 | return True 90 | 91 | if __name__ == '__main__': 92 | 93 | check_runner.CheckRunner( 94 | 'example_check', 'some_process_group', [(ExampleCheck, {})]).run() 95 | ``` 96 | 97 | ## Out-of-box checks 98 | 99 | ### HTTP Check 100 | 101 | Process check based on HTTP query. 102 | 103 | #### CLI 104 | 105 | $ /usr/local/bin/supervisor_http_check -h 106 | usage: supervisor_http_check [-h] -n CHECK_NAME -g PROCESS_GROUP -u URL -p 107 | PORT [-t TIMEOUT] [-r NUM_RETRIES] 108 | 109 | Run HTTP check program. 110 | 111 | optional arguments: 112 | -h, --help show this help message and exit 113 | -n CHECK_NAME, --check-name CHECK_NAME 114 | Health check name. 115 | -g PROCESS_GROUP, --process-group PROCESS_GROUP 116 | Supervisor process group name. 117 | -N PROCESS_NAME, --process-name PROCESS_NAME 118 | Supervisor process name. Process group argument is 119 | ignored if this is passed in 120 | -u URL, --url URL HTTP check url 121 | -m METHOD, --method METHOD 122 | HTTP request method (GET, POST, PUT...) 123 | -j JSON, --json JSON HTTP json body, auto sets content-type header to 124 | application/json 125 | -b BODY, --body BODY HTTP body, will be ignored if json body pass in 126 | -H HEADERS, --headers HEADERS 127 | HTTP headers as json 128 | -U USERNAME, --username USERNAME 129 | HTTP check username 130 | -P PASSWORD, --password PASSWORD 131 | HTTP check password 132 | -p PORT, --port PORT HTTP port to query. Can be integer or regular 133 | expression which will be used to extract port from a 134 | process name. 135 | -t TIMEOUT, --timeout TIMEOUT 136 | Connection timeout. Default: 15 137 | -r NUM_RETRIES, --num-retries NUM_RETRIES 138 | Connection retries. Default: 2 139 | 140 | #### Configuration Examples 141 | 142 | Query process running on port 8080 using URL _/ping_: 143 | 144 | [eventlistener:example_check] 145 | command=/usr/local/bin/supervisor_http_check -g example_service -n example_check -u /ping -t 30 -r 3 -p 8080 146 | events=TICK_60 147 | 148 | Query process group using URL /ping. Each process is listening on it's own port. 149 | Each process name is formed as _some-process-name\_port_ so particular port number can 150 | be extracted using regular expression: 151 | 152 | [eventlistener:example_check] 153 | command=/usr/local/bin/supervisor_http_check -g example_service -n example_check -u /ping -t 30 -r 3 -p ".+_(\\d+)" 154 | events=TICK_60 155 | 156 | 157 | ### TCP Check 158 | 159 | Process check based on TCP connection status. 160 | 161 | #### CLI 162 | 163 | $ /usr/local/bin/supervisor_tcp_check -h 164 | usage: supervisor_tcp_check [-h] -n CHECK_NAME -g PROCESS_GROUP -p PORT 165 | [-t TIMEOUT] [-r NUM_RETRIES] 166 | 167 | Run TCP check program. 168 | 169 | optional arguments: 170 | -h, --help show this help message and exit 171 | -n CHECK_NAME, --check-name CHECK_NAME 172 | Check name. 173 | -g PROCESS_GROUP, --process-group PROCESS_GROUP 174 | Supervisor process group name. 175 | -N PROCESS_NAME, --process-name PROCESS_NAME 176 | Supervisor process name. Process group argument is 177 | ignored if this is passed in 178 | -p PORT, --port PORT TCP port to query. Can be integer or regular 179 | expression which will be used to extract port from a 180 | process name. 181 | -t TIMEOUT, --timeout TIMEOUT 182 | Connection timeout. Default: 15 183 | -r NUM_RETRIES, --num-retries NUM_RETRIES 184 | Connection retries. Default: 2 185 | 186 | #### Configuration Examples 187 | 188 | Connect to process running on port 8080: 189 | 190 | [eventlistener:example_check] 191 | command=/usr/local/bin/supervisor_tcp_check -g example_service -n example_check -t 30 -r 3 -p 8080 192 | events=TICK_60 193 | 194 | Query process group when each process is listening on it's own port. 195 | Each process name is formed as _some-process-name\_port_ so particular port number can 196 | be extracted using regular expression: 197 | 198 | [eventlistener:example_check] 199 | command=/usr/local/bin/supervisor_tcp_check -g example_service -n example_check -t 30 -r 3 -p ".+_(\\d+)" 200 | events=TICK_60 201 | 202 | 203 | ### XML-RPC Check 204 | 205 | Process check based on call to XML-RPC server. 206 | 207 | #### CLI 208 | 209 | $ /usr/local/bin/supervisor_xmlrpc_check -h 210 | usage: supervisor_xmlrpc_check [-h] -n CHECK_NAME -g PROCESS_GROUP [-u URL] 211 | [-s SOCK_PATH] [-S SOCK_DIR] [-p PORT] 212 | [-r NUM_RETRIES] 213 | 214 | Run XML RPC check program. 215 | 216 | optional arguments: 217 | -h, --help show this help message and exit 218 | -n CHECK_NAME, --check-name CHECK_NAME 219 | Health check name. 220 | -g PROCESS_GROUP, --process-group PROCESS_GROUP 221 | Supervisor process group name. 222 | -N PROCESS_NAME, --process-name PROCESS_NAME 223 | Supervisor process name. Process group argument is 224 | ignored if this is passed in 225 | -u URL, --url URL XML RPC check url 226 | -s SOCK_PATH, --socket-path SOCK_PATH 227 | Full path to XML RPC server local socket 228 | -S SOCK_DIR, --socket-dir SOCK_DIR 229 | Path to XML RPC server socket directory. Socket name 230 | will be constructed using process name: 231 | .sock. 232 | -m METHOD, --method METHOD 233 | XML RPC method name. Default is status 234 | -p PORT, --port PORT Port to query. Can be integer or regular 235 | expression which will be used to extract port from a 236 | process name. 237 | -r NUM_RETRIES, --num-retries NUM_RETRIES 238 | Connection retries. Default: 2 239 | 240 | #### Configuration Examples 241 | 242 | Call to process' XML-RPC server listening on port 8080, URL /status, RPC method get_status: 243 | 244 | [eventlistener:example_check] 245 | command=/usr/local/bin/supervisor_xmlrpc_check -g example_service -n example_check -r 3 -p 8080 -u /status -m get_status 246 | events=TICK_60 247 | 248 | Call to process' XML-RPC server listening on UNIX socket: 249 | 250 | [eventlistener:example_check] 251 | command=/usr/local/bin/supervisor_xmlrpc_check -g example_service -n example_check -r 3 -s /var/run/example.sock -m get_status 252 | events=TICK_60 253 | 254 | Call to process group XML-RPC servers, listening on different UNIX socket. In such 255 | case socket directory must be specified, process socket name will be formed as .sock: 256 | 257 | [eventlistener:example_check] 258 | command=/usr/local/bin/supervisor_xmlrpc_check -g example_service -n example_check -r 3 -S /var/run/ -m get_status 259 | events=TICK_60 260 | 261 | ### Memory Check 262 | 263 | Process check based on amount of memory consumed by process. 264 | 265 | #### CLI 266 | 267 | $ /usr/local/bin/supervisor_memory_check -h 268 | usage: supervisor_memory_check [-h] -n CHECK_NAME -g PROCESS_GROUP -m MAX_RSS 269 | [-c CUMULATIVE] 270 | 271 | Run memory check program. 272 | 273 | optional arguments: 274 | -h, --help show this help message and exit 275 | -n CHECK_NAME, --check-name CHECK_NAME 276 | Health check name. 277 | -g PROCESS_GROUP, --process-group PROCESS_GROUP 278 | Supervisor process group name. 279 | -N PROCESS_NAME, --process-name PROCESS_NAME 280 | Supervisor process name. Process group argument is 281 | ignored if this is passed in 282 | -m MAX_RSS, --msx-rss MAX_RSS 283 | Maximum memory allowed to use by process, KB. 284 | -c CUMULATIVE, --cumulative CUMULATIVE 285 | Recursively calculate memory used by all process 286 | children. 287 | 288 | #### Configuration Examples 289 | 290 | Restart process if the total amount of memory consumed by process and all its 291 | children is greater than 100M: 292 | 293 | [eventlistener:example_check] 294 | command=/usr/local/bin/supervisor_memory_check -n example_check -m 102400 -c -g example_service 295 | events=TICK_60 296 | 297 | ### CPU Check 298 | 299 | Process check based on CPU percent usage within specified time interval. 300 | 301 | #### CLI 302 | 303 | $ /usr/local/bin/supervisor_cpu_check -h 304 | usage: supervisor_cpu_check [-h] -n CHECK_NAME -g PROCESS_GROUP -p MAX_CPU -i INTERVAL 305 | 306 | Run memory check program. 307 | 308 | optional arguments: 309 | -h, --help show this help message and exit 310 | -n CHECK_NAME, --check-name CHECK_NAME 311 | Health check name. 312 | -g PROCESS_GROUP, --process-group PROCESS_GROUP 313 | Supervisor process group name. 314 | -N PROCESS_NAME, --process-name PROCESS_NAME 315 | Supervisor process name. Process group argument is 316 | ignored if this is passed in 317 | -p MAX_CPU, --max-cpu-percent MAX_CPU 318 | Maximum CPU percent usage allowed to use by process 319 | within time interval. 320 | -i INTERVAL, --interval INTERVAL 321 | How long process is allowed to use CPU over threshold, 322 | seconds. 323 | 324 | 325 | #### Configuration Examples 326 | 327 | Restart process when it consumes more than 100% CPU within 30 minutes: 328 | 329 | [eventlistener:example_check] 330 | command=/usr/local/bin/supervisor_cpu_check -n example_check -p 100 -i 1800 -g example_service 331 | events=TICK_60 332 | 333 | 334 | ### File Check 335 | 336 | Process check based on file update timeout. 337 | 338 | #### CLI 339 | 340 | $ /usr/local/bin/supervisor_file_check -h 341 | usage: supervisor_file_check [-h] -n CHECK_NAME [-g PROCESS_GROUP] [-N PROCESS_NAME] -t TIMEOUT [-x] [-f FILE] 342 | 343 | Run File check program. 344 | 345 | options: 346 | -h, --help show this help message and exit 347 | -n CHECK_NAME, --check-name CHECK_NAME 348 | Health check name. 349 | -g PROCESS_GROUP, --process-group PROCESS_GROUP 350 | Supervisor process group name. 351 | -N PROCESS_NAME, --process-name PROCESS_NAME 352 | Supervisor process name. Process group argument is ignored if this is passed in 353 | -t TIMEOUT, --timeout TIMEOUT 354 | Timeout in seconds after no file change a process is considered dead. 355 | -x, --fail-on-error Fail the health check on any error. 356 | -f FILEPATH, --filepath FILEPATH 357 | Filepath of file to check (default: 358 | %(root_directory)/%(process_group)s-%(process_name)s-%(process_pid)s-*) 359 | -d ROOT_DIR, --root-dir ROOT_DIR 360 | Root Directory of Notification Files (default: tempfile.gettempdir()) 361 | #### Configuration Examples 362 | 363 | Perform passive health checks on default root_directory, allowing a maximum of 30 seconds health notification delay. 364 | 365 | [eventlistener:example_check] 366 | command=/usr/local/bin/supervisor_file_check -n example_check -t 30 367 | events=TICK_60 368 | 369 | #### NotificationFile Utility Class 370 | 371 | To simplify the work with this health check module, an utility class is provided to update the notification file associated with the current supervisor subprocess and notify an heartbeat. 372 | 373 | ```python 374 | from supervisor_checks.util import NotificationFile 375 | 376 | tmp = NotificationFile() 377 | 378 | while True: 379 | tmp.notify() 380 | # do some work here 381 | ``` 382 | 383 | ### Complex Check 384 | 385 | Complex check (run multiple checks at once). 386 | 387 | #### CLI 388 | 389 | $ /usr/local/bin/supervisor_complex_check -h 390 | usage: supervisor_complex_check [-h] -n CHECK_NAME -g PROCESS_GROUP -c 391 | CHECK_CONFIG 392 | 393 | Run SupervisorD check program. 394 | 395 | optional arguments: 396 | -h, --help show this help message and exit 397 | -n CHECK_NAME, --check-name CHECK_NAME 398 | Health check name. 399 | -g PROCESS_GROUP, --process-group PROCESS_GROUP 400 | Supervisor process group name. 401 | -N PROCESS_NAME, --process-name PROCESS_NAME 402 | Supervisor process name. Process group argument is 403 | ignored if this is passed in 404 | -c CHECK_CONFIG, --check-config CHECK_CONFIG 405 | Check config in JSON format 406 | 407 | #### Example configuration 408 | 409 | Here's example configuration using memory and http checks: 410 | 411 | [eventlistener:example_check] 412 | command=/usr/local/bin/supervisor_complex_check -n example_check -g example_service -c '{"memory":{"cumulative":true,"max_rss":4194304},"http":{"timeout":15,"port":8090,"url":"\/ping","num_retries":3}}' 413 | events=TICK_60 414 | 415 | 416 | ## Acknowledgement 417 | 418 | This is inspired by [Superlance](https://superlance.readthedocs.org/en/latest/) plugin package. 419 | 420 | Though, while [Superlance](https://superlance.readthedocs.org/en/latest/) is basically the set 421 | of feature-rich health check programs, `supervisor_checks` package is mostly focused on providing 422 | the framework to easily implement application-specific health checks of any complexity. 423 | 424 | ## Bug reports 425 | 426 | Please file here: 427 | 428 | Or contact me directly: 429 | 430 | 431 | Coverity Scan Build Status 433 | 434 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file=README.md 3 | 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from setuptools import setup, find_packages 4 | 5 | py_version = sys.version_info[:2] 6 | 7 | if py_version < (2, 6): 8 | raise RuntimeError('On Python 2, this package requires Python 2.6 or later') 9 | elif (3, 0) < py_version < (3, 2): 10 | raise RuntimeError('On Python 3, this package requires Python 3.2 or later') 11 | 12 | 13 | install_requires = ['psutil', 14 | # 'supervisor>=4.0.0', temporarily disabled 15 | ] 16 | 17 | if py_version < (3, 2): 18 | install_requires.append('futures') 19 | 20 | tests_require = install_requires + [] 21 | 22 | setup( 23 | name='supervisor_checks', 24 | packages=find_packages(), 25 | version='0.8.1', 26 | description='Framework to build health checks for Supervisor-based services.', 27 | author='Vovan Kuznetsov', 28 | author_email='vovanec@gmail.com', 29 | maintainer_email='vovanec@gmail.com', 30 | url='https://github.com/vovanec/supervisor_checks', 31 | download_url='https://github.com/vovanec/supervisor_checks/tarball/0.8.1', 32 | keywords=['supervisor', 'event', 'listener', 'eventlistener', 33 | 'http', 'memory', 'xmlrpc', 'health', 'check', 'monitor', 'cpu'], 34 | license='MIT', 35 | classifiers=['License :: OSI Approved :: MIT License', 36 | 'Development Status :: 4 - Beta', 37 | 'Intended Audience :: Developers', 38 | 'Operating System :: POSIX', 39 | 'Topic :: System :: Boot', 40 | 'Topic :: System :: Monitoring', 41 | 'Topic :: System :: Systems Administration', 42 | 'Programming Language :: Python', 43 | 'Programming Language :: Python :: 2', 44 | 'Programming Language :: Python :: 2.6', 45 | 'Programming Language :: Python :: 2.7', 46 | 'Programming Language :: Python :: 3', 47 | 'Programming Language :: Python :: 3.2', 48 | 'Programming Language :: Python :: 3.3', 49 | 'Programming Language :: Python :: 3.4'], 50 | install_requires=install_requires, 51 | tests_require=tests_require, 52 | test_suite='nose.collector', 53 | extras_require={ 54 | 'test': tests_require, 55 | }, 56 | entry_points={ 57 | 'console_scripts': [ 58 | 'supervisor_memory_check=supervisor_checks.bin.memory_check:main', 59 | 'supervisor_cpu_check=supervisor_checks.bin.cpu_check:main', 60 | 'supervisor_http_check=supervisor_checks.bin.http_check:main', 61 | 'supervisor_tcp_check=supervisor_checks.bin.tcp_check:main', 62 | 'supervisor_xmlrpc_check=supervisor_checks.bin.xmlrpc_check:main', 63 | 'supervisor_complex_check=supervisor_checks.bin.complex_check:main', 64 | 'supervisor_file_check=supervisor_checks.bin.file_check:main'] 65 | } 66 | ) 67 | 68 | -------------------------------------------------------------------------------- /supervisor_checks/__init__.py: -------------------------------------------------------------------------------- 1 | """Framework to build health checks for Supervisor-based services. 2 | 3 | Health check programs are supposed to run as event listeners in Supervisor 4 | environment. Here's typical configuration example: 5 | 6 | [eventlistener:example_check] 7 | command=/usr/local/bin/supervisor_example_check 8 | stderr_logfile = /var/log/supervisor/supervisor_example_check-stderr.log 9 | stdout_logfile = /var/log/supervisor/supervisor_example_check-stdout.log 10 | events=TICK_60 11 | 12 | While framework provides the set of ready-for-use health check classes( 13 | tcp, http, xmlrpc, memory etc), it can be easily extended by adding custom 14 | health checks. Here's really simple example of adding custom check: 15 | 16 | from supervisor_checks.check_modules import base 17 | from supervisor_checks import check_runner 18 | 19 | class ExampleCheck(base.BaseCheck): 20 | 21 | NAME = 'example' 22 | 23 | def __call__(self, process_spec): 24 | 25 | # Always return True 26 | 27 | return True 28 | 29 | check_runner.CheckRunner( 30 | 'example_check', 'some_process_group', [(ExampleCheck, {})]).run() 31 | """ 32 | 33 | __author__ = 'vovanec@gmail.com' 34 | -------------------------------------------------------------------------------- /supervisor_checks/bin/__init__.py: -------------------------------------------------------------------------------- 1 | """Executable scripts. 2 | """ 3 | 4 | __author__ = 'vovanec@gmail.com' 5 | -------------------------------------------------------------------------------- /supervisor_checks/bin/complex_check.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | r"""Example configuration: 4 | 5 | [eventlistener:example_check] 6 | command=/usr/local/bin/supervisor_complex_check -n example_check -g example_service -c '{"memory":{"cumulative":true,"max_rss":4194304},"http":{"timeout":15,"port":8090,"url":"\/ping","num_retries":3}}' 7 | events=TICK_60 8 | """ 9 | 10 | 11 | import argparse 12 | import json 13 | import sys 14 | 15 | from supervisor_checks import check_runner 16 | from supervisor_checks.check_modules import cpu 17 | from supervisor_checks.check_modules import http 18 | from supervisor_checks.check_modules import memory 19 | from supervisor_checks.check_modules import tcp 20 | from supervisor_checks.check_modules import xmlrpc 21 | 22 | __author__ = 'vovanec@gmail.com' 23 | 24 | 25 | CHECK_CLASSES = {http.HTTPCheck.NAME: http.HTTPCheck, 26 | memory.MemoryCheck.NAME: memory.MemoryCheck, 27 | tcp.TCPCheck.NAME: tcp.TCPCheck, 28 | xmlrpc.XMLRPCCheck.NAME: xmlrpc.XMLRPCCheck, 29 | cpu.CPUCheck.NAME: cpu.CPUCheck} 30 | 31 | 32 | def _make_argument_parser(): 33 | """Create the option parser. 34 | """ 35 | 36 | parser = argparse.ArgumentParser( 37 | description='Run SupervisorD check program.') 38 | parser.add_argument('-n', '--check-name', dest='check_name', 39 | type=str, required=True, default=None, 40 | help='Health check name.') 41 | parser.add_argument('-g', '--process-group', dest='process_group', 42 | type=str, default=None, 43 | help='Supervisor process group name.') 44 | parser.add_argument('-N', '--process-name', dest='process_name', 45 | type=str, default=None, 46 | help='Supervisor process name. Process group argument is ignored if this ' + 47 | 'is passed in') 48 | parser.add_argument('-c', '--check-config', dest='check_config', type=str, 49 | help='Check config JSON', required=True, default=None) 50 | 51 | return parser 52 | 53 | 54 | def main(): 55 | 56 | arg_parser = _make_argument_parser() 57 | args = arg_parser.parse_args() 58 | 59 | checks_config_dict = json.loads(args.check_config) 60 | if not isinstance(checks_config_dict, dict): 61 | raise ValueError('Check config must be dictionary type!') 62 | 63 | checks_config = [] 64 | for check_name, check_config in checks_config_dict.items(): 65 | checks_config.append((CHECK_CLASSES[check_name], check_config)) 66 | 67 | return check_runner.CheckRunner( 68 | args.check_name, args.process_group, args.process_name, checks_config).run() 69 | 70 | 71 | if __name__ == '__main__': 72 | 73 | sys.exit(main()) 74 | -------------------------------------------------------------------------------- /supervisor_checks/bin/cpu_check.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | """Example configuration(restart process when it consumes more than 4 | 100% CPU within 30 minutes): 5 | 6 | [eventlistener:example_check] 7 | command=/usr/local/bin/supervisor_cpu_check -n example_check -p 100 -i 1800 -g example_service 8 | events=TICK_60 9 | """ 10 | 11 | import argparse 12 | import sys 13 | 14 | from supervisor_checks import check_runner 15 | from supervisor_checks.check_modules import cpu 16 | 17 | 18 | __author__ = 'vovanec@gmail.com' 19 | 20 | 21 | def _make_argument_parser(): 22 | """Create the option parser. 23 | """ 24 | 25 | parser = argparse.ArgumentParser( 26 | description='Run memory check program.') 27 | parser.add_argument('-n', '--check-name', dest='check_name', 28 | type=str, required=True, default=None, 29 | help='Health check name.') 30 | parser.add_argument('-g', '--process-group', dest='process_group', 31 | type=str, default=None, 32 | help='Supervisor process group name.') 33 | parser.add_argument('-N', '--process-name', dest='process_name', 34 | type=str, default=None, 35 | help='Supervisor process name. Process group argument is ignored if this ' + 36 | 'is passed in') 37 | parser.add_argument( 38 | '-p', '--max-cpu-percent', dest='max_cpu', type=int, required=True, 39 | help='Maximum CPU percent usage allowed to use by process ' 40 | 'within time interval.') 41 | 42 | parser.add_argument( 43 | '-i', '--interval', dest='interval', type=int, required=True, 44 | help='How long process is allowed to use CPU over threshold, seconds.') 45 | 46 | return parser 47 | 48 | 49 | def main(): 50 | 51 | arg_parser = _make_argument_parser() 52 | args = arg_parser.parse_args() 53 | 54 | checks_config = [(cpu.CPUCheck, {'max_cpu': args.max_cpu, 55 | 'interval': args.interval})] 56 | 57 | return check_runner.CheckRunner( 58 | args.check_name, args.process_group, args.process_name, checks_config).run() 59 | 60 | 61 | if __name__ == '__main__': 62 | 63 | sys.exit(main()) 64 | -------------------------------------------------------------------------------- /supervisor_checks/bin/file_check.py: -------------------------------------------------------------------------------- 1 | 2 | import argparse 3 | import sys 4 | 5 | from supervisor_checks import check_runner 6 | from supervisor_checks.check_modules import file 7 | 8 | 9 | def _make_argument_parser(): 10 | """Create the option parser. 11 | """ 12 | 13 | parser = argparse.ArgumentParser( 14 | description='Run File check program.') 15 | parser.add_argument('-n', '--check-name', dest='check_name', 16 | type=str, required=True, default=None, 17 | help='Health check name.') 18 | parser.add_argument('-g', '--process-group', dest='process_group', 19 | type=str, default=None, 20 | help='Supervisor process group name.') 21 | parser.add_argument('-N', '--process-name', dest='process_name', 22 | type=str, default=None, 23 | help='Supervisor process name. Process group argument is ignored if this ' + 24 | 'is passed in') 25 | parser.add_argument( 26 | '-t', '--timeout', dest='timeout', type=int, required=True, 27 | help='Timeout in seconds after no file change a process is considered dead.') 28 | parser.add_argument("-x", "--fail-on-error", dest="fail_on_error", action="store_true", help="Fail the health check on any error.") 29 | parser.add_argument("-f", "--filepath", dest="filepath", type=str, default=None, help="Filepath of file to check (default: %%(root_directory)/%%(process_group)s-%%(process_name)s-%%(process_pid)s-*)") 30 | parser.add_argument("-d", "--root-dir", dest="root_dir", type=str, default=None, help="Root Directory of Notification Files (default: tempfile.gettempdir())") 31 | return parser 32 | 33 | 34 | def main(): 35 | 36 | arg_parser = _make_argument_parser() 37 | args = arg_parser.parse_args() 38 | 39 | checks_config = [(file.FileCheck, {'timeout': args.timeout, 'fail_on_error': args.fail_on_error, 'filepath': args.filepath, 'root_dir': args.root_dir})] 40 | return check_runner.CheckRunner( 41 | args.check_name, args.process_group, args.process_name, checks_config).run() 42 | 43 | 44 | if __name__ == '__main__': 45 | 46 | sys.exit(main()) 47 | -------------------------------------------------------------------------------- /supervisor_checks/bin/http_check.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | """Example configuration: 4 | 5 | [eventlistener:example_check] 6 | command=/usr/local/bin/supervisor_http_check -g example_service -n example_check -u /ping -t 30 -r 3 -p 8080 7 | events=TICK_60 8 | """ 9 | 10 | import argparse 11 | import json 12 | import sys 13 | 14 | from supervisor_checks import check_runner 15 | from supervisor_checks.check_modules import http 16 | 17 | __author__ = 'vovanec@gmail.com' 18 | 19 | 20 | def _make_argument_parser(): 21 | """Create the option parser. 22 | """ 23 | 24 | parser = argparse.ArgumentParser( 25 | description='Run HTTP check program.') 26 | parser.add_argument('-n', '--check-name', dest='check_name', 27 | type=str, required=True, default=None, 28 | help='Health check name.') 29 | parser.add_argument('-g', '--process-group', dest='process_group', 30 | type=str, default=None, 31 | help='Supervisor process group name.') 32 | parser.add_argument('-N', '--process-name', dest='process_name', 33 | type=str, default=None, 34 | help='Supervisor process name. Process group argument is ignored if this ' + 35 | 'is passed in') 36 | parser.add_argument('-u', '--url', dest='url', type=str, 37 | help='HTTP check url', required=True, default=None) 38 | parser.add_argument('-m', '--method', dest='method', type=str, 39 | help='HTTP request method (GET, POST, PUT...)', default='GET') 40 | parser.add_argument('-j', '--json', dest='json', type=json.loads, 41 | help='HTTP json body, auto sets content-type header to application/json', 42 | default=None) 43 | parser.add_argument('-b', '--body', dest='body', type=str, 44 | help='HTTP body, will be ignored if json body pass in', default=None) 45 | parser.add_argument('-H', '--headers', dest='headers', type=json.loads, 46 | help='HTTP headers as json', default=None) 47 | parser.add_argument('-U', '--username', dest='username', type=str, 48 | help='HTTP check username', required=False, 49 | default=None) 50 | parser.add_argument('-P', '--password', dest='password', type=str, 51 | help='HTTP check password', required=False, 52 | default=None) 53 | parser.add_argument( 54 | '-p', '--port', dest='port', type=str, 55 | default=None, required=True, 56 | help='HTTP port to query. Can be integer or regular expression which ' 57 | 'will be used to extract port from a process name.') 58 | parser.add_argument( 59 | '-t', '--timeout', dest='timeout', type=int, required=False, 60 | default=http.DEFAULT_TIMEOUT, 61 | help='Connection timeout. Default: %s' % (http.DEFAULT_TIMEOUT,)) 62 | parser.add_argument( 63 | '-r', '--num-retries', dest='num_retries', type=int, 64 | default=http.DEFAULT_RETRIES, required=False, 65 | help='Connection retries. Default: %s' % (http.DEFAULT_RETRIES,)) 66 | 67 | return parser 68 | 69 | 70 | def main(): 71 | 72 | arg_parser = _make_argument_parser() 73 | args = arg_parser.parse_args() 74 | 75 | checks_config = [(http.HTTPCheck, {'url': args.url, 76 | 'timeout': args.timeout, 77 | 'num_retries': args.num_retries, 78 | 'method': args.method, 79 | 'json': args.json, 80 | 'body': args.body, 81 | 'headers': args.headers, 82 | 'port': args.port, 83 | 'username': args.username, 84 | 'password': args.password, 85 | })] 86 | return check_runner.CheckRunner( 87 | args.check_name, args.process_group, args.process_name, checks_config).run() 88 | 89 | 90 | if __name__ == '__main__': 91 | 92 | sys.exit(main()) 93 | -------------------------------------------------------------------------------- /supervisor_checks/bin/memory_check.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | """Example configuration: 4 | 5 | [eventlistener:example_check] 6 | command=/usr/local/bin/supervisor_memory_check -n example_check -m 102400 -c -g example_service 7 | events=TICK_60 8 | """ 9 | 10 | import argparse 11 | import sys 12 | 13 | from supervisor_checks import check_runner 14 | from supervisor_checks.check_modules import memory 15 | 16 | 17 | __author__ = 'vovanec@gmail.com' 18 | 19 | 20 | def _make_argument_parser(): 21 | """Create the option parser. 22 | """ 23 | 24 | parser = argparse.ArgumentParser( 25 | description='Run memory check program.') 26 | parser.add_argument('-n', '--check-name', dest='check_name', 27 | type=str, required=True, default=None, 28 | help='Health check name.') 29 | parser.add_argument('-g', '--process-group', dest='process_group', 30 | type=str, default=None, 31 | help='Supervisor process group name.') 32 | parser.add_argument('-N', '--process-name', dest='process_name', 33 | type=str, default=None, 34 | help='Supervisor process name. Process group argument is ignored if this ' + 35 | 'is passed in') 36 | parser.add_argument( 37 | '-m', '--msx-rss', dest='max_rss', type=int, required=True, 38 | help='Maximum memory allowed to use by process, KB.') 39 | parser.add_argument( 40 | '-c', '--cumulative', dest='cumulative', action='store_true', 41 | help='Recursively calculate memory used by all process children.') 42 | 43 | return parser 44 | 45 | 46 | def main(): 47 | 48 | arg_parser = _make_argument_parser() 49 | args = arg_parser.parse_args() 50 | 51 | checks_config = [(memory.MemoryCheck, {'max_rss': args.max_rss, 52 | 'cumulative': args.cumulative})] 53 | 54 | return check_runner.CheckRunner( 55 | args.check_name, args.process_group, args.process_name, checks_config).run() 56 | 57 | 58 | if __name__ == '__main__': 59 | 60 | sys.exit(main()) 61 | -------------------------------------------------------------------------------- /supervisor_checks/bin/tcp_check.py: -------------------------------------------------------------------------------- 1 | """Health check based on TCP connection status. 2 | 3 | Example configuration: 4 | 5 | [eventlistener:example_check] 6 | command=/usr/local/bin/supervisor_tcp_check -n example_service_check -u /ping -t 30 -r 3 -g example_service -p 8080 7 | events=TICK_60 8 | """ 9 | 10 | import argparse 11 | import sys 12 | 13 | from supervisor_checks import check_runner 14 | from supervisor_checks.check_modules import tcp 15 | 16 | 17 | __author__ = 'vovanec@gmail.net' 18 | 19 | 20 | def _make_argument_parser(): 21 | """Create the option parser. 22 | """ 23 | 24 | parser = argparse.ArgumentParser( 25 | description='Run TCP check program.') 26 | parser.add_argument('-n', '--check-name', dest='check_name', 27 | type=str, required=True, default=None, 28 | help='Check name.') 29 | parser.add_argument('-g', '--process-group', dest='process_group', 30 | type=str, default=None, 31 | help='Supervisor process group name.') 32 | parser.add_argument('-N', '--process-name', dest='process_name', 33 | type=str, default=None, 34 | help='Supervisor process name. Process group argument is ignored if this ' + 35 | 'is passed in') 36 | parser.add_argument( 37 | '-p', '--port', dest='port', type=str, 38 | default=None, required=True, 39 | help='TCP port to query. Can be integer or regular expression which ' 40 | 'will be used to extract port from a process name.') 41 | parser.add_argument( 42 | '-t', '--timeout', dest='timeout', type=int, required=False, 43 | default=tcp.DEFAULT_TIMEOUT, 44 | help='Connection timeout. Default: %s' % (tcp.DEFAULT_TIMEOUT,)) 45 | parser.add_argument( 46 | '-r', '--num-retries', dest='num_retries', type=int, 47 | default=tcp.DEFAULT_RETRIES, required=False, 48 | help='Connection retries. Default: %s' % (tcp.DEFAULT_RETRIES,)) 49 | 50 | return parser 51 | 52 | 53 | def main(): 54 | 55 | arg_parser = _make_argument_parser() 56 | args = arg_parser.parse_args() 57 | 58 | checks_config = [(tcp.TCPCheck, {'timeout': args.timeout, 59 | 'num_retries': args.num_retries, 60 | 'port': args.port})] 61 | 62 | return check_runner.CheckRunner( 63 | args.check_name, args.process_group, args.process_name, checks_config).run() 64 | 65 | 66 | if __name__ == '__main__': 67 | 68 | sys.exit(main()) 69 | -------------------------------------------------------------------------------- /supervisor_checks/bin/xmlrpc_check.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | """Example configuration: 4 | 5 | [eventlistener:example_check] 6 | command=/usr/local/bin/supervisor_xmlrpc_check -g example_service -n example_check -u /ping -r 3 -p 8080 7 | events=TICK_60 8 | """ 9 | 10 | import argparse 11 | import sys 12 | 13 | from supervisor_checks import check_runner 14 | from supervisor_checks.check_modules import xmlrpc 15 | 16 | __author__ = 'vovanec@gmail.com' 17 | 18 | 19 | def _make_argument_parser(): 20 | """Create the option parser. 21 | """ 22 | 23 | parser = argparse.ArgumentParser( 24 | description='Run XML RPC check program.') 25 | parser.add_argument('-n', '--check-name', dest='check_name', 26 | type=str, required=True, default=None, 27 | help='Health check name.') 28 | parser.add_argument('-g', '--process-group', dest='process_group', 29 | type=str, default=None, 30 | help='Supervisor process group name.') 31 | parser.add_argument('-N', '--process-name', dest='process_name', 32 | type=str, default=None, 33 | help='Supervisor process name. Process group argument is ignored if this ' + 34 | 'is passed in') 35 | parser.add_argument('-u', '--url', dest='url', type=str, 36 | help='XML RPC check url', required=False, default=None) 37 | parser.add_argument('-s', '--socket-path', dest='sock_path', type=str, 38 | help='Full path to XML RPC server local socket', 39 | required=False, default=None) 40 | parser.add_argument('-S', '--socket-dir', dest='sock_dir', type=str, 41 | help='Path to XML RPC server socket directory. Socket ' 42 | 'name will be constructed using process name: ' 43 | '.sock.', 44 | required=False, default=None) 45 | parser.add_argument('-m', '--method', dest='method', type=str, 46 | help='XML RPC method name. Default is %s' % ( 47 | xmlrpc.DEFAULT_METHOD,), required=False, 48 | default=xmlrpc.DEFAULT_METHOD) 49 | parser.add_argument('-U', '--username', dest='username', type=str, 50 | help='XMLRPC check username', required=False, 51 | default=None) 52 | parser.add_argument('-P', '--password', dest='password', type=str, 53 | help='XMLRPC check password', required=False, 54 | default=None) 55 | parser.add_argument( 56 | '-p', '--port', dest='port', type=str, 57 | default=None, required=False, 58 | help='Port to query. Can be integer or regular expression which ' 59 | 'will be used to extract port from a process name.') 60 | parser.add_argument( 61 | '-r', '--num-retries', dest='num_retries', type=int, 62 | default=xmlrpc.DEFAULT_RETRIES, required=False, 63 | help='Connection retries. Default: %s' % (xmlrpc.DEFAULT_RETRIES,)) 64 | 65 | return parser 66 | 67 | 68 | def main(): 69 | 70 | arg_parser = _make_argument_parser() 71 | args = arg_parser.parse_args() 72 | 73 | checks_config = [(xmlrpc.XMLRPCCheck, {'url': args.url, 74 | 'sock_path': args.sock_path, 75 | 'sock_dir': args.sock_dir, 76 | 'num_retries': args.num_retries, 77 | 'port': args.port, 78 | 'method': args.method, 79 | 'username': args.username, 80 | 'password': args.password, 81 | })] 82 | 83 | return check_runner.CheckRunner( 84 | args.check_name, args.process_group, args.process_name, checks_config).run() 85 | 86 | 87 | if __name__ == '__main__': 88 | 89 | sys.exit(main()) 90 | 91 | -------------------------------------------------------------------------------- /supervisor_checks/check_modules/__init__.py: -------------------------------------------------------------------------------- 1 | """Health check modules. 2 | """ 3 | 4 | __author__ = 'vovanec@gmail.com' 5 | -------------------------------------------------------------------------------- /supervisor_checks/check_modules/base.py: -------------------------------------------------------------------------------- 1 | """Base class for checks. 2 | """ 3 | 4 | __author__ = 'vovanec@gmail.com' 5 | 6 | 7 | class BaseCheck(object): 8 | """Base class for checks. 9 | """ 10 | 11 | NAME = None 12 | 13 | def __init__(self, check_config, log): 14 | """Constructor. 15 | 16 | :param dict check_config: implementation specific check config. 17 | :param (str) -> None log: logging function. 18 | """ 19 | 20 | self._config = check_config 21 | self._validate_config() 22 | self.__log = log 23 | 24 | def __call__(self, process_spec): 25 | """Run single check. 26 | 27 | :param dict process_spec: process specification dictionary as returned 28 | by SupervisorD API. 29 | 30 | :return: True is check succeeded, otherwise False. If check failed - 31 | monitored process will be automatically restarted. 32 | 33 | :rtype: bool 34 | """ 35 | 36 | raise NotImplementedError 37 | 38 | def _validate_config(self): 39 | """Method may be implemented in subclasses. Should return None or 40 | raise InvalidCheckConfig in case if configuration is invalid. 41 | 42 | Here's typical example of parameter check: 43 | 44 | if 'url' not in self._config: 45 | raise errors.InvalidCheckConfig( 46 | 'Required `url` parameter is missing in %s check config.' % ( 47 | self.NAME,)) 48 | """ 49 | 50 | pass 51 | 52 | def _log(self, msg, *args): 53 | """Log check message. 54 | 55 | :param str msg: log message. 56 | """ 57 | 58 | self.__log('%s: %s' % (self.__class__.__name__, msg % args)) 59 | -------------------------------------------------------------------------------- /supervisor_checks/check_modules/cpu.py: -------------------------------------------------------------------------------- 1 | """Process check based on CPU usage. 2 | """ 3 | 4 | import psutil 5 | import time 6 | 7 | from supervisor_checks import errors 8 | from supervisor_checks.check_modules import base 9 | 10 | __author__ = 'vovanec@gmail.com' 11 | 12 | 13 | PSUTIL_CHECK_INTERVAL = 3.0 14 | DEF_CPU_CHECK_INTERVAL = 3600 15 | 16 | 17 | class CPUCheck(base.BaseCheck): 18 | """Process check based on CPU usage. 19 | """ 20 | 21 | NAME = 'cpu' 22 | 23 | def __init__(self, *args, **kwargs): 24 | 25 | super().__init__(*args, **kwargs) 26 | 27 | self._process_states = {} 28 | self._check_interval = self._config.get( 29 | 'interval', DEF_CPU_CHECK_INTERVAL) 30 | 31 | def __call__(self, process_spec): 32 | 33 | pid = process_spec['pid'] 34 | process_name = process_spec['name'] 35 | 36 | cpu_pct = self._get_cpu_percent(pid, process_name) 37 | self._log('CPU percent used by process %s is %s', 38 | process_name, cpu_pct) 39 | 40 | proc_state = self._process_states.setdefault( 41 | process_name, {'first_seen_over_threshold': float('inf'), 42 | 'over_threshold': False}) 43 | 44 | first_seen_over_threshold = proc_state['first_seen_over_threshold'] 45 | over_threshold = proc_state['over_threshold'] 46 | 47 | if cpu_pct > self._config['max_cpu']: 48 | if time.time() - first_seen_over_threshold > self._check_interval: 49 | self._log( 50 | 'CPU usage for process %s has been above the configured ' 51 | 'threshold %s for maximum allowed interval: %s seconds.', 52 | process_name, self._config['max_cpu'], self._check_interval) 53 | 54 | self._process_states[process_name] = { 55 | 'first_seen_over_threshold': float('inf'), 56 | 'over_threshold': False} 57 | 58 | return False 59 | elif not over_threshold: 60 | self._log('CPU usage for process %s is above the threshold %s.', 61 | process_name, self._config['max_cpu'],) 62 | 63 | self._process_states[process_name] = { 64 | 'first_seen_over_threshold': time.time(), 65 | 'over_threshold': True} 66 | else: 67 | self._log('CPU usage for process %s is above the threshold ' 68 | '%s for %s seconds.', process_name, 69 | self._config['max_cpu'], self._check_interval) 70 | else: 71 | if over_threshold: 72 | self._log('CPU usage for process %s dropped below the ' 73 | 'threshold %s after %s seconds.', process_name, 74 | self._config['max_cpu'], self._check_interval) 75 | 76 | self._process_states[process_name] = { 77 | 'first_seen_over_threshold': float('inf'), 78 | 'over_threshold': False} 79 | 80 | return True 81 | 82 | def _get_cpu_percent(self, pid, process_name): 83 | """Get CPU percent used by process. 84 | """ 85 | 86 | self._log('Checking for CPU percent used by process %s.', process_name) 87 | 88 | return psutil.Process(pid).cpu_percent(PSUTIL_CHECK_INTERVAL) 89 | 90 | def _validate_config(self): 91 | 92 | if 'max_cpu' not in self._config: 93 | raise errors.InvalidCheckConfig( 94 | 'Required `max_cpu` parameter is missing in %s check config.' 95 | % (self.NAME,)) 96 | 97 | if not isinstance(self._config['max_cpu'], (int, float)): 98 | raise errors.InvalidCheckConfig( 99 | '`max_cpu` parameter must be numeric type in %s check config.' 100 | % (self.NAME,)) 101 | -------------------------------------------------------------------------------- /supervisor_checks/check_modules/file.py: -------------------------------------------------------------------------------- 1 | import time 2 | import os 3 | from supervisor_checks import errors 4 | from supervisor_checks import utils 5 | from supervisor_checks.check_modules import base 6 | 7 | 8 | class FileCheck(base.BaseCheck): 9 | NAME = "file" 10 | 11 | def __call__(self, process_spec): 12 | notification_filepath = self._config["filepath"] 13 | if notification_filepath is None: 14 | notification_filepath = utils.NotificationFile.get_filepath(self._config["root_dir"], process_spec["group"], process_spec["name"], process_spec["pid"]) 15 | 16 | try: 17 | stat = os.stat(notification_filepath, follow_symlinks=False) 18 | return time.time() - stat.st_ctime <= self._config["timeout"] 19 | except OSError: 20 | self._log("ERROR: Could not stat file: %s" % (notification_filepath,)) 21 | return not self._config["fail_on_error"] 22 | 23 | 24 | def _validate_config(self): 25 | if "timeout" not in self._config: 26 | raise errors.InvalidCheckConfig( 27 | 'Required `timeout` parameter is missing in %s check config.' % ( 28 | self.NAME,)) 29 | 30 | if not isinstance(self._config['timeout'], int): 31 | raise errors.InvalidCheckConfig( 32 | '`timeout` parameter must be int type in %s check config.' % ( 33 | self.NAME,)) 34 | 35 | if "fail_on_error" not in self._config: 36 | raise errors.InvalidCheckConfig( 37 | 'Required `fail_on_error` parameter is missing in %s check config.' % ( 38 | self.NAME,)) 39 | 40 | if "filepath" not in self._config: 41 | raise errors.InvalidCheckConfig( 42 | 'Required `filepath` parameter is missing in %s check config.' % ( 43 | self.NAME,)) 44 | 45 | if "root_dir" not in self._config: 46 | raise errors.InvalidCheckConfig( 47 | 'Required `root_dir` parameter is missing in %s check config.' % ( 48 | self.NAME,)) -------------------------------------------------------------------------------- /supervisor_checks/check_modules/http.py: -------------------------------------------------------------------------------- 1 | """Process check based on HTTP query. 2 | """ 3 | 4 | import base64 5 | import json 6 | 7 | from supervisor_checks import errors 8 | from supervisor_checks import utils 9 | from supervisor_checks.check_modules import base 10 | from supervisor_checks.compat import httplib 11 | 12 | __author__ = 'vovanec@gmail.com' 13 | 14 | 15 | DEFAULT_RETRIES = 2 16 | DEFAULT_TIMEOUT = 15 17 | 18 | LOCALHOST = '127.0.0.1' 19 | 20 | 21 | class HTTPCheck(base.BaseCheck): 22 | """Process check based on HTTP query. 23 | """ 24 | 25 | HEADERS = {'User-Agent': 'http_check'} 26 | NAME = 'http' 27 | 28 | def __call__(self, process_spec): 29 | 30 | try: 31 | port = utils.get_port(self._config['port'], process_spec['name']) 32 | return self._http_check(process_spec['name'], port) 33 | except errors.InvalidPortSpec: 34 | self._log('ERROR: Could not extract the HTTP port for process ' 35 | 'name %s using port specification %s.', 36 | process_spec['name'], self._config['port']) 37 | 38 | return True 39 | except Exception as exc: 40 | self._log('Check failed: %s', exc) 41 | 42 | return False 43 | 44 | def _http_check(self, process_name, port): 45 | 46 | self._log('Querying URL http://%s:%s%s for process %s', 47 | LOCALHOST, port, self._config['url'], 48 | process_name) 49 | 50 | host_port = '%s:%s' % (LOCALHOST, port,) 51 | num_retries = self._config.get('num_retries', DEFAULT_RETRIES) 52 | timeout = self._config.get('timeout', DEFAULT_TIMEOUT) 53 | username = self._config.get('username') 54 | password = self._config.get('password') 55 | 56 | with utils.retry_errors(num_retries, self._log).retry_context( 57 | self._make_http_request) as retry_http_request: 58 | res = retry_http_request( 59 | host_port, timeout, username=username, password=password) 60 | 61 | self._log('Status contacting URL http://%s%s for process %s: ' 62 | '%s %s' % (host_port, self._config['url'], process_name, 63 | res.status, res.reason)) 64 | 65 | if res.status != httplib.OK: 66 | raise httplib.HTTPException( 67 | 'Bad HTTP status code: %s' % (res.status,)) 68 | 69 | return True 70 | 71 | def _make_http_request(self, host_port, timeout, 72 | username=None, password=None): 73 | 74 | connection = httplib.HTTPConnection(host_port, timeout=timeout) 75 | headers = self.HEADERS.copy() 76 | 77 | if username and password: 78 | auth_str = '%s:%s' % (username, password) 79 | headers['Authorization'] = 'Basic %s' % base64.b64encode( 80 | auth_str.encode()).decode() 81 | 82 | config_headers = self._config.get('headers') 83 | if config_headers: 84 | headers.update(config_headers) 85 | # auto apply content type if json argument is passed in 86 | if self._config.get('json'): 87 | headers['Content-Type'] = 'application/json' 88 | 89 | body = self._config.get('body') 90 | json_body = self._config.get('json') 91 | if json_body: 92 | body = json.dumps(json_body) 93 | 94 | connection.request( 95 | self._config.get('method'), self._config['url'], body, 96 | headers=headers) 97 | 98 | return connection.getresponse() 99 | 100 | def _validate_config(self): 101 | 102 | if 'url' not in self._config: 103 | raise errors.InvalidCheckConfig( 104 | 'Required `url` parameter is missing in %s check config.' % ( 105 | self.NAME,)) 106 | 107 | if not isinstance(self._config['url'], str): 108 | raise errors.InvalidCheckConfig( 109 | '`url` parameter must be string type in %s check config.' % ( 110 | self.NAME,)) 111 | 112 | if 'port' not in self._config: 113 | raise errors.InvalidCheckConfig( 114 | 'Required `port` parameter is missing in %s check config.' % ( 115 | self.NAME,)) 116 | -------------------------------------------------------------------------------- /supervisor_checks/check_modules/memory.py: -------------------------------------------------------------------------------- 1 | """Process check based on RSS memory usage. 2 | """ 3 | 4 | import psutil 5 | 6 | from supervisor_checks import errors 7 | from supervisor_checks.check_modules import base 8 | 9 | __author__ = 'vovanec@gmail.com' 10 | 11 | 12 | class MemoryCheck(base.BaseCheck): 13 | """Process check based on memory usage. 14 | """ 15 | 16 | NAME = 'memory' 17 | 18 | def __call__(self, process_spec): 19 | 20 | pid = process_spec['pid'] 21 | process_name = process_spec['name'] 22 | 23 | if self._config.get('cumulative', False): 24 | rss = self._get_cumulative_rss(pid, process_name) 25 | else: 26 | rss = self._get_rss(pid, process_name) 27 | 28 | self._log('Total memory consumed by process %s is %s KB', 29 | process_name, rss) 30 | 31 | if rss > self._config['max_rss']: 32 | self._log('Memory usage for process %s is above the configured ' 33 | 'threshold: %s KB vs %s KB.', process_name, 34 | rss, self._config['max_rss']) 35 | 36 | return False 37 | 38 | return True 39 | 40 | def _get_rss(self, pid, process_name): 41 | """Get RSS used by process. 42 | """ 43 | 44 | self._log('Checking for RSS memory used by process %s', process_name) 45 | 46 | return int(psutil.Process(pid).memory_info().rss / 1024) 47 | 48 | def _get_cumulative_rss(self, pid, process_name): 49 | """Get cumulative RSS used by process and all its children. 50 | """ 51 | 52 | self._log('Checking for cumulative RSS memory used by process %s', 53 | process_name) 54 | 55 | parent = psutil.Process(pid) 56 | rss_total = parent.memory_info().rss 57 | for child_process in parent.children(recursive=True): 58 | rss_total += child_process.memory_info().rss 59 | 60 | return int(rss_total / 1024) 61 | 62 | def _validate_config(self): 63 | 64 | if 'max_rss' not in self._config: 65 | raise errors.InvalidCheckConfig( 66 | 'Required `max_rss` parameter is missing in %s check config.' 67 | % (self.NAME,)) 68 | 69 | if not isinstance(self._config['max_rss'], (int, float)): 70 | raise errors.InvalidCheckConfig( 71 | '`max_rss` parameter must be numeric type in %s check config.' 72 | % (self.NAME,)) 73 | -------------------------------------------------------------------------------- /supervisor_checks/check_modules/tcp.py: -------------------------------------------------------------------------------- 1 | """Process check based on TCP connection status. 2 | """ 3 | 4 | import socket 5 | 6 | from supervisor_checks import errors 7 | from supervisor_checks import utils 8 | from supervisor_checks.check_modules import base 9 | 10 | __author__ = 'vovanec@gmail.com' 11 | 12 | 13 | DEFAULT_RETRIES = 2 14 | DEFAULT_TIMEOUT = 15 15 | 16 | LOCALHOST = '127.0.0.1' 17 | 18 | 19 | class TCPCheck(base.BaseCheck): 20 | """Process check based on TCP connection status. 21 | """ 22 | 23 | NAME = 'tcp' 24 | 25 | def __call__(self, process_spec): 26 | 27 | timeout = self._config.get('timeout', DEFAULT_TIMEOUT) 28 | num_retries = self._config.get('num_retries', DEFAULT_RETRIES) 29 | 30 | try: 31 | port = utils.get_port(self._config['port'], process_spec['name']) 32 | with utils.retry_errors(num_retries, self._log).retry_context( 33 | self._tcp_check) as retry_tcp_check: 34 | return retry_tcp_check(process_spec['name'], port, timeout) 35 | except errors.InvalidPortSpec: 36 | self._log('ERROR: Could not extract the HTTP port for process ' 37 | 'name %s using port specification %s.', 38 | process_spec['name'], self._config['port']) 39 | 40 | return True 41 | except Exception as exc: 42 | self._log('Check failed: %s', exc) 43 | 44 | return False 45 | 46 | def _tcp_check(self, process_name, port, timeout): 47 | 48 | self._log('Trying to connect to TCP port %s for process %s', 49 | port, process_name) 50 | sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 51 | sock.settimeout(timeout) 52 | sock.connect((LOCALHOST, port)) 53 | sock.close() 54 | 55 | self._log('Successfully connected to TCP port %s for process %s', 56 | port, process_name) 57 | 58 | return True 59 | 60 | def _validate_config(self): 61 | 62 | if 'port' not in self._config: 63 | raise errors.InvalidCheckConfig( 64 | 'Required `port` parameter is missing in %s check config.' % ( 65 | self.NAME,)) 66 | -------------------------------------------------------------------------------- /supervisor_checks/check_modules/xmlrpc.py: -------------------------------------------------------------------------------- 1 | """Process check based on call to XML RPC server. 2 | """ 3 | 4 | import supervisor.xmlrpc 5 | 6 | from supervisor_checks import errors 7 | from supervisor_checks import utils 8 | from supervisor_checks.check_modules import base 9 | from supervisor_checks.compat import xmlrpclib 10 | 11 | __author__ = 'vovanec@gmail.com' 12 | 13 | 14 | DEFAULT_RETRIES = 2 15 | DEFAULT_METHOD = 'status' 16 | 17 | LOCALHOST = '127.0.0.1' 18 | 19 | 20 | class XMLRPCCheck(base.BaseCheck): 21 | """Process check based on query to XML RPC server. 22 | """ 23 | 24 | NAME = 'xmlrpc' 25 | 26 | def __call__(self, process_spec): 27 | 28 | try: 29 | process_name = process_spec['name'] 30 | retries_left = self._config.get('num_retries', DEFAULT_RETRIES) 31 | method_name = self._config.get('method', DEFAULT_METHOD) 32 | username = self._config.get('username') 33 | password = self._config.get('password') 34 | 35 | server_url = self._get_server_url(process_name) 36 | if not server_url: 37 | return True 38 | 39 | self._log('Querying XML RPC server at %s, method %s for process %s', 40 | server_url, method_name, process_name) 41 | 42 | with utils.retry_errors(retries_left, self._log).retry_context( 43 | self._xmlrpc_check) as retry_xmlrpc_check: 44 | 45 | return retry_xmlrpc_check(process_name, server_url, method_name, 46 | username=username, password=password) 47 | except Exception as exc: 48 | self._log('Check failed: %s', exc) 49 | 50 | return False 51 | 52 | def _xmlrpc_check(self, process_name, server_url, method_name, 53 | username=None, password=None): 54 | 55 | try: 56 | xmlrpc_result = getattr( 57 | self._get_rpc_client(server_url, 58 | username=username, 59 | password=password), method_name)() 60 | 61 | self._log('Successfully contacted XML RPC server at %s, ' 62 | 'method %s for process %s. Result: %s', server_url, 63 | method_name, process_name, xmlrpc_result) 64 | 65 | return True 66 | except xmlrpclib.Fault as err: 67 | self._log('XML RPC server returned error: %s', err) 68 | 69 | return False 70 | 71 | def _validate_config(self): 72 | 73 | one_of_required = set(['url', 'sock_path', 'sock_dir']) 74 | 75 | param_intersection = one_of_required.intersection(self._config) 76 | if not param_intersection: 77 | raise errors.InvalidCheckConfig( 78 | 'One of required parameters: `url`, `sock_path` or `sock_dir` ' 79 | 'is missing in %s check config.' % (self.NAME,)) 80 | 81 | if len(param_intersection) > 1: 82 | raise errors.InvalidCheckConfig( 83 | '`url`, `sock_path` and `sock_dir` must be mutually exclusive' 84 | 'in %s check config.' % (self.NAME,)) 85 | 86 | if 'url' in self._config and 'port' not in self._config: 87 | raise errors.InvalidCheckConfig( 88 | 'When `url` parameter is specified, `port` parameter is ' 89 | 'required in %s check config.' % (self.NAME,)) 90 | 91 | def _get_server_url(self, process_name): 92 | """Construct XML RPC server URL. 93 | 94 | :param str process_name: process name. 95 | :rtype: str|None 96 | """ 97 | 98 | url = self._config.get('url') 99 | 100 | if url: 101 | try: 102 | port = utils.get_port(self._config['port'], 103 | process_name) 104 | 105 | return 'http://%s:%s%s' % (LOCALHOST, port, url) 106 | except errors.InvalidPortSpec: 107 | self._log('ERROR: Could not extract the HTTP port for ' 108 | 'process name %s using port specification %s.', 109 | process_name, self._config['port']) 110 | else: 111 | sock_path = self._config.get('sock_path') 112 | 113 | if not sock_path: 114 | sock_dir = self._config.get('sock_dir') 115 | 116 | if not sock_dir: 117 | self._log('ERROR: Could not construct XML RPC socket ' 118 | 'path using configuration provided. sock_dir ' 119 | 'or sock_path argument must be specified.') 120 | return None 121 | 122 | sock_path = 'unix://%s/%s.sock' % (sock_dir, process_name,) 123 | 124 | if not sock_path.startswith('unix://'): 125 | sock_path = 'unix://%s' % (sock_path,) 126 | 127 | return sock_path 128 | 129 | @staticmethod 130 | def _get_rpc_client(server_url, username=None, password=None): 131 | 132 | return xmlrpclib.ServerProxy( 133 | 'http://127.0.0.1', supervisor.xmlrpc.SupervisorTransport( 134 | username, password, server_url)) 135 | -------------------------------------------------------------------------------- /supervisor_checks/check_runner.py: -------------------------------------------------------------------------------- 1 | """Instance of CheckRunner runs the set of pre-configured checks 2 | against the process running under SupervisorD. 3 | """ 4 | 5 | import concurrent.futures 6 | import datetime 7 | import os 8 | import select 9 | import signal 10 | import sys 11 | import threading 12 | 13 | from supervisor import childutils 14 | from supervisor.options import make_namespec, split_namespec 15 | from supervisor.states import ProcessStates 16 | 17 | from supervisor_checks.compat import xmlrpclib 18 | 19 | __author__ = 'vovanec@gmail.com' 20 | 21 | 22 | # Process spec keys 23 | STATE_KEY = 'state' 24 | NAME_KEY = 'name' 25 | GROUP_KEY = 'group' 26 | EVENT_NAME_KEY = 'eventname' 27 | 28 | MAX_THREADS = 16 29 | TICK_EVENTS = set(['TICK_5', 'TICK_60', 'TICK_3600']) 30 | 31 | 32 | class AboutToShutdown(Exception): 33 | """Raised from supervisor events read loop, when 34 | application is about to shutdown. 35 | """ 36 | 37 | pass 38 | 39 | 40 | class CheckRunner(object): 41 | """SupervisorD checks runner. 42 | """ 43 | 44 | def __init__(self, check_name, process_group, process_name, checks_config, env=None): 45 | """Constructor. 46 | 47 | :param str check_name: the name of check to display in log. 48 | :param str process_group: the name of the process group. 49 | :param list checks_config: the list of check module configurations 50 | in format [(check_class, check_configuration_dictionary)] 51 | :param dict env: environment. 52 | """ 53 | 54 | self._environment = env or os.environ 55 | self._name = check_name 56 | self._checks_config = checks_config 57 | self._checks = self._init_checks() 58 | self._process_group = process_group 59 | # represents specific process name 60 | self._process_name = process_name 61 | self._group_check_name = '%s_check' % (self._process_display_name(),) 62 | self._rpc_client = childutils.getRPCInterface(self._environment) 63 | self._stop_event = threading.Event() 64 | 65 | def run(self): 66 | """Run main check loop. 67 | """ 68 | 69 | self._log('Starting the health check for %s process ' 70 | 'Checks config: %s', self._process_display_name(), self._checks_config) 71 | 72 | self._install_signal_handlers() 73 | 74 | while not self._stop_event.is_set(): 75 | 76 | try: 77 | event_type = self._wait_for_supervisor_event() 78 | except AboutToShutdown: 79 | self._log( 80 | 'Health check for %s process has been told to stop.', 81 | self._process_display_name()) 82 | 83 | break 84 | 85 | if event_type in TICK_EVENTS: 86 | self._check_processes() 87 | else: 88 | self._log('Received unsupported event type: %s', event_type) 89 | 90 | childutils.listener.ok(sys.stdout) 91 | 92 | self._log('Done.') 93 | 94 | def _check_processes(self): 95 | """Run single check loop for process group or name. 96 | """ 97 | 98 | process_specs = self._get_process_spec_list(ProcessStates.RUNNING) 99 | if process_specs: 100 | if len(process_specs) == 1: 101 | self._check_and_restart(process_specs[0]) 102 | else: 103 | # Query and restart in multiple threads simultaneously. 104 | with concurrent.futures.ThreadPoolExecutor(MAX_THREADS) as pool: 105 | for process_spec in process_specs: 106 | pool.submit(self._check_and_restart, process_spec) 107 | else: 108 | self._log( 109 | 'No processes in state RUNNING found for process %s', 110 | self._process_display_name()) 111 | 112 | def _check_and_restart(self, process_spec): 113 | """Run checks for the process and restart if needed. 114 | """ 115 | 116 | for check in self._checks: 117 | self._log('Performing `%s` check for process name %s', 118 | check.NAME, process_spec['name']) 119 | 120 | try: 121 | if not check(process_spec): 122 | self._log('`%s` check failed for process %s. Trying to ' 123 | 'restart.', check.NAME, process_spec['name']) 124 | 125 | return self._restart_process(process_spec) 126 | else: 127 | self._log('`%s` check succeeded for process %s', 128 | check.NAME, process_spec['name']) 129 | except Exception as exc: 130 | self._log('`%s` check raised error for process %s: %s', 131 | check.NAME, process_spec['name'], exc) 132 | 133 | def _init_checks(self): 134 | """Init check instances. 135 | 136 | :rtype: list 137 | """ 138 | 139 | checks = [] 140 | for check_class, check_cfg in self._checks_config: 141 | checks.append(check_class(check_cfg, self._log)) 142 | 143 | return checks 144 | 145 | def _get_process_spec_list(self, state=None): 146 | """Get the list of processes in a process group or name. 147 | 148 | If process_name doesn't exist then get all processes in the defined group 149 | If process_name exists then get only the process(es) that match that name 150 | """ 151 | 152 | process_specs = [] 153 | for process_spec in self._rpc_client.supervisor.getAllProcessInfo(): 154 | if not self._process_name: 155 | if (process_spec[GROUP_KEY] == self._process_group and 156 | (state is None or process_spec[STATE_KEY] == state)): 157 | process_specs.append(process_spec) 158 | else: 159 | if ((process_spec[GROUP_KEY], process_spec[NAME_KEY]) == 160 | split_namespec(self._process_name) and 161 | (state is None or process_spec[STATE_KEY] == state)): 162 | process_specs.append(process_spec) 163 | 164 | return process_specs 165 | 166 | def _restart_process(self, process_spec): 167 | """Restart a process. 168 | """ 169 | 170 | if not self._process_name: 171 | name_spec = make_namespec( 172 | process_spec[GROUP_KEY], process_spec[NAME_KEY]) 173 | else: 174 | name_spec_tuple = split_namespec(self._process_name) 175 | name_spec = make_namespec(name_spec_tuple[0], name_spec_tuple[1]) 176 | 177 | rpc_client = childutils.getRPCInterface(self._environment) 178 | 179 | process_spec = rpc_client.supervisor.getProcessInfo(name_spec) 180 | if process_spec[STATE_KEY] is ProcessStates.RUNNING: 181 | self._log('Trying to stop process %s', name_spec) 182 | 183 | try: 184 | rpc_client.supervisor.stopProcess(name_spec) 185 | self._log('Stopped process %s', name_spec) 186 | except xmlrpclib.Fault as exc: 187 | self._log('Failed to stop process %s: %s', name_spec, exc) 188 | 189 | try: 190 | self._log('Starting process %s', name_spec) 191 | rpc_client.supervisor.startProcess(name_spec, False) 192 | except xmlrpclib.Fault as exc: 193 | self._log('Failed to start process %s: %s', name_spec, exc) 194 | 195 | else: 196 | self._log('%s not in RUNNING state, cannot restart', name_spec) 197 | 198 | def _log(self, msg, *args): 199 | """Write message to STDERR. 200 | 201 | :param str msg: string message. 202 | """ 203 | 204 | curr_dt = datetime.datetime.now().strftime('%Y/%m/%d %H:%M:%S') 205 | 206 | sys.stderr.write( 207 | '%s [%s] %s\n' % (curr_dt, self._name, msg % args,)) 208 | 209 | sys.stderr.flush() 210 | 211 | def _install_signal_handlers(self): 212 | """Install signal handlers. 213 | """ 214 | 215 | self._log('Installing signal handlers.') 216 | 217 | for sig in (signal.SIGINT, signal.SIGUSR1, signal.SIGHUP, 218 | signal.SIGTERM, signal.SIGQUIT): 219 | signal.signal(sig, self._signal_handler) 220 | 221 | def _signal_handler(self, signum, _): 222 | """Signal handler. 223 | """ 224 | 225 | self._stop_event.set() 226 | 227 | def _wait_for_supervisor_event(self): 228 | """Wait for supervisor events. 229 | """ 230 | 231 | childutils.listener.ready(sys.stdout) 232 | 233 | while not self._stop_event.is_set(): 234 | try: 235 | rdfs, _, _ = select.select([sys.stdin], [], [], .5) 236 | except InterruptedError: 237 | continue 238 | 239 | if rdfs: 240 | headers = childutils.get_headers(rdfs[0].readline()) 241 | # Read the payload to make read buffer empty. 242 | _ = sys.stdin.read(int(headers['len'])) 243 | event_type = headers[EVENT_NAME_KEY] 244 | self._log('Received %s event from supervisor', event_type) 245 | 246 | return event_type 247 | 248 | raise AboutToShutdown 249 | 250 | def _process_display_name(self): 251 | return self._process_name or self._process_group 252 | -------------------------------------------------------------------------------- /supervisor_checks/compat.py: -------------------------------------------------------------------------------- 1 | """Compatibility functions. 2 | """ 3 | 4 | __author__ = 'vovanec@gmail.com' 5 | 6 | try: # pragma: no cover 7 | from supervisor.compat import httplib 8 | except ImportError: # pragma: no cover 9 | import httplib 10 | 11 | try: # pragma: no cover 12 | from supervisor.compat import xmlrpclib 13 | except ImportError: # pragma: no cover 14 | import xmlrpclib 15 | -------------------------------------------------------------------------------- /supervisor_checks/errors.py: -------------------------------------------------------------------------------- 1 | """Error classes. 2 | """ 3 | 4 | __author__ = 'vovanec@gmail.com' 5 | 6 | 7 | class InvalidCheckConfig(ValueError): 8 | """Raised when invalid configuration dictionary passed to check module. 9 | """ 10 | 11 | pass 12 | 13 | 14 | class InvalidPortSpec(InvalidCheckConfig): 15 | """Raised when invalid port specification was provided in config. 16 | """ 17 | 18 | pass 19 | -------------------------------------------------------------------------------- /supervisor_checks/utils.py: -------------------------------------------------------------------------------- 1 | """Utility functions. 2 | """ 3 | 4 | 5 | import contextlib 6 | import functools 7 | import re 8 | import time 9 | import os 10 | import tempfile 11 | 12 | from supervisor_checks import errors 13 | 14 | __author__ = 'vovanec@gmail.com' 15 | 16 | 17 | RETRY_SLEEP_TIME = 3 18 | 19 | 20 | class retry_errors(object): 21 | """Decorator to retry on errors. 22 | """ 23 | 24 | def __init__(self, num_retries, log): 25 | 26 | self._num_retries = num_retries 27 | self._log = log 28 | 29 | def __call__(self, func): 30 | 31 | @functools.wraps(func) 32 | def wrap_it(*args, **kwargs): 33 | tries_count = 0 34 | while True: 35 | try: 36 | return func(*args, **kwargs) 37 | except Exception as exc: 38 | tries_count += 1 39 | 40 | if tries_count <= self._num_retries: 41 | retry_in = tries_count * RETRY_SLEEP_TIME 42 | self._log( 43 | 'Exception occurred: %s. Retry in %s seconds.' % ( 44 | exc, retry_in)) 45 | 46 | time.sleep(retry_in) 47 | else: 48 | raise 49 | 50 | return wrap_it 51 | 52 | @contextlib.contextmanager 53 | def retry_context(self, func): 54 | """Use retry_errors object as a context manager. 55 | 56 | :param func: decorated function. 57 | """ 58 | 59 | yield self(func) 60 | 61 | 62 | def get_port(port_or_port_re, process_name): 63 | """Given the regular expression, extract port from the process name. 64 | 65 | :param str port_or_port_re: whether integer port or port regular expression. 66 | :param str process_name: process name. 67 | 68 | :rtype: int|None 69 | """ 70 | 71 | if isinstance(port_or_port_re, int): 72 | return port_or_port_re 73 | 74 | try: 75 | return int(port_or_port_re) 76 | except ValueError: 77 | pass 78 | 79 | match = re.match(port_or_port_re, process_name) 80 | 81 | if match: 82 | try: 83 | groups = match.groups() 84 | if len(groups) == 1: 85 | return int(groups[0]) 86 | except (ValueError, TypeError) as err: 87 | raise errors.InvalidCheckConfig(err) 88 | 89 | raise errors.InvalidCheckConfig( 90 | 'Could not extract port number for process name %s using regular ' 91 | 'expression %s' % (process_name, port_or_port_re)) 92 | 93 | 94 | class _TemporaryFileWrapper: 95 | """Temporary file wrapper 96 | 97 | This class provides a wrapper around files opened for 98 | temporary use. In particular, it seeks to automatically 99 | remove the file when it is no longer needed. 100 | """ 101 | 102 | def __init__(self, file, name, delete=True): 103 | self.file = file 104 | self.name = name 105 | self.delete = delete 106 | self.close_called = False 107 | 108 | def __getattr__(self, name): 109 | return getattr(self.file, name) 110 | 111 | def close(self, unlink=os.unlink): 112 | if not self.close_called and self.file is not None: 113 | self.close_called = True 114 | try: 115 | self.file.close() 116 | finally: 117 | if self.delete: 118 | unlink(self.name) 119 | 120 | def __del__(self): 121 | self.close() 122 | 123 | 124 | _open_flags = os.O_CREAT 125 | if hasattr(os, "O_NOFOLLOW"): 126 | _open_flags |= os.O_NOFOLLOW 127 | 128 | class NotificationFile: 129 | @staticmethod 130 | def get_filename(process_group, process_name, pid): 131 | return f"{process_group!s}-{process_name!s}-{pid!s}" 132 | 133 | @staticmethod 134 | def get_filepath(root_dir=None, process_group=None, process_name=None, pid=None): 135 | root_dir = root_dir if root_dir is not None else tempfile.gettempdir() 136 | process_group = process_group if process_group is not None else os.getenv("SUPERVISOR_GROUP_NAME") 137 | process_name = process_name if process_name is not None else os.getenv("SUPERVISOR_PROCESS_NAME") 138 | pid = pid if pid is not None else os.getpid() 139 | return os.path.join(root_dir, NotificationFile.get_filename(process_group, process_name, pid)) 140 | 141 | def __init__(self, filepath=None, root_dir=None, delete=True): 142 | """ 143 | Creates a NotificationFile object used to indicate a heartbeat. 144 | Only supports UNIX. 145 | 146 | param str filepath: optional filepath to use as notification file 147 | param str root_dir: optional root_dir to use for the notification file (default: tempfile.gettempdir()) 148 | param bool delete: wether to delete the notification file after fd is closed 149 | """ 150 | if filepath is None: 151 | filepath = self.get_filepath(root_dir=root_dir) 152 | 153 | def opener(file, flags): 154 | flags |= _open_flags 155 | return os.open(file, flags, mode=0o000) 156 | 157 | fd = open(filepath, "rb", buffering=0, opener=opener) 158 | self._tmp = _TemporaryFileWrapper(fd, filepath, delete=delete) 159 | 160 | self.spinner = 0 161 | 162 | def notify(self): 163 | self.spinner = (self.spinner + 1) % 2 164 | os.fchmod(self._tmp.fileno(), self.spinner) 165 | 166 | def close(self): 167 | return self._tmp.close() 168 | --------------------------------------------------------------------------------