├── LICENSE ├── README.md ├── image.png ├── install.sh ├── threshold-3.2.tar.gz └── threshold.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 secureoptions 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, and distribute copies of the Software, 9 | and to permit persons to whom the Software is furnished to do so, subject to 10 | the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | The author of this software is Benjamin J Fowler 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # threshold 2 | A simple tool which allows you to set up a packet-loss/latency, TCP-handshake, or HTTP/HTTPs file transfer monitor against a network host. The monitor will execute a user-defined action if it detects failure to the host. 3 |
4 | - [Installation](#installation) 5 | - [Command-Line Structure](#structure) 6 | - [Parameters](#parameters) 7 | - [Examples and Scenarios](#examples) 8 | - [Tips and Best Practices](#tips) 9 | - [Threshold Logging](#logging) 10 |
11 | 12 | 13 | ## Installation 14 | \# *From Linux terminal download the source code from Github*
15 | `wget https://github.com/secureoptions/threshold/raw/master/threshold-3.2.tar.gz`
16 | 17 | \# *Unpack and change into installation directory*
18 | `tar -xvzf threshold-3.2.tar.gz`
19 | `cd threshold-3.2/`
20 | 21 | \# *Run the installation script*
22 | `sudo ./install.sh`
23 | 24 | \# *Verify installation*
25 | `threshold --version`
26 |
27 | 28 | 29 | ## Command Structure 30 | 31 | 32 | 33 |
34 |
35 | 36 | ## Parameters 37 | __-a | --action__
38 | The user-defined action to take if the threshold monitor is triggered. This can be just about any command that you can execute from a linux system CLI. 39 | 40 | __-b | --backoff__
41 | Default is 60. Only used when specifying a http:// or https:// as target host (-d). This is the interval in seconds between consecutive download tests. It is sometimes needed if a target webserver throttles consecutive web requests from the same source. 42 | 43 | __-c | --count__
44 | Default is 3. The number of *consective* pings that must fail before the threshold monitor is considered to be in a failed state. If used with TCP handshakes (-P), it's the number of consecutive handshakes that must fail before the monitor is considered in a failed state. 45 | 46 | __-d | --destination__
47 | The target host IP or DNS hostname that you want to monitor. If this host becomes unresponsive for the parameters you define, then the action (-a "") is taken. __Important Note:__ if you use a prefix of http:// or https:// in the destination, this will create a download monitor which checks the actual download transfer time against the timeout (-t) that you have defined. If the download transfer time exceeds the timeout, then the monitor is considered in a failed state. 48 | 49 | __-i | --interval__
50 | Default is 5. The interval in seconds at which you want to send out individual ping packets. If used with TCP (-P), the interval in seconds that TCP handshakes will be initiated. 51 | 52 | __-k | --kill__
53 | Use to kill either a specific threshold monitor job (ie. threshold -k 3509) or kill ALL jobs (ie. threshold -k) 54 | 55 | __-l | --list__
56 | List the active threshold monitor jobs 57 | 58 | __-L | --latency__
59 | The maximum latency in milliseconds that a ping monitor can *average*. A ping monitor's average is calculated based on the number of pings it uses (-c). Therefore, its average latency reading is more accurate with the higher number of pings it's configured to use. 60 | 61 | __-o | --logging__
62 | User-defined log location for threshold. This is logging for threshold monitoring (not to be confused with *action* output). Default location is /var/log/threshold.log 63 | 64 | __-p | --persist__
65 | When setting this argument, your threshold monitor will remain persistent even if it has failed once. In this scenario, the monitor will execute the action you define, and then start itself again with same job parameters. 66 | 67 | __-P | --port__
68 | The TCP port that will be used to establish TCP handshakes on. Using this flag will also cause threshold to use a TCP-handshake monitor rather than a ping monitor. 69 | 70 | __-t | --timeout__
71 | Default is 1. The time in seconds to wait for a response back to ping or TCP SYN/ACK from target. If used with (-P) then timeout is not only the amount of time to wait for response for TCP SYN/ACK, but also the time to wait before sending FIN on successful TCP connections. When used with HTTP/HTTPs monitor, it is the maximum amount of time a download has to complete before placing the monitor into a 'failed' state, triggering the action. 72 | 73 | __-u | --uninstall__
74 | Uninstall threshold from your system. This will also stop any current jobs you have running. 75 | 76 | __-v | --version__
77 | The current version of threshold 78 | 79 | __-6 | --ipv6__
80 | Uses ipv6 rather than ipv4. 81 | 82 |
83 |
84 | 85 | ## Use cases and examples 86 | __(Example: Setting a ping monitor)__ Client machines have been experiencing sporadic connection timeouts when trying to SSH into a linux server (192.168.3.10). You suspect potential packet loss or high latency somewhere in the network. For troubleshooting you choose to use MTR to check the network path when the issue occurs again (credits:https://github.com/traviscross/mtr). MTR will run from one of the impacted client's machines: 87 | 88 | sudo threshold -c 5 -L 50 -d 192.168.3.10 -a "mtr -r -c 100 192.168.3.10 >> mtr-results.txt" -p 89 | 90 | The above example sets a simple ping monitor against (-d) *192.168.3.10*. If the host fails to respond to 5 consecutive pings (-c) __*OR*__ if the average latency of the 5 pings exceed 50ms (-L), the MTR tool will execute with its own arguments (-a), etc. 91 | 92 | __(Example: Setting a TCP handshake monitor)__ After troubleshooting some application issues, you noticed that you are getting occasional connection timeouts between your app server and database, "mydb.organization.org" (SQL/TCP 1433). You want to determine if this problem is due to a network issue or perhaps something higher up the stack. A packet capture with tcpdump may be appropriate at the next occurence of the issue (credits:http://www.tcpdump.org/): 93 | 94 | sudo threshold -c 6 -d mydb.organization.org -P 1433 -a "tcpdump -i eth0 host mydb.organization.org -c 10 -w db_capture.pcap" -p 95 | 96 | The above example will continually monitor TCP handshakes with *mydb.organization.org*. Aside from default values used, if this destination fails to respond to 6 consecutive handshake attempts (-c) on TCP port 1433 (-P) then a tcpdump packet capture will run and export results to a wireshark-readable file (-a). Note that setting the -P argument here is the only factor that tells threshold to use TCP handshakes instead of pings 97 | 98 | __(Example: Setting an HTTP/HTTPS file transfer monitor)__ You noticed that when downloading content from your webserver to your workstation, it sometimes takes longer than expected. From your particular network it usually takes about 5 minutes to complete a 100MB, but lately this is less frequently the case. 99 | 100 | Since previous ping and TCP-handshake monitors have come back clean, you decided that using a HTTP file transfer monitor in conjunction with an iperf3 test (the *action*) would be appropriate here. The iperf3 test will give you an idea of the raw throughput capabilities of your network the next time the issue occurs (credits: https://iperf.fr/iperf-download.php) 101 | 102 | sudo threshold -d http://mywebserver.com/some/100MBfile.zip -t 300 -b 10 -a "iperf3 -c mywebserver.com -time 300 --logfile iperf3-results.txt" -p 103 | 104 | Aside from default values, the above monitor will download a "100MBfile.zip" file from mywebserver.com. The download must complete in 5min or 300 seconds (-t). If this time is exceeded, the iperf3 action will be taken (-a). The interval between downloads is every 10 seconds (-b). 105 | 106 | Note that threshold will know that it should use downloads as monitor rather than ping and TCP handshakes since you have prefixed the host with *http://*, telling it that it's monitoring a webserver. 107 | 108 |
109 |
110 | 111 | ## Tips for successful monitoring 112 | 1) Make sure monitors will not fail immediately upon starting them. For example, make sure you can actually ping *example.com* before setting up a monitor against it 113 | 114 | 2) If you use system variables (ie. $HOME), be sure to enclose the action in double quotes ("") and escape the variable (ie. \$HOME) 115 | 116 | 3) Don't make the monitors too aggressive. There is a good chance they will fail based on a false alarm 117 | 118 | 4) Don't make the monitors too tolerant. You could miss critical events. To balance too aggressive vs. too tolerant, research and understand as much about the issue as you can prior to configuring a monitor. 119 | 120 | 5) Consider setting up multiple types of monitoring against one destination, with multiple criteria, settings and actions (followup analysis). This is helpful when you don't know which layer an issue is happening at. 121 | 122 | 6) Match appropriate actions with appropriate monitors. For example, it may make more sense to run a MTR as a followup action to a ping monitor, compared to running an iperf3 test as an action, etc. Conversely, it may make more sense to run an iperf3 test as the followup action to an HTTP file transfer monitor. 123 | 124 | 7) Your actions should have self-contained limits. For example, you might want to specify a max filesize of 10MB on pcaps, or a timelimit on iperf3 test, etc. These limits help reduce overall consumption and load on your system when you're away. 125 | 126 |
127 |
128 | 129 | ## Threshold Logging 130 | You can specify the path that you want threshold to log information about jobs by using the __-o__ flag. Logging contains information such as when jobs started, when and if they failed, or if they were stopped by a user. This log tells you what happen to threshold jobs *NOT* the subsequent actions that were executed. If you want the output and results of actions, you will need to define those output parameters within the action itself. 131 | 132 | Logging is helpful with all monitors, but especially the HTTP/HTTPs monitor. Not only will it tell you if a download failed to complete within a given timeout (-t), it will also log whether the download failed due to a particular HTTP/HTTPs error response code or if there was a TCP connection error. 133 | -------------------------------------------------------------------------------- /image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/secureoptions/threshold/cc98e4abf99f3a07058a4092cf3d3423ce7bc587/image.png -------------------------------------------------------------------------------- /install.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | # Installation script must be run with superuser 4 | if [ "$(id -u)" -ne 0 ] 5 | then 6 | echo "" 7 | echo "Permission denied (run with 'sudo'). Installation exiting..." 8 | echo "" 9 | exit 1 10 | fi 11 | 12 | # Check if python 2.7 is installed 13 | python --version > /dev/null 2>1 14 | 15 | result=$? 16 | 17 | if [ $result -ne 0 ] 18 | then 19 | echo "" 20 | echo 'You must have Python 2.7 installed before running this script' 21 | echo "" 22 | exit 1 23 | fi 24 | 25 | # Check if local system is a Mac 26 | sw_vers > /dev/null 2>1 27 | 28 | result=$? 29 | if [ $result -eq 0 ] 30 | then 31 | mv threshold.py /usr/local/bin/threshold 32 | else 33 | mv threshold.py /usr/bin/threshold 34 | fi 35 | 36 | mkdir -p /etc/threshold 37 | mv threshold.1.man /usr/share/man/man1/threshold.1 38 | gzip -f /usr/share/man/man1/threshold.1 39 | 40 | echo "" 41 | echo "Installation sucessful! You now safely remove this directory and install script" 42 | echo "" -------------------------------------------------------------------------------- /threshold-3.2.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/secureoptions/threshold/cc98e4abf99f3a07058a4092cf3d3423ce7bc587/threshold-3.2.tar.gz -------------------------------------------------------------------------------- /threshold.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | 3 | import os 4 | import sys 5 | 6 | VERSION="threshold v3.2" 7 | 8 | # if user is checking version 9 | try: 10 | if sys.argv[1] in ('-v', '--version'): 11 | print(VERSION) 12 | sys.exit() 13 | except IndexError: 14 | pass 15 | 16 | # Check if user is sudo 17 | if os.getuid() != 0: 18 | print('\nPermission ERROR: You must run this tool as a superuser\n' ) 19 | sys.exit() 20 | 21 | import re 22 | import signal 23 | import shutil 24 | import sqlite3 25 | import subprocess 26 | import time 27 | import threading 28 | import logging 29 | import getpass 30 | from datetime import datetime 31 | 32 | 33 | PORT=0 34 | TIMEOUT=1 35 | INTERVAL=5 36 | BACKOFF=60 37 | COUNT=3 38 | LATENCY = 500 39 | KILL=False 40 | LIST=False 41 | PERSIST=False 42 | UNINSTALL=False 43 | VERS=False 44 | IPV6=False 45 | LOGGING='/var/log/threshold.log' 46 | 47 | 48 | # Verify that at least one argument is used with tool 49 | if len(sys.argv) == 1: 50 | print('{}: You must specify at least one argument when using this tool:\n\n' 51 | '\t-d|--destination\n' 52 | '\t-t|--timeout\n' 53 | '\t-i|--interval\n' 54 | '\t-c|--count\n' 55 | '\t-a|--action\n' 56 | '\t-P|--port\n' 57 | '\t-k|--kill\n' 58 | '\t-b|--backoff\n' 59 | '\t-p|--persist\n' 60 | '\t-l|--list\n' 61 | '\t-u|--uninstall\n' 62 | '\t-6|--ipv6\n' 63 | '\t-L|--latency\n' 64 | '\t-o|--logging\n' 65 | '\t-v|--version\n\n' 66 | 'USAGE SYNTAX\n' 67 | 'sudo threshold [MONITOR OPTIONS] -a "[ACTION TO TAKE]"\n\n' 68 | 'Check out \'man threshold\' for more info' .format (sys.argv[0])) 69 | sys.exit() 70 | 71 | 72 | 73 | else: 74 | # Create one dict and one list. The one dict is of flags that require user parameters, the other list is of binary flags 75 | parameter_flags = {} 76 | binary_flags = [] 77 | for flag in sys.argv[1::]: 78 | # Flags requiring user-provided parameters 79 | accepted_flags = ['-L', '--latency','-d', '--destination','-t','--timeout', '-i','--interval', '-c','--count','-a', '--action','-P', '--port', '-o','--logging','-b', '--backoff'] 80 | # Binary Flags 81 | next_accepted_flags = ['-p', '--persist', '-l', '--list', '-u', '--uninstall', '-6', '--ipv6'] 82 | 83 | if flag in accepted_flags: 84 | try: 85 | if sys.argv[sys.argv.index(flag) + 1] not in accepted_flags + next_accepted_flags + ['-k', '--kill']: 86 | parameter_flags[flag] = sys.argv[sys.argv.index(flag) + 1] 87 | except: 88 | print('ERROR: Not all parameters were provided. See \'man threshold\' for assistance.') 89 | sys.exit() 90 | 91 | # could take optional parameter as well as be binary 92 | elif flag in ['-k', '--kill']: 93 | try: 94 | if re.match(r'\d{2,8}', sys.argv[sys.argv.index(flag) + 1]): 95 | KILL = sys.argv[sys.argv.index(flag) + 1] 96 | else: 97 | KILL = 'killall' 98 | except: 99 | KILL = 'killall' 100 | 101 | 102 | elif flag in next_accepted_flags: 103 | binary_flags.append(flag) 104 | 105 | 106 | for flag in parameter_flags: 107 | if flag in ('-d', '--destination'): 108 | HOSTIP = parameter_flags[flag] 109 | 110 | elif flag in ('-t','--timeout'): 111 | TIMEOUT = int(parameter_flags[flag]) 112 | 113 | elif flag in ('-i','--interval'): 114 | INTERVAL = int(parameter_flags[flag]) 115 | 116 | elif flag in ('-c','--count'): 117 | COUNT = int(parameter_flags[flag]) 118 | 119 | elif flag in ('-a', '--action'): 120 | ACTION = parameter_flags[flag] 121 | 122 | elif flag in ('-P', '--port'): 123 | PORT = int(parameter_flags[flag]) 124 | 125 | elif flag in ('-b', '--backoff'): 126 | BACKOFF = int(parameter_flags[flag]) 127 | 128 | elif flag in ('-L', '--latency'): 129 | LATENCY = int(parameter_flags[flag]) 130 | 131 | elif flag in ('-o', '--logging'): 132 | LOGGING = parameter_flags[flag] 133 | 134 | 135 | for flag in binary_flags: 136 | if flag in ('-p', '--persist'): 137 | PERSIST = True 138 | 139 | elif flag in ('-l', '--list'): 140 | LIST = True 141 | 142 | elif flag in ('-u', '--uninstall'): 143 | UNINSTALL = True 144 | 145 | elif flag in ('-6', '--ipv6'): 146 | IPV6 = True 147 | 148 | 149 | # Setup DB 150 | username = getpass.getuser() 151 | conn = sqlite3.connect('.threshold.db') 152 | c = conn.cursor() 153 | 154 | 155 | def close_things(): 156 | conn.commit() 157 | conn.close() 158 | sys.exit() 159 | 160 | 161 | c.execute('''CREATE TABLE IF NOT EXISTS jobs 162 | (pid integer, monitor text, criteria text, action text, persistent text, time text, logging text)''') 163 | 164 | def killall(): 165 | c.execute('SELECT pid,logging FROM jobs') 166 | pids = c.fetchall() 167 | 168 | if pids: 169 | print("\nKilling All jobs...") 170 | for pid in pids: 171 | # Set up logging 172 | LOGGING = pid[1] 173 | logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', filename=LOGGING,level=logging.DEBUG) 174 | try: 175 | c.execute('DELETE FROM jobs WHERE pid=?', (pid[0],)) 176 | os.kill(pid[0], signal.SIGTERM) 177 | logging.info('User killed threshold job: ' + str(pid[0])) 178 | except: 179 | pass 180 | else: 181 | print("\nThere are no current jobs running...Exiting") 182 | 183 | 184 | # If user wants to uninstall threshold 185 | if UNINSTALL: 186 | response = raw_input("\nAre you sure you want to uninstall threshold?\n" 187 | "Choose Y or N.\n" 188 | "---------------------------------------------\n" 189 | "y\\N> ") or "n" 190 | 191 | if response.lower() == 'y': 192 | killall() 193 | try: 194 | os.remove('/usr/bin/threshold') 195 | except: 196 | pass 197 | try: 198 | os.remove('/usr/local/bin/threshold') 199 | except: 200 | pass 201 | try: 202 | os.remove('/usr/share/man/man1/threshold.1') 203 | os.remove('/usr/share/man/man1/threshold.1.gz') 204 | except: 205 | pass 206 | print('\nthreshold has been uninstalled...') 207 | 208 | # If user wants to list existing jobs 209 | elif LIST: 210 | c.execute('SELECT * FROM jobs') 211 | jobs = c.fetchall() 212 | 213 | if jobs: 214 | for j in jobs: 215 | print('\nJobID: {}\n' 216 | 'Monitor Type: {}\n' 217 | 'Failure Criteria: {}\n' 218 | 'Action: {}\n' 219 | 'Persistent: {}\n' 220 | 'Monitor Running Since: {}\n' 221 | 'Threshold Logging: {}\n' 222 | .format(j[0], j[1], j[2], j[3], j[4], j[5], j[6])) 223 | else: 224 | print('\nNo registered jobs.') 225 | 226 | 227 | 228 | # If user wants to kill one or ALL jobs 229 | elif KILL == 'killall': 230 | response = raw_input("\nAre you sure you want to kill ALL threshold jobs?\n" 231 | "Choose Y or N.\n" 232 | "---------------------------------------------\n" 233 | "y\\N > ") or "n" 234 | 235 | if response.lower() == 'y': 236 | killall() 237 | 238 | elif KILL != False: 239 | try: 240 | # Set up logging 241 | c.execute('SELECT logging FROM jobs WHERE pid=?', (int(KILL),)) 242 | try: 243 | LOGGING = c.fetchall()[0][0] 244 | logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', filename=LOGGING,level=logging.DEBUG) 245 | c.execute('DELETE FROM jobs WHERE pid=?', (int(KILL),)) 246 | os.kill(int(KILL), signal.SIGTERM) 247 | logging.info('User killed threshold job: ' + KILL) 248 | except IndexError: 249 | print('\nThere is no job id {} currently registered with threshold' .format(KILL)) 250 | except: 251 | pass 252 | 253 | # If host ip exists, assume user is setting up a monitor 254 | elif HOSTIP: 255 | # Set up logging 256 | logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', filename=LOGGING,level=logging.DEBUG) 257 | # current date and time 258 | now = datetime.utcnow() 259 | 260 | # fork parent process to start child threshold jobs with 261 | fork = os.fork() 262 | 263 | # If user wants to set up HTTP/S monitor 264 | hostip_list = HOSTIP.split('://') 265 | if len(hostip_list) > 1: 266 | 267 | def http_monitor(fork_pid): 268 | global PERSIST 269 | # Check if necessary parameters have been filled out 270 | if TIMEOUT <= 1: 271 | print('\nERROR: It doesn\'t look like you specified TIMEOUT (\'-t | --timeout\') for the download monitor') 272 | close_things() 273 | 274 | # Check if user specified Ipv6 275 | if IPV6: 276 | args = 'curl -g -6 -o /dev/null -s -k -m {} "{}"' .format(TIMEOUT,HOSTIP) 277 | else: 278 | args = 'curl -o /dev/null -s -k -m {} {}' .format(TIMEOUT,HOSTIP) 279 | 280 | # Create DB entry 281 | try: 282 | c.execute('''INSERT INTO jobs VALUES 283 | (?,?,?,?,?,?,?)''', (fork_pid, 'Loop download of ' + HOSTIP, 'Completes in > ' + str(TIMEOUT) + ' second(s)', ACTION, str(PERSIST), str(now) + ' UTC', LOGGING)) 284 | conn.commit() 285 | except NameError: 286 | print('\nERROR: You haven\'t defined an action to take with \'-a | --action\'') 287 | close_things() 288 | 289 | while True: 290 | # Check that HTTP(s) response code is 2xx or 3xx, but fail if anything else 291 | p = subprocess.Popen('curl -I -s -k ' + HOSTIP, shell=True, stdout=subprocess.PIPE).communicate()[0].decode('utf-8') 292 | 293 | m = re.search(r'HTTP/1.1 (\d+)', p) 294 | if m: 295 | code = m.group(1) 296 | if int(code[0]) not in [2,3]: 297 | # Get defined action 298 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,)) 299 | action = c.fetchall()[0][0] 300 | 301 | # execute it 302 | subprocess.Popen(action, shell=True) 303 | logging.warning(str(fork_pid) + ' Monitor failed due HTTP(s) code (' + code + '): ' + args) 304 | 305 | logging.warning('Took action for ' + str(fork_pid) + ': ' + action) 306 | 307 | # Remove previous monitor job from since it triggered 308 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,)) 309 | conn.commit() 310 | PERSIST = False 311 | break 312 | 313 | # Start download 314 | p = subprocess.Popen(args, shell=True) 315 | 316 | while p.poll() is None: 317 | time.sleep(0.1) 318 | 319 | if p.returncode == 0: 320 | pass 321 | 322 | else: 323 | # Download did not finish within user-defined time. Execute defined action 324 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,)) 325 | action = c.fetchall()[0][0] 326 | 327 | # execute it 328 | subprocess.Popen(action, shell=True) 329 | 330 | # Log action 331 | if p.returncode == 28: 332 | logging.warning(str(fork_pid) + ' Monitor failed due to taking too long: ' + args) 333 | else: 334 | logging.warning(str(fork_pid) + ' failed with curl error code \'' + p.returncode + '\'') 335 | logging.warning(str(fork_pid) + ' reference error codes at https://curl.haxx.se/libcurl/c/libcurl-errors.html') 336 | logging.warning('Took action for ' + str(fork_pid) + ': ' + action) 337 | 338 | # Remove previous monitor job from since it triggered 339 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,)) 340 | conn.commit() 341 | break 342 | time.sleep(int(BACKOFF)) 343 | 344 | # Run the monitor 345 | if not fork: 346 | while True: 347 | logging.info('HTTP/HTTPS Download Monitor Job (' + str(os.getpid()) + ') Started...') 348 | http_monitor(os.getpid()) 349 | if PERSIST == False: 350 | break 351 | 352 | 353 | # If user wants to set up a TCP handshake monitor 354 | elif PORT != 0: 355 | 356 | # Check if user specified Ipv6 357 | if IPV6: 358 | args = 'nc -w -6 {} {} {} >/dev/null 2>&1' .format(TIMEOUT, HOSTIP, PORT) 359 | else: 360 | # Use ipv5 361 | args = 'nc -w {} {} {} >/dev/null 2>&1' .format(TIMEOUT, HOSTIP, PORT) 362 | 363 | def tcp_monitor(fork_pid): 364 | # Create DB entry 365 | try: 366 | c.execute('''INSERT INTO jobs VALUES 367 | (?,?,?,?,?,?,?)''', (fork_pid, 'TCP handshake with ' + HOSTIP, 'Fails ' + str(COUNT) +' consecutive *OR* ' + str(COUNT) + ' complete with average time > ' + str(TIMEOUT) + ' second(s)', ACTION, str(PERSIST), str(now) + ' UTC', LOGGING)) 368 | conn.commit() 369 | except NameError: 370 | print('\nERROR: You haven\'t defined an action to take with \'-a | --action\'') 371 | close_things() 372 | 373 | # Continuously loop through nc handshake until it fails 374 | handshake_results = [] 375 | while True: 376 | time.sleep(INTERVAL) 377 | p = subprocess.Popen(args, shell=True) 378 | while p.poll() is None: 379 | time.sleep(0.1) 380 | 381 | handshake_results.append(p.returncode) 382 | if len(handshake_results) == COUNT: 383 | if 0 in handshake_results: 384 | handshake_results = [] 385 | pass 386 | else: 387 | # The monitor has alarmed, take action!!! 388 | # Get action from DB 389 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,)) 390 | action = c.fetchall()[0][0] 391 | 392 | # execute it 393 | subprocess.Popen(action, shell=True) 394 | 395 | # Log action 396 | logging.warning(str(fork_pid) + ' Monitor failed: ' + args) 397 | logging.warning('Took action for ' + str(fork_pid) + ' : ' + action) 398 | 399 | # Remove previous monitor job from since it failed 400 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,)) 401 | conn.commit() 402 | break 403 | 404 | # Run the monitor 405 | if not fork: 406 | while True: 407 | logging.info('TCP Handshake Monitor Job (' + str(os.getpid()) + ') Started...') 408 | tcp_monitor(os.getpid()) 409 | if PERSIST == False: 410 | break 411 | 412 | # If user wants to set up a ping monitor 413 | else: 414 | 415 | # Check if user specified Ipv6 416 | if IPV6: 417 | args = 'ping6 -c {} -i {} -W {} {}' .format(COUNT, INTERVAL, TIMEOUT, HOSTIP) 418 | else: 419 | args = 'ping -c {} -i {} -W {} {}' .format(COUNT, INTERVAL, TIMEOUT, HOSTIP) 420 | 421 | def ping_monitor(fork_pid): 422 | # Create DB entry 423 | try: 424 | c.execute('''INSERT INTO jobs VALUES 425 | (?,?,?,?,?,?,?)''', (fork_pid, 'Ping ' + HOSTIP, 'Fails ' +str(COUNT) + ' consecutive *OR* ' +str(COUNT)+ ' complete with average latency > ' + str(LATENCY) + ' ms', ACTION, str(PERSIST), str(now) + ' UTC', LOGGING)) 426 | conn.commit() 427 | except NameError: 428 | print('\nERROR: You haven\'t defined an action to take with \'-a | --action\'') 429 | close_things() 430 | 431 | # Continuously loop through ping 432 | while True: 433 | try: 434 | p = subprocess.Popen(args, shell=True, stdout=subprocess.PIPE) 435 | while p.poll() is None: 436 | time.sleep(0.1) 437 | output = p.communicate()[0].decode('utf-8') 438 | m = re.search(r'\d+\.?\d*/(\d+\.?\d*)/\d+\.?\d* ms', output) 439 | if m: 440 | average_latency = m.group(1) 441 | 442 | if p.returncode != 0 or float(average_latency) > float(LATENCY): 443 | # The monitor has alarmed, take action!!! 444 | # Get action from DB 445 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,)) 446 | action = c.fetchall()[0][0] 447 | 448 | # execute it 449 | subprocess.Popen(action, shell=True) 450 | 451 | # Log action 452 | logging.warning(str(fork_pid) + ' Ping Latency Monitor failed (avg latency ' + str(average_latency) + ' ms): ' + args) 453 | logging.warning('Took action for ' + str(fork_pid) + ' : ' + action) 454 | 455 | # Remove previous monitor job from since it failed 456 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,)) 457 | conn.commit() 458 | break 459 | except: 460 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,)) 461 | action = c.fetchall()[0][0] 462 | 463 | # execute it 464 | subprocess.Popen(action, shell=True) 465 | 466 | # Log action 467 | logging.warning(str(fork_pid) + ' Ping Monitor failed: ' + args) 468 | logging.warning('Took action for ' + str(fork_pid) + ' : ' + action) 469 | 470 | # Remove previous monitor job from since it failed 471 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,)) 472 | conn.commit() 473 | break 474 | 475 | # Run the monitor 476 | if not fork: 477 | while True: 478 | logging.info('Ping Monitor Job (' + str(os.getpid()) + ') Started...') 479 | ping_monitor(os.getpid()) 480 | if PERSIST == False: 481 | break 482 | 483 | close_things() --------------------------------------------------------------------------------