├── LICENSE
├── README.md
├── image.png
├── install.sh
├── threshold-3.2.tar.gz
└── threshold.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 secureoptions
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, and distribute copies of the Software,
9 | and to permit persons to whom the Software is furnished to do so, subject to
10 | the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
23 | The author of this software is Benjamin J Fowler
24 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # threshold
2 | A simple tool which allows you to set up a packet-loss/latency, TCP-handshake, or HTTP/HTTPs file transfer monitor against a network host. The monitor will execute a user-defined action if it detects failure to the host.
3 |
4 | - [Installation](#installation)
5 | - [Command-Line Structure](#structure)
6 | - [Parameters](#parameters)
7 | - [Examples and Scenarios](#examples)
8 | - [Tips and Best Practices](#tips)
9 | - [Threshold Logging](#logging)
10 |
11 |
12 |
13 | ## Installation
14 | \# *From Linux terminal download the source code from Github*
15 | `wget https://github.com/secureoptions/threshold/raw/master/threshold-3.2.tar.gz`
16 |
17 | \# *Unpack and change into installation directory*
18 | `tar -xvzf threshold-3.2.tar.gz`
19 | `cd threshold-3.2/`
20 |
21 | \# *Run the installation script*
22 | `sudo ./install.sh`
23 |
24 | \# *Verify installation*
25 | `threshold --version`
26 |
27 |
28 |
29 | ## Command Structure
30 |
31 |
32 |
33 |
34 |
35 |
36 | ## Parameters
37 | __-a | --action__
38 | The user-defined action to take if the threshold monitor is triggered. This can be just about any command that you can execute from a linux system CLI.
39 |
40 | __-b | --backoff__
41 | Default is 60. Only used when specifying a http:// or https:// as target host (-d). This is the interval in seconds between consecutive download tests. It is sometimes needed if a target webserver throttles consecutive web requests from the same source.
42 |
43 | __-c | --count__
44 | Default is 3. The number of *consective* pings that must fail before the threshold monitor is considered to be in a failed state. If used with TCP handshakes (-P), it's the number of consecutive handshakes that must fail before the monitor is considered in a failed state.
45 |
46 | __-d | --destination__
47 | The target host IP or DNS hostname that you want to monitor. If this host becomes unresponsive for the parameters you define, then the action (-a "") is taken. __Important Note:__ if you use a prefix of http:// or https:// in the destination, this will create a download monitor which checks the actual download transfer time against the timeout (-t) that you have defined. If the download transfer time exceeds the timeout, then the monitor is considered in a failed state.
48 |
49 | __-i | --interval__
50 | Default is 5. The interval in seconds at which you want to send out individual ping packets. If used with TCP (-P), the interval in seconds that TCP handshakes will be initiated.
51 |
52 | __-k | --kill__
53 | Use to kill either a specific threshold monitor job (ie. threshold -k 3509) or kill ALL jobs (ie. threshold -k)
54 |
55 | __-l | --list__
56 | List the active threshold monitor jobs
57 |
58 | __-L | --latency__
59 | The maximum latency in milliseconds that a ping monitor can *average*. A ping monitor's average is calculated based on the number of pings it uses (-c). Therefore, its average latency reading is more accurate with the higher number of pings it's configured to use.
60 |
61 | __-o | --logging__
62 | User-defined log location for threshold. This is logging for threshold monitoring (not to be confused with *action* output). Default location is /var/log/threshold.log
63 |
64 | __-p | --persist__
65 | When setting this argument, your threshold monitor will remain persistent even if it has failed once. In this scenario, the monitor will execute the action you define, and then start itself again with same job parameters.
66 |
67 | __-P | --port__
68 | The TCP port that will be used to establish TCP handshakes on. Using this flag will also cause threshold to use a TCP-handshake monitor rather than a ping monitor.
69 |
70 | __-t | --timeout__
71 | Default is 1. The time in seconds to wait for a response back to ping or TCP SYN/ACK from target. If used with (-P) then timeout is not only the amount of time to wait for response for TCP SYN/ACK, but also the time to wait before sending FIN on successful TCP connections. When used with HTTP/HTTPs monitor, it is the maximum amount of time a download has to complete before placing the monitor into a 'failed' state, triggering the action.
72 |
73 | __-u | --uninstall__
74 | Uninstall threshold from your system. This will also stop any current jobs you have running.
75 |
76 | __-v | --version__
77 | The current version of threshold
78 |
79 | __-6 | --ipv6__
80 | Uses ipv6 rather than ipv4.
81 |
82 |
83 |
84 |
85 | ## Use cases and examples
86 | __(Example: Setting a ping monitor)__ Client machines have been experiencing sporadic connection timeouts when trying to SSH into a linux server (192.168.3.10). You suspect potential packet loss or high latency somewhere in the network. For troubleshooting you choose to use MTR to check the network path when the issue occurs again (credits:https://github.com/traviscross/mtr). MTR will run from one of the impacted client's machines:
87 |
88 | sudo threshold -c 5 -L 50 -d 192.168.3.10 -a "mtr -r -c 100 192.168.3.10 >> mtr-results.txt" -p
89 |
90 | The above example sets a simple ping monitor against (-d) *192.168.3.10*. If the host fails to respond to 5 consecutive pings (-c) __*OR*__ if the average latency of the 5 pings exceed 50ms (-L), the MTR tool will execute with its own arguments (-a), etc.
91 |
92 | __(Example: Setting a TCP handshake monitor)__ After troubleshooting some application issues, you noticed that you are getting occasional connection timeouts between your app server and database, "mydb.organization.org" (SQL/TCP 1433). You want to determine if this problem is due to a network issue or perhaps something higher up the stack. A packet capture with tcpdump may be appropriate at the next occurence of the issue (credits:http://www.tcpdump.org/):
93 |
94 | sudo threshold -c 6 -d mydb.organization.org -P 1433 -a "tcpdump -i eth0 host mydb.organization.org -c 10 -w db_capture.pcap" -p
95 |
96 | The above example will continually monitor TCP handshakes with *mydb.organization.org*. Aside from default values used, if this destination fails to respond to 6 consecutive handshake attempts (-c) on TCP port 1433 (-P) then a tcpdump packet capture will run and export results to a wireshark-readable file (-a). Note that setting the -P argument here is the only factor that tells threshold to use TCP handshakes instead of pings
97 |
98 | __(Example: Setting an HTTP/HTTPS file transfer monitor)__ You noticed that when downloading content from your webserver to your workstation, it sometimes takes longer than expected. From your particular network it usually takes about 5 minutes to complete a 100MB, but lately this is less frequently the case.
99 |
100 | Since previous ping and TCP-handshake monitors have come back clean, you decided that using a HTTP file transfer monitor in conjunction with an iperf3 test (the *action*) would be appropriate here. The iperf3 test will give you an idea of the raw throughput capabilities of your network the next time the issue occurs (credits: https://iperf.fr/iperf-download.php)
101 |
102 | sudo threshold -d http://mywebserver.com/some/100MBfile.zip -t 300 -b 10 -a "iperf3 -c mywebserver.com -time 300 --logfile iperf3-results.txt" -p
103 |
104 | Aside from default values, the above monitor will download a "100MBfile.zip" file from mywebserver.com. The download must complete in 5min or 300 seconds (-t). If this time is exceeded, the iperf3 action will be taken (-a). The interval between downloads is every 10 seconds (-b).
105 |
106 | Note that threshold will know that it should use downloads as monitor rather than ping and TCP handshakes since you have prefixed the host with *http://*, telling it that it's monitoring a webserver.
107 |
108 |
109 |
110 |
111 | ## Tips for successful monitoring
112 | 1) Make sure monitors will not fail immediately upon starting them. For example, make sure you can actually ping *example.com* before setting up a monitor against it
113 |
114 | 2) If you use system variables (ie. $HOME), be sure to enclose the action in double quotes ("") and escape the variable (ie. \$HOME)
115 |
116 | 3) Don't make the monitors too aggressive. There is a good chance they will fail based on a false alarm
117 |
118 | 4) Don't make the monitors too tolerant. You could miss critical events. To balance too aggressive vs. too tolerant, research and understand as much about the issue as you can prior to configuring a monitor.
119 |
120 | 5) Consider setting up multiple types of monitoring against one destination, with multiple criteria, settings and actions (followup analysis). This is helpful when you don't know which layer an issue is happening at.
121 |
122 | 6) Match appropriate actions with appropriate monitors. For example, it may make more sense to run a MTR as a followup action to a ping monitor, compared to running an iperf3 test as an action, etc. Conversely, it may make more sense to run an iperf3 test as the followup action to an HTTP file transfer monitor.
123 |
124 | 7) Your actions should have self-contained limits. For example, you might want to specify a max filesize of 10MB on pcaps, or a timelimit on iperf3 test, etc. These limits help reduce overall consumption and load on your system when you're away.
125 |
126 |
127 |
128 |
129 | ## Threshold Logging
130 | You can specify the path that you want threshold to log information about jobs by using the __-o__ flag. Logging contains information such as when jobs started, when and if they failed, or if they were stopped by a user. This log tells you what happen to threshold jobs *NOT* the subsequent actions that were executed. If you want the output and results of actions, you will need to define those output parameters within the action itself.
131 |
132 | Logging is helpful with all monitors, but especially the HTTP/HTTPs monitor. Not only will it tell you if a download failed to complete within a given timeout (-t), it will also log whether the download failed due to a particular HTTP/HTTPs error response code or if there was a TCP connection error.
133 |
--------------------------------------------------------------------------------
/image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/secureoptions/threshold/cc98e4abf99f3a07058a4092cf3d3423ce7bc587/image.png
--------------------------------------------------------------------------------
/install.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 |
3 | # Installation script must be run with superuser
4 | if [ "$(id -u)" -ne 0 ]
5 | then
6 | echo ""
7 | echo "Permission denied (run with 'sudo'). Installation exiting..."
8 | echo ""
9 | exit 1
10 | fi
11 |
12 | # Check if python 2.7 is installed
13 | python --version > /dev/null 2>1
14 |
15 | result=$?
16 |
17 | if [ $result -ne 0 ]
18 | then
19 | echo ""
20 | echo 'You must have Python 2.7 installed before running this script'
21 | echo ""
22 | exit 1
23 | fi
24 |
25 | # Check if local system is a Mac
26 | sw_vers > /dev/null 2>1
27 |
28 | result=$?
29 | if [ $result -eq 0 ]
30 | then
31 | mv threshold.py /usr/local/bin/threshold
32 | else
33 | mv threshold.py /usr/bin/threshold
34 | fi
35 |
36 | mkdir -p /etc/threshold
37 | mv threshold.1.man /usr/share/man/man1/threshold.1
38 | gzip -f /usr/share/man/man1/threshold.1
39 |
40 | echo ""
41 | echo "Installation sucessful! You now safely remove this directory and install script"
42 | echo ""
--------------------------------------------------------------------------------
/threshold-3.2.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/secureoptions/threshold/cc98e4abf99f3a07058a4092cf3d3423ce7bc587/threshold-3.2.tar.gz
--------------------------------------------------------------------------------
/threshold.py:
--------------------------------------------------------------------------------
1 | #! /usr/bin/env python
2 |
3 | import os
4 | import sys
5 |
6 | VERSION="threshold v3.2"
7 |
8 | # if user is checking version
9 | try:
10 | if sys.argv[1] in ('-v', '--version'):
11 | print(VERSION)
12 | sys.exit()
13 | except IndexError:
14 | pass
15 |
16 | # Check if user is sudo
17 | if os.getuid() != 0:
18 | print('\nPermission ERROR: You must run this tool as a superuser\n' )
19 | sys.exit()
20 |
21 | import re
22 | import signal
23 | import shutil
24 | import sqlite3
25 | import subprocess
26 | import time
27 | import threading
28 | import logging
29 | import getpass
30 | from datetime import datetime
31 |
32 |
33 | PORT=0
34 | TIMEOUT=1
35 | INTERVAL=5
36 | BACKOFF=60
37 | COUNT=3
38 | LATENCY = 500
39 | KILL=False
40 | LIST=False
41 | PERSIST=False
42 | UNINSTALL=False
43 | VERS=False
44 | IPV6=False
45 | LOGGING='/var/log/threshold.log'
46 |
47 |
48 | # Verify that at least one argument is used with tool
49 | if len(sys.argv) == 1:
50 | print('{}: You must specify at least one argument when using this tool:\n\n'
51 | '\t-d|--destination\n'
52 | '\t-t|--timeout\n'
53 | '\t-i|--interval\n'
54 | '\t-c|--count\n'
55 | '\t-a|--action\n'
56 | '\t-P|--port\n'
57 | '\t-k|--kill\n'
58 | '\t-b|--backoff\n'
59 | '\t-p|--persist\n'
60 | '\t-l|--list\n'
61 | '\t-u|--uninstall\n'
62 | '\t-6|--ipv6\n'
63 | '\t-L|--latency\n'
64 | '\t-o|--logging\n'
65 | '\t-v|--version\n\n'
66 | 'USAGE SYNTAX\n'
67 | 'sudo threshold [MONITOR OPTIONS] -a "[ACTION TO TAKE]"\n\n'
68 | 'Check out \'man threshold\' for more info' .format (sys.argv[0]))
69 | sys.exit()
70 |
71 |
72 |
73 | else:
74 | # Create one dict and one list. The one dict is of flags that require user parameters, the other list is of binary flags
75 | parameter_flags = {}
76 | binary_flags = []
77 | for flag in sys.argv[1::]:
78 | # Flags requiring user-provided parameters
79 | accepted_flags = ['-L', '--latency','-d', '--destination','-t','--timeout', '-i','--interval', '-c','--count','-a', '--action','-P', '--port', '-o','--logging','-b', '--backoff']
80 | # Binary Flags
81 | next_accepted_flags = ['-p', '--persist', '-l', '--list', '-u', '--uninstall', '-6', '--ipv6']
82 |
83 | if flag in accepted_flags:
84 | try:
85 | if sys.argv[sys.argv.index(flag) + 1] not in accepted_flags + next_accepted_flags + ['-k', '--kill']:
86 | parameter_flags[flag] = sys.argv[sys.argv.index(flag) + 1]
87 | except:
88 | print('ERROR: Not all parameters were provided. See \'man threshold\' for assistance.')
89 | sys.exit()
90 |
91 | # could take optional parameter as well as be binary
92 | elif flag in ['-k', '--kill']:
93 | try:
94 | if re.match(r'\d{2,8}', sys.argv[sys.argv.index(flag) + 1]):
95 | KILL = sys.argv[sys.argv.index(flag) + 1]
96 | else:
97 | KILL = 'killall'
98 | except:
99 | KILL = 'killall'
100 |
101 |
102 | elif flag in next_accepted_flags:
103 | binary_flags.append(flag)
104 |
105 |
106 | for flag in parameter_flags:
107 | if flag in ('-d', '--destination'):
108 | HOSTIP = parameter_flags[flag]
109 |
110 | elif flag in ('-t','--timeout'):
111 | TIMEOUT = int(parameter_flags[flag])
112 |
113 | elif flag in ('-i','--interval'):
114 | INTERVAL = int(parameter_flags[flag])
115 |
116 | elif flag in ('-c','--count'):
117 | COUNT = int(parameter_flags[flag])
118 |
119 | elif flag in ('-a', '--action'):
120 | ACTION = parameter_flags[flag]
121 |
122 | elif flag in ('-P', '--port'):
123 | PORT = int(parameter_flags[flag])
124 |
125 | elif flag in ('-b', '--backoff'):
126 | BACKOFF = int(parameter_flags[flag])
127 |
128 | elif flag in ('-L', '--latency'):
129 | LATENCY = int(parameter_flags[flag])
130 |
131 | elif flag in ('-o', '--logging'):
132 | LOGGING = parameter_flags[flag]
133 |
134 |
135 | for flag in binary_flags:
136 | if flag in ('-p', '--persist'):
137 | PERSIST = True
138 |
139 | elif flag in ('-l', '--list'):
140 | LIST = True
141 |
142 | elif flag in ('-u', '--uninstall'):
143 | UNINSTALL = True
144 |
145 | elif flag in ('-6', '--ipv6'):
146 | IPV6 = True
147 |
148 |
149 | # Setup DB
150 | username = getpass.getuser()
151 | conn = sqlite3.connect('.threshold.db')
152 | c = conn.cursor()
153 |
154 |
155 | def close_things():
156 | conn.commit()
157 | conn.close()
158 | sys.exit()
159 |
160 |
161 | c.execute('''CREATE TABLE IF NOT EXISTS jobs
162 | (pid integer, monitor text, criteria text, action text, persistent text, time text, logging text)''')
163 |
164 | def killall():
165 | c.execute('SELECT pid,logging FROM jobs')
166 | pids = c.fetchall()
167 |
168 | if pids:
169 | print("\nKilling All jobs...")
170 | for pid in pids:
171 | # Set up logging
172 | LOGGING = pid[1]
173 | logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', filename=LOGGING,level=logging.DEBUG)
174 | try:
175 | c.execute('DELETE FROM jobs WHERE pid=?', (pid[0],))
176 | os.kill(pid[0], signal.SIGTERM)
177 | logging.info('User killed threshold job: ' + str(pid[0]))
178 | except:
179 | pass
180 | else:
181 | print("\nThere are no current jobs running...Exiting")
182 |
183 |
184 | # If user wants to uninstall threshold
185 | if UNINSTALL:
186 | response = raw_input("\nAre you sure you want to uninstall threshold?\n"
187 | "Choose Y or N.\n"
188 | "---------------------------------------------\n"
189 | "y\\N> ") or "n"
190 |
191 | if response.lower() == 'y':
192 | killall()
193 | try:
194 | os.remove('/usr/bin/threshold')
195 | except:
196 | pass
197 | try:
198 | os.remove('/usr/local/bin/threshold')
199 | except:
200 | pass
201 | try:
202 | os.remove('/usr/share/man/man1/threshold.1')
203 | os.remove('/usr/share/man/man1/threshold.1.gz')
204 | except:
205 | pass
206 | print('\nthreshold has been uninstalled...')
207 |
208 | # If user wants to list existing jobs
209 | elif LIST:
210 | c.execute('SELECT * FROM jobs')
211 | jobs = c.fetchall()
212 |
213 | if jobs:
214 | for j in jobs:
215 | print('\nJobID: {}\n'
216 | 'Monitor Type: {}\n'
217 | 'Failure Criteria: {}\n'
218 | 'Action: {}\n'
219 | 'Persistent: {}\n'
220 | 'Monitor Running Since: {}\n'
221 | 'Threshold Logging: {}\n'
222 | .format(j[0], j[1], j[2], j[3], j[4], j[5], j[6]))
223 | else:
224 | print('\nNo registered jobs.')
225 |
226 |
227 |
228 | # If user wants to kill one or ALL jobs
229 | elif KILL == 'killall':
230 | response = raw_input("\nAre you sure you want to kill ALL threshold jobs?\n"
231 | "Choose Y or N.\n"
232 | "---------------------------------------------\n"
233 | "y\\N > ") or "n"
234 |
235 | if response.lower() == 'y':
236 | killall()
237 |
238 | elif KILL != False:
239 | try:
240 | # Set up logging
241 | c.execute('SELECT logging FROM jobs WHERE pid=?', (int(KILL),))
242 | try:
243 | LOGGING = c.fetchall()[0][0]
244 | logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', filename=LOGGING,level=logging.DEBUG)
245 | c.execute('DELETE FROM jobs WHERE pid=?', (int(KILL),))
246 | os.kill(int(KILL), signal.SIGTERM)
247 | logging.info('User killed threshold job: ' + KILL)
248 | except IndexError:
249 | print('\nThere is no job id {} currently registered with threshold' .format(KILL))
250 | except:
251 | pass
252 |
253 | # If host ip exists, assume user is setting up a monitor
254 | elif HOSTIP:
255 | # Set up logging
256 | logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', filename=LOGGING,level=logging.DEBUG)
257 | # current date and time
258 | now = datetime.utcnow()
259 |
260 | # fork parent process to start child threshold jobs with
261 | fork = os.fork()
262 |
263 | # If user wants to set up HTTP/S monitor
264 | hostip_list = HOSTIP.split('://')
265 | if len(hostip_list) > 1:
266 |
267 | def http_monitor(fork_pid):
268 | global PERSIST
269 | # Check if necessary parameters have been filled out
270 | if TIMEOUT <= 1:
271 | print('\nERROR: It doesn\'t look like you specified TIMEOUT (\'-t | --timeout\') for the download monitor')
272 | close_things()
273 |
274 | # Check if user specified Ipv6
275 | if IPV6:
276 | args = 'curl -g -6 -o /dev/null -s -k -m {} "{}"' .format(TIMEOUT,HOSTIP)
277 | else:
278 | args = 'curl -o /dev/null -s -k -m {} {}' .format(TIMEOUT,HOSTIP)
279 |
280 | # Create DB entry
281 | try:
282 | c.execute('''INSERT INTO jobs VALUES
283 | (?,?,?,?,?,?,?)''', (fork_pid, 'Loop download of ' + HOSTIP, 'Completes in > ' + str(TIMEOUT) + ' second(s)', ACTION, str(PERSIST), str(now) + ' UTC', LOGGING))
284 | conn.commit()
285 | except NameError:
286 | print('\nERROR: You haven\'t defined an action to take with \'-a | --action\'')
287 | close_things()
288 |
289 | while True:
290 | # Check that HTTP(s) response code is 2xx or 3xx, but fail if anything else
291 | p = subprocess.Popen('curl -I -s -k ' + HOSTIP, shell=True, stdout=subprocess.PIPE).communicate()[0].decode('utf-8')
292 |
293 | m = re.search(r'HTTP/1.1 (\d+)', p)
294 | if m:
295 | code = m.group(1)
296 | if int(code[0]) not in [2,3]:
297 | # Get defined action
298 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,))
299 | action = c.fetchall()[0][0]
300 |
301 | # execute it
302 | subprocess.Popen(action, shell=True)
303 | logging.warning(str(fork_pid) + ' Monitor failed due HTTP(s) code (' + code + '): ' + args)
304 |
305 | logging.warning('Took action for ' + str(fork_pid) + ': ' + action)
306 |
307 | # Remove previous monitor job from since it triggered
308 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,))
309 | conn.commit()
310 | PERSIST = False
311 | break
312 |
313 | # Start download
314 | p = subprocess.Popen(args, shell=True)
315 |
316 | while p.poll() is None:
317 | time.sleep(0.1)
318 |
319 | if p.returncode == 0:
320 | pass
321 |
322 | else:
323 | # Download did not finish within user-defined time. Execute defined action
324 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,))
325 | action = c.fetchall()[0][0]
326 |
327 | # execute it
328 | subprocess.Popen(action, shell=True)
329 |
330 | # Log action
331 | if p.returncode == 28:
332 | logging.warning(str(fork_pid) + ' Monitor failed due to taking too long: ' + args)
333 | else:
334 | logging.warning(str(fork_pid) + ' failed with curl error code \'' + p.returncode + '\'')
335 | logging.warning(str(fork_pid) + ' reference error codes at https://curl.haxx.se/libcurl/c/libcurl-errors.html')
336 | logging.warning('Took action for ' + str(fork_pid) + ': ' + action)
337 |
338 | # Remove previous monitor job from since it triggered
339 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,))
340 | conn.commit()
341 | break
342 | time.sleep(int(BACKOFF))
343 |
344 | # Run the monitor
345 | if not fork:
346 | while True:
347 | logging.info('HTTP/HTTPS Download Monitor Job (' + str(os.getpid()) + ') Started...')
348 | http_monitor(os.getpid())
349 | if PERSIST == False:
350 | break
351 |
352 |
353 | # If user wants to set up a TCP handshake monitor
354 | elif PORT != 0:
355 |
356 | # Check if user specified Ipv6
357 | if IPV6:
358 | args = 'nc -w -6 {} {} {} >/dev/null 2>&1' .format(TIMEOUT, HOSTIP, PORT)
359 | else:
360 | # Use ipv5
361 | args = 'nc -w {} {} {} >/dev/null 2>&1' .format(TIMEOUT, HOSTIP, PORT)
362 |
363 | def tcp_monitor(fork_pid):
364 | # Create DB entry
365 | try:
366 | c.execute('''INSERT INTO jobs VALUES
367 | (?,?,?,?,?,?,?)''', (fork_pid, 'TCP handshake with ' + HOSTIP, 'Fails ' + str(COUNT) +' consecutive *OR* ' + str(COUNT) + ' complete with average time > ' + str(TIMEOUT) + ' second(s)', ACTION, str(PERSIST), str(now) + ' UTC', LOGGING))
368 | conn.commit()
369 | except NameError:
370 | print('\nERROR: You haven\'t defined an action to take with \'-a | --action\'')
371 | close_things()
372 |
373 | # Continuously loop through nc handshake until it fails
374 | handshake_results = []
375 | while True:
376 | time.sleep(INTERVAL)
377 | p = subprocess.Popen(args, shell=True)
378 | while p.poll() is None:
379 | time.sleep(0.1)
380 |
381 | handshake_results.append(p.returncode)
382 | if len(handshake_results) == COUNT:
383 | if 0 in handshake_results:
384 | handshake_results = []
385 | pass
386 | else:
387 | # The monitor has alarmed, take action!!!
388 | # Get action from DB
389 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,))
390 | action = c.fetchall()[0][0]
391 |
392 | # execute it
393 | subprocess.Popen(action, shell=True)
394 |
395 | # Log action
396 | logging.warning(str(fork_pid) + ' Monitor failed: ' + args)
397 | logging.warning('Took action for ' + str(fork_pid) + ' : ' + action)
398 |
399 | # Remove previous monitor job from since it failed
400 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,))
401 | conn.commit()
402 | break
403 |
404 | # Run the monitor
405 | if not fork:
406 | while True:
407 | logging.info('TCP Handshake Monitor Job (' + str(os.getpid()) + ') Started...')
408 | tcp_monitor(os.getpid())
409 | if PERSIST == False:
410 | break
411 |
412 | # If user wants to set up a ping monitor
413 | else:
414 |
415 | # Check if user specified Ipv6
416 | if IPV6:
417 | args = 'ping6 -c {} -i {} -W {} {}' .format(COUNT, INTERVAL, TIMEOUT, HOSTIP)
418 | else:
419 | args = 'ping -c {} -i {} -W {} {}' .format(COUNT, INTERVAL, TIMEOUT, HOSTIP)
420 |
421 | def ping_monitor(fork_pid):
422 | # Create DB entry
423 | try:
424 | c.execute('''INSERT INTO jobs VALUES
425 | (?,?,?,?,?,?,?)''', (fork_pid, 'Ping ' + HOSTIP, 'Fails ' +str(COUNT) + ' consecutive *OR* ' +str(COUNT)+ ' complete with average latency > ' + str(LATENCY) + ' ms', ACTION, str(PERSIST), str(now) + ' UTC', LOGGING))
426 | conn.commit()
427 | except NameError:
428 | print('\nERROR: You haven\'t defined an action to take with \'-a | --action\'')
429 | close_things()
430 |
431 | # Continuously loop through ping
432 | while True:
433 | try:
434 | p = subprocess.Popen(args, shell=True, stdout=subprocess.PIPE)
435 | while p.poll() is None:
436 | time.sleep(0.1)
437 | output = p.communicate()[0].decode('utf-8')
438 | m = re.search(r'\d+\.?\d*/(\d+\.?\d*)/\d+\.?\d* ms', output)
439 | if m:
440 | average_latency = m.group(1)
441 |
442 | if p.returncode != 0 or float(average_latency) > float(LATENCY):
443 | # The monitor has alarmed, take action!!!
444 | # Get action from DB
445 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,))
446 | action = c.fetchall()[0][0]
447 |
448 | # execute it
449 | subprocess.Popen(action, shell=True)
450 |
451 | # Log action
452 | logging.warning(str(fork_pid) + ' Ping Latency Monitor failed (avg latency ' + str(average_latency) + ' ms): ' + args)
453 | logging.warning('Took action for ' + str(fork_pid) + ' : ' + action)
454 |
455 | # Remove previous monitor job from since it failed
456 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,))
457 | conn.commit()
458 | break
459 | except:
460 | c.execute('SELECT action FROM jobs WHERE pid=?', (fork_pid,))
461 | action = c.fetchall()[0][0]
462 |
463 | # execute it
464 | subprocess.Popen(action, shell=True)
465 |
466 | # Log action
467 | logging.warning(str(fork_pid) + ' Ping Monitor failed: ' + args)
468 | logging.warning('Took action for ' + str(fork_pid) + ' : ' + action)
469 |
470 | # Remove previous monitor job from since it failed
471 | c.execute('DELETE FROM jobs WHERE pid=?', (fork_pid,))
472 | conn.commit()
473 | break
474 |
475 | # Run the monitor
476 | if not fork:
477 | while True:
478 | logging.info('Ping Monitor Job (' + str(os.getpid()) + ') Started...')
479 | ping_monitor(os.getpid())
480 | if PERSIST == False:
481 | break
482 |
483 | close_things()
--------------------------------------------------------------------------------