Latency Statistics for ${rrdname}

├── .gitignore ├── LICENSE ├── Makefile ├── Makefile.freebsd ├── NOTES.md ├── README.md ├── dpinger.c ├── influx ├── README.md ├── dpinger_grafana_dashboard.json ├── dpinger_influx_logger └── dpinger_start.sh └── rrd ├── README.md ├── dpinger_rrd_create ├── dpinger_rrd_gencgi ├── dpinger_rrd_graph ├── dpinger_rrd_update └── sample.html /.gitignore: -------------------------------------------------------------------------------- 1 | /dpinger 2 | /dpinger.debug 3 | /dpinger.full 4 | /dpinger.o 5 | /.*.swp 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015-2022, Denny Page 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | 1. Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | 2. Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 24 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | #CC=clang 2 | #CFLAGS=-Wall -Wextra -pthread -g -O2 3 | 4 | all: dpinger 5 | 6 | .PHONY: clean 7 | clean: 8 | rm -f dpinger 9 | -------------------------------------------------------------------------------- /Makefile.freebsd: -------------------------------------------------------------------------------- 1 | PROG= dpinger 2 | MAN= 3 | 4 | BINDIR= ${PREFIX}/bin 5 | WARNS= 6 6 | 7 | LDADD= -lpthread 8 | 9 | .include 10 | -------------------------------------------------------------------------------- /NOTES.md: -------------------------------------------------------------------------------- 1 | Loss accuracy 2 | 3 | In general, dpinger works a bit differently than other latency monitors. Rather than a "probe" that fires off and processes a handful of echo request/replies all at once, dpinger maintains a rolling array of echo requests spaced on the send interval. In other words, instead of waking up every second and sending 4 echo requests at once, dpinger sends an echo request every 250 milliseconds. When dpinger receives an echo reply, the time difference between the request packet and reply packet (latency) is recorded. There is nothing that times out an echo request/reply and records it as permanently lost. 4 | 5 | When the alert check is made, or a report is generated, dpinger goes through the array and examines each echo request. If a reply has been received, it is used as part of the overall latency calculation. If a reply has not yet been received, the amount of time since the request is compared against the loss interval. If it is greater than the loss interval, the request/reply is counted as lost in the current report. However the concept of the request/reply being lost is not a permanent decision. In subsequent reports, if a the missing reply has been received, its latency will be used instead of being counted as lost. 6 | 7 | It's important to keep in mind that latency and loss are reported as averages across the entire request set. The default time period for dpinger is 60 seconds, with an echo request being sent every 500 milliseconds. This means that the latency and loss will be reported as averages across 116-120 samples. The alert check runs every second by default. So each time, the 4 oldest entries in the set have been replaced by the 4 newest ones. 8 | 9 | Note that if you want accurate loss reporting, it is important that the number of samples be sufficient. In order to achieve 1% loss resolution, you have need more than 100 samples in the set. The calculation for loss resolution is: 10 | 11 | 100 / ((time_period - loss_interval) / send_interval) 12 | 13 | The default settings for dpinger report loss with an accuracy of 0.87%. 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # dpinger 2 | 3 | dpinger is a daemon for continuous monitoring of latency and loss on a network connection. It is 4 | intended for use by firewalls to monitor link health, as well as for providing information to 5 | various monitoring systems such as Cacti, Nagios, Zabbix, etc. 6 | 7 | The output of dpinger can either be file or socket based, and consists of three numbers: 8 | 9 | 10 | 11 | dpinger also provides for invocation of a command based upon threshold values 12 | for Average Latency or Percentage of Loss. Arguments to the command are: 13 | 14 | 15 | 16 | In addition to command invocation, dpinger can also log alerts via syslog. 17 | 18 | If several instances of dpinger are being used to monitor different targets, or the same target 19 | with different source addresses, etc., an Identifier can be added to the output to identify 20 | which instance of dpinger is the source. This is particularly useful with syslog. 21 | 22 |
23 | 24 | Usage examples: 25 | 26 | dpinger -t 300s -r 60s 192.168.0.1 >> /tmp/dpinger.out 27 | 28 | Monitor IP address 192.168.0.1 for latency and loss. Average results over 5 minutes. 29 | Produce a report every 60 seconds and append it to /tmp/dpinger.out. 30 | 31 | dpinger -r 0 -S -D 250m -L 20 -p /run/dpinger 192.168.0.1 32 | 33 | Monitor IP address 192.168.0.1 for latency and loss. Log alerts via syslog if latency 34 | exceeds 250 milliseconds or loss exceeds 20 percent. Record process id in /run/dpinger. 35 | 36 | dpinger -f -B 192.168.0.50 -r 10s 192.168.0.1 37 | 38 | Monitor IP address 192.168.0.1 for latency and loss. Use 192.168.0.50 as the address 39 | for sending and receiving ICMP packets. Run in the foreground and report status via 40 | stdout every 10 seconds. 41 | 42 | dpinger -R -o /tmp/gw.status fe80::1 -L 35% -C "/var/etc/alert igb1" 43 | 44 | Monitor IP address fe80::1 for latency and loss. Maintain a status file in 45 | /tmp/gw.status with the current status. If packet loss exceeds 35% invoke the following 46 | command: 47 | 48 | /var/etc/alert igb1 fe80::1 49 | 50 | the command will be invoked with an alarm value of 1 when loss exceeds 35%, and again 51 | with an alarm value of 0 when loss returns to below 35%. 52 | 53 | dpinger -r 0 -s 200m -u /tmp/igb1.status -p /run/dpinger fe80::1 54 | 55 | Monitor IP address fe80::1 for latency and loss. Send echo requests every 200 milliseconds. 56 | Make current status available on demand via a Unix domain socket /tmp/igb1.status. Record 57 | process id in /run/dpinger. 58 | 59 | dpinger -S -i Comcast -s 5s -t 600s -r 0 -L 10% -p /run/dpinger 8.8.8.8 60 | 61 | Monitor IP address 8.8.8.8 for latency and loss. Send echo requests every five seconds and 62 | average results over 10 minutes. Log alerts via syslog including identifier string "Comcast" 63 | if average loss exceeds 10 percent. Record process id in /run/dpinger. 64 | -------------------------------------------------------------------------------- /dpinger.c: -------------------------------------------------------------------------------- 1 | 2 | // 3 | // Copyright (c) 2015-2023, Denny Page 4 | // All rights reserved. 5 | // 6 | // Redistribution and use in source and binary forms, with or without 7 | // modification, are permitted provided that the following conditions 8 | // are met: 9 | // 10 | // 1. Redistributions of source code must retain the above copyright 11 | // notice, this list of conditions and the following disclaimer. 12 | // 13 | // 2. Redistributions in binary form must reproduce the above copyright 14 | // notice, this list of conditions and the following disclaimer in the 15 | // documentation and/or other materials provided with the distribution. 16 | // 17 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 18 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 19 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 20 | // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 21 | // HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 22 | // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED 23 | // TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 24 | // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 25 | // LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 26 | // NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 27 | // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | // 29 | 30 | 31 | // Silly that this is required for accept4 on Linux 32 | #define _GNU_SOURCE 33 | 34 | 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | #include 42 | #include 43 | #include 44 | #include 45 | 46 | #include 47 | #include 48 | #include 49 | #include 50 | #include 51 | #include 52 | #include 53 | #include 54 | #include 55 | #include 56 | #include 57 | 58 | #include 59 | #include 60 | 61 | 62 | // Who we are 63 | static const char * progname; 64 | 65 | // Process ID file 66 | static const char * pidfile_name = NULL; 67 | 68 | // Flags 69 | static unsigned int foreground = 0; 70 | static unsigned int flag_rewind = 0; 71 | static unsigned int flag_syslog = 0; 72 | static unsigned int flag_priority = 0; 73 | 74 | // String representation of target 75 | #define ADDR_STR_MAX (INET6_ADDRSTRLEN + IF_NAMESIZE + 1) 76 | static char dest_str[ADDR_STR_MAX]; 77 | 78 | // Time period over which we are averaging results in ms 79 | static unsigned long time_period_msec = 60000; 80 | 81 | // Interval between sends in ms 82 | static unsigned long send_interval_msec = 500; 83 | 84 | // Interval before a sequence is initially treated as lost 85 | // Input from command line in ms and used in us 86 | static unsigned long loss_interval_msec = 0; 87 | static unsigned long loss_interval_usec = 0; 88 | 89 | // Interval between reports in ms 90 | static unsigned long report_interval_msec = 1000; 91 | 92 | // Interval between alert checks in ms 93 | static unsigned long alert_interval_msec = 1000; 94 | 95 | // Threshold for triggering alarms based on latency 96 | // Input from command line in ms and used in us 97 | static unsigned long latency_alarm_threshold_msec = 0; 98 | static unsigned long latency_alarm_threshold_usec = 0; 99 | 100 | // Threshold for triggering alarms based on loss percentage 101 | static unsigned long loss_alarm_threshold_percent = 0; 102 | 103 | // Command to invoke for alerts 104 | static char * alert_cmd = NULL; 105 | static size_t alert_cmd_offset; 106 | 107 | // Interval before an alarm is cleared (hold time) 108 | static unsigned long alarm_hold_msec = 0; 109 | #define DEFAULT_HOLD_PERIODS 10 110 | 111 | // Report file 112 | static const char * report_name = NULL; 113 | static int report_fd; 114 | 115 | // Unix socket 116 | static const char * usocket_name = NULL; 117 | static int usocket_fd; 118 | 119 | static char identifier[64] = "\0"; 120 | 121 | // Length of maximum output (dest_str alarm_flag average_latency_usec latency_deviation average_loss_percent) 122 | #define OUTPUT_MAX (sizeof(identifier) + sizeof(dest_str) + sizeof(" 1 999999999999 999999999999 100\0")) 123 | 124 | 125 | // Main ping status array 126 | typedef struct 127 | { 128 | enum 129 | { 130 | PACKET_STATUS_EMPTY = 0, 131 | PACKET_STATUS_SENT = 1, 132 | PACKET_STATUS_RECEIVED = 2 133 | } status; 134 | 135 | struct timespec time_sent; 136 | unsigned long latency_usec; 137 | } ping_entry_t; 138 | 139 | static ping_entry_t * array; 140 | static unsigned int array_size; 141 | static unsigned int next_slot = 0; 142 | 143 | 144 | // Sockets used to send and receive 145 | static int send_sock; 146 | static int recv_sock; 147 | 148 | // IPv4 / IPv6 parameters 149 | static uint16_t af_family = AF_INET; // IPv6: AF_INET6 150 | static uint8_t echo_request_type = ICMP_ECHO; // IPv6: ICMP6_ECHO_REQUEST 151 | static uint8_t echo_reply_type = ICMP_ECHOREPLY; // IPv6: ICMP6_ECHO_REPLY 152 | static int ip_proto = IPPROTO_ICMP; // IPv6: IPPROTO_ICMPV6 153 | 154 | // Destination address 155 | static struct sockaddr_storage dest_addr; 156 | static socklen_t dest_addr_len; 157 | 158 | // Source (bind) address 159 | static struct sockaddr_storage bind_addr; 160 | static socklen_t bind_addr_len = 0; 161 | 162 | // ICMP echo request/reply header 163 | // 164 | // The physical layout of the ICMP is the same between IPv4 and IPv6 so we define our 165 | // own type for convenience 166 | typedef struct 167 | { 168 | uint8_t type; 169 | uint8_t code; 170 | uint16_t cksum; 171 | uint16_t id; 172 | uint16_t sequence; 173 | } icmphdr_t; 174 | 175 | // Echo request/reply packet buffers 176 | #define IPV4_ICMP_DATA_MAX (IP_MAXPACKET - sizeof(struct ip) - sizeof(icmphdr_t)) 177 | #define IPV6_ICMP_DATA_MAX (IP_MAXPACKET - sizeof(icmphdr_t)) 178 | #define PACKET_BUFLEN (IP_MAXPACKET + 256) 179 | 180 | static unsigned long echo_data_len = 0; 181 | static unsigned int echo_request_len = sizeof(icmphdr_t); 182 | static unsigned int echo_reply_len = IP_MAXPACKET; 183 | static icmphdr_t * echo_request; 184 | static void * echo_reply; 185 | 186 | // Echo id and Sequence information 187 | static uint16_t echo_id; 188 | static uint16_t next_sequence = 0; 189 | static uint16_t sequence_limit; 190 | 191 | // Receive thread ready 192 | static unsigned int recv_ready = 0; 193 | 194 | 195 | // 196 | // Log for abnormal events 197 | // 198 | __attribute__ ((format (printf, 1, 2))) 199 | static void 200 | logger( 201 | const char * format, 202 | ...) 203 | { 204 | va_list args; 205 | 206 | va_start(args, format); 207 | if (flag_syslog) 208 | { 209 | vsyslog(LOG_WARNING, format, args); 210 | } 211 | else 212 | { 213 | vfprintf(stderr, format, args); 214 | } 215 | va_end(args); 216 | } 217 | 218 | 219 | // 220 | // Termination handler 221 | // 222 | __attribute__ ((noreturn)) 223 | static void 224 | term_handler( 225 | int signum) 226 | { 227 | // NB: This function may be simultaneously invoked by multiple threads 228 | if (usocket_name) 229 | { 230 | (void) unlink(usocket_name); 231 | } 232 | if (pidfile_name) 233 | { 234 | (void) unlink(pidfile_name); 235 | } 236 | logger("exiting on signal %d\n", signum); 237 | exit(0); 238 | } 239 | 240 | 241 | // 242 | // Compute checksum for ICMP 243 | // 244 | static uint16_t 245 | cksum( 246 | const uint16_t * p, 247 | int len) 248 | { 249 | uint32_t sum = 0; 250 | 251 | while (len > 1) 252 | { 253 | sum += *p++; 254 | len -= sizeof(*p); 255 | } 256 | 257 | if (len == 1) 258 | { 259 | sum += (uint16_t) *((const uint8_t *) p); 260 | } 261 | 262 | sum = (sum >> 16) + (sum & 0xFFFF); 263 | sum += (sum >> 16); 264 | 265 | return (uint16_t) ~sum; 266 | } 267 | 268 | 269 | // 270 | // sqrt function for standard deviation 271 | // 272 | static unsigned long 273 | llsqrt( 274 | unsigned long long x) 275 | { 276 | unsigned long long prev; 277 | unsigned long long s; 278 | 279 | s = x; 280 | if (s) 281 | { 282 | prev = ~((unsigned long long) 1 << 63); 283 | 284 | while (s < prev) 285 | { 286 | prev = s; 287 | s = (s + (x / s)) / 2; 288 | } 289 | } 290 | 291 | return (unsigned long) s; 292 | } 293 | 294 | 295 | // 296 | // Compute delta between old time and new time in microseconds 297 | // 298 | static unsigned long 299 | ts_elapsed_usec( 300 | const struct timespec * old, 301 | const struct timespec * new) 302 | { 303 | long r_usec; 304 | 305 | // Note that we are using monotonic clock and time cannot run backwards 306 | if (new->tv_nsec >= old->tv_nsec) 307 | { 308 | r_usec = (new->tv_sec - old->tv_sec) * 1000000 + (new->tv_nsec - old->tv_nsec) / 1000; 309 | } 310 | else 311 | { 312 | r_usec = (new->tv_sec - old->tv_sec - 1) * 1000000 + (1000000000 + new->tv_nsec - old->tv_nsec) / 1000; 313 | } 314 | 315 | return (unsigned long) r_usec; 316 | } 317 | 318 | 319 | // 320 | // Send thread 321 | // 322 | __attribute__ ((noreturn)) 323 | static void * 324 | send_thread( 325 | __attribute__ ((unused)) 326 | void * arg) 327 | { 328 | struct timespec sleeptime; 329 | ssize_t len; 330 | int r; 331 | 332 | // Set up our echo request packet 333 | memset(echo_request, 0, echo_request_len); 334 | echo_request->type = echo_request_type; 335 | echo_request->code = 0; 336 | echo_request->id = echo_id; 337 | 338 | // Give the recv thread a moment to initialize 339 | sleeptime.tv_sec = 0; 340 | sleeptime.tv_nsec = 10000; // 10us 341 | do { 342 | r = nanosleep(&sleeptime, NULL); 343 | if (r == -1) 344 | { 345 | logger("nanosleep error in send thread waiting for recv thread: %d\n", errno); 346 | } 347 | } while (recv_ready == 0); 348 | 349 | // Set up the timespec for nanosleep 350 | sleeptime.tv_sec = send_interval_msec / 1000; 351 | sleeptime.tv_nsec = (send_interval_msec % 1000) * 1000000; 352 | 353 | while (1) 354 | { 355 | // Set sequence number and checksum 356 | echo_request->sequence = htons(next_sequence); 357 | echo_request->cksum = 0; 358 | echo_request->cksum = cksum((uint16_t *) echo_request, sizeof(icmphdr_t)); 359 | 360 | array[next_slot].status = PACKET_STATUS_EMPTY; 361 | sched_yield(); 362 | 363 | clock_gettime(CLOCK_MONOTONIC, &array[next_slot].time_sent); 364 | array[next_slot].status = PACKET_STATUS_SENT; 365 | len = sendto(send_sock, echo_request, echo_request_len, 0, (struct sockaddr *) &dest_addr, dest_addr_len); 366 | if (len == -1) 367 | { 368 | logger("%s%s: sendto error: %d\n", identifier, dest_str, errno); 369 | } 370 | 371 | next_slot = (next_slot + 1) % array_size; 372 | next_sequence = (next_sequence + 1) % sequence_limit; 373 | 374 | r = nanosleep(&sleeptime, NULL); 375 | if (r == -1) 376 | { 377 | logger("nanosleep error in send thread: %d\n", errno); 378 | } 379 | } 380 | } 381 | 382 | 383 | // 384 | // Receive thread 385 | // 386 | __attribute__ ((noreturn)) 387 | static void * 388 | recv_thread( 389 | __attribute__ ((unused)) 390 | void * arg) 391 | { 392 | struct sockaddr_storage src_addr; 393 | socklen_t src_addr_len; 394 | ssize_t len; 395 | icmphdr_t * icmp; 396 | struct timespec now; 397 | unsigned int array_slot; 398 | 399 | // Thread startup complete 400 | recv_ready = 1; 401 | 402 | while (1) 403 | { 404 | src_addr_len = sizeof(src_addr); 405 | len = recvfrom(recv_sock, echo_reply, echo_reply_len, 0, (struct sockaddr *) &src_addr, &src_addr_len); 406 | if (len == -1) 407 | { 408 | logger("%s%s: recvfrom error: %d\n", identifier, dest_str, errno); 409 | continue; 410 | } 411 | clock_gettime(CLOCK_MONOTONIC, &now); 412 | 413 | if (af_family == AF_INET) 414 | { 415 | struct ip * ip; 416 | size_t ip_len; 417 | 418 | // With IPv4, we get the entire IP packet 419 | if (len < (ssize_t) sizeof(struct ip)) 420 | { 421 | logger("%s%s: received packet too small for IP header\n", identifier, dest_str); 422 | continue; 423 | } 424 | ip = echo_reply; 425 | ip_len = (size_t) ip->ip_hl << 2; 426 | 427 | icmp = (void *) ((char *) ip + ip_len); 428 | len -= ip_len; 429 | } 430 | else 431 | { 432 | // With IPv6, we just get the ICMP payload 433 | icmp = echo_reply; 434 | } 435 | 436 | // This should never happen 437 | if (len < (ssize_t) sizeof(icmphdr_t)) 438 | { 439 | logger("%s%s: received packet too small for ICMP header\n", identifier, dest_str); 440 | continue; 441 | } 442 | 443 | // If it's not an echo reply for us, skip the packet 444 | if (icmp->type != echo_reply_type || icmp->id != echo_id) 445 | { 446 | continue; 447 | } 448 | 449 | array_slot = ntohs(icmp->sequence) % array_size; 450 | if (array[array_slot].status == PACKET_STATUS_RECEIVED) 451 | { 452 | logger("%s%s: duplicate echo reply received\n", identifier, dest_str); 453 | continue; 454 | } 455 | 456 | array[array_slot].latency_usec = ts_elapsed_usec(&array[array_slot].time_sent, &now); 457 | array[array_slot].status = PACKET_STATUS_RECEIVED; 458 | } 459 | } 460 | 461 | 462 | // 463 | // Generate a report 464 | // 465 | static void 466 | report( 467 | unsigned long *average_latency_usec, 468 | unsigned long *latency_deviation, 469 | unsigned long *average_loss_percent) 470 | { 471 | struct timespec now; 472 | unsigned long packets_received = 0; 473 | unsigned long packets_lost = 0; 474 | unsigned long latency_usec = 0; 475 | unsigned long total_latency_usec = 0; 476 | unsigned long long total_latency_usec2 = 0; 477 | unsigned int slot; 478 | unsigned int i; 479 | 480 | clock_gettime(CLOCK_MONOTONIC, &now); 481 | 482 | slot = next_slot; 483 | for (i = 0; i < array_size; i++) 484 | { 485 | if (array[slot].status == PACKET_STATUS_RECEIVED) 486 | { 487 | packets_received++; 488 | latency_usec = array[slot].latency_usec; 489 | total_latency_usec += latency_usec; 490 | total_latency_usec2 += (unsigned long long) latency_usec * latency_usec; 491 | } 492 | else if (array[slot].status == PACKET_STATUS_SENT && 493 | ts_elapsed_usec(&array[slot].time_sent, &now) > loss_interval_usec) 494 | { 495 | packets_lost++; 496 | } 497 | 498 | slot = (slot + 1) % array_size; 499 | } 500 | 501 | if (packets_received) 502 | { 503 | unsigned long avg = total_latency_usec / packets_received; 504 | unsigned long long avg2 = total_latency_usec2 / packets_received; 505 | 506 | // stddev = sqrt((sum(rtt^2) / packets) - (sum(rtt) / packets)^2) 507 | *average_latency_usec = avg; 508 | *latency_deviation = llsqrt(avg2 - ((unsigned long long) avg * avg)); 509 | } 510 | else 511 | { 512 | *average_latency_usec = 0; 513 | *latency_deviation = 0; 514 | } 515 | 516 | if (packets_lost) 517 | { 518 | *average_loss_percent = packets_lost * 100 / (packets_received + packets_lost); 519 | } 520 | else 521 | { 522 | *average_loss_percent = 0; 523 | } 524 | } 525 | 526 | 527 | // 528 | // Report thread 529 | // 530 | __attribute__ ((noreturn)) 531 | static void * 532 | report_thread( 533 | __attribute__ ((unused)) 534 | void * arg) 535 | { 536 | char buf[OUTPUT_MAX]; 537 | struct timespec sleeptime; 538 | unsigned long average_latency_usec; 539 | unsigned long latency_deviation; 540 | unsigned long average_loss_percent; 541 | ssize_t len; 542 | ssize_t rs; 543 | int r; 544 | 545 | // Set up the timespec for nanosleep 546 | sleeptime.tv_sec = report_interval_msec / 1000; 547 | sleeptime.tv_nsec = (report_interval_msec % 1000) * 1000000; 548 | 549 | while (1) 550 | { 551 | r = nanosleep(&sleeptime, NULL); 552 | if (r == -1) 553 | { 554 | logger("nanosleep error in report thread: %d\n", errno); 555 | } 556 | 557 | report(&average_latency_usec, &latency_deviation, &average_loss_percent); 558 | 559 | len = snprintf(buf, sizeof(buf), "%s%lu %lu %lu\n", identifier, average_latency_usec, latency_deviation, average_loss_percent); 560 | if (len < 0 || (size_t) len > sizeof(buf)) 561 | { 562 | logger("error formatting output in report thread\n"); 563 | } 564 | 565 | rs = write(report_fd, buf, (size_t) len); 566 | if (rs == -1) 567 | { 568 | logger("write error in report thread: %d\n", errno); 569 | } 570 | else if (rs != len) 571 | { 572 | logger("short write in report thread: %zd/%zd\n", rs, len); 573 | } 574 | 575 | if (flag_rewind) 576 | { 577 | (void) ftruncate(report_fd, len); 578 | (void) lseek(report_fd, SEEK_SET, 0); 579 | } 580 | } 581 | } 582 | 583 | 584 | // 585 | // Alert thread 586 | // 587 | __attribute__ ((noreturn)) 588 | static void * 589 | alert_thread( 590 | __attribute__ ((unused)) 591 | void * arg) 592 | { 593 | struct timespec sleeptime; 594 | unsigned long average_latency_usec; 595 | unsigned long latency_deviation; 596 | unsigned long average_loss_percent; 597 | unsigned int alarm_hold_periods; 598 | unsigned int latency_alarm_decay = 0; 599 | unsigned int loss_alarm_decay = 0; 600 | unsigned int alert = 0; 601 | unsigned int alarm_on; 602 | int r; 603 | 604 | // Set up the timespec for nanosleep 605 | sleeptime.tv_sec = alert_interval_msec / 1000; 606 | sleeptime.tv_nsec = (alert_interval_msec % 1000) * 1000000; 607 | 608 | // Set number of alarm hold periods 609 | alarm_hold_periods = (unsigned int) ((alarm_hold_msec + alert_interval_msec - 1) / alert_interval_msec); 610 | 611 | while (1) 612 | { 613 | r = nanosleep(&sleeptime, NULL); 614 | if (r == -1) 615 | { 616 | logger("nanosleep error in alert thread: %d\n", errno); 617 | } 618 | 619 | report(&average_latency_usec, &latency_deviation, &average_loss_percent); 620 | 621 | if (latency_alarm_threshold_usec) 622 | { 623 | if (average_latency_usec > latency_alarm_threshold_usec) 624 | { 625 | if (latency_alarm_decay == 0) 626 | { 627 | alert = 1; 628 | } 629 | 630 | latency_alarm_decay = alarm_hold_periods; 631 | } 632 | else if (latency_alarm_decay) 633 | { 634 | latency_alarm_decay--; 635 | if (latency_alarm_decay == 0) 636 | { 637 | alert = 1; 638 | } 639 | } 640 | } 641 | 642 | if (loss_alarm_threshold_percent) 643 | { 644 | if (average_loss_percent > loss_alarm_threshold_percent) 645 | { 646 | if (loss_alarm_decay == 0) 647 | { 648 | alert = 1; 649 | } 650 | 651 | loss_alarm_decay = alarm_hold_periods; 652 | } 653 | else if (loss_alarm_decay) 654 | { 655 | loss_alarm_decay--; 656 | if (loss_alarm_decay == 0) 657 | { 658 | alert = 1; 659 | } 660 | } 661 | } 662 | 663 | if (alert) 664 | { 665 | alert = 0; 666 | 667 | alarm_on = latency_alarm_decay || loss_alarm_decay; 668 | logger("%s%s: %s latency %luus stddev %luus loss %lu%%\n", identifier, dest_str, alarm_on ? "Alarm" : "Clear", average_latency_usec, latency_deviation, average_loss_percent); 669 | 670 | if (alert_cmd) 671 | { 672 | r = snprintf(alert_cmd + alert_cmd_offset, OUTPUT_MAX, " %s%s %u %lu %lu %lu", identifier, dest_str, alarm_on, average_latency_usec, latency_deviation, average_loss_percent); 673 | if (r < 0 || (size_t) r >= OUTPUT_MAX) 674 | { 675 | logger("error formatting command in alert thread\n"); 676 | continue; 677 | } 678 | 679 | // Note that system waits for the alert command to finish before returning 680 | r = system(alert_cmd); 681 | if (r == -1) 682 | { 683 | logger("error executing command in alert thread\n"); 684 | } 685 | } 686 | } 687 | } 688 | } 689 | 690 | // 691 | // Unix socket thread 692 | // 693 | __attribute__ ((noreturn)) 694 | static void * 695 | usocket_thread( 696 | __attribute__ ((unused)) 697 | void * arg) 698 | { 699 | char buf[OUTPUT_MAX]; 700 | unsigned long average_latency_usec; 701 | unsigned long latency_deviation; 702 | unsigned long average_loss_percent; 703 | int sock_fd; 704 | ssize_t len; 705 | ssize_t rs; 706 | int r; 707 | 708 | while (1) 709 | { 710 | #if defined(DISABLE_ACCEPT4) 711 | // Legacy 712 | sock_fd = accept(usocket_fd, NULL, NULL); 713 | (void) fcntl(sock_fd, F_SETFL, FD_CLOEXEC); 714 | (void) fcntl(sock_fd, F_SETFL, fcntl(sock_fd, F_GETFL, 0) | O_NONBLOCK); 715 | #else 716 | sock_fd = accept4(usocket_fd, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC); 717 | #endif 718 | 719 | report(&average_latency_usec, &latency_deviation, &average_loss_percent); 720 | 721 | len = snprintf(buf, sizeof(buf), "%s%lu %lu %lu\n", identifier, average_latency_usec, latency_deviation, average_loss_percent); 722 | if (len < 0 || (size_t) len > sizeof(buf)) 723 | { 724 | logger("error formatting output in usocket thread\n"); 725 | } 726 | 727 | rs = write(sock_fd, buf, (size_t) len); 728 | if (rs == -1) 729 | { 730 | logger("write error in usocket thread: %d\n", errno); 731 | } 732 | else if (rs != len) 733 | { 734 | logger("short write in usocket thread: %zd/%zd\n", rs, len); 735 | } 736 | 737 | r = close(sock_fd); 738 | if (r == -1) 739 | { 740 | logger("close error in usocket thread: %d\n", errno); 741 | } 742 | } 743 | } 744 | 745 | 746 | 747 | // 748 | // Decode a time argument 749 | // 750 | static int 751 | get_time_arg_msec( 752 | const char * arg, 753 | unsigned long * value) 754 | { 755 | long t; 756 | char * suffix; 757 | 758 | t = strtol(arg, &suffix, 10); 759 | if (*suffix == 'm') 760 | { 761 | // Milliseconds 762 | suffix++; 763 | } 764 | else if (*suffix == 's') 765 | { 766 | // Seconds 767 | t *= 1000; 768 | suffix++; 769 | } 770 | 771 | // Invalid specification? 772 | if (t < 0 || *suffix != 0) 773 | { 774 | return 1; 775 | } 776 | 777 | *value = (unsigned long) t; 778 | return 0; 779 | } 780 | 781 | 782 | // 783 | // Decode a percent argument 784 | // 785 | static int 786 | get_percent_arg( 787 | const char * arg, 788 | unsigned long * value) 789 | { 790 | long t; 791 | char * suffix; 792 | 793 | t = strtol(arg, &suffix, 10); 794 | if (*suffix == '%') 795 | { 796 | suffix++; 797 | } 798 | 799 | // Invalid specification? 800 | if (t < 0 || t > 100 || *suffix != 0) 801 | { 802 | return 1; 803 | } 804 | 805 | *value = (unsigned long) t; 806 | return 0; 807 | } 808 | 809 | 810 | // 811 | // Decode a byte length argument 812 | // 813 | static int 814 | get_length_arg( 815 | const char * arg, 816 | unsigned long * value) 817 | { 818 | long t; 819 | char * suffix; 820 | 821 | t = strtol(arg, &suffix, 10); 822 | if (*suffix == 'b') 823 | { 824 | // Bytes 825 | suffix++; 826 | } 827 | else if (*suffix == 'k') 828 | { 829 | // Kilobytes 830 | t *= 1024; 831 | suffix++; 832 | } 833 | 834 | // Invalid specification? 835 | if (t < 0 || *suffix != 0) 836 | { 837 | return 1; 838 | } 839 | 840 | *value = (unsigned long) t; 841 | return 0; 842 | } 843 | 844 | 845 | // 846 | // Output usage 847 | // 848 | static void 849 | usage(void) 850 | { 851 | fprintf(stderr, "Dpinger version 3.3\n\n"); 852 | fprintf(stderr, "Usage:\n"); 853 | fprintf(stderr, " %s [-f] [-R] [-S] [-P] [-h] [-B bind_addr] [-s send_interval] [-l loss_interval] [-t time_period] [-r report_interval] [-d data_length] [-o output_file] [-A alert_interval] [-D latency_alarm] [-L loss_alarm] [-H hold_interval] [-C alert_cmd] [-i identifier] [-u usocket] [-p pidfile] dest_addr\n\n", progname); 854 | fprintf(stderr, " options:\n"); 855 | fprintf(stderr, " -f run in foreground\n"); 856 | fprintf(stderr, " -R rewind output file between reports\n"); 857 | fprintf(stderr, " -S log warnings via syslog\n"); 858 | fprintf(stderr, " -P priority scheduling for receive thread (requires root)\n"); 859 | fprintf(stderr, " -h display usage\n"); 860 | fprintf(stderr, " -B bind (source) address\n"); 861 | fprintf(stderr, " -s time interval between echo requests (default 500ms)\n"); 862 | fprintf(stderr, " -l time interval before packets are treated as lost (default 4x send interval)\n"); 863 | fprintf(stderr, " -t time period over which results are averaged (default 60s)\n"); 864 | fprintf(stderr, " -r time interval between reports (default 1s)\n"); 865 | fprintf(stderr, " -d data length (default 0)\n"); 866 | fprintf(stderr, " -o output file for reports (default stdout)\n"); 867 | fprintf(stderr, " -A time interval between alerts (default 1s)\n"); 868 | fprintf(stderr, " -D time threshold for latency alarm (default none)\n"); 869 | fprintf(stderr, " -L percent threshold for loss alarm (default none)\n"); 870 | fprintf(stderr, " -H time interval to hold an alarm before clearing it (default 10x alert interval)\n"); 871 | fprintf(stderr, " -C optional command to be invoked via system() for alerts\n"); 872 | fprintf(stderr, " -i identifier text to include in output\n"); 873 | fprintf(stderr, " -u unix socket name for polling\n"); 874 | fprintf(stderr, " -p process id file name\n\n"); 875 | fprintf(stderr, " notes:\n"); 876 | fprintf(stderr, " IP addresses can be in either IPv4 or IPv6 format\n\n"); 877 | fprintf(stderr, " time values can be expressed with a suffix of 'm' (milliseconds) or 's' (seconds)\n"); 878 | fprintf(stderr, " if no suffix is specified, milliseconds is the default\n\n"); 879 | fprintf(stderr, " the output format is \"latency_avg latency_stddev loss_pct\"\n"); 880 | fprintf(stderr, " latency values are output in microseconds\n"); 881 | fprintf(stderr, " loss percentage is reported in whole numbers of 0-100\n"); 882 | fprintf(stderr, " resolution of loss calculation is: 100 / ((time_period - loss_interval) / send_interval)\n\n"); 883 | fprintf(stderr, " the alert_cmd is invoked as \"alert_cmd dest_addr alarm_flag latency_avg latency_stddev loss_pct\"\n"); 884 | fprintf(stderr, " alarm_flag is set to 1 if either latency or loss is in alarm state\n"); 885 | fprintf(stderr, " alarm_flag will return to 0 when both have have cleared alarm state\n"); 886 | fprintf(stderr, " alarm hold time begins when the source of the alarm retruns to normal\n\n"); 887 | } 888 | 889 | 890 | // 891 | // Fatal error 892 | // 893 | __attribute__ ((noreturn, format (printf, 1, 2))) 894 | static void 895 | fatal( 896 | const char * format, 897 | ...) 898 | { 899 | va_list args; 900 | 901 | va_start(args, format); 902 | vfprintf(stderr, format, args); 903 | va_end(args); 904 | 905 | exit(EXIT_FAILURE); 906 | } 907 | 908 | 909 | // 910 | // Parse command line arguments 911 | // 912 | static void 913 | parse_args( 914 | int argc, 915 | char * const argv[]) 916 | { 917 | struct addrinfo hint; 918 | struct addrinfo * addr_info; 919 | const char * dest_arg; 920 | const char * bind_arg = NULL; 921 | size_t len; 922 | int opt; 923 | int r; 924 | 925 | progname = argv[0]; 926 | 927 | while((opt = getopt(argc, argv, "fhRSPB:s:l:t:r:d:o:A:D:L:H:C:i:u:p:")) != -1) 928 | { 929 | switch (opt) 930 | { 931 | case 'f': 932 | foreground = 1; 933 | break; 934 | 935 | case 'R': 936 | flag_rewind = 1; 937 | break; 938 | 939 | case 'S': 940 | flag_syslog = 1; 941 | break; 942 | 943 | case 'P': 944 | flag_priority = 1; 945 | break; 946 | 947 | case 'B': 948 | bind_arg = optarg; 949 | break; 950 | 951 | case 's': 952 | r = get_time_arg_msec(optarg, &send_interval_msec); 953 | if (r || send_interval_msec == 0) 954 | { 955 | fatal("invalid send interval %s\n", optarg); 956 | } 957 | break; 958 | 959 | case 'l': 960 | r = get_time_arg_msec(optarg, &loss_interval_msec); 961 | if (r || loss_interval_msec == 0) 962 | { 963 | fatal("invalid loss interval %s\n", optarg); 964 | } 965 | break; 966 | 967 | case 't': 968 | r = get_time_arg_msec(optarg, &time_period_msec); 969 | if (r || time_period_msec == 0) 970 | { 971 | fatal("invalid averaging time period %s\n", optarg); 972 | } 973 | break; 974 | 975 | case 'r': 976 | r = get_time_arg_msec(optarg, &report_interval_msec); 977 | if (r) 978 | { 979 | fatal("invalid report interval %s\n", optarg); 980 | } 981 | break; 982 | 983 | case 'd': 984 | r = get_length_arg(optarg, &echo_data_len); 985 | if (r) 986 | { 987 | fatal("invalid data length %s\n", optarg); 988 | } 989 | break; 990 | 991 | case 'o': 992 | report_name = optarg; 993 | break; 994 | 995 | case 'A': 996 | r = get_time_arg_msec(optarg, &alert_interval_msec); 997 | if (r || alert_interval_msec == 0) 998 | { 999 | fatal("invalid alert interval %s\n", optarg); 1000 | } 1001 | break; 1002 | 1003 | case 'D': 1004 | r = get_time_arg_msec(optarg, &latency_alarm_threshold_msec); 1005 | if (r) 1006 | { 1007 | fatal("invalid latency alarm threshold %s\n", optarg); 1008 | } 1009 | latency_alarm_threshold_usec = latency_alarm_threshold_msec * 1000; 1010 | break; 1011 | 1012 | case 'L': 1013 | r = get_percent_arg(optarg, &loss_alarm_threshold_percent); 1014 | if (r) 1015 | { 1016 | fatal("invalid loss alarm threshold %s\n", optarg); 1017 | } 1018 | break; 1019 | 1020 | case 'H': 1021 | r = get_time_arg_msec(optarg, &alarm_hold_msec); 1022 | if (r) 1023 | { 1024 | fatal("invalid alarm hold interval %s\n", optarg); 1025 | } 1026 | break; 1027 | 1028 | case 'C': 1029 | alert_cmd_offset = strlen(optarg); 1030 | alert_cmd = malloc(alert_cmd_offset + OUTPUT_MAX); 1031 | if (alert_cmd == NULL) 1032 | { 1033 | fatal("malloc of alert command buffer failed\n"); 1034 | } 1035 | memcpy(alert_cmd, optarg, alert_cmd_offset); 1036 | break; 1037 | 1038 | case 'i': 1039 | len = strlen(optarg); 1040 | if (len >= sizeof(identifier) - 1) 1041 | { 1042 | fatal("identifier argument too large (max %u bytes)\n", (unsigned) sizeof(identifier) - 1); 1043 | } 1044 | // optarg with a space appended 1045 | memcpy(identifier, optarg, len); 1046 | identifier[len] = ' '; 1047 | identifier[len + 1] = '\0'; 1048 | break; 1049 | 1050 | case 'u': 1051 | usocket_name = optarg; 1052 | break; 1053 | 1054 | case 'p': 1055 | pidfile_name = optarg; 1056 | break; 1057 | 1058 | case 'h': 1059 | default: 1060 | usage(); 1061 | exit(EXIT_FAILURE); 1062 | } 1063 | } 1064 | 1065 | // Ensure we have the correct number of parameters 1066 | if (argc != optind + 1) 1067 | { 1068 | usage(); 1069 | exit(EXIT_FAILURE); 1070 | } 1071 | dest_arg = argv[optind]; 1072 | 1073 | // Ensure we have something to do: at least one of alarm, report, socket 1074 | if (report_interval_msec == 0 && latency_alarm_threshold_msec == 0 && loss_alarm_threshold_percent == 0 && usocket_name == NULL) 1075 | { 1076 | fatal("no activity enabled\n"); 1077 | } 1078 | 1079 | // Ensure there is a minimum of one resolved slot at all times 1080 | if (time_period_msec <= send_interval_msec * 2 + loss_interval_msec) 1081 | { 1082 | fatal("the time period must be greater than twice the send interval plus the loss interval\n"); 1083 | } 1084 | 1085 | // Ensure we don't have sequence space issues. This really should only be hit by 1086 | // complete accident. Even a ratio of 16384:1 would be excessive. 1087 | if (time_period_msec / send_interval_msec > 65536) 1088 | { 1089 | fatal("the ratio of time period to send interval cannot exceed 65536:1\n"); 1090 | } 1091 | 1092 | // Check destination address 1093 | memset(&hint, 0, sizeof(struct addrinfo)); 1094 | hint.ai_flags = AI_NUMERICHOST; 1095 | hint.ai_family = AF_UNSPEC; 1096 | hint.ai_socktype = SOCK_RAW; 1097 | 1098 | r = getaddrinfo(dest_arg, NULL, &hint, &addr_info); 1099 | if (r != 0) 1100 | { 1101 | fatal("invalid destination IP address %s\n", dest_arg); 1102 | } 1103 | 1104 | if (addr_info->ai_family == AF_INET6) 1105 | { 1106 | af_family = AF_INET6; 1107 | ip_proto = IPPROTO_ICMPV6; 1108 | echo_request_type = ICMP6_ECHO_REQUEST; 1109 | echo_reply_type = ICMP6_ECHO_REPLY; 1110 | } 1111 | else if (addr_info->ai_family != AF_INET) 1112 | { 1113 | fatal("invalid destination IP address %s\n", dest_arg); 1114 | } 1115 | 1116 | 1117 | dest_addr_len = addr_info->ai_addrlen; 1118 | memcpy(&dest_addr, addr_info->ai_addr, dest_addr_len); 1119 | freeaddrinfo(addr_info); 1120 | 1121 | // Check bind address 1122 | if (bind_arg) 1123 | { 1124 | // Address family must match 1125 | hint.ai_family = af_family; 1126 | 1127 | r = getaddrinfo(bind_arg, NULL, &hint, &addr_info); 1128 | if (r != 0) 1129 | { 1130 | fatal("invalid bind IP address %s\n", bind_arg); 1131 | } 1132 | 1133 | bind_addr_len = addr_info->ai_addrlen; 1134 | memcpy(&bind_addr, addr_info->ai_addr, bind_addr_len); 1135 | freeaddrinfo(addr_info); 1136 | } 1137 | 1138 | // Check requested data length 1139 | if (echo_data_len) 1140 | { 1141 | if (af_family == AF_INET) 1142 | { 1143 | if (echo_data_len > IPV4_ICMP_DATA_MAX) 1144 | { 1145 | fatal("data length too large for IPv4 - maximum is %u bytes\n", (unsigned) IPV4_ICMP_DATA_MAX); 1146 | } 1147 | } 1148 | else 1149 | { 1150 | if (echo_data_len > IPV6_ICMP_DATA_MAX) 1151 | { 1152 | fatal("data length too large for IPv6 - maximum is %u bytes\n", (unsigned) IPV6_ICMP_DATA_MAX); 1153 | } 1154 | } 1155 | 1156 | echo_request_len += echo_data_len; 1157 | } 1158 | } 1159 | 1160 | 1161 | // 1162 | // Main 1163 | // 1164 | int 1165 | main( 1166 | int argc, 1167 | char *argv[]) 1168 | { 1169 | char bind_str[ADDR_STR_MAX] = "(none)"; 1170 | char pidbuf[64]; 1171 | int pidfile_fd = -1; 1172 | pid_t pid; 1173 | pthread_t thread; 1174 | struct sigaction act; 1175 | int buflen = PACKET_BUFLEN; 1176 | ssize_t len; 1177 | ssize_t rs; 1178 | int r; 1179 | 1180 | // Handle command line args 1181 | parse_args(argc, argv); 1182 | 1183 | // Set up our sockets 1184 | send_sock = socket(af_family, SOCK_RAW, ip_proto); 1185 | if (send_sock == -1) 1186 | { 1187 | perror("socket"); 1188 | fatal("cannot create send socket\n"); 1189 | } 1190 | (void) fcntl(send_sock, F_SETFL, FD_CLOEXEC); 1191 | (void) setsockopt(send_sock, SOL_SOCKET, SO_SNDBUF, &buflen, sizeof(buflen)); 1192 | 1193 | recv_sock = socket(af_family, SOCK_RAW, ip_proto); 1194 | if (recv_sock == -1) 1195 | { 1196 | perror("socket"); 1197 | fatal("cannot create recv socket\n"); 1198 | } 1199 | (void) fcntl(recv_sock, F_SETFL, FD_CLOEXEC); 1200 | (void) setsockopt(recv_sock, SOL_SOCKET, SO_RCVBUF, &buflen, sizeof(buflen)); 1201 | 1202 | // Bind our sockets to an address if requested 1203 | if (bind_addr_len) 1204 | { 1205 | r = bind(send_sock, (struct sockaddr *) &bind_addr, bind_addr_len); 1206 | if (r == -1) 1207 | { 1208 | perror("bind"); 1209 | fatal("cannot bind send socket\n"); 1210 | } 1211 | r = bind(recv_sock, (struct sockaddr *) &bind_addr, bind_addr_len); 1212 | if (r == -1) 1213 | { 1214 | perror("bind"); 1215 | fatal("cannot bind recv socket\n"); 1216 | } 1217 | } 1218 | 1219 | // Drop privileges 1220 | (void) setgid(getgid()); 1221 | (void) setuid(getuid()); 1222 | 1223 | // Create pid file 1224 | if (pidfile_name) 1225 | { 1226 | pidfile_fd = open(pidfile_name, O_WRONLY | O_CREAT | O_EXCL | O_CLOEXEC, 0644); 1227 | if (pidfile_fd != -1) 1228 | { 1229 | // Lock the pid file 1230 | r = flock(pidfile_fd, LOCK_EX | LOCK_NB); 1231 | if (r == -1) 1232 | { 1233 | perror("flock"); 1234 | fatal("error locking pid file\n"); 1235 | } 1236 | } 1237 | else 1238 | { 1239 | // Pid file already exists? 1240 | pidfile_fd = open(pidfile_name, O_RDWR | O_CREAT | O_CLOEXEC, 0644); 1241 | if (pidfile_fd == -1) 1242 | { 1243 | perror("open"); 1244 | fatal("cannot create/open pid file %s\n", pidfile_name); 1245 | } 1246 | 1247 | // Lock the pid file 1248 | r = flock(pidfile_fd, LOCK_EX | LOCK_NB); 1249 | if (r == -1) 1250 | { 1251 | fatal("pid file %s is in use by another process\n", pidfile_name); 1252 | } 1253 | 1254 | // Check for existing pid 1255 | rs = read(pidfile_fd, pidbuf, sizeof(pidbuf) - 1); 1256 | if (rs > 0) 1257 | { 1258 | pidbuf[rs] = 0; 1259 | 1260 | pid = (pid_t) strtol(pidbuf, NULL, 10); 1261 | if (pid > 0) 1262 | { 1263 | // Is the pid still alive? 1264 | r = kill(pid, 0); 1265 | if (r == 0) 1266 | { 1267 | fatal("pid file %s is in use by process %u\n", pidfile_name, (unsigned int) pid); 1268 | } 1269 | } 1270 | } 1271 | 1272 | // Reset the pid file 1273 | (void) lseek(pidfile_fd, 0, 0); 1274 | r = ftruncate(pidfile_fd, 0); 1275 | if (r == -1) 1276 | { 1277 | perror("ftruncate"); 1278 | fatal("cannot write pid file %s\n", pidfile_name); 1279 | } 1280 | } 1281 | } 1282 | 1283 | // Create report file 1284 | if (report_name) 1285 | { 1286 | report_fd = open(report_name, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, 0644); 1287 | if (report_fd == -1) 1288 | { 1289 | perror("open"); 1290 | fatal("cannot open/create report file %s\n", report_name); 1291 | } 1292 | } 1293 | else 1294 | { 1295 | report_fd = fileno(stdout); 1296 | } 1297 | 1298 | // Create unix socket 1299 | if (usocket_name) 1300 | { 1301 | struct sockaddr_un uaddr; 1302 | 1303 | if (strlen(usocket_name) >= sizeof(uaddr.sun_path)) 1304 | { 1305 | fatal("socket name too large\n"); 1306 | } 1307 | 1308 | usocket_fd = socket(AF_UNIX, SOCK_STREAM, 0); 1309 | if (usocket_fd == -1) 1310 | { 1311 | perror("socket"); 1312 | fatal("cannot create unix domain socket\n"); 1313 | } 1314 | (void) fcntl(usocket_fd, F_SETFL, FD_CLOEXEC); 1315 | (void) unlink(usocket_name); 1316 | 1317 | memset(&uaddr, 0, sizeof(uaddr)); 1318 | uaddr.sun_family = AF_UNIX; 1319 | strncpy(uaddr.sun_path, usocket_name, sizeof(uaddr.sun_path)); 1320 | r = bind(usocket_fd, (struct sockaddr *) &uaddr, sizeof(uaddr)); 1321 | if (r == -1) 1322 | { 1323 | perror("bind"); 1324 | fatal("cannot bind unix domain socket\n"); 1325 | } 1326 | 1327 | r = chmod(usocket_name, 0666); 1328 | if (r == -1) 1329 | { 1330 | perror("fchmod"); 1331 | fatal("cannot fchmod unix domain socket\n"); 1332 | } 1333 | 1334 | r = listen(usocket_fd, 5); 1335 | if (r == -1) 1336 | { 1337 | perror("listen"); 1338 | fatal("cannot listen on unix domain socket\n"); 1339 | } 1340 | } 1341 | 1342 | // End of general errors from command line options 1343 | 1344 | // Self background 1345 | if (foreground == 0) 1346 | { 1347 | pid = fork(); 1348 | 1349 | if (pid == -1) 1350 | { 1351 | perror("fork"); 1352 | fatal("cannot background\n"); 1353 | } 1354 | 1355 | if (pid) 1356 | { 1357 | _exit(EXIT_SUCCESS); 1358 | } 1359 | 1360 | (void) setsid(); 1361 | } 1362 | 1363 | // Termination handler 1364 | memset(&act, 0, sizeof(act)); 1365 | act.sa_handler = (void (*)(int)) term_handler; 1366 | (void) sigaction(SIGTERM, &act, NULL); 1367 | (void) sigaction(SIGINT, &act, NULL); 1368 | 1369 | // Write pid file 1370 | if (pidfile_fd != -1) 1371 | { 1372 | len = snprintf(pidbuf, sizeof(pidbuf), "%u\n", (unsigned) getpid()); 1373 | if (len < 0 || (size_t) len > sizeof(pidbuf)) 1374 | { 1375 | fatal("error formatting pidfile\n"); 1376 | } 1377 | 1378 | rs = write(pidfile_fd, pidbuf, (size_t) len); 1379 | if (rs == -1) 1380 | { 1381 | perror("write"); 1382 | fatal("error writing pidfile\n"); 1383 | } 1384 | 1385 | r = close(pidfile_fd); 1386 | if (r == -1) 1387 | { 1388 | perror("close"); 1389 | fatal("error writing pidfile\n"); 1390 | } 1391 | } 1392 | 1393 | // Create the array 1394 | array_size = (unsigned int) (time_period_msec / send_interval_msec); 1395 | array = calloc(array_size, sizeof(*array)); 1396 | if (array == NULL) 1397 | { 1398 | fatal("calloc of packet array failed\n"); 1399 | } 1400 | 1401 | // Allocate the echo request/reply packet buffers 1402 | echo_request = (icmphdr_t *) malloc(echo_request_len); 1403 | echo_reply = malloc(echo_reply_len); 1404 | if (echo_request == NULL || echo_reply == NULL) 1405 | { 1406 | fatal("malloc of packet buffers failed\n"); 1407 | } 1408 | 1409 | // Set the default loss interval 1410 | if (loss_interval_msec == 0) 1411 | { 1412 | loss_interval_msec = send_interval_msec * 4; 1413 | } 1414 | loss_interval_usec = loss_interval_msec * 1000; 1415 | 1416 | // Log our general parameters 1417 | r = getnameinfo((struct sockaddr *) &dest_addr, dest_addr_len, dest_str, sizeof(dest_str), NULL, 0, NI_NUMERICHOST); 1418 | if (r != 0) 1419 | { 1420 | fatal("getnameinfo of destination address failed\n"); 1421 | } 1422 | 1423 | // Default alarm hold if not explicitly set 1424 | if (alarm_hold_msec == 0) 1425 | { 1426 | alarm_hold_msec = alert_interval_msec * DEFAULT_HOLD_PERIODS; 1427 | } 1428 | 1429 | if (bind_addr_len) 1430 | { 1431 | r = getnameinfo((struct sockaddr *) &bind_addr, bind_addr_len, bind_str, sizeof(bind_str), NULL, 0, NI_NUMERICHOST); 1432 | if (r != 0) 1433 | { 1434 | fatal("getnameinfo of bind address failed\n"); 1435 | } 1436 | } 1437 | 1438 | logger("send_interval %lums loss_interval %lums time_period %lums report_interval %lums data_len %lu alert_interval %lums latency_alarm %lums loss_alarm %lu%% alarm_hold %lums dest_addr %s bind_addr %s identifier \"%s\"\n", 1439 | send_interval_msec, loss_interval_msec, time_period_msec, report_interval_msec, echo_data_len, 1440 | alert_interval_msec, latency_alarm_threshold_msec, loss_alarm_threshold_percent, alarm_hold_msec, 1441 | dest_str, bind_str, identifier); 1442 | 1443 | // Set my echo id 1444 | echo_id = htons((uint16_t) getpid()); 1445 | 1446 | // Set the limit for sequence number to ensure a multiple of array size 1447 | sequence_limit = (uint16_t) array_size; 1448 | while ((sequence_limit & 0x8000) == 0) 1449 | { 1450 | sequence_limit <<= 1; 1451 | } 1452 | 1453 | // Create recv thread 1454 | r = pthread_create(&thread, NULL, &recv_thread, NULL); 1455 | if (r != 0) 1456 | { 1457 | perror("pthread_create"); 1458 | fatal("cannot create recv thread\n"); 1459 | } 1460 | 1461 | // Set priority on recv thread if requested 1462 | if (flag_priority) 1463 | { 1464 | struct sched_param thread_sched_param; 1465 | 1466 | r = sched_get_priority_min(SCHED_RR); 1467 | if (r == -1) 1468 | { 1469 | perror("sched_get_priority_min"); 1470 | fatal("cannot determin minimum shceduling priority for SCHED_RR\n"); 1471 | } 1472 | thread_sched_param.sched_priority = r; 1473 | 1474 | r = pthread_setschedparam(thread, SCHED_RR, &thread_sched_param); 1475 | if (r != 0) 1476 | { 1477 | perror("pthread_setschedparam"); 1478 | fatal("cannot set receive thread priority\n"); 1479 | } 1480 | } 1481 | 1482 | // Create send thread 1483 | r = pthread_create(&thread, NULL, &send_thread, NULL); 1484 | if (r != 0) 1485 | { 1486 | perror("pthread_create"); 1487 | fatal("cannot create send thread\n"); 1488 | } 1489 | 1490 | // Report thread 1491 | if (report_interval_msec) 1492 | { 1493 | r = pthread_create(&thread, NULL, &report_thread, NULL); 1494 | if (r != 0) 1495 | { 1496 | perror("pthread_create"); 1497 | fatal("cannot create report thread\n"); 1498 | } 1499 | } 1500 | 1501 | // Create alert thread 1502 | if (latency_alarm_threshold_msec || loss_alarm_threshold_percent) 1503 | { 1504 | r = pthread_create(&thread, NULL, &alert_thread, NULL); 1505 | if (r != 0) 1506 | { 1507 | perror("pthread_create"); 1508 | fatal("cannot create alert thread\n"); 1509 | } 1510 | } 1511 | 1512 | // Create usocket thread 1513 | if (usocket_name) 1514 | { 1515 | r = pthread_create(&thread, NULL, &usocket_thread, NULL); 1516 | if (r != 0) 1517 | { 1518 | perror("pthread_create"); 1519 | fatal("cannot create usocket thread\n"); 1520 | } 1521 | } 1522 | 1523 | // Wait (forever) for last thread started 1524 | pthread_join(thread, NULL); 1525 | 1526 | // notreached 1527 | return 0; 1528 | } 1529 | -------------------------------------------------------------------------------- /influx/README.md: -------------------------------------------------------------------------------- 1 | Examples for dpinger logging/monitoring with InfluxDB and Grafana 2 | 3 |
4 | 5 | Files: 6 | 7 | dpinger_influx_logger 8 | 9 | Python script for logging dpinger data in InfluxDB 10 | 11 | 12 | dpinger_start.sh 13 | 14 | Sample start script for dpinger influx logging 15 | 16 | 17 | dpinger_grafana_dashboard.json 18 | 19 | Example Grafana dashboard for monitoring dpinger data 20 | -------------------------------------------------------------------------------- /influx/dpinger_grafana_dashboard.json: -------------------------------------------------------------------------------- 1 | { 2 | "annotations": { 3 | "list": [ 4 | { 5 | "builtIn": 1, 6 | "datasource": { 7 | "type": "datasource", 8 | "uid": "grafana" 9 | }, 10 | "enable": true, 11 | "hide": true, 12 | "iconColor": "rgba(0, 211, 255, 1)", 13 | "name": "Annotations & Alerts", 14 | "target": { 15 | "limit": 100, 16 | "matchAny": false, 17 | "tags": [], 18 | "type": "dashboard" 19 | }, 20 | "type": "dashboard" 21 | } 22 | ] 23 | }, 24 | "editable": true, 25 | "fiscalYearStartMonth": 0, 26 | "graphTooltip": 0, 27 | "id": 3, 28 | "iteration": 1652309379625, 29 | "links": [], 30 | "liveNow": false, 31 | "panels": [ 32 | { 33 | "datasource": { 34 | "uid": "$source" 35 | }, 36 | "fieldConfig": { 37 | "defaults": { 38 | "color": { 39 | "mode": "palette-classic" 40 | }, 41 | "custom": { 42 | "axisLabel": "", 43 | "axisPlacement": "auto", 44 | "barAlignment": 0, 45 | "drawStyle": "line", 46 | "fillOpacity": 10, 47 | "gradientMode": "none", 48 | "hideFrom": { 49 | "legend": false, 50 | "tooltip": false, 51 | "viz": false 52 | }, 53 | "lineInterpolation": "linear", 54 | "lineWidth": 1, 55 | "pointSize": 5, 56 | "scaleDistribution": { 57 | "type": "linear" 58 | }, 59 | "showPoints": "never", 60 | "spanNulls": false, 61 | "stacking": { 62 | "group": "A", 63 | "mode": "none" 64 | }, 65 | "thresholdsStyle": { 66 | "mode": "off" 67 | } 68 | }, 69 | "mappings": [], 70 | "min": 0, 71 | "thresholds": { 72 | "mode": "absolute", 73 | "steps": [ 74 | { 75 | "color": "green", 76 | "value": null 77 | }, 78 | { 79 | "color": "red", 80 | "value": 80 81 | } 82 | ] 83 | }, 84 | "unit": "ms" 85 | }, 86 | "overrides": [ 87 | { 88 | "matcher": { 89 | "id": "byName", 90 | "options": "loss" 91 | }, 92 | "properties": [ 93 | { 94 | "id": "unit", 95 | "value": "percent" 96 | } 97 | ] 98 | }, 99 | { 100 | "matcher": { 101 | "id": "byName", 102 | "options": "loss" 103 | }, 104 | "properties": [ 105 | { 106 | "id": "color", 107 | "value": { 108 | "fixedColor": "#e00000", 109 | "mode": "fixed" 110 | } 111 | }, 112 | { 113 | "id": "custom.fillOpacity", 114 | "value": 100 115 | }, 116 | { 117 | "id": "custom.lineWidth", 118 | "value": 0 119 | }, 120 | { 121 | "id": "unit", 122 | "value": "percent" 123 | }, 124 | { 125 | "id": "max", 126 | "value": 100 127 | } 128 | ] 129 | } 130 | ] 131 | }, 132 | "gridPos": { 133 | "h": 19, 134 | "w": 24, 135 | "x": 0, 136 | "y": 0 137 | }, 138 | "id": 2, 139 | "options": { 140 | "legend": { 141 | "calcs": [ 142 | "mean", 143 | "lastNotNull", 144 | "max", 145 | "min" 146 | ], 147 | "displayMode": "table", 148 | "placement": "bottom" 149 | }, 150 | "tooltip": { 151 | "mode": "multi", 152 | "sort": "none" 153 | } 154 | }, 155 | "pluginVersion": "8.3.5", 156 | "targets": [ 157 | { 158 | "alias": "latency", 159 | "groupBy": [ 160 | { 161 | "params": [ 162 | "$intervals" 163 | ], 164 | "type": "time" 165 | }, 166 | { 167 | "params": [ 168 | "null" 169 | ], 170 | "type": "fill" 171 | } 172 | ], 173 | "measurement": "dpinger", 174 | "orderByTime": "ASC", 175 | "policy": "default", 176 | "query": "SELECT mean(\"latency\") FROM \"wan\" WHERE $timeFilter GROUP BY time($__interval) fill(null)", 177 | "queryType": "randomWalk", 178 | "rawQuery": false, 179 | "refId": "A", 180 | "resultFormat": "time_series", 181 | "select": [ 182 | [ 183 | { 184 | "params": [ 185 | "latency" 186 | ], 187 | "type": "field" 188 | }, 189 | { 190 | "params": [], 191 | "type": "mean" 192 | } 193 | ] 194 | ], 195 | "tags": [ 196 | { 197 | "key": "name", 198 | "operator": "=~", 199 | "value": "/^$name$/" 200 | } 201 | ] 202 | }, 203 | { 204 | "alias": "stddev", 205 | "groupBy": [ 206 | { 207 | "params": [ 208 | "$intervals" 209 | ], 210 | "type": "time" 211 | }, 212 | { 213 | "params": [ 214 | "null" 215 | ], 216 | "type": "fill" 217 | } 218 | ], 219 | "measurement": "dpinger", 220 | "orderByTime": "ASC", 221 | "policy": "default", 222 | "queryType": "randomWalk", 223 | "refId": "B", 224 | "resultFormat": "time_series", 225 | "select": [ 226 | [ 227 | { 228 | "params": [ 229 | "stddev" 230 | ], 231 | "type": "field" 232 | }, 233 | { 234 | "params": [], 235 | "type": "mean" 236 | } 237 | ] 238 | ], 239 | "tags": [ 240 | { 241 | "key": "name", 242 | "operator": "=~", 243 | "value": "/^$name$/" 244 | } 245 | ] 246 | }, 247 | { 248 | "alias": "loss", 249 | "groupBy": [ 250 | { 251 | "params": [ 252 | "$intervals" 253 | ], 254 | "type": "time" 255 | }, 256 | { 257 | "params": [ 258 | "null" 259 | ], 260 | "type": "fill" 261 | } 262 | ], 263 | "measurement": "dpinger", 264 | "orderByTime": "ASC", 265 | "policy": "default", 266 | "queryType": "randomWalk", 267 | "refId": "C", 268 | "resultFormat": "time_series", 269 | "select": [ 270 | [ 271 | { 272 | "params": [ 273 | "loss" 274 | ], 275 | "type": "field" 276 | }, 277 | { 278 | "params": [], 279 | "type": "mean" 280 | } 281 | ] 282 | ], 283 | "tags": [ 284 | { 285 | "key": "name", 286 | "operator": "=~", 287 | "value": "/^$name$/" 288 | } 289 | ] 290 | } 291 | ], 292 | "title": "$name - ${intervals} intervals", 293 | "transformations": [], 294 | "type": "timeseries" 295 | } 296 | ], 297 | "refresh": "1m", 298 | "schemaVersion": 36, 299 | "style": "dark", 300 | "tags": [], 301 | "templating": { 302 | "list": [ 303 | { 304 | "current": { 305 | "selected": false, 306 | "text": "dpinger", 307 | "value": "dpinger" 308 | }, 309 | "hide": 0, 310 | "includeAll": false, 311 | "label": "Source", 312 | "multi": false, 313 | "name": "source", 314 | "options": [], 315 | "query": "influxdb", 316 | "queryValue": "", 317 | "refresh": 1, 318 | "regex": "", 319 | "skipUrlSync": false, 320 | "type": "datasource" 321 | }, 322 | { 323 | "current": { 324 | "selected": false, 325 | "text": "wan", 326 | "value": "wan" 327 | }, 328 | "datasource": { 329 | "type": "influxdb", 330 | "uid": "$source" 331 | }, 332 | "definition": "SHOW TAG VALUES WITH KEY = \"name\"", 333 | "hide": 0, 334 | "includeAll": false, 335 | "label": "Name", 336 | "multi": false, 337 | "name": "name", 338 | "options": [], 339 | "query": "SHOW TAG VALUES WITH KEY = \"name\"", 340 | "refresh": 1, 341 | "regex": "", 342 | "skipUrlSync": false, 343 | "sort": 0, 344 | "tagValuesQuery": "", 345 | "tagsQuery": "", 346 | "type": "query", 347 | "useTags": false 348 | }, 349 | { 350 | "auto": true, 351 | "auto_count": 500, 352 | "auto_min": "10s", 353 | "current": { 354 | "selected": false, 355 | "text": "auto", 356 | "value": "$__auto_interval_intervals" 357 | }, 358 | "hide": 0, 359 | "label": "Intervals", 360 | "name": "intervals", 361 | "options": [ 362 | { 363 | "selected": true, 364 | "text": "auto", 365 | "value": "$__auto_interval_intervals" 366 | }, 367 | { 368 | "selected": false, 369 | "text": "10s", 370 | "value": "10s" 371 | }, 372 | { 373 | "selected": false, 374 | "text": "30s", 375 | "value": "30s" 376 | }, 377 | { 378 | "selected": false, 379 | "text": "1m", 380 | "value": "1m" 381 | }, 382 | { 383 | "selected": false, 384 | "text": "2m", 385 | "value": "2m" 386 | }, 387 | { 388 | "selected": false, 389 | "text": "5m", 390 | "value": "5m" 391 | }, 392 | { 393 | "selected": false, 394 | "text": "10m", 395 | "value": "10m" 396 | }, 397 | { 398 | "selected": false, 399 | "text": "15m", 400 | "value": "15m" 401 | }, 402 | { 403 | "selected": false, 404 | "text": "30m", 405 | "value": "30m" 406 | }, 407 | { 408 | "selected": false, 409 | "text": "1h", 410 | "value": "1h" 411 | }, 412 | { 413 | "selected": false, 414 | "text": "6h", 415 | "value": "6h" 416 | }, 417 | { 418 | "selected": false, 419 | "text": "12h", 420 | "value": "12h" 421 | }, 422 | { 423 | "selected": false, 424 | "text": "1d", 425 | "value": "1d" 426 | }, 427 | { 428 | "selected": false, 429 | "text": "7d", 430 | "value": "7d" 431 | } 432 | ], 433 | "query": "10s,30s,1m,2m,5m,10m,15m,30m,1h,6h,12h,1d,7d", 434 | "queryValue": "", 435 | "refresh": 2, 436 | "skipUrlSync": false, 437 | "type": "interval" 438 | } 439 | ] 440 | }, 441 | "time": { 442 | "from": "now-24h", 443 | "to": "now" 444 | }, 445 | "timepicker": { 446 | "refresh_intervals": [ 447 | "1m", 448 | "5m" 449 | ] 450 | }, 451 | "timezone": "", 452 | "title": "WAN Latency", 453 | "uid": "ThwrgHYMk", 454 | "version": 46, 455 | "weekStart": "" 456 | } 457 | -------------------------------------------------------------------------------- /influx/dpinger_influx_logger: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | dpinger_path = "/usr/local/bin/dpinger" 4 | 5 | import os 6 | import sys 7 | import signal 8 | import requests 9 | from subprocess import Popen, PIPE 10 | from requests import post 11 | 12 | # Handle SIGINT 13 | def signal_handler(signal, frame): 14 | try: 15 | dpinger.kill() 16 | except: 17 | pass 18 | sys.exit(0) 19 | 20 | signal.signal(signal.SIGINT, signal_handler) 21 | 22 | # Handle command line ars 23 | progname = sys.argv.pop(0) 24 | if (len(sys.argv) < 4): 25 | print('Usage: {0} influx_url influx_db host name target [additional dpinger options]'.format(progname)) 26 | print(' influx_url URL of the Influx server') 27 | print(' influx_db name of the Influx database') 28 | print(' host value of "host" tag (example: output of hostname command)') 29 | print(' name value of "name" tag (example: a circuit name such as "wan")') 30 | print(' target IP address to monitor (also the value of the "target" tag)') 31 | sys.exit(1) 32 | influx_url = sys.argv.pop(0) 33 | influx_db = sys.argv.pop(0) 34 | host = sys.argv.pop(0) 35 | name = sys.argv.pop(0) 36 | target = sys.argv.pop(0) 37 | 38 | influx_user = os.getenv('INFLUX_USER') 39 | influx_pass = os.getenv('INFLUX_PASS') 40 | 41 | # Set up dpinger command 42 | cmd = [dpinger_path, "-f"] 43 | cmd.extend(sys.argv) 44 | cmd.extend(["-r", "10s", target]) 45 | 46 | # Set up formats 47 | url = '{0}/write?db={1}'.format(influx_url, influx_db) 48 | datafmt = "dpinger,host={0},name={1},target={2} latency={{0:.3f}},stddev={{1:.3f}},loss={{2}}i".format(host, name, target) 49 | 50 | # Start up dpinger 51 | try: 52 | dpinger = Popen(cmd, stdout=PIPE, text=True, bufsize=0) 53 | except: 54 | print("failed to start dpinger") 55 | sys.exit(1) 56 | 57 | # Start the show 58 | while True: 59 | line = dpinger.stdout.readline() 60 | if (len(line) == 0): 61 | print("dpinger exited") 62 | sys.exit(1) 63 | 64 | [latency, stddev, loss] = line.split() 65 | data = datafmt.format(float(latency) / 1000, float(stddev) / 1000, loss) 66 | #print(data) 67 | try: 68 | post(url = url, auth = (influx_user, influx_pass), data = data) 69 | except: 70 | print("post failed") 71 | -------------------------------------------------------------------------------- /influx/dpinger_start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | INFLUX_URL="http://myinfluxhost:8086" 4 | export INFLUX_USER="dpinger" 5 | export INFLUX_PASS="myinfluxpass" 6 | 7 | exec /usr/local/dpinger_influx_logger $INFLUX_URL dpinger `hostname` wan 8.8.8.8 8 | -------------------------------------------------------------------------------- /rrd/README.md: -------------------------------------------------------------------------------- 1 | Example scripts for creating RRD graphs with dpinger 2 | 3 |
4 | 5 | Files and Usage: 6 | 7 | dpinger_rrd_create 8 | 9 | Create the rrd initial file. 10 | 11 | dpinger_rrd_update 12 | 13 | Daemon updater script. Runs dpinger and feeds the rrd file. 14 | 15 | dpinger_rrd_gencgi 16 | 17 | Generate a cgi script that displays graphs. 18 | 19 | dpinger_rrd_graph 20 | 21 | Generate png files for use with static html 22 | 23 | sample.html 24 | 25 | Sample static html to display graphs. 26 | -------------------------------------------------------------------------------- /rrd/dpinger_rrd_create: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | if [ $# -ne 1 ] 4 | then 5 | echo "usage: $0 name" 6 | exit 1 7 | fi 8 | name="$1" 9 | 10 | rrdfile="${name}.rrd" 11 | echo "Creating rrd file ${rrdfile}" 12 | 13 | 14 | # Time duration method doesn't work in all versions of rrdtool 15 | #rrdtool create "${rrdfile}" --step 1m \ 16 | # DS:latency:GAUGE:5m:0:U \ 17 | # DS:stddev:GAUGE:5m:0:U \ 18 | # DS:loss:GAUGE:5m:0:100 \ 19 | # RRA:AVERAGE:0.5:1m:15d \ 20 | # RRA:AVERAGE:0.5:5m:90d \ 21 | # RRA:AVERAGE:0.5:1h:3y 22 | 23 | # This method works in all versions 24 | rrdtool create "${rrdfile}" --step 60 \ 25 | DS:latency:GAUGE:300:0:U \ 26 | DS:stddev:GAUGE:300:0:U \ 27 | DS:loss:GAUGE:300:0:100 \ 28 | RRA:AVERAGE:0.5:1:21600 \ 29 | RRA:AVERAGE:0.5:5:25920 \ 30 | RRA:AVERAGE:0.5:60:26352 31 | -------------------------------------------------------------------------------- /rrd/dpinger_rrd_gencgi: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | if [ $# -ne 2 ] 4 | then 5 | echo "usage: $0 rrdname pngname" 6 | exit 1 7 | fi 8 | rrdname="${1}" 9 | pngname="${2}" 10 | 11 | # Prefixes for rrd and png files. Note that if the prefix is a directory, it must incldue the trailing slash 12 | # If no value is set, the files are located in the current directory when the cgi script runs 13 | rrdprefix= 14 | pngprefix=/tmp/ 15 | 16 | 17 | # Graph dimensions 18 | #graph_height=240 19 | #graph_width=720 20 | graph_height=280 21 | graph_width=840 22 | 23 | # Preferred font 24 | font="DejaVuSansMono" 25 | 26 | # Latency breakpoints in milliseconds 27 | latency_s0=20 28 | latency_s1=40 29 | latency_s2=80 30 | latency_s3=160 31 | latency_s4=320 32 | 33 | # Latency colors 34 | latency_c0="dddddd" 35 | latency_c1="ddbbbb" 36 | latency_c2="d4aaaa" 37 | latency_c3="cc9999" 38 | latency_c4="c38888" 39 | latency_c5="bb7777" 40 | 41 | # Standard deviation color & opacity 42 | stddev_c="55333355" 43 | 44 | # Loss color 45 | loss_c="ee0000" 46 | 47 | 48 | gen_graph() 49 | { 50 | png=$1 51 | rrd=$2 52 | start=$3 53 | end=$4 54 | step=$5 55 | description=$6 56 | 57 | echo "" 125 | echo "

" 126 | } 127 | 128 | ( 129 | echo "#!/usr/bin/rrdcgi" 130 | echo " Latency Statistics for ${rrdname} " 131 | 132 | gen_graph "${pngprefix}${pngname}-1.png" "${rrdprefix}${rrdname}.rrd" "now-8h" "now" "60" "Last 8 hours - 1 minute intervals" 133 | gen_graph "${pngprefix}${pngname}-2.png" "${rrdprefix}${rrdname}.rrd" "now-36h" "now" "300" "Last 36 hours - 5 minute intervals" 134 | gen_graph "${pngprefix}${pngname}-3.png" "${rrdprefix}${rrdname}.rrd" "now-8d" "now" "1800" "Last 8 days - 30 minute intervals" 135 | gen_graph "${pngprefix}${pngname}-4.png" "${rrdprefix}${rrdname}.rrd" "now-60d" "now" "14400" "Last 60 days - 4 hour intervals" 136 | gen_graph "${pngprefix}${pngname}-5.png" "${rrdprefix}${rrdname}.rrd" "now-1y" "now" "86400" "Last 1 year - 1 day intervals" 137 | gen_graph "${pngprefix}${pngname}-6.png" "${rrdprefix}${rrdname}.rrd" "now-4y" "now" "86400" "Last 4 years - 1 day intervals" 138 | 139 | echo " " 140 | ) > "${pngname}.cgi" 141 | -------------------------------------------------------------------------------- /rrd/dpinger_rrd_graph: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | if [ $# -ne 2 ] 4 | then 5 | echo "usage: $0 rrdname pngname" 6 | exit 1 7 | fi 8 | rrdname="${1}" 9 | pngname="${2}" 10 | 11 | # Prefixes for rrd and png files. Note that if the prefix is a directory, it must incldue the trailing slash 12 | # If no value is set, the files are located in the current directory 13 | rrdprefix= 14 | pngprefix=/tmp/ 15 | 16 | # Graph dimensions 17 | graph_height=240 18 | graph_width=720 19 | #graph_height=280 20 | #graph_width=840 21 | 22 | # Preferred font 23 | font="DejaVuSansMono" 24 | 25 | # Latency breakpoints in milliseconds 26 | latency_s0=20 27 | latency_s1=40 28 | latency_s2=80 29 | latency_s3=160 30 | latency_s4=320 31 | 32 | # Latency colors 33 | latency_c0="dddddd" 34 | latency_c1="ddbbbb" 35 | latency_c2="d4aaaa" 36 | latency_c3="cc9999" 37 | latency_c4="c38888" 38 | latency_c5="bb7777" 39 | 40 | # Standard deviation color & opacity 41 | stddev_c="55333355" 42 | 43 | # Loss color 44 | loss_c="ee0000" 45 | 46 | 47 | gen_graph() 48 | { 49 | png=$1 50 | rrd=$2 51 | start=$3 52 | end=$4 53 | step=$5 54 | description=$6 55 | 56 | rrdtool graph "${png}" \ 57 | --lazy \ 58 | --start "${start}" --end "${end}" --step "${step}" \ 59 | --height "${graph_height}" --width "${graph_width}" \ 60 | --title "Average Latency and Packet Loss - ${description}" \ 61 | --disable-rrdtool-tag \ 62 | --color BACK#ffffff \ 63 | --font DEFAULT:9:"${font}" \ 64 | --font AXIS:8:"${font}" \ 65 | \ 66 | DEF:latency_us="${rrd}":latency:AVERAGE:step="${step}" \ 67 | CDEF:latency=latency_us,1000,/ \ 68 | CDEF:latency_s0=latency,${latency_s0},MIN \ 69 | CDEF:latency_s1=latency,${latency_s1},MIN \ 70 | CDEF:latency_s2=latency,${latency_s2},MIN \ 71 | CDEF:latency_s3=latency,${latency_s3},MIN \ 72 | CDEF:latency_s4=latency,${latency_s4},MIN \ 73 | VDEF:latency_min=latency,MINIMUM \ 74 | VDEF:latency_max=latency,MAXIMUM \ 75 | VDEF:latency_avg=latency,AVERAGE \ 76 | VDEF:latency_last=latency,LAST \ 77 | \ 78 | DEF:stddev_us="${rrd}":stddev:AVERAGE:step="${step}" \ 79 | CDEF:stddev=stddev_us,1000,/ \ 80 | VDEF:stddev_min=stddev,MINIMUM \ 81 | VDEF:stddev_max=stddev,MAXIMUM \ 82 | VDEF:stddev_avg=stddev,AVERAGE \ 83 | VDEF:stddev_last=stddev,LAST \ 84 | \ 85 | DEF:loss="${rrd}":loss:AVERAGE:step="${step}" \ 86 | CDEF:loss_neg=loss,-1,* \ 87 | VDEF:loss_min=loss,MINIMUM \ 88 | VDEF:loss_max=loss,MAXIMUM \ 89 | VDEF:loss_avg=loss,AVERAGE \ 90 | VDEF:loss_last=loss,LAST \ 91 | \ 92 | COMMENT:" Min Max Avg Last\n" \ 93 | \ 94 | COMMENT:" " \ 95 | AREA:latency#${latency_c5} \ 96 | AREA:latency_s4#${latency_c4} \ 97 | AREA:latency_s3#${latency_c3} \ 98 | AREA:latency_s2#${latency_c2} \ 99 | AREA:latency_s1#${latency_c1} \ 100 | AREA:latency_s0#${latency_c0} \ 101 | LINE1:latency#000000:"Latency " \ 102 | GPRINT:"latency_min:%8.3lf ms\t" \ 103 | GPRINT:"latency_max:%8.3lf ms\t" \ 104 | GPRINT:"latency_avg:%8.3lf ms\t" \ 105 | GPRINT:"latency_last:%8.3lf ms\n" \ 106 | \ 107 | COMMENT:" " \ 108 | LINE1:stddev#${stddev_c}:"Stddev " \ 109 | GPRINT:"stddev_min:%8.3lf ms\t" \ 110 | GPRINT:"stddev_max:%8.3lf ms\t" \ 111 | GPRINT:"stddev_avg:%8.3lf ms\t" \ 112 | GPRINT:"stddev_last:%8.3lf ms\n" \ 113 | \ 114 | COMMENT:" " \ 115 | AREA:loss_neg#${loss_c}:"Loss " \ 116 | GPRINT:"loss_min:%4.1lf %%\t\t" \ 117 | GPRINT:"loss_max:%4.1lf %%\t\t" \ 118 | GPRINT:"loss_avg:%4.1lf %%\t\t" \ 119 | GPRINT:"loss_last:%4.1lf %%\n" \ 120 | \ 121 | COMMENT:" \n" \ 122 | GPRINT:"latency_last:Ending at %H\:%M on %B %d, %Y\r:strftime" 123 | } 124 | 125 | 126 | gen_graph "${pngprefix}${pngname}-1.png" "${rrdprefix}${rrdname}.rrd" "now-8h" "now" "60" "Last 8 hours - 1 minute intervals" 127 | gen_graph "${pngprefix}${pngname}-2.png" "${rrdprefix}${rrdname}.rrd" "now-36h" "now" "300" "Last 36 hours - 5 minute intervals" 128 | gen_graph "${pngprefix}${pngname}-3.png" "${rrdprefix}${rrdname}.rrd" "now-8d" "now" "1800" "Last 8 days - 30 minute intervals" 129 | gen_graph "${pngprefix}${pngname}-4.png" "${rrdprefix}${rrdname}.rrd" "now-60d" "now" "14400" "Last 60 days - 4 hour intervals" 130 | gen_graph "${pngprefix}${pngname}-5.png" "${rrdprefix}${rrdname}.rrd" "now-1y" "now" "86400" "Last 1 year - 1 day intervals" 131 | gen_graph "${pngprefix}${pngname}-6.png" "${rrdprefix}${rrdname}.rrd" "now-4y" "now" "86400" "Last 4 years - 1 day intervals" 132 | -------------------------------------------------------------------------------- /rrd/dpinger_rrd_update: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | if [ $# -lt 2 ] 4 | then 5 | echo "usage: $0 rrdname targetip [dpinger options]" 6 | exit 1 7 | fi 8 | name="$1" 9 | targetip="$2" 10 | shift 2 11 | options=$* 12 | 13 | # Where the dpinger executable is located 14 | dpinger=/usr/local/bin/dpinger 15 | 16 | 17 | rrdfile="${name}.rrd" 18 | if [ \! -w ${rrdfile} ] 19 | then 20 | echo "$0: file \"${rrdfile}\" does not exist or is not writable" 21 | exit 1 22 | fi 23 | 24 | ${dpinger} -f ${options} -s 500m -t 60s -r 60s ${targetip} | 25 | while read -r latency stddev loss; do 26 | rrdtool update "${rrdfile}" -t latency:stddev:loss "N:$latency:$stddev:$loss" 27 | done 28 | -------------------------------------------------------------------------------- /rrd/sample.html: -------------------------------------------------------------------------------- 1 | 2 | >Latency Statistics for WAN 3 | 4 | wan-1 5 |

6 | wan-2 7 |

8 | wan-3 9 |

10 | wan-4 11 |

12 | wan-5 13 |

14 | wan-6 15 | 16 | 17 | --------------------------------------------------------------------------------