├── README └── linewatch /README: -------------------------------------------------------------------------------- 1 | In Phil Hagen's excellent SANS FOR572 class, he discusses the problem 2 | of visually scanning through large data sets (netflow data, logs, etc). 3 | One low-technology trick is what Phil calls a "visual grep": 4 | scroll through the data rapidly and pause when you see an area of 5 | irregularity. In other words, you're looking for sections of the data 6 | which are different from the data around them. 7 | 8 | Phil was bemoaning the fact that there was no tool to automatically 9 | perform this task. At first I though you could use the Linux "watch" 10 | utility for this. But "watch" only looks for diffs between the output 11 | of successive runs of a single command. It doesn't work on a file of data. 12 | 13 | So I wrote "linewatch" as a tool to tell you when a given line of data 14 | is more than X% different from the line that comes before (the threshold 15 | is settable by the user). In its simplest form, you specify the 16 | threshold and your input file: 17 | 18 | $ linewatch -t 20% testdata 19 | 1! 0123456789012345678901234567890123456789012345678901234567890123456789 20 | 7! 0123456789012345678901234567890123456789 21 | 9! 012345678901234567890123456789 22 | 11! 01234567890123456789 23 | 13! 0123456789 24 | 14 lines processed 25 | 26 | Here we're asking for linewatch to show us any lines that differ by 20% 27 | or more from the previous line (you can say "-t 20%", "-t 20", or "-t .2"). 28 | The output is the line number and the content of the line. You will always 29 | see line #1 because it's 100% different from the "no data" that came before. 30 | The output includes the total lines in the file so you can see if there's 31 | a long gap between the last "different" line and the end of file. 32 | 33 | Some data, such as Unix logs or netflow data, start with a variable 34 | timestamp value, which you may wish to ignore when doing your comparisons. 35 | Or you might want to only compare certain portions of each line. 36 | So linewatch has the -o, -c, -f options to "slice and dice" each input 37 | line before processing. 38 | 39 | -o is a simple offset. All characters prior to the offset are simply 40 | ignored. So, for example, "-o 15" would ignore the normal Unix time/date 41 | stamp that's in typical Unix log files. 42 | 43 | -c gives you a more flexible way of selecting ranges of characters 44 | from your input. The syntax is Perl-like: "-c 20..40,60..80". 45 | Character position is numbered starting from zero, so "0..19" is 46 | the first 20 characters of the line. 47 | 48 | -f allows you to select fields using a delimiter (-d). The delimiter 49 | can be a Perl regular expression. The default delimiter is whitespace 50 | (in Perl regex language, "\s+"). Here's a trivial example which looks 51 | for the login shell field changing in a Unix password file: 52 | 53 | $ linewatch -f6 -d: -t0 /etc/passwd 54 | 11! nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false 55 | 12! root:*:0:0:System Administrator:/var/root:/bin/sh 56 | 13! daemon:*:1:1:System Services:/var/root:/usr/bin/false 57 | 14! _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico 58 | 15! _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false 59 | 86 lines processed 60 | 61 | We use ":" as the delimiter ("-d:"). Note that we're comparing on 62 | field number 6 ("-f6"), because fields are also numbered from zero. 63 | Our tolerance for change is zero ("-t0"), so any change at all will 64 | trigger output. 65 | 66 | It can be helpful to display context around the lines that meet 67 | our change threshold. Similar to grep, "-B" and "-A" can be used 68 | to specify a number of lines before and after the "hit" to display. 69 | "-C" can be used to specify the same amount of context lines to 70 | be displayed before and after. 71 | 72 | linewatch uses "!" to indicate the changing lines and a "." to show 73 | the context lines that don't meet the threshold: 74 | 75 | $ linewatch -t30 -C2 testdata 76 | 1! 0123456789012345678901234567890123456789012345678901234567890123456789 77 | 2. 0123456789012345678901234567890123456789012345678901234567890123456789 78 | 3. 012345678901234567890123456789012345678901234567890123456789 79 | 9. 012345678901234567890123456789 80 | 10. 012345678901234567890123456789 81 | 11! 01234567890123456789 82 | 12. 01234567890123456789 83 | 13! 0123456789 84 | 14. 0123456789 85 | 14 lines processed 86 | -------------------------------------------------------------------------------- /linewatch: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | # 3 | # linewatch - display lines that are different from the previous line 4 | # 5 | # Hal Pomeranz (hal@deer-run.com), 2015-04-16 6 | # This code released under Creative Commons Attribution license (CC BY) 7 | 8 | use strict; 9 | use Getopt::Std; 10 | 11 | $Getopt::Std::STANDARD_HELP_VERSION = 1; 12 | sub VERSION_MESSAGE {} 13 | sub HELP_MESSAGE { 14 | my($msg) = @_; 15 | 16 | if (length($msg) && !ref($msg)) { 17 | warn "*** $msg\n"; 18 | } 19 | 20 | die <<'EoUseMsg'; 21 | Usage: linewatch -t thresh [options] file ... 22 | 23 | -t thresh Specify threshold as decimal (.20) or percent (20%) 24 | 25 | Line splitting options: 26 | -o offset Character offset from beginning of string (start from 0) 27 | -c chars Char positions to compare (Ex: "0..20,40..80") 28 | -f fields Delimited fields to compare (Ex: "1,2,5..8") 29 | -d delim Field delimiter (default is "\s+") 30 | 31 | Context options: 32 | -C n show n lines before *and* after changed line 33 | -B n show n lines before changed line 34 | -A n show n lines after changed line 35 | EoUseMsg 36 | } 37 | 38 | my %opts = (); 39 | getopts('A:B:C:c:d:f:o:t:', \%opts); 40 | 41 | my $threshold = $opts{'t'}; 42 | if (!defined($threshold)) { 43 | HELP_MESSAGE("Please define a change threshold with -t"); 44 | } 45 | if ($threshold =~ /\%$/ || $threshold > 1) { 46 | $threshold =~ s/\%$//; 47 | $threshold = $threshold / 100; 48 | } 49 | 50 | my($offset, $charlist, $fields) = ($opts{'o'}, $opts{'c'}, $opts{'f'}); 51 | my($specs, $delim) = (); 52 | if (length($offset)) { 53 | unless ($offset =~ /^\d+$/) { 54 | HELP_MESSAGE("Invalid offset value $offset"); 55 | } 56 | $specs += 1; 57 | } 58 | if (length($fields)) { 59 | $specs = 1; 60 | $delim = $opts{'d'}; 61 | $delim = '\s+' unless (length($delim)); 62 | } 63 | if (length($charlist)) { 64 | $specs += 1; 65 | } 66 | 67 | # only allow one of -c/-f/-o -- this simplifies the main loop below 68 | if ($specs > 1) { 69 | HELP_MESSAGE("Only pick one of -c/-f/-o"); 70 | } 71 | 72 | # Users should only use -C or -A/-B. Nevertheless try to do something 73 | # sensible if they use both. 74 | # 75 | my($lines_before, $lines_after) = (); 76 | if ($opts{'C'} =~ /^\d+$/) { 77 | $lines_before = $lines_after = $opts{'C'}; 78 | } 79 | if ($opts{'A'} =~ /^\d+$/) { 80 | $lines_after = $opts{'A'}; 81 | } 82 | if ($opts{'B'} =~ /^\d+$/) { 83 | $lines_before = $opts{'B'}; 84 | } 85 | 86 | my $after_lines_to_output = 0; # tracks context lines left to output 87 | my @before_lines = (); # ring buffer holding previous lines to output 88 | my $curr_line = -1; # location of last line in @before_lines 89 | my($llen, @lastchars, $tlen, @thischars) = (); 90 | while (<>) { 91 | # $line is the characters from $_ we need to compare. 92 | my $line = $_; 93 | 94 | # if we're field-splitting, extract the fields and rebuild $line 95 | if (length($fields)) { 96 | my @f = split(/$delim/, $line); 97 | @f = eval("\@f[$fields]"); 98 | HELP_MESSAGE("Invalid field spec '$fields'") if ($@); 99 | $line = join('', @f); 100 | } 101 | 102 | # now split line into individual characters (@thischars) 103 | @thischars = split(//, $line); 104 | 105 | # if "-o", then chop out the appropriate number of leading chars 106 | splice(@thischars, 0, $offset) if ($offset); 107 | 108 | # extract character vectors specified with "-c" option if necessary 109 | if (length($charlist)) { 110 | @thischars = eval("\@thischars[$charlist]"); 111 | HELP_MESSAGE("Invalid character list '$charlist'") if ($@); 112 | } 113 | 114 | # $tlen is number of characters left to compare 115 | $tlen = scalar(@thischars); 116 | 117 | # figure out whether this character set or the previous one is longest 118 | my $max = ($tlen > $llen) ? $tlen : $llen; 119 | 120 | # count the number of character positions that are different 121 | my $mismatch = 0; 122 | for (my $i = 0; $i < $max; $i++) { 123 | $mismatch += 1 if ($thischars[$i] ne $lastchars[$i]); 124 | } 125 | 126 | # tracks whether we do output on this line so it doesn't get 127 | # into our context ring buffer 128 | my $line_was_output = 0; 129 | 130 | # if we've exceeded our tolerance, output some lines 131 | if ($max && (($mismatch / $max) > $threshold)) { 132 | # display any context lines before 133 | if (@before_lines) { 134 | my $increment = 0; 135 | if (scalar(@before_lines) == $lines_before) { # ring buffer full 136 | $increment = $curr_line + 1; 137 | } 138 | 139 | for (my $i = 0; $i < @before_lines; $i++) { 140 | my $index = ($i + $increment) % $lines_before; 141 | print $before_lines[$index]; 142 | } 143 | 144 | $curr_line = -1; 145 | @before_lines = (); 146 | } 147 | 148 | # display line that exceeded threshold, with "!" after line num 149 | print "$.!\t$_"; 150 | $line_was_output = 1; 151 | 152 | # set a counter for the number of lines of context to show after 153 | $after_lines_to_output = $lines_after; 154 | } 155 | elsif ($after_lines_to_output) { # we need to output lines of context 156 | print "$..\t$_"; 157 | $line_was_output = 1; 158 | $after_lines_to_output--; 159 | } 160 | 161 | # current line becomes previous line 162 | $llen = $tlen; 163 | @lastchars = @thischars; 164 | 165 | # save current line in the ring buffer since it's now an old line 166 | 167 | if ($lines_before && !$line_was_output) { 168 | $curr_line = ($curr_line + 1) % $lines_before; 169 | $before_lines[$curr_line] = "$..\t$_"; 170 | } 171 | } 172 | 173 | # output total number of lines seen 174 | print "$. lines processed\n"; 175 | --------------------------------------------------------------------------------