├── .example_pd.png ├── .example_pt.png ├── .gitignore ├── LICENSE ├── README.md ├── dates.gp ├── fmd ├── fmt ├── indiv_d.templ.gp ├── indiv_t.templ.gp ├── times.gp ├── waconv ├── waextr └── wastat /.example_pd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phikal/wastat/edc7e1fd0f1b0f3ae33a617b50f1cb1e9f446edb/.example_pd.png -------------------------------------------------------------------------------- /.example_pt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phikal/wastat/edc7e1fd0f1b0f3ae33a617b50f1cb1e9f446edb/.example_pt.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.dat 2 | *.png 3 | *.txt 4 | *.lis 5 | indiv_d.gp 6 | indiv_t.gp 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Legal Code 2 | 3 | CC0 1.0 Universal 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 6 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN 7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 9 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS 10 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM 11 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED 12 | HEREUNDER. 13 | 14 | Statement of Purpose 15 | 16 | The laws of most jurisdictions throughout the world automatically confer 17 | exclusive Copyright and Related Rights (defined below) upon the creator 18 | and subsequent owner(s) (each and all, an "owner") of an original work of 19 | authorship and/or a database (each, a "Work"). 20 | 21 | Certain owners wish to permanently relinquish those rights to a Work for 22 | the purpose of contributing to a commons of creative, cultural and 23 | scientific works ("Commons") that the public can reliably and without fear 24 | of later claims of infringement build upon, modify, incorporate in other 25 | works, reuse and redistribute as freely as possible in any form whatsoever 26 | and for any purposes, including without limitation commercial purposes. 27 | These owners may contribute to the Commons to promote the ideal of a free 28 | culture and the further production of creative, cultural and scientific 29 | works, or to gain reputation or greater distribution for their Work in 30 | part through the use and efforts of others. 31 | 32 | For these and/or other purposes and motivations, and without any 33 | expectation of additional consideration or compensation, the person 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she 35 | is an owner of Copyright and Related Rights in the Work, voluntarily 36 | elects to apply CC0 to the Work and publicly distribute the Work under its 37 | terms, with knowledge of his or her Copyright and Related Rights in the 38 | Work and the meaning and intended legal effect of CC0 on those rights. 39 | 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be 41 | protected by copyright and related or neighboring rights ("Copyright and 42 | Related Rights"). Copyright and Related Rights include, but are not 43 | limited to, the following: 44 | 45 | i. the right to reproduce, adapt, distribute, perform, display, 46 | communicate, and translate a Work; 47 | ii. moral rights retained by the original author(s) and/or performer(s); 48 | iii. publicity and privacy rights pertaining to a person's image or 49 | likeness depicted in a Work; 50 | iv. rights protecting against unfair competition in regards to a Work, 51 | subject to the limitations in paragraph 4(a), below; 52 | v. rights protecting the extraction, dissemination, use and reuse of data 53 | in a Work; 54 | vi. database rights (such as those arising under Directive 96/9/EC of the 55 | European Parliament and of the Council of 11 March 1996 on the legal 56 | protection of databases, and under any national implementation 57 | thereof, including any amended or successor version of such 58 | directive); and 59 | vii. other similar, equivalent or corresponding rights throughout the 60 | world based on applicable law or treaty, and any national 61 | implementations thereof. 62 | 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention 64 | of, applicable law, Affirmer hereby overtly, fully, permanently, 65 | irrevocably and unconditionally waives, abandons, and surrenders all of 66 | Affirmer's Copyright and Related Rights and associated claims and causes 67 | of action, whether now known or unknown (including existing as well as 68 | future claims and causes of action), in the Work (i) in all territories 69 | worldwide, (ii) for the maximum duration provided by applicable law or 70 | treaty (including future time extensions), (iii) in any current or future 71 | medium and for any number of copies, and (iv) for any purpose whatsoever, 72 | including without limitation commercial, advertising or promotional 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each 74 | member of the public at large and to the detriment of Affirmer's heirs and 75 | successors, fully intending that such Waiver shall not be subject to 76 | revocation, rescission, cancellation, termination, or any other legal or 77 | equitable action to disrupt the quiet enjoyment of the Work by the public 78 | as contemplated by Affirmer's express Statement of Purpose. 79 | 80 | 3. Public License Fallback. Should any part of the Waiver for any reason 81 | be judged legally invalid or ineffective under applicable law, then the 82 | Waiver shall be preserved to the maximum extent permitted taking into 83 | account Affirmer's express Statement of Purpose. In addition, to the 84 | extent the Waiver is so judged Affirmer hereby grants to each affected 85 | person a royalty-free, non transferable, non sublicensable, non exclusive, 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the 88 | maximum duration provided by applicable law or treaty (including future 89 | time extensions), (iii) in any current or future medium and for any number 90 | of copies, and (iv) for any purpose whatsoever, including without 91 | limitation commercial, advertising or promotional purposes (the 92 | "License"). The License shall be deemed effective as of the date CC0 was 93 | applied by Affirmer to the Work. Should any part of the License for any 94 | reason be judged legally invalid or ineffective under applicable law, such 95 | partial invalidity or ineffectiveness shall not invalidate the remainder 96 | of the License, and in such case Affirmer hereby affirms that he or she 97 | will not (i) exercise any of his or her remaining Copyright and Related 98 | Rights in the Work or (ii) assert any associated claims and causes of 99 | action with respect to the Work, in either case contrary to Affirmer's 100 | express Statement of Purpose. 101 | 102 | 4. Limitations and Disclaimers. 103 | 104 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 105 | surrendered, licensed or otherwise affected by this document. 106 | b. Affirmer offers the Work as-is and makes no representations or 107 | warranties of any kind concerning the Work, express, implied, 108 | statutory or otherwise, including without limitation warranties of 109 | title, merchantability, fitness for a particular purpose, non 110 | infringement, or the absence of latent or other defects, accuracy, or 111 | the present or absence of errors, whether or not discoverable, all to 112 | the greatest extent permissible under applicable law. 113 | c. Affirmer disclaims responsibility for clearing rights of other persons 114 | that may apply to the Work or any use thereof, including without 115 | limitation any person's Copyright and Related Rights in the Work. 116 | Further, Affirmer disclaims responsibility for obtaining any necessary 117 | consents, permissions or other rights required for any use of the 118 | Work. 119 | d. Affirmer understands and acknowledges that Creative Commons is not a 120 | party to this document and has no duty or obligation with respect to 121 | this CC0 or use of the Work. 122 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Wastat 2 | ====== 3 | 4 | Wastat is a toolkit for analysing whatsapp chats, creating statistics 5 | and plotting pretty graphs. 6 | 7 | Setup 8 | ===== 9 | 10 | It is assumed that you are using a *nix system, such as Linux, a BSD or 11 | MacOS. You will require a POSIX compatible shell, perl, [AWK] and 12 | [Gnuplot]. Basic acquaintance with shell operations is also expected. 13 | Gnuplot is not required if the user doesn't wish to create plots. 14 | 15 | To work with a chat, one first has to receive in in Email format. Later 16 | on, it might be possible to extract the necessary information from a 17 | SQLite database, which one can access when one's phone is rooted. Refer 18 | to the official WhatsApp FAQ to find out how to [Email a chat]. 19 | 20 | One should note that WhatsApp doesn't always allow exporting the full 21 | chat, due to the extension sizes. This is an external limitation this 22 | project can't do anything about. 23 | 24 | Parts 25 | ===== 26 | 27 | `waconv` 28 | -------- 29 | 30 | Since different whatsapp versions using different languages export chats 31 | in different ways, in a generally inconvenient format, the `waconv` 32 | script standardizes different formats into a simple to parse [TSV] 33 | structure. This means, that tools like [AWK] can easily process the 34 | chat structure from now on (`waextr` for example). 35 | 36 | Currently, three different formats are recognized, with the following 37 | associated codes: 38 | 39 | | Format | Code | Date | Time | 40 | |--------|-------|--------------|----------------| 41 | | US | `uk` | `MM/DD/YYYY` | `AM/AM` | 42 | | UK | `uk` | `DD/MM/YYYY` | `A.M./P.M.` | 43 | | German | `de` | `DD/MM/YYYY` | `vorm./nachm.` | 44 | 45 | Some of these might be out of date with newer versions, and will will be 46 | updated with newer versions, as soon as possible. 47 | 48 | To actually process a file, made up of lines like these (ie. the `uk` 49 | format): 50 | 51 | 52 | 24/01/2018, 9:49 p.m. - Faust: What meaning to these riddling words applies? 53 | 24/01/2016, 10:20 p.m. - Mephisto: I am the spirit, ever, that denies! 54 | And rightly so: since everything created 55 | In turn deserves the be annihilated. 56 | 57 | one would write `./waconv uk [chatfile]`, and redirect the output. The 58 | above example would thus become: 59 | 60 | 24/01/2018 21:49 faust what meaning to these riddling words applies 61 | 24/01/2016 22:20 mephisto i am the spirit ever that denies and rightly so since everything created in turn deserves the be annihilated 62 | 63 | This step is necessary if one wants to work with the following two tools. 64 | 65 | `waextr` 66 | -------- 67 | 68 | `Waextr` is basically just a helper script for `wastat`. It requires one 69 | argument, which may contain one of the following letter, to enable the 70 | output of certain columns. These are: `d` (to output the date), `t` (to 71 | output the times), `u` (to output the user) and `m` (to update the 72 | messages). So for example `waextr dm [chatfile]`, processing the example 73 | from above, would output: 74 | 75 | 24/01/2018 faust 76 | 24/01/2016 mephisto 77 | 78 | If one executes `wastat` using awk, setting the `usern` variable, only 79 | those lines will be printed, if the value matches the name. Hence, to output 80 | 81 | 24/01/2018 what meaning to these riddling words applies 82 | 83 | one would run `awk -v usern=faust -f waextr dm`. 84 | 85 | `wastat` 86 | -------- 87 | 88 | This main script has multiple commands, and overview can be generated if 89 | it is called without any arguments or by calling the script with the 90 | argument `help`. The same list is also presented here: 91 | 92 | - `wastat wc`: counts how often words have been used in messages 93 | - `wastat wo`: prints all words used in messages, each on one line 94 | - `wastat uc`: counts how often a _user_ (ie. number) has sent a message 95 | - `wastat uwc`: counts how many "words" each user has used 96 | - `wastat pt`: plots how many messages have been sent per minute 97 | - `wastat put`: plots how many messages selected users have sent per 98 | minute 99 | - `wastat pd`: plots how many messages have been sent each day 100 | - `wastat pud`: plots how many messages have been sent by selected users 101 | each day 102 | 103 | The axillary command `wastat clean` deletes all files and images 104 | generated by wastat. 105 | 106 | examples 107 | -------- 108 | 109 | - `wastat pt`: 110 | 111 | ![`pt`](./.example_pt.png) 112 | - `wastat pd`: 113 | 114 | ![`pd`](./.example_pd.png) 115 | 116 | Legal and other information 117 | =========================== 118 | 119 | This software has been placed into the public domain, or an approximation 120 | of it, under [CC0]. If there are any issues with the 121 | software, contact the [author] or visit the [GitHub repository]. 122 | 123 | The chat extract from this document has been taken from A. S. Kline's 124 | [English Translation] of J. W. Goethe's _Faust_. 125 | 126 | [AWK]: https://en.wikipedia.org/wiki/AWK 127 | [Gnuplot]: http://www.gnuplot.info/ 128 | [Email a chat]: https://faq.whatsapp.com/en/android/23756533/ 129 | [TSV]: https://en.wikipedia.org/wiki/Tab-separated_values 130 | [CC0]: ./LICENSE 131 | [author]: https://dyst.ax.lt/~xat/ 132 | [GitHub repository]: https://github.com/phikal/wastat 133 | [English Translation]: https://www.poetryintranslation.com/klinesfaust.php 134 | -------------------------------------------------------------------------------- /dates.gp: -------------------------------------------------------------------------------- 1 | set timefmt "%d/%m/%Y" 2 | set format x "%d/%m" 3 | set xdata time 4 | 5 | set grid 6 | set autoscale y 7 | set title "Messages in \"Chat\" over time" 8 | set xlabel "Date" 9 | set ylabel "Ammount of Messages" 10 | set ytic auto 11 | unset label 12 | set style data lines 13 | 14 | p "date.dat" u 2:1 notitle 15 | -------------------------------------------------------------------------------- /fmd: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # fill missing dates (and sort) 4 | # licence: MIT (see LICENSE) 5 | 6 | from datetime import datetime, timedelta 7 | import sys 8 | import re 9 | 10 | if len(sys.argv) < 3: 11 | print("usage: prog | fmd [start date] [end date]") 12 | quit(1) 13 | 14 | pat = re.compile(r'^\s*(\d+) (\d{2}/\d{2}/\d{4})') 15 | vals = dict([(datetime.strptime(match.group(2), "%m/%d/%Y"), 16 | int(match.group(1))) 17 | for line in sys.stdin 18 | for match in [pat.match(line)] 19 | if match]) 20 | 21 | form = "%d/%m/%Y" 22 | time = datetime.strptime(sys.argv[1], form) 23 | while time < datetime.strptime(sys.argv[2], form): 24 | print("{}\t{}".format(vals.get(time, 0), 25 | time.strftime(form))) 26 | time += timedelta(days=1) 27 | -------------------------------------------------------------------------------- /fmt: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # fill missing times (and sort) 4 | # licence: MIT (see LICENSE) 5 | 6 | import sys 7 | import re 8 | from datetime import datetime, timedelta 9 | 10 | form = "%H:%M" 11 | pat = re.compile(r'\s*(\d+) (\d{2}:\d{2})') 12 | vals = dict([(datetime.strptime(match.group(2), form), 13 | int(match.group(1))) 14 | for line in sys.stdin 15 | for match in [(pat.match(line))] 16 | if match]) 17 | 18 | time = datetime.strptime("00:00", form) 19 | while time < datetime.strptime("23:59", form): 20 | print("{}\t{}".format(vals.get(time, 0), 21 | datetime.strftime(time, form))) 22 | time += timedelta(minutes=1) 23 | -------------------------------------------------------------------------------- /indiv_d.templ.gp: -------------------------------------------------------------------------------- 1 | set term png 2 | set output "indiv_d.png" 3 | 4 | set timefmt "%d/%m/%Y" 5 | set format x "%d/%m" 6 | set xdata time 7 | 8 | set grid 9 | set autoscale y 10 | set title "Amount of messages in \"chat\" over time" 11 | set xlabel "Date" 12 | set ylabel "Message count" 13 | set ytic auto 14 | unset log 15 | unset label 16 | set style data lines 17 | 18 | -------------------------------------------------------------------------------- /indiv_t.templ.gp: -------------------------------------------------------------------------------- 1 | set term png 2 | set output "indiv_t.png" 3 | 4 | set timefmt "%H:%M" 5 | set format x "%H:%M" 6 | set xdata time 7 | 8 | set grid 9 | set autoscale y 10 | set title "Total messages in \"Chat\" over all days" 11 | set xlabel "Time" 12 | set ylabel "Ammount of Messages" 13 | set ytic auto 14 | set xtic auto 15 | set style data lines 16 | 17 | -------------------------------------------------------------------------------- /times.gp: -------------------------------------------------------------------------------- 1 | set timefmt "%H:%M" 2 | set format x "%H:%M" 3 | set xdata time 4 | 5 | set grid 6 | set autoscale y 7 | set title "Total messages in \"Chat\" over all days" 8 | set xlabel "Time" 9 | set ylabel "Ammount of Messages" 10 | set ytic auto 11 | set xtic auto 12 | set style data lines 13 | 14 | p "time.dat" u 2:1 notitle 15 | -------------------------------------------------------------------------------- /waconv: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | use strict; 4 | use warnings; 5 | 6 | $|++; 7 | 8 | # locale to convert from 9 | my $L = lc shift @ARGV; 10 | 11 | print "$0: Converts whatsapp chats from * to 24h&co. Format 12 | Usage: $0 [lang] [chat file] 13 | 14 | Supported locale: 15 | de: convert from german 16 | us: convert from us format (MM/DD/YYYY, AM/PM) 17 | uk: convert from uk format (DD/MM/YYYY, AM/PM) 18 | gen: convert from general format (DD/MM/YYYY) 19 | " and exit if $L =~ /^(h|$)/ or not @ARGV; 20 | 21 | # line, time and date (and not) regex 22 | my ($tre, $dre, $lre); 23 | 24 | if ($L eq "de") { 25 | $dre = qr{(\d{2}).(\d{2}).(\d{4})}; 26 | $tre = qr{(\d{1,2}):(\d{2}) (vorm|nachm).}; 27 | $lre = qr{^(\d{2}\.\d{2}\.\d{4}), (\d{1,2}:\d{2} (?:vorm|nachm)\.) - (?:([^:]*):)? (.*)$}; 28 | } elsif ($L eq "us") { 29 | $dre = qr{(\d{2})/(\d{2})/(\d{4})}; 30 | $tre = qr{(\d{1,2}):(\d{2}) (am|pm)}; 31 | $lre = qr{^(\d{1,2}/\d{2}/\d{4}), (\d{1,2}:\d{2} (?:am|pm)) - (?:([^:]*):)? (.*)$}; 32 | } elsif ($L eq "uk") { 33 | $dre = qr{(\d{2})/(\d{2})/(\d{4})}; 34 | $tre = qr{(\d{1,2}):(\d{2}) (a\.m\.|p\.m\.)}; 35 | $lre = qr{^(\d{2}/\d{2}/\d{4}), (\d{1,2}:\d{2} (?:a\.m\.|p\.m\.)) - (?:([^:]*):)? (.*)$}; 36 | } elsif ($L eq "gen") { 37 | $dre = qr{(\d{2})/(\d{2})/(\d{4})}; 38 | $tre = qr{(\d{1,2}):(\d{2})}; 39 | $lre = qr{^(\d{2}/\d{2}/\d{4}), (\d{1,2}:\d{2}) - (?:([^:]*):)? (.*)$}; 40 | } else { 41 | die "Locale not supported\n" 42 | } 43 | 44 | # convert date 45 | sub cd { 46 | my ($date) = @_; 47 | if ($date =~ m/$dre/) { 48 | my ($month, $day, $year) = ($1, $2, $3); 49 | ($month, $day, $year) = ($2, $1, $3) if $L eq "us" or $L eq "uk"; 50 | return sprintf "%02d/%02d/%04d", $day, $month, $year; 51 | } else { 52 | return $date; 53 | } 54 | } 55 | 56 | # convert any variation of AM/PM to standard AM/PM 57 | sub cXXtAM { 58 | my $time = shift @_; 59 | if ($time =~ m/$tre/) { 60 | my $noon; 61 | $noon = $3 eq "vorm" ? "am" : "pm" if $L eq "de"; 62 | $noon = $3 eq "a.m." ? "am" : "pm" if $L eq "uk"; 63 | $noon = $3 if $L eq "us"; 64 | return "$1:$2 $noon"; 65 | } else { return $time; } 66 | } 67 | 68 | # convert 12h to 24h 69 | sub c12t24 { 70 | chomp(my $x = shift @_); 71 | my @t = split(/[: ]/, $x); 72 | $t[0]= $t[0] + 12 if $t[2] eq "pm" and $t[0] < 12; 73 | $t[0]= 0 if $t[0] == 12 and $t[2] eq "am"; 74 | return $t[0], $t[1]; 75 | } 76 | 77 | # covert time 78 | sub ct { 79 | my ($time) = @_; 80 | if ($L eq "gen") { 81 | return $time 82 | } 83 | if ($time =~ m/$tre/) { 84 | my ($hour, $min) = c12t24(cXXtAM($time)); 85 | return sprintf "%02d:%02d", $hour, $min; 86 | } 87 | } 88 | 89 | while (<>) { 90 | chomp; 91 | if (m/$lre/gx) { 92 | my ($date, $time, $user, $message) = 93 | (&cd($1), &ct($2), lc $3, $4); 94 | $user =~ s/\W+//g; 95 | $message = lc $message; 96 | $message =~ s/[\W\s]+/ /g; 97 | print "\n" if $. > 1; 98 | print "$date\t$time\t$user\t$message" if $user ne "error"; 99 | } else { 100 | my $message = lc $_; 101 | $message =~ s/[\W\s]+/ /g; 102 | print " " .$message; 103 | } 104 | } 105 | -------------------------------------------------------------------------------- /waextr: -------------------------------------------------------------------------------- 1 | #!/usr/bin/awk -f 2 | 3 | # waextr - whatsapp extraction tool 4 | # License: MIT (see LICENSE) 5 | 6 | BEGIN { 7 | req = ARGV[1] 8 | date = (req ~ /d/) 9 | time = (req ~ /t/) 10 | user = (req ~ /u/) 11 | mess = (req ~ /m/) 12 | delete ARGV[1] 13 | 14 | FS = "\t" 15 | } 16 | 17 | usern && usern != $3 { next } 18 | 19 | date { printf("%s\t", $1) } 20 | time { printf("%s\t", $2) } 21 | user { printf("%s\t", $3) } 22 | mess { printf("%s", $4) } 23 | { print "" } 24 | -------------------------------------------------------------------------------- /wastat: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # wastat toolkit script 4 | # license: MIT (see LICENSE) 5 | 6 | errcho() { 7 | >&2 echo "$*" 8 | } 9 | 10 | show_help () { 11 | errcho "wastat toolkit v0.4, by phikal" 12 | errcho 13 | errcho "usage:" 14 | errcho " word-count, wc:" 15 | errcho " count word occurances" 16 | errcho " default chat file: chat.txt or 2nd argument" 17 | errcho " user-word-count, uwc:" 18 | errcho " count word occurances per user" 19 | errcho " default chat file: chat.txt or 2nd argument" 20 | errcho " avg-msg-length, aml:" 21 | errcho " average length of messages per user" 22 | errcho " default chat file: chat.txt or 2nd argument" 23 | errcho " word-output, wo:" 24 | errcho " output all words separately" 25 | errcho " default chat file: chat.txt or 2nd argument" 26 | errcho " user-count, uc:" 27 | errcho " count ammount of messages per user" 28 | errcho " default chat file: chat.txt or 2 argument" 29 | errcho " plot-dates, pd:" 30 | errcho " plot all dates" 31 | errcho " default chat file: chat.txt or 2nd argument" 32 | errcho " the first and last date automatically extracted" 33 | errcho " can be manaully set as 3rd and 4th argument" 34 | errcho " plot-users, pu:" 35 | errcho " plot all users' dates individually" 36 | errcho " default chat file: chat.txt or 2nd argument" 37 | errcho " the first and last date automatically extracted" 38 | errcho " can be manaully set as 3rd and 4th argument" 39 | errcho " programm will Query users to be plotted from stdin" 40 | errcho " plot-user-times, put:" 41 | errcho " plot accumalative ammount of messages sent over all days individually" 42 | errcho " default chat file: chat.txt" 43 | errcho " programm will Query users to be plotted from stdin" 44 | errcho " plot-times, pt:" 45 | errcho " plot accumalative ammount of messages sent over all days" 46 | errcho " default chat file: chat.txt or 2nd argument" 47 | errcho " clean:" 48 | errcho " remove all generated content from direcory" 49 | errcho " help, -h:" 50 | errcho " this message" 51 | } 52 | 53 | plot_times () { 54 | if ! type gnuplot > /dev/null; then 55 | errcho "gnuplot is needed to plot the data" 56 | exit 1 57 | fi 58 | 59 | INPUT=${1:-chat.txt} 60 | 61 | errcho "Using $INPUT to plot accumulative message count over all days" 62 | ./waextr t "$INPUT" | sort | uniq -c | ./fmt > time.dat 63 | errcho "Finished extraction, opening gnuplot..." 64 | gnuplot -p times.gp 65 | } 66 | 67 | 68 | plot_user_times () { 69 | if ! type gnuplot > /dev/null; then 70 | errcho "gnuplot is needed to plot the data" 71 | exit 1 72 | fi 73 | 74 | INPUT=${1:-chat.txt} 75 | 76 | errcho "Using $INPUT to plot accumulative message count over all days" 77 | errcho "You will be presented a list of all individual partipants from $INPUT" 78 | errcho "Please [c]onfirm or [m]odify user names (anything else is ignored)" 79 | 80 | cp indiv_t.templ.gp indiv_t.gp 81 | 82 | COUNT=0 83 | mkdir -f user 84 | for user in $(./waextr u "$INPUT" | sort | uniq | sed 's/ /_/g'); do 85 | name=$(echo "$user" | sed 's/_/ /g') 86 | 87 | printf "%s\t\t[c/m/*]: " "$user" 88 | read -r choice 89 | case "$choice" in 90 | c) ;; 91 | m) printf "Enter alias: " 92 | read -r name;; 93 | *) continue ;; 94 | esac 95 | 96 | awk -v usern="$user" -f ./waextr d "$INPUT" |\ 97 | ./fmt > "user/$user.dat" 98 | printf " (%s)" "$(cat < "user/$user.dat" 2> /dev/null | wc -l)" 99 | 100 | if [ $COUNT -eq 0 ]; then 101 | printf "p" >> indiv_t.gp 102 | else 103 | printf "," >> indiv_t.gp 104 | fi 105 | 106 | printf '\\\n\t"user/%s.dat" u 2:1 t "%s"' "$user" "$name" >> indiv_t.gp 107 | COUNT=$((COUNT+1)) 108 | 109 | echo 110 | done 111 | 112 | if [ $COUNT -eq 0 ]; then 113 | errcho "It seems like you either didn't select anyone, or nobody could be selected...\nQuitting." 114 | exit 1 115 | fi 116 | 117 | errcho "Finished extraction, opening gnuplot..." 118 | gnuplot -p indiv_t.gp 119 | } 120 | 121 | plot_users () { 122 | if ! type gnuplot > /dev/null; then 123 | errcho "gnuplot is needed to plot the data" 124 | exit 1 125 | fi 126 | 127 | INPUT=${1:-chat.txt} 128 | 129 | FIRST=$(head -1 "$INPUT" | cut -f1) 130 | LAST=$(tail -1 "$INPUT" | cut -f1) 131 | 132 | START=${2:-$FIRST} 133 | END=${3:-$LAST} 134 | 135 | errcho "Using $INPUT to plot $COUNT users over time from $START to $END" 136 | errcho "You will be presented a list of all individual partipants from $INPUT" 137 | errcho "Please [c]onfirm or [m]odify user names (anything else is ignored)" 138 | 139 | cp indiv_d.templ.gp indiv_d.gp 140 | 141 | COUNT=0 142 | for user in $(./waextr u "$INPUT" | sort | uniq | sed 's/ /_/g'); do 143 | printf "%s\t\t[c/m/*]: " "$user" 144 | read -r choice 145 | 146 | ouser=$(echo "$user" | sed 's/_/ /g') 147 | name=$ouser 148 | 149 | case "$choice" in 150 | c) ;; 151 | m) printf "Enter alias: " 152 | read -r name;; 153 | *) continue ;; 154 | esac 155 | 156 | awk -v usern="$user" -f ./waextr d "$INPUT" |\ 157 | sort | uniq -c | ./fmd "$START" "$END" > "$user.dat" 158 | printf " (%s)" "$(cat < "$user.dat" 2> /dev/null | wc -l)" 159 | 160 | if [ $COUNT -eq 0 ]; then 161 | printf "p" >> indiv_d.gp 162 | else 163 | printf "," >> indiv_d.gp 164 | fi 165 | 166 | printf '\\\n\t"%s.dat" u 2:1 t "%s"' "$user" "$name" >> indiv_d.gp 167 | COUNT=$((COUNT + 1)) 168 | 169 | echo 170 | done 171 | 172 | if [ $COUNT -eq 0 ]; then 173 | errcho "It seems like you either didn't select anyone, or nobody could be selected...\nQuitting." 174 | exit 1 175 | fi 176 | 177 | errcho "Finished extraction, opening gnuplot..." 178 | gnuplot -p indiv_d.gp 179 | } 180 | 181 | plot_dates () { 182 | if ! type gnuplot > /dev/null; then 183 | errcho "gnuplot is needed to plot the data" 184 | exit 1 185 | fi 186 | 187 | INPUT=${1:-chat.txt} 188 | 189 | FIRST=$(head -1 "$INPUT" | cut -f1) 190 | LAST=$(tail -1 "$INPUT" | cut -f1) 191 | 192 | START=${2:-$FIRST} 193 | END=${3:-$LAST} 194 | 195 | errcho "Using $INPUT to plot message count over time from $START to $END" 196 | ./waextr d "$INPUT" | sort | uniq -c | ./fmd "$START" "$END" > date.dat 197 | errcho "Finished extraction, opening gnuplot..." 198 | gnuplot -p dates.gp 199 | } 200 | 201 | user_count () { 202 | INPUT=${1:-chat.txt} 203 | ./waextr u "$INPUT" | sort | uniq -c | sort -nr 204 | } 205 | 206 | word_output () { 207 | INPUT=${1:-chat.txt} 208 | ./waextr m "$INPUT" | grep -E -o "\w+" 209 | } 210 | 211 | word_count () { 212 | INPUT=${1:-chat.txt} 213 | ./waextr m "$INPUT" | grep -E -o "\w+" |\ 214 | sort | uniq -c | sort -nr 215 | } 216 | 217 | avg_msg_length () { 218 | INPUT=${1:-chat.txt} 219 | 220 | for user in $(./waextr u "$INPUT" | sort | uniq | sed 's/ /_/g'); do 221 | awk -v user="$user"\ 222 | 'BEGIN { FS="\t"; gsub(/_/, " ", user) } 223 | $3 == user { count += split($4, a, / */); msgs++ } 224 | END { printf("%8g %s\n", count/msgs, user) }' "$INPUT" 225 | done | sort -nr 226 | } 227 | 228 | user_word_count () { 229 | INPUT=${1:-chat.txt} 230 | 231 | for user in $(./waextr u "$INPUT" | sort | uniq | sed 's/ /_/g'); do 232 | awk -v user="$user"\ 233 | 'BEGIN { FS="\t"; gsub(/_/, " ", user) } 234 | $3 == user { count += split($4, a, / */) } 235 | END { printf("%8d %s\n", count, user) }' "$INPUT" 236 | done | sort -nr 237 | } 238 | 239 | clean() { 240 | rm ./*.dat 2> /dev/null 241 | rm indiv_{t,d}.gp 2> /dev/null 242 | errcho "Cleaned up." 243 | } 244 | 245 | case "$1" in 246 | word-count|wc) shift; word_count "$@";; 247 | user-word-count|uwc) shift; user_word_count "$@";; 248 | avg-msg-length|aml) shift; avg_msg_length "$@";; 249 | word-output|wo) shift; word_output "$@";; 250 | user-count|uc) shift; user_count "$@";; 251 | plot-dates|pd) shift; plot_dates "$@";; 252 | plot-users|pu) shift; plot_users "$@";; 253 | plot-times|pt) shift; plot_times "$@";; 254 | plot-user-times|put) shift; plot_user_times "$@";; 255 | clean) clean;; 256 | help|-h) show_help;; 257 | *) show_help; exit 1;; 258 | esac 259 | --------------------------------------------------------------------------------