├── img ├── times.png ├── trend.png ├── network.png ├── shares.png └── activity.png ├── style.mplstyle ├── LICENSE ├── README.md └── analyzer.py /img/times.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/empicano/whatsapp-analyzer/HEAD/img/times.png -------------------------------------------------------------------------------- /img/trend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/empicano/whatsapp-analyzer/HEAD/img/trend.png -------------------------------------------------------------------------------- /img/network.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/empicano/whatsapp-analyzer/HEAD/img/network.png -------------------------------------------------------------------------------- /img/shares.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/empicano/whatsapp-analyzer/HEAD/img/shares.png -------------------------------------------------------------------------------- /img/activity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/empicano/whatsapp-analyzer/HEAD/img/activity.png -------------------------------------------------------------------------------- /style.mplstyle: -------------------------------------------------------------------------------- 1 | # For an explanation of the different attributes, see: 2 | # https://matplotlib.org/tutorials/introductory/customizing.html#customizing-with-matplotlibrc-files 3 | 4 | axes.titlepad : 15 5 | axes.titleweight : bold 6 | axes.titlesize : x-large 7 | axes.xmargin : 0 8 | axes.ymargin : 0 9 | axes.spines.top : False 10 | axes.spines.right : False 11 | axes.axisbelow : True 12 | axes.labelsize : medium 13 | 14 | figure.subplot.hspace : 0.25 15 | 16 | font.family : sans-serif 17 | font.size : 8 18 | 19 | grid.color : eeeeee 20 | 21 | legend.loc : best 22 | legend.frameon : False 23 | legend.fontsize : large 24 | legend.handletextpad: 2 25 | 26 | lines.solid_capstyle : butt 27 | lines.linewidth : 3 28 | 29 | xtick.major.pad : 10 30 | xtick.major.size : 5 31 | xtick.labelsize : medium 32 | 33 | ytick.major.pad : 10 34 | ytick.major.size : 5 35 | ytick.labelsize : medium 36 | 37 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Felix Böhm 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Introduction 2 | WhatsApp-Analyzer is a statistical analysis tool for WhatsApp chats. Working on the chat files that can be exported from WhatsApp it generates various plots showing, for example, which other participant a user responds to the most. 3 | 4 | ## List of Plots 5 | 6 | ### Trend 7 | ![](img/trend.png) 8 | In this plot we can see in gray the raw number of messages for every day (the three dates with the most messages annotated), in blue the monthly mean of messages per day and in red the overall mean of messages per day. The matplotlib zoom function in the lower left corner can be helpful to explore the graph. 9 | 10 | ### Activity 11 | ![](img/activity.png) 12 | The activity plot shows the weekly means of messages per day for every user. Based on this we can explore in which periods different participants were the most (or the least) active. For easier comparison the graphs of all other users are displayed each time in addition to the main plot in light gray. Users are sorted by total message count, their order and color stays the same across all following plots. 13 | 14 | ### Shares 15 | ![](img/shares.png) 16 | This graphic shows three different plots. On the left side we see the shares of messages, words and media files per user, each annotated with the real value. On the right side we can find two bar plots exposing in more detail the relationship between those three values. 17 | 18 | In this particular case we could for example discover that although the pink user has written less messages than the violet user, because of his tendency to write long messages containing an above average number of media files, in total he has written more text and has sent more media files than the violet user. 19 | 20 | ### Times 21 | ![](img/times.png) 22 | In the upper plot we can see for every hour of the week the average message count. In our example graph we could for instance spot that during the night the conversation normally comes to a halt and that friday and saturday evening are on average by far the busiest times in the group. Plotted in gray we find the daily mean of messages. Factoring this in we could argue that whilst friday and saturday evening are usually both equally busy, on the first are still sent the most messages overall. 23 | 24 | The lower plot displays the hourly mean of messages on a day. It is additionally shown in the upper plot in the same color for the purpose of easier comparison. 25 | 26 | ### Network 27 | ![](img/network.png) 28 | This alluvial diagram shows how often users respond to each other. A line from left to right represents the number of responses of the user on the left to messages of the user on the right. A vertical line thus represents answers to oneself, that is to say consecutive messages from the same user. As response to message M is seen the message that follows M in the chat (the first message in the chat doesn't respond to anything). 29 | 30 | ## Instructions 31 | To get started, export the chat file that you want to analyze to your computer. To do that, open WhatsApp on your mobile phone and select the desired chat. Under **group / contact info** you will find the button **export chat**. Choose **without media**. 32 | 33 | WhatsApp is very inconsistent with the format of exported files. Depending on mobile phone OS and language, the time, date and status message format will be different. This program expects the following format: 34 | 35 | ``` 36 | dd.mm.yy, hh:mm:ss: Third Witch: That will be ere the set of sun. 37 | 27.03.19, 06:03:56: First Witch: Where the place? 38 | 27.03.19, 06:03:59: Second Witch: Upon the heath, here: 39 | 27.03.19, 06:04:05: Third Witch: There to meet with Macbeth. 40 | 27.03.19, 06:04:09: First Witch: I come, Graymalkin! 41 | 27.03.19, 06:04:14: Second Witch: Paddock calls. 42 | 27.03.19, 06:04:16: Third Witch: Anon. 43 | ``` 44 | 45 | Run `python3 analyzer.py [file path] ` in your terminal to start the analysis. 46 | 47 | **Needed Dependencies:** [matplotlib](https://matplotlib.org), [numpy](http://www.numpy.org) 48 | 49 | ## Remarks 50 | - Media files are recognized on the basis of their **** (video, sticker, ... respectively) tag in the message. Keep this in mind when preparing the chat file for analysis. 51 | - You have the option to give boundary dates (both included, format: **dd.mm.yyyy**) as arguments to the program. This makes it easier to, for example, compare outputs for different years. Both dates have to be given, if not, the program falls back to using the date of the first and last message as boundaries. 52 | - As people sometimes go to bed (and thus write messages) after midnight, a notion of a day from 00:00 to 23:59 would not represent the data in an optimal way. For all statistics days therefore begin at 04:00 in the morning and correspondingly end at 03:59 the day after. You can change this notion by setting the **DAYSTART** variable at the top of the **analyzer.py** file. 53 | - If there are more than eight participants to a group, only the seven most active ones are shown in the plots, the rest gets pooled together to keep the plots from getting confusing. 54 | - The color palette can be modified at the top of the **analyzer.py** file. 55 | - New line characters in messages are not a problem, before analyzing a message they are converted to whitespace. 56 | - The program could easily be adjusted to analyse other communication means than WhatsApp chats (e.g. emails). Either one brings the data in the earlier on discussed form, or one changes the way the program reads in the data (see the **Text** class in the **analyzer.py** file for this). 57 | 58 | -------------------------------------------------------------------------------- /analyzer.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import datetime as dt 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | import matplotlib.dates as mdates 7 | 8 | from matplotlib.lines import Line2D 9 | from numpy import mean 10 | 11 | 12 | # path to chat file 13 | PATH = sys.argv[1] 14 | # define the hour in which a day starts and ends, set 0 for start at 00:00 and end at 23:59 15 | DAYSTART = 4 16 | # interval to consider 17 | BOUND = [None, None] 18 | if len(sys.argv) > 3: 19 | for i in range(2): 20 | BOUND[i] = dt.datetime.strptime(sys.argv[2+i], '%d.%m.%Y') 21 | BOUND[i] = BOUND[i] - dt.timedelta(days=1) if BOUND[i].hour < DAYSTART else BOUND[i] 22 | # words excluded in all statistics 23 | EXCLUDED = [''] 24 | # color scheme 25 | COLORS = [ 26 | '#1f2041', # major plots 27 | '#dd0426', # secondary plots 28 | '#bbbbbb', # tertiary plots 29 | '#dddddd', # background plots 30 | '#41b08e', # user 1 31 | '#eb7232', # user 2 32 | '#8188bf', # user 3 33 | '#e75aa7', # user 4 34 | '#86bf39', # user 5 35 | '#f3c218', # user 6 36 | '#c69d59', # user 7 37 | '#8d8d8d', # user 8 38 | ] 39 | 40 | # display up to this number of users, if greater, add up the rest and display as one 41 | MEMBERMAX = 8 42 | 43 | 44 | class Member: 45 | """Represent a chat participator""" 46 | hours = [[0]*24 for _ in range(7)] # messages at weekday in hour 47 | period = 0 # time frame of chat in days 48 | first = None # date of first message 49 | days = [] # messages mapped on days (all users) 50 | 51 | def __init__(self, name): 52 | """Initialize member object""" 53 | self.name = name 54 | self.words = 0 55 | self.days = [0]*Member.period # messages mapped on days (one user) 56 | self.media = 0 # number of media files sent 57 | self.answers = {} 58 | 59 | def add_message(self, message, date, predec): 60 | """Add data from one message to the user object""" 61 | Member.hours[date.weekday()][date.hour] += 1 62 | index = (date - Member.first).days 63 | Member.days[index] += 1 64 | self.days[index] += 1 65 | self.answers.setdefault(predec, 0) 66 | self.answers[predec] += 1 67 | for word in message.split(): 68 | if word not in EXCLUDED: 69 | self.words += 1 70 | elif word == 'omitted>': 71 | self.media += 1 72 | 73 | 74 | class Text: 75 | """Contain methods for working on the chat file""" 76 | 77 | @staticmethod 78 | def extract(line, members, predec): 79 | """Extract data out of one line""" 80 | try: 81 | date = dt.datetime( 82 | 2000+int(line[6:8]), 83 | int(line[3:5]), 84 | int(line[:2]), 85 | hour=int(line[10:12]), 86 | minute=int(line[13:15]), 87 | second=int(line[16:18]) 88 | ) 89 | # shift date according to DAYSTART 90 | date = date - dt.timedelta(days=1) if date.hour < DAYSTART else date 91 | if BOUND[0] and not (BOUND[0] <= date < BOUND[1]): return 92 | if not Member.first: 93 | if BOUND[0]: Member.first = BOUND[0] 94 | else: Member.first = date.replace(hour=DAYSTART, minute=0, second=0) 95 | line = line[20:] 96 | name, line = line.split(': ', 1) 97 | except ValueError: 98 | pass # ignore status messages 99 | else: 100 | # check if we have to change the index in the days list 101 | while (max(date, BOUND[1] if BOUND[1] else date) - Member.first).days >= Member.period: 102 | Member.period += 1 103 | Member.days.append(0) 104 | for member in members: 105 | member.days.append(0) 106 | # add data to member object 107 | if all(member.name != name for member in members): 108 | members.append(Member(name)) 109 | for m in members: 110 | if m.name == name: 111 | m.add_message(line, date, predec) 112 | return m.name 113 | return predec 114 | 115 | @staticmethod 116 | def process(path): 117 | """Extract and order data out of given chat file""" 118 | members = [] 119 | prev = None # previous line in chat file 120 | predec = None # author of previous message (predecessor) 121 | with open(path) as chat: 122 | for line in chat: 123 | n = None 124 | if ( 125 | len(line) > 20 126 | and line[2] == line[5] == '.' 127 | and line[8:10] == ', ' 128 | and line[12] == line[15] == line[18] == ':' 129 | ): 130 | if prev: predec = Text.extract(prev, members, predec) 131 | prev = line 132 | else: 133 | if prev: prev = prev[:-1] + ' ' + line 134 | Text.extract(prev, members, predec) 135 | members = sorted(members, key=lambda m: sum(m.days), reverse=True) 136 | # if number of members is greater than MEMBERMAX, add up the rest 137 | if len(members) > MEMBERMAX: 138 | others = Member('Others') 139 | for m in members[MEMBERMAX-1:]: 140 | for i, d in enumerate(m.days): 141 | others.days[i] += d 142 | others.words += m.words 143 | others.media += m.media 144 | for n, c in m.answers.items(): 145 | others.answers.setdefault(n, 0) 146 | others.answers[n] += c 147 | members = members[:MEMBERMAX-1] + [others] 148 | for m in members: 149 | m.answers.setdefault('Others', 0) 150 | drop = [] 151 | for n, c in m.answers.items(): 152 | if n not in [m.name for m in members]: 153 | m.answers['Others'] += m.answers[n] 154 | drop.append(m.answers[n]) 155 | for n in drop: 156 | m.answers.pop(n, None) 157 | return members 158 | 159 | 160 | def trend(members): 161 | """Visualize overall message count trend. 162 | 163 | This includes raw message count/day, mean count/day for every 164 | month and overall mean count/day. 165 | """ 166 | # convert from daily message count to monthly average 167 | start = ( 168 | Member.first if Member.first.day==1 169 | else Member.first.replace( 170 | day=1, 171 | month=Member.first.month%12+1, 172 | year=Member.first.year+1 if Member.first.month==12 else Member.first.year 173 | ) 174 | ) 175 | last = Member.first + dt.timedelta(days=Member.period) 176 | delta_months = (last.year - start.year) * 12 + last.month - start.month 177 | # get indexes of first day of every month in days list 178 | indexes = [(start-Member.first).days] + [(start.replace( 179 | month=(start.month+i) % 12 + 1, 180 | year=start.year + (start.month+i) // 12 181 | ) - Member.first).days for i in range(0, delta_months)] 182 | # get monthly messages/day mean 183 | months = [mean(Member.days[indexes[i]:indexes[i+1]]) for i in range(len(indexes)-1)] 184 | 185 | # plot total messages per day 186 | plt.figure() 187 | dates = [Member.first.date() + dt.timedelta(days=i) for i in range(Member.period)] 188 | s = plt.stem(dates, Member.days, markerfmt=' ', basefmt=' ', label='Total Messages per Day') 189 | plt.setp(s[1], linewidth=0.5, color=COLORS[2]) 190 | # plot overall mean of messages per day 191 | mn = mean(Member.days) 192 | plt.axhline(mn, color=COLORS[1], label='Overall Mean of Messages per Day') 193 | # plot monthly mean of messages per day 194 | x = [dates[i] for i in indexes[:-1]] 195 | plt.plot(x, months, color=COLORS[0], label='Monthly Mean of Messages per Day') 196 | 197 | # set style attributes 198 | plt.xlim( 199 | Member.first.date() - dt.timedelta(days=1), 200 | Member.first.date() + dt.timedelta(Member.period) 201 | ) 202 | plt.ylim(0, 1.05*max(Member.days)) 203 | plt.gca().yaxis.grid(True) 204 | plt.legend() 205 | plt.title('Messages per Day (Over a Period of ' + str(Member.period) + ' Days)') 206 | # set formatter and locator (autolocator has problems setting good date xticks) 207 | plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) 208 | plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=(Member.period // 250) or 1)) 209 | plt.gca().callbacks.connect( 210 | 'xlim_changed', 211 | lambda ax: ax.xaxis.set_major_locator( 212 | mdates.MonthLocator(interval=(int(ax.get_xlim()[1] - ax.get_xlim()[0]) // 250) or 1) 213 | ) 214 | ) 215 | 216 | # annotate mean line 217 | plt.annotate( 218 | '{0:.{digits}f}'.format(mn, digits=2), 219 | xy=(Member.first.date() + dt.timedelta(days=Member.period), mn), 220 | xytext=(8, -3), 221 | textcoords='offset points', 222 | ) 223 | # annotate maxima 224 | annotations = [] 225 | for i, m in enumerate(Member.days): 226 | for j, a in enumerate(annotations): 227 | if m > a[1]: 228 | annotations.insert(j, (dates[i], m)) 229 | if len(annotations) > 3: del annotations[-1] 230 | break 231 | else: 232 | if len(annotations) < 3: 233 | annotations.append((dates[i], m)) 234 | plt.scatter(*zip(*annotations), color=COLORS[2], marker='.') 235 | for a, m in annotations: 236 | plt.annotate( 237 | a.strftime('%d.%m.%Y'), 238 | xy=(a, m), 239 | xytext=(-9, -10), 240 | rotation=90, 241 | textcoords='offset points', 242 | ) 243 | 244 | 245 | def activity(members): 246 | """Visualize member activity over whole chat period. 247 | 248 | Display weekly means for every user in a spaghetti plot emphasizing 249 | one user at a time. 250 | """ 251 | # compute weekly means 252 | fig, axarr = plt.subplots(len(members), sharex=True, sharey=True, squeeze=False) 253 | axarr = [ax for lt in axarr for ax in lt] 254 | index = (7 - Member.first.weekday()) % 7 255 | weeks = [ 256 | [mean(members[i].days[k:k+7]) for k in range(index, Member.period-6, 7)] 257 | for i in range(len(members)) 258 | ] 259 | dates = [Member.first.date() + dt.timedelta(days=i) for i in range(index, Member.period-6, 7)] 260 | 261 | # plot multiple times with different emphasis 262 | for i, member in enumerate(members): 263 | for j in range(len(members)): 264 | axarr[i].plot(dates, weeks[j], color=COLORS[3], linewidth=0.5) 265 | axarr[i].plot(dates, weeks[i], color=COLORS[i+4]) 266 | # set style attributes 267 | axarr[i].yaxis.grid(True) 268 | if weeks[0]: axarr[i].set_ylim(0, 1.1*max([max(l) for l in weeks])) 269 | axarr[i].set_ylabel(member.name, labelpad=20, rotation=0, ha='right') 270 | plt.xlim( 271 | Member.first.date() - dt.timedelta(days=1), 272 | Member.first.date() + dt.timedelta(Member.period) 273 | ) 274 | # set formatter and locator (autolocator has problems setting good date xticks) 275 | axarr[i].xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) 276 | axarr[i].xaxis.set_major_locator(mdates.MonthLocator(interval=(Member.period // 250) or 1)) 277 | axarr[i].callbacks.connect( 278 | 'xlim_changed', 279 | lambda ax: ax.xaxis.set_major_locator( 280 | mdates.MonthLocator(interval=(int(ax.get_xlim()[1] - ax.get_xlim()[0]) // 250) or 1) 281 | ) 282 | ) 283 | 284 | # set title 285 | fig.add_subplot(111, frameon=False) 286 | plt.tick_params(labelcolor='none', left=False, bottom=False) 287 | plt.title('User Activity (Messages / Day Weekly Means)') 288 | 289 | 290 | def shares(members): 291 | """Visualize conversation shares. 292 | 293 | This includes number of messages as share, number of words as share 294 | and average words per message. 295 | """ 296 | # plot stacked bar plots visualizing shares of messages, text and media files 297 | fig = plt.figure() 298 | members = members[::-1] 299 | count = [ 300 | [sum(m.days) for m in members], 301 | [m.words for m in members], 302 | [m.media for m in members] 303 | ] 304 | for i in range(3): 305 | ax = fig.add_subplot(161 + i, xlim=[0, 1]) 306 | c = count[i] 307 | total = sum(c) 308 | shares = [c / total if total else 1 / len(members) for c in c] 309 | for j, member in enumerate(members): 310 | x = plt.bar(0.6, shares[j], 0.6, bottom=sum(shares[:j]), color=COLORS[len(members)-j+3]) 311 | p = x.patches[0] 312 | # annotate segments with total value 313 | if p.get_height() > 0.03: 314 | ax.text(0.6, p.get_y() + shares[j] / 2, c[j], ha='center', va='center') 315 | # annotate segments with user names 316 | if i == 0: 317 | ax.text(-0.3, p.get_y() + shares[j] / 2, member.name, ha='right', va='center') 318 | 319 | # set style attributes 320 | ax.spines['bottom'].set_visible(False) 321 | ax.tick_params(direction='inout', length=10) 322 | ax.xaxis.set_visible(False) 323 | ax.set_yticks([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) 324 | if i: ax.set_yticklabels([]) 325 | else: ax.set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()]) 326 | ax.text( 327 | p.get_x() + p.get_width() / 2, 328 | -0.04, 329 | ('Messages Sent', 'Words Written', 'Media Files Sent')[i], 330 | ha='center' 331 | ) 332 | 333 | # set title 334 | fig.add_subplot(121, frameon=False) 335 | plt.tick_params(labelcolor='none', left=False, bottom=False) 336 | plt.title('Shares of Messages, Words and Media Files per User') 337 | 338 | # plot average number of words and media files per message 339 | averages = [ 340 | [m.words / sum(m.days) for m in members], 341 | [m.media / sum(m.days) for m in members] 342 | ] 343 | titles = [ 344 | 'Average Words per Message', 345 | 'Average Media Files per Message' 346 | ] 347 | for i in range(2): 348 | # plot overall mean 349 | ax = fig.add_subplot(220 + (i+1)*2, xmargin=0.05, ymargin=0.15) 350 | mean = sum(count[i+1]) / sum(Member.days) 351 | plt.axvline(mean, color=COLORS[2], label='Overall Mean', zorder=0) 352 | plt.legend() 353 | # plot bar chart 354 | plt.barh(range(len(members)), averages[i], 0.5, color=COLORS[3+len(members):3:-1]) 355 | plt.title(titles[i]) 356 | # set style attributes 357 | ax.xaxis.grid(True) 358 | ax.yaxis.set_visible(False) 359 | 360 | 361 | def times(members): 362 | """Visualize message count averages in different time frames. 363 | 364 | This includes message count mean per hour of the day and message 365 | count mean per day of the week. 366 | """ 367 | weekday_counts = [0]*7 368 | for i in range(Member.period): 369 | weekday_counts[(Member.first + dt.timedelta(days=i)).weekday()] += 1 370 | 371 | # plot message hourly message count mean (one week) 372 | fig = plt.figure() 373 | ax = fig.add_subplot(211, xmargin=0.05, ymargin=0.1) 374 | weekdays = [sum(Member.hours[i]) for i in range(7)] 375 | w_means = list(map(lambda w, c: w / c if c else 0, weekdays, weekday_counts)) 376 | for i in range(7): 377 | plt.plot( 378 | [i*24, (i+1)*24], 379 | (w_means[i]/24,)*2, 380 | color=COLORS[2], 381 | label=None if i else 'Daily Mean' 382 | ) 383 | div = sum(weekday_counts) 384 | d_means = [x / div if div else 0 for x in [sum(col) for col in zip(*Member.hours)]] 385 | plt.plot( 386 | range(24*7+1), 387 | d_means[DAYSTART:]+6*d_means+d_means[:DAYSTART+1], 388 | COLORS[1], 389 | lw=0.5, 390 | label='Hourly Mean (Overall)' 391 | ) 392 | raw = [e / c if c else 0 for h, c in zip(Member.hours, weekday_counts) for e in h] 393 | plt.plot(range(24*7+1), raw[DAYSTART:] + raw[:DAYSTART+1], COLORS[0]) 394 | 395 | # set style attributes 396 | ax.grid(True) 397 | ticks = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] 398 | plt.xticks(range(0, 24*8, 24), [s + ' ' + str(DAYSTART).zfill(2) + ':00' for s in ticks]) 399 | ax.set_xticks(range(0, 24*7), minor=True) 400 | plt.title('Hourly Message Count Mean (One Week)') 401 | plt.legend() 402 | 403 | # plot message hourly message count mean (overall) 404 | ax = fig.add_subplot(212, xmargin=0.05) 405 | d_means = d_means[DAYSTART:] + d_means[:DAYSTART+1] 406 | plt.plot(range(25), d_means, COLORS[1]) 407 | 408 | # set style attributes 409 | ax.grid(True) 410 | plt.ylim(-0.1*max(d_means), 1.1*max(d_means)) 411 | plt.xticks(range(25), list(range(DAYSTART, 24)) + list(range(DAYSTART+1))) 412 | plt.title('Hourly Message Count Mean (Overall)') 413 | 414 | 415 | def network(members): 416 | """Visualize response network structures. 417 | 418 | Display how often users respond to each other user in an alluvial 419 | diagram. 420 | """ 421 | class LineDataUnits(Line2D): 422 | """Line2D taking lw argument in y axis units instead of points""" 423 | def __init__(self, *args, **kwargs): 424 | _lw_data = kwargs.pop('lw', 1) 425 | super().__init__(*args, **kwargs) 426 | self._lw_data = _lw_data 427 | 428 | def _get_lw(self): 429 | if self.axes is not None: 430 | ppd = 72./self.axes.figure.dpi 431 | trans = self.axes.transData.transform 432 | return ((trans((1, self._lw_data)) - trans((0, 0))) * ppd)[1] 433 | else: 434 | return 1 435 | 436 | def _set_lw(self, lw): 437 | self._lw_data = lw 438 | 439 | _linewidth = property(_get_lw, _set_lw) 440 | 441 | def ease(y0, y1): 442 | """Return ease in out function from point (0, y0) to (1, y1)""" 443 | return y0 + (y1-y0) * x**2 / (x**2 + (1-x)**2) 444 | 445 | fig, ax = plt.subplots() 446 | x = np.linspace(0.002, 0.998) 447 | s = sum(Member.days) - 1 448 | net = [[m.answers[c.name]/s if c.name in m.answers else 0 for c in members] for m in members] 449 | spc = 0.05 # spacing between groups 450 | posr = 1 + len(members)*spc 451 | for i in range(len(members)): 452 | for j, m in enumerate(members): 453 | posl = 1 + (len(members)-1-j)*spc - sum([sum(net[k]) for k in range(j)]) 454 | posl -= sum(net[j][:i]) + net[j][i] 455 | posr -= net[j][i] + (spc if j == 0 else 0) 456 | # draw limitations 457 | p = plt.bar(0, net[j][i], 0.002, posl, color='black', align='edge').patches[0] 458 | plt.bar(1, net[j][i], -0.002, posr, color='black', align='edge') 459 | # annotate segments with user names 460 | if i == 0: 461 | tpos = 1 + len(members)*spc - spc 462 | tpos -= sum([sum(net[k]) for k in range(j)]) + sum(net[j])/2 + j*spc 463 | ax.text(-0.043, tpos, m.name, ha='right', va='center') 464 | # draw alluvial lines 465 | ax.add_line(LineDataUnits( 466 | x, 467 | ease(posl+net[j][i]/2, posr+net[j][i]/2), 468 | lw=net[j][i], 469 | alpha=0.6, 470 | color=COLORS[j+4] 471 | )) 472 | 473 | # set style attributes 474 | plt.ylim(0, 1 + len(members)*spc - spc) 475 | plt.title('Response Network') 476 | ax.set_axis_off() 477 | 478 | 479 | if __name__ == '__main__': 480 | members = Text.process(PATH) 481 | # set custom plot style 482 | plt.style.use(os.path.join(sys.path[0], 'style.mplstyle')) 483 | # show plots 484 | for plot in [trend, activity, shares, times, network]: 485 | plot(members) 486 | plt.gcf().canvas.set_window_title('Whatsapp Analyzer') 487 | plt.show(block=False) 488 | plt.show() 489 | 490 | --------------------------------------------------------------------------------