├── README.md ├── gc_log_visualizer.py ├── images ├── humongous-objects.png ├── ihop-and-reclaimable-unhealthy.png ├── ihop-and-reclaimable.png └── to-space-exhaustion.png ├── regionsize_vs_objectsize.sh ├── requirements.txt └── setup.py /README.md: -------------------------------------------------------------------------------- 1 | # Deprecated 2 | 3 | This visualizer worked great for very old java versions, but has not been kept up-to-date for newer versions. As HubSpot is mostly running newer (jdk11/17) versions, 4 | we will not be maintaining this anymore. 5 | 6 | We've tried out and had good luck with https://github.com/krzysztofslusarski/jvm-gc-logs-analyzer, which supports newer log formats and also has many different helpful views. 7 | 8 | Old Readme contents 9 |
10 | # Run a gc.log through gnuplot for multiple views of GC performance 11 | 12 | The python script `gc_log_visualizer.py` will use gnuplot to graph interesting characteristics 13 | and data from the given gc log. 14 | 15 | * pre/post gc amounts for total heap. Bar for `InitiatingHeapOccupancyPercent` if found. 16 | * mixed gc duration, from the start of the first event until not continued in a new minor event (g1gc) 17 | * count of sequentials runs of mixed gc (g1gc) 18 | * stop-the-world pause times from GC events, other stw events ignored 19 | * Percentage of total time spent in GC stop-the-world 20 | * Count of GC stop-the-world pause times grouped by time taken 21 | * Multi-phase concurrent mark cycle duration (g1gc) 22 | * Line graph of pre-gc sizes, young old and total. to-space exhaustion events added for g1gc. Bar for `InitiatingHeapOccupancyPercent` if found. Reclaimable (mb) amount per mixed gc event. 23 | * Eden size pre/post. For g1gc shows how the alg floats the target Eden size around. 24 | * Delta of Tenured data for each GC event for g1gc only. 25 | The idea of this graph is to get a rough idea on the Tenured fill rate. 26 | Not entirely sure of what's going on here, after a young gc event Tenured can drop significantly. 27 | 28 | The shell script `regionsize_vs_objectsize.sh` will take a gc.log 29 | as input and return the percent of Humongous Objects that would fit 30 | into various G1RegionSize's (2mb-32mb by powers of 2). 31 | 32 | ``` 33 | ./regionsize_vs_objectsize.sh 34 | 1986 humongous objects referenced in 35 | 32% would not be humongous with a 2mb g1 region size 36 | 77% would not be humongous with a 4mb g1 region size 37 | 100% would not be humongous with a 8mb g1 region size 38 | 100% would not be humongous with a 16mb g1 region size 39 | 100% would not be humongous with a 32mb g1 region size 40 | ``` 41 | 42 | ## How to run 43 | The start and end dates are optional and can be any format gnuplot understands. 44 | The second argument will be used as the base name for the created png files. 45 | 46 | ``` 47 | python gc_log_visualizer.py 48 | python gc_log_visualizer.py gc.log 49 | python gc_log_visualizer.py gc.log.0.current user-app 50 | python gc_log_visualizer.py gc.log 3minwindow 2015-08-12:19:36:00 2015-08-12:19:39:00 51 | ``` 52 | 53 | ## gc log preparation 54 | The script has been run on ParallelGC and G1GC logs. There may 55 | be some oddities/issues with ParallelGC as profiling it hasn't 56 | proven overly useful. 57 | 58 | The following gc params are required for full functionality. 59 | 60 | ``` 61 | -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy 62 | ``` 63 | 64 | ## required python libs 65 | The python libs that are required can be found in the setup.py 66 | and handled in the usual manner. 67 | 68 | ``` 69 | # enter a virtualenv or not 70 | pip install -r requirements.txt 71 | ``` 72 | 73 | ## gnuplot 74 | The gc.log is parsed into flat files which are then run through 75 | gnuplot. 76 | 77 | ``` 78 | # osx 79 | brew install gnuplot 80 | brew unlink libjpeg 81 | brew install libjpeg 82 | brew link libjpeg 83 | ``` 84 | 85 | ## Examples 86 | 87 | Line charts of generation sizes and total, bar for InitiatingHeapOccupancyPercent (in Mb), 88 | reclaimable amount per mixed gc event. 89 | 90 | ![example of main chart with InitiatingHeapOccupancyPercent and reclaimable](images/ihop-and-reclaimable.png) 91 | 92 | Another example of the same chart but with the `InitiatingHeapOccupancyPercent` 93 | set below working set, which results in lots of mixed gc events shown as the reclaimable squares. 94 | 95 | ![example of unhealthy main chart with InitiatingHeapOccupancyPercent and reclaimable](images/ihop-and-reclaimable-unhealthy.png) 96 | 97 | To-space exhaustion from traffic bursts on cache 98 | expiration events. Solution: use stampeding herd protection. 99 | 100 | ![example of to-space exhaustion](images/to-space-exhaustion.png) 101 | 102 | This visualization of humongous objects shows the sizes in KB, 103 | as well as the vertical groupings that have the potential to 104 | cause to-space exhaustion. 105 | 106 | ![example of humongous objects](images/humongous-objects.png) 107 |
108 | -------------------------------------------------------------------------------- /gc_log_visualizer.py: -------------------------------------------------------------------------------- 1 | #!python 2 | 3 | import sys 4 | import re 5 | import tempfile 6 | import os 7 | import dateutil.parser 8 | 9 | class StwSubTimings: 10 | def __init__(self): 11 | self.reset() 12 | 13 | def reset(self): 14 | self.ext_root_scan = 0 15 | self.update_rs = 0 16 | self.scan_rs = 0 17 | self.object_copy = 0 18 | self.termination = 0 19 | self.other = 0 20 | 21 | def unknown_time(self, total): 22 | if total: 23 | return int((total * 1000) - self.ext_root_scan - self.update_rs - self.scan_rs - self.object_copy - self.termination - self.other) 24 | else: 25 | return 0 26 | 27 | class LogParser: 28 | heapG1GCPattern = '\s*\[Eden: ([0-9.]+)([BKMG])\(([0-9.]+)([BKMG])\)->[0-9.BKMG()]+ Survivors: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG]) Heap: ([0-9.]+)([BKMG])\([0-9.BKMG]+\)->([0-9.]+)([BKMG])\([0-9.BKMG]+\)' 29 | parallelPattern = '\s*\[PSYoungGen: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.MKBG]+\)\] ([0-9.]+)([MKBG])->([0-9.]+)([MKBG])\([0-9.MKBG]+\),' 30 | parallelFullPattern = '\s*\[PSYoungGen: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.MKBG]+\)\] \[ParOldGen: [0-9.BKMG]+->[0-9.BKMG]+\([0-9.MKBG]+\)\] ([0-9.]+)([MKBG])->([0-9.]+)([MKBG])\([0-9.MKBG]+\),' 31 | heapCMSPattern = '.*\[ParNew: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.BKMG]+\), [.0-9]+ secs\] ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.BKMG]+\).*' 32 | rootScanStartPattern = '[0-9T\-\:\.\+]* ([0-9.]*): \[GC concurrent-root-region-scan-start\]' 33 | rootScanMarkEndPattern = '[0-9T\-\:\.\+]* ([0-9.]*): \[GC concurrent-mark-end, .*' 34 | rootScanEndPattern = '[0-9T\-\:\.\+]* ([0-9.]*): \[GC concurrent-cleanup-end, .*' 35 | mixedStartPattern = '\s*([0-9.]*): \[G1Ergonomics \(Mixed GCs\) start mixed GCs, .*' 36 | mixedContinuePattern = '\s*([0-9.]*): \[G1Ergonomics \(Mixed GCs\) continue mixed GCs, .*' 37 | mixedEndPattern = '\s*([0-9.]*): \[G1Ergonomics \(Mixed GCs\) do not continue mixed GCs, .*' 38 | exhaustionPattern = '.*\(to-space exhausted\).*' 39 | humongousObjectPattern = '.*request concurrent cycle initiation, .*, allocation request: ([0-9]*) .*, source: concurrent humongous allocation]' 40 | occupancyThresholdPattern = '.*threshold: ([0-9]*) bytes .*, source: end of GC\]' 41 | reclaimablePattern = '.*reclaimable: ([0-9]*) bytes \(([0-9.]*) %\), threshold: ([0-9]*).00 %]' 42 | 43 | def __init__(self, input_file): 44 | self.timestamp = None 45 | self.input_file = input_file 46 | self.pause_file = open('pause.dat', "w+b") 47 | self.young_pause_file = open('young-pause.dat', "w+b") 48 | self.mixed_pause_file = open('mixed-pause.dat', "w+b") 49 | self.pause_count_file = open('pause_count.dat', "w+b") 50 | self.full_gc_file = open('full_gc.dat', "w+b") 51 | self.gc_file = open('gc.dat', "w+b") 52 | self.young_file = open('young.dat', "w+b") 53 | self.root_scan_file = open('rootscan.dat', "w+b") 54 | self.cms_mark_file = open('cms_mark.dat', "w+b") 55 | self.cms_rescan_file = open('cms_rescan.dat', "w+b") 56 | self.mixed_duration_file = open('mixed_duration.dat', "w+b") 57 | self.exhaustion_file = open('exhaustion.dat', "w+b") 58 | self.humongous_objects_file = open('humongous_objects.dat', "w+b") 59 | self.reclaimable_file = open('reclaimable.dat', "w+b") 60 | self.gc_alg_g1gc = False 61 | self.gc_alg_cms = False 62 | self.gc_alg_parallel = False 63 | self.pre_gc_total = 0 64 | self.post_gc_total = 0 65 | self.pre_gc_young = 0 66 | self.pre_gc_young_target = 0 67 | self.post_gc_young = 0 68 | self.pre_gc_survivor = 0 69 | self.post_gc_survivor = 0 70 | self.tenured_delta = 0 71 | self.full_gc = False 72 | self.gc = False 73 | self.root_scan_start_time = 0 74 | self.root_scan_end_timestamp = 0 75 | self.root_scan_mark_end_time = 0 76 | self.mixed_duration_start_time = 0 77 | self.mixed_duration_count = 0 78 | self.total_pause_time = 0 79 | self.size = '1024,768' 80 | self.last_minute = -1 81 | self.reset_pause_counts() 82 | self.occupancy_threshold = None 83 | self.stw = StwSubTimings() 84 | 85 | def cleanup(self): 86 | os.unlink(self.pause_file.name) 87 | os.unlink(self.young_pause_file.name) 88 | os.unlink(self.mixed_pause_file.name) 89 | os.unlink(self.pause_count_file.name) 90 | os.unlink(self.full_gc_file.name) 91 | os.unlink(self.gc_file.name) 92 | os.unlink(self.young_file.name) 93 | os.unlink(self.root_scan_file.name) 94 | os.unlink(self.cms_mark_file.name) 95 | os.unlink(self.cms_rescan_file.name) 96 | os.unlink(self.mixed_duration_file.name) 97 | os.unlink(self.exhaustion_file.name) 98 | os.unlink(self.humongous_objects_file.name) 99 | os.unlink(self.reclaimable_file.name) 100 | return 101 | 102 | def close_files(self): 103 | self.pause_file.close() 104 | self.young_pause_file.close() 105 | self.mixed_pause_file.close() 106 | self.pause_count_file.close() 107 | self.gc_file.close() 108 | self.full_gc_file.close() 109 | self.young_file.close() 110 | self.root_scan_file.close() 111 | self.cms_mark_file.close() 112 | self.cms_rescan_file.close() 113 | self.mixed_duration_file.close() 114 | self.exhaustion_file.close() 115 | self.humongous_objects_file.close() 116 | self.reclaimable_file.close() 117 | 118 | def gnuplot(self, name, start, end): 119 | if start is None: 120 | xrange = "" 121 | else: 122 | xrange = "set xrange [ \"%s\":\"%s\" ]; " % (start, end) 123 | 124 | # Add a line for the occupancy threshold if found 125 | occupancy_threshold_arrow = "" 126 | if self.occupancy_threshold: 127 | occupancy_threshold_arrow = "set arrow 10 from graph 0,first %d to graph 1, first %d nohead; " % (self.occupancy_threshold, self.occupancy_threshold) 128 | occupancy_threshold_arrow += "set label \"%s\" at graph 0,first %d offset 1,1; " % ('IOF' if self.gc_alg_cms else 'IHOP', self.occupancy_threshold) 129 | 130 | # example of how to cap the y-range of the graph at .2 131 | #gnuplot_cmd = "gnuplot -e 'set term png size %s; set yrange [0:0.2]; set output \"%s-stw-200ms-cap.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2'" % (self.size, name, xrange, self.pause_file.name) 132 | #os.system(gnuplot_cmd) 133 | 134 | if self.gc_alg_parallel: 135 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"all stw\"'" % (self.size, name, xrange, self.pause_file.name) 136 | os.system(gnuplot_cmd) 137 | 138 | # Separate young and mixed stw events 139 | if self.gc_alg_g1gc: 140 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-young.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"young\"'" % (self.size, name, xrange, self.young_pause_file.name) 141 | os.system(gnuplot_cmd) 142 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-mixed.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"mixed\"'" % (self.size, name, xrange, self.mixed_pause_file.name) 143 | os.system(gnuplot_cmd) 144 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-all.png\"; set xdata time; " \ 145 | "set ylabel \"Secs\"; " \ 146 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 147 | "%s " \ 148 | "plot \"%s\" using 1:2 title \"young\"" \ 149 | ", \"%s\" using 1:2 title \"mixed\"'" % (self.size, name, xrange, self.young_pause_file.name, self.mixed_pause_file.name) 150 | os.system(gnuplot_cmd) 151 | 152 | # Separate young and mixed stw events 153 | if self.gc_alg_cms: 154 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-young.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"young\"'" % (self.size, name, xrange, self.pause_file.name) 155 | os.system(gnuplot_cmd) 156 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-all.png\"; set xdata time; " \ 157 | "set ylabel \"Secs\"; " \ 158 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 159 | "%s " \ 160 | "plot \"%s\" using 1:2 title \"young\"" \ 161 | ", \"%s\" using 1:2 title \"mark\"" \ 162 | ", \"%s\" using 1:2 title \"rescan\"'" % (self.size, name, xrange, self.pause_file.name, self.cms_mark_file.name, self.cms_rescan_file.name) 163 | os.system(gnuplot_cmd) 164 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-old.png\"; set xdata time; " \ 165 | "set ylabel \"Secs\"; " \ 166 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 167 | "%s " \ 168 | "plot \"%s\" using 1:2 title \"mark\"" \ 169 | ", \"%s\" using 1:2 title \"rescan\"'" % (self.size, name, xrange, self.cms_mark_file.name, self.cms_rescan_file.name) 170 | os.system(gnuplot_cmd) 171 | 172 | # Stw sub-timings 173 | if self.gc_alg_g1gc: 174 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-ext-root-scan.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:3 title \"ext-root-scan\"'" % (self.size, name, xrange, self.pause_file.name) 175 | os.system(gnuplot_cmd) 176 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-update-rs.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:4 title \"update-rs\"'" % (self.size, name, xrange, self.pause_file.name) 177 | os.system(gnuplot_cmd) 178 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-scan-rs.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:5 title \"scan-rs\"'" % (self.size, name, xrange, self.pause_file.name) 179 | os.system(gnuplot_cmd) 180 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-object-copy.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:6 title \"object-copy\"'" % (self.size, name, xrange, self.pause_file.name) 181 | os.system(gnuplot_cmd) 182 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-termination.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:7 title \"termination\"'" % (self.size, name, xrange, self.pause_file.name) 183 | os.system(gnuplot_cmd) 184 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-other.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:8 title \"other\"'" % (self.size, name, xrange, self.pause_file.name) 185 | os.system(gnuplot_cmd) 186 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-unknown.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:9 title \"unknown\"'" % (self.size, name, xrange, self.pause_file.name) 187 | os.system(gnuplot_cmd) 188 | 189 | # total pause time 190 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-total-pause.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:8 title \"%% of time in gc\"'" % (self.size, name, xrange, self.pause_count_file.name) 191 | os.system(gnuplot_cmd) 192 | 193 | # Note: This seems to have marginal utility as compared to the plot of wall time vs. pause time 194 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-pause-count.png\"; set xdata time; " \ 195 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 196 | "%s " \ 197 | "plot \"%s\" using 1:2 title \"under-50\" with lines" \ 198 | ", \"%s\" using 1:3 title \"50-90\" with lines" \ 199 | ", \"%s\" using 1:4 title \"90-120\" with lines" \ 200 | ", \"%s\" using 1:5 title \"120-150\" with lines" \ 201 | ", \"%s\" using 1:6 title \"150-200\" with lines" \ 202 | ", \"%s\" using 1:7 title \"200+\" with lines'" % (self.size, name, xrange, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name) 203 | os.system(gnuplot_cmd) 204 | 205 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-heap.png\"; set xdata time; " \ 206 | "set ylabel \"MB\"; " \ 207 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 208 | "%s " \ 209 | "%s " \ 210 | "plot \"%s\" using 1:2 title \"pre-gc-amount\"" \ 211 | ", \"%s\" using 1:3 title \"post-gc-amount\"'" % (self.size, name, occupancy_threshold_arrow, xrange, self.gc_file.name, self.gc_file.name) 212 | os.system(gnuplot_cmd) 213 | 214 | # Add to-space exhaustion events if any are found 215 | if self.gc_alg_g1gc and os.stat(self.exhaustion_file.name).st_size > 0: 216 | to_space_exhaustion = ", \"%s\" using 1:2 title \"to-space-exhaustion\" pt 7 ps 3" % (self.exhaustion_file.name) 217 | else: 218 | to_space_exhaustion = "" 219 | 220 | # line graph of Eden, Tenured and the Total 221 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-totals.png\"; set xdata time; " \ 222 | "set ylabel \"MB\"; " \ 223 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 224 | "%s " \ 225 | "%s " \ 226 | "plot \"%s\" using 1:2 title \"Eden\" with lines" \ 227 | ", \"%s\" using 1:4 title \"Tenured\" with lines" \ 228 | "%s" \ 229 | ", \"%s\" using 1:5 title \"Total\" with lines" \ 230 | ", \"%s\" using 1:2 title \"Reclaimable\"'" % (self.size, name, xrange, occupancy_threshold_arrow, self.young_file.name, self.young_file.name, to_space_exhaustion, self.young_file.name, self.reclaimable_file.name) 231 | os.system(gnuplot_cmd) 232 | 233 | 234 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-young.png\"; set xdata time; " \ 235 | "set ylabel \"MB\"; " \ 236 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 237 | "%s " \ 238 | "plot \"%s\" using 1:2 title \"current\"" \ 239 | ", \"%s\" using 1:3 title \"max\"'" % (self.size, name, xrange, self.young_file.name, self.young_file.name) 240 | os.system(gnuplot_cmd) 241 | 242 | if self.gc_alg_g1gc: 243 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-tenured-delta.png\"; set xdata time; " \ 244 | "set ylabel \"MB\"; " \ 245 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \ 246 | "%s " \ 247 | "plot \"%s\" using 1:6 with lines title \"tenured-delta\"'" % (self.size, name, xrange, self.young_file.name) 248 | os.system(gnuplot_cmd) 249 | 250 | if self.gc_alg_g1gc: 251 | # root-scan times 252 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-root-scan.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"root-scan-duration(ms)\"'" % (self.size, name, xrange, self.root_scan_file.name) 253 | os.system(gnuplot_cmd) 254 | 255 | # time from first mixed-gc to last 256 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-mixed-duration.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"mixed-gc-duration(ms)\"'" % (self.size, name, xrange, self.mixed_duration_file.name) 257 | os.system(gnuplot_cmd) 258 | 259 | # count of mixed-gc runs before stopping mixed gcs, max is 8 by default 260 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-mixed-duration-count.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:3 title \"mixed-gc-count\"'" % (self.size, name, xrange, self.mixed_duration_file.name) 261 | os.system(gnuplot_cmd) 262 | 263 | # to-space exhaustion events 264 | if os.stat(self.exhaustion_file.name).st_size > 0: 265 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-exhaustion.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2'" % (self.size, name, xrange, self.exhaustion_file.name) 266 | os.system(gnuplot_cmd) 267 | 268 | # humongous object sizes 269 | if os.stat(self.humongous_objects_file.name).st_size > 0: 270 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-humongous.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"humongous-object-size(KB)\"'" % (self.size, name, xrange, self.humongous_objects_file.name) 271 | os.system(gnuplot_cmd) 272 | 273 | return 274 | 275 | def determine_gc_alg(self): 276 | with open(self.input_file) as f: 277 | for line in f: 278 | m = re.match('^CommandLine flags: .*', line, flags=0) 279 | if m: 280 | if re.match(".*-XX:\+UseG1GC.*", line, flags=0): 281 | self.gc_alg_g1gc = True 282 | pct = self.get_long_field(line, '-XX:InitiatingHeapOccupancyPercent', 45) 283 | max = self.get_long_field(line, '-XX:MaxHeapSize') 284 | if pct and max: 285 | self.occupancy_threshold = int(max * (pct / 100.0) / 1048576.0) 286 | return 287 | 288 | elif re.match(".*-XX:\+UseConcMarkSweepGC.*", line, flags=0): 289 | self.gc_alg_cms = True 290 | pct = self.get_long_field(line, '-XX:CMSInitiatingOccupancyFraction') 291 | max = self.get_long_field(line, '-XX:MaxHeapSize') 292 | if pct and max: 293 | self.occupancy_threshold = int(max * (pct / 100.0) / 1048576.0) 294 | return 295 | elif re.match(".*-XX:\+UseParallelGC.*", line, flags=0): 296 | self.gc_alg_parallel = True 297 | return 298 | 299 | m = re.match(LogParser.heapG1GCPattern, line, flags=0) 300 | if m: 301 | self.gc_alg_g1gc = True 302 | return 303 | 304 | m = re.match(LogParser.heapCMSPattern, line, flags=0) 305 | if m: 306 | self.gc_alg_cms = True 307 | return 308 | 309 | m = re.match(LogParser.parallelPattern, line, flags=0) 310 | if m: 311 | self.gc_alg_parallel = True 312 | return 313 | 314 | def get_long_field(self, line, field, def_value=0): 315 | m = re.match(".*%s=([0-9]+).*" % field, line, flags=0) 316 | if m: 317 | return long(m.group(1)) 318 | else: 319 | return long(def_value) 320 | 321 | def parse_log(self): 322 | with open(self.input_file) as f: 323 | for line in f: 324 | # This needs to be first 325 | self.line_has_timestamp(line) 326 | 327 | self.line_has_gc(line) 328 | 329 | if self.gc_alg_g1gc: 330 | self.collect_root_scan_times(line) 331 | self.collect_mixed_duration_times(line) 332 | self.collect_to_space_exhaustion(line) 333 | self.collect_humongous_objects(line) 334 | self.collect_reclaimable(line) 335 | self.collect_stw_sub_timings(line) 336 | 337 | # find the occupance threshold if CommandLine log line not present 338 | if not self.occupancy_threshold: 339 | self.collect_occupancy_threshold_pattern(line) 340 | 341 | if self.gc_alg_cms: 342 | self.write_cms_data(line) 343 | 344 | # This needs to be last 345 | if self.line_has_pause_time(line): 346 | self.output_data() 347 | self.stw.reset() 348 | 349 | def output_data(self): 350 | if self.mixed_duration_count == 0: 351 | self.young_pause_file.write("%s %.6f\n" % (self.timestamp_string(), self.pause_time)) 352 | else: 353 | self.mixed_pause_file.write("%s %.6f\n" % (self.timestamp_string(), self.pause_time)) 354 | 355 | self.pause_file.write("%s %.6f %d %d %d %d %d %d %d\n" % (self.timestamp_string(), self.pause_time, self.stw.ext_root_scan, self.stw.update_rs, self.stw.scan_rs, self.stw.object_copy, self.stw.termination, self.stw.other, self.stw.unknown_time(self.pause_time))) 356 | self.young_file.write("%s %s %s %s %s %s\n" % (self.timestamp_string(), self.pre_gc_young, self.pre_gc_young_target, self.pre_gc_total - self.pre_gc_young, self.pre_gc_total, self.tenured_delta)) 357 | 358 | # clean this up, full_gc's should probably graph 359 | # in the same chart as regular gc events if possible 360 | if self.full_gc: 361 | self.full_gc_file.write("%s %s %s\n" % (self.timestamp_string(), self.pre_gc_total, self.post_gc_total)) 362 | self.full_gc = False 363 | elif self.gc: 364 | self.gc_file.write("%s %s %s\n" % (self.timestamp_string(), self.pre_gc_total, self.post_gc_total)) 365 | self.gc = False 366 | 367 | def output_pause_counts(self): 368 | self.pause_count_file.write("%s %s %s %s %s %s %s %s\n" % (self.timestamp_string(), self.under_50, self.under_90, self.under_120, self.under_150, self.under_200, self.over_200, self.total_pause_time * 100 / 60)) 369 | 370 | def line_has_pause_time(self, line): 371 | m = re.match("[0-9-]*T[0-9]+:([0-9]+):.* threads were stopped: ([0-9.]+) seconds", line, flags=0) 372 | if not m or not (self.gc or self.full_gc): 373 | return False 374 | 375 | cur_minute = int(m.group(1)) 376 | self.pause_time = float(m.group(2)) 377 | self.increment_pause_counts(self.pause_time) 378 | 379 | if cur_minute != self.last_minute: 380 | self.last_minute = cur_minute 381 | self.output_pause_counts() 382 | self.reset_pause_counts() 383 | 384 | return True 385 | 386 | def line_has_timestamp(self, line): 387 | t = line.split() 388 | if t and len(t) > 0: 389 | t = t[0] 390 | if t: 391 | t = t[:-1] 392 | 393 | if t and len(t) > 15: # 15 is mildly arbitrary 394 | try: 395 | self.timestamp = dateutil.parser.parse(t) 396 | except (ValueError, AttributeError), e: 397 | return 398 | return 399 | 400 | def timestamp_string(self): 401 | return self.any_timestamp_string(self.timestamp) 402 | 403 | def any_timestamp_string(self, ts): 404 | return ts.strftime("%Y-%m-%d:%H:%M:%S") 405 | 406 | def collect_root_scan_times(self, line): 407 | m = re.match(LogParser.rootScanStartPattern, line, flags=0) 408 | if m: 409 | if self.root_scan_mark_end_time > 0: 410 | elapsed_time = self.root_scan_mark_end_time - self.root_scan_start_time 411 | self.root_scan_file.write("%s %s\n" % (self.any_timestamp_string(self.root_scan_end_timestamp), elapsed_time)) 412 | self.root_scan_mark_end_time = 0 413 | 414 | self.root_scan_start_time = int(float(m.group(1)) * 1000) 415 | return 416 | 417 | 418 | m = re.match(LogParser.rootScanMarkEndPattern, line, flags=0) 419 | if m and self.root_scan_start_time > 0: 420 | self.root_scan_mark_end_time = int(float(m.group(1)) * 1000) 421 | self.root_scan_end_timestamp = self.timestamp 422 | return 423 | 424 | m = re.match(LogParser.rootScanEndPattern, line, flags=0) 425 | if m and self.root_scan_start_time > 0: 426 | self.root_scan_end_timestamp = self.timestamp 427 | elapsed_time = int(float(m.group(1)) * 1000) - self.root_scan_start_time 428 | self.root_scan_file.write("%s %s\n" % (self.any_timestamp_string(self.root_scan_end_timestamp), elapsed_time)) 429 | self.root_scan_start_time = 0 430 | self.root_scan_mark_end_time = 0 431 | 432 | def collect_mixed_duration_times(self, line): 433 | m = re.match(LogParser.mixedStartPattern, line, flags=0) 434 | if m: 435 | self.mixed_duration_start_time = int(float(m.group(1)) * 1000) 436 | self.mixed_duration_count += 1 437 | return 438 | 439 | m = re.match(LogParser.mixedContinuePattern, line, flags=0) 440 | if m: 441 | self.mixed_duration_count += 1 442 | return 443 | 444 | m = re.match(LogParser.mixedEndPattern, line, flags=0) 445 | if m and self.mixed_duration_start_time > 0: 446 | elapsed_time = int(float(m.group(1)) * 1000) - self.mixed_duration_start_time 447 | self.mixed_duration_count += 1 448 | self.mixed_duration_file.write("%s %s %s\n" % (self.timestamp_string(), elapsed_time, self.mixed_duration_count)) 449 | self.mixed_duration_start_time = 0 450 | self.mixed_duration_count = 0 451 | 452 | def collect_to_space_exhaustion(self, line): 453 | m = re.match(LogParser.exhaustionPattern, line, flags=0) 454 | if m and self.timestamp: 455 | self.exhaustion_file.write("%s %s\n" % (self.timestamp_string(), 100)) 456 | 457 | def collect_humongous_objects(self, line): 458 | m = re.match(LogParser.humongousObjectPattern, line, flags=0) 459 | if m and self.timestamp: 460 | self.humongous_objects_file.write("%s %s\n" % (self.timestamp_string(), int(m.group(1)) / 1024)) 461 | 462 | def collect_occupancy_threshold_pattern(self, line): 463 | m = re.match(LogParser.occupancyThresholdPattern, line, flags=0) 464 | if m: 465 | self.occupancy_threshold = int(int(m.group(1)) / 1048576) 466 | 467 | def collect_reclaimable(self, line): 468 | m = re.match(LogParser.reclaimablePattern, line, flags=0) 469 | if m and int(float(m.group(2))) >= int(m.group(3)) and self.timestamp: 470 | self.reclaimable_file.write("%s %d\n" % (self.timestamp_string(), long(m.group(1)) / 1048576)) 471 | 472 | def collect_stw_sub_timings(self, line): 473 | if re.match('^[ ]+\[.*', line): 474 | self.stw.ext_root_scan = self.parseMaxTiming('Ext Root Scanning', line, self.stw.ext_root_scan) 475 | self.stw.update_rs = self.parseMaxTiming('Update RS', line, self.stw.update_rs) 476 | self.stw.scan_rs = self.parseMaxTiming('Scan RS', line, self.stw.scan_rs) 477 | self.stw.object_copy = self.parseMaxTiming('Object Copy', line, self.stw.object_copy) 478 | self.stw.termination = self.parseMaxTiming('Termination', line, self.stw.termination) 479 | m = re.match('^[ ]+\[Other: ([0-9.]+).*', line) 480 | if m: 481 | self.stw.other = int(float(m.group(1))) 482 | 483 | def parseMaxTiming(self, term, line, current_value): 484 | m = re.match("^[ ]+\[%s .* Max: ([0-9]+)\.[0-9],.*" % (term), line) 485 | if m: 486 | return int(float(m.group(1))) 487 | else: 488 | return current_value 489 | 490 | def write_cms_data(self, line): 491 | # collect stw times 492 | # 1) initial marking step, checks from roots 493 | # 2016-04-30T06:11:03.626+0000: 120634.808: [CMS-concurrent-mark: 0.922/0.922 secs] [Times: user=7.25 sys=0.59, real=0.93 secs] 494 | m = re.match(".*\[CMS-concurrent-mark: .*, real=([.0-9]+) secs.*", line, flags=0) 495 | if m: 496 | self.cms_mark_file.write("%s %.6f\n" % (self.timestamp_string(), float(m.group(1)))) 497 | 498 | # 2) rescan phase 499 | # 2016-04-30T06:11:09.341+0000: 120640.523: [GC (CMS Final Remark) [YG occupancy: 737574 K (996800 K)]2016-04-30T06:11:09.341+0000: 120640.523: [Rescan (parallel) , 0.0728015 secs]2016-04-30T06:11:09.414+0000: 120640.596: [weak refs processing, 0.0236183 secs]2016-04-30T06:11:09.437+0000: 120640.619: [class unloading, 0.0157037 secs]2016-04-30T06:11:09.453+0000: 120640.635: [scrub symbol table, 0.0069954 secs]2016-04-30T06:11:09.460+0000: 120640.642: [scrub string table, 0.0007916 secs][1 CMS-remark: 22933820K(30349760K)] 23671395K(31346560K), 0.1314855 secs] [Times: user=0.83 sys=0.17, real=0.13 secs] 500 | m = re.match(".*\[Rescan .*, real=([.0-9]+) secs.*", line, flags=0) 501 | if m: 502 | self.cms_rescan_file.write("%s %.6f\n" % (self.timestamp_string(), float(m.group(1)))) 503 | 504 | 505 | def line_has_gc(self, line): 506 | m = re.match(LogParser.heapG1GCPattern, line, flags=0) 507 | if m: 508 | self.store_gc_amount(m) 509 | self.gc = True 510 | return 511 | 512 | m = re.match(LogParser.parallelPattern, line, flags=0) 513 | if m: 514 | self.store_gc_amount(m) 515 | self.gc = True 516 | return 517 | 518 | m = re.match(LogParser.parallelFullPattern, line, flags=0) 519 | if m: 520 | self.store_gc_amount(m) 521 | self.full_gc = True 522 | 523 | m = re.match(LogParser.heapCMSPattern, line, flags=0) 524 | if m: 525 | self.store_gc_amount(m) 526 | self.gc = True 527 | 528 | return 529 | 530 | def store_gc_amount(self, matcher): 531 | i = 1 532 | self.pre_gc_young = self.scale(matcher.group(i), matcher.group(i+1)) 533 | 534 | if self.gc_alg_g1gc or self.gc_alg_parallel: 535 | i += 2 536 | self.pre_gc_young_target = self.scale(matcher.group(i), matcher.group(i+1)) 537 | 538 | if self.gc_alg_cms: 539 | i += 2 540 | self.post_gc_young = self.scale(matcher.group(i), matcher.group(i+1)) 541 | 542 | if self.gc_alg_g1gc: 543 | i += 2 544 | self.pre_gc_survivor = self.scale(matcher.group(i), matcher.group(i+1)) 545 | i += 2 546 | self.post_gc_survivor = self.scale(matcher.group(i), matcher.group(i+1)) 547 | 548 | i += 2 549 | self.pre_gc_total = self.scale(matcher.group(i), matcher.group(i+1)) 550 | i += 2 551 | self.post_gc_total = self.scale(matcher.group(i), matcher.group(i+1)) 552 | 553 | if self.gc_alg_g1gc: 554 | self.tenured_delta = (self.post_gc_total - self.post_gc_survivor) - (self.pre_gc_total - self.pre_gc_young - self.pre_gc_survivor) 555 | 556 | def scale(self, amount, unit): 557 | rawValue = float(amount) 558 | if unit == 'B': 559 | return int(rawValue / (1024.0 * 1024.0)) 560 | elif unit == 'K': 561 | return int(rawValue / 1024.0) 562 | elif unit == 'M': 563 | return int(rawValue) 564 | elif unit == 'G': 565 | return int(rawValue * 1024.0) 566 | return rawValue 567 | 568 | def increment_pause_counts(self, pause_time): 569 | self.total_pause_time = self.total_pause_time + pause_time 570 | 571 | if pause_time < 0.050: 572 | self.under_50 = self.under_50 + 1 573 | elif pause_time < 0.090: 574 | self.under_90 = self.under_90 + 1 575 | elif pause_time < 0.120: 576 | self.under_120 = self.under_120 + 1 577 | elif pause_time < 0.150: 578 | self.under_150 = self.under_150 + 1 579 | elif pause_time < 0.200: 580 | self.under_200 = self.under_200 + 1 581 | else: 582 | self.over_200 = self.over_200 + 1 583 | 584 | def reset_pause_counts(self): 585 | self.under_50 = 0 586 | self.under_90 = 0 587 | self.under_120 = 0 588 | self.under_150 = 0 589 | self.under_200 = 0 590 | self.over_200 = 0 591 | self.total_pause_time = 0 592 | 593 | def main(): 594 | logParser = LogParser(sys.argv[1]) 595 | try: 596 | logParser.determine_gc_alg() 597 | print("gc alg: parallel=%s, g1gc=%s, cms=%s" % (logParser.gc_alg_parallel, logParser.gc_alg_g1gc, logParser.gc_alg_cms)) 598 | logParser.parse_log() 599 | logParser.close_files() 600 | basefilename = sys.argv[2] if len(sys.argv) > 2 else 'default' 601 | start = None 602 | end = None 603 | if len(sys.argv) > 3: 604 | start = sys.argv[3] 605 | end = sys.argv[4] 606 | logParser.gnuplot(basefilename, start, end) 607 | finally: 608 | logParser.cleanup() 609 | 610 | 611 | if __name__ == '__main__': 612 | main() 613 | 614 | -------------------------------------------------------------------------------- /images/humongous-objects.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/humongous-objects.png -------------------------------------------------------------------------------- /images/ihop-and-reclaimable-unhealthy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/ihop-and-reclaimable-unhealthy.png -------------------------------------------------------------------------------- /images/ihop-and-reclaimable.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/ihop-and-reclaimable.png -------------------------------------------------------------------------------- /images/to-space-exhaustion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/to-space-exhaustion.png -------------------------------------------------------------------------------- /regionsize_vs_objectsize.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | log=$1 4 | if [ -z "${log}" ] ; then 5 | echo "Usage: ${0} " 6 | exit 7 | fi 8 | 9 | total=`grep "source: concurrent humongous allocation" ${log} | wc -l` 10 | fit2mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<1048576) print}' | wc -l` 11 | fit4mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<2097152) print}' | wc -l` 12 | fit8mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<4194304) print}' | wc -l` 13 | fit16mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<8388608) print}' | wc -l` 14 | fit32mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<16777216) print}' | wc -l` 15 | 16 | echo "${total} humongous objects referenced in ${log}" 17 | echo `echo ${fit2mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 2mb g1 region size 18 | echo `echo ${fit4mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 4mb g1 region size 19 | echo `echo ${fit8mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 8mb g1 region size 20 | echo `echo ${fit16mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 16mb g1 region size 21 | echo `echo ${fit32mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 32mb g1 region size 22 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | -e . 2 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup(name='gc_log_visualizer', 4 | version='0.3', 5 | description='Generate multiple gnuplot graphs from java gc log data', 6 | author='Eric Abbott', 7 | author_email='eabbott@hubspot.com', 8 | url='https://github.com/HubSpot/gc_log_visualizer', 9 | packages=find_packages(), 10 | zip_safe=False, 11 | include_package_data=True, 12 | install_requires=[ 13 | 'python-dateutil' 14 | ], 15 | platforms=["any"] 16 | ) 17 | --------------------------------------------------------------------------------