├── README.md
├── gc_log_visualizer.py
├── images
├── humongous-objects.png
├── ihop-and-reclaimable-unhealthy.png
├── ihop-and-reclaimable.png
└── to-space-exhaustion.png
├── regionsize_vs_objectsize.sh
├── requirements.txt
└── setup.py
/README.md:
--------------------------------------------------------------------------------
1 | # Deprecated
2 |
3 | This visualizer worked great for very old java versions, but has not been kept up-to-date for newer versions. As HubSpot is mostly running newer (jdk11/17) versions,
4 | we will not be maintaining this anymore.
5 |
6 | We've tried out and had good luck with https://github.com/krzysztofslusarski/jvm-gc-logs-analyzer, which supports newer log formats and also has many different helpful views.
7 |
8 | Old Readme contents
9 |
10 | # Run a gc.log through gnuplot for multiple views of GC performance
11 |
12 | The python script `gc_log_visualizer.py` will use gnuplot to graph interesting characteristics
13 | and data from the given gc log.
14 |
15 | * pre/post gc amounts for total heap. Bar for `InitiatingHeapOccupancyPercent` if found.
16 | * mixed gc duration, from the start of the first event until not continued in a new minor event (g1gc)
17 | * count of sequentials runs of mixed gc (g1gc)
18 | * stop-the-world pause times from GC events, other stw events ignored
19 | * Percentage of total time spent in GC stop-the-world
20 | * Count of GC stop-the-world pause times grouped by time taken
21 | * Multi-phase concurrent mark cycle duration (g1gc)
22 | * Line graph of pre-gc sizes, young old and total. to-space exhaustion events added for g1gc. Bar for `InitiatingHeapOccupancyPercent` if found. Reclaimable (mb) amount per mixed gc event.
23 | * Eden size pre/post. For g1gc shows how the alg floats the target Eden size around.
24 | * Delta of Tenured data for each GC event for g1gc only.
25 | The idea of this graph is to get a rough idea on the Tenured fill rate.
26 | Not entirely sure of what's going on here, after a young gc event Tenured can drop significantly.
27 |
28 | The shell script `regionsize_vs_objectsize.sh` will take a gc.log
29 | as input and return the percent of Humongous Objects that would fit
30 | into various G1RegionSize's (2mb-32mb by powers of 2).
31 |
32 | ```
33 | ./regionsize_vs_objectsize.sh
34 | 1986 humongous objects referenced in
35 | 32% would not be humongous with a 2mb g1 region size
36 | 77% would not be humongous with a 4mb g1 region size
37 | 100% would not be humongous with a 8mb g1 region size
38 | 100% would not be humongous with a 16mb g1 region size
39 | 100% would not be humongous with a 32mb g1 region size
40 | ```
41 |
42 | ## How to run
43 | The start and end dates are optional and can be any format gnuplot understands.
44 | The second argument will be used as the base name for the created png files.
45 |
46 | ```
47 | python gc_log_visualizer.py
48 | python gc_log_visualizer.py gc.log
49 | python gc_log_visualizer.py gc.log.0.current user-app
50 | python gc_log_visualizer.py gc.log 3minwindow 2015-08-12:19:36:00 2015-08-12:19:39:00
51 | ```
52 |
53 | ## gc log preparation
54 | The script has been run on ParallelGC and G1GC logs. There may
55 | be some oddities/issues with ParallelGC as profiling it hasn't
56 | proven overly useful.
57 |
58 | The following gc params are required for full functionality.
59 |
60 | ```
61 | -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy
62 | ```
63 |
64 | ## required python libs
65 | The python libs that are required can be found in the setup.py
66 | and handled in the usual manner.
67 |
68 | ```
69 | # enter a virtualenv or not
70 | pip install -r requirements.txt
71 | ```
72 |
73 | ## gnuplot
74 | The gc.log is parsed into flat files which are then run through
75 | gnuplot.
76 |
77 | ```
78 | # osx
79 | brew install gnuplot
80 | brew unlink libjpeg
81 | brew install libjpeg
82 | brew link libjpeg
83 | ```
84 |
85 | ## Examples
86 |
87 | Line charts of generation sizes and total, bar for InitiatingHeapOccupancyPercent (in Mb),
88 | reclaimable amount per mixed gc event.
89 |
90 | 
91 |
92 | Another example of the same chart but with the `InitiatingHeapOccupancyPercent`
93 | set below working set, which results in lots of mixed gc events shown as the reclaimable squares.
94 |
95 | 
96 |
97 | To-space exhaustion from traffic bursts on cache
98 | expiration events. Solution: use stampeding herd protection.
99 |
100 | 
101 |
102 | This visualization of humongous objects shows the sizes in KB,
103 | as well as the vertical groupings that have the potential to
104 | cause to-space exhaustion.
105 |
106 | 
107 |
108 |
--------------------------------------------------------------------------------
/gc_log_visualizer.py:
--------------------------------------------------------------------------------
1 | #!python
2 |
3 | import sys
4 | import re
5 | import tempfile
6 | import os
7 | import dateutil.parser
8 |
9 | class StwSubTimings:
10 | def __init__(self):
11 | self.reset()
12 |
13 | def reset(self):
14 | self.ext_root_scan = 0
15 | self.update_rs = 0
16 | self.scan_rs = 0
17 | self.object_copy = 0
18 | self.termination = 0
19 | self.other = 0
20 |
21 | def unknown_time(self, total):
22 | if total:
23 | return int((total * 1000) - self.ext_root_scan - self.update_rs - self.scan_rs - self.object_copy - self.termination - self.other)
24 | else:
25 | return 0
26 |
27 | class LogParser:
28 | heapG1GCPattern = '\s*\[Eden: ([0-9.]+)([BKMG])\(([0-9.]+)([BKMG])\)->[0-9.BKMG()]+ Survivors: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG]) Heap: ([0-9.]+)([BKMG])\([0-9.BKMG]+\)->([0-9.]+)([BKMG])\([0-9.BKMG]+\)'
29 | parallelPattern = '\s*\[PSYoungGen: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.MKBG]+\)\] ([0-9.]+)([MKBG])->([0-9.]+)([MKBG])\([0-9.MKBG]+\),'
30 | parallelFullPattern = '\s*\[PSYoungGen: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.MKBG]+\)\] \[ParOldGen: [0-9.BKMG]+->[0-9.BKMG]+\([0-9.MKBG]+\)\] ([0-9.]+)([MKBG])->([0-9.]+)([MKBG])\([0-9.MKBG]+\),'
31 | heapCMSPattern = '.*\[ParNew: ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.BKMG]+\), [.0-9]+ secs\] ([0-9.]+)([BKMG])->([0-9.]+)([BKMG])\([0-9.BKMG]+\).*'
32 | rootScanStartPattern = '[0-9T\-\:\.\+]* ([0-9.]*): \[GC concurrent-root-region-scan-start\]'
33 | rootScanMarkEndPattern = '[0-9T\-\:\.\+]* ([0-9.]*): \[GC concurrent-mark-end, .*'
34 | rootScanEndPattern = '[0-9T\-\:\.\+]* ([0-9.]*): \[GC concurrent-cleanup-end, .*'
35 | mixedStartPattern = '\s*([0-9.]*): \[G1Ergonomics \(Mixed GCs\) start mixed GCs, .*'
36 | mixedContinuePattern = '\s*([0-9.]*): \[G1Ergonomics \(Mixed GCs\) continue mixed GCs, .*'
37 | mixedEndPattern = '\s*([0-9.]*): \[G1Ergonomics \(Mixed GCs\) do not continue mixed GCs, .*'
38 | exhaustionPattern = '.*\(to-space exhausted\).*'
39 | humongousObjectPattern = '.*request concurrent cycle initiation, .*, allocation request: ([0-9]*) .*, source: concurrent humongous allocation]'
40 | occupancyThresholdPattern = '.*threshold: ([0-9]*) bytes .*, source: end of GC\]'
41 | reclaimablePattern = '.*reclaimable: ([0-9]*) bytes \(([0-9.]*) %\), threshold: ([0-9]*).00 %]'
42 |
43 | def __init__(self, input_file):
44 | self.timestamp = None
45 | self.input_file = input_file
46 | self.pause_file = open('pause.dat', "w+b")
47 | self.young_pause_file = open('young-pause.dat', "w+b")
48 | self.mixed_pause_file = open('mixed-pause.dat', "w+b")
49 | self.pause_count_file = open('pause_count.dat', "w+b")
50 | self.full_gc_file = open('full_gc.dat', "w+b")
51 | self.gc_file = open('gc.dat', "w+b")
52 | self.young_file = open('young.dat', "w+b")
53 | self.root_scan_file = open('rootscan.dat', "w+b")
54 | self.cms_mark_file = open('cms_mark.dat', "w+b")
55 | self.cms_rescan_file = open('cms_rescan.dat', "w+b")
56 | self.mixed_duration_file = open('mixed_duration.dat', "w+b")
57 | self.exhaustion_file = open('exhaustion.dat', "w+b")
58 | self.humongous_objects_file = open('humongous_objects.dat', "w+b")
59 | self.reclaimable_file = open('reclaimable.dat', "w+b")
60 | self.gc_alg_g1gc = False
61 | self.gc_alg_cms = False
62 | self.gc_alg_parallel = False
63 | self.pre_gc_total = 0
64 | self.post_gc_total = 0
65 | self.pre_gc_young = 0
66 | self.pre_gc_young_target = 0
67 | self.post_gc_young = 0
68 | self.pre_gc_survivor = 0
69 | self.post_gc_survivor = 0
70 | self.tenured_delta = 0
71 | self.full_gc = False
72 | self.gc = False
73 | self.root_scan_start_time = 0
74 | self.root_scan_end_timestamp = 0
75 | self.root_scan_mark_end_time = 0
76 | self.mixed_duration_start_time = 0
77 | self.mixed_duration_count = 0
78 | self.total_pause_time = 0
79 | self.size = '1024,768'
80 | self.last_minute = -1
81 | self.reset_pause_counts()
82 | self.occupancy_threshold = None
83 | self.stw = StwSubTimings()
84 |
85 | def cleanup(self):
86 | os.unlink(self.pause_file.name)
87 | os.unlink(self.young_pause_file.name)
88 | os.unlink(self.mixed_pause_file.name)
89 | os.unlink(self.pause_count_file.name)
90 | os.unlink(self.full_gc_file.name)
91 | os.unlink(self.gc_file.name)
92 | os.unlink(self.young_file.name)
93 | os.unlink(self.root_scan_file.name)
94 | os.unlink(self.cms_mark_file.name)
95 | os.unlink(self.cms_rescan_file.name)
96 | os.unlink(self.mixed_duration_file.name)
97 | os.unlink(self.exhaustion_file.name)
98 | os.unlink(self.humongous_objects_file.name)
99 | os.unlink(self.reclaimable_file.name)
100 | return
101 |
102 | def close_files(self):
103 | self.pause_file.close()
104 | self.young_pause_file.close()
105 | self.mixed_pause_file.close()
106 | self.pause_count_file.close()
107 | self.gc_file.close()
108 | self.full_gc_file.close()
109 | self.young_file.close()
110 | self.root_scan_file.close()
111 | self.cms_mark_file.close()
112 | self.cms_rescan_file.close()
113 | self.mixed_duration_file.close()
114 | self.exhaustion_file.close()
115 | self.humongous_objects_file.close()
116 | self.reclaimable_file.close()
117 |
118 | def gnuplot(self, name, start, end):
119 | if start is None:
120 | xrange = ""
121 | else:
122 | xrange = "set xrange [ \"%s\":\"%s\" ]; " % (start, end)
123 |
124 | # Add a line for the occupancy threshold if found
125 | occupancy_threshold_arrow = ""
126 | if self.occupancy_threshold:
127 | occupancy_threshold_arrow = "set arrow 10 from graph 0,first %d to graph 1, first %d nohead; " % (self.occupancy_threshold, self.occupancy_threshold)
128 | occupancy_threshold_arrow += "set label \"%s\" at graph 0,first %d offset 1,1; " % ('IOF' if self.gc_alg_cms else 'IHOP', self.occupancy_threshold)
129 |
130 | # example of how to cap the y-range of the graph at .2
131 | #gnuplot_cmd = "gnuplot -e 'set term png size %s; set yrange [0:0.2]; set output \"%s-stw-200ms-cap.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2'" % (self.size, name, xrange, self.pause_file.name)
132 | #os.system(gnuplot_cmd)
133 |
134 | if self.gc_alg_parallel:
135 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"all stw\"'" % (self.size, name, xrange, self.pause_file.name)
136 | os.system(gnuplot_cmd)
137 |
138 | # Separate young and mixed stw events
139 | if self.gc_alg_g1gc:
140 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-young.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"young\"'" % (self.size, name, xrange, self.young_pause_file.name)
141 | os.system(gnuplot_cmd)
142 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-mixed.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"mixed\"'" % (self.size, name, xrange, self.mixed_pause_file.name)
143 | os.system(gnuplot_cmd)
144 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-all.png\"; set xdata time; " \
145 | "set ylabel \"Secs\"; " \
146 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
147 | "%s " \
148 | "plot \"%s\" using 1:2 title \"young\"" \
149 | ", \"%s\" using 1:2 title \"mixed\"'" % (self.size, name, xrange, self.young_pause_file.name, self.mixed_pause_file.name)
150 | os.system(gnuplot_cmd)
151 |
152 | # Separate young and mixed stw events
153 | if self.gc_alg_cms:
154 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-young.png\"; set xdata time; set ylabel \"Secs\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"young\"'" % (self.size, name, xrange, self.pause_file.name)
155 | os.system(gnuplot_cmd)
156 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-all.png\"; set xdata time; " \
157 | "set ylabel \"Secs\"; " \
158 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
159 | "%s " \
160 | "plot \"%s\" using 1:2 title \"young\"" \
161 | ", \"%s\" using 1:2 title \"mark\"" \
162 | ", \"%s\" using 1:2 title \"rescan\"'" % (self.size, name, xrange, self.pause_file.name, self.cms_mark_file.name, self.cms_rescan_file.name)
163 | os.system(gnuplot_cmd)
164 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-stw-old.png\"; set xdata time; " \
165 | "set ylabel \"Secs\"; " \
166 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
167 | "%s " \
168 | "plot \"%s\" using 1:2 title \"mark\"" \
169 | ", \"%s\" using 1:2 title \"rescan\"'" % (self.size, name, xrange, self.cms_mark_file.name, self.cms_rescan_file.name)
170 | os.system(gnuplot_cmd)
171 |
172 | # Stw sub-timings
173 | if self.gc_alg_g1gc:
174 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-ext-root-scan.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:3 title \"ext-root-scan\"'" % (self.size, name, xrange, self.pause_file.name)
175 | os.system(gnuplot_cmd)
176 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-update-rs.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:4 title \"update-rs\"'" % (self.size, name, xrange, self.pause_file.name)
177 | os.system(gnuplot_cmd)
178 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-scan-rs.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:5 title \"scan-rs\"'" % (self.size, name, xrange, self.pause_file.name)
179 | os.system(gnuplot_cmd)
180 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-object-copy.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:6 title \"object-copy\"'" % (self.size, name, xrange, self.pause_file.name)
181 | os.system(gnuplot_cmd)
182 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-termination.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:7 title \"termination\"'" % (self.size, name, xrange, self.pause_file.name)
183 | os.system(gnuplot_cmd)
184 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-other.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:8 title \"other\"'" % (self.size, name, xrange, self.pause_file.name)
185 | os.system(gnuplot_cmd)
186 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-substw-unknown.png\"; set xdata time; set ylabel \"millis\"; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:9 title \"unknown\"'" % (self.size, name, xrange, self.pause_file.name)
187 | os.system(gnuplot_cmd)
188 |
189 | # total pause time
190 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-total-pause.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:8 title \"%% of time in gc\"'" % (self.size, name, xrange, self.pause_count_file.name)
191 | os.system(gnuplot_cmd)
192 |
193 | # Note: This seems to have marginal utility as compared to the plot of wall time vs. pause time
194 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-pause-count.png\"; set xdata time; " \
195 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
196 | "%s " \
197 | "plot \"%s\" using 1:2 title \"under-50\" with lines" \
198 | ", \"%s\" using 1:3 title \"50-90\" with lines" \
199 | ", \"%s\" using 1:4 title \"90-120\" with lines" \
200 | ", \"%s\" using 1:5 title \"120-150\" with lines" \
201 | ", \"%s\" using 1:6 title \"150-200\" with lines" \
202 | ", \"%s\" using 1:7 title \"200+\" with lines'" % (self.size, name, xrange, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name, self.pause_count_file.name)
203 | os.system(gnuplot_cmd)
204 |
205 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-heap.png\"; set xdata time; " \
206 | "set ylabel \"MB\"; " \
207 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
208 | "%s " \
209 | "%s " \
210 | "plot \"%s\" using 1:2 title \"pre-gc-amount\"" \
211 | ", \"%s\" using 1:3 title \"post-gc-amount\"'" % (self.size, name, occupancy_threshold_arrow, xrange, self.gc_file.name, self.gc_file.name)
212 | os.system(gnuplot_cmd)
213 |
214 | # Add to-space exhaustion events if any are found
215 | if self.gc_alg_g1gc and os.stat(self.exhaustion_file.name).st_size > 0:
216 | to_space_exhaustion = ", \"%s\" using 1:2 title \"to-space-exhaustion\" pt 7 ps 3" % (self.exhaustion_file.name)
217 | else:
218 | to_space_exhaustion = ""
219 |
220 | # line graph of Eden, Tenured and the Total
221 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-totals.png\"; set xdata time; " \
222 | "set ylabel \"MB\"; " \
223 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
224 | "%s " \
225 | "%s " \
226 | "plot \"%s\" using 1:2 title \"Eden\" with lines" \
227 | ", \"%s\" using 1:4 title \"Tenured\" with lines" \
228 | "%s" \
229 | ", \"%s\" using 1:5 title \"Total\" with lines" \
230 | ", \"%s\" using 1:2 title \"Reclaimable\"'" % (self.size, name, xrange, occupancy_threshold_arrow, self.young_file.name, self.young_file.name, to_space_exhaustion, self.young_file.name, self.reclaimable_file.name)
231 | os.system(gnuplot_cmd)
232 |
233 |
234 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-young.png\"; set xdata time; " \
235 | "set ylabel \"MB\"; " \
236 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
237 | "%s " \
238 | "plot \"%s\" using 1:2 title \"current\"" \
239 | ", \"%s\" using 1:3 title \"max\"'" % (self.size, name, xrange, self.young_file.name, self.young_file.name)
240 | os.system(gnuplot_cmd)
241 |
242 | if self.gc_alg_g1gc:
243 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-tenured-delta.png\"; set xdata time; " \
244 | "set ylabel \"MB\"; " \
245 | "set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; " \
246 | "%s " \
247 | "plot \"%s\" using 1:6 with lines title \"tenured-delta\"'" % (self.size, name, xrange, self.young_file.name)
248 | os.system(gnuplot_cmd)
249 |
250 | if self.gc_alg_g1gc:
251 | # root-scan times
252 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-root-scan.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"root-scan-duration(ms)\"'" % (self.size, name, xrange, self.root_scan_file.name)
253 | os.system(gnuplot_cmd)
254 |
255 | # time from first mixed-gc to last
256 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-mixed-duration.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"mixed-gc-duration(ms)\"'" % (self.size, name, xrange, self.mixed_duration_file.name)
257 | os.system(gnuplot_cmd)
258 |
259 | # count of mixed-gc runs before stopping mixed gcs, max is 8 by default
260 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-mixed-duration-count.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:3 title \"mixed-gc-count\"'" % (self.size, name, xrange, self.mixed_duration_file.name)
261 | os.system(gnuplot_cmd)
262 |
263 | # to-space exhaustion events
264 | if os.stat(self.exhaustion_file.name).st_size > 0:
265 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-exhaustion.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2'" % (self.size, name, xrange, self.exhaustion_file.name)
266 | os.system(gnuplot_cmd)
267 |
268 | # humongous object sizes
269 | if os.stat(self.humongous_objects_file.name).st_size > 0:
270 | gnuplot_cmd = "gnuplot -e 'set term png size %s; set output \"%s-humongous.png\"; set xdata time; set timefmt \"%%Y-%%m-%%d:%%H:%%M:%%S\"; %s plot \"%s\" using 1:2 title \"humongous-object-size(KB)\"'" % (self.size, name, xrange, self.humongous_objects_file.name)
271 | os.system(gnuplot_cmd)
272 |
273 | return
274 |
275 | def determine_gc_alg(self):
276 | with open(self.input_file) as f:
277 | for line in f:
278 | m = re.match('^CommandLine flags: .*', line, flags=0)
279 | if m:
280 | if re.match(".*-XX:\+UseG1GC.*", line, flags=0):
281 | self.gc_alg_g1gc = True
282 | pct = self.get_long_field(line, '-XX:InitiatingHeapOccupancyPercent', 45)
283 | max = self.get_long_field(line, '-XX:MaxHeapSize')
284 | if pct and max:
285 | self.occupancy_threshold = int(max * (pct / 100.0) / 1048576.0)
286 | return
287 |
288 | elif re.match(".*-XX:\+UseConcMarkSweepGC.*", line, flags=0):
289 | self.gc_alg_cms = True
290 | pct = self.get_long_field(line, '-XX:CMSInitiatingOccupancyFraction')
291 | max = self.get_long_field(line, '-XX:MaxHeapSize')
292 | if pct and max:
293 | self.occupancy_threshold = int(max * (pct / 100.0) / 1048576.0)
294 | return
295 | elif re.match(".*-XX:\+UseParallelGC.*", line, flags=0):
296 | self.gc_alg_parallel = True
297 | return
298 |
299 | m = re.match(LogParser.heapG1GCPattern, line, flags=0)
300 | if m:
301 | self.gc_alg_g1gc = True
302 | return
303 |
304 | m = re.match(LogParser.heapCMSPattern, line, flags=0)
305 | if m:
306 | self.gc_alg_cms = True
307 | return
308 |
309 | m = re.match(LogParser.parallelPattern, line, flags=0)
310 | if m:
311 | self.gc_alg_parallel = True
312 | return
313 |
314 | def get_long_field(self, line, field, def_value=0):
315 | m = re.match(".*%s=([0-9]+).*" % field, line, flags=0)
316 | if m:
317 | return long(m.group(1))
318 | else:
319 | return long(def_value)
320 |
321 | def parse_log(self):
322 | with open(self.input_file) as f:
323 | for line in f:
324 | # This needs to be first
325 | self.line_has_timestamp(line)
326 |
327 | self.line_has_gc(line)
328 |
329 | if self.gc_alg_g1gc:
330 | self.collect_root_scan_times(line)
331 | self.collect_mixed_duration_times(line)
332 | self.collect_to_space_exhaustion(line)
333 | self.collect_humongous_objects(line)
334 | self.collect_reclaimable(line)
335 | self.collect_stw_sub_timings(line)
336 |
337 | # find the occupance threshold if CommandLine log line not present
338 | if not self.occupancy_threshold:
339 | self.collect_occupancy_threshold_pattern(line)
340 |
341 | if self.gc_alg_cms:
342 | self.write_cms_data(line)
343 |
344 | # This needs to be last
345 | if self.line_has_pause_time(line):
346 | self.output_data()
347 | self.stw.reset()
348 |
349 | def output_data(self):
350 | if self.mixed_duration_count == 0:
351 | self.young_pause_file.write("%s %.6f\n" % (self.timestamp_string(), self.pause_time))
352 | else:
353 | self.mixed_pause_file.write("%s %.6f\n" % (self.timestamp_string(), self.pause_time))
354 |
355 | self.pause_file.write("%s %.6f %d %d %d %d %d %d %d\n" % (self.timestamp_string(), self.pause_time, self.stw.ext_root_scan, self.stw.update_rs, self.stw.scan_rs, self.stw.object_copy, self.stw.termination, self.stw.other, self.stw.unknown_time(self.pause_time)))
356 | self.young_file.write("%s %s %s %s %s %s\n" % (self.timestamp_string(), self.pre_gc_young, self.pre_gc_young_target, self.pre_gc_total - self.pre_gc_young, self.pre_gc_total, self.tenured_delta))
357 |
358 | # clean this up, full_gc's should probably graph
359 | # in the same chart as regular gc events if possible
360 | if self.full_gc:
361 | self.full_gc_file.write("%s %s %s\n" % (self.timestamp_string(), self.pre_gc_total, self.post_gc_total))
362 | self.full_gc = False
363 | elif self.gc:
364 | self.gc_file.write("%s %s %s\n" % (self.timestamp_string(), self.pre_gc_total, self.post_gc_total))
365 | self.gc = False
366 |
367 | def output_pause_counts(self):
368 | self.pause_count_file.write("%s %s %s %s %s %s %s %s\n" % (self.timestamp_string(), self.under_50, self.under_90, self.under_120, self.under_150, self.under_200, self.over_200, self.total_pause_time * 100 / 60))
369 |
370 | def line_has_pause_time(self, line):
371 | m = re.match("[0-9-]*T[0-9]+:([0-9]+):.* threads were stopped: ([0-9.]+) seconds", line, flags=0)
372 | if not m or not (self.gc or self.full_gc):
373 | return False
374 |
375 | cur_minute = int(m.group(1))
376 | self.pause_time = float(m.group(2))
377 | self.increment_pause_counts(self.pause_time)
378 |
379 | if cur_minute != self.last_minute:
380 | self.last_minute = cur_minute
381 | self.output_pause_counts()
382 | self.reset_pause_counts()
383 |
384 | return True
385 |
386 | def line_has_timestamp(self, line):
387 | t = line.split()
388 | if t and len(t) > 0:
389 | t = t[0]
390 | if t:
391 | t = t[:-1]
392 |
393 | if t and len(t) > 15: # 15 is mildly arbitrary
394 | try:
395 | self.timestamp = dateutil.parser.parse(t)
396 | except (ValueError, AttributeError), e:
397 | return
398 | return
399 |
400 | def timestamp_string(self):
401 | return self.any_timestamp_string(self.timestamp)
402 |
403 | def any_timestamp_string(self, ts):
404 | return ts.strftime("%Y-%m-%d:%H:%M:%S")
405 |
406 | def collect_root_scan_times(self, line):
407 | m = re.match(LogParser.rootScanStartPattern, line, flags=0)
408 | if m:
409 | if self.root_scan_mark_end_time > 0:
410 | elapsed_time = self.root_scan_mark_end_time - self.root_scan_start_time
411 | self.root_scan_file.write("%s %s\n" % (self.any_timestamp_string(self.root_scan_end_timestamp), elapsed_time))
412 | self.root_scan_mark_end_time = 0
413 |
414 | self.root_scan_start_time = int(float(m.group(1)) * 1000)
415 | return
416 |
417 |
418 | m = re.match(LogParser.rootScanMarkEndPattern, line, flags=0)
419 | if m and self.root_scan_start_time > 0:
420 | self.root_scan_mark_end_time = int(float(m.group(1)) * 1000)
421 | self.root_scan_end_timestamp = self.timestamp
422 | return
423 |
424 | m = re.match(LogParser.rootScanEndPattern, line, flags=0)
425 | if m and self.root_scan_start_time > 0:
426 | self.root_scan_end_timestamp = self.timestamp
427 | elapsed_time = int(float(m.group(1)) * 1000) - self.root_scan_start_time
428 | self.root_scan_file.write("%s %s\n" % (self.any_timestamp_string(self.root_scan_end_timestamp), elapsed_time))
429 | self.root_scan_start_time = 0
430 | self.root_scan_mark_end_time = 0
431 |
432 | def collect_mixed_duration_times(self, line):
433 | m = re.match(LogParser.mixedStartPattern, line, flags=0)
434 | if m:
435 | self.mixed_duration_start_time = int(float(m.group(1)) * 1000)
436 | self.mixed_duration_count += 1
437 | return
438 |
439 | m = re.match(LogParser.mixedContinuePattern, line, flags=0)
440 | if m:
441 | self.mixed_duration_count += 1
442 | return
443 |
444 | m = re.match(LogParser.mixedEndPattern, line, flags=0)
445 | if m and self.mixed_duration_start_time > 0:
446 | elapsed_time = int(float(m.group(1)) * 1000) - self.mixed_duration_start_time
447 | self.mixed_duration_count += 1
448 | self.mixed_duration_file.write("%s %s %s\n" % (self.timestamp_string(), elapsed_time, self.mixed_duration_count))
449 | self.mixed_duration_start_time = 0
450 | self.mixed_duration_count = 0
451 |
452 | def collect_to_space_exhaustion(self, line):
453 | m = re.match(LogParser.exhaustionPattern, line, flags=0)
454 | if m and self.timestamp:
455 | self.exhaustion_file.write("%s %s\n" % (self.timestamp_string(), 100))
456 |
457 | def collect_humongous_objects(self, line):
458 | m = re.match(LogParser.humongousObjectPattern, line, flags=0)
459 | if m and self.timestamp:
460 | self.humongous_objects_file.write("%s %s\n" % (self.timestamp_string(), int(m.group(1)) / 1024))
461 |
462 | def collect_occupancy_threshold_pattern(self, line):
463 | m = re.match(LogParser.occupancyThresholdPattern, line, flags=0)
464 | if m:
465 | self.occupancy_threshold = int(int(m.group(1)) / 1048576)
466 |
467 | def collect_reclaimable(self, line):
468 | m = re.match(LogParser.reclaimablePattern, line, flags=0)
469 | if m and int(float(m.group(2))) >= int(m.group(3)) and self.timestamp:
470 | self.reclaimable_file.write("%s %d\n" % (self.timestamp_string(), long(m.group(1)) / 1048576))
471 |
472 | def collect_stw_sub_timings(self, line):
473 | if re.match('^[ ]+\[.*', line):
474 | self.stw.ext_root_scan = self.parseMaxTiming('Ext Root Scanning', line, self.stw.ext_root_scan)
475 | self.stw.update_rs = self.parseMaxTiming('Update RS', line, self.stw.update_rs)
476 | self.stw.scan_rs = self.parseMaxTiming('Scan RS', line, self.stw.scan_rs)
477 | self.stw.object_copy = self.parseMaxTiming('Object Copy', line, self.stw.object_copy)
478 | self.stw.termination = self.parseMaxTiming('Termination', line, self.stw.termination)
479 | m = re.match('^[ ]+\[Other: ([0-9.]+).*', line)
480 | if m:
481 | self.stw.other = int(float(m.group(1)))
482 |
483 | def parseMaxTiming(self, term, line, current_value):
484 | m = re.match("^[ ]+\[%s .* Max: ([0-9]+)\.[0-9],.*" % (term), line)
485 | if m:
486 | return int(float(m.group(1)))
487 | else:
488 | return current_value
489 |
490 | def write_cms_data(self, line):
491 | # collect stw times
492 | # 1) initial marking step, checks from roots
493 | # 2016-04-30T06:11:03.626+0000: 120634.808: [CMS-concurrent-mark: 0.922/0.922 secs] [Times: user=7.25 sys=0.59, real=0.93 secs]
494 | m = re.match(".*\[CMS-concurrent-mark: .*, real=([.0-9]+) secs.*", line, flags=0)
495 | if m:
496 | self.cms_mark_file.write("%s %.6f\n" % (self.timestamp_string(), float(m.group(1))))
497 |
498 | # 2) rescan phase
499 | # 2016-04-30T06:11:09.341+0000: 120640.523: [GC (CMS Final Remark) [YG occupancy: 737574 K (996800 K)]2016-04-30T06:11:09.341+0000: 120640.523: [Rescan (parallel) , 0.0728015 secs]2016-04-30T06:11:09.414+0000: 120640.596: [weak refs processing, 0.0236183 secs]2016-04-30T06:11:09.437+0000: 120640.619: [class unloading, 0.0157037 secs]2016-04-30T06:11:09.453+0000: 120640.635: [scrub symbol table, 0.0069954 secs]2016-04-30T06:11:09.460+0000: 120640.642: [scrub string table, 0.0007916 secs][1 CMS-remark: 22933820K(30349760K)] 23671395K(31346560K), 0.1314855 secs] [Times: user=0.83 sys=0.17, real=0.13 secs]
500 | m = re.match(".*\[Rescan .*, real=([.0-9]+) secs.*", line, flags=0)
501 | if m:
502 | self.cms_rescan_file.write("%s %.6f\n" % (self.timestamp_string(), float(m.group(1))))
503 |
504 |
505 | def line_has_gc(self, line):
506 | m = re.match(LogParser.heapG1GCPattern, line, flags=0)
507 | if m:
508 | self.store_gc_amount(m)
509 | self.gc = True
510 | return
511 |
512 | m = re.match(LogParser.parallelPattern, line, flags=0)
513 | if m:
514 | self.store_gc_amount(m)
515 | self.gc = True
516 | return
517 |
518 | m = re.match(LogParser.parallelFullPattern, line, flags=0)
519 | if m:
520 | self.store_gc_amount(m)
521 | self.full_gc = True
522 |
523 | m = re.match(LogParser.heapCMSPattern, line, flags=0)
524 | if m:
525 | self.store_gc_amount(m)
526 | self.gc = True
527 |
528 | return
529 |
530 | def store_gc_amount(self, matcher):
531 | i = 1
532 | self.pre_gc_young = self.scale(matcher.group(i), matcher.group(i+1))
533 |
534 | if self.gc_alg_g1gc or self.gc_alg_parallel:
535 | i += 2
536 | self.pre_gc_young_target = self.scale(matcher.group(i), matcher.group(i+1))
537 |
538 | if self.gc_alg_cms:
539 | i += 2
540 | self.post_gc_young = self.scale(matcher.group(i), matcher.group(i+1))
541 |
542 | if self.gc_alg_g1gc:
543 | i += 2
544 | self.pre_gc_survivor = self.scale(matcher.group(i), matcher.group(i+1))
545 | i += 2
546 | self.post_gc_survivor = self.scale(matcher.group(i), matcher.group(i+1))
547 |
548 | i += 2
549 | self.pre_gc_total = self.scale(matcher.group(i), matcher.group(i+1))
550 | i += 2
551 | self.post_gc_total = self.scale(matcher.group(i), matcher.group(i+1))
552 |
553 | if self.gc_alg_g1gc:
554 | self.tenured_delta = (self.post_gc_total - self.post_gc_survivor) - (self.pre_gc_total - self.pre_gc_young - self.pre_gc_survivor)
555 |
556 | def scale(self, amount, unit):
557 | rawValue = float(amount)
558 | if unit == 'B':
559 | return int(rawValue / (1024.0 * 1024.0))
560 | elif unit == 'K':
561 | return int(rawValue / 1024.0)
562 | elif unit == 'M':
563 | return int(rawValue)
564 | elif unit == 'G':
565 | return int(rawValue * 1024.0)
566 | return rawValue
567 |
568 | def increment_pause_counts(self, pause_time):
569 | self.total_pause_time = self.total_pause_time + pause_time
570 |
571 | if pause_time < 0.050:
572 | self.under_50 = self.under_50 + 1
573 | elif pause_time < 0.090:
574 | self.under_90 = self.under_90 + 1
575 | elif pause_time < 0.120:
576 | self.under_120 = self.under_120 + 1
577 | elif pause_time < 0.150:
578 | self.under_150 = self.under_150 + 1
579 | elif pause_time < 0.200:
580 | self.under_200 = self.under_200 + 1
581 | else:
582 | self.over_200 = self.over_200 + 1
583 |
584 | def reset_pause_counts(self):
585 | self.under_50 = 0
586 | self.under_90 = 0
587 | self.under_120 = 0
588 | self.under_150 = 0
589 | self.under_200 = 0
590 | self.over_200 = 0
591 | self.total_pause_time = 0
592 |
593 | def main():
594 | logParser = LogParser(sys.argv[1])
595 | try:
596 | logParser.determine_gc_alg()
597 | print("gc alg: parallel=%s, g1gc=%s, cms=%s" % (logParser.gc_alg_parallel, logParser.gc_alg_g1gc, logParser.gc_alg_cms))
598 | logParser.parse_log()
599 | logParser.close_files()
600 | basefilename = sys.argv[2] if len(sys.argv) > 2 else 'default'
601 | start = None
602 | end = None
603 | if len(sys.argv) > 3:
604 | start = sys.argv[3]
605 | end = sys.argv[4]
606 | logParser.gnuplot(basefilename, start, end)
607 | finally:
608 | logParser.cleanup()
609 |
610 |
611 | if __name__ == '__main__':
612 | main()
613 |
614 |
--------------------------------------------------------------------------------
/images/humongous-objects.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/humongous-objects.png
--------------------------------------------------------------------------------
/images/ihop-and-reclaimable-unhealthy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/ihop-and-reclaimable-unhealthy.png
--------------------------------------------------------------------------------
/images/ihop-and-reclaimable.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/ihop-and-reclaimable.png
--------------------------------------------------------------------------------
/images/to-space-exhaustion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HubSpot/gc_log_visualizer/1bd0bf92b60e5db16c09985b77adb6519ea3aa68/images/to-space-exhaustion.png
--------------------------------------------------------------------------------
/regionsize_vs_objectsize.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | log=$1
4 | if [ -z "${log}" ] ; then
5 | echo "Usage: ${0} "
6 | exit
7 | fi
8 |
9 | total=`grep "source: concurrent humongous allocation" ${log} | wc -l`
10 | fit2mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<1048576) print}' | wc -l`
11 | fit4mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<2097152) print}' | wc -l`
12 | fit8mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<4194304) print}' | wc -l`
13 | fit16mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<8388608) print}' | wc -l`
14 | fit32mb=`grep "source: concurrent humongous allocation" ${log} | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' | awk '{if($1<16777216) print}' | wc -l`
15 |
16 | echo "${total} humongous objects referenced in ${log}"
17 | echo `echo ${fit2mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 2mb g1 region size
18 | echo `echo ${fit4mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 4mb g1 region size
19 | echo `echo ${fit8mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 8mb g1 region size
20 | echo `echo ${fit16mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 16mb g1 region size
21 | echo `echo ${fit32mb} ${total} | awk '{printf "%2.0f", 100 * $1 / $2}'`% would not be humongous with a 32mb g1 region size
22 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | -e .
2 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup, find_packages
2 |
3 | setup(name='gc_log_visualizer',
4 | version='0.3',
5 | description='Generate multiple gnuplot graphs from java gc log data',
6 | author='Eric Abbott',
7 | author_email='eabbott@hubspot.com',
8 | url='https://github.com/HubSpot/gc_log_visualizer',
9 | packages=find_packages(),
10 | zip_safe=False,
11 | include_package_data=True,
12 | install_requires=[
13 | 'python-dateutil'
14 | ],
15 | platforms=["any"]
16 | )
17 |
--------------------------------------------------------------------------------