├── .gitignore ├── LICENSE ├── README.md ├── build.sh ├── heapprof.c ├── heapprof.py ├── ok.py ├── test.c └── threads.c /.gitignore: -------------------------------------------------------------------------------- 1 | *.heapprof 2 | test 3 | threads 4 | heapprof.so 5 | core 6 | *.core 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014, Yossi Kreinin 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | heapprof 2 | ======== 3 | 4 | A 250 LOC heap profiler - easy to hack/port, works out of the box with Linux dynamically linked binaries. 5 | 6 | For a description of how it works, see [How to make a heap profiler](https://yosefk.com/blog/how-to-make-a-heap-profiler.html). 7 | 8 | Build 9 | ===== 10 | 11 | ``` 12 | ./build.sh 13 | ``` 14 | 15 | Then: 16 | 17 | * Copy **heapprof.so** to a directory in your $LD_LIBRARY_PATH. 18 | * Copy **heapprof.py** to a directory in your $PATH. 19 | 20 | Usage 21 | ===== 22 | 23 | The following works on Linux; see [Porting](#porting) for how to adapt this to, say, a "bare metal" embedded platform. 24 | 25 | To run your program with malloc instrumentation. 26 | 27 | ``` 28 | env LD_PRELOAD=heapprof.so GLIBCXX_FORCE_NEW=1 your-program args... 29 | ``` 30 | 31 | The program should produce a core dump in any of the possible ways, such as: 32 | 33 | ``` 34 | int* p=0; *p=0; // a C program can just crash itself 35 | os.kill(os.getpid(), 11) # a Python program can send itself a SIGSEGV 36 | gdb program -ex "b func_name" -ex r -ex "gcore my.core" -ex q 37 | # the above places a gdb breakpoint and dumps core when it's hit 38 | ``` 39 | 40 | The core dump contains metadata near each not-yet-freed malloc'd chunk of memory. To decode this metadata, use: 41 | 42 | ``` 43 | heaaprof.py your-program core > output.heapprof 44 | ``` 45 | 46 | You will get a list of call stacks sorted by the sum of sizes of allocated blocks, along the lines of: 47 | 48 | ``` 49 | 19% 10000 [1000, 2000, 3000, 1000, 3000] 50 | 0x843f234 myfunc /my/file.c 51 | ... 52 | ``` 53 | 54 | This shows the relative and absolute sum of block sizes, a list of the sizes of all live blocks (can be long...), and the call stack. 55 | 56 | Runtime options 57 | =============== 58 | 59 | * When running with heapprof.so, use **$HEAPPROF_FRAMES** to configure the number of collected stack frames (default: 16) 60 | * When running heapprof.py, set **$HEAPPROF_ADDR2LINE** if you prefer to use it instead of gdb for function names and source line information (drawback: doesn't see inside shared libraries; advantage: doesn't look at the core dump so works with custom (non-ELF) core dumps that an embedded system could produce) 61 | * **$GLIBCXX_FORCE_NEW** in the usage example above isn't a heapprof.so thing - it forces GNU libc++ to use malloc so its memory usage isn't obscured by its own memory allocator. If your program has other custom allocators, forcing it to use malloc instead under heapprof.so could be a good idea. 62 | 63 | Porting 64 | ======= 65 | 66 | * **heapprof.py** will work with a non-standard core dump format (such as a raw memory dump obtained from a JTAG probe or sent over any other channel) if you setenv $HEAPPROF_ADDR2LINE. If addr2line does not work with your executable format you'll need to hack heapprof.py to use whatever works. 67 | * if your platform doesn't have a **backtrace()** function you'll need to roll your own. See some tips [here](http://www.yosefk.com/blog/getting-the-call-stack-without-a-frame-pointer.html). 68 | * heapprof.c uses **dlsym** to get &malloc and &free - its own malloc/free are implemented on top of libc's functions. In a statically linked program, you'd need to pull in the code of malloc and free, rename them and call them by that new name instead. 69 | * heapprof.c uses **pthread_mutex_lock** to handle recursive calls to malloc in a multithreaded environment. The reason it needs to do so is that backtrace() was observed to call malloc at times. If your backtrace() doesn't use malloc or you're in a single-threaded environment, you can just delete that code. If your backtrace() mallocs, *and* it's a multi-threaded system but it's not using pthreads, you'll need to reimplement the recursive mutex using your system's facilities. 70 | * heapprof.c uses **sbrk** as a malloc replacement before its initialization is complete. If your system doesn't have sbrk, you'll need to roll your own memory pool - unless you also don't have dlsym and pthreads, in which case you don't need any of this stuff - it's those two who malloc during init time... 71 | * On systems requiring malloc blocks to be aligned to more than 8 bytes, change **ALIGNMENT** in heapprof.c accordingly. 72 | * On big endian platforms, **find_blocks()** from heapprof.py will need tweaking to search for the right magic string - or heapprof.c's magic numbers can be modified. 73 | -------------------------------------------------------------------------------- /build.sh: -------------------------------------------------------------------------------- 1 | #!/bin/tcsh -f 2 | setenv HEAPPROF_FRAMES 8 3 | # use addr2line instead of gdb - just to test that code branch 4 | setenv HEAPPROF_ADDR2LINE 5 | unlimit coredumpsize 6 | echo building the preloaded profiler library 7 | gcc -o heapprof.so heapprof.c -shared -fPIC -ldl -lpthread -ansi -Wall 8 | echo building and running a test program 9 | echo 10 | gcc -o test test.c -g 11 | env LD_PRELOAD=./heapprof.so gdb -q ./test -ex 'b sample_state' -ex r -ex 'gcore first.core' -ex c -ex 'gcore second.core' -ex c -ex q 12 | echo 13 | echo running the heap profiler on the first snapshot 14 | ./heapprof.py test first.core > first.heapprof 15 | echo running the heap profiler on the second snapshot 16 | ./heapprof.py test second.core > second.heapprof 17 | echo 18 | unsetenv HEAPPROF_ADDR2LINE 19 | echo testing thread support 20 | gcc -o threads threads.c -g -lpthread 21 | rm -f core 22 | env LD_PRELOAD=./heapprof.so ./threads 23 | echo 24 | echo running the heap profiler on the snapshot - will dump core intentionally 25 | ./heapprof.py threads core > threads.heapprof 26 | echo 27 | echo testing on Python - will make it dump core intentionally 28 | echo 29 | rm -f core 30 | env LD_PRELOAD=./heapprof.so python -c 'import os; os.kill(os.getpid(), 11)' 31 | ./heapprof.py `which python` core > python.heapprof 32 | echo 33 | echo results are at '*.heapprof' 34 | # test results at *.heapprof 35 | ./ok.py 36 | -------------------------------------------------------------------------------- /heapprof.c: -------------------------------------------------------------------------------- 1 | #define _GNU_SOURCE /* for RTLD_NEXT */ 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | /* NOTE: some systems may require a different alignment; 11 | ALIGN_UP assumes that ALIGNMENT is a power of 2 */ 12 | #define ALIGNMENT 8 13 | 14 | #define ALIGN_UP(x) (((x) + (ALIGNMENT-1)) & ~(ALIGNMENT-1)) 15 | #define EXTRA ALIGN_UP((g_heapprof_frames+3)*sizeof(void*)) 16 | 17 | #define START_MAGIC ((void*)0x50616548) /* ASCII "HeaP" */ 18 | #define END_MAGIC ((void*)0x466f7250) /* ASCII "ProF" */ 19 | 20 | /* metadata block layout: 21 | 22 | START_MAGIC 23 | size 24 | caller ret addr 1 25 | caller ret addr 2 26 | ... 27 | caller ret addr g_heapprof_frames-1 28 | END_MAGIC 29 | */ 30 | #define START_INDEX 0 31 | #define SIZE_INDEX 1 32 | #define END_INDEX (g_heapprof_frames+2) 33 | 34 | typedef void* (*malloc_func)(size_t); 35 | typedef void* (*free_func)(void*); 36 | 37 | static malloc_func g_malloc; 38 | static free_func g_free; 39 | static int g_heapprof_frames = 16; 40 | 41 | static pthread_mutex_t g_backtrace_mutex; 42 | static pthread_mutexattr_t g_mutex_attr; 43 | static int g_mutex_init; 44 | 45 | static char* g_pre_init_begin; 46 | static char* g_pre_init_end; 47 | 48 | static void init(void) { 49 | static int once = 1; 50 | if(once) { 51 | once = 0; 52 | 53 | /* these may call malloc... so malloc is designed to work with g_malloc==0 */ 54 | g_malloc = (malloc_func)(size_t)dlsym(RTLD_NEXT, "malloc"); 55 | g_free = (free_func)(size_t)dlsym(RTLD_NEXT, "free"); 56 | 57 | char* frames = getenv("HEAPPROF_FRAMES"); 58 | if(frames) g_heapprof_frames = atoi(frames); 59 | 60 | pthread_mutexattr_init(&g_mutex_attr); 61 | pthread_mutexattr_settype(&g_mutex_attr, PTHREAD_MUTEX_RECURSIVE); 62 | pthread_mutex_init(&g_backtrace_mutex, &g_mutex_attr); 63 | g_mutex_init = 1; 64 | 65 | puts("*** heapprof: initialized (remove heapprof.so from $LD_PRELOAD to disable) ***"); 66 | } 67 | } 68 | 69 | /* if malloc is called before we have g_malloc (as it is by dlsym("malloc")...), 70 | get memory from sbrk(), and remember the range of pointers allocated by sbrk 71 | so that free() doesn't try to call *g_free on them */ 72 | static void* pre_init_malloc(size_t size) { 73 | if(!g_pre_init_begin) g_pre_init_begin = (char*)sbrk(0); 74 | char* p = (char*)sbrk(size); 75 | g_pre_init_end = p + size; 76 | return p; 77 | } 78 | 79 | void* malloc(size_t size) { 80 | static int inside_malloc = 0; /* this is needed since backtrace mallocs. protected by g_backtrace_mutex */ 81 | init(); 82 | 83 | if(g_mutex_init) pthread_mutex_lock(&g_backtrace_mutex); 84 | 85 | void** p = (void**)(g_malloc ? g_malloc(size+EXTRA) : pre_init_malloc(size+EXTRA)); 86 | p[SIZE_INDEX] = (void*)size; /* write size even if we don't write the call stack [for realloc] */ 87 | 88 | if(inside_malloc || !g_mutex_init) { 89 | if(g_mutex_init) pthread_mutex_unlock(&g_backtrace_mutex); 90 | return (char*)p + EXTRA; /* backtrace() calls malloc which causes infinite recursion. 91 | unfortunately this workaround is thread-unsafe */ 92 | } 93 | 94 | inside_malloc = 1; 95 | p[START_INDEX] = START_MAGIC; 96 | backtrace(p+SIZE_INDEX, g_heapprof_frames+1); /* 1 for &malloc, overwriting size */ 97 | p[SIZE_INDEX] = (void*)size; /* overwrite &malloc back, with size */ 98 | p[END_INDEX] = END_MAGIC; 99 | inside_malloc = 0; 100 | 101 | if(g_mutex_init) pthread_mutex_unlock(&g_backtrace_mutex); 102 | 103 | return (char*)p + EXTRA; 104 | } 105 | 106 | void free(void *ptr) { 107 | if(!ptr) return; /* free(NULL) is a legitimate no-op */ 108 | init(); 109 | void** p = (void**)((char*)ptr - EXTRA); 110 | p[START_INDEX] = 0; /* clear the magic numbers so heapprof.py doesn't find a free block */ 111 | p[END_INDEX] = 0; 112 | if(!g_free) { 113 | return; /* we aren't fully initialized - so leak the block */ 114 | } 115 | if((char*)p >= g_pre_init_begin && (char*)p <= g_pre_init_end) { 116 | return; /* a block allocated by sbrk() - can't free */ 117 | } 118 | g_free(p); 119 | } 120 | 121 | void* calloc(size_t nmemb, size_t size) { 122 | void* ret = malloc(nmemb * size); 123 | memset(ret, 0, nmemb * size); 124 | return ret; 125 | } 126 | 127 | void* realloc(void* ptr, size_t size) { 128 | if(!ptr) return malloc(size); 129 | size_t* p = (size_t*)((char*)ptr - EXTRA); 130 | size_t prev_size = p[SIZE_INDEX]; /* that's why malloc saves size even when it doesn't save the call stack */ 131 | void* new_ptr = malloc(size); 132 | size_t copy_size = prev_size < size ? prev_size : size; 133 | memcpy(new_ptr, ptr, copy_size); 134 | free(ptr); 135 | return new_ptr; 136 | } 137 | -------------------------------------------------------------------------------- /heapprof.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | import sys, commands, struct, operator, subprocess, os 3 | 4 | if len(sys.argv) != 3: 5 | print 'usage:',sys.argv[0],' ' 6 | sys.exit(1) 7 | 8 | prog, core = sys.argv[1:] 9 | 10 | # finds out the size of void*/size_t. could be hardcoded for speed... 11 | try: 12 | cell = int(commands.getoutput('gdb '+prog+r''' -ex 'printf "cell %d\n", sizeof(void*)' -ex q | grep cell''').split()[1]) 13 | except: 14 | print 'gdb failed to open',prog,core,'- assuming a 32b pointer' 15 | cell = 4 16 | 17 | fmt = {4:'I',8:'Q'}[cell] 18 | 19 | def gdb_sym_info(addrs,exe): 20 | gdb = subprocess.Popen(['gdb',prog,core], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) 21 | info = {} 22 | found = 0 23 | for addr in addrs: 24 | if addr: 25 | gdb.stdin.write('info symbol 0x%x\n'%addr) 26 | gdb.stdin.write('list *0x%x\n'%addr) 27 | gdb.stdin.write('printf "\\ndone\\n"\n') 28 | gdb.stdin.flush() 29 | line = '' 30 | lineinfo = None 31 | syminfo = 'UNKNOWN' 32 | while line != 'done': 33 | line = gdb.stdout.readline().strip() 34 | if 'is in' in line: lineinfo = line.split('is in ')[1] 35 | if 'in section' in line: syminfo = line.split('(gdb) ')[1] 36 | if lineinfo: 37 | info[addr] = lineinfo 38 | else: 39 | info[addr] = syminfo 40 | found += int(info[addr] != 'UNKNOWN') 41 | 42 | return info, found 43 | 44 | def addr2line_sym_info(addrs,exe): 45 | addr2line = subprocess.Popen('addr2line -f -e'.split()+[exe], stdin=subprocess.PIPE, stdout=subprocess.PIPE) 46 | info = {} 47 | for addr in addrs: 48 | if addr: 49 | addr2line.stdin.write('0x%x\n'%addr) 50 | addr2line.stdin.flush() 51 | info[addr] = addr2line.stdout.readline().strip()+' '+addr2line.stdout.readline().strip() 52 | return info 53 | 54 | def sym_info(addrs,exe): 55 | if 'HEAPPROF_ADDR2LINE' in os.environ: 56 | gdb_found = 0 57 | else: 58 | syminfo, gdb_found = gdb_sym_info(addrs, prog) 59 | if gdb_found < 1: # gdb didn't manage to find anything - perhaps the core dump is in a custom format 60 | syminfo = addr2line_sym_info(addrs, prog) 61 | return syminfo 62 | 63 | # a silly guard against "non-blocks" - occurences of HeaP and ProF 64 | # in code instead of data 65 | def is_block(s,e): return (e-s)%cell == 0 and (e-s)/cell < 100 66 | 67 | class Block: 68 | def __init__(self, metadata): 69 | self.size = struct.unpack(fmt, metadata[0:cell])[0] 70 | self.stack = struct.unpack('%d'%(len(metadata)/cell - 1)+fmt, metadata[cell:]) 71 | 72 | def find_blocks(bytes): 73 | blocks = [] 74 | end_index = 0 75 | while True: 76 | start_index = bytes.find('HeaP',end_index) 77 | end_index = bytes.find('ProF',start_index) 78 | if not is_block(start_index, end_index): 79 | end_index = start_index + cell # search again 80 | else: 81 | if min(start_index, end_index) < 0: 82 | break 83 | blocks.append(Block(bytes[start_index+cell:end_index])) # this assumes little endian... 84 | return blocks 85 | 86 | def code_addrs(blocks): 87 | return list(reduce(operator.or_, [set(block.stack) for block in blocks])) 88 | 89 | def report(blocks, syminfo): 90 | stack2sizes = {} 91 | for block in blocks: 92 | stack2sizes.setdefault(block.stack,list()).append(block.size) 93 | 94 | total = sorted([(sum(sizes), stack) for stack, sizes in stack2sizes.iteritems()]) 95 | heapsize = sum([size for size, stack in total]) 96 | 97 | for size, stack in reversed(total): 98 | print '%d%% %d %s'%(int(100.*size/heapsize), size, stack2sizes[stack]) 99 | for addr in stack: 100 | if addr: 101 | print ' 0x%x'%addr, syminfo[addr] 102 | 103 | blocks = find_blocks(open(core,'rb').read()) 104 | if not blocks: 105 | print 'no heap blocks found in the core dump (searched for metadata enclosed in the magic string HeaP...ProF)' 106 | sys.exit(1) 107 | syminfo = sym_info(code_addrs(blocks), prog) 108 | report(blocks, syminfo) 109 | 110 | -------------------------------------------------------------------------------- /ok.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | '''test that the build and run results are OK''' 3 | # threads.heapprof should have 2 call stacks allocating 1024 blocks each 4 | import commands, sys 5 | 6 | print 'testing that *.heapprof files contain the expected results...' 7 | 8 | assert commands.getoutput("grep ' 1024 ' threads.heapprof | wc -l").strip() == '2' 9 | 10 | second = open('second.heapprof').read() 11 | if 'no heap blocks found' in second: 12 | print "threads.heapprof is OK but second.heapprof is not - perhaps gdb's gcore command doesn't work? Is it gdb 7.2 and up?" 13 | print "anyway, this test failed but presumably heapprof itself works correctly." 14 | sys.exit() 15 | assert '1048576 [1048576]' in second 16 | assert '1048576 [131073, 131073, 131071, 131073, 131071, 131073, 131071, 131071]' in second 17 | assert 'example_func' in second 18 | assert 'another_func' in second 19 | 20 | print 'ok.' 21 | 22 | -------------------------------------------------------------------------------- /test.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | void sample_state() {} 5 | 6 | void example_func() { 7 | int i; 8 | for(i=0; i<8; ++i) { 9 | int* p=(int*)malloc(128*1024 + (i%2 ? 1 : -1)); 10 | p[1]=5; 11 | } 12 | } 13 | 14 | void another_func() { 15 | malloc(1024*1024); 16 | } 17 | 18 | int main() { 19 | example_func(); 20 | sample_state(); 21 | another_func(); 22 | sample_state(); 23 | } 24 | -------------------------------------------------------------------------------- /threads.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | void make_blocks(void) { 6 | int i; 7 | for(i=0; i<1024*16; ++i) { 8 | void* p = malloc(1); 9 | if(i%16) { free(p); } 10 | } 11 | } 12 | 13 | /* this function is run by the second thread */ 14 | void* thread_func(void* x_void_ptr) 15 | { 16 | make_blocks(); 17 | return 0; 18 | } 19 | 20 | int main() { 21 | pthread_t thread; 22 | pthread_create(&thread, 0, thread_func, 0); 23 | 24 | make_blocks(); 25 | 26 | pthread_join(thread, NULL); 27 | 28 | int* p = 0; 29 | *p = 0; 30 | } 31 | --------------------------------------------------------------------------------