├── README.md
├── crash
    ├── Main.hs
    ├── Setup.hs
    ├── crash.c
    ├── crash.cabal
    └── crash.h
└── ghd
    ├── Makefile
    ├── atos.h
    ├── ghd.cpp
    ├── ghd.h
    ├── rts_wrapper.h
    ├── rts_wrapper_noprof.c
    └── rts_wrapper_prof.c


/README.md:
--------------------------------------------------------------------------------
   1 | 
   2 | # Hacking GHC's Stack for Fun and Profit
   3 | 
   4 | This document explains how to look at the stack of a [GHC](http://www.haskell.org/ghc/) compiled [Haskell](http://www.haskell.org/haskellwiki/Haskell) program, and why that might be of interest even to somebody who's not a [GHC hacker](http://www.haskell.org/ghc/contributors). Spoiler: It's a basic building block for many kinds of debugging and profiling tools.
   5 | 
   6 | The code / explanations here are for OS X with 32bit GHC 7.6.3 and gcc 4.2.1, but I'll mention relevant differences to make this apply on a different platform / GHC / architecture as well. Just be alert about different word and pointer sizes, register names and calling conventions.
   7 | 
   8 | ### Disclaimer
   9 | 
  10 | I'm not what you'd call an expert on the internals of GHC and its RTS. I simply hope this will be helpful, and I'd love to be corrected and find a better way of doing and explaining all of this.
  11 | 
  12 | ### Motivation
  13 | 
  14 | GHC's [debugging facilities](http://www.haskell.org/haskellwiki/Debugging) are rather limited. There is a debugger inside the [GHCi](http://www.haskell.org/haskellwiki/GHC/GHCi) REPL, but that requires our program to be build and linked by its bytecode compiler. Over the years, there have been many issues preventing otherwise perfectly fine Haskell code from working within GHCi, and even if none of those affect us, we're still left with a much slower running program with significantly changed runtime behavior. Another option would be the [-xc flag](http://www.haskell.org/ghc/docs/7.6.3/html/users_guide/runtime-control.html#rts-options-debugging) of GHC's [RTS](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts), doing a stack trace for every exception. That one also has its fair share of drawbacks (for instance [Ticket #8512](https://ghc.haskell.org/trac/ghc/ticket/8512#ticket)).
  15 | 
  16 | The situation goes from bad to worse as we move from Haskell exceptions to actual [segmentation faults](http://en.wikipedia.org/wiki/Segfault). Here we are completely on our own, as none of the facilities mentioned above work on them. Additionally, the way GHC implements Haskell's stack and generates debug information makes its compiled programs opaque to debuggers like [gdb](https://www.gnu.org/software/gdb/). For this reason, most other profiling and analysis programs will also have a rather hard time with GHC compiled code.
  17 | 
  18 | # Debugging
  19 | 
  20 | ### Introduction
  21 | 
  22 | First, yes, Haskell program's do crash (segfault). There are many APIs and primitives which offer no memory safety guarantees. They are generally used to implement the higher-level, safer systems, or to interface with system APIs like POSIX or OpenGL, or simply for performance reasons (hard to be competitive with C/C++/Fortran when there's a bounds check on every memory access). And even if we ban all of those things from our codebase, we'll very likely still use Haskell libraries which will use unsafe memory operations or call into C code. And this code will inevitably have bugs, and we'll be faced with having our program exit with
  23 | 
  24 |     Segmentation fault
  25 | 
  26 | or
  27 | 
  28 |     Bus error
  29 | 
  30 | and not much more to go on. This can be a potentially very distressing situation. Imagine having your mission critical 1M LoC application just crash after 30min with zero helpful information.
  31 | 
  32 | If this was a C/C++/ObjC etc. program, we could simply fire up our favorite debugger and get the full call stack of every thread, function arguments, all local and global variables, the line of code where the segfault happened, etc. Even for code obscured by the optimizer and after some memory corruption we'd still likely get some useful data. Even better, we could use tools like [Valgrind](http://valgrind.org/), pinpointing subtle memory errors before they cascade into the inevitable segfault. None of this works (well) with GHC compiled programs. Before resorting to debugging with [git bisect](https://www.kernel.org/pub/software/scm/git/docs/git-bisect.html) or a similar measure, please read on!
  33 | 
  34 | ### Sample Program
  35 | 
  36 | Before we get to actually poking around the stack, we start by writing a simple crashing Haskell program so we have something to work with. Here:
  37 | 
  38 | `Main.hs`
  39 | 
  40 | ```haskell
  41 | module Main where
  42 | 
  43 | import Data.Vector.Mutable
  44 | 
  45 | main :: IO ()
  46 | main = do
  47 |     someFuncA
  48 |     someFuncB
  49 | 
  50 | {-# NOINLINE someFuncA #-}
  51 | someFuncA :: IO ()
  52 | someFuncA = do
  53 |     v <- new 100
  54 |     unsafeWrite v 0 (0 :: Int)
  55 | 
  56 | {-# NOINLINE someFuncB #-}
  57 | someFuncB :: IO ()
  58 | someFuncB = do
  59 |     someFuncC
  60 |     someFuncC
  61 | 
  62 | {-# NOINLINE someFuncC #-}
  63 | someFuncC :: IO ()
  64 | someFuncC = do
  65 |     v <- new 100
  66 |     unsafeWrite v 1000000000 (0 :: Int) -- Crash (hopefully)
  67 | 
  68 | ```
  69 | 
  70 | `crash.cabal` should contain
  71 | 
  72 |     ghc-options:      -O2 -threaded
  73 | 
  74 | as these flags are likely what we'd likely be using for a real program.
  75 | 
  76 | Running `crash` yields the expected result:
  77 | 
  78 |     $ ./dist/build/crash/crash
  79 |     Segmentation fault
  80 | 
  81 | The full sample program can also be found in the [crash folder](https://github.com/blitzcode/ghc-stack/tree/master/crash).
  82 | 
  83 | ### A first try with gdb
  84 | 
  85 | As discussed during the intro, no (direct) help from GHCi / the RTS here. Let's look at what `gdb` (documentation [here](http://www.gnu.org/software/gdb/documentation/), if you need a little refresher) tells us:
  86 | 
  87 |     $ gdb ./dist/build/crash/crash
  88 |     GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 UTC 2011)
  89 |     Copyright 2004 Free Software Foundation, Inc.
  90 |     GDB is free software, covered by the GNU General Public License, and you are
  91 |     welcome to change it and/or distribute copies of it under certain conditions.
  92 |     Type "show copying" to see the conditions.
  93 |     There is absolutely no warranty for GDB.  Type "show warranty" for details.
  94 |     This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ... done
  95 | 
  96 |     (gdb) r
  97 |     Starting program: /Users/Tim/crash/dist/build/crash/crash
  98 |     Reading symbols for shared libraries ++. done
  99 | 
 100 |     Program received signal EXC_BAD_ACCESS, Could not access memory.
 101 |     Reason: KERN_PROTECTION_FAILURE at address: 0xbc0ac520
 102 |     0x00001d94 in s1A7_info ()
 103 |     (gdb) bt
 104 |     #0  0x00001d94 in s1A7_info ()
 105 |     #1  0x00000000 in ?? ()
 106 |     (gdb)
 107 | 
 108 | Not helpful at all, at least at first sight!
 109 | 
 110 | The cryptic symbol `s1A7_info` is named by GHC's [STG](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode) phase. It's worth having a look at the link if this is the first time you've heard that acronym. It's a very informative read in any case, and we'll be getting back to it some more. Also see [Debugging/CompiledCode#Decodingsymbolnames](https://ghc.haskell.org/trac/ghc/wiki/Debugging/CompiledCode#Decodingsymbolnames) and [Compiler/SymbolNames](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/SymbolNames) for additional information on how these symbol names are assigned.
 111 | 
 112 | We can compile our program with `-ddump-stg -ddump-cmm -ddump-to-file`, and if we're lucky we'll find the `s1A7` identifier somewhere in the dumped [C--](http://en.wikipedia.org/wiki/C--) intermediate code (inside `dist/build/crash/crash-tmp/Main.dump-*` for a cabal build of a program named 'crash'). This way we should at least be able to pin down the module of the crash location, maybe more. Better than nothing!
 113 | 
 114 | There is much more information inside the [GHC commentary](https://ghc.haskell.org/trac/ghc/wiki/Commentary), for instance [Debugging/CompiledCode](https://ghc.haskell.org/trac/ghc/wiki/Debugging/CompiledCode) (highly recommended read!). Also see these [helpful gdb macros](https://ghc.haskell.org/trac/ghc/attachment/wiki/Debugging/CompiledCode/.gdbinit). The commentary also has page on [C--](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType).
 115 | 
 116 | What we'd really like to see is a full stack trace, though.
 117 | 
 118 | ### The x86 stack
 119 | 
 120 | We start by doing a quick review how the stack works with most compiled languages on an x86 system. I'll just point to these two short articles for [x86](http://eli.thegreenplace.net/2011/02/04/where-the-top-of-the-stack-is-on-x86/) and [x86-64](http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/). All clear? Good!
 121 | 
 122 | Generating a stack trace for say a C/C++ program on x86 would simply involve looking at the program counter (also called 'PC' or 'instruction pointer', `eip` on x86, `rip` on x86-64) to get the top function and then following the singly linked-list of stack frames while collecting the return addresses along the way. The location of the first stack frame will be obtained from the frame pointer (`ebp` / `rbp` on x86 / x86-64).
 123 | 
 124 | Here are three different implementations of this idea, hopefully making it very clear:
 125 | 
 126 | * https://github.com/blitzcode/rsvp/blob/94c167bcf2a82092a8fe21eeb2e7a5a6b2d8fc77/src/prof.cpp#L1478
 127 | * https://bitbucket.org/edd/dbg/src/1abb9939664c/src/osx/frames.cpp?at=default
 128 | * http://opensource.apple.com/source/gdb/gdb-956/src/gdb/macosx/macosx-self-backtrace.c
 129 | 
 130 | Frame pointers are fast and conceptually simple, but come with inherent limitations and don't work with the default code generation parameters of many modern compilers (e.g. `-fomit-frame-pointer`). Debuggers like `gdb` generally use special debug information (Call Frame Information, or CFI) or fall back on function prologue analysis. It's slower and vastly more complicated, but more reliable and doesn't require programs to be compiled with frame pointers explicitly enabled.
 131 | 
 132 | Some information and source code regarding CFI:
 133 | 
 134 | * http://gnu.wildebeest.org/blog/mjw/2007/08/23/stack-unwinding/
 135 | * http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf/cfi_assembler.h
 136 | * http://www.nongnu.org/libunwind/
 137 | 
 138 | For simplicity, we'll assume the presence of frame pointers (`-fno-omit-frame-pointer` for gcc and clang) and use that as out stack traversal method for non-Haskell code.
 139 | 
 140 | ### It's not like that
 141 | 
 142 | Please (temporarily) forget everything you just read about x86 stack traversal, as GHC has a different stack and uses a [different](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/CallingConvention) [calling convention](http://en.wikipedia.org/wiki/X86_calling_conventions). Apart from those implementation differences, there is a major conceptual difference as well. Haskell is a lazy language, and its stack is in evaluation order, not source code call order. Add [tail call optimization](http://en.wikipedia.org/wiki/Tail_call_optimization) into the mix, and it becomes clear why generating a stack trace for a GHC compiled program is a non-trivial problem.
 143 | 
 144 | There's an ongoing discussion on how to improve GHC's diagnostic features with easily accessible stack traces (see [ExplicitCallStack](https://ghc.haskell.org/trac/ghc/wiki/ExplicitCallStack), [Ticket #3693](https://ghc.haskell.org/trac/ghc/ticket/3693) contains some very interesting work-in-progress patches to the compiler and RTS).
 145 | 
 146 | ### Notes on RTS headers
 147 | 
 148 | We'll soon start looking at various C header files from GHC's [RTS](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts). Links will be to HEAD versions of these files, including the in-line excerpts from them. That's perfectly fine for following the explanations here, but when it gets to actually poking around in memory you probably want to look at the header files shipping with your GHC. For me they're in `/Library/Frameworks/GHC.framework/Versions/7.6.3-i386/usr/lib/ghc-7.6.3/include/`, you might need to look elsewhere or download a [source tarball](http://www.haskell.org/ghc/download_ghc_7_6_3#sources).
 149 | 
 150 | There's also the issue of different versions of the RTS. There are debug / no-debug, single / multi-threaded, profiling enabled / disabled and 32 / 64 bit versions of it. You'll need to know what the program you're inspecting links against. See the [Rts/Config](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Config) page for more details.
 151 | 
 152 | Unlike most C APIs, the Haskell RTS one does not seem to be usable from C++. Among other issues, code in the headers uses identifiers like `new`. (Update: Fixed in HEAD, see [Ticket #8676](https://ghc.haskell.org/trac/ghc/ticket/8676#ticket))
 153 | 
 154 | Finally, [SourceTree/Includes](https://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes) from the GHC commentary explains the general include structure.
 155 | 
 156 | ### Locating GHC's stack
 157 | 
 158 | How can we find GHC's stack in memory? The [STG](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode) machine defines two registers of potential interest: `BaseReg` (or base pointer) and `Sp` (stack pointer).
 159 | 
 160 | Quoting the GHC commentary:
 161 | 
 162 | > * The STG registers
 163 | >   - There are rather a lot of registers here: more than can be practicably stored in actual available processor registers on most architectures.
 164 | >   - To deal with the lack of processor registers, most of the STG registers are actually kept on the stack in a block of memory pointed to by a special STG register called the "base register" (or BaseReg). To get or set values of registers which are not kept in processor registers, the STG machine generates an instruction to load or store from an address relative to the BaseReg.
 165 | >   - The most important four registers are the BaseReg, the stack pointer (Sp), the heap pointer (Hp), and the general purpose register R1 which is used for intermediate values, as well as for returning evaluated values when unwinding the stack. These are the four registers which are assigned actual processor registers when implementing the STG machine on x86.
 166 | > * The STG stack
 167 | >   - Stores function arguments and continuations (i.e. the stack frames which are executed when a function returns)
 168 | >   - Grows downwards in memory
 169 | >   - The top of the stack is pointed to by the STG register Sp, and the maximum available stack pointer is stored in SpLim. There is no frame pointer.
 170 | 
 171 | Also see [Rts/HaskellExecution/Registers](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/Registers).
 172 | 
 173 | We can look at [MachRegs.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/stg/MachRegs.h) to see which architectural registers they map to.
 174 | 
 175 | Relevant excerpt for x86:
 176 | 
 177 |     Ok, we've only got 6 general purpose registers, a frame pointer and a
 178 |     stack pointer.  \tr{%eax} and \tr{%edx} are return values from C functions,
 179 |     hence they get trashed across ccalls and are caller saves. \tr{%ebx},
 180 |     \tr{%esi}, \tr{%edi}, \tr{%ebp} are all callee-saves.
 181 | 
 182 |     Reg     STG-Reg
 183 |     ---------------
 184 |     ebx     Base
 185 |     ebp     Sp
 186 |     esi     R1
 187 |     edi     Hp
 188 | 
 189 | And for x86-64:
 190 | 
 191 |     %rax          caller-saves, don't steal this one
 192 |     %rbx          YES
 193 |     %rcx          arg reg, caller-saves
 194 |     %rdx          arg reg, caller-saves
 195 |     %rsi          arg reg, caller-saves
 196 |     %rdi          arg reg, caller-saves
 197 |     %rbp          YES (our *prime* register)
 198 |     %rsp          (unavailable - stack pointer)
 199 |     %r8           arg reg, caller-saves
 200 |     %r9           arg reg, caller-saves
 201 |     %r10          caller-saves
 202 |     %r11          caller-saves
 203 |     %r12          YES
 204 |     %r13          YES
 205 |     %r14          YES
 206 |     %r15          YES
 207 | 
 208 |     %xmm0-7       arg regs, caller-saves
 209 |     %xmm8-15      caller-saves
 210 | 
 211 |     Use the caller-saves regs for Rn, because we don't always have to
 212 |     save those (as opposed to Sp/Hp/SpLim etc. which always have to be
 213 |     saved).
 214 | 
 215 | ### The Cost Center Stack
 216 | 
 217 | Before we attempt to traverse the more unusual and complicated STG stack used for actually evaluating expressions, let's look at the cost center stack (CCS) which GHC's profiling RTS maintains.
 218 | 
 219 | GHC's time profiler (`+RTS -p`) and its exception stack tracing (`+RTS -xc`) feature can show us conventional looking call stacks. This feature comes with with both time and space overhead, but it's probably in line with an unoptimized debug build of a C/C++ program. We can use this system to obtain better call stacks even in the case of a segfault. Now would be a good time to refresh your memory on GHC's profiling features (GHC Manual, [Chapter 5 - Profiling](http://www.haskell.org/ghc/docs/7.6.3/html/users_guide/profiling.html)) if you haven't looked at them in a while.
 220 | 
 221 | We'll start by recompiling our earlier crashing program with profiling (`cabal configure --enable-executable-profiling` or just `-prof` to GHC, also add `-fprof-auto -caf-all` to the `ghc-prof-options` field / pass them to GHC).
 222 | 
 223 | How do we get at the CCS when we're looking at a crashed program in `gdb`? Have a look at this excerpt from [Regs.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/stg/Regs.h):
 224 | 
 225 | ```c
 226 | /*
 227 |  * This is the table that holds shadow-locations for all the STG
 228 |  * registers.  The shadow locations are used when:
 229 |  *
 230 |  *     1) the particular register isn't mapped to a real machine
 231 |  *        register, probably because there's a shortage of real registers.
 232 |  *     2) caller-saves registers are saved across a CCall
 233 |  */
 234 | typedef struct {
 235 |   StgUnion        rR1;
 236 |   StgUnion        rR2;
 237 |   StgUnion        rR3;
 238 |   StgUnion        rR4;
 239 |   StgUnion        rR5;
 240 |   StgUnion        rR6;
 241 |   StgUnion        rR7;
 242 |   StgUnion        rR8;
 243 |   StgUnion        rR9;          /* used occasionally by heap/stack checks */
 244 |   StgUnion        rR10;         /* used occasionally by heap/stack checks */
 245 |   StgFloat        rF1;
 246 |   StgFloat        rF2;
 247 |   StgFloat        rF3;
 248 |   StgFloat        rF4;
 249 |   StgDouble       rD1;
 250 |   StgDouble       rD2;
 251 |   StgWord64       rL1;
 252 |   StgPtr          rSp;
 253 |   StgPtr          rSpLim;
 254 |   StgPtr          rHp;
 255 |   StgPtr          rHpLim;
 256 |   struct CostCentreStack_ * rCCCS;  /* current cost-centre-stack */
 257 |   struct StgTSO_ *     rCurrentTSO;
 258 |   struct nursery_ *    rNursery;
 259 |   struct bdescr_ *     rCurrentNursery; /* Hp/HpLim point into this block */
 260 |   struct bdescr_ *     rCurrentAlloc;   /* for allocation using allocate() */
 261 |   StgWord         rHpAlloc;     /* number of *bytes* being allocated in heap */
 262 |   StgWord         rRet;  /* holds the return code of the thread */
 263 | } StgRegTable;
 264 | ```
 265 | 
 266 | This is the block of memory which `BaseReg` points at. From looking at `MachRegs.h` earlier we know this is `ebx` on x86 and `r13` on x86-x64. It contains references to everything from the heap to the STG stack (through the [TSO](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects#ThreadStateObjects)), but we're only interested in locating the CCS for now. We need to offset the base pointer so we're getting at the `rCCCS` field. If we're using the debug runtime we can directly use the structure above, but we might need to get by with an offset. To simplify this task, we can look at the special header `DerivedConstants.h` and find the offset constant we want:
 267 | 
 268 | ```c
 269 | #define OFFSET_StgRegTable_rCCCS 96
 270 | ```
 271 | 
 272 | If you can't find this header, it's because you're looking at a clean source tarball or repository. The header is generated when GHC is compiled to match the memory layout of the target architecture. Read [SourceTree/Includes#DerivedConstants]([https://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree/Includes#DerivedConstants) for details.
 273 | 
 274 | Assuming we found our header and offset, let's proceed with `gdb`:
 275 | 
 276 |     (gdb) x/1a $ebx+96
 277 |     0x21352c <MainCapability+108>:      0x6fc578
 278 | 
 279 | We now have a pointer to `CostCentreStack`. From [CCS.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/prof/CCS.h):
 280 | 
 281 | ```c
 282 | typedef struct CostCentreStack_ {
 283 |     StgInt ccsID;               // unique ID, allocated by the RTS
 284 | 
 285 |     CostCentre *cc;             // Cost centre at the top of the stack
 286 | 
 287 |     struct CostCentreStack_ *prevStack;   // parent
 288 |     struct IndexTable_      *indexTable;  // children
 289 |     struct CostCentreStack_ *root;        // root of stack
 290 |     StgWord    depth;           // number of items in the stack
 291 | 
 292 |     StgWord64  scc_count;       // Count of times this CCS is entered
 293 |                                 // align 8 (Note [struct alignment])
 294 | 
 295 |     StgWord    selected;        // is this CCS shown in the heap
 296 |                                 // profile? (zero if excluded via -hc
 297 |                                 // -hm etc.)
 298 | 
 299 |     StgWord    time_ticks;      // number of time ticks accumulated by
 300 |                                 // this CCS
 301 | 
 302 |     StgWord64  mem_alloc;       // mem allocated by this CCS
 303 |                                 // align 8 (Note [struct alignment])
 304 | 
 305 |     StgWord64  inherited_alloc; // sum of mem_alloc over all children
 306 |                                 // (calculated at the end)
 307 |                                 // align 8 (Note [struct alignment])
 308 | 
 309 |     StgWord    inherited_ticks; // sum of time_ticks over all children
 310 |                                 // (calculated at the end)
 311 | } CostCentreStack;
 312 | ```
 313 | 
 314 | Lots of potentially interesting stuff, but we stay focussed and go one deeper, the pointer to `ConstCentre`:
 315 | 
 316 | ```c
 317 | typedef struct CostCentre_ {
 318 |     StgInt ccID;              // Unique Id, allocated by the RTS
 319 | 
 320 |     char * label;
 321 |     char * module;
 322 |     char * srcloc;
 323 | 
 324 |     // used for accumulating costs at the end of the run...
 325 |     StgWord64 mem_alloc;      // align 8 (Note [struct alignment])
 326 |     StgWord   time_ticks;
 327 | 
 328 |     StgInt is_caf;            // non-zero for a CAF cost centre
 329 | 
 330 |     struct CostCentre_ *link;
 331 | } CostCentre;
 332 | ```
 333 | 
 334 | The `label`, `module` and `srcloc` members look just like what we need! Continuing where we left of with `gdb`, looking at `ccsID` and the actual top `CostCentre` pointer (`cc` from the `CostCentreStack` pointer we just got)
 335 | 
 336 |     (gdb) x/2a 0x6fc578
 337 |     0x6fc578:   0x71    0x2095d8 <Main_someFuncC_C0_cc>
 338 | 
 339 | Diving into `ConstCentre`, now looking at `ccID`, `label`, `module`, and `srcloc`
 340 | 
 341 |     (gdb) x/4a 0x2095d8
 342 |     0x2095d8 <Main_someFuncC_C0_cc>:    0x1     0x19f5a0 <c1zU_str>     0x19f5ac <c1zW_str>     0x19f5b4 <c1zY_str>
 343 | 
 344 | Printing out the latter three as C strings
 345 | 
 346 |     (gdb) p (char *) 0x19f5a0
 347 |     $3 = 0x19f5a0 "someFuncC"
 348 |     (gdb) p (char *) 0x19f5ac
 349 |     $4 = 0x19f5ac "Main"
 350 |     (gdb) p (char *) 0x19f5b4
 351 |     $5 = 0x19f5b4 "Main.hs:(25,1)-(27,39)"
 352 | 
 353 | Just what we wanted! We're now going up the stack one level, using `CostCentreStack`'s `prevStack` pointer (8 bytes into the structure, as specified by `#define OFFSET_CostCentreStack_prevStack 8`)
 354 | 
 355 |     (gdb) x 0x6fc578+8
 356 |     0x6fc580:   0x6fc4d8
 357 |     (gdb) x 0x6fc4d8+4
 358 |     0x6fc4dc:   0x2095fc <Main_someFuncB_C1_cc>
 359 |     (gdb) x/4a 0x2095fc
 360 |     0x2095fc <Main_someFuncB_C1_cc>:    0x2     0x19f5cc <c1A0_str>     0x19f5d8 <c1A2_str>     0x19f5e0 <c1A4_str>
 361 |     (gdb) p (char *) 0x19f5cc
 362 |     $7 = 0x19f5cc "someFuncB"
 363 |     (gdb) p (char *) 0x19f5d8
 364 |     $8 = 0x19f5d8 "Main"
 365 |     (gdb) p (char *) 0x19f5e0
 366 |     $9 = 0x19f5e0 "Main.hs:(19,1)-(21,13)"
 367 | 
 368 | And one more
 369 | 
 370 |     (gdb) x 0x6fc4d8+8
 371 |     0x6fc4e0:   0x6fc398
 372 |     (gdb) x 0x6fc398+4
 373 |     0x6fc39c:   0x209644 <Main_main_C3_cc>
 374 |     (gdb) x/4a 0x209644
 375 |     0x209644 <Main_main_C3_cc>: 0x4     0x19f624 <c1Ac_str>     0x19f62c <c1Ae_str>     0x19f634 <c1Ag_str>
 376 |     (gdb) p (char *) 0x19f624
 377 |     $10 = 0x19f624 "main"
 378 |     (gdb) p (char *) 0x19f62c
 379 |     $11 = 0x19f62c "Main"
 380 |     (gdb) p (char *) 0x19f634
 381 |     $12 = 0x19f634 "Main.hs:(7,1)-(9,13)"
 382 | 
 383 | The `prevStack` pointer now points to `0x211b40 <CCS_MAIN>`, we've reached the bottom. We now know the call stack was
 384 | 
 385 |     Main.someFuncC   (Main.hs:(25,1)-(27,39))
 386 |       Main.someFuncB (Main.hs:(19,1)-(21,13))
 387 |         Main.main    (Main.hs:(7, 1)-(9, 13))
 388 | 
 389 | If `someFuncC` was very large, we could do a second run after sprinkling in a few `{-# SCC ... #-}` annotations (documentation for the [SCC Pragma](http://www.haskell.org/ghc/docs/7.6.3/html/users_guide/profiling.html#scc-pragma)) to narrow down the location of the crash even further. Also note that we now got access to not only the stack, but all kinds of other RTS profiling data which could be scraped and displayed in realtime by an external tool.
 390 | 
 391 | ### From Haskell into C and back
 392 | 
 393 | One situation we're not addressing with the previous discussion of CCS inspection is how we deal with a crash inside of a [Foreign Function Interface (FFI)](http://book.realworldhaskell.org/read/interfacing-with-c-the-ffi.html) call. We can debug C/C++ code called from Haskell as usual, but how do we get the Haskell part of the stack?
 394 | 
 395 | First, we'll modify the 'crash' test program by adding some C code.
 396 | 
 397 | `crash.h`:
 398 | 
 399 | ```c
 400 | #ifndef CRASH_H
 401 | #define CRASH_H
 402 | 
 403 | __attribute__((noinline)) void someCFuncA();
 404 | __attribute__((noinline)) void someCFuncB();
 405 | 
 406 | #endif // CRASH_H
 407 | ```
 408 | 
 409 | `crash.c`:
 410 | 
 411 | ```c
 412 | #include "crash.h"
 413 | 
 414 | void someCFuncA()
 415 | {
 416 |     someCFuncB();
 417 |     someCFuncB();
 418 | }
 419 | 
 420 | void someCFuncB()
 421 | {
 422 |     * (char *) 1 = 0;
 423 | }
 424 | ```
 425 | 
 426 | Additions to `crash.cabal`
 427 | 
 428 | ```
 429 |   c-sources:        crash.c
 430 |   cc-options:       -std=c99 -g -O3 -fno-omit-frame-pointer
 431 | ```
 432 | 
 433 | (we want debug symbols and frame pointers)
 434 | 
 435 | Changes in `Main.hs`
 436 | 
 437 | ```haskell
 438 |     -- unsafeWrite v 1000000000 (0 :: Int) -- Crash (hopefully)
 439 |     c_someCFuncA
 440 | 
 441 | foreign import ccall unsafe "crash.h someCFuncA"
 442 |     c_someCFuncA :: IO ()
 443 | ```
 444 | 
 445 | `gdb` correctly shows the C part of the call stack
 446 | 
 447 |     $ gdb ./dist/build/crash/crash
 448 |     GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 UTC 2011)
 449 |     Copyright 2004 Free Software Foundation, Inc.
 450 |     GDB is free software, covered by the GNU General Public License, and you are
 451 |     welcome to change it and/or distribute copies of it under certain conditions.
 452 |     Type "show copying" to see the conditions.
 453 |     There is absolutely no warranty for GDB.  Type "show warranty" for details.
 454 |     This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ... done
 455 | 
 456 |     (gdb) r
 457 |     Starting program: /Users/Tim/crash/dist/build/crash/crash
 458 |     Reading symbols for shared libraries ++. done
 459 | 
 460 |     Program received signal EXC_BAD_ACCESS, Could not access memory.
 461 |     Reason: KERN_PROTECTION_FAILURE at address: 0x00000001
 462 |     someCFuncB () at crash.c:24
 463 |     24      * (char *) 1 = 0;
 464 |     (gdb) bt
 465 |     #0  someCFuncB () at crash.c:24
 466 |     #1  0x0000278b in someCFuncA () at crash.c:18
 467 |     #2  0x0000220e in s1Al_info ()
 468 |     Previous frame inner to this frame (gdb could not unwind past this frame)
 469 | 
 470 | The debugger could be used to walk the C part of the stack, but we're going to use frame pointers to demonstrate the method.
 471 | 
 472 |     (gdb) x/2a $ebp
 473 |     0xbfffd6e8: 0xbfffd6f8      0x278b <someCFuncA+11>
 474 | 
 475 | The frame pointer gives us our first frame link and return address pointers, establishing the someCFuncA -> someCFuncB call order. Moving on:
 476 | 
 477 |     (gdb) x/2a 0xbfffd6f8
 478 |     0xbfffd6f8: 0x603cc8        0x220e <s1Al_info+38>
 479 | 
 480 | The next frame shows we're retuning back into Haskell code (`s1Al_info`). Since Haskell does not use frame pointers, the space of the pointer to the next stack frame is simply occupied by the `Sp` register, pointing to the top of Haskell's STG stack. A detailed look at the STG stack will follow, but for now all we need to know is that the closure on the top will have a layout like this:
 481 | 
 482 | ```c
 483 | typedef struct StgClosure_ {
 484 |     StgHeader   header;
 485 |     struct StgClosure_ *payload[FLEXIBLE_ARRAY];
 486 | } *StgClosurePtr; // StgClosure defined in Rts.h
 487 | 
 488 | typedef struct {
 489 |     const StgInfoTable* info;
 490 | #ifdef PROFILING
 491 |     StgProfHeader         prof;
 492 | #endif
 493 | } StgHeader;
 494 | 
 495 | typedef struct {
 496 |   CostCentreStack *ccs;
 497 |   union {
 498 |     struct _RetainerSet *rs;  /* Retainer Set */
 499 |     StgWord ldvw;             /* Lag/Drag/Void Word */
 500 |   } hp;
 501 | } StgProfHeader;
 502 | ```
 503 | 
 504 | Basically, the second word on the stack will point us to the CCS. Here:
 505 | 
 506 |     (gdb) x/2a 0x603cc8
 507 |     0x603cc8:   0x21e8 <s1Al_info>      0x6fc578
 508 | 
 509 | And showing just enough of `CostCentreStack` to reveal the `CostCentre` pointer
 510 | 
 511 |     (gdb) x/2a 0x6fc578
 512 |     0x6fc578:   0x71    0x2095d8 <Main_someFuncC_C0_cc>
 513 | 
 514 | We found the CCS and the cost center for someFuncC, allowing us to proceed as before.
 515 | 
 516 | ### The STG stack
 517 | 
 518 | So far we've only been working with the CCS, which is not available unless we've compiled our program with profiling enabled. What can we learn from looking at Haskell's native STG stack?
 519 | 
 520 | Recall from our earlier discussion of registers (or read [Rts/HaskellExecution/Registers](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/Registers) again) that the virtual `Sp` register (`ebp` /  `rbp` on x86 / x86-64) always points to the top of the stack. We can also use the `BaseReg` and get access to the current [TSO](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects#ThreadStateObjects), containing a pointer to the `StgStack` structure. In any case, we'll have a look with `gdb`:
 521 | 
 522 |     (gdb) x/32a $ebp
 523 |     0x503cd8:   0x1d94 <s1A7_info>      0x1e2c <s1A8_info>      0xefc6c <stg_catch_frame_info>  0x0
 524 |     0x503ce8:   0x11bd82 <base_GHCziTopHandler_runIO2_closure+2>        0xf32d8 <stg_stop_thread_info>  0xf2f34 <stg_TSO_info>  0x11d79c <stg_END_TSO_QUEUE_closure>
 525 |     0x503cf8:   0x5038f0        0x503930        0x1     0x0
 526 |     0x503d08:   0x11d79c <stg_END_TSO_QUEUE_closure>    0x3     0x0     0x1
 527 |     0x503d18:   0x300670        0x11e440 <MainCapability>       0x11d798 <stg_NO_TREC_closure>  0x11d79c <stg_END_TSO_QUEUE_closure>
 528 |     0x503d28:   0x11d79c <stg_END_TSO_QUEUE_closure>    0xec    0xf321c <stg_MUT_ARR_PTRS_DIRTY_info>   0x41
 529 |     0x503d38:   0x42    0x11c0d5 <base_DataziMaybe_Nothing_closure+1>   0x11c0d5 <base_DataziMaybe_Nothing_closure+1>   0x502fba
 530 |     0x503d48:   0x11c0d5 <base_DataziMaybe_Nothing_closure+1>   0x11c0d5 <base_DataziMaybe_Nothing_closure+1>   0x11c0d5 <base_DataziMaybe_Nothing_closure+1>   0x11c0d5 <base_DataziMaybe_
 531 | 
 532 | The first 32 words of the stack (and what comes after it, perhaps)! How can we attempt to understand this further? There's a fair bit of material on the [STG machine](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode) page on how C-- code uses the stack, but let's first look at an excerpt from the [Rts/Storage/Stack](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Stack) page:
 533 | 
 534 | > The stack consists of a sequence of stack frames (also sometimes called activation records) where each frame has the same layout as a heap object:
 535 | > Header        Payload...
 536 | 
 537 | and
 538 | 
 539 | > Stack frames therefore have bitmap layout.
 540 | 
 541 | Ok, moving on to the documentation on heap objects, i.e. [Rts/Storage/HeapObjects](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects):
 542 | 
 543 | > All heap objects have the same basic layout, embodied by the type StgClosure in [Closures.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/storage/Closures.h).
 544 | 
 545 | ```c
 546 | /* All closures follow the generic format */
 547 | 
 548 | typedef struct StgClosure_ {
 549 |     StgHeader   header;
 550 |     struct StgClosure_ *payload[FLEXIBLE_ARRAY];
 551 | } *StgClosurePtr; // StgClosure defined in Rts.h
 552 | ```
 553 | 
 554 | > A heap object always begins with a header, defined by `StgHeader` in [Closures.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/storage/Closures.h)
 555 | 
 556 | ```c
 557 | typedef struct {
 558 |     const StgInfoTable* info;
 559 | #ifdef PROFILING
 560 |     StgProfHeader         prof;
 561 | #endif
 562 | } StgHeader;
 563 | ```
 564 | 
 565 | When looking at these structures, for now assume `PROFILING`, `TICKY` and `DEBUG_CLOSURE` to be undefined, while assuming `TABLES_NEXT_TO_CODE` to be defined. See the [Rts/Config](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Config) page for more details.
 566 | 
 567 | Also worth mentioning is the issue of pointer tagging, see [Rts/HaskellExecution/PointerTagging](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging). It shouldn't affect us much for our basic stack traversal, but it's important to know about when poking around further in GHC's heap.
 568 | 
 569 | Now we know that when we're looking at the top of our stack and see `0x1d94 <s1A7_info>`, we actually have a pointer to an `StgInfoTable`, followed by a 'payload' array of closures. What are 'info tables' anyway? Have a quick look at [Rts/Storage/HeapObjects#InfoTables](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects#InfoTables) and come back.
 570 | 
 571 | Let's look at the definitions, from [InfoTables.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/storage/InfoTables.h):
 572 | 
 573 | ```c
 574 | /*
 575 |  * The "standard" part of an info table.  Every info table has this bit.
 576 |  */
 577 | typedef struct StgInfoTable_ {
 578 | 
 579 | #if !defined(TABLES_NEXT_TO_CODE)
 580 |     StgFunPtr       entry;      /* pointer to the entry code */
 581 | #endif
 582 | 
 583 | #ifdef PROFILING
 584 |     StgProfInfo     prof;
 585 | #endif
 586 | #ifdef TICKY
 587 |   /* Ticky-specific stuff would go here. */
 588 | #endif
 589 | #ifdef DEBUG_CLOSURE
 590 |   /* Debug-specific stuff would go here. */
 591 | #endif
 592 | 
 593 |     StgClosureInfo  layout;     /* closure layout info (one word) */
 594 | 
 595 |     StgHalfWord     type;       /* closure type */
 596 |     StgHalfWord     srt_bitmap;
 597 |        /* In a CONSTR:
 598 |             - the constructor tag
 599 |           In a FUN/THUNK
 600 |             - a bitmap of SRT entries
 601 |        */
 602 | 
 603 | #ifdef TABLES_NEXT_TO_CODE
 604 |     StgCode         code[FLEXIBLE_ARRAY];
 605 | #endif
 606 | } *StgInfoTablePtr;
 607 | ```
 608 | 
 609 | Info tables for stack frames / return addresses have a special layout, though. It is defined by `StgRetInfoTable`:
 610 | 
 611 | ```c
 612 | typedef struct {
 613 | #if defined(TABLES_NEXT_TO_CODE)
 614 |     OFFSET_FIELD(srt_offset);   /* offset to the SRT table */
 615 |     StgInfoTable i;
 616 | #else
 617 |     StgInfoTable i;
 618 |     StgSRT      *srt;   /* pointer to the SRT table */
 619 | #endif
 620 | } StgRetInfoTable;
 621 | ```
 622 | 
 623 | Now, why does `StgRetInfoTable` prepend an additional field instead of append? It would seem more logical to tack it on at the end so we can first treat all info tables as an `StgInfoTable`, inspect the `type` field and then cast to the more specific structure and access additional fields. It has to do with `TABLES_NEXT_TO_CODE` (documented in [Rts/Config](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Config) and [Rts/Storage/HeapObjects#TABLES_NEXT_TO_CODE](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects#TABLES_NEXT_TO_CODE)). Have a look at this excerpt from [ClosuresMacros.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/storage/ClosureMacros.h):
 624 | 
 625 | ```c
 626 | /* -----------------------------------------------------------------------------
 627 |    Info tables are slammed up against the entry code, and the label
 628 |    for the info table is at the *end* of the table itself.  This
 629 |    inline function adjusts an info pointer to point to the beginning
 630 |    of the table, so we can use standard C structure indexing on it.
 631 | 
 632 |    Note: this works for SRT info tables as long as you don't want to
 633 |    access the SRT, since they are laid out the same with the SRT
 634 |    pointer as the first word in the table.
 635 | 
 636 |    NOTES ABOUT MANGLED C VS. MINI-INTERPRETER:
 637 | 
 638 |    A couple of definitions:
 639 | 
 640 |        "info pointer"    The first word of the closure.  Might point
 641 |                          to either the end or the beginning of the
 642 |                          info table, depending on whether we're using
 643 |                          the mini interpretter or not.  GET_INFO(c)
 644 |                          retrieves the info pointer of a closure.
 645 | 
 646 |        "info table"      The info table structure associated with a
 647 |                          closure.  This is always a pointer to the
 648 |                          beginning of the structure, so we can
 649 |                          use standard C structure indexing to pull out
 650 |                          the fields.  get_itbl(c) returns a pointer to
 651 |                          the info table for closure c.
 652 | 
 653 |    An address of the form xxxx_info points to the end of the info
 654 |    table or the beginning of the info table depending on whether we're
 655 |    mangling or not respectively.  So,
 656 | 
 657 |          c->header.info = xxx_info
 658 | 
 659 |    makes absolute sense, whether mangling or not.
 660 | 
 661 |    -------------------------------------------------------------------------- */
 662 | ...
 663 | #ifdef TABLES_NEXT_TO_CODE
 664 | #define INFO_PTR_TO_STRUCT(info) ((StgInfoTable *)(info) - 1)
 665 | #define RET_INFO_PTR_TO_STRUCT(info) ((StgRetInfoTable *)(info) - 1)
 666 | ...
 667 | #else
 668 | #define INFO_PTR_TO_STRUCT(info) ((StgInfoTable *)info)
 669 | #define RET_INFO_PTR_TO_STRUCT(info) ((StgRetInfoTable *)info)
 670 | ...
 671 | #endif
 672 | 
 673 | ```
 674 | 
 675 | Our pointer actually points to the closure's entry code (`StgInfoTable.code`), and the info table prepends it. We need to subtract the structure size from it. We can also see this in the [gdb macros](https://ghc.haskell.org/trac/ghc/attachment/wiki/Debugging/CompiledCode/.gdbinit) mentioned earlier:
 676 | 
 677 |     define pinfo
 678 |     p *((StgInfoTable *)$arg0-1)
 679 |     end
 680 | 
 681 |     define prinfo
 682 |     p *((StgRetInfoTable *)$arg0-1)
 683 |     end
 684 | 
 685 | GHC's runtime does 'append' the additional fields, it's just the definition of beginning and end that are reversed, like with top / bottom of the stack.
 686 | 
 687 | To go over the stack we need to know the payload size coming after the `StgHeader`. Remember that stack frames have [bitmap layout](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects#Bitmaplayout) (also please have a quick look at that). The `StgClosureInfo` structure from the info table tells us about the size of the payload:
 688 | 
 689 | ```c
 690 | /*
 691 |  * Stuff describing the closure layout.  Well, actually, it might
 692 |  * contain the selector index for a THUNK_SELECTOR.  This union is one
 693 |  * word long.
 694 |  */
 695 | typedef union {
 696 |     struct {                    /* Heap closure payload layout: */
 697 |         StgHalfWord ptrs;       /* number of pointers */
 698 |         StgHalfWord nptrs;      /* number of non-pointers */
 699 |     } payload;
 700 | 
 701 |     StgWord bitmap;               /* word-sized bit pattern describing */
 702 |                                   /*  a stack frame: see below */
 703 | 
 704 | #ifndef TABLES_NEXT_TO_CODE
 705 |     StgLargeBitmap* large_bitmap; /* pointer to large bitmap structure */
 706 | #else
 707 |     OFFSET_FIELD(large_bitmap_offset);  /* offset from info table to large bitmap structure */
 708 | #endif
 709 | 
 710 |     StgWord selector_offset;      /* used in THUNK_SELECTORs */
 711 | 
 712 | } StgClosureInfo;
 713 | ```
 714 | 
 715 | The payload size can be extracted from the size bits from `bitmap`, preferably using the `BITMAP_SIZE` macro as the layout of the word changes between 32 / 64 bit. This will get us pretty far, but there are some closures on the stack that require special treatment. Fortunately, the RTS provides a convenient function taking care of it all. Looking at `stack_frame_sizeW` from [ClosuresMacros.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/storage/ClosureMacros.h):
 716 | 
 717 | ```c
 718 | /* -----------------------------------------------------------------------------
 719 |    Sizes of stack frames
 720 |    -------------------------------------------------------------------------- */
 721 | 
 722 | EXTERN_INLINE StgWord stack_frame_sizeW( StgClosure *frame );
 723 | EXTERN_INLINE StgWord stack_frame_sizeW( StgClosure *frame )
 724 | {
 725 |     StgRetInfoTable *info;
 726 | 
 727 |     info = get_ret_itbl(frame);
 728 |     switch (info->i.type) {
 729 | 
 730 |     case RET_DYN:
 731 |     {
 732 |         StgRetDyn *dyn = (StgRetDyn *)frame;
 733 |         return  sizeofW(StgRetDyn) + RET_DYN_BITMAP_SIZE +
 734 |             RET_DYN_NONPTR_REGS_SIZE +
 735 |             RET_DYN_PTRS(dyn->liveness) + RET_DYN_NONPTRS(dyn->liveness);
 736 |     }
 737 | 
 738 |     case RET_FUN:
 739 |         return sizeofW(StgRetFun) + ((StgRetFun *)frame)->size;
 740 | 
 741 |     case RET_BIG:
 742 |         return 1 + GET_LARGE_BITMAP(&info->i)->size;
 743 | 
 744 |     case RET_BCO:
 745 |         return 2 + BCO_BITMAP_SIZE((StgBCO *)((P_)frame)[1]);
 746 | 
 747 |     default:
 748 |         return 1 + BITMAP_SIZE(info->i.layout.bitmap);
 749 |     }
 750 | }
 751 | ```
 752 | 
 753 | The W postfix means the function returns the size in words, not bytes.
 754 | 
 755 | How do we know when we reached the bottom of the stack? If we have access to the `StgStack` structure (i.e. from the TSO)
 756 | 
 757 | ```c
 758 | typedef struct StgStack_ {
 759 |     StgHeader header;
 760 |     StgWord32 stack_size;   // stack size in *words*
 761 |     StgWord32 dirty;        // non-zero => dirty
 762 |     StgPtr sp;              // current stack pointer
 763 |     StgWord stack[FLEXIBLE_ARRAY];
 764 | } StgStack;
 765 | ```
 766 | 
 767 | it's rather obvious as we have the stack pointer, start and size of the current stack chunk. If not, we can look for a closure of type `STOP_FRAME` and use this as our bottom marker.
 768 | 
 769 | One last type of stack frame we need to discuss is the underflow frame. The STG stack is implemented as a linked-list of chunks (see [Implement stack chunks and separate TSO/STACK objects](https://ghc.haskell.org/trac/ghc/changeset/f30d527344db528618f64a25250a3be557d9f287/ghc) and [this](https://ghc.haskell.org/trac/ghc/blog/stack-chunks) post from the [GHC blog](https://ghc.haskell.org/trac/ghc/blog/)). Here's the closure of [type](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/storage/ClosureTypes.h) `UNDERFLOW_FRAME` from [Closures.h](https://ghc.haskell.org/trac/ghc/browser/ghc/includes/rts/storage/Closures.h):
 770 | 
 771 | ```c
 772 | typedef struct {
 773 |     const StgInfoTable* info;
 774 |     struct StgStack_ *next_chunk;
 775 | } StgUnderflowFrame;
 776 | ```
 777 | 
 778 | The contained stack object tells us everything we need to know to continue traversing.
 779 | 
 780 | GHC already has some internal code for printing out various heap and stack objects, see [Printer.h](https://ghc.haskell.org/trac/ghc/browser/ghc/rts/Printer.h) and [Printer.c](https://ghc.haskell.org/trac/ghc/browser/ghc/rts/Printer.c). With functions like `printStackChunk` and `prettyPrintClosure` its basically a cookbook for how to traverse and trace the contents of RTS objects.
 781 | 
 782 | All basic building blocks for traversing the STG stack are in place. It's still fairly difficult to extract useful information from it, but it's the best we have without a profiling RTS.
 783 | 
 784 | ### Automatic stack capture
 785 | 
 786 | The last remaining task is to write some code to automate the often tedious steps required to extract the call stack. There are at least three different ways we could go about this.
 787 | 
 788 | One approach would be to automate `gdb` through its [scripting capabilities](http://stackoverflow.com/questions/10748501/automating-gdb-sessions) and [machine interface](https://sourceware.org/gdb/onlinedocs/gdb/GDB_002fMI.html). While this is certainly viable for debugging, it might be a bit slow for more frequent, performance sensitive extraction of call stack information (i.e. for a profiler).
 789 | 
 790 | Another implementation is installing a [signal handler](http://en.wikipedia.org/wiki/Unix_signal) inside our program, catching the `SIGBUS` / `SIGSEGV` / etc. signals and performing the stack walk right there. This is complicated by the short list of functions which can be [safely called from a signal handler](https://www.securecoding.cert.org/confluence/display/seccode/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers). Given the nature of our signals, the heap might already be corrupted and any information retrieval difficult to carry out.
 791 | 
 792 | The method this document will actually describe is writing a minimal stand-alone debugger, inspecting the crashed process from the outside. A more complicated solution, but the most flexible and versatile one.
 793 | 
 794 | ### atos
 795 | 
 796 | Outside of `gdb`, we need a way for our program to translate addresses into symbol names. That gap will be filled by the OS X [atos](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/atos.1.html) command line utility. With a live process, it can easily resolve any address.
 797 | 
 798 | Linux provides atos-like functionality through [`addr2line`](http://www.linuxcommand.org/man_pages/addr2line1.html), Windows has the `Sym*` APIs from [DbgHelp](http://msdn.microsoft.com/en-us/library/windows/desktop/ms679291%28v=vs.85%29.aspx).
 799 | 
 800 | The `atos` communication, wrapping and parsing code will be handled by the [ATOS_Pipe](https://github.com/blitzcode/rsvp/blob/94c167bcf2a82092a8fe21eeb2e7a5a6b2d8fc77/src/prof.cpp#L128) and [SymbolManager](https://github.com/blitzcode/rsvp/blob/94c167bcf2a82092a8fe21eeb2e7a5a6b2d8fc77/src/prof.cpp#L181) classes from [rsvp](https://github.com/blitzcode/rsvp/).
 801 | 
 802 | With those classes, getting the symbol name string for a program counter value is very simple:
 803 | 
 804 | ```c
 805 | symbol->SymbolIDToName(symbol->AddressToSymbolID(x86ts32.__eip))
 806 | ```
 807 | 
 808 | ### Inspecting another process
 809 | 
 810 | To apply the stack traversal and symbol resolution methods described, we need to have access to the memory, threads and registers of the debugged process. We also need to be notified once it segfaults. On Linux the [`ptrace`](http://linux.die.net/man/2/ptrace) call should be able to provide everything we need. Unfortunately, OS X has a rather [gimped implementation](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/ptrace.2.html) of that API. See [uninformed.org](http://uninformed.org/index.cgi?v=4&a=3&p=14) for a discussion of workarounds for OS X, and this [Linux Journal](http://www.linuxjournal.com/article/6100) article for an explanation of how it works on Linux.
 811 | 
 812 | Our usage of `ptrace` boils down to initiating the tracing relationship with the child like this
 813 | 
 814 | ```c
 815 | pid_t child = fork();
 816 | if (child == 0)
 817 | {
 818 |     // Let parent trace us
 819 |     ptrace(PT_TRACE_ME, 0, NULL, NULL);
 820 | 
 821 |     if (execv(argv[1], &argv[1]) == -1)
 822 |         assert("execl() failed, can't load child executable");
 823 | }
 824 | else
 825 | {
 826 |     DebugLoop();
 827 |     kill(child, SIGKILL);
 828 | }
 829 | ```
 830 | 
 831 | and then letting the child continue every time we receive a signal from it (`waitpid`)
 832 | 
 833 | ```c
 834 | void PTraceContinue(pid_t pid)
 835 | {
 836 |     // Child process stopped after signalling us, let it proceed
 837 |     ptrace(PT_CONTINUE, pid, (caddr_t) 1, 0);
 838 | }
 839 | ```
 840 | 
 841 | The `DebugLoop` function handling events from the child / target process is structured as follows
 842 | 
 843 | ```c
 844 | void DebugLoop(const TargetInfo& ti)
 845 | {
 846 |     while (true)
 847 |     {
 848 |         int status;
 849 |         waitpid(ti.m_pid, &status, 0);
 850 | 
 851 |         // Handle signal
 852 |         if (WIFEXITED(status))
 853 |             break;
 854 |         else if (WIFSIGNALED(status))
 855 |             break;
 856 |         else if (WIFSTOPPED(status))
 857 |         {
 858 |             const int sig = WSTOPSIG(status);
 859 |             switch (sig)
 860 |             {
 861 |                 // Fault?
 862 |                 case SIGSEGV:
 863 |                 case SIGBUS:
 864 |                 case SIGILL:
 865 |                 case SIGFPE:
 866 |                 case SIGSYS:
 867 |                     TraceProcessState(ti);
 868 |                     return;
 869 |                 // Continue
 870 |                 default:
 871 |                     PTraceContinue(ti.m_pid);
 872 |                     break;
 873 |             }
 874 |         }
 875 |     }
 876 | }
 877 | ```
 878 | 
 879 | ### Mach
 880 | 
 881 | We'll need to replace some of the missing `ptrace` functionality through [Mach kernel](http://en.wikipedia.org/wiki/Mach_%28kernel%29) functions. Mach is the underlying kernel on which OS X implements its BSD / POSIX layer. Mach in OS X is somewhat sparsely documented, but see [uninformed.org](http://uninformed.org/index.cgi?v=4&a=3), [developer.apple.com](https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming/About/About.html#//apple_ref/doc/uid/TP30000905-CH204-TPXREF101), [Mach IPC Interface](http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/) and [Mac OS X Internals: A Systems Approach](http://www.amazon.com/Mac-OS-Internals-Systems-Approach/dp/0321278542/) (a somewhat dated but still useful book). The Mach APIs can provide us with full access to the debugged process.
 882 | 
 883 | For Windows, similar functionality can be accessed through APIs like `CreateRemoteThread`, `SuspendThread`, `GetThreadContext`, `ResumeThread` and `StackWalk64`.
 884 | 
 885 | ### Mach crash course
 886 | 
 887 | A full Mach kernel API tutorial is out of scope for this document, but here are the bits we need for stack capture.
 888 | 
 889 | The Mach headers can be found in `/usr/include/mach/` and subdirectories. Include like this
 890 | 
 891 | ```c
 892 | #include <mach/mach.h>
 893 | #include <mach/mach_vm.h> // for mach_vm_ instead of vm_
 894 | ```
 895 | 
 896 | We need a port for the task to be debugged
 897 | 
 898 | ```c
 899 | task_for_pid(mach_task_self(), pid, &m_task_port);
 900 | ```
 901 | 
 902 | Obtain a list of threads from the target process
 903 | 
 904 | ```c
 905 | thread_act_port_array_t thread_list;
 906 | mach_msg_type_number_t thread_count;
 907 | task_threads(ti.m_task_port, &thread_list, &thread_count);
 908 | ```
 909 | 
 910 | Query thread scheduling etc. information
 911 | 
 912 | ```c
 913 | thread_basic_info tbi;
 914 | mach_msg_type_number_t thread_info_count = THREAD_BASIC_INFO_COUNT;
 915 | thread_info(thread_list[thread_idx],
 916 |             THREAD_BASIC_INFO,
 917 |             (thread_info_t) &tbi,
 918 |             &thread_info_count);
 919 | ```
 920 | 
 921 | Skip threads which were not running when we got the fault
 922 | 
 923 | ```c
 924 | if (tbi.flags & TH_FLAGS_SWAPPED)
 925 |     continue;
 926 | ```
 927 | 
 928 | Query thread exception information (trap number, fault address etc.)
 929 | 
 930 | ```c
 931 | x86_exception_state32_t x86es32;
 932 | mach_msg_type_number_t exception_state_count32 = x86_EXCEPTION_STATE32_COUNT;
 933 | thread_get_state(thread_list[thread_idx],
 934 |                  x86_EXCEPTION_STATE32,
 935 |                  (thread_state_t) &x86es32,
 936 |                  &exception_state_count32);
 937 | ```
 938 | 
 939 | Query thread registers (program counter, frame pointer, etc.)
 940 | 
 941 | ```c
 942 | x86_thread_state32_t x86ts32;
 943 | mach_msg_type_number_t thread_state_count32 = x86_THREAD_STATE32_COUNT;
 944 | thread_get_state(thread_list[thread_idx],
 945 |                  x86_THREAD_STATE32,
 946 |                  (thread_state_t) &x86ts32,
 947 |                  &thread_state_count32);
 948 | ```
 949 | 
 950 | And finally, read memory from the target process
 951 | 
 952 | ```c
 953 | kern_return_t ReadMemory(mach_port_name_t task, target_ptr_t address, target_ptr_t size, void *out)
 954 | {
 955 |     mach_vm_size_t inout_size = size;
 956 |     kern_return_t ret = mach_vm_read_overwrite(
 957 |         task,
 958 |         address,
 959 |         inout_size,
 960 |         (mach_vm_address_t) (uintptr_t) out,
 961 |         &inout_size);
 962 |     if (ret != KERN_SUCCESS)
 963 |         return ret;
 964 |     if (inout_size != size)
 965 |         return KERN_FAILURE;
 966 |     return KERN_SUCCESS;
 967 | }
 968 | ```
 969 | 
 970 | ### Other obstacles
 971 | 
 972 | How do we know if our target process is using a profiling RTS or not (CCS vs STG stack traversal)? We can't just call `rts_isProfiled`, but we can use [`nm`](http://unixhelp.ed.ac.uk/CGI/man-cgi?nm) to check for certain symbols, like this:
 973 | 
 974 | ```c++
 975 | bool DoesSymbolExist(const char *executable, const char *symbol)
 976 | {
 977 |     // Use nm to check if a symbol exists
 978 |     char buf[1024];
 979 |     std::snprintf(buf, sizeof(buf), "nm %s | grep %s", executable, symbol);
 980 |     std::FILE *pipe = popen(buf, "r");
 981 |     assert(pipe != NULL);
 982 |     bool ret = std::fread(buf, 1, sizeof(buf), pipe) != 0;
 983 |     pclose(pipe);
 984 |     return ret;
 985 | }
 986 | 
 987 | bool IsProfilingRTS(const char *exe) { return DoesSymbolExist(exe, "_CCS_MAIN"      ); }
 988 | 
 989 | bool IsThreadedRTS (const char *exe) { return DoesSymbolExist(exe, "_createOSThread"); }
 990 | ```
 991 | 
 992 | When our target process stops with a fault, how do we know if we're in Haskell or C etc. code? I don't really have a good answer for that, but this heuristic seems to work in pratice:
 993 | 
 994 | ```c++
 995 | bool IsHaskellSymbol(
 996 |     const TargetInfo& ti,
 997 |     SymbolManager *symbol,
 998 |     uint32_t sym_id,
 999 |     uint16_t file_name_id)
1000 | {
1001 |     // Haskell functions don't have source file debug information
1002 |     if (std::strcmp(symbol->FileIDToName(file_name_id), symbol->GetUnresolvedSymbolName()) != 0)
1003 |         return false;
1004 |     // Assume that all Haskell functions come from our executable
1005 |     if (strcasecmp(symbol->SymbolIDToModule(sym_id), ti.m_module.c_str()) != 0)
1006 |         return false;
1007 |     // Assume that all Haskell functions have certain tokens in their name
1008 |     if (std::strstr(symbol->SymbolIDToName(sym_id), "_info") == NULL)
1009 |         return false;
1010 | 
1011 |     return true;
1012 | }
1013 | ```
1014 | 
1015 | ### The Glorious Haskell Debugger v0.0.1 Pre-alpha
1016 | 
1017 | For the finale, here is a **proof-of-concept** implementation of everything we discussed so far. **It is kinda cobbled together, has many limitations and missing features, but can serve as an example and starting point**. With our discussion of stack traversal, the references given above for `atos` / `ptrace` and the Mach APIs, you will hopefully be able to follow along.
1018 | 
1019 | As with everything else so far, this is for OS X, GHC 7.6.3, 32bit. There's no support for the LLVM backend. The many hints for other platforms throughout this document should make porting fairly smooth, though.
1020 | 
1021 | The full source code is in the [ghd folder](https://github.com/blitzcode/ghc-stack/tree/master/ghd).
1022 | 
1023 | Build with `make`. Run with `sudo ghd ./myfaultyexecutable paramsformyfaultyexecutable`.
1024 | 
1025 | ### Example stack traces
1026 | 
1027 | Let's look at a few examples of stack traces produced by `ghd`.
1028 | 
1029 | Our 'crash' program, error in Haskell code, profiling (CCS) enabled:
1030 | 
1031 | ```
1032 | ghd: Debugging '../crash/dist/build/crash/crash', PID 50929
1033 | ghd: RTS - Threaded: Yes | Profiling: Yes
1034 | ghd: Received signal 'SIGTRAP - trace trap' from child, continuing
1035 | ghd: Child stopped on fault 'SIGSEGV - segmentation violation'
1036 | ghd: Attempting to run stack trace
1037 | ghd: ---------------
1038 | ghd: Thread 1 of 3
1039 | ghd: ---------------
1040 | ghd: Status - Waiting
1041 | ghd: ExceptionState - trapno: 14 (EXC_I386_PGFLT - page fault), err: 6, faultvaddr: 0xeecb89bc
1042 | ghd: Registers
1043 | ghd:   eax: 7325048, ebx: 2176268, ecx: 6316880, edx: 404, edi: 6314680, esi: 6316456, ebp: 0x603cc8
1044 | ghd:   esp: 0xbfffd81c, ss: 31, eflags: 66070, eip: 0x21ee <s1zZ_info>, cs: 23, ds: 31
1045 | ghd:   es: 31, fs: 0, gs: 55
1046 | ghd: Stack Trace
1047 | ghd:  0x21ee <s1zZ_info> [Haskell, switching to CCS]
1048 | ghd:   CCS:0x6fc578 <someFuncC> from Main (Main.hs:(25,1)-(27,39))
1049 | ghd:    CCS:0x6fc4d8 <someFuncB> from Main (Main.hs:(19,1)-(21,13))
1050 | ghd:     CCS:0x6fc398 <main> from Main (Main.hs:(7,1)-(9,13))
1051 | ghd:      CCS:0x211b60 <MAIN> from MAIN (<built-in>)
1052 | ghd: Stack trace done, exiting
1053 | ```
1054 | 
1055 | Our 'crash' program, error in C code, profiling (CCS) enabled:
1056 | 
1057 | ```
1058 | ghd: Debugging '../crash/dist/build/crash/crash', PID 50987
1059 | ghd: RTS - Threaded: Yes | Profiling: Yes
1060 | ghd: Received signal 'SIGTRAP - trace trap' from child, continuing
1061 | ghd: Child stopped on fault 'SIGBUS - bus error'
1062 | ghd: Attempting to run stack trace
1063 | ghd: ---------------
1064 | ghd: Thread 1 of 3
1065 | ghd: ---------------
1066 | ghd: Status - Waiting
1067 | ghd: ExceptionState - trapno: 14 (EXC_I386_PGFLT - page fault), err: 6, faultvaddr: 0x1
1068 | ghd: Registers
1069 | ghd:   eax: 6316476, ebx: 2176268, ecx: 100, edx: 404, edi: 6314680, esi: 6316456, ebp: 0xbfffd7f8
1070 | ghd:   esp: 0xbfffd7f8, ss: 31, eflags: 66182, eip: 0x2773 <someCFuncB>, cs: 23, ds: 31
1071 | ghd:   es: 31, fs: 0, gs: 55
1072 | ghd: Stack Trace
1073 | ghd:  0x2773 <someCFuncB> from crash (crash.c:12)
1074 | ghd:   0x278b <someCFuncA> from crash (crash.c:8)
1075 | ghd:    0x220e <s1Al_info> [Haskell, switching to CCS]
1076 | ghd:     CCS:0x6fc578 <someFuncC> from Main (Main.hs:(25,1)-(29,16))
1077 | ghd:      CCS:0x6fc4d8 <someFuncB> from Main (Main.hs:(19,1)-(21,13))
1078 | ghd:       CCS:0x6fc398 <main> from Main (Main.hs:(7,1)-(9,13))
1079 | ghd:        CCS:0x211b60 <MAIN> from MAIN (<built-in>)
1080 | ghd: Stack trace done, exiting
1081 | ```
1082 | 
1083 | Our 'crash' program, error in C code, profiling (CCS) disabled:
1084 | 
1085 | ```
1086 | ghd: Debugging '../crash/dist/build/crash/crash', PID 50779
1087 | ghd: RTS - Threaded: Yes | Profiling: No
1088 | ghd: Received signal 'SIGTRAP - trace trap' from child, continuing
1089 | ghd: Child stopped on fault 'SIGBUS - bus error'
1090 | ghd: Attempting to run stack trace
1091 | ghd: ---------------
1092 | ghd: Thread 1 of 2
1093 | ghd: ---------------
1094 | ghd: Status - Waiting
1095 | ghd: ExceptionState - trapno: 14 (EXC_I386_PGFLT - page fault), err: 6, faultvaddr: 0x1
1096 | ghd: Registers
1097 | ghd:   eax: 5267468, ebx: 1172556, ecx: 100, edx: 404, edi: 5263432, esi: 5267456, ebp: 0xbfffd7f8
1098 | ghd:   esp: 0xbfffd7f8, ss: 31, eflags: 66182, eip: 0x1f23 <someCFuncB>, cs: 23, ds: 31
1099 | ghd:   es: 31, fs: 0, gs: 55
1100 | ghd: Stack Trace
1101 | ghd:  0x1f23 <someCFuncB> from crash (crash.c:12)
1102 | ghd:   0x1f3b <someCFuncA> from crash (crash.c:8)
1103 | ghd:    0x1d90 <s1AP_info> [Haskell, switching to STG stack]
1104 | ghd:     0x503cd8 <s1AP_info> (RET_SMALL, 4b)
1105 | ghd:      0x503cdc <s1AQ_info> (RET_SMALL, 4b)
1106 | ghd:       0x503ce0 <stg_catch_frame_info> (CATCH_FRAME, 12b, <base_GHCziTopHandler_runIO2_info>)
1107 | ghd:        0x503cec <stg_stop_thread_info> (STOP_FRAME, 4b)
1108 | ghd: Stack trace done, exiting
1109 | ```
1110 | 
1111 | Our 'crash' program, error in Haskell code, profiling (CCS) disabled:
1112 | 
1113 | ```
1114 | ghd: Debugging '../crash/dist/build/crash/crash', PID 50828
1115 | ghd: RTS - Threaded: Yes | Profiling: No
1116 | ghd: Received signal 'SIGTRAP - trace trap' from child, continuing
1117 | ghd: Child stopped on fault 'SIGSEGV - segmentation violation'
1118 | ghd: Attempting to run stack trace
1119 | ghd: ---------------
1120 | ghd: Thread 1 of 2
1121 | ghd: ---------------
1122 | ghd: Status - Waiting
1123 | ghd: ExceptionState - trapno: 14 (EXC_I386_PGFLT - page fault), err: 6, faultvaddr: 0xeebb880c
1124 | ghd: Registers
1125 | ghd:   eax: 5267456, ebx: 1172556, ecx: 5267872, edx: 404, edi: 5263432, esi: 5267456, ebp: 0x503cd8
1126 | ghd:   esp: 0xbfffd81c, ss: 31, eflags: 66070, eip: 0x1d70 <s1Ax_info>, cs: 23, ds: 31
1127 | ghd:   es: 31, fs: 0, gs: 55
1128 | ghd: Stack Trace
1129 | ghd:  0x1d70 <s1Ax_info> [Haskell, switching to STG stack]
1130 | ghd:   0x503cd8 <s1Ax_info> (RET_SMALL, 4b)
1131 | ghd:    0x503cdc <s1Ay_info> (RET_SMALL, 4b)
1132 | ghd:     0x503ce0 <stg_catch_frame_info> (CATCH_FRAME, 12b, <base_GHCziTopHandler_runIO2_info>)
1133 | ghd:      0x503cec <stg_stop_thread_info> (STOP_FRAME, 4b)
1134 | ghd: Stack trace done, exiting
1135 | ```
1136 | 
1137 | My [Game of Life](https://github.com/blitzcode/haskell-gol/tree/master/parallel-glfwb) program, error in C++ code, profiling (CCS) disabled:
1138 | 
1139 | ```
1140 | ghd: Debugging '/Users/Tim/haskell-gol/parallel-glfwb/src/out', PID 51007
1141 | ghd: RTS - Threaded: Yes | Profiling: No
1142 | ghd: Received signal 'SIGTRAP - trace trap' from child, continuing
1143 | ghd: Child stopped on fault 'SIGBUS - bus error'
1144 | ghd: Attempting to run stack trace
1145 | ghd: ---------------
1146 | ghd: Thread 3 of 7
1147 | ghd: ---------------
1148 | ghd: Status - Waiting
1149 | ghd: ExceptionState - trapno: 14 (EXC_I386_PGFLT - page fault), err: 6, faultvaddr: 0x1
1150 | ghd: Registers
1151 | ghd:   eax: 256, ebx: 16486418, ecx: 16486162, edx: 16416529, edi: 16485906, esi: 255, ebp: 0xb0100d78
1152 | ghd:   esp: 0xb0100d4c, ss: 31, eflags: 66118, eip: 0x3bb44 <step_grid_interior_segment()>, cs: 23, ds: 31
1153 | ghd:   es: 31, fs: 31, gs: 55
1154 | ghd: Stack Trace
1155 | ghd:  0x3bb44 <step_grid_interior_segment()> from out (gol.cpp:36)
1156 | ghd:   0x3bf3d <step_grid> from out (gol.cpp:111)
1157 | ghd:    0x365b3 <s10vm_info> [Haskell, switching to STG stack]
1158 | ghd:     0x15163400 <s10vm_info> (RET_SMALL, 36b)
1159 | ghd:      0x15163424 <stg_maskAsyncExceptionszh_ret_info> (RET_SMALL, 4b)
1160 | ghd:       0x15163428 <stg_catch_frame_info> (CATCH_FRAME, 12b, <sAuF_info>)
1161 | ghd:        0x15163434 <sAEY_info> (RET_SMALL, 12b)
1162 | ghd:         0x15163440 <stg_unmaskAsyncExceptionszh_ret_info> (RET_SMALL, 4b)
1163 | ghd:          0x15163444 <sAF0_info> (RET_SMALL, 52b)
1164 | ghd:           0x15163478 <sAGo_info> (RET_SMALL, 52b)
1165 | ghd:            0x151634ac <stg_catch_frame_info> (CATCH_FRAME, 12b, <base_GHCziConcziSync_forkIO2_info>)
1166 | ghd:             0x151634b8 <stg_stop_thread_info> (STOP_FRAME, 4b)
1167 | ghd: Stack trace done, exiting
1168 | ```
1169 | 
1170 | My [twitter client](https://github.com/blitzcode/jacky), error in Haskell code, profiling
1171 | (CCS) enabled:
1172 | 
1173 | ```
1174 | ghd: Debugging './jacky', PID 47395
1175 | ghd: RTS - Threaded: Yes | Profiling: Yes
1176 | ghd: Received signal 'SIGTRAP - trace trap' from child, continuing
1177 | ghd: Received signal 'SIGCHLD - child status has changed' from child, continuing
1178 | ghd: Child stopped on fault 'SIGBUS - bus error'
1179 | ghd: Attempting to run stack trace
1180 | ghd: ---------------
1181 | ghd: Thread 1 of 8
1182 | ghd: ---------------
1183 | ghd: Status - Waiting
1184 | ghd: ExceptionState - trapno: 14 (EXC_I386_PGFLT - page fault), err: 7, faultvaddr: 0x18435000
1185 | ghd: Registers
1186 | ghd:   eax: 400244736, ebx: 75534156, ecx: 1704960, edx: 435625360, edi: 71898364, esi: 71898149, ebp: 0x43edbd4
1187 | ghd:   esp: 0xbfffd93c, ss: 31, eflags: 66050, eip: 0x1ad4f01 <s1bQ_info>, cs: 23, ds: 31
1188 | ghd:   es: 31, fs: 0, gs: 55
1189 | ghd: Stack Trace
1190 | ghd:  0x1ad4f01 <s1bQ_info> [Haskell, switching to CCS]
1191 | ghd:   CCS:0x19f71d90 <drawQuad.uw> from QuadRendering (QuadRendering.hs:379:15-49)
1192 | ghd:    CCS:0x19f71b60 <drawQuad> from QuadRendering (QuadRendering.hs:(334,1)-(423,38))
1193 | ghd:     CCS:0x19f71480 <drawText.\> from FontRendering (FontRendering.hs:(192,83)-(211,57))
1194 | ghd:      CCS:0x46737d0 <drawText> from FontRendering (FontRendering.hs:(189,1)-(212,46))
1195 | ghd:       CCS:0x4673500 <text> from UI (UI.hs:(190,1)-(193,61))
1196 | ghd:        CCS:0x4673460 <frameAbsolute> from UI (UI.hs:141:1-51)
1197 | ghd:         CCS:0x4673370 <split> from UI (UI.hs:(110,1)-(113,28))
1198 | ghd:          CCS:0x437be50 <fontRenderingTest> from App (App.hs:(123,1)-(162,16))
1199 | ghd:           CCS:0x4682700 <layer> from UI (UI.hs:147:1-53)
1200 | ghd:            CCS:0x4682000 <runUI> from UI (UI.hs:(160,1)-(161,31))
1201 | ghd:             CCS:0x4678f90 <draw.\> from App (App.hs:(95,9)-(120,21))
1202 | ghd:              CCS:0x4678f40 <withQuadRenderBuffer.\.\.\> from QuadRendering (QuadRendering.hs:(181,23)-(191,40))
1203 | ghd:               CCS:0x4678ef0 <withQuadRenderBuffer.\.\> from QuadRendering (QuadRendering.hs:(180,30)-(191,40))
1204 | ghd:                CCS:0x4678e50 <withQuadRenderBuffer.\> from QuadRendering (QuadRendering.hs:(173,28)-(198,17))
1205 | ghd:                 CCS:0x4678db0 <withQuadRenderBuffer> from QuadRendering (QuadRendering.hs:(169,1)-(205,52))
1206 | ghd:                  CCS:0x4678770 <draw> from App (App.hs:(82,1)-(120,21))
1207 | ghd:                   CCS:0x467af60 <run.loop> from App (App.hs:(361,9)-(382,27))
1208 | ghd:                    CCS:0x441eef0 <run> from App (App.hs:(342,1)-(383,12))
1209 | ghd:                     CCS:0x441ee40 <main.\.\.\.\.\.\> from Main (Main.hs:(270,60)-(306,85))
1210 | ghd:                      CCS:0x441edf0 <withFontRenderer.\.\> from FontRendering (FontRendering.hs:(62,13)-(75,17))
1211 | ghd:                       CCS:0x441eda0 <withFontRenderer.\> from FontRendering (FontRendering.hs:(54,9)-(75,17))
1212 | ghd:                        CCS:0x441ed50 <withFontRenderer> from FontRendering (FontRendering.hs:(52,1)-(75,17))
1213 | ghd:                         CCS:0x441ed00 <main.\.\.\.\.\> from Main (Main.hs:(266,21)-(306,85))
1214 | ghd:                          CCS:0x441ecb0 <withQuadRenderer> from QuadRendering (QuadRendering.hs:(77,1)-(134,28))
1215 | ghd:                           CCS:0x441ec60 <main.\.\.\.\> from Main (Main.hs:(262,19)-(306,85))
1216 | ghd:                            CCS:0x441ec10 <withTextureCache.\> from TextureCache (TextureCache.hs:(63,9)-(77,13))
1217 | ghd:                             CCS:0x441ebc0 <withTextureCache> from TextureCache (TextureCache.hs:(49,1)-(77,13))
1218 | ghd:                              CCS:0x441eb70 <main.\.\.\> from Main (Main.hs:(253,17)-(306,85))
1219 | ghd:                               CCS:0x441eb20 <withWindow> from GLFWHelpers (GLFWHelpers.hs:(16,1)-(33,9))
1220 | ghd:                                CCS:0x441ead0 <main.\.\> from Main (Main.hs:(247,52)-(306,85))
1221 | ghd:                                 CCS:0x441ea80 <withImageCache.\> from ImageCache (ImageCache.hs:130:16-19)
1222 | ghd:                                  CCS:0x441ea30 <withImageCache> from ImageCache (ImageCache.hs:(93,1)-(130,20))
1223 | ghd:                                   CCS:0x441e9e0 <withProcessStatusesAsync> from Main (Main.hs:(121,1)-(152,17))
1224 | ghd:                                    CCS:0x441e990 <main.\.withProcessStatusesAsync'> from Main (Main.hs:(232,15)-(237,77))
1225 | ghd:                                     CCS:0x441e940 <main.\.withPSAsync> from Main (Main.hs:(219,15)-(231,36))
1226 | ghd:                                      CCS:0x441e700 <main.\> from Main (Main.hs:(216,31)-(306,85))
1227 | ghd:                                       CCS:0x44f5000 <main> from Main (Main.hs:(155,1)-(307,32))
1228 | ghd:                                        CCS:0x24426a0 <MAIN> from MAIN (<built-in>)
1229 | ghd: Stack trace done, exiting
1230 | ```
1231 | 
1232 | Seems to work pretty well!
1233 | 
1234 | # Profiling
1235 | 
1236 | How is what we've discussed so far related to profiling? A profiler often inspects a program in a similar way to a debugger. Sometimes, [your debugger is the profiler](http://poormansprofiler.org/). In case the term 'sampling profiler' is unclear, head over to [Wikipedia](http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29#Statistical_profilers).
1237 | 
1238 | ### rsvp
1239 | 
1240 | I previously developed a realtime profiling tool called rsvp. It provides source level profiling with call tree analysis for C, C++ and Objective-C programs. It also monitors things like memory allocations, disk I/O and thread scheduling. The source and documentation might be worth a read if you want a working example of some of the things explained here.
1241 | 
1242 | ![rsvp](https://raw.github.com/blitzcode/rsvp/master/img/main.png)
1243 | 
1244 | You can get it [here](https://github.com/blitzcode/rsvp).
1245 | 
1246 | Unfortunately, it does not yet support GHC Haskell programs. The techniques described here should allow rsvp to apply its analysis to them, though. I haven't yet yet decided how to best approach this. Let me know if you have any suggestions or are simply interested in Haskell support for rsvp.
1247 | 
1248 | # Conclusion
1249 | 
1250 | I hope this was useful, perhaps even a bit fun? Maybe you got inspired and will give a shot at developing your own tools and libraries working with GHC's stack and heap, perhaps even hack on GHC itself. Please [contact me](http://www.blitzcode.net/contact.shtml) if you have any comments or spotted mistakes in this document.
1251 | 
1252 | # Legal
1253 | 
1254 | This program is published under the [MIT License](http://en.wikipedia.org/wiki/MIT_License).
1255 | 
1256 | # Author
1257 | 
1258 | Developed by Tim C. Schroeder, visit my [website](http://www.blitzcode.net) to learn more.
1259 | 
1260 | 


--------------------------------------------------------------------------------
/crash/Main.hs:
--------------------------------------------------------------------------------
 1 | 
 2 | module Main where
 3 | 
 4 | import Data.Vector.Mutable
 5 | 
 6 | main :: IO ()
 7 | main = do
 8 |     someFuncA
 9 |     someFuncB
10 | 
11 | {-# NOINLINE someFuncA #-}
12 | someFuncA :: IO ()
13 | someFuncA = do
14 |     v <- new 100
15 |     unsafeWrite v 0 (0 :: Int)
16 | 
17 | {-# NOINLINE someFuncB #-}
18 | someFuncB :: IO ()
19 | someFuncB = do
20 |     someFuncC
21 |     someFuncC
22 | 
23 | {-# NOINLINE someFuncC #-}
24 | someFuncC :: IO ()
25 | someFuncC = do
26 |     v <- new 100
27 |     --unsafeWrite v 1000000000 (0 :: Int) -- Crash (hopefully)
28 |     unsafeWrite v 0 (0 :: Int)
29 |     c_someCFuncA
30 | 
31 | foreign import ccall unsafe "crash.h someCFuncA"
32 |     c_someCFuncA :: IO ()
33 | 
34 | 


--------------------------------------------------------------------------------
/crash/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 | 


--------------------------------------------------------------------------------
/crash/crash.c:
--------------------------------------------------------------------------------
 1 | 
 2 | #include "crash.h"
 3 | 
 4 | void someCFuncA()
 5 | {
 6 |     someCFuncB();
 7 |     someCFuncB();
 8 | }
 9 | 
10 | void someCFuncB()
11 | {
12 |     * (char *) 1 = 0;
13 | }
14 | 
15 | 


--------------------------------------------------------------------------------
/crash/crash.cabal:
--------------------------------------------------------------------------------
 1 | -- Initial crash.cabal generated by cabal init.  For further documentation,
 2 | --  see http://haskell.org/cabal/users-guide/
 3 | 
 4 | name:                crash
 5 | version:             0.1.0.0
 6 | -- synopsis:            
 7 | -- description:         
 8 | -- license:             
 9 | license-file:        LICENSE
10 | author:              Tim C. Schroeder
11 | maintainer:          tim@blitzcode.net
12 | -- copyright:           
13 | -- category:            
14 | build-type:          Simple
15 | -- extra-source-files:  
16 | cabal-version:       >=1.10
17 | 
18 | executable crash
19 |   main-is:             Main.hs
20 |   ghc-prof-options: -fprof-auto -caf-all
21 |   ghc-options:      -Wall -rtsopts -O2 -threaded -ddump-stg -ddump-cmm -ddump-to-file
22 | 
23 |   c-sources:        crash.c
24 |   --include-dirs:     /Library/Frameworks/GHC.framework/Versions/7.6.3-i386/usr/lib/ghc-7.6.3/include/
25 |   cc-options:       -std=c99 -g -O3 -fno-omit-frame-pointer -Wall -Wextra -Wno-unused-parameter
26 | 
27 | 
28 |   -- other-modules:       
29 |   -- other-extensions:    
30 |   build-depends:       base >=4.6 && <4.7, vector
31 |   -- hs-source-dirs:      
32 |   default-language:    Haskell2010
33 | 


--------------------------------------------------------------------------------
/crash/crash.h:
--------------------------------------------------------------------------------
 1 | 
 2 | #ifndef CRASH_H
 3 | #define CRASH_H
 4 | 
 5 | __attribute__((noinline)) void someCFuncA();
 6 | __attribute__((noinline)) void someCFuncB();
 7 | 
 8 | #endif // CRASH_H
 9 | 
10 | 


--------------------------------------------------------------------------------
/ghd/Makefile:
--------------------------------------------------------------------------------
 1 | 
 2 | # Compile all C++98 and C99 code
 3 | 
 4 | EXECUTABLE_NAME = ghd
 5 | 
 6 | # Default target depending on main output
 7 | .PHONY : all
 8 | all : $(EXECUTABLE_NAME) 
 9 | 
10 | # General compiler and linker flags
11 | FLAGS = -m32 -g -Wall -Wextra -Wno-unused-parameter -O3 -fno-omit-frame-pointer
12 | 
13 | # Flags specific to C or C++ code
14 | CFLAGS = $(FLAGS) -std=c99 \
15 | 	 -I/Library/Frameworks/GHC.framework/Versions/7.6.3-i386/usr/lib/ghc-7.6.3/include/
16 | CXXFLAGS = $(FLAGS) -std=c++98
17 | 
18 | # Sources and objects. We assume all .c / .cpp files in the directory are meant to
19 | # be compiled
20 | SRC = $(wildcard *.c) $(wildcard *.cpp)
21 | OBJ = $(patsubst %.cpp,%.o,$(patsubst %.c,%.o,$(SRC)))
22 | 
23 | # Rule for producing the main output
24 | LD_OBJ = $(OBJ)
25 | $(EXECUTABLE_NAME) : $(LD_OBJ)
26 | 	$(CXX) -o $@ $(FLAGS) $(LDFLAGS) $(LD_OBJ)
27 | 
28 | # Only include this if we actually build the main output, don't want to
29 | # generate dependencies for the clean targets etc.
30 | ifeq ($(MAKECMDGOALS), )
31 |  
32 | # Automatically generate dependencies with the compiler for each source file
33 | %.d: %.cpp
34 | 	@set -e; rm -f $@; \
35 | 	 $(CXX) -MM $(CXXFLAGS) $< > $@.$$$$; \
36 | 	 sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' < $@.$$$$ > $@; \
37 | 	 $(RM) $@.$$$$
38 | %.d: %.c
39 | 	@set -e; rm -f $@; \
40 | 	 $(CC) -MM $(CFLAGS) $< > $@.$$$$; \
41 | 	 sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' < $@.$$$$ > $@; \
42 | 	 $(RM) $@.$$$$
43 | 
44 | # Include the generated dependencies for object and dependency files. Be silent
45 | # during the first compile where the .d files have not been generated yet and
46 | # everything is recompiled
47 | -include $(patsubst %.cpp,%.d,$(patsubst %.c,%.d,$(SRC)))
48 | 
49 | endif
50 | 
51 | # Clean by deleting final output, dependency and object files
52 | .PHONY : clean
53 | clean:
54 | 	$(RM) $(EXECUTABLE_NAME) $(OBJ) $(OBJ:.o=.d)
55 | 
56 | 


--------------------------------------------------------------------------------
/ghd/atos.h:
--------------------------------------------------------------------------------
  1 | 
  2 | #ifndef ATOS_H
  3 | #define ATOS_H
  4 | 
  5 | 
  6 | //
  7 | // atos code reused from rsvp (https://github.com/blitzcode/rsvp)
  8 | //
  9 | 
 10 | 
 11 | // Wrapper around atos tool
 12 | //
 13 | // http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man1/atos.1.html
 14 | //
 15 | // Duplicating the full functionality of this tool would be rather difficult. Its source is not
 16 | // available, some of the underlying APIs are poorly documented, plus it does exactly what we need
 17 | // and seems reasonably fast and reliable
 18 | class ATOS_Pipe
 19 | {
 20 | public:
 21 |     ATOS_Pipe(pid_t pid)
 22 |     {
 23 |         // Check for atos
 24 |         //
 25 |         // Since 10.9 calling atos directly seems to be deprecated. The warning suggest
 26 |         // to invoke it through xcrun, which seems to work fine on 10.6 already
 27 |         //
 28 |         // Also see here:
 29 |         //
 30 |         // http://root.cern.ch/phpBB3/viewtopic.php?f=3&t=17190&start=30
 31 |         // https://github.com/allending/Kiwi/pull/365
 32 |         if (std::system("xcrun atos 2> /dev/null") != 0)
 33 |             assert(!"Can't find 'atos' command line utility - dev. tools not installed?");
 34 | 
 35 |         // TODO: The bi-directional popen() only works with this environment
 36 |         //       variable set to avoid a deadlock due to buffering, nasty
 37 |         if (setenv("NSUnbufferedIO", "YES", 1) != 0)
 38 |             assert(!"setenv failed");
 39 | 
 40 |         char buf[64];
 41 |         std::snprintf(buf, sizeof(buf), "xcrun atos -p %i", pid);
 42 |         m_pipe = popen(buf, "r+");
 43 |         assert(m_pipe != NULL);
 44 |     }
 45 | 
 46 |     ~ATOS_Pipe()
 47 |     {
 48 |         if (m_pipe != NULL)
 49 |             if (pclose(m_pipe ) != 0)
 50 |                 assert(!"pclose() failed");
 51 |     }
 52 | 
 53 |     void AddressToSymbol(uint64_t addr, char *buf, size_t buf_len) const
 54 |     {
 55 |         // Communicate with atos program for address resolution, needs to have
 56 |         // buffering disabled to not deadlock
 57 | 
 58 |         // The addresses need to be in hexadecimal, since 10.7 atos only resolves those
 59 |         if (std::fprintf(m_pipe, "0x%llx\n", addr) <= 0)
 60 |             assert(!"Writing to atos pipe failed");
 61 | 
 62 |         if (std::fgets(buf, buf_len, m_pipe) == NULL)
 63 |             assert(!"Reading from atos pipe failed");
 64 |     }
 65 | 
 66 | protected:
 67 |     std::FILE *m_pipe;
 68 | 
 69 | };
 70 | 
 71 | class SymbolManager
 72 | {
 73 | public:
 74 |     SymbolManager(pid_t pid) :
 75 |         m_atos(pid),
 76 |         m_cache_hit(0),
 77 |         m_cache_miss(0),
 78 |         m_unresolved_sym_name("(Unresolved)")
 79 |     { }
 80 | 
 81 |     uint32_t AddressToSymbolID(
 82 |         uint64_t addr,
 83 |         uint16_t *file_name_id_out = NULL, // Optional source & line information starts here
 84 |         uint16_t *line_number_out  = NULL)
 85 |     {
 86 |         // Resolve an address into a symbol ID common to all addresses resolving into that symbol.
 87 |         // We return source and line information on the spot. We can't cache them here as they are
 88 |         // tied to the address and not the symbol
 89 | 
 90 |         // Check address cache
 91 |         CacheEntry &cache = m_cache[addr % (sizeof(m_cache) / sizeof(CacheEntry))];
 92 |         if (cache.m_pc == addr &&
 93 |             cache.m_sym_id != uint32_t(-1)) // Sometimes we get a null address and a false hit
 94 |         {
 95 |             m_cache_hit++;
 96 |         }
 97 |         else
 98 |         {
 99 |             m_cache_miss++;
100 | 
101 |             // Obtain symbol string from atos
102 |             char symbol[8192];
103 |             m_atos.AddressToSymbol(addr, symbol, sizeof(symbol));
104 | 
105 |             // Module and file name / line
106 |             char module[1024];
107 |             char file_name[1024];
108 |             uint line_number;
109 |             ExtractModuleAndFileName(
110 |                 symbol,
111 |                 module,
112 |                 sizeof(module),
113 |                 file_name,
114 |                 sizeof(file_name),
115 |                 &line_number);
116 | 
117 |             // De-mangle atos output into clean and display friendly symbol name
118 |             PrettyPrintSymbol(symbol);
119 | 
120 |             // Just convert all hex addresses to a single unresolved token
121 |             if (std::strncmp(symbol, "0x", 2) == 0)
122 |                 std::strncpy(symbol, GetUnresolvedSymbolName(), sizeof(symbol));
123 | 
124 |             // Check if we already have that symbol name in the table
125 |             const uint64_t sym_hash = BernsteinHash(symbol) ^ BernsteinHash(module);
126 |             std::map<uint64_t, uint32_t>::iterator it_sym = m_hash_to_sym_id.find(sym_hash);
127 |             if (it_sym == m_hash_to_sym_id.end())
128 |             {
129 |                 // Add to symbol and module name string table
130 |                 uint32_t new_id = uint32_t(m_sym_table.size());
131 |                 m_sym_table.push_back(SymbolName());
132 |                 m_sym_table.back().m_symbol = std::string(symbol);
133 |                 m_sym_table.back().m_module = std::string(module);
134 | 
135 |                 // Hash-to-ID translation entry
136 |                 it_sym = m_hash_to_sym_id.insert
137 |                     (std::map<uint64_t, uint32_t>::value_type(sym_hash, new_id)).first;
138 |             }
139 | 
140 |             // Check if we already have that file name in the table
141 |             const uint64_t file_hash = BernsteinHash(file_name);
142 |             std::map<uint64_t, uint16_t>::iterator it_file = m_hash_to_file_name_id.find(file_hash);
143 |             if (it_file == m_hash_to_file_name_id.end())
144 |             {
145 |                 // Add to file name string table
146 |                 uint16_t new_id = uint16_t(m_file_name_table.size());
147 |                 m_file_name_table.push_back(std::string(file_name));
148 | 
149 |                 // Hash-to-ID translation entry
150 |                 it_file = m_hash_to_file_name_id.insert
151 |                     (std::map<uint64_t, uint16_t>::value_type(file_hash, new_id)).first;
152 |             }
153 | 
154 |             // Update cache
155 |             cache.m_pc = addr;
156 |             cache.m_sym_id = (* it_sym).second;
157 |             cache.m_file_name_id = (* it_file).second;
158 |             cache.m_line_number = line_number;
159 | 
160 |             assert(std::strcmp(symbol,    SymbolIDToName  ((* it_sym ).second)) == 0);
161 |             assert(std::strcmp(module,    SymbolIDToModule((* it_sym ).second)) == 0);
162 |             assert(std::strcmp(file_name, FileIDToName    ((* it_file).second)) == 0);
163 |         }
164 | 
165 |         // Return results from cache
166 |         if (file_name_id_out != NULL)
167 |             (* file_name_id_out) = cache.m_file_name_id;
168 |         if (line_number_out != NULL)
169 |             (* line_number_out) = cache.m_line_number;
170 |         const uint32_t sym_id = cache.m_sym_id;
171 | 
172 |         return sym_id;
173 |     }
174 | 
175 |     const char * SymbolIDToName(uint32_t id) const
176 |     {
177 |         assert(id < m_sym_table.size());
178 |         return m_sym_table[id].m_symbol.c_str();
179 |     }
180 | 
181 |     const char * SymbolIDToModule(uint32_t id) const
182 |     {
183 |         assert(id < m_sym_table.size());
184 |         return m_sym_table[id].m_module.c_str();
185 |     }
186 | 
187 |     const char * FileIDToName(uint16_t id) const
188 |     {
189 |         assert(id < m_file_name_table.size());
190 |         return m_file_name_table[id].c_str();
191 |     }
192 | 
193 |     float GetCacheHitPercentage() const
194 |     {
195 |         return float(m_cache_hit) / float(m_cache_hit + m_cache_miss) * 100.0f;
196 |     }
197 | 
198 |     const char * GetUnresolvedSymbolName() const { return m_unresolved_sym_name.c_str(); }
199 | 
200 | protected:
201 |     ATOS_Pipe m_atos;
202 | 
203 |     // Address -> Symbol ID cache
204 |     uint m_cache_hit;
205 |     uint m_cache_miss;
206 |     struct CacheEntry
207 |     {
208 |         CacheEntry() : m_pc(0), m_sym_id(-1), m_file_name_id(-1), m_line_number(-1) { }
209 |         uint64_t m_pc;
210 |         uint32_t m_sym_id;
211 |         // Have to store this here instead of the symbol table as they vary by address, not symbol
212 |         uint16_t m_file_name_id;
213 |         uint16_t m_line_number;
214 |     } m_cache[65536 * 32]; // 32MB
215 | 
216 |     // Table of unique symbol names and map to translate string hash -> table location
217 |     std::map<uint64_t, uint32_t> m_hash_to_sym_id;
218 |     struct SymbolName
219 |     {
220 |         std::string m_symbol;
221 |         std::string m_module;
222 |     };
223 |     std::vector<SymbolName> m_sym_table;
224 | 
225 |     // Table of unique file names and map to translate string hash -> table location
226 |     std::map<uint64_t, uint16_t> m_hash_to_file_name_id;
227 |     std::vector<std::string> m_file_name_table;
228 | 
229 |     std::string m_unresolved_sym_name;
230 | 
231 |     uint64_t BernsteinHash(const char *str_) const
232 |     {
233 |         // The original Bernstein hash eventually had some collisions, this is a simple
234 |         // 64 bit extension of it
235 | 
236 |         const uint8_t *str = reinterpret_cast<const uint8_t *> (str_);
237 |         uint32_t hash_a = 5381;
238 |         uint32_t hash_b = 5387;
239 |         int c;
240 | 
241 |         while ((c = *str++))
242 |         {
243 |             hash_a = hash_a * 33 ^ c;
244 |             hash_b = hash_b * 35 ^ c;
245 |         }
246 | 
247 |         return uint64_t(hash_a) | (uint64_t(hash_b) << 32);
248 |     }
249 | 
250 |     void ExtractModuleAndFileName(
251 |         const char *sym,
252 |         char *module,
253 |         size_t module_len,
254 |         char *file,
255 |         size_t file_len,
256 |         uint *line_number) const
257 |     {
258 |         // Extract the module and file / line from a symbol string. Can pass NULL for all out
259 |         // parameters to skip them
260 | 
261 |         // Initialize with failure defaults in case we abort
262 |         if (module != NULL)
263 |             std::strncpy(module, GetUnresolvedSymbolName(), module_len);
264 |         if (file != NULL)
265 |             std::strncpy(file, GetUnresolvedSymbolName(), file_len);
266 |         if (line_number != NULL)
267 |             (* line_number) = 0;
268 | 
269 |         // Find module name part
270 |         const char module_token[] = " (in ";
271 |         const char *module_begin = std::strstr(sym, module_token);
272 |         if (module_begin == NULL)
273 |             return; // Not present
274 |         module_begin += std::strlen(module_token);
275 | 
276 |         // Find end of module name part
277 |         const char *module_end = std::strchr(module_begin, ')');
278 |         if (module_end == NULL)
279 |             return; // Must be terminated by closing brace
280 |         const size_t module_idx = module_end - module_begin;
281 | 
282 |         // Extract module name
283 |         if (module != NULL)
284 |         {
285 |             std::strncpy(module, module_begin, module_len);
286 |             module[std::min(module_idx, module_len)] = '\0';
287 |         }
288 | 
289 |         // Find file name part
290 |         const char file_token[] = " (";
291 |         const char *file_begin = std::strstr(module_end, file_token);
292 |         if (file_begin == NULL)
293 |             return; // Not present
294 |         file_begin += std::strlen(file_token);
295 | 
296 |         // Find end of file name part
297 |         const char *file_end = std::strchr(file_begin, ':');
298 |         if (file_end == NULL)
299 |             return; // Need colon
300 |         const size_t file_idx = file_end - file_begin;
301 | 
302 |         // Extract file name
303 |         if (file != NULL)
304 |         {
305 |             std::strncpy(file, file_begin, file_len);
306 |             file[std::min(file_idx, file_len)] = '\0';
307 |         }
308 | 
309 |         // Extract line number
310 |         if (line_number != NULL)
311 |             std::sscanf(file_end + 1, "%i", line_number);
312 |     }
313 | 
314 |     void PrettyPrintSymbol(char *sym) const
315 |     {
316 |         // Convert the output of atos into a name that is readable and compact. We inevitably throw
317 |         // away some information like template arguments, function overloads and module information
318 |         // etc., it's a trade-off. This function also makes certain assumptions on how atos formats
319 |         // its symbols, will likely need to be tweaked if anything changes
320 | 
321 |         if (sym[0] == '+' || sym[0] == '-')
322 |         {
323 |             // Objective C. We just cut off the parameter list and everything after the brackets
324 |             while (*sym++ != '\0')
325 |             {
326 |                 if (*sym == ']' || *sym == ':')
327 |                 {
328 |                     *sym++ = ']';
329 |                     *sym = '\0';
330 |                 }
331 |             }
332 |         }
333 |         else
334 |         {
335 |             // Assume C / C++
336 | 
337 |             // Remove module, source file and offset information
338 |             {
339 |                 char *cut = std::strstr(sym, " (in ");
340 |                 if (cut != NULL)
341 |                     *cut = '\0';
342 |             }
343 | 
344 |             // Remove newline
345 |             if (sym[std::strlen(sym) - 1] == '\n')
346 |                 sym[std::strlen(sym) - 1] = '\0';
347 | 
348 |             // Shorten '(anonymous namespace)' to 'anon'
349 |             {
350 |                 char *anon = NULL;
351 |                 const char search[] = "(anonymous namespace)";
352 |                 const size_t len_s = sizeof(search) - 1;
353 |                 while ((anon = std::strstr(sym, search)))
354 |                 {
355 |                     const char replace[] = "anon";
356 |                     const size_t len_r = sizeof(replace) - 1;
357 |                     std::memcpy(anon, replace, len_r);
358 |                     std::memmove(anon + len_r, anon + len_s, std::strlen(sym + len_s) + 1);
359 |                 }
360 |             }
361 | 
362 |             char *orig_ptr = sym;
363 | 
364 |             // Compact braces and angle brackets
365 |             int angle_level = 0, brace_level = 0;
366 |             char *angle_start = NULL, *brace_start = NULL;
367 |             while (*sym != '\0')
368 |             {
369 |                 // Angle brackets confuse our parser, skip operators which have them
370 |                 const char ops[][16] =
371 |                 {
372 |                     "operator<<=", "operator <<=", "operator>>=", "operator >>=", // Shift Assign
373 |                     "operator<<",  "operator <<",  "operator>>",  "operator >>",  // Shift
374 |                     "operator<",   "operator <",   "operator>",   "operator >",   // Compare
375 |                     "operator->",  "operator ->"                                  // Dereference
376 |                 };
377 |                 for (uint i=0; i<sizeof(ops) / sizeof(ops[0]); i++)
378 |                     if (std::strncmp(sym, ops[i], std::strlen(ops[i])) == 0)
379 |                     {
380 |                         sym += std::strlen(ops[i]);
381 |                         break;
382 |                     }
383 | 
384 |                 // Don't bother inside braces, we just remove it all anyway
385 |                 if (brace_level == 0)
386 |                 {
387 |                     // Increment nesting level and store position of first open angle
388 |                     // bracket so we know where to start deleting
389 |                     if (*sym == '<')
390 |                         if (angle_level++ == 0)
391 |                             angle_start = sym;
392 | 
393 |                     // Decrement nesting level and replace on encountering final angle bracket
394 |                     if (*sym == '>')
395 |                         if (--angle_level == 0)
396 |                         {
397 |                             std::memmove(angle_start + 1, sym, strlen(sym) + 1);
398 |                             sym = angle_start + 1;
399 |                         }
400 |                     assert(angle_level >= 0);
401 |                 }
402 | 
403 |                 // Don't bother inside angle brackets, we just remove it all anyway
404 |                 if (angle_level == 0)
405 |                 {
406 |                     // Increment nesting level and store position of first open
407 |                     // brace so we know where to start deleting
408 |                     if (*sym == '(')
409 |                         if (brace_level++ == 0)
410 |                             brace_start = sym;
411 | 
412 |                     // Decrement nesting level and delete on encountering final brace
413 |                     if (*sym == ')')
414 |                         if (--brace_level == 0)
415 |                         {
416 |                             if (sym - brace_start > 1)
417 |                             {
418 |                                 std::memmove(brace_start + 1, sym, strlen(sym) + 1);
419 |                                 sym = brace_start + 1;
420 |                             }
421 |                         }
422 |                     assert(brace_level >= 0);
423 |                 }
424 | 
425 |                 sym++;
426 |             }
427 |             assert(angle_level == 0);
428 |             assert(brace_level == 0);
429 | 
430 |             // Remove leading types and trailing qualifiers
431 |             {
432 |                 sym = orig_ptr;
433 | 
434 |                 // Trailing const
435 |                 char *const_trail = std::strstr(sym, " const");
436 |                 if (const_trail != NULL)
437 |                     *const_trail = '\0';
438 | 
439 |                 // Leading types (return values) are sometimes included in the symbol, remove them
440 |                 while (*sym != '\0')
441 |                 {
442 |                     // Operator function names have spaces in them, don't throw them
443 |                     // away as leading segments
444 |                     const char ops[][16] = { " operator", ":operator", "$operator" };
445 |                     bool break_outer = false;
446 |                     for (uint i=0; i<sizeof(ops) / sizeof(ops[0]); i++)
447 |                         if (std::strncmp(sym, ops[i], std::strlen(ops[i])) == 0)
448 |                             break_outer = true;
449 |                     if (break_outer)
450 |                         break;
451 | 
452 |                     // Remove all space separated segments before the last one
453 |                     if (*sym == ' ')
454 |                     {
455 |                         std::memmove(orig_ptr, sym + 1, std::strlen(sym) + 1);
456 |                         sym = orig_ptr;
457 |                         continue;
458 |                     }
459 | 
460 |                     sym++;
461 |                 }
462 |             }
463 |         }
464 |     }
465 | };
466 | 
467 | 
468 | #endif // ATOS_H
469 | 
470 | 


--------------------------------------------------------------------------------
/ghd/ghd.cpp:
--------------------------------------------------------------------------------
  1 | 
  2 | #include "ghd.h"
  3 | 
  4 | #include <sys/wait.h>
  5 | #include <sys/types.h>
  6 | #include <sys/ptrace.h>
  7 | #include <sys/sysctl.h>
  8 | 
  9 | #include <signal.h>
 10 | 
 11 | #include <cstdio>
 12 | #include <cstdlib>
 13 | #include <cassert>
 14 | #include <stdint.h>
 15 | 
 16 | #include <vector>
 17 | #include <string>
 18 | #include <map>
 19 | 
 20 | #include "atos.h"
 21 | 
 22 | int main(int argc, char* argv[])
 23 | {
 24 |     if (argc == 1)
 25 |     {
 26 |         std::printf("Usage: %s executable [arguments]\n", argv[0]);
 27 |         return -1;
 28 |     }
 29 | 
 30 |     // Are we root?
 31 |     //
 32 |     // TODO: We could be a bit smarter about this, like explained here
 33 |     //       http://os-tres.net/blog/2010/02/17/mac-os-x-and-task-for-pid-mach-call/
 34 |     //       or just launch the child process with a Mach call, but for this simple
 35 |     //       proof-of-concept it is easiest just to require running as root to get
 36 |     //       the required access
 37 |     if (geteuid() != 0)
 38 |         assert(!"Need to run as root - forgot 'sudo'?");
 39 | 
 40 |     // We expect to be 32bit, like our target. Supporting all combinations of 32 and 64 bit
 41 |     // is absolutely possible, just this proof-of-concept debugger is fixed to a single
 42 |     // architecture to simplify things
 43 |     assert(sizeof(void *) == 4);
 44 | 
 45 |     // Launch process to be debugged, forward all arguments
 46 |     pid_t child = fork();
 47 |     if (child == 0)
 48 |     {
 49 |         // Let parent trace us
 50 |         //
 51 |         // TODO: We could alternatively also handle all of this entirely without ptrace()
 52 |         //       and rely entirely on Mach exception handling
 53 |         ptrace(PT_TRACE_ME, 0, NULL, NULL);
 54 | 
 55 |         if (execv(argv[1], &argv[1]) == -1)
 56 |             assert("execl() failed, can't load child executable");
 57 |     }
 58 |     else if (child == -1)
 59 |         assert(!"fork() failed");
 60 |     else
 61 |     {
 62 |         TargetInfo ti(child, argv[1]);
 63 | 
 64 |         if (ti.m_is_64_bit)
 65 |             assert(!"ghd: Expected 32bit executable");
 66 | 
 67 |         std::printf("ghd: Debugging '%s', PID %i\n", argv[1], child);
 68 | 
 69 |         std::printf("ghd: RTS - Threaded: %s | Profiling: %s\n",
 70 |             IsThreadedRTS(argv[1]) ? "Yes" : "No",
 71 |             IsProfilingRTS(argv[1]) ? "Yes" : "No");
 72 | 
 73 |         DebugLoop(ti);
 74 |         kill(child, SIGKILL);
 75 |     }
 76 | 
 77 |     return 0;
 78 | }
 79 | 
 80 | // Since we can't call rts_isProfiled() or such, check the flavor of the RTS by checking
 81 | // for the presence of symbols
 82 | //
 83 | // TODO: Could maybe just check for the library 'ribHSrts_thr_p.a(darwin.o)' etc.?
 84 | //
 85 | bool IsProfilingRTS(const char *exe) { return DoesSymbolExist(exe, "_CCS_MAIN"      ); }
 86 | bool IsThreadedRTS (const char *exe) { return DoesSymbolExist(exe, "_createOSThread"); }
 87 | bool DoesSymbolExist(const char *executable, const char *symbol)
 88 | {
 89 |     // Use nm to check if a symbol exists
 90 |     char buf[1024];
 91 |     std::snprintf(buf, sizeof(buf), "nm %s | grep %s", executable, symbol);
 92 |     std::FILE *pipe = popen(buf, "r");
 93 |     assert(pipe != NULL);
 94 |     bool ret = std::fread(buf, 1, sizeof(buf), pipe) != 0;
 95 |     pclose(pipe);
 96 |     return ret;
 97 | }
 98 | 
 99 | bool IsProcess64Bit(pid_t pid)
100 | {
101 |     int mib[4] = { CTL_KERN, KERN_PROC, KERN_PROC_PID, 0 };
102 |     mib[3] = pid;
103 |     size_t len = sizeof(kinfo_proc);
104 |     kinfo_proc kp;
105 | 
106 |     if (sysctl(mib, 4, &kp, &len, NULL, 0) == -1)
107 |         assert(!"sysctl() failed");
108 | 
109 |     return kp.kp_proc.p_flag & P_LP64 != 0;
110 | }
111 | 
112 | TargetInfo::TargetInfo(pid_t pid, const char *executable)
113 | {
114 |     // Process
115 |     m_pid = pid;
116 |     m_executable = executable;
117 |     // Configuration
118 |     m_threaded_rts = IsThreadedRTS(executable);
119 |     m_profiling_rts = IsProfilingRTS(executable);
120 |     m_is_64_bit = IsProcess64Bit(pid);
121 |     // Extract name of the executable module
122 |     const char *module_name = &executable[std::strlen(executable) - 1];
123 |     while (module_name != executable)
124 |     {
125 |         if (* module_name == '/')
126 |         {
127 |             module_name++;
128 |             break;
129 |         }
130 |         module_name--;
131 |     }
132 |     m_module = module_name;
133 |     // We need a port for the task to be debugged
134 |     if (task_for_pid(mach_task_self(), pid, &m_task_port) != KERN_SUCCESS)
135 |         assert(!"Can't get port for task");
136 | }
137 | 
138 | TargetInfo::~TargetInfo()
139 | {
140 |     if (mach_port_deallocate(mach_task_self(), m_task_port) != KERN_SUCCESS)
141 |         assert(!"Can't deallocate port");
142 | }
143 | 
144 | bool TargetInfo::ReadMemoryArg(target_ptr_t address, target_ptr_t size, void *buf) const
145 | {
146 |     kern_return_t ret = ::ReadMemory(m_task_port, address, size, buf);
147 |     return (ret == KERN_SUCCESS);
148 | }
149 | 
150 | void * TargetInfo::ReadMemoryReturn(target_ptr_t address, target_ptr_t size) const
151 | {
152 |     static char buf[65536];
153 |     if (size > sizeof(buf))
154 |         return NULL;
155 |     bool ret = ReadMemoryArg(address, size, buf);
156 |     return ret ? buf : NULL;
157 | }
158 | 
159 | target_ptr_t TargetInfo::ReadMemoryPtr(target_ptr_t address) const
160 | {
161 |     // Return the pointer value at the given address. Returns 0 both in case of
162 |     // a pointer to a null pointer and an actual failure to read the pointer value
163 |     // in the first place
164 |     target_ptr_t *ptr_val =
165 |         reinterpret_cast<target_ptr_t *> (ReadMemoryReturn(address, sizeof(target_ptr_t)));
166 |     return (ptr_val == NULL) ? 0 : (* ptr_val);
167 | }
168 | 
169 | bool TargetInfo::ReadMemoryString(target_ptr_t address, char *buf_out, size_t buf_size) const
170 | {   
171 |     unsigned int i;
172 |     for (i=0; i<buf_size; i++)
173 |     {
174 |         // TODO: Very slow, shouldn't matter for debugging, though
175 |         if (ReadMemoryArg(address + i, 1, &buf_out[i]) == false)
176 |            return false; 
177 |         if (buf_out[i] == 0)
178 |             break;
179 |         if (i == buf_size - 1) // Null terminate and fail if we're out of space
180 |         {
181 |             buf_out[buf_size - 1] = 0;
182 |             return false;
183 |         }
184 |     }
185 |     return true;
186 | }
187 | 
188 | void PTraceContinue(pid_t pid)
189 | {
190 |     // Child process stopped after signalling us, let it proceed
191 |     ptrace(PT_CONTINUE, pid, (caddr_t) 1, 0);
192 | }
193 | 
194 | void DebugLoop(const TargetInfo& ti)
195 | {
196 |     while (true)
197 |     {
198 |         // Wait for the child to exit or generate a signal requiring our attention
199 |         int status;
200 |         if (waitpid(ti.m_pid, &status, 0) == -1)
201 |         {
202 |             assert(!"waitpid() failed");
203 |             break;
204 |         }
205 | 
206 |         // Handle signal
207 |         if (WIFEXITED(status))
208 |         {
209 |             std::printf("ghd: Child done (%i), exiting\n", WEXITSTATUS(status));
210 |             break;
211 |         }
212 |         else if (WIFSIGNALED(status))
213 |         {
214 |             std::printf("ghd: Child terminated (%s), exiting\n",
215 |                 SignalToString(WTERMSIG(status)));
216 |             break;
217 |         }
218 |         else if (WIFSTOPPED(status))
219 |         {
220 |             // Child has stopped, forwarding us some a signal. Stack trace on fault or continue
221 |             const int sig = WSTOPSIG(status);
222 |             switch (sig)
223 |             {
224 |                 // Fault?
225 |                 case SIGSEGV:
226 |                 case SIGBUS:
227 |                 case SIGILL:
228 |                 case SIGFPE:
229 |                 case SIGSYS:
230 |                     std::printf("ghd: Child stopped on fault '%s'\n", SignalToString(sig));
231 |                     TraceProcessState(ti);
232 |                     return;
233 |                 case SIGALRM:
234 |                     // Just ignore this one silently
235 |                     PTraceContinue(ti.m_pid);
236 |                     break;
237 |                 // Continue
238 |                 default:
239 |                     std::printf(
240 |                         "ghd: Received signal '%s' from child, continuing\n",
241 |                         SignalToString(sig));
242 |                     PTraceContinue(ti.m_pid);
243 |                     break;
244 |             }
245 |         }
246 |         else
247 |             std::printf("ghd: Unknown event from child\n");
248 |     }
249 | }
250 | 
251 | void TraceProcessState(const TargetInfo& ti)
252 | {
253 |     std::printf("ghd: Attempting to run stack trace\n");
254 | 
255 |     // Initialize atos late to reduce startup time and make sure our child
256 |     // has actually run exec()
257 |     std::auto_ptr<SymbolManager> symbol =
258 |         std::auto_ptr<SymbolManager>(new SymbolManager(ti.m_pid));
259 | 
260 |     // List of threads
261 |     thread_act_port_array_t thread_list;
262 |     mach_msg_type_number_t thread_count;
263 |     if (task_threads(ti.m_task_port, &thread_list, &thread_count) != KERN_SUCCESS)
264 |         assert(!"Can't obtain thread list");
265 | 
266 |     for (uint thread_idx=0; thread_idx<thread_count; thread_idx++)
267 |     {
268 |         // Query thread scheduling etc. information
269 |         thread_basic_info tbi;
270 |         mach_msg_type_number_t thread_info_count = THREAD_BASIC_INFO_COUNT;
271 |         if (thread_info(thread_list[thread_idx],
272 |                         THREAD_BASIC_INFO,
273 |                         (thread_info_t) &tbi,
274 |                         &thread_info_count) != KERN_SUCCESS)
275 |         {
276 |             assert(!"thread_info() failed");
277 |         }
278 | 
279 |         // Only show threads which were running, remove to show all threads
280 |         if (tbi.flags & TH_FLAGS_SWAPPED)
281 |             continue;
282 | 
283 |         std::printf("ghd: ---------------\n");
284 |         std::printf("ghd: Thread %i of %i\n", thread_idx + 1, thread_count);
285 |         std::printf("ghd: ---------------\n");
286 | 
287 |         std::printf("ghd: Status - ");
288 |         switch (tbi.run_state)
289 |         {
290 |             case TH_STATE_RUNNING:         std::printf("Running");         break;
291 |             case TH_STATE_STOPPED:         std::printf("Stopped");         break;
292 |             case TH_STATE_WAITING:         std::printf("Waiting");         break;
293 |             case TH_STATE_UNINTERRUPTIBLE: std::printf("Uninterruptible"); break;
294 |             case TH_STATE_HALTED:          std::printf("Halted");          break;
295 |             default:                       std::printf("(Unknown)");       break;
296 |         }
297 |         if (tbi.flags & TH_FLAGS_SWAPPED)
298 |             std::printf(" (swapped out)");
299 |         if (tbi.flags & TH_FLAGS_IDLE)
300 |             std::printf(" (idle)");
301 |         std::printf("\n");
302 | 
303 |         // Query thread exception information
304 |         x86_exception_state32_t x86es32;
305 |         mach_msg_type_number_t exception_state_count32 = x86_EXCEPTION_STATE32_COUNT;
306 |         if (thread_get_state(thread_list[thread_idx],
307 |                              x86_EXCEPTION_STATE32,
308 |                              (thread_state_t) &x86es32,
309 |                              &exception_state_count32) != KERN_SUCCESS)
310 |         {
311 |             assert(!"thread_get_state() failed");
312 |         }
313 |         const char *trap_err_str = "";
314 |         switch (x86es32.__trapno)
315 |         {
316 |             case EXC_I386_DIVERR   :
317 |                 trap_err_str = " (EXC_I386_DIVERR - divide by 0 error)"; break;
318 |             case EXC_I386_SGLSTP   :
319 |                 trap_err_str = " (EXC_I386_SGLSTP - single step)"; break;
320 |             case EXC_I386_NMIFLT   :
321 |                 trap_err_str = " (EXC_I386_NMIFLT - NMI)"; break;
322 |             case EXC_I386_BPTFLT   :
323 |                 trap_err_str = " (EXC_I386_BPTFLT - breakpoint fault)"; break;
324 |             case EXC_I386_INTOFLT  :
325 |                 trap_err_str = " (EXC_I386_INTOFLT - INTO overflow fault)"; break;
326 |             case EXC_I386_BOUNDFLT :
327 |                 trap_err_str = " (EXC_I386_BOUNDFLT - BOUND instruction fault)"; break;
328 |             case EXC_I386_INVOPFLT :
329 |                 trap_err_str = " (EXC_I386_INVOPFLT - invalid opcode fault)"; break;
330 |             case EXC_I386_NOEXTFLT :
331 |                 trap_err_str = " (EXC_I386_NOEXTFLT - extension not available fault)"; break;
332 |             case EXC_I386_DBLFLT   :
333 |                 trap_err_str = " (EXC_I386_DBLFLT - double fault)"; break;
334 |             case EXC_I386_EXTOVRFLT:
335 |                 trap_err_str = " (EXC_I386_EXTOVRFLT - extension overrun fault)"; break;
336 |             case EXC_I386_INVTSSFLT:
337 |                 trap_err_str = " (EXC_I386_INVTSSFLT - invalid TSS fault)"; break;
338 |             case EXC_I386_SEGNPFLT :
339 |                 trap_err_str = " (EXC_I386_SEGNPFLT - segment not present fault)"; break;
340 |             case EXC_I386_STKFLT   :
341 |                 trap_err_str = " (EXC_I386_STKFLT - stack fault)"; break;
342 |             case EXC_I386_GPFLT    :
343 |                 trap_err_str = " (EXC_I386_GPFLT - general protection fault)"; break;
344 |             case EXC_I386_PGFLT    :
345 |                 trap_err_str = " (EXC_I386_PGFLT - page fault)"; break;
346 |             case EXC_I386_EXTERRFLT:
347 |                 trap_err_str = " (EXC_I386_EXTERRFLT - extension error fault)"; break;
348 |             case EXC_I386_ALIGNFLT :
349 |                 trap_err_str = " (EXC_I386_ALIGNFLT - alignment fault)"; break;
350 |             case EXC_I386_ENDPERR  :
351 |                 trap_err_str = " (EXC_I386_ENDPERR - emulated extension error flt)"; break;
352 |             case EXC_I386_ENOEXTFLT:
353 |                 trap_err_str = " (EXC_I386_ENOEXTFLT - emulated ext not present)"; break;
354 |         }
355 |         std::printf("ghd: ExceptionState - trapno: %u%s, err: %u, faultvaddr: 0x%x\n",
356 |             x86es32.__trapno,
357 |             trap_err_str,
358 |             x86es32.__err,
359 |             x86es32.__faultvaddr);
360 | 
361 |         // Query thread registers
362 |         x86_thread_state32_t x86ts32;
363 |         mach_msg_type_number_t thread_state_count32 = x86_THREAD_STATE32_COUNT;
364 |         if (thread_get_state(thread_list[thread_idx],
365 |                              x86_THREAD_STATE32,
366 |                              (thread_state_t) &x86ts32,
367 |                              &thread_state_count32) != KERN_SUCCESS)
368 |         {
369 |             assert(!"thread_get_state() failed");
370 |         }
371 |         std::printf("ghd: Registers\n" \
372 |                     "ghd:   eax: %u, ebx: %u, ecx: %u, edx: %u, edi: %u, esi: %u, ebp: 0x%x\n" \
373 |                     "ghd:   esp: 0x%x, ss: %u, eflags: %u, eip: 0x%x <%s>, cs: %u, ds: %u\n" \
374 |                     "ghd:   es: %u, fs: %u, gs: %u\n",
375 |             x86ts32.__eax,
376 |             x86ts32.__ebx,
377 |             x86ts32.__ecx,
378 |             x86ts32.__edx,
379 |             x86ts32.__edi,
380 |             x86ts32.__esi,
381 |             x86ts32.__ebp,
382 |             x86ts32.__esp,
383 |             x86ts32.__ss,
384 |             x86ts32.__eflags,
385 |             x86ts32.__eip,
386 |             symbol->SymbolIDToName(symbol->AddressToSymbolID(x86ts32.__eip)),
387 |             x86ts32.__cs,
388 |             x86ts32.__ds,
389 |             x86ts32.__es,
390 |             x86ts32.__fs,
391 |             x86ts32.__gs);
392 | 
393 |         // Traverse stack
394 |         StackTrace(ti, symbol.get(), x86ts32.__ebp, x86ts32.__eip);
395 |     }
396 | 
397 |     // Cleanup
398 |     std::printf("ghd: Stack trace done, exiting\n");
399 |     DeallocateThreadList(thread_list, thread_count);
400 | }
401 | 
402 | void Indent(unsigned int depth)
403 | {
404 |     std::printf("ghd: ");
405 |     for (unsigned int i=0; i<depth; i++)
406 |         std::printf(" ");
407 | }
408 | 
409 | void StackTrace(
410 |     const TargetInfo& ti,
411 |     SymbolManager *symbol,
412 |     target_ptr_t fp, // Frame pointer (or STG Sp register when we're in Haskell code)
413 |     target_ptr_t ip) // Instruction pointer
414 | {
415 |     std::printf("ghd: Stack Trace\n");
416 | 
417 |     // Frame pointer x86 stack traversal
418 |     struct StackFrame
419 |     {
420 |         target_ptr_t next;
421 |         target_ptr_t ret;
422 |     };
423 |     StackFrame frame = { fp, ip };
424 |     target_ptr_t old_fp = fp - 1;
425 |     for (unsigned int depth=1; depth<64; depth++)
426 |     {
427 |         uint16_t file_name_id, line_number;
428 |         const uint32_t sym_id =
429 |             symbol->AddressToSymbolID(frame.ret, &file_name_id, &line_number);
430 | 
431 |         // Are we in Haskell code now?
432 |         if (IsHaskellSymbol(ti, symbol, sym_id, file_name_id))
433 |         {
434 |             Indent(depth);
435 |             std::printf("0x%x <%s> [Haskell%s]\n",
436 |                 frame.ret,
437 |                 symbol->SymbolIDToName(sym_id),
438 |                 ti.m_profiling_rts ? ", switching to CCS" : ", switching to STG stack");
439 | 
440 |             if (ti.m_profiling_rts)
441 |             {
442 |                 // We have a CCS, pick it up from the Sp in the frame
443 |                 const target_ptr_t ccs = CCSPtrFromTopOfStack(ti, frame.next);
444 |                 if (ccs == 0)
445 |                 {
446 |                     Indent(depth + 1);
447 |                     std::printf("(Can't access CCS from STG stack at Sp = 0x%x)\n", frame.next);
448 |                 }
449 |                 else
450 |                     DumpCCS(ti, ccs, depth + 1);
451 |             }
452 |             else
453 |             {
454 |                 // No profiling RTS, proceed with the STG stack
455 |                 DumpSTG(ti, symbol, frame.next, depth + 1);
456 |             }
457 |             break; // Done, we don't handle any potential Haskell -> C transition yet
458 |         }
459 | 
460 |         // Current frame's function
461 |         Indent(depth);
462 |         std::printf("0x%x <%s> from %s (%s:%i)\n",
463 |             frame.ret,
464 |             symbol->SymbolIDToName(sym_id),
465 |             symbol->SymbolIDToModule(sym_id),
466 |             symbol->FileIDToName(file_name_id),
467 |             line_number);
468 | 
469 |         // Some sanity checks for the frame link pointer
470 |         bool stop_traversal = false;
471 |         if (frame.next < 1024)
472 |         {
473 |             Indent(depth + 1);
474 |             std::printf("(Next frame near zero)\n");
475 |             stop_traversal = true;
476 |         }
477 |         if (frame.next % sizeof(void *) != 0) // IIRC, OS X actually requires 16b alignment
478 |         {
479 |             Indent(depth + 1);
480 |             std::printf("(Next frame improperly aligned)\n");
481 |             stop_traversal = true;
482 |         }
483 |         if (frame.next == old_fp)
484 |         {
485 |             Indent(depth + 1);
486 |             std::printf("(Next frame identical to current)\n");
487 |             stop_traversal = true;
488 |         }
489 |         if (frame.next < old_fp)
490 |         {
491 |             Indent(depth + 1);
492 |             std::printf("(Next frame before current)\n");
493 |             stop_traversal = true;
494 |         }
495 |         if (frame.next - old_fp > 32 * 1024 * 1024)
496 |         {
497 |             Indent(depth + 1);
498 |             std::printf("(Next frame >32MB away from current)\n");
499 |             stop_traversal = true;
500 |         }
501 | 
502 |         // Next frame
503 |         old_fp = frame.next;
504 |         if (ti.ReadMemoryArg(frame.next, sizeof(StackFrame), &frame) == false)
505 |         {
506 |             // Likely a bad address, stop traversing
507 |             Indent(depth + 1);
508 |             std::printf("(Can't access next frame)\n");
509 |             stop_traversal = true;
510 |         }
511 | 
512 |         if (stop_traversal)
513 |             break;
514 |     }
515 | }
516 | 
517 | // Get a Cost Center Stack pointer from either the base register or the STG stack pointer
518 | target_ptr_t CCSPtrFromBaseReg(const TargetInfo& ti, target_ptr_t base_reg)
519 | {
520 |     return ti.ReadMemoryPtr(base_reg + OFFSET_StgRegTable_rCCCS);
521 | }
522 | target_ptr_t CCSPtrFromTopOfStack(const TargetInfo& ti, target_ptr_t sp_reg)
523 | {
524 |     return ti.ReadMemoryPtr
525 |         (sp_reg + OFFSET_StgClosure_header + OFFSET_StgHeader_prof + OFFSET_StgProfHeader_ccs);
526 | }
527 | 
528 | bool IsHaskellSymbol(
529 |     const TargetInfo& ti,
530 |     SymbolManager *symbol,
531 |     uint32_t sym_id,
532 |     uint16_t file_name_id)
533 | {
534 |     // One challenge with inspecting a stopped target process is figuring out if we're
535 |     // inside a Haskell function and need to do CCS / STG stack traversal, or inside
536 |     // a C/C++/ObjC function and need to do x86 frame pointer stack traversal.
537 |     //
538 |     // Here we simply look at the symbol information and decide based upon that. If
539 |     // our function has a matching name, comes from the executable (main module) and lacks
540 |     // source file debug information, we assume it's a Haskell function. This is rather
541 |     // crummy, but works well enough. We could also look at some other bits of machine
542 |     // state to make the decision, i.e.:
543 |     //
544 |     // C <- C
545 |     // -----------------------------
546 |     // ebp: Frame pointer (pointing to next + ret pointers)
547 |     // esp: Top of x86 stack
548 |     // 
549 |     // C <- Haskell
550 |     // -----------------------------
551 |     // ebp: Frame pointer (pointing to Haskell Sp + ret pointers)
552 |     // esp: Top of x86 stack
553 |     // 
554 |     // Haskell <- Haskell
555 |     // -----------------------------
556 |     // ebx: BaseReg
557 |     // edi: Hp
558 |     // ebp: Sp, Top of STG stack
559 |     // esp: Top of C stack? (Not used as a STG register)
560 |     // 
561 |     // Haskell <- C
562 |     // -----------------------------
563 |     // ebx: BaseReg
564 |     // edi: Hp
565 |     // ebp: Sp, Top of STG stack
566 |     // esp: Top of C stack? (Not used as a STG register)
567 | 
568 |     // Haskell functions don't have source file debug information
569 |     if (std::strcmp(symbol->FileIDToName(file_name_id), symbol->GetUnresolvedSymbolName()) != 0) 
570 |         return false;
571 |     // Assume that all Haskell functions come from our executable
572 |     if (strcasecmp(symbol->SymbolIDToModule(sym_id), ti.m_module.c_str()) != 0)
573 |         return false;
574 |     // Assume that all Haskell functions have certain tokens in their name
575 |     if (std::strstr(symbol->SymbolIDToName(sym_id), "_info") == NULL)
576 |         return false;
577 | 
578 |     return true;
579 | }
580 | 
581 | void DumpCCS(const TargetInfo& ti, target_ptr_t ccs, uint32_t starting_depth)
582 | {
583 |     // Traverse and print the passed Cost Center Stack
584 |     for (unsigned int depth=starting_depth; depth<64; depth++) 
585 |     {
586 |         Indent(depth);
587 | 
588 |         // Get CC pointer from CCS
589 |         const target_ptr_t cc = ti.ReadMemoryPtr(ccs + OFFSET_ConstCentreStack_cc);
590 |         if (cc == 0)
591 |         {
592 |             std::printf("(Can't read CC pointer)\n");
593 |             break;
594 |         }
595 | 
596 |         // Retrieve symbol information from CC
597 |         const target_ptr_t label_ptr  = ti.ReadMemoryPtr(cc + OFFSET_ConstCentre_label);
598 |         const target_ptr_t module_ptr = ti.ReadMemoryPtr(cc + OFFSET_ConstCentre_module);
599 |         const target_ptr_t srcloc_ptr = ti.ReadMemoryPtr(cc + OFFSET_ConstCentre_srcloc);
600 |         char label[256], module[256], srcloc[256];
601 |         if (ti.ReadMemoryString(label_ptr, label, sizeof(label)) == false)
602 |             std::strcpy(label, "(can't read label)");
603 |         if (ti.ReadMemoryString(module_ptr, module, sizeof(module)) == false)
604 |             std::strcpy(module, "(can't read module)");
605 |         if (ti.ReadMemoryString(srcloc_ptr, srcloc, sizeof(srcloc)) == false)
606 |             std::strcpy(srcloc, "(can't read srcloc)");
607 |         std::printf("CCS:0x%x <%s> from %s (%s)\n", ccs, label, module, srcloc);
608 | 
609 |         // Walk the CC stack
610 |         ccs = ti.ReadMemoryPtr(ccs + OFFSET_ConstCentreStack_prevStack);
611 |         if (ccs == 0)
612 |             break;
613 |     }
614 | }
615 | 
616 | void DumpSTG(
617 |     const TargetInfo& ti,
618 |     SymbolManager *symbol,
619 |     target_ptr_t sp,
620 |     uint32_t starting_depth)
621 | {
622 |     // TODO: See GHC's 'Printer.c' on how we could print more information about what's on
623 |     //       the stack
624 | 
625 |     // Traverse and print the passed STG stack
626 |     for (unsigned int depth=starting_depth; depth<128; depth++) 
627 |     {
628 |         // Top closure
629 |         const target_ptr_t info = ti.ReadMemoryPtr(sp);
630 |         const uint32_t sym_id = symbol->AddressToSymbolID(info);
631 | 
632 |         Indent(depth);
633 | 
634 |         // Let our wrapper collect the required information
635 |         uint32_t closure_type, closure_size;
636 |         target_ptr_t fun_ref;
637 |         if (GetClosureTypeAndSize(ti.m_task_port, sp, &closure_type, &closure_size, &fun_ref) != 0)
638 |         {
639 |             std::printf("0x%x (Can't read stack frame)\n", sp);
640 |             break;
641 |         }
642 | 
643 |         // Referenced closure
644 |         char ref_buf[256] = { 0 };
645 |         if (fun_ref != 0)
646 |         {
647 |             std::snprintf(
648 |                 ref_buf,
649 |                 sizeof(ref_buf),
650 |                 ", <%s>",
651 |                 symbol->SymbolIDToName(symbol->AddressToSymbolID(ti.ReadMemoryPtr(fun_ref))));
652 |         }
653 | 
654 |         std::printf("0x%x <%s> (%s, %ib%s)\n",
655 |             sp,
656 |             symbol->SymbolIDToName(sym_id),
657 |             ClosureTypeToString(closure_type),
658 |             closure_size,
659 |             ref_buf);
660 | 
661 |         if (closure_type == wrapper_STOP_FRAME)
662 |             break;
663 | 
664 |         // TODO: Handle underflow frames
665 |         if (closure_type == wrapper_UNDERFLOW_FRAME)
666 |             break;
667 | 
668 |         sp += closure_size;
669 |     }
670 | }
671 | 
672 | const char * SignalToString(int sig)
673 | {
674 |     switch (sig)
675 |     {
676 |         case SIGHUP   : return "SIGHUP - terminal line hangup";
677 |         case SIGINT   : return "SIGINT - interrupt program";
678 |         case SIGQUIT  : return "SIGQUIT - quit program";
679 |         case SIGILL   : return "SIGILL - illegal instruction";
680 |         case SIGTRAP  : return "SIGTRAP - trace trap";
681 |         case SIGABRT  : return "SIGABRT - abort program (formerly SIGIOT)";
682 |         case SIGEMT   : return "SIGEMT - emulate instruction executed";
683 |         case SIGFPE   : return "SIGFPE - floating-point exception";
684 |         case SIGKILL  : return "SIGKILL - kill program";
685 |         case SIGBUS   : return "SIGBUS - bus error";
686 |         case SIGSEGV  : return "SIGSEGV - segmentation violation";
687 |         case SIGSYS   : return "SIGSYS - non-existent system call invoked";
688 |         case SIGPIPE  : return "SIGPIPE - write on a pipe with no reader";
689 |         case SIGALRM  : return "SIGALRM - real-time timer expired";
690 |         case SIGTERM  : return "SIGTERM - software termination signal";
691 |         case SIGURG   : return "SIGURG - urgent condition present on socket";
692 |         case SIGSTOP  : return "SIGSTOP - stop (cannot be caught or ignored)";
693 |         case SIGTSTP  : return "SIGTSTP - stop signal generated from keyboard";
694 |         case SIGCONT  : return "SIGCONT - continue after stop";
695 |         case SIGCHLD  : return "SIGCHLD - child status has changed";
696 |         case SIGTTIN  : return "SIGTTIN - background read attempted from control terminal";
697 |         case SIGTTOU  : return "SIGTTOU - background write attempted to control terminal";
698 |         case SIGIO    : return "SIGIO - I/O is possible on a descriptor (see fcntl(2))";
699 |         case SIGXCPU  : return "SIGXCPU - cpu time limit exceeded (see setrlimit(2))";
700 |         case SIGXFSZ  : return "SIGXFSZ - file size limit exceeded (see setrlimit(2))";
701 |         case SIGVTALRM: return "SIGVTALRM - virtual time alarm (see setitimer(2))";
702 |         case SIGPROF  : return "SIGPROF - profiling timer alarm (see setitimer(2))";
703 |         case SIGWINCH : return "SIGWINCH - window size change";
704 |         case SIGINFO  : return "SIGINFO - status request from keyboard";
705 |         case SIGUSR1  : return "SIGUSR1 - user defined signal 1";
706 |         case SIGUSR2  : return "SIGUSR2 - user defined signal 2";
707 |         default       : return "unknown";
708 |     }
709 | }
710 | 
711 | void DeallocateThreadList(
712 |     thread_act_port_array_t thread_list,
713 |     mach_msg_type_number_t thread_count)
714 | {
715 |     for (uint i=0; i<thread_count; i++)
716 |         if (mach_port_deallocate(mach_task_self(), thread_list[i]) != KERN_SUCCESS)
717 |             assert(!"Error while deallocating thread port");
718 | 
719 |     if (vm_deallocate(mach_task_self(),
720 |                       (vm_offset_t) thread_list,
721 |                       sizeof(* thread_list) * thread_count) != KERN_SUCCESS)
722 |     {
723 |         assert(!"Error while deallocating thread list");
724 |     }
725 | }
726 | 
727 | 


--------------------------------------------------------------------------------
/ghd/ghd.h:
--------------------------------------------------------------------------------
 1 | 
 2 | #ifndef GHD_H
 3 | #define GHD_H
 4 | 
 5 | #include <string>
 6 | 
 7 | #include <mach/mach.h>
 8 | #include <mach/mach_vm.h> // for mach_vm_ instead of vm_
 9 | 
10 | class SymbolManager;
11 | 
12 | #include "rts_wrapper.h"
13 | 
14 | class TargetInfo
15 | {
16 | public:
17 |     TargetInfo(pid_t pid, const char *executable);
18 |     ~TargetInfo();
19 | 
20 |     void * ReadMemoryReturn(target_ptr_t address, target_ptr_t size) const;
21 |     bool ReadMemoryArg(target_ptr_t address, target_ptr_t size, void *buf) const;
22 |     target_ptr_t ReadMemoryPtr(target_ptr_t address) const;
23 |     bool ReadMemoryString(target_ptr_t address, char *buf_out, size_t buf_size) const;
24 | 
25 |     std::string m_executable;
26 |     std::string m_module; // Module name for our executable
27 |     bool m_threaded_rts;
28 |     bool m_profiling_rts;
29 |     pid_t m_pid;
30 |     bool m_is_64_bit;
31 |     mach_port_name_t m_task_port;
32 | 
33 | protected:
34 |     TargetInfo() { };
35 | };
36 | 
37 | bool DoesSymbolExist(const char *executable, const char *symbol);
38 | bool IsProfilingRTS(const char *exe);
39 | bool IsThreadedRTS (const char *exe);
40 | bool IsProcess64Bit(pid_t pid);
41 | 
42 | void DebugLoop(const TargetInfo& ti);
43 | void PTraceContinue(pid_t pid);
44 | void TraceProcessState(const TargetInfo& ti);
45 | void DeallocateThreadList(
46 |     thread_act_port_array_t thread_list,
47 |     mach_msg_type_number_t thread_count);
48 | const char * SignalToString(int sig);
49 | void Indent(unsigned int starting_depth);
50 | void StackTrace(
51 |     const TargetInfo& ti,
52 |     SymbolManager *symbol,
53 |     target_ptr_t fp,
54 |     target_ptr_t ip);
55 | bool IsHaskellSymbol(
56 |     const TargetInfo& ti,
57 |     SymbolManager *symbol,
58 |     uint32_t sym_id,
59 |     uint16_t file_name_id);
60 | void DumpCCS(const TargetInfo& ti, target_ptr_t ccs, uint32_t starting_depth);
61 | void DumpSTG(const TargetInfo& ti, SymbolManager *symbol, target_ptr_t sp, uint32_t starting_depth);
62 | target_ptr_t CCSPtrFromBaseReg(const TargetInfo& ti, target_ptr_t base_reg);
63 | target_ptr_t CCSPtrFromTopOfStack(const TargetInfo& ti, target_ptr_t sp_reg);
64 | 
65 | #endif // GHD_H
66 | 
67 | 


--------------------------------------------------------------------------------
/ghd/rts_wrapper.h:
--------------------------------------------------------------------------------
 1 | 
 2 | #ifndef RTS_WRAPPER_H
 3 | #define RTS_WRAPPER_H
 4 | 
 5 | //
 6 | // Since we can't include GHC's RTS headers from C++ code
 7 | // (https://ghc.haskell.org/trac/ghc/ticket/8676#ticket)
 8 | // we're writing a small C wrapper around the bits we need
 9 | //
10 | 
11 | #ifdef __cplusplus
12 | extern "C" {
13 | #endif // __cplusplus
14 | 
15 | #include <stdint.h>
16 | 
17 | #include <mach/mach.h>
18 | #include <mach/mach_vm.h> // for mach_vm_ instead of vm_
19 | 
20 | typedef uint32_t target_ptr_t;
21 | 
22 | kern_return_t ReadMemory(mach_port_name_t task, target_ptr_t address, target_ptr_t size, void *out);
23 | 
24 | const char * ClosureTypeToString(unsigned int type);
25 | int GetClosureTypeAndSize(
26 |     mach_port_name_t task,
27 |     target_ptr_t closure_addr,
28 |     uint32_t *closure_type_out,
29 |     uint32_t *closure_size_out,
30 |     target_ptr_t *fun_ref_out);
31 | 
32 | // Structure field offsets so we can traverse them in the memory space of the target process
33 | extern target_ptr_t OFFSET_StgRegTable_rCCCS;
34 | extern target_ptr_t OFFSET_ConstCentreStack_cc;
35 | extern target_ptr_t OFFSET_ConstCentreStack_prevStack;
36 | extern target_ptr_t OFFSET_ConstCentre_label;
37 | extern target_ptr_t OFFSET_ConstCentre_module;
38 | extern target_ptr_t OFFSET_ConstCentre_srcloc;
39 | extern target_ptr_t OFFSET_StgClosure_header;
40 | extern target_ptr_t OFFSET_StgHeader_prof;
41 | extern target_ptr_t OFFSET_StgProfHeader_ccs;
42 | 
43 | // Selected closure types
44 | extern uint32_t wrapper_STOP_FRAME;
45 | extern uint32_t wrapper_UNDERFLOW_FRAME;
46 | 
47 | #ifdef __cplusplus
48 | }
49 | #endif // __cplusplus
50 | 
51 | #endif // RTS_WRAPPER_H
52 | 
53 | 


--------------------------------------------------------------------------------
/ghd/rts_wrapper_noprof.c:
--------------------------------------------------------------------------------
  1 | 
  2 | #include "rts_wrapper.h"
  3 | 
  4 | // GHC RTS header, profiling disabled
  5 | #define THREADED
  6 | #include <Rts.h>
  7 | 
  8 | int GetClosureTypeAndSize(
  9 |     mach_port_name_t task,
 10 |     target_ptr_t closure_addr,
 11 |     uint32_t *closure_type_out,
 12 |     uint32_t *closure_size_out,
 13 |     target_ptr_t *fun_ref_out) // Optional closure referenced by closure_addr
 14 | {
 15 |     // Retrieve information about a closure stack object. We use this code during STG
 16 |     // stack traversal, which we only do on a non-profiling RTS, hence the reason why
 17 |     // it is located here
 18 | 
 19 |     (* fun_ref_out) = 0;
 20 | 
 21 |     // Copy closure
 22 |     StgClosure closure;
 23 |     if (ReadMemory(task, closure_addr, sizeof(StgClosure), &closure) != KERN_SUCCESS)
 24 |         return 1;
 25 | 
 26 |     // Copy info table for closure
 27 |     StgRetInfoTable info;
 28 |     const target_ptr_t info_addr = (target_ptr_t) get_ret_itbl(&closure);
 29 |     if (ReadMemory(task, info_addr, sizeof(StgRetInfoTable), &info) != KERN_SUCCESS)
 30 |         return 1;
 31 | 
 32 |     // Type
 33 |     (* closure_type_out) = info.i.type;
 34 | 
 35 |     // Compute closure size. Adapted from this code from 'ClosureMacros.h':
 36 | 
 37 |     // EXTERN_INLINE StgWord stack_frame_sizeW( StgClosure *frame )
 38 |     // {
 39 |     //     StgRetInfoTable *info;
 40 |     // 
 41 |     //     info = get_ret_itbl(frame);
 42 |     //     switch (info->i.type) {
 43 |     // 
 44 |     //     case RET_DYN:
 45 |     //     {
 46 |     //         StgRetDyn *dyn = (StgRetDyn *)frame;
 47 |     //         return  sizeofW(StgRetDyn) + RET_DYN_BITMAP_SIZE + 
 48 |     //             RET_DYN_NONPTR_REGS_SIZE +
 49 |     //             RET_DYN_PTRS(dyn->liveness) + RET_DYN_NONPTRS(dyn->liveness);
 50 |     //     }
 51 |     //             
 52 |     //     case RET_FUN:
 53 |     //         return sizeofW(StgRetFun) + ((StgRetFun *)frame)->size;
 54 |     // 
 55 |     //     case RET_BIG:
 56 |     //         return 1 + GET_LARGE_BITMAP(&info->i)->size;
 57 |     // 
 58 |     //     case RET_BCO:
 59 |     //         return 2 + BCO_BITMAP_SIZE((StgBCO *)((P_)frame)[1]);
 60 |     // 
 61 |     //     default:
 62 |     //         return 1 + BITMAP_SIZE(info->i.layout.bitmap);
 63 |     //     }
 64 |     // }
 65 | 
 66 |     switch (info.i.type)
 67 |     {
 68 |         case RET_DYN:
 69 |         {
 70 |             // typedef struct {
 71 |             //     const StgInfoTable* info;
 72 |             //     StgWord        liveness;
 73 |             //     StgWord        ret_addr;
 74 |             //     StgClosure *   payload[FLEXIBLE_ARRAY];
 75 |             // } StgRetDyn;
 76 |             StgRetDyn dyn;
 77 |             if (ReadMemory(task, closure_addr, sizeof(StgRetDyn), &dyn) != KERN_SUCCESS)
 78 |                 return 1;
 79 |             (* closure_size_out) = sizeofW(StgRetDyn) + RET_DYN_BITMAP_SIZE + 
 80 |                 RET_DYN_NONPTR_REGS_SIZE +
 81 |                 RET_DYN_PTRS(dyn.liveness) + RET_DYN_NONPTRS(dyn.liveness);
 82 |             break;
 83 |         }
 84 |                 
 85 |         case RET_FUN:
 86 |         {
 87 |             // typedef struct {
 88 |             //     const StgInfoTable* info;
 89 |             //     StgWord        size;
 90 |             //     StgClosure *   fun;
 91 |             //     StgClosure *   payload[FLEXIBLE_ARRAY];
 92 |             // } StgRetFun;
 93 |             StgRetFun fun;
 94 |             if (ReadMemory(task, closure_addr, sizeof(StgRetFun), &fun) != KERN_SUCCESS)
 95 |                 return 1;
 96 |             (* closure_size_out) = sizeofW(StgRetFun) + fun.size;
 97 |             break;
 98 |         }
 99 | 
100 |         case RET_BIG:
101 |         {
102 |             // #define GET_LARGE_BITMAP(info) ((StgLargeBitmap*) (((StgWord) ((info)+1))
103 |             //     + (info)->layout.large_bitmap_offset))
104 |             // return 1 + GET_LARGE_BITMAP(&info->i)->size;
105 |             StgLargeBitmap bm;
106 |             const target_ptr_t bm_addr =
107 |                 (info_addr + sizeof(StgRetInfoTable)) +
108 |                 info.i.layout.large_bitmap_offset;
109 |             if (ReadMemory(task, bm_addr, sizeof(StgLargeBitmap), &bm) != KERN_SUCCESS)
110 |                 return 1;
111 |             (* closure_size_out) = 1 + bm.size;
112 |             break;
113 |         }
114 | 
115 |         case RET_BCO:
116 |             // #define BCO_BITMAP(bco) ((StgLargeBitmap *)((StgBCO *)(bco))->bitmap)
117 |             // #define BCO_BITMAP_SIZE(bco) (BCO_BITMAP(bco)->size)
118 |             //
119 |             // typedef struct {
120 |             //     StgHeader      header;
121 |             //     StgArrWords   *instrs;	/* a pointer to an ArrWords */
122 |             //     StgArrWords   *literals;	/* a pointer to an ArrWords */
123 |             //     StgMutArrPtrs *ptrs;	/* a pointer to a  MutArrPtrs */
124 |             //     StgHalfWord   arity;        /* arity of this BCO */
125 |             //     StgHalfWord   size;         /* size of this BCO (in words) */
126 |             //     StgWord       bitmap[FLEXIBLE_ARRAY];  /* an StgLargeBitmap */
127 |             // } StgBCO;
128 |             //
129 |             // typedef struct {
130 |             //   StgWord size;
131 |             //   StgWord bitmap[FLEXIBLE_ARRAY];
132 |             // } StgLargeBitmap;
133 |             //
134 |             // (* closure_size_out) = 2 + BCO_BITMAP_SIZE((StgBCO *)((P_)frame)[1]);
135 |             //
136 |             (* closure_size_out) = 2; // TODO: Implement me
137 |             break;
138 |         
139 |         default:
140 |             (* closure_size_out) = 1 + BITMAP_SIZE(info.i.layout.bitmap);
141 |             break;
142 |     }
143 | 
144 |     (* closure_size_out) *= sizeof(StgWord);
145 | 
146 |     // TODO: We could do a lot more than just follow up a single reference
147 |     switch (info.i.type)
148 |     {
149 |         case CATCH_FRAME:
150 |         {
151 |             StgCatchFrame cframe;
152 |             if (ReadMemory(task, closure_addr, sizeof(StgCatchFrame), &cframe) != KERN_SUCCESS)
153 |                 return 1;
154 |             (* fun_ref_out) = (target_ptr_t) UNTAG_CLOSURE(cframe.handler);
155 |             break;
156 |         }
157 |     }
158 | 
159 |     return 0;
160 | }
161 | 
162 | uint32_t wrapper_STOP_FRAME = STOP_FRAME;
163 | uint32_t wrapper_UNDERFLOW_FRAME = UNDERFLOW_FRAME;
164 | 
165 | // Copied from GHC's 'Printer.c'
166 | 
167 | static const char *closure_type_names[] = {
168 |  [INVALID_OBJECT]        = "INVALID_OBJECT",
169 |  [CONSTR]                = "CONSTR",
170 |  [CONSTR_1_0]            = "CONSTR_1_0",
171 |  [CONSTR_0_1]            = "CONSTR_0_1",
172 |  [CONSTR_2_0]            = "CONSTR_2_0",
173 |  [CONSTR_1_1]            = "CONSTR_1_1",
174 |  [CONSTR_0_2]            = "CONSTR_0_2",
175 |  [CONSTR_STATIC]         = "CONSTR_STATIC",
176 |  [CONSTR_NOCAF_STATIC]   = "CONSTR_NOCAF_STATIC",
177 |  [FUN]                   = "FUN",
178 |  [FUN_1_0]               = "FUN_1_0",
179 |  [FUN_0_1]               = "FUN_0_1",
180 |  [FUN_2_0]               = "FUN_2_0",
181 |  [FUN_1_1]               = "FUN_1_1",
182 |  [FUN_0_2]               = "FUN_0_2",
183 |  [FUN_STATIC]            = "FUN_STATIC",
184 |  [THUNK]                 = "THUNK",
185 |  [THUNK_1_0]             = "THUNK_1_0",
186 |  [THUNK_0_1]             = "THUNK_0_1",
187 |  [THUNK_2_0]             = "THUNK_2_0",
188 |  [THUNK_1_1]             = "THUNK_1_1",
189 |  [THUNK_0_2]             = "THUNK_0_2",
190 |  [THUNK_STATIC]          = "THUNK_STATIC",
191 |  [THUNK_SELECTOR]        = "THUNK_SELECTOR",
192 |  [BCO]                   = "BCO",
193 |  [AP]                    = "AP",
194 |  [PAP]                   = "PAP",
195 |  [AP_STACK]              = "AP_STACK",
196 |  [IND]                   = "IND",
197 |  [IND_PERM]              = "IND_PERM",
198 |  [IND_STATIC]            = "IND_STATIC",
199 |  [RET_BCO]               = "RET_BCO",
200 |  [RET_SMALL]             = "RET_SMALL",
201 |  [RET_BIG]               = "RET_BIG",
202 |  [RET_DYN]               = "RET_DYN",
203 |  [RET_FUN]               = "RET_FUN",
204 |  [UPDATE_FRAME]          = "UPDATE_FRAME",
205 |  [CATCH_FRAME]           = "CATCH_FRAME",
206 |  [UNDERFLOW_FRAME]       = "UNDERFLOW_FRAME",
207 |  [STOP_FRAME]            = "STOP_FRAME",
208 |  [BLOCKING_QUEUE]        = "BLOCKING_QUEUE",
209 |  [BLACKHOLE]             = "BLACKHOLE",
210 |  [MVAR_CLEAN]            = "MVAR_CLEAN",
211 |  [MVAR_DIRTY]            = "MVAR_DIRTY",
212 |  [ARR_WORDS]             = "ARR_WORDS",
213 |  [MUT_ARR_PTRS_CLEAN]    = "MUT_ARR_PTRS_CLEAN",
214 |  [MUT_ARR_PTRS_DIRTY]    = "MUT_ARR_PTRS_DIRTY",
215 |  [MUT_ARR_PTRS_FROZEN0]  = "MUT_ARR_PTRS_FROZEN0",
216 |  [MUT_ARR_PTRS_FROZEN]   = "MUT_ARR_PTRS_FROZEN",
217 |  [MUT_VAR_CLEAN]         = "MUT_VAR_CLEAN",
218 |  [MUT_VAR_DIRTY]         = "MUT_VAR_DIRTY",
219 |  [WEAK]                  = "WEAK",
220 |  [PRIM]	                 = "PRIM",
221 |  [MUT_PRIM]              = "MUT_PRIM",
222 |  [TSO]                   = "TSO",
223 |  [STACK]                 = "STACK",
224 |  [TREC_CHUNK]            = "TREC_CHUNK",
225 |  [ATOMICALLY_FRAME]      = "ATOMICALLY_FRAME",
226 |  [CATCH_RETRY_FRAME]     = "CATCH_RETRY_FRAME",
227 |  [CATCH_STM_FRAME]       = "CATCH_STM_FRAME",
228 |  [WHITEHOLE]             = "WHITEHOLE"
229 | };
230 | 
231 | const char * ClosureTypeToString(unsigned int type)
232 | {
233 |     static const char unknown[] = "(Unknown Type)";
234 |     if (type >= N_CLOSURE_TYPES)
235 |         return unknown;
236 |     else
237 |         return closure_type_names[type];
238 | }
239 | 
240 | 


--------------------------------------------------------------------------------
/ghd/rts_wrapper_prof.c:
--------------------------------------------------------------------------------
 1 | 
 2 | #include "rts_wrapper.h"
 3 | 
 4 | // GHC RTS header, profiling enabled
 5 | #define PROFILING
 6 | #define THREADED
 7 | #include <Rts.h>
 8 | 
 9 | // We use those offsets during CCS traversal, so they need to be in the implementation
10 | // file where the profiling RTS
11 | target_ptr_t OFFSET_StgRegTable_rCCCS          = offsetof(StgRegTable    , rCCCS    );
12 | target_ptr_t OFFSET_ConstCentreStack_cc        = offsetof(CostCentreStack, cc       );
13 | target_ptr_t OFFSET_ConstCentreStack_prevStack = offsetof(CostCentreStack, prevStack);
14 | target_ptr_t OFFSET_ConstCentre_label          = offsetof(CostCentre     , label    );
15 | target_ptr_t OFFSET_ConstCentre_module         = offsetof(CostCentre     , module   );
16 | target_ptr_t OFFSET_ConstCentre_srcloc         = offsetof(CostCentre     , srcloc   );
17 | target_ptr_t OFFSET_StgClosure_header          = offsetof(StgClosure     , header   );
18 | target_ptr_t OFFSET_StgHeader_prof             = offsetof(StgHeader      , prof     );
19 | target_ptr_t OFFSET_StgProfHeader_ccs          = offsetof(StgProfHeader  , ccs      );
20 | 
21 | kern_return_t ReadMemory(mach_port_name_t task, target_ptr_t address, target_ptr_t size, void *out)
22 | {
23 |     mach_vm_size_t inout_size = size;
24 |     kern_return_t ret = mach_vm_read_overwrite(
25 |         task,
26 |         address,
27 |         inout_size,
28 |         (mach_vm_address_t) (uintptr_t) out,
29 |         &inout_size);
30 |     if (ret != KERN_SUCCESS)
31 |         return ret;
32 |     if (inout_size != size)
33 |         return KERN_FAILURE;
34 |     return KERN_SUCCESS;
35 | }
36 | 
37 | 


--------------------------------------------------------------------------------