├── 2019NAKubecon.pdf ├── coredumps ├── alpine.md ├── analyzing.md ├── coredumps.alpine.yaml ├── coredumps.yaml ├── generating.md └── readme.md ├── cpu-profiling ├── flamegraph.jpg ├── flamegraph.svg ├── profiling.yaml ├── readme.md └── trap.sh ├── dynamic-tracing-bcc.gif ├── dynamic-tracing ├── bcc │ └── readme.md ├── dynamic-tracing.yaml ├── kubernetes.md ├── mapgen.py ├── overview.md ├── perf │ └── readme.md ├── probes.md ├── readme.md └── runNative.sh ├── images ├── Dockerfile ├── Dockerfile.alpine ├── calc-offsets.py ├── netcore-bcc-trace.py ├── readme.md ├── setup.4.15.sh ├── setup.sh └── trace-hist.py ├── kernel-interactions └── readme.md ├── perfcollect ├── calltree.png ├── events.png ├── flamegraph.png ├── perfcollect.yaml └── readme.md ├── readme.md ├── static-tracepoints ├── readme.md └── static-tracepoints.yaml └── todo └── readme.md /2019NAKubecon.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joe-elliott/netcore-kubernetes-profiling/1c66cb84d8b9398292d46da34d4b2dba2dbe84ef/2019NAKubecon.pdf -------------------------------------------------------------------------------- /coredumps/alpine.md: -------------------------------------------------------------------------------- 1 | # Coredumps in Alpine 2 | 3 | This guide is meant to complement the [generating](./generating.md) and [analyzing](./analyzing.md) guides already available. These instructions are specific to Alpine and have been tested using [coredumps.alpine.yaml](./coredumps.alpine.yaml) 4 | 5 | Unfortunately the `createdump` utility is broken in Alpine containers. See the below thread for details. Due to this bug we will be forced to generate a full coredump. 6 | 7 | https://github.com/dotnet/coreclr/issues/24599 8 | 9 | ## Alpine coredumps 10 | 11 | Using the .NET Core pid run `createdump` in full mode. Note how large these dumps are. If you are running against using .NET Core 3.0+ then you should be able to use `dotnet dump` itself to generate the dump. 12 | 13 | ``` 14 | # /usr/share/dotnet/shared/Microsoft.NETCore.App/2.2.5/createdump --full 7 15 | Writing full dump to file /tmp/coredump.7 16 | Written 11171069952 bytes (2727312 pages) to core file 17 | 18 | # ls -al /tmp 19 | ... 20 | -rw-r--r-- 1 root root 11171229696 Jun 23 12:25 coredump.7 21 | ``` 22 | 23 | Use `dotnet dump` to analyze. See the [official documentation](https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-dump-instructions.md) for help. 24 | 25 | Below you will see some examples of some instructions run against dump taken from the [sample app](https://github.com/joe-elliott/sample-netcore-app). 26 | 27 | ``` 28 | / # dotnet dump analyze /tmp/coredump.7 29 | 30 | > clrstack 31 | OS Thread Id: 0x7 (0) 32 | Child SP IP Call Site 33 | 00007FFF276AE3C0 00007f38c66463ad [GCFrame: 00007fff276ae3c0] 34 | 00007FFF276AE4A0 00007f38c66463ad [HelperMethodFrame_1OBJ: 00007fff276ae4a0] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) 35 | 00007FFF276AE5D0 00007F384C0CD4A2 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) 36 | 00007FFF276AE660 00007F384C0989E9 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/root/coreclr/src/mscorlib/src/System/Threading/Tasks/Task.cs @ 2959] 37 | 00007FFF276AE6C0 00007F384C098879 System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken) [/root/coreclr/src/mscorlib/src/System/Threading/Tasks/Task.cs @ 2898] 38 | 00007FFF276AE720 00007F384C0B96B6 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) [/root/coreclr/src/mscorlib/src/System/Runtime/CompilerServices/TaskAwaiter.cs @ 146] 39 | 00007FFF276AE740 00007F384C765527 Microsoft.AspNetCore.Hosting.WebHostExtensions.Run(Microsoft.AspNetCore.Hosting.IWebHost) [/_/src/Microsoft.AspNetCore.Hosting/WebHostExtensions.cs @ 66] 40 | 00007FFF276AE760 00007F384C5B1B6E sample_netcore_app.Program.Main(System.String[]) 41 | 00007FFF276AEA48 00007f38c575cfcf [GCFrame: 00007fff276aea48] 42 | 00007FFF276AEF10 00007f38c575cfcf [GCFrame: 00007fff276aef10] 43 | 44 | > clrthreads 45 | ThreadCount: 10 46 | UnstartedThread: 0 47 | BackgroundThread: 7 48 | PendingThread: 0 49 | DeadThread: 2 50 | Hosted Runtime: no 51 | Lock 52 | DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 53 | 0 1 7 000055F9D46F0D20 2020020 Preemptive 00007F362BDD6E28:00007F362BDD7FD0 000055F9D46D1F60 0 Ukn 54 | 8 2 15 000055F9D4865C60 21220 Preemptive 0000000000000000:0000000000000000 000055F9D46D1F60 0 Ukn (Finalizer) 55 | 9 3 16 000055F9D487A580 1020220 Preemptive 0000000000000000:0000000000000000 000055F9D46D1F60 0 Ukn (Threadpool Worker) 56 | 10 4 17 000055F9D48EA400 21220 Preemptive 00007F372B8ED1D8:00007F372B8EDFD0 000055F9D46D1F60 0 Ukn 57 | XXXX 5 0 000055F9D48F3760 1031820 Preemptive 0000000000000000:0000000000000000 000055F9D46D1F60 0 Ukn (Threadpool Worker) 58 | 11 6 1a 000055F9D48F67E0 1021220 Preemptive 00007F372B903EE0:00007F372B903FD0 000055F9D46D1F60 0 Ukn (Threadpool Worker) 59 | 12 7 1b 000055F9D49A7C00 2021220 Preemptive 00007F362BC76408:00007F362BC77FD0 000055F9D46D1F60 0 Ukn 60 | XXXX 8 0 000055F9D4A99FE0 1031820 Preemptive 0000000000000000:0000000000000000 000055F9D46D1F60 0 Ukn (Threadpool Worker) 61 | 14 9 1e 000055F9D4A9BC80 21220 Preemptive 00007F362BDD80D0:00007F362BDD9FD0 000055F9D46D1F60 0 Ukn 62 | 15 10 87 000055F9D48F5E40 1021220 Preemptive 00007F362BDDCDC0:00007F362BDDDFD0 000055F9D46D1F60 0 Ukn (Threadpool Worker) 63 | 64 | > dumpheap -type FibonacciProvider 65 | Address MT Size 66 | 00007f372b950038 00007f384e086cd0 24 67 | 68 | Statistics: 69 | MT Count TotalSize Class Name 70 | 00007f384e086cd0 1 24 sample_netcore_app.Providers.FibonacciProvider 71 | 72 | > gcroot 00007f372b950038 73 | 74 | Thread 7: 75 | 00007FFF276AE660 00007F384C0989E9 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/root/coreclr/src/mscorlib/src/System/Threading/Tasks/Task.cs @ 2959] 76 | rbp-38: 00007fff276ae678 77 | -> 00007F362BDD6D58 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.AspNetCore.Hosting.WebHostExtensions+d__4, Microsoft.AspNetCore.Hosting]] 78 | -> 00007F362B961BA8 Microsoft.AspNetCore.Hosting.Internal.WebHost 79 | -> 00007F362B95FD60 Microsoft.Extensions.DependencyInjection.ServiceCollection 80 | -> 00007F362B95FD78 System.Collections.Generic.List`1[[Microsoft.Extensions.DependencyInjection.ServiceDescriptor, Microsoft.Extensions.DependencyInjection.Abstractions]] 81 | -> 00007F362BC28DC0 Microsoft.Extensions.DependencyInjection.ServiceDescriptor[] 82 | -> 00007F362B93FF38 Microsoft.Extensions.DependencyInjection.ServiceDescriptor 83 | -> 00007F362B93CA30 Microsoft.AspNetCore.Hosting.WebHostBuilderContext 84 | -> 00007F362B948C70 Microsoft.Extensions.Configuration.ConfigurationRoot 85 | -> 00007F362B948C90 Microsoft.Extensions.Configuration.ConfigurationReloadToken 86 | -> 00007F362B948CA8 System.Threading.CancellationTokenSource 87 | -> 00007F362BC3AFB8 System.Threading.CancellationTokenSource+CallbackPartition[] 88 | -> 00007F362BC3AFE0 System.Threading.CancellationTokenSource+CallbackPartition 89 | -> 00007F362BDCA070 System.Threading.CancellationTokenSource+CallbackNode 90 | -> 00007F362BDCA030 System.Action`1[[System.Object, System.Private.CoreLib]] 91 | -> 00007F362BDCA008 Microsoft.Extensions.Primitives.ChangeToken+<>c__DisplayClass1_0`1[[System.String, System.Private.CoreLib]] 92 | -> 00007F362BDC9FC8 System.Action`1[[System.String, System.Private.CoreLib]] 93 | -> 00007F362BDC9F20 Microsoft.Extensions.Options.OptionsMonitor`1[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering]] 94 | -> 00007F362BDCA4F8 System.Action`2[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering],[System.String, System.Private.CoreLib]] 95 | -> 00007F362BDCA4D8 Microsoft.Extensions.Options.OptionsMonitor`1+ChangeTrackerDisposable[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering]] 96 | -> 00007F362BDCA498 System.Action`2[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering],[System.String, System.Private.CoreLib]] 97 | -> 00007F362BDCA480 Microsoft.Extensions.Options.OptionsMonitorExtensions+<>c__DisplayClass0_0`1[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering]] 98 | -> 00007F362BDCA440 System.Action`1[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering]] 99 | -> 00007F362BDCA110 Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware 100 | -> 00007F362BDC6588 Microsoft.AspNetCore.Http.RequestDelegate 101 | -> 00007F362BDC6550 Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware 102 | -> 00007F362BDC38D8 Microsoft.AspNetCore.Routing.Matching.DfaMatcherFactory 103 | -> 00007F362BC29F10 Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope 104 | -> 00007F362BC3FB10 System.Collections.Generic.List`1[[System.IDisposable, System.Private.CoreLib]] 105 | -> 00007F362BC4F670 System.IDisposable[] 106 | -> 00007F362BC4DF88 Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServer 107 | -> 00007F362BC4F168 System.Collections.Generic.List`1[[Microsoft.AspNetCore.Server.Kestrel.Transport.Abstractions.Internal.ITransport, Microsoft.AspNetCore.Server.Kestrel.Transport.Abstractions]] 108 | -> 00007F362BDD46A8 Microsoft.AspNetCore.Server.Kestrel.Transport.Abstractions.Internal.ITransport[] 109 | -> 00007F362BDD3738 Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketTransport 110 | -> 00007F362BDD3080 Microsoft.AspNetCore.Server.Kestrel.Core.AnyIPListenOptions 111 | -> 00007F362BDD32D0 System.Collections.Generic.List`1[[System.Func`2[[Microsoft.AspNetCore.Connections.ConnectionDelegate, Microsoft.AspNetCore.Connections.Abstractions],[Microsoft.AspNetCore.Connections.ConnectionDelegate, Microsoft.AspNetCore.Connections.Abstractions]], System.Private.CoreLib]] 112 | -> 00007F362BDD3648 System.Func`2[[Microsoft.AspNetCore.Connections.ConnectionDelegate, Microsoft.AspNetCore.Connections.Abstractions],[Microsoft.AspNetCore.Connections.ConnectionDelegate, Microsoft.AspNetCore.Connections.Abstractions]][] 113 | -> 00007F362BDD3608 System.Func`2[[Microsoft.AspNetCore.Connections.ConnectionDelegate, Microsoft.AspNetCore.Connections.Abstractions],[Microsoft.AspNetCore.Connections.ConnectionDelegate, Microsoft.AspNetCore.Connections.Abstractions]] 114 | -> 00007F362BDD35C0 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnectionBuilderExtensions+<>c__DisplayClass1_0`1[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]] 115 | -> 00007F362BDD35D8 Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnectionMiddleware`1[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]] 116 | -> 00007F362BDCE760 Microsoft.AspNetCore.Hosting.Internal.HostingApplication 117 | -> 00007F362BDCB2D8 Microsoft.AspNetCore.Http.RequestDelegate 118 | -> 00007F362BDCB2B8 Microsoft.AspNetCore.Hosting.Internal.RequestServicesContainerMiddleware 119 | -> 00007F362BC29E80 Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine 120 | -> 00007F362BC32010 System.Collections.Concurrent.ConcurrentDictionary`2[[System.Type, System.Private.CoreLib],[System.Func`2[[Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope, Microsoft.Extensions.DependencyInjection],[System.Object, System.Private.CoreLib]], System.Private.CoreLib]] 121 | -> 00007F362BDBEDB0 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.Type, System.Private.CoreLib],[System.Func`2[[Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope, Microsoft.Extensions.DependencyInjection],[System.Object, System.Private.CoreLib]], System.Private.CoreLib]] 122 | -> 00007F362BDBE5B8 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Type, System.Private.CoreLib],[System.Func`2[[Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope, Microsoft.Extensions.DependencyInjection],[System.Object, System.Private.CoreLib]], System.Private.CoreLib]][] 123 | -> 00007F372B950008 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Type, System.Private.CoreLib],[System.Func`2[[Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope, Microsoft.Extensions.DependencyInjection],[System.Object, System.Private.CoreLib]], System.Private.CoreLib]] 124 | -> 00007F372B9822D8 System.Func`2[[Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope, Microsoft.Extensions.DependencyInjection],[System.Object, System.Private.CoreLib]] 125 | -> 00007F372B9822C0 Microsoft.Extensions.DependencyInjection.ServiceLookup.ExpressionResolverBuilder+<>c__DisplayClass17_1 126 | -> 00007F372B950038 sample_netcore_app.Providers.FibonacciProvider 127 | ``` -------------------------------------------------------------------------------- /coredumps/analyzing.md: -------------------------------------------------------------------------------- 1 | # Analyzing Core Dumps 2 | 3 | This guide will cover the basics of loading a netcore core dump in lldb and analyzing it. It presumes you've followed [this guide](./generating.md). The below commands are all run in the sidecar from the previous guide after a dump has been generated. 4 | 5 | 6 | ## Loading the Dump 7 | If you follow [the previous guide](./generating.md) then you have a core dump in your `/tmp` directory generated one way or another. 8 | 9 | ``` 10 | # ls /tmp/coredump* 11 | /tmp/coredump.6 12 | # lldb /usr/bin/dotnet --core /tmp/coredump.6 13 | ``` 14 | 15 | After you are in lldb, load the sos plugin and point it at the CLR. The sos plugin provides a set of commands that allow you to analyze the state of the managed application. The rest of the guide will use commands enabled by this plugin. Note that the location of libsosplugin.so and the CLR are framework version dependent. 16 | 17 | ``` 18 | (lldb) plugin load /usr/share/dotnet/shared/Microsoft.NETCore.App/2.2.5/libsosplugin.so 19 | (lldb) setclrpath /usr/share/dotnet/shared/Microsoft.NETCore.App/2.2.5 20 | ``` 21 | 22 | #### Getting help 23 | The sos plugin provides some basic help by running the `soshelp` command. Help is always a good place to start. After you have run `soshelp` and reviewed the available commands see below for some basic guides on performing other tasks. 24 | 25 | #### Finding a Thrown Exception 26 | An unhandled exception is a common way for a application to unexpectedly crash. In our case we forced the application to crash by calling [Environment.FailFast()](https://github.com/joe-elliott/sample-netcore-app/blob/master/Controllers/FailController.cs#L15). Let's discover the exception that was thrown and inspect the call stack. 27 | 28 | First, let's just check out all of our CLR threads. 29 | 30 | ``` 31 | (lldb) sos Threads 32 | ThreadCount: 9 33 | UnstartedThread: 0 34 | BackgroundThread: 8 35 | PendingThread: 0 36 | DeadThread: 0 37 | Hosted Runtime: no 38 | Lock 39 | ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 40 | 1 1 6 0000000002715340 2020020 Preemptive 0000000000000000:0000000000000000 00000000026C6FD0 0 Ukn 41 | 9 2 1a 00000000027BC340 21220 Preemptive 0000000000000000:0000000000000000 00000000026C6FD0 0 Ukn (Finalizer) 42 | 10 3 1b 00007FC6F00009F0 1020220 Preemptive 0000000000000000:0000000000000000 00000000026C6FD0 0 Ukn (Threadpool Worker) 43 | 11 4 1c 0000000002856A80 21220 Preemptive 0000000000000000:0000000000000000 00000000026C6FD0 0 Ukn 44 | 12 7 20 00000000028BC3D0 2021220 Preemptive 0000000000000000:0000000000000000 00000000026C6FD0 0 Ukn 45 | 14 9 23 00007FC6E4009B30 21220 Preemptive 00007FC7083BEE50:00007FC7083C0140 00000000026C6FD0 0 Ukn 46 | 15 12 e3 00007FC6D400E050 1021220 Preemptive 00007FC8085D0748:00007FC8085D1520 00000000026C6FD0 0 Ukn (Threadpool Worker) System.ExecutionEngineException 00007fc7081a71e0 47 | 16 13 e4 00007FC6D400F230 1021220 Preemptive 00007FC8085D7730:00007FC8085D9520 00000000026C6FD0 0 Ukn (Threadpool Worker) 48 | 17 16 e7 00007FC6D4011840 1021220 Preemptive 00007FC70843C5B0:00007FC70843E140 00000000026C6FD0 0 Ukn (Threadpool Worker) 49 | ``` 50 | 51 | Note that thread 15 has an unhandled exception. Let's switch to that thread and view the exception in detail. 52 | 53 | ``` 54 | (lldb) thread select 15 55 | * thread #15, stop reason = signal SIGABRT 56 | frame #0: 0x00007fc9a378db5a libpthread.so.0`__new_sem_wait_slow + 106 57 | libpthread.so.0`__new_sem_wait_slow: 58 | -> 0x7fc9a378db5a <+106>: cmpq $-0x1000, %rax ; imm = 0xF000 59 | 0x7fc9a378db60 <+112>: ja 0x7fc9a378db7b ; <+139> 60 | 0x7fc9a378db62 <+114>: movl %r8d, %edi 61 | 0x7fc9a378db65 <+117>: movl %eax, 0xc(%rsp) 62 | (lldb) sos PrintException 63 | Exception object: 00007fc7081a71e0 64 | Exception type: System.ExecutionEngineException 65 | Message: 66 | InnerException: 67 | StackTrace (generated): 68 | 69 | StackTraceString: 70 | HResult: 80131506 71 | ``` 72 | 73 | Finally, view the current callstack on the thread. This should give us information about where the exception was called from. In this case it clearly calls out the exception was called from `FailController::Get` as expected. 74 | 75 | ``` 76 | (lldb) sos ClrStack 77 | OS Thread Id: 0xe3 (15) 78 | Child SP IP Call Site 79 | 00007FC98A919548 00007fc9a378db5a [GCFrame: 00007fc98a919548] 80 | 00007FC98A919628 00007fc9a378db5a [HelperMethodFrame_2OBJ: 00007fc98a919628] System.Environment.FailFast(System.String, System.Exception) 81 | 00007FC98A919760 00007FC92EB305EF sample_netcore_app.Controllers.FailController.Get() 82 | 00007FC98A919780 00007FC92C921C0D SOS Warning: Loading symbols for dynamic assemblies is not yet supported 83 | DynamicClass.lambda_method 84 | 00007FC98A919790 00007FC929CE3C93 /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll!Unknown 85 | 00007FC98A9197A0 00007FC929CF1483 /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll!Unknown 86 | 00007FC98A9197F0 00007FC929CF31C2 /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll!Unknown 87 | ... 88 | ``` 89 | 90 | #### Inspecting Objects on the Heap 91 | Another common task is to inspect the heap to diagnose a memory leak. Here are some basic commands to inspect objects on the heap. 92 | 93 | First, let's find an object we are interested in and dump some basic information. 94 | ``` 95 | (lldb) sos DumpHeap -type FibonacciProvider 96 | Address MT Size 97 | 00007fc808277940 00007fc92a956cd0 24 98 | 99 | Statistics: 100 | MT Count TotalSize Class Name 101 | 00007fc92a956cd0 1 24 sample_netcore_app.Providers.FibonacciProvider 102 | Total 1 objects 103 | ``` 104 | 105 | ``` 106 | (lldb) sos DumpObj 00007fc808277940 107 | Name: sample_netcore_app.Providers.FibonacciProvider 108 | MethodTable: 00007fc92a956cd0 109 | EEClass: 00007fc92a962db8 110 | Size: 24(0x18) bytes 111 | File: /app/sample-netcore-app.dll 112 | Fields: 113 | None 114 | ``` 115 | 116 | ``` 117 | (lldb) sos DumpMT -md 00007fc92a956cd0 118 | EEClass: 00007FC92A962DB8 119 | Module: 00007FC9281F43C8 120 | Name: sample_netcore_app.Providers.FibonacciProvider 121 | mdToken: 0000000002000005 122 | File: /app/sample-netcore-app.dll 123 | BaseSize: 0x18 124 | ComponentSize: 0x0 125 | Slots in VTable: 7 126 | Number of IFaces in IFaceMap: 1 127 | -------------------------------------- 128 | MethodDesc Table 129 | Entry MethodDesc JIT Name 130 | 00007FC9288979A0 00007FC9284352F0 PreJIT System.Object.ToString() 131 | 00007FC9288979C0 00007FC9284352F8 PreJIT System.Object.Equals(System.Object) 132 | 00007FC928897A10 00007FC928435320 PreJIT System.Object.GetHashCode() 133 | 00007FC928897A20 00007FC928435340 PreJIT System.Object.Finalize() 134 | 00007FC92EB28CC0 00007FC92A956CB8 JIT sample_netcore_app.Providers.FibonacciProvider.calculateFibonacciValue(Int32) 135 | 00007FC92EB28CA0 00007FC92A956CB0 JIT sample_netcore_app.Providers.FibonacciProvider..ctor() 136 | 00007FC92EB28D00 00007FC92A956CC0 JIT sample_netcore_app.Providers.FibonacciProvider.calculateFibonacciValueRecursive(Int32, Int32, Int32, Int32) 137 | ``` 138 | 139 | You can even dump the IL code if you're so inclined. 140 | 141 | ``` 142 | (lldb) sos DumpIL 00007FC92A956CC0 143 | ilAddr = 00007FC9A3BF02D3 144 | IL_0000: ldarg.3 145 | IL_0001: ldarg.s VAR OR ARG 4 146 | IL_0003: bgt.s IL_0015 147 | IL_0005: ldarg.0 148 | IL_0006: ldarg.2 149 | IL_0007: ldarg.1 150 | IL_0008: ldarg.2 151 | IL_0009: add 152 | IL_000a: ldarg.3 153 | IL_000b: ldc.i4.1 154 | IL_000c: add 155 | IL_000d: ldarg.s VAR OR ARG 4 156 | IL_000f: call sample_netcore_app.Providers.FibonacciProvider::calculateFibonacciValueRecursive 157 | IL_0014: ret 158 | IL_0015: ldarg.2 159 | IL_0016: ret 160 | ``` 161 | 162 | However, the most likely thing you're looking for is a path to a GCRoot. This will give you information about why the object is still in memory which will help you diagnose memory leaks. At this point I am unsure why some of the object names are `` and how to correct this. 163 | 164 | **Note:** Since this guide was written I have discovered the [dotnet dump](https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-dump-instructions.md) tool. I believe that some of the issues with the gcroot analysis are because lldb in the debian container is version 7.0. Using the [dotnet dump](https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-dump-instructions.md) tool might significantly improve analysis. 165 | 166 | ``` 167 | (lldb) sos GCRoot -all -nostacks 00007fc808277940 168 | HandleTable: 169 | 00007FC9A3DA1140 (strong handle) 170 | -> 00007FC7081F2B48 System.Object[] 171 | -> 00007FC7081E6B20 System.Threading.Tasks.Task 172 | ... 173 | -> 00007FC80825EBA0 System.Action`1[[System.String, System.Private.CoreLib]] 174 | -> 00007FC80825EB18 175 | -> 00007FC80825EF18 System.Action`2[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering],[System.String, System.Private.CoreLib]] 176 | -> 00007FC80825EEF8 177 | -> 00007FC80825EEB8 System.Action`2[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering],[System.String, System.Private.CoreLib]] 178 | -> 00007FC80825EEA0 179 | -> 00007FC80825EE60 System.Action`1[[Microsoft.AspNetCore.HostFiltering.HostFilteringOptions, Microsoft.AspNetCore.HostFiltering]] 180 | -> 00007FC80825ECB0 Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware 181 | ... 182 | -> 00007FC808277940 sample_netcore_app.Providers.FibonacciProvider 183 | ``` -------------------------------------------------------------------------------- /coredumps/coredumps.alpine.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Pod 3 | metadata: 4 | name: sample-netcore-app 5 | labels: 6 | app: sample-netcore-app 7 | spec: 8 | shareProcessNamespace: true 9 | containers: 10 | - name: sample-netcore-app 11 | image: joeelliott/sample-netcore-app:v1.1.2-2.2.5-alpine 12 | imagePullPolicy: IfNotPresent 13 | env: 14 | - name: COMPlus_DbgEnableMiniDump 15 | value: "1" 16 | - name: COMPlus_DbgMiniDumpName 17 | value: "/tmp/coredump.%d" 18 | - name: ASPNETCORE_URLS 19 | value: http://*:8080 20 | - name: COMPlus_DbgMiniDumpType 21 | value: "4" 22 | volumeMounts: 23 | - mountPath: /tmp 24 | name: tmp 25 | - name: profile-sidecar 26 | image: joeelliott/netcore-debugging-tools:v0.0.14-2.2.5-alpine 27 | imagePullPolicy: IfNotPresent 28 | securityContext: 29 | privileged: true 30 | args: 31 | - sleep 32 | - "1d" 33 | volumeMounts: 34 | - mountPath: /tmp 35 | name: tmp 36 | volumes: 37 | - name: tmp 38 | emptyDir: {} 39 | -------------------------------------------------------------------------------- /coredumps/coredumps.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Pod 3 | metadata: 4 | name: sample-netcore-app 5 | labels: 6 | app: sample-netcore-app 7 | spec: 8 | shareProcessNamespace: true 9 | containers: 10 | - name: sample-netcore-app 11 | image: joeelliott/sample-netcore-app:v1.1.0-2.2.5 12 | imagePullPolicy: IfNotPresent 13 | env: 14 | - name: COMPlus_DbgEnableMiniDump 15 | value: "1" 16 | - name: COMPlus_DbgMiniDumpName 17 | value: "/tmp/coredump.%d" 18 | - name: ASPNETCORE_URLS 19 | value: http://*:8080 20 | volumeMounts: 21 | - mountPath: /tmp 22 | name: tmp 23 | - name: profile-sidecar 24 | image: joeelliott/netcore-debugging-tools:v0.0.11-2.2.5 25 | imagePullPolicy: IfNotPresent 26 | securityContext: 27 | privileged: true 28 | args: 29 | - sleep 30 | - infinity 31 | volumeMounts: 32 | - mountPath: /tmp 33 | name: tmp 34 | volumes: 35 | - name: tmp 36 | emptyDir: {} -------------------------------------------------------------------------------- /coredumps/generating.md: -------------------------------------------------------------------------------- 1 | # Generating Core Dumps 2 | 3 | This guide will walk you through capturing a core dump of a netcore application running in Kubernetes. The tools are designed to run in a sidecar next to the pod you want to debug. 4 | 5 | Most information pulled from: 6 | 7 | https://github.com/dotnet/coreclr/blob/master/Documentation/botr/xplat-minidump-generation.md 8 | 9 | ## Run your netcore app in K8s 10 | Create your pod with a [debugging sidecar](https://hub.docker.com/r/joeelliott/netcore-debugging-tools). The rest of this guide will use [coredumps.yaml](./coredumps.yaml) which runs a sidecar next to a simple [sample app](https://github.com/joe-elliott/sample-netcore-app). 11 | 12 | #### Environment Variables 13 | Set the following environment variables for your main process. 14 | 15 | ``` 16 | env: 17 | - name: COMPlus_DbgEnableMiniDump 18 | value: "1" 19 | - name: COMPlus_DbgMiniDumpName 20 | value: "/tmp/coredump.%d" 21 | ``` 22 | 23 | `COMPlus_DbgEnableMiniDump` tells the netcore runtime to generate a core dump if the process exits unexpectedly. 24 | 25 | `COMPlus_DbgMiniDumpName` designates the file to place the core dump in when the process exits unexpectedly. We are placing it in `/tmp` so it is accessible in the sidecar. 26 | 27 | Another variable you could consider setting is `COMPlus_DbgMiniDumpType`. `COMPlus_DbgMiniDumpType` allows you to change the information that is captured in the core dump. See [here](https://github.com/dotnet/coreclr/blob/master/Documentation/botr/xplat-minidump-generation.md#configurationpolicy) for more information. The default value of `MiniDumpWithPrivateReadWriteMemory` has been sufficient to view threads, stack traces and explore the heap. 28 | 29 | #### shareProcessNamespace 30 | Setting `shareProcessNamespace` to true allows the sidecar to easily access the process you want to debug. 31 | 32 | #### Mount /tmp 33 | By sharing /tmp as an empty directory the debugging sidecar can easily access core dumps created when the application exits unexpectedly. 34 | 35 | ## Generate dump 36 | 37 | To generate a dump file first exec into the sidecar. 38 | 39 | ``` 40 | kubectl exec -it -c profile-sidecar sample-netcore-app bash 41 | ``` 42 | 43 | There are two different scenarios in which you'd generally like to generate a core dump. See below for details on generating a dump on demand or when an application crashes. 44 | 45 | #### On Demand 46 | 47 | On demand core dumps are useful when your application enters states you need to better understand that do not cause the application to crash. E.g. 48 | 49 | - Your application is deadlocking and you want to see the stack traces of all threads. 50 | - Your application is consuming an unbounded amount of memory and you want to investigate the heap. 51 | 52 | To generate a core dump on demand we will use the `createdump` utility provided by Microsoft. This application is located in `/usr/share/dotnet/shared/Microsoft.NETCore.App/`. Note that you will need the pid of the dotnet process. 53 | 54 | ``` 55 | # ps aux | grep dotnet 56 | root 832 0.7 3.8 11927716 77536 ? SLsl 13:26 0:00 dotnet /app/sample-netcore-app.dll 57 | 58 | # /usr/share/dotnet/shared/Microsoft.NETCore.App/2.2.5/createdump 832 59 | Writing minidump with heap to file /tmp/coredump.832 60 | Written 168390656 bytes (41111 pages) to core file 61 | 62 | # ls -al /tmp/coredump* 63 | -rw-r--r-- 1 root root 168591360 Jun 8 13:29 /tmp/coredump.832 64 | ``` 65 | 66 | #### On Unexpected Exception 67 | 68 | If your application is crashing due to an unexpected exception then coredumps will be generated automatically due to the environment variables set above (`COMPlus_DbgEnableMiniDump` and `COMPlus_DbgMiniDumpName`). The [sample application](https://github.com/joe-elliott/sample-netcore-app) has an endpoint that calls [`Environment.FailFast()`](https://docs.microsoft.com/en-us/dotnet/api/system.environment.failfast?view=netcore-2.2) to force just such an unexpected exit. 69 | 70 | After connecting to the sidecar: 71 | 72 | ``` 73 | # ps aux | grep dotnet 74 | root 151 0.1 4.0 11796900 83064 ? SLsl 13:02 0:01 dotnet /app/sample-netcore-app.dll 75 | 76 | # curl http://localhost:8080/api/fail 77 | curl: (52) Empty reply from server 78 | 79 | # ls -al /tmp/coredump* 80 | -rw-r--r-- 1 root root 171085824 Jun 8 13:26 /tmp/coredump.151 81 | ``` 82 | 83 | ## Next Steps 84 | 85 | Now that you have generated a dump check out [this guide](./analyzing.md) for more information on analyzing it. -------------------------------------------------------------------------------- /coredumps/readme.md: -------------------------------------------------------------------------------- 1 | # core dumps 2 | 3 | Taking a core dump allows analysis of the state of the application at the time the dump was taken. This is useful to investigate the number and state of your application threads, viewing last thrown exceptions, exploring the objects on the heap and more. 4 | 5 | The following guides show how to generate and analyze the core dump of an application running in Kubernetes from a sidecar. 6 | 7 | - [Generating](./generating.md) 8 | - Guide on how to generate a coredump in multiple scenarios. 9 | - [Analyzing](./analyzing.md) 10 | - Information about using lldb to analyze the captured dump. 11 | 12 | ### Alternative Methods 13 | 14 | The above guides are dependent on being able to install lldb 3.9 in container. If this is not possible then Microsoft has provided a [dotnet dump](https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-dump-instructions.md) tool that does not rely on a native debugger. 15 | 16 | Note that dotnet dump cannot create dumps of .NET Core applications before 3.0. We still need to use createdump for 2.2 and earlier. 17 | 18 | - [Alpine](./alpine.md) 19 | - Alpine specific notes. -------------------------------------------------------------------------------- /cpu-profiling/flamegraph.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joe-elliott/netcore-kubernetes-profiling/1c66cb84d8b9398292d46da34d4b2dba2dbe84ef/cpu-profiling/flamegraph.jpg -------------------------------------------------------------------------------- /cpu-profiling/profiling.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Pod 3 | metadata: 4 | name: sample-netcore-app 5 | labels: 6 | app: sample-netcore-app 7 | spec: 8 | shareProcessNamespace: true 9 | containers: 10 | - name: sample-netcore-app 11 | image: joeelliott/sample-netcore-app:v1.0.0-2.2.5 12 | imagePullPolicy: IfNotPresent 13 | env: 14 | - name: COMPlus_PerfMapEnabled 15 | value: "1" 16 | - name: COMPlus_ZapDisable 17 | value: "1" 18 | - name: ASPNETCORE_URLS 19 | value: http://*:8080 20 | volumeMounts: 21 | - mountPath: /tmp 22 | name: tmp 23 | - name: profile-sidecar 24 | image: joeelliott/netcore-debugging-tools:v0.0.7-2.2.5 25 | imagePullPolicy: IfNotPresent 26 | securityContext: 27 | privileged: true 28 | args: 29 | - sleep 30 | - infinity 31 | volumeMounts: 32 | - mountPath: /tmp 33 | name: tmp 34 | volumes: 35 | - name: tmp 36 | emptyDir: {} -------------------------------------------------------------------------------- /cpu-profiling/readme.md: -------------------------------------------------------------------------------- 1 | # cpu-profiling 2 | 3 | This collection of scripts is designed to support cpu profiling of a netcore application running in Kubernetes cluster. The tools are designed to run in a sidecar next to the pod you want to debug. 4 | 5 | Most information pulled from: 6 | 7 | - [Linux Performance Tracing](https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md) 8 | - [Profiling Net Core App Linux](https://codeblog.dotsandbrackets.com/profiling-net-core-app-linux/) 9 | - [Flamegraphs](https://github.com/brendangregg/FlameGraph) 10 | 11 | ## Run your netcore app in K8s 12 | Create your pod with a [debugging sidecar](https://hub.docker.com/r/joeelliott/netcore-debugging-tools). The rest of this guide will use [profiling.yaml](./profiling.yaml) which runs a sidecar next to a simple [sample app](https://github.com/joe-elliott/sample-netcore-app). 13 | 14 | #### Environment Variables 15 | Set the following environment variables for your main process. 16 | 17 | ``` 18 | env: 19 | - name: COMPlus_PerfMapEnabled 20 | value: "1" 21 | - name: COMPlus_ZapDisable 22 | value: "1" 23 | ``` 24 | 25 | `COMPlus_PerfMapEnabled` creates a perf map in `/tmp` that perf can read to symbolicate stack traces. 26 | 27 | `COMPlus_ZapDisable` will force netcore runtime to be JITted. This is normally not desirable, but it will cause the netcore runtime dll symbols to be included in the perf maps. This will allow perf to gather symbols for both the runtime as well as your application. 28 | 29 | There are other ways to do this if you are interested. https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md#resolving-framework-symbols 30 | 31 | #### Mount /tmp 32 | By sharing /tmp as an empty directory the debugging sidecar can easily access perf maps created by the netcore application. 33 | 34 | #### shareProcessNamespace 35 | Setting `shareProcessNamespace` to true allows the sidecar to easily access the process you want to debug. 36 | 37 | ## Profile! 38 | 39 | Exec into the sidecar and run `./setup.sh`. The tools we are using are very tightly coupled with the kernel version you want to debug. Because of this we can't install all of the tools we need directly in the container. They must be installed once the container is running and the kernel version is known. `./setup.sh` will attempt to install the rest. If you are having issues refer to the notes on [kernel interactions](../kernel-interactions) with the container. 40 | 41 | ``` 42 | kubectl exec -it -c profile-sidecar sample-netcore-app bash 43 | # ./setup.sh 44 | ``` 45 | 46 | Next discover the pid of the dotnet process you want to profile. You will use it in the below examples. 47 | 48 | ``` 49 | # ps aux | grep dotnet 50 | root 6 0.5 4.2 11940308 87108 ? SLsl 02:46 0:06 dotnet /app/sample-netcore-app.dll 51 | ``` 52 | 53 | #### perf and FlameGraphs 54 | 55 | You can generate an interactable flamegraph svg by running the following: 56 | ``` 57 | perf record -g -p 58 | 59 | perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flamegraph.svg 60 | ``` 61 | 62 | Exit the container and copy it locally 63 | ``` 64 | kubectl cp default/sample-netcore-app:flamegraph.svg flamegraph.svg -c profile-sidecar 65 | ``` 66 | 67 | Enjoy your [interactable flamegraph](./flamegraph.svg) 68 | 69 | ![flamegraph](./flamegraph.jpg) 70 | 71 | ## Traps 72 | 73 | Profiling for long periods of time can often generate too much data to be worthwhile. Often you only want to start tracing during certain events when a service is misbehaving. See [`./trap.sh`](./trap.sh) script for an example. 74 | 75 | This script uses docker stats to only trigger profiling when the CPU usage dips below a threshold. This is useful if you have a netcore application that is experiencing thread starvation and causing the service to stall out. 76 | -------------------------------------------------------------------------------- /cpu-profiling/trap.sh: -------------------------------------------------------------------------------- 1 | # 2 | # This is a crappy little script that waits for a container's cpu usage to fall below a threshold and 3 | # then profiles it for one second. You can use something like it but way better to diagnose 4 | # thread starvation issues. 5 | # 6 | count=0 7 | threshold=3 8 | pid=1034 9 | containerid=3883c31f9b55 10 | 11 | while true; do 12 | perc=100 13 | 14 | while [ $perc -gt $threshold ]; do 15 | perc=$(docker stats --format '{{.CPUPerc}}' --no-stream $containerid | sed 's/.$//') 16 | perc=$(printf "%.0f" $perc) 17 | 18 | sleep 1 19 | done 20 | 21 | count=$((count+1)) 22 | 23 | ## Generate flamegraph 24 | perf record -p $pid -g -- sleep 1 25 | perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > $count.svg 26 | done -------------------------------------------------------------------------------- /dynamic-tracing-bcc.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joe-elliott/netcore-kubernetes-profiling/1c66cb84d8b9398292d46da34d4b2dba2dbe84ef/dynamic-tracing-bcc.gif -------------------------------------------------------------------------------- /dynamic-tracing/bcc/readme.md: -------------------------------------------------------------------------------- 1 | # bcc 2 | 3 | This document shows step by step examples on using bcc to dynamically trace [this application](https://github.com/joe-elliott/sample-netcore-app) in your cluster with [the sidecar](https://hub.docker.com/r/joeelliott/netcore-debugging-tools) generated from this repo. 4 | 5 | See [probes](../probes.md) for more information on setup. After you have followed the steps there come back to learn how to use bcc. 6 | 7 | #### netcore-bcc-trace.py 8 | 9 | bcc is mostly amazing. It allows for bpf programs to be run when dynamic tracepoints are hit. [netcore-bcc-trace.py](../../images/netcore-bcc-trace.py) is a utility I built to easily trace parameter and return values of functions. 10 | 11 | Tracing `calculateFibonacciValue`: 12 | ``` 13 | root@sample-netcore-app:~# python netcore-bcc-trace.py /app-profile/sample-netcore-app.ni.exe 0x1920 int 14 | dotnet-3438 [001] .... 903863.831439: : val 10 15 | dotnet-3438 [000] .... 903895.395103: : val 20 16 | dotnet-3740 [001] .... 903899.770254: : val 30 17 | ``` 18 | 19 | [netcore-bcc-trace.py](../../images/netcore-bcc-trace.py) dynamically prints out the values passed into the traced method as it is being called. In the above example the application was curled passing in the three values shown. 20 | 21 | Tracing `calculateEchoValue`: 22 | ``` 23 | root@sample-netcore-app:~# python netcore-bcc-trace.py /app-profile/sample-netcore-app.ni.exe 0x1900 str 24 | dotnet-5408 [001] .... 904117.897441: : len 11 : hello world 25 | ``` 26 | In this example the echo endpoint was called passing in "hello world". 27 | 28 | It should be noted that the string tracing does some hacky things to extract a string parameter value and display it as it's being traced. I'm just ditching the first byte of every character and displaying the second. I don't actually know what netcore's internal character encoding is and this just happens to work if all of your characters are 8-bit ASCII. 29 | 30 | #### trace-hist.py 31 | [trace-hist.py](../../images/trace-hist.py) 32 | This basic example traces `calculateFibonacciValue` and draws a histogram of the values that were passed to this function. Eventually I intend on rolling histogram functionality into the above script. 33 | 34 | Because bcc uses bpf to attach arbitrary code to dynamic tracepoints it can do so much more than the above! See some examples here: https://github.com/iovisor/bcc/tree/master/examples. 35 | -------------------------------------------------------------------------------- /dynamic-tracing/dynamic-tracing.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Pod 3 | metadata: 4 | name: sample-netcore-app 5 | labels: 6 | app: sample-netcore-app 7 | spec: 8 | shareProcessNamespace: true 9 | containers: 10 | - env: 11 | - name: COMPlus_PerfMapEnabled 12 | value: "1" 13 | - name: ASPNETCORE_URLS 14 | value: http://*:8080 15 | image: joeelliott/sample-netcore-app:v1.0.0-2.2.5 16 | imagePullPolicy: IfNotPresent 17 | name: sample-netcore-app 18 | command: ["/run-native/runNative.sh"] 19 | args: ["/app/sample-netcore-app.dll"] 20 | volumeMounts: 21 | - mountPath: /run-native 22 | name: run-native-volume 23 | - mountPath: /app-profile 24 | name: app 25 | - mountPath: /tmp 26 | name: tmp 27 | - name: profile-sidecar 28 | image: joeelliott/netcore-debugging-tools:v0.0.10-2.2.5 29 | imagePullPolicy: IfNotPresent 30 | securityContext: 31 | privileged: true 32 | args: 33 | - sleep 34 | - infinity 35 | volumeMounts: 36 | - mountPath: /app-profile 37 | name: app 38 | - mountPath: /tmp 39 | name: tmp 40 | - mountPath: /sys 41 | name: sys 42 | # - mountPath: /usr/src 43 | # name: modules 44 | # readOnly: true 45 | # - mountPath: /lib/modules 46 | # name: headers 47 | # readOnly: true 48 | volumes: 49 | - name: tmp 50 | emptyDir: {} 51 | - configMap: 52 | defaultMode: 0740 53 | name: profile-run-native 54 | name: run-native-volume 55 | - emptyDir: {} 56 | name: app 57 | - hostPath: 58 | path: /sys 59 | type: Directory 60 | name: sys 61 | # - hostPath: 62 | # path: /usr/src 63 | # type: Directory 64 | # name: modules 65 | # - hostPath: 66 | # path: /lib/modules 67 | # type: Directory 68 | # name: headers 69 | --- 70 | apiVersion: v1 71 | kind: ConfigMap 72 | metadata: 73 | name: profile-run-native 74 | data: 75 | runNative.sh: | 76 | #! /bin/sh 77 | # 78 | # runNative.sh 79 | # ./runNative /app/app.dll 80 | # 81 | 82 | APP_DLL=$1 83 | APP_DIR=$(dirname "$APP_DLL") 84 | DOTNET_VERSION=$(dotnet --info | grep Version | cut -f2 -d":" | xargs) 85 | DOTNET_FRAMEWORK_PATH=/usr/share/dotnet/shared/Microsoft.NETCore.App/$DOTNET_VERSION 86 | #todo: dynamically generate this with dotnet --list-runtimes 87 | ADDITIONAL_PATHS=/usr/share/dotnet/shared/Microsoft.AspNetCore.All/$DOTNET_VERSION:/usr/share/dotnet/shared/Microsoft.AspNetCore.App/$DOTNET_VERSION 88 | 89 | # using the shell name to guess the runtime id. can't find a better way to do this 90 | # bash => linux-x64 91 | # ash => linux-musl-x64 92 | if [ -f /bin/bash ]; then 93 | RUNTIME_ID=linux-x64 94 | else 95 | RUNTIME_ID=linux-musl-x64 96 | fi 97 | 98 | # get dotnet sdk 99 | echo -- Grabbing netcore sdk $DOTNET_VERSION for runtime $RUNTIME_ID 100 | 101 | # alpine containers have wget. standard have curl 102 | if which curl; then 103 | curl -L -o runtime.zip https://www.nuget.org/api/v2/package/runtime.$RUNTIME_ID.Microsoft.NETCore.App/$DOTNET_VERSION 104 | elif which wget; then 105 | wget -O runtime.zip https://www.nuget.org/api/v2/package/runtime.$RUNTIME_ID.Microsoft.NETCore.App/$DOTNET_VERSION 106 | else 107 | echo "Unable to pull runtime" 108 | exit 1 109 | fi 110 | 111 | # install unzip if necessary 112 | # alpine containers have unzip. others don't. use apt-get to bring it in. 113 | which unzip || { apt-get update && apt-get install unzip -y; } 114 | 115 | mkdir -p ./runtime 116 | unzip runtime.zip -d ./runtime 117 | cp ./runtime/tools/crossgen . 118 | chmod 744 ./crossgen 119 | rm -rf ./runtime 120 | 121 | # find libjitclr.so 122 | if [ -f $APP_DIR/libcrljit.so ]; then 123 | JIT_PATH=$APP_DIR/libcrljit.so 124 | elif [ -f $DOTNET_FRAMEWORK_PATH/libclrjit.so ]; then 125 | JIT_PATH=$DOTNET_FRAMEWORK_PATH/libclrjit.so 126 | else 127 | # look in other places? use find? 128 | echo "Unable to find libclrjit.so" 129 | exit 1 130 | fi 131 | 132 | # generate native image and perf map 133 | APP_BASE_NAME=${APP_DLL%.*} 134 | APP_NATIVE_IMAGE=$APP_BASE_NAME.ni.exe 135 | ./crossgen /JITPath $JIT_PATH \ 136 | /Platform_Assemblies_Paths $DOTNET_FRAMEWORK_PATH:$APP_DIR:$ADDITIONAL_PATHS \ 137 | $APP_DLL 138 | ./crossgen /Platform_Assemblies_Paths $DOTNET_FRAMEWORK_PATH:$APP_DIR:$ADDITIONAL_PATHS \ 139 | /CreatePerfMap /tmp \ 140 | $APP_NATIVE_IMAGE 141 | 142 | cp $APP_BASE_NAME.deps.json $APP_BASE_NAME.ni.deps.json 143 | cp $APP_BASE_NAME.runtimeconfig.json $APP_BASE_NAME.ni.runtimeconfig.json 144 | 145 | #cp to /app-profile 146 | cp -r $APP_DIR/* /app-profile 147 | 148 | # run native image 149 | dotnet /app-profile/$(basename $APP_NATIVE_IMAGE) 150 | -------------------------------------------------------------------------------- /dynamic-tracing/kubernetes.md: -------------------------------------------------------------------------------- 1 | # dynamic-tracing-kubernetes 2 | 3 | This guide will take you through setting up a pod to be dynamically traced in Kubernetes using a sidecar. See [dynamic-tracing.yaml](./dynamic-tracing.yaml) for the end product. It executes a fairly intimidating configmap at startup to generate a native image. See [the overview](./overview.md) for a description of what it does. 4 | 5 | #### Environment Variables 6 | Set the following environment variables for your main process. 7 | 8 | ``` 9 | env: 10 | - name: COMPlus_PerfMapEnabled 11 | value: "1" 12 | ``` 13 | 14 | `COMPlus_PerfMapEnabled` creates a perf map in `/tmp` that perf can read to symbolicate stack traces. 15 | 16 | #### Liveness Probe 17 | If you have a liveness probe you will want to increase the `initialDelaySeconds`. The startup script can take awhile to run. 18 | 19 | ``` 20 | livenessProbe: 21 | initialDelaySeconds: 600 22 | ``` 23 | 24 | #### shareProcessNamespace 25 | Setting `shareProcessNamespace` to true allows the sidecar to easily access the process you want to debug. 26 | 27 | #### Use runNative on startup 28 | Adjust the command to run [runNative.sh](./runNative.sh) instead of the actual app. [runNative.sh](./runNative.sh) will generate a native image of the app dll and then run the native image. See [the overview](./overview.md) for a description of what it does. 29 | 30 | ``` 31 | command: ["/run-native/runNative.sh"] 32 | args: ["/app/app.dll"] 33 | ``` 34 | 35 | ``` 36 | - name: run-native-volume 37 | mountPath: /run-native 38 | ... 39 | - name: run-native-volume 40 | configMap: 41 | defaultMode: 0740 42 | name: run-native 43 | ``` 44 | 45 | #### Mount shared folders 46 | Both `/tmp` and `/app-profile` need to be mounted as an emptydir and shared between your sidecar and your main container to allow for dynamic tracing. 47 | 48 | ``` 49 | - name: tmp 50 | emptyDir: {} 51 | - name: app 52 | emptyDir: {} 53 | ... 54 | - name: app 55 | mountPath: /app-profile 56 | - name: tmp 57 | mountPath: /tmp 58 | ``` 59 | 60 | Additionally mount `/sys` on the host for bcc. 61 | 62 | ``` 63 | - name: sys 64 | mountPath: /sys 65 | ... 66 | - name: sys 67 | hostPath: 68 | path: /sys 69 | type: Directory 70 | ``` 71 | 72 | #### Set privileged 73 | The sidecar executes kernel functions that requires elevated privileges. 74 | 75 | ``` 76 | securityContext: 77 | privileged: true 78 | ``` 79 | 80 | #### Mount host headers 81 | 82 | The `setup.sh` script will attempt to pull linux headers when it is run, but in a lot of cases this will fail to find the correct linux headers in the repos setup in the sidecar container. The following lines appear commented out in the `dynamic-tracing.yaml` and have been confirmed to work in GKE with an Ubuntu node. If you're having trouble with the setup.sh script pulling Linux headers try this approach. 83 | 84 | ``` 85 | - mountPath: /usr/src 86 | name: modules 87 | readOnly: true 88 | - mountPath: /lib/modules 89 | name: headers 90 | readOnly: true 91 | ... 92 | - hostPath: 93 | path: /usr/src 94 | type: Directory 95 | name: modules 96 | - hostPath: 97 | path: /lib/modules 98 | type: Directory 99 | name: headers 100 | ``` 101 | 102 | ## Next Steps 103 | See [probes](./probes.md) for more information about the kinds of probes you can place. This document has examples of using both perf and bcc to place dynamic probes on [this app](https://github.com/joe-elliott/sample-netcore-app) -------------------------------------------------------------------------------- /dynamic-tracing/mapgen.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # USAGE: dotnet-mapgen [-h] {generate,merge} PID 4 | # 5 | 6 | import argparse 7 | import glob 8 | import os 9 | import shutil 10 | import subprocess 11 | import tempfile 12 | 13 | def bail(error): 14 | print("ERROR: " + error) 15 | exit(1) 16 | 17 | def get_assembly_list(pid): 18 | assemblies = [] 19 | try: 20 | with open("/tmp/perfinfo-%d.map" % pid) as f: 21 | for line in f: 22 | parts = line.split(';') 23 | if len(parts) < 2 or parts[0] != "ImageLoad": 24 | continue 25 | assemblies.append(parts[1]) 26 | except IOError: 27 | bail("error opening /tmp/perfinfo-%d.map file" % pid) 28 | return assemblies 29 | 30 | def get_base_address(pid, assembly): 31 | hexaddr = subprocess.check_output( 32 | "cat /proc/%d/maps | grep %s | head -1 | cut -d '-' -f 1" % 33 | (pid, assembly), shell=True) 34 | if hexaddr == '': 35 | return -1 36 | return int(hexaddr, 16) 37 | 38 | def append_perf_map(assembly, asm_map, pid): 39 | base_address = get_base_address(pid, assembly) 40 | lines_to_add = "" 41 | with open(asm_map) as f: 42 | for line in f: 43 | parts = line.split() 44 | offset, size, symbol = parts[0], parts[1], str.join(" ", parts[2:]) 45 | offset = int(offset, 16) + base_address 46 | lines_to_add += "%016x %s %s\n" % (offset, size, symbol) 47 | with open("/tmp/perf-%d.map" % pid, "a") as perfmap: 48 | perfmap.write(lines_to_add) 49 | 50 | def merge(pid): 51 | assemblies = get_assembly_list(pid) 52 | succeeded, failed = (0, 0) 53 | for assembly in assemblies: 54 | # TODO The generated map files have a GUID embedded in them, which 55 | # allows multiple versions to coexist (probably). How do we get 56 | # this GUID? E.g.: 57 | # System.Runtime.ni.{819d412e-d773-4dbb-8d01-20d412b6cf09}.map 58 | # jpe - removed ni?: /tmp/%s.ni.{*}.map 59 | matches = glob.glob("/tmp/%s.{*}.map" % 60 | os.path.splitext(os.path.basename(assembly))[0]) 61 | if len(matches) == 0: 62 | failed += 1 63 | else: 64 | append_perf_map(assembly, matches[0], pid) 65 | succeeded += 1 66 | print("perfmap merging: %d succeeded, %d failed" % (succeeded, failed)) 67 | 68 | parser = argparse.ArgumentParser(description= 69 | "Generates map files for crossgen-compiled assemblies, and merges them " + 70 | "into the main perf map file. Built for use with .NET Core on Linux.") 71 | parser.add_argument("pid", type=int, help="the dotnet process id") 72 | args = parser.parse_args() 73 | 74 | merge(args.pid) -------------------------------------------------------------------------------- /dynamic-tracing/overview.md: -------------------------------------------------------------------------------- 1 | # dynamic-tracing 2 | 3 | Information about dynamically tracing netcore applications is sparse and sometimes incorrect. There is definitely still work to be done, but the below steps are a very good start. Most of the information contained in this document was pulled from: 4 | 5 | - [Using CrossGen to Create Native Images](https://github.com/dotnet/coreclr/blob/master/Documentation/building/crossgen.md) 6 | - [Dynamic Tracing of .NET Core Methods](https://blogs.microsoft.co.il/sasha/2018/02/08/dynamic-tracing-of-net-core-methods/) 7 | - Currently this link 404s, but leaving it here to document the work this is based on. Hopefully, we will see it again one day. 8 | - [perf Examples](http://www.brendangregg.com/perf.html) 9 | - [proc maps](https://stackoverflow.com/questions/1401359/understanding-linux-proc-id-maps) 10 | 11 | The below notes review generally how to dynamically trace a netcore application. See [this guide](./kubernetes.md) for a drop in method of dynamically tracing in Kubernetes from a sidecar. 12 | 13 | #### make a dotnet thing 14 | ``` 15 | ./dotnet new console 16 | ./dotnet publish . -o ./bin --self-contained --runtime linux-x64 17 | ``` 18 | 19 | #### generate native images using crossgen 20 | Crossgen is available in the appropriate runtime netcore nuget package. For instance if you have a 2.2.2 netcore app running on the `linux-musl-x64` runtime then you would download the following package. Unzip the package and look in the `./tools` directory to find crossgen. 21 | 22 | https://www.nuget.org/packages/runtime.linux-musl-x64.Microsoft.NETCore.App/2.2.2 23 | 24 | After you get a hold of the appropriate crossgen run the following commands. The first command generates the native image. The second generates a map file that we will use to determine the address at which to place a probe. 25 | 26 | ``` 27 | ./crossgen /JITPath bin/libclrjit.so /Platform_Assemblies_Paths bin bin/app.dll 28 | ./crossgen /Platform_Assemblies_Paths bin /CreatePerfMap /tmp bin/app.ni.exe 29 | ``` 30 | 31 | Do the above for every dll you want to place probes on. Presumably you can place probes on other dlls, but so far I have only done this with the primary dll or exe. 32 | 33 | #### Find the address to trace 34 | 35 | To place a probe we have to find the offset into the native image. You will use the the native image perf maps in `/tmp` and the process memory map located at `/proc//maps`. 36 | 37 | Get the process id. 38 | ``` 39 | root@sample-netcore-app:~# ps aux | grep dotnet 40 | root 112 0.0 3.8 11804944 77704 ? SLl 11:14 0:03 dotnet /app-profile/sample-netcore-app.ni.exe 41 | ``` 42 | 43 | Note the location of the method you want to trace. 44 | ``` 45 | root@sample-netcore-app:~# cat /tmp/sample-netcore-app.ni.\{e46f1077-89cb-4add-94fd-a6ae91a035fc\}.map | grep calculateEcho 46 | 0000000000021900 9 instance string [sample-netcore-app] sample_netcore_app.Providers.EchoProvider::calculateEchoValue(string) 47 | ``` 48 | 49 | Note the mmap'ed sections of the native image in the process memory map. 50 | ``` 51 | root@sample-netcore-app:~# cat /proc/112/maps | grep sample 52 | 7ff03c8a0000-7ff03c8a1000 r--p 00000000 08:01 8410243 /app-profile/sample-netcore-app.ni.exe 53 | 7ff03c8b0000-7ff03c8b1000 rw-p 00000000 08:01 8410243 /app-profile/sample-netcore-app.ni.exe 54 | 7ff03c8c0000-7ff03c8c3000 r-xp 00000000 08:01 8410243 /app-profile/sample-netcore-app.ni.exe 55 | 7ff03c8d2000-7ff03c8d3000 r--p 00002000 08:01 8410243 /app-profile/sample-netcore-app.ni.exe 56 | ``` 57 | 58 | Choose the appropriate section from the above four. The correct section will both be executable and will also contain the address we discovered above (0x21900 in our case). 59 | 60 | - The first section is not executable and contains offsets 0x00000->0x10000. 61 | - The second section is not executable and contains offsets 0x10000->0x20000. 62 | - The third section is executable and contains offsets 0x20000->0x30000. This is the correct section! 63 | - The fourth section is not executable and contains offsets 0x30000->0x40000 (See Notes below). 64 | 65 | Once you have all of the above values you can caluclate the offset using the following calculation: 66 | ``` 67 | MethodAddress - (ExeSectionStartAddress - FirstSectionStartAddress) + SectionOffset 68 | ``` 69 | In our case 70 | ``` 71 | 0x21900 - (0x7ff03c8c0000 - 0x7ff03c8a0000) + 0x0000 = 0x1900 72 | ``` 73 | 74 | *Note*: I'm honestly not sure how the SectionOffset works into the above calculations. The third column is an offset into the file (SectionOffset) that was passed when mmap was called. I've never had this land on the same section as the executable to really test how they impact calculating the offset for dynamic tracing. [calc-offsets.py](../images/calc-offsets.py) uses the original calculations provided by Sasha Goldstein. 75 | 76 | #### Add a probe, trace it, view it and remove 77 | 78 | ``` 79 | perf probe -x ./bin/app.ni.exe --add 0x1900 80 | perf record -e probe_app:* -ag -- sleep 10 81 | perf script 82 | perf probe --del=* 83 | ``` 84 | 85 | #### Next Steps 86 | 87 | See [probes](./probes.md) for more information about the kinds of probes you can place. This document has examples of using both perf and bcc to place dynamic probes on [this app](https://github.com/joe-elliott/sample-netcore-app) 88 | 89 | #### calc-offsets.py 90 | 91 | The helper script [calc-offsets.py](../images/calc-offsets.py) is provided to perform the above calculations automatically. 92 | 93 | ``` 94 | root@sample-netcore-app:~# python calc-offsets.py 112 sample-netcore-app.ni.exe 95 | offset: 17e0 : void [sample-netcore-app] sample_netcore_app.Program::Main(string[]) 96 | offset: 1820 : class [Microsoft.AspNetCore.Hosting.Abstractions]Microsoft.AspNetCore.Hosting.IWebHostBuilder [sample-netcore-app] sample_netcore_app.Program::CreateWebHostBuilder(string[]) 97 | offset: 1840 : instance void [sample-netcore-app] sample_netcore_app.Program::.ctor() 98 | offset: 1850 : instance void [sample-netcore-app] sample_netcore_app.Startup::.ctor(class [Microsoft.Extensions.Configuration.Abstractions]Microsoft.Extensions.Configuration.IConfiguration) 99 | offset: 1870 : instance class [Microsoft.Extensions.Configuration.Abstractions]Microsoft.Extensions.Configuration.IConfiguration [sample-netcore-app] sample_netcore_app.Startup::get_Configuration() 100 | offset: 1880 : instance void [sample-netcore-app] sample_netcore_app.Startup::ConfigureServices(class [Microsoft.Extensions.DependencyInjection.Abstractions]Microsoft.Extensions.DependencyInjection.IServiceCollection) 101 | offset: 18d0 : instance void [sample-netcore-app] sample_netcore_app.Startup::Configure(class [Microsoft.AspNetCore.Http.Abstractions]Microsoft.AspNetCore.Builder.IApplicationBuilder,class [Microsoft.AspNetCore.Hosting.Abstractions]Microsoft.AspNetCore.Hosting.IHostingEnvironment) 102 | offset: 18f0 : instance void [sample-netcore-app] sample_netcore_app.Providers.EchoProvider::.ctor() 103 | offset: 1900 : instance string [sample-netcore-app] sample_netcore_app.Providers.EchoProvider::calculateEchoValue(string) 104 | offset: 1910 : instance void [sample-netcore-app] sample_netcore_app.Providers.FibonacciProvider::.ctor() 105 | offset: 1920 : instance int32 [sample-netcore-app] sample_netcore_app.Providers.FibonacciProvider::calculateFibonacciValue(int32) 106 | offset: 1950 : instance int32 [sample-netcore-app] sample_netcore_app.Providers.FibonacciProvider::calculateFibonacciValueRecursive(int32,int32,int32,int32) 107 | offset: 1980 : instance void [sample-netcore-app] sample_netcore_app.Controllers.EchoController::.ctor(class sample_netcore_app.Providers.IEchoProvider) 108 | offset: 19c0 : instance class [Microsoft.AspNetCore.Mvc.Core]Microsoft.AspNetCore.Mvc.ActionResult`1 [sample-netcore-app] sample_netcore_app.Controllers.EchoController::Get(string) 109 | offset: 1a00 : instance void [sample-netcore-app] sample_netcore_app.Controllers.FibonacciController::.ctor(class sample_netcore_app.Providers.IFibonacciProvider) 110 | offset: 1a40 : instance class [Microsoft.AspNetCore.Mvc.Core]Microsoft.AspNetCore.Mvc.ActionResult`1 [sample-netcore-app] sample_netcore_app.Controllers.FibonacciController::Get(int32) 111 | ``` -------------------------------------------------------------------------------- /dynamic-tracing/perf/readme.md: -------------------------------------------------------------------------------- 1 | # perf 2 | 3 | This document shows step by step examples on using perf to dynamically trace [this application](https://github.com/joe-elliott/sample-netcore-app) in your cluster with [the sidecar](https://hub.docker.com/r/joeelliott/netcore-debugging-tools) generated from this repo. 4 | 5 | See [probes](../probes.md) for more information on setup. After you have followed the steps there come back to learn how to use perf. 6 | 7 | #### Add the probe 8 | ``` 9 | # perf probe -x /app-profile/sample-netcore-app.ni.exe --add '0x1920' 10 | ``` 11 | 12 | #### Record 13 | ``` 14 | # perf record -e probe_sample:* -ag -- sleep 10 15 | ``` 16 | 17 | #### Exercise 18 | ``` 19 | $ curl http://sample-netcore-app/api/fibonacci?pos=3 20 | 3 21 | ``` 22 | 23 | #### Dump Results 24 | ``` 25 | # perf script 26 | Failed to open /app-profile/sample-netcore-app.ni.exe, continuing without symbols 27 | Failed to open /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll, continuing without symbols 28 | Failed to open /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Routing.dll, continuing without symbols 29 | Failed to open /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.HostFiltering.dll, continuing without symbols 30 | Failed to open /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Hosting.dll, continuing without symbols 31 | Failed to open /usr/share/dotnet/shared/Microsoft.NETCore.App/2.2.5/System.Private.CoreLib.dll, continuing without symbols 32 | Failed to open /lib/x86_64-linux-gnu/libpthread-2.24.so, continuing without symbols 33 | dotnet 29638 [000] 930393.538484: probe_sample:abs_1920: (7f8ccc5a1920) 34 | 1920 [unknown] (/app-profile/sample-netcore-app.ni.exe) 35 | 123c93 [unknown] (/usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll) 36 | 1331c2 [unknown] (/usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll) 37 | 7f8cd21d0bbb void [System.Private.CoreLib] System.Runtime.CompilerServices.AsyncMethodBuilderCore::Start(!!0&)+0x3b (/tmp/perf-247.map) 38 | 7f8cd21d0b49 instance class [netstandard]System.Threading.Tasks.Task [Microsoft.AspNetCore.Mvc.Core] Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker::Invoke 39 | f11f2 [unknown] (/usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll) 40 | 132c64 [unknown] (/usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Mvc.Core.dll) 41 | ... 42 | ``` 43 | The call stack is currently quite bad. We might be able to improve this by running crossgen on dependent dlls and using mapgen to merge them into the perfmap. There is definitely [work to be done](../../todo) in this area. 44 | 45 | #### Integer Parameters 46 | Parameters can be recorded by understanding which registers are used to pass various parameter types. See System V AMD64 ABI in https://en.wikipedia.org/wiki/X86_calling_conventions. 47 | 48 | Also, even though it's for kprobes, this (https://www.kernel.org/doc/Documentation/trace/kprobetrace.txt) is the best document I can find which shows how to request and format registers and memory locations. 49 | 50 | ``` 51 | # perf probe -x /app-profile/sample-netcore-app.ni.exe --add '0x1920 pos=%si:s32' 52 | ``` 53 | 54 | ``` 55 | $ curl http://sample-netcore-app/api/fibonacci?pos=3 56 | 3 57 | ``` 58 | 59 | Note the named parameter "pos" is formatted as a signed 32 bit integer: 60 | ``` 61 | # perf script 62 | Failed to open /app-profile/sample-netcore-app.ni.exe, continuing without symbols 63 | dotnet 22154 [000] 1762703.370019: probe_sample:abs_1920: (7f784eea1920) pos=3 64 | ``` 65 | 66 | ### Return Values (uretprobes) 67 | 68 | ``` 69 | # perf probe -x /app-profile/sample-netcore-app.ni.exe --add '0x1920%return ret=$retval:s32' 70 | ``` 71 | 72 | ``` 73 | $ curl http://sample-netcore-app/api/fibonacci?pos=10 74 | 89 75 | ``` 76 | 77 | Note that the return value of `89` is successfully recorded and displayed: 78 | ``` 79 | # perf script 80 | dotnet 22346 [000] 1762828.667743: probe_sample:abs_1920: (7f784eea1920 <- 7f78528413d5) ret=89 81 | ``` 82 | 83 | ### String Parameters 84 | 85 | Through trial and error I have found that the netcore string's length is a 32 bit int stored 8 bytes offset from the string pointer. Note that we are using `(%si)` to dereference the value in RSI because this is a string type. We are also pulling 128 bits of the string itself in two 64 bit chunks. 86 | 87 | Note that perf supports a string type directly. However, this requires a null terminated string. 88 | 89 | ``` 90 | perf probe -x /app-profile/sample-netcore-app.ni.exe --add '0x1900 len=+8(%si):u32 str=+12(%si):x64 str2=+20(%si)' 91 | ``` 92 | 93 | Exercise 94 | ``` 95 | $ curl http://sample-netcore-app/api/echo?echo=abc 96 | abc 97 | $ curl http://sample-netcore-app/api/echo?echo=abcdef 98 | abcdef 99 | $ curl http://sample-netcore-app/api/echo?echo=abcdefghi 100 | abcdefghi 101 | $ curl http://sample-netcore-app/api/echo?echo=abcdefghijkl 102 | abcdefghijkl 103 | $ curl http://sample-netcore-app/api/echo?echo=abcdefghijklmno 104 | abcdefghijklmno 105 | ``` 106 | 107 | Note that `str` and `str2`'s bytes are actually in reverse order. I am unsure why this is. 108 | ``` 109 | # perf script 110 | Failed to open /app-profile/sample-netcore-app.ni.exe, continuing without symbols 111 | dotnet 24162 [000] 33591.658838: probe_sample:abs_1900: (7fc028bd1900) len=3 str=0x6300620061 str2=0x0 112 | dotnet 24162 [000] 33593.727911: probe_sample:abs_1900: (7fc028bd1900) len=6 str=0x64006300620061 str2=0x660065 113 | dotnet 24162 [000] 33595.926689: probe_sample:abs_1900: (7fc028bd1900) len=9 str=0x64006300620061 str2=0x68006700660065 114 | dotnet 24162 [000] 33598.230186: probe_sample:abs_1900: (7fc028bd1900) len=12 str=0x64006300620061 str2=0x68006700660065 115 | dotnet 24162 [000] 33600.630045: probe_sample:abs_1900: (7fc028bd1900) len=15 str=0x64006300620061 str2=0x68006700660065 116 | ``` -------------------------------------------------------------------------------- /dynamic-tracing/probes.md: -------------------------------------------------------------------------------- 1 | # probes 2 | 3 | This document shows step by step examples on using both perf and bcc to dynamically trace [this application](https://github.com/joe-elliott/sample-netcore-app) in your cluster with [the sidecar](https://hub.docker.com/r/joeelliott/netcore-debugging-tools) generated from this repo. 4 | 5 | #### Run Application 6 | 7 | Create [dynamic-tracing.yaml](./dynamic-tracing.yaml) Kubernetes specs in your cluster. Exec into the sidecar and run `./setup.sh`. The tools we are using are very tightly coupled with the kernel version you want to debug. Because of this we can't install all of the tools we need directly in the container. They must be installed once the container is running and the kernel version is known. `./setup.sh` will attempt to install the rest. If you are having issues refer to the notes on [kernel interactions](../kernel-interactions) with the container. 8 | 9 | ``` 10 | kubectl exec -it -c profile-sidecar sample-netcore-app bash 11 | # ./setup.sh 12 | ``` 13 | 14 | ~Use [mapgen.py](./mapgen.py) to merge the native image perf map with the standard perf map. Adapted from this [script](https://gist.github.com/goldshtn/fe3f7c3b10ec7e5511ae755abaf52172).~ At this point I mostly think that mapgen doesn't work. There is a lot of work still [to be done](../todo) on building good stack traces in perf while dynamic tracing. 15 | 16 | #### Dump offsets 17 | 18 | Use [calc-offsets.py](../images/calc-offsets.py) to see method offsets for use in probing. Record these offsets for both the perf and bcc guides below. 19 | 20 | ``` 21 | root@sample-netcore-app:~# python calc-offsets.py 112 sample-netcore-app.ni.exe 22 | offset: 17e0 : void [sample-netcore-app] sample_netcore_app.Program::Main(string[]) 23 | offset: 1820 : class [Microsoft.AspNetCore.Hosting.Abstractions]Microsoft.AspNetCore.Hosting.IWebHostBuilder [sample-netcore-app] sample_netcore_app.Program::CreateWebHostBuilder(string[]) 24 | offset: 1840 : instance void [sample-netcore-app] sample_netcore_app.Program::.ctor() 25 | offset: 1850 : instance void [sample-netcore-app] sample_netcore_app.Startup::.ctor(class [Microsoft.Extensions.Configuration.Abstractions]Microsoft.Extensions.Configuration.IConfiguration) 26 | offset: 1870 : instance class [Microsoft.Extensions.Configuration.Abstractions]Microsoft.Extensions.Configuration.IConfiguration [sample-netcore-app] sample_netcore_app.Startup::get_Configuration() 27 | offset: 1880 : instance void [sample-netcore-app] sample_netcore_app.Startup::ConfigureServices(class [Microsoft.Extensions.DependencyInjection.Abstractions]Microsoft.Extensions.DependencyInjection.IServiceCollection) 28 | offset: 18d0 : instance void [sample-netcore-app] sample_netcore_app.Startup::Configure(class [Microsoft.AspNetCore.Http.Abstractions]Microsoft.AspNetCore.Builder.IApplicationBuilder,class [Microsoft.AspNetCore.Hosting.Abstractions]Microsoft.AspNetCore.Hosting.IHostingEnvironment) 29 | offset: 18f0 : instance void [sample-netcore-app] sample_netcore_app.Providers.EchoProvider::.ctor() 30 | offset: 1900 : instance string [sample-netcore-app] sample_netcore_app.Providers.EchoProvider::calculateEchoValue(string) 31 | offset: 1910 : instance void [sample-netcore-app] sample_netcore_app.Providers.FibonacciProvider::.ctor() 32 | offset: 1920 : instance int32 [sample-netcore-app] sample_netcore_app.Providers.FibonacciProvider::calculateFibonacciValue(int32) 33 | offset: 1950 : instance int32 [sample-netcore-app] sample_netcore_app.Providers.FibonacciProvider::calculateFibonacciValueRecursive(int32,int32,int32,int32) 34 | offset: 1980 : instance void [sample-netcore-app] sample_netcore_app.Controllers.EchoController::.ctor(class sample_netcore_app.Providers.IEchoProvider) 35 | offset: 19c0 : instance class [Microsoft.AspNetCore.Mvc.Core]Microsoft.AspNetCore.Mvc.ActionResult`1 [sample-netcore-app] sample_netcore_app.Controllers.EchoController::Get(string) 36 | offset: 1a00 : instance void [sample-netcore-app] sample_netcore_app.Controllers.FibonacciController::.ctor(class sample_netcore_app.Providers.IFibonacciProvider) 37 | offset: 1a40 : instance class [Microsoft.AspNetCore.Mvc.Core]Microsoft.AspNetCore.Mvc.ActionResult`1 [sample-netcore-app] sample_netcore_app.Controllers.FibonacciController::Get(int32) 38 | ``` 39 | 40 | ## Examples 41 | After calculating the appropriate offset dynamic tracing can be accomplished with a number of tools. 42 | 43 | - [perf](./perf) 44 | - [bcc](./bcc) 45 | 46 | In both cases we will be dumping registers in order to inspect method parameters. See System V AMD64 ABI in https://en.wikipedia.org/wiki/X86_calling_conventions. 47 | -------------------------------------------------------------------------------- /dynamic-tracing/readme.md: -------------------------------------------------------------------------------- 1 | # dynamic-tracing 2 | 3 | Dynamic tracing allows instrumentation of code without recompiling. This includes the ability to not only record when specific methods are being called but also dump parameters or return values from methods. It can give you incredible insight into the behavior of a live application without any changes to the codebase. 4 | 5 | The following guides show how to generally perform dynamic tracing with netcore as well as specific details of how to trace an application running in Kubernetes from a sidecar. 6 | 7 | - [Overview](./overview.md) 8 | - General guide on how to perform dynamic tracing with netcore. 9 | - [In Kubernetes](./kubernetes.md) 10 | - Specialized scripts and techniques for dynamic tracing netcore apps in Kubernetes. 11 | - [Probes](./probes.md) 12 | - Different kinds of probes that can be placed once a the address of a function is determined. If you want to skip the details and get right to the live examples click here! -------------------------------------------------------------------------------- /dynamic-tracing/runNative.sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | # 3 | # runNative.sh 4 | # ./runNative /app/app.dll 5 | # 6 | 7 | APP_DLL=$1 8 | APP_DIR=$(dirname "$APP_DLL") 9 | DOTNET_VERSION=$(dotnet --info | grep Version | cut -f2 -d":" | xargs) 10 | DOTNET_FRAMEWORK_PATH=/usr/share/dotnet/shared/Microsoft.NETCore.App/$DOTNET_VERSION 11 | #todo: dynamically generate this with dotnet --list-runtimes 12 | ADDITIONAL_PATHS=/usr/share/dotnet/shared/Microsoft.AspNetCore.All/$DOTNET_VERSION:/usr/share/dotnet/shared/Microsoft.AspNetCore.App/$DOTNET_VERSION 13 | 14 | # using the shell name to guess the runtime id. can't find a better way to do this 15 | # bash => linux-x64 16 | # ash => linux-musl-x64 17 | if [ -f /bin/bash ]; then 18 | RUNTIME_ID=linux-x64 19 | else 20 | RUNTIME_ID=linux-musl-x64 21 | fi 22 | 23 | # get dotnet sdk 24 | echo -- Grabbing netcore sdk $DOTNET_VERSION for runtime $RUNTIME_ID 25 | 26 | # alpine containers have wget. standard have curl 27 | if which curl; then 28 | curl -L -o runtime.zip https://www.nuget.org/api/v2/package/runtime.$RUNTIME_ID.Microsoft.NETCore.App/$DOTNET_VERSION 29 | elif which wget; then 30 | wget -O runtime.zip https://www.nuget.org/api/v2/package/runtime.$RUNTIME_ID.Microsoft.NETCore.App/$DOTNET_VERSION 31 | else 32 | echo "Unable to pull runtime" 33 | exit 1 34 | fi 35 | 36 | # install unzip if necessary 37 | # alpine containers have unzip. others don't. use apt-get to bring it in. 38 | which unzip || { apt-get update && apt-get install unzip -y; } 39 | 40 | mkdir -p ./runtime 41 | unzip runtime.zip -d ./runtime 42 | cp ./runtime/tools/crossgen . 43 | chmod 744 ./crossgen 44 | rm -rf ./runtime 45 | 46 | # find libjitclr.so 47 | if [ -f $APP_DIR/libcrljit.so ]; then 48 | JIT_PATH=$APP_DIR/libcrljit.so 49 | elif [ -f $DOTNET_FRAMEWORK_PATH/libclrjit.so ]; then 50 | JIT_PATH=$DOTNET_FRAMEWORK_PATH/libclrjit.so 51 | else 52 | # look in other places? use find? 53 | echo "Unable to find libclrjit.so" 54 | exit 1 55 | fi 56 | 57 | # generate native image and perf map 58 | # todo: support netcore dependencies 59 | APP_BASE_NAME=${APP_DLL%.*} 60 | APP_NATIVE_IMAGE=$APP_BASE_NAME.ni.exe 61 | ./crossgen /JITPath $JIT_PATH \ 62 | /Platform_Assemblies_Paths $DOTNET_FRAMEWORK_PATH:$APP_DIR:$ADDITIONAL_PATHS \ 63 | $APP_DLL 64 | ./crossgen /Platform_Assemblies_Paths $DOTNET_FRAMEWORK_PATH:$APP_DIR:$ADDITIONAL_PATHS \ 65 | /CreatePerfMap /tmp \ 66 | $APP_NATIVE_IMAGE 67 | 68 | # todo: support self contained builds. this assumes a framework dependent build 69 | cp $APP_BASE_NAME.deps.json $APP_BASE_NAME.ni.deps.json 70 | cp $APP_BASE_NAME.runtimeconfig.json $APP_BASE_NAME.ni.runtimeconfig.json 71 | 72 | # required for dynamic tracing from the host machine. the pod must mount /app-profile on the host to 73 | # /app-profile in container 74 | cp -r $APP_DIR/* /app-profile 75 | 76 | # run native image 77 | dotnet /app-profile/$(basename $APP_NATIVE_IMAGE) -------------------------------------------------------------------------------- /images/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM mcr.microsoft.com/dotnet/core/sdk:2.2.300-stretch 2 | 3 | WORKDIR /root 4 | 5 | ENV PATH="${PATH}:/root/.dotnet/tools" 6 | 7 | RUN echo deb http://cloudfront.debian.net/debian sid main >> /etc/apt/sources.list \ 8 | && apt-get update \ 9 | && apt-get install -y \ 10 | git \ 11 | bpfcc-tools \ 12 | lttng-tools \ 13 | liblttng-ust-dev \ 14 | procps \ 15 | lldb \ 16 | && rm -rf /var/lib/apt/lists/* \ 17 | && git clone --depth=1 https://github.com/BrendanGregg/FlameGraph \ 18 | && curl -OL https://aka.ms/perfcollect \ 19 | && chmod +x ./perfcollect \ 20 | && dotnet tool install -g dotnet-dump --version 3.0.0-preview7.19365.2 21 | 22 | ADD setup.sh \ 23 | setup.4.15.sh \ 24 | calc-offsets.py \ 25 | netcore-bcc-trace.py \ 26 | ./ 27 | 28 | RUN chmod +x setup.sh setup.4.15.sh -------------------------------------------------------------------------------- /images/Dockerfile.alpine: -------------------------------------------------------------------------------- 1 | FROM mcr.microsoft.com/dotnet/core/sdk:2.2.300-alpine3.9 2 | 3 | WORKDIR /root 4 | 5 | ENV PATH="${PATH}:/root/.dotnet/tools" 6 | 7 | RUN echo http://dl-cdn.alpinelinux.org/alpine/edge/testing >> /etc/apk/repositories \ 8 | && apk update \ 9 | && apk upgrade \ 10 | && apk add bash \ 11 | ncurses \ 12 | perf \ 13 | git \ 14 | perl \ 15 | && rm -rf /var/cache/apk/* \ 16 | && git clone --depth=1 https://github.com/BrendanGregg/FlameGraph \ 17 | && dotnet tool install -g dotnet-dump --version 3.0.0-preview7.19365.2 -------------------------------------------------------------------------------- /images/calc-offsets.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # 4 | # calc-offsets.py 5 | # e.g. calc-offsets.py 1234 sample-netcore-app.ni.exe 6 | # 7 | # Parses /tmp//maps and /tmp/ file and dumps a list 8 | # of offsets for all methods in the native image map file generated by crossgen. 9 | # These offsets are then usable in tools like perf and bcc for dynamic tracing 10 | # 11 | 12 | import argparse 13 | import os 14 | import re 15 | import subprocess 16 | 17 | class Section(object): 18 | def __init__(self, start, end, perms, offset, path): 19 | self.start = int(start, 16) 20 | self.end = int(end, 16) 21 | self.perms = perms 22 | self.offset = int(offset, 16) 23 | self.path = path 24 | 25 | def all_sections(pid): 26 | sections = {} 27 | with open("/proc/%d/maps" % pid, "r") as maps: 28 | for line in maps: 29 | match = re.match(r"(\S+)-(\S+)\s+(\S+)\s+(\S+)\s+\S+\s+\S+\s+(\S+)", line.strip()) 30 | if match is None: 31 | continue 32 | start, end, perms, offset, path = match.group(1, 2, 3, 4, 5) 33 | if '/' not in path: 34 | continue 35 | filename = os.path.basename(path) 36 | section = Section(start, end, perms, offset, path) 37 | if filename in sections: 38 | sections[filename].append(section) 39 | else: 40 | sections[filename] = [section] 41 | return sections 42 | 43 | parser = argparse.ArgumentParser(description="Place dynamic tracing probes on a managed method " + 44 | "that resides in a crossgen-compiled assembly. For .NET Core on Linux.", 45 | epilog="EXAMPLE: ./place-probe.py 1234 sample-netcore-app.ni.exe") 46 | parser.add_argument("pid", type=int, help="the dotnet process id") 47 | parser.add_argument("nativeimage", type=str, help="name of the native image generated by crossgen") 48 | args = parser.parse_args() 49 | 50 | sections = all_sections(args.pid) 51 | 52 | output = subprocess.check_output("cat /tmp/%s*map" % os.path.splitext(args.nativeimage)[0], shell=True) 53 | assembly = args.nativeimage 54 | 55 | for line in output.strip().split('\n'): 56 | parts = line.split() 57 | 58 | address = int(parts[0], 16) 59 | symbol = str.join(' ', parts[2:]) 60 | 61 | first_section = sections[assembly][0] 62 | # exec section has to be be executable and contain the method in question 63 | exec_section = [section for section in sections[assembly] 64 | if 'r-xp' == section.perms and 65 | (section.start - first_section.start) < address and 66 | (section.end - first_section.start) > address][0] 67 | 68 | offset_from_first = exec_section.start - first_section.start 69 | offset_in_file = exec_section.offset 70 | 71 | final_address = address - offset_from_first + offset_in_file 72 | 73 | print("offset: %x : %s" % 74 | (final_address, ' '.join(parts[2:]) )) -------------------------------------------------------------------------------- /images/netcore-bcc-trace.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | from bcc import BPF 3 | import argparse 4 | 5 | parser = argparse.ArgumentParser('netcore-bcc-trace') 6 | parser.add_argument('nativeImagePath', help='Full path to the native netcore image to trace.', type=str) 7 | parser.add_argument('methodOffset', help='The offset of the method to trace.', type=lambda x: int(x,0)) 8 | parser.add_argument('type', help='The type of parameter or return value.', choices=['int', 'str'], type=str) 9 | parser.add_argument('--len', help='The max length to print for string types.', default=50) 10 | parser.add_argument('--ret', help='Pass this flag if you want to trace a return value instead of a parameter.', default=False, action='store_true') 11 | args = parser.parse_args() 12 | 13 | def generateBPF(type, maxLength, isReturn): 14 | bpf=""" 15 | #include 16 | 17 | int trace(struct pt_regs *ctx) { 18 | if (!PT_REGS_PARM1(ctx) ) { 19 | bpf_trace_printk("arg error\\n"); 20 | return 0; 21 | } 22 | """ 23 | 24 | if type == 'int': 25 | if isReturn: 26 | bpf += 'bpf_trace_printk("val %d\\n", PT_REGS_RC(ctx));' 27 | else: 28 | bpf += 'bpf_trace_printk("val %d\\n", PT_REGS_PARM2(ctx));' 29 | elif type == 'str': 30 | # print a string up to maxLength characters 31 | if isReturn: 32 | bpf += 'void *str = (void *)PT_REGS_RC(ctx);' 33 | else: 34 | bpf += 'void *str = (void *)PT_REGS_PARM2(ctx);' 35 | 36 | bpf += """ 37 | if( !str ) { 38 | bpf_trace_printk("null pointer\\n"); 39 | return 0; 40 | } 41 | 42 | int len; 43 | bpf_probe_read(&len, sizeof(len), (void *)(str + 8)); 44 | """ 45 | 46 | # create a large enough char buff 47 | bpf += """ 48 | char buf[%d]; 49 | bpf_probe_read(buf, %d * sizeof(char), (void *)(str + 11)); 50 | 51 | """ % (maxLength * 2, maxLength * 2) 52 | 53 | pos = 0 54 | while pos < maxLength: 55 | bpf += """ 56 | buf[%d] = buf[%d]; 57 | """ % (pos, pos * 2 + 1) 58 | pos += 1 59 | 60 | bpf += """ 61 | bpf_trace_printk("len %d : %s \\n", len, buf); 62 | """ 63 | 64 | bpf += """ 65 | return 0; 66 | } 67 | """ 68 | 69 | return bpf 70 | 71 | bpf = generateBPF(args.type, args.len, args.ret) 72 | b = BPF(text=bpf) 73 | 74 | print('Begin tracing. Hit Ctrl+C to exit.') 75 | 76 | try: 77 | if args.ret: 78 | b.attach_uretprobe(name=args.nativeImagePath, addr=args.methodOffset, fn_name="trace").trace_print() 79 | else: 80 | b.attach_uprobe(name=args.nativeImagePath, addr=args.methodOffset, fn_name="trace").trace_print() 81 | except KeyboardInterrupt: 82 | print('Exiting...') 83 | 84 | -------------------------------------------------------------------------------- /images/readme.md: -------------------------------------------------------------------------------- 1 | # images 2 | 3 | A collection of Dockerfiles to build sidecar containers from which to profile a netcore application. These images will contain tooling necessary to profile using all the techniques discussed in this repo. 4 | 5 | Currently these images are only 2.2.5 which purposefully matches the runtime builds for the sample application: https://github.com/joe-elliott/sample-netcore-app. With additional work this could be extended to cover other netcore versions. 6 | 7 | #### ./Dockerfile.alpine 8 | 9 | The alpine image lacks bcc and lttng so any examples using those tools will not work. -------------------------------------------------------------------------------- /images/setup.4.15.sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | 3 | echo deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20180222 sid main contrib >> /etc/apt/sources.list 4 | 5 | apt-get update 6 | 7 | ./perfcollect install 8 | 9 | apt-get install -y --allow-downgrades \ 10 | linux-headers-4.15 \ 11 | linux-perf=4.15+90 12 | 13 | # Added this in an attempt to get bcc to work. It did not. 14 | # ln -s /lib/modules/4.15.0-1-amd64 /lib/modules/4.15.0 15 | 16 | -------------------------------------------------------------------------------- /images/setup.sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | apt-get update 3 | 4 | ./perfcollect install 5 | apt-get install -y linux-headers-`uname -r` 6 | 7 | -------------------------------------------------------------------------------- /images/trace-hist.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | from bcc import BPF 4 | from time import sleep 5 | 6 | bpf=""" 7 | #include 8 | #include 9 | 10 | BPF_HISTOGRAM(dist); 11 | BPF_HISTOGRAM(dist_linear); 12 | 13 | int trace(struct pt_regs *ctx) { 14 | if (!PT_REGS_PARM1(ctx) ) { 15 | bpf_trace_printk("arg error\\n"); 16 | return 0; 17 | } 18 | 19 | dist.increment(bpf_log2l(ctx->si)); 20 | dist_linear.increment(ctx->si); 21 | 22 | return 0; 23 | } 24 | """ 25 | 26 | b = BPF(text=bpf) 27 | b.attach_uprobe(name="/app-profile/sample-netcore-app.ni.exe", addr=0x1920, fn_name="trace") 28 | 29 | print("Tracing .. Hit Ctrl-C to end.") 30 | 31 | # trace until Ctrl-C 32 | try: 33 | sleep(99999999) 34 | except KeyboardInterrupt: 35 | print() 36 | 37 | # output 38 | print("log2 histogram") 39 | print("~~~~~~~~~~~~~~") 40 | b["dist"].print_log2_hist("pos") 41 | 42 | print("\nlinear histogram") 43 | print("~~~~~~~~~~~~~~~~") 44 | b["dist_linear"].print_linear_hist("pos") -------------------------------------------------------------------------------- /kernel-interactions/readme.md: -------------------------------------------------------------------------------- 1 | # kernel-interactions 2 | 3 | When you run [setup.sh](../images/setup.sh) it pulls tools to help with the various debugging methods. Often these tools are compiled and packaged for a single kernel version. Unfortunately, the debian repos that the runtime containers are pointed at can be missing packages for the kernel you happen to be running on. At least kernel versions 4.19 and 4.9 appear to work out of the box. 4 | 5 | This document contains information about how to get the debug tooling working on kernels that do not immediately work. Some of these techniques may be dangerous or provide inconsistent results. 6 | 7 | ## Kernel 4.15 8 | 9 | A [setup script](../images/setup.4.15.sh) has been provided for 4.15. Note that it uses a snapshot of the unstable repo from 2018 to find 4.15 tooling and linux headers. 10 | 11 | ``` 12 | echo deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20180222 sid main contrib >> /etc/apt/sources.list 13 | ``` 14 | 15 | Using this technique everything except dynamic tracing seems to work. 16 | 17 | ## Kernel 4.14 18 | 19 | Similar to 4.15 20 | 21 | ``` 22 | echo deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20180222 sid main contrib >> /etc/apt/sources.list 23 | apt-get update 24 | apt-get install linux-tools-4.14 25 | ``` 26 | 27 | ## Kernel 4.9 28 | 29 | After running `setup.sh` run the following to install the correct version of perf. 30 | 31 | ``` 32 | apt-get install linux-perf-4.9 33 | ``` 34 | 35 | # Linux Headers 36 | 37 | Currently setup.sh attempts to pull the appropriate Linux headers. I believe this will only be successful in a small number of scenarios in which the container repos happen to have the appropriate headers for the host. Commented out lines have been added to [dynamic-tracing.yaml](../dynamic-tracing/dynamic-tracing.yaml) that successfully mount Linux headers in GKE on Ubuntu nodes and allow bpf/bcc dynamic tracing. With additional testing this may become the preferred method. -------------------------------------------------------------------------------- /perfcollect/calltree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joe-elliott/netcore-kubernetes-profiling/1c66cb84d8b9398292d46da34d4b2dba2dbe84ef/perfcollect/calltree.png -------------------------------------------------------------------------------- /perfcollect/events.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joe-elliott/netcore-kubernetes-profiling/1c66cb84d8b9398292d46da34d4b2dba2dbe84ef/perfcollect/events.png -------------------------------------------------------------------------------- /perfcollect/flamegraph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joe-elliott/netcore-kubernetes-profiling/1c66cb84d8b9398292d46da34d4b2dba2dbe84ef/perfcollect/flamegraph.png -------------------------------------------------------------------------------- /perfcollect/perfcollect.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Pod 3 | metadata: 4 | name: sample-netcore-app 5 | labels: 6 | app: sample-netcore-app 7 | spec: 8 | shareProcessNamespace: true 9 | containers: 10 | - name: sample-netcore-app 11 | image: joeelliott/sample-netcore-app:v1.0.0-2.2.5 12 | imagePullPolicy: IfNotPresent 13 | env: 14 | - name: COMPlus_PerfMapEnabled 15 | value: "1" 16 | - name: COMPlus_EnableEventLog 17 | value: "1" 18 | - name: COMPlus_ZapDisable 19 | value: "1" 20 | - name: ASPNETCORE_URLS 21 | value: http://*:8080 22 | volumeMounts: 23 | - mountPath: /var/run/lttng 24 | name: lttng 25 | - mountPath: /tmp 26 | name: tmp 27 | - name: profile-sidecar 28 | image: joeelliott/netcore-debugging-tools:v0.0.7-2.2.5 29 | imagePullPolicy: IfNotPresent 30 | securityContext: 31 | privileged: true 32 | args: 33 | - sleep 34 | - infinity 35 | volumeMounts: 36 | - mountPath: /var/run/lttng 37 | name: lttng 38 | - mountPath: /tmp 39 | name: tmp 40 | volumes: 41 | - name: lttng 42 | emptyDir: {} 43 | - name: tmp 44 | emptyDir: {} -------------------------------------------------------------------------------- /perfcollect/readme.md: -------------------------------------------------------------------------------- 1 | # perfcollect 2 | 3 | [perfcollect](https://aka.ms/perfcollect) and [Perfview](https://github.com/Microsoft/perfview/blob/master/documentation/Downloading.md) are a collection of tools provided by Microsoft to analyze the behavior of running netcore processes. 4 | 5 | The following guide will walk you through using these tools to gather events and perform cpu profiling on a live container running in Kubernetes. Note that we will be performing our data collection from a sidecar deployed in the same pod as the container we want to debug. 6 | 7 | Check out these guides on [cpu profiling](../cpu-profiling) and [static-tracepoints](../static-tracepoints) without using PerfView. 8 | 9 | ## Run your netcore app in K8s 10 | Create your pod with a [debugging sidecar](https://hub.docker.com/r/joeelliott/netcore-debugging-tools). The rest of this guide will use [perfcollect.yaml](./perfcollect.yaml) which runs a sidecar next to a simple [sample app](https://github.com/joe-elliott/sample-netcore-app). 11 | 12 | #### Environment Variables 13 | 14 | ``` 15 | env: 16 | - name: COMPlus_PerfMapEnabled 17 | value: "1" 18 | - name: COMPlus_EnableEventLog 19 | value: "1" 20 | - name: COMPlus_ZapDisable 21 | value: "1" 22 | - name: COMPlus_ReadyToRun 23 | value: "0" 24 | ``` 25 | 26 | `COMPlus_EnableEventLog` Instructs netcore to produce LTTng events. 27 | 28 | `COMPlus_PerfMapEnabled` creates a perf map in `/tmp` that perf can read to symbolicate stack traces. 29 | 30 | `COMPlus_ZapDisable` will force netcore runtime to be JITted. This is normally not desirable, but it will cause the netcore runtime dll symbols to be included in the perf maps. This will allow perf to gather symbols for both the runtime as well as your application. 31 | 32 | There are other ways to do this if you are interested. https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md#resolving-framework-symbols 33 | 34 | `COMPlus_ReadyToRun` will prevent the .NETCore runtime from using pre-compiled images. This can also be used to increase symbol coverage. Some details [here](https://docs.microsoft.com/en-us/visualstudio/debugger/jit-optimization-and-debugging?view=vs-2019#limitations-of-the-suppress-jit-optimization-option). 35 | 36 | #### Mount /tmp 37 | By sharing /tmp as an empty directory the debugging sidecar can easily access perf maps created by the netcore application. 38 | 39 | #### Mount /var/run/lttng 40 | LTTng uses a number of files in this folder to communicate with the running process. Sharing this folder between containers allows your sidecar to pick up events produced by your netcore app. 41 | 42 | #### shareProcessNamespace 43 | Setting `shareProcessNamespace` to true allows the sidecar to easily access the process you want to debug. 44 | 45 | ### 2. Run ./setup.sh 46 | SSH to the node and run [`./setup.sh `](./setup.sh) with the pid of the process you want to profile as root. This script will 47 | 48 | - Move map files out of the container's `/tmp` directory to the host so perf can pick them up. 49 | - Download and run `perfcollect install` 50 | 51 | ## Profile! 52 | 53 | Exec into the sidecar and run `./setup.sh`. The tools we are using are very tightly coupled with the kernel version you want to debug. Because of this we can't install all of the tools we need directly in the container. They must be installed once the container is running and the kernel version is known. `./setup.sh` will attempt to install the rest. If you are having issues refer to the notes on [kernel interactions](../kernel-interactions) with the container. 54 | 55 | ``` 56 | kubectl exec -it -c profile-sidecar sample-netcore-app bash 57 | # ./setup.sh 58 | ``` 59 | 60 | Next discover the pid of the dotnet process you want to profile. You will use it in the below examples. 61 | 62 | ``` 63 | # ps aux | grep dotnet 64 | root 6 0.5 4.2 11940308 87108 ? SLsl 02:46 0:06 dotnet /app/sample-netcore-app.dll 65 | ``` 66 | 67 | The perfcollect script itself will collect both stack traces and events at the same time. The below will collect for 5 seconds. If you leave the `collectsec` argument off you will need to Ctrl+C to interrupt `perfcollect`. This will create a `sample.trace.zip` file which can then be viewed with [PerfView](https://github.com/Microsoft/perfview/blob/master/documentation/Downloading.md) 68 | 69 | `./perfcollect collect sample -collectsec 5` 70 | 71 | Exit the container and copy it locally 72 | 73 | ``` 74 | kubectl cp default/sample-netcore-app:sample.trace.zip sample.trace.zip -c profile-sidecar 75 | ``` 76 | 77 | ### Warning 78 | 79 | Perfcollect is bad about swallowing errors. If you pull your sample.trace.zip locally and are not seeing stack traces I would recommend reviewing the `perfcollect.log` file contained in the zip. It will show you the raw perf commands run and their outputs. 80 | 81 | For example perfcollect supports a `-pid` parameter, but if you pass it perfcollect will fail silently: https://github.com/dotnet/corefx-tools/issues/84. 82 | 83 | 84 | ## PerfView 85 | 86 | Open up your `sample.trace.zip` in PerfView and explore some of the functionality. Some sample screenshots below. 87 | 88 | ![Call Tree](./calltree.png) 89 | 90 | ![Events](./events.png) 91 | 92 | ![FlameGraph](./flamegraph.png) -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # netcore-kubernetes-profiling 2 | 3 | This is my personal collection of notes, scripts and techniques developed to help debug live .NET Core applications. All of these techniques were performed from a sidecar in Kubernetes. If you are interested in profiling .NET Core applications running on Linux without Kubernetes then these guides still will contain a lot of useful information. 4 | 5 | Feel free to ask questions, suggest changes or submit pull requests. 6 | 7 | ## Demo! 8 | 9 | The below dynamic tracing demo was performed on [this application](https://github.com/joe-elliott/sample-netcore-app) built with [this Dockerfile](https://github.com/joe-elliott/sample-netcore-app/blob/master/Dockerfile). Even though this application was built normally and contains no special instrumentation we can still dynamically trace any method in the application using [bcc](https://github.com/iovisor/bcc). In the following demo we will trace [calculateFibonacciValue](https://github.com/joe-elliott/sample-netcore-app/blob/master/Providers/FibonacciProvider.cs#L9) and [calculateEchoValue](https://github.com/joe-elliott/sample-netcore-app/blob/master/Providers/EchoProvider.cs#L9). 10 | 11 | ![bcc demo](./dynamic-tracing-bcc.gif) 12 | 13 | Check out the below guides to get details on how to do this and more. 14 | 15 | ## Debugging techniques 16 | 17 | - [cpu profiling](./cpu-profiling) 18 | - Building FlameGraphs from perf data 19 | - [static tracepoints](./static-tracepoints) 20 | - Recording and viewing LTTng events 21 | - [perfcollect](./perfcollect) 22 | - Static Tracepoints and CPU Profiling the Microsoft way 23 | - [dynamic tracing](./dynamic-tracing) 24 | - Perf events and BCC to trace any method in an application without instrumentation 25 | - [core dumps](./coredumps) 26 | - Multiple methods for collecting and analyzing coredumps. 27 | 28 | ## Other information 29 | 30 | - [images](./images) 31 | - A collection of Dockerfiles to build sidecar profiling containers. 32 | - [kernel interactions](./kernel-interactions) 33 | - The containers, tools, and the kernel can sometimes have weird interactions. This document contains information on how to get these tools working on a variety of kernel versions. 34 | - [todo](./todo) 35 | - Future work for this repo. 36 | 37 | Previously this repo was focused on executing these techniques from the node the application was running on. If you are interested in that approach you can check it out [here](https://github.com/joe-elliott/netcore-kubernetes-profiling/tree/54bacfeecb33de6bbc590768af9c276efd1b4e4c). 38 | 39 | ## Presentations 40 | 41 | - [NA Kubecon 2019 Slides](./2019NAKubecon.pdf) 42 | - This session focused specifically on the profiling and tracing techniques as executed from a sidecar. Check out [the video](https://www.youtube.com/watch?v=yNTc2-i9arg) and [a shot of the room!](https://flic.kr/p/2hNBGL4). 43 | - [CodePaLousa 2019 Slides](https://docs.google.com/presentation/d/1-OJtTSEGEWxYAIHhKDoociKJXL7CFH8BPl6xJDATSuI/edit?usp=sharing) 44 | - The session was [streamed](https://www.facebook.com/CodePaLOUsa/videos/487782252038255/) by [Switcher Studios](https://www.switcherstudio.com/). It only focused on Linux .NET Core debugging but used this repo as its source material. 45 | -------------------------------------------------------------------------------- /static-tracepoints/readme.md: -------------------------------------------------------------------------------- 1 | # static-tracepoints 2 | 3 | Recording static tracepoints produced by the netcore framework is actually quite easy. Netcore is already instrumented to produce framework level events such as garbage collection and thread creation. 4 | 5 | If you are interested in both profiling and recording LTTng events see [perfcollect](../perfcollect). This documentation will walk you through generating data for the PerfView utility. 6 | 7 | ## Run your netcore app in K8s 8 | Create your pod with a [debugging sidecar](https://hub.docker.com/r/joeelliott/netcore-debugging-tools). The rest of this guide will use [static-tracepoints.yaml](./static-tracepoints.yaml) which runs a sidecar next to a simple [sample app](https://github.com/joe-elliott/sample-netcore-app). 9 | 10 | 11 | #### Environment Variables 12 | Set the following environment variables for your main process. 13 | 14 | ``` 15 | env: 16 | - name: COMPlus_EnableEventLog 17 | value: "1" 18 | ``` 19 | 20 | `COMPlus_EnableEventLog` Instructs netcore to produce LTTng events. 21 | 22 | #### Mount /var/run/lttng 23 | LTTng uses a number of files in this folder to communicate with the running process. Sharing this folder between containers allows your sidecar to pick up events produced by your netcore app. 24 | 25 | #### shareProcessNamespace 26 | Setting `shareProcessNamespace` to true allows the sidecar to easily access the process you want to debug. 27 | 28 | ## Collect Events! 29 | 30 | Exec into the sidecar and discover the pid of the dotnet process you want to profile. You will use it in the below examples. 31 | 32 | ``` 33 | kubectl exec -it -c profile-sidecar sample-netcore-app bash 34 | # ps aux | grep dotnet 35 | root 7 0.4 3.9 11797376 80500 ? SLsl 00:55 0:01 dotnet /app/sample-netcore-app.dll 36 | ``` 37 | 38 | Start an LTTng session and collect events. 39 | 40 | ``` 41 | lttng create session --output=./lttng-events 42 | lttng enable-event --userspace --all 43 | lttng track --pid= -u 44 | lttng start 45 | # events are being recored now 46 | lttng stop 47 | lttng destroy 48 | ``` 49 | 50 | Dump them to the terminal. 51 | 52 | ``` 53 | babeltrace ./lttng-events 54 | 55 | ... 56 | [01:00:37.588459510] (+0.000000481) sample-netcore-app DotNETRuntime:GCSampledObjectAllocationHigh: { cpu_id = 0 }, { Address = 139897548412480, TypeID = 139906679802464, ObjectCountForTypeSample = 1, TotalSizeForTypeSample = 122, ClrInstanceID = 0 } 57 | [01:00:37.588460717] (+0.000001207) sample-netcore-app DotNETRuntime:EventSource: { cpu_id = 0 }, { EventID = 25, EventName = "SetActivityId", EventSourceName = "System.Threading.Tasks.TplEventSource", Payload = "{\"NewId\":00000000-0000-0000-0000-000000000000}" } 58 | [01:00:37.588523375] (+0.000062658) sample-netcore-app DotNETRuntime:ThreadPoolWorkerThreadWait: { cpu_id = 0 }, { ActiveWorkerThreadCount = 2, RetiredWorkerThreadCount = 0, ClrInstanceID = 0 } 59 | [01:00:37.588524339] (+0.000000964) sample-netcore-app DotNETRuntime:ThreadPoolWorkerThreadWait: { cpu_id = 0 }, { ActiveWorkerThreadCount = 2, RetiredWorkerThreadCount = 0, ClrInstanceID = 0 } 60 | [01:00:38.586440792] (+0.997916453) sample-netcore-app DotNETRuntime:GCSampledObjectAllocationHigh: { cpu_id = 0 }, { Address = 139897548412608, TypeID = 139906670816858, ObjectCountForTypeSample = 1, TotalSizeForTypeSample = 48, ClrInstanceID = 0 } 61 | [01:00:38.586445343] (+0.000004551) sample-netcore-app DotNETRuntime:GCSampledObjectAllocationHigh: { cpu_id = 0 }, { Address = 139897548412656, TypeID = 139906679813328, ObjectCountForTypeSample = 1, TotalSizeForTypeSample = 24, ClrInstanceID = 0 } 62 | ... 63 | ``` -------------------------------------------------------------------------------- /static-tracepoints/static-tracepoints.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Pod 3 | metadata: 4 | name: sample-netcore-app 5 | labels: 6 | app: sample-netcore-app 7 | spec: 8 | shareProcessNamespace: true 9 | containers: 10 | - name: sample-netcore-app 11 | image: joeelliott/sample-netcore-app:v1.0.0-2.2.5 12 | imagePullPolicy: IfNotPresent 13 | env: 14 | - name: COMPlus_EnableEventLog 15 | value: "1" 16 | - name: ASPNETCORE_URLS 17 | value: http://*:8080 18 | volumeMounts: 19 | - mountPath: /var/run/lttng 20 | name: lttng 21 | - name: profile-sidecar 22 | image: joeelliott/netcore-debugging-tools:v0.0.7-2.2.5 23 | imagePullPolicy: IfNotPresent 24 | args: 25 | - sleep 26 | - infinity 27 | volumeMounts: 28 | - mountPath: /var/run/lttng 29 | name: lttng 30 | volumes: 31 | - name: lttng 32 | emptyDir: {} -------------------------------------------------------------------------------- /todo/readme.md: -------------------------------------------------------------------------------- 1 | ## todo 2 | 3 | - dynamic tracing 4 | - improve call stacks 5 | - https://github.com/dotnet/ILMerge 6 | - Perf can't use perf maps for dlls? bcc can? 7 | - http://blogs.microsoft.co.il/sasha/2017/02/27/profiling-a-net-core-application-on-linux/ 8 | - review mapgen.py. make sure we can get stack traces 9 | - bcc/bpf 10 | - netcore-bcc-trace.py 11 | - support float types 12 | - support parameters besides the first 13 | - histogram support? 14 | - core dumps 15 | - understand why the GCRoot output is littered with `` 16 | - Attempt to install lldb-3.9? 17 | - https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md 18 | - test these examples on other linux distros and kernel versions 19 | - build sidecars for other netcore versions 20 | 21 | ## to read 22 | 23 | - https://jvns.ca/blog/2017/07/05/linux-tracing-systems/ 24 | - http://man7.org/linux/man-pages/man1/perf-probe.1.html 25 | - https://linux.die.net/man/1/perf-probe 26 | - https://www.kernel.org/doc/Documentation/trace/kprobetrace.txt 27 | - http://www.brendangregg.com/blog/2018-10-08/dtrace-for-linux-2018.html 28 | - http://www.brendangregg.com/blog/2019-01-01/learn-ebpf-tracing.html 29 | - https://www.joyfulbikeshedding.com/blog/2019-01-31-full-system-dynamic-tracing-on-linux-using-ebpf-and-bpftrace.html 30 | --------------------------------------------------------------------------------