├── LICENSE ├── README.md ├── extra ├── perf.py_ └── perf_support.patch ├── flamegraph_sample.svg ├── probes ├── gazebo.csv ├── i915.csv ├── igdrcl.csv ├── intel_media.csv ├── opencl.csv ├── openvino.csv ├── orbslam.csv ├── probes.csv ├── va.csv └── xe.csv ├── requirements.txt ├── scripts ├── bashfulprofiler.sh ├── build_perf_with_ctf.sh └── parse_perf.py ├── setup_dependency.sh └── trace_sample.html /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Bash Scripting Meets Performance Analysis: 2 | ## BashfulProfiler for Linux application and Kernel 3 | 4 | Introducing BashfulProfiler: a robust, non-intrusive, and highly adaptable performance analysis tool built around Bash. Designed to offer developers flexible and detailed insights into the performance characteristics of their Linux-based applications, BashfulProfiler leverages the power of the Linux perf tool. It works best when the target binaries — executables or shared libraries — are compiled with symbols (e.g., using the -g flag). This implementation has been tested on Ubuntu 22.04 and 24.04, with Python versions 3.10 and 3.12. While the core logic is implemented in Bash, several third-party utilities integrated into the workflow are Python-based. 5 | 6 | ## Overview 7 | The tool's design is split into two main components: the front end, built entirely using bash scripting, and the backend, which relies on the Linux Kernel Perf tool, ctf2ctf, trace2html and flamegraph. Probes or traces are defined in a configuration file (for instance, probes.csv), which are then parsed and passed to the Perf tool to set up the probes. Once set, a capturing script proceeds to record these probes over a predetermined duration (such as 8 seconds). After the recording phase, the captured data is processed and transformed into easily understandable time charts and flamegraphs, offering clear insights for performance analysis. The probe setting is done once for that boot session however capture can be done multiple times depending on run configuratins and needs. 8 | 9 | Flow Diagram: 10 | 11 | ![image](https://github.com/arshadlab/time_charting_with_perf/assets/85929438/71ae12ec-4c0f-4655-aa5d-9f2c97ef1220) 12 | 13 | 14 | ### Usage 15 | - Use the provided script to build the Linux perf tool with CTF support. The script downloads the Linux kernel source for the running kernel’s major version and compiles only the perf tool. The tool also requires certain kernel CONFIG options to be enabled, which are typically enabled in stock kernels. 16 | 17 | - Identify binaries and their exported functions or symbols to set probes on. Any .so file or kernel module with publicly exported methods can be traced. However, using binaries built with debug symbols (-g option, but still optimized for release) provides richer probe points. It is recommended to generate binaries with debug symbols included and not stripped. Regular expression is supported (e.g grep) as symbol filter. 18 | 19 | - set TRACE_ROOT to this repo local path and source bashfulprofiler.sh for bash function availability. 20 | 21 | - Create a .csv recipe to define probes across multiple .so files and kernel modules. Alternatively, users can set ad hoc probe points by directly calling the script's functions. See probes directory for examples. 22 | 23 | - Start the workload in a separate console. 24 | 25 | - Run trace_capture_and_convert to capture traces while the workload is running. 26 | 27 | - Open the generated trace.html and flamegraph in a browser to analyze the results. 28 | 29 | - Once set, probes remain available throughout the boot session or until the binary is rebuilt. 30 | 31 | ### Features 32 | **Dynamic Tracepoint Injection**: No need to modify source code, simply provide the shared libraries and functions of interest, and BashfulProfiler will handle the rest. 33 | 34 | **Robust Data Collection** The tool captures comprehensive data from the tracepoints, such as execution start and stop timestamps, to provide a detailed timeline of function execution. 35 | 36 | **Data Processing** The raw trace data is processed through perf convert and ctf2ctf to convert it into a more readable and analyzable format. 37 | 38 | **Interactive Performance Charts** The processed trace data is then passed to trace2html to generate interactive performance time charts. These charts provide a visual representation of how functions within the application execute and interact over time, making it easier to identify potential bottlenecks or performance inefficiencies. 39 | 40 | **Flamegraph Generation**: In addition to time charts, BashfulProfiler also generates Flamegraph visualizations, offering a more consolidated and intuitive view of the program’s performance. This helps in quickly identifying hotspots and understanding function call hierarchies at a glance. 41 | 42 | Users can set up probes either by directly calling the probe-setting Bash functions or by using a .csv file as a recipe for a quick and consistent setup across different binaries and kernel modules. This approach allows teams to share recipes, ensuring uniform output and faster results. 43 | 44 | 45 | ## Getting Started 46 | ### Installation 47 | 48 | #### Clone this repo and install dependencies 49 | First of all, git clone repo to local folder. 50 | 51 | ``` 52 | git clone https://github.com/arshadlab/time_charting_with_perf 53 | cd time_charting_with_perf 54 | ./setup_dependency.sh 55 | export TRACE_ROOT=$PWD 56 | source ./scripts/bashfulprofiler.sh 57 | ``` 58 | 59 | #### Setup system with linux perf tool enabled 60 | 61 | The Linux Perf tool is a powerful utility for profiling and performance monitoring on Linux systems. Here are brief steps to setup system with perf tool 62 | 63 | ##### Install perf tool 64 | 65 | Due to the necessity of perf, root access (for example, using sudo) is required. Perf also need to be compiled with ctf conversion support which the default build doesn't comes with. 66 | A helper script is provided in the scripts folder to download the kernel source for the current major version and compile only the perf tool. This script also installs all necessary dependencies. 67 | For more control, users can choose to execute the commands manually. 68 | 69 | ``` 70 | $ ./build_perf_with_ctf.sh 71 | ``` 72 | 73 | Keep in mind that the perf tool requires root permissions (e.g sudo) and capabilities to function properly. It also depends on certain kernel CONFIG options being enabled. In most cases, the stock kernel includes the necessary configurations by default. 74 | 75 | If perf command doesn't work as expected then most likely kernel is not build with required configs. Please refer to existing manuals on how to rebuild kernel. Make sure below configs are enabled in the .config file. 76 | 77 | ``` 78 | CONFIG_PERF_EVENTS=y 79 | CONFIG_FRAME_POINTER=y 80 | CONFIG_KALLSYMS=y 81 | CONFIG_TRACEPOINTS=y 82 | CONFIG_KPROBES=y 83 | CONFIG_KPROBE_EVENTS=y 84 | # user-level dynamic tracing: 85 | CONFIG_UPROBES=y 86 | CONFIG_UPROBE_EVENTS=y 87 | ``` 88 | 89 | The given script builds perf with ctf support. This can be verified by checking if the --to-ctf option appears in the list of supported commands: 90 | 91 | ``` 92 | $ perf data convert --help 93 | Usage: perf data convert [] 94 | 95 | -f, --force don't complain, do it 96 | -i, --input input file name 97 | -v, --verbose be more verbose 98 | --all Convert all events 99 | --to-ctf ... Convert to CTF format 100 | --to-json ... Convert to JSON format 101 | --tod Convert time to wall clock time 102 | 103 | ``` 104 | 105 | If --to-ctf is not listed, the require packages listed in perf install script should be installed before rebuilding the perf tool from the kernel source. 106 | 107 | As long as the kernel headers for the target kernel are properly installed, there's no need to rebuild the entire kernel. The perf tool can be built separately by navigating to the /tools/perf directory and running make, provided that all required development packages are in place. 108 | 109 | ``` 110 | /tools/perf $ make 111 | ``` 112 | 113 | Once the perf binary is built, it can either be copied to /usr/bin/ for system-wide access, or the current directory can be added to the PATH environment variable for convenient use. 114 | 115 | #### Setup probes using csv 116 | 117 | It's important to initiate workload or the target process before setting up the probes. This is because the probes, as defined in the probe.csv file, rely on running target process in order to determine the absolute locations of the .so files within the system. However, if probe.csv file contains the full paths to the .so files, running the target process prior to setting up the probes is not necessary. 118 | ``` 119 | gst-launch-1.0 filesrc location= ! h265parse ! vah265dec ! queue ! gvafpscounter starting-frame=2000 ! fakesink async=false 120 | ``` 121 | 122 | The helper functions are provided in bashfulprofiler.sh, so make sure to source it before proceeding if you haven't already. 123 | 124 | ``` 125 | source ./scripts/bashfulprofiler.sh 126 | ``` 127 | 128 | Setup probes using probe_set_csv bash function 129 | 130 | ``` 131 | $ probe_set_csv ./probes/intel_media.csv 132 | Setting probe for: vaDisplayIsValid at 0x00000000000043e0 133 | perf command: 134 | sudo perf probe -x /usr/lib/x86_64-linux-gnu/libva.so.2.2200.0 -a vaDisplayIsValid_entry=0x00000000000043e0 135 | Added new event: 136 | probe_libva:vaDisplayIsValid_entry (on 0x00000000000043e0 in /usr/lib/x86_64-linux-gnu/libva.so.2.2200.0) 137 | 138 | You can now use it in all perf tools, such as: 139 | 140 | perf record -e probe_libva:vaDisplayIsValid_entry -aR sleep 1 141 | 142 | perf command: 143 | sudo perf probe -x /usr/lib/x86_64-linux-gnu/libva.so.2.2200.0 -a vaDisplayIsValid=0x00000000000043e0%return 144 | Added new event: 145 | probe_libva:vaDisplayIsValid__return (on 0x00000000000043e0%return in /usr/lib/x86_64-linux-gnu/libva.so.2.2200.0) 146 | 147 | 148 | You can now use it in all perf tools, such as: 149 | 150 | perf record -e probe_librcl:rclpublish__return -aR sleep 1 151 | 152 | ``` 153 | 154 | #### Setup probes using lib 155 | 156 | Probes can also be set directly on .so files. Use the probe_set_all_from_binary script function with an optional filter to set probes. If no filter is provided, probes will be added to all exported symbols. Both, publicly available symbols (e.g., using -T) as well as debug symbols will be searched (e.g., using -t). 157 | 158 | This will place entry/exit probes on all function symbols in iHD_drv_video.so that contain 'Execute(' in their names. Note the escape '\' for special characters. 159 | 160 | ``` 161 | $ probe_set_all_from_binary /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so 'Execute\(' 162 | ``` 163 | 164 | Setting up probes is typically a one-time task, unless the system is rebooted or the target binary is modified or updated. New probes are appended to the existing list, and if a probe with the same name is added again, it will be overwritten. Once configured, these probes remain available, allowing multiple capture sessions to be conducted without the need for reconfiguration. This persistence across sessions provides the flexibility to perform repeated analyses efficiently. 165 | 166 | #### Start Capturing 167 | 168 | Initiating capture using capture.sh. Make sure target process is running (e.g gst-launch). Default capturing duration is 8 seconds 169 | ``` 170 | $ trace_capture_and_convert [capture duration in seconds] 171 | ``` 172 | trace.html and flamegraph.svg will be in output folder and ready to be viewed in browser 173 | 174 | ``` 175 | $ ./output/trace.html ./output/flamegraph.svg 176 | ``` 177 | 178 | 179 | #### Remove Probes 180 | Once established, probes remain created (though inactive) until the system is rebooted or the associated binary is modified. It’s important to note that these probes do not consume any CPU resources unless they are actively used by a perf record session. However, if the probes are no longer needed, it’s a good practice to remove them to keep the environment clean and avoid potential conflicts. The following command can be used to remove all probes: 181 | ``` 182 | $ probe_remove_all 183 | ``` 184 | 185 | ### Sample probes for Intel Media driver (intel_media.csv) 186 | A sample recipe file, intel_media.csv, is included in this repo as a starting point for Intel Media Driver profiling. It sets up probes on all libva functions that start with va, adds probes to the media driver for symbols containing CreateBuffer, and finally, includes probes for selected functions in the i915 module. 187 | 188 | ``` 189 | #,probe_set_csv will look into the given process to find library path from loaded .so files 190 | #,If absolute path is given then process name is ignored. 191 | 192 | #,Header: ".so name","process name","symbol filter" 193 | # Add probes to libva's publically exported symbols. 194 | /usr/lib/x86_64-linux-gnu/libva.so.2.2200.0,,va 195 | 196 | # Add probes to media driver's symbols containing CreateBuffer keyword. Requires media driver to be build with -g else will not hit any 197 | /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so,,CreateBuffer 198 | 199 | # Add probe to i915 execbuffer call. Xe probes can be added accordingly. 200 | i915.ko,,\bi915_gem_do_execbuffer$ 201 | i915.ko,,\bi915_gem_wait_ioctl$ 202 | i915.ko,,\bi915_request_wait$ 203 | i915.ko,,\bi915_request_wait_timeout$ 204 | i915.ko,,\bflush_submission$ 205 | ... 206 | ``` 207 | 208 | Setting probes based on recipe can be done using probe_set_csv function call. 209 | 210 | ``` 211 | $ probe_set_csv ./scripts/intel_media.csv 212 | ``` 213 | 214 | The probes.csv is a comma-separated .csv file with four columns and no spaces around commas. The first column is designated for the .so/binary to be probed, and it can contain either just the name or the absolute path. If only the name is provided, the process name - which is the second entry - will be utilized to determine the absolute location of the .so. The process, presumably running with the .so file loaded, should be active prior to setting up probes. However, if an absolute path is provided, there's no requirement for the process name, and probe setup can be conducted at any time. 215 | 216 | The third column is designated for the symbol on which the probe is to be set. This symbol can be either fully named or partially named with a wildcard, following the Linux grep regular expression pattern. If multiple entries match, probes will be set up on all of them. 217 | 218 | The process name is required for the first row without path, and all subsequent rows use the same name for finding the .so path. Also if multiple symbols match the regular expression, the probe name is appended with the symbol address to ensure uniqueness and facilitate tracking. Additionally, the complete symbol line output by objdump is displayed in the script, which helps relate the captured probe to the exact symbol signature. 219 | 220 | The sample intel_media.csv for Intel media performance analysis leverages symbols exported by .so. However, if the binary is compiled with the -g option, more precise probing is possible as a larger set of symbols will be accessible for selection. 221 | 222 | Scripts are included in the repo to view loaded libraries and symbols exported by them. 223 | 224 | ![image](https://github.com/user-attachments/assets/10cc7e95-8de7-46bb-a049-43cc7d698667) 225 | 226 | 227 | ## Sample probes for OpenVino Run: 228 | Here is a sample probe CSV file designed to set trace points at key locations, including the entry points when a model is compiled and then sent for inference. At this stage, primitives are assembled into GPU kernels, and following a flush, the call waits for the GPU to complete execution. Having all this information visually presented at the forefront provides a holistic view of how the framework internally handles requests and the time spent at each stage. The included probes offer a comprehensive picture, though users can add more probes as needed. Note: The probes below require OpenVino to be built from source with the RelWithDebugInfo option. 229 | 230 | Sample probe file for OpenVino analysis: 231 | ``` 232 | #, Header: ".so name","process name","symbol filter" 233 | #, No space before and after commas 234 | #, Openvino library with debug symbol included 235 | #, probe_set_csv will look into the given process to find library path from loaded .so files 236 | #, If absolute path is given then process name is ignored. 237 | #, Below probes assume openvino plugins are compiled with debug symbols included (e.g -g). 238 | 239 | # GPU 240 | libopenvino_intel_gpu_plugin.so,benchmark_app,\bov::intel_gpu::SyncInferRequest::infer\(\)\s*$ 241 | libopenvino_intel_gpu_plugin.so,,\bov::intel_gpu::Plugin::compile_model\(.*\) 242 | libopenvino_intel_gpu_plugin.so,,ov::intel_gpu::SyncInferRequest::enqueue\(\)\s*$ 243 | libopenvino_intel_gpu_plugin.so,,ov::intel_gpu::SyncInferRequest::wait\(\)\s*$ 244 | libopenvino_intel_gpu_plugin.so,,\bcldnn::network::execute_impl\(.*\)$ 245 | libopenvino_intel_gpu_plugin.so,,\bcldnn::ocl::ocl_stream::flush\(\)\sconst$ 246 | libopenvino_intel_gpu_plugin.so,,\bcldnn::ocl::typed_primitive_impl_ocl<.*>::execute_impl 247 | libopenvino_intel_gpu_plugin.so,,\bcldnn::onednn::typed_primitive_onednn_impl<.*>::build_primitive 248 | libopenvino_intel_gpu_plugin.so,,\bcldnn::onednn::typed_primitive_onednn_impl<.*>::execute_impl 249 | libopenvino_auto_batch_plugin.so,,\bov::autobatch_plugin::Plugin::compile_model\(.*\) 250 | 251 | # CPU 252 | libopenvino_intel_cpu_plugin.so,benchmark_app,\bov::intel_cpu::SyncInferRequest::infer\(\)\s*$ 253 | libopenvino_intel_cpu_plugin.so,,\bov::intel_cpu::Plugin::compile_model\(.*\) 254 | 255 | # Kernel mode driver. i915.ko 256 | i915.ko, ,\bi915_gem_do_execbuffer 257 | i915.ko, ,\bii915_gem_wait_ioctl 258 | i915.ko, ,\bi915_request_wait_timeout 259 | i915.ko, ,\bflush_submission 260 | 261 | ``` 262 | 263 | 264 | Setting up probes is typically a one-time task, unless the system is rebooted or the target binary is modified or updated. New probes are appended to the existing list, and if a probe with the same name is added again, it will be overwritten. Once configured, these probes remain available, allowing multiple capture sessions to be conducted without the need for reconfiguration. This persistence across sessions provides the flexibility to perform repeated analyses efficiently. 265 | 266 | To further explain regex expressions, the pattern **\bov::intel_gpu::Plugin::compile_model\(.*\)** for the compile_model probe is designed to match lines in text where the compile_model() method of the Plugin class in the ov::intel_gpu namespace is invoked. It captures any arguments it might take (e.g., .*), and ensures it starts at a word boundary (\b) to prevent partial matches. Additionally, the $ character in some probes ensures that matching occurs only for those symbols where there is no extra word or character at the end. 267 | 268 | 269 | ![image](https://github.com/user-attachments/assets/243aef33-3058-4049-9750-e23697ad6185) 270 | 271 | 272 | ## Analyzing Gazebo ROS2: 273 | 274 | The default probes included with the tool are specifically chosen to analyze Gazebo simulations that involve ROS 2, including interactions with the Navigation2 and MoveIt2 stacks. These probes help uncover performance characteristics such as: 275 | 276 | **Time chart of key events in Gazebo and ROS2:** Visualize the timeline of critical events, helping to understand system's operation and identify potential performance issues. 277 | 278 | **Publish/Subscribe events by ROS2 nodes:** Analyze the Pub/Sub event dynamics between ROS2 nodes, enabling a closer look at communication efficiency and latencies within system. 279 | 280 | **Path planning latencies by Nav2 and Moveit2 stacks:** Understand and optimize the time taken for path planning, an essential aspect of robotic navigation in ROS2 systems. 281 | 282 | **FlameGraph of system hotspots:** Gain a clear visual representation of system's performance hotspots, helping focus optimization efforts effectively. 283 | 284 | **Compatible with all ROS2 Distributions:** The tool should work on all ROS2 distributions as long as the exported symbols are the same. 285 | 286 | Sample trace.html and flamegraph.svg are provide in the repo. 287 | 288 | 289 | 290 | Simulation update breakup 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | BashfulProfiler acts as a seamless conduit between applications and the Linux Perf tool, offering a user-friendly and efficient way to gain insights into system's performance and take action where necessary. 303 | 304 | ## Troubleshoot 305 | If the number of trace samples becomes too large, loading the generated .html file in a browser may become difficult or unresponsive. In such cases, there are two practical options: 306 | Either 307 | - Reduce the number of trace probes by adjusting symbol filter criteria. 308 | - Shorten the capture duration to limit the volume of collected data. 309 | - Remove probe with highest count to reduce capture size in subsequent run. 310 | 311 | Use below command to count by probe name in captured trace. 312 | ``` 313 | $ perf script -i ./output/instrace.data | sed 's/^[ \t]*//;s/[ \t]*$//' | tr -s ' ' | awk -F'[ ]' '{print $5}' | awk -F'[:]' '{print$2}' | sort | uniq -c | sort -nr 314 | 42868 vaDisplayIsValid__return 315 | 42868 vaDisplayIsValid_entry 316 | 10845 vaDestroyBuffer__return 317 | 10845 vaDestroyBuffer_entry 318 | 10845 vaCreateBuffer__return 319 | 10845 vaCreateBuffer_entry 320 | 10845 DdiMediaDecode_CreateBuffer__return 321 | 10845 DdiMediaDecode_CreateBuffer_entry 322 | 10845 DdiMedia_CreateBuffer__return 323 | 10845 DdiMedia_CreateBuffer_entry 324 | 10845 DdiDecode_CreateBuffer__return 325 | 10845 DdiDecode_CreateBuffer_entry 326 | 8223 i915_gem_wait_ioctl__return 327 | 8223 i915_gem_wait_ioctl_entry 328 | ``` 329 | Once unncessarry probes are identified then either remove them from .csv by adjusting symbol filter criteria or use probe_remove call to remove entry and return probes. 330 | ``` 331 | $ probe_remove vaDisplayIsValid 332 | ``` 333 | 334 | Probe names have a length restriction. If issues arise while setting probes, it may help to shorten the probe names defined in the .csv file to ensure compatibility. 335 | 336 | -------------------------------------------------------------------------------- /extra/perf.py_: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import csv 3 | #objdump -t $(cat /proc/$(pgrep gzserver)/maps | grep ros_init | tr -s ' ' | cut -d ' ' -f 6 | sort | uniq) | c++filt | grep 'GazeboRosInitPrivate::Publish' | cut -d ' ' -f 1 4 | def main(): 5 | process = 'gzserver' 6 | ''' 7 | probes = [ 8 | ['libgazebo_ros_init.so', 'GazeboRosInitPrivate::Publish', 'ros_init_pubtime'], 9 | ['libgazebo_ros_init.so', 'GazeboRosInitPrivate::UpdateEnd', 'ros_init_updateend'], 10 | ['libgazebo_ros_vacuum_gripper.so', 'GazeboRosVacuumGripperPrivate::OnUpdate', 'vac_gripper_update'], 11 | ['libgazebo_ros2_control.so', 'GazeboRosControlPrivate::Update', 'ros2_control_update'], 12 | ['libgazebo_ros_joint_state_publisher.so', 'GazeboRosJointStatePublisherPrivate::OnUpdate', 'state_pub_update'], 13 | ['libgazebo_ros_diff_drive.so', 'GazeboRosDiffDrivePrivate::OnUpdate', 'diffdrive_update'], 14 | ['libgazebo_ros_diff_drive.so', 'GazeboRosDiffDrivePrivate::PublishOdometryTf', 'diffdrive_publish_tf'], 15 | ['libgazebo_ros_diff_drive.so', 'GazeboRosDiffDrivePrivate::PublishOdometryMsg', 'diffdrive_publish_msg'], 16 | ['libgazebo_ros_diff_drive.so', 'GazeboRosDiffDrivePrivate::PublishWheelsTf', 'diffdrive_publish_whltf'], 17 | ['libgazebo_ros_diff_drive.so', 'GazeboRosDiffDrivePrivate::UpdateWheelVelocities', 'diffdrive_update_whl_vel'], 18 | ['libgazebo_ros_state.so', 'GazeboRosStatePrivate::OnUpdate', 'ros_state_update'], 19 | ['libtf2_ros.so','::TransformBroadcaster::sendTransform(geo', 'tf2_ros_sendtransform'], 20 | ['libcontroller_manager.so', 'ControllerManager::update', 'control_manager_update'], 21 | ['libcontroller_manager.so', 'ControllerManager::read', 'control_manager_read'], 22 | ['libcontroller_manager.so', 'ControllerManager::write', 'control_manager_write'], 23 | ['libjoint_trajectory_controller.so', 'JointTrajectoryController::update', 'joint_traj_update'], 24 | ['libjoint_state_broadcaster.so', 'JointStateBroadcaster::update', 'joint_state_bcaster_update'], 25 | ['librclcpp.so', '::publish(std::unique_ptr::execute_impl 15 | libopenvino_intel_gpu_plugin.so,,\bcldnn::onednn::typed_primitive_onednn_impl<.*>::build_primitive 16 | libopenvino_intel_gpu_plugin.so,,\bcldnn::onednn::typed_primitive_onednn_impl<.*>::execute_impl 17 | libopenvino_auto_batch_plugin.so,,\bov::autobatch_plugin::Plugin::compile_model\(.*\) 18 | # CPU 19 | libopenvino_intel_cpu_plugin.so,benchmark_app,\bov::intel_cpu::SyncInferRequest::infer\(\)\s*$ 20 | libopenvino_intel_cpu_plugin.so,,\bov::intel_cpu::Plugin::compile_model\(.*\) 21 | -------------------------------------------------------------------------------- /probes/orbslam.csv: -------------------------------------------------------------------------------- 1 | #, Header: ".so name", "process name", "symbol filter", "probe name" 2 | #, ROS2 libraries 3 | #, set_probes_csv.sh will look into the given process to find library path from loaded .so files 4 | #, If absolute path is given then process name is ignored. 5 | #, ### Disabling below due to too much traffic generating. 6 | libORB_SLAM3.so,slam,TrackRGBD 7 | libORB_SLAM3.so,slam,ExtractORB 8 | -------------------------------------------------------------------------------- /probes/probes.csv: -------------------------------------------------------------------------------- 1 | #,Header: ".so name","process name","symbol filter" 2 | #,ROS2 libraries 3 | #,set_probes_csv.sh will look into the given process to find library path from loaded .so files 4 | #,If absolute path is given then process name is ignored. 5 | /opt/ros/humble/lib/librcl.so,,rcl_publish$,rclpublish 6 | #,### Disabling below due to too much traffic generating. 7 | #,librcl.so,gzserver,rcl_take$,rcl_take_topic_subscription 8 | #,librcl.so,gzserver,rcl_take_request$,rcl_take_request 9 | #,librmw_implementation.so,gzserver,rmw_publish$ 10 | #,librmw_implementation.so,gzserver,rmw_take_request 11 | #,librmw_implementation.so,gzserver,rmw_take_with_info 12 | #,librmw_implementation.so,gzserver,rmw_take$ 13 | #,Gazebo native libraries 14 | libgazebo_physics.so,gzserver,physics::World::Update() 15 | libgazebo_physics.so,gzserver,gazebo::physics::Model::Update() 16 | libgazebo_physics.so,gzserver,ODEPhysics::UpdatePhysics 17 | libgazebo_physics.so,gzserver,ODEPhysics::UpdateCollision 18 | libgazebo_physics.so,gzserver,ContactManager::PublishContacts 19 | libgazebo_physics.so,gzserver,gazebo::physics::Joint::Update() 20 | #,### Disabling below due to too much traffic generating. 21 | #,libgazebo_physics.so,gzserver,physics::Link::Update( 22 | #,libgazebo_physics.so,gzserver,physics::Entity::SetWorldPose(ignition 23 | #,Gazebo Plugins 24 | libgazebo_ros_init.so,gzserver,GazeboRosInitPrivate::Publish 25 | libgazebo_ros_init.so,gzserver,GazeboRosInitPrivate::UpdateEnd 26 | libgazebo_ros_vacuum_gripper.so,gzserver,GazeboRosVacuumGripperPrivate::OnUpdate 27 | libgazebo_ros2_control.so,gzserver,GazeboRosControlPrivate::Update 28 | libgazebo_ros_joint_state_publisher.so,gzserver,GazeboRosJointStatePublisherPrivate::OnUpdate 29 | libgazebo_ros_diff_drive.so,gzserver,GazeboRosDiffDrivePrivate::OnUpdate 30 | libgazebo_ros_diff_drive.so,gzserver,GazeboRosDiffDrivePrivate::PublishOdometryTf 31 | libgazebo_ros_diff_drive.so,gzserver,GazeboRosDiffDrivePrivate::PublishOdometryMsg 32 | libgazebo_ros_diff_drive.so,gzserver,GazeboRosDiffDrivePrivate::PublishWheelsTf 33 | libgazebo_ros_diff_drive.so,gzserver,GazeboRosDiffDrivePrivate::UpdateWheelVelocities 34 | libgazebo_ros_state.so,gzserver,GazeboRosStatePrivate::OnUpdate 35 | libtf2_ros.so,gzserver,::TransformBroadcaster::sendTransform(geo 36 | libcontroller_manager.so,gzserver,ControllerManager::update 37 | libcontroller_manager.so,gzserver,ControllerManager::read 38 | libcontroller_manager.so,gzserver,ControllerManager::write 39 | libjoint_trajectory_controller.so,gzserver,JointTrajectoryController::update 40 | libjoint_state_broadcaster.so,gzserver,JointStateBroadcaster::update 41 | #,nav2 stack 42 | libplanner_server_core.so,planner_server,computePlan 43 | #,moveit2 44 | libmoveit_move_group_default_capabilities.so,move_group,MoveGroupCartesianPathService::computeService 45 | libmoveit_move_group_default_capabilities.so,move_group,MoveGroupPlanService::computePlanService 46 | -------------------------------------------------------------------------------- /probes/va.csv: -------------------------------------------------------------------------------- 1 | #,probe_set_csv will look into the given process to find library path from loaded .so files 2 | #,If absolute path is given then process name is ignored. 3 | 4 | #,Header: ".so name","process name","symbol filter" 5 | # Add probes to libva's publically exported symbols. 6 | /usr/local/lib/libva.so,,vaCreateBuffer 7 | ,,vaInitialize 8 | ,,vaCreateConfig 9 | ,,vaCreateSurfaces$ 10 | ,,vaCreateBuffer 11 | ,,vaCreateContext 12 | ,,vaCreateImage 13 | ,,vaGetImage 14 | ,,vaBeginPicture 15 | ,,vaRenderPicture 16 | ,,vaEndPicture 17 | ,,vaSyncBuffer 18 | ,,vaSyncSurface 19 | ,,vaLockSurface 20 | ,,vaMapBuffer 21 | ,,vaUnmapBuffer 22 | ,,vaUnlockSurface 23 | ,,vaPutImage 24 | ,,vaReleaseBufferHandle 25 | ,,vaDeriveImage 26 | ,,vaDestroyImage 27 | ,,vaDestroyBuffer 28 | ,,vaDestroyConfig 29 | ,,vaDestroyContext 30 | ,,vaDestroySurfaces 31 | ,,vaTerminate 32 | -------------------------------------------------------------------------------- /probes/xe.csv: -------------------------------------------------------------------------------- 1 | #,probe_set_csv will look into the given process to find library path from loaded .so files 2 | #,If absolute path is given then process name is ignored. 3 | 4 | #,Header: ".so name","process name","symbol filter" 5 | # Add probes to xe publically exported symbols. 6 | xe.ko,,event_xe_pm_runtime_get_ioctl 7 | ,,intel_get_pipe_from_crtc_id_ioctl 8 | ,,vm_bind_ioctl_ops_execute 9 | ,,xe_drm_compat_ioctl 10 | ,,xe_exec_ioctl 11 | ,,xe_exec_queue_create_ioctl 12 | ,,xe_exec_queue_destroy_ioctl 13 | ,,xe_exec_queue_get_property_ioctl 14 | ,,xe_gem_create_ioctl 15 | ,,xe_gem_mmap_offset_ioctl 16 | ,,xe_ioctls 17 | ,,xe_oa_add_config_ioctl 18 | ,,xe_oa_ioctl 19 | ,,xe_oa_ioctl.cold 20 | ,,xe_oa_remove_config_ioctl 21 | ,,xe_oa_stream_open_ioctl 22 | ,,xe_observation_ioctl 23 | ,,xe_pm_runtime_get_ioctl 24 | ,,xe_query_ioctl 25 | ,,xe_vm_bind_ioctl 26 | ,,xe_vm_create_ioctl 27 | ,,xe_vm_destroy_ioctl 28 | ,,xe_wait_user_fence_ioctl 29 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | html5lib==1.1 2 | six==1.17.0 3 | bs4==0.0.2 4 | beautifulsoup4==4.12.3 5 | -------------------------------------------------------------------------------- /scripts/bashfulprofiler.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright 2025 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | # Author: Arshad Mehmood 17 | 18 | # Set TRACE_ROOT to repo root folder. 19 | SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" 20 | export TRACE_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" 21 | 22 | 23 | #************************************** 24 | # Shows loaded dynamic libraries with their paths by a process. 25 | # Syntax: 26 | # probe_show_loaded > [namefilter] 27 | # probe_show_loaded gzserver rcl 28 | #************************************** 29 | probe_show_loaded() { 30 | local param1=$1 31 | local filter=$2 32 | local pid="$param1" 33 | 34 | # If param1 is not all digits, assume it's a process name and get the PID 35 | if [[ -z "${param1##*[!0-9]*}" ]]; then 36 | pid=$(pgrep -o "$param1") 37 | if [[ -z "$pid" ]]; then 38 | echo "Error: No process found with name '$param1'" 39 | return 1 40 | fi 41 | fi 42 | 43 | if [[ ! -r /proc/$pid/maps ]]; then 44 | echo "Error: Cannot access /proc/$pid/maps" 45 | return 1 46 | fi 47 | 48 | cat /proc/$pid/maps | grep '\.so' | grep "$filter" | tr -s ' ' | cut -d ' ' -f 6 | sort | uniq 49 | } 50 | 51 | #************************************** 52 | # Shows symbols exported by .so file. Includes debug symbols if present. 53 | # Returns Demangled name followed by mangled name per line 54 | # Syntax: 55 | # probe_show_symbols [namefilter] 56 | # probe_show_symbols /opt/ros/foxy/lib/librcl.so publish 57 | #************************************** 58 | probe_show_symbols() { 59 | local lib_path="$1" 60 | local filter="$2" 61 | local probe_filter="(!__k???tab_* & !__crc_* & !__* & !*@plt)" 62 | 63 | if [[ "$filter" == *"::"* ]]; then 64 | # Filter contains ::, so search demangled names efficiently 65 | perf probe -x "$lib_path" -F --no-demangle --filter "$probe_filter" | sort | uniq | 66 | while IFS= read -r mangled_name; do 67 | printf "%s\n" "$mangled_name" 68 | done | 69 | c++filt | 70 | paste -d '\t' - <(perf probe -x "$lib_path" -F --no-demangle --filter "$probe_filter") | 71 | while IFS=$'\t' read -r demangled_name mangled_name_2; do 72 | if [[ "$demangled_name" == *"$filter"* ]]; then 73 | printf "%-60s -> %s\n" "$mangled_name_2" "$demangled_name" 74 | fi 75 | done 76 | else 77 | # Filter is a plain name, so search mangled names directly 78 | perf probe -x "$lib_path" -F --no-demangle --filter "$probe_filter" | sort | uniq | grep -E "$filter" | 79 | while IFS= read -r mangled_name; do 80 | demangled_name=$(echo "$mangled_name" | c++filt) 81 | printf "%-60s -> %s\n" "$mangled_name" "$demangled_name" 82 | done 83 | fi 84 | } 85 | 86 | #************************************** 87 | # Set probes on exported functions by a dynamic library/kernel module/executable. probe_name could be an exported function but in case of just a string, address in hex must be provided. 88 | # Syntax: 89 | # probe_set_from_binary [symbol_filter] [probe_name] 90 | # 91 | # probe_set_from_binary /opt/ros/foxy/lib/librcl.so 92 | # probe_set_from_binary /opt/ros/foxy/lib/librcl.so pub 93 | # 94 | # Using grep -E capability e.g \b for word boundary 95 | # probe_set_from_binary /opt/ros/foxy/lib/librcl.so '\bTracking\b|\bFrame\b' 96 | # 97 | # Tips: 98 | # ## Get unique names of triggered functions 99 | # $ perf script -i ./output/instrace.data | sed 's/^[ \t]*//;s/[ \t]*$//' | tr -s ' ' | awk -F'[ ]' '{print $5}' | awk -F'[:]' '{print$2}' | sort | uniq > function.txt 100 | # May use "sed 's/_entry$//'" 101 | # # Get count of each function 102 | # $ sudo perf script | sed 's/^[ \t]*//;s/[ \t]*$//' | tr -s ' ' | awk -F'[ ]' '{print $5}' | awk -F'[:]' '{print$2}' | sort | uniq -c | sort -nr 103 | 104 | # ## Or from successful probe insertion points 105 | # $ sudo perf probe -l | cut -d':' -f2 | cut -d ' ' -f1 > function.txt 106 | #************************************** 107 | probe_set_from_binary() { 108 | local lib_path="$1" 109 | local filter="$2" 110 | # Get address and demangled symbol names, and filter only valid symbol lines 111 | local symbols 112 | symbols=$(probe_show_symbols "$lib_path" "$filter") 113 | 114 | local symbol_count=$(echo "$symbols" | wc -l) 115 | 116 | local current_count=1 117 | # Loop through each symbol line 118 | echo "$symbols" | while read -r line; do 119 | local mangled_name demangled_name 120 | mangled_name=$(echo "$line" | tr -s ' ' | cut -d ' ' -f 1) 121 | demangled_name=$(echo "$line" | tr -s ' ' | cut -d ' ' -f 3-) 122 | function_name=$(echo "$demangled_name" | sed 's/::/_/g' | sed 's/(.*//') # Replace :: with _ and function params 123 | 124 | echo "($current_count/$symbol_count) Setting probe for: $function_name $mangled_name" 125 | probe_set_with_duration $lib_path $function_name $mangled_name 126 | current_count=$((current_count + 1)) 127 | done 128 | } 129 | 130 | #************************************** 131 | # Adds an entry and exit probes for a symbol 132 | # Syntax: 133 | # probe_set_with_duration <.so path> function_name [mangled name/address] 134 | #************************************** 135 | probe_set_with_duration() { 136 | local address=$3 137 | local function_with_signature="$2" 138 | local function_name="${function_with_signature%%(*}" 139 | 140 | # Remove trailing underscores 141 | function_name="${function_name%_}" 142 | 143 | # Remove ALL leading underscores using a loop 144 | while [[ "${function_name:0:1}" == "_" ]]; do 145 | function_name="${function_name:1}" 146 | done 147 | 148 | if [[ -z "$3" ]]; then address=$2; fi 149 | 150 | local probe_target="" 151 | if [[ "$1" == *.ko ]]; then 152 | probe_target="-m $1" 153 | else 154 | probe_target="-x $1" 155 | fi 156 | 157 | sudo perf probe -q -d ${function_name}_entry 158 | local perf_cmd="sudo perf probe $probe_target --no-demangle -a ${function_name}_entry='$address'" 159 | echo -e "perf command:\n\t $perf_cmd" 160 | echo $perf_cmd >> perf_cmd.sh 161 | eval "$perf_cmd" 162 | 163 | sudo perf probe -q -d ${function_name}__return 164 | local perf_cmd="sudo perf probe $probe_target --no-demangle -a ${function_name}='$address%return'" 165 | echo -e "perf command:\n\t $perf_cmd" 166 | echo $perf_cmd >> perf_cmd.sh 167 | eval "$perf_cmd" 168 | } 169 | 170 | #************************************** 171 | # Sets a single probe 172 | # Syntax: 173 | # probe_set_single 174 | #************************************** 175 | probe_set_single() { 176 | sudo perf probe -q -d $2 177 | 178 | local address=$3 179 | 180 | if [ -z "$3"]; then address=$2; fi 181 | 182 | local probe_target="" 183 | if [[ "$1" == *.ko ]]; then 184 | probe_target="-m $1" 185 | else 186 | probe_target="-x $1" 187 | fi 188 | 189 | local perf_cmd="sudo perf probe $probe_target --no-demangle -a $2=$address" 190 | echo -e "perf command:\n\t $perf_cmd" 191 | eval "$perf_cmd" 192 | } 193 | 194 | #************************************** 195 | # Delete all probes 196 | # Syntax: 197 | # probe_remove_all 198 | #************************************** 199 | probe_remove_all() { 200 | sudo perf probe -d '*' 201 | } 202 | 203 | #************************************** 204 | # Delete a single probe entry and exit 205 | # Syntax: 206 | # probe_remove 207 | #************************************** 208 | probe_remove() { 209 | sudo perf probe -d $1_entry 210 | sudo perf probe -d $1__return 211 | } 212 | 213 | #************************************** 214 | # This bash functions reads probe requests from probe.csv and sets up entry and exit probe for request function 215 | # 216 | # probe.csv fields: '.so name','process name','symbol filter' (without quotes) 217 | # No space before and after commas. Process name from previous rows will be used if none given for non absolute lib names. 218 | # Sample csv format: 219 | # libgazebo_ros_init.so,benchmark_app,GazeboRosInitPrivate::Publish 220 | # /libopenvino_intel_gpu_plugin.so,,ov::intel_gpu::SyncInferRequest::infer\(\)\s*$ 221 | # /libopenvino_intel_gpu_plugin.so,,\bcldnn::network::execute_impl\(.*\)\s$ 222 | # /i915.ko,,\bi915_gem_do_execbuffer$ 223 | # 224 | # Grep regex format for symbol filter: 225 | # e.g \bcldnn::network::execute_impl\(.*\)$ 226 | # This regular expression matches lines that begin with the word boundary of the function 227 | # cldnn::network::execute_impl() called with any arguments, ensuring it's a separate word, 228 | # followed by optional whitespace and ending precisely at the line's end. 229 | # 230 | # TIPS: 231 | # Extract names from .C file using ctags and add absolute path to .so for first row only. 232 | # $ ctags --c-kinds=f -x --fields=+n mos_bufmgr_xe.c | awk '{print ",," $1}' > function.csv 233 | # $ probe_set_csv probes.csv 234 | # 235 | # If the .so names are not absolute paths, the process name must include the process that utilizes 236 | # these .so files, and /proc/pid/maps is used to determine their absolute paths. Therefore, the 237 | # process must be active during the execution of the probe_set_csv script. This requirement 238 | # is unnecessary when all .so names are provided as absolute paths. 239 | #************************************** 240 | probe_set_from_csv() { 241 | probe_file=$1 242 | # Initialize previous_process_name outside the loop 243 | previous_process_name="" 244 | previous_library_name="" 245 | 246 | # Add info to perf command log file. 247 | echo -e "\n# Adding probes from $probe_file\n" >> perf_cmd.sh 248 | 249 | local line_count=$(wc -l < "$probe_file") 250 | local current_line=0 251 | while IFS=, read -r library_name process_name symbol_filter 252 | do 253 | current_line=$((current_line + 1)) 254 | 255 | # Skip completely empty lines 256 | if [[ ( -z "$library_name" && -z "$process_name" && -z "$symbol_filter" ) || \ 257 | ( "$library_name" == \#* ) ]]; then 258 | echo "Skipping empty line $current_line" 259 | continue 260 | fi 261 | 262 | # Use previous process_name if the current one is empty 263 | if [ -z "$library_name" ] && [ -n "$previous_library_name" ]; then 264 | library_name=$previous_library_name 265 | elif [ -n "$library_name" ]; then 266 | previous_library_name=$library_name # Update previous_library_name 267 | fi 268 | 269 | # Skip empty and commented rows 270 | if [ -z "$previous_library_name" ] || [[ "$previous_library_name" =~ ^# ]]; then 271 | echo "Skipping empty or commented line $current_line" 272 | continue 273 | fi 274 | 275 | library_path=$library_name 276 | 277 | # Direct probe call for kernel modules. no symbol search supported 278 | if [[ "$library_name" == *.ko ]]; then 279 | 280 | # Use modinfo to get the full path of the kernel module 281 | library_path=$(modinfo -n -m "$library_name") 282 | if [ -z "$library_path" ]; then 283 | echo "Error: Could not find full path for kernel module $library_name" 284 | continue 285 | fi 286 | 287 | elif ! [[ "$library_name" =~ ^/ ]]; then 288 | # Use previous process_name if the current one is empty 289 | if [ -z "$process_name" ] && [ -n "$previous_process_name" ]; then 290 | process_name=$previous_process_name 291 | elif [ -n "$process_name" ]; then 292 | previous_process_name=$process_name # Update previous_process_name 293 | fi 294 | 295 | # Retrieve the PID of the process 296 | pid=$(pgrep -o "$process_name") 297 | if [ -z "$pid" ]; then 298 | echo "Process $process_name not running" 299 | continue 300 | fi 301 | 302 | echo "PID $pid ($process_name) will be used to locate library path for $library_name" 303 | 304 | # Find out library path from loaded list 305 | library_path=$(cat /proc/$pid/maps | grep "$library_name" | tr -s ' ' | cut -d ' ' -f 6 | sort | uniq) 306 | 307 | if [ -z "$library_path" ]; then 308 | echo "Library $library_name not found" 309 | continue 310 | fi 311 | fi 312 | 313 | printf "(%d/%d) Processing Line: %s,%s,%s\n" "$current_line" "$line_count" "$library_name" "$process_name" "$symbol_filter" 314 | probe_set_from_binary $library_path $symbol_filter 315 | 316 | done < "$probe_file" 317 | } 318 | 319 | #************************************** 320 | # This bash functions removes probe listed in csv file 321 | #************************************** 322 | probe_remove_from_csv() { 323 | probe_file=$1 324 | 325 | local line_count=$(wc -l < "$probe_file") 326 | local current_line=0 327 | while IFS=, read -r library_name process_name symbol_filter 328 | do 329 | current_line=$((current_line + 1)) 330 | printf "(%d/%d) Processing Line: %s\n" "$current_line" "$line_count" "$symbol_filter" 331 | probe_remove $symbol_filter 332 | 333 | done < "$probe_file" 334 | } 335 | 336 | #************************************** 337 | # This bash funtion captures perf events from already added probes 338 | # A process name or pid can be given to capture events for that process only. Otherwise system wide events are captured 339 | # trace_capture_and_convert [duration] [processname|pid] 340 | # trace_capture_and_convert 8 gzserver 341 | # trace_capture_and_convert 4 342 | #************************************** 343 | trace_capture_and_convert() { 344 | local root_dir 345 | #root_dir=$(dirname "$(dirname "$(realpath "$0")")") 346 | 347 | local capture_duration=${1:-8} 348 | local p_cmd="" 349 | local ctf_cmd="" 350 | 351 | # Determine project root from environment variable or fallback 352 | if [[ -n "$TRACE_ROOT" ]]; then 353 | root_dir="$TRACE_ROOT" 354 | elif [[ -d "./3rdparty" ]]; then 355 | root_dir="$(pwd)" 356 | else 357 | echo "Error: TRACE_ROOT environment variable not set and 3rdparty directory not found in current path." 358 | return 1 359 | fi 360 | 361 | local thirdparty="$root_dir/3rdparty" 362 | 363 | if [ ! -f "$thirdparty/ctf2ctf/build/ctf2ctf" ]; then 364 | echo "ctf2ctf binary not found. Building for the first run" 365 | mkdir -p "$thirdparty/ctf2ctf/build" 366 | cmake -B "$thirdparty/ctf2ctf/build" -S "$root_dir/ctf2ctf/" 367 | make -j$(nproc) -C "$thirdparty/ctf2ctf/build" 368 | fi 369 | 370 | if [ ! -d "$thirdparty/FlameGraph" ]; then 371 | echo "Flame Graph not found. Cloning" 372 | git -C "$thirdparty" clone https://github.com/brendangregg/FlameGraph 373 | fi 374 | 375 | if [ ! -z "$2" ]; then 376 | local target_pid="$2" 377 | if [[ -z "${target_pid##*[!0-9]*}" ]]; then 378 | target_pid=$(pgrep -o "$2") 379 | echo "Target process $2 with pid $target_pid" 380 | fi 381 | p_cmd="-p $target_pid" 382 | ctf_cmd="--pid-whitelist=$target_pid" 383 | fi 384 | 385 | echo "Capturing for $capture_duration seconds" 386 | 387 | local processes 388 | processes=$(ps -ao pid,comm --sort=start_time) 389 | 390 | local OUTPUT_DIR="$PWD/output" 391 | mkdir -p "$OUTPUT_DIR" 392 | rm -rf "$OUTPUT_DIR"/* 393 | 394 | sudo bash -c "set -x; perf record $p_cmd --running-time --timestamp-boundary -B --namespaces -m 2048 -r50 -e probe*:* -o $OUTPUT_DIR/instrace.data -aR sleep $capture_duration > /dev/null 2>&1" & 395 | sudo bash -c "set -x; perf record $p_cmd -B --namespaces -m 2048 -F 1000 -r50 -o $OUTPUT_DIR/systrace.data -g -aR sleep $capture_duration > /dev/null 2>&1" 396 | 397 | stty sane 398 | while pgrep -x "perf" > /dev/null; do 399 | echo "Waiting for perf record to complete..." 400 | sleep 1 401 | done 402 | stty sane 403 | sync && sleep 2 404 | echo "Recording completed" 405 | 406 | local USER 407 | USER=$(whoami) 408 | sudo chown "$USER:$USER" "$OUTPUT_DIR"/*.data 409 | 410 | set -x 411 | perf data -i "$OUTPUT_DIR/systrace.data" convert --to-ctf "$OUTPUT_DIR/systrace_data" 412 | perf data -i "$OUTPUT_DIR/instrace.data" convert --to-ctf "$OUTPUT_DIR/instrace_data" 413 | { set +x; } 2>/dev/null 414 | 415 | echo "CTF conversion completed" 416 | 417 | set -x 418 | "$thirdparty/ctf2ctf/build/ctf2ctf" "$OUTPUT_DIR/systrace_data/" $ctf_cmd > "$OUTPUT_DIR/systrace.json" 419 | "$thirdparty/ctf2ctf/build/ctf2ctf" "$OUTPUT_DIR/instrace_data/" $ctf_cmd > "$OUTPUT_DIR/instrace.json" 420 | { set +x; } 2>/dev/null 421 | 422 | echo "JSON conversion completed" 423 | 424 | set -x 425 | "$thirdparty/catapult/tracing/bin/trace2html" "$OUTPUT_DIR/systrace.json" "$OUTPUT_DIR/instrace.json" --output "$OUTPUT_DIR/trace.html" --config full 426 | { set +x; } 2>/dev/null 427 | 428 | echo "HTML conversion completed" 429 | 430 | if [ -f "$thirdparty/FlameGraph/stackcollapse-perf.pl" ]; then 431 | echo "Generating FlameGraph" 432 | perf script -i "$OUTPUT_DIR/systrace.data" | "$thirdparty/FlameGraph/stackcollapse-perf.pl" > "$OUTPUT_DIR/flamegraph.perf-folded" 433 | "$thirdparty/FlameGraph/flamegraph.pl" "$OUTPUT_DIR/flamegraph.perf-folded" > "$OUTPUT_DIR/flamegraph.svg" 434 | fi 435 | 436 | echo "Capture completed. Use a web browser to open:" 437 | echo " - $OUTPUT_DIR/trace.html" 438 | echo " - $OUTPUT_DIR/flamegraph.svg" 439 | echo 440 | echo "Running Processes:" 441 | echo "$processes" 442 | } 443 | 444 | -------------------------------------------------------------------------------- /scripts/build_perf_with_ctf.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -e # Exit on error 4 | 5 | # Install dependencies 6 | echo "Installing required packages..." 7 | sudo apt update 8 | sudo apt-get install -y libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf 9 | sudo apt-get install -y libpfm4-dev elfutils libdw-dev systemtap-sdt-dev libunwind-dev libslang2-dev libcap-dev libcapstone-dev libbabeltrace-ctf-dev libtraceevent-dev libbfd-dev libperl-dev 10 | sudo apt-get install -y libbabeltrace-ctf-dev libbabeltrace-ctf1 libbabeltrace1 libbabeltrace-dev python3-babeltrace libtracefs-dev libbabeltrace2-dev 11 | 12 | # Get the current kernel version 13 | KERNEL_VERSION=$(uname -r) 14 | KERNEL_MAJOR=$(echo "$KERNEL_VERSION" | cut -d. -f1,2) 15 | 16 | if [ -d "linux-src" ]; then 17 | echo "Using existing linux-src directory. If running kernel version is changed then remove this directory for updated code" 18 | else 19 | echo "Cloning perf tool sources for v$KERNEL_MAJOR ..." 20 | git clone --depth=1 --branch v$KERNEL_MAJOR https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git linux-src 21 | fi 22 | cd linux-src/tools/perf 23 | 24 | # Build perf with CTF support 25 | echo "Building perf with CTF support..." 26 | make clean 27 | make -j$(nproc) EXTRA_CFLAGS="-I/usr/include/x86_64-linux-gnu" EXTRA_LDFLAGS='-L /usr/lib/x86_64-linux-gnu/ -lbabeltrace2' 28 | 29 | # Check if the build was successful 30 | if [ -f "./perf" ]; then 31 | echo "perf built successfully!" 32 | cp ./perf ../../.. 33 | echo "You can now use ./perf or move it to /usr/bin (preferred)." 34 | echo " sudo cp ./perf /usr/bin/" 35 | echo "linux-src directory can be deleted to save space" 36 | 37 | else 38 | echo "Build failed. Check for missing dependencies." 39 | exit 1 40 | fi 41 | 42 | -------------------------------------------------------------------------------- /scripts/parse_perf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # Parse a perf script call stack enabled output to generate dot file. 3 | # Syntax/Flow: 4 | # $ sudo perf script > perf.output 5 | # $ python parse_perf.py perf.output perf.dot 6 | # $ xdot perf.dot 7 | # 8 | 9 | import re 10 | import sys 11 | from collections import defaultdict 12 | 13 | # Whether to include offset in function name. Results in more function nodes 14 | include_offset = False 15 | 16 | def parse_perf_output(perf_files): 17 | edges = defaultdict(int) 18 | call_counts = defaultdict(int) 19 | skip = False; 20 | for perf_file in perf_files: 21 | with open(perf_file, 'r') as file: 22 | current_stack = {} 23 | record = [] # Temporary list to hold function calls for a record 24 | 25 | for line in file: 26 | line = line.strip() 27 | if line and skip == True: 28 | continue 29 | if not line: 30 | if skip == True: 31 | skip = False; 32 | continue 33 | # Process the record, then reset for the next record 34 | process_record(record, current_stack, edges, call_counts) 35 | record = [] # Reset the record 36 | skip = False 37 | continue 38 | 39 | if ':' in line: 40 | match = re.search(r'probe(?:_[^:]*)?:([^:]+):', line) 41 | if not match: 42 | continue 43 | function_name = match.group(1).strip() 44 | if function_name.endswith("_return"): 45 | skip = True; 46 | continue 47 | if function_name.endswith("_entry"): 48 | function_name = function_name[:-6] #strip "_entry" 49 | 50 | else: 51 | #match = re.search(r'\s+(.*?)\+0x[0-9a-fA-F]+', line) if include_offset else re.search(r'\s+(.*?)\+0x', line) 52 | if include_offset: 53 | match = re.search(r'\s+([a-zA-Z0-9_:]+)(?:\([^)]*\))?(?:\+0x([0-9a-fA-F]+))?', line) 54 | else: 55 | match = re.search(r'\s+([a-zA-Z0-9_:]+)(?:\([^)]*\))?(?:\+0x[0-9a-fA-F]+)?', line) 56 | 57 | if not match: 58 | continue 59 | function_name = match.group(0).split()[-1] if include_offset else match.group(1).strip() 60 | if function_name.endswith("_entry"): 61 | function_name = function_name[:-6] #strip "_entry" 62 | record.append(function_name) 63 | 64 | # Process any remaining record after the last line 65 | if record: 66 | process_record(record, current_stack, edges, call_counts) 67 | 68 | return edges, call_counts 69 | 70 | def process_record(record, current_stack, edges, call_counts): 71 | record.reverse() # Reverse the record so the parent call is at index 0 72 | 73 | # Find the point of divergence where the current record differs from the current stack 74 | min_length = 0 75 | if record[0] in current_stack: 76 | min_length = min(len(record), len(current_stack[record[0]])) 77 | 78 | divergence_point = min_length # Assume divergence at the end of the shortest list 79 | for i in range(min_length): 80 | if record[i] != current_stack[record[0]][i]: 81 | divergence_point = i 82 | break 83 | 84 | previous_function = record[divergence_point - 1] if divergence_point > 0 else None 85 | for function in record[divergence_point:]: 86 | call_counts[function] += 1 87 | if previous_function: 88 | edges[(previous_function, function)] += 1 89 | previous_function = function 90 | 91 | # Update the current stack to the new stack 92 | current_stack[record[0]] = record 93 | 94 | def generate_dot(edges, call_counts, output_file): 95 | with open(output_file, 'w') as f: 96 | f.write("digraph G {\n") 97 | for function, count in call_counts.items(): 98 | f.write(f' "{function}" [label="{function} (Calls: {count})"];\n') 99 | for (caller, callee), count in edges.items(): 100 | f.write(f' "{caller}" -> "{callee}" [label="{count} calls"];\n') 101 | f.write("}\n") 102 | 103 | if __name__ == "__main__": 104 | if len(sys.argv) < 3: 105 | print("Usage: python parse_perf.py ... ") 106 | sys.exit(1) 107 | edges, call_counts = parse_perf_output(sys.argv[1:-1]) 108 | generate_dot(edges, call_counts, sys.argv[-1]) 109 | 110 | -------------------------------------------------------------------------------- /setup_dependency.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Tested with python 3.10 & 3.12 4 | 5 | # Get the absolute path of the directory where the script resides 6 | root_dir=$(dirname "$(realpath "$0")") 7 | 8 | # Install babel trace 9 | sudo apt-get install libbabeltrace-dev 10 | mkdir -p $root_dir/3rdparty 11 | 12 | cd $root_dir/3rdparty 13 | # Clone, patch and build ctf2ctf 14 | git clone --depth 1 https://github.com/KDAB/ctf2ctf.git 15 | cd ctf2ctf 16 | git fetch --depth 1 origin 489cc5e8dd5ecf51ed3e37a012a287a16c16b51c 17 | git checkout 489cc5e8dd5ecf51ed3e37a012a287a16c16b51c 18 | git apply $root_dir/extra/perf_support.patch 19 | git submodule update --init --recursive 20 | mkdir -p ./build 21 | cmake -B ./build -S ./ 22 | make -j$(nproc) -C ./build 23 | 24 | cd $root_dir/3rdparty 25 | 26 | # Clone trace2html tool 27 | git clone --depth 1 https://chromium.googlesource.com/catapult 28 | 29 | # Delete these two packages and install them via requirements.txt. This due to incompatibility with python 3.12 30 | rm -rf ./catapult/third_party/six 31 | rm -rf ./catapult/third_party/beautifulsoup4 32 | 33 | # Clone flamegraph implementation 34 | git clone --depth 1 https://github.com/brendangregg/FlameGraph.git 35 | 36 | cd $root_dir 37 | pip install -r requirements.txt 38 | --------------------------------------------------------------------------------