├── .gitignore ├── .gitmodules ├── CHANGELOG.md ├── LICENSE ├── Makefile ├── README.md ├── bin ├── cpuactturbo ├── cpumhz ├── cpumhzturbo ├── lsds ├── psn ├── run_xcapture.sh ├── run_xcpu.sh ├── schedlat ├── syscallargs ├── tracepointargs ├── vmtop └── xtop ├── doc └── licenses │ └── Python-license.txt ├── experiments ├── README.md ├── faster-biolatency │ ├── biolatency.bpf.c │ ├── biolatency.c │ ├── biolatency.h │ └── fio │ │ ├── allmulti.sh │ │ └── onessd.sh ├── oracle │ ├── README.md │ ├── xcapture-bpf │ └── xcapture-bpf.c ├── xcapture-v0-proc │ ├── 0xtools.spec │ ├── Makefile │ ├── release.sh │ ├── src │ │ └── xcapture.c │ ├── xcapture-restart.service │ ├── xcapture-restart.timer │ ├── xcapture.default │ └── xcapture.service ├── xcapture-v1-bpftrace │ ├── transform.jq │ └── xcapture.bt └── xcapture-v2-bcc │ ├── xcapture-bpf │ └── xcapture-bpf.c ├── include ├── syscall_64.h ├── syscall_64_2.6.18.h ├── syscall_64_2.6.32.h └── syscall_names.h ├── lib └── 0xtools │ ├── argparse.py │ ├── psnproc.py │ └── psnreport.py ├── tools ├── README.md ├── sql │ └── sclathist.sql └── xq └── xcapture ├── Makefile ├── include ├── blk_types.h ├── syscall_aarch64.h ├── syscall_aarch64.tbl ├── syscall_arg1_is_fd.txt ├── syscall_arm64.h ├── syscall_fd_bitmap_aarch64.h ├── syscall_fd_bitmap_x86_64.h ├── syscall_names_aarch64.h ├── syscall_names_x86_64.h ├── syscall_x86_64.h ├── syscall_x86_64.tbl ├── xcapture.h └── xcapture_user.h ├── src ├── filters │ └── README.md ├── helpers │ └── file_helpers.h ├── maps │ └── xcapture_maps.h ├── probes │ ├── io │ │ ├── iorq.bpf.c │ │ └── iorq.bpf.h │ ├── syscall │ │ ├── syscall.bpf.c │ │ └── syscall.bpf.h │ └── task │ │ └── task.bpf.c ├── retrievers │ └── README.md ├── user │ ├── main.c │ ├── task_handler.c │ ├── task_handler.h │ ├── tracking_handler.c │ └── tracking_handler.h └── utils │ ├── md5.c │ ├── md5.h │ └── xcapture_helpers.h └── tests ├── Makefile ├── README.md ├── md5_test.c └── test_md5.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.swp 3 | bin/xcapture 4 | bin/xcap/* 5 | *.tar.gz 6 | *.o 7 | xcapture/.output/ 8 | xcapture/out/ 9 | xcapture/xcapture 10 | xcapture/tests/md5_test 11 | xcapture/.clangd 12 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "libbpf-bootstrap"] 2 | path = libbpf-bootstrap 3 | url = https://github.com/libbpf/libbpf-bootstrap.git 4 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # 0x.tools changelog 2 | 3 | 3.0.0-alpha-preview 4 | ===================== 5 | * Moved older xcapture versions (and experiemental prototypes to `experiments/`) directory 6 | - xcapture-v0-proc still works just fine (reading /proc files) and is useful for always-on thread sampling on older Linux kernels without eBPF 7 | - xcapture-v1-bpftrace is a simple hack + prototype 8 | - xcapture-v2-bcc is a more functional version, but also prototype-grade (but you can see how the "xtop" tool could work with stack tracing etc) 9 | - xcapture-v3: The current `bin/xcapture` is now the final, modern eBPF + libbpf based tool for efficient, always-on thread activity tracking that will eventually graduate to production grade (my current plan is to launch it with my talk at [P99CONF 2025](https://www.p99conf.io) (in 22-23 October 2025). 10 | 11 | 1.2.3 12 | ====================== 13 | * OS support 14 | - add unistd.h lookup for arm64 and ppc64le platforms 15 | 16 | * Installation & running 17 | - add RPMspec for creating RPMs 18 | - add systemd service definitions 19 | 20 | * Features 21 | - add more "single file descriptor" syscalls that can report filenames accessed 22 | - report /proc/PID/ns/pid for container namespace info 23 | - systems with python3 now print out extra newlines like intended 24 | 25 | 1.1.0 26 | ====================== 27 | * general 28 | - using semantic versioning now (major.minor.patch) 29 | - in the future, will update version numbers in a specific tool only when it was updated 30 | 31 | * pSnapper 32 | - `psn` works with python 3 now too (uses whereever the "/usr/bin/env python" command points to) 33 | 34 | * xcapture 35 | - Fixed xcapture compiler warnings shown on newer gcc versions 36 | - More precise sampling interval (account for sampling busy-time and subtract that from next sleep duration) 37 | - Under 1 sec sleep durations supported (For example `-d 0.1` for sampling at 10 Hz) 38 | 39 | * make/install 40 | - by default, executables go to `/usr/bin` now 41 | - python libraries go under PREFIX/lib/0xtools now 42 | - use PREFIX option in makefile to adjust the installation root 43 | - makefile uses the `install` command instead of the `ln -s` hack for installing files 44 | - `make uninstall` removes installed files 45 | 46 | 0.18 47 | ====================== 48 | * New column 49 | - `filenamesum` column strips numbers out of filenames to summarize events against similar files 50 | 51 | 0.16 52 | ====================== 53 | * New script 54 | - schedlat.py - show scheduling latency of a single process 55 | 56 | 0.15 57 | ====================== 58 | * Minor changes only 59 | - Handle SIGPIPE to not get `IOError: [Errno 32] Broken pipe` error when piping pSnapper output to other tools like "head" 60 | - Change the info link tp.dev/psnapper to tanelpoder.com/psnapper 61 | 62 | 0.14 63 | ====================== 64 | * report file names that are accessed with I/O syscalls with arg0 as the file descriptor 65 | - example: `sudo psn -G syscall,filename` 66 | - works with read, write, pread, fsync, recvmsg, sendmsg etc, but not with batch io syscalls like io_submit(), select() that may submit multiple fds per call 67 | 68 | * no need to install kernel-headers package anymore as pSnapper now has the unistd.h file bundled with the install 69 | - no more exceptions complaining about missing unistd_64.h file 70 | - pSnapper still tries to use the unistd.h file from a standard /usr/include location, but falls back to the bundled one if the file is missing. this should help with using pSnapper on other platforms too (different processor architectures, including 32bit vs 64bit versions of the same architecture have different syscall numbers 71 | 72 | * pSnapper can now run on RHEL5 equivalents (2.6.18 kernel), however with separately installed python26 or later, as I haven't "downgraded" pSnapper's python code to work with python 2.4 (yet) 73 | - you could install python 2.6 or 2.7 manually in your own directory or use the EPEL package: (yum install epel-release ; yum install python26 ) 74 | - you will also need to uncomment the 2nd line in psn script (use #!/usr/bin/env/python26 instead of python) 75 | - note that 2.6.18 kernel doesnt provide syscall,file name and kstack sampling (but wchan is available) 76 | 77 | 78 | 79 | 0.13 80 | ====================== 81 | * kernel stack summary reporting - new column `kstack` 82 | * wider max column length (for kstack) 83 | * add `--list` option to list all available columns 84 | * replace digits from `comm` column by default to collapse different threads of the same thing into one. you can use `comm2` to see the unedited process comm. 85 | 86 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | CC ?= gcc 2 | PREFIX ?= /usr 3 | 4 | # Default target 5 | .PHONY: all 6 | all: 7 | @echo "To install the 0xtools utilities, please run:" 8 | @echo " sudo make install" 9 | @echo "" 10 | @echo "To remove the 0xtools utilities, please run:" 11 | @echo " sudo make uninstall" 12 | 13 | install: 14 | install -m 0755 -d ${PREFIX}/bin 15 | install -m 0755 bin/psn ${PREFIX}/bin/psn 16 | install -m 0755 -d ${PREFIX}/lib/0xtools 17 | install -m 0644 lib/0xtools/psnproc.py ${PREFIX}/lib/0xtools/psnproc.py 18 | install -m 0644 lib/0xtools/psnreport.py ${PREFIX}/lib/0xtools/psnreport.py 19 | install -m 0644 lib/0xtools/argparse.py ${PREFIX}/lib/0xtools/argparse.py 20 | install -m 0755 bin/schedlat ${PREFIX}/bin/schedlat 21 | install -m 0755 bin/vmtop ${PREFIX}/bin/vmtop 22 | install -m 0755 bin/syscallargs ${PREFIX}/bin/syscallargs 23 | install -m 0755 bin/tracepointargs ${PREFIX}/bin/tracepointargs 24 | install -m 0755 bin/cpumhz ${PREFIX}/bin/cpumhz 25 | install -m 0755 bin/cpumhzturbo ${PREFIX}/bin/cpumhzturbo 26 | install -m 0755 bin/cpuactturbo ${PREFIX}/bin/cpuactturbo 27 | install -m 0755 bin/lsds ${PREFIX}/bin/lsds 28 | 29 | uninstall: 30 | rm -fv ${PREFIX}/bin/psn 31 | rm -fv ${PREFIX}/bin/schedlat ${PREFIX}/bin/vmtop ${PREFIX}/bin/syscallargs ${PREFIX}/bin/tracepointargs 32 | rm -fv ${PREFIX}/bin/cpumhz ${PREFIX}/bin/cpumhzturbo ${PREFIX}/bin/cpuactturbo ${PREFIX}/bin/lsds 33 | rm -fv ${PREFIX}/lib/0xtools/psnproc.py ${PREFIX}/lib/0xtools/psnreport.py ${PREFIX}/lib/0xtools/argparse.py 34 | rm -rfv ${PREFIX}/lib/0xtools 35 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## XCapture v3.0.0-alpha 2 | _By Tanel Poder_ 3 | _2025-04-22_ 4 | 5 | This is the first ever release of [0x.tools](https://0x.tools) XCapture tool that is built with **modern eBPF**! My previous tools and prototypes were using either _bcc_, _bpftrace_ or were just sampling and aggregating thread level info from _/proc_ files. 6 | 7 | * [Announcing xCapture v3: Linux Performance Analysis with Modern eBPF and DuckDB](https://tanelpoder.com/posts/xcapture-v3-alpha-ebpf-performance-analysis-with-duckdb/) 8 | 9 | ## Requirements 10 | 11 | Modern eBPF means `libbpf`, `CORE`, `BTF`, `BPF iterators`, etc. I'll write about my learning journey with proper thank you notes soon. 12 | 13 | In practice this means you'll need to be on a **Linux kernel 5.14** or up. XCapture v3 is a future-facing tool, so I'll invest the time in that direction and not worry about all the legacy systems out there (unlike my approach was with all my previous tools was). 14 | 15 | This means, RHEL9+ on Linux 5.14, or Oracle Enterprise Linux 8+, as long as you run at least their UEK7 Linux kernel (5.15). Ubuntu has pretty new kernels (and they have the HWE versions), so Ubuntu 20+ with the latest HWE kernel available for it should work. I have done my latest tests on Ubuntu 24.04 on Linux 6.8 though (will keep you updated once I test more). 16 | 17 | ## Building xcapture-next (v3) 18 | 19 | ``` 20 | git clone https://github.com/tanelpoder/0xtools.git 21 | cd 0xtools 22 | ``` 23 | 24 | To install the system packages (on Ubuntu 24.04) for compiling the binary, run: 25 | 26 | ``` 27 | sudo apt install make gcc pkg-config libbpf-dev libbpf-tools clang llvm libbfd-dev libelf1 libelf-dev zlib1g-dev 28 | ``` 29 | 30 | On RHEL9: 31 | 32 | ``` 33 | sudo dnf install libbpf libbpf-tools clang llvm-devel binutils-devel elfutils-libelf elfutils-libelf-devel zlib-devel 34 | ``` 35 | 36 | To install required libbpf dependencies for the GitHub repo, run: 37 | 38 | ``` 39 | git submodule update --init --recursive 40 | ``` 41 | 42 | ## Running xcapture in developer mode 43 | 44 | By default, xcapture prints some of its fields as formatted output to your terminal screen: 45 | 46 | 47 | ``` 48 | cd xcapture 49 | make 50 | sudo ./xcapture 51 | ``` 52 | 53 | The eventual "always-on" production mode for appending samples to hourly CSV files is enabled by the `-o DIRNAME` option. You can use `-o .` to output to your current directory. 54 | 55 | > While XCapture requires root privileges to load its eBPF programs and do its sampling, the consumers of the output CSV files **do not have to be root**! They can be any regular user who has the Unix filesystem permissions to read the output directory and CSV files. This provides a nice separation of duties. And you can analyze the "dimensional data warehouse" of Linux thread activity from any angle _you_ want, without having to update or change XCapture itself. 56 | 57 | You can also run `./xcapture --help` to get some idea of its current functionality. 58 | 59 | **NB!** While all the syscall & IO _tracking_ action happens automatically in the kernel space, the simulatneous _sampling_ of the tracked events is driven by the userspace `xcapture` program. The thread state sampling loop actually runs completely inside the kernel too, thanks to eBPF _task iterators_, but the invocation and frequency of the sampling is driven by the userspace program. 60 | 61 | Therefore it makes sense to schedule the userspace "sampling driver" with a high scheduling priority, to get consistently reoccurring samples from it. I run it like this and recommend that you do too: 62 | 63 | ``` 64 | $ sudo TZ=:/etc/localtime chrt -r 30 ./xcapture -vo DIRNAME 65 | ``` 66 | 67 | The `chrt` puts the userspace xcapture program into real-time scheduling class. It's a single, single-threaded prodess and you'll only need to run only one in the host and it can monitor all threads in the system. By default it wakes up once per second and tells the eBPF task iterator to do its sampling, gets results via an eBPF ringbuf and writes the records either to STDOUT or CSV files. 68 | 69 | The entire sampling loop itself is very quick, from ~100us in my laptop VMs, to ~20ms per wakeup in a large NUMA machine with 384 CPUs. So, XCapture _passive sampling_ at 1Hz without _active tracking_ of event latencies has only taken between 0.01% and 2% _**of a single CPU**__ in my servers! (The _2% of-a-single-CPU_ result is from my AMD EPYC server with 384 CPUs :-) 70 | 71 | The `TZ:=/etc/localtime` setting gives you two things: 72 | 73 | 1) You can choose your own human wall-clock timezone in which to print out various timestamps. You can set `TZ=` (to empty value) to get times in UTC. The kernel and eBPF programs don't deal with human time internally, the CLOCK\_MONOTONIC clock-source I'm using is just stored as number of nanoseconds from some arbitrary point in the past. 74 | 2) The timezone environment variable also reduces `xcapture` userspace CPU usage, as otherwise it would go and check some `/etc/localtimezone` file or something like that on every `snprintf()` library call. 75 | 76 | 77 | ## That's all! 78 | 79 | Back to [0x.tools](https://0x.tools) 80 | 81 | -------------------------------------------------------------------------------- /bin/cpuactturbo: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright 2022-2025, Tanel Poder [0x.tools] 4 | # SPDX-License-Identifier: GPL-2.0-or-later 5 | # 6 | # Program: cpuactturbo 7 | # 8 | # Purpose: Show a summary of actual CPUs "busyness" (being in C0-state and not idle), 9 | # in somewhat tricky units of MHz, grouped by 100MHz increments. 10 | # 11 | # The "act" in cpuactturbo stands for actual (or active) running time for the CPU. 12 | # 13 | # Since there are different levels of "idle", this output is too high 14 | # level for *really understanding* your CPU (power) usage. For that, 15 | # you'd need to use raw turbostat output to see the C1,C2,... states too 16 | # (man turbostat helps) 17 | # 18 | # Notes: You need sudo/root access for this tool as turbostat requires it (setcap 19 | # is also possible) 20 | # 21 | # Important: This only shows you something about how often (and how fast) the CPU clocks 22 | # ran during the measurement period. This tool does not drill down into 23 | # actual CPU frontend/backend/latency metrics within the CPU. You'd need 24 | # "perf stat -d" for that or similar tools. 25 | # 26 | # One example of "perf stat -d" usage is in my blog: 27 | # https://tanelpoder.com/2015/09/21/ram-is-the-new-disk-and-how-to-measure-its-performance-part-2-tools/ 28 | 29 | # As this script needs to be ran as root (with sudo), /usr/bin might not be in root's path 30 | TURBOSTAT=/usr/bin/turbostat 31 | 32 | # turbostat requires root 33 | if [ "$USER" != root ]; then 34 | echo "Error: This script uses turbostat which requires root privs. Run sudo $0" >&2 35 | exit 1 36 | fi 37 | 38 | # create bars for barchart 39 | print_blocks() { 40 | local count=$1 41 | case $count in 42 | 1) printf "▏";; # 1/8 block 43 | 2) printf "▎";; # 1/4 block 44 | 3) printf "▍";; # 3/8 block 45 | 4) printf "▌";; # 1/2 block 46 | 5) printf "▋";; # 5/8 block 47 | 6) printf "▊";; # 3/4 block 48 | 7) printf "▉";; # 7/8 block 49 | 8) printf "█";; # full block 50 | *) 51 | # For numbers > 8, print full blocks and then a partial block 52 | local full_blocks=$((count / 8)) 53 | local remainder=$((count % 8)) 54 | 55 | for ((i=0; i 8, print full blocks and then a partial block 31 | local full_blocks=$((count / 8)) 32 | local remainder=$((count % 8)) 33 | 34 | for ((i=0; i&2 27 | exit 1 28 | fi 29 | 30 | # create bars for barchart 31 | print_blocks() { 32 | local count=$1 33 | case $count in 34 | 1) printf "▏";; # 1/8 block 35 | 2) printf "▎";; # 1/4 block 36 | 3) printf "▍";; # 3/8 block 37 | 4) printf "▌";; # 1/2 block 38 | 5) printf "▋";; # 5/8 block 39 | 6) printf "▊";; # 3/4 block 40 | 7) printf "▉";; # 7/8 block 41 | 8) printf "█";; # full block 42 | *) 43 | # For numbers > 8, print full blocks and then a partial block 44 | local full_blocks=$((count / 8)) 45 | local remainder=$((count % 8)) 46 | 47 | for ((i=0; i= 3: 64 | field_type = ' '.join(field_info[1:-1]) 65 | field_name = field_info[-1] 66 | if 'args' not in syscall_info: 67 | syscall_info['args'] = [] 68 | syscall_info['args'].append((field_type, field_name)) 69 | 70 | return syscall_info 71 | 72 | def list_syscalls(syscalls_path): 73 | syscalls = [] 74 | for root, dirs, files in os.walk(syscalls_path): 75 | dirs[:] = [d for d in dirs if d.startswith("sys_enter_")] 76 | for file in files: 77 | if file == 'format': 78 | file_path = os.path.join(root, file) 79 | syscall_info = parse_syscall_format(file_path) 80 | syscalls.append(syscall_info) 81 | 82 | if not syscalls: 83 | if os.geteuid() != 0: 84 | print("Error: No syscalls found. Please run this as root or mount the debugfs with proper permissions.", file=sys.stderr) 85 | else: 86 | print(f"Error: No syscalls found in the specified path {syscalls_path}", file=sys.stderr) 87 | sys.exit(1) 88 | 89 | return syscalls 90 | 91 | 92 | def main(): 93 | signal.signal(signal.SIGPIPE, signal.SIG_DFL) # for things like: ./syscallargs | head 94 | 95 | parser = argparse.ArgumentParser(description='List kernel system calls and their arguments from Linux debugfs.') 96 | parser.add_argument('-l', '--newlines', action='store_true', help='Print each system call and its arguments on a new line') 97 | parser.add_argument('-i', '--id', action='store_true', help='Print syscall ID (not the same thing as internal syscall_nr)') 98 | parser.add_argument('-t', '--typeinfo', action='store_true', help='Include type information for syscall arguments in the output') 99 | parser.add_argument('-V', '--version', action='version', version=f"%(prog)s {__version__} by {__author__} [{__url__}]", help='Show the program version and exit') 100 | parser.add_argument('--path', type=str, default=DEFAULT_PATH, help=f'Path to the debugfs syscalls directory: {DEFAULT_PATH}') 101 | 102 | args = parser.parse_args() 103 | syscalls_path = args.path 104 | syscalls = list_syscalls(syscalls_path) 105 | 106 | for syscall in syscalls: 107 | if args.id: 108 | print(f"{syscall['name']}", end=f" // {syscall['id']}\n" if args.newlines else "(") 109 | else: 110 | print(f"{syscall['name']}", end="\n" if args.newlines else "(") 111 | 112 | args_list = [] 113 | for index, (arg_type, arg_name) in enumerate(syscall['args'][1:]): 114 | if args.typeinfo: 115 | argout = f"({arg_type}) {arg_name}" 116 | else: 117 | argout = arg_name 118 | if args.newlines: 119 | print(f" {index}: {argout}") 120 | else: 121 | args_list.append(argout) 122 | if not args.newlines: 123 | print(", ".join(args_list), end="") 124 | 125 | if args.id: 126 | print("" if args.newlines else f"); // {syscall['id']}") 127 | else: 128 | print("" if args.newlines else ");") 129 | 130 | 131 | if __name__ == "__main__": 132 | main() 133 | 134 | 135 | -------------------------------------------------------------------------------- /bin/tracepointargs: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # tracepointargs -- List Linux tracepoint events, their arguments 4 | # and expand related structures if an argument is a structure 5 | # 6 | # Copyright 2025 Tanel Poder [https://0x.tools] 7 | # 8 | # This program is free software; you can redistribute it and/or modify 9 | # it under the terms of the GNU General Public License as published by 10 | # the Free Software Foundation; either version 2 of the License, or 11 | # (at your option) any later version. 12 | # 13 | # This program is distributed in the hope that it will be useful, 14 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 15 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 16 | # GNU General Public License for more details. 17 | # 18 | # SPDX-License-Identifier: GPL-2.0-or-later 19 | 20 | import os 21 | import sys 22 | import argparse 23 | import signal 24 | import re 25 | import json 26 | from typing import Dict, List, Tuple, Optional 27 | 28 | __version__ = "1.0.0" 29 | __description__ = "List tracepoint event arguments from debugfs" 30 | __author__ = "Tanel Poder" 31 | __date__ = "2025-01-30" 32 | __url__ = "https://0x.tools" 33 | 34 | DEFAULT_PATH = '/sys/kernel/debug/tracing/events' 35 | 36 | # Parse C struct definitions from vmlinux.h 37 | def parse_vmlinux_structs(vmlinux_path: str) -> Dict[str, List[Tuple[str, str]]]: 38 | if not os.path.exists(vmlinux_path): 39 | print(f"Error: vmlinux.h file not found: {vmlinux_path}", file=sys.stderr) 40 | sys.exit(1) 41 | 42 | structs = {} 43 | current_struct = None 44 | current_members = [] 45 | 46 | with open(vmlinux_path, 'r') as f: 47 | for line in f: 48 | line = line.strip() 49 | 50 | # Start of struct definition 51 | if line.startswith('struct ') and '{' in line: 52 | struct_name = line.split()[1].split('{')[0].strip() 53 | current_struct = struct_name 54 | current_members = [] 55 | 56 | # Struct member 57 | elif current_struct and ';' in line and not line.startswith('}'): 58 | # Remove comments if present 59 | line = line.split('//')[0].strip() 60 | if line: 61 | # Split type and name, handling pointers and arrays 62 | parts = line.rstrip(';').strip().split() 63 | if len(parts) >= 2: 64 | member_type = ' '.join(parts[:-1]) 65 | member_name = parts[-1].split('[')[0] # Remove array dimensions 66 | current_members.append((member_type, member_name)) 67 | 68 | # End of struct 69 | elif line.startswith('};') and current_struct: 70 | if current_members: # Only add if struct has members 71 | structs[current_struct] = current_members 72 | current_struct = None 73 | current_members = [] 74 | 75 | return structs 76 | 77 | def parse_event_format(file_path): 78 | event_info = {} 79 | with open(file_path, 'r') as file: 80 | lines = file.readlines() 81 | for line in lines: 82 | line = line.strip() 83 | if line.startswith('name:'): 84 | event_info['name'] = line.split(':')[1].strip() 85 | elif line.startswith('ID:'): 86 | event_info['id'] = int(line.split()[1].strip()) 87 | elif line.startswith('field:'): 88 | # example: field:unsigned short common_type; offset:0; size:2; signed:0; 89 | field_info = line.split(';')[0].replace(':', ' ').split() 90 | if len(field_info) >= 3: 91 | field_type = ' '.join(field_info[1:-1]) 92 | field_name = field_info[-1] 93 | if 'args' not in event_info: 94 | event_info['args'] = [] 95 | # Skip common_* fields as they're present in all events 96 | if not field_name.startswith('common_'): 97 | event_info['args'].append((field_type, field_name)) 98 | 99 | return event_info 100 | 101 | def list_tracepoints(events_path): 102 | tracepoints = [] 103 | for event_type in os.listdir(events_path): 104 | type_path = os.path.join(events_path, event_type) 105 | if not os.path.isdir(type_path): 106 | continue 107 | 108 | for event_name in os.listdir(type_path): 109 | event_dir = os.path.join(type_path, event_name) 110 | if not os.path.isdir(event_dir): 111 | continue 112 | 113 | format_file = os.path.join(event_dir, 'format') 114 | if os.path.isfile(format_file): 115 | event_info = parse_event_format(format_file) 116 | event_info['type'] = event_type 117 | tracepoints.append(event_info) 118 | 119 | if not tracepoints: 120 | if os.geteuid() != 0: 121 | print("Error: No tracepoints found. Please run this as root or mount the debugfs with proper permissions.", 122 | file=sys.stderr) 123 | else: 124 | print(f"Error: No tracepoints found in the specified path {events_path}", file=sys.stderr) 125 | sys.exit(1) 126 | 127 | return tracepoints 128 | 129 | def format_struct_members(struct_name: str, members: List[Tuple[str, str]], structs: Dict[str, List[Tuple[str, str]]], 130 | indent: str = '', follow_structs: bool = False, depth: int = 0, 131 | visited: Optional[set] = None) -> str: 132 | if not members: 133 | return f"{struct_name} {{}}" 134 | 135 | # Prevent infinite recursion 136 | if visited is None: 137 | visited = set() 138 | if struct_name in visited: 139 | return f"{struct_name} /* recursive reference */" 140 | visited.add(struct_name) 141 | 142 | # Limit maximum recursion depth (linked lists, etc) 143 | MAX_DEPTH = 2 144 | if depth > MAX_DEPTH: 145 | return f"{struct_name} /* max depth reached */" 146 | 147 | result = [f"{struct_name} {{"] 148 | for type_name, member_name in members: 149 | line = f"{indent} {type_name} {member_name};" 150 | result.append(line) 151 | 152 | # If following structs and this is a struct type, expand it 153 | if follow_structs and 'struct ' in type_name: 154 | # Extract the struct name, handling pointers and const 155 | sub_struct = type_name.split('struct ')[-1].split()[0] 156 | sub_struct = sub_struct.replace('*', '').replace('const', '').strip() 157 | 158 | if sub_struct in structs and sub_struct not in visited: 159 | # Expand the sub-struct with increased indentation 160 | sub_expansion = format_struct_members( 161 | f"struct {sub_struct}", 162 | structs[sub_struct], 163 | structs, 164 | indent + ' ', 165 | follow_structs, 166 | depth + 1, 167 | visited.copy() # Create a new copy for each branch 168 | ) 169 | result.append(f"{indent} {sub_expansion}") 170 | 171 | result.append(f"{indent}}}") 172 | return '\n'.join(result) 173 | 174 | def main(): 175 | signal.signal(signal.SIGPIPE, signal.SIG_DFL) # for things like: ./tracepointargs | head 176 | 177 | parser = argparse.ArgumentParser(description='List kernel tracepoint events and their arguments from Linux debugfs.') 178 | parser.add_argument('-l', '--newlines', action='store_true', 179 | help='Print each tracepoint and its arguments on a new line') 180 | parser.add_argument('-i', '--id', action='store_true', 181 | help='Print tracepoint ID') 182 | parser.add_argument('-t', '--typeinfo', action='store_true', 183 | help='Include type information for tracepoint arguments in the output') 184 | parser.add_argument('-a', '--expand-structs', action='store_true', 185 | help='Access and expand struct definitions from vmlinux.h (requires -t)') 186 | parser.add_argument('-f', '--follow-structs', action='store_true', 187 | help='Recursively expand nested struct definitions (requires -a)') 188 | parser.add_argument('--vmlinux', type=str, 189 | help='Path to vmlinux.h file for struct definitions') 190 | parser.add_argument('-s', '--sort', choices=['name', 'type'], default='name', 191 | help='Sort output by event name or type (default: name)') 192 | parser.add_argument('-V', '--version', action='version', 193 | version=f"%(prog)s {__version__}", 194 | help='Show the program version and exit') 195 | parser.add_argument('pattern', nargs='?', type=str, 196 | help='Regex pattern to filter events (e.g., "block/*done*" or ".*write.*")') 197 | parser.add_argument('--path', type=str, default=DEFAULT_PATH, 198 | help=f'Path to the debugfs events directory: {DEFAULT_PATH}') 199 | parser.add_argument('--type', type=str, help='Filter events by type (e.g., "syscalls", "block")') 200 | 201 | args = parser.parse_args() 202 | 203 | # Load struct definitions if requested 204 | structs = {} 205 | if args.expand_structs: 206 | if not args.typeinfo: 207 | print("Error: --expand-structs (-a) requires --typeinfo (-t)", file=sys.stderr) 208 | sys.exit(1) 209 | if not args.vmlinux: 210 | print("Error: --expand-structs (-a) requires --vmlinux ", file=sys.stderr) 211 | sys.exit(1) 212 | if args.follow_structs and not args.expand_structs: 213 | print("Error: --follow-structs (-f) requires --expand-structs (-a)", file=sys.stderr) 214 | sys.exit(1) 215 | structs = parse_vmlinux_structs(args.vmlinux) 216 | 217 | tracepoints = list_tracepoints(args.path) 218 | 219 | # Filter by type if specified 220 | if args.type: 221 | tracepoints = [tp for tp in tracepoints if tp['type'] == args.type] 222 | if not tracepoints: 223 | print(f"No tracepoints found for type: {args.type}", file=sys.stderr) 224 | sys.exit(1) 225 | 226 | # Filter by regex pattern if specified 227 | if args.pattern: 228 | try: 229 | # Convert shell-style wildcards to regex 230 | pattern = args.pattern.replace('*', '.*') 231 | regex = re.compile(pattern) 232 | tracepoints = [tp for tp in tracepoints 233 | if regex.search(f"{tp['type']}/{tp['name']}")] 234 | if not tracepoints: 235 | print(f"No tracepoints found matching pattern: {args.pattern}", file=sys.stderr) 236 | sys.exit(1) 237 | except re.error as e: 238 | print(f"Invalid regex pattern: {e}", file=sys.stderr) 239 | sys.exit(1) 240 | 241 | # Sort tracepoints 242 | if args.sort == 'name': 243 | tracepoints.sort(key=lambda x: x['name']) 244 | else: # sort by type 245 | tracepoints.sort(key=lambda x: (x['type'], x['name'])) 246 | 247 | for tp in tracepoints: 248 | event_name = f"{tp['type']}/{tp['name']}" 249 | 250 | if args.id: 251 | print(f"{event_name}", end=f" // {tp['id']}\n" if args.newlines else "(") 252 | else: 253 | print(f"{event_name}", end="\n" if args.newlines else "(") 254 | 255 | args_list = [] 256 | for index, (arg_type, arg_name) in enumerate(tp.get('args', [])): 257 | if args.typeinfo: 258 | # Check if this type is a struct and we should expand it 259 | struct_expanded = "" 260 | if args.expand_structs: 261 | # Extract struct name, handling pointers 262 | struct_name = arg_type.strip() 263 | if struct_name.startswith('struct '): 264 | struct_name = struct_name.split()[1] 265 | # Remove pointer asterisks and const 266 | struct_name = struct_name.replace('*', '').replace('const', '').strip() 267 | if struct_name in structs: 268 | # Add extra indentation if using newlines mode 269 | base_indent = ' ' if args.newlines else ' ' 270 | struct_expanded = f"\n{base_indent}{format_struct_members(f'struct {struct_name}', structs[struct_name], structs, base_indent, args.follow_structs)}" 271 | 272 | argout = f"({arg_type}) {arg_name}{struct_expanded}" 273 | else: 274 | argout = arg_name 275 | if args.newlines: 276 | print(f" {index}: {argout}") 277 | else: 278 | args_list.append(argout) 279 | 280 | if not args.newlines: 281 | print(", ".join(args_list), end="") 282 | 283 | if args.id: 284 | print("" if args.newlines else f"); // {tp['id']}") 285 | else: 286 | print("" if args.newlines else ");") 287 | 288 | if __name__ == "__main__": 289 | main() 290 | -------------------------------------------------------------------------------- /bin/vmtop: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # This tool is part of https://0x.tools 4 | 5 | if [ $# -ne 1 ]; then 6 | echo "Usage: $0 SLEEP_SECONDS" 7 | exit 1 8 | fi 9 | 10 | F1=/tmp/vmtop1.$$.tmp 11 | F2=/tmp/vmtop2.$$.tmp 12 | 13 | cat /proc/vmstat > $F2 14 | 15 | while true ; do 16 | clear 17 | echo `date` " [0x.tools vmtop]" 18 | echo 19 | printf "%-32s %16s %16s %16s %16s\n" "METRIC" "DELTA" "DELTA_KB" "CURRENT" "CURRENT_MB" 20 | printf "%-32s %16s %16s %16s %16s\n" "-------------------------------" "----------------" "----------------" "----------------" "----------------" 21 | mv $F2 $F1 22 | cat /proc/vmstat > $F2 23 | join $F1 $F2 | grep ^nr | awk '{ printf("%-32s %16d %\47 16i %\47 16i %\47 16i\n", $1,$3-$2, ($3-$2)*4, $3, $3*4/1024) }' | grep -v ' 0 ' 24 | sleep $1 25 | done 26 | 27 | 28 | # TODO trap CTRL-C remove file 29 | 30 | -------------------------------------------------------------------------------- /bin/xtop: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | CURDIR="$(dirname "$(realpath "$0")")" 4 | 5 | ${CURDIR}/xcapture-bpf --xtop --clear-screen $* 6 | 7 | -------------------------------------------------------------------------------- /experiments/README.md: -------------------------------------------------------------------------------- 1 | # 0x.Tools: X-Ray vision for Linux systems 2 | 3 | **0x.tools** is a set of open-source utilities for analyzing application performance on Linux. It has a goal of deployment simplicity and minimal dependencies, to reduce friction of systematic troubleshooting. There’s no need to upgrade the OS, install kernel modules, heavy monitoring frameworks, Java agents or databases. Some of these tools also work on over-decade-old Linux kernels, like version 2.6.18 from 18 years ago. 4 | 5 | **0x.tools** allow you to measure individual thread level activity, like thread sleep states, currently executing system calls and kernel wait locations. Additionally, you can drill down into CPU usage of any thread or the system as a whole. You can be systematic in your troubleshooting - no need for guessing or genius wizard tricks with traditional system utilization stats. 6 | 7 | ## Announcing xcapture v3.0.0-alpha using modern eBPF and DuckDB! (2025-04-23) 8 | 9 | * https://tanelpoder.com/posts/xcapture-v3-alpha-ebpf-performance-analysis-with-duckdb/ 10 | 11 | ## OLD: xcapture-bpf and xtop v2.0.2 announced! (2024-07-03) 12 | 13 | xcapture-bpf (and xtop) are like the Linux top tool, but extended with x-ray vision and ability to view your performance data from any chosen angle (that eBPF allows to instrument). You can use it for system level overview and drill down into indivual threads' activity and soon even into individual kernel events like lock waits or memory stalls. eBPF is not only customizable, it's completely programmable and I plan to take full advantage of it. I have so far implemented less than 5% of everything this method and the new tool is capable of, stay tuned for more! 14 | 15 | * https://0x.tools 16 | 17 | ### xcapture-bpf demo 18 | This is one of the things that you get: 19 | 20 | [![asciicast](https://asciinema.org/a/666715.svg)](https://asciinema.org/a/666715) 21 | 22 | ### xcapture-bpf screenshot 23 | A screenshot that illustrates how xcapture-bpf output and stacktiles work with terminal search/highlighting and scroll-back ability: 24 | 25 | ![xcapture-bpf screenshot with terminal highlighting](https://0x.tools/images/xcapture-bpf-stacktiles.png) 26 | 27 | ### xcapture-bpf install instructions and info 28 | 29 | * Go to https://0x.tools for more info and the installation instructions of the latest eBPF-based tool 30 | 31 | ## Other tools 32 | 33 | An example of _one of_ the tools `psn` (that doesn't use eBPF, just reads the usual `/proc` files) is here: 34 | 35 | ``` 36 | $ sudo psn -p "mysqld|kwork" -G syscall,wchan 37 | 38 | Linux Process Snapper v0.14 by Tanel Poder [https://0x.tools] 39 | Sampling /proc/syscall, stat, wchan for 5 seconds... finished. 40 | 41 | 42 | === Active Threads ======================================================================================== 43 | 44 | samples | avg_threads | comm | state | syscall | wchan 45 | ----------------------------------------------------------------------------------------------------------- 46 | 25 | 3.12 | (mysqld) | Disk (Uninterruptible) | fsync | _xfs_log_force_lsn 47 | 16 | 2.00 | (mysqld) | Running (ON CPU) | [running] | 0 48 | 14 | 1.75 | (mysqld) | Disk (Uninterruptible) | pwrite64 | call_rwsem_down_write_failed 49 | 8 | 1.00 | (mysqld) | Disk (Uninterruptible) | fsync | submit_bio_wait 50 | 4 | 0.50 | (mysqld) | Disk (Uninterruptible) | pread64 | io_schedule 51 | 4 | 0.50 | (mysqld) | Disk (Uninterruptible) | pwrite64 | io_schedule 52 | 3 | 0.38 | (mysqld) | Disk (Uninterruptible) | pread64 | 0 53 | 3 | 0.38 | (mysqld) | Running (ON CPU) | [running] | io_schedule 54 | 3 | 0.38 | (mysqld) | Running (ON CPU) | pread64 | 0 55 | 2 | 0.25 | (mysqld) | Disk (Uninterruptible) | [running] | 0 56 | 1 | 0.12 | (kworker/*:*) | Running (ON CPU) | read | worker_thread 57 | 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | fsync | io_schedule 58 | 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | futex | call_rwsem_down_write_failed 59 | 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | poll | 0 60 | 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | pwrite64 | _xfs_log_force_lsn 61 | 1 | 0.12 | (mysqld) | Running (ON CPU) | fsync | submit_bio_wait 62 | 1 | 0.12 | (mysqld) | Running (ON CPU) | futex | futex_wait_queue_me 63 | ``` 64 | **Usage info** and more details here: 65 | * https://0x.tools 66 | 67 | **Twitter:** 68 | * https://twitter.com/0xtools 69 | 70 | **Author:** 71 | * https://tanelpoder.com 72 | 73 | -------------------------------------------------------------------------------- /experiments/faster-biolatency/biolatency.bpf.c: -------------------------------------------------------------------------------- 1 | // SPDX-License-Identifier: GPL-2.0 2 | // Copyright (c) 2020 Wenbo Zhang 3 | // Modified for PERCPU_HASH usage and timestamp selection 4 | // 5 | // 02-Apr-2025 Tanel Poder Changes below: 6 | // Rely on built-in [io_]start_time_ns fields in Linux kernel 7 | // Remove IO insert/issue TPs and starts map 8 | // Mark "hists" map as per-CPU map 9 | 10 | #include 11 | #include 12 | #include 13 | #include 14 | 15 | #include "biolatency.h" 16 | #include "bits.bpf.h" 17 | #include "core_fixes.bpf.h" 18 | 19 | #define MAX_ENTRIES 10240 20 | 21 | extern int LINUX_KERNEL_VERSION __kconfig; 22 | 23 | const volatile bool filter_cg = false; 24 | const volatile bool targ_per_disk = false; 25 | const volatile bool targ_per_flag = false; 26 | const volatile bool targ_queued = false; 27 | const volatile bool targ_ms = false; 28 | const volatile bool filter_dev = false; 29 | const volatile __u32 targ_dev = 0; 30 | const volatile bool targ_single = true; 31 | 32 | struct { 33 | __uint(type, BPF_MAP_TYPE_CGROUP_ARRAY); 34 | __type(key, u32); 35 | __type(value, u32); 36 | __uint(max_entries, 1); 37 | } cgroup_map SEC(".maps"); 38 | 39 | static struct hist initial_hist; 40 | 41 | struct { 42 | __uint(type, BPF_MAP_TYPE_PERCPU_HASH); 43 | __uint(max_entries, MAX_ENTRIES); 44 | __type(key, struct hist_key); 45 | __type(value, struct hist); 46 | } hists SEC(".maps"); 47 | 48 | static int handle_block_rq_complete(struct request *rq, int error, unsigned int nr_bytes) 49 | { 50 | struct hist_key hkey = {}; 51 | struct hist *histp; 52 | u64 slot, delta, start_ns; 53 | u64 ts; 54 | int ret; 55 | 56 | if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) { 57 | return 0; 58 | } 59 | 60 | ts = bpf_ktime_get_ns(); 61 | 62 | if (targ_queued) 63 | start_ns = BPF_CORE_READ(rq, start_time_ns); 64 | else 65 | start_ns = BPF_CORE_READ(rq, io_start_time_ns); 66 | 67 | delta = (s64)(ts - start_ns); 68 | 69 | if (delta < 0) { 70 | return 0; 71 | } 72 | 73 | hkey.dev = 0; // Initialize 74 | if (targ_per_disk) { 75 | struct gendisk *disk = NULL; 76 | struct request_queue *q = BPF_CORE_READ(rq, q); 77 | if (q) 78 | disk = BPF_CORE_READ(q, disk); 79 | 80 | if (disk) { 81 | u32 major = BPF_CORE_READ(disk, major); 82 | u32 minor = BPF_CORE_READ(disk, first_minor); 83 | hkey.dev = MKDEV(major, minor); 84 | } else { 85 | } 86 | } 87 | 88 | if (filter_dev && hkey.dev != targ_dev) { 89 | return 0; 90 | } 91 | 92 | hkey.cmd_flags = 0; 93 | if (targ_per_flag) 94 | hkey.cmd_flags = BPF_CORE_READ(rq, cmd_flags); 95 | 96 | histp = bpf_map_lookup_elem(&hists, &hkey); 97 | if (!histp) { 98 | ret = bpf_map_update_elem(&hists, &hkey, &initial_hist, BPF_ANY); 99 | if (ret < 0) { 100 | return 0; // Exit if update failed 101 | } 102 | histp = bpf_map_lookup_elem(&hists, &hkey); 103 | if (!histp) { 104 | return 0; 105 | } 106 | } else { 107 | } 108 | 109 | // Calculate log2 histogram slot 110 | if (targ_ms) 111 | delta /= 1000000U; 112 | else 113 | delta /= 1000U; 114 | 115 | slot = log2l(delta); 116 | if (slot >= MAX_SLOTS) 117 | slot = MAX_SLOTS - 1; 118 | 119 | histp->slots[slot]++; 120 | 121 | return 0; 122 | } 123 | 124 | SEC("tp_btf/block_rq_complete") 125 | int BPF_PROG(block_rq_complete_btf, struct request *rq, int error, unsigned int nr_bytes) 126 | { 127 | return handle_block_rq_complete(rq, error, nr_bytes); 128 | } 129 | 130 | char LICENSE[] SEC("license") = "GPL"; 131 | -------------------------------------------------------------------------------- /experiments/faster-biolatency/biolatency.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ 2 | #ifndef __BIOLATENCY_H 3 | #define __BIOLATENCY_H 4 | 5 | #define DISK_NAME_LEN 32 6 | #define MAX_SLOTS 27 7 | 8 | #define MINORBITS 20 9 | #define MINORMASK ((1U << MINORBITS) - 1) 10 | 11 | #define MKDEV(ma, mi) (((ma) << MINORBITS) | (mi)) 12 | 13 | struct hist_key { 14 | __u32 cmd_flags; 15 | __u32 dev; 16 | }; 17 | 18 | struct hist { 19 | unsigned long long slots[MAX_SLOTS]; 20 | }; 21 | 22 | #endif /* __BIOLATENCY_H */ 23 | -------------------------------------------------------------------------------- /experiments/faster-biolatency/fio/allmulti.sh: -------------------------------------------------------------------------------- 1 | ./onessd.sh $1 /dev/nvme0n1 $2 & 2 | ./onessd.sh $1 /dev/nvme1n1 $2 & 3 | ./onessd.sh $1 /dev/nvme2n1 $2 & 4 | ./onessd.sh $1 /dev/nvme3n1 $2 & 5 | ./onessd.sh $1 /dev/nvme4n1 $2 & 6 | ./onessd.sh $1 /dev/nvme5n1 $2 & 7 | ./onessd.sh $1 /dev/nvme6n1 $2 & 8 | ./onessd.sh $1 /dev/nvme7n1 $2 & 9 | ./onessd.sh $1 /dev/nvme8n1 $2 & 10 | ./onessd.sh $1 /dev/nvme9n1 $2 & 11 | ./onessd.sh $1 /dev/nvme10n1 $2 & 12 | ./onessd.sh $1 /dev/nvme11n1 $2 & 13 | ./onessd.sh $1 /dev/nvme12n1 $2 & 14 | ./onessd.sh $1 /dev/nvme13n1 $2 & 15 | ./onessd.sh $1 /dev/nvme14n1 $2 & 16 | ./onessd.sh $1 /dev/nvme15n1 $2 & 17 | ./onessd.sh $1 /dev/nvme16n1 $2 & 18 | ./onessd.sh $1 /dev/nvme17n1 $2 & 19 | ./onessd.sh $1 /dev/nvme18n1 $2 & 20 | ./onessd.sh $1 /dev/nvme19n1 $2 & 21 | ./onessd.sh $1 /dev/nvme20n1 $2 & 22 | 23 | jobs 24 | wait 25 | -------------------------------------------------------------------------------- /experiments/faster-biolatency/fio/onessd.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | [ $# -ne 3 ] && echo Usage $0 numjobs /dev/DEVICENAME BLOCKSIZE && exit 1 4 | 5 | fio --readonly --name=onessd \ 6 | --filename=$2 \ 7 | --filesize=100% --bs=$3 --direct=1 --overwrite=0 \ 8 | --rw=randread --random_generator=lfsr \ 9 | --numjobs=$1 --time_based=1 --runtime=3600 \ 10 | --ioengine=io_uring --registerfiles --fixedbufs \ 11 | --iodepth=256 --iomem=shmhuge --thread \ 12 | --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --iodepth_batch_complete_max=16 \ 13 | --gtod_reduce=1 --group_reporting --minimal 14 | 15 | 16 | -------------------------------------------------------------------------------- /experiments/oracle/README.md: -------------------------------------------------------------------------------- 1 | This is an example of using `xcapture` to read Oracle's `kskthwtb()` function arg2 to report Oracle wait event name. 2 | 3 | This is just a proof-of-concept experiment, so you need to change the `KSLEDT_SYM_ADDRESS` value in `xcapture-bpf.c` to the location of `ksledt_` symbol in your Oracle binary. 4 | 5 | You can get the value with: 6 | 7 | ``` 8 | nm $ORACLE_HOME/bin/oracle | grep ksledt 9 | ``` 10 | 11 | Example command (executed in the experiments/oracle directory): 12 | 13 | ``` 14 | sudo ./xcapture-bpf --xtop -G oracle_wait_event 15 | ``` 16 | 17 | Output: 18 | 19 | 20 | ``` 21 | === Active Threads ============================================================================================ 22 | 23 | seconds | avg_thr | visual_pct | st | username | comm | syscall | oracle_wait_event 24 | --------------------------------------------------------------------------------------------------------------- 25 | 2.60 | 0.52 | ████▊ | R | oracle | oracle_*_li | - | - 26 | 2.07 | 0.41 | ███▊ | D | oracle | oracle_*_li | pread64 | db file scattered read 27 | 0.13 | 0.03 | ▎ | R | oracle | oracle_*_li | mmap | - 28 | 0.13 | 0.03 | ▎ | R | oracle | ora_vkrm_lin*m | - | 29 | 0.13 | 0.03 | ▎ | R | oracle | ora_m*_lin*c | semtimedop | 30 | 0.07 | 0.01 | ▏ | R | oracle | oracle_*_li | pread64 | db file scattered read 31 | 0.07 | 0.01 | ▏ | R | oracle | ora_vktm_lin*m | clock_nanosleep | 32 | 0.07 | 0.01 | ▏ | R | oracle | ora_p*_lin*c | semtimedop | 33 | 0.07 | 0.01 | ▏ | R | tanel | python* | - | 34 | 0.07 | 0.01 | ▏ | R | oracle | ora_vkrm_lin*m | clock_nanosleep | 35 | 0.07 | 0.01 | ▏ | S | oracle | ora_ckpt_lin*m | io_getevents | control file parallel write 36 | 0.07 | 0.01 | ▏ | R | oracle | oracle_*_li | pread64 | - 37 | 0.07 | 0.01 | ▏ | R | oracle | oracle_*_li | io_submit | - 38 | 39 | 40 | sampled: 75 times, avg_thr: 1.12 41 | start: 2024-10-12 20:57:27, duration: 5s 42 | 43 | 44 | 45 | === Active Threads ================================================================================== 46 | 47 | seconds | avg_thr | visual_pct | st | username | comm | syscall | oracle_wait_event 48 | ----------------------------------------------------------------------------------------------------- 49 | 0.68 | 0.14 | ██▊ | R | oracle | oracle_*_li | - | - 50 | 0.41 | 0.08 | █▋ | S | oracle | oracle_*_li | io_getevents | direct path read 51 | 0.27 | 0.05 | █▏ | R | oracle | ora_vkrm_lin*m | - | 52 | 0.14 | 0.03 | ▋ | R | oracle | ora_dia*_lin*c | semtimedop | - 53 | 0.14 | 0.03 | ▋ | R | oracle | ora_mmnl_lin*m | - | - 54 | 0.07 | 0.01 | ▍ | R | oracle | ora_m*_lin*m | - | 55 | 0.07 | 0.01 | ▍ | R | oracle | ora_p*_lin*c | semtimedop | 56 | 0.07 | 0.01 | ▍ | R | oracle | ora_vktm_lin*m | clock_nanosleep | 57 | 0.07 | 0.01 | ▍ | R | oracle | ora_p*p_lin*c | semtimedop | 58 | 0.07 | 0.01 | ▍ | R | root | pmdaxfs | openat | 59 | 0.07 | 0.01 | ▍ | R | root | pmdalinux | read | 60 | 0.07 | 0.01 | ▍ | R | root | pmdaproc | getdents64 | 61 | 0.07 | 0.01 | ▍ | R | pcp | pmdaproc | openat | 62 | 0.07 | 0.01 | ▍ | R | pcp | pmdaproc | read | 63 | 0.07 | 0.01 | ▍ | R | pcp | pmlogger | - | 64 | 0.07 | 0.01 | ▍ | R | oracle | ora_p*q_lin*c | semtimedop | 65 | 0.07 | 0.01 | ▍ | R | oracle | ora_mmnl_lin*m | semtimedop | - 66 | 0.07 | 0.01 | ▍ | R | oracle | ora_dbrm_lin*m | - | - 67 | 68 | 69 | sampled: 74 times, avg_thr: 0.5 70 | start: 2024-10-12 20:57:32, duration: 5s 71 | ``` 72 | 73 | 74 | -------------------------------------------------------------------------------- /experiments/xcapture-v0-proc/0xtools.spec: -------------------------------------------------------------------------------- 1 | %define debug_package %{nil} 2 | # %define _unpackaged_files_terminate_build 0 3 | 4 | %define ReleaseNumber 5 5 | %define VERSION 1.2.3 6 | 7 | Name: 0xtools 8 | Version: %{VERSION} 9 | Release: %{ReleaseNumber}%{?dist} 10 | Source0: 0xtools-%{VERSION}.tar.gz 11 | URL: https://0x.tools/ 12 | Packager: Tanel Poder 13 | Summary: Always-on Profiling for Production Systems 14 | License: GPL 15 | Group: System Environment 16 | BuildArch: %{_arch} 17 | 18 | # Creators of these rpmspec/service files are below (Tanel just merged/customized them): 19 | 20 | # Liyong 21 | # Bart Sjerps (https://github.com/bsjerps) 22 | 23 | # RPM build instructions: 24 | # Have rpmbuild, gcc, make and rpmdevtools installed 25 | # Optionally, update Version above to the latest release 26 | # 27 | # Download 0xtools source archive into SOURCES: 28 | # spectool -g -R 0xtools.spec 29 | # 30 | # Build package: 31 | # rpmbuild -bb 0xtools.spec 32 | 33 | # Prevent compiling .py files 34 | %define __python false 35 | 36 | %description 37 | 0x.tools is a set of open-source utilities for analyzing application performance on Linux. 38 | It has a goal of deployment simplicity and minimal dependencies, to reduce friction of systematic troubleshooting. 39 | There’s no need to upgrade the OS, install kernel modules, heavy monitoring frameworks, Java agents or databases. 40 | These tools also work on over-decade-old Linux kernels, like version 2.6.18 from 15 years ago. 41 | 42 | 0x.tools allow you to measure individual thread level activity, like thread sleep states, 43 | currently executing system calls and kernel wait locations. 44 | 45 | %prep 46 | %setup -q 47 | 48 | %build 49 | make PREFIX=%{buildroot}/usr 50 | make install PREFIX=%{buildroot}/usr 51 | 52 | %install 53 | install -m 0755 -d -p %{buildroot}/usr/bin 54 | install -m 0755 -d -p %{buildroot}/usr/bin/%{name} 55 | install -m 0755 -d -p %{buildroot}/usr/lib/%{name} 56 | install -m 0755 -d -p %{buildroot}/usr/share/%{name} 57 | install -m 0755 -d -p %{buildroot}/var/log/xcapture 58 | 59 | install -m 0755 bin/run_xcpu.sh %{buildroot}/usr/bin/run_xcpu.sh 60 | install -m 0755 bin/run_xcapture.sh %{buildroot}/usr/bin/run_xcapture.sh 61 | install -m 0755 bin/schedlat %{buildroot}/usr/bin/schedlat 62 | install -m 0755 bin/vmtop %{buildroot}/usr/bin/vmtop 63 | 64 | cp -p doc/licenses/* %{buildroot}/usr/share/%{name} 65 | cp -p LICENSE %{buildroot}/usr/share/%{name} 66 | 67 | 68 | ## empty files to please %ghost section (we don't want precompiled) 69 | ## This ensures the object files also get cleaned up if we uninstall the RPM 70 | #touch %{buildroot}/usr/lib/%{name}/{psnreport,psnproc,argparse}.pyc 71 | #touch %{buildroot}/usr/lib/%{name}/{psnreport,psnproc,argparse}.pyo 72 | 73 | 74 | # systemd service 75 | install -Dp -m 0644 xcapture.default $RPM_BUILD_ROOT/etc/default/xcapture 76 | install -Dp -m 0644 xcapture.service %{buildroot}/usr/lib/systemd/system/xcapture.service 77 | install -Dp -m 0644 xcapture-restart.service %{buildroot}/usr/lib/systemd/system/xcapture-restart.service 78 | install -Dp -m 0644 xcapture-restart.timer %{buildroot}/usr/lib/systemd/system/xcapture-restart.timer 79 | 80 | %clean 81 | rm -rf %{buildroot} 82 | 83 | %post 84 | /bin/systemctl daemon-reload 85 | /bin/systemctl enable --now xcapture 86 | /bin/systemctl enable --now xcapture-restart.timer 87 | 88 | %preun 89 | if [ "$1" -eq "0" ] 90 | then 91 | /bin/systemctl disable --now xcapture 92 | /bin/systemctl disable --now xcapture-restart.timer 93 | fi 94 | 95 | %files 96 | %defattr(0755,root,root,0755) 97 | %{_bindir}/psn 98 | %{_bindir}/run_xcapture.sh 99 | %{_bindir}/run_xcpu.sh 100 | %{_bindir}/schedlat 101 | %{_bindir}/xcapture 102 | %{_bindir}/vmtop 103 | /usr/lib/0xtools/* 104 | /usr/lib/systemd/system/xcapture.service 105 | /usr/lib/systemd/system/xcapture-restart.service 106 | /usr/lib/systemd/system/xcapture-restart.timer 107 | 108 | %defattr(0644,root,root,0755) 109 | /usr/share/%{name} 110 | %ghost /usr/lib/%{name}/*.pyc 111 | %ghost /usr/lib/%{name}/*.pyo 112 | 113 | %config(noreplace) /etc/default/xcapture 114 | %dir /var/log/xcapture/ 115 | 116 | -------------------------------------------------------------------------------- /experiments/xcapture-v0-proc/Makefile: -------------------------------------------------------------------------------- 1 | CC ?= gcc 2 | PREFIX ?= /usr 3 | 4 | # build 5 | CFLAGS ?= -Wall 6 | 7 | # debuginfo included 8 | CFLAGS_DEBUG=-I include -ggdb -Wall 9 | 10 | # debug without compiler optimizations 11 | CFLAGS_DEBUG0=-I include -ggdb -O0 12 | 13 | all: 14 | $(CC) $(CFLAGS) -I include -o bin/xcapture src/xcapture.c 15 | 16 | debug: 17 | $(CC) $(CFLAGS_DEBUG) -o bin/xcapture src/xcapture.c 18 | 19 | debug0: 20 | $(CC) $(CFLAGS_DEBUG0) -o bin/xcapture src/xcapture.c 21 | 22 | install: 23 | install -m 0755 -d ${PREFIX}/bin 24 | install -m 0755 bin/xcapture ${PREFIX}/bin/xcapture 25 | 26 | uninstall: 27 | rm -fv ${PREFIX}/bin/xcapture 28 | 29 | clean: 30 | rm -fv bin/xcapture 31 | -------------------------------------------------------------------------------- /experiments/xcapture-v0-proc/release.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | PROJECT_NAME="0xtools" 4 | 5 | if [ 0 -eq $# ]; then 6 | echo "" 7 | echo " Usage: ./release.sh tag_or_commitid [tag_or_commitid...]" 8 | echo "" 9 | exit 1 10 | fi 11 | 12 | for name in "$@"; do 13 | target_type=$(git cat-file -t "${name}" 2>/dev/null) 14 | if [[ -z "${target_type}" ]]; then 15 | echo "${name} is invalid, ignored." 16 | continue 17 | fi 18 | 19 | suffix="" 20 | if expr "${target_type}" : "^commit" >/dev/null; then 21 | suffix=$(git rev-parse --short=8 "${name}") 22 | elif expr "${target_type}" : "^tag" >/dev/null; then 23 | suffix="${name}" 24 | else 25 | echo "${name} is neither a commit nor a tag!" 26 | continue 27 | fi 28 | target_name="${PROJECT_NAME}-${suffix}" 29 | echo "archiving ${target_name}" 30 | git archive -9 --format=tar.gz --prefix="${target_name}"/ "${name}" >"${target_name}".tar.gz 31 | echo "finish ${target_name}" 32 | done 33 | -------------------------------------------------------------------------------- /experiments/xcapture-v0-proc/xcapture-restart.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=restart xcapture 3 | 4 | [Service] 5 | Type=oneshot 6 | ExecStart=/usr/bin/systemctl restart xcapture 7 | -------------------------------------------------------------------------------- /experiments/xcapture-v0-proc/xcapture-restart.timer: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=xcapture restart 3 | 4 | [Timer] 5 | OnCalendar=hourly 6 | 7 | [Install] 8 | WantedBy=timers.target 9 | -------------------------------------------------------------------------------- /experiments/xcapture-v0-proc/xcapture.default: -------------------------------------------------------------------------------- 1 | SAMPLEINTERVAL=10 2 | LOGDIRPATH=/var/log/xcapture 3 | ADDITIONALOPTIONS="exe,cmdline,syscall,wchan,kstack" 4 | MINUTES=1440 5 | -------------------------------------------------------------------------------- /experiments/xcapture-v0-proc/xcapture.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=0x.Tools xcapture 3 | 4 | [Service] 5 | Environment="SAMPLEINTERVAL=1" 6 | Environment="LOGDIRPATH=/var/log/xcapture" 7 | Environment="ADDITIONALOPTIONS=syscall,wchan,exe,cmdline" 8 | Environment="MINUTES=59520" 9 | EnvironmentFile=/etc/default/xcapture 10 | ExecStartPre=/bin/sh -c 'test -d "$LOGDIRPATH"' 11 | ExecStartPre=/bin/sh -c 'test "$SAMPLEINTERVAL" -ge 1' 12 | ExecStart=/bin/sh -c '/usr/bin/xcapture -d ${SAMPLEINTERVAL} -c ${ADDITIONALOPTIONS} -o ${LOGDIRPATH}' 13 | KillSignal=SIGTERM 14 | 15 | [Install] 16 | WantedBy=multi-user.target 17 | -------------------------------------------------------------------------------- /experiments/xcapture-v1-bpftrace/transform.jq: -------------------------------------------------------------------------------- 1 | (["SAMPLE_TIME", "TID", "PID", "COMM", "TASK_STATE", "SYSCALL_ID", 2 | "SYSCALL_ARG0", "SYSCALL_ARG1", "SYSCALL_ARG2", 3 | "SYSCALL_ARG3", "SYSCALL_ARG4", "SYSCALL_ARG5", 4 | "CMDLINE", "PROFILE_USTACK", "PROFILE_KSTACK", 5 | "SYSCALL_USTACK", "OFFCPU_USTACK", "OFFCPU_KSTACK", 6 | "SCHED_WAKEUP", "ORACLE_WAIT_EVENT"], 7 | (.samples[] | 8 | .SAMPLE_TIME as $time | 9 | .comm as $comm_map | 10 | .task_state as $task_state_map | 11 | .syscall_id as $syscall_id_map | 12 | .syscall_args as $syscall_args_map | 13 | .cmdline as $cmdline_map | 14 | .profile_ustack as $profile_ustack_map | 15 | .profile_kstack as $profile_kstack_map | 16 | .syscall_ustack as $syscall_ustack_map | 17 | .offcpu_ustack as $offcpu_ustack_map | 18 | .offcpu_kstack as $offcpu_kstack_map | 19 | .sched_wakeup as $sched_wakeup_map | 20 | .oracle_wait_event as $oracle_wait_event_map | 21 | .pid | to_entries[] | .key as $key | 22 | [$time, $key, .value, 23 | ($comm_map [$key] ), # // "-" 24 | ($task_state_map [$key] ), 25 | ($syscall_id_map [$key] ), 26 | ($syscall_args_map [$key][0] ), 27 | ($syscall_args_map [$key][1] ), 28 | ($syscall_args_map [$key][2] ), 29 | ($syscall_args_map [$key][3] ), 30 | ($syscall_args_map [$key][4] ), 31 | ($syscall_args_map [$key][5] ), 32 | ($cmdline_map [$key] ), 33 | ($profile_ustack_map [$key] ), 34 | ($profile_kstack_map [$key] ), 35 | ($syscall_ustack_map [$key] ), 36 | ($offcpu_ustack_map [$key] ), 37 | ($offcpu_kstack_map [$key] ), 38 | ($sched_wakeup_map [$key] ), 39 | ($oracle_wait_event_map [$key] ) 40 | ] 41 | )) | @csv 42 | 43 | # vi:syntax=zsh 44 | 45 | -------------------------------------------------------------------------------- /experiments/xcapture-v1-bpftrace/xcapture.bt: -------------------------------------------------------------------------------- 1 | /* 2 | * 0x.Tools xcapture.bt v0.4 - Proof-of-concept prototype for sampling 3 | * Linux thread activity using eBPF [0x.tools] 4 | * 5 | * Copyright 2019-2023 Tanel Poder 6 | * 7 | * This program is free software; you can redistribute it and/or modify 8 | * it under the terms of the GNU General Public License as published by 9 | * the Free Software Foundation; either version 2 of the License, or 10 | * (at your option) any later version. 11 | * 12 | * This program is distributed in the hope that it will be useful, 13 | * but WITHOUT ANY WARRANTY; without even the implied warranty of 14 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 | * GNU General Public License for more details. 16 | * 17 | * You should have received a copy of the GNU General Public License along 18 | * with this program; if not, write to the Free Software Foundation, Inc., 19 | * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 20 | * 21 | * SPDX-License-Identifier: GPL-2.0-or-later 22 | * 23 | */ 24 | 25 | // This is a PoC prototype for demonstrating feasibility of the custom, programmable 26 | // task state object populating + sampling approach. This script is not complete and 27 | // it probably has bugs. I have plenty of improvements in It's not a finished tool or a product. 28 | // 29 | // To avoid the extremely slow stack address to symbol resolution in bpftrace, enable 30 | // symbol caching, for example: 31 | // 32 | // sudo BPFTRACE_CACHE_USER_SYMBOLS=1 bpftrace xcapture.bt 33 | // or 34 | // sudo BPFTRACE_CACHE_USER_SYMBOLS=1 bpftrace -f json xcapture.bt > out.json 35 | 36 | BEGIN { 37 | @TASK_STATES[0x00] = "R"; // "(running)" 38 | @TASK_STATES[0x01] = "S"; // "(sleeping)" 39 | @TASK_STATES[0x02] = "D"; // "(disk sleep)" 40 | @TASK_STATES[0x04] = "T"; // "(stopped)" 41 | @TASK_STATES[0x08] = "t"; // "(tracing stop)" 42 | @TASK_STATES[0x10] = "X"; // "(dead)" 43 | @TASK_STATES[0x20] = "Z"; // "(zombie)" 44 | @TASK_STATES[0x40] = "P"; // "(parked)" 45 | @TASK_STATES[0x80] = "I"; // "(idle)" 46 | } 47 | 48 | 49 | // record system calls by threads into the thread state array 50 | // ideally/eventually need to move pid/uid/gid (and perhaps comm) assignment out of the syscall probe 51 | tracepoint:raw_syscalls:sys_enter { 52 | // [tid] uses thread local storage, cleaned out automatically on thread exit 53 | @pid [tid] = pid; // *in bpftrace* tid means thread ID (task ID), pid means Process ID (thread group ID) 54 | @uid [tid] = uid; 55 | @gid [tid] = gid; 56 | @comm [tid] = comm; 57 | @cmdline [tid] = str(uptr(curtask->mm->arg_start)); 58 | @task_state [tid] = @TASK_STATES[curtask->__state & 0xff]; 59 | @syscall_id [tid] = args->id; 60 | @syscall_args [tid] = (args->args[0], args->args[1], args->args[2], args->args[3], args->args[4], args->args[5]); 61 | @syscall_ustack [tid] = ustack(); 62 | } 63 | 64 | tracepoint:raw_syscalls:sys_exit { 65 | delete(@syscall_id[tid]) // @syscall_id [tid] = -1; 66 | } 67 | 68 | 69 | // thread requests going off CPU 70 | // by the time schedule() is called, the caller has set the new task state 71 | kprobe:schedule { 72 | @task_state [tid] = @TASK_STATES[curtask->__state & 0xff]; 73 | @offcpu_ustack [tid] = ustack(); 74 | @offcpu_kstack [tid] = kstack(); 75 | } 76 | 77 | // thread has been put back on CPU 78 | // newer kernels have the "isra" version of this function name, thus the * wildcard 79 | kprobe:finish_task_switch* { 80 | @task_state [tid] = @TASK_STATES[curtask->__state & 0xff]; 81 | delete(@offcpu_ustack[tid]); 82 | delete(@offcpu_kstack[tid]); 83 | } 84 | 85 | // sampled profiling of on-CPU threads 86 | // update the stack id of threads currently running on (any) cpu 87 | profile:hz:1 { 88 | @task_state [tid] = @TASK_STATES[curtask->__state & 0xff]; 89 | @profile_ustack[tid] = ustack(); 90 | @profile_kstack[tid] = kstack(); 91 | } 92 | 93 | // Context enrichment example (kernel): tasks waiting in the CPU runqueue 94 | tracepoint:sched:sched_wakeup, 95 | tracepoint:sched:sched_wakeup_new { 96 | @sched_wakeup[args->pid] = 1; 97 | } 98 | 99 | tracepoint:sched:sched_switch { 100 | delete(@sched_wakeup[args->next_pid]); // or: @sched_wakeup[args->next_pid] = -1; 101 | } 102 | 103 | tracepoint:sched:sched_process_exit { 104 | delete(@pid [args->pid]); 105 | delete(@uid [args->pid]); 106 | delete(@gid [args->pid]); 107 | delete(@comm [args->pid]); 108 | delete(@cmdline [args->pid]); 109 | delete(@task_state [args->pid]); 110 | delete(@syscall_id [args->pid]); 111 | delete(@syscall_args [args->pid]); 112 | delete(@syscall_ustack [args->pid]); 113 | delete(@sched_wakeup [args->pid]); 114 | } 115 | 116 | 117 | // Context enrichment example (application): Oracle database wait events 118 | uprobe:/u01/app/oracle/product/19.0.0/dbhome_1/bin/oracle:kskthbwt { 119 | $EVENT_NAME_ARRAY_START=(uint64 *) *uptr(0x600069f0); // uaddr("ksledt_") gave error... 120 | $EVENT_NAME_SLOT_SIZE=(uint64) 56; // sizeof(struct) 121 | 122 | @oracle_wait_event[tid] = str(*uptr($EVENT_NAME_ARRAY_START + ($EVENT_NAME_SLOT_SIZE * arg1)/8)); 123 | } 124 | 125 | uprobe:/u01/app/oracle/product/19.0.0/dbhome_1/bin/oracle:kskthewt { 126 | delete(@oracle_wait_event[tid]); // @oracle_wait_event[tid] = -1; 127 | } 128 | 129 | 130 | // write out SAMPLES of thread states & activity 131 | // interval is executed on 1 CPU only, so we won't emit duplicates 132 | interval:hz:1 { 133 | @SAMPLE_TIME=strftime("\"%Y-%m-%dT%H:%M:%S.%f\"", nsecs); // extra "" for json output 134 | print(@SAMPLE_TIME); 135 | 136 | print(@pid); 137 | print(@comm); 138 | print(@cmdline); 139 | print(@task_state); 140 | print(@syscall_id); 141 | print(@syscall_args); 142 | print(@profile_ustack); 143 | print(@profile_kstack); 144 | print(@syscall_ustack); 145 | print(@offcpu_ustack); 146 | print(@offcpu_kstack); 147 | print(@sched_wakeup); 148 | print(@oracle_wait_event); 149 | } 150 | 151 | END { 152 | clear(@SAMPLE_TIME); 153 | clear(@TASK_STATES); 154 | clear(@pid); 155 | clear(@uid); 156 | clear(@gid); 157 | clear(@comm); 158 | clear(@cmdline); 159 | clear(@profile_ustack); 160 | clear(@profile_kstack); 161 | clear(@syscall_ustack); 162 | clear(@offcpu_ustack); 163 | clear(@offcpu_kstack); 164 | clear(@sched_wakeup); 165 | clear(@syscall_id); 166 | clear(@syscall_args); 167 | clear(@task_state); 168 | clear(@oracle_wait_event); 169 | } 170 | 171 | // TODO: 172 | // --------------------------------------------------------------------------------------------- 173 | // There's *plenty* to do! If you know bcc/libbpf and are interested in helping out, ping me :-) 174 | // 175 | // Email: tanel@tanelpoder.com 176 | // 177 | // PRINTOUT NOTES: 178 | // ---------------------------------------------------------------------------------------------- 179 | // "Kernel: 5.3 bpftrace supports C style while loops: 180 | // bpftrace -e 'i:ms:100 { $i = 0; while ($i <= 100) { printf("%d ", $i); $i++} exit(); }' 181 | // Loops can be short circuited by using the continue and break keywords." 182 | // 183 | // Unfortunately bpftrace doesn't (yet?) support iterating through only the existing (populated) 184 | // elements in hash maps, we don't want to loop from 1 to pid_max every time we emit output! 185 | // 186 | // Thus, we need to use bcc/libbpf for the sampling loops or use bpftool to dump or mount 187 | // the kernel ebpf maps as files and do our reading / sampling from there. 188 | // 189 | // Since we don't want to always emit/print every single task, but would rather have some 190 | // conditional logic & intelligence of what threads are interesting (a'la only print R & D states 191 | // and some specific syscalls under S state), it's better to push this decision logic down to 192 | // kernel. This means bcc or more likely libbpf as mentioned above. 193 | // 194 | // DATA STRUCTURE NOTES: 195 | // ---------------------------------------------------------------------------------------------- 196 | // With bcc/libbpf it's likely possible to use a hashmap of structs (or hashmap of maps) for 197 | // storing each thread's complete state in a single thread state "array", under a single TID key. 198 | // This should reduce any timing & "read consistency" issues when sampling/emitting records too. 199 | // 200 | /* vi:syntax=c */ 201 | /* vi:filetype=c */ 202 | -------------------------------------------------------------------------------- /include/syscall_64.h: -------------------------------------------------------------------------------- 1 | #define __NR_read 0 2 | #define __NR_write 1 3 | #define __NR_open 2 4 | #define __NR_close 3 5 | #define __NR_stat 4 6 | #define __NR_fstat 5 7 | #define __NR_lstat 6 8 | #define __NR_poll 7 9 | #define __NR_lseek 8 10 | #define __NR_mmap 9 11 | #define __NR_mprotect 10 12 | #define __NR_munmap 11 13 | #define __NR_brk 12 14 | #define __NR_rt_sigaction 13 15 | #define __NR_rt_sigprocmask 14 16 | #define __NR_rt_sigreturn 15 17 | #define __NR_ioctl 16 18 | #define __NR_pread64 17 19 | #define __NR_pwrite64 18 20 | #define __NR_readv 19 21 | #define __NR_writev 20 22 | #define __NR_access 21 23 | #define __NR_pipe 22 24 | #define __NR_select 23 25 | #define __NR_sched_yield 24 26 | #define __NR_mremap 25 27 | #define __NR_msync 26 28 | #define __NR_mincore 27 29 | #define __NR_madvise 28 30 | #define __NR_shmget 29 31 | #define __NR_shmat 30 32 | #define __NR_shmctl 31 33 | #define __NR_dup 32 34 | #define __NR_dup2 33 35 | #define __NR_pause 34 36 | #define __NR_nanosleep 35 37 | #define __NR_getitimer 36 38 | #define __NR_alarm 37 39 | #define __NR_setitimer 38 40 | #define __NR_getpid 39 41 | #define __NR_sendfile 40 42 | #define __NR_socket 41 43 | #define __NR_connect 42 44 | #define __NR_accept 43 45 | #define __NR_sendto 44 46 | #define __NR_recvfrom 45 47 | #define __NR_sendmsg 46 48 | #define __NR_recvmsg 47 49 | #define __NR_shutdown 48 50 | #define __NR_bind 49 51 | #define __NR_listen 50 52 | #define __NR_getsockname 51 53 | #define __NR_getpeername 52 54 | #define __NR_socketpair 53 55 | #define __NR_setsockopt 54 56 | #define __NR_getsockopt 55 57 | #define __NR_clone 56 58 | #define __NR_fork 57 59 | #define __NR_vfork 58 60 | #define __NR_execve 59 61 | #define __NR_exit 60 62 | #define __NR_wait4 61 63 | #define __NR_kill 62 64 | #define __NR_uname 63 65 | #define __NR_semget 64 66 | #define __NR_semop 65 67 | #define __NR_semctl 66 68 | #define __NR_shmdt 67 69 | #define __NR_msgget 68 70 | #define __NR_msgsnd 69 71 | #define __NR_msgrcv 70 72 | #define __NR_msgctl 71 73 | #define __NR_fcntl 72 74 | #define __NR_flock 73 75 | #define __NR_fsync 74 76 | #define __NR_fdatasync 75 77 | #define __NR_truncate 76 78 | #define __NR_ftruncate 77 79 | #define __NR_getdents 78 80 | #define __NR_getcwd 79 81 | #define __NR_chdir 80 82 | #define __NR_fchdir 81 83 | #define __NR_rename 82 84 | #define __NR_mkdir 83 85 | #define __NR_rmdir 84 86 | #define __NR_creat 85 87 | #define __NR_link 86 88 | #define __NR_unlink 87 89 | #define __NR_symlink 88 90 | #define __NR_readlink 89 91 | #define __NR_chmod 90 92 | #define __NR_fchmod 91 93 | #define __NR_chown 92 94 | #define __NR_fchown 93 95 | #define __NR_lchown 94 96 | #define __NR_umask 95 97 | #define __NR_gettimeofday 96 98 | #define __NR_getrlimit 97 99 | #define __NR_getrusage 98 100 | #define __NR_sysinfo 99 101 | #define __NR_times 100 102 | #define __NR_ptrace 101 103 | #define __NR_getuid 102 104 | #define __NR_syslog 103 105 | #define __NR_getgid 104 106 | #define __NR_setuid 105 107 | #define __NR_setgid 106 108 | #define __NR_geteuid 107 109 | #define __NR_getegid 108 110 | #define __NR_setpgid 109 111 | #define __NR_getppid 110 112 | #define __NR_getpgrp 111 113 | #define __NR_setsid 112 114 | #define __NR_setreuid 113 115 | #define __NR_setregid 114 116 | #define __NR_getgroups 115 117 | #define __NR_setgroups 116 118 | #define __NR_setresuid 117 119 | #define __NR_getresuid 118 120 | #define __NR_setresgid 119 121 | #define __NR_getresgid 120 122 | #define __NR_getpgid 121 123 | #define __NR_setfsuid 122 124 | #define __NR_setfsgid 123 125 | #define __NR_getsid 124 126 | #define __NR_capget 125 127 | #define __NR_capset 126 128 | #define __NR_rt_sigpending 127 129 | #define __NR_rt_sigtimedwait 128 130 | #define __NR_rt_sigqueueinfo 129 131 | #define __NR_rt_sigsuspend 130 132 | #define __NR_sigaltstack 131 133 | #define __NR_utime 132 134 | #define __NR_mknod 133 135 | #define __NR_uselib 134 136 | #define __NR_personality 135 137 | #define __NR_ustat 136 138 | #define __NR_statfs 137 139 | #define __NR_fstatfs 138 140 | #define __NR_sysfs 139 141 | #define __NR_getpriority 140 142 | #define __NR_setpriority 141 143 | #define __NR_sched_setparam 142 144 | #define __NR_sched_getparam 143 145 | #define __NR_sched_setscheduler 144 146 | #define __NR_sched_getscheduler 145 147 | #define __NR_sched_get_priority_max 146 148 | #define __NR_sched_get_priority_min 147 149 | #define __NR_sched_rr_get_interval 148 150 | #define __NR_mlock 149 151 | #define __NR_munlock 150 152 | #define __NR_mlockall 151 153 | #define __NR_munlockall 152 154 | #define __NR_vhangup 153 155 | #define __NR_modify_ldt 154 156 | #define __NR_pivot_root 155 157 | #define __NR__sysctl 156 158 | #define __NR_prctl 157 159 | #define __NR_arch_prctl 158 160 | #define __NR_adjtimex 159 161 | #define __NR_setrlimit 160 162 | #define __NR_chroot 161 163 | #define __NR_sync 162 164 | #define __NR_acct 163 165 | #define __NR_settimeofday 164 166 | #define __NR_mount 165 167 | #define __NR_umount2 166 168 | #define __NR_swapon 167 169 | #define __NR_swapoff 168 170 | #define __NR_reboot 169 171 | #define __NR_sethostname 170 172 | #define __NR_setdomainname 171 173 | #define __NR_iopl 172 174 | #define __NR_ioperm 173 175 | #define __NR_create_module 174 176 | #define __NR_init_module 175 177 | #define __NR_delete_module 176 178 | #define __NR_get_kernel_syms 177 179 | #define __NR_query_module 178 180 | #define __NR_quotactl 179 181 | #define __NR_nfsservctl 180 182 | #define __NR_getpmsg 181 /* reserved for LiS/STREAMS */ 183 | #define __NR_putpmsg 182 /* reserved for LiS/STREAMS */ 184 | #define __NR_afs_syscall 183 /* reserved for AFS */ 185 | #define __NR_tuxcall 184 /* reserved for tux */ 186 | #define __NR_security 185 187 | #define __NR_gettid 186 188 | #define __NR_readahead 187 189 | #define __NR_setxattr 188 190 | #define __NR_lsetxattr 189 191 | #define __NR_fsetxattr 190 192 | #define __NR_getxattr 191 193 | #define __NR_lgetxattr 192 194 | #define __NR_fgetxattr 193 195 | #define __NR_listxattr 194 196 | #define __NR_llistxattr 195 197 | #define __NR_flistxattr 196 198 | #define __NR_removexattr 197 199 | #define __NR_lremovexattr 198 200 | #define __NR_fremovexattr 199 201 | #define __NR_tkill 200 202 | #define __NR_time 201 203 | #define __NR_futex 202 204 | #define __NR_sched_setaffinity 203 205 | #define __NR_sched_getaffinity 204 206 | #define __NR_set_thread_area 205 207 | #define __NR_io_setup 206 208 | #define __NR_io_destroy 207 209 | #define __NR_io_getevents 208 210 | #define __NR_io_submit 209 211 | #define __NR_io_cancel 210 212 | #define __NR_get_thread_area 211 213 | #define __NR_lookup_dcookie 212 214 | #define __NR_epoll_create 213 215 | #define __NR_epoll_ctl_old 214 216 | #define __NR_epoll_wait_old 215 217 | #define __NR_remap_file_pages 216 218 | #define __NR_getdents64 217 219 | #define __NR_set_tid_address 218 220 | #define __NR_restart_syscall 219 221 | #define __NR_semtimedop 220 222 | #define __NR_fadvise64 221 223 | #define __NR_timer_create 222 224 | #define __NR_timer_settime 223 225 | #define __NR_timer_gettime 224 226 | #define __NR_timer_getoverrun 225 227 | #define __NR_timer_delete 226 228 | #define __NR_clock_settime 227 229 | #define __NR_clock_gettime 228 230 | #define __NR_clock_getres 229 231 | #define __NR_clock_nanosleep 230 232 | #define __NR_exit_group 231 233 | #define __NR_epoll_wait 232 234 | #define __NR_epoll_ctl 233 235 | #define __NR_tgkill 234 236 | #define __NR_utimes 235 237 | #define __NR_vserver 236 238 | #define __NR_mbind 237 239 | #define __NR_set_mempolicy 238 240 | #define __NR_get_mempolicy 239 241 | #define __NR_mq_open 240 242 | #define __NR_mq_unlink 241 243 | #define __NR_mq_timedsend 242 244 | #define __NR_mq_timedreceive 243 245 | #define __NR_mq_notify 244 246 | #define __NR_mq_getsetattr 245 247 | #define __NR_kexec_load 246 248 | #define __NR_waitid 247 249 | #define __NR_add_key 248 250 | #define __NR_request_key 249 251 | #define __NR_keyctl 250 252 | #define __NR_ioprio_set 251 253 | #define __NR_ioprio_get 252 254 | #define __NR_inotify_init 253 255 | #define __NR_inotify_add_watch 254 256 | #define __NR_inotify_rm_watch 255 257 | #define __NR_migrate_pages 256 258 | #define __NR_openat 257 259 | #define __NR_mkdirat 258 260 | #define __NR_mknodat 259 261 | #define __NR_fchownat 260 262 | #define __NR_futimesat 261 263 | #define __NR_newfstatat 262 264 | #define __NR_unlinkat 263 265 | #define __NR_renameat 264 266 | #define __NR_linkat 265 267 | #define __NR_symlinkat 266 268 | #define __NR_readlinkat 267 269 | #define __NR_fchmodat 268 270 | #define __NR_faccessat 269 271 | #define __NR_pselect6 270 272 | #define __NR_ppoll 271 273 | #define __NR_unshare 272 274 | #define __NR_set_robust_list 273 275 | #define __NR_get_robust_list 274 276 | #define __NR_splice 275 277 | #define __NR_tee 276 278 | #define __NR_sync_file_range 277 279 | #define __NR_vmsplice 278 280 | #define __NR_move_pages 279 281 | #define __NR_utimensat 280 282 | #define __NR_epoll_pwait 281 283 | #define __NR_signalfd 282 284 | #define __NR_timerfd_create 283 285 | #define __NR_eventfd 284 286 | #define __NR_fallocate 285 287 | #define __NR_timerfd_settime 286 288 | #define __NR_timerfd_gettime 287 289 | #define __NR_accept4 288 290 | #define __NR_signalfd4 289 291 | #define __NR_eventfd2 290 292 | #define __NR_epoll_create1 291 293 | #define __NR_dup3 292 294 | #define __NR_pipe2 293 295 | #define __NR_inotify_init1 294 296 | #define __NR_preadv 295 297 | #define __NR_pwritev 296 298 | #define __NR_rt_tgsigqueueinfo 297 299 | #define __NR_perf_event_open 298 300 | #define __NR_recvmmsg 299 301 | #define __NR_prlimit64 300 302 | -------------------------------------------------------------------------------- /lib/0xtools/psnreport.py: -------------------------------------------------------------------------------- 1 | # psn -- Linux Process Snapper by Tanel Poder [https://0x.tools] 2 | # Copyright 2019-2021 Tanel Poder 3 | # 4 | # This program is free software; you can redistribute it and/or modify 5 | # it under the terms of the GNU General Public License as published by 6 | # the Free Software Foundation; either version 2 of the License, or 7 | # (at your option) any later version. 8 | # 9 | # This program is distributed in the hope that it will be useful, 10 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 11 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 | # GNU General Public License for more details. 13 | # 14 | # You should have received a copy of the GNU General Public License along 15 | # with this program; if not, write to the Free Software Foundation, Inc., 16 | # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 17 | # 18 | # SPDX-License-Identifier: GPL-2.0-or-later 19 | 20 | # query/report code 21 | 22 | from itertools import groupby 23 | from datetime import datetime 24 | 25 | import psnproc as proc 26 | import logging 27 | 28 | def flatten(li): 29 | return [item for sublist in li for item in sublist] 30 | 31 | 32 | ### ASCII table output ### 33 | def output_table_report(report, dataset): 34 | max_field_width = 500 35 | header_fmts, field_fmts = [], [] 36 | total_field_width = 0 37 | total_field_width_without_kstack = 0 38 | 39 | if dataset: 40 | col_idx = 0 41 | for source, cols, expr, token in report.full_projection(): 42 | if token in ('pid', 'task', 'samples'): 43 | col_type = int 44 | elif token == 'event_time': 45 | col_type = str 46 | elif token == 'avg_threads': 47 | col_type = float 48 | elif cols: 49 | col = [c for c in source.available_columns if c[0] == cols[0]][0] 50 | col_type = col[1] 51 | else: 52 | col_type = str 53 | 54 | if col_type in (str, int, int): 55 | max_field_length = max([len(str(row[col_idx])) for row in dataset]) 56 | elif col_idx == float: 57 | max_field_length = max([len(str(int(row[col_idx]))) for row in dataset]) + 3 # arbitrary! 58 | 59 | field_width = min(max_field_width, max(len(token), max_field_length)) 60 | 61 | # left-align strings both in header and data 62 | if col_type == str: 63 | header_fmts.append('%%-%s.%ss' % (field_width, field_width)) 64 | else: 65 | header_fmts.append('%%%s.%ss' % (field_width, field_width)) 66 | 67 | if col_type == str: 68 | field_fmts.append('%%-%s.%ss' % (field_width, field_width)) 69 | elif col_type in (int, int): 70 | field_fmts.append('%%%sd' % field_width) 71 | elif col_type == float: 72 | field_fmts.append('%%%s.%sf' % (field_width, 2)) # arbitrary 73 | 74 | total_field_width += field_width 75 | total_field_width_without_kstack += field_width if token != 'kstack' else 0 76 | col_idx += 1 77 | 78 | report_width = total_field_width + (3 * (len(header_fmts) -1)) + 2 79 | hr = '-' * report_width 80 | title_pad = report_width - len(report.name) - 2 81 | #title = '=== ' + report.name + ' ' + '=' * (title_pad - 29) + ' [' + datetime.now().strftime("%Y-%m-%d %H:%M:%S") + '] ===' 82 | title = '=== ' + report.name + ' ' + '=' * (title_pad - 3) 83 | header_fmt = ' ' + ' | '.join(header_fmts) + ' ' 84 | field_fmt = ' ' + ' | '.join(field_fmts) + ' ' 85 | 86 | print("") 87 | print(title) 88 | print("") 89 | if dataset: 90 | print(header_fmt % tuple([c[3] for c in report.full_projection()])) 91 | print(hr) 92 | for row in dataset: 93 | print(field_fmt % row) 94 | else: 95 | print('query returned no rows') 96 | print("") 97 | print("") 98 | 99 | 100 | 101 | class Report: 102 | def __init__(self, name, projection, dimensions=[], where=[], order=[], output_fn=output_table_report): 103 | def reify_column_token(col_token): 104 | if col_token == 'samples': 105 | return (None, [], 'COUNT(1)', col_token) 106 | elif col_token == 'avg_threads': 107 | return (None, [], 'CAST(COUNT(1) AS REAL) / %(num_sample_events)s', col_token) 108 | elif col_token in ('pid', 'task', 'event_time'): 109 | return ('first_source', [col_token], col_token, col_token) 110 | 111 | for t in proc.all_sources: 112 | for c in t.schema_columns: 113 | if col_token.lower() == c[0].lower(): 114 | return (t, [c[0]], c[0], c[0]) 115 | 116 | raise Exception('projection/dimension column %s not found.\nUse psn --list to see all available columns' % col_token) 117 | 118 | def process_filter_sql(filter_sql): 119 | idle_filter = "stat.state_id IN ('S', 'Z', 'I', 'P')" 120 | 121 | if filter_sql == 'active': 122 | return (proc.stat, ['state_id'], 'not(%s)' % idle_filter, filter_sql) 123 | elif filter_sql == 'idle': 124 | return (proc.stat, ['state_id'], idle_filter, filter_sql) 125 | else: 126 | raise Exception('arbitrary filtering not implemented') 127 | 128 | self.name = name 129 | self.projection = [reify_column_token(t) for t in projection if t] 130 | self.dimensions = [reify_column_token(t) for t in dimensions if t] 131 | self.order = [reify_column_token(t) for t in order if t] 132 | self.where = [process_filter_sql(t) for t in where if t] 133 | self.output_fn = output_fn 134 | 135 | # columns without a specific source are assigned the first source 136 | first_source = [c[0] for c in (self.projection + self.dimensions + self.order + self.where) if c[0] and c[0] != 'first_source'][0] 137 | self.projection = [(first_source if c[0] == 'first_source' else c[0], c[1], c[2], c[3]) for c in self.projection] 138 | self.dimensions = [(first_source if c[0] == 'first_source' else c[0], c[1], c[2], c[3]) for c in self.dimensions] 139 | self.order = [(first_source if c[0] == 'first_source' else c[0], c[1], c[2], c[3]) for c in self.order] 140 | self.where = [(first_source if c[0] == 'first_source' else c[0], c[1], c[2], c[3]) for c in self.where] 141 | 142 | self.sources = {} # source -> [cols] 143 | for d in [self.projection, self.dimensions, self.order, self.where]: 144 | for source, column_names, expr, token in d: 145 | source_columns = self.sources.get(source, ['pid', 'task', 'event_time']) 146 | source_columns.extend(column_names) 147 | self.sources[source] = source_columns 148 | if None in self.sources: 149 | del self.sources[None] 150 | 151 | 152 | def full_projection(self): 153 | return self.projection + [c for c in self.dimensions if c not in self.projection] 154 | 155 | 156 | def query(self): 157 | def render_col(c): 158 | return '%s.%s' % (c[0].name, c[2]) if c[0] else c[2] 159 | 160 | # build join conditions 161 | first_source_name = list(self.sources.keys())[0].name 162 | join_where = flatten([['%s.%s = %s.%s' % (s.name, c, first_source_name, c) for c in ['pid', 'task', 'event_time']] for s in list(self.sources.keys())[1:]]) 163 | 164 | attr = { 165 | 'projection': '\t' + ',\n\t'.join([render_col(c) for c in self.full_projection()]), 166 | 'tables': '\t' + ',\n\t'.join([s.name for s in self.sources]), 167 | 'where': '\t' + ' AND\n\t'.join([c[2] for c in self.where] + join_where), 168 | 'dimensions': '\t' + ',\n\t'.join([render_col(c) for c in self.dimensions]), 169 | 'order': '\t' + ',\n\t'.join([render_col(c) + ' DESC' for c in self.order]), 170 | 'num_sample_events': '(SELECT COUNT(DISTINCT(event_time)) FROM %s)' % first_source_name 171 | } 172 | 173 | logging.debug('attr where=%s#end' % attr['where']) 174 | 175 | sql = 'SELECT\n%(projection)s\nFROM\n%(tables)s' % attr 176 | # tanel changed from self.where to attr['where'] 177 | # TODO think through the logic of using self.where vs attr.where (in the context of allowing pid/tid to be not part of group by) 178 | if attr['where'].strip(): 179 | sql += '\nWHERE\n%(where)s' % attr 180 | if attr['dimensions']: 181 | sql += '\nGROUP BY\n%(dimensions)s' % attr 182 | if attr['order']: 183 | sql += '\nORDER BY\n%(order)s' % attr 184 | 185 | # final substitution allows things like avg_threads to work 186 | return sql % attr 187 | 188 | 189 | def dataset(self, conn): 190 | logging.debug(self.query()) 191 | r = conn.execute(self.query()).fetchall() 192 | logging.debug('Done') 193 | return r 194 | 195 | def output_report(self, conn): 196 | self.output_fn(self, self.dataset(conn)) 197 | 198 | 199 | 200 | 201 | 202 | -------------------------------------------------------------------------------- /tools/README.md: -------------------------------------------------------------------------------- 1 | Look inside `xq` for usage examples and my launch blog for now: 2 | 3 | * https://tanelpoder.com/posts/xcapture-v3-alpha-ebpf-performance-analysis-with-duckdb/ 4 | 5 | -------------------------------------------------------------------------------- /tools/sql/sclathist.sql: -------------------------------------------------------------------------------- 1 | -- SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause 2 | -- Copyright 2024-2038 Tanel Poder [0x.tools] 3 | 4 | -- 5 | -- This is a demo of an overengineered SQL query, simply so I could display histogram latency 6 | -- buckets using SQL (and fill any empty buckets) all using just SQL. Normally the histogram 7 | -- rendering will be done in the frontend UI code, but I couldn't help it and wanted to fit 8 | -- it all in DuckDB to avoid an additional Python or JS dependency. 9 | -- 10 | -- The queries in the xcapture v3 beta and production release will be much simpler (and faster)! 11 | -- 12 | -- Depending on your terminal size, you need the .maxwidth setting to see all columns of interest 13 | 14 | .nullvalue '' 15 | .width connection2=50 16 | .width connectionsum=50 17 | .width connectionsum2=50 18 | .maxwidth 300 19 | .maxrows 1000 20 | 21 | SET enable_progress_bar = false; 22 | -- SET memory_limit = '1GB'; 23 | -- SET enable_profiling = 'json'; -- query_tree, query_tree_optimizer, json 24 | -- SET profiling_mode = 'standard'; -- standard, detailed 25 | -- SET profiling_output = 'duckdb.prof'; 26 | 27 | WITH part AS ( -- block device id to name mapping 28 | SELECT 29 | LIST_EXTRACT(field_list, 1)::int AS DEV_MAJ, 30 | LIST_EXTRACT(field_list, 2)::int AS DEV_MIN, 31 | TRIM(LIST_EXTRACT(field_list, 4)) AS DEVNAME 32 | FROM ( 33 | SELECT 34 | REGEXP_EXTRACT_ALL(column0, ' +(\w+)') field_list 35 | FROM 36 | read_csv('#XTOP_DATADIR#/partitions', skip=1, header=false) 37 | WHERE 38 | field_list IS NOT NULL 39 | ) 40 | ), 41 | base_samples AS ( 42 | SELECT 43 | t.timestamp AS SAMPLE_TIMESTAMP 44 | , t.exe 45 | , t.username 46 | , CASE 47 | WHEN t.comm LIKE 'ora_p%' -- for oracle process naming that also use letters in addition to digits 48 | THEN regexp_replace(t.comm, '(?:p[0-9a-z]+_)', 'p*_', 'g') 49 | ELSE regexp_replace(t.comm, '[0-9]+', '*', 'g') 50 | END as COMM 51 | , t.state 52 | , t.syscall 53 | , t.syscall_active 54 | , t.sysc_arg1 55 | , t.sysc_arg2 56 | , t.sysc_arg3 57 | , t.sysc_arg4 58 | , t.sysc_arg5 59 | , t.sysc_arg6 60 | , t.filename 61 | , COALESCE(REGEXP_REPLACE(t.filename, '[0-9]+', '*', 'g'), '-') AS FILENAMESUM 62 | , CASE 63 | WHEN t.filename IS NULL THEN '-' 64 | WHEN REGEXP_MATCHES(t.filename, '\.([^\.]+)$') THEN REGEXP_EXTRACT(t.filename, '(\.[^\.]+)$', 1) 65 | ELSE '-' 66 | END AS FEXT 67 | , connection 68 | , connection2: COALESCE(REGEXP_REPLACE(connection, '::ffff:', '', 'g'), '-') 69 | , connectionsum: COALESCE(REGEXP_REPLACE(connection2, '(->.*:)[0-9]+', '\1[*]'), '-') 70 | , connectionsum2: COALESCE(REGEXP_REPLACE(connection2, '(:)[0-9]+', '\1[*]', 'g'), '-') 71 | , t.tid 72 | , t.tgid 73 | , t.sysc_seq_num 74 | , sc.duration_ns AS SYSC_DURATION_NS 75 | , COALESCE(io.iorq_seq_num , 0) AS IORQ_SEQ_NUM 76 | , COALESCE(io.duration_ns , 0) AS IORQ_DURATION_NS 77 | , COALESCE(io.service_ns , 0) AS IORQ_SERVICE_NS 78 | , COALESCE(io.queued_ns , 0) AS IORQ_QUEUED_NS 79 | , COALESCE(io.bytes , 0) AS IORQ_BYTES 80 | , COALESCE(io.iorq_flags , '-') AS IORQ_FLAGS 81 | , COALESCE(io.dev_maj, 0) AS DEV_MAJ 82 | , COALESCE(io.dev_min, 0) AS DEV_MIN 83 | , CASE 84 | WHEN io.dev_maj IS NULL OR io.dev_maj IS NULL THEN '-' 85 | ELSE io.dev_maj||':'||io.dev_min 86 | END AS DEV 87 | , COALESCE(part.devname, '-') AS DEVNAME 88 | FROM 89 | read_csv_auto('#XTOP_DATADIR#/xcapture_samples_*.csv') AS t 90 | LEFT OUTER JOIN 91 | read_csv_auto('#XTOP_DATADIR#/xcapture_syscend_*.csv') AS sc 92 | ON t.tid = sc.tid 93 | AND t.sysc_seq_num = sc.sysc_seq_num 94 | LEFT OUTER JOIN read_csv_auto('#XTOP_DATADIR#/xcapture_iorqend_*.csv') AS io 95 | ON t.tid = io.insert_tid 96 | AND t.iorq_seq_num = io.iorq_seq_num 97 | LEFT OUTER JOIN part 98 | ON io.dev_maj = part.dev_maj 99 | AND io.dev_min = part.dev_min 100 | WHERE 101 | (#XTOP_WHERE#) -- the parenthesis are important here if there are OR statements in filter clause 102 | AND timestamp >= TIMESTAMP '#XTOP_LOW_TIME#' 103 | AND timestamp < TIMESTAMP '#XTOP_HIGH_TIME#' 104 | ), 105 | grouped_lat_buckets AS ( 106 | SELECT 107 | seconds: COUNT(*) 108 | , SUM(seconds) OVER (PARTITION BY #XTOP_GROUP_COLS#) total_group_seconds 109 | , LEAST( 110 | POWER(2, CEIL(LOG2(CASE WHEN sysc_duration_ns <= 0 THEN NULL ELSE CEIL(sysc_duration_ns / 1000) END)))::bigint 111 | , POWER(2, 25)::bigint 112 | ) AS SC_LAT_BKT_US 113 | , #XTOP_GROUP_COLS# 114 | , ROUND(SUM(1000000000 / sysc_duration_ns)) AS EST_SC_CNT 115 | , ROUND(SUM(1000000000 / iorq_duration_ns)) AS EST_IORQ_CNT 116 | , MIN(sysc_duration_ns) AS MIN_SCLAT_NS 117 | , MAX(sysc_duration_ns) AS MAX_SCLAT_NS 118 | , MIN(sample_timestamp) AS FIRST_SEEN_TS 119 | , MAX(sample_timestamp) AS LAST_SEEN_TS 120 | FROM 121 | base_samples 122 | GROUP BY 123 | #XTOP_GROUP_COLS# 124 | , LEAST( 125 | POWER(2, CEIL(LOG2(CASE WHEN sysc_duration_ns <= 0 THEN NULL ELSE CEIL(sysc_duration_ns / 1000) END)))::bigint 126 | , POWER(2, 25)::bigint 127 | ) 128 | ), 129 | max_group_seconds AS ( 130 | SELECT MAX(total_group_seconds) AS max_seconds FROM grouped_lat_buckets 131 | ), 132 | max_lat_bkt_seconds AS ( 133 | SELECT MAX(sum_seconds) AS max_seconds FROM ( 134 | SELECT SUM(seconds) AS sum_seconds FROM grouped_lat_buckets 135 | WHERE syscall != '-' 136 | GROUP BY 137 | #XTOP_GROUP_COLS# 138 | , sc_lat_bkt_us 139 | ) 140 | ), 141 | -- generate all bucket numbers to also list buckets with no rows in histogram 142 | -- the following pattern is a hack to abuse SQL for generating the entire histogram in SQL 143 | -- in the final xtop (and web UI) this SQL hack is not needed as histograms are rendered in 144 | -- frontend application code 145 | gen_all_lat_buckets AS ( 146 | SELECT nr, POWER(2,nr)::bigint AS sc_lat_bkt_us 147 | FROM (SELECT * FROM generate_series(0, 25) AS buckets(nr)) 148 | ), 149 | distinct_dimensions AS ( 150 | SELECT DISTINCT #XTOP_GROUP_COLS# 151 | FROM grouped_lat_buckets 152 | ), 153 | all_dims_buckets AS ( 154 | SELECT distinct_dimensions.*, gen_all_lat_buckets.sc_lat_bkt_us 155 | FROM distinct_dimensions CROSS JOIN gen_all_lat_buckets 156 | ), 157 | full_data AS ( 158 | SELECT 159 | ab.* 160 | , COALESCE(g.seconds, 0) AS SECONDS 161 | , COALESCE(g.est_sc_cnt, NULL) AS EST_SC_CNT 162 | , COALESCE(g.min_sclat_ns, NULL) AS MIN_SCLAT_NS 163 | , COALESCE(g.max_sclat_ns, NULL) AS MAX_SCLAT_NS 164 | , g.first_seen_ts 165 | , g.last_seen_ts 166 | FROM 167 | all_dims_buckets ab LEFT OUTER JOIN grouped_lat_buckets g USING (#XTOP_GROUP_COLS#,sc_lat_bkt_us) 168 | ) 169 | SELECT 170 | SUM(seconds) AS SECONDS 171 | , ROUND(SUM(seconds) / EPOCH (TIMESTAMP '#XTOP_HIGH_TIME#' - TIMESTAMP '#XTOP_LOW_TIME#'), 1) AVG_THR 172 | , BAR(SUM(seconds), 0, (SELECT max_seconds FROM max_group_seconds), 10) AS TIME_BAR 173 | , #XTOP_GROUP_COLS# 174 | , CASE WHEN syscall != '-' OR syscall IS NULL THEN 175 | -- the LPAD below isn't really needed anymore as we generate all latency buckets 176 | LPAD('', COALESCE(MIN(CEIL(LOG2(sc_lat_bkt_us)))::int, 0), ' ') || STRING_AGG( 177 | CASE -- for debugging 178 | WHEN seconds IS NULL THEN 'n' -- if you see this, something funky going on 179 | ELSE 180 | CASE(CEIL(8 * seconds / (SELECT max_seconds FROM max_lat_bkt_seconds)))::int 181 | WHEN 0 THEN ' ' 182 | WHEN 1 THEN '▁' 183 | WHEN 2 THEN '▂' 184 | WHEN 3 THEN '▃' 185 | WHEN 4 THEN '▄' 186 | WHEN 5 THEN '▅' 187 | WHEN 6 THEN '▆' 188 | WHEN 7 THEN '▇' 189 | WHEN 8 THEN '█' 190 | ELSE 191 | '?' -- bar overflow (would be a bug) 192 | END 193 | END, 194 | '' ORDER BY sc_lat_bkt_us ASC 195 | ) 196 | ELSE '' END AS "<1us__32us_1ms__32ms_1s_8+" -- latency_distribution 197 | , LPAD(FORMAT('{:,}', CASE WHEN syscall != '-' THEN MIN(min_sclat_ns) END), 15, ' ') AS MIN_SC_LAT_NS 198 | , LPAD(FORMAT('{:,}', CASE WHEN syscall != '-' THEN MAX(max_sclat_ns) END), 18, ' ') AS MAX_SC_LAT_NS 199 | , LPAD(FORMAT('{:,}', CASE WHEN syscall != '-' THEN SUM(est_sc_cnt)::bigint END), 13, ' ') AS EST_SC_CNT 200 | -- , MIN(first_seen_ts) AS FIRST_SEEN 201 | -- , MAX(last_seen_ts) AS LAST_SEEN 202 | FROM 203 | full_data 204 | GROUP BY 205 | #XTOP_GROUP_COLS# 206 | HAVING 207 | SUM(seconds) > 0 -- filter out empty buckets post-aggregation and histogram rendering 208 | ORDER BY 209 | seconds DESC 210 | , #XTOP_GROUP_COLS# 211 | LIMIT 30 212 | ; 213 | -------------------------------------------------------------------------------- /tools/xq: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Rudimentary "frontend" for querying xcapture output CSV data using DuckDB and 4 | # any template queries in the /next/tools/sql directory 5 | 6 | # bail out on any error 7 | set -e 8 | 9 | if [ $# -lt 3 ]; then 10 | echo "Usage: $0 [LOW_TIME] [HIGH_TIME]" 11 | echo "Example: $0 sclathist \"status,username,exe,syscall,filename\" \"state='DISK'\"" 12 | echo 13 | echo "Time arguments can be:" 14 | echo " - Date command offsets (e.g., -5M for 5 minutes ago)" 15 | echo " - Explicit timestamps (e.g., \"2025-03-17 09:00:00\")" 16 | echo 17 | echo "Examples with time:" 18 | echo " $0 sclathist \"state,exe,syscall,filename\" \"username='mysql'\" -5M" 19 | echo " $0 sclathist \"state,exe,syscall,filename\" \"syscall LIKE 'p%64'\" -2H -1H" 20 | echo " $0 sclathist \"state,exe,syscall,filename\" \"exe='oracle'\" \"2025-03-17 09:00:00\" \"2025-03-17 10:00:00\"" 21 | exit 1 22 | fi 23 | 24 | TEMPLATE="$1" 25 | GROUPBY="$2" 26 | FILTER="$3" 27 | 28 | # if datadir env var is not set, read CSV files from current dir 29 | DATADIR="${DATADIR:-.}" 30 | 31 | # where the xq script resides, the sql templates are in "sql" dir under it for now 32 | BASENAME="$(dirname "$(realpath "$0")")" 33 | 34 | # Function to process timestamp - handles both date offsets and explicit timestamps 35 | process_timestamp() { 36 | local time_arg="$1" 37 | 38 | # Check if the argument looks like a date offset (starts with + or -) 39 | if [[ "$time_arg" =~ ^[+-] ]]; then 40 | # It's a date offset, use the date command 41 | # date --date "$time_arg" +"%Y-%m-%d %H:%M:%S" 42 | date -v "$time_arg" +"%Y-%m-%d %H:%M:%S" 43 | else 44 | # It's an explicit timestamp, return as is 45 | echo "$time_arg" 46 | fi 47 | } 48 | 49 | # Process low timestamp if provided, otherwise read from 1 minute ago 50 | if [ $# -ge 4 ]; then 51 | LOW_TIME=$(process_timestamp "$4") 52 | else 53 | LOW_TIME=$(process_timestamp "-1M") 54 | fi 55 | 56 | # Process high timestamp if provided, otherwise use current time 57 | if [ $# -ge 5 ]; then 58 | HIGH_TIME=$(process_timestamp "$5") 59 | else 60 | HIGH_TIME=$(process_timestamp "+0S") 61 | fi 62 | 63 | echo 64 | echo "[0x.tools] Xcapture Query: Time from $LOW_TIME to $HIGH_TIME" 65 | echo 66 | 67 | # slashes in DATADIR path must be escaped for sed variable substitution 68 | ESCAPED_DATADIR=$(printf '%s\n' "$DATADIR" | sed 's/\//\\\//g') 69 | 70 | sed -e 's/#XTOP_GROUP_COLS#/'"${GROUPBY}"'/g' \ 71 | -e 's/#XTOP_WHERE#/'"${FILTER}"'/g' \ 72 | -e 's/#XTOP_LOW_TIME#/'"${LOW_TIME}"'/g' \ 73 | -e 's/#XTOP_HIGH_TIME#/'"${HIGH_TIME}"'/g' \ 74 | -e 's/#XTOP_DATADIR#/'"${ESCAPED_DATADIR}"'/g' \ 75 | "${BASENAME}/sql/${TEMPLATE}.sql" > ${BASENAME}/sql/out.sql 76 | 77 | # cat ${BASENAME}/sql/out.sql 78 | 79 | duckdb -f ${BASENAME}/sql/out.sql 80 | 81 | # rm ${BASENAME}/sql/out.sql 82 | echo 83 | -------------------------------------------------------------------------------- /xcapture/Makefile: -------------------------------------------------------------------------------- 1 | # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 | OUTPUT := .output 3 | CLANG ?= clang 4 | LIBBPF_SRC := $(abspath ../libbpf-bootstrap/libbpf/src) 5 | BPFTOOL_SRC := $(abspath ../libbpf-bootstrap/bpftool/src) 6 | LIBBPF_OBJ := $(abspath $(OUTPUT)/libbpf.a) 7 | BPFTOOL_OUTPUT ?= $(abspath $(OUTPUT)/bpftool) 8 | BPFTOOL ?= $(BPFTOOL_OUTPUT)/bootstrap/bpftool 9 | LIBBLAZESYM_SRC := $(abspath ../libbpf-bootstrap/blazesym/) 10 | LIBBLAZESYM_INC := $(abspath $(LIBBLAZESYM_SRC)/capi/include) 11 | LIBBLAZESYM_OBJ := $(abspath $(OUTPUT)/libblazesym_c.a) 12 | 13 | # 0xtools main include dir for syscall_nr to name translation etc 14 | XTOOLS_INC := $(abspath ../../include) 15 | 16 | ARCH ?= $(shell uname -m | sed 's/x86_64/x86/' \ 17 | | sed 's/arm.*/arm/' \ 18 | | sed 's/aarch64/arm64/' \ 19 | | sed 's/ppc64le/powerpc/' \ 20 | | sed 's/mips.*/mips/' \ 21 | | sed 's/riscv64/riscv/' \ 22 | | sed 's/loongarch64/loongarch/') 23 | VMLINUX := ../libbpf-bootstrap/vmlinux.h/include/$(ARCH)/vmlinux.h 24 | 25 | INCLUDES := -I$(OUTPUT) \ 26 | -I../libbpf-bootstrap/libbpf/include/uapi \ 27 | -I$(dir $(VMLINUX)) \ 28 | -I$(LIBBLAZESYM_INC) \ 29 | -I$(XTOOLS_INC) \ 30 | -I. \ 31 | -Iinclude \ 32 | -Isrc \ 33 | -Isrc/probes \ 34 | -Isrc/maps \ 35 | -Isrc/utils \ 36 | -Isrc/user \ 37 | -I$(OUTPUT)/src/probes 38 | 39 | CFLAGS := -g -Wall 40 | ALL_LDFLAGS := $(LDFLAGS) $(EXTRA_LDFLAGS) 41 | 42 | APPS = xcapture 43 | 44 | # Define BPF source files with their full paths 45 | BPF_SRCS := src/probes/task/task.bpf.c \ 46 | src/probes/syscall/syscall.bpf.c \ 47 | src/probes/io/iorq.bpf.c 48 | 49 | # Define BPF object and skeleton paths explicitly to avoid confusion 50 | BPF_OBJS := $(patsubst %.bpf.c,$(OUTPUT)/%.bpf.o,$(BPF_SRCS)) 51 | BPF_SKELS := $(patsubst $(OUTPUT)/%.bpf.o,$(OUTPUT)/%.skel.h,$(BPF_OBJS)) 52 | 53 | # Explicitly define the object files needed for the final binary 54 | OBJS := $(OUTPUT)/src/user/main.o $(OUTPUT)/src/utils/md5.o \ 55 | $(OUTPUT)/src/user/task_handler.o $(OUTPUT)/src/user/tracking_handler.o 56 | 57 | # Get Clang's default includes on this system 58 | CLANG_BPF_SYS_INCLUDES ?= $(shell $(CLANG) -v -E - &1 \ 59 | | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }') 60 | 61 | ifeq ($(V),1) 62 | Q = 63 | msg = 64 | else 65 | Q = @ 66 | msg = @printf ' %-8s %s%s\n' \ 67 | "$(1)" \ 68 | "$(patsubst $(abspath $(OUTPUT))/%,%,$(2))" \ 69 | "$(if $(3), $(3))"; 70 | MAKEFLAGS += --no-print-directory 71 | endif 72 | 73 | define allow-override 74 | $(if $(or $(findstring environment,$(origin $(1))),\ 75 | $(findstring command line,$(origin $(1)))),,\ 76 | $(eval $(1) = $(2))) 77 | endef 78 | 79 | $(call allow-override,CC,$(CROSS_COMPILE)cc) 80 | $(call allow-override,LD,$(CROSS_COMPILE)ld) 81 | 82 | .PHONY: all 83 | all: $(APPS) 84 | 85 | .PHONY: clean 86 | clean: 87 | $(call msg,CLEAN) 88 | $(Q)rm -rf $(OUTPUT) $(APPS) 89 | 90 | # remove only xcapture output (not libbpf, bpftool etc) 91 | .PHONY: cleanx 92 | cleanx: 93 | $(call msg,CLEANX, xcapture artifacts) 94 | $(Q)rm -f $(APPS) 95 | $(Q)rm -rf $(OUTPUT)/src 96 | 97 | # Create all needed output directories 98 | $(OUTPUT) $(OUTPUT)/libbpf $(BPFTOOL_OUTPUT): 99 | $(call msg,MKDIR,$@) 100 | $(Q)mkdir -p $@ 101 | 102 | # Output directories for various components 103 | $(OUTPUT)/src/user: 104 | $(call msg,MKDIR,$@) 105 | $(Q)mkdir -p $@ 106 | 107 | $(OUTPUT)/src/probes/task: 108 | $(call msg,MKDIR,$@) 109 | $(Q)mkdir -p $@ 110 | 111 | $(OUTPUT)/src/probes/syscall: 112 | $(call msg,MKDIR,$@) 113 | $(Q)mkdir -p $@ 114 | 115 | $(OUTPUT)/src/probes/io: 116 | $(call msg,MKDIR,$@) 117 | $(Q)mkdir -p $@ 118 | 119 | $(OUTPUT)/src/helpers: 120 | $(call msg,MKDIR,$@) 121 | $(Q)mkdir -p $@ 122 | 123 | $(OUTPUT)/src/utils: 124 | $(call msg,MKDIR,$@) 125 | $(Q)mkdir -p $@ 126 | 127 | # Build libbpf 128 | $(LIBBPF_OBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT)/libbpf 129 | $(call msg,LIB,$@) 130 | $(Q)$(MAKE) -C $(LIBBPF_SRC) BUILD_STATIC_ONLY=1 \ 131 | OBJDIR=$(patsubst %/,%,$(dir $@))"/libbpf" \ 132 | DESTDIR=$(patsubst %/,%,$(dir $@)) \ 133 | INCLUDEDIR= LIBDIR= UAPIDIR= \ 134 | install 135 | 136 | # Build bpftool 137 | $(BPFTOOL): | $(BPFTOOL_OUTPUT) 138 | $(call msg,BPFTOOL,$@) 139 | $(Q)$(MAKE) ARCH= CROSS_COMPILE= OUTPUT=$(BPFTOOL_OUTPUT)/ -C $(BPFTOOL_SRC) bootstrap 140 | 141 | # Generic rule for building BPF objects 142 | $(OUTPUT)/src/probes/task/task.bpf.o: src/probes/task/task.bpf.c \ 143 | include/xcapture.h src/maps/xcapture_maps.h src/utils/xcapture_helpers.h src/helpers/file_helpers.h \ 144 | $(LIBBPF_OBJ) $(VMLINUX) | $(OUTPUT)/src/probes/task $(BPFTOOL) 145 | $(call msg,BPF,$@) 146 | $(Q)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH) \ 147 | $(INCLUDES) $(CLANG_BPF_SYS_INCLUDES) \ 148 | -c $< -o $(patsubst %.bpf.o,%.tmp.bpf.o,$@) 149 | $(Q)$(BPFTOOL) gen object $@ $(patsubst %.bpf.o,%.tmp.bpf.o,$@) 150 | 151 | $(OUTPUT)/src/probes/syscall/syscall.bpf.o: src/probes/syscall/syscall.bpf.c include/xcapture.h src/maps/xcapture_maps.h $(LIBBPF_OBJ) $(VMLINUX) | $(OUTPUT)/src/probes/syscall $(BPFTOOL) 152 | $(call msg,BPF,$@) 153 | $(Q)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH) \ 154 | $(INCLUDES) $(CLANG_BPF_SYS_INCLUDES) \ 155 | -c $< -o $(patsubst %.bpf.o,%.tmp.bpf.o,$@) 156 | $(Q)$(BPFTOOL) gen object $@ $(patsubst %.bpf.o,%.tmp.bpf.o,$@) 157 | 158 | $(OUTPUT)/src/probes/io/iorq.bpf.o: src/probes/io/iorq.bpf.c include/xcapture.h src/maps/xcapture_maps.h $(LIBBPF_OBJ) $(VMLINUX) | $(OUTPUT)/src/probes/io $(BPFTOOL) 159 | $(call msg,BPF,$@) 160 | $(Q)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH) \ 161 | $(INCLUDES) $(CLANG_BPF_SYS_INCLUDES) \ 162 | -c $< -o $(patsubst %.bpf.o,%.tmp.bpf.o,$@) 163 | $(Q)$(BPFTOOL) gen object $@ $(patsubst %.bpf.o,%.tmp.bpf.o,$@) 164 | 165 | # Generate BPF skeletons 166 | $(OUTPUT)/src/probes/task/task.skel.h: $(OUTPUT)/src/probes/task/task.bpf.o | $(BPFTOOL) 167 | $(call msg,GEN,$@) 168 | $(Q)$(BPFTOOL) gen skeleton $< > $@ 169 | 170 | $(OUTPUT)/src/probes/syscall/syscall.skel.h: $(OUTPUT)/src/probes/syscall/syscall.bpf.o | $(BPFTOOL) 171 | $(call msg,GEN,$@) 172 | $(Q)$(BPFTOOL) gen skeleton $< > $@ 173 | 174 | $(OUTPUT)/src/probes/io/iorq.skel.h: $(OUTPUT)/src/probes/io/iorq.bpf.o | $(BPFTOOL) 175 | $(call msg,GEN,$@) 176 | $(Q)$(BPFTOOL) gen skeleton $< > $@ 177 | 178 | # Build md5.o 179 | $(OUTPUT)/src/utils/md5.o: src/utils/md5.c src/utils/md5.h | $(OUTPUT)/src/utils 180 | $(call msg,CC,$@) 181 | $(Q)$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@ 182 | 183 | # Define specific dependencies for main.o 184 | MAIN_DEPS := src/user/main.c \ 185 | include/xcapture.h \ 186 | include/xcapture_user.h \ 187 | src/utils/md5.h \ 188 | $(OUTPUT)/src/probes/task/task.skel.h \ 189 | $(OUTPUT)/src/probes/syscall/syscall.skel.h \ 190 | $(OUTPUT)/src/probes/io/iorq.skel.h 191 | 192 | 193 | # Ringbuf event handlers 194 | $(OUTPUT)/src/user/task_handler.o: src/user/task_handler.c src/user/task_handler.h | $(OUTPUT)/src/user 195 | $(call msg,CC,$@) 196 | $(Q)$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@ 197 | 198 | $(OUTPUT)/src/user/tracking_handler.o: src/user/tracking_handler.c src/user/tracking_handler.h | $(OUTPUT)/src/user 199 | $(call msg,CC,$@) 200 | $(Q)$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@ 201 | 202 | # Build main.o with explicit dependency on all BPF skeletons 203 | $(OUTPUT)/src/user/main.o: $(MAIN_DEPS) | $(OUTPUT)/src/user 204 | $(call msg,CC,$@) 205 | $(Q)$(CC) $(CFLAGS) $(INCLUDES) -D__TARGET_ARCH_$(ARCH) -c src/user/main.c -o $@ 206 | 207 | # Build final binary with explicit objects 208 | xcapture: $(OBJS) $(LIBBPF_OBJ) | $(OUTPUT) 209 | $(call msg,BINARY,$@) 210 | $(Q)$(CC) $(CFLAGS) -D__TARGET_ARCH_$(ARCH) $^ $(ALL_LDFLAGS) -lelf -lz -o $@ 211 | 212 | # Print debug info (useful for troubleshooting) 213 | .PHONY: debug 214 | debug: 215 | @echo "BPF_SRCS: $(BPF_SRCS)" 216 | @echo "BPF_OBJS: $(BPF_OBJS)" 217 | @echo "BPF_SKELS: $(BPF_SKELS)" 218 | @echo "OBJS: $(OBJS)" 219 | @echo "MAIN_DEPS: $(MAIN_DEPS)" 220 | @ls -la $(OUTPUT)/src/probes/task/ 2>/dev/null || echo "Task probe directory not created yet" 221 | 222 | # delete failed targets 223 | .DELETE_ON_ERROR: 224 | 225 | # keep intermediate (.skel.h, .bpf.o, etc) targets 226 | .SECONDARY: 227 | -------------------------------------------------------------------------------- /xcapture/include/blk_types.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ 2 | #ifndef __BLK_TYPES_H 3 | #define __BLK_TYPES_H 4 | 5 | /* From include/linux/blk_types.h */ 6 | 7 | /* 8 | * Operations and flags common to the bio and request structures. 9 | * We use 8 bits for encoding the operation, and the remaining 24 for flags. 10 | * 11 | * The least significant bit of the operation number indicates the data 12 | * transfer direction: 13 | * 14 | * - if the least significant bit is set transfers are TO the device 15 | * - if the least significant bit is not set transfers are FROM the device 16 | * 17 | * If a operation does not transfer data the least significant bit has no 18 | * meaning. 19 | */ 20 | #define REQ_OP_BITS 8 21 | #define REQ_OP_MASK ((1 << REQ_OP_BITS) - 1) 22 | #define REQ_FLAG_BITS 24 23 | 24 | enum req_opf { 25 | /* read sectors from the device */ 26 | REQ_OP_READ = 0, 27 | /* write sectors to the device */ 28 | REQ_OP_WRITE = 1, 29 | /* flush the volatile write cache */ 30 | REQ_OP_FLUSH = 2, 31 | /* discard sectors */ 32 | REQ_OP_DISCARD = 3, 33 | /* securely erase sectors */ 34 | REQ_OP_SECURE_ERASE = 5, 35 | /* reset a zone write pointer */ 36 | REQ_OP_ZONE_RESET = 6, 37 | /* write the same sector many times */ 38 | REQ_OP_WRITE_SAME = 7, 39 | /* reset all the zone present on the device */ 40 | REQ_OP_ZONE_RESET_ALL = 8, 41 | /* write the zero filled sector many times */ 42 | REQ_OP_WRITE_ZEROES = 9, 43 | /* Open a zone */ 44 | REQ_OP_ZONE_OPEN = 10, 45 | /* Close a zone */ 46 | REQ_OP_ZONE_CLOSE = 11, 47 | /* Transition a zone to full */ 48 | REQ_OP_ZONE_FINISH = 12, 49 | 50 | /* SCSI passthrough using struct scsi_request */ 51 | REQ_OP_SCSI_IN = 32, 52 | REQ_OP_SCSI_OUT = 33, 53 | /* Driver private requests */ 54 | REQ_OP_DRV_IN = 34, 55 | REQ_OP_DRV_OUT = 35, 56 | 57 | REQ_OP_LAST, 58 | }; 59 | 60 | enum req_flag_bits { 61 | __REQ_FAILFAST_DEV = /* no driver retries of device errors */ 62 | REQ_OP_BITS, 63 | __REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */ 64 | __REQ_FAILFAST_DRIVER, /* no driver retries of driver errors */ 65 | __REQ_SYNC, /* request is sync (sync write or read) */ 66 | __REQ_META, /* metadata io request */ 67 | __REQ_PRIO, /* boost priority in cfq */ 68 | __REQ_NOMERGE, /* don't touch this for merging */ 69 | __REQ_IDLE, /* anticipate more IO after this one */ 70 | __REQ_INTEGRITY, /* I/O includes block integrity payload */ 71 | __REQ_FUA, /* forced unit access */ 72 | __REQ_PREFLUSH, /* request for cache flush */ 73 | __REQ_RAHEAD, /* read ahead, can fail anytime */ 74 | __REQ_BACKGROUND, /* background IO */ 75 | __REQ_NOWAIT, /* Don't wait if request will block */ 76 | __REQ_NOWAIT_INLINE, /* Return would-block error inline */ 77 | /* 78 | * When a shared kthread needs to issue a bio for a cgroup, doing 79 | * so synchronously can lead to priority inversions as the kthread 80 | * can be trapped waiting for that cgroup. CGROUP_PUNT flag makes 81 | * submit_bio() punt the actual issuing to a dedicated per-blkcg 82 | * work item to avoid such priority inversions. 83 | */ 84 | __REQ_CGROUP_PUNT, 85 | 86 | /* command specific flags for REQ_OP_WRITE_ZEROES: */ 87 | __REQ_NOUNMAP, /* do not free blocks when zeroing */ 88 | 89 | __REQ_HIPRI, 90 | 91 | /* for driver use */ 92 | __REQ_DRV, 93 | __REQ_SWAP, /* swapping request. */ 94 | __REQ_NR_BITS, /* stops here */ 95 | }; 96 | 97 | #define REQ_FAILFAST_DEV (1ULL << __REQ_FAILFAST_DEV) 98 | #define REQ_FAILFAST_TRANSPORT (1ULL << __REQ_FAILFAST_TRANSPORT) 99 | #define REQ_FAILFAST_DRIVER (1ULL << __REQ_FAILFAST_DRIVER) 100 | #define REQ_SYNC (1ULL << __REQ_SYNC) 101 | #define REQ_META (1ULL << __REQ_META) 102 | #define REQ_PRIO (1ULL << __REQ_PRIO) 103 | #define REQ_NOMERGE (1ULL << __REQ_NOMERGE) 104 | #define REQ_IDLE (1ULL << __REQ_IDLE) 105 | #define REQ_INTEGRITY (1ULL << __REQ_INTEGRITY) 106 | #define REQ_FUA (1ULL << __REQ_FUA) 107 | #define REQ_PREFLUSH (1ULL << __REQ_PREFLUSH) 108 | #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) 109 | #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) 110 | #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) 111 | #define REQ_NOWAIT_INLINE (1ULL << __REQ_NOWAIT_INLINE) 112 | #define REQ_CGROUP_PUNT (1ULL << __REQ_CGROUP_PUNT) 113 | 114 | #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) 115 | #define REQ_HIPRI (1ULL << __REQ_HIPRI) 116 | 117 | #define REQ_DRV (1ULL << __REQ_DRV) 118 | #define REQ_SWAP (1ULL << __REQ_SWAP) 119 | 120 | #define REQ_FAILFAST_MASK \ 121 | (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) 122 | 123 | #define REQ_NOMERGE_FLAGS \ 124 | (REQ_NOMERGE | REQ_PREFLUSH | REQ_FUA) 125 | 126 | #endif /* __BLK_TYPES_H */ 127 | -------------------------------------------------------------------------------- /xcapture/include/syscall_arg1_is_fd.txt: -------------------------------------------------------------------------------- 1 | 0 // read((unsigned int) fd, (char *) buf, (size_t) count); // nr 2 | 1 // write((unsigned int) fd, (const char *) buf, (size_t) count); // nr 3 | 5 // newfstat((unsigned int) fd, (struct stat *) statbuf); // nr 4 | 8 // lseek((unsigned int) fd, (off_t) offset, (unsigned int) whence); // nr 5 | 16 // ioctl((unsigned int) fd, (unsigned int) cmd, (unsigned long) arg); // nr 6 | 17 // pread64((unsigned int) fd, (char *) buf, (size_t) count, (loff_t) pos); // nr 7 | 18 // pwrite64((unsigned int) fd, (const char *) buf, (size_t) count, (loff_t) pos); // nr 8 | 19 // readv((unsigned long) fd, (const struct iovec *) vec, (unsigned long) vlen); // nr 9 | 20 // writev((unsigned long) fd, (const struct iovec *) vec, (unsigned long) vlen); // nr 10 | 42 // connect((int) fd, (struct sockaddr *) uservaddr, (int) addrlen); // nr 11 | 43 // accept((int) fd, (struct sockaddr *) upeer_sockaddr, (int *) upeer_addrlen); // nr 12 | 44 // sendto((int) fd, (void *) buff, (size_t) len, (unsigned int) flags, (struct sockaddr *) addr, (int) addr_len); // nr 13 | 45 // recvfrom((int) fd, (void *) ubuf, (size_t) size, (unsigned int) flags, (struct sockaddr *) addr, (int *) addr_len); // nr 14 | 46 // sendmsg((int) fd, (struct user_msghdr *) msg, (unsigned int) flags); // nr 15 | 47 // recvmsg((int) fd, (struct user_msghdr *) msg, (unsigned int) flags); // nr 16 | 48 // shutdown((int) fd, (int) how); // nr 17 | 49 // bind((int) fd, (struct sockaddr *) umyaddr, (int) addrlen); // nr 18 | 50 // listen((int) fd, (int) backlog); // nr 19 | 51 // getsockname((int) fd, (struct sockaddr *) usockaddr, (int *) usockaddr_len); // nr 20 | 52 // getpeername((int) fd, (struct sockaddr *) usockaddr, (int *) usockaddr_len); // nr 21 | 54 // setsockopt((int) fd, (int) level, (int) optname, (char *) optval, (int) optlen); // nr 22 | 55 // getsockopt((int) fd, (int) level, (int) optname, (char *) optval, (int *) optlen); // nr 23 | 72 // fcntl((unsigned int) fd, (unsigned int) cmd, (unsigned long) arg); // nr 24 | 73 // flock((unsigned int) fd, (unsigned int) cmd); // nr 25 | 77 // ftruncate((unsigned int) fd, (off_t) length); // nr 26 | 78 // getdents((unsigned int) fd, (struct linux_dirent *) dirent, (unsigned int) count); // nr 27 | 91 // fchmod((unsigned int) fd, (umode_t) mode); // nr 28 | 93 // fchown((unsigned int) fd, (uid_t) user, (gid_t) group); // nr 29 | 138 // fstatfs((unsigned int) fd, (struct statfs *) buf); // nr 30 | 187 // readahead((int) fd, (loff_t) offset, (size_t) count); // nr 31 | 190 // fsetxattr((int) fd, (const char *) name, (const void *) value, (size_t) size, (int) flags); // nr 32 | 193 // fgetxattr((int) fd, (const char *) name, (void *) value, (size_t) size); // nr 33 | 196 // flistxattr((int) fd, (char *) list, (size_t) size); // nr 34 | 199 // fremovexattr((int) fd, (const char *) name); // nr 35 | 217 // getdents64((unsigned int) fd, (struct linux_dirent64 *) dirent, (unsigned int) count); // nr 36 | 221 // fadvise64((int) fd, (loff_t) offset, (size_t) len, (int) advice); // nr 37 | 254 // inotify_add_watch((int) fd, (const char *) pathname, (u32) mask); // nr 38 | 255 // inotify_rm_watch((int) fd, (__s32) wd); // nr 39 | 277 // sync_file_range((int) fd, (loff_t) offset, (loff_t) nbytes, (unsigned int) flags); // nr 40 | 278 // vmsplice((int) fd, (const struct iovec *) uiov, (unsigned long) nr_segs, (unsigned int) flags); // nr 41 | 285 // fallocate((int) fd, (int) mode, (loff_t) offset, (loff_t) len); // nr 42 | 288 // accept4((int) fd, (struct sockaddr *) upeer_sockaddr, (int *) upeer_addrlen, (int) flags); // nr 43 | 295 // preadv((unsigned long) fd, (const struct iovec *) vec, (unsigned long) vlen, (unsigned long) pos_l, (unsigned long) pos_h); // nr 44 | 296 // pwritev((unsigned long) fd, (const struct iovec *) vec, (unsigned long) vlen, (unsigned long) pos_l, (unsigned long) pos_h); // nr 45 | 299 // recvmmsg((int) fd, (struct mmsghdr *) mmsg, (unsigned int) vlen, (unsigned int) flags, (struct __kernel_timespec *) timeout); // nr 46 | 307 // sendmmsg((int) fd, (struct mmsghdr *) mmsg, (unsigned int) vlen, (unsigned int) flags); // nr 47 | 308 // setns((int) fd, (int) flags); // nr 48 | 313 // finit_module((int) fd, (const char *) uargs, (int) flags); // nr 49 | 322 // execveat((int) fd, (const char *) filename, (const char *const *) argv, (const char *const *) envp, (int) flags); // nr 50 | 327 // preadv2((unsigned long) fd, (const struct iovec *) vec, (unsigned long) vlen, (unsigned long) pos_l, (unsigned long) pos_h, (rwf_t) flags); // nr 51 | 328 // pwritev2((unsigned long) fd, (const struct iovec *) vec, (unsigned long) vlen, (unsigned long) pos_l, (unsigned long) pos_h, (rwf_t) flags); // nr 52 | 426 // io_uring_enter((unsigned int) fd, (u32) to_submit, (u32) min_complete, (u32) flags, (const void *) argp, (size_t) argsz); // nr 53 | 427 // io_uring_register((unsigned int) fd, (unsigned int) opcode, (void *) arg, (unsigned int) nr_args); // nr 54 | 431 // fsconfig((int) fd, (unsigned int) cmd, (const char *) _key, (const void *) _value, (int) aux); // nr 55 | 436 // close_range((unsigned int) fd, (unsigned int) max_fd, (unsigned int) flags); // nr 56 | 438 // pidfd_getfd((int) pidfd, (int) fd, (unsigned int) flags); // nr 57 | 443 // quotactl_fd((unsigned int) fd, (unsigned int) cmd, (qid_t) id, (void *) addr); // nr 58 | 451 // cachestat((unsigned int) fd, (struct cachestat_range *) cstat_range, (struct cachestat *) cstat, (unsigned int) flags); // nr 59 | -------------------------------------------------------------------------------- /xcapture/include/syscall_fd_bitmap_aarch64.h: -------------------------------------------------------------------------------- 1 | #ifndef __XCAPTURE_SYSCALL_FD_BITMAP_H 2 | #define __XCAPTURE_SYSCALL_FD_BITMAP_H 3 | 4 | /* 5 | * Define bitmap for syscalls that have fd as first argument 6 | * Each bit position corresponds to a syscall number 7 | * Using 8 * uint64_t for syscalls 0-511 8 | */ 9 | #define SYSCALL_FD_BITMAP_SIZE (512 / 64) 10 | 11 | /* 12 | * Initialize the bitmap with syscalls that have fd as first argument 13 | * Each entry in this array represents a 64-bit chunk of the bitmap 14 | * 15 | * The list is generated using latest 0x.tools syscallargs (1.0.4) 16 | * using the syscall_x86_64.tbl file from latest linux kernel source 17 | * 18 | * sudo syscallargs --gen-fd-bitmap --syscalltbl include/syscall_aarch64.tbl 19 | */ 20 | 21 | static const uint64_t syscall_fd_bitmap[SYSCALL_FD_BITMAP_SIZE] = { 22 | (1ULL << ( 7 - 0)) /* fsetxattr((int) fd) */ 23 | | (1ULL << ( 10 - 0)) /* fgetxattr((int) fd) */ 24 | | (1ULL << ( 13 - 0)) /* flistxattr((int) fd) */ 25 | | (1ULL << ( 16 - 0)) /* fremovexattr((int) fd) */ 26 | | (1ULL << ( 21 - 0)) /* epoll_ctl((int) epfd) */ 27 | | (1ULL << ( 22 - 0)) /* epoll_pwait((int) epfd) */ 28 | | (1ULL << ( 24 - 0)) /* dup3((unsigned int) oldfd) */ 29 | | (1ULL << ( 25 - 0)) /* fcntl((unsigned int) fd) */ 30 | | (1ULL << ( 27 - 0)) /* inotify_add_watch((int) fd) */ 31 | | (1ULL << ( 28 - 0)) /* inotify_rm_watch((int) fd) */ 32 | | (1ULL << ( 29 - 0)) /* ioctl((unsigned int) fd) */ 33 | | (1ULL << ( 32 - 0)) /* flock((unsigned int) fd) */ 34 | | (1ULL << ( 33 - 0)) /* mknodat((int) dfd) */ 35 | | (1ULL << ( 34 - 0)) /* mkdirat((int) dfd) */ 36 | | (1ULL << ( 35 - 0)) /* unlinkat((int) dfd) */ 37 | | (1ULL << ( 37 - 0)) /* linkat((int) olddfd) */ 38 | | (1ULL << ( 38 - 0)) /* renameat((int) olddfd) */ 39 | | (1ULL << ( 44 - 0)) /* fstatfs((unsigned int) fd) */ 40 | | (1ULL << ( 46 - 0)) /* ftruncate((unsigned int) fd) */ 41 | | (1ULL << ( 47 - 0)) /* fallocate((int) fd) */ 42 | | (1ULL << ( 48 - 0)) /* faccessat((int) dfd) */ 43 | | (1ULL << ( 50 - 0)) /* fchdir((unsigned int) fd) */ 44 | | (1ULL << ( 52 - 0)) /* fchmod((unsigned int) fd) */ 45 | | (1ULL << ( 53 - 0)) /* fchmodat((int) dfd) */ 46 | | (1ULL << ( 54 - 0)) /* fchownat((int) dfd) */ 47 | | (1ULL << ( 55 - 0)) /* fchown((unsigned int) fd) */ 48 | | (1ULL << ( 56 - 0)) /* openat((int) dfd) */ 49 | | (1ULL << ( 57 - 0)) /* close((unsigned int) fd) */ 50 | | (1ULL << ( 61 - 0)) /* getdents64((unsigned int) fd) */ 51 | | (1ULL << ( 62 - 0)) /* lseek((unsigned int) fd) */ 52 | | (1ULL << ( 63 - 0)) /* read((unsigned int) fd) */ 53 | , (1ULL << ( 64 - 64)) /* write((unsigned int) fd) */ 54 | | (1ULL << ( 65 - 64)) /* readv((unsigned long) fd) */ 55 | | (1ULL << ( 66 - 64)) /* writev((unsigned long) fd) */ 56 | | (1ULL << ( 67 - 64)) /* pread64((unsigned int) fd) */ 57 | | (1ULL << ( 68 - 64)) /* pwrite64((unsigned int) fd) */ 58 | | (1ULL << ( 69 - 64)) /* preadv((unsigned long) fd) */ 59 | | (1ULL << ( 70 - 64)) /* pwritev((unsigned long) fd) */ 60 | | (1ULL << ( 71 - 64)) /* sendfile64((int) out_fd) */ 61 | | (1ULL << ( 74 - 64)) /* signalfd4((int) ufd) */ 62 | | (1ULL << ( 75 - 64)) /* vmsplice((int) fd) */ 63 | | (1ULL << ( 76 - 64)) /* splice((int) fd_in) */ 64 | | (1ULL << ( 77 - 64)) /* tee((int) fdin) */ 65 | | (1ULL << ( 78 - 64)) /* readlinkat((int) dfd) */ 66 | | (1ULL << ( 79 - 64)) /* newfstatat((int) dfd) */ 67 | | (1ULL << ( 80 - 64)) /* newfstat((unsigned int) fd) */ 68 | | (1ULL << ( 82 - 64)) /* fsync((unsigned int) fd) */ 69 | | (1ULL << ( 83 - 64)) /* fdatasync((unsigned int) fd) */ 70 | | (1ULL << ( 84 - 64)) /* sync_file_range((int) fd) */ 71 | , (1ULL << (200 - 192)) /* bind((int) fd) */ 72 | | (1ULL << (201 - 192)) /* listen((int) fd) */ 73 | | (1ULL << (202 - 192)) /* accept((int) fd) */ 74 | | (1ULL << (203 - 192)) /* connect((int) fd) */ 75 | | (1ULL << (204 - 192)) /* getsockname((int) fd) */ 76 | | (1ULL << (205 - 192)) /* getpeername((int) fd) */ 77 | | (1ULL << (206 - 192)) /* sendto((int) fd) */ 78 | | (1ULL << (207 - 192)) /* recvfrom((int) fd) */ 79 | | (1ULL << (208 - 192)) /* setsockopt((int) fd) */ 80 | | (1ULL << (209 - 192)) /* getsockopt((int) fd) */ 81 | | (1ULL << (210 - 192)) /* shutdown((int) fd) */ 82 | | (1ULL << (211 - 192)) /* sendmsg((int) fd) */ 83 | | (1ULL << (212 - 192)) /* recvmsg((int) fd) */ 84 | | (1ULL << (213 - 192)) /* readahead((int) fd) */ 85 | | (1ULL << (223 - 192)) /* fadvise64_64((int) fd) */ 86 | | (1ULL << (242 - 192)) /* accept4((int) fd) */ 87 | , (1ULL << (263 - 256)) /* fanotify_mark((int) fanotify_fd) */ 88 | | (1ULL << (264 - 256)) /* name_to_handle_at((int) dfd) */ 89 | | (1ULL << (265 - 256)) /* open_by_handle_at((int) mountdirfd) */ 90 | | (1ULL << (267 - 256)) /* syncfs((int) fd) */ 91 | | (1ULL << (268 - 256)) /* setns((int) fd) */ 92 | | (1ULL << (269 - 256)) /* sendmmsg((int) fd) */ 93 | | (1ULL << (273 - 256)) /* finit_module((int) fd) */ 94 | | (1ULL << (276 - 256)) /* renameat2((int) olddfd) */ 95 | | (1ULL << (281 - 256)) /* execveat((int) fd) */ 96 | | (1ULL << (285 - 256)) /* copy_file_range((int) fd_in) */ 97 | | (1ULL << (286 - 256)) /* preadv2((unsigned long) fd) */ 98 | | (1ULL << (287 - 256)) /* pwritev2((unsigned long) fd) */ 99 | | (1ULL << (291 - 256)) /* statx((int) dfd) */ 100 | | (1ULL << (294 - 256)) /* kexec_file_load((int) kernel_fd) */ 101 | , (1ULL << (410 - 384)) /* timerfd_gettime((int) ufd) */ 102 | | (1ULL << (411 - 384)) /* timerfd_settime((int) ufd) */ 103 | | (1ULL << (412 - 384)) /* utimensat((int) dfd) */ 104 | | (1ULL << (417 - 384)) /* recvmmsg((int) fd) */ 105 | | (1ULL << (424 - 384)) /* pidfd_send_signal((int) pidfd) */ 106 | | (1ULL << (426 - 384)) /* io_uring_enter((unsigned int) fd) */ 107 | | (1ULL << (427 - 384)) /* io_uring_register((unsigned int) fd) */ 108 | | (1ULL << (428 - 384)) /* open_tree((int) dfd) */ 109 | | (1ULL << (429 - 384)) /* move_mount((int) from_dfd) */ 110 | | (1ULL << (431 - 384)) /* fsconfig((int) fd) */ 111 | | (1ULL << (432 - 384)) /* fsmount((int) fs_fd) */ 112 | | (1ULL << (433 - 384)) /* fspick((int) dfd) */ 113 | | (1ULL << (436 - 384)) /* close_range((unsigned int) fd) */ 114 | | (1ULL << (437 - 384)) /* openat2((int) dfd) */ 115 | | (1ULL << (438 - 384)) /* pidfd_getfd((int) pidfd) */ 116 | | (1ULL << (439 - 384)) /* faccessat2((int) dfd) */ 117 | | (1ULL << (440 - 384)) /* process_madvise((int) pidfd) */ 118 | | (1ULL << (441 - 384)) /* epoll_pwait2((int) epfd) */ 119 | | (1ULL << (442 - 384)) /* mount_setattr((int) dfd) */ 120 | | (1ULL << (443 - 384)) /* quotactl_fd((unsigned int) fd) */ 121 | | (1ULL << (445 - 384)) /* landlock_add_rule((const int) ruleset_fd) */ 122 | | (1ULL << (446 - 384)) /* landlock_restrict_self((const int) ruleset_fd) */ 123 | , (1ULL << (448 - 448)) /* process_mrelease((int) pidfd) */ 124 | | (1ULL << (451 - 448)) /* cachestat((unsigned int) fd) */ 125 | | (1ULL << (452 - 448)) /* fchmodat2((int) dfd) */ 126 | }; 127 | 128 | 129 | #define SYSCALL_HAS_FD_ARG1(syscall_nr) ({ \ 130 | int __has_fd = 0; \ 131 | if ((syscall_nr) < 512) { \ 132 | __has_fd = !!(syscall_fd_bitmap[(syscall_nr) / 64] & (1ULL << ((syscall_nr) % 64))); \ 133 | } \ 134 | __has_fd; \ 135 | }) 136 | 137 | #endif /* __XCAPTURE_SYSCALL_FD_BITMAP_H */ 138 | -------------------------------------------------------------------------------- /xcapture/include/syscall_fd_bitmap_x86_64.h: -------------------------------------------------------------------------------- 1 | #ifndef __XCAPTURE_SYSCALL_FD_BITMAP_H 2 | #define __XCAPTURE_SYSCALL_FD_BITMAP_H 3 | 4 | /* 5 | * Define bitmap for syscalls that have fd as first argument 6 | * Each bit position corresponds to a syscall number 7 | * Using 8 * uint64_t for syscalls 0-511 8 | */ 9 | #define SYSCALL_FD_BITMAP_SIZE (512 / 64) 10 | 11 | /* 12 | * Initialize the bitmap with syscalls that have fd as first argument 13 | * Each entry in this array represents a 64-bit chunk of the bitmap 14 | * 15 | * The list is generated using latest 0x.tools syscallargs (1.0.4) 16 | * using the syscall_x86_64.tbl file from latest linux kernel source 17 | * 18 | * sudo ./bin/syscallargs --gen-fd-bitmap --syscalltbl ./next/xcapture/include/syscall_x86_64.tbl 19 | */ 20 | 21 | static const uint64_t syscall_fd_bitmap[SYSCALL_FD_BITMAP_SIZE] = { 22 | (1ULL << ( 0 - 0)) /* read((unsigned int) fd) */ 23 | | (1ULL << ( 1 - 0)) /* write((unsigned int) fd) */ 24 | | (1ULL << ( 3 - 0)) /* close((unsigned int) fd) */ 25 | | (1ULL << ( 5 - 0)) /* newfstat((unsigned int) fd) */ 26 | | (1ULL << ( 8 - 0)) /* lseek((unsigned int) fd) */ 27 | | (1ULL << ( 16 - 0)) /* ioctl((unsigned int) fd) */ 28 | | (1ULL << ( 17 - 0)) /* pread64((unsigned int) fd) */ 29 | | (1ULL << ( 18 - 0)) /* pwrite64((unsigned int) fd) */ 30 | | (1ULL << ( 19 - 0)) /* readv((unsigned long) fd) */ 31 | | (1ULL << ( 20 - 0)) /* writev((unsigned long) fd) */ 32 | | (1ULL << ( 33 - 0)) /* dup2((unsigned int) oldfd) */ 33 | | (1ULL << ( 40 - 0)) /* sendfile64((int) out_fd) */ 34 | | (1ULL << ( 42 - 0)) /* connect((int) fd) */ 35 | | (1ULL << ( 43 - 0)) /* accept((int) fd) */ 36 | | (1ULL << ( 44 - 0)) /* sendto((int) fd) */ 37 | | (1ULL << ( 45 - 0)) /* recvfrom((int) fd) */ 38 | | (1ULL << ( 46 - 0)) /* sendmsg((int) fd) */ 39 | | (1ULL << ( 47 - 0)) /* recvmsg((int) fd) */ 40 | | (1ULL << ( 48 - 0)) /* shutdown((int) fd) */ 41 | | (1ULL << ( 49 - 0)) /* bind((int) fd) */ 42 | | (1ULL << ( 50 - 0)) /* listen((int) fd) */ 43 | | (1ULL << ( 51 - 0)) /* getsockname((int) fd) */ 44 | | (1ULL << ( 52 - 0)) /* getpeername((int) fd) */ 45 | | (1ULL << ( 54 - 0)) /* setsockopt((int) fd) */ 46 | | (1ULL << ( 55 - 0)) /* getsockopt((int) fd) */ 47 | , (1ULL << ( 72 - 64)) /* fcntl((unsigned int) fd) */ 48 | | (1ULL << ( 73 - 64)) /* flock((unsigned int) fd) */ 49 | | (1ULL << ( 74 - 64)) /* fsync((unsigned int) fd) */ 50 | | (1ULL << ( 75 - 64)) /* fdatasync((unsigned int) fd) */ 51 | | (1ULL << ( 77 - 64)) /* ftruncate((unsigned int) fd) */ 52 | | (1ULL << ( 78 - 64)) /* getdents((unsigned int) fd) */ 53 | | (1ULL << ( 81 - 64)) /* fchdir((unsigned int) fd) */ 54 | | (1ULL << ( 91 - 64)) /* fchmod((unsigned int) fd) */ 55 | | (1ULL << ( 93 - 64)) /* fchown((unsigned int) fd) */ 56 | , (1ULL << (138 - 128)) /* fstatfs((unsigned int) fd) */ 57 | | (1ULL << (187 - 128)) /* readahead((int) fd) */ 58 | | (1ULL << (190 - 128)) /* fsetxattr((int) fd) */ 59 | , (1ULL << (193 - 192)) /* fgetxattr((int) fd) */ 60 | | (1ULL << (196 - 192)) /* flistxattr((int) fd) */ 61 | | (1ULL << (199 - 192)) /* fremovexattr((int) fd) */ 62 | | (1ULL << (217 - 192)) /* getdents64((unsigned int) fd) */ 63 | | (1ULL << (221 - 192)) /* fadvise64((int) fd) */ 64 | | (1ULL << (232 - 192)) /* epoll_wait((int) epfd) */ 65 | | (1ULL << (233 - 192)) /* epoll_ctl((int) epfd) */ 66 | | (1ULL << (254 - 192)) /* inotify_add_watch((int) fd) */ 67 | | (1ULL << (255 - 192)) /* inotify_rm_watch((int) fd) */ 68 | , (1ULL << (257 - 256)) /* openat((int) dfd) */ 69 | | (1ULL << (258 - 256)) /* mkdirat((int) dfd) */ 70 | | (1ULL << (259 - 256)) /* mknodat((int) dfd) */ 71 | | (1ULL << (260 - 256)) /* fchownat((int) dfd) */ 72 | | (1ULL << (261 - 256)) /* futimesat((int) dfd) */ 73 | | (1ULL << (262 - 256)) /* newfstatat((int) dfd) */ 74 | | (1ULL << (263 - 256)) /* unlinkat((int) dfd) */ 75 | | (1ULL << (264 - 256)) /* renameat((int) olddfd) */ 76 | | (1ULL << (265 - 256)) /* linkat((int) olddfd) */ 77 | | (1ULL << (267 - 256)) /* readlinkat((int) dfd) */ 78 | | (1ULL << (268 - 256)) /* fchmodat((int) dfd) */ 79 | | (1ULL << (269 - 256)) /* faccessat((int) dfd) */ 80 | | (1ULL << (275 - 256)) /* splice((int) fd_in) */ 81 | | (1ULL << (276 - 256)) /* tee((int) fdin) */ 82 | | (1ULL << (277 - 256)) /* sync_file_range((int) fd) */ 83 | | (1ULL << (278 - 256)) /* vmsplice((int) fd) */ 84 | | (1ULL << (280 - 256)) /* utimensat((int) dfd) */ 85 | | (1ULL << (281 - 256)) /* epoll_pwait((int) epfd) */ 86 | | (1ULL << (282 - 256)) /* signalfd((int) ufd) */ 87 | | (1ULL << (285 - 256)) /* fallocate((int) fd) */ 88 | | (1ULL << (286 - 256)) /* timerfd_settime((int) ufd) */ 89 | | (1ULL << (287 - 256)) /* timerfd_gettime((int) ufd) */ 90 | | (1ULL << (288 - 256)) /* accept4((int) fd) */ 91 | | (1ULL << (289 - 256)) /* signalfd4((int) ufd) */ 92 | | (1ULL << (292 - 256)) /* dup3((unsigned int) oldfd) */ 93 | | (1ULL << (295 - 256)) /* preadv((unsigned long) fd) */ 94 | | (1ULL << (296 - 256)) /* pwritev((unsigned long) fd) */ 95 | | (1ULL << (299 - 256)) /* recvmmsg((int) fd) */ 96 | | (1ULL << (301 - 256)) /* fanotify_mark((int) fanotify_fd) */ 97 | | (1ULL << (303 - 256)) /* name_to_handle_at((int) dfd) */ 98 | | (1ULL << (304 - 256)) /* open_by_handle_at((int) mountdirfd) */ 99 | | (1ULL << (306 - 256)) /* syncfs((int) fd) */ 100 | | (1ULL << (307 - 256)) /* sendmmsg((int) fd) */ 101 | | (1ULL << (308 - 256)) /* setns((int) fd) */ 102 | | (1ULL << (313 - 256)) /* finit_module((int) fd) */ 103 | | (1ULL << (316 - 256)) /* renameat2((int) olddfd) */ 104 | , (1ULL << (320 - 320)) /* kexec_file_load((int) kernel_fd) */ 105 | | (1ULL << (322 - 320)) /* execveat((int) fd) */ 106 | | (1ULL << (326 - 320)) /* copy_file_range((int) fd_in) */ 107 | | (1ULL << (327 - 320)) /* preadv2((unsigned long) fd) */ 108 | | (1ULL << (328 - 320)) /* pwritev2((unsigned long) fd) */ 109 | | (1ULL << (332 - 320)) /* statx((int) dfd) */ 110 | , (1ULL << (424 - 384)) /* pidfd_send_signal((int) pidfd) */ 111 | | (1ULL << (426 - 384)) /* io_uring_enter((unsigned int) fd) */ 112 | | (1ULL << (427 - 384)) /* io_uring_register((unsigned int) fd) */ 113 | | (1ULL << (428 - 384)) /* open_tree((int) dfd) */ 114 | | (1ULL << (429 - 384)) /* move_mount((int) from_dfd) */ 115 | | (1ULL << (431 - 384)) /* fsconfig((int) fd) */ 116 | | (1ULL << (432 - 384)) /* fsmount((int) fs_fd) */ 117 | | (1ULL << (433 - 384)) /* fspick((int) dfd) */ 118 | | (1ULL << (436 - 384)) /* close_range((unsigned int) fd) */ 119 | | (1ULL << (437 - 384)) /* openat2((int) dfd) */ 120 | | (1ULL << (438 - 384)) /* pidfd_getfd((int) pidfd) */ 121 | | (1ULL << (439 - 384)) /* faccessat2((int) dfd) */ 122 | | (1ULL << (440 - 384)) /* process_madvise((int) pidfd) */ 123 | | (1ULL << (441 - 384)) /* epoll_pwait2((int) epfd) */ 124 | | (1ULL << (442 - 384)) /* mount_setattr((int) dfd) */ 125 | | (1ULL << (443 - 384)) /* quotactl_fd((unsigned int) fd) */ 126 | | (1ULL << (445 - 384)) /* landlock_add_rule((const int) ruleset_fd) */ 127 | | (1ULL << (446 - 384)) /* landlock_restrict_self((const int) ruleset_fd) */ 128 | , (1ULL << (448 - 448)) /* process_mrelease((int) pidfd) */ 129 | | (1ULL << (451 - 448)) /* cachestat((unsigned int) fd) */ 130 | | (1ULL << (452 - 448)) /* fchmodat2((int) dfd) */ 131 | }; 132 | 133 | 134 | #define SYSCALL_HAS_FD_ARG1(syscall_nr) ({ \ 135 | int __has_fd = 0; \ 136 | if ((syscall_nr) < 512) { \ 137 | __has_fd = !!(syscall_fd_bitmap[(syscall_nr) / 64] & (1ULL << ((syscall_nr) % 64))); \ 138 | } \ 139 | __has_fd; \ 140 | }) 141 | 142 | #endif /* __XCAPTURE_SYSCALL_FD_BITMAP_H */ 143 | -------------------------------------------------------------------------------- /xcapture/include/xcapture.h: -------------------------------------------------------------------------------- 1 | // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 | // Copyright 2024-2038 Tanel Poder [0x.tools] 3 | 4 | #ifndef __XCAPTURE_H 5 | #define __XCAPTURE_H 6 | 7 | #ifdef __BPF__ 8 | #define UINT32_MAX 0xFFFFFFFF 9 | #include "vmlinux.h" 10 | #endif 11 | 12 | #define TASK_COMM_LEN 16 13 | #define MAX_STACK_LEN 127 14 | #define MAX_FILENAME_LEN 256 15 | #define MAX_CMDLINE_LEN 64 16 | #define MAX_CONN_INFO_LEN 128 // "TCP 1.2.3.4:80->5.6.7.8:12345" for IPv4 17 | 18 | // kernel task states here so we don't have to include kernel headers 19 | #define TASK_RUNNING 0x00000000 20 | #define TASK_INTERRUPTIBLE 0x00000001 21 | #define TASK_UNINTERRUPTIBLE 0x00000002 22 | #define TASK_STOPPED 0x00000004 23 | #define TASK_TRACED 0x00000008 24 | /* Used in tsk->exit_state: */ 25 | #define EXIT_DEAD 0x00000010 26 | #define EXIT_ZOMBIE 0x00000020 27 | #define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) 28 | /* Used in tsk->state again: */ 29 | #define TASK_PARKED 0x00000040 30 | #define TASK_DEAD 0x00000080 31 | #define TASK_WAKEKILL 0x00000100 32 | #define TASK_WAKING 0x00000200 33 | #define TASK_NOLOAD 0x00000400 34 | #define TASK_NEW 0x00000800 35 | #define TASK_RTLOCK_WAIT 0x00001000 36 | #define TASK_FREEZABLE 0x00002000 37 | #define TASK_FREEZABLE_UNSAFE 0x00004000 38 | #define TASK_FROZEN 0x00008000 39 | #define TASK_STATE_MAX 0x00010000 40 | 41 | #define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD) 42 | 43 | // task flags from linux/sched.h 44 | #define PF_KSWAPD 0x00020000 /* I am kswapd */ 45 | #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ 46 | 47 | // linux/vmalloc.h 48 | #define VM_IOREMAP 0x00000001 /* ioremap() and friends */ 49 | #define VM_ALLOC 0x00000002 /* vmalloc() */ 50 | #define VM_MAP 0x00000004 /* vmap()ed pages */ 51 | #define VM_USERMAP 0x00000008 /* suitable for remap_vmalloc_range */ 52 | #define VM_DMA_COHERENT 0x00000010 /* dma_alloc_coherent */ 53 | #define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */ 54 | #define VM_NO_GUARD 0x00000040 /* *** DANGEROUS*** don't add guard page */ 55 | #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ 56 | #define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */ 57 | #define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */ 58 | #define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */ 59 | #define VM_DEFER_KMEMLEAK 0x00000800 /* defer kmemleak object creation */ 60 | #define VM_SPARSE 0x00001000 /* sparse vm_area. not all pages are present. */ 61 | 62 | 63 | // devices 64 | #define MINORBITS 20 65 | #define MINORMASK ((1U << MINORBITS) - 1) 66 | #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi)) 67 | #define MAJOR(dev) ((unsigned int) ((dev) >> MINORBITS)) 68 | #define MINOR(dev) ((unsigned int) ((dev) & MINORMASK)) 69 | 70 | // tracking if there are any IO requests in aio rings for heuristic reasoning 71 | // if the later io_[p]getevents calls are blocked or in an idle loop at app level 72 | struct aio_ctx_key { 73 | __u32 tgid; 74 | __u64 ctx_id; 75 | } __attribute__((packed)); 76 | 77 | struct aio_ctx_info { 78 | pid_t tid; // Thread ID that last called io_submit 79 | __u64 last_submit_ts; // Timestamp of last io_submit call 80 | __u32 submit_count; // Number of io_submit calls for this context 81 | }; 82 | 83 | // ringbuf struct type 84 | enum event_type { 85 | EVENT_TASK_INFO = 1, 86 | EVENT_SYSCALL_COMPLETION = 2, 87 | EVENT_IORQ_COMPLETION = 3 88 | }; 89 | 90 | // Structure for tracking in-flight block I/O requests 91 | struct iorq_info { 92 | bool iorq_sampled:1; // Whether this was caught by task_iter sampler and must be emitted on completion 93 | __u64 iorq_sequence_num; // Sequence number from submitting task 94 | pid_t insert_pid; // Task that queued the I/O (if queuing was needed) 95 | pid_t insert_tgid; // Process that queued the I/O 96 | pid_t issue_pid; // Task that issued the I/O to block device driver 97 | pid_t issue_tgid; // Process that issued the I/O 98 | }; 99 | 100 | // This is the central "extended Task State Array" (eTSA) 101 | // to be used with BPF_MAP_TYPE_TASK_STORAGE 102 | struct task_storage { 103 | pid_t pid; // having pid/tgid duplicated here allow tracking probes 104 | pid_t tgid; // to avoid looking up the task_struct if they get the task_storage anyway 105 | __u64 sample_start_ktime; // CLOCK_MONOTONIC ns (all tasks have same sample_time) 106 | __u64 sample_actual_ktime; // CLOCK_MONOTONIC ns (debug for sample duration analysis) 107 | 108 | bool sc_sampled:1; // task iterator will set the following fields only if it catches a task in syscall 109 | __s32 in_syscall_nr; 110 | __u64 sc_enter_time; 111 | __u64 sc_sequence_num; // any syscall entry in a task will increment this single counter (tracepoint) 112 | __u64 prev_sc_sequence_num; // edge case: deal with long idle aio getevents calls ongoing before xcapture start 113 | 114 | __u64 iorq_sequence_num; // sequence number for all iorq submissions by this task 115 | struct request *last_iorq_rq; // Last iorq submitted, task_iter updates iorq_sampled=true for this 116 | struct request *last_iorq_sampled; // save the rq address that was ongoing during sample (for emitting later) 117 | 118 | __u32 aio_inflight_reqs; // number of inflight requests in aio ring (0 means idle, waiting for work) 119 | }; 120 | 121 | // Syscall completion event structure for ringbuf 122 | struct sc_completion_event { 123 | enum event_type type; 124 | pid_t pid; 125 | pid_t tgid; 126 | __u64 completed_sc_sequence_num; 127 | __u64 completed_sc_enter_time; 128 | __u64 completed_sc_exit_time; 129 | __s64 completed_sc_ret_val; 130 | __s32 completed_syscall_nr; 131 | }; 132 | 133 | // Block I/O completion event structure for ringbuf 134 | struct iorq_completion_event { 135 | enum event_type type; 136 | pid_t insert_pid; 137 | pid_t insert_tgid; 138 | pid_t issue_pid; 139 | pid_t issue_tgid; 140 | pid_t complete_pid; 141 | pid_t complete_tgid; 142 | __u64 iorq_sequence_num; 143 | __u64 iorq_insert_time; 144 | __u64 iorq_issue_time; 145 | __u64 iorq_complete_time; 146 | __u32 iorq_dev; 147 | __u64 iorq_sector; 148 | __u32 iorq_bytes; 149 | __u32 iorq_cmd_flags; 150 | __s32 iorq_error; 151 | }; 152 | 153 | // network connection tracking 154 | struct socket_info { 155 | __u16 family; // AF_INET or AF_INET6 156 | __u16 protocol; // IPPROTO_TCP or IPPROTO_UDP 157 | union { 158 | __u32 saddr_v4; 159 | __u8 saddr_v6[16]; // Changed from __int128 160 | }; 161 | union { 162 | __u32 daddr_v4; 163 | __u8 daddr_v6[16]; // Changed from __int128 164 | }; 165 | __u16 sport; 166 | __u16 dport; 167 | }; 168 | 169 | 170 | // This gets emitted to userspace via ringbuf 171 | struct task_output_event { 172 | enum event_type type; 173 | 174 | // Task struct fields 175 | pid_t pid; 176 | pid_t tgid; 177 | __u32 state; 178 | __u32 flags; 179 | uid_t euid; 180 | char comm[TASK_COMM_LEN]; 181 | 182 | // Task's additional data 183 | __s32 syscall_nr; 184 | __u64 syscall_args[6]; 185 | char filename[MAX_FILENAME_LEN]; 186 | char exe_file[MAX_FILENAME_LEN]; 187 | 188 | // Socket info 189 | struct socket_info sock_info; 190 | bool has_socket_info:1; 191 | 192 | // Task's scheduler state 193 | int on_cpu; 194 | int on_rq; 195 | void *migration_pending; 196 | bool in_execve:1; 197 | bool in_iowait:1; 198 | bool in_thrashing:1; 199 | bool sched_remote_wakeup:1; 200 | 201 | // Extended task state storage 202 | struct task_storage storage; 203 | 204 | // Stack trace info 205 | int kstack_len; 206 | __u64 kstack[MAX_STACK_LEN]; 207 | }; 208 | 209 | 210 | // task filtering based on command line options 211 | struct filter_config { 212 | bool show_all; // Show all tasks including sleeping ones when true 213 | __u32 state_mask; // Bitmap of states to show 214 | }; 215 | 216 | 217 | #endif /* __XCAPTURE_H */ 218 | -------------------------------------------------------------------------------- /xcapture/include/xcapture_user.h: -------------------------------------------------------------------------------- 1 | // Update xcapture_user.h 2 | #ifndef __XCAPTURE_USER_H 3 | #define __XCAPTURE_USER_H 4 | 5 | #include 6 | #include 7 | 8 | #define DEFAULT_OUTPUT_DIR "." 9 | #define SAMPLE_CSV_FILENAME "xcapture_samples" // .csv will be appended later 10 | #define KSTACK_CSV_FILENAME "xcapture_kstacks" 11 | #define SYSC_COMPLETION_CSV_FILENAME "xcapture_syscend" 12 | #define IORQ_COMPLETION_CSV_FILENAME "xcapture_iorqend" 13 | 14 | // For converting BPF ktime to wallclock time 15 | struct time_correlation { 16 | struct timespec wall_time; // CLOCK_REALTIME 17 | struct timespec mono_time; // CLOCK_MONOTONIC: what bpf_ktime_get_ns() returns 18 | }; 19 | 20 | // Hourly csv output files 21 | struct output_files { 22 | FILE *sample_file; 23 | FILE *sc_completion_file; 24 | FILE *iorq_completion_file; 25 | FILE *kstack_file; 26 | int current_year; // Track full timestamp in case of long VM pauses 27 | int current_month; // that may cause the timestamp to jump by 24 hours or more 28 | int current_day; 29 | int current_hour; 30 | }; 31 | 32 | // Shared variables 33 | extern pid_t mypid; 34 | extern struct time_correlation tcorr; 35 | extern struct output_files files; 36 | extern bool output_csv; 37 | extern bool output_verbose; 38 | extern bool dump_stack_traces; 39 | 40 | // Shared function declarations 41 | extern const char *getusername(uid_t uid); 42 | extern const char *format_task_state(__u32 state); 43 | extern const char *safe_syscall_name(__s32 syscall_nr); 44 | extern const char *get_syscall_info_desc(__u32 syscall_nr); 45 | extern const char *get_iorq_op_flags(__u32 cmd_flags); 46 | extern const char *format_connection(const struct socket_info *si, char *buf, size_t buflen); 47 | extern struct timespec get_wall_from_mono(struct time_correlation *tcorr, __u64 bpf_time); 48 | extern struct timespec sub_ns_from_ts(struct timespec ts, __u64 ns); 49 | extern void get_str_from_ts(struct timespec ts, char *buf, size_t bufsize); 50 | extern int check_and_rotate_files(struct output_files *files); 51 | 52 | #endif /* __XCAPTURE_USER_H */ 53 | -------------------------------------------------------------------------------- /xcapture/src/filters/README.md: -------------------------------------------------------------------------------- 1 | This is where we put the various .h files that hold "always inline" functions and macros for filtering records of interest 2 | -------------------------------------------------------------------------------- /xcapture/src/helpers/file_helpers.h: -------------------------------------------------------------------------------- 1 | // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 | // Copyright 2024-2038 Tanel Poder [0x.tools] 3 | 4 | #ifndef __FILE_HELPERS_H 5 | #define __FILE_HELPERS_H 6 | 7 | #include 8 | #include 9 | #include 10 | 11 | #include "xcapture.h" 12 | 13 | #define MAX_PATH_DEPTH 16 14 | // socket file type 15 | #define AF_INET 2 16 | #define AF_INET6 10 17 | #define S_IFMT 0170000 18 | #define S_IFSOCK 0140000 19 | 20 | // get file dentry name only 21 | static void __always_inline get_file_name(struct file *file, char *dest, size_t size, const char *fallback) { 22 | if (file) { 23 | struct path file_path; 24 | BPF_CORE_READ_INTO(&file_path, file, f_path); 25 | struct dentry *dentry = BPF_CORE_READ(file, f_path.dentry); 26 | 27 | if (dentry) { 28 | struct qstr d_name = BPF_CORE_READ(dentry, d_name); 29 | bpf_probe_read_kernel_str(dest, size, d_name.name); 30 | return; 31 | } 32 | } 33 | 34 | // Handle error/fallback message case 35 | __builtin_memcpy(dest, fallback, __builtin_strlen(fallback) + 1); 36 | } 37 | 38 | 39 | // get inet socket info from file object 40 | static __always_inline bool get_socket_info(struct file *file, struct socket_info *si) 41 | { 42 | struct socket *sock; 43 | struct sock *sk; 44 | struct inet_sock *inet; 45 | 46 | if (!file) 47 | return false; 48 | 49 | sock = BPF_CORE_READ(file, private_data); 50 | if (!sock) 51 | return false; 52 | 53 | sk = BPF_CORE_READ(sock, sk); 54 | if (!sk) 55 | return false; 56 | 57 | si->family = BPF_CORE_READ(sk, __sk_common.skc_family); 58 | if (si->family != AF_INET && si->family != AF_INET6) 59 | return false; 60 | 61 | si->protocol = BPF_CORE_READ(sk, sk_protocol); 62 | if (si->protocol != IPPROTO_TCP && si->protocol != IPPROTO_UDP) 63 | return false; 64 | 65 | inet = (struct inet_sock *)sk; 66 | 67 | if (si->family == AF_INET) { 68 | si->saddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr); 69 | si->daddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_daddr); 70 | } else { 71 | unsigned __int128 saddr, daddr; 72 | BPF_CORE_READ_INTO(&saddr, sk, __sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32); 73 | BPF_CORE_READ_INTO(&daddr, sk, __sk_common.skc_v6_daddr.in6_u.u6_addr32); 74 | __builtin_memcpy(si->saddr_v6, &saddr, sizeof(si->saddr_v6)); 75 | __builtin_memcpy(si->daddr_v6, &daddr, sizeof(si->daddr_v6)); 76 | } 77 | 78 | // Read ports 79 | si->sport = BPF_CORE_READ(inet, inet_sport); 80 | si->dport = BPF_CORE_READ(sk, __sk_common.skc_dport); 81 | 82 | return true; 83 | } 84 | 85 | #endif /* __FILE_HELPERS_H */ 86 | -------------------------------------------------------------------------------- /xcapture/src/maps/xcapture_maps.h: -------------------------------------------------------------------------------- 1 | #ifndef XCAPTURE_MAPS_H 2 | #define XCAPTURE_MAPS_H 3 | 4 | // LIBBPF_PIN_BY_NAME is needed so that different object 5 | // files would be able to share the same map (by design) 6 | // and not create their own disconnected private copies 7 | 8 | // Task storage map for maintaining per-task state (extended thread state) 9 | struct { 10 | __uint(type, BPF_MAP_TYPE_TASK_STORAGE); 11 | __uint(map_flags, BPF_F_NO_PREALLOC); 12 | __type(key, int); 13 | __type(value, struct task_storage); 14 | __uint(pinning, LIBBPF_PIN_BY_NAME); 15 | } task_storage SEC(".maps"); 16 | 17 | // Command line option passing for "tasks of interst" filtering 18 | struct { 19 | __uint(type, BPF_MAP_TYPE_ARRAY); 20 | __uint(max_entries, 1); 21 | __type(key, __u32); 22 | __type(value, struct filter_config); 23 | __uint(pinning, LIBBPF_PIN_BY_NAME); 24 | } filter_config_map SEC(".maps"); 25 | 26 | // Map for tracking block I/O operations 27 | struct { 28 | __uint(type, BPF_MAP_TYPE_HASH); 29 | __uint(max_entries, 1024 * 1024); 30 | __type(key, struct request *); 31 | __type(value, struct iorq_info); 32 | __uint(pinning, LIBBPF_PIN_BY_NAME); 33 | } iorq_tracking SEC(".maps"); 34 | 35 | // Ring buffers for event completion records and sampled task info 36 | struct { 37 | __uint(type, BPF_MAP_TYPE_RINGBUF); 38 | __uint(max_entries, 16 * 1024 * 1024); // bytes 39 | __uint(pinning, LIBBPF_PIN_BY_NAME); 40 | } completion_events SEC(".maps"); 41 | 42 | // The task_samples map has periodic bursts of task iterator record writes 43 | struct { 44 | __uint(type, BPF_MAP_TYPE_RINGBUF); 45 | __uint(max_entries, 16 * 1024 * 1024); // bytes 46 | __uint(pinning, LIBBPF_PIN_BY_NAME); 47 | } task_samples SEC(".maps"); 48 | 49 | #endif /* XCAPTURE_MAPS_H */ 50 | -------------------------------------------------------------------------------- /xcapture/src/probes/io/iorq.bpf.c: -------------------------------------------------------------------------------- 1 | #include "io/iorq.bpf.h" 2 | #include "xcapture.h" 3 | #include "xcapture_helpers.h" 4 | 5 | char LICENSE[] SEC("license") = "Dual BSD/GPL"; 6 | 7 | // In the "simple" IO request tracking mode, trying avoid and delay work 8 | // as much as possible to the request completion tracepoint, as we only 9 | // need to process completion if an in-flight request happened to be sampled 10 | // while in-flight 11 | 12 | // Block I/O request insertion handler (does not fire if it's bypassed for direct dispatch) 13 | SEC("tp_btf/block_rq_insert") 14 | int BPF_PROG(block_rq_insert, struct request *rq) 15 | { 16 | struct task_struct *task = bpf_get_current_task_btf(); 17 | struct task_storage *storage; 18 | storage = bpf_task_storage_get(&task_storage, task, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 19 | // if can't get task storage object, something is broken, no point in accounting iorqs 20 | if (!storage) 21 | return 0; 22 | 23 | // ensure zero fill of reused memory as we don't populate all members here 24 | // this also sets iorq_info.iorq_sampled = false 25 | struct iorq_info iorq_info = {0}; 26 | 27 | storage->last_iorq_rq = rq; 28 | 29 | // every task has its private sequence counting, incremented only here or iorq issue tp 30 | iorq_info.iorq_sequence_num = ++storage->iorq_sequence_num; 31 | iorq_info.insert_pid = task->pid; 32 | iorq_info.insert_tgid = task->tgid; 33 | 34 | bpf_map_update_elem(&iorq_tracking, &rq, &iorq_info, BPF_ANY); 35 | 36 | return 0; 37 | } 38 | 39 | // Block I/O request *issue* handler (may run under a different task than request inserter) 40 | SEC("tp_btf/block_rq_issue") 41 | int BPF_PROG(block_rq_issue, struct request *rq) 42 | { 43 | struct task_struct *task = bpf_get_current_task_btf(); 44 | struct task_storage *storage; 45 | storage = bpf_task_storage_get(&task_storage, task, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 46 | if (!storage) 47 | return 0; 48 | 49 | // iorq INSERT tracepoint may not always get hit, some I/Os can go directly to ISSUE stage 50 | // check if request is already inserted into the OS I/O queue previously 51 | struct iorq_info *iorq_info = bpf_map_lookup_elem(&iorq_tracking, &rq); 52 | 53 | if (iorq_info) { 54 | // iorq_info->issue_time = rq->io_start_time_ns; // issue to device driver 55 | iorq_info->issue_pid = task->pid; 56 | iorq_info->issue_tgid = task->tgid; 57 | } 58 | // request issued/dispatched directly without inserting to a queue first 59 | // so set the iorq map entry up from scratch (insert_pid same as issue_pid) 60 | else { 61 | struct iorq_info new_iorq_info = {0}; // deliberate zero fill 62 | 63 | storage->last_iorq_rq = rq; 64 | 65 | // as we skipped insert, increment the task-private iorq sequence counter here 66 | new_iorq_info.iorq_sequence_num = ++storage->iorq_sequence_num; 67 | new_iorq_info.insert_pid = task->pid; 68 | new_iorq_info.insert_tgid = task->tgid; 69 | new_iorq_info.issue_pid = task->pid; 70 | new_iorq_info.issue_tgid = task->tgid; 71 | 72 | bpf_map_update_elem(&iorq_tracking, &rq, &new_iorq_info, BPF_ANY); 73 | } 74 | 75 | return 0; 76 | } 77 | 78 | // block IORQ completion tracepoint that emits a completion record to ringbuf only 79 | // if the task sampler has marked the currently completing I/O as "sampled" 80 | SEC("tp_btf/block_rq_complete") 81 | int BPF_PROG(block_rq_complete, struct request *rq, int error, unsigned int nr_bytes) 82 | { 83 | // process completion event only after all bios under this IO request are completed 84 | if (nr_bytes < rq->__data_len) 85 | return 0; 86 | 87 | // check if this I/O was sampled 88 | struct iorq_info *iorq_info = bpf_map_lookup_elem(&iorq_tracking, &rq); 89 | // no tracked IO found, nothing to do 90 | if (!iorq_info) 91 | return 0; 92 | 93 | // if this I/O request wasn't sampled by task iterator while in-flight, then do not emit 94 | if (!iorq_info->iorq_sampled) 95 | goto cleanup; 96 | 97 | // allocate ringbuf memory for emitting event 98 | struct iorq_completion_event *event; 99 | event = bpf_ringbuf_reserve(&completion_events, sizeof(*event), 0); 100 | if (!event) 101 | goto cleanup; 102 | 103 | // populate all output struct fields to avoid stale garbage values in ringbuf 104 | // need CORE for rq for RHEL9 / kernel 5.14 verifier 105 | event->type = EVENT_IORQ_COMPLETION; 106 | event->insert_pid = iorq_info->insert_pid; 107 | event->insert_tgid = iorq_info->insert_tgid; 108 | event->issue_pid = iorq_info->issue_pid; 109 | event->issue_tgid = iorq_info->issue_tgid; 110 | event->iorq_sequence_num = iorq_info->iorq_sequence_num; 111 | event->iorq_complete_time = bpf_ktime_get_ns(); // current time for completion 112 | event->iorq_sector = BPF_CORE_READ(rq, __sector); 113 | event->iorq_bytes = BPF_CORE_READ(rq, __data_len); 114 | event->iorq_cmd_flags = BPF_CORE_READ(rq, cmd_flags); 115 | event->iorq_insert_time = BPF_CORE_READ(rq, start_time_ns); 116 | event->iorq_issue_time = BPF_CORE_READ(rq, io_start_time_ns); 117 | event->iorq_error = error; // tracepoint argument 118 | // if rq->q->disk is found, convert it to device maj/min numbers 119 | event->iorq_dev = 0; 120 | struct gendisk *disk = BPF_CORE_READ(rq, q, disk); 121 | if (disk) 122 | event->iorq_dev = MKDEV(BPF_CORE_READ(disk, major), BPF_CORE_READ(disk, first_minor)); 123 | 124 | bpf_ringbuf_submit(event, 0); 125 | 126 | cleanup: 127 | // delete the iorq tracking map element regardless of its sampling status 128 | bpf_map_delete_elem(&iorq_tracking, &rq); 129 | return 0; 130 | } 131 | 132 | 133 | 134 | /* 135 | This is the original completion tracepoint I was using until RHEL 9.5 tests showed its 136 | verifier didn't like it (function calls "spilling" the *rq argument from registers to 137 | stack/memory and it lost its BTF type info afterwards. I'll make this switch dynamic 138 | as a special case for RHEL 5.14.x kernels (or just RHEL's kernel if needed). 139 | 140 | You can comment out the tracepoint above and uncomment the tracepoint below if you want 141 | to play with. 142 | */ 143 | 144 | /* 145 | SEC("tp_btf/block_rq_complete") 146 | int BPF_PROG(block_rq_complete, struct request *rq, int error, unsigned int nr_bytes) 147 | { 148 | // process completion event only after all bios under this IO request are completed 149 | if (nr_bytes < rq->__data_len) 150 | return 0; 151 | 152 | struct iorq_info *iorq_info = bpf_map_lookup_elem(&iorq_tracking, &rq); 153 | // no tracked IO found, nothing to do 154 | if (!iorq_info) 155 | return 0; 156 | 157 | // if this I/O request wasn't sampled by task iterator while in-flight, then do not emit 158 | if (!iorq_info->iorq_sampled) 159 | goto cleanup; 160 | 161 | // allocate ringbuf memory for emitting event 162 | struct iorq_completion_event *event; 163 | event = bpf_ringbuf_reserve(&completion_events, sizeof(*event), 0); 164 | if (!event) 165 | goto cleanup; 166 | 167 | event->type = EVENT_IORQ_COMPLETION; 168 | 169 | event->insert_pid = iorq_info->insert_pid; 170 | event->insert_tgid = iorq_info->insert_tgid; 171 | event->issue_pid = iorq_info->issue_pid; 172 | event->issue_tgid = iorq_info->issue_tgid; 173 | event->iorq_sequence_num = iorq_info->iorq_sequence_num; 174 | event->iorq_insert_time = rq->start_time_ns; 175 | event->iorq_issue_time = rq->io_start_time_ns; 176 | event->iorq_complete_time = bpf_ktime_get_ns(); // current time for completion 177 | 178 | // if rq->q->disk is found, convert it to device maj/min numbers 179 | struct gendisk *disk = rq->q->disk; 180 | if (disk) 181 | event->iorq_dev = MKDEV(disk->major, disk->first_minor); 182 | 183 | event->iorq_sector = rq->__sector; 184 | event->iorq_bytes = rq->__data_len; 185 | event->iorq_cmd_flags = rq->cmd_flags; 186 | event->iorq_error = error; // tracepoint argument 187 | 188 | bpf_ringbuf_submit(event, 0); 189 | 190 | cleanup: 191 | // delete the iorq tracking map element regardless of its sampling status 192 | bpf_map_delete_elem(&iorq_tracking, &rq); 193 | return 0; 194 | } 195 | */ 196 | -------------------------------------------------------------------------------- /xcapture/src/probes/io/iorq.bpf.h: -------------------------------------------------------------------------------- 1 | // Common definitions for syscall tracing 2 | #ifndef SYSCALL_BPF_H 3 | #define SYSCALL_BPF_H 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #include 11 | 12 | #endif /* SYSCALL_BPF_H */ 13 | -------------------------------------------------------------------------------- /xcapture/src/probes/syscall/syscall.bpf.c: -------------------------------------------------------------------------------- 1 | // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 | // Copyright 2024-2038 Tanel Poder [0x.tools] 3 | 4 | #include "syscall/syscall.bpf.h" 5 | #include "xcapture.h" 6 | #include "xcapture_helpers.h" 7 | 8 | char LICENSE[] SEC("license") = "Dual BSD/GPL"; 9 | 10 | // If a userspace task is entering io_*getevents, check if there are any 11 | // I/Os inflight in that ring or it's just an "idle" wait for work to show up 12 | // NOTE: this won't work in the task_iterator program context (yet) as it needs 13 | // to read userspace memory of other processes. TODO: 5.18 has bpf_copy_from_user_task() 14 | static __u32 __always_inline get_num_inflight_aios_ring(__u64 ctx_id) 15 | { 16 | if (!ctx_id) return 0; 17 | struct aio_ring *ring = (void *)ctx_id; 18 | __u32 head = 0, tail = 0; 19 | 20 | // Read ring head and tail (and bail on error) 21 | if (BPF_CORE_READ_USER_INTO(&head, ring, head)) return -1; 22 | if (BPF_CORE_READ_USER_INTO(&tail, ring, tail)) return -2; 23 | 24 | if (tail >= head) { 25 | return tail - head; 26 | } else { 27 | // When when tail has wrapped but head hasn't yet 28 | return (UINT32_MAX - head) + tail + 1; 29 | } 30 | } 31 | 32 | 33 | // syscall entry & exit handlers for active tracking mode 34 | SEC("tp_btf/sys_enter") 35 | int BPF_PROG(handle_sys_enter, struct pt_regs *regs, long syscall_nr) 36 | { 37 | struct task_storage *storage; 38 | struct task_struct *task = bpf_get_current_task_btf(); 39 | if (!task) 40 | return 0; 41 | 42 | storage = bpf_task_storage_get(&task_storage, task, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 43 | if (!storage) 44 | return 0; 45 | 46 | storage->sc_enter_time = bpf_ktime_get_ns(); 47 | storage->in_syscall_nr = syscall_nr; 48 | storage->sc_sequence_num++; 49 | 50 | if (syscall_nr == __NR_io_getevents || syscall_nr == __NR_io_pgetevents) { 51 | __u64 ctx_id = PT_REGS_PARM1_CORE_SYSCALL(regs); // aio ctx_id (process-wide mem addr) 52 | storage->aio_inflight_reqs = get_num_inflight_aios_ring(ctx_id); 53 | } 54 | 55 | return 0; 56 | } 57 | 58 | SEC("tp_btf/sys_exit") 59 | int BPF_PROG(handle_sys_exit, struct pt_regs *regs, long ret) 60 | { 61 | struct task_storage *storage; 62 | struct task_struct *task = bpf_get_current_task_btf(); 63 | storage = bpf_task_storage_get(&task_storage, task, NULL, 64 | BPF_LOCAL_STORAGE_GET_F_CREATE); 65 | if (!storage) 66 | return 0; 67 | 68 | if (!storage->sc_sampled) { // only emit syscalls caught by task sampler 69 | return 0; 70 | } else { 71 | struct sc_completion_event *scevent; 72 | scevent = bpf_ringbuf_reserve(&completion_events, sizeof(*scevent), 0); 73 | 74 | if (scevent) { 75 | scevent->type = EVENT_SYSCALL_COMPLETION; // Set scevent type 76 | 77 | // if storage->sc_sampled above is true, then storage->pid/tgid 78 | // have been put in place by task sampler already too 79 | scevent->pid = storage->pid; 80 | scevent->tgid = storage->tgid; 81 | scevent->completed_syscall_nr = storage->in_syscall_nr; 82 | scevent->completed_sc_sequence_num = storage->sc_sequence_num; 83 | scevent->completed_sc_enter_time = storage->sc_enter_time; 84 | scevent->completed_sc_exit_time = bpf_ktime_get_ns(); 85 | scevent->completed_sc_ret_val = ret; 86 | 87 | bpf_ringbuf_submit(scevent, 0); 88 | } 89 | } 90 | 91 | // clear sampled status as the syscall exits 92 | storage->sc_sampled = false; 93 | storage->in_syscall_nr = -1; 94 | storage->sc_enter_time = 0; 95 | return 0; 96 | } 97 | -------------------------------------------------------------------------------- /xcapture/src/probes/syscall/syscall.bpf.h: -------------------------------------------------------------------------------- 1 | // Common definitions for syscall tracing 2 | #ifndef SYSCALL_BPF_H 3 | #define SYSCALL_BPF_H 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #include "xcapture_maps.h" 11 | 12 | // platform specific syscall stuff 13 | #if defined(__TARGET_ARCH_arm64) 14 | #include "syscall_aarch64.h" 15 | #include "syscall_fd_bitmap_aarch64.h" 16 | #elif defined(__TARGET_ARCH_x86) 17 | #include "syscall_x86_64.h" 18 | #include "syscall_fd_bitmap_x86_64.h" 19 | #endif 20 | 21 | #endif /* SYSCALL_BPF_H */ 22 | -------------------------------------------------------------------------------- /xcapture/src/probes/task/task.bpf.c: -------------------------------------------------------------------------------- 1 | // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause 2 | // Copyright 2024-2038 Tanel Poder [0x.tools] 3 | 4 | #include "vmlinux.h" 5 | #include 6 | #include 7 | #include 8 | #include "xcapture.h" 9 | #include "xcapture_maps.h" 10 | #include "xcapture_helpers.h" 11 | #include "helpers/file_helpers.h" 12 | 13 | #if defined(__TARGET_ARCH_arm64) 14 | #include "syscall_aarch64.h" 15 | #include "syscall_fd_bitmap_aarch64.h" 16 | #elif defined(__TARGET_ARCH_x86) 17 | #include "syscall_x86_64.h" 18 | #include "syscall_fd_bitmap_x86_64.h" 19 | #endif 20 | 21 | 22 | char LICENSE[] SEC("license") = "Dual BSD/GPL"; 23 | char VERSION[] = "3.0.0"; 24 | extern int LINUX_KERNEL_VERSION __kconfig; 25 | 26 | #define PAGE_SIZE 4096 27 | #define EAGAIN 11 28 | 29 | // Version-adaptive task state field retrieval 30 | static __u32 __always_inline get_task_state(void *arg) 31 | { 32 | if (bpf_core_field_exists(struct task_struct___pre514, state)) { 33 | struct task_struct___pre514 *task = arg; 34 | return task->state; 35 | } else { 36 | struct task_struct___post514 *task = arg; 37 | return task->__state; 38 | } 39 | } 40 | 41 | // Interesting task filtering for task iterator 42 | static bool __always_inline should_emit_task(__u32 task_state, struct filter_config *cfg, 43 | __s32 syscall_nr, __u32 aio_inflight_reqs) 44 | { 45 | if (!cfg) 46 | return true; // If we can't find config, show everything 47 | 48 | if (cfg->show_all) 49 | return true; 50 | 51 | // Filter out TASK_INTERRUPTIBLE state tasks by default unless a task in 52 | // SLEEP state is waiting for a recently submitted async I/O completion 53 | if ((syscall_nr == __NR_io_getevents || syscall_nr == __NR_io_pgetevents) && aio_inflight_reqs) 54 | { 55 | return true; 56 | } 57 | 58 | if (task_state & TASK_INTERRUPTIBLE) { 59 | return false; 60 | } 61 | 62 | return true; 63 | } 64 | 65 | 66 | SEC("iter/task") 67 | int get_tasks(struct bpf_iter__task *ctx) 68 | { 69 | // we can replace this map with libbpf program arguments 70 | struct filter_config *cfg; 71 | const __u32 key = 0; 72 | cfg = bpf_map_lookup_elem(&filter_config_map, &key); 73 | 74 | // use the same timestamp for each record returned from a task iterator loop 75 | static __u64 this_iter_loop_start_ktime; 76 | 77 | if (ctx->meta->seq_num == 0) { 78 | this_iter_loop_start_ktime = bpf_ktime_get_ns(); 79 | } 80 | 81 | struct task_struct *task = ctx->task; 82 | if (!task) 83 | return 0; 84 | 85 | // Early filtering of uninteresting tasks 86 | __u32 task_state = get_task_state(task); 87 | __u32 task_flags = task->flags; 88 | 89 | // exclude idle kernel threads 90 | if ((task_flags & PF_KTHREAD) && (task_state & TASK_IDLE)) 91 | return 0; 92 | 93 | // Get task storage early to check for interesting tasks 94 | struct task_storage *storage; 95 | storage = bpf_task_storage_get(&task_storage, task, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 96 | if (!storage) 97 | return 0; 98 | 99 | if (!storage->pid) storage->pid = task->pid; 100 | if (!storage->tgid) storage->tgid = task->tgid; 101 | 102 | // Check if we are in a syscall (as active syscall probes may be disabled) 103 | __s32 passive_syscall_nr = -1; // uninitialized (valid syscall numbers start from 0) 104 | struct pt_regs *passive_regs = NULL; 105 | 106 | // kernel threads don't issue syscalls 107 | if (!(task_flags & PF_KTHREAD) && task->stack) { 108 | passive_regs = (struct pt_regs *) bpf_task_pt_regs(task); 109 | 110 | if (passive_regs) { 111 | 112 | #if defined(__TARGET_ARCH_x86) 113 | // max val 511 to make verifier on older kernels happy 114 | // this issue showed up probably due to refactoring to separate x86/arm .h files 115 | // so that some macros are behind ifdefs now - or it's just an older kernel issue (TODO) 116 | passive_syscall_nr = (__s32) passive_regs->orig_ax & 0x1ffUL; 117 | #elif defined(__TARGET_ARCH_arm64) 118 | passive_syscall_nr = (__s32) passive_regs->syscallno & 0x1ffUL; 119 | #endif 120 | 121 | } 122 | } 123 | 124 | // Apply user-controlled interesting task filtering 125 | if (!should_emit_task(task_state, cfg, passive_syscall_nr, storage->aio_inflight_reqs)) 126 | return 0; 127 | 128 | // By this point we know the task/state is interesting 129 | // Sample task, populate sample timestamps first 130 | storage->sample_start_ktime = this_iter_loop_start_ktime; 131 | storage->sample_actual_ktime = bpf_ktime_get_ns(); 132 | 133 | // Mark any ongoing tracepoint-captured syscall as "sampled" so we get completion events later 134 | if (passive_syscall_nr >= 0) { 135 | storage->in_syscall_nr = passive_syscall_nr; 136 | storage->sc_sampled = true; 137 | 138 | // syscall entry time is 0 only for syscalls already ongoing when xcapture started 139 | // so set it to current sample timestamp, so we'll know the partial duration when sc ends 140 | if (!storage->sc_enter_time) { 141 | storage->sc_enter_time = storage->sample_actual_ktime; 142 | } 143 | } 144 | 145 | // Reserve space in output ring buffer and start populating it 146 | // Important: We are reusing existing ringbuf space, so pages are not zero-filled 147 | struct task_output_event *event; 148 | event = bpf_ringbuf_reserve(&task_samples, sizeof(*event), 0); 149 | if (!event) { 150 | return 0; 151 | } 152 | 153 | // Basic task information 154 | event->type = EVENT_TASK_INFO; 155 | event->pid = task->pid; // to avoid confusion, use TID and TGID in userspace output 156 | event->tgid = task->tgid; // kernel pid == TID (thread ID) in userspace for threads 157 | // kernel tgid == TGID (thread group ID) for processes 158 | 159 | // Scheduler state info (volatile fast changing values) 160 | event->flags = task_flags; 161 | event->state = task_state; 162 | event->on_cpu = task->on_cpu; 163 | event->on_rq = task->on_rq; 164 | event->migration_pending = task->migration_pending; 165 | event->in_execve = BPF_CORE_READ_BITFIELD(task, in_execve); 166 | event->in_iowait = BPF_CORE_READ_BITFIELD(task, in_iowait); 167 | 168 | // Conditional reads based on kernel version 169 | if (bpf_core_field_exists(task->sched_remote_wakeup)) 170 | event->sched_remote_wakeup = BPF_CORE_READ_BITFIELD(task, sched_remote_wakeup); 171 | 172 | // Reset syscall nr and args due to ringbuffer reuse 173 | event->syscall_nr = passive_syscall_nr; 174 | __builtin_memset(event->syscall_args, 0, sizeof(event->syscall_args)); 175 | __u64 sc1_arg = 0; 176 | 177 | // if in syscall, read syscall number and args, otherwise skip 178 | if (passive_syscall_nr >= 0 && passive_regs) { 179 | // 180 | #if defined(__TARGET_ARCH_x86) 181 | // TODO both sections below can be optimized (yay!) 182 | bpf_probe_read_kernel(&sc1_arg, sizeof(sc1_arg), &passive_regs->di); 183 | bpf_probe_read_kernel(&event->syscall_args[0], sizeof(event->syscall_args[0]), &passive_regs->di); 184 | bpf_probe_read_kernel(&event->syscall_args[1], sizeof(event->syscall_args[1]), &passive_regs->si); 185 | bpf_probe_read_kernel(&event->syscall_args[2], sizeof(event->syscall_args[2]), &passive_regs->dx); 186 | bpf_probe_read_kernel(&event->syscall_args[3], sizeof(event->syscall_args[3]), &passive_regs->r10); 187 | bpf_probe_read_kernel(&event->syscall_args[4], sizeof(event->syscall_args[4]), &passive_regs->r8); 188 | bpf_probe_read_kernel(&event->syscall_args[5], sizeof(event->syscall_args[5]), &passive_regs->r9); 189 | #elif defined(__TARGET_ARCH_arm64) 190 | bpf_probe_read_kernel(&event->syscall_args[0], sizeof(event->syscall_args[0]), &passive_regs->regs[1]); 191 | bpf_probe_read_kernel(&event->syscall_args[1], sizeof(event->syscall_args[1]), &passive_regs->regs[2]); 192 | bpf_probe_read_kernel(&event->syscall_args[2], sizeof(event->syscall_args[2]), &passive_regs->regs[3]); 193 | bpf_probe_read_kernel(&event->syscall_args[3], sizeof(event->syscall_args[3]), &passive_regs->regs[4]); 194 | bpf_probe_read_kernel(&event->syscall_args[4], sizeof(event->syscall_args[4]), &passive_regs->regs[5]); 195 | bpf_probe_read_kernel(&event->syscall_args[5], sizeof(event->syscall_args[5]), &passive_regs->regs[6]); 196 | #endif 197 | // 198 | } 199 | 200 | // username, comm and other slower changing metadata 201 | // (TODO: add more metadata like: namespace id, cgroup name, etc) 202 | event->euid = task->cred->euid.val; 203 | BPF_CORE_READ_STR_INTO(&event->comm, task, comm); 204 | 205 | // executable file name for userspace apps (kernel tasks don't set task->mm) 206 | if (task->mm) { 207 | get_file_name(task->mm->exe_file, event->exe_file, sizeof(event->exe_file), "[NO_EXE]"); 208 | } else { 209 | __builtin_memcpy(event->exe_file, "[NO_MM]", 8); 210 | } 211 | 212 | // first reset the output values due to conditional population below (and ringbuf reuse!) 213 | event->kstack_len = 0; 214 | event->filename[0] = '-'; 215 | event->filename[1] = '\0'; 216 | event->has_socket_info = false; 217 | 218 | // Read file descriptor information for current syscall 219 | if (passive_syscall_nr >=0 && SYSCALL_HAS_FD_ARG1(passive_syscall_nr)) { 220 | struct file *file = NULL; 221 | struct files_struct *files = BPF_CORE_READ(task, files); 222 | 223 | if (files) { 224 | struct fdtable *fdt = BPF_CORE_READ(files, fdt); 225 | struct file **fd_array = BPF_CORE_READ(fdt, fd); 226 | 227 | if (fd_array) { 228 | if (event->syscall_args[0] >= 0 && event->syscall_args[0] < 1024) { // TODO remove this 229 | bpf_probe_read_kernel(&file, sizeof(file), &fd_array[event->syscall_args[0]]); 230 | } 231 | } 232 | } 233 | 234 | if (file) { 235 | get_file_name(file, event->filename, sizeof(event->filename), "-"); 236 | } 237 | 238 | // Try to get socket information 239 | if (file) { 240 | struct inode *inode = BPF_CORE_READ(file, f_path.dentry, d_inode); 241 | if (inode) { 242 | // unsigned short i_mode = BPF_CORE_READ(inode, i_mode); 243 | unsigned short i_mode = BPF_CORE_READ(inode, i_mode); 244 | // Check if file is of socket type (S_IFSOCK == 0140000) 245 | if ((i_mode & S_IFMT) == S_IFSOCK) { 246 | event->has_socket_info = get_socket_info(file, &event->sock_info); 247 | } 248 | } 249 | } 250 | } 251 | 252 | // Track iorq info if relevant: iorq struct addresses get quickly reused in kernel 253 | // by any task in the system. iorq pointers are not unique over time so need 254 | // to compare kernel-provided iorq insert/issue time with our tracked state. 255 | // this is because we don't clear the storage->last_iorq_rq in iorq completion tracepoint 256 | if (storage->last_iorq_rq) { 257 | storage->last_iorq_sampled = storage->last_iorq_rq; 258 | // storage->last_iorq_sampled_insert_ns = storage->last_iorq_insert_ns; 259 | // storage->last_iorq_sampled_issue_ns = storage->last_iorq_issue_ns; 260 | 261 | struct iorq_info *iorq_info = bpf_map_lookup_elem(&iorq_tracking, &storage->last_iorq_sampled); 262 | 263 | // First make sure that the iorq struct in this memory address is caused by *this* task's 264 | // last iorq (and not reused by another request). Not using iorq_sequence num here as iorq 265 | // addr + pid + high precision timestamps should be ok for differentiating (for now). 266 | // If the iorq addr + pid + insert or issue ts are the same in the iorq map and task 267 | // storage, then this is indeed our I/O at this iorq memory address (populated by 268 | // the independently running tracepoints), so it's ok to mark the iorq sampled. 269 | 270 | if (iorq_info && iorq_info->insert_pid == task->pid && 271 | iorq_info->iorq_sequence_num == storage->iorq_sequence_num) { 272 | iorq_info->iorq_sampled = true; 273 | } 274 | } 275 | // if (iorq_info->insert_time == storage->last_iorq_sampled_insert_ns) { 276 | 277 | __builtin_memcpy(&event->storage, storage, sizeof(event->storage)); 278 | bpf_ringbuf_submit(event, 0); 279 | 280 | return 0; 281 | } 282 | -------------------------------------------------------------------------------- /xcapture/src/retrievers/README.md: -------------------------------------------------------------------------------- 1 | This is where we put the various .h files that hold "always inline" functions and macros for retrieving struct fields of interest. 2 | -------------------------------------------------------------------------------- /xcapture/src/user/task_handler.c: -------------------------------------------------------------------------------- 1 | // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 | // Copyright 2024-2038 Tanel Poder [0x.tools] 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | #include 10 | #include 11 | 12 | #include "xcapture.h" 13 | #include "task_handler.h" 14 | #include "xcapture_user.h" 15 | #include "md5.h" 16 | 17 | // External variables from main.c 18 | extern struct output_files files; 19 | extern struct time_correlation tcorr; 20 | extern pid_t mypid; 21 | extern bool output_csv; 22 | extern bool output_verbose; 23 | extern bool dump_stack_traces; 24 | 25 | // Function declarations for common functions 26 | extern const char *getusername(uid_t uid); 27 | extern const char *format_task_state(__u32 state); 28 | extern const char *safe_syscall_name(__s32 syscall_nr); 29 | extern const char *format_connection(const struct socket_info *si, char *buf, size_t buflen); 30 | extern struct timespec get_wall_from_mono(struct time_correlation *tcorr, __u64 bpf_time); 31 | extern struct timespec sub_ns_from_ts(struct timespec ts, __u64 ns); 32 | extern void get_str_from_ts(struct timespec ts, char *buf, size_t bufsize); 33 | extern int check_and_rotate_files(struct output_files *files); 34 | 35 | int handle_task_event(void *ctx, void *data, size_t data_sz) 36 | { 37 | enum event_type *type_ptr = (enum event_type *)data; 38 | enum event_type event_type = *type_ptr; 39 | 40 | // Safety check - only task info events should be in this ring buffer 41 | if (event_type != EVENT_TASK_INFO) { 42 | fprintf(stderr, "Unexpected event type in task samples ring buffer: %d\n", event_type); 43 | return 0; 44 | } 45 | 46 | const struct task_output_event *event = data; 47 | // struct bpf_print_ctx *printer_ctx = (struct bpf_print_ctx *)ctx; 48 | 49 | // Skip processing xcapture itself (it's always on CPU when sampling) 50 | if (event->pid == mypid) 51 | return 0; 52 | 53 | // get sample_start timestamp from when this task loop iteration started 54 | char timestamp[64]; 55 | struct timespec current_sample_ts_iter_start = get_wall_from_mono(&tcorr, event->storage.sample_start_ktime); 56 | get_str_from_ts(current_sample_ts_iter_start, timestamp, sizeof(timestamp)); 57 | 58 | // Process task info 59 | __u64 sc_duration_ns = 0; 60 | if (event->storage.sc_enter_time > 0) { 61 | sc_duration_ns = event->storage.sample_actual_ktime - event->storage.sc_enter_time; 62 | } 63 | 64 | char sc_start_time_str[64] = ""; 65 | // when this task struct was actually read 66 | // struct timespec current_sample_ts_this_task = get_wall_from_mono(&tcorr, event->storage.sample_actual_ktime); 67 | 68 | // get syscall start timestamp string from ktime ns 69 | if (event->storage.sc_enter_time > 0) { 70 | struct timespec current_sc_start_ts = get_wall_from_mono( 71 | &tcorr, event->storage.sample_actual_ktime - sc_duration_ns); 72 | get_str_from_ts(current_sc_start_ts, sc_start_time_str, sizeof(sc_start_time_str)); 73 | } 74 | 75 | char conn_buf[256] = ""; 76 | if (event->has_socket_info) { 77 | format_connection(&event->sock_info, conn_buf, sizeof(conn_buf)); 78 | } 79 | 80 | if (output_csv) { 81 | if (check_and_rotate_files(&files) < 0) { 82 | fprintf(stderr, "Failed to rotate output files\n"); 83 | return -1; 84 | } 85 | 86 | fprintf(files.sample_file, 87 | "%s,%d,%d,%s,\"%s\",\"%s\",\"%s\",%s,%s,%s,%lld,%lld,%lld,%llx,%llx,%llx,%llx,%llx,%llx,\"%s\",\"%s\",%s,%d\n", 88 | timestamp, 89 | event->pid, 90 | event->tgid, 91 | format_task_state(event->state), 92 | getusername(event->euid), 93 | (event->flags & PF_KTHREAD) ? "[kernel]" : event->exe_file, 94 | event->comm, 95 | (event->flags & PF_KTHREAD) ? "-" : safe_syscall_name(event->syscall_nr), 96 | (event->flags & PF_KTHREAD) ? "-" : ( 97 | event->storage.sc_enter_time ? safe_syscall_name(event->storage.in_syscall_nr) : "?" 98 | ), 99 | event->storage.sc_enter_time > 0 ? sc_start_time_str : "", // todo validate bug 100 | sc_duration_ns, 101 | event->storage.sc_sequence_num, 102 | event->storage.iorq_sequence_num, 103 | event->syscall_args[0], 104 | event->syscall_args[1], 105 | event->syscall_args[2], 106 | event->syscall_args[3], 107 | event->syscall_args[4], 108 | event->syscall_args[5], 109 | event->filename[0] ? event->filename : "", 110 | event->has_socket_info ? conn_buf : "", 111 | get_syscall_info_desc(event->syscall_nr), 112 | event->storage.aio_inflight_reqs 113 | ); 114 | } 115 | else { 116 | printf("%-26s %'6lld %7d %7d %-6s %-6d %-6d %-4d %-16s %-20s %-16s %-20s %-20s %'16lld %16llx " 117 | "%-20s %-40s %-26s %12lld %-12s %12d\n", 118 | timestamp, 119 | (event->storage.sample_actual_ktime - event->storage.sample_start_ktime) / 1000, // microsec for dev mode 120 | event->pid, 121 | event->tgid, 122 | format_task_state(event->state), 123 | event->on_cpu, 124 | event->on_rq, 125 | (bool) event->migration_pending, 126 | getusername(event->euid), 127 | (event->flags & PF_KTHREAD) ? "[kernel]" : event->exe_file, 128 | event->comm, 129 | event->flags & PF_KTHREAD ? "-" : safe_syscall_name(event->syscall_nr), 130 | (event->flags & PF_KTHREAD) ? "-" : ( 131 | event->storage.sc_enter_time > 0 ? safe_syscall_name(event->storage.in_syscall_nr) : "?" 132 | ), 133 | (sc_duration_ns / 1000), // microseconds 134 | event->syscall_args[0], 135 | event->filename[0] ? event->filename : "-", 136 | event->has_socket_info ? conn_buf : "-", 137 | event->storage.sc_enter_time > 0 ? sc_start_time_str : "-", // todo validate bug 138 | event->storage.sc_sequence_num, 139 | get_syscall_info_desc(event->syscall_nr), 140 | event->storage.aio_inflight_reqs 141 | ); 142 | } 143 | 144 | if (dump_stack_traces && files.kstack_file && event->kstack_len > 0) { 145 | // Get md5 hash of stack addresses (lower 64 bits) 146 | uint64_t stack_hash = hash_stack((uint64_t*)event->kstack, event->kstack_len); 147 | 148 | // Comma-separated list of stack trace addresses in hex 149 | fprintf(files.kstack_file, "%s,%d,%d,%lx,['", timestamp, event->pid, event->tgid, stack_hash); 150 | 151 | for (int i = 0; i < event->kstack_len; i++) { 152 | fprintf(files.kstack_file, "0x%llx", event->kstack[i]); 153 | if (i < event->kstack_len - 1) { 154 | fprintf(files.kstack_file, "','"); 155 | } 156 | } 157 | 158 | fprintf(files.kstack_file, "']\n"); 159 | } 160 | 161 | return 0; 162 | } 163 | -------------------------------------------------------------------------------- /xcapture/src/user/task_handler.h: -------------------------------------------------------------------------------- 1 | #ifndef __TASK_HANDLER_H 2 | #define __TASK_HANDLER_H 3 | 4 | #include 5 | #include "xcapture.h" 6 | #include "xcapture_user.h" 7 | 8 | int handle_task_event(void *ctx, void *data, size_t data_sz); 9 | 10 | #endif /* __TASK_HANDLER_H */ 11 | -------------------------------------------------------------------------------- /xcapture/src/user/tracking_handler.c: -------------------------------------------------------------------------------- 1 | // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 | // Copyright 2024-2038 Tanel Poder [0x.tools] 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | 11 | #include 12 | #include 13 | 14 | #include "xcapture.h" 15 | #include "tracking_handler.h" 16 | #include "xcapture_user.h" 17 | 18 | // External variables from main.c that you need to access 19 | extern struct output_files files; 20 | extern struct time_correlation tcorr; 21 | extern pid_t mypid; 22 | extern bool output_csv; 23 | extern bool output_verbose; 24 | 25 | // Function declarations for functions you'll call 26 | extern const char *safe_syscall_name(__s32 syscall_nr); 27 | extern const char *get_iorq_op_flags(__u32 cmd_flags); 28 | extern struct timespec get_wall_from_mono(struct time_correlation *tcorr, __u64 bpf_time); 29 | extern void get_str_from_ts(struct timespec ts, char *buf, size_t bufsize); 30 | extern int check_and_rotate_files(struct output_files *files); 31 | 32 | int handle_tracking_event(void *ctx, void *data, size_t data_sz) 33 | { 34 | // Implement the tracking event handler (copied from your current code) 35 | enum event_type *type_ptr = (enum event_type *)data; 36 | enum event_type event_type = *type_ptr; 37 | 38 | switch (event_type) { 39 | case EVENT_SYSCALL_COMPLETION: 40 | { 41 | const struct sc_completion_event *e = data; 42 | 43 | if (e->pid == mypid) 44 | return 0; 45 | 46 | __u64 duration_ns = (e->completed_sc_exit_time - e->completed_sc_enter_time); 47 | char ts_enter[64], ts_exit[64]; 48 | get_str_from_ts(get_wall_from_mono(&tcorr, e->completed_sc_enter_time), ts_enter, sizeof(ts_enter)); 49 | get_str_from_ts(get_wall_from_mono(&tcorr, e->completed_sc_exit_time), ts_exit, sizeof(ts_exit)); 50 | 51 | // syscall error vs large value formatting 52 | const char *csv_format_str; 53 | const char *printf_format_str; 54 | 55 | // CSV header: TYPE,TID,TGID,SYSCALL_NAME,DURATION_NS,SYSC_RET_VAL,SYSC_SEQ_NUM,SYSC_ENTER_TIME 56 | // stdout currently prints microseconds (CSV prints ns) 57 | if (e->completed_sc_ret_val >= -4095 && e->completed_sc_ret_val <= (1024*1024*16)) { 58 | csv_format_str = "SYSC_END,%d,%d,%s,%llu,%lld,%llu,%s\n"; 59 | printf_format_str = "SYSC_END %7d %7d %-20s dur= %-'10llu ret= %-10lld seq= %-10llu %s\n"; 60 | } else { 61 | csv_format_str = "SYSC_END,%d,%d,%s,%llu,0x%llx,%lld,%s\n"; 62 | printf_format_str = "SYSC_END %7d %7d %-20s dur= %-'10llu ret= 0x%llx seq= %-10llu %s\n"; 63 | } 64 | 65 | if (output_csv) { 66 | if (check_and_rotate_files(&files) < 0) { 67 | fprintf(stderr, "Failed to rotate output files\n"); 68 | return -1; 69 | } 70 | 71 | fprintf(files.sc_completion_file, csv_format_str, 72 | e->pid, 73 | e->tgid, 74 | safe_syscall_name(e->completed_syscall_nr), 75 | duration_ns, 76 | e->completed_sc_ret_val, 77 | e->completed_sc_sequence_num, 78 | ts_enter); 79 | } else { 80 | printf(printf_format_str, 81 | e->pid, 82 | e->tgid, 83 | safe_syscall_name(e->completed_syscall_nr), 84 | duration_ns / 1000, // print microsec in dev mode for narrower output 85 | e->completed_sc_ret_val, 86 | e->completed_sc_sequence_num, 87 | ts_exit); // seeing syscall completion ts is more useful in dev mode 88 | } 89 | } 90 | break; 91 | 92 | case EVENT_IORQ_COMPLETION: 93 | { 94 | const struct iorq_completion_event *e = data; 95 | 96 | // if insert_time == issue_time then io queue was bypassed (no queuing time) 97 | __u64 duration_ns = (e->iorq_complete_time - e->iorq_insert_time); 98 | __u64 service_ns = (e->iorq_complete_time - e->iorq_issue_time); 99 | char iorq_insert_str[64], iorq_issue_str[64], iorq_complete_str[64]; 100 | 101 | get_str_from_ts(get_wall_from_mono(&tcorr, e->iorq_insert_time), iorq_insert_str, sizeof(iorq_insert_str)); 102 | get_str_from_ts(get_wall_from_mono(&tcorr, e->iorq_issue_time), iorq_issue_str, sizeof(iorq_issue_str)); 103 | get_str_from_ts(get_wall_from_mono(&tcorr, e->iorq_complete_time), iorq_complete_str, sizeof(iorq_complete_str)); 104 | 105 | if (output_csv) { 106 | if (check_and_rotate_files(&files) < 0) { 107 | fprintf(stderr, "Failed to rotate output files\n"); 108 | return -1; 109 | } 110 | // nanosec granularity for csv 111 | fprintf(files.iorq_completion_file, 112 | "IORQ_END,%d,%d,%d,%d,%d,%d,%u,%u,%llu,%u,%s,%llu,%llu,%llu,%llu,%s,%d\n", 113 | e->insert_pid, e->insert_tgid, e->issue_pid, e->issue_tgid, 114 | e->complete_pid, e->complete_tgid, 115 | MAJOR(e->iorq_dev), MINOR(e->iorq_dev), e->iorq_sector, e->iorq_bytes, 116 | get_iorq_op_flags(e->iorq_cmd_flags), e->iorq_sequence_num, 117 | duration_ns, service_ns, (duration_ns - service_ns), 118 | iorq_insert_str, e->iorq_error); 119 | } else { 120 | // microsec granularity for dev display mode 121 | printf("IORQ_END %7d %7d %7d %7d %7d %7d %-20s dur= %-'10llu que= %-'10llu svc= %-'10llu " 122 | "%3u:%-3u %26s %7d %7d %10llu %12llu %8u err= %-5d\n", 123 | e->insert_pid, e->insert_tgid, e->issue_pid, e->issue_tgid, 124 | e->complete_pid, e->complete_tgid, 125 | get_iorq_op_flags(e->iorq_cmd_flags), 126 | duration_ns / 1000, (duration_ns - service_ns) / 1000, service_ns / 1000, 127 | MAJOR(e->iorq_dev), MINOR(e->iorq_dev), 128 | iorq_insert_str, e->issue_pid, e->issue_tgid, 129 | e->iorq_sector, e->iorq_sequence_num, e->iorq_bytes, e->iorq_error 130 | ); 131 | } 132 | } 133 | break; 134 | 135 | default: 136 | fprintf(stderr, "Unknown event type in tracking ring buffer: %d\n", event_type); 137 | break; 138 | } 139 | 140 | return 0; 141 | } 142 | -------------------------------------------------------------------------------- /xcapture/src/user/tracking_handler.h: -------------------------------------------------------------------------------- 1 | #ifndef __TRACKING_HANDLER_H 2 | #define __TRACKING_HANDLER_H 3 | 4 | #include 5 | #include "xcapture.h" 6 | #include "xcapture_user.h" 7 | 8 | int handle_tracking_event(void *ctx, void *data, size_t data_sz); 9 | 10 | #endif /* __TRACKING_HANDLER_H */ 11 | -------------------------------------------------------------------------------- /xcapture/src/utils/md5.c: -------------------------------------------------------------------------------- 1 | #include "md5.h" 2 | 3 | // MD5 transformation constants 4 | #define S11 7 5 | #define S12 12 6 | #define S13 17 7 | #define S14 22 8 | #define S21 5 9 | #define S22 9 10 | #define S23 14 11 | #define S24 20 12 | #define S31 4 13 | #define S32 11 14 | #define S33 16 15 | #define S34 23 16 | #define S41 6 17 | #define S42 10 18 | #define S43 15 19 | #define S44 21 20 | 21 | // Basic MD5 functions 22 | #define F(x, y, z) (((x) & (y)) | ((~x) & (z))) 23 | #define G(x, y, z) (((x) & (z)) | ((y) & (~z))) 24 | #define H(x, y, z) ((x) ^ (y) ^ (z)) 25 | #define I(x, y, z) ((y) ^ ((x) | (~z))) 26 | 27 | // ROTATE_LEFT rotates x left n bits 28 | #define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n)))) 29 | 30 | // Transformation macros 31 | #define FF(a, b, c, d, x, s, ac) { \ 32 | (a) += F ((b), (c), (d)) + (x) + (uint32_t)(ac); \ 33 | (a) = ROTATE_LEFT ((a), (s)); \ 34 | (a) += (b); \ 35 | } 36 | #define GG(a, b, c, d, x, s, ac) { \ 37 | (a) += G ((b), (c), (d)) + (x) + (uint32_t)(ac); \ 38 | (a) = ROTATE_LEFT ((a), (s)); \ 39 | (a) += (b); \ 40 | } 41 | #define HH(a, b, c, d, x, s, ac) { \ 42 | (a) += H ((b), (c), (d)) + (x) + (uint32_t)(ac); \ 43 | (a) = ROTATE_LEFT ((a), (s)); \ 44 | (a) += (b); \ 45 | } 46 | #define II(a, b, c, d, x, s, ac) { \ 47 | (a) += I ((b), (c), (d)) + (x) + (uint32_t)(ac); \ 48 | (a) = ROTATE_LEFT ((a), (s)); \ 49 | (a) += (b); \ 50 | } 51 | 52 | // MD5 internal function declarations 53 | static void MD5Transform(uint32_t state[4], const unsigned char block[64]); 54 | static void Encode(unsigned char *output, uint32_t *input, unsigned int len); 55 | static void Decode(uint32_t *output, const unsigned char *input, unsigned int len); 56 | 57 | // Encodes input uint32_t into output unsigned char 58 | static void Encode(unsigned char *output, uint32_t *input, unsigned int len) { 59 | unsigned int i, j; 60 | for (i = 0, j = 0; j < len; i++, j += 4) { 61 | output[j] = (unsigned char)(input[i] & 0xff); 62 | output[j+1] = (unsigned char)((input[i] >> 8) & 0xff); 63 | output[j+2] = (unsigned char)((input[i] >> 16) & 0xff); 64 | output[j+3] = (unsigned char)((input[i] >> 24) & 0xff); 65 | } 66 | } 67 | 68 | // Decodes input unsigned char into output uint32_t 69 | static void Decode(uint32_t *output, const unsigned char *input, unsigned int len) { 70 | unsigned int i, j; 71 | for (i = 0, j = 0; j < len; i++, j += 4) 72 | output[i] = ((uint32_t)input[j]) | (((uint32_t)input[j+1]) << 8) | 73 | (((uint32_t)input[j+2]) << 16) | (((uint32_t)input[j+3]) << 24); 74 | } 75 | 76 | // MD5 initialization 77 | void MD5_Init(MD5_CTX *context) { 78 | context->count[0] = context->count[1] = 0; 79 | 80 | // Load magic initialization constants 81 | context->state[0] = 0x67452301; 82 | context->state[1] = 0xefcdab89; 83 | context->state[2] = 0x98badcfe; 84 | context->state[3] = 0x10325476; 85 | } 86 | 87 | // MD5 block update operation 88 | void MD5_Update(MD5_CTX *context, const unsigned char *input, unsigned int inputLen) { 89 | unsigned int i, index, partLen; 90 | 91 | // Compute number of bytes mod 64 92 | index = (unsigned int)((context->count[0] >> 3) & 0x3F); 93 | 94 | // Update number of bits 95 | if ((context->count[0] += ((uint32_t)inputLen << 3)) < ((uint32_t)inputLen << 3)) 96 | context->count[1]++; 97 | context->count[1] += ((uint32_t)inputLen >> 29); 98 | 99 | partLen = 64 - index; 100 | 101 | // Transform as many times as possible 102 | if (inputLen >= partLen) { 103 | memcpy(&context->buffer[index], input, partLen); 104 | MD5Transform(context->state, context->buffer); 105 | 106 | for (i = partLen; i + 63 < inputLen; i += 64) 107 | MD5Transform(context->state, &input[i]); 108 | 109 | index = 0; 110 | } else 111 | i = 0; 112 | 113 | // Buffer remaining input 114 | memcpy(&context->buffer[index], &input[i], inputLen-i); 115 | } 116 | 117 | // MD5 finalization 118 | void MD5_Final(unsigned char digest[16], MD5_CTX *context) { 119 | unsigned char bits[8]; 120 | unsigned int index, padLen; 121 | static unsigned char PADDING[64] = { 122 | 0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 123 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 124 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 125 | }; 126 | 127 | // Save number of bits 128 | Encode(bits, context->count, 8); 129 | 130 | // Pad out to 56 mod 64 131 | index = (unsigned int)((context->count[0] >> 3) & 0x3f); 132 | padLen = (index < 56) ? (56 - index) : (120 - index); 133 | MD5_Update(context, PADDING, padLen); 134 | 135 | // Append length (before padding) 136 | MD5_Update(context, bits, 8); 137 | 138 | // Store state in digest 139 | Encode(digest, context->state, 16); 140 | } 141 | 142 | // MD5 basic transformation 143 | static void MD5Transform(uint32_t state[4], const unsigned char block[64]) { 144 | uint32_t a = state[0], b = state[1], c = state[2], d = state[3], x[16]; 145 | 146 | Decode(x, block, 64); 147 | 148 | // Round 1 149 | FF(a, b, c, d, x[ 0], S11, 0xd76aa478); 150 | FF(d, a, b, c, x[ 1], S12, 0xe8c7b756); 151 | FF(c, d, a, b, x[ 2], S13, 0x242070db); 152 | FF(b, c, d, a, x[ 3], S14, 0xc1bdceee); 153 | FF(a, b, c, d, x[ 4], S11, 0xf57c0faf); 154 | FF(d, a, b, c, x[ 5], S12, 0x4787c62a); 155 | FF(c, d, a, b, x[ 6], S13, 0xa8304613); 156 | FF(b, c, d, a, x[ 7], S14, 0xfd469501); 157 | FF(a, b, c, d, x[ 8], S11, 0x698098d8); 158 | FF(d, a, b, c, x[ 9], S12, 0x8b44f7af); 159 | FF(c, d, a, b, x[10], S13, 0xffff5bb1); 160 | FF(b, c, d, a, x[11], S14, 0x895cd7be); 161 | FF(a, b, c, d, x[12], S11, 0x6b901122); 162 | FF(d, a, b, c, x[13], S12, 0xfd987193); 163 | FF(c, d, a, b, x[14], S13, 0xa679438e); 164 | FF(b, c, d, a, x[15], S14, 0x49b40821); 165 | 166 | // Round 2 167 | GG(a, b, c, d, x[ 1], S21, 0xf61e2562); 168 | GG(d, a, b, c, x[ 6], S22, 0xc040b340); 169 | GG(c, d, a, b, x[11], S23, 0x265e5a51); 170 | GG(b, c, d, a, x[ 0], S24, 0xe9b6c7aa); 171 | GG(a, b, c, d, x[ 5], S21, 0xd62f105d); 172 | GG(d, a, b, c, x[10], S22, 0x2441453); 173 | GG(c, d, a, b, x[15], S23, 0xd8a1e681); 174 | GG(b, c, d, a, x[ 4], S24, 0xe7d3fbc8); 175 | GG(a, b, c, d, x[ 9], S21, 0x21e1cde6); 176 | GG(d, a, b, c, x[14], S22, 0xc33707d6); 177 | GG(c, d, a, b, x[ 3], S23, 0xf4d50d87); 178 | GG(b, c, d, a, x[ 8], S24, 0x455a14ed); 179 | GG(a, b, c, d, x[13], S21, 0xa9e3e905); 180 | GG(d, a, b, c, x[ 2], S22, 0xfcefa3f8); 181 | GG(c, d, a, b, x[ 7], S23, 0x676f02d9); 182 | GG(b, c, d, a, x[12], S24, 0x8d2a4c8a); 183 | 184 | // Round 3 185 | HH(a, b, c, d, x[ 5], S31, 0xfffa3942); 186 | HH(d, a, b, c, x[ 8], S32, 0x8771f681); 187 | HH(c, d, a, b, x[11], S33, 0x6d9d6122); 188 | HH(b, c, d, a, x[14], S34, 0xfde5380c); 189 | HH(a, b, c, d, x[ 1], S31, 0xa4beea44); 190 | HH(d, a, b, c, x[ 4], S32, 0x4bdecfa9); 191 | HH(c, d, a, b, x[ 7], S33, 0xf6bb4b60); 192 | HH(b, c, d, a, x[10], S34, 0xbebfbc70); 193 | HH(a, b, c, d, x[13], S31, 0x289b7ec6); 194 | HH(d, a, b, c, x[ 0], S32, 0xeaa127fa); 195 | HH(c, d, a, b, x[ 3], S33, 0xd4ef3085); 196 | HH(b, c, d, a, x[ 6], S34, 0x4881d05); 197 | HH(a, b, c, d, x[ 9], S31, 0xd9d4d039); 198 | HH(d, a, b, c, x[12], S32, 0xe6db99e5); 199 | HH(c, d, a, b, x[15], S33, 0x1fa27cf8); 200 | HH(b, c, d, a, x[ 2], S34, 0xc4ac5665); 201 | 202 | // Round 4 203 | II(a, b, c, d, x[ 0], S41, 0xf4292244); 204 | II(d, a, b, c, x[ 7], S42, 0x432aff97); 205 | II(c, d, a, b, x[14], S43, 0xab9423a7); 206 | II(b, c, d, a, x[ 5], S44, 0xfc93a039); 207 | II(a, b, c, d, x[12], S41, 0x655b59c3); 208 | II(d, a, b, c, x[ 3], S42, 0x8f0ccc92); 209 | II(c, d, a, b, x[10], S43, 0xffeff47d); 210 | II(b, c, d, a, x[ 1], S44, 0x85845dd1); 211 | II(a, b, c, d, x[ 8], S41, 0x6fa87e4f); 212 | II(d, a, b, c, x[15], S42, 0xfe2ce6e0); 213 | II(c, d, a, b, x[ 6], S43, 0xa3014314); 214 | II(b, c, d, a, x[13], S44, 0x4e0811a1); 215 | II(a, b, c, d, x[ 4], S41, 0xf7537e82); 216 | II(d, a, b, c, x[11], S42, 0xbd3af235); 217 | II(c, d, a, b, x[ 2], S43, 0x2ad7d2bb); 218 | II(b, c, d, a, x[ 9], S44, 0xeb86d391); 219 | 220 | state[0] += a; 221 | state[1] += b; 222 | state[2] += c; 223 | state[3] += d; 224 | } 225 | 226 | // Main stack hashing function that uses MD5 227 | uint64_t hash_stack(uint64_t *stack, int stack_len) { 228 | MD5_CTX context; 229 | unsigned char digest[16]; 230 | uint64_t hash_value; 231 | 232 | MD5_Init(&context); 233 | MD5_Update(&context, (const unsigned char*)stack, stack_len * sizeof(uint64_t)); 234 | MD5_Final(digest, &context); 235 | 236 | memcpy(&hash_value, digest + 8, sizeof(uint64_t)); 237 | return hash_value; 238 | } 239 | -------------------------------------------------------------------------------- /xcapture/src/utils/md5.h: -------------------------------------------------------------------------------- 1 | #ifndef __MD5_H 2 | #define __MD5_H 3 | 4 | #include 5 | #include 6 | 7 | // MD5 context structure 8 | typedef struct { 9 | uint32_t state[4]; // State (ABCD) 10 | uint32_t count[2]; // Number of bits, mod 2^64 (LSB first) 11 | unsigned char buffer[64]; // Input buffer 12 | } MD5_CTX; 13 | 14 | // MD5 functions - public interface 15 | void MD5_Init(MD5_CTX *context); 16 | void MD5_Update(MD5_CTX *context, const unsigned char *input, unsigned int inputLen); 17 | void MD5_Final(unsigned char digest[16], MD5_CTX *context); 18 | 19 | // Stack hash function - use standard C uint64_t 20 | uint64_t hash_stack(uint64_t *stack, int stack_len); 21 | 22 | #endif /* __MD5_H */ 23 | -------------------------------------------------------------------------------- /xcapture/src/utils/xcapture_helpers.h: -------------------------------------------------------------------------------- 1 | // Handle kernel version differences in task state field 2 | struct task_struct___post514 { 3 | unsigned int __state; 4 | } __attribute__((preserve_access_index)); 5 | 6 | struct task_struct___pre514 { 7 | long state; 8 | } __attribute__((preserve_access_index)); 9 | 10 | struct fred_info___check { 11 | long unsigned int edata; 12 | } __attribute__((preserve_access_index)); 13 | 14 | // Helper function to get disk information from request 15 | static struct gendisk __always_inline *get_disk(struct request *rq) 16 | { 17 | struct gendisk *disk = NULL; 18 | struct request_queue *q = BPF_CORE_READ(rq, q); 19 | 20 | if (q) { 21 | disk = BPF_CORE_READ(q, disk); 22 | } 23 | 24 | return disk; // will be NULL if (!q) 25 | } 26 | -------------------------------------------------------------------------------- /xcapture/tests/Makefile: -------------------------------------------------------------------------------- 1 | CC = gcc 2 | CFLAGS = -Wall -Wextra -O2 -I.. 3 | LDFLAGS = 4 | 5 | # Source files 6 | SRCS = md5_test.c ../md5.c 7 | OBJS = md5_test.o ../md5.o 8 | TARGET = md5_test 9 | 10 | .PHONY: all clean 11 | 12 | all: $(TARGET) 13 | 14 | $(TARGET): $(OBJS) 15 | $(CC) $(CFLAGS) $(OBJS) -o $(TARGET) $(LDFLAGS) 16 | 17 | md5_test.o: md5_test.c ../md5.h 18 | $(CC) $(CFLAGS) -c $< -o $@ 19 | 20 | ../md5.o: ../md5.c ../md5.h 21 | $(CC) $(CFLAGS) -c $< -o $@ 22 | 23 | clean: 24 | rm -f md5_test.o $(TARGET) 25 | # Don't remove md5.o from parent directory as it may be used elsewhere 26 | 27 | # Special clean-all target if you want to clean everything including parent directory objects 28 | clean-all: clean 29 | rm -f ../md5.o 30 | -------------------------------------------------------------------------------- /xcapture/tests/README.md: -------------------------------------------------------------------------------- 1 | # MD5 Implementation Test Suite 2 | 3 | This project provides a comprehensive test suite for verifying an MD5 hash implementation against Python's built-in hashlib implementation. 4 | 5 | ## Components 6 | 7 | - **md5.h / md5.c**: The MD5 implementation to be tested (your existing files) 8 | - **md5_test.c**: C program that reads strings and calculates their MD5 hashes 9 | - **test_md5.py**: Python script that generates test strings and compares results 10 | 11 | ## Building the C Test Program 12 | 13 | ```bash 14 | make 15 | ``` 16 | 17 | This will compile the `md5_test` executable using your existing md5.c and md5.h files. 18 | 19 | ## Running the Tests 20 | 21 | ```bash 22 | # Run with default settings (100 test strings) 23 | python3 test_md5.py 24 | 25 | # Run with more test strings and larger sizes 26 | python3 test_md5.py -n 1000 --min-length 10 --max-length 5000 27 | 28 | # Keep temporary test files for inspection 29 | python3 test_md5.py --keep-files 30 | ``` 31 | 32 | ### Command-line Options 33 | 34 | - `-n, --count`: Number of test strings to generate (default: 100) 35 | - `--min-length`: Minimum string length (default: 0) 36 | - `--max-length`: Maximum string length (default: 1000) 37 | - `--keep-files`: Keep temporary test files after running 38 | - `--exe`: Path to the C MD5 test executable (default: ./md5_test) 39 | 40 | ## Testing Process 41 | 42 | The test script does the following: 43 | 44 | 1. Generates random test strings of varying lengths, including edge cases 45 | 2. Calculates MD5 hashes using the C implementation 46 | 3. Calculates MD5 hashes using Python's hashlib.md5 47 | 4. Compares the results to ensure they match 48 | 49 | ## Manual Testing 50 | 51 | If you want to run the tests manually: 52 | 53 | 1. Generate test strings: 54 | ```bash 55 | python3 -c "import random, string; print('\n'.join(''.join(random.choice(string.printable) for _ in range(random.randint(0, 100))) for _ in range(20)))" > test_strings.txt 56 | ``` 57 | 58 | 2. Run the C implementation: 59 | ```bash 60 | ./md5_test test_strings.txt > c_output.txt 61 | ``` 62 | 63 | 3. Run the Python implementation: 64 | ```bash 65 | python3 -c "import sys, hashlib; [print(f'{hashlib.md5(line.strip().encode()).hexdigest()} {line.strip()}') for line in open(sys.argv[1])]" test_strings.txt > py_output.txt 66 | ``` 67 | 68 | 4. Compare the results: 69 | ```bash 70 | diff c_output.txt py_output.txt 71 | ``` 72 | -------------------------------------------------------------------------------- /xcapture/tests/md5_test.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "md5.h" 5 | 6 | #define MAX_LINE_LENGTH 10000 7 | 8 | /* Helper function to remove newline characters */ 9 | void strip_newline(char *str) { 10 | size_t len = strlen(str); 11 | if (len > 0 && str[len-1] == '\n') { 12 | str[len-1] = '\0'; 13 | } 14 | } 15 | 16 | /* Helper function to convert raw MD5 digest to hex string */ 17 | void md5_to_hex(const unsigned char digest[16], char *hex_output) { 18 | for (int i = 0; i < 16; i++) { 19 | sprintf(&hex_output[i*2], "%02x", digest[i]); 20 | } 21 | hex_output[32] = '\0'; 22 | } 23 | 24 | int main(int argc, char *argv[]) { 25 | FILE *input_file; 26 | char line[MAX_LINE_LENGTH]; 27 | unsigned char digest[16]; /* Raw MD5 digest (16 bytes) */ 28 | char md5_hex[33]; /* MD5 hash as hex string (32 chars + null terminator) */ 29 | MD5_CTX context; 30 | 31 | /* Check arguments */ 32 | if (argc != 2) { 33 | fprintf(stderr, "Usage: %s \n", argv[0]); 34 | return 1; 35 | } 36 | 37 | /* Open the input file */ 38 | input_file = fopen(argv[1], "r"); 39 | if (!input_file) { 40 | perror("Error opening input file"); 41 | return 1; 42 | } 43 | 44 | /* Process each line */ 45 | while (fgets(line, MAX_LINE_LENGTH, input_file)) { 46 | strip_newline(line); 47 | 48 | /* Calculate MD5 hash */ 49 | MD5_Init(&context); 50 | MD5_Update(&context, (const unsigned char *)line, strlen(line)); 51 | MD5_Final(digest, &context); 52 | 53 | /* Convert binary digest to hex string */ 54 | md5_to_hex(digest, md5_hex); 55 | 56 | /* Output hash and original string */ 57 | printf("%s %s\n", md5_hex, line); 58 | } 59 | 60 | fclose(input_file); 61 | return 0; 62 | } 63 | -------------------------------------------------------------------------------- /xcapture/tests/test_md5.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Test MD5 hash implementation by: 4 | 1. Generating random test strings 5 | 2. Computing MD5 hashes using both C and Python implementations 6 | 3. Comparing the results 7 | """ 8 | import argparse 9 | import hashlib 10 | import os 11 | import random 12 | import string 13 | import subprocess 14 | import sys 15 | import tempfile 16 | from pathlib import Path 17 | 18 | def generate_random_string(min_length, max_length): 19 | """Generate a random string with length between min_length and max_length""" 20 | length = random.randint(min_length, max_length) 21 | chars = string.ascii_letters + string.digits + string.punctuation + " " 22 | return ''.join(random.choice(chars) for _ in range(length)) 23 | 24 | def generate_test_strings(count, min_length, max_length, output_file): 25 | """Generate 'count' test strings with varying lengths and write to file""" 26 | test_cases = [] 27 | 28 | # Generate completely random strings 29 | for _ in range(count - 5): 30 | test_cases.append(generate_random_string(min_length, max_length)) 31 | 32 | # Add some edge cases 33 | test_cases.extend([ 34 | "", # Empty string 35 | "a" * min_length if min_length > 0 else "a", # Minimum length with same character 36 | "a" * max_length, # Maximum length with same character 37 | string.ascii_letters + string.digits, # Alphanumeric 38 | "".join(chr(i) for i in range(32, 127)) # ASCII printable characters 39 | ]) 40 | 41 | # Write to file 42 | with open(output_file, 'w', encoding='utf-8') as f: 43 | for test_case in test_cases: 44 | f.write(f"{test_case}\n") 45 | 46 | return len(test_cases) 47 | 48 | def compute_python_md5(input_file, output_file): 49 | """Compute MD5 hashes using Python's hashlib and write to file""" 50 | with open(input_file, 'r', encoding='utf-8') as infile, \ 51 | open(output_file, 'w', encoding='utf-8') as outfile: 52 | for line in infile: 53 | line = line.rstrip('\n') 54 | md5_hash = hashlib.md5(line.encode('utf-8')).hexdigest() 55 | outfile.write(f"{md5_hash} {line}\n") 56 | 57 | def run_c_md5(input_file, output_file, executable="./md5_test"): 58 | """Run the C implementation of MD5 hash and capture output""" 59 | try: 60 | with open(output_file, 'w', encoding='utf-8') as out: 61 | subprocess.run( 62 | [executable, input_file], 63 | stdout=out, 64 | stderr=subprocess.PIPE, 65 | text=True, 66 | check=True 67 | ) 68 | except subprocess.CalledProcessError as e: 69 | print(f"Error running C MD5 implementation: {e.stderr}", file=sys.stderr) 70 | sys.exit(1) 71 | 72 | def compare_results(file1, file2): 73 | """Compare two files with MD5 hashes and return differences""" 74 | with open(file1, 'r', encoding='utf-8') as f1, \ 75 | open(file2, 'r', encoding='utf-8') as f2: 76 | lines1 = f1.readlines() 77 | lines2 = f2.readlines() 78 | 79 | if len(lines1) != len(lines2): 80 | return False, f"Files have different number of lines: {len(lines1)} vs {len(lines2)}" 81 | 82 | differences = [] 83 | for i, (line1, line2) in enumerate(zip(lines1, lines2), 1): 84 | if line1.strip() != line2.strip(): 85 | differences.append(f"Line {i}:\n C: {line1.strip()}\n Python: {line2.strip()}") 86 | 87 | return len(differences) == 0, differences 88 | 89 | def main(): 90 | parser = argparse.ArgumentParser(description='Test MD5 hash implementation') 91 | parser.add_argument('-n', '--count', type=int, default=100, 92 | help='Number of test strings to generate (default: 100)') 93 | parser.add_argument('--min-length', type=int, default=0, 94 | help='Minimum string length (default: 0)') 95 | parser.add_argument('--max-length', type=int, default=1000, 96 | help='Maximum string length (default: 1000)') 97 | parser.add_argument('--keep-files', action='store_true', 98 | help='Keep temporary test files after running') 99 | parser.add_argument('--exe', type=str, default='./md5_test', 100 | help='Path to the C MD5 test executable (default: ./md5_test)') 101 | 102 | args = parser.parse_args() 103 | 104 | # Ensure the C executable exists 105 | if not os.path.isfile(args.exe): 106 | print(f"Error: C executable '{args.exe}' not found. Make sure to compile it first.", file=sys.stderr) 107 | return 1 108 | 109 | # Create temporary directory for test files 110 | temp_dir = tempfile.mkdtemp() 111 | input_file = os.path.join(temp_dir, "test_strings.txt") 112 | c_output = os.path.join(temp_dir, "c_md5_output.txt") 113 | py_output = os.path.join(temp_dir, "python_md5_output.txt") 114 | 115 | try: 116 | # Step 1: Generate test strings 117 | print(f"Generating {args.count} test strings...") 118 | num_strings = generate_test_strings(args.count, args.min_length, args.max_length, input_file) 119 | print(f"Generated {num_strings} test strings in {input_file}") 120 | 121 | # Step 2: Run both implementations 122 | print("Computing MD5 hashes using Python implementation...") 123 | compute_python_md5(input_file, py_output) 124 | 125 | print("Computing MD5 hashes using C implementation...") 126 | run_c_md5(input_file, c_output, args.exe) 127 | 128 | # Step 3: Compare results 129 | print("Comparing results...") 130 | match, differences = compare_results(c_output, py_output) 131 | 132 | if match: 133 | print("SUCCESS: All MD5 hashes match!") 134 | return 0 135 | else: 136 | print("FAILURE: MD5 hash differences found:") 137 | for diff in differences[:10]: # Show only first 10 differences 138 | print(diff) 139 | print() 140 | 141 | if len(differences) > 10: 142 | print(f"... and {len(differences) - 10} more differences.") 143 | 144 | return 1 145 | 146 | finally: 147 | if not args.keep_files: 148 | # Clean up temporary files 149 | for file in [input_file, c_output, py_output]: 150 | if os.path.exists(file): 151 | os.unlink(file) 152 | try: 153 | os.rmdir(temp_dir) 154 | except: 155 | pass 156 | else: 157 | print(f"Test files kept in {temp_dir}") 158 | 159 | if __name__ == "__main__": 160 | sys.exit(main()) 161 | --------------------------------------------------------------------------------