├── BPF-bcc_Userspace_Oracle ├── README.md ├── eventsname.sql ├── ora_logicalIO_histogram.py ├── ora_logicalIO_histogram_example.txt ├── ora_sqlparse_trace.py ├── ora_sqlparse_trace_12c_18c.py ├── ora_sqlparse_trace_example.txt ├── ora_wait_histogram.py ├── ora_wait_histogram_12c_18c.py ├── ora_wait_histogram_example.txt ├── ora_wait_trace.py ├── ora_wait_trace_12c_18c.py └── ora_wait_trace_example.txt ├── Ftrace ├── README └── iolatency_micro ├── LICENSE ├── Perf_probes └── README ├── README.md ├── SystemTap_Linux_IO ├── README ├── blockio_ioblock_latencyhistogram.stp ├── blockio_ioscheduler_latencyhistogram.stp ├── blockio_latency_outliers_per_device.stp ├── blockio_rq_all_TRACE_EXPERIMENTAL.stp ├── blockio_rq_filter_TRACE_EXPERIMENTAL.stp ├── blockio_rq_issue_basic_latencyhistogram.stp ├── blockio_rq_issue_filter_latencyhistogram.stp ├── blockio_rq_issue_filter_latencyhistogram_new.stp ├── blockio_rq_issue_latencyhistogram.stp ├── blockio_rq_issue_latencyhistogram_new.stp ├── pread_latencyhistogram.stp ├── read_latencyhistogram.stp ├── read_latencyhistogram_filterPID.stp └── sync_asyncio_and_libaio_TRACE.stp └── SystemTap_Userspace_Oracle ├── README ├── eventsname.sql ├── experimental ├── logical_io_latency.stp ├── oracle_events_12102_resolve_eventnames_xksled.stp └── trace_oracle_events_debug.stp ├── histograms_oracle_events_11204.stp ├── histograms_oracle_events_12102.stp ├── histograms_oracle_events_version_independent.stp ├── ksuse_find_offsets.sql ├── livepatch_oracle ├── README ├── example_change_ret_val.stp ├── filterSQL_opiprs.stp ├── livepatch_basic_opiprs.stp └── livepatch_opiprs.stp ├── measure_io_patterns ├── Oracle_read_profile.stp ├── Oracle_read_profile_drilldown_file.stp └── Oracle_read_profile_drilldown_objectnum.stp ├── oracle_event_latencyhistogram.stp ├── trace_oracle_events_11204.stp ├── trace_oracle_events_12102.stp ├── trace_oracle_iocalls_12102.stp ├── trace_oracle_logical_io_basic.stp ├── trace_oracle_logical_io_count.stp ├── trace_oracle_logicalio_wait_events_physicalio_11204.stp ├── trace_oracle_logicalio_wait_events_physicalio_12102.stp ├── trace_oracle_wait_events_asyncio_libaio_11204.stp └── trace_oracle_wait_events_asyncio_libaio_12102.stp /BPF-bcc_Userspace_Oracle/README.md: -------------------------------------------------------------------------------- 1 | # BPF/bcc scripts for Oracle tracing 2 | 3 | Author: Luca.Canali@cern.ch 4 | First release: April 2016 5 | Update February 2019 6 | 7 | See also: http://externaltable.blogspot.com/2016/05/linux-bpfbcc-for-oracle-tracing.html 8 | 9 | This folder contains example scripts for tracing Oracle database processes with bcc/BPF. 10 | It works by hooking probes on the userspace. The scripts use Linux eBPF with bcc and uprobes. 11 | The interest to do this is to provide tools and methods to combine userspace tracing of Oracle 12 | with the great instrumentation power of dynamic instrumentation probes on the OS with bcc/BPF. 13 | The scripts are ports of previous work done with SystemTap and/or Perf and are intended as learning 14 | material not for production usage. 15 | 16 | 17 | | Script | Short description 18 | | -------------------------- | ------------------------------------------------------------------------------------- 19 | | [ora_sqlparse_trace.py](ora_sqlparse_trace.py) | Tracing of Oracle SQL parsing. This script traces the SQL hard parsing on Oracle binaries hooking on opiprs and reads from function arguments (CPU registers) and from process memory. 20 | | [ora_sqlparse_trace_12c_18c.py](ora_sqlparse_trace_12c_18c.py) | For 12c and higher. Tracing of Oracle SQL parsing. This script traces the SQL hard parsing on Oracle binaries hooking on opiprs and reads from function arguments (CPU registers) and from process memory. 21 | | [ora_wait_trace.py](ora_wait_trace.py) | Tracing of Oracle wait events. This probe traces Oracle sessions by hooking on the functions kskthewt and kews_update_wait_time and reads from function aguments (CPU registers). 22 | | [ora_wait_trace_12c_18c.py](ora_wait_trace_12c_18c.py) | For 12c and higher. Tracing of Oracle wait events. This probe traces Oracle sessions by hooking on the functions kskthewt and kews_update_wait_time and reads from function aguments (CPU registers). 23 | | [ora_wait_histogram_12c_18c.py](ora_wait_histogram_12c_18c.py) | For 12c and higher. Wait event latency histograms. This probe traces Oracle sessions by hooking on the functions kskthewt and kews_update_wait_time and reads from function arguments (CPU registers) 24 | | [ora_wait_histogram.py](ora_wait_histogram.py) | Wait event latency histograms. This probe traces Oracle sessions by hooking on the functions kskthewt and kews_update_wait_time and reads from function arguments (CPU registers) 25 | | [ora_logicalIO_histogram.py](ora_logicalIO_histogram.py) | Logical IO latency histograms. This probe measures the latency between call and return for the Oracle function kcbgtcr, which is an important part of the logical IO processing for consistent reads 26 | 27 | Compatibility and issues: 28 | 29 | - Must have a kernel with BPF enabled. You can use Red Hat/ Oracle Linux 7.6 or higher (`yum install bcc*`) 30 | or use a system with Linux kernel version 4.5 or higher. 31 | - Oracle version: these scripts have been developed and tested for Oracle 11.2.0.4 32 | - New in February 2019, a version for 12c and higher of 3 scripts implement a workaround for the issues with Oracle 33 | and uprobes in those versions. Tested with Oracle 18c. 34 | - The workaround is simple and consists in tracing the next instruction (for example kskthewt+2 instead of kskthewt), 35 | see [Oracle12_and_Perf](https://mahmoudhatem.wordpress.com/2017/03/22/workaround-for-linux-perf-probes-issue-for-oracle-tracing/) 36 | - The scripts provided here are experimental and may cause unwanted effects especially on busy systems, and overall may be incompatible with your current set-up and/or need some tweaking before running 37 | 38 | Credits and acknowledgements: 39 | 40 | - [Brendan Gregg](https://twitter.com/brendangregg) for writing many of the example tools/scripts for bcc which have been used as guide for writing the scripts in this folder 41 | - BPF and [bcc](https://github.com/iovisor/bcc) development teams 42 | - [Frits Hoogland](https://twitter.com/fritshoogland) for collaboration on investigating Oracle internals and userspace tracing. 43 | - [Sasha Goldshtein](https://twitter.com/goldshtn) has developed scripts for bcc tracing of MySQL and PostgreSQL, see for example [dbstat.py](https://github.com/iovisor/bcc/blob/master/tools/dbstat.py) 44 | - [Hatem Mahmoud](https://twitter.com/Hatem__Mahmoud) has investigated the issue with uprobes and Oracle 12c and higher, providing a simple workaround 45 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/eventsname.sql: -------------------------------------------------------------------------------- 1 | -- 2 | -- eventsname.sql 3 | -- 4 | -- This sqlplus script generates a sed script file to replace oracle wait event numbers with even names 5 | -- intended to be used together the systemtap trace scripts 6 | -- 7 | -- L.C. Aug 2014 8 | -- 9 | 10 | set echo off pages 0 lines 200 feed off head off sqlblanklines off trimspool on trimout on 11 | 12 | spool eventsname.sed 13 | 14 | select 's/\/'||'event='||replace(name,'/','\/')||'/g' SED from v$event_name order by event# desc; 15 | 16 | spool off 17 | exit 18 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_logicalIO_histogram.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # 3 | # ora_logicalIO_histogram.py - Oracle logical IO latency histogram using BPF/bcc and uprobes 4 | # 5 | # This script measures the latency between call and return time for the Oracle function "kcbgtcr", 6 | # which is an important part of the logical IO processing for consistent reads. 7 | # This code is experimental and a proof of concept. Use at your own risk. 8 | # 9 | # Usage: ora_logicalIO_histogram.py [-h] [-p PID] 10 | # 11 | # Author: Luca.Canali@cern.ch - April 2016 12 | # Licensed under the Apache License, Version 2.0 (the "License") 13 | # 14 | # Credits: 15 | # example scripts in bcc repository, in particular by @BrendanGregg 16 | # @FritsHoogland for collaboration on investigations of Oracle internals 17 | # and userspace tracing 18 | # 19 | 20 | from __future__ import print_function 21 | from bcc import BPF 22 | from time import sleep, strftime 23 | import ctypes as ct 24 | import argparse 25 | import os 26 | 27 | examples = """examples: 28 | ./ora_logicalIO_histogram.py 1 10 # Oracle kcbgtcr latency histograms, 1-second summary, repeat 10 times 29 | ./ora_logicalIO_histogram.py -p 123 1 10 # trace PID 123 only 30 | """ 31 | 32 | parser = argparse.ArgumentParser( 33 | description="Oracle logical IO latency histogram for consistent reads\nrequires the environment variable ORACLE_HOME\nrun as root", 34 | formatter_class=argparse.RawDescriptionHelpFormatter, 35 | epilog=examples) 36 | parser.add_argument("-p", "--pid", 37 | help="trace PID only") 38 | parser.add_argument("-T", "--timestamp", action="store_true", 39 | help="include timestamp on output") 40 | parser.add_argument("interval", nargs="?", default=99999999, 41 | help="measurement interval, in seconds") 42 | parser.add_argument("count", nargs="?", default=99999999, 43 | help="number of outputs") 44 | args = parser.parse_args() 45 | countdown = int(args.count) 46 | 47 | # full path of the oracle executable 48 | oracle_executable = os.path.expandvars("$ORACLE_HOME/bin/oracle") 49 | if not os.path.isfile(oracle_executable): 50 | exit("Oracle executable not found.\nPlease set the environment variable ORACLE_HOME.") 51 | 52 | # load BPF program 53 | bpf_text = """ 54 | #include 55 | 56 | BPF_HASH(hasht_duration, u32); 57 | BPF_HISTOGRAM(hist_logicalio); 58 | 59 | int in_kcbgtcr(struct pt_regs *ctx) { 60 | u32 pid = bpf_get_current_pid_tgid(); 61 | if (FILTER) 62 | return 0; 63 | u64 ts = bpf_ktime_get_ns(); 64 | hasht_duration.update(&pid, &ts); 65 | return 0; 66 | }; 67 | 68 | int out_kcbgtcr(struct pt_regs *ctx) { 69 | u32 pid = bpf_get_current_pid_tgid(); 70 | if (FILTER) 71 | return 0; 72 | u64 t1 = bpf_ktime_get_ns(); 73 | u64 *t0 = hasht_duration.lookup(&pid); 74 | if (t0) { 75 | u64 delta_time = t1 - *t0; 76 | hist_logicalio.increment(bpf_log2l(delta_time)); 77 | } 78 | return 0; 79 | }; 80 | """ 81 | 82 | # code substitutions 83 | if args.pid: 84 | bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid) 85 | else: 86 | bpf_text = bpf_text.replace('FILTER', '0') 87 | 88 | b = BPF(text=bpf_text) 89 | b.attach_uprobe(name=oracle_executable, sym="kcbgtcr", fn_name="in_kcbgtcr") 90 | b.attach_uretprobe(name=oracle_executable, sym="kcbgtcr", fn_name="out_kcbgtcr") 91 | 92 | # output 93 | 94 | exiting = 0 if args.interval else 1 95 | hist_logicalio = b.get_table("hist_logicalio") 96 | 97 | # Start tracing 98 | print("Latency histograms for kcbgtcr, Oracle logical IO for consistent read... Hit Ctrl-C to end.") 99 | 100 | while (1): 101 | try: 102 | sleep(int(args.interval)) 103 | except KeyboardInterrupt: 104 | exiting = 1 105 | 106 | print() 107 | if args.timestamp: 108 | print("%-8s\n" % strftime("%H:%M:%S"), end="") 109 | 110 | hist_logicalio.print_log2_hist("kcbgtcr latency, ns", "", section_print_fn=int) 111 | hist_logicalio.clear() 112 | 113 | countdown -= 1 114 | if exiting or countdown == 0: 115 | exit() 116 | 117 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_logicalIO_histogram_example.txt: -------------------------------------------------------------------------------- 1 | ora_logicalIO_histogram.py measures the latency between call and return for the Oracle function kcbgtcr, 2 | which is an important part of the logical IO processing for consistent reads, 3 | and reports the values as a series latency histogram 4 | 5 | Example: 6 | 7 | # ./ora_logicalIO_histogram.py -p 123 3 10 8 | Latency histograms for kcbgtcr, Oracle logical IO for consistent read... Hit Ctrl-C to end. 9 | 10 | kcbgtcr latency, ns : count distribution 11 | 0 -> 1 : 0 | | 12 | 2 -> 3 : 0 | | 13 | 4 -> 7 : 0 | | 14 | 8 -> 15 : 0 | | 15 | 16 -> 31 : 0 | | 16 | 32 -> 63 : 0 | | 17 | 64 -> 127 : 0 | | 18 | 128 -> 255 : 0 | | 19 | 256 -> 511 : 0 | | 20 | 512 -> 1023 : 0 | | 21 | 1024 -> 2047 : 0 | | 22 | 2048 -> 4095 : 0 | | 23 | 4096 -> 8191 : 1040 |****************************************| 24 | 8192 -> 16383 : 143 |***** | 25 | 16384 -> 32767 : 20 | | 26 | 32768 -> 65535 : 12 | | 27 | 65536 -> 131071 : 4 | | 28 | 29 | 30 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_sqlparse_trace.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # 3 | # ora_sqlparse_trace.py Basic tracing of Oracle hard parsing using BPF/bcc and uprobes 4 | # 5 | # This script traces SQL hard parsing on Oracle binaries hooking on the Oracle function 6 | # "opiprs" and reads from function arguments (CPU registers) and from process memory. 7 | # This code is experimental and a proof of concept. Use at your own risk. 8 | # 9 | # Usage: ora_sqlparse_trace.py [-h] [-p PID] 10 | # 11 | # Author: Luca.Canali@cern.ch - April 2016 12 | # Licensed under the Apache License, Version 2.0 (the "License") 13 | # 14 | # Credits: 15 | # example scripts in bcc repository, in particular by @BrendanGregg 16 | # @FritsHoogland for collaboration on investigations of Oracle internals 17 | # and userspace tracing 18 | # 19 | 20 | from __future__ import print_function 21 | from bcc import BPF 22 | from time import strftime 23 | import ctypes as ct 24 | import argparse 25 | import os 26 | 27 | examples = """examples: 28 | export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/rdbms 29 | ./ora_sqlparse_trace.py # trace Oracle sql hard parsing 30 | ./ora_sqlparse_trace.py -p 123 # trace PID 123 only 31 | """ 32 | 33 | parser = argparse.ArgumentParser( 34 | description="Oracle sql hard parse tracing\nrequires the environment variable ORACLE_HOME\nrun as root", 35 | formatter_class=argparse.RawDescriptionHelpFormatter, 36 | epilog=examples) 37 | parser.add_argument("-p", "--pid", 38 | help="trace PID only") 39 | args = parser.parse_args() 40 | 41 | # full path of the oracle executable 42 | oracle_executable = os.path.expandvars("$ORACLE_HOME/bin/oracle") 43 | if not os.path.isfile(oracle_executable): 44 | exit("Oracle executable not found.\nPlease set the environment variable ORACLE_HOME.") 45 | 46 | # load BPF program 47 | bpf_text = """ 48 | #include 49 | 50 | struct str_t { 51 | u32 pid; 52 | u64 len; 53 | char sql[400]; 54 | }; 55 | 56 | BPF_PERF_OUTPUT(events); 57 | 58 | int trace_opiprs(struct pt_regs *ctx) { 59 | struct str_t data = {}; 60 | u32 pid; 61 | pid = bpf_get_current_pid_tgid(); 62 | if (FILTER) 63 | return 0; 64 | data.pid = pid; 65 | data.len = ctx->dx; 66 | bpf_probe_read(&data.sql, sizeof(data.sql), (void *)ctx->si); 67 | events.perf_submit(ctx, &data, sizeof(data)); 68 | return 0; 69 | }; 70 | """ 71 | STR_DATA = 400 # match definition of str_t.sql, SQL text display will be trunctaced at this length 72 | 73 | # code substitutions 74 | if args.pid: 75 | bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid) 76 | else: 77 | bpf_text = bpf_text.replace('FILTER', '0') 78 | 79 | class Data(ct.Structure): 80 | _fields_ = [ 81 | ("pid", ct.c_ulong), 82 | ("len", ct.c_ulonglong), 83 | ("sql", ct.c_char * STR_DATA) 84 | ] 85 | 86 | b = BPF(text=bpf_text) 87 | b.attach_uprobe(name=oracle_executable, sym="opiprs", fn_name="trace_opiprs") 88 | 89 | # Start tracing 90 | print("Start tracing Oracle hard parsing... Hit Ctrl-C to end.") 91 | 92 | def print_event(cpu, data, size): 93 | event = ct.cast(data, ct.POINTER(Data)).contents 94 | print("%-9s pid=%d len=%d sql=%s" % (strftime("%H:%M:%S"), event.pid, event.len, event.sql)) 95 | 96 | b["events"].open_perf_buffer(print_event) 97 | while 1: 98 | b.kprobe_poll() 99 | 100 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_sqlparse_trace_12c_18c.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # 3 | # ora_sqlparse_trace.py Basic tracing of Oracle hard parsing using BPF/bcc and uprobes 4 | # 5 | # This script traces SQL hard parsing on Oracle binaries hooking on the Oracle function 6 | # "opiprs" and reads from function arguments (CPU registers) and from process memory. 7 | # This code is experimental and a proof of concept. Use at your own risk. 8 | # 9 | # Usage: ora_sqlparse_trace.py [-h] [-p PID] 10 | # 11 | # Author: Luca.Canali@cern.ch - April 2016 12 | # Licensed under the Apache License, Version 2.0 (the "License") 13 | # 14 | # Credits: 15 | # example scripts in bcc repository, in particular by @BrendanGregg 16 | # @FritsHoogland for collaboration on investigations of Oracle internals 17 | # and userspace tracing 18 | # 19 | 20 | from __future__ import print_function 21 | from bcc import BPF 22 | from time import strftime 23 | import ctypes as ct 24 | import argparse 25 | import os 26 | 27 | examples = """examples: 28 | export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/rdbms 29 | ./ora_sqlparse_trace.py # trace Oracle sql hard parsing 30 | ./ora_sqlparse_trace.py -p 123 # trace PID 123 only 31 | """ 32 | 33 | parser = argparse.ArgumentParser( 34 | description="Oracle sql hard parse tracing\nrequires the environment variable ORACLE_HOME\nrun as root", 35 | formatter_class=argparse.RawDescriptionHelpFormatter, 36 | epilog=examples) 37 | parser.add_argument("-p", "--pid", 38 | help="trace PID only") 39 | args = parser.parse_args() 40 | 41 | # full path of the oracle executable 42 | oracle_executable = os.path.expandvars("$ORACLE_HOME/bin/oracle") 43 | if not os.path.isfile(oracle_executable): 44 | exit("Oracle executable not found.\nPlease set the environment variable ORACLE_HOME.") 45 | 46 | # load BPF program 47 | bpf_text = """ 48 | #include 49 | 50 | struct str_t { 51 | u32 pid; 52 | u64 len; 53 | char sql[400]; 54 | }; 55 | 56 | BPF_PERF_OUTPUT(events); 57 | 58 | int trace_opiprs(struct pt_regs *ctx) { 59 | struct str_t data = {}; 60 | u32 pid; 61 | pid = bpf_get_current_pid_tgid(); 62 | if (FILTER) 63 | return 0; 64 | data.pid = pid; 65 | data.len = ctx->dx; 66 | bpf_probe_read(&data.sql, sizeof(data.sql), (void *)ctx->si); 67 | events.perf_submit(ctx, &data, sizeof(data)); 68 | return 0; 69 | }; 70 | """ 71 | STR_DATA = 400 # match definition of str_t.sql, SQL text display will be trunctaced at this length 72 | 73 | # code substitutions 74 | if args.pid: 75 | bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid) 76 | else: 77 | bpf_text = bpf_text.replace('FILTER', '0') 78 | 79 | class Data(ct.Structure): 80 | _fields_ = [ 81 | ("pid", ct.c_ulong), 82 | ("len", ct.c_ulonglong), 83 | ("sql", ct.c_char * STR_DATA) 84 | ] 85 | 86 | b = BPF(text=bpf_text) 87 | 88 | address_list = BPF.get_user_functions_and_addresses(oracle_executable, "^opiprs$") 89 | 90 | # workaround for Oracle 12c and higher 91 | # need to attach the address of the probe + 2 92 | paddr = address_list[0][1] + 2 93 | 94 | b.attach_uprobe(name=oracle_executable, addr=paddr, fn_name="trace_opiprs") 95 | 96 | # This is the original code, works on Oracle 11c then breaks for higher versions 97 | # b.attach_uprobe(name=oracle_executable, sym="opiprs", fn_name="trace_opiprs") 98 | 99 | # This is a test with a possible patch to bcc to add an offset parameter to attach_uprobe 100 | # b.attach_uprobe(name=oracle_executable, sym="opiprs", fn_name="trace_opiprs", offset=2) 101 | 102 | # Start tracing 103 | print("Start tracing Oracle hard parsing... Hit Ctrl-C to end.") 104 | 105 | def print_event(cpu, data, size): 106 | event = ct.cast(data, ct.POINTER(Data)).contents 107 | print("%-9s pid=%d len=%d sql=%s" % (strftime("%H:%M:%S"), event.pid, event.len, event.sql)) 108 | 109 | b["events"].open_perf_buffer(print_event) 110 | while 1: 111 | b.kprobe_poll() 112 | 113 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_sqlparse_trace_example.txt: -------------------------------------------------------------------------------- 1 | The script ora_sqlparse_trace.py attaches uprobes to the oracle function opiprs, one of the fuctions in the callchain for Oracle hard parsing, parsing of new statements not yet in the cache. 2 | The script reads from the function call parameters (CPU registers) and stack the SQL text and length and prints it to stdout for tracing purposes. 3 | Run the script as root 4 | Requires the environment variable ORACLE_HOME 5 | 6 | Example: 7 | # export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/rdbms 8 | # ./ora_sqlparse_trace.py 9 | 10 | Optionally specify -p , to limit tracing to a single Oracle process 11 | 12 | Test: run SQL on the target Oracle session. Example output: 13 | 14 | Start tracing Oracle hard parsing... Hit Ctrl-C to end. 15 | 12:47:15 pid=123 len=25 sql=select sysdate from dual 16 | ... 17 | 18 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_wait_histogram.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # 3 | # ora_wait_histogram.py - Oracle wait event latency histograms using BPF/bcc and uprobes 4 | # 5 | # This script traces Oracle sessions by hooking on the functions "kskthewt" and 6 | # "kews_update_wait_time" and reads from function arguments (CPU registers). BPF computes 7 | # the latency histogram for the wait events and the script prints the values on stdout. 8 | # This code is experimental and a proof of concept. Use at your own risk. 9 | # 10 | # Usage: ora_wait_histogram.py [-h] [-p PID] 11 | # 12 | # use together with eventsname.sql and eventsname.sed for resolving event# into event name 13 | # generate eventsname.sed from sqlplus using the scrip eventsname.sql 14 | # Example: 15 | # ./ora_wait_histogram -p 123| sed -e 's/event# = /event#=/g' -f eventsname.sed 16 | # 17 | # Example for streaming mode: 18 | # stdbuf -oL ./ora_wait_histogram -p 123| sed -e 's/event# = /event#=/g' -f eventsname.sed 19 | # 20 | # Author: Luca.Canali@cern.ch - April 2016 21 | # Licensed under the Apache License, Version 2.0 (the "License") 22 | # 23 | # Credits: 24 | # example scripts in bcc repository, in particular by @BrendanGregg 25 | # @FritsHoogland for collaboration on investigations of Oracle internals 26 | # and userspace tracing 27 | # 28 | 29 | from __future__ import print_function 30 | from bcc import BPF 31 | from time import sleep, strftime 32 | import ctypes as ct 33 | import argparse 34 | import os 35 | 36 | examples = """examples: 37 | ./ora_wait_histogram.py 1 10 # Oracle event histograms, 1-second summary every 10 seconds 38 | ./ora_wait_histogram.py -p 123 1 10 # trace PID 123 only 39 | """ 40 | 41 | parser = argparse.ArgumentParser( 42 | description="Oracle wait event histograms\nrequires the environment variable ORACLE_HOME\nrun as root", 43 | formatter_class=argparse.RawDescriptionHelpFormatter, 44 | epilog=examples) 45 | parser.add_argument("-p", "--pid", 46 | help="trace PID only") 47 | parser.add_argument("-T", "--timestamp", action="store_true", 48 | help="include timestamp on output") 49 | parser.add_argument("interval", nargs="?", default=99999999, 50 | help="output interval, in seconds") 51 | parser.add_argument("count", nargs="?", default=99999999, 52 | help="number of outputs") 53 | args = parser.parse_args() 54 | countdown = int(args.count) 55 | 56 | # full path of the oracle executable 57 | oracle_executable = os.path.expandvars("$ORACLE_HOME/bin/oracle") 58 | if not os.path.isfile(oracle_executable): 59 | exit("Oracle executable not found.\nPlease set the environment variable ORACLE_HOME.") 60 | 61 | # load BPF program 62 | bpf_text = """ 63 | #include 64 | 65 | typedef struct str_event { 66 | u64 event; 67 | u64 wait_time; 68 | } str_event_t; 69 | 70 | BPF_HASH(wait_time, u32); 71 | BPF_HISTOGRAM(eventhist, str_event_t); 72 | 73 | int trace_kskthewt(struct pt_regs *ctx) { 74 | str_event_t data = {}; 75 | u32 pid; 76 | u64 *wt; 77 | pid = bpf_get_current_pid_tgid(); 78 | if (FILTER) 79 | return 0; 80 | data.event = ctx->si; 81 | wt = wait_time.lookup(&pid); 82 | if (wt) 83 | data.wait_time = bpf_log2l(*wt); 84 | else 85 | data.wait_time = 0; 86 | 87 | eventhist.increment(data); 88 | return 0; 89 | }; 90 | 91 | int trace_kews_update_wait_time(struct pt_regs *ctx) { 92 | u32 pid; 93 | u64 rsi; 94 | pid = bpf_get_current_pid_tgid(); 95 | if (FILTER) 96 | return 0; 97 | rsi = ctx->si; 98 | if (rsi > 0) 99 | wait_time.update(&pid, &rsi); 100 | return 0; 101 | }; 102 | 103 | """ 104 | 105 | # code substitutions 106 | if args.pid: 107 | bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid) 108 | else: 109 | bpf_text = bpf_text.replace('FILTER', '0') 110 | 111 | b = BPF(text=bpf_text) 112 | b.attach_uprobe(name=oracle_executable, sym="kskthewt", fn_name="trace_kskthewt") 113 | b.attach_uprobe(name=oracle_executable, sym="kews_update_wait_time", fn_name="trace_kews_update_wait_time") 114 | 115 | # output 116 | 117 | exiting = 0 if args.interval else 1 118 | eventhist = b.get_table("eventhist") 119 | 120 | # Start tracing 121 | print("Start tracing oracle wait events... Hit Ctrl-C to end.") 122 | 123 | while (1): 124 | try: 125 | sleep(int(args.interval)) 126 | except KeyboardInterrupt: 127 | exiting = 1 128 | 129 | print() 130 | if args.timestamp: 131 | print("Time = %-8s\n" % strftime("%H:%M:%S"), end="") 132 | 133 | eventhist.print_log2_hist("wait time, microsec", "event#", section_print_fn=int) 134 | eventhist.clear() 135 | 136 | countdown -= 1 137 | if exiting or countdown == 0: 138 | exit() 139 | 140 | 141 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_wait_histogram_12c_18c.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # 3 | # ora_wait_histogram.py - Oracle wait event latency histograms using BPF/bcc and uprobes 4 | # 5 | # This script traces Oracle sessions by hooking on the functions "kskthewt" and 6 | # "kews_update_wait_time" and reads from function arguments (CPU registers). BPF computes 7 | # the latency histogram for the wait events and the script prints the values on stdout. 8 | # This code is experimental and a proof of concept. Use at your own risk. 9 | # 10 | # Usage: ora_wait_histogram.py [-h] [-p PID] 11 | # 12 | # use together with eventsname.sql and eventsname.sed for resolving event# into event name 13 | # generate eventsname.sed from sqlplus using the scrip eventsname.sql 14 | # Example: 15 | # ./ora_wait_histogram -p 123| sed -e 's/event# = /event#=/g' -f eventsname.sed 16 | # 17 | # Example for streaming mode: 18 | # stdbuf -oL ./ora_wait_histogram -p 123| sed -e 's/event# = /event#=/g' -f eventsname.sed 19 | # 20 | # Author: Luca.Canali@cern.ch - April 2016 21 | # Licensed under the Apache License, Version 2.0 (the "License") 22 | # 23 | # Credits: 24 | # example scripts in bcc repository, in particular by @BrendanGregg 25 | # @FritsHoogland for collaboration on investigations of Oracle internals 26 | # and userspace tracing 27 | # 28 | 29 | from __future__ import print_function 30 | from bcc import BPF 31 | from time import sleep, strftime 32 | import ctypes as ct 33 | import argparse 34 | import os 35 | 36 | examples = """examples: 37 | ./ora_wait_histogram.py 1 10 # Oracle event histograms, 1-second summary every 10 seconds 38 | ./ora_wait_histogram.py -p 123 1 10 # trace PID 123 only 39 | """ 40 | 41 | parser = argparse.ArgumentParser( 42 | description="Oracle wait event histograms\nrequires the environment variable ORACLE_HOME\nrun as root", 43 | formatter_class=argparse.RawDescriptionHelpFormatter, 44 | epilog=examples) 45 | parser.add_argument("-p", "--pid", 46 | help="trace PID only") 47 | parser.add_argument("-T", "--timestamp", action="store_true", 48 | help="include timestamp on output") 49 | parser.add_argument("interval", nargs="?", default=99999999, 50 | help="output interval, in seconds") 51 | parser.add_argument("count", nargs="?", default=99999999, 52 | help="number of outputs") 53 | args = parser.parse_args() 54 | countdown = int(args.count) 55 | 56 | # full path of the oracle executable 57 | oracle_executable = os.path.expandvars("$ORACLE_HOME/bin/oracle") 58 | if not os.path.isfile(oracle_executable): 59 | exit("Oracle executable not found.\nPlease set the environment variable ORACLE_HOME.") 60 | 61 | # load BPF program 62 | bpf_text = """ 63 | #include 64 | 65 | typedef struct str_event { 66 | u64 event; 67 | u64 wait_time; 68 | } str_event_t; 69 | 70 | BPF_HASH(wait_time, u32); 71 | BPF_HISTOGRAM(eventhist, str_event_t); 72 | 73 | int trace_kskthewt(struct pt_regs *ctx) { 74 | str_event_t data = {}; 75 | u32 pid; 76 | u64 *wt; 77 | pid = bpf_get_current_pid_tgid(); 78 | if (FILTER) 79 | return 0; 80 | data.event = ctx->si; 81 | wt = wait_time.lookup(&pid); 82 | if (wt) 83 | data.wait_time = bpf_log2l(*wt); 84 | else 85 | data.wait_time = 0; 86 | 87 | eventhist.increment(data); 88 | return 0; 89 | }; 90 | 91 | int trace_kews_update_wait_time(struct pt_regs *ctx) { 92 | u32 pid; 93 | u64 rsi; 94 | pid = bpf_get_current_pid_tgid(); 95 | if (FILTER) 96 | return 0; 97 | rsi = ctx->si; 98 | if (rsi > 0) 99 | wait_time.update(&pid, &rsi); 100 | return 0; 101 | }; 102 | 103 | """ 104 | 105 | # code substitutions 106 | if args.pid: 107 | bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid) 108 | else: 109 | bpf_text = bpf_text.replace('FILTER', '0') 110 | 111 | b = BPF(text=bpf_text) 112 | 113 | # workaround for Oracle 12c and higher 114 | # need to attach the address of the probe + 2 115 | address_list = BPF.get_user_functions_and_addresses(oracle_executable, "^kskthewt$") 116 | paddr = address_list[0][1] + 2 117 | b.attach_uprobe(name=oracle_executable, addr=paddr, fn_name="trace_kskthewt") 118 | 119 | 120 | # This is the original code, works on Oracle 11c then breaks for higher versions 121 | # b.attach_uprobe(name=oracle_executable, sym="kskthewt", fn_name="trace_kskthewt") 122 | 123 | # workaround for Oracle 12c and higher 124 | # need to attach the address of the probe + 2 125 | address_list = BPF.get_user_functions_and_addresses(oracle_executable, "^kews_update_wait_time$") 126 | paddr = address_list[0][1] + 2 127 | b.attach_uprobe(name=oracle_executable, addr=paddr, fn_name="trace_kews_update_wait_time") 128 | 129 | # This is the original code, works on Oracle 11c then breaks for higher versions 130 | # b.attach_uprobe(name=oracle_executable, sym="kews_update_wait_time", fn_name="trace_kews_update_wait_time") 131 | 132 | # output 133 | 134 | exiting = 0 if args.interval else 1 135 | eventhist = b.get_table("eventhist") 136 | 137 | # Start tracing 138 | print("Start tracing oracle wait events... Hit Ctrl-C to end.") 139 | 140 | while (1): 141 | try: 142 | sleep(int(args.interval)) 143 | except KeyboardInterrupt: 144 | exiting = 1 145 | 146 | print() 147 | if args.timestamp: 148 | print("Time = %-8s\n" % strftime("%H:%M:%S"), end="") 149 | 150 | eventhist.print_log2_hist("wait time, microsec", "event#", section_print_fn=int) 151 | eventhist.clear() 152 | 153 | countdown -= 1 154 | if exiting or countdown == 0: 155 | exit() 156 | 157 | 158 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_wait_histogram_example.txt: -------------------------------------------------------------------------------- 1 | The ora_wait_histogram.py attaches uprobes to some relevant fuctions in the callchain for Oracle wait events: kskthewt and kews_update_wait_time 2 | The script reads from the function call parameters (CPU registers) the wait event details and reports the values as a series latency histogram 3 | 4 | Optional step: 5 | create eventsname.sed by running (as oracle) the supplied script eventsname.sql 6 | this provides translation from the wait event number to event name 7 | stdbuf can be used when running in "streaming mode" 8 | 9 | Example: 10 | 11 | # stdbuf -oL ./ora_wait_histogram.py 10 10|sed -e 's/event# = /event#=/g' -f eventsname.sed 12 | Start tracing oracle wait events... Hit Ctrl-C to end. 13 | 14 | event=db file sequential read 15 | wait time, microsec : count distribution 16 | 0 -> 1 : 0 | | 17 | 2 -> 3 : 0 | | 18 | 4 -> 7 : 0 | | 19 | 8 -> 15 : 0 | | 20 | 16 -> 31 : 0 | | 21 | 32 -> 63 : 0 | | 22 | 64 -> 127 : 25 | | 23 | 128 -> 255 : 24521 |******************** | 24 | 256 -> 511 : 46788 |****************************************| 25 | 512 -> 1023 : 12169 |********** | 26 | 1024 -> 2047 : 1132 | | 27 | 2048 -> 4095 : 660 | | 28 | 4096 -> 8191 : 248 | | 29 | 8192 -> 16383 : 29 | | 30 | 31 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_wait_trace.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # 3 | # ora_wait_trace.py - Basic oracle wait event tracing using BPF/bcc and uprobes 4 | # 5 | # This script traces Oracle sessions by hooking on the functions "kskthewt" and 6 | # "kews_update_wait_time" and reads from function arguments (CPU registers). 7 | # This code is experimental and a proof of concept. Use at your own risk. 8 | # 9 | # Usage: ora_wait_trace.py [-h] [-p PID] 10 | # 11 | # Use together with eventsname.sql and eventsname.sed for resolving event# into event name 12 | # generate eventsname.sed from sqlplus using the scrip eventsname.sql 13 | # Example: 14 | # ./ora_waittrace.py -p 123| sed -f eventsname.sed 15 | # 16 | # Example for the streaming mode using stdbuf to avois buffering effects: 17 | # stdbuf -oL ./ora_wait_trace.py -p 123| sed -f ~oracle/luca/eventsname.sed 18 | # 19 | # Author: Luca.Canali@cern.ch - April 2016 20 | # Licensed under the Apache License, Version 2.0 (the "License") 21 | # 22 | # Credits: 23 | # example scripts in bcc repository, in particular by @BrendanGregg 24 | # @FritsHoogland for collaboration on investigations of Oracle internals 25 | # and userspace tracing 26 | # 27 | 28 | from __future__ import print_function 29 | from bcc import BPF 30 | from time import strftime 31 | import ctypes as ct 32 | import argparse 33 | import os 34 | 35 | examples = """examples: 36 | export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/rdbms 37 | ./ora_wait_trace.py # trace Oracle wait events 38 | ./ora_wait_trace.py -p 123 # trace PID 123 only 39 | """ 40 | 41 | parser = argparse.ArgumentParser( 42 | description="Trace Oracle wait events with BPF/bcc\nrequires the environment variable ORACLE_HOME\nrun as root", 43 | formatter_class=argparse.RawDescriptionHelpFormatter, 44 | epilog=examples) 45 | parser.add_argument("-p", "--pid", 46 | help="trace PID only") 47 | args = parser.parse_args() 48 | 49 | # full path of the oracle executable 50 | oracle_executable = os.path.expandvars("$ORACLE_HOME/bin/oracle") 51 | if not os.path.isfile(oracle_executable): 52 | exit("Oracle executable not found.\nPlease set the environment variable ORACLE_HOME.") 53 | 54 | # load BPF program 55 | bpf_text = """ 56 | #include 57 | 58 | struct str_t { 59 | u32 pid; 60 | u64 event; 61 | u64 wait_time; 62 | }; 63 | 64 | BPF_HASH(wait_time, u32); 65 | BPF_PERF_OUTPUT(events); 66 | 67 | int trace_kskthewt(struct pt_regs *ctx) { 68 | struct str_t data = {}; 69 | u32 pid; 70 | u64 *wt; 71 | pid = bpf_get_current_pid_tgid(); 72 | if (FILTER) 73 | return 0; 74 | data.pid = pid; 75 | data.event = ctx->si; 76 | wt = wait_time.lookup(&pid); 77 | if (wt) 78 | data.wait_time = *wt; 79 | else 80 | data.wait_time = 0; 81 | events.perf_submit(ctx, &data, sizeof(data)); 82 | return 0; 83 | }; 84 | 85 | int trace_kews_update_wait_time(struct pt_regs *ctx) { 86 | u32 pid; 87 | u64 rsi; 88 | pid = bpf_get_current_pid_tgid(); 89 | if (FILTER) 90 | return 0; 91 | rsi = ctx->si; 92 | if (rsi > 0) 93 | wait_time.update(&pid, &rsi); 94 | return 0; 95 | }; 96 | 97 | """ 98 | 99 | # code substitutions 100 | if args.pid: 101 | bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid) 102 | else: 103 | bpf_text = bpf_text.replace('FILTER', '0') 104 | 105 | class Data(ct.Structure): 106 | _fields_ = [ 107 | ("pid", ct.c_ulong), 108 | ("event", ct.c_ulonglong), 109 | ("wait_time", ct.c_ulonglong) 110 | ] 111 | 112 | b = BPF(text=bpf_text) 113 | b.attach_uprobe(name=oracle_executable, sym="kskthewt", fn_name="trace_kskthewt") 114 | b.attach_uprobe(name=oracle_executable, sym="kews_update_wait_time", fn_name="trace_kews_update_wait_time") 115 | 116 | # Start tracing 117 | print("Start tracing oracle wait events... Hit Ctrl-C to end.") 118 | 119 | def print_event(cpu, data, size): 120 | event = ct.cast(data, ct.POINTER(Data)).contents 121 | print("%-9s pid=%d event#=%d wait_time=%d" % (strftime("%H:%M:%S"), event.pid, event.event, event.wait_time)) 122 | 123 | b["events"].open_perf_buffer(print_event) 124 | while 1: 125 | b.kprobe_poll() 126 | 127 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_wait_trace_12c_18c.py: -------------------------------------------------------------------------------- 1 | head 1.1; 2 | access; 3 | symbols; 4 | locks; strict; 5 | comment @# @; 6 | 7 | 8 | 1.1 9 | date 2019.02.14.15.03.33; author root; state Exp; 10 | branches; 11 | next ; 12 | 13 | 14 | desc 15 | @@ 16 | 17 | 18 | 1.1 19 | log 20 | @Initial revision 21 | @ 22 | text 23 | @#!/usr/bin/python 24 | # 25 | # ora_wait_trace.py - Basic oracle wait event tracing using BPF/bcc and uprobes 26 | # 27 | # This script traces Oracle sessions by hooking on the functions "kskthewt" and 28 | # "kews_update_wait_time" and reads from function arguments (CPU registers). 29 | # This code is experimental and a proof of concept. Use at your own risk. 30 | # 31 | # Usage: ora_wait_trace.py [-h] [-p PID] 32 | # 33 | # Use together with eventsname.sql and eventsname.sed for resolving event# into event name 34 | # generate eventsname.sed from sqlplus using the scrip eventsname.sql 35 | # Example: 36 | # ./ora_waittrace.py -p 123| sed -f eventsname.sed 37 | # 38 | # Example for the streaming mode using stdbuf to avois buffering effects: 39 | # stdbuf -oL ./ora_wait_trace.py -p 123| sed -f ~oracle/luca/eventsname.sed 40 | # 41 | # Author: Luca.Canali@@cern.ch - April 2016 42 | # Licensed under the Apache License, Version 2.0 (the "License") 43 | # 44 | # Credits: 45 | # example scripts in bcc repository, in particular by @@BrendanGregg 46 | # @@FritsHoogland for collaboration on investigations of Oracle internals 47 | # and userspace tracing 48 | # 49 | 50 | from __future__ import print_function 51 | from bcc import BPF 52 | from time import strftime 53 | import ctypes as ct 54 | import argparse 55 | import os 56 | 57 | examples = """examples: 58 | export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/rdbms 59 | ./ora_wait_trace.py # trace Oracle wait events 60 | ./ora_wait_trace.py -p 123 # trace PID 123 only 61 | """ 62 | 63 | parser = argparse.ArgumentParser( 64 | description="Trace Oracle wait events with BPF/bcc\nrequires the environment variable ORACLE_HOME\nrun as root", 65 | formatter_class=argparse.RawDescriptionHelpFormatter, 66 | epilog=examples) 67 | parser.add_argument("-p", "--pid", 68 | help="trace PID only") 69 | args = parser.parse_args() 70 | 71 | # full path of the oracle executable 72 | oracle_executable = os.path.expandvars("$ORACLE_HOME/bin/oracle") 73 | if not os.path.isfile(oracle_executable): 74 | exit("Oracle executable not found.\nPlease set the environment variable ORACLE_HOME.") 75 | 76 | # load BPF program 77 | bpf_text = """ 78 | #include 79 | 80 | struct str_t { 81 | u32 pid; 82 | u64 event; 83 | u64 wait_time; 84 | }; 85 | 86 | BPF_HASH(wait_time, u32); 87 | BPF_PERF_OUTPUT(events); 88 | 89 | int trace_kskthewt(struct pt_regs *ctx) { 90 | struct str_t data = {}; 91 | u32 pid; 92 | u64 *wt; 93 | pid = bpf_get_current_pid_tgid(); 94 | if (FILTER) 95 | return 0; 96 | data.pid = pid; 97 | data.event = ctx->si; 98 | wt = wait_time.lookup(&pid); 99 | if (wt) 100 | data.wait_time = *wt; 101 | else 102 | data.wait_time = 0; 103 | events.perf_submit(ctx, &data, sizeof(data)); 104 | return 0; 105 | }; 106 | 107 | int trace_kews_update_wait_time(struct pt_regs *ctx) { 108 | u32 pid; 109 | u64 rsi; 110 | pid = bpf_get_current_pid_tgid(); 111 | if (FILTER) 112 | return 0; 113 | rsi = ctx->si; 114 | if (rsi > 0) 115 | wait_time.update(&pid, &rsi); 116 | return 0; 117 | }; 118 | 119 | """ 120 | 121 | # code substitutions 122 | if args.pid: 123 | bpf_text = bpf_text.replace('FILTER', 'pid != %s' % args.pid) 124 | else: 125 | bpf_text = bpf_text.replace('FILTER', '0') 126 | 127 | class Data(ct.Structure): 128 | _fields_ = [ 129 | ("pid", ct.c_ulong), 130 | ("event", ct.c_ulonglong), 131 | ("wait_time", ct.c_ulonglong) 132 | ] 133 | 134 | b = BPF(text=bpf_text) 135 | 136 | # workaround for Oracle 12c and higher 137 | # need to attach the address of the probe + 2 138 | address_list = BPF.get_user_functions_and_addresses(oracle_executable, "kskthewt") 139 | paddr = address_list[0][1] + 2 140 | b.attach_uprobe(name=oracle_executable, addr=paddr, fn_name="trace_kskthewt") 141 | 142 | # This is the original code, works on Oracle 11c then breaks for higher versions 143 | # b.attach_uprobe(name=oracle_executable, sym="kskthewt", fn_name="trace_kskthewt") 144 | 145 | # workaround for Oracle 12c and higher 146 | # need to attach the address of the probe + 2 147 | address_list = BPF.get_user_functions_and_addresses(oracle_executable, "^kews_update_wait_time$") 148 | paddr = address_list[0][1] + 2 149 | b.attach_uprobe(name=oracle_executable, addr=paddr, fn_name="trace_kews_update_wait_time") 150 | 151 | # This is the original code, works on Oracle 11c then breaks for higher versions 152 | # b.attach_uprobe(name=oracle_executable, sym="kews_update_wait_time", fn_name="trace_kews_update_wait_time") 153 | 154 | # Start tracing 155 | print("Start tracing oracle wait events... Hit Ctrl-C to end.") 156 | 157 | def print_event(cpu, data, size): 158 | event = ct.cast(data, ct.POINTER(Data)).contents 159 | print("%-9s pid=%d event#=%d wait_time=%d" % (strftime("%H:%M:%S"), event.pid, event.event, event.wait_time)) 160 | 161 | b["events"].open_perf_buffer(print_event) 162 | while 1: 163 | b.kprobe_poll() 164 | 165 | @ 166 | -------------------------------------------------------------------------------- /BPF-bcc_Userspace_Oracle/ora_wait_trace_example.txt: -------------------------------------------------------------------------------- 1 | The script ora_wait_trace.py attaches uprobes to some relevant fuctions in the callchain for Oracle wait events: kskthewt and kews_update_wait_time 2 | The script reads from the function call parameters (CPU registers) the wait event details and prints them to stdout for tracing 3 | Run the script as root 4 | Requires the environment variable ORACLE_HOME 5 | 6 | Optional step: 7 | create eventsname.sed by running (as oracle) the supplied script eventsname.sql 8 | this provides translation from the wait event number to event name 9 | stdbuf can be used when running in "streming mode" to avoid buffering effects in the output 10 | 11 | Example: 12 | # export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/rdbms 13 | 14 | # stdbuf -oL ./ora_wait_trace.py -p 123| sed -f eventsname.sed 15 | 16 | Start tracing oracle wait events... Hit Ctrl-C to end. 17 | 15:02:27 pid=10456 event=SQL*Net message from client wait_time=19430133 18 | 15:02:27 pid=10456 event=Disk file operations I/O wait_time=869 19 | 15:02:27 pid=10456 event=db file sequential read wait_time=434 20 | 15:02:27 pid=10456 event=db file sequential read wait_time=446 21 | 15:02:27 pid=10456 event=db file sequential read wait_time=430 22 | 15:02:27 pid=10456 event=SQL*Net message to client wait_time=6 23 | 15:02:27 pid=10456 event=db file sequential read wait_time=2494 24 | 15:02:27 pid=10456 event=db file scattered read wait_time=829 25 | ... 26 | 27 | -------------------------------------------------------------------------------- /Ftrace/README: -------------------------------------------------------------------------------- 1 | Ftrace and iolatency_micro 2 | 3 | Measure I/O latency from the block interface, using trace points and ftrace 4 | The script is a minor modification and extension of Brendan Gregg's iolatency 5 | https://github.com/brendangregg/perf-tools 6 | 7 | Modifications by Luca.Canali@cern.ch, July 2015: 8 | - The script now reports latency in microseconds (was milliseconds in the original script) 9 | - In addition: time waited per bucket are now reported 10 | - The option -m has been added in particular to avoid double counting of IOPS when using device mapper 11 | - Renamed the script to iolatency_micro 12 | 13 | USAGE: iolatency [-hQT] [-d device] [-m major_device] [-i iotype] [interval [count]] 14 | 15 | Example: 16 | 17 | [root@myserver luca]# ./iolatency_micro -m 253 18 | Tracing block I/O. Output every DeltaT = 1 seconds. Ctrl-C to end. 19 | 20 | >=(mus) .. <(mus) : IOPS IO_latency/DeltaT |IOPS Distribution | 21 | 0 -> 1 : 14 0 |# | 22 | 1 -> 2 : 0 0 | | 23 | 2 -> 4 : 1 3 |# | 24 | 4 -> 8 : 0 0 | | 25 | 8 -> 16 : 0 0 | | 26 | 16 -> 32 : 1 24 |# | 27 | 32 -> 64 : 6 288 |# | 28 | 64 -> 128 : 12 1152 |# | 29 | 128 -> 256 : 45 8640 |# | 30 | 256 -> 512 : 1396 536064 |## | 31 | 512 -> 1024 : 23945 18389760 |########################### | 32 | 1024 -> 2048 : 34846 53523456 |######################################| 33 | 2048 -> 4096 : 3584 11010048 |#### | 34 | 4096 -> 8192 : 243 1492992 |# | 35 | 8192 -> 16384 : 116 1425408 |# | 36 | 37 | >=(mus) .. <(mus) : IOPS IO_latency/DeltaT |IOPS Distribution | 38 | 0 -> 1 : 14 0 |# | 39 | 1 -> 2 : 0 0 | | 40 | 2 -> 4 : 1 3 |# | 41 | 4 -> 8 : 0 0 | | 42 | 8 -> 16 : 2 24 |# | 43 | 16 -> 32 : 2 48 |# | 44 | 32 -> 64 : 5 240 |# | 45 | 64 -> 128 : 5 480 |# | 46 | 128 -> 256 : 38 7296 |# | 47 | 256 -> 512 : 1551 595584 |## | 48 | 512 -> 1024 : 28627 21985536 |############################# | 49 | 1024 -> 2048 : 37716 57931776 |######################################| 50 | 2048 -> 4096 : 2255 6927360 |### | 51 | 4096 -> 8192 : 392 2408448 |# | 52 | 8192 -> 16384 : 108 1327104 |# | 53 | 16384 -> 32768 : 0 0 | | 54 | 32768 -> 65536 : 41 2015232 |# | 55 | ^C 56 | Ending tracing... 57 | -------------------------------------------------------------------------------- /Ftrace/iolatency_micro: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # iolatency_micro - summarize block device I/O latency as a histogram. 4 | # Written using Linux ftrace. 5 | # 6 | # This shows the distribution of latency, allowing modes and latency outliers 7 | # to be identified and studied. 8 | # 9 | # USAGE: ./iolatency_micro [-hQT] [-d device] [-i iotype] [interval [count]] 10 | # 11 | # REQUIREMENTS: FTRACE CONFIG and block:block_rq_* tracepoints, which you may 12 | # already have on recent kernels. 13 | # 14 | # OVERHEAD: block device I/O issue and completion events are traced and buffered 15 | # in-kernel, then processed and summarized in user space. There may be 16 | # measurable overhead with this approach, relative to the block device IOPS. 17 | # 18 | # This was written as a proof of concept for ftrace. 19 | # 20 | # A modified version of perf-tools iolatency, originally at: https://github.com/brendangregg/perf-tools 21 | # 22 | # COPYRIGHT: Copyright (c) 2014 Brendan Gregg. 23 | # Modified and extended in July 2015 by Luca Canali. 24 | # 25 | # This program is free software; you can redistribute it and/or 26 | # modify it under the terms of the GNU General Public License 27 | # as published by the Free Software Foundation; either version 2 28 | # of the License, or (at your option) any later version. 29 | # 30 | # This program is distributed in the hope that it will be useful, 31 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 32 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 33 | # GNU General Public License for more details. 34 | # 35 | # You should have received a copy of the GNU General Public License 36 | # along with this program; if not, write to the Free Software Foundation, 37 | # Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. 38 | # 39 | # (http://www.gnu.org/copyleft/gpl.html) 40 | # 41 | # 20-Jul-2014 Brendan Gregg Created this. 42 | # July 2015 Luca.Canali@cern.ch has performed minor changes and additions 43 | # and renamed the script to iolatency_micro 44 | # The script now reports latency in microseconds 45 | # IOPS and time waited per bucket are now reported 46 | # The option -m has been added in particular to avoid 47 | # double counting of IOPS when using device mapper 48 | 49 | ### default variables 50 | tracing=/sys/kernel/debug/tracing 51 | flock=/var/tmp/.ftrace-lock 52 | bufsize_kb=4096 53 | opt_device=0; device=; opt_iotype=0; iotype=; opt_timestamp=0 54 | opt_interval=0; interval=1; opt_count=0; count=0; opt_queue=0 55 | trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section 56 | 57 | function usage { 58 | cat <<-END >&2 59 | USAGE: iolatency [-hQT] [-d device] [-m major_device] [-i iotype] [interval [count]] 60 | -d device # device string (eg, "202,1) 61 | -m major_device # major device numbed (eg, 253 for device mapper) 62 | -i iotype # match type (eg, '*R*' for all reads) 63 | -Q # use queue insert as start time 64 | -T # timestamp on output 65 | -h # this usage message 66 | interval # summary interval, seconds (default 1) 67 | count # number of summaries 68 | eg, 69 | iolatency # summarize latency every second 70 | iolatency -Q # include block I/O queue time 71 | iolatency 5 2 # 2 x 5 second summaries 72 | iolatency -i '*R*' # trace reads 73 | iolatency -d 202,1 # trace device 202,1 only 74 | 75 | See the man page and example file for more info. 76 | END 77 | exit 78 | } 79 | 80 | function warn { 81 | if ! eval "$@"; then 82 | echo >&2 "WARNING: command failed \"$@\"" 83 | fi 84 | } 85 | 86 | function end { 87 | # disable tracing 88 | echo 2>/dev/null 89 | echo "Ending tracing..." 2>/dev/null 90 | cd $tracing 91 | warn "echo 0 > events/block/$b_start/enable" 92 | warn "echo 0 > events/block/block_rq_complete/enable" 93 | if (( opt_device || opt_iotype )); then 94 | warn "echo 0 > events/block/$b_start/filter" 95 | warn "echo 0 > events/block/block_rq_complete/filter" 96 | fi 97 | warn "echo > trace" 98 | (( wroteflock )) && warn "rm $flock" 99 | } 100 | 101 | function die { 102 | echo >&2 "$@" 103 | exit 1 104 | } 105 | 106 | function edie { 107 | # die with a quiet end() 108 | echo >&2 "$@" 109 | exec >/dev/null 2>&1 110 | end 111 | exit 1 112 | } 113 | 114 | ### process options 115 | while getopts m:d:hi:QT opt 116 | do 117 | case $opt in 118 | d) opt_device=1; device=$OPTARG ;; 119 | m) opt_major=1; majordev=$OPTARG ;; 120 | i) opt_iotype=1; iotype=$OPTARG ;; 121 | Q) opt_queue=1 ;; 122 | T) opt_timestamp=1 ;; 123 | h|?) usage ;; 124 | esac 125 | done 126 | shift $(( $OPTIND - 1 )) 127 | if (( $# )); then 128 | opt_interval=1 129 | interval=$1 130 | shift 131 | fi 132 | if (( $# )); then 133 | opt_count=1 134 | count=$1 135 | fi 136 | if (( opt_device )); then 137 | major=${device%,*} 138 | minor=${device#*,} 139 | dev=$(( (major << 20) + minor )) 140 | fi 141 | if (( opt_major )); then 142 | devmin=$(( $majordev << 20 )) 143 | devmax=$(( ($majordev + 1) << 20 )) 144 | fi 145 | if (( opt_queue )); then 146 | b_start=block_rq_insert 147 | else 148 | b_start=block_rq_issue 149 | fi 150 | 151 | ### select awk 152 | [[ -x /usr/bin/mawk ]] && awk='mawk -W interactive' || awk=awk 153 | 154 | ### check permissions 155 | cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE? 156 | debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)" 157 | 158 | ### ftrace lock 159 | [[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock" 160 | echo $$ > $flock || die "ERROR: unable to write $flock." 161 | wroteflock=1 162 | 163 | ### setup and begin tracing 164 | warn "echo nop > current_tracer" 165 | warn "echo $bufsize_kb > buffer_size_kb" 166 | filter= 167 | if (( opt_iotype )); then 168 | filter="rwbs ~ \"$iotype\"" 169 | fi 170 | if (( opt_device )); then 171 | [[ "$filter" != "" ]] && filter="$filter && " 172 | filter="${filter}dev == $dev" 173 | fi 174 | if (( opt_major )); then 175 | [[ "$filter" != "" ]] && filter="$filter && " 176 | filter="${filter}((dev >= $devmin) && (dev < $devmax))" 177 | fi 178 | if (( opt_iotype || opt_device || opt_major )); then 179 | if ! echo "$filter" > events/block/$b_start/filter || \ 180 | ! echo "$filter" > events/block/block_rq_complete/filter 181 | then 182 | edie "ERROR: setting -d or -t filter. Exiting." 183 | fi 184 | fi 185 | if ! echo 1 > events/block/$b_start/enable || \ 186 | ! echo 1 > events/block/block_rq_complete/enable; then 187 | edie "ERROR: enabling block I/O tracepoints. Exiting." 188 | fi 189 | etext= 190 | (( !opt_count )) && etext=" Ctrl-C to end." 191 | echo "Tracing block I/O. Output every DeltaT = $interval seconds.$etext" 192 | 193 | # 194 | # Determine output format. It may be one of the following (newest first): 195 | # TASK-PID CPU# |||| TIMESTAMP FUNCTION 196 | # TASK-PID CPU# TIMESTAMP FUNCTION 197 | # To differentiate between them, the number of header fields is counted, 198 | # and an offset set, to skip the extra column when needed. 199 | # 200 | offset=$($awk 'BEGIN { o = 0; } 201 | $1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; } 202 | $2 ~ /TASK/ { print o; exit }' trace) 203 | 204 | ### print trace buffer 205 | warn "echo > trace" 206 | i=0 207 | while (( !opt_count || (i < count) )); do 208 | (( i++ )) 209 | sleep $interval 210 | 211 | # snapshots were added in 3.10 212 | if [[ -x snapshot ]]; then 213 | echo 1 > snapshot 214 | echo > trace 215 | cat snapshot 216 | else 217 | cat trace 218 | echo > trace 219 | fi 220 | 221 | (( opt_timestamp )) && printf "time %(%H:%M:%S)T:\n" -1 222 | echo "tick" 223 | done | \ 224 | $awk -v o=$offset -v opt_timestamp=$opt_timestamp -v b_start=$b_start -v interval=$interval ' 225 | function star(sval, smax, swidth) { 226 | stars = "" 227 | if (smax == 0) return "" 228 | for (si = 0; si < (swidth * sval / smax); si++) { 229 | stars = stars "#" 230 | } 231 | return stars 232 | } 233 | 234 | BEGIN { max_i = 0 } 235 | 236 | # common fields 237 | $1 != "#" { 238 | time = $(3+o); sub(":", "", time) 239 | dev = $(5+o) 240 | } 241 | 242 | # block I/O request 243 | $1 != "#" && $0 ~ b_start { 244 | # 245 | # example: (fields1..4+o) 202,1 W 0 () 12862264 + 8 [tar] 246 | # The cmd field "()" might contain multiple words (hex), 247 | # hence stepping from the right (NF-3). 248 | # 249 | loc = $(NF-3) 250 | starts[dev, loc] = time 251 | next 252 | } 253 | 254 | # block I/O completion 255 | $1 != "#" && $0 ~ /rq_complete/ { 256 | # 257 | # example: (fields1..4+o) 202,1 W () 12862256 + 8 [0] 258 | # 259 | dir = $(6+o) 260 | loc = $(NF-3) 261 | 262 | if (starts[dev, loc] > 0) { 263 | latency_mus = 1000000 * (time - starts[dev, loc]) #changed from 1000 to 1000000 264 | i = 0 265 | for (mus = 1; latency_mus > mus; mus *= 2) { i++ } 266 | hist[i]++ 267 | if (i > max_i) 268 | max_i = i 269 | delete starts[dev, loc] 270 | } 271 | next 272 | } 273 | 274 | # timestamp 275 | $1 == "time" { 276 | lasttime = $2 277 | } 278 | 279 | # print summary 280 | $1 == "tick" { 281 | print "" 282 | if (opt_timestamp) 283 | print lasttime 284 | 285 | # find min_i value 286 | for (min_i = 0; min_i <= max_i; min_i++) { 287 | if (hist[min_i] > 0) 288 | break 289 | } 290 | 291 | # find max value 292 | max_v = 0 293 | for (i = 0; i <= max_i; i++) { 294 | if (hist[i] > max_v) 295 | max_v = hist[i] 296 | } 297 | 298 | # print histogram 299 | printf "%8s .. %-8s: %-10s %-17s |%-38s|\n", ">=(mus)", "<(mus)", 300 | "IOPS", "IO_latency/DeltaT", "IOPS Distribution" 301 | mus = 1 302 | from = 0 303 | for (i = 0; i <= max_i; i++) { 304 | if (i >= min_i-1) { 305 | printf "%8d -> %-8d: %-10d %-17d |%-38s|\n", from, mus, 306 | hist[i]/interval, hist[i]*from*1.5/interval, star(hist[i], max_v, 38) 307 | } 308 | from = mus 309 | mus *= 2 310 | } 311 | fflush() 312 | delete hist 313 | delete starts # invalid if events missed between snapshots 314 | max_i = 0 315 | } 316 | 317 | $0 ~ /LOST.*EVENTS/ { print "WARNING: " $0 > "/dev/stderr" } 318 | ' 319 | 320 | ### end tracing 321 | end 322 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /Perf_probes/README: -------------------------------------------------------------------------------- 1 | 2 | Linux perf and uprobes for tracing and profiling Oracle workloads. 3 | Author: Luca.Canali@cern.ch 4 | January 2016 5 | 6 | See details at: 7 | http://externaltable.blogspot.com/2016/02/linux-perf-probes-for-oracle-tracing.html 8 | 9 | ---- 10 | Selected examples: 11 | 12 | Probes on the wait event interface: 13 | 14 | # export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/rdbms 15 | 16 | # perf probe -x $ORACLE_HOME/bin/oracle kskthewt timestamp=%di event=%si 17 | # perf probe -x $ORACLE_HOME/bin/oracle kskthbwt timestamp=%si event=%dx 18 | # perf probe -x $ORACLE_HOME/bin/oracle kews_update_wait_time wait_time=%si event=%r13 19 | 20 | # perf record -e probe_oracle:kews_update_wait_time -e probe_oracle:kskthbwt -e probe_oracle:kskthewt -p 21 | to stop gathering data 22 | # perf script 23 | 24 | Probe on SQL hard parsing: 25 | 26 | # perf probe -x $ORACLE_HOME/bin/oracle opiprs length=%dx sql='+0(%si)':"string" 27 | # perf record -e probe_oracle:opiprs -p 28 | # perf script 29 | 30 | ---- 31 | Note a workaround is needed for Oracle 12c when using Perf on recent kernels. The workaround is simple and consists in tracing the next instruction (for example kskthewt+2 instead of kskthewt), see [Oracle12_and_Perf](https://mahmoudhatem.wordpress.com/2017/03/22/workaround-for-linux-perf-probes-issue-for-oracle-tracing/) 32 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Linux Tracing Scripts 2 | Author: Luca.Canali@cern.ch 3 | 4 | This repository contains example scripts and tools for troubleshooting and performance analysis on Linux systems. It includes dynamic tracing scripts with FTrace, Perf, SystemTap, BPF/bcc. 5 | 6 | | Directory | Short description 7 | | -------------------------- | ------------------------------------------------------------------------------------- 8 | | [Ftrace](Ftrace) | I/O latency histograms at microsecond resolution using ftrace 9 | | [Perf](Perf_probes) | Linux Perf and uprobes for Oracle tracing and profiling 10 | | [SystemTap_Linux_IO](SystemTap_Linux_IO) | SystemTap scripts for Linux I/O tracing and I/O latency measurements 11 | | [SystemTap_Userspace_Oracle](SystemTap_Userspace_Oracle) | SystemTap scripts for Oracle RDBMS troubleshooting and internals investigations using userspace dynamic tracing 12 | | [BPF-bcc_Userspace_Oracle](BPF-bcc_Userspace_Oracle) | BPF/bcc scripts for Oracle userspace tracing, mostly ports from previous SystemTap and Perf work 13 | 14 | Disclaimer: 15 | Many of the scripts provided here are experimental, may cause unwanted effect especially on busy production systems and overall may be incompatible with your current set-up and/or need some tweaking before running. 16 | 17 | Acknowledgements: 18 | - [Brendan Gregg](https://twitter.com/brendangregg) for many original ideas and tools that have inspired large parts of this work 19 | - [Frits Hoogland](https://twitter.com/fritshoogland) for collaboration on investigating Oracle internals and userspace tracing 20 | - Dev teams for Ftrace, SystemTap, Perf, BPF and bcc 21 | 22 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/README: -------------------------------------------------------------------------------- 1 | 2 | SystemTap Linux I/O probes 3 | Luca.Canali@cern.ch 4 | 5 | This directory contains SystemTap scripts of help for troubleshooting and investigating Linux I/O. 6 | The following probes are for investigating I/O latency and/or tracing at the level of the block I/O interface. They are variations of the same idea: measuring I/O from kernel.trace("block_rq_issue") to kernel.trace("block_rq_complete"). 7 | This has the added advantage of working *without* kernel debuginfo, as in the original ideas of BrendanGregg's https://github.com/brendangregg/systemtap-lwtools 8 | The scripts based on kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete") have been tested on RHEL/OL 5,6,7 (i.e. up to kernel 3.10) however they will most likely not work with more recent kernels due for example to changes to kernel.trace probes and to struct bio. 9 | 10 | SCRIPT NAME -> SHORT DESCRIPTION 11 | blockio_rq_issue_basic_latencyhistogram.stp -> basic script to measure latency histograms for 12 | block I/O 13 | blockio_rq_issue_latencyhistogram.stp -> a more refined script for block I/O histograms 14 | blockio_rq_issue_latencyhistogram_new.stp -> same as above with additional optimizations only 15 | for SystemTap 2.6 and higher 16 | blockio_rq_issue_filter_latencyhistogram.stp -> block I/O histograms with additional configurable 17 | filters 18 | blockio_rq_issue_filter_latencyhistogram_new.stp -> same as above with additional optimizations only 19 | for SystemTap 2.6 or higher 20 | blockio_latency_outliers_per_device.stp -> gather stats on block I/O latency per disk and 21 | report events with latency greater then threshold 22 | blockio_rq_all_TRACE_EXPERIMENTAL.stp -> trace block I/O calls, use for troubleshooting, 23 | beware of the overhead. SystemTap 2.6 or higher 24 | blockio_rq_filter_TRACE_EXPERIMENTAL.stp -> tracing of block I/O with filters on device and 25 | operation, use for troubleshooting, 26 | beware of the overhead 27 | pread_latencyhistogram.stp -> histograms of the "pread" system call latency 28 | read_latencyhistogram.stp -> histograms and total latency of "read" call 29 | read_latencyhistogram_filterPID.stp -> histograms and total latency of "read" call with 30 | additional filter on target process 31 | 32 | The scripts here below produce block I/O histograms using ioblock and ioscheduler providers. 33 | They are similar to the scripts above but in this case kernel debuginfo are needed 34 | 35 | blockio_ioblock_latencyhistogram.stp -> latency histograms using the ioblock provider 36 | blockio_ioscheduler_latencyhistogram.stp -> latency histograms using the ioscheduler provider 37 | 38 | Tracing scripts for block I/O and for sync and async calls (including tracing libaio) 39 | 40 | sync_asyncio_and_libiao_TRACE.stp -> tracing of I/O calls including asynchronous I/O 41 | and (optioanlly) libaio. 42 | Additional references: 43 | http://externaltable.blogspot.com/2015/07/diagnose-high-latency-io-operations.html 44 | 45 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_ioblock_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_ioblock_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather block I/O latency from the Linux kernel block I/O interface 6 | # and print I/O latency histograms 7 | # 8 | # By Luca.Canali@cern.ch, March 2015 9 | # 10 | # Note this script requires kernel debuginfo 11 | # 12 | 13 | global LatencyTimes, RequestTime 14 | 15 | probe ioblock_trace.request { 16 | if ($bio) 17 | RequestTime[$bio] = gettimeofday_us() 18 | } 19 | 20 | probe ioblock.end { 21 | t = gettimeofday_us() 22 | s = RequestTime[$bio] 23 | delete RequestTime[$bio] 24 | if (s > 0) 25 | LatencyTimes <<< (t-s) 26 | } 27 | 28 | probe timer.sec(3) { 29 | if (@count(LatencyTimes) > 0) 30 | println(@hist_log(LatencyTimes)) 31 | delete(LatencyTimes) 32 | } 33 | 34 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_ioscheduler_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_ioscheduler_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather block I/O latency from the Linux kernel block I/O interface 6 | # and print I/O latency histograms 7 | # 8 | # By Luca.Canali@cern.ch, March 2015 9 | # 10 | # Note this script requires kernel debuginfo 11 | # 12 | 13 | global LatencyTimes, RequestTime 14 | 15 | probe ioscheduler.elv_next_request.return { 16 | rq = $return 17 | if (rq != 0) { 18 | RequestTime[rq] = gettimeofday_us() 19 | } 20 | } 21 | 22 | probe ioscheduler.elv_completed_request { 23 | t = gettimeofday_us() 24 | s = RequestTime[$rq] 25 | delete RequestTime[$rq] 26 | if (s > 0) 27 | LatencyTimes <<< (t-s) 28 | } 29 | 30 | probe timer.sec(3) { 31 | if (@count(LatencyTimes) > 0) 32 | println(@hist_log(LatencyTimes)) 33 | delete(LatencyTimes) 34 | } 35 | 36 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_latency_outliers_per_device.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_latency_outliers_per_device.stp 4 | # 5 | # This is a SystemTap script to drill down on latency outliers from the Linux kernel block I/O interface 6 | # 7 | # By Luca.Canali@cern.ch, July 2015 8 | # 9 | # Note: the use of kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete") allows the 10 | # script to work without the need to install kernel debuginfo. 11 | # 12 | # Notes: struct bio, details at http://lxr.free-electrons.com/source/include/linux/blk_types.h 13 | # struct request, details at http://lxr.free-electrons.com/source/include/linux/blkdev.h 14 | # 15 | # Usage: stap -v blockio_latency_outliers_per_device.stp 16 | # 17 | # Tested on Systemtap versions 1.8, 2,5 and 2.6, RHEL/OL 5,6,7 (i.e. up to kernel 3.10) 18 | # Note: this script will not work on some recent kernels due for example to changes to 19 | # kernel.trace probes and to struct bio 20 | # 21 | 22 | global RequestDevice, RequestSector, RequestTime[100000], LatencyTimes[1000] 23 | 24 | # variables used to define filters, edit as needed 25 | global IO_size = -1 # this will be used as a filter for the I/O request size 26 | # the value 8192 targets 8KB operations for Oracle single-block I/O 27 | # use the value -1 to disable this filter 28 | global IO_operation = -1 # this will be used as a filter: only read operations, i.e. bit N.1 of bi_rw == 0 29 | # a value of 0 considers only read operations (the value 1 is for write) 30 | # use the value -1 to disable this filter 31 | global IO_devmaj = -1 # this will be used as a filter: device major number ( -1 means no filter) 32 | # Example use the value 253 will consider only device 253 (device mapper block devices) 33 | global IO_devmin = -1 # this will be used as a filter: device minor number (or -1 if no filter) 34 | 35 | global LatencyThreshold = 500000 # Latency threshold after which a warning is printed to stdout, 36 | # it is overwritten by script parameter 1 if present via the begin probe 37 | 38 | probe begin { 39 | %( $# > 1 %? LatencyThreshold = $2 %) # if a parameter is provided, use it to overwrite the latency warning threshold 40 | printf("Measuring block I/O latency and statistics\n") 41 | printf("A warning will be printed for I/Os with latency higher than %d microseconds\n", LatencyThreshold) 42 | printf("Statistics will be printed every %d seconds. Press CTRL-C to stop\n\n", $1) 43 | } 44 | 45 | # probe on block I/O as it is issued, record the I/O if it matches the filters 46 | probe kernel.trace("block_rq_issue") { 47 | t = gettimeofday_us() 48 | # examine only the first bio record for simplicity (it seems that more than 1 bio is rare anyways) 49 | # rq type is struct request 50 | if ($rq->bio) # discard entries without a bio record 51 | if ($rq->bio->bi_bdev) # discard entries without a device associated 52 | if ($rq->bio->bi_flags & 8) # check BIO_SEG_VALID, introduced to avoid double counting with device mapper 53 | if ($rq->bio->bi_size > 0) # discard entries with size<=0 54 | if ((IO_operation == -1) ||(($rq->bio->bi_rw & 0x1) == IO_operation)) # filter on operation type (read or write) 55 | if ((IO_size == -1) || ($rq->bio->bi_size == IO_size)) { # filter on I/O size 56 | devmaj = $rq->bio->bi_bdev->bd_dev >> 20 57 | devmin = $rq->bio->bi_bdev->bd_dev - (devmaj << 20) 58 | if ((devmaj == IO_devmaj ) || (IO_devmaj == -1)) # optional filter on device major number 59 | if ((devmin == IO_devmin ) || (IO_devmin == -1)) { # optional filter on device minor number 60 | RequestTime[$rq->bio] = t # record the start time of this block I/O 61 | RequestDevice[$rq->bio] = kernel_string($rq->bio->bi_bdev->bd_disk->disk_name) 62 | RequestSector[$rq->bio] = $rq->bio->bi_sector 63 | } 64 | } 65 | } 66 | 67 | probe kernel.trace("block_rq_complete") { 68 | t = gettimeofday_us() 69 | if ($rq->bio) { # discard entries without a bio record 70 | s = RequestTime[$rq->bio] 71 | if (s > 0) { 72 | delta = t-s 73 | LatencyTimes[RequestDevice[$rq->bio]] <<< delta # populates latency histogram 74 | if (delta > LatencyThreshold) { 75 | printf("latency warning, >%d microsec: device=%s, sector=%d, latency=%d\n", 76 | LatencyThreshold, RequestDevice[$rq->bio], RequestSector[$rq->bio], delta) 77 | } 78 | delete RequestTime[$rq->bio] # clear the stored info for this $rq 79 | delete RequestDevice[$rq->bio] 80 | delete RequestSector[$rq->bio] 81 | } 82 | } 83 | } 84 | 85 | probe timer.sec($1) { 86 | printf("I/O latency basic statistics per device, measurement time: %d seconds\nLatency measured in microseconds\n", $1) 87 | printf("%-12s %14s %14s %14s %14s\n\n", "Disk name", "Num I/Os", "Min latency", "Avg latency", "Max latency") 88 | foreach ([disk] in LatencyTimes) { 89 | printf("%-12s %14d %14d %14d %14d\n", 90 | disk, @count(LatencyTimes[disk]), @min(LatencyTimes[disk]), @avg(LatencyTimes[disk]), @max(LatencyTimes[disk])) 91 | } 92 | printf("-----\n\n") 93 | delete(LatencyTimes) # resets all stats 94 | } 95 | 96 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_rq_all_TRACE_EXPERIMENTAL.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_rq_all_TRACE_EXPERIMENTAL.stp 4 | # 5 | # Trace all block I/O calls, USE WITH CARE, avoid running this on I/O-busy systems 6 | # 7 | # Luca.Canali@cern.ch 8 | # July 2015 9 | # Requires SystemTap 2.6 or higher 10 | # Tested on RHEL/EL 6.x,7.x 11 | # Note: this script will not work on some recent kernels due for example to changes to 12 | # kernel.trace probes and struct bio 13 | # 14 | # Usage: stap -v blockio_rq_all_TRACE_EXPERIMENTAL.stp 15 | # 16 | 17 | probe kernel.trace("block_rq_issue") { 18 | printf("pid=%d, block_rq_issue, rq=%lu...\n",pid(), $rq) 19 | for (curr_bio = $rq->bio; curr_bio; curr_bio = curr_bio->bi_next ) 20 | if (curr_bio->bi_bdev) { 21 | devmaj = curr_bio->bi_bdev->bd_dev >> 20 22 | devmin = curr_bio->bi_bdev->bd_dev - (devmaj << 20) 23 | printf("pid=%d, ....bio=%lu, bi_size=%d,bi_sector=%d,dev=%d,%d\n", pid(), curr_bio, curr_bio->bi_size, curr_bio->bi_sector, devmaj, devmin ) 24 | } 25 | else 26 | printf("pid=%d, ....bio=%lu, bi_size=%d,bi_sector=%d,dev=%d\n", pid(), curr_bio, curr_bio->bi_size, curr_bio->bi_sector, 0 ) 27 | } 28 | 29 | probe kernel.trace("block_rq_insert") { 30 | printf("pid=%d, block_rq_insert, rq=%lu...\n",pid(), $rq) 31 | for (curr_bio = $rq->bio; curr_bio; curr_bio = curr_bio->bi_next ) 32 | if (curr_bio->bi_bdev) { 33 | devmaj = curr_bio->bi_bdev->bd_dev >> 20 34 | devmin = curr_bio->bi_bdev->bd_dev - (devmaj << 20) 35 | printf("pid=%d, ....bio=%lu, bi_size=%d,bi_sector=%d,dev=%d,%d\n", pid(), curr_bio, curr_bio->bi_size, curr_bio->bi_sector, devmaj, devmin ) 36 | } 37 | else 38 | printf("pid=%d, ....bio=%lu, bi_size=%d,bi_sector=%d,dev=%d\n", pid(), curr_bio, curr_bio->bi_size, curr_bio->bi_sector, 0 ) 39 | } 40 | 41 | probe kernel.trace("block_rq_complete") { 42 | printf("pid=%d, block_rq_complete, rq=%lu...\n",pid(), $rq) 43 | for (curr_bio = $rq->bio; curr_bio; curr_bio = curr_bio->bi_next ) 44 | if (curr_bio->bi_bdev) { 45 | devmaj = curr_bio->bi_bdev->bd_dev >> 20 46 | devmin = curr_bio->bi_bdev->bd_dev - (devmaj << 20) 47 | printf("pid=%d, ....bio=%lu, bi_size=%d,bi_sector=%d,dev=%d,%d\n", pid(), curr_bio, curr_bio->bi_size, curr_bio->bi_sector, devmaj, devmin ) 48 | } 49 | else 50 | printf("pid=%d, ....bio=%lu, bi_size=%d,bi_sector=%d,dev=%d\n", pid(), curr_bio, curr_bio->bi_size, curr_bio->bi_sector, 0 ) 51 | } 52 | 53 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_rq_filter_TRACE_EXPERIMENTAL.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # Usage: stap -v blockio_rq_filter_TRACE_EXPERIMENTAL.stp 4 | # 5 | # Trace all block I/O calls for a given device, USE WITH CARE, avoid running this on I/O-busy systems 6 | # Edit the script with the details of the device to trace before running 7 | # 8 | # Luca.Canali@cern.ch 9 | # July 2015 10 | # Requires SystemTap 2.6 or higher 11 | # Tested on RHEL/EL 6.x,7.x 12 | # Note: this script will not work on some recent kernels due for example to changes to 13 | # kernel.trace probes and to struct bio 14 | # 15 | # Usage: stap -v blockio_rq_filter_TRACE_EXPERIMENTAL.stp 16 | # 17 | 18 | global RequestTime[1000] 19 | 20 | # variables used to define filters, edit as needed 21 | global IO_size = -1 # this will be used as a filter for the I/O request size 22 | # the value 8192 targets 8KB operations for Oracle single-block I/O 23 | # use the value -1 to disable this filter 24 | global IO_operation = -1 # this will be used as a filter: only read operations, i.e. bit N.1 of bi_rw == 0 25 | # a value of 0 considers only read operations (the value 1 is for write) 26 | # use the value -1 to disable this filter 27 | global IO_devmaj = -1 # this will be used as a filter: device major number ( -1 means no filter) 28 | # Example use the value 253 will consider only device 253 (device mapper block devices) 29 | # Or put here the major number of the device you want to filter 30 | # (use ls -l to find major and minor number) 31 | global IO_devmin = -1 # this will be used as a filter: device minor number (or -1 if no filter) 32 | 33 | 34 | # probe on block I/O as it is issued, record the I/O if it matches the filters 35 | probe kernel.trace("block_rq_issue") { 36 | t = gettimeofday_us() 37 | # examine only the first bio record for simplicity (it seems that more than 1 bio is rare anyways) 38 | # rq type is struct request 39 | if ($rq->bio) # discard entries without a bio record 40 | if ($rq->bio->bi_bdev) # discard entries without a device associated 41 | if ($rq->bio->bi_flags & 8) # check BIO_SEG_VALID, introduced to avoid double counting with device mapper 42 | if ($rq->bio->bi_size > 0) # discard entries with size<=0 43 | if ((IO_operation == -1) ||(($rq->bio->bi_rw & 0x1) == IO_operation)) # filter on operation type (read or write) 44 | if ((IO_size == -1) || ($rq->bio->bi_size == IO_size)) { # filter on I/O size 45 | devmaj = $rq->bio->bi_bdev->bd_dev >> 20 46 | devmin = $rq->bio->bi_bdev->bd_dev - (devmaj << 20) 47 | if ((devmaj == IO_devmaj ) || (IO_devmaj == -1)) # optional filter on device major number 48 | if ((devmin == IO_devmin ) || (IO_devmin == -1)) { # optional filter on device minor number 49 | printf("block_io rq, bio=%lu, dev=%s, sector=%d\n",$rq->bio, 50 | kernel_string($rq->bio->bi_bdev->bd_disk->disk_name), $rq->bio->bi_sector) 51 | RequestTime[$rq->bio] = t # record the start time of this block I/O 52 | } 53 | } 54 | } 55 | 56 | probe kernel.trace("block_rq_complete") { 57 | t = gettimeofday_us() 58 | if ($rq->bio) { # discard entries without a bio record 59 | s = RequestTime[$rq->bio] 60 | if (s > 0) { 61 | delta = t-s 62 | printf(".....block_rq_complete, latency microsecond:%d, bio=%lu\n",delta,$rq->bio) 63 | } 64 | delete RequestTime[$rq->bio] # clear the stored info for this $rq 65 | } 66 | } 67 | 68 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_rq_issue_basic_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_rq_issue_basic_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather block I/O latency from the Linux kernel block I/O interface 6 | # and print I/O latency histograms 7 | # 8 | # See also blockio_rq_issue_latencyhistogram.stp and blockio_rq_issue_filter_latencyhistogram.stp 9 | # for more sophisticated versions of this type of latency measurement 10 | # 11 | # By Luca.Canali@cern.ch, March 2015 12 | # 13 | # This script is based on original ideas of biolatency-nd.stp of systemtap-lwtools 14 | # by Brendan Gregg 15 | # 16 | # Note the use of kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete") 17 | # this allows the probe to work without the need to install kernel debuginfo 18 | # 19 | 20 | global LatencyTimes, RequestTime 21 | 22 | probe kernel.trace("block_rq_issue") { 23 | RequestTime[$rq] = gettimeofday_us() 24 | } 25 | 26 | probe kernel.trace("block_rq_complete") { 27 | t = gettimeofday_us() 28 | s = RequestTime[$rq] 29 | if (s > 0) { 30 | LatencyTimes <<< (t-s) 31 | delete RequestTime[$rq] 32 | } 33 | } 34 | 35 | probe timer.sec(3) { 36 | if (@count(LatencyTimes) > 0) 37 | println(@hist_log(LatencyTimes)) 38 | delete(LatencyTimes) 39 | } 40 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_rq_issue_filter_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_rq_issue_filter_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather block I/O latency from the Linux kernel block I/O interface 6 | # 7 | # By Luca.Canali@cern.ch, July 2015 8 | # This script is based on original ideas by Brendan Gregg see also biolatency-nd.stp and systemtap-lwtools 9 | # 10 | # The script blockio_rq_issue_filter_latencyhistogram.stp records latency histograms for block I/Os with filters 11 | # the default filters gather read operation of 8KB 12 | # 13 | # Note: the use of kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete") allows the 14 | # script to work without the need to install kernel debuginfo. 15 | # 16 | # Notes: struct bio, details at http://lxr.free-electrons.com/source/include/linux/blk_types.h 17 | # struct request, details at http://lxr.free-electrons.com/source/include/linux/blkdev.h 18 | # 19 | # Usage: stap -v blockio_rq_issue_filter_latencyhistogram.stp 20 | # 21 | # Tested on Systemtap versions 1.8, 2,5 and 2.6, RHEL/OL 5,6,7 (i.e. up to kernel 3.10) 22 | # Note: this script will not work on some recent kernels due for example to changes to 23 | # kernel.trace probes and struct bio 24 | # 25 | 26 | global LatencyTimes, RequestTime[100000] 27 | 28 | # variables used to define filters, edit as needed 29 | global IO_size = 8192 # this will be used as a filter for the I/O request size 30 | # the value 8192 targets 8KB operations for Oracle single-block I/O 31 | # use the value -1 to disable this filter 32 | global IO_operation = 0 # this will be used as a filter: only read operations, i.e. bit N.1 of bi_rw == 0 33 | # a value of 0 considers only read operations (the value 1 is for write) 34 | # use the value -1 to disable this filter 35 | global IO_devmaj = -1 # this will be used as a filter: device major number ( -1 means no filter) 36 | # Example use the value 253 will consider only device 253 (device mapper block devices) 37 | global IO_devmin = -1 # this will be used as a filter: device minor number (or -1 if no filter) 38 | 39 | probe begin { 40 | printf("Block I/O latency histograms from kernel trace points\n") 41 | printf("Filters:\n") 42 | printf(" IO_size = %d\n", IO_size) 43 | printf(" IO_operation = %d (0=read, 1=write, -1=disable filter)\n", IO_operation ) 44 | printf(" IO_devmaj = %d (-1=disable filter)\n", IO_devmaj) 45 | printf(" IO_devmin = %d (-1=disable filter)\n\n", IO_devmin) 46 | } 47 | 48 | # probe on block I/O as it is issued, record the I/O if it matches the filters 49 | probe kernel.trace("block_rq_issue") { 50 | t = gettimeofday_us() 51 | # examine only the first bio record for simplicity (it seems that more than 1 bio is rare anyways) 52 | # rq type is struct request 53 | if ($rq->bio) # discard entries without a bio record 54 | if ($rq->bio->bi_bdev) # discard entries without a device associated 55 | if ($rq->bio->bi_flags & 8) # check BIO_SEG_VALID, introduced to avoid double counting with device mapper 56 | if ($rq->bio->bi_size > 0) # discard entries with size<=0 57 | if ((IO_operation == -1) ||(($rq->bio->bi_rw & 0x1) == IO_operation)) # filter on operation type (read or write) 58 | if ((IO_size == -1) || ($rq->bio->bi_size == IO_size)) { # filter on I/O size 59 | devmaj = $rq->bio->bi_bdev->bd_dev >> 20 60 | devmin = $rq->bio->bi_bdev->bd_dev - (devmaj << 20) 61 | if ((devmaj == IO_devmaj ) || (IO_devmaj == -1)) # optional filter on device major number 62 | if ((devmin == IO_devmin ) || (IO_devmin == -1)) # optional filter on device minor number 63 | RequestTime[$rq->bio] = t # record the start time of this block I/O 64 | } 65 | } 66 | 67 | # I/O is finished, measure end time and add to histogram 68 | probe kernel.trace("block_rq_complete") { 69 | t = gettimeofday_us() 70 | if ($rq->bio) { # discard entries without a bio record 71 | s = RequestTime[$rq->bio] 72 | if (s > 0) { 73 | LatencyTimes <<< (t-s) # populates latency histogram 74 | delete RequestTime[$rq->bio] # clears the timer for this block I/O 75 | } 76 | } 77 | } 78 | 79 | # print histogram every $1 seconds ($1 is the script parameter N.1) 80 | probe timer.sec($1) { 81 | printf("Block I/O latency histogram, measurement time: %d seconds, I/O count: %d\n", $1, @count(LatencyTimes)) 82 | printf("Value = latency bucket (microseconds), count=I/O operations in %d seconds\n", $1) 83 | if (@count(LatencyTimes) > 0) 84 | println(@hist_log(LatencyTimes)) 85 | delete(LatencyTimes) 86 | } 87 | 88 | 89 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_rq_issue_filter_latencyhistogram_new.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_rq_issue_filter_latencyhistogram_new.stp 4 | # 5 | # This is a SystemTap script to gather block I/O latency from the Linux kernel block I/O interface 6 | # 7 | # By Luca.Canali@cern.ch, July 2015 8 | # This script is based on original ideas by Brendan Gregg see also biolatency-nd.stp and systemtap-lwtools 9 | # 10 | # The script blockio_rq_issue_filter_latencyhistogram_new.stp records latency histograms for block I/Os with filters 11 | # the default filters gather read operation of 8KB 12 | # 13 | # Note: the use of kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete") allows the 14 | # script to work without the need to install kernel debuginfo. 15 | # 16 | # Notes: struct bio, details at http://lxr.free-electrons.com/source/include/linux/blk_types.h 17 | # struct request, details at http://lxr.free-electrons.com/source/include/linux/blkdev.h 18 | # *this version requires Systemtap v2.6 or higher* 19 | # 20 | # Usage: stap -v blockio_rq_issue_filter_latencyhistogram_new.stp 21 | # 22 | # Tested on Systemtap versions 2.6 and RHEL/OL 7 23 | # Note: this script will not work on some recent kernels due for example to changes to 24 | # kernel.trace probes and to struct bio 25 | # 26 | 27 | global LatencyTimes, RequestTime[100000] 28 | 29 | # variables used to define filters, edit as needed 30 | global IO_size = 8192 # this will be used as a filter for the I/O request size 31 | # the value 8192 targets 8KB operations for Oracle single-block I/O 32 | # use the value -1 to disable this filter 33 | global IO_operation = 0 # this will be used as a filter: only read operations, i.e. bit N.1 of bi_rw == 0 34 | # a value of 0 considers only read operations (the value 1 is for write) 35 | # use the value -1 to disable this filter 36 | global IO_devmaj = -1 # this will be used as a filter: device major number ( -1 means no filter) 37 | # Example use the value 253 will consider only device 253 (device mapper block devices) 38 | global IO_devmin = -1 # this will be used as a filter: device minor number (or -1 if no filter) 39 | 40 | probe begin { 41 | printf("Block I/O latency histograms from kernel trace points\n") 42 | printf("Filters:\n") 43 | printf(" IO_size = %d\n", IO_size) 44 | printf(" IO_operation = %d (0=read, 1=write, -1=disable filter)\n", IO_operation ) 45 | printf(" IO_devmaj = %d (-1=disable filter)\n", IO_devmaj) 46 | printf(" IO_devmin = %d (-1=disable filter)\n\n", IO_devmin) 47 | } 48 | 49 | # probe on block I/O as it is issued, record the I/O if it matches the filters 50 | probe kernel.trace("block_rq_issue") { 51 | t = gettimeofday_us() 52 | # loop on all bio records in this $rq. Normally there is only 1, but there could be more. 53 | # only collect info if operation type and size match requirements 54 | # rq type is struct request 55 | for (curr_bio = $rq->bio; curr_bio; curr_bio = curr_bio->bi_next) { # discard entries without a bio record 56 | if ($rq->bio->bi_bdev) # discard entries without a device associated 57 | if ($rq->bio->bi_flags & 8) # check BIO_SEG_VALID, introduced to avoid double counting with device mapper 58 | if ($rq->bio->bi_size > 0) # discard entries with size<=0 59 | if ((IO_operation == -1) ||(($rq->bio->bi_rw & 0x1) == IO_operation)) # filter on operation type (read or write) 60 | if ((IO_size == -1) || ($rq->bio->bi_size == IO_size)) { # filter on I/O size 61 | devmaj = $rq->bio->bi_bdev->bd_dev >> 20 62 | devmin = $rq->bio->bi_bdev->bd_dev - (devmaj << 20) 63 | if ((devmaj == IO_devmaj ) || (IO_devmaj == -1)) # optional filter on device major number 64 | if ((devmin == IO_devmin ) || (IO_devmin == -1)) # optional filter on device minor number 65 | RequestTime[$rq->bio] = t # record the start time of this block I/O 66 | } 67 | } 68 | } 69 | 70 | # I/O is finished, measure end time and add to histogram 71 | probe kernel.trace("block_rq_complete") { 72 | t = gettimeofday_us() 73 | for (curr_bio = $rq->bio; curr_bio; curr_bio = curr_bio->bi_next) { # discard entries without a bio record 74 | s = RequestTime[curr_bio] 75 | if (s > 0) { 76 | LatencyTimes <<< (t-s) # populates latency histogram 77 | delete RequestTime[curr_bio] # clears the timer for this block I/O 78 | } 79 | } 80 | } 81 | 82 | # print histogram every $1 seconds ($1 is the script parameter N.1) 83 | probe timer.sec($1) { 84 | printf("Block I/O latency histogram, measurement time: %d seconds, I/O count: %d\n", $1, @count(LatencyTimes)) 85 | printf("Value = latency bucket (microseconds), count=I/O operations in %d seconds\n", $1) 86 | if (@count(LatencyTimes) > 0) 87 | println(@hist_log(LatencyTimes)) 88 | delete(LatencyTimes) 89 | } 90 | 91 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_rq_issue_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_rq_issue_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather block I/O latency from the Linux kernel block I/O interface 6 | # 7 | # By Luca.Canali@cern.ch, July 2015 8 | # This script is based on original ideas by Brendan Gregg see also biolatency-nd.stp and systemtap-lwtools 9 | # 10 | # Note: the use of kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete") allows the 11 | # script to work without the need to install kernel debuginfo. 12 | # 13 | # Notes: struct bio, details at http://lxr.free-electrons.com/source/include/linux/blk_types.h 14 | # struct request, details at http://lxr.free-electrons.com/source/include/linux/blkdev.h 15 | # 16 | # Usage: stap -v blockio_rq_issue_latencyhistogram.stp 17 | # 18 | # Tested on Systemtap versions 1.8, 2,5 and 2.6, RHEL/OL 5,6,7 19 | # Note: this script will not work on some recent kernels due for example to changes to 20 | # kernel.trace probes and struct bio 21 | # 22 | 23 | global LatencyTimes, RequestTime[100000] 24 | 25 | probe begin { 26 | printf("Block I/O latency histograms from kernel trace points\n\n") 27 | } 28 | 29 | # probe on block I/O as it is issued 30 | probe kernel.trace("block_rq_issue") { 31 | t = gettimeofday_us() 32 | # examine only the first bio record for simplicity (it seems that more than 1 bio is rare anyways) 33 | # rq type is struct request 34 | if ($rq->bio) # discard entries without a bio record 35 | if ($rq->bio->bi_bdev) # discard entries without a device associated 36 | if ($rq->bio->bi_flags & 8) # check BIO_SEG_VALID, introduced to avoid double counting with device mapper 37 | if ($rq->bio->bi_size > 0) # discard entries with size<=0 38 | RequestTime[$rq->bio] = t # record the start time of this block I/O 39 | } 40 | 41 | # I/O is finished, measure end time and add to histogram 42 | probe kernel.trace("block_rq_complete") { 43 | t = gettimeofday_us() 44 | if ($rq->bio) { # discard entries without a bio record 45 | s = RequestTime[$rq->bio] 46 | if (s > 0) { 47 | LatencyTimes <<< (t-s) # populates latency histogram 48 | delete RequestTime[$rq->bio] # clears the timer for this block I/O 49 | } 50 | } 51 | } 52 | 53 | # print histogram every $1 seconds ($1 is the script parameter N.1) 54 | probe timer.sec($1) { 55 | printf("Block I/O latency histogram, measurement time: %d seconds, I/O count: %d\n", $1, @count(LatencyTimes)) 56 | printf("Value = latency bucket (microseconds), count=I/O operations in %d seconds\n", $1) 57 | if (@count(LatencyTimes) > 0) 58 | println(@hist_log(LatencyTimes)) 59 | delete(LatencyTimes) 60 | } 61 | 62 | 63 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/blockio_rq_issue_latencyhistogram_new.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # blockio_rq_issue_latencyhistogram_new.stp 4 | # 5 | # This is a SystemTap script to gather block I/O latency from the Linux kernel block I/O interface 6 | # 7 | # By Luca.Canali@cern.ch, June 2015 8 | # This script is based on original ideas by Brendan Gregg see also biolatency-nd.stp and systemtap-lwtools 9 | # 10 | # Note: the use of kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete") allows the 11 | # script to work without the need to install kernel debuginfo. 12 | # 13 | # Notes: struct bio, details at http://lxr.free-electrons.com/source/include/linux/blk_types.h 14 | # struct request, details at http://lxr.free-electrons.com/source/include/linux/blkdev.h 15 | # *this version requires Systemtap 2.6 or higher* 16 | # 17 | # Usage: stap -v blockio_rq_issue_latencyhistogram_new.stp 18 | # 19 | # Tested on Systemtap versions 2.6 and RHEL/OL 7 20 | # Note: this script will not work on some recent kernels due for example to changes to 21 | # kernel.trace probes and to struct bio 22 | # 23 | 24 | global LatencyTimes, RequestTime[100000] 25 | 26 | probe begin { 27 | printf("Block I/O latency histograms from kernel trace points\n\n") 28 | } 29 | 30 | # probe on block I/O as it is issued 31 | probe kernel.trace("block_rq_issue") { 32 | t = gettimeofday_us() 33 | # loop on all bio records in this $rq. Normally there is only 1, but there could be more. 34 | # rq type is struct request 35 | for (curr_bio = $rq->bio; curr_bio; curr_bio = curr_bio->bi_next) { # discard entries without a bio record 36 | if (!(curr_bio->bi_bdev)) # discard entries without a device associated 37 | continue 38 | if (curr_bio->bi_size <= 0) # discard entries with size<=0 39 | continue 40 | if (!($rq->bio->bi_flags & 8)) # check BIO_SEG_VALID, introduced to avoid double counting with device mapper 41 | continue 42 | RequestTime[curr_bio] = t # record the start time of this block I/O 43 | } 44 | } 45 | 46 | # I/O is finished, measure end time and add to histogram 47 | probe kernel.trace("block_rq_complete") { 48 | t = gettimeofday_us() 49 | for (curr_bio = $rq->bio; curr_bio; curr_bio = curr_bio->bi_next) { # discard entries without a bio record 50 | s = RequestTime[curr_bio] 51 | if (s > 0) { 52 | LatencyTimes <<< (t-s) # populates latency histogram 53 | delete RequestTime[curr_bio] # clears the timer for this block I/O 54 | } 55 | } 56 | } 57 | 58 | # print histogram every $1 seconds ($1 is the script parameter N.1) 59 | probe timer.sec($1) { 60 | printf("Block I/O latency histogram, measurement time: %d seconds, I/O count: %d\n", $1, @count(LatencyTimes)) 61 | printf("Value = latency bucket (microseconds), count=I/O operations in %d seconds\n", $1) 62 | if (@count(LatencyTimes) > 0) 63 | println(@hist_log(LatencyTimes)) 64 | delete(LatencyTimes) 65 | } 66 | 67 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/pread_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # pread_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather I/O latency from the Linux kernel syscall pread 6 | # and print I/O latency histograms 7 | # 8 | # By Luca.Canali@cern.ch, March 2015 9 | # 10 | # This script is based on original ideas of biolatency-nd.stp of systemtap-lwtools 11 | # by Brendan Gregg 12 | # 13 | # Note: this probe does not need to have kernel debuginfo installed 14 | # 15 | # Usage: stap -v pread_latencyhistogram.stp 16 | # 17 | # Example: stap -v pread_latencyhistogram.stp 3 18 | # 19 | 20 | global LatencyTimes, RequestTime 21 | 22 | probe nd_syscall.pread { 23 | RequestTime[tid()] = gettimeofday_us() 24 | } 25 | 26 | probe nd_syscall.pread.return { 27 | t = gettimeofday_us() 28 | s = RequestTime[tid()] 29 | if (s > 0) { 30 | LatencyTimes <<< (t-s) 31 | delete RequestTime[tid()] 32 | } 33 | } 34 | 35 | probe timer.sec($1) { 36 | if (@count(LatencyTimes) > 0) 37 | println(@hist_log(LatencyTimes)) 38 | delete(LatencyTimes) 39 | } 40 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/read_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # read_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather I/O latency from the Linux kernel syscall read 6 | # and print I/O latency histograms 7 | # 8 | # By Luca.Canali@cern.ch, September 2017 9 | # Based, with minor modifications, on pread_latencyhistogram.stp (March 2015) 10 | # 11 | # This script is based on original ideas of biolatency-nd.stp of systemtap-lwtools 12 | # by Brendan Gregg 13 | # 14 | # Note: this probe does not need to have kernel debuginfo installed 15 | # 16 | # Usage: stap -v read_latencyhistogram.stp 17 | # 18 | # Example: stap -v read_latencyhistogram.stp 3 19 | # 20 | 21 | global LatencyTimes, RequestTime, TotalLatency 22 | 23 | probe nd_syscall.read { 24 | RequestTime[tid()] = gettimeofday_us() 25 | } 26 | 27 | probe nd_syscall.read.return { 28 | t = gettimeofday_us() 29 | s = RequestTime[tid()] 30 | if (s > 0) { 31 | deltaT = t-s 32 | TotalLatency += deltaT 33 | LatencyTimes <<< deltaT 34 | delete RequestTime[tid()] 35 | } 36 | } 37 | 38 | probe timer.sec($1) { 39 | if (@count(LatencyTimes) > 0) { 40 | println("Latency histogram of read calls in the interval") 41 | println(@hist_log(LatencyTimes)) 42 | print("Summed latency in the interval (microseconds): ") 43 | println(TotalLatency) 44 | } 45 | delete(LatencyTimes) 46 | TotalLatency = 0 47 | } 48 | 49 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/read_latencyhistogram_filterPID.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # read_latencyhistogram_filterPID.stp 4 | # 5 | # This is a SystemTap script to gather I/O latency from the Linux kernel syscall read 6 | # and print I/O latency histograms 7 | # Add a filter on the traced process with the -x option 8 | # 9 | # By Luca.Canali@cern.ch, September 2017 10 | # Based, with minor modifications, on pread_latencyhistogram.stp (March 2015) 11 | # 12 | # This script is based on original ideas of biolatency-nd.stp of systemtap-lwtools 13 | # by Brendan Gregg 14 | # 15 | # Note: this probe does not need to have kernel debuginfo installed 16 | # 17 | # Usage: stap -v read_latencyhistogram_filterPID.stp -x 18 | # 19 | # Example: stap -v read_latencyhistogram_filterPID.stp -x 1234 3 20 | # 21 | 22 | global LatencyTimes, RequestTime, TotalLatency 23 | 24 | probe nd_syscall.read { 25 | if (pid() == target()) { 26 | RequestTime[tid()] = gettimeofday_us() 27 | } 28 | } 29 | 30 | probe nd_syscall.read.return { 31 | if (pid() == target()) { 32 | t = gettimeofday_us() 33 | s = RequestTime[tid()] 34 | if (s > 0) { 35 | deltaT = t-s 36 | TotalLatency += deltaT 37 | LatencyTimes <<< deltaT 38 | delete RequestTime[tid()] 39 | } 40 | } 41 | } 42 | 43 | probe timer.sec($1) { 44 | if (@count(LatencyTimes) > 0) { 45 | println("Latency histogram of read calls in the interval") 46 | println(@hist_log(LatencyTimes)) 47 | print("Summed latency in the interval (microseconds): ") 48 | println(TotalLatency) 49 | } 50 | delete(LatencyTimes) 51 | TotalLatency = 0 52 | } 53 | 54 | -------------------------------------------------------------------------------- /SystemTap_Linux_IO/sync_asyncio_and_libaio_TRACE.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # sync_asyncio_and_libiaio_TRACE.stp 4 | # 5 | # This is a SystemTap probe to trace Linux I/O system calls, including async I/O originally 6 | # developed to trace Oracle system calls for I/O processing. 7 | # 8 | # Dependencies: 9 | # needs kernel debuginfo 10 | # additional dependencies are listed for tracing libaio 11 | # by default the libaio probe for io_getevents_0_4 is commented out, uncomment if you need it 12 | # 13 | # Compatibility: tested on RHEL/OL 5.10, 6.x and 7.x 14 | # 15 | # Usage: stap -v sync_asyncio_and_libiaio_TRACE.stp -x 16 | # 17 | # Version 1.0, Oct 2014 by Luca.Canali@cern.ch 18 | # Latest updates, August 2015. 19 | # Additional credits for original contributions: @FritsHoogland 20 | # 21 | # Note: this is experimental code, use at your own risk 22 | # 23 | 24 | #################### 25 | # Trace libaio # 26 | #################### 27 | 28 | # Dependencies: needs libaio debuginfo 29 | # Use systemtap 2.5 or higher (for Oracle userspace tracing) or comment out 30 | # On some systems need to change "/usr/lib64/libaio.so" for "/lib64/libaio.so.1" 31 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 32 | # Uncomment the probe code here below if you want to trace libaio.io_getevents calls 33 | 34 | # probe process("/usr/lib64/libaio.so").function("io_getevents_0_4") { 35 | # printf ("LIBAIO:->io_getevents_0_4: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout.tv_sec=%d\n", 36 | # local_clock_us(), execname(), pid(), $min_nr, @cast($timeout,"struct timespec","")->tv_sec) 37 | # } 38 | 39 | ###################### 40 | # Trace Physical IO # 41 | ###################### 42 | 43 | probe syscall.pread, syscall.pwrite { 44 | if (pid() == target()) { 45 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, fd=%d, offset=%d, count(bytes)=%d\n", name, local_clock_us(), execname(), pid(), fd, offset, count) 46 | } 47 | } 48 | 49 | probe syscall.pread.return, syscall.pwrite.return { 50 | if (pid() == target()) { 51 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(bytes)=%d\n", name, local_clock_us(), execname(), pid(), $return) 52 | } 53 | } 54 | 55 | # some ugly tricks added here for compatibility, in particular there seem to be problems in RHEL/EL 7.x debuginfo related to 56 | # $iocbpp which is incorrectly reported as long int in those system instead of struct iocb**, also we use a trick to resolve the array 57 | # the use of a probe on kernel.function instead of syscall.io_submit also needed for some platforms (RHEL 6.7) 58 | # See below for a more basic probe that works on RHEL and RHEL6.6 59 | probe kernel.function("sys_io_submit") { 60 | if (pid() == target()) { 61 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", "io_submit", local_clock_us(), execname(), pid(), $nr) 62 | for (i=0; i<$nr; i++) { 63 | printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_fildes, 64 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_offset, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_nbytes, 65 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_lio_opcode) 66 | } 67 | } 68 | } 69 | 70 | # For reference this is the original probe on io_submit without the compatibility tricks used above 71 | # probe syscall.io_submit { 72 | # if (pid() == target()) { 73 | # printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", name, local_clock_us(), execname(), pid(), nr) 74 | # for (i=0; i<$nr; i++) { 75 | # printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, $iocbpp[i]->aio_fildes, 76 | # $iocbpp[i]->aio_offset, $iocbpp[i]->aio_nbytes, $iocbpp[i]->aio_lio_opcode) 77 | # } 78 | # } 79 | # } 80 | 81 | 82 | probe syscall.io_submit.return { 83 | if (pid() == target()) { 84 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 85 | } 86 | } 87 | 88 | probe syscall.io_getevents { 89 | if (pid() == target()) { 90 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout=%s\n", name, local_clock_us(), execname(), pid(), min_nr, $timeout$) 91 | } 92 | } 93 | 94 | # need to explicitly cast $events for RHEL/EL 7.x kernels, where debuginfo issues report $events as long int instead of struct io_event* 95 | probe syscall.io_getevents.return { 96 | if (pid() == target()) { 97 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 98 | for (i=0; i<$return; i++) { # cycle over the reaped I/Os 99 | obj_addr = @cast($events, "struct io_event")[i]->obj # details of struct iocb in /usr/include/libaio.h 100 | fildes = user_uint32(obj_addr +20) 101 | bytes = user_uint64(obj_addr +32) 102 | offset = user_uint64(obj_addr +40) 103 | printf(" %d:, fildes=%d, offset=%lu, bytes=%lu\n", i+1, fildes, offset, bytes) 104 | } 105 | } 106 | } 107 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/README: -------------------------------------------------------------------------------- 1 | 2 | This directory contains SystemTap scripts for userspace probing of Oracle database processes, originally developed for Oracle troubleshooting and investigations of Oracle internals. 3 | 4 | Author: Luca.Canali@cern.ch 5 | First release: September 2014 6 | 7 | Relevant blog posts: 8 | http://externaltable.blogspot.com/2014/11/life-of-oracle-io-tracing-logical-and.html 9 | http://externaltable.blogspot.com/2014/09/systemtap-into-oracle-for-fun-and-profit.html 10 | http://externaltable.blogspot.com/2016/03/systemtap-guru-mode-and-oracle-sql.html 11 | 12 | Acknowledgments: 13 | @FritsHoogland for original work and collaboration on Oracle userspace tracing. 14 | 15 | Compatibility and notes: 16 | The userspace-trace scripts require Linux UTRACE or UPROBES. 17 | This is the case of RHEL/CentOS/Oracle Linux 6.x (UTRACE)and 7.x (UPROBES) 18 | These scripts will not work with old kernels such as RHEL 5. 19 | SystemTap 2.5 or higher is needed. 20 | 21 | Notable exception and issues with Uprobes and Oracle 12c 22 | When using UPROBES to trace Oracle 12c the scripts will throw "inode-offset registration error" 23 | Oracle 11g is not affected by this issue 24 | This issue is relevant for RHEL/OL version 7.1 or higher 25 | @Hatem__Mahmoud has describes this issue in details and proposes a workaround at: 26 | https://mahmoudhatem.wordpress.com/2017/03/27/workaround-for-systemtap-issue-oracle-tracing-registration-error-rc-0 27 | Another workaround is to use an older version of Linux kernel (kernel 3.10.0-123.x or older) 28 | 29 | The installation of Kernel debuginfo is needed for several of the scripts that also have OS probes. See details on the scripts' headers. 30 | 31 | 32 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/eventsname.sql: -------------------------------------------------------------------------------- 1 | -- 2 | -- eventsname.sql 3 | -- 4 | -- This sqlplus script generates a sed script file to replace oracle wait event numbers with even names 5 | -- intended to be used together the systemtap trace scripts 6 | -- 7 | -- L.C. Aug 2014 8 | -- 9 | 10 | set echo off pages 0 lines 200 feed off head off sqlblanklines off trimspool on trimout on 11 | 12 | spool eventsname.sed 13 | 14 | select 's/\/'||'event='||replace(name,'/','\/')||'/g' SED from v$event_name order by event# desc; 15 | 16 | spool off 17 | exit 18 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/experimental/logical_io_latency.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # This is an example script for measuring latency of logical I/O (for consistent read) 4 | # This is just example code, not intended to be complete nor precise 5 | # Do not run in production as the overhead can be very high 6 | # 7 | # Use: stap -v logical_io_latency { note optionally add -x to trace 1 pid only} 8 | # 9 | # needs Linux kernel 3.10 or higher, tested on: OL7.0 and Oracle 12.1.0.2 10 | # 11 | # Version 1.0, Nov 2014 by Luca.Canali@cern.ch 12 | # Additional credits for original contributions: @FritsHoogland 13 | # 14 | 15 | global track_time[10000] 16 | global latency_logicalio 17 | 18 | probe process("oracle").function("kcbgtcr") { # consistent reads 19 | track_time[pid()] = local_clock_us() 20 | } 21 | 22 | probe process("oracle").function("kcbgtcr").return { # consistent reads 23 | latency_logicalio <<< (local_clock_us() - track_time[pid()]) 24 | } 25 | 26 | 27 | probe timer.sec(10) { 28 | printf("\n") 29 | printf("Sample period: 10 seconds\n") 30 | printf("\n") 31 | printf("Total logical IOs for consistent read: %10d\n", @count(latency_logicalio)) 32 | printf("\n") 33 | if (@count(latency_logicalio)) { 34 | printf("Logical IO latency in microseconds (us):\n") 35 | printf("\n") 36 | printf("Min:%10d\n", @min(latency_logicalio)) 37 | printf("Max:%10d\n", @max(latency_logicalio)) 38 | printf("Avg:%10d\n", @avg(latency_logicalio)) 39 | printf("\n") 40 | printf("Histogram representation of IO latency in microseconds (us):\n") 41 | printf("\n") 42 | print(@hist_log(latency_logicalio)) 43 | delete latency_logicalio 44 | } 45 | } 46 | 47 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/experimental/oracle_events_12102_resolve_eventnames_xksled.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # stap script tracing oracle wait events 4 | # 5 | # Use: stap -v oracle_events_12102_resolve_eventnames.stp -x 6 | # 7 | # tested on: OL7 and Oracle 12.1.0.2 8 | # 9 | # wait event names are read from memory 10 | # - this works but under some circumstances: Oracle has to page-in the memory pages containing the even name strings, as stored in the oracle binary 11 | # - for example if you have access to the session being traced page-in of the strings can be triggered by running: select * from v$event_name; 12 | # - another method is to briefly enable 10046 trace for the session under study: 13 | # exec dbms_monitor.session_trace_enable(,) 14 | # exec dbms_monitor.session_trace_disable(,) 15 | # 16 | # Version 1.0, Oct 2014 by Luca.Canali@cern.ch 17 | # Additional credits for original contributions: @FritsHoogland 18 | # Note: this is experimental code, use at your own risk 19 | # 20 | 21 | probe process("oracle").function("kskthewt") { 22 | xksuse = register("r13")-3928 23 | ksuudnam = user_string(xksuse + 140) 24 | ksusenum = user_uint16(xksuse + 1704) 25 | ksuseopc = user_uint16(xksuse + 1602) 26 | ksusep1 = user_uint64(xksuse + 1608) 27 | ksusep2 = user_uint64(xksuse + 1616) 28 | ksusep3 = user_uint64(xksuse + 1624) 29 | ksusetim = user_uint32(xksuse + 1632) 30 | ksusesqh = user_uint32(xksuse + 1868) 31 | 32 | xksled = 0x015F0E6F00 # found querying the DB X$KSLED on 12.1.0.2 33 | xksled_record_size = 0x38 34 | event_string_pointer = user_int64(xksled+ksuseopc*xksled_record_size) 35 | event_string = user_string(event_string_pointer) 36 | 37 | printf("timestamp=%ld, pid=%d, sid=%d, name=%s, event#=%u, event=%s, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, sql_hash=%u\n", 38 | u64_arg(1), pid(), ksusenum, ksuudnam, ksuseopc, event_string, ksusep1, ksusep2, ksusep3, ksusetim, ksusesqh) 39 | } 40 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/experimental/trace_oracle_events_debug.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_events_debug.stp 4 | # 5 | # trace oracle wait events in systemtap by hooking on the oracle function kskthewt 6 | # kskthewt is called when Oracle updates the wait time at the end of the wait. 7 | # The register R13 is found by trial and error to be a pointer into segmented array underlying X$KSUSE 8 | # (roughly speaking a pointer to V$SESSION.SADDR). The offset between the value of the register R13 and 9 | # the value of V$SESSION.SADDR for the process/session under exmaination depends on the Oracle version and port 10 | # use this script to find the offset value. 11 | # Note also that registers RDI (that is arg1) is set to the timestamp value (as seen in the 10046 trace file) 12 | # and the register RSI (that is arg2) is set to the wait event number 13 | # 14 | # How to run: stap -v trace_oracle_events_debug.stp -x 15 | # Note:if the -x option is not used this script will trace all running oracle processes 16 | # 17 | # Dependencies: 18 | # systemtap 2.5 or higher 19 | # kernel mustbe linked to support uprobes or utrace (for example RHEL7.0 and 6.5 respectively) 20 | # add $ORACLE_HOME/bin in $PATH 21 | # 22 | # tested on: RHEL6.5 and OL7, Oracle 11.2.0.4 and 12.1.0.2 23 | # 24 | # Version 1.0, Aug 2014 by @LucaCanaliDB 25 | # Based on previous work by the author on DTrace probes in 2013 26 | # Additional credits: @FritsHoogland 27 | # 28 | 29 | probe process("oracle").function("kskthewt") { 30 | printf("pid: %d, function: %s, arg1=%lx, arg2=%lx \n", pid(),ppfunc(),u64_arg(1),u64_arg(2)) 31 | printf("pid: %d, function: %s, arg1=%ld, arg2=%ld \n", pid(),ppfunc(),u64_arg(1),u64_arg(2)) 32 | print_regs() 33 | } 34 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/histograms_oracle_events_11204.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # histograms_oracle_events_11204.stp 4 | # 5 | # This script reads Oracle wait event details from memory using SystemTap probes 6 | # See more details of how this works in the script trace_oracle_events_11204.stp 7 | # 8 | # Dependencies and prerequisites: 9 | # Use SystemTap 2.5 or higher 10 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 11 | # The oracle DB executable should be in the path: add $ORACLE_HOME/bin in $PATH 12 | # 13 | # Software versions and compatibility: 14 | # Linux RHEL/OL 6.x and 7.x 15 | # Oracle RDBMS 11.2.0.4 16 | # 17 | # Version 1.0, Aug 2014 by Luca.Canali@cern.ch 18 | # Additional credits for original contributions: @FritsHoogland 19 | # 20 | # Note: this is experimental code, use at your own risk 21 | # 22 | 23 | global eventlatency[2000] 24 | 25 | # gather and aggregate wait event latency details into a histogram 26 | probe process("oracle").function("kskthewt") { 27 | # this is the base value to read x$ksuse/v$session data from userspace memory 28 | xksuse = register("r13")-7912 29 | 30 | # uncomment the variable/xksuse fields as needed 31 | 32 | # ksuudnam = user_string(xksuse + 132) 33 | # ksusenum = user_uint16(xksuse + 5920) 34 | ksuseopc = user_uint16(xksuse + 5826) 35 | # ksusep1 = user_uint64(xksuse + 5832) 36 | # ksusep2 = user_uint64(xksuse + 5840) 37 | # ksusep3 = user_uint64(xksuse + 5848) 38 | ksusetim = user_uint32(xksuse + 5856) 39 | # ksusesqh = user_uint32(xksuse + 6084) 40 | 41 | # this creates an aggregation of event wait latency detailed per event 42 | # add filters on the available fileds and/or add aggregation columns if needed 43 | eventlatency[ksuseopc] <<< ksusetim 44 | 45 | #debug code 46 | #printf("event#=%d, wait_time=%d\n",ksuseopc, ksusetim) 47 | } 48 | 49 | 50 | # print histogram details every 10 seconds and reset the counters in eventlatency[] 51 | probe timer.sec(10) { 52 | printf("\nDate: %s\n\n",tz_ctime(gettimeofday_s())) 53 | foreach ([event] in eventlatency) { 54 | printf("Latency histogram (value=latency in microsec) for event#=%d\n",event) 55 | print(@hist_log(eventlatency[event])) 56 | } 57 | # comment out delete eventlatency if you prefer that histograms grow cumulative instead 58 | delete eventlatency 59 | } 60 | 61 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/histograms_oracle_events_12102.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # histograms_oracle_events_12102.stp 4 | # 5 | # This script reads Oracle wait event details from memory using SystemTap probes 6 | # See more details in the script trace_oracle_events_12102.stp 7 | # 8 | # Software versions and compatibility: 9 | # Linux RHEL/OL 6.x and 7.x 10 | # Oracle RDBMS 11.2.0.x and 12.1.0.x 11 | # 12 | # Notable exception and issue with Oracle 12.1.0.2: 13 | # this script will throw "inode-offset registration error" when run against 12.1.0.2 on 14 | # RHEL/OL7.1 (i.e. kernel 3.10.0-229.x). The workaround is to use an older kernel 15 | # such as RHEL/OL7.0 (kernel 3.10.0-123.x). It seems to work fine on RHEL/OL 7.1 and 11g. 16 | # 17 | # Version 1.0, Aug 2014 by Luca.Canali@cern.ch 18 | # Additional credits for original contributions: @FritsHoogland 19 | # Note: this is experimental code, use at your own risk 20 | # 21 | 22 | global eventlatency[2000] 23 | 24 | # gather and aggregate wait event latency details into a histogram 25 | probe process("oracle").function("kskthewt") { 26 | # this is the base value to read x$ksuse/v$session data from userspace memory 27 | xksuse = register("r13")-3928 28 | 29 | # uncomment the variable/xksuse fields as needed 30 | # ksuudnam = user_string(xksuse + 140) 31 | # ksusenum = user_uint16(xksuse + 1704) 32 | ksuseopc = user_uint16(xksuse + 1602) 33 | # ksusep1 = user_uint64(xksuse + 1608) 34 | # ksusep2 = user_uint64(xksuse + 1616) 35 | # ksusep3 = user_uint64(xksuse + 1624) 36 | ksusetim = user_uint32(xksuse + 1632) 37 | # ksusesqh = user_uint32(xksuse + 1868) 38 | 39 | # this creates an aggregation of event wait latency detailed per event 40 | # add filters on the available fileds and/or add aggregation columns if needed 41 | eventlatency[ksuseopc] <<< ksusetim 42 | 43 | #debug code 44 | #printf("event#=%d, wait_time=%d\n",ksuseopc, ksusetim) 45 | } 46 | 47 | 48 | # print histogram details every 10 seconds and reset the counters in eventlatency[] 49 | probe timer.sec(10) { 50 | printf("\nDate: %s\n\n",tz_ctime(gettimeofday_s())) 51 | foreach ([event] in eventlatency) { 52 | printf("Latency histogram (value=latency in microsec) for event#=%d\n",event) 53 | print(@hist_log(eventlatency[event])) 54 | } 55 | # comment out delete eventlatency if you prefer that histograms grow cumulative instead 56 | delete eventlatency 57 | } 58 | 59 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/histograms_oracle_events_version_independent.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # histograms_oracle_events_version_independent.stp 4 | # 5 | # This is a SystemTap script to gather wait event details from Oracle and print 6 | # wait event latency histograms 7 | # 8 | # Dependencies and prerequisites: 9 | # Use SystemTap 2.5 or higher 10 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 11 | # The oracle DB executable should be in the path: add $ORACLE_HOME/bin in $PATH 12 | # 13 | # Software versions and compatibility: 14 | # Linux RHEL/OL 6.x and 7.x 15 | # Oracle RDBMS 11.2.0.x and 12.1.0.x 16 | # 17 | # Notable exception and issue with Oracle 12.1.0.2: 18 | # this script will throw "inode-offset registration error" when run against 12.1.0.2 on 19 | # RHEL/OL7.1 (i.e. kernel 3.10.0-229.x). The workaround is to use an older kernel 20 | # such as RHEL/OL7.0 (kernel 3.10.0-123.x). It seems to work fine on RHEL/OL 7.1 and 11g. 21 | # 22 | # How to run: stap -v histograms_oracle_events_version_independent.stp 23 | # Note optionally add -x to limit data collection to 1 process 24 | # 25 | # Version 1.0, Aug 2014 by Luca.Canali@cern.ch 26 | # Additional credits for original contributions: @FritsHoogland 27 | # 28 | # Note: this is experimental code, use at your own risk 29 | # 30 | 31 | global eventlatency[2000] 32 | global waittime[2000] 33 | 34 | # gather and aggregate wait event latency details into a histogram 35 | probe process("oracle").function("kews_update_wait_time") { 36 | # update the wait time, the wait event number is captured in the call to kskthewt 37 | wait_time = u32_arg(2) 38 | # some false positives have wait_time =0 in particular for background processes, we ignore them 39 | if (wait_time > 0) { 40 | waittime[pid()] = wait_time 41 | } 42 | } 43 | 44 | 45 | probe process("oracle").function("kskthewt") { 46 | # the event number is in arg2 47 | event=u32_arg(2) 48 | # the wait_time was previously recorded into the waittime array 49 | eventlatency[event] <<< waittime[pid()] 50 | } 51 | 52 | 53 | # print histogram details every 10 seconds and reset the counters in eventlatency[] 54 | probe timer.sec(10) { 55 | printf("\nDate: %s\n\n",tz_ctime(gettimeofday_s())) 56 | foreach ([event] in eventlatency) { 57 | printf("Latency histogram (value=latency in microsec) for event#=%d\n",event) 58 | print(@hist_log(eventlatency[event])) 59 | } 60 | # comment this if you prefer that histograms grow comulative instead 61 | delete eventlatency 62 | } 63 | 64 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/ksuse_find_offsets.sql: -------------------------------------------------------------------------------- 1 | -- 2 | -- ksuse_find_offset.sql 3 | -- 4 | -- Script to find the offsets of data files in the X$KSUSE data structure (segment array) 5 | -- The output of this script have been used to build stap probes trace_oracle_events_12102 and trace_oracle_events_11204 6 | -- Author: Luca.Canali@cern.ch, Aug 2014 7 | -- Thanks to @FritsHoolgand for the pointing this method out to me 8 | -- 9 | 10 | select c.kqfconam FIELD_NAME, c.kqfcooff OFFSET from x$kqfco c, x$kqfta t 11 | where t.indx = c.kqfcotab 12 | and t.kqftanam='X$KSUSE' 13 | and c.kqfconam in ('KSUSEOPC','KSUSEP1','KSUSEP2','KSUSEP3','KSUSETIM','KSUSESQH','KSUSETIM','KSUSENUM','KSUUDNAM','KSUSEOBJ') 14 | order by c.kqfcooff; 15 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/livepatch_oracle/README: -------------------------------------------------------------------------------- 1 | This directory contains example SystemTap scripts that can be used to modify userspace data and CPU registers on the fly at runtime. 2 | Examples are provided on how to apply these techniques to Oracle SQL parsing. 3 | 4 | Author: Luca.Canali@cern.ch 5 | First released: February 2016 6 | 7 | Additional info at the blog entry: 8 | http://externaltable.blogspot.com/2016/03/systemtap-guru-mode-and-oracle-sql.html 9 | 10 | Scripts: 11 | 12 | filterSQL_opiprs.stp: live filtering of Oracle SQL Parsing 13 | 14 | livepatch_basic_opiprs.stp: basic on the fly modification of Oracle hard parsing operations 15 | 16 | livepatch_opiprs.stp: on the fly modification of Oracle hard parsing operations 17 | 18 | example_change_ret_val.stp: example code to change fucntion return codes 19 | 20 | 21 | Disclaimer: The tools and techniques presented in this post are intended for learning/reference only and are best used on a sandbox as they are unsupported and can potentially put at risk systems stability and integrity. Administrator privileges are needed to run SystemTap probes. 22 | 23 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/livepatch_oracle/example_change_ret_val.stp: -------------------------------------------------------------------------------- 1 | # 2 | # example_change_ret_val.stp 3 | # Change function return value by modifying the register RAX on the fly using SystemTap in guru mode 4 | # This is a stub, edit the process name and function name as relevant. 5 | # 6 | # Usage: run in guru mode 7 | # stap -g -v example_change_ret_val.stp (optionally add -x to limit to a simple process) 8 | # 9 | # Author: Luca Canali, Feb 2016 10 | # 11 | 12 | # find register offset here https://github.com/jav/systemtap/blob/master/tapset/x86_64/registers.stp 13 | %{ 14 | #define OFFSET_RAX 80 15 | %} 16 | 17 | function change_rax() %{ 18 | int myint; 19 | myint = 1; /* customize here with an expression for the value of the return function */ 20 | memcpy( ((char *)CONTEXT->uregs) + OFFSET_RAX, &myint, sizeof(myint)); 21 | %} 22 | 23 | 24 | probe process("....").function("....").return { 25 | printf("pre-change: regrax=%d\n", register("rax")) 26 | change_rax(); 27 | printf("post-change: regrax=%d\n", register("rax")) 28 | } 29 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/livepatch_oracle/filterSQL_opiprs.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # This is a SystemTap script for filtering Oracle SQL, hooking on the parsing function opiprs in guru mode. 4 | # 1. A SystemTap probe on the Oracle function opiprs is triggered at every hard parse operation 5 | # 2. The SQL statement is matched against the user-defined KEYWORD_TO_BLOCK 6 | # 3. If a match is found the SQL parsing is blocked (the Oracle session will receive an error) 7 | # This is implemented using an embedded C function: block_parse 8 | # 9 | # Note: This is example code, not intended for production use. It is unsupported and may cause instabilities. 10 | # Requires SystemTap in guru mode 11 | # 12 | # Use: stap -g -v filterSQL_opiprs.stp 13 | # Note: optionally add -x to limit the action to only (default is for all oracle sessions) 14 | # 15 | # The Oracle binary is expected to be in the PATH: export PATH=$PATH:$ORACLE_HOME/bin 16 | # Run as privileged user (root) 17 | # 18 | # Compatibility: Oracle 11.2.0.4 on RHEL/OL 6 and 7 19 | # Oracle 12.1.0.2 on RHEL/OL 6 and 7 with kernel up to 3.10.0-123.x 20 | # issue: will not work on 12c with Linux kernel version higher than 3.10.0-123.x 21 | # the incompatibility comes from uprobes 22 | # Use SystemTap version 2.5 or higher. 23 | # Kernel debuginfo not needed for this probe. 24 | # 25 | # Author: Luca.Canali@cern.ch 26 | # Created: November 2014, last updated: March 2016. 27 | # 28 | 29 | global KEYWORD_TO_BLOCK = "UNWANTED SQL" 30 | 31 | function block_parse(pointersql:long) %{ 32 | char *sqltext; 33 | sqltext = (char *) STAP_ARG_pointersql; 34 | /* Modify the SQL text by writing a 0 as first character, therefore forcing an empty string */ 35 | /* This will cause Oracle to throw an error: ORA-00900 invalid SQL statement */ 36 | sqltext[0] = 0; 37 | %} 38 | 39 | probe process("oracle").function("opiprs") { 40 | sqltext = user_string2(register("rsi"),"error") 41 | # debug code 42 | # sqllength = register("rdx") 43 | # printf("opiParse: arg2=%s, arg3=%d\n",sqltext,sqllength) 44 | if (isinstr(sqltext, KEYWORD_TO_BLOCK)) { 45 | printf("FOUND!\n") 46 | block_parse(register("rsi")) 47 | } 48 | } 49 | 50 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/livepatch_oracle/livepatch_basic_opiprs.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # This is a SystemTap script for modifying on the fly Oracle SQL parsing by 4 | # hooking on the parsing function opiprs in guru mode. 5 | # 1. A SystemTap probe on the Oracle function opiprs is triggered at every hard parse operation 6 | # 2. The SQL statement is matched against the user-defined TARGET_SQL 7 | # 3. If a match is found the SQL is replaced with REPLACEMENT_SQL 8 | # This is implemented using an embedded C function: replace_SQL 9 | # The probe only works if REPLACEMENT_SQL is not longer than TARGET_SQL 10 | # 11 | # Note: This is example code, not intended for production use. It is unsupported and may cause instabilities. 12 | # Requires SystemTap in guru mode 13 | # 14 | # Use: stap -g -v livepatch_basic_opiprs.stp 15 | # Note: optionally add -x to limit the action to only (default is for all oracle sessions) 16 | # 17 | # The Oracle binary is expected to be in the PATH: export PATH=$PATH:$ORACLE_HOME/bin 18 | # Run as privileged user (root) 19 | # 20 | # Note: this is a simplified version of livepatch_opiprs.stp 21 | # 22 | # Compatibility: Oracle 11.2.0.4 on RHEL/OL 6 and 7 23 | # Oracle 12.1.0.2 on RHEL/OL 6 and 7 with kernel up to 3.10.0-123.x 24 | # issue: will not work on 12c with Linux kernel version higher than 3.10.0-123.x 25 | # the incompatibility comes from uprobes 26 | # Use SystemTap version 2.5 or higher. 27 | # Kernel debuginfo not needed for this probe. 28 | # 29 | # Author: Luca.Canali@cern.ch 30 | # Created: November 2014, last updated: March 2016. 31 | # 32 | 33 | %{ 34 | /* Customize here as desired: SQL that will replace TARGET_SQL */ 35 | /* Note replacement_sql must not be longer than the target_sql */ 36 | 37 | #define REPLACEMENT_SQL "select power(count(*),3) from dba_objects" 38 | %} 39 | 40 | global TARGET_SQL = "select count(*) from dba_objects, dba_objects, dba_objects" 41 | 42 | function replace_SQL(pointersql:long) %{ 43 | char *sqltext; 44 | 45 | sqltext = (char *) STAP_ARG_pointersql; 46 | /* This changes in memory (stack) the SQL text that will be parsed */ 47 | strcpy(sqltext, "select power(count(*),3) from dba_objects"); 48 | %} 49 | 50 | probe process("oracle").function("opiprs") { 51 | sqltext = user_string2(register("rsi"),"error") 52 | # debug code 53 | # sqllength = register("rdx") 54 | # printf("opiParse: arg2=%s, arg3=%d\n",sqltext,sqllength) 55 | if (sqltext == TARGET_SQL) { 56 | printf("FOUND!\n") # debug code 57 | replace_SQL(register("rsi")) 58 | } 59 | } 60 | 61 | 62 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/livepatch_oracle/livepatch_opiprs.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # This is a SystemTap script for modifying on the fly Oracle SQL parsing by 4 | # hooking on the parsing function opiprs in guru mode. 5 | # 1. A SystemTap probe on the Oracle function opiprs is triggered at every hard parse operation 6 | # 2. The SQL statement is matched against the user-defined TARGET_SQL 7 | # 3. If a match is found the SQL is replaced with REPLACEMENT_SQL 8 | # This is implemented using an embedded C function: replace_SQL2 9 | # This makes use of techniques for modifying userspace memory and CPU registers using SystemTap 10 | # 11 | # Note: This is example code, not intended for production use. It is unsupported and may cause instabilities. 12 | # Requires SystemTap in guru mode 13 | # 14 | # Use: stap -g -v livepatch_opiprs.stp 15 | # Note: optionally add -x to limit the action to only (default is for all oracle sessions) 16 | # 17 | # The Oracle binary is expected to be in the PATH: export PATH=$PATH:$ORACLE_HOME/bin 18 | # Run as privileged user (root) 19 | # 20 | # Compatibility: tested for Oracle 11.2.0.4 on RHEL/OL 6 and 7 21 | # tested for Oracle 12.1.0.2 on RHEL/OL 6 and 7 with kernel up to 3.10.0-123.x 22 | # issue: uprobes for kernel higher than 3.10.0-123.x will not work on Oracle 12c 23 | # SystemTap version 2.5 or higher. Kernel debuginfo not needed for this probe. 24 | # 25 | # Author: Luca.Canali@cern.ch 26 | # Created: February 2016, last updated: March 2016. 27 | # 28 | 29 | # The following offet values for CPU registers can be found at this link: 30 | # https://github.com/jav/systemtap/blob/master/tapset/x86_64/registers.stp 31 | # see definitions in function _stp_get_register_by_offset() 32 | 33 | %{ 34 | #define OFFSET_RDX 96 35 | #define OFFSET_RSI 104 36 | #define OFFSET_RSP 152 37 | 38 | /* Customize here as desired: REPLACEMENT_SQL will be parsed instead of TARGET_SQL */ 39 | #define REPLACEMENT_SQL "select sysdate -1 from dual" 40 | %} 41 | 42 | # SQL statement that will be replaced 43 | global TARGET_SQL = "select sysdate from dual" 44 | 45 | function replaceSQL2() %{ 46 | char *sqltext; 47 | char *new_sqltext; 48 | long new_sqllength; 49 | 50 | /* Retrieves the pointer to the original sql text from register rsi, 51 | or rather from its copy in CONTEXT->uregs 52 | */ 53 | memcpy(&sqltext, ((char *)CONTEXT->uregs) + OFFSET_RSI, sizeof(sqltext)); 54 | 55 | /* Retrieves the stack pointer from rsp, or rather from its copy in CONTEXT->uregs */ 56 | memcpy(&new_sqltext, ((char *)CONTEXT->uregs) + OFFSET_RSP, sizeof(new_sqltext)); 57 | 58 | /* This copies the new SQL into a new memory location. The chioce is %rsp - 0x2000 59 | this is a guess, hoping the memory is allocated, free and not used by branch or leaf 60 | functions of opiprs. From a few tests whis seems to work OK, although it may cause 61 | crashes, use with caution 62 | */ 63 | new_sqltext -= 0x2000; 64 | strcpy(new_sqltext, REPLACEMENT_SQL); 65 | 66 | /* This updates the sql string length into register rdx and the pointer to the new SQL in 67 | register rsi. 68 | The technique is to update the relevant SystemTap memory area in CONTEXT->uregs 69 | this will be copied back to register rdx when SystemTap returns to userspace 70 | */ 71 | new_sqllength = strlen(REPLACEMENT_SQL); 72 | memcpy( ((char *)CONTEXT->uregs) + OFFSET_RDX, &new_sqllength, sizeof(new_sqllength)); 73 | memcpy( ((char *)CONTEXT->uregs) + OFFSET_RSI, &new_sqltext, sizeof(new_sqltext)); 74 | 75 | %} 76 | 77 | probe process("oracle").function("opiprs") { 78 | sqltext = user_string2(register("rsi"),"error") 79 | # debug code 80 | # sqllength = register("rdx") 81 | # printf("opiParse: arg2=%s, arg3=%d\n",sqltext,sqllength) 82 | # printf("_reg_offsets['rdx']=%d\n",_reg_offsets["rdx"]) 83 | # printf("rsi = %lx\n", register("rsi")) 84 | if (sqltext == TARGET_SQL) { 85 | printf("FOUND!\n") # debug code 86 | replaceSQL2() 87 | } 88 | } 89 | 90 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/measure_io_patterns/Oracle_read_profile.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # Measure oracle I/O by hooking on kcbzvb and reporting counts of distinct and repeated I/O 4 | # The overhead of this code can be high, use rather on test systems 5 | # 6 | # Luca.Canali@cern.ch, Nov 2014 7 | # 8 | 9 | /* creates c arrays and zero out the hash table myhashtable */ 10 | %{ 11 | #define NUMBITS 64 /* 64 bits in a long */ 12 | #define ARRAYSIZE 134217728 /* hash table size */ 13 | #define HASHPRIME 8589934583 /* prime number for hashing, close to ARRAYSIZE * NUMBITS */ 14 | 15 | static unsigned long myhashtable[ARRAYSIZE]; 16 | static unsigned long mybitmasks[] = { 17 | 0x1, 0x2, 0x4, 0x8, 18 | 0x10, 0x20, 0x40, 0x80, 19 | 0x100, 0x200, 0x400, 0x800, 20 | 0x1000, 0x2000, 0x4000, 0x8000, 21 | 0x10000, 0x20000, 0x40000, 0x80000, 22 | 0x100000, 0x200000, 0x400000, 0x800000, 23 | 0x1000000, 0x2000000, 0x4000000, 0x8000000, 24 | 0x10000000, 0x20000000, 0x40000000, 0x80000000, 25 | 0x100000000, 0x200000000, 0x400000000, 0x800000000, 26 | 0x1000000000, 0x2000000000, 0x4000000000, 0x8000000000, 27 | 0x10000000000, 0x20000000000, 0x40000000000, 0x80000000000, 28 | 0x100000000000, 0x200000000000, 0x400000000000, 0x800000000000, 29 | 0x1000000000000, 0x2000000000000, 0x4000000000000, 0x8000000000000, 30 | 0x10000000000000, 0x20000000000000, 0x40000000000000, 0x80000000000000, 31 | 0x100000000000000, 0x200000000000000, 0x400000000000000, 0x800000000000000, 32 | 0x1000000000000000, 0x2000000000000000, 0x4000000000000000, 0x8000000000000000}; 33 | %} 34 | 35 | function my_keepcount(filenum:long, blocknum:long) %{ 36 | unsigned long myarrayindex; 37 | unsigned long mybitmask; 38 | unsigned long myindex; 39 | 40 | myindex = (STAP_ARG_blocknum + (STAP_ARG_filenum * 0x100000000)) % HASHPRIME; 41 | 42 | myarrayindex = myindex / NUMBITS; 43 | mybitmask = mybitmasks[(myindex % NUMBITS)]; 44 | 45 | if (myarrayindex >= ARRAYSIZE) 46 | STAP_RETURN(-1); /* error */ 47 | 48 | if ( (myhashtable[myarrayindex] & mybitmask) == 0 ) { 49 | myhashtable[myarrayindex] |= mybitmask; 50 | STAP_RETURN(1); /* new block read */ 51 | } 52 | else 53 | STAP_RETURN(0); /* repeated read */ 54 | %} 55 | 56 | global num_total_ios 57 | global num_distinct_blocks 58 | 59 | probe process("oracle").function("kcbzvb") { # data blocks who have undergone an I/O operation 60 | filenum=register("rsi") 61 | blocknum=register("rdx") 62 | result=my_keepcount(filenum,blocknum) 63 | num_total_ios++ 64 | if (result == 1) 65 | num_distinct_blocks++ 66 | if (result == -1) 67 | printf("Error!\n") 68 | } 69 | 70 | probe timer.s(3) 71 | { 72 | printf("number of distinct blocks read: %ld, total number of blocks read: %ld\n", 73 | num_distinct_blocks, num_total_ios) 74 | } 75 | 76 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/measure_io_patterns/Oracle_read_profile_drilldown_file.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # Measure oracle I/O by hooking on kcbzvb and reporting counts of distinct and repeated I/O 4 | # With drill down by file number 5 | # The overhead of this code can be high, use rather on test systems 6 | # 7 | # Luca.Canali@cern.ch, Nov 2014 8 | # 9 | 10 | /* creates c arrays and zero out the hash table myhashtable */ 11 | %{ 12 | #define NUMBITS 64 /* 64 bits in a long */ 13 | #define ARRAYSIZE 134217728 /* hash table size */ 14 | #define HASHPRIME 8589934583 /* prime number for hashing, close to ARRAYSIZE * NUMBITS */ 15 | 16 | static unsigned long myhashtable[ARRAYSIZE]; 17 | static unsigned long mybitmasks[] = { 18 | 0x1, 0x2, 0x4, 0x8, 19 | 0x10, 0x20, 0x40, 0x80, 20 | 0x100, 0x200, 0x400, 0x800, 21 | 0x1000, 0x2000, 0x4000, 0x8000, 22 | 0x10000, 0x20000, 0x40000, 0x80000, 23 | 0x100000, 0x200000, 0x400000, 0x800000, 24 | 0x1000000, 0x2000000, 0x4000000, 0x8000000, 25 | 0x10000000, 0x20000000, 0x40000000, 0x80000000, 26 | 0x100000000, 0x200000000, 0x400000000, 0x800000000, 27 | 0x1000000000, 0x2000000000, 0x4000000000, 0x8000000000, 28 | 0x10000000000, 0x20000000000, 0x40000000000, 0x80000000000, 29 | 0x100000000000, 0x200000000000, 0x400000000000, 0x800000000000, 30 | 0x1000000000000, 0x2000000000000, 0x4000000000000, 0x8000000000000, 31 | 0x10000000000000, 0x20000000000000, 0x40000000000000, 0x80000000000000, 32 | 0x100000000000000, 0x200000000000000, 0x400000000000000, 0x800000000000000, 33 | 0x1000000000000000, 0x2000000000000000, 0x4000000000000000, 0x8000000000000000}; 34 | %} 35 | 36 | function my_keepcount(filenum:long, blocknum:long) %{ 37 | unsigned long myarrayindex; 38 | unsigned long mybitmask; 39 | unsigned long myindex; 40 | 41 | myindex = (STAP_ARG_blocknum + (STAP_ARG_filenum * 0x100000000)) % HASHPRIME; 42 | 43 | myarrayindex = myindex / NUMBITS; 44 | mybitmask = mybitmasks[(myindex % NUMBITS)]; 45 | 46 | if (myarrayindex >= ARRAYSIZE) 47 | STAP_RETURN(-1); /* error */ 48 | 49 | if ( (myhashtable[myarrayindex] & mybitmask) == 0 ) { 50 | myhashtable[myarrayindex] |= mybitmask; 51 | STAP_RETURN(1); /* new block read */ 52 | } 53 | else 54 | STAP_RETURN(0); /* repeated read */ 55 | %} 56 | 57 | global num_total_ios[2000] 58 | global num_distinct_blocks[2000] 59 | 60 | probe process("oracle").function("kcbzvb") { # data blocks who have undergone an I/O operation 61 | filenum=register("rsi") 62 | blocknum=register("rdx") 63 | result=my_keepcount(filenum,blocknum) 64 | num_total_ios[filenum]++ 65 | if (result == 1) 66 | num_distinct_blocks[filenum]++ 67 | if (result == -1) 68 | printf("Error!\n") 69 | } 70 | 71 | probe timer.s(10) 72 | { 73 | printf("\nDate: %s\n",tz_ctime(gettimeofday_s())) 74 | printf("\nFile number and read profile\n") 75 | foreach ([file] in num_total_ios) 76 | printf("file_num: %d, number of distinct and new blocks read: %d, total number of blocks read: %d\n", 77 | file, num_distinct_blocks[file], num_total_ios[file]) 78 | delete num_total_ios 79 | delete num_distinct_blocks 80 | } 81 | 82 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/measure_io_patterns/Oracle_read_profile_drilldown_objectnum.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # Measure oracle I/O by hooking on kcbzvb and reporting counts of distinct and repeated I/O 4 | # With drill down by data_object_id 5 | # The overhead of this code can be high, use rather on test systems 6 | # 7 | # Luca.Canali@cern.ch, Nov 2014 8 | # 9 | 10 | /* creates c arrays and zero out the hash table myhashtable */ 11 | %{ 12 | #define NUMBITS 64 /* 64 bits in a long */ 13 | #define ARRAYSIZE 134217728 /* hash table size */ 14 | #define HASHPRIME 8589934583 /* prime number for hashing, close to ARRAYSIZE * NUMBITS */ 15 | 16 | static unsigned long myhashtable[ARRAYSIZE]; 17 | static unsigned long mybitmasks[] = { 18 | 0x1, 0x2, 0x4, 0x8, 19 | 0x10, 0x20, 0x40, 0x80, 20 | 0x100, 0x200, 0x400, 0x800, 21 | 0x1000, 0x2000, 0x4000, 0x8000, 22 | 0x10000, 0x20000, 0x40000, 0x80000, 23 | 0x100000, 0x200000, 0x400000, 0x800000, 24 | 0x1000000, 0x2000000, 0x4000000, 0x8000000, 25 | 0x10000000, 0x20000000, 0x40000000, 0x80000000, 26 | 0x100000000, 0x200000000, 0x400000000, 0x800000000, 27 | 0x1000000000, 0x2000000000, 0x4000000000, 0x8000000000, 28 | 0x10000000000, 0x20000000000, 0x40000000000, 0x80000000000, 29 | 0x100000000000, 0x200000000000, 0x400000000000, 0x800000000000, 30 | 0x1000000000000, 0x2000000000000, 0x4000000000000, 0x8000000000000, 31 | 0x10000000000000, 0x20000000000000, 0x40000000000000, 0x80000000000000, 32 | 0x100000000000000, 0x200000000000000, 0x400000000000000, 0x800000000000000, 33 | 0x1000000000000000, 0x2000000000000000, 0x4000000000000000, 0x8000000000000000}; 34 | %} 35 | 36 | function my_keepcount(filenum:long, blocknum:long) %{ 37 | unsigned long myarrayindex; 38 | unsigned long mybitmask; 39 | unsigned long myindex; 40 | 41 | myindex = (STAP_ARG_blocknum + (STAP_ARG_filenum * 0x100000000)) % HASHPRIME; 42 | 43 | myarrayindex = myindex / NUMBITS; 44 | mybitmask = mybitmasks[(myindex % NUMBITS)]; 45 | 46 | if (myarrayindex >= ARRAYSIZE) 47 | STAP_RETURN(-1); /* error */ 48 | 49 | if ( (myhashtable[myarrayindex] & mybitmask) == 0 ) { 50 | myhashtable[myarrayindex] |= mybitmask; 51 | STAP_RETURN(1); /* new block read */ 52 | } 53 | else 54 | STAP_RETURN(0); /* repeated read */ 55 | %} 56 | 57 | global num_total_ios[2000] 58 | global num_distinct_blocks[2000] 59 | 60 | probe process("oracle").function("kcbzvb") { # data blocks who have undergone an I/O operation 61 | filenum=register("rsi") 62 | blocknum=register("rdx") 63 | data_object_id=user_int32(register("rdi")+24) 64 | 65 | result=my_keepcount(filenum,blocknum) 66 | num_total_ios[data_object_id]++ 67 | if (result == 1) 68 | num_distinct_blocks[data_object_id]++ 69 | if (result == -1) 70 | printf("Error!\n") 71 | } 72 | 73 | probe timer.s(10) 74 | { 75 | printf("\nDate: %s\n",tz_ctime(gettimeofday_s())) 76 | printf("\nRead profile dirlled down by data_object_id\n") 77 | foreach ([obj] in num_total_ios) 78 | printf("data_object_id: %d, number of distinct and new blocks read: %d, total number of blocks read: %d\n", 79 | obj, num_distinct_blocks[obj], num_total_ios[obj]) 80 | delete num_total_ios 81 | delete num_distinct_blocks 82 | } 83 | 84 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/oracle_event_latencyhistogram.stp: -------------------------------------------------------------------------------- 1 | #!/usr/bin/stap 2 | # 3 | # oracle_event_latencyhistogram.stp 4 | # 5 | # This is a SystemTap script to gather Oracle wait event measurements directly from Oracle binaries 6 | # and print wait event latency histograms to be consumed by PyLatencyMap for heatmap visualization 7 | # 8 | # Use: stap -v oracle_event_latencyhistogram.stp 9 | # 10 | # Note optionally add -x to limit data collection to 1 process 11 | # 12 | # Note: in case of error ERROR: Skipped too many probes and for a system with many Oracle processes 13 | # increase the max number of UPROBES. For example: 14 | # stap -v -DMAXUPROBES=1500 oracle_event_latency.stp 15 | # 16 | # Prerequisite: find the value of of interest using Oracle SQL*plus. 17 | # Example SQL: 18 | # select event#,name from v$event_name where name in ('db file sequential read', 'log file sync'); 19 | # 12.1.0.2 output: 20 | # EVENT# NAME 21 | # ------- ------------------------ 22 | # 146 log file sync 23 | # 153 db file sequential read 24 | # 25 | # Dependencies: 26 | # Needs SystemTap version 2.5 or higher 27 | # Kernel must have support for uprobes or utrace (use RHEL7.x or RHEL6.x) 28 | # The oracle executable needs to be in the path, i.e. add $ORACLE_HOME/bin to $PATH 29 | # 30 | # Software versions and compatibility: 31 | # Linux RHEL/OL 6.x and 7.x 32 | # Oracle RDBMS 11.2.0.x and 12.1.0.x 33 | # 34 | # Notable exception and issue with Oracle 12.1.0.2: 35 | # this script will throw "inode-offset registration error" when run against 12.1.0.2 on 36 | # RHEL/OL7.1 (i.e. kernel 3.10.0-229.x). The workaround is to use an older kernel 37 | # such as RHEL/OL7.0 (kernel 3.10.0-123.x). It seems to work fine on RHEL/OL 7.1 and 11g. 38 | # 39 | # Author: Luca.Canali@cern.ch (@LucaCanaliDB) 40 | # Additional credits for original contributions: @FritsHoogland 41 | # Version 1.0, March 2015. 42 | # Based on previous work on Oracle tracing with SystemTap by Luca.Canali@cern.ch, Aug 2014 43 | # 44 | # Note: this is experimental code, use at your own risk 45 | # 46 | 47 | global eventlatency 48 | global waittime[10000] 49 | global eventnum 50 | 51 | probe begin { 52 | if (argv_1 != "") { 53 | eventnum = strtol(argv_1, 10) 54 | printf("Now sampling event N# %d\n", eventnum) 55 | } 56 | else { 57 | printf("Usage: stap -v oracle_event_latency.stp \n") 58 | exit() 59 | } 60 | } 61 | 62 | # gather and aggregate wait event latency details into a histogram 63 | probe process("oracle").function("kews_update_wait_time") { 64 | waittime[pid()] = u32_arg(2) # update the wait time, the wait event number is captured in the call to kskthewt 65 | } 66 | 67 | 68 | probe process("oracle").function("kskthewt") { 69 | # the event number is in arg2 70 | if ((u32_arg(2) == eventnum) && (waittime[pid()] > 0)) { 71 | eventlatency <<< waittime[pid()] # the wait_time was previously recorded into the waittime array 72 | delete waittime[pid()] 73 | } 74 | } 75 | 76 | 77 | # print histogram details every 3 seconds in a format recognized by Pylatencymap 78 | # change to a different repetition rate if you prefer 79 | probe timer.sec(3) { 80 | if (@count(eventlatency) > 0) 81 | println(@hist_log(eventlatency)) 82 | delete(eventlatency) 83 | } 84 | 85 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_events_11204.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_events_11204.stp 4 | # 5 | # Trace oracle wait events using SystemTap by hooking on the Oracle RDBMS function kskthewt 6 | # kskthewt is called when Oracle updates the wait-related performance counters at the end of the wait. 7 | # The register R13 is found by trial and error to be a pointer into segmented array underlying X$KSUSE 8 | # (roughly speaking a pointer to V$SESSION data). The offset between the value of the register R13 and 9 | # the value of V$SESSION.SADDR for the process/session under examination depends on the Oracle version 10 | # and patchset. 11 | # Additional scripts used to generate this script: 12 | # trace_oracle_events_debug.stp -> use to find the offset between R13 and the base value of X$KSSUE 13 | # ksuse_find_offsets.sql -> use to find the ofssets for the X$KSUSE fields to gather/disply 14 | # eventsname.sql -> optionally use to create a sed script converting event# to event names 15 | # Note also that registers RDI (that is arg1) is set to the timestamp value 16 | # and the register RSI (that is arg2) is set to the wait event number 17 | # 18 | # How to run: stap -v trace_oracle_events_11204.stp -x 19 | # Note: if the -x option is not used this script will trace all running oracle processes 20 | # 21 | # Dependencies: 22 | # Use systemtap 2.5 or higher 23 | # Kernel must have support for uprobes or utrace (this seems the case for example RHEL7.x and 6.x) 24 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 25 | # 26 | # Software versions and compatibility: 27 | # Linux RHEL/OL 6.x and 7.x 28 | # Oracle RDBMS 11.2.0.4 29 | # 30 | # Version 1.0, Aug 2014 by Luca.Canali@cern.ch 31 | # Additional credits for original contributions: @FritsHoogland 32 | # 33 | # Note: this is experimental code, use at your own risk 34 | # 35 | 36 | probe process("oracle").function("kskthewt") { 37 | xksuse = register("r13")-7912 38 | ksuudnam = user_string(xksuse + 132) 39 | ksusenum = user_uint16(xksuse + 5920) 40 | ksuseopc = user_uint16(xksuse + 5826) 41 | ksusep1 = user_uint64(xksuse + 5832) 42 | ksusep2 = user_uint64(xksuse + 5840) 43 | ksusep3 = user_uint64(xksuse + 5848) 44 | ksusetim = user_uint32(xksuse + 5856) 45 | ksusesqh = user_uint32(xksuse + 6084) 46 | printf("timestamp=%ld, pid=%d, sid=%d, name=%s, event#=%u, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, sql_hash=%u\n", u64_arg(1), pid(), ksusenum, ksuudnam, ksuseopc, ksusep1, ksusep2, ksusep3, ksusetim, ksusesqh) 47 | } 48 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_events_12102.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_events_12102.stp 4 | # 5 | # Trace oracle wait events using SystemTap by hooking on the Oracle RDBMS function kskthewt 6 | # kskthewt is called when Oracle updates the wait-related performance counters at the end of the wait. 7 | # The register R13 is found by trial and error to be a pointer into segmented array underlying X$KSUSE 8 | # (roughly speaking a pointer to V$SESSION data). The offset between the value of the register R13 and 9 | # the value of V$SESSION.SADDR for the process/session under examination depends on the Oracle version 10 | # and patchset. 11 | # Additional scripts used to generate this script: 12 | # trace_oracle_events_debug.stp -> use to find the offset between R13 and the base value of X$KSSUE 13 | # ksuse_find_offsets.sql -> use to find the ofssets for the X$KSUSE fields to gather/disply 14 | # eventsname.sql -> optionally use to create a sed script converting event# to event names 15 | # Note also that registers RDI (that is arg1) is set to the timestamp value 16 | # and the register RSI (that is arg2) is set to the wait event number 17 | # 18 | # How to run: stap -v trace_oracle_events_12102.stp -x 19 | # Note: if the -x option is not used this script will trace all running oracle processes 20 | # 21 | # Dependencies: 22 | # Use systemtap 2.5 or higher 23 | # Kernel must have support for uprobes or utrace (this seems the case for example RHEL7.0 and 6.5) 24 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 25 | # 26 | # Software versions and compatibility: 27 | # Linux RHEL/OL 6.x and 7.x 28 | # Oracle RDBMS 12.1.0.2 29 | # 30 | # Version 1.0, Aug 2014 by Luca.Canali@cern.ch 31 | # Additional credits for original contributions: @FritsHoogland 32 | # 33 | # Note: this is experimental code, use at your own risk 34 | # 35 | 36 | probe process("oracle").function("kskthewt") { 37 | xksuse = register("r13")-3928 38 | ksuudnam = user_string(xksuse + 140) 39 | ksusenum = user_uint16(xksuse + 1704) 40 | ksuseopc = user_uint16(xksuse + 1602) 41 | ksusep1 = user_uint64(xksuse + 1608) 42 | ksusep2 = user_uint64(xksuse + 1616) 43 | ksusep3 = user_uint64(xksuse + 1624) 44 | ksusetim = user_uint32(xksuse + 1632) 45 | ksusesqh = user_uint32(xksuse + 1868) 46 | printf("timestamp=%ld, pid=%d, sid=%d, name=%s, event#=%u, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, sql_hash=%u\n", u64_arg(1), pid(), ksusenum, ksuudnam, ksuseopc, ksusep1, ksusep2, ksusep3, ksusetim, ksusesqh) 47 | } 48 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_iocalls_12102.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_iocalls_12102.stp 4 | # 5 | # This is a SystemTap probe to trace physical I/O to block devices and Oracle wait events 6 | # the script works by hooking into the OS syscalls and to Oracle kernel function kskthewt 7 | # for Oracle 12.1.0.2 8 | # For more info on how this works, see also trace_oracle_events_12102.stp 9 | # 10 | # Dependencies (for Oracle userspace tracing): 11 | # Use systemtap 2.5 or higher 12 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 13 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 14 | # Needs kernel debug info 15 | # This is for oracle 12.1.0.2 (edit the probe on function("kskthewt") to port for a different version) 16 | # 17 | # Notable exception and issue with Oracle 12.1.0.2: 18 | # this script will throw "inode-offset registration error" when run against 12.1.0.2 on 19 | # RHEL/OL7.1 (i.e. kernel 3.10.0-229.x). The workaround is to use an older kernel 20 | # such as RHEL/OL7.0 (kernel 3.10.0-123.x). It seems to work fine on RHEL/OL 7.1 and 11g. 21 | # 22 | # Version 1.0, Aug 2014 by Luca.Canali@cern.ch 23 | # Additional credits for original contributions: @FritsHoogland 24 | # 25 | # Note: this is experimental code, use at your own risk 26 | 27 | ############################## 28 | # Trace wait events 12.1.0.2 # 29 | ############################## 30 | 31 | probe process("oracle").function("kskthbwt") { 32 | xksuse = register("r13")-3928 33 | ksusenum = user_uint16(xksuse + 1704) 34 | printf("==========\nDB WAIT EVENT BEGIN: timestamp_ora=%ld, pid=%d, sid=%d, event#=%u\n", 35 | register("rsi"), pid(), ksusenum, register("rdx")) 36 | } 37 | 38 | probe process("oracle").function("kskthewt") { 39 | xksuse = register("r13") - 3928 40 | ksuudnam = user_string(xksuse + 140) 41 | ksusenum = user_uint16(xksuse + 1704) 42 | ksuseopc = user_uint16(xksuse + 1602) 43 | ksusep1 = user_uint64(xksuse + 1608) 44 | ksusep2 = user_uint64(xksuse + 1616) 45 | ksusep3 = user_uint64(xksuse + 1624) 46 | ksusetim = user_uint32(xksuse + 1632) 47 | ksusesqh = user_uint32(xksuse + 1868) 48 | ksuseobj = user_uint32(xksuse + 2312) 49 | printf("DB WAIT EVENT END: timestamp_ora=%ld, pid=%d, sid=%d, name=%s, event#=%u, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, obj=%d, sql_hash=%u\n==========\n", 50 | register("rdi"), pid(), ksusenum, ksuudnam, ksuseopc, ksusep1, ksusep2, ksusep3, ksusetim, ksuseobj, ksusesqh) 51 | } 52 | 53 | 54 | ########################### 55 | # Trace Physical IO - ASM # 56 | ########################### 57 | 58 | probe syscall.pread, syscall.pwrite { 59 | if (pid() == target()) { 60 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, fd=%d, offset=%d, count(bytes)=%d\n", name, local_clock_us(), execname(), pid(), fd, offset, count) 61 | } 62 | } 63 | 64 | probe syscall.pread.return, syscall.pwrite.return { 65 | if (pid() == target()) { 66 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(bytes)=%d\n", name, local_clock_us(), execname(), pid(), $return) 67 | } 68 | } 69 | 70 | # some ugly tricks added here for compatibility, in particular there seem to be problems in RHEL/EL 7.x debuginfo related to 71 | # $iocbpp which is incorrectly reported as long int in those system instead of struct iocb**, also we use a trick to resolve the array 72 | # the use of a probe on kernel.function instead of syscall.io_submit also needed for some platforms (RHEL 6.7) 73 | # See below for a more basic probe that works on RHEL and RHEL6.6 74 | probe kernel.function("sys_io_submit") { 75 | if (pid() == target()) { 76 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", "io_submit", local_clock_us(), execname(), pid(), $nr) 77 | for (i=0; i<$nr; i++) { 78 | printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_fildes, 79 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_offset, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_nbytes, 80 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_lio_opcode) 81 | } 82 | } 83 | } 84 | 85 | # For reference this is the original probe on io_submit without the compatibility tricks used above 86 | # probe syscall.io_submit { 87 | # if (pid() == target()) { 88 | # printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", name, local_clock_us(), execname(), pid(), nr) 89 | # for (i=0; i<$nr; i++) { 90 | # printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, $iocbpp[i]->aio_fildes, 91 | # $iocbpp[i]->aio_offset, $iocbpp[i]->aio_nbytes, $iocbpp[i]->aio_lio_opcode) 92 | # } 93 | # } 94 | # } 95 | 96 | 97 | probe syscall.io_submit.return { 98 | if (pid() == target()) { 99 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 100 | } 101 | } 102 | 103 | probe syscall.io_getevents { 104 | if (pid() == target()) { 105 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout=%s\n", name, local_clock_us(), execname(), pid(), min_nr, $timeout$) 106 | } 107 | } 108 | 109 | # need to explicitly cast $events for RHEL/EL 7.x kernels, where debuginfo issues report $events as long int instead of struct io_event* 110 | probe syscall.io_getevents.return { 111 | if (pid() == target()) { 112 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 113 | for (i=0; i<$return; i++) { # cycle over the reaped I/Os 114 | obj_addr = @cast($events, "struct io_event")[i]->obj # details of struct iocb in /usr/include/libaio.h 115 | fildes = user_uint32(obj_addr +20) 116 | bytes = user_uint64(obj_addr +32) 117 | offset = user_uint64(obj_addr +40) 118 | printf(" %d:, fildes=%d, offset=%lu, bytes=%lu\n", i+1, fildes, offset, bytes) 119 | } 120 | } 121 | } 122 | 123 | global active_bio[10000] 124 | 125 | probe ioblock.request { 126 | if (pid() == target()) { 127 | printf("OS: ->%s, timestamp=%d, pid=%d, devname=%s, sector=%d, size=%d, rw=%d, address_bio=%lu\n", 128 | name, local_clock_us(), pid(), devname, sector, size, rw, $bio ) 129 | active_bio[$bio] += 1 130 | } 131 | } 132 | 133 | probe ioblock_trace.request { 134 | if (pid() == target()) { 135 | printf("OS: ->%s, timestamp=%d, pid=%d, devname=%s, sector=%d, size=%d, rw=%d, address_bio=%lu\n", 136 | name, local_clock_us(), pid(), devname, sector, size, rw, $bio ) 137 | } 138 | } 139 | 140 | # the use of active_bio[] is a workaround as pid is not populated in ioblock.end and therefore cannot be used for filtering 141 | probe ioblock.end { 142 | if (active_bio[$bio] >= 1) { 143 | printf("OS: <-%s, timestamp=%d, pid=%d, devname=%s, sector=%d, rw=%d, address_bio=%lu\n", 144 | name, local_clock_us(), pid(), devname, sector, rw, $bio) 145 | active_bio[$bio] -= 1 146 | } 147 | } 148 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_logical_io_basic.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_logical_io_basic.stp 4 | # 5 | # This is a SystemTap probe to trace logical I/O by hooking into the Oracle kernel function kcbgtcr 6 | # This probe is a port to SystemTap of DTrace work by Tanel Poder, see qer_trace.sh 7 | # A detailed analysis of logical IO using DTrace and the script dtracelio.d can be found in 8 | # Alexander Anokhin blog http://alexanderanokhin.wordpress.com/2011/11/13/dynamic-tracing-of-oracle-logical-io/ 9 | # 10 | # Dependencies: 11 | # Use systemtap 2.5 or higher 12 | # Kernel must have support for uprobes or utrace (this seems the case for example RHEL7.x and 6.x) 13 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 14 | # 15 | # How to run: stap -v histograms_oracle_events_version_independent.stp 16 | # Note optionally add -x to limit data collection to 1 process 17 | # 18 | # By Luca.Canali@cern.ch, Aug 2014 based on original work by Tanel Poder and Alexander Anokhin 19 | # Note: This is experimental code, use at your own risk 20 | # The overhead of this code can be high, use rather on test systems 21 | # 22 | 23 | probe process("oracle").function("kcbgtcr") { 24 | printf("tbs#=%d, rfile=%d, block#=%d, bigfile_block#=%d, obj#=%d\n",user_int32(u64_arg(1)), user_int32(u64_arg(1)+4) >> 22 & 0x003FFFFF, 25 | user_int32(u64_arg(1)+4) & 0x003FFFFF, user_int32(u64_arg(1)+4), user_int32(u64_arg(1)+8)) 26 | } 27 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_logical_io_count.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_logical_io_count.stp 4 | # 5 | # This is a SystemTap probe to trace and aggregate statistics of Oracle logical I/O 6 | # the script works by hooking into the Oracle kernel function kcbgtcr 7 | # See also the probe trace_oracle_logical_io_basic.stp for more information 8 | # 9 | # Dependencies: 10 | # Use systemtap 2.5 or higher 11 | # Kernel must have support for uprobes or utrace (this is the case for example for RHEL7.x and 6.x) 12 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 13 | # 14 | # How to run: stap -v trace_oracle_logical_io_count.stp 15 | # Note optionally add -x to limit data collection to 1 process 16 | # 17 | # By Luca.Canali@cern.ch, Aug 2014 based on original work by Tanel Poder and Alexander Anokhin 18 | # Note: This is experimental code, use at your own risk 19 | # The overhead of this code can be high, use rather on test systems 20 | 21 | global obj_iocount[1000] 22 | global tbs_iocount[200] 23 | global pid_iocount[200] 24 | 25 | probe process("oracle").function("kcbgtcr") { 26 | obj = user_int32(u64_arg(1)+8) 27 | tbs = user_int32(u64_arg(1)) 28 | pid=pid() 29 | obj_iocount[obj] += 1 30 | tbs_iocount[tbs] += 1 31 | pid_iocount[pid] += 1 32 | } 33 | 34 | # print logical io count statistics every 10 seconds and reset the counters 35 | probe timer.sec(10) { 36 | printf("\nDate: %s\n",tz_ctime(gettimeofday_s())) 37 | printf("\nObject number and logical I/O count\n") 38 | foreach ([obj] in obj_iocount) { 39 | printf("obj#=%d, count=%d\n",obj, obj_iocount[obj]) 40 | } 41 | printf("\nTablespace number and logical I/O count\n") 42 | foreach ([tbs] in tbs_iocount) { 43 | printf("tbs#=%d, count=%d\n", tbs, tbs_iocount[tbs]) 44 | } 45 | printf("\nuser process pid and logical I/O count\n") 46 | foreach ([pid] in pid_iocount) { 47 | printf("pid#=%d, count=%d\n", pid, pid_iocount[pid]) 48 | } 49 | # comment out here below if you prefer that statistics values grow cumulative 50 | delete obj_iocount 51 | delete tbs_iocount 52 | delete pid_iocount 53 | } 54 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_logicalio_wait_events_physicalio_11204.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_logicalio_wait_events_physicalio_11204.stp 4 | # 5 | # This is a SystemTap probe to trace logical I/O wait events and physical I/O 6 | # the script works by hooking into the OS syscalls and to Oracle kernel functions 7 | # 8 | # Dependencies (for Oracle userspace tracing): 9 | # Use systemtap 2.5 or higher 10 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 11 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 12 | # Needs kernel debuginfo 13 | # 14 | # Usage: stap -v trace_logicalio_wait_events_physicalio_11204.stp -x 15 | # 16 | # This script version is for Oracle 11.2.0.4, support for other Oracle versions also available 17 | # see also trace_logicalio_wait_events_physicalio_12102.stp 18 | # 19 | # Version 1.0, Oct 2014 by Luca.Canali@cern.ch 20 | # Additional credits for original contributions: @FritsHoogland 21 | # Logical I/O probes kcbgtcr and kcbgcur, from original work by Alexander Anokhin and @TanelPoder 22 | # 23 | # Note: This is experimental code, use at your own risk 24 | # The overhead of this code can be high, use rather on test systems 25 | # 26 | 27 | 28 | #################### 29 | # Trace Logical IO # 30 | #################### 31 | 32 | probe process("oracle").function("kcbgtcr") { # consistent reads 33 | pointer_lio=register("rdi") 34 | printf("DB LOGICAL IO Consistent Read (kcbgtcr) for block: tbs#=%ld, rfile#=%d, block#=%d, bigfile_block#=%d, obj#=%ld\n", 35 | user_int32(pointer_lio), user_int32(pointer_lio+4) >> 22 & 0x003FFFFF, user_int32(pointer_lio+4) & 0x003FFFFF, 36 | user_int32(pointer_lio+4), user_int32(pointer_lio+8)) 37 | } 38 | 39 | probe process("oracle").function("kcbgcur") { # current reads 40 | pointer_lio=register("rdi") 41 | printf("DB LOGICAL IO Current Read (kcbgcur) for block: tbs#=%ld, rfile#=%d, block#=%d, bigfile_block#=%d, obj#=%ld\n", 42 | user_int32(pointer_lio), user_int32(pointer_lio+4) >> 22 & 0x003FFFFF, user_int32(pointer_lio+4) & 0x003FFFFF, 43 | user_int32(pointer_lio+4), user_int32(pointer_lio+8)) 44 | } 45 | 46 | probe process("oracle").function("kcbzib") { # initiates physical reads into buffer cache 47 | printf(" ->kcbzib, Oracle logical read operations require physical reads into the buffer cache\n") 48 | } 49 | 50 | probe process("oracle").function("kcbzgb") { # prepares physical read into buffer cache 51 | printf(" -> kcbzgb, Oracle has allocated buffer cache space for block: tbs#=%ld, rfile#=%d, block#=%d, bigfile_block#=%d, obj#=%ld\n", 52 | register("rsi"), register("rdx") >> 22 & 0x003FFFFF, register("rdx") & 0x003FFFFF, register("rdx"), register("r8")) 53 | } 54 | 55 | probe process("oracle").function("kcbzvb") { # data blocks who have undergone an I/O operation 56 | printf(" ->kcbzvb, Oracle has performed I/O on: file#=%ld, block#=%d, rfile#=%d, bigfile_block#=%d\n", 57 | register("rsi"), register("rdx") & 0x003FFFFF, register("rdx") >> 22 & 0x003FFFFF, register("rdx")) 58 | } 59 | 60 | 61 | ############################# 62 | # Trace wait events 11.2.0.4# 63 | ############################# 64 | 65 | probe process("oracle").function("kskthbwt") { 66 | xksuse = register("r13")-7912 67 | ksusenum = user_uint16(xksuse + 5920) 68 | printf("==========\nDB WAIT EVENT BEGIN: timestamp_ora=%ld, pid=%d, sid=%d, event#=%u\n", 69 | register("rsi"), pid(), ksusenum, register("rdx")) 70 | } 71 | 72 | probe process("oracle").function("kskthewt") { 73 | xksuse = register("r13")-7912 74 | ksuudnam = user_string(xksuse + 132) 75 | ksusenum = user_uint16(xksuse + 5920) 76 | ksuseopc = user_uint16(xksuse + 5826) 77 | ksusep1 = user_uint64(xksuse + 5832) 78 | ksusep2 = user_uint64(xksuse + 5840) 79 | ksusep3 = user_uint64(xksuse + 5848) 80 | ksusetim = user_uint32(xksuse + 5856) 81 | ksusesqh = user_uint32(xksuse + 6084) 82 | ksuseobj = user_uint32(xksuse + 6464) 83 | printf("DB WAIT EVENT END: timestamp_ora=%ld, pid=%d, sid=%d, name=%s, event#=%u, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, obj=%d, sql_hash=%u\n==========\n", 84 | register("rdi"), pid(), ksusenum, ksuudnam, ksuseopc, ksusep1, ksusep2, ksusep3, ksusetim, ksuseobj, ksusesqh) 85 | } 86 | 87 | 88 | ########################### 89 | # Trace Physical IO - ASM # 90 | ########################### 91 | 92 | probe syscall.pread, syscall.pwrite { 93 | if (pid() == target()) { 94 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, fd=%d, offset=%d, count(bytes)=%d\n", name, local_clock_us(), execname(), pid(), fd, offset, count) 95 | } 96 | } 97 | 98 | probe syscall.pread.return, syscall.pwrite.return { 99 | if (pid() == target()) { 100 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(bytes)=%d\n", name, local_clock_us(), execname(), pid(), $return) 101 | } 102 | } 103 | 104 | # some ugly tricks added here for compatibility, in particular there seem to be problems in RHEL/EL 7.x debuginfo related to 105 | # $iocbpp which is incorrectly reported as long int in those system instead of struct iocb**, also we use a trick to resolve the array 106 | # the use of a probe on kernel.function instead of syscall.io_submit also needed for some platforms (RHEL 6.7) 107 | # See below for a more basic probe that works on RHEL and RHEL6.6 108 | probe kernel.function("sys_io_submit") { 109 | if (pid() == target()) { 110 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", "io_submit", local_clock_us(), execname(), pid(), $nr) 111 | for (i=0; i<$nr; i++) { 112 | printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_fildes, 113 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_offset, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_nbytes, 114 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_lio_opcode) 115 | } 116 | } 117 | } 118 | 119 | # For reference this is the original probe on io_submit without the compatibility tricks used above 120 | # probe syscall.io_submit { 121 | # if (pid() == target()) { 122 | # printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", name, local_clock_us(), execname(), pid(), nr) 123 | # for (i=0; i<$nr; i++) { 124 | # printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, $iocbpp[i]->aio_fildes, 125 | # $iocbpp[i]->aio_offset, $iocbpp[i]->aio_nbytes, $iocbpp[i]->aio_lio_opcode) 126 | # } 127 | # } 128 | # } 129 | 130 | 131 | probe syscall.io_submit.return { 132 | if (pid() == target()) { 133 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 134 | } 135 | } 136 | 137 | probe syscall.io_getevents { 138 | if (pid() == target()) { 139 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout=%s\n", name, local_clock_us(), execname(), pid(), min_nr, $timeout$) 140 | } 141 | } 142 | 143 | # need to explicitly cast $events for RHEL/EL 7.x kernels, where debuginfo issues report $events as long int instead of struct io_event* 144 | probe syscall.io_getevents.return { 145 | if (pid() == target()) { 146 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 147 | for (i=0; i<$return; i++) { # cycle over the reaped I/Os 148 | obj_addr = @cast($events, "struct io_event")[i]->obj # details of struct iocb in /usr/include/libaio.h 149 | fildes = user_uint32(obj_addr +20) 150 | bytes = user_uint64(obj_addr +32) 151 | offset = user_uint64(obj_addr +40) 152 | printf(" %d:, fildes=%d, offset=%lu, bytes=%lu\n", i+1, fildes, offset, bytes) 153 | } 154 | } 155 | } 156 | 157 | 158 | global active_bio[10000] 159 | 160 | probe ioblock.request { 161 | if (pid() == target()) { 162 | printf("OS: ->%s, timestamp=%d, pid=%d, devname=%s, sector=%d, size=%d, rw=%d, address_bio=%lu\n", 163 | name, local_clock_us(), pid(), devname, sector, size, rw, $bio ) 164 | active_bio[$bio] += 1 165 | } 166 | } 167 | 168 | probe ioblock_trace.request { 169 | if (pid() == target()) { 170 | printf("OS: ->%s, timestamp=%d, pid=%d, devname=%s, sector=%d, size=%d, rw=%d, address_bio=%lu\n", 171 | name, local_clock_us(), pid(), devname, sector, size, rw, $bio ) 172 | } 173 | } 174 | 175 | # the use of active_bio[] is a workaround as pid is not populated in ioblock.end and therefore cannot be used for filtering 176 | probe ioblock.end { 177 | if (active_bio[$bio] >= 1) { 178 | printf("OS: <-%s, timestamp=%d, pid=%d, devname=%s, sector=%d, rw=%d, address_bio=%lu\n", 179 | name, local_clock_us(), pid(), devname, sector, rw, $bio) 180 | active_bio[$bio] -= 1 181 | } 182 | } 183 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_logicalio_wait_events_physicalio_12102.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_logicalio_wait_events_physicalio_12102.stp 4 | # 5 | # This is a SystemTap probe to trace logical I/O wait events and physical I/O 6 | # the script works by hooking into the OS syscalls and to Oracle kernel functions 7 | # 8 | # Dependencies (for Oracle userspace tracing): 9 | # Use systemtap 2.5 or higher 10 | # Kernel must have support for uprobes or utrace (for example RHEL/EL 7.x and 6.x) 11 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 12 | # Needs kernel debuginfo 13 | # 14 | # Notable exception and issue with Oracle 12.1.0.2: 15 | # this script will throw "inode-offset registration error" when run against 12.1.0.2 on 16 | # RHEL/OL7.1 (i.e. kernel 3.10.0-229.x). The workaround is to use an older kernel 17 | # such as RHEL/OL7.0 (kernel 3.10.0-123.x). It seems to work fine on RHEL/EL 7.1 and 11g. 18 | # 19 | # Usage: stap -v trace_logicalio_wait_events_physicalio_12102.stp -x 20 | # 21 | # This script version is for Oracle 12.1.0.2, support for other Oracle versions also available 22 | # see also trace_logicalio_wait_events_physicalio_11204.stp 23 | # 24 | # Version 1.0, Oct 2014 by Luca.Canali@cern.ch 25 | # Additional credits for original contributions: @FritsHoogland 26 | # Logical I/O probes kcbgtcr and kcbgcur, from original work by Alexander Anokhin and @TanelPoder 27 | # 28 | # Note: This is experimental code, use at your own risk 29 | # The overhead of this code can be high, use rather on test systems 30 | # 31 | 32 | #################### 33 | # Trace Logical IO # 34 | #################### 35 | 36 | probe process("oracle").function("kcbgtcr") { # consistent reads 37 | pointer_lio=register("rdi") 38 | printf("DB LOGICAL IO Consistent Read (kcbgtcr) for block: tbs#=%ld, rfile#=%d, block#=%d, bigfile_block#=%d, obj#=%ld\n", 39 | user_int32(pointer_lio), user_int32(pointer_lio+4) >> 22 & 0x003FFFFF, user_int32(pointer_lio+4) & 0x003FFFFF, 40 | user_int32(pointer_lio+4), user_int32(pointer_lio+8)) 41 | } 42 | 43 | probe process("oracle").function("kcbgcur") { # current reads 44 | pointer_lio=register("rdi") 45 | printf("DB LOGICAL IO Current Read (kcbgcur) for block: tbs#=%ld, rfile#=%d, block#=%d, bigfile_block#=%d, obj#=%ld\n", 46 | user_int32(pointer_lio), user_int32(pointer_lio+4) >> 22 & 0x003FFFFF, user_int32(pointer_lio+4) & 0x003FFFFF, 47 | user_int32(pointer_lio+4), user_int32(pointer_lio+8)) 48 | } 49 | 50 | probe process("oracle").function("kcbzib") { # initiates physical reads into buffer cache 51 | printf(" ->kcbzib, Oracle logical read operations require physical reads into the buffer cache\n") 52 | } 53 | 54 | probe process("oracle").function("kcbzgb") { # prepares physical read into buffer cache 55 | printf(" -> kcbzgb, Oracle has allocated buffer cache space for block: tbs#=%ld, rfile#=%d, block#=%d, bigfile_block#=%d, obj#=%ld\n", 56 | register("rsi"), register("rdx") >> 22 & 0x003FFFFF, register("rdx") & 0x003FFFFF, register("rdx"), register("r8")) 57 | } 58 | 59 | probe process("oracle").function("kcbzvb") { # data blocks who have undergone an I/O operation 60 | printf(" ->kcbzvb, Oracle has performed I/O on: file#=%ld, block#=%d, rfile#=%d, bigfile_block#=%d\n", 61 | register("rsi"), register("rdx") & 0x003FFFFF, register("rdx") >> 22 & 0x003FFFFF, register("rdx")) 62 | } 63 | 64 | 65 | ############################## 66 | # Trace wait events 12.1.0.2 # 67 | ############################## 68 | 69 | probe process("oracle").function("kskthbwt") { 70 | xksuse = register("r13")-3928 71 | ksusenum = user_uint16(xksuse + 1704) 72 | printf("==========\nDB WAIT EVENT BEGIN: timestamp_ora=%ld, pid=%d, sid=%d, event#=%u\n", 73 | register("rsi"), pid(), ksusenum, register("rdx")) 74 | } 75 | 76 | probe process("oracle").function("kskthewt") { 77 | xksuse = register("r13") - 3928 78 | ksuudnam = user_string(xksuse + 140) 79 | ksusenum = user_uint16(xksuse + 1704) 80 | ksuseopc = user_uint16(xksuse + 1602) 81 | ksusep1 = user_uint64(xksuse + 1608) 82 | ksusep2 = user_uint64(xksuse + 1616) 83 | ksusep3 = user_uint64(xksuse + 1624) 84 | ksusetim = user_uint32(xksuse + 1632) 85 | ksusesqh = user_uint32(xksuse + 1868) 86 | ksuseobj = user_uint32(xksuse + 2312) 87 | printf("DB WAIT EVENT END: timestamp_ora=%ld, pid=%d, sid=%d, name=%s, event#=%u, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, obj=%d, sql_hash=%u\n==========\n", 88 | register("rdi"), pid(), ksusenum, ksuudnam, ksuseopc, ksusep1, ksusep2, ksusep3, ksusetim, ksuseobj, ksusesqh) 89 | } 90 | 91 | 92 | ########################### 93 | # Trace Physical IO - ASM # 94 | ########################### 95 | 96 | probe syscall.pread, syscall.pwrite { 97 | if (pid() == target()) { 98 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, fd=%d, offset=%d, count(bytes)=%d\n", name, local_clock_us(), execname(), pid(), fd, offset, count) 99 | } 100 | } 101 | 102 | probe syscall.pread.return, syscall.pwrite.return { 103 | if (pid() == target()) { 104 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(bytes)=%d\n", name, local_clock_us(), execname(), pid(), $return) 105 | } 106 | } 107 | 108 | # some ugly tricks added here for compatibility, in particular there seem to be problems in RHEL/EL 7.x debuginfo related to 109 | # $iocbpp which is incorrectly reported as long int in those system instead of struct iocb**, also we use a trick to resolve the array 110 | # the use of a probe on kernel.function instead of syscall.io_submit also needed for some platforms (RHEL 6.7) 111 | # See below for a more basic probe that works on RHEL and RHEL6.6 112 | probe kernel.function("sys_io_submit") { 113 | if (pid() == target()) { 114 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", "io_submit", local_clock_us(), execname(), pid(), $nr) 115 | for (i=0; i<$nr; i++) { 116 | printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_fildes, 117 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_offset, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_nbytes, 118 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_lio_opcode) 119 | } 120 | } 121 | } 122 | 123 | # For reference this is the original probe on io_submit without the compatibility tricks used above 124 | # probe syscall.io_submit { 125 | # if (pid() == target()) { 126 | # printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", name, local_clock_us(), execname(), pid(), nr) 127 | # for (i=0; i<$nr; i++) { 128 | # printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, $iocbpp[i]->aio_fildes, 129 | # $iocbpp[i]->aio_offset, $iocbpp[i]->aio_nbytes, $iocbpp[i]->aio_lio_opcode) 130 | # } 131 | # } 132 | # } 133 | 134 | 135 | probe syscall.io_submit.return { 136 | if (pid() == target()) { 137 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 138 | } 139 | } 140 | 141 | probe syscall.io_getevents { 142 | if (pid() == target()) { 143 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout=%s\n", name, local_clock_us(), execname(), pid(), min_nr, $timeout$) 144 | } 145 | } 146 | 147 | # need to explicitly cast $events for RHEL/EL 7.x kernels, where debuginfo issues report $events as long int instead of struct io_event* 148 | probe syscall.io_getevents.return { 149 | if (pid() == target()) { 150 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 151 | for (i=0; i<$return; i++) { # cycle over the reaped I/Os 152 | obj_addr = @cast($events, "struct io_event")[i]->obj # details of struct iocb in /usr/include/libaio.h 153 | fildes = user_uint32(obj_addr +20) 154 | bytes = user_uint64(obj_addr +32) 155 | offset = user_uint64(obj_addr +40) 156 | printf(" %d:, fildes=%d, offset=%lu, bytes=%lu\n", i+1, fildes, offset, bytes) 157 | } 158 | } 159 | } 160 | 161 | global active_bio[10000] 162 | 163 | probe ioblock.request { 164 | if (pid() == target()) { 165 | printf("OS: ->%s, timestamp=%d, pid=%d, devname=%s, sector=%d, size=%d, rw=%d, address_bio=%lu\n", 166 | name, local_clock_us(), pid(), devname, sector, size, rw, $bio ) 167 | active_bio[$bio] += 1 168 | } 169 | } 170 | 171 | probe ioblock_trace.request { 172 | if (pid() == target()) { 173 | printf("OS: ->%s, timestamp=%d, pid=%d, devname=%s, sector=%d, size=%d, rw=%d, address_bio=%lu\n", 174 | name, local_clock_us(), pid(), devname, sector, size, rw, $bio ) 175 | } 176 | } 177 | 178 | # the use of active_bio[] is a workaround as pid is not populated in ioblock.end and therefore cannot be used for filtering 179 | probe ioblock.end { 180 | if (active_bio[$bio] >= 1) { 181 | printf("OS: <-%s, timestamp=%d, pid=%d, devname=%s, sector=%d, rw=%d, address_bio=%lu\n", 182 | name, local_clock_us(), pid(), devname, sector, rw, $bio) 183 | active_bio[$bio] -= 1 184 | } 185 | } 186 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_wait_events_asyncio_libaio_11204.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_wait_events_asyncio_libaio_11204.stp 4 | # 5 | # This is a SystemTap probe to trace logical I/O wait events and physical I/O 6 | # the script works by hooking into the OS syscalls and to Oracle kernel functions 7 | # 8 | # Dependencies (for Oracle userspace tracing): 9 | # Use systemtap 2.5 or higher 10 | # Kernel must have support for uprobes or utrace (for example RHEL/EL 7.x and 6.x) 11 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 12 | # Needs kernel debuginfo 13 | # Needs libaio debuginfo 14 | # 15 | # Usage: stap -v trace_oracle_wait_events_asyncio_libaio_11204.stp -x 16 | # 17 | # This script version is for Oracle 11.2.0.4, support for other Oracle versions also available 18 | # see trace_oracle_wait_events_asyncio_libaio_12102.stp 19 | # 20 | # Version 1.0, Oct 2014 by Luca.Canali@cern.ch 21 | # Additional credits for original contributions: @FritsHoogland 22 | # 23 | # Note: This is experimental code, use at your own risk 24 | # The overhead of this code can be high, use rather on test systems 25 | # 26 | 27 | #################### 28 | # Trace libaio # 29 | #################### 30 | 31 | # Dependencies: needs libaio debuginfo 32 | # Use systemtap 2.5 or higher (for Oracle userspace tracing) or comment out 33 | # On some systems need to change "/usr/lib64/libaio.so" for "/lib64/libaio.so.1" 34 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 35 | 36 | probe process("/usr/lib64/libaio.so").function("io_getevents_0_4") { 37 | printf ("LIBAIO:->io_getevents_0_4: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout.tv_sec=%d\n", 38 | local_clock_us(), execname(), pid(), $min_nr, @cast($timeout,"struct timespec","")->tv_sec) 39 | } 40 | 41 | ############################## 42 | # Trace wait events 11.2.0.4 # 43 | ############################## 44 | 45 | 46 | probe process("oracle").function("kskthbwt") { 47 | xksuse = register("r13")-7912 48 | ksusenum = user_uint16(xksuse + 5920) 49 | printf("==========\nDB WAIT EVENT BEGIN: timestamp_ora=%ld, pid=%d, sid=%d, event#=%u\n", 50 | register("rsi"), pid(), ksusenum, register("rdx")) 51 | } 52 | 53 | probe process("oracle").function("kskthewt") { 54 | xksuse = register("r13")-7912 55 | ksuudnam = user_string(xksuse + 132) 56 | ksusenum = user_uint16(xksuse + 5920) 57 | ksuseopc = user_uint16(xksuse + 5826) 58 | ksusep1 = user_uint64(xksuse + 5832) 59 | ksusep2 = user_uint64(xksuse + 5840) 60 | ksusep3 = user_uint64(xksuse + 5848) 61 | ksusetim = user_uint32(xksuse + 5856) 62 | ksusesqh = user_uint32(xksuse + 6084) 63 | ksuseobj = user_uint32(xksuse + 6464) 64 | printf("DB WAIT EVENT END: timestamp_ora=%ld, pid=%d, sid=%d, name=%s, event#=%u, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, obj=%d, sql_hash=%u\n==========\n", 65 | register("rdi"), pid(), ksusenum, ksuudnam, ksuseopc, ksusep1, ksusep2, ksusep3, ksusetim, ksuseobj, ksusesqh) 66 | } 67 | 68 | ########################### 69 | # Trace Physical IO - ASM # 70 | ########################### 71 | 72 | probe syscall.pread, syscall.pwrite { 73 | if (pid() == target()) { 74 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, fd=%d, offset=%d, count(bytes)=%d\n", name, local_clock_us(), execname(), pid(), fd, offset, count) 75 | } 76 | } 77 | 78 | probe syscall.pread.return, syscall.pwrite.return { 79 | if (pid() == target()) { 80 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(bytes)=%d\n", name, local_clock_us(), execname(), pid(), $return) 81 | } 82 | } 83 | 84 | # some ugly tricks added here for compatibility, in particular there seem to be problems in RHEL/EL 7.x debuginfo related to 85 | # $iocbpp which is incorrectly reported as long int in those system instead of struct iocb**, also we use a trick to resolve the array 86 | # the use of a probe on kernel.function instead of syscall.io_submit also needed for some platforms (RHEL 6.7) 87 | # See below for a more basic probe that works on RHEL and RHEL6.6 88 | probe kernel.function("sys_io_submit") { 89 | if (pid() == target()) { 90 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", "io_submit", local_clock_us(), execname(), pid(), $nr) 91 | for (i=0; i<$nr; i++) { 92 | printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_fildes, 93 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_offset, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_nbytes, 94 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_lio_opcode) 95 | } 96 | } 97 | } 98 | 99 | # For reference this is the original probe on io_submit without the compatibility tricks used above 100 | # probe syscall.io_submit { 101 | # if (pid() == target()) { 102 | # printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", name, local_clock_us(), execname(), pid(), nr) 103 | # for (i=0; i<$nr; i++) { 104 | # printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, $iocbpp[i]->aio_fildes, 105 | # $iocbpp[i]->aio_offset, $iocbpp[i]->aio_nbytes, $iocbpp[i]->aio_lio_opcode) 106 | # } 107 | # } 108 | # } 109 | 110 | 111 | probe syscall.io_submit.return { 112 | if (pid() == target()) { 113 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 114 | } 115 | } 116 | 117 | probe syscall.io_getevents { 118 | if (pid() == target()) { 119 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout=%s\n", name, local_clock_us(), execname(), pid(), min_nr, $timeout$) 120 | } 121 | } 122 | 123 | # need to explicitly cast $events for RHEL/EL 7.x kernels, where debuginfo issues report $events as long int instead of struct io_event* 124 | probe syscall.io_getevents.return { 125 | if (pid() == target()) { 126 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 127 | for (i=0; i<$return; i++) { # cycle over the reaped I/Os 128 | obj_addr = @cast($events, "struct io_event")[i]->obj # details of struct iocb in /usr/include/libaio.h 129 | fildes = user_uint32(obj_addr +20) 130 | bytes = user_uint64(obj_addr +32) 131 | offset = user_uint64(obj_addr +40) 132 | printf(" %d:, fildes=%d, offset=%lu, bytes=%lu\n", i+1, fildes, offset, bytes) 133 | } 134 | } 135 | } 136 | -------------------------------------------------------------------------------- /SystemTap_Userspace_Oracle/trace_oracle_wait_events_asyncio_libaio_12102.stp: -------------------------------------------------------------------------------- 1 | #!/usr/local/bin/stap 2 | # 3 | # trace_oracle_wait_events_asyncio_libaio_12102.stp 4 | # 5 | # This is a SystemTap probe to trace logical I/O wait events and physical I/O 6 | # the script works by hooking into the OS syscalls and to Oracle kernel functions 7 | # 8 | # Dependencies (for Oracle userspace tracing): 9 | # Use systemtap 2.5 or higher 10 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 11 | # The oracle executable should be in the path: add $ORACLE_HOME/bin in $PATH 12 | # Needs kernel debuginfo 13 | # Needs libaio debuginfo 14 | # 15 | # Notable exception and issue with Oracle 12.1.0.2: 16 | # this script will throw "inode-offset registration error" when run against 12.1.0.2 on 17 | # RHEL/OL7.1 (i.e. kernel 3.10.0-229.x). The workaround is to use an older kernel 18 | # such as RHEL/OL7.0 (kernel 3.10.0-123.x). It seems to work fine on RHEL/OL 7.1 and 11g. 19 | 20 | # 21 | # Usage: stap -v trace_oracle_wait_events_asyncio_libaio_12102.stp -x 22 | # 23 | # This script version is for Oracle 12.1.0.2, support for other Oracle versions also available 24 | # see trace_oracle_wait_events_asyncio_libao_11204.stp 25 | # 26 | # Version 1.0, Oct 2014 by Luca.Canali@cern.ch 27 | # Additional credits for original contributions: @FritsHoogland 28 | # 29 | # Note: This is experimental code, use at your own risk 30 | # The overhead of this code can be high, use rather on test systems 31 | # 32 | 33 | #################### 34 | # Trace libaio # 35 | #################### 36 | 37 | # Dependencies: needs libaio debuginfo 38 | # Use systemtap 2.5 or higher (for Oracle userspace tracing) or comment out 39 | # On some systems need to change "/usr/lib64/libaio.so" for "/lib64/libaio.so.1" 40 | # Kernel must have support for uprobes or utrace (for example RHEL/OL 7.x and 6.x) 41 | 42 | probe process("/usr/lib64/libaio.so").function("io_getevents_0_4") { 43 | printf ("LIBAIO:->io_getevents_0_4: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout.tv_sec=%d\n", 44 | local_clock_us(), execname(), pid(), $min_nr, @cast($timeout,"struct timespec","")->tv_sec) 45 | } 46 | 47 | ############################## 48 | # Trace wait events 12.1.0.2 # 49 | ############################## 50 | 51 | probe process("oracle").function("kskthbwt") { 52 | xksuse = register("r13")-3928 53 | ksusenum = user_uint16(xksuse + 1704) 54 | printf("==========\nDB WAIT EVENT BEGIN: timestamp_ora=%ld, pid=%d, sid=%d, event#=%u\n", 55 | register("rsi"), pid(), ksusenum, register("rdx")) 56 | } 57 | 58 | probe process("oracle").function("kskthewt") { 59 | xksuse = register("r13") - 3928 60 | ksuudnam = user_string(xksuse + 140) 61 | ksusenum = user_uint16(xksuse + 1704) 62 | ksuseopc = user_uint16(xksuse + 1602) 63 | ksusep1 = user_uint64(xksuse + 1608) 64 | ksusep2 = user_uint64(xksuse + 1616) 65 | ksusep3 = user_uint64(xksuse + 1624) 66 | ksusetim = user_uint32(xksuse + 1632) 67 | ksusesqh = user_uint32(xksuse + 1868) 68 | ksuseobj = user_uint32(xksuse + 2312) 69 | printf("DB WAIT EVENT END: timestamp_ora=%ld, pid=%d, sid=%d, name=%s, event#=%u, p1=%lu, p2=%lu, p3=%lu, wait_time=%u, obj=%d, sql_hash=%u\n==========\n", 70 | register("rdi"), pid(), ksusenum, ksuudnam, ksuseopc, ksusep1, ksusep2, ksusep3, ksusetim, ksuseobj, ksusesqh) 71 | } 72 | 73 | ########################### 74 | # Trace Physical IO - ASM # 75 | ########################### 76 | 77 | probe syscall.pread, syscall.pwrite { 78 | if (pid() == target()) { 79 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, fd=%d, offset=%d, count(bytes)=%d\n", name, local_clock_us(), execname(), pid(), fd, offset, count) 80 | } 81 | } 82 | 83 | probe syscall.pread.return, syscall.pwrite.return { 84 | if (pid() == target()) { 85 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(bytes)=%d\n", name, local_clock_us(), execname(), pid(), $return) 86 | } 87 | } 88 | 89 | # some ugly tricks added here for compatibility, in particular there seem to be problems in RHEL/EL 7.x debuginfo related to 90 | # $iocbpp which is incorrectly reported as long int in those system instead of struct iocb**, also we use a trick to resolve the array 91 | # the use of a probe on kernel.function instead of syscall.io_submit also needed for some platforms (RHEL 6.7) 92 | # See below for a more basic probe that works on RHEL and RHEL6.6 93 | probe kernel.function("sys_io_submit") { 94 | if (pid() == target()) { 95 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", "io_submit", local_clock_us(), execname(), pid(), $nr) 96 | for (i=0; i<$nr; i++) { 97 | printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_fildes, 98 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_offset, @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_nbytes, 99 | @cast(user_int64($iocbpp+8*i), "struct iocb")->aio_lio_opcode) 100 | } 101 | } 102 | } 103 | 104 | # For reference this is the original probe on io_submit without the compatibility tricks used above 105 | # probe syscall.io_submit { 106 | # if (pid() == target()) { 107 | # printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, nr(num I/O)=%d\n", name, local_clock_us(), execname(), pid(), nr) 108 | # for (i=0; i<$nr; i++) { 109 | # printf(" %d: file descriptor=%d, offset=%d, bytes=%d, opcode=%d\n", i+1, $iocbpp[i]->aio_fildes, 110 | # $iocbpp[i]->aio_offset, $iocbpp[i]->aio_nbytes, $iocbpp[i]->aio_lio_opcode) 111 | # } 112 | # } 113 | # } 114 | 115 | 116 | probe syscall.io_submit.return { 117 | if (pid() == target()) { 118 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 119 | } 120 | } 121 | 122 | probe syscall.io_getevents { 123 | if (pid() == target()) { 124 | printf ("OS: ->%s: timestamp=%d, program=%s, pid=%d, min_nr=%d, timeout=%s\n", name, local_clock_us(), execname(), pid(), min_nr, $timeout$) 125 | } 126 | } 127 | 128 | # need to explicitly cast $events for RHEL/EL 7.x kernels, where debuginfo issues report $events as long int instead of struct io_event* 129 | probe syscall.io_getevents.return { 130 | if (pid() == target()) { 131 | printf ("OS: <-%s: timestamp=%d, program=%s, pid=%d, return(num I/O)=%ld\n", name, local_clock_us(), execname(), pid(), $return) 132 | for (i=0; i<$return; i++) { # cycle over the reaped I/Os 133 | obj_addr = @cast($events, "struct io_event")[i]->obj # details of struct iocb in /usr/include/libaio.h 134 | fildes = user_uint32(obj_addr +20) 135 | bytes = user_uint64(obj_addr +32) 136 | offset = user_uint64(obj_addr +40) 137 | printf(" %d:, fildes=%d, offset=%lu, bytes=%lu\n", i+1, fildes, offset, bytes) 138 | } 139 | } 140 | } 141 | --------------------------------------------------------------------------------