├── Install-BCC └── Ubuntu 18.04 LTS源码构建bcc.md ├── README.md ├── USDT └── usdt.py ├── docs └── README.md ├── img ├── systrace.png └── 动态追踪技术漫谈.png ├── interface ├── BPF │ └── BPF_FUNC_INTERFACE.md └── python │ └── BCC_PYTHON.md ├── kprobe ├── README.md ├── kprobe_blk.py └── kprobe_multi_func.py ├── temp ├── code │ ├── BPF_func_use │ │ ├── maps │ │ │ ├── BPF_HASH │ │ │ │ └── bpfhash.py │ │ │ ├── README.md │ │ │ └── key.py │ │ ├── print │ │ │ ├── BPF_PERF_OUTPUT.py │ │ │ ├── README.md │ │ │ ├── print_custom_fields.py │ │ │ └── print_fields.py │ │ └── time │ │ │ └── start_end.py │ ├── ebpf_prog_for_test │ │ ├── for_limited.py │ │ └── for_unlimited.py │ ├── hello.py │ ├── histogram.py │ ├── index │ │ ├── README.md │ │ └── mm │ │ │ ├── hw │ │ │ ├── bpfhash.py │ │ │ └── start_end.py │ │ │ └── pro_past_1s_mmap_size │ │ │ └── pmmapsz.py │ └── trace_sync.py ├── idea │ └── idea.md └── note │ ├── BccLesson │ ├── lessonNote │ │ └── disksnoop.md │ └── lessonRecord │ │ └── README.md │ ├── README.md │ └── eBPFnote │ ├── eBPFPROGNOTE.md │ └── image │ └── eBPF程序原理.png ├── tracepoint ├── HASH_OUTPUT.py ├── tracepoint.py ├── tracepoint_kmalloc.py └── tracepoint_mm_vmscan_writepage.py └── uprobe └── uprobe_strlen.py /Install-BCC/Ubuntu 18.04 LTS源码构建bcc.md: -------------------------------------------------------------------------------- 1 | ### Ubuntu 18.04 LTS源码构建bcc 2 | 3 | 目前安装bcc有两种方式,一种是直接使用发行版提供的软件包,Ubuntu里叫bpfcc-tools,CentOS7中的是bcc-tools。另一种方式是源码编译安装。推荐通过源码编译安装。 4 | 5 | > 第一种和第二种方式只能二选一,否则会有冲突导致不可用 6 | 7 | 有人反应第一种方式安装bcc后,BPF模块各种出错。目前通过源码编译安装是最稳定最安全的方法。下面将详细介绍通过源码编译安装bcc。 8 | 9 | #### 踩坑指南 10 | 11 | - Linux发行版最好用Ubuntu,不要用CentOS7 12 | - 尽量不要用曾经手动升级过内核的系统,用原生的发行版系统 13 | - 命令安装与源码安装只可二选一,否则可能导致不可用 14 | - 不要直接clone官方仓库,编译会缺文件,使用bcc的release包。 15 | - 官方要求的依赖缺少对python3附加模块的支持,需要自己手动添加 16 | 17 | #### 源码编译安装bcc 18 | 19 | 1. 检查环境(特别高版本内核可以忽略此步) 20 | 21 | **内核配置**:高版本的内核这些是标配,基本不用管,不放心也可以检查下。通过命令 22 | 23 | ```c 24 | less /boot/config- 25 | ``` 26 | 27 | 配置选项 28 | 29 | ```c 30 | CONFIG_BPF=y 31 | CONFIG_BPF_SYSCALL=y 32 | # [optional, for tc filters] 33 | CONFIG_NET_CLS_BPF=m 34 | # [optional, for tc actions] 35 | CONFIG_NET_ACT_BPF=m 36 | CONFIG_BPF_JIT=y 37 | # [for Linux kernel versions 4.1 through 4.6] 38 | CONFIG_HAVE_BPF_JIT=y 39 | # [for Linux kernel versions 4.7 and later] 40 | CONFIG_HAVE_EBPF_JIT=y 41 | # [optional, for kprobes] 42 | CONFIG_BPF_EVENTS=y 43 | ``` 44 | 45 | ```c 46 | CONFIG_NET_SCH_SFQ=m 47 | CONFIG_NET_ACT_POLICE=m 48 | CONFIG_NET_ACT_GACT=m 49 | CONFIG_DUMMY=m 50 | CONFIG_VXLAN=m 51 | ``` 52 | 53 | **构建工具**:这些是构建依赖的工具,和它们最低的版本要求(后面会安装,这里用于查看版本) 54 | 55 | ![工具检查](C:\Users\hds\Desktop\工具检查.png) 56 | 57 | 2. 正式安装 58 | 59 | 先安装所有依赖的工具 60 | 61 | ```c 62 | sudo apt-get -y install bison build-essential cmake flex git libedit-dev \ 63 | libllvm6.0 llvm-6.0-dev libclang-6.0-dev python zlib1g-dev libelf-dev 64 | ``` 65 | 66 | 除了官网要求的这些工具之外,还要额外安装几个python3的包 67 | 68 | ```c 69 | sudo apt-get install python3-distutils 70 | sudo apt-get install python3-pip 71 | sudo apt-get install python3-setuptools 72 | ``` 73 | 74 | > 虽然官网没有要求安装,但我在实际编译过程中发生编译中断,通过查资料发现是缺少python3的依赖包。Python基本解释器确实需要一些附加模块,这些未默认安装在Ubuntu 18.04 75 | 76 | 接下来在主目录新建一个文件夹,用来放bcc源码 77 | 78 | ```c 79 | mkdir ebpf; cd ebpf 80 | ``` 81 | 82 | 下载bcc的release包:[点击下载](https://github.com/iovisor/bcc/releases) 83 | 84 | ![release包](C:\Users\hds\Desktop\release包.png) 85 | 86 | 选择**bcc-src-with-submodule.tar.gz**,然后解压。剩下的只要依次执行下列命令就可以安装成功了。 87 | 88 | ```c 89 | mkdir bcc/build; cd bcc/build 90 | cmake .. 91 | make 92 | sudo make install 93 | cmake -DPYTHON_CMD=python3 .. # build python3 binding 94 | pushd src/python/ 95 | make 96 | sudo make install 97 | popd 98 | ``` 99 | 100 | 3. 使用bcc tools里的工具 101 | 102 | > bcc工具的默认安装目录为/usr/share/bcc/tools 103 | 104 | 使用cachestat查看缓存命中情况 105 | 106 | ![bcc工具2](C:\Users\hds\Desktop\bcc工具2.png) 107 | 108 | 运行一个最简单的ebpf程序:检测到`clone`系统调用就打印`hello world` 109 | 110 | ![bcc工具1](C:\Users\hds\Desktop\bcc工具1.png) 111 | 112 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## eBPF/BCC原理和编程学习 2 | 3 | #### 动态追踪技术地图 4 | 5 | ![动态追踪技术漫谈](/img/动态追踪技术漫谈.png) 6 | 7 | - 参考资料 8 | 9 | [动态追踪技术漫谈](https://blog.openresty.com.cn/cn/dynamic-tracing/) 10 | 11 | #### 系统跟踪框架 12 | 13 | ![systrace](/img/systrace.png) 14 | 15 | 16 | 17 | #### 关于eBPF的好文 18 | 19 | - LWN.net的eBPF/BCC文章完整系列 20 | 1. [A thorough introduction to eBPF](https://lwn.net/Articles/740157/) 21 | 2. [An introduction to the BPF Compiler Collection](https://lwn.net/Articles/742082/) 22 | 3. [Some advanced BCC topics](https://lwn.net/Articles/747640/) 23 | 4. [Using user-space tracepoints with BPF](https://lwn.net/Articles/753601/) 24 | - 博客/公众号文章 25 | 1. [eBPF 简史](https://linux.cn/article-9032-1.html) 26 | 2. [From High Ceph Latency to Kernel Patch with eBPF/BCC](https://mp.weixin.qq.com/s/9MWwwsuB1jgfQ8HTvkh7TA) 27 | 28 | 29 | 30 | #### 仓库代码资料参考 31 | 32 | - [作者badman250的专栏:底层性能诊断](https://blog.csdn.net/notbaron/category_7425420.html) 33 | 34 | 35 | 36 | #### 内容不断完善中... -------------------------------------------------------------------------------- /USDT/usdt.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #USDT在python中有支持 5 | 6 | USDT(pid = int(pid)) #初始化指定进程的USDT 7 | 8 | #绑定BPF的C函数到http_server_request的USDT probe 9 | u.enable_probe(probe = "http_server_request", fn_name = "do_trace") 10 | 11 | #传递USDT对象到BPF中 12 | BPF(text = bpf_text, usdt_contexts = [u]) 13 | -------------------------------------------------------------------------------- /docs/README.md: -------------------------------------------------------------------------------- 1 | ### bpf_design_QA 2 | 3 | BPF extensibility and applicability to networking, tracing, security 4 | in the linux kernel and several user space implementations of BPF 5 | virtual machine led to a number of misunderstanding on what BPF actually is. 6 | This short QA is an attempt to address that and outline a direction 7 | of where BPF is heading long term. 8 | 9 | **Q: Is BPF a generic instruction set similar to x64 and arm64?** 10 | 11 | A: NO. 12 | 13 | **Q: Is BPF a generic virtual machine ?** 14 | 15 | A: NO. 16 | 17 | BPF is generic instruction set _with_ C calling convention. 18 | 19 | **Q: Why C calling convention was chosen?** 20 | 21 | A: Because BPF programs are designed to run in the linux kernel 22 | which is written in C, hence BPF defines instruction set compatible 23 | with two most used architectures x64 and arm64 (and takes into 24 | consideration important quirks of other architectures) and 25 | defines calling convention that is compatible with C calling 26 | convention of the linux kernel on those architectures. 27 | 28 | **Q: can multiple return values be supported in the future?** 29 | 30 | A: NO. BPF allows only register R0 to be used as return value. 31 | 32 | **Q: can more than 5 function arguments be supported in the future?** 33 | 34 | A: NO. BPF calling convention only allows registers R1-R5 to be used 35 | as arguments. BPF is not a standalone instruction set. 36 | (unlike x64 ISA that allows msft, cdecl and other conventions) 37 | 38 | **Q: can BPF programs access instruction pointer or return address?** 39 | 40 | A: NO. 41 | 42 | **Q: can BPF programs access stack pointer ?** 43 | 44 | A: NO. Only frame pointer (register R10) is accessible. 45 | From compiler point of view it's necessary to have stack pointer. 46 | For example LLVM defines register R11 as stack pointer in its 47 | BPF backend, but it makes sure that generated code never uses it. 48 | 49 | **Q: Does C-calling convention diminishes possible use cases?** 50 | 51 | A: YES. BPF design forces addition of major functionality in the form 52 | of kernel helper functions and kernel objects like BPF maps with 53 | seamless interoperability between them. It lets kernel call into 54 | BPF programs and programs call kernel helpers with zero overhead. 55 | As all of them were native C code. That is particularly the case 56 | for JITed BPF programs that are indistinguishable from 57 | native kernel C code. 58 | 59 | **Q: Does it mean that 'innovative' extensions to BPF code are disallowed?** 60 | 61 | A: Soft yes. At least for now until BPF core has support for 62 | bpf-to-bpf calls, indirect calls, loops, global variables, 63 | jump tables, read only sections and all other normal constructs 64 | that C code can produce. 65 | 66 | **Q: Can loops be supported in a safe way?** 67 | 68 | A: It's not clear yet. BPF developers are trying to find a way to 69 | support bounded loops where the verifier can guarantee that 70 | the program terminates in less than 4096 instructions. 71 | 72 | **Q: How come LD_ABS and LD_IND instruction are present in BPF whereas 73 | C code cannot express them and has to use builtin intrinsics?** 74 | 75 | A: This is artifact of compatibility with classic BPF. Modern 76 | networking code in BPF performs better without them. 77 | See 'direct packet access'. 78 | 79 | **Q: It seems not all BPF instructions are one-to-one to native CPU. 80 | For example why BPF_JNE and other compare and jumps are not cpu-like?** 81 | 82 | A: This was necessary to avoid introducing flags into ISA which are 83 | impossible to make generic and efficient across CPU architectures. 84 | 85 | **Q: why BPF_DIV instruction doesn't map to x64 div?** 86 | 87 | A: Because if we picked one-to-one relationship to x64 it would have made 88 | it more complicated to support on arm64 and other archs. Also it 89 | needs div-by-zero runtime check. 90 | 91 | **Q: why there is no BPF_SDIV for signed divide operation?** 92 | 93 | A: Because it would be rarely used. llvm errors in such case and 94 | prints a suggestion to use unsigned divide instead 95 | 96 | **Q: Why BPF has implicit prologue and epilogue?** 97 | 98 | A: Because architectures like sparc have register windows and in general 99 | there are enough subtle differences between architectures, so naive 100 | store return address into stack won't work. Another reason is BPF has 101 | to be safe from division by zero (and legacy exception path 102 | of LD_ABS insn). Those instructions need to invoke epilogue and 103 | return implicitly. 104 | 105 | **Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?** 106 | 107 | A: Because classic BPF didn't have them and BPF authors felt that compiler 108 | workaround would be acceptable. Turned out that programs lose performance 109 | due to lack of these compare instructions and they were added. 110 | These two instructions is a perfect example what kind of new BPF 111 | instructions are acceptable and can be added in the future. 112 | These two already had equivalent instructions in native CPUs. 113 | New instructions that don't have one-to-one mapping to HW instructions 114 | will not be accepted. 115 | 116 | **Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF 117 | registers which makes BPF inefficient virtual machine for 32-bit 118 | CPU architectures and 32-bit HW accelerators. Can true 32-bit registers 119 | be added to BPF in the future?** 120 | 121 | A: NO. The first thing to improve performance on 32-bit archs is to teach 122 | LLVM to generate code that uses 32-bit subregisters. Then second step 123 | is to teach verifier to mark operations where zero-ing upper bits 124 | is unnecessary. Then JITs can take advantage of those markings and 125 | drastically reduce size of generated code and improve performance. 126 | 127 | **Q: Does BPF have a stable ABI?** 128 | 129 | A: YES. BPF instructions, arguments to BPF programs, set of helper 130 | functions and their arguments, recognized return codes are all part 131 | of ABI. However when tracing programs are using bpf_probe_read() helper 132 | to walk kernel internal datastructures and compile with kernel 133 | internal headers these accesses can and will break with newer 134 | kernels. The union bpf_attr -> kern_version is checked at load time 135 | to prevent accidentally loading kprobe-based bpf programs written 136 | for a different kernel. Networking programs don't do kern_version check. 137 | 138 | **Q: How much stack space a BPF program uses?** 139 | 140 | A: Currently all program types are limited to 512 bytes of stack 141 | space, but the verifier computes the actual amount of stack used 142 | and both interpreter and most JITed code consume necessary amount. 143 | 144 | **Q: Can BPF be offloaded to HW?** 145 | 146 | A: YES. BPF HW offload is supported by NFP driver. 147 | 148 | **Q: Does classic BPF interpreter still exist?** 149 | 150 | A: NO. Classic BPF programs are converted into extend BPF instructions. 151 | 152 | **Q: Can BPF call arbitrary kernel functions?** 153 | 154 | A: NO. BPF programs can only call a set of helper functions which 155 | is defined for every program type. 156 | 157 | **Q: Can BPF overwrite arbitrary kernel memory?** 158 | 159 | A: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read() 160 | and bpf_probe_read_str() helpers. Networking programs cannot read 161 | arbitrary memory, since they don't have access to these helpers. 162 | Programs can never read or write arbitrary memory directly. 163 | 164 | **Q: Can BPF overwrite arbitrary user memory?** 165 | 166 | A: Sort-of. Tracing BPF programs can overwrite the user memory 167 | of the current task with bpf_probe_write_user(). Every time such 168 | program is loaded the kernel will print warning message, so 169 | this helper is only useful for experiments and prototypes. 170 | Tracing BPF programs are root only. 171 | 172 | **Q: When bpf_trace_printk() helper is used the kernel prints nasty 173 | warning message. Why is that?** 174 | 175 | A: This is done to nudge program authors into better interfaces when 176 | programs need to pass data to user space. Like bpf_perf_event_output() 177 | can be used to efficiently stream data via perf ring buffer. 178 | BPF maps can be used for asynchronous data sharing between kernel 179 | and user space. bpf_trace_printk() should only be used for debugging. 180 | 181 | **Q: Can BPF functionality such as new program or map types, new 182 | helpers, etc be added out of kernel module code?** 183 | 184 | A: NO. -------------------------------------------------------------------------------- /img/systrace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linuxkerneltravel/eBPF/6de3c7f05e3f1abed745b8e0e4133fb9e414a749/img/systrace.png -------------------------------------------------------------------------------- /img/动态追踪技术漫谈.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linuxkerneltravel/eBPF/6de3c7f05e3f1abed745b8e0e4133fb9e414a749/img/动态追踪技术漫谈.png -------------------------------------------------------------------------------- /interface/BPF/BPF_FUNC_INTERFACE.md: -------------------------------------------------------------------------------- 1 | #### BPF接口(函数/宏) 2 | 3 | | 函数 | 功能 | 4 | | :------------------------- | :----------------------------------------------------------- | 5 | | bpf_ktime_get_ns() | 返回纳秒时间 | 6 | | BPF_HASH(last) | 创建BPF映射对象,叫做last。如果没有指定任何参数,所以健值都是无符号64位 | 7 | | bpf_trace_printk() | 输出字符串,类似printf ,在调试中使用个,工具中使用BPF_PERF_OUTPUT() | 8 | | bpf_get_current_pid_tgid() | 函数获得pid进程,其中低32位是进程ID,高32位是组id | 9 | | BPF_PERF_OUTPUT(events) | 命名输出频道名字为events. | 10 | | bpf_get_current_common() | 函数用当前进程名字填充第一个参数地址 | 11 | | events.perf_submit() | 通过ring buffer将事件提交到用户层 | 12 | 13 | #### 接口描述 14 | 15 | 1. BPF_HASH 和 BPF_TABLE 16 | 17 | 虽然BCC提供对内核导出的全部数据结构的访问,但最常用的两个是BPF_HASH和BPF_TABLE。 18 | 19 | 从根本上讲,BCC的所有数据结构都是map,在它们之上构建更高级别的数据结构。其中最基本的是BPF_TABLE。 20 | 21 | BPF_TABLE宏接受一种类型的表(hash, percpu_array, or array)作为参数,以及其他宏,如BPF_HASH和BPF_ARRAY仅仅是围绕BPF_TABLE进行包装。 22 | 23 | 因为所有的数据结构都是map,所以它们都支持相同的核心功能集,包括map.lookup(),map.update(),和map.delete()。也有一些map特定的功能,如map.perf_read()为BPF_PERF_ARRAY的功能,map.call()为BPF_PROG_ARRAY的功能。 -------------------------------------------------------------------------------- /interface/python/BCC_PYTHON.md: -------------------------------------------------------------------------------- 1 | ### bcc python 2 | 3 | bcc的python相关的知识 4 | 5 | #### 一.初始化 6 | 7 | 1. BPF 8 | 9 | 语法:BPF({text = BPF_program | src_file = filename}, [usdt_contexts = [USDT_object, ...]]) 10 | 11 | 创建一个BPF对象,或者说实例化一个BPF对象,能通过交互来产生输出。 12 | 13 | 2. USDT 14 | 15 | 语法:USDT({pid = pid | path = path}) 16 | 17 | 创建对象来使用USDT,可以指定进程ID,路径。 18 | 19 | #### 二.事件 20 | 21 | 1. attach_kprobe 22 | 23 | 语法:BPF.attach_kprobe(event = "event", fn_name = "name") 24 | 25 | 使用内核动态跟踪的函数入口,关联C函数name和内核函数event()。 26 | 27 | 2. attach_kretprobe 28 | 29 | 语法:BPF.attach_kretprobe(event = "event", fn_name = "name") 30 | 31 | 关联C函数name和内核函数event,在内核函数返回的时候调用函数name。 32 | 33 | 3. attach_tracepoint 34 | 35 | 语法:BPF.attach_tracepoint(tp = "tracepoint", fn_name = "name") 36 | 37 | 关联C语言的定义的BPF函数和内核的tracepoint。也可以使用TRACEPOINT_PROBE宏,使用该宏可以使用高级的自声明的args结构体,args包含了tracepoint参数。如果使用attach_tracepoint,参数需要在BPF程序中声明。 38 | 39 | 4. attach_uprobe 40 | 41 | 语法:BPF.attach_uprobe(name = "location", sym = "symbol", fn_name = "name") 42 | 43 | 将在location中的函数事件symbol,关联到C定义的函数。当symbol调用时候回调用name函数。 44 | 45 | 例如: 46 | 47 | ```python 48 | b.attach_uprobe(name = "c", sym = "strlen", fn_name = "count") 49 | ``` 50 | 51 | 表示将C库的函数strlen()与定义的C函数count关联,当strlen()被调用的时候,会执行C函数。 52 | 53 | 5. attach_uretprobe 54 | 55 | 语法:BPF.attach_uretprobe(name = "location", sym = "symbol", fn_name = "name") 56 | 57 | 同attach_uprobe一样,不过是在函数返回的时候调用name函数。 58 | 59 | 6. USDT.enable_probe 60 | 61 | 语法:USDT.enable_probe(probe = probe, fn_name = name) 62 | 63 | 将BPF的C函数附加到USDT探针上。 64 | 65 | 例如: 66 | 67 | ```python 68 | u = USDT(pid = int(pid)) 69 | u.enable_probe(probe = "http_server_request", fn_name = "do_trace") 70 | ``` 71 | 72 | 查看二进制文件是否有USDT探针,可以使用如下命令检测stap调试段: 73 | 74 | ```shell 75 | #readelf -n binary 76 | ``` 77 | 78 | #### 三.调试输出 79 | 80 | 1. trace_print 81 | 82 | 语法:BPF.trace_print(fmt = "fields") 83 | 84 | 持续读取全局共享的/sys/kernel/debug/tracing/trace_pipe文件并输出。这个文件可以被BPF和bpf_trace_printk()函数写入。 85 | 86 | 例如: 87 | 88 | ```python 89 | #print trace_pipe output as-is: 90 | b.trace_print() 91 | #print PID and message 92 | b.trace_print(fmt = "{1} {5}") 93 | ``` 94 | 95 | 2. trace_fields 96 | 97 | 语法:BPF.trace_fields(nonblocking = False) 98 | 99 | 从全局共享文件/sys/kernel/debug/tracing/trace_pipe中读取一行并返回域。参数表示在等待写入的时候是否blocking。 100 | 101 | #### 四.映射 102 | 103 | 1. print_log2_hist 104 | 105 | 语法:table.print_log2_hist(val_type = "value", section_header = "Bucket ptr", section_print_fn = None) 106 | 107 | 使用ASCII以log2直方图打印表。表必须用log2方式存储,这个可以通过bpf_log2()。 108 | 109 | val_type是可选的,表示列头。 110 | 111 | section_header:如果直方图有第二个键,多个表会被打印,section_header会被作为头描述。 112 | 113 | 如果section_print_fn不是none,传递bucket值。 114 | 115 | #### 五.关于BPF Errors 116 | 117 | BPF中所有内存读取都要通过bpf_probe_read()函数将内存复制到BPF栈。如果直接读取内存,会出现Invalid mem access。 -------------------------------------------------------------------------------- /kprobe/README.md: -------------------------------------------------------------------------------- 1 | kprobes 2 | -------------------------------------------------------------------------------- /kprobe/kprobe_blk.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | from __futrue__ import print_function 4 | from bcc import BPF 5 | 6 | REQ_WRITE = 1 7 | 8 | #load eBPF program 9 | # 程序功能: 10 | # 跟踪磁盘相关的内核函数 11 | # 源码分析: 12 | # 定义了两个C函数trace_start()和trace_completion(), 13 | # 分别附加到内核函数blk_start_request()和blk_complete_request() 14 | b = BPF(text = ''' 15 | #include 16 | #include 17 | 18 | BPF_HASH(start, struct request *); //hash表的key(键值),保证唯一性和加快查找速度 19 | 20 | void trace_start(struct pt_regs *ctx, 21 | struct request *req) { 22 | u64 ts = bpf_ktime_get_ns(); 23 | start.update(&req, &ts); 24 | } 25 | 26 | void trace_completion(struct pt_regs *ctx, 27 | struct request *req) { 28 | u64 *tsp, delta; 29 | 30 | tsp = start.lookup(&req) 31 | if (tsp != 0) { 32 | delta = bpf_ktime_get_ns() - *tsp; 33 | bpf_trace_printk("%d %x %d\\n", 34 | req->__data_len, 35 | req->cmd_flags, delta/1000); 36 | 37 | start.delete(&req); 38 | } 39 | } 40 | ''') 41 | 42 | b.attach_kprobe(event = "blk_start_request", fn_naem = "trace_start") 43 | b.attach_kprobe(event = "blk_mq_start_request", fn_name = "trace_start") 44 | b.attach_kprobe(event = "blk_account_io_completion", fn_name = "trace_completion") 45 | 46 | 47 | -------------------------------------------------------------------------------- /kprobe/kprobe_multi_func.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | from bcc import BPF 4 | 5 | #ebpf program 6 | program = ''' 7 | int hello(void *ctx) 8 | { 9 | bpf_trace_printk("hello, world!\\n"); 10 | 11 | return 0; 12 | } 13 | ''' 14 | 15 | #load ebpf program 16 | b = BPF(text = program) 17 | 18 | #使用attach_kprobe()来创建sys_clone的kprobe,当触发时候运行hello程序 19 | #可以调用多个attach_kprobe()来附加C程序给多个内核函数 20 | b.attach_kprobe(event = b.get_syscall_fnname("clone"), fn_name = "hello") 21 | 22 | print("%-18s %-16s %-6s %s"%("TIME(s)", "COMM", "PID", "MESSAGE")) 23 | 24 | #output 25 | while 1: 26 | try: 27 | (task, pid, cpu, flags, ts, msg) = b.trace_fields() #trace_fields()来返回来自trace_pipe的一组域 28 | # 真正的工具应使用BPF_PERF_OUTPUT() 29 | except ValueError: 30 | continue 31 | print("%-18.9f %-16s %-6d %s"%(ts, task, pid, msg)) 32 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/maps/BPF_HASH/bpfhash.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | 5 | #analyze disksnoop.py 6 | 7 | from __future__ import print_function 8 | from bcc import BPF 9 | ''' 10 | printb(s) 11 | 12 | print a bytes object to stdout and flush 13 | ''' 14 | from bcc.utils import printb 15 | 16 | REQ_WRITE = 1 17 | 18 | b = BPF(text = ''' 19 | /* 20 | *定义寄存器组的结构体:struct pt_regs 21 | * 内核路径:/arch/x86/include/uapi/asm/ptrace.h 22 | * eBPF使用bpf_load_program()函数来将BPF代码载入内核 23 | * bpf_load_program()负责通过参数向内核提供三类信息: 24 | * .BPF程序的类型 25 | * .BPF代码 26 | * .代码运行时所需要的存放log的缓存地址(位于用户空间) 27 | * 通过bpf_load_program()的参数bpf_prog_type,可以看到eBPF支持的程序类型 28 | * bpf_prog_type: BPF_PROG_TYPE_KPROBE 29 | * BPF prog入口参数: struct pt_regs 30 | * 程序类型: 用于kprobe功能的BPF代码 31 | */ 32 | #include 33 | 34 | //定义了内核数据结构:struct request 35 | // 内核路径:/include/linux/blkdev.h 36 | #include 37 | 38 | //define hash table name:start 39 | //define key pointer type:struct request * 40 | BPF_HASH(start, struct request *); 41 | //在hash table中使用指向struct request的指针作为key,这在跟踪中很常见 42 | //指向结构体的指针是很好的key,因为它们是唯一的:两个结构体不能有相同的指针地址。 43 | //存储时间戳有两个常用的key:指向结构体的指针,和线程id。 44 | 45 | /* define probe_1 */ 46 | /* 47 | *two arguments:ctx point to registers context 48 | * req is key pointer,point to 'struct request' 49 | */ 50 | //开始的时候触发 51 | void trace_start(struct pt_regs *ctx, struct request *req) { 52 | //struct pt_regs *ctx用于寄存器和BPF上下文 53 | //stroed start timestamp 54 | u64 ts = bpf_ktime_get_ns(); 55 | 56 | //update timestamp 57 | start.update(&req, &ts);//用后面的覆写以前的值 58 | } 59 | 60 | /* define probe_2 */ 61 | //完成的时候触发 62 | void trace_completion(struct pt_regs *ctx, struct request *req) { 63 | u64 *tsp, delta; 64 | 65 | //return a pointer to its value if it exists,else NULL 66 | tsp = start.lookup(&req); 67 | if (tsp) { 68 | //u64 bpf_ktime_get_ns(void) 69 | delta = bpf_ktime_get_ns() - *tsp; 70 | 71 | /* struct request { 72 | * ... 73 | * unsigned int cmd_flags; 74 | * ... 75 | * unsigned int __data_len; 76 | * ... 77 | * }; 78 | */ 79 | bpf_trace_printk("%d %x %d\\n", 80 | req->__data_len, 81 | req->cmd_flags, 82 | delta / 1000); 83 | //__data_len,cmd_flags, delta/1000这三个数据会以类似追加的方式写在trace_pipe文件的末尾。 84 | //注意bpf_trace_printk()的打印格式,3个数据之间是由空格分隔,这对于python后期的处理有用 85 | //python会用trace_fields()读取trace_pipe的每个字段,上述的三个字段以对象的形式存在,python会用msg指向这三个数据 86 | //然后单独拿出来msg使用split()进行切片 87 | //切片后会生成一个包含3个字符串的list 88 | //最后用一个三元组来保存这个list里的数据。 89 | //其中,每个元组对应保存一个list中的字符串。 90 | 91 | start.delete(&req); //delete key 92 | } 93 | } 94 | ''') 95 | 96 | #call class BPF's method:get_kprobe_functions 97 | #因为get_kprobe_functions()是BPF类的静态方法 98 | if BPF.get_kprobe_functions(b'blk_start_request'): 99 | #开头的b表示这是一个bytes类型,即字节流类型。 100 | #pythn2将string处理为原生的bytes类型,而不是unicode。 101 | #pyhon3所有的string均是unicode类型。 102 | #python3.x里默认的str是(python2.x里的)unicode。bytes是(python2.x)的str 103 | #b""或者b''前缀代表的就是bytes. 104 | #b'blk_start_request'中的b前缀在python2.x里没有什么具体意义,只是为了兼容python3.c的这种写法 105 | b.attach_kprobe(event = 'blk_start_request', 106 | fn_name = 'trace_start') 107 | 108 | b.attach_kprobe(event = 'blk_mq_start_request', fn_name = 'trace_start') 109 | b.attach_kprobe(event = 'blk_account_io_completion', 110 | fn_name = 'trace_completion') 111 | 112 | #header 113 | #-号左对齐,+号右对齐(默认不用写出+号) 114 | print('%-18s %-2s %-7s %8s' % ('TIME(s)', 'T', 'BYTES', 'LAT(ms)')) 115 | 116 | #format output 117 | while True: 118 | try: 119 | (task, pid, cpu, flags, ts, msg) = b.trace_fields() 120 | #split()以空字符(包含空格,换行\n,制表符\t)对字符串进行切片 121 | #msg指向number对象,然后调用number的split()方法 122 | (bytes_s, bflags_s, us_s) = msg.split() 123 | #(bytes_s,bflags_s,us_s)是一个元组 124 | #这个元组用来接收切片后的返回值。 125 | 126 | #int():将一个字符串或数字转换为整型 127 | #int(x, base=10):x表示字符串或数字,base表示进制数,默认十进制 128 | #int(3.6) = 3 129 | #int('12', 16):如果是带着参数base的话,12要以字符串的形式 130 | # 进行输入 131 | if int(bflags_s, 16) & REQ_WRITE: #将16进制的bflags_s转换为十进制数 132 | type_s = b'W' 133 | elif bytes_s == '0':#see blk_fill_rwbs() for logic 134 | type_s = b'M' 135 | else: 136 | type_s = b'R' 137 | ms = float(int(us_s, 10)) / 1000 138 | 139 | #print(b'%-18.9f %-2s %-7s %8.2f' % (ts, type_s, bytes_s, ms)) 140 | printb(b'%-18.9f %-2s %-7s %8.2f' % (ts, type_s, bytes_s, ms)) 141 | except KeyboardInterrupt: 142 | exit() 143 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/maps/README.md: -------------------------------------------------------------------------------- 1 | #### 映射(map) 2 | 3 | eBPF程序使用的主要数据结构是eBPF map,eBPF map是一种通用数据结构,它允许在内核内或内核与用户空间之间来回传递数据。顾名思义,“map(地图)”使用键存储和检索数据。 4 | 5 | 映射的maps是BPF数据保存,高级对象表,哈希表和直方图的基础。 6 | 7 | 1. BPF_TABLE 8 | 9 | - 语法:BPF_TABLE(_table_type, _key_type, _leaf_type, _name, _max_entries) 10 | 11 | 创建一个映射,名字为name。大多数时候会被高层宏使用,例如BPF_HASH,BPF_HIST等。还有map.lookup(),map.lookup_or_init(),map.delete(),map.update(),map.insert(),map.increment()。 12 | 13 | 2. BPF_HASH 14 | 15 | - 语法:BPF_HASH(name, [key_type], [leaf_type], [size]) 16 | 17 | 创建一个name哈希表,中括号中是可选参数。 18 | 19 | 默认:BPF_HASH(name, key_type = u64, leaf_type = u64, size = 10240) 20 | 21 | 相关函数:map.lookup(),map.update(),map.increment()。 22 | 23 | 整列中数据是预分配的,不能删除,所以没有删除操作。 24 | 25 | 3. BPF_HISTOGRAM 26 | 27 | - 语法:BPF_HISTOGRAM(name, [key_type], [size]) 28 | 29 | 创建一个直方图,默认是由64位整型桶索引。 30 | 31 | 4. map.lookup 32 | 33 | - 语法:*val map.lookup(&key) 34 | 35 | 寻找map中键为key的值,如果存在则返回指向该键值的指针。 36 | 37 | 5. map.loopup_or_init 38 | 39 | - 语法:*val map.lookup_or_init(&key, &zero) 40 | 41 | 在map中寻找键,找到返回键值的指针,找不到则初始化为第二个参数。 42 | 43 | 6. map.delete 44 | 45 | - 从map中删除某个键值。 46 | 47 | 7. map.update 48 | 49 | - 语法:map.update(&key, &val) 50 | 51 | 更新键值。 52 | 53 | 8. map.insert 54 | 55 | - 插入键值 56 | 57 | 9. map.increment 58 | 59 | - 增加指定键的值,用于直方图。 60 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/maps/key.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | from bcc import BPF 5 | 6 | b = BPF(text = ''' 7 | #include 8 | 9 | //BPF_HASH(name[,key_type[,leaf_type[,size]]]) 10 | //defaults:BPF_HASH(name, key_type=u64, leaf_type=u64, size=10240) 11 | BPF_HASH(event); 12 | //event是hash table的名字。 13 | //event后,没有别的参数。表示key的类型采用默认的u64。 14 | //if need to modify the key type, set as:struct request* 15 | //means that key has new type, new type is 'struct request*' 16 | //remember key is a pointer!!! 17 | 18 | int kprobe__sys_clone(void *ctx) { 19 | u64 key = 0; 20 | u64 count = 0; 21 | u64 *val_ptr; 22 | 23 | bpf_trace_printk("first key value:%llu count:%d\\n", key, count); 24 | 25 | val_ptr = event.lookup(&key); 26 | if (val_ptr) { 27 | count++; 28 | event.delete(&key); 29 | bpf_trace_printk("after delete key value:%llu count:%d\\n", key, count); 30 | } 31 | 32 | event.update(&key, &count); //修改变量的值要通过引用传递,&variable_name 33 | 34 | bpf_trace_printk("after update key value:%llu count:%d\\n", key, count); 35 | 36 | return 0; 37 | } 38 | ''') 39 | 40 | #b.trace_print(fmt = '{5}') 41 | b.trace_print() 42 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/print/BPF_PERF_OUTPUT.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #BPF_PERF_OUTPUT() 5 | # 通过perf ring buffer创建BPF表,将定义的事件数据输出。 6 | # 这个是将数据推送到用户态的建议方法 7 | # 8 | #perf_submit: 9 | # 函数原型:int perf_submit((void *)ctx, (void *)data, u32 data_size) 10 | # 该函数是BPF_PERF_OUTPUT表(即events)的方法,将定义的事件数据推送到用户态 11 | 12 | from bcc import BPF 13 | 14 | #load eBPF program 15 | b = BPF(text = ''' 16 | #include 17 | 18 | struct data_t { 19 | u32 pid; 20 | u64 ts; 21 | char comm[TASK_COMM_LEN]; 22 | }; 23 | 24 | BPF_PERF_OUTPUT(events); //定义了一张输出表events 25 | 26 | int hello(struct pt_regs *ctx) { 27 | struct data_t data = {}; 28 | data.pid = bpf_get_current_pid_tgid(); 29 | data.ts = bpf_ktime_ns(); 30 | bpf_get_current_comm(&data.comm, 31 | sizeof(data.comm)); 32 | 33 | /*代码中的输出表是events,数据通过events.perf_submit来推送*/ 34 | events.perf_submit(ctx, &data, 35 | sizeof(data)); //向BPF表推送数据 36 | 37 | return 0; 38 | } 39 | ''') 40 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/print/README.md: -------------------------------------------------------------------------------- 1 | ### 输出函数 2 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/print/print_custom_fields.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #this is a exercise to print custom fields 5 | 6 | from __future__ import print_function #必须放第一个 7 | from bcc import BPF #从bcc package中导入BPF类 8 | 9 | b = BPF(text = ''' 10 | int hello(void *ctx) { 11 | bpf_trace_printk("hello, world\\n"); 12 | 13 | return 0; 14 | } 15 | ''') 16 | 17 | b.attach_kprobe(event = b.get_syscall_fnname('clone'), 18 | fn_name = 'hello') 19 | 20 | print('PID MESSAGE') 21 | try: 22 | #b.trace_print() #原样打印trace_pipe的输出。print trace_pipe output as-is 23 | #通过指定tuple的index,有选择的打印信息 24 | b.trace_print(fmt = '{0} {5}') #{0}{5}是tuple的index,打印时中间用一个空格分隔 25 | except KeyboardInterrupt: 26 | exit() 27 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/print/print_fields.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | from bcc import BPF 5 | 6 | b = BPF(text = ''' 7 | int hello(void *ctx) { 8 | bpf_trace_printk("hello, world\\n"); 9 | 10 | return 0; 11 | } 12 | ''') 13 | 14 | b.attach_kprobe(event = b.get_syscall_fnname('clone'), 15 | fn_name = 'hello') 16 | 17 | #print header 18 | print '%-18s %-16s %-6s %s' % \ 19 | ('TIME(s)', 'COMM', 'PID', 'MESSAGE') 20 | #下面的两种打印方式与上面的效果一样 21 | #print ('%-18s %-16s %-6s %s' % 22 | # ('TIME(s)', 'COMM', 'PID', 'MESSAGE')) 23 | 24 | #print('%-18s %-16s %-6s %s' % 25 | # ('TIME(s)', 'COMM', 'PID', 'MESSAGE')) 26 | 27 | #format output 28 | while 1: 29 | try: 30 | (task, pid, cpu, flags, ts, msg) = b.trace_fields() 31 | #前5个(task, pid, cpu, flags, ts)是trace_pipe默认有的 32 | #第6个开始,是bpf_trace_printk()以类似追加的方式写入trace_pipe的 33 | except ValueError: 34 | continue 35 | #print '%-18.9f %-16s %-6d %s' % (ts, task, pid, msg) 36 | #-号表示左对齐,18.9中的18表示占18个空格的长度,9表示小数点后保留9位 37 | print '%-18.9f %-16s %-6d %s %-6d' % (ts, task, pid, msg, cpu) 38 | -------------------------------------------------------------------------------- /temp/code/BPF_func_use/time/start_end.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | from __future__ import print_function 5 | from bcc import BPF 6 | 7 | b = BPF(text = ''' 8 | #include 9 | 10 | BPF_HASH(last); 11 | 12 | int do_trace(struct pt_regs *ctx) { 13 | u64 ts, *tsp, delta, key = 0; 14 | // *val map.lookup(&key) 15 | //lookup()函数在map中查找key,如果存在就返回 16 | //指向其值的指针,否则返回NULL 17 | 18 | //尝试读取存储的时间戳 19 | //attempt to read stored timestamp 20 | tsp = last.lookup(&key); 21 | if (tsp != NULL) { 22 | delta = bpf_ktime_get_ns() - *tsp; 23 | //u64 bpf_ktime_get_ns(void) 24 | //返回值,当前时间(以纳秒位单位) 25 | 26 | if (delta < 1000000000) { 27 | //output if time is less than 1 second 28 | bpf_trace_printk("%d\\n", delta / 1000000000); 29 | } 30 | 31 | last.delete(&key); 32 | } 33 | 34 | bpf_trace_printk("%d", key); 35 | //update stored timestamp 36 | ts = bpf_ktime_get_ns(); 37 | last.update(&key, &ts); 38 | //map.update(&key, &value) 39 | //将第二个参数中的值与键相关联,覆盖以前的任何值 40 | 41 | return 0; 42 | } 43 | ''') 44 | 45 | b.attach_kprobe(event = b.get_syscall_fnname('mmap'), 46 | fn_name = 'do_trace') 47 | 48 | print ('Tracing for quick sync`s...Ctrl-C to end ') 49 | 50 | #format output 51 | start = 0 #global variable 52 | while 1: 53 | #下面这语句是固定的,按照这个格式用就ok 54 | #trace_fields()的作用就是从trace_pipe文件中读取一行 55 | #然后将其作为字段返回 56 | (task, pid, cpu, flags, ts, ms) = b.trace_fields() 57 | #ms的值来自于BPF program中由bpf_trace_printk()“追加” 58 | #写入到trace_pipe中的delta 59 | if start == 0: 60 | start = ts #ts是trace_fields()读出来的 61 | ts = ts - start 62 | print('At time %.2fs:mutiple syncs detectd, last %s ms ago' % (ts, ms)) 63 | #%.2fs中,f表示float,s表示输出的数字后面带s 64 | 65 | 66 | 67 | 68 | 69 | -------------------------------------------------------------------------------- /temp/code/ebpf_prog_for_test/for_limited.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #验证eBPF程序是否允许循环 5 | # 结果:eBPF程序可以写循环。如果循环次数是确定的,程序可以运行。 \ 6 | # 如果是死循环,如for(;;),程序不能运行。 7 | 8 | from bcc import BPF 9 | 10 | b = BPF(text = ''' 11 | int kprobe__sys_clone(void *ctx) { 12 | int i = 0; 13 | for (i; i < 5; i++) 14 | bpf_trace_printk("hello world\\n"); 15 | 16 | return 0; 17 | } 18 | ''') 19 | 20 | b.trace_print() 21 | -------------------------------------------------------------------------------- /temp/code/ebpf_prog_for_test/for_unlimited.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #验证eBPF程序是否允许循环 5 | # 结果:eBPF程序可以写循环。如果循环次数是确定的,程序可以运行。 \ 6 | # 如果是死循环,如for(;;),程序不能运行。 7 | 8 | from bcc import BPF 9 | 10 | b = BPF(text = ''' 11 | int kprobe__sys_clone(void *ctx) { 12 | for (;;) 13 | bpf_trace_printk("hello world\\n"); 14 | 15 | return 0; 16 | } 17 | ''') 18 | 19 | b.trace_print() 20 | -------------------------------------------------------------------------------- /temp/code/hello.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | from bcc import BPF 5 | #从bcc package中导入BPF类,BPF是基类 6 | # 通过BPF()来实例化一个对象,这个对象就是eBPF程序。 7 | # BPF()实例化对象的时候,会调用叫做__init__(self)的构造函数 8 | #bcc是一个python的package,路径如下: 9 | # /usr/lib/python2.7/dist-packages/bcc 10 | #python的package一个典型的特征就是在package中一定有一个__init__.py的模块。 11 | 12 | #ebpf program 13 | # 整个eBPF程序都包含在program变量中,这是在 14 | # eBPF虚拟机上的内核中运行的代码 15 | program = ''' 16 | /* 17 | *kprobe__前缀指示bcc工具链将kprobe附加到 \ 18 | *其后的内核符号上 19 | * 20 | *将探针(uprobe,kprobe,tracepoints或USDT) \ 21 | *插入到给定的函数sys_cone()中,该函数可以 \ 22 | *在内核中或在用户空间代码中 23 | * 24 | *使用kprobe的语法是: \ 25 | * kprobe__kernel_function_name \ 26 | *其中kprobe__是前缀,用于给内核函数创建 \ 27 | *一个kprobe(内核函数调用的动态跟踪)。 28 | * 29 | *也可通过C语言函数定义一个C函数,然后使用 \ 30 | *python的BPF.attach_kprobe()来关联到内核函数 31 | * 32 | * kretprobes: \ 33 | * kretprobes动态跟踪内核函数的返回,语法如下: \ 34 | * kretprobe__kernel_function_name \ 35 | * 前缀是kretprobe__。也可以使用python的 36 | * BPF.attach_kretprobe()来关联C函数到内核函数。 \ 37 | * 38 | * 例子: int kretprobe__tcp_v4_connect(struct pt_regs *ctx) 39 | * { 40 | * int ret = PT_REGS_RC(ctx); //返回值保存在ret中 41 | * [...] 42 | * } 43 | */ 44 | 45 | int kprobe__sys_clone(void *ctx) //kprobe__是前缀,用于给内核函数创建一个kprobe 46 | { //ctx有参数,但是由于我们不在这里使用它们,因此我们将其转换为void * 47 | //当调用sys_clone并触发该kprobe时,eBPF程序运行 48 | //bpf_trace_printk()打印“hello world”到内核的trace buffer(跟踪缓冲区)。 49 | //kernel trace buffer就是/sys/kernel/debug/tracing/trace_pipe 50 | 51 | bpf_trace_printk("hello, world!\\n"); 52 | //bpf_trace_printk()使用类似“追加写入”的方式向trace_pipe文件写入字符串内容 53 | //可以使用trace_print(fmt = '{number}')的方式单独读取bpf_trace_printk()写入的内容 54 | 55 | return 0; //return 0是必要的,不同的内核挂钩点会根据return的值,作不同的处理。如果为定义 \ 56 | //返回值,可能会导致奇怪的性能表现。 57 | } 58 | ''' 59 | 60 | #load eBPF program 61 | #python程序的其余部分将eBPF代码加载到内核并运行 62 | b = BPF(text = program) #实例化一个新的BPF对象b 63 | 64 | #BPF_trace_print()对内核的trace buffer(/sys/kerel/debug/tracing/trace_pipe)执行阻塞读取 65 | # 并将内容打印到标准输出中 66 | b.trace_print() #trace_print()是BPF的python库的接口 67 | 68 | #以前将程序编译为eBPF字节码并将其加载到内核的繁琐任务完全是通过实例化一个新的BPF对象来完成的 69 | # 所有的底层工作都在幕后,由python绑定和bcc的libbpf完成 70 | # libbpf是BPF加载器。libbpf在运行时获取之前已经编译好的BPF ELF文件,然后针对运行平台做进一步的处理,并触发BPF程序的加载和验证 71 | -------------------------------------------------------------------------------- /temp/code/histogram.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #直方图练习 5 | from bcc import BPF 6 | from time import sleep 7 | 8 | #load eBPF program 9 | b = BPF(text = ''' 10 | #include 11 | #include 12 | 13 | /*直方图(histogram)*/ 14 | BPF_HISTOGRAM(dist); //BPF_HISTOGRAM(dist)定义BPF直方图映射对象,名字叫做dist 15 | 16 | /*把kprobe插在blk_account_io_completion()上*/ 17 | int kprobe__blk_account_io_completion(struct pt_regs *ctx, 18 | struct request *req) { 19 | //dist.increment()函数会增加直方图中各个值,值由参数指定。increment:增量 20 | dist.increment(bpf_log2l(req->__data_len/1024)); //bpf_log2l()将值变成log-2模式 21 | 22 | return 0; 23 | } 24 | ''') 25 | 26 | #header 27 | print("Tracing ...Hit Ctrl-C to end.") #python的print的会自动换行,不需要\n 28 | 29 | #trace until Ctrl-C 30 | try: 31 | sleep(99999999) 32 | except KeyboardInterrupt: 33 | print 34 | 35 | #output 36 | b["dist"].print_log2_hist("kbytes") #print_log2_hist("kbytes")打印hist直方图,列单位为kbytes 37 | -------------------------------------------------------------------------------- /temp/code/index/README.md: -------------------------------------------------------------------------------- 1 | ### 指标提取代码 2 | -------------------------------------------------------------------------------- /temp/code/index/mm/hw/bpfhash.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | 5 | #analyze disksnoop.py 6 | 7 | from __future__ import print_function 8 | from bcc import BPF 9 | from bcc.utils import printb 10 | 11 | REQ_WRITE = 1 12 | 13 | b = BPF(text = ''' 14 | /* 15 | *定义寄存器组的结构体:struct pt_regs 16 | * 内核路径:/arch/x86/include/uapi/asm/ptrace.h 17 | */ 18 | #include 19 | 20 | //定义了内核数据结构:struct request 21 | // 内核路径:/include/linux/blkdev.h 22 | #include 23 | 24 | //define hash table name:start 25 | //define key pointer type:struct request * 26 | BPF_HASH(start, struct request *); 27 | 28 | /* define probe_1 */ 29 | /* 30 | *two arguments:ctx point to registers context 31 | * req is key pointer,point to 'struct request' 32 | */ 33 | //开始的时候触发 34 | void trace_start(struct pt_regs *ctx, struct request *req) { 35 | //stroed start timestamp 36 | u64 ts = bpf_ktime_get_ns(); 37 | 38 | //update timestamp 39 | start.update(&req, &ts);//用后面的覆写以前的值 40 | } 41 | 42 | /* define probe_2 */ 43 | //完成的时候触发 44 | void trace_completion(struct pt_regs *ctx, struct request *req) { 45 | u64 *tsp, delta; 46 | 47 | //return a pointer to its value if it exists,else NULL 48 | tsp = start.lookup(&req); 49 | if (tsp) { 50 | //u64 bpf_ktime_get_ns(void) 51 | delta = bpf_ktime_get_ns() - *tsp; 52 | 53 | /* struct request { 54 | * ... 55 | * unsigned int cmd_flags; 56 | * ... 57 | * unsigned int __data_len; 58 | * ... 59 | * }; 60 | */ 61 | bpf_trace_printk("%d %x %d\\n", 62 | req->__data_len, 63 | req->cmd_flags, 64 | delta / 1000); 65 | //__data_len,cmd_flags, delta/1000这三个数据会以类似追加的方式写在trace_pipe文件的末尾。 66 | //注意bpf_trace_printk()的打印格式,3个数据之间是由空格分隔,这对于python后期的处理有用 67 | //python会用trace_fields()读取trace_pipe的每个字段,上述的三个字段以对象的形式存在,python会用msg指向这三个数据 68 | //然后单独拿出来msg使用split()进行切片 69 | //切片后会生成一个包含3个字符串的list 70 | //最后用一个三元组来保存这个list里的数据。 71 | //其中,每个元组对应保存一个list中的字符串。 72 | 73 | start.delete(&req); //delete key 74 | } 75 | } 76 | ''') 77 | 78 | #call class BPF's method:get_kprobe_functions 79 | if BPF.get_kprobe_functions(b'blk_start_request'): 80 | #开头的b表示这是一个bytes类型,即字节流类型。 81 | #pythn2将string处理为原生的bytes类型,而不是unicode。 82 | #pyhon3所有的string均是unicode类型。 83 | #python3.x里默认的str是(python2.x里的)unicode。bytes是(python2.x)的str 84 | #b""或者b''前缀代表的就是bytes. 85 | #b'blk_start_request'中的b前缀在python2.x里没有什么具体意义,只是为了兼容python3.c的这种写法 86 | b.attach_kprobe(event = 'blk_start_request', 87 | fn_name = 'trace_start') 88 | 89 | b.attach_kprobe(event = 'blk_mq_start_request', fn_name = 'trace_start') 90 | b.attach_kprobe(event = 'blk_account_io_completion', 91 | fn_name = 'trace_completion') 92 | 93 | #header 94 | #-号左对齐,+号右对齐(默认不用写出+号) 95 | print('%-18s %-2s %-7s %8s' % ('TIME(s)', 'T', 'BYTES', 'LAT(ms)')) 96 | 97 | #format output 98 | while True: 99 | try: 100 | (task, pid, cpu, flags, ts, msg) = b.trace_fields() 101 | #split()以空字符(包含空格,换行\n,制表符\t)对字符串进行切片 102 | #msg指向number对象,然后调用number的split()方法 103 | (bytes_s, bflags_s, us_s) = msg.split() 104 | #(bytes_s,bflags_s,us_s)是一个元组 105 | #这个元组用来接收切片后的返回值。 106 | 107 | #int():将一个字符串或数字转换为整型 108 | #int(x, base=10):x表示字符串或数字,base表示进制数,默认十进制 109 | #int(3.6) = 3 110 | #int('12', 16):如果是带着参数base的话,12要以字符串的形式 111 | # 进行输入 112 | if int(bflags_s, 16) & REQ_WRITE: #将16进制的bflags_s转换为十进制数 113 | type_s = b'W' 114 | elif bytes_s == '0':#see blk_fill_rwbs() for logic 115 | type_s = b'M' 116 | else: 117 | type_s = b'R' 118 | ms = float(int(us_s, 10)) / 1000 119 | 120 | print(b'%-18.9f %-2s %-7s %8.2f' % (ts, type_s, bytes_s, ms)) 121 | #printb(b'%-18.9f %-2s %-7s %8.2f' % (ts, type_s, bytes_s, ms)) 122 | except KeyboardInterrupt: 123 | exit() 124 | -------------------------------------------------------------------------------- /temp/code/index/mm/hw/start_end.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | from __future__ import print_function 5 | from bcc import BPF 6 | 7 | b = BPF(text = ''' 8 | #include 9 | 10 | BPF_HASH(last); 11 | 12 | int do_trace(struct pt_regs *ctx) { 13 | u64 ts, *tsp, delta, key = 0; 14 | // *val map.lookup(&key) 15 | //lookup()函数在map中查找key,如果存在就返回 16 | //指向其值的指针,否则返回NULL 17 | 18 | //尝试读取存储的时间戳 19 | //attempt to read stored timestamp 20 | tsp = last.lookup(&key); 21 | if (tsp != NULL) { 22 | delta = bpf_ktime_get_ns() - *tsp; 23 | //u64 bpf_ktime_get_ns(void) 24 | //返回值,当前时间(以纳秒位单位) 25 | 26 | if (delta < 1000000000) { 27 | //output if time is less than 1 second 28 | bpf_trace_printk("%d\\n", delta / 1000000000); 29 | } 30 | 31 | last.delete(&key); 32 | } 33 | 34 | bpf_trace_printk("%d", key); 35 | //update stored timestamp 36 | ts = bpf_ktime_get_ns(); 37 | last.update(&key, &ts); 38 | //map.update(&key, &value) 39 | //将第二个参数中的值与键相关联,覆盖以前的任何值 40 | 41 | return 0; 42 | } 43 | ''') 44 | 45 | b.attach_kprobe(event = b.get_syscall_fnname('mmap'), 46 | fn_name = 'do_trace') 47 | 48 | print ('Tracing for quick sync`s...Ctrl-C to end ') 49 | 50 | #format output 51 | start = 0 #global variable 52 | while 1: 53 | #下面这语句是固定的,按照这个格式用就ok 54 | #trace_fields()的作用就是从trace_pipe文件中读取一行 55 | #然后将其作为字段返回 56 | (task, pid, cpu, flags, ts, ms) = b.trace_fields() 57 | #ms的值来自于BPF program中由bpf_trace_printk()“追加” 58 | #写入到trace_pipe中的delta 59 | if start == 0: 60 | start = ts #ts是trace_fields()读出来的 61 | ts = ts - start 62 | print('At time %.2fs:mutiple syncs detectd, last %s ms ago' % (ts, ms)) 63 | #%.2fs中,f表示float,s表示输出的数字后面带s 64 | 65 | 66 | 67 | 68 | 69 | -------------------------------------------------------------------------------- /temp/code/index/mm/pro_past_1s_mmap_size/pmmapsz.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #程序功能:提取指定进程过去1s内mmap的内存 5 | 6 | from __future__ import print_function 7 | from bcc import BPF 8 | from time import sleep 9 | 10 | #load eBPF program 11 | b = BPF(text = ''' 12 | BPF_HASH(ctl, u32, u64); 13 | //BPF_PERF_OUTPUT(events); 14 | 15 | TRACEPOINT_PROBE(syscalls, sys_enter_mmap) { 16 | u64 val = args->len; 17 | u32 pid = bpf_get_current_pid_tgid(); 18 | ctl.update(&pid, &val); 19 | 20 | return 0; 21 | } 22 | ''') 23 | while True: 24 | try: 25 | sleep(1) 26 | for k,v in sorted(b["ctl"].items()): 27 | print("%d" % (k.value, v.value)) 28 | print 29 | except KeyboardInterrupt: 30 | exit() 31 | -------------------------------------------------------------------------------- /temp/code/trace_sync.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | from bcc import BPF 5 | 6 | b = BPF(text = ''' 7 | int kprobe__sys_sync(void *ctx) { 8 | /*BPF程序中声明的所有函数都希望在probe上运行 9 | *因此它们都需要将pt_reg* ctx作为i第一个参数 10 | * 11 | * 如果需要定义一些不会在probe上执行的辅助函数 12 | * 则需要用static inline来定义它们,以使编译器内联 13 | * 有时候还需要向其中添加_always_inline功能属性 14 | */ 15 | bpf_trace_printk("sys_sync() called"); 16 | 17 | return 0; 18 | } 19 | ''') 20 | 21 | print 'Tracing sys_sync()...Ctrl-C to end' 22 | while True: 23 | try: 24 | b.trace_print() 25 | except KeyboardInterrupt: 26 | exit() 27 | 28 | -------------------------------------------------------------------------------- /temp/idea/idea.md: -------------------------------------------------------------------------------- 1 | ### idea 2 | 3 | 记录一些eBPF/BCC编程中的思考和心得 4 | 5 | #### eBPF程序 6 | 7 | 1. 一触即发 8 | 9 | 今天在分析一个eBPF程序结构的时候,想到eBPF程序它和普通的用户程序有两点不同(目前就想到这两点)。 10 | 11 | 第一个,eBPF程序是用户态的代码,但可以在内核空间执行。这也是BPF/eBPF机制的一个亮点,让用户程序具有在内核空间执行的能力,但这也对用户程序的安全性有较为严格的检查,因为内核一定要能发现用户代码的潜在隐患。 12 | 13 | 第二个。用户程序一般都是运行一次,或者根据我们的需求来运行,直白一点,就是我让你运行你才运行。而eBPF程序的运行带有种自动化的行为。eBPF程序的特点就是为了追踪一个事件,或者说跟踪一个函数,比如kmalloc(),在某一段时间内,kmalloc()可能会被调用5次,相应的eBPF程序也会被执行5次。所以eBPF程序的运行特点是一触即发,随触随发的形式。普通的用户程序的执行概括的说就是:执行->结束,而eBPF程序的情况是:执行->执行->执行-> …… ->执行->结束,而且eBPF程序每次执行,都是完全相同的代码再走一遍。 14 | 15 | 2. 循环 16 | 17 | 目前看着教程练习了一些eBPF程序,发现BCC例子中的很多eBPF程序里看不到有循环。猜测eBPF程序里应该不能写循环。打算写个例子来验证下。 18 | 19 | +++++++++++++++++++++++++++++++++时间分割线+++++++++++++++++++++++++++++++++++++++ 20 | 21 | 经过验证,eBPF程序中可以使用循环。但是循环次数必须是确定的值,如果是死循环,比如for(;;),运行就会失败。 22 | 23 | **代码1:有限的循环5次** 24 | 25 | ```python 26 | #!/usr/bin/env python 27 | # coding=utf-8 28 | 29 | from bcc import BPF 30 | 31 | b = BPF(text = ''' 32 | int kprobe__sys_clone(void *ctx) { 33 | int i = 0; 34 | for (i; i < 5; i++) 35 | bpf_trace_printk("hello world\\n"); 36 | 37 | return 0; 38 | } 39 | ''') 40 | 41 | b.trace_print() 42 | 43 | ``` 44 | 45 | 运行结果: 46 | 47 | ```shell 48 | $ sudo python ./for_limited.py 49 | 50 | ThreadPoolForeg-7898 [001] .... 288815.687851: 0: hello world 51 | ThreadPoolForeg-7898 [001] .... 288815.687910: 0: hello world 52 | ThreadPoolForeg-7898 [001] .... 288815.687912: 0: hello world 53 | ThreadPoolForeg-7898 [001] .... 288815.687912: 0: hello world 54 | ThreadPoolForeg-7898 [001] .... 288815.687913: 0: hello world 55 | ManagementAgent-1405 [003] .... 288827.365195: 0: hello world 56 | ManagementAgent-1405 [003] .... 288827.365241: 0: hello world 57 | ManagementAgent-1405 [003] .... 288827.365242: 0: hello world 58 | ManagementAgent-1405 [003] .... 288827.365243: 0: hello world 59 | ManagementAgent-1405 [003] .... 288827.365243: 0: hello world 60 | ... 61 | 62 | ``` 63 | 64 | 代码2:在eBPF程序中写入无限循环 65 | 66 | ```python 67 | #!/usr/bin/env python 68 | # coding=utf-8 69 | 70 | from bcc import BPF 71 | 72 | b = BPF(text = ''' 73 | int kprobe__sys_clone(void *ctx) { 74 | for (;;) 75 | bpf_trace_printk("hello world\\n"); 76 | 77 | return 0; 78 | } 79 | ''') 80 | 81 | b.trace_print() 82 | 83 | ``` 84 | 85 | 运行结果: 86 | 87 | ```shell 88 | $ sudo python ./for_unlimited.py 89 | 90 | bpf: Failed to load program: Invalid argument 91 | ``` 92 | 93 | -------------------------------------------------------------------------------- /temp/note/BccLesson/lessonNote/disksnoop.md: -------------------------------------------------------------------------------- 1 | nore 2 | -------------------------------------------------------------------------------- /temp/note/BccLesson/lessonRecord/README.md: -------------------------------------------------------------------------------- 1 | ### `bcc` Python Developer Tutorial - lesson record 2 | 3 | 记录一下已经看过的lesson 4 | 5 | - [x] Lesson 1. Hello World 6 | - [x] Lesson 2. sys_sync() 7 | - [x] Lesson 3. `hello_fields.py` 8 | - [x] Lesson 4. `sync_timing.py` 9 | - [x] Lesson 5. `sync_count.py` 10 | - [x] Lesson 6. `disksnoop.py` 11 | - [ ] Lesson 7. `hello_perf_output.py` 12 | - [ ] Lesson 8. `sync_perf_output.py` 13 | - [ ] Lesson 9. `bitehist.py` 14 | - [ ] Lesson 10. `disklatency.py` 15 | - [ ] Lesson 11. `vfsreadlat.py` 16 | - [ ] Lesson 12. `urandomread.py` 17 | - [ ] Lesson 13. `disksnoop.py fixed` 18 | - [ ] Lesson 14. `strlen_count.py` 19 | - [ ] Lesson 15. `nodejs_http_server.py` 20 | - [ ] Lesson 16. `task_switch.c` 21 | - [ ] Lesson 17. Further Study 22 | 23 | -------------------------------------------------------------------------------- /temp/note/README.md: -------------------------------------------------------------------------------- 1 | bcc程序笔记 2 | -------------------------------------------------------------------------------- /temp/note/eBPFnote/eBPFPROGNOTE.md: -------------------------------------------------------------------------------- 1 | ### eBPF程序分析总结 2 | 3 | 参考资料:[eBPF简史](https://linux.cn/article-9032-1.html) 4 | 5 | #### eBPF example 6 | 7 | - sockex1_user.c 8 | 9 | ```c 10 | #include <…> 11 | // 篇幅所限,清单 3 和 4 都只罗列出部分关键代码,有兴趣一窥全貌的读者可以移步 http://elixir.free-electrons.com/linux/v4.12.6/source/samples/bpf深入学习 12 | int main(int ac, char **argv) 13 | { 14 | // 1. eBPF 的伪代码位于 sockex1_kern.o 中,这是一个由 llvm 生成的 elf 格式文件,指令集为 bpf; 15 | snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 16 | if (load_bpf_file(filename)) { 17 | // load_bpf_file()定义于 bpf_load.c,利用 libelf 来解析 sockex1_kern.o 18 | // 并利用 bpf_load_program 将解析出的伪代码 attach 进内核; 19 | } 20 | // 2. 因为 sockex1_kern.o 中 bpf 程序的类型为 BPF_PROG_TYPE_SOCKET_FILTER 21 | // 所以这里需要用用 SO_ATTACH_BPF 来指明程序的 sk_filter 要挂载到哪一个套接字上 22 | sock = open_raw_sock("lo"); 23 | assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, 24 | sizeof(prog_fd[0])) == 0); 25 | //…… 26 | for (i = 0; i < 5; i++) { 27 | // 3. 利用 map 机制获取经由 lo 发出的 tcp 报文的总长度 28 | key = IPPROTO_TCP; 29 | assert(bpf_map_lookup_elem(map_fd[0], &key, &tcp_cnt) == 0); 30 | // …… 31 | } 32 | return 0; 33 | } 34 | ``` 35 | 36 | - sockex1_kern.c 37 | 38 | ```c 39 | #include <……> 40 | // 预先定义好的 map 对象 41 | // 这里要注意好其实 map 是需要由用户空间程序调用 bpf_create_map()进行创建的 42 | // 在这里定义的 map 对象,实际上会在 load_bpf_file()解析 ELF 文件的同时被解析和创建出来 43 | // 这里的 SEC(NAME)宏表示在当前 obj 文件中新增一个段(section) 44 | struct bpf_map_def SEC("maps") my_map = { 45 | .type = BPF_MAP_TYPE_ARRAY, 46 | .key_size = sizeof(u32), 47 | .value_size = sizeof(long), 48 | .max_entries = 256, 49 | }; 50 | SEC("socket1") 51 | int bpf_prog1(struct __sk_buff *skb) 52 | { 53 | // 这个例子比较简单,仅仅是读取输入报文的包头中的协议位而已 54 | // 这里的 load_byte 实际指向了 llvm 的 built-in 函数 asm(llvm.bpf.load.byte) 55 | // 用于生成 eBPF 指令 BPF_LD_ABS 和 BPF_LD_IND 56 | int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); 57 | long *value; 58 | // …… 59 | // 根据 key(&index,注意这是一个指向函数的引用)获取对应的 value 60 | value = bpf_map_lookup_elem(&my_map, &index); 61 | if (value) 62 | __sync_fetch_and_add(value, skb->len); //这里的__sync_fetch_and_add 是 llvm 中的内嵌函数,表示 atomic 加操作 63 | return 0; 64 | } 65 | // 为了满足 GPL 毒药的需求,所有会注入内核的 BPF 代码都须显式的支持 GPL 协议 66 | char _license[] SEC("license") = "GPL"; 67 | ``` 68 | 69 | - 总结 70 | 71 | ![eBPF程序原理](/temp/note/eBPFnote/image/eBPF程序原理.png) 72 | 73 | -------------------------------------------------------------------------------- /temp/note/eBPFnote/image/eBPF程序原理.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linuxkerneltravel/eBPF/6de3c7f05e3f1abed745b8e0e4133fb9e414a749/temp/note/eBPFnote/image/eBPF程序原理.png -------------------------------------------------------------------------------- /tracepoint/HASH_OUTPUT.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | 5 | bpf_text = """ 6 | #include 7 | #include 8 | 9 | struct val_t { 10 | u64 pid; 11 | int sig; 12 | int tpid; 13 | char comm[TASK_COMM_LEN]; 14 | }; 15 | 16 | struct data_t { 17 | u64 pid; 18 | int tpid; 19 | int sig; 20 | int ret; 21 | char comm[TASK_COMM_LEN]; 22 | }; 23 | 24 | BPF_HASH(infotmp, u32, struct val_t); 25 | BPF_PERF_OUTPUT(events); 26 | 27 | int syscall__kill(struct pt_regs *ctx, int tpid, int sig) 28 | { 29 | u32 pid = bpf_get_current_pid_tgid(); 30 | FILTER 31 | 32 | struct val_t val = {.pid = pid}; 33 | if (bpf_get_current_comm(&val.comm, sizeof(val.comm)) == 0) { 34 | val.tpid = tpid; 35 | val.sig = sig; 36 | infotmp.update(&pid, &val); 37 | } 38 | 39 | return 0; 40 | }; 41 | 42 | int do_ret_sys_kill(struct pt_regs *ctx) 43 | { 44 | struct data_t data = {}; 45 | struct val_t *valp; 46 | u32 pid = bpf_get_current_pid_tgid(); 47 | 48 | valp = infotmp.lookup(&pid); 49 | if (valp == 0) { 50 | // missed entry 51 | return 0; 52 | } 53 | 54 | bpf_probe_read(&data.comm, sizeof(data.comm), valp->comm); 55 | data.pid = pid; 56 | data.tpid = valp->tpid; 57 | data.ret = PT_REGS_RC(ctx); 58 | data.sig = valp->sig; 59 | 60 | events.perf_submit(ctx, &data, sizeof(data)); 61 | infotmp.delete(&pid); 62 | 63 | return 0; 64 | } 65 | """ 66 | -------------------------------------------------------------------------------- /tracepoint/tracepoint.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #tracepoint比较稳定,如果可以都建议用tracepoint替代kprobes 5 | # 6 | #可以使用perf list命令来列出可用的tracepoints 7 | # 8 | #将eBPF程序附加到tracepoint上需要内核版本大于4.7 9 | # 10 | # TRACEPOINT_PROBE()宏:该宏声明要附加到跟踪点的函数,\ 11 | # 每次触发该跟踪点时都会调用该函数 12 | # 13 | # 以下的C代码片段显示了一个空的eBPF程序:该程序每次在内核中调用kmalloc()时运行 14 | # TRACEPOINT_PROBE(kmem, kmalloc) { 15 | # retrurn 0; 16 | # } 17 | # 该宏的参数是跟踪点的类别和事件本身。 \ 18 | # 这将直接转换为debugfs文件系统的层次结构布局 \ 19 | # (例如:/sys/kernel/debug/traceing/events/category/event) 20 | # 21 | 22 | #程序功能: 23 | # 跟踪随机读 24 | from __future__ import print_function 25 | from bcc import BPF 26 | 27 | #load eBPF program 28 | b = BPF(text = ''' 29 | /*将eBPF程序附加到tracepoint上*/ 30 | /* TRACEPOINT_PROBE(random, urandon_read) \ 31 | 是内核的tracepoint random:urandom_read. \ 32 | 其格式位于 /sys/kernel/debug/tracing/events/random/urandom_read/format 33 | */ 34 | TRACEPOINT_PROBE(random, urandom_read) { 35 | bpf_trace_printk("%d\\n", args->got_bits); 36 | 37 | return 0; 38 | } 39 | ''') 40 | 41 | #header 42 | print("%-18s %-16s %-6s %s"%("TIME(s)", 43 | "COMM", "PID", "GOTBITS")) 44 | 45 | #format output 46 | while 1: 47 | try: 48 | (task, pid, cpu, flags, ts, msg) = b.trace_fields() 49 | except ValueError: 50 | continue 51 | print("%-18.9f %-16s %-6d %s"%(ts, task, pid, msg)) 52 | -------------------------------------------------------------------------------- /tracepoint/tracepoint_kmalloc.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #程序功能:使用BPF_HASH map存储kmalloc()调用点的内核指令指针地址,和调用次数 5 | # 并使用python进行后期的处理 6 | 7 | from bcc import BPF 8 | from time import sleep 9 | 10 | #eBPF program 11 | b = BPF(text = ''' 12 | /* 13 | *BPF_HASH()宏带有许多可选参数, \ 14 | * 但是对于大多数用途,需要指定的只是此 \ 15 | * 哈希表实例的名称(本例中为callers),键数据类型(u64),值数据类型(unsigned long) 16 | */ 17 | BPF_HASH(callers, u64, unsigned long); //一旦计数存储在哈希表中,就可以使用python处理该计数 18 | //通过索引BPF对象(实例中的b)来完成对该表的访问 19 | TRACEPOINT_PROBE(kmem, kmalloc) { 20 | //向跟踪点传递参数 21 | //在eBPF程序中,可以通过magic args变量访问tracepoint参数 22 | u64 ip = args->call_site; //kmalloc()调用的内核指令地址 23 | unsigned long *count; 24 | unsigned long c = 1; 25 | 26 | count = callers.lookup((u64 *)&ip); //使用lookup()函数访问BPF哈希表条目;如果给定键值不存在任何条目,则返回NULL 27 | if (count != 0) 28 | c += *count; 29 | 30 | callers.update(&ip, &c); //map.update()函数会插入一个新的key值(如果不存在),或更新现有键的值 31 | 32 | return 0; 33 | } 34 | ''') 35 | 36 | while True: 37 | try: 38 | sleep(1) 39 | #遍历callers哈希表中的所有项 40 | for k,v in sorted(b["callers"].items()): #生成的python对象是HashTable(在BCC Python前端中定义), 41 | # 并可以使用items()函数访问其项 42 | print("%s %u" % (b.ksym(k.value), v.value)) #使用BCC的BPF.ksym()函数将内核地址转换为符号 43 | print 44 | except KeyboardInterrupt: 45 | exit() 46 | -------------------------------------------------------------------------------- /tracepoint/tracepoint_mm_vmscan_writepage.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | from bcc import BPF 5 | 6 | b = BPF(text = ''' 7 | BPF_HASH(callers, u64, unsigned long); 8 | 9 | /*TRACEPOINT_PROBE()的参数是跟踪点的类别和事件本身 \ 10 | * 跟踪点的类别在/sys/kernel/debug/tracing/events文件夹下 \ 11 | * vmscan是events里的一个类别,mm_vmscan_writepage()是vmscan/里的事件 \ 12 | * TRACEPOINT_PROBE()会将跟踪点的类别和事件直接转换为debugfs文件系统的层次结构布局 13 | */ 14 | TRACEPOINT_PROBE(vmscan, mm_vmscan_writepage) { 15 | bpf_trace_printk("hello world\\n"); /*bpf_trace_printk()用于从BPF程序写入/sys/kernel/debug/tracing/trace_pipe 16 | * 然后可以使用BPF.trace_print()函数在python中打印这些消息 17 | * 18 | *bpf_trace_printk()的主要缺点是,由于trace_pipe文件是全局自愿,因此它 19 | * 包含由并发编写器编写的所有消息,因此很难从单个BPF程序中过滤消息。 20 | * 21 | *首选方法是将消息存储在BPF程序内的BPF_PERF_OUTPUT map中,然后使用 22 | * open_perf_buffer()和kprobe_poll()处理它们。 23 | return 0; 24 | } 25 | ''') 26 | 27 | b.trace_print() 28 | -------------------------------------------------------------------------------- /uprobe/uprobe_strlen.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | #程序功能: 5 | # 跟踪用户层函数 strlen() 6 | # 7 | #跟踪用户层函数使用uprobe,对应的BPF函数是attach_uprobe 8 | # 例如: 9 | # b.attach_uprobe(name = "c", sym = "strlen", fn_name = "count") 10 | # 含义: 11 | # 附加到C库,函数为strlen(),对应的处理函数为count 12 | 13 | from __future__ import print_function 14 | from bcc import BPF 15 | from time import sleep 16 | 17 | #load eBPF program 18 | b = BPF(text = ''' 19 | #include 20 | 21 | struct key_t { 22 | char c[80]; 23 | }; 24 | 25 | BPF_HASH(counts, struct key_t); 26 | 27 | int count(struct pt_regs *ctx) { 28 | if (!PT_REGS_PARM1(ctx)) 29 | return 0; 30 | struct key_t key = {}; 31 | u64 zero = 0, *val; 32 | 33 | bpf_probe_read(&key.c, sizeof(key.c), 34 | (void *)PT_REGS_PARM1(ctx)); 35 | val = counts.lookup_or_init(&key, &zero); 36 | (*val)++; 37 | 38 | return 0; 39 | } 40 | ''') 41 | 42 | b.attach_uprobe(name = "c", sym = "strlen", fn_name = "count") 43 | 44 | #header 45 | print("Tracing strlen()...Hit Ctrl-C to end.") 46 | 47 | #sleep until Ctrl-C 48 | try: 49 | sleep(99999999) 50 | except KeyboardInterrupt: 51 | pass 52 | 53 | #print output 54 | print("%10s %s"%("COUNT", "STRING")) 55 | 56 | counts = b.get_table("counts") 57 | 58 | for k,v in sorted(counts.items(), key = lambda counts:counts[1].value): 59 | print("%10d\"%s\""%(v.value, k.c.encode('string-escape'))) 60 | --------------------------------------------------------------------------------