├── .gitignore ├── LICENSE ├── README.md ├── bpf ├── Makefile ├── README.md ├── bpf-maps │ ├── README.md │ └── examples-in-kernel │ │ ├── Makefile_addmyown │ │ ├── README.md │ │ ├── xdp_ip_tracker_common.h │ │ ├── xdp_ip_tracker_kern.c │ │ └── xdp_ip_tracker_user.c ├── bpf_trace_printk_definition.pdf └── perf-sys.h ├── bpftrace └── README.md ├── btf ├── README.md └── btf-xdp-cnt.c ├── bumblebee └── tcp_kprobe.c ├── libbpf ├── README.md └── libbpfgo-example │ ├── const-x64.go │ ├── hellokprobe.c │ ├── hellokprobe.go │ └── hellokprobe.h ├── tc ├── README.md ├── debug-tc-xdp-drop-tcp.c ├── headers │ ├── bpf_endian.h │ └── bpf_helpers.h ├── tc-xdp-drop-tcp.c └── tc-xdp-statistics.c └── xdp ├── README.md └── xdp-drop-world.c /.gitignore: -------------------------------------------------------------------------------- 1 | log 2 | .vscode 3 | *.o 4 | test -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Wen-Quan Li 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 学习Linux BPF/eBPF 编程 2 | 3 | 打造学习BPF知识的中文社区。学习计划如下: 4 | ![bpf-learning-path](https://davidlovezoe.club/wordpress/wp-content/uploads/2020/06/eBPF-learning-002-2048x1528.png) 5 | 6 | ## 相关博文参考 7 | - 中文版:https://davidlovezoe.club/wordpress/archives/tag/bpf 8 | 9 | ## 实验环境准备 10 | - Linux操作系统(喜欢尝鲜新版内核的同学,推荐使用[bumblebee](https://bumblebee.io/ZH)的[vagrantfile](https://github.com/solo-io/bumblebee/blob/main/Vagrantfile)) 11 | 12 | 本人自己的实验环境是`Ubuntu 18.04`标准版vagrant虚拟机,内核版本为`4.15.0`。可以从这里下载该vagrant虚拟机环境,已安装bcc工具集合: 13 | 14 | > 下载链接: https://pan.baidu.com/s/11dsEU6Yk6KGDGNor-fbsgQ 提取码: qvhc。 15 | > 使用方式可以参考[这篇文章](https://davidlovezoe.club/ebpf-learning-bcc-intro)。 16 | 17 | 以下命令如无特殊说明,均在Ubuntu环境下测试执行。 18 | 19 | - 预装clang、LLVM、iproute2、libelf-dev 20 | ```bash 21 | # for ubuntu 22 | apt install clang llvm libelf-dev iproute2 23 | # test clang 24 | clang -v 25 | # test llvm 26 | llc --version 27 | # test iproute2 28 | ip link 29 | ``` 30 | - `bpftool`命令行安装说明 31 | 32 | 下载Linux内核源码,进行本地编译。 33 | ```bash 34 | # 确认内核版本 35 | uname -r 36 | # 找到对应内核版本的源代码 37 | apt-cache search linux-source 38 | apt install linux-source-5.3.0 39 | apt install libelf-dev 40 | 41 | cd /usr/src/linux-source-5.3.0 42 | tar xjf linux-source-5.3.0.tar.bz2 43 | cd linux-source-5.3.0/tools 44 | make -C bpf/bpftool/ 45 | cd bpf/bpftool/ 46 | ./bpftool prog/net 47 | ``` 48 | 49 | ## 目录说明 50 | - [bpf知识整理](./bpf/README.md) 51 | - 经典文章翻译 52 | - 编译Linux内核所有bpf示例代码 53 | - [网络XDP编程](./xdp/README.md) 54 | - [网络TC编程](./tc/README.md) 55 | - [btf入门](./btf/README.md) 56 | - [bpftrace入门](./bpftrace/README.md) 57 | 58 | ## 常见问题Q&A 59 | ### 1. 'asm/type.h' file not found 60 | 61 | - 错误现象 62 | 63 | 在执行下面命令进行代码编译时,可能会遇到某些头文件找不到的错误: 64 | 65 | ```shell 66 | clang -I ./headers/ -O2 -target bpf -c tc-xdp-drop-tcp.c -o tc-xdp-drop-tcp.o 67 | 68 | In file included from tc-xdp-drop-tcp.c:2: 69 | In file included from /usr/include/linux/bpf.h:11: 70 | /usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not found 71 | #include 72 | ^~~~~~~~~~~~~ 73 | 1 error generated. 74 | ``` 75 | 76 | - 原因分析 77 | 78 | 在源代码文件中引用了某些系统目录(一般为`/usr/include/`)下的头文件,而这些头文件没有出现在目标路径下,导致编译失败。 79 | 80 | 如上述问题中的asm相关文件,asm全称`Architecture Specific Macros`,直译过来“与机器架构相关的宏文件”,顾名思义它是跟机器架构密切相关的,不同的架构x86、x64、arm实现是不一样的,而操作系统并没有提供`/usr/include/asm/`这样通用的目录,只提供了具体架构相关的目录,如`/usr/include/x86_64-linux-gnu/asm/`,因此无法找到引用。 81 | 82 | - 解决方案 83 | 84 | 添加软链`/usr/include/asm/`,指向操作系统自带的asm目录: 85 | ```shell 86 | cd /usr/include 87 | ln -s ./x86_64-linux-gnu/asm asm 88 | ``` 89 | ### 2. 'bpf/bpf_helpers.h' file not found 90 | - 解决方案 91 | 92 | ```shell 93 | apt-get install libbpf-dev 94 | # run `apt-file update` if needed 95 | apt-file list libbpf-dev | grep bpf_helpers.h 96 | # you will get the result like: libbpf-dev: /usr/include/bpf/bpf_helpers.h 97 | ``` 98 | 99 | ## slack社区 100 | 101 | 感兴趣可[加入](https://join.slack.com/t/learning-bpf/shared_invite/zt-1bgiyr7rm-AHMOlzjqJXKGYMcOzjP4XQ) 102 | 103 | ## 参考材料 104 | 105 | -------------------------------------------------------------------------------- /bpf/Makefile: -------------------------------------------------------------------------------- 1 | # For v4.15.0 2 | # SPDX-License-Identifier: GPL-2.0 3 | # List of programs to build 4 | hostprogs-y := test_lru_dist 5 | hostprogs-y += sock_example 6 | hostprogs-y += fds_example 7 | hostprogs-y += sockex1 8 | hostprogs-y += sockex2 9 | hostprogs-y += sockex3 10 | hostprogs-y += tracex1 11 | hostprogs-y += tracex2 12 | hostprogs-y += tracex3 13 | hostprogs-y += tracex4 14 | hostprogs-y += tracex5 15 | hostprogs-y += tracex6 16 | hostprogs-y += test_probe_write_user 17 | hostprogs-y += trace_output 18 | hostprogs-y += lathist 19 | hostprogs-y += offwaketime 20 | hostprogs-y += spintest 21 | hostprogs-y += map_perf_test 22 | hostprogs-y += test_overhead 23 | hostprogs-y += test_cgrp2_array_pin 24 | hostprogs-y += test_cgrp2_attach 25 | hostprogs-y += test_cgrp2_attach2 26 | hostprogs-y += test_cgrp2_sock 27 | hostprogs-y += test_cgrp2_sock2 28 | hostprogs-y += xdp1 29 | hostprogs-y += xdp2 30 | hostprogs-y += xdp_router_ipv4 31 | hostprogs-y += test_current_task_under_cgroup 32 | hostprogs-y += trace_event 33 | hostprogs-y += sampleip 34 | hostprogs-y += tc_l2_redirect 35 | hostprogs-y += lwt_len_hist 36 | hostprogs-y += xdp_tx_iptunnel 37 | hostprogs-y += test_map_in_map 38 | hostprogs-y += per_socket_stats_example 39 | hostprogs-y += load_sock_ops 40 | hostprogs-y += xdp_redirect 41 | hostprogs-y += xdp_redirect_map 42 | hostprogs-y += xdp_redirect_cpu 43 | hostprogs-y += xdp_monitor 44 | hostprogs-y += syscall_tp 45 | # new 0: 46 | hostprogs-y += my_bpf_101 47 | 48 | # Libbpf dependencies 49 | LIBBPF := ../../tools/lib/bpf/bpf.o 50 | CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o 51 | 52 | test_lru_dist-objs := test_lru_dist.o $(LIBBPF) 53 | sock_example-objs := sock_example.o $(LIBBPF) 54 | fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o 55 | sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o 56 | sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o 57 | sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o 58 | tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o 59 | tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o 60 | tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o 61 | tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o 62 | tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o 63 | tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o 64 | load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o 65 | test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o 66 | trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o 67 | lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o 68 | offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o 69 | spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o 70 | map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o 71 | test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o 72 | test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o 73 | test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o 74 | test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS) 75 | test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o 76 | test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o 77 | xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o 78 | # reuse xdp1 source intentionally 79 | xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o 80 | xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o 81 | test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \ 82 | test_current_task_under_cgroup_user.o 83 | trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o 84 | sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o 85 | tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o 86 | lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o 87 | xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o 88 | test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o 89 | per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o 90 | xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o 91 | xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o 92 | xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o 93 | xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o 94 | syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o 95 | # new 1: 96 | my_bpf_101-objs := bpf_load.o $(LIBBPF) my_bpf_101_user.o 97 | 98 | # Tell kbuild to always build the programs 99 | always := $(hostprogs-y) 100 | always += sockex1_kern.o 101 | always += sockex2_kern.o 102 | always += sockex3_kern.o 103 | always += tracex1_kern.o 104 | always += tracex2_kern.o 105 | always += tracex3_kern.o 106 | always += tracex4_kern.o 107 | always += tracex5_kern.o 108 | always += tracex6_kern.o 109 | always += sock_flags_kern.o 110 | always += test_probe_write_user_kern.o 111 | always += trace_output_kern.o 112 | always += tcbpf1_kern.o 113 | always += tcbpf2_kern.o 114 | always += tc_l2_redirect_kern.o 115 | always += lathist_kern.o 116 | always += offwaketime_kern.o 117 | always += spintest_kern.o 118 | always += map_perf_test_kern.o 119 | always += test_overhead_tp_kern.o 120 | always += test_overhead_kprobe_kern.o 121 | always += parse_varlen.o parse_simple.o parse_ldabs.o 122 | always += test_cgrp2_tc_kern.o 123 | always += xdp1_kern.o 124 | always += xdp2_kern.o 125 | always += xdp_router_ipv4_kern.o 126 | always += test_current_task_under_cgroup_kern.o 127 | always += trace_event_kern.o 128 | always += sampleip_kern.o 129 | always += lwt_len_hist_kern.o 130 | always += xdp_tx_iptunnel_kern.o 131 | always += test_map_in_map_kern.o 132 | always += cookie_uid_helper_example.o 133 | always += tcp_synrto_kern.o 134 | always += tcp_rwnd_kern.o 135 | always += tcp_bufs_kern.o 136 | always += tcp_cong_kern.o 137 | always += tcp_iw_kern.o 138 | always += tcp_clamp_kern.o 139 | always += tcp_basertt_kern.o 140 | always += xdp_redirect_kern.o 141 | always += xdp_redirect_map_kern.o 142 | always += xdp_redirect_cpu_kern.o 143 | always += xdp_monitor_kern.o 144 | always += syscall_tp_kern.o 145 | # new 3: 146 | always += my_bpf_101_kern.o 147 | 148 | 149 | HOSTCFLAGS += -I$(objtree)/usr/include 150 | HOSTCFLAGS += -I$(srctree)/tools/lib/ 151 | HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/ 152 | HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include 153 | HOSTCFLAGS += -I$(srctree)/tools/perf 154 | HOSTCFLAGS += -DHAVE_ATTR_TEST=0 155 | 156 | HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable 157 | HOSTLOADLIBES_fds_example += -lelf 158 | HOSTLOADLIBES_sockex1 += -lelf 159 | HOSTLOADLIBES_sockex2 += -lelf 160 | HOSTLOADLIBES_sockex3 += -lelf 161 | HOSTLOADLIBES_tracex1 += -lelf 162 | HOSTLOADLIBES_tracex2 += -lelf 163 | HOSTLOADLIBES_tracex3 += -lelf 164 | HOSTLOADLIBES_tracex4 += -lelf -lrt 165 | HOSTLOADLIBES_tracex5 += -lelf 166 | HOSTLOADLIBES_tracex6 += -lelf 167 | HOSTLOADLIBES_test_cgrp2_sock2 += -lelf 168 | HOSTLOADLIBES_load_sock_ops += -lelf 169 | HOSTLOADLIBES_test_probe_write_user += -lelf 170 | HOSTLOADLIBES_trace_output += -lelf -lrt 171 | HOSTLOADLIBES_lathist += -lelf 172 | HOSTLOADLIBES_offwaketime += -lelf 173 | HOSTLOADLIBES_spintest += -lelf 174 | HOSTLOADLIBES_map_perf_test += -lelf -lrt 175 | HOSTLOADLIBES_test_overhead += -lelf -lrt 176 | HOSTLOADLIBES_xdp1 += -lelf 177 | HOSTLOADLIBES_xdp2 += -lelf 178 | HOSTLOADLIBES_xdp_router_ipv4 += -lelf 179 | HOSTLOADLIBES_test_current_task_under_cgroup += -lelf 180 | HOSTLOADLIBES_trace_event += -lelf 181 | HOSTLOADLIBES_sampleip += -lelf 182 | HOSTLOADLIBES_tc_l2_redirect += -l elf 183 | HOSTLOADLIBES_lwt_len_hist += -l elf 184 | HOSTLOADLIBES_xdp_tx_iptunnel += -lelf 185 | HOSTLOADLIBES_test_map_in_map += -lelf 186 | HOSTLOADLIBES_xdp_redirect += -lelf 187 | HOSTLOADLIBES_xdp_redirect_map += -lelf 188 | HOSTLOADLIBES_xdp_redirect_cpu += -lelf 189 | HOSTLOADLIBES_xdp_monitor += -lelf 190 | HOSTLOADLIBES_syscall_tp += -lelf 191 | 192 | # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: 193 | # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang 194 | LLC ?= llc 195 | CLANG ?= clang 196 | 197 | # Detect that we're cross compiling and use the cross compiler 198 | ifdef CROSS_COMPILE 199 | HOSTCC = $(CROSS_COMPILE)gcc 200 | CLANG_ARCH_ARGS = -target $(ARCH) 201 | endif 202 | 203 | # Trick to allow make to be run from this directory 204 | all: $(LIBBPF) 205 | $(MAKE) -C ../../ $(CURDIR)/ 206 | 207 | clean: 208 | $(MAKE) -C ../../ M=$(CURDIR) clean 209 | @find $(CURDIR) -type f -name '*~' -delete 210 | 211 | $(LIBBPF): FORCE 212 | $(MAKE) -C $(dir $@) $(notdir $@) 213 | 214 | $(obj)/syscall_nrs.s: $(src)/syscall_nrs.c 215 | $(call if_changed_dep,cc_s_c) 216 | 217 | $(obj)/syscall_nrs.h: $(obj)/syscall_nrs.s FORCE 218 | $(call filechk,offsets,__SYSCALL_NRS_H__) 219 | 220 | clean-files += syscall_nrs.h 221 | 222 | FORCE: 223 | 224 | 225 | # Verify LLVM compiler tools are available and bpf target is supported by llc 226 | .PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC) 227 | 228 | verify_cmds: $(CLANG) $(LLC) 229 | @for TOOL in $^ ; do \ 230 | if ! (which -- "$${TOOL}" > /dev/null 2>&1); then \ 231 | echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\ 232 | exit 1; \ 233 | else true; fi; \ 234 | done 235 | 236 | verify_target_bpf: verify_cmds 237 | @if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \ 238 | echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\ 239 | echo " NOTICE: LLVM version >= 3.7.1 required" ;\ 240 | exit 2; \ 241 | else true; fi 242 | 243 | $(src)/*.c: verify_target_bpf 244 | 245 | $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h 246 | 247 | # asm/sysreg.h - inline assembly used by it is incompatible with llvm. 248 | # But, there is no easy way to fix it, so just exclude it since it is 249 | # useless for BPF samples. 250 | $(obj)/%.o: $(src)/%.c 251 | $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \ 252 | -I$(srctree)/tools/testing/selftests/bpf/ \ 253 | -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \ 254 | -D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \ 255 | -Wno-gnu-variable-sized-type-not-at-end \ 256 | -Wno-address-of-packed-member -Wno-tautological-compare \ 257 | -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \ 258 | -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@ 259 | -------------------------------------------------------------------------------- /bpf/README.md: -------------------------------------------------------------------------------- 1 | # bpf知识整理 2 | 3 | ## 经典文章翻译 4 | - [A thorough introduction to eBPF](https://davidlovezoe.club/ebpf-learning-001) 5 | - [An introduction to the BPF Compiler Collection](https://davidlovezoe.club/ebpf-learning-bcc-intro) 6 | - [Early packet drop — and more — with BPF](https://davidlovezoe.club/ebpf-learning-xdp-init-intro) 7 | 8 | ## 编译Linux内核源码bpf示例代码 samples/bpf 9 | 10 | 0. 详细说明可以查看博文:[https://davidlovezoe.club/compile-bpf-examples](https://davidlovezoe.club/compile-bpf-examples) 11 | 12 | 1. 下载你的实验环境对应的内核版本的Linux内核源代码 13 | 2. 保证你的实验环境已经安装`clang`和`llvm` 14 | * clang >= version 3.4.0 15 | * llvm >= version 3.7.1 16 | 3. 编译bpf示例代码 17 | ```bash 18 | # 切换到内核源代码根目录 19 | cd linux_sourcecode/ 20 | # 生成内核编译时需要的头文件 21 | make headers_install 22 | # 生成.config文件为下面执行命令做准备 23 | make menuconfig 24 | # 使用make命令编译samples/bpf/目录下所有bpf示例代码,注意需要加上最后的/符号 25 | make samples/bpf/ # or make M=samples/bpf 26 | 27 | # 本人的实验环境是Ubuntu 18.04 with 4.15.0内核,在执行上面的make命令时, 28 | # 发生了以下的错误信息 29 | ... 30 | In file included from ./tools/perf/perf-sys.h:9:0, 31 | from samples/bpf/bpf_load.c:28: 32 | ./tools/perf/perf-sys.h: In function ‘sys_perf_event_open’: 33 | ./tools/perf/perf-sys.h:68:15: error: ‘test_attr__enabled’ undeclared (first use in this function) 34 | if (unlikely(test_attr__enabled)) 35 | ^ 36 | ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ 37 | # define unlikely(x) __builtin_expect(!!(x), 0) 38 | ^ 39 | ./tools/perf/perf-sys.h:68:15: note: each undeclared identifier is reported only once for each function it appears in 40 | if (unlikely(test_attr__enabled)) 41 | ^ 42 | ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ 43 | # define unlikely(x) __builtin_expect(!!(x), 0) 44 | ^ 45 | In file included from samples/bpf/bpf_load.c:28:0: 46 | ./tools/perf/perf-sys.h:69:3: warning: implicit declaration of function ‘test_attr__open’ [-Wimplicit-function-declaration] 47 | test_attr__open(attr, pid, cpu, fd, group_fd, flags); 48 | ^~~~~~~~~~~~~~~ 49 | scripts/Makefile.host:107: recipe for target 'samples/bpf/bpf_load.o' failed 50 | make[1]: *** [samples/bpf/bpf_load.o] Error 1 51 | Makefile:1823: recipe for target 'samples/bpf/' failed 52 | make: *** [samples/bpf/] Error 2 53 | 54 | # 查看./tools/perf/perf-sys.h这个文件,发现报错的那一行是test开头 55 | # 通过Google发现了内核大佬们的邮件来往:https://www.spinics.net/lists/netdev/msg608676.html 56 | # 大佬建议由于是测试相关的代码,所以可以skip掉。 57 | # 修改完的文件就是同目录下的perf-sys.h,请斟酌参考 58 | # 重新运行一下命令 59 | make samples/bpf/ # and it works 60 | ``` 61 | 62 | ## bpftool cheetsheet 63 | ```bash 64 | 65 | bpftool prog tracelog # short for “cat /sys/kernel/debug/tracing/trace_pipe” 66 | 67 | ``` 68 | -------------------------------------------------------------------------------- /bpf/bpf-maps/README.md: -------------------------------------------------------------------------------- 1 | # 数据交换、消息传递的桥梁——BPF Map 2 | 3 | ## 为什么需要BPF Map 4 | 5 | 通过消息传递来触发程序中的行为是软件工程中广泛使用的技术。一个程序可以通过发送消息来修改另一个程序的行为,这也允许这些程序之间通过这个方式来传递信息。 6 | 7 | 关于BPF最吸引人的一个方面,就是运行在内核上的代码和加载上述代码的程序可以在运行时使用消息传递相互通信。 8 | 9 | 而BPF Map是用户空间和内核空间之间的数据交换、信息传递的桥梁。 10 | 11 | ## BPF Map是什么 12 | 13 | BPF Map本质上是驻留在内核中的以键/值方式存储的数据结构,它们可以被任何知道它们的BPF程序访问。在用户空间运行的程序也可以通过使用文件描述符(File Descriptors)来访问BPF Map。你可以在BPF Map中存储任何类型的数据,只要你事先正确指定数据大小。在内核中,键和值都被视为二进制的blobs的方式来存储。 14 | -------------------------------------------------------------------------------- /bpf/bpf-maps/examples-in-kernel/Makefile_addmyown: -------------------------------------------------------------------------------- 1 | # SPDX-License-Identifier: GPL-2.0 2 | # List of programs to build 3 | hostprogs-y := test_lru_dist 4 | hostprogs-y += sock_example 5 | hostprogs-y += fds_example 6 | hostprogs-y += sockex1 7 | hostprogs-y += sockex2 8 | hostprogs-y += sockex3 9 | hostprogs-y += tracex1 10 | hostprogs-y += tracex2 11 | hostprogs-y += tracex3 12 | hostprogs-y += tracex4 13 | hostprogs-y += tracex5 14 | hostprogs-y += tracex6 15 | hostprogs-y += test_probe_write_user 16 | hostprogs-y += trace_output 17 | hostprogs-y += lathist 18 | hostprogs-y += offwaketime 19 | hostprogs-y += spintest 20 | hostprogs-y += map_perf_test 21 | hostprogs-y += test_overhead 22 | hostprogs-y += test_cgrp2_array_pin 23 | hostprogs-y += test_cgrp2_attach 24 | hostprogs-y += test_cgrp2_attach2 25 | hostprogs-y += test_cgrp2_sock 26 | hostprogs-y += test_cgrp2_sock2 27 | hostprogs-y += xdp1 28 | hostprogs-y += xdp2 29 | hostprogs-y += xdp_router_ipv4 30 | hostprogs-y += test_current_task_under_cgroup 31 | hostprogs-y += trace_event 32 | hostprogs-y += sampleip 33 | hostprogs-y += tc_l2_redirect 34 | hostprogs-y += lwt_len_hist 35 | hostprogs-y += xdp_tx_iptunnel 36 | hostprogs-y += test_map_in_map 37 | hostprogs-y += per_socket_stats_example 38 | hostprogs-y += load_sock_ops 39 | hostprogs-y += xdp_redirect 40 | hostprogs-y += xdp_redirect_map 41 | hostprogs-y += xdp_redirect_cpu 42 | hostprogs-y += xdp_monitor 43 | hostprogs-y += syscall_tp 44 | hostprogs-y += xdp_ip_tracker 45 | 46 | # Libbpf dependencies 47 | LIBBPF := ../../tools/lib/bpf/bpf.o 48 | CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o 49 | 50 | test_lru_dist-objs := test_lru_dist.o $(LIBBPF) 51 | sock_example-objs := sock_example.o $(LIBBPF) 52 | fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o 53 | sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o 54 | sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o 55 | sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o 56 | tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o 57 | tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o 58 | tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o 59 | tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o 60 | tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o 61 | tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o 62 | load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o 63 | test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o 64 | trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o 65 | lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o 66 | offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o 67 | spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o 68 | map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o 69 | test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o 70 | test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o 71 | test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o 72 | test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS) 73 | test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o 74 | test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o 75 | xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o 76 | # reuse xdp1 source intentionally 77 | xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o 78 | xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o 79 | xdp_ip_tracker-objs := bpf_load.o $(LIBBPF) xdp_ip_tracker_user.o 80 | test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \ 81 | test_current_task_under_cgroup_user.o 82 | trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o 83 | sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o 84 | tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o 85 | lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o 86 | xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o 87 | test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o 88 | per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o 89 | xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o 90 | xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o 91 | xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o 92 | xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o 93 | syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o 94 | 95 | # Tell kbuild to always build the programs 96 | always := $(hostprogs-y) 97 | always += sockex1_kern.o 98 | always += sockex2_kern.o 99 | always += sockex3_kern.o 100 | always += tracex1_kern.o 101 | always += tracex2_kern.o 102 | always += tracex3_kern.o 103 | always += tracex4_kern.o 104 | always += tracex5_kern.o 105 | always += tracex6_kern.o 106 | always += sock_flags_kern.o 107 | always += test_probe_write_user_kern.o 108 | always += trace_output_kern.o 109 | always += tcbpf1_kern.o 110 | always += tcbpf2_kern.o 111 | always += tc_l2_redirect_kern.o 112 | always += lathist_kern.o 113 | always += offwaketime_kern.o 114 | always += spintest_kern.o 115 | always += map_perf_test_kern.o 116 | always += test_overhead_tp_kern.o 117 | always += test_overhead_kprobe_kern.o 118 | always += parse_varlen.o parse_simple.o parse_ldabs.o 119 | always += test_cgrp2_tc_kern.o 120 | always += xdp1_kern.o 121 | always += xdp2_kern.o 122 | always += xdp_router_ipv4_kern.o 123 | always += test_current_task_under_cgroup_kern.o 124 | always += trace_event_kern.o 125 | always += sampleip_kern.o 126 | always += lwt_len_hist_kern.o 127 | always += xdp_tx_iptunnel_kern.o 128 | always += test_map_in_map_kern.o 129 | always += cookie_uid_helper_example.o 130 | always += tcp_synrto_kern.o 131 | always += tcp_rwnd_kern.o 132 | always += tcp_bufs_kern.o 133 | always += tcp_cong_kern.o 134 | always += tcp_iw_kern.o 135 | always += tcp_clamp_kern.o 136 | always += tcp_basertt_kern.o 137 | always += xdp_redirect_kern.o 138 | always += xdp_redirect_map_kern.o 139 | always += xdp_redirect_cpu_kern.o 140 | always += xdp_monitor_kern.o 141 | always += syscall_tp_kern.o 142 | always += xdp_ip_tracker_kern.o 143 | 144 | HOSTCFLAGS += -I$(objtree)/usr/include 145 | HOSTCFLAGS += -I$(srctree)/tools/lib/ 146 | HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/ 147 | HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include 148 | HOSTCFLAGS += -I$(srctree)/tools/perf 149 | HOSTCFLAGS += -DHAVE_ATTR_TEST=0 150 | 151 | HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable 152 | HOSTLOADLIBES_fds_example += -lelf 153 | HOSTLOADLIBES_sockex1 += -lelf 154 | HOSTLOADLIBES_sockex2 += -lelf 155 | HOSTLOADLIBES_sockex3 += -lelf 156 | HOSTLOADLIBES_tracex1 += -lelf 157 | HOSTLOADLIBES_tracex2 += -lelf 158 | HOSTLOADLIBES_tracex3 += -lelf 159 | HOSTLOADLIBES_tracex4 += -lelf -lrt 160 | HOSTLOADLIBES_tracex5 += -lelf 161 | HOSTLOADLIBES_tracex6 += -lelf 162 | HOSTLOADLIBES_test_cgrp2_sock2 += -lelf 163 | HOSTLOADLIBES_load_sock_ops += -lelf 164 | HOSTLOADLIBES_test_probe_write_user += -lelf 165 | HOSTLOADLIBES_trace_output += -lelf -lrt 166 | HOSTLOADLIBES_lathist += -lelf 167 | HOSTLOADLIBES_offwaketime += -lelf 168 | HOSTLOADLIBES_spintest += -lelf 169 | HOSTLOADLIBES_map_perf_test += -lelf -lrt 170 | HOSTLOADLIBES_test_overhead += -lelf -lrt 171 | HOSTLOADLIBES_xdp1 += -lelf 172 | HOSTLOADLIBES_xdp2 += -lelf 173 | HOSTLOADLIBES_xdp_router_ipv4 += -lelf 174 | HOSTLOADLIBES_test_current_task_under_cgroup += -lelf 175 | HOSTLOADLIBES_trace_event += -lelf 176 | HOSTLOADLIBES_sampleip += -lelf 177 | HOSTLOADLIBES_tc_l2_redirect += -l elf 178 | HOSTLOADLIBES_lwt_len_hist += -l elf 179 | HOSTLOADLIBES_xdp_tx_iptunnel += -lelf 180 | HOSTLOADLIBES_test_map_in_map += -lelf 181 | HOSTLOADLIBES_xdp_redirect += -lelf 182 | HOSTLOADLIBES_xdp_redirect_map += -lelf 183 | HOSTLOADLIBES_xdp_redirect_cpu += -lelf 184 | HOSTLOADLIBES_xdp_monitor += -lelf 185 | HOSTLOADLIBES_syscall_tp += -lelf 186 | HOSTLOADLIBES_xdp_ip_tracker += -lelf -lrt 187 | 188 | # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: 189 | # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang 190 | LLC ?= llc 191 | CLANG ?= clang 192 | 193 | # Detect that we're cross compiling and use the cross compiler 194 | ifdef CROSS_COMPILE 195 | HOSTCC = $(CROSS_COMPILE)gcc 196 | CLANG_ARCH_ARGS = -target $(ARCH) 197 | endif 198 | 199 | # Trick to allow make to be run from this directory 200 | all: $(LIBBPF) 201 | $(MAKE) -C ../../ $(CURDIR)/ 202 | 203 | clean: 204 | $(MAKE) -C ../../ M=$(CURDIR) clean 205 | @find $(CURDIR) -type f -name '*~' -delete 206 | 207 | $(LIBBPF): FORCE 208 | $(MAKE) -C $(dir $@) $(notdir $@) 209 | 210 | $(obj)/syscall_nrs.s: $(src)/syscall_nrs.c 211 | $(call if_changed_dep,cc_s_c) 212 | 213 | $(obj)/syscall_nrs.h: $(obj)/syscall_nrs.s FORCE 214 | $(call filechk,offsets,__SYSCALL_NRS_H__) 215 | 216 | clean-files += syscall_nrs.h 217 | 218 | FORCE: 219 | 220 | 221 | # Verify LLVM compiler tools are available and bpf target is supported by llc 222 | .PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC) 223 | 224 | verify_cmds: $(CLANG) $(LLC) 225 | @for TOOL in $^ ; do \ 226 | if ! (which -- "$${TOOL}" > /dev/null 2>&1); then \ 227 | echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\ 228 | exit 1; \ 229 | else true; fi; \ 230 | done 231 | 232 | verify_target_bpf: verify_cmds 233 | @if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \ 234 | echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\ 235 | echo " NOTICE: LLVM version >= 3.7.1 required" ;\ 236 | exit 2; \ 237 | else true; fi 238 | 239 | $(src)/*.c: verify_target_bpf 240 | 241 | $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h 242 | 243 | # asm/sysreg.h - inline assembly used by it is incompatible with llvm. 244 | # But, there is no easy way to fix it, so just exclude it since it is 245 | # useless for BPF samples. 246 | $(obj)/%.o: $(src)/%.c 247 | $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \ 248 | -I$(srctree)/tools/testing/selftests/bpf/ \ 249 | -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \ 250 | -D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \ 251 | -Wno-gnu-variable-sized-type-not-at-end \ 252 | -Wno-address-of-packed-member -Wno-tautological-compare \ 253 | -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \ 254 | -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@ -------------------------------------------------------------------------------- /bpf/bpf-maps/examples-in-kernel/README.md: -------------------------------------------------------------------------------- 1 | # 基于Linux内核源码中的BPF示例代码,来添加自己的代码,测试BPF Map功能 2 | Linux内核代码包含了大量的成熟的BPF示例代码,其中大部分都是使用了BPF Map来进行数据交换的。 3 | 建议先阅读如何[编译运行LINUX内核源码中的BPF示例代码](https://davidlovezoe.club/compile-bpf-examples),确保能理解Linux内核对于这些示例代码的使用方式。 4 | -------------------------------------------------------------------------------- /bpf/bpf-maps/examples-in-kernel/xdp_ip_tracker_common.h: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | // define the struct for the key of bpf map 4 | struct pair { 5 | __u32 src_ip; 6 | __u32 dest_ip; 7 | }; 8 | 9 | struct stats { 10 | __u64 tx_cnt; // the sending request count 11 | __u64 rx_cnt; // the received request count 12 | __u64 tx_bytes; // the sending request bytes 13 | __u64 rx_bytes; // the sending received bytes 14 | }; -------------------------------------------------------------------------------- /bpf/bpf-maps/examples-in-kernel/xdp_ip_tracker_kern.c: -------------------------------------------------------------------------------- 1 | #define KBUILD_MODNAME "foo" 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include "bpf_helpers.h" 11 | #include "bpf_endian.h" 12 | #include "xdp_ip_tracker_common.h" 13 | 14 | #define bpf_printk(fmt, ...) \ 15 | ({ \ 16 | char ____fmt[] = fmt; \ 17 | bpf_trace_printk(____fmt, sizeof(____fmt), \ 18 | ##__VA_ARGS__); \ 19 | }) 20 | 21 | struct bpf_map_def SEC("maps") tracker_map = { 22 | .type = BPF_MAP_TYPE_HASH, 23 | .key_size = sizeof(struct pair), 24 | .value_size = sizeof(struct stats), 25 | .max_entries = 2048, 26 | }; 27 | 28 | static __always_inline bool parse_and_track(bool is_rx, void *data_begin, void *data_end, struct pair *pair) 29 | { 30 | struct ethhdr *eth = data_begin; 31 | 32 | if ((void *)(eth + 1) > data_end) 33 | return false; 34 | 35 | if (eth->h_proto == bpf_htons(ETH_P_IP)) 36 | { 37 | struct iphdr *iph = (struct iphdr *)(eth + 1); 38 | if ((void *)(iph + 1) > data_end) 39 | return false; 40 | 41 | pair->src_ip = is_rx ? iph->daddr : iph->saddr; 42 | pair->dest_ip = is_rx ? iph->saddr : iph->daddr; 43 | 44 | // update the map for track 45 | struct stats *stats, newstats = {0, 0, 0, 0}; 46 | long long bytes = data_end - data_begin; 47 | 48 | stats = bpf_map_lookup_elem(&tracker_map, pair); 49 | if (stats) 50 | { 51 | if (is_rx) 52 | { 53 | stats->rx_cnt++; 54 | stats->rx_bytes += bytes; 55 | } 56 | else 57 | { 58 | stats->tx_cnt++; 59 | stats->tx_bytes += bytes; 60 | } 61 | } 62 | else 63 | { 64 | if (is_rx) 65 | { 66 | newstats.rx_cnt = 1; 67 | newstats.rx_bytes = bytes; 68 | } 69 | else 70 | { 71 | newstats.tx_cnt = 1; 72 | newstats.tx_bytes = bytes; 73 | } 74 | bpf_map_update_elem(&tracker_map, pair, &newstats, BPF_NOEXIST); 75 | } 76 | return true; 77 | } 78 | return false; 79 | } 80 | 81 | SEC("xdp_ip_tracker") 82 | int _xdp_ip_tracker(struct xdp_md *ctx) 83 | { 84 | // the struct to store the ip address as the keys of bpf map 85 | struct pair pair; 86 | 87 | bpf_printk("starting xdp ip tracker...\n"); 88 | 89 | void *data_end = (void *)(long)ctx->data_end; 90 | void *data = (void *)(long)ctx->data; 91 | // pass if the network packet is not ipv4 92 | if (!parse_and_track(true, data, data_end, &pair)) 93 | return XDP_PASS; 94 | 95 | return XDP_DROP; 96 | } 97 | 98 | char _license[] SEC("license") = "GPL"; -------------------------------------------------------------------------------- /bpf/bpf-maps/examples-in-kernel/xdp_ip_tracker_user.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include "bpf_load.h" 15 | #include 16 | #include "bpf_util.h" 17 | #include "xdp_ip_tracker_common.h" 18 | 19 | static int ifindex = 6; // target network interface to attach, you can find it via `ip a` 20 | static __u32 xdp_flags = 0; 21 | 22 | // unlink the xdp program and exit 23 | static void int_exit(int sig) 24 | { 25 | printf("stopping\n"); 26 | set_link_xdp_fd(ifindex, -1, xdp_flags); 27 | exit(0); 28 | } 29 | 30 | // An XDP program which track packets with IP address 31 | // Usage: ./xdp_ip_tracker 32 | int main(int argc, char **argv) 33 | { 34 | char *filename = "xdp_ip_tracker_kern.o"; 35 | // change limits 36 | struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 37 | if (setrlimit(RLIMIT_MEMLOCK, &r)) 38 | { 39 | perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)"); 40 | return 1; 41 | } 42 | 43 | // load the bpf kern file 44 | if (load_bpf_file(filename)) 45 | { 46 | printf("error %s", bpf_log_buf); 47 | return 1; 48 | } 49 | 50 | // confirm the bpf prog fd is available 51 | if (!prog_fd[0]) 52 | { 53 | printf("load_bpf_file: %s\n", strerror(errno)); 54 | return 1; 55 | } 56 | 57 | // add signal handlers 58 | signal(SIGINT, int_exit); 59 | signal(SIGTERM, int_exit); 60 | 61 | // link the xdp program to the network interface 62 | if (set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) 63 | { 64 | printf("link set xdp fd failed\n"); 65 | return 1; 66 | } 67 | 68 | int result; 69 | struct pair next_key, lookup_key = {0, 0}; 70 | struct stats value = {}; 71 | while (1) 72 | { 73 | sleep(2); 74 | // retrieve the bpf map of statistics 75 | while (bpf_map_get_next_key(map_fd[0], &lookup_key, &next_key) != -1) 76 | { 77 | //printf("The local ip of next key in the map is: '%d'\n", next_key.src_ip); 78 | //printf("The remote ip of next key in the map is: '%d'\n", next_key.dest_ip); 79 | struct in_addr local = {next_key.src_ip}; 80 | struct in_addr remote = {next_key.dest_ip}; 81 | printf("The local ip of next key in the map is: '%s'\n", inet_ntoa(local)); 82 | printf("The remote ip of next key in the map is: '%s'\n", inet_ntoa(remote)); 83 | 84 | // get the value via the key 85 | // TODO: change to assert 86 | // assert(bpf_map_lookup_elem(map_fd[0], &next_key, &value) == 0) 87 | result = bpf_map_lookup_elem(map_fd[0], &next_key, &value); 88 | if (result == 0) 89 | { 90 | // print the value 91 | printf("rx_cnt value read from the map: '%llu'\n", value.rx_cnt); 92 | printf("rx_bytes value read from the map: '%llu'\n", value.rx_bytes); 93 | } 94 | else 95 | { 96 | printf("Failed to read value from the map: %d (%s)\n", result, strerror(errno)); 97 | } 98 | lookup_key = next_key; 99 | printf("\n\n"); 100 | } 101 | printf("start a new loop...\n"); 102 | // reset the lookup key for a fresh start 103 | lookup_key.src_ip = 0; 104 | lookup_key.dest_ip = 0; 105 | } 106 | 107 | printf("end\n"); 108 | // unlink the xdp program 109 | set_link_xdp_fd(ifindex, -1, xdp_flags); 110 | return 0; 111 | } -------------------------------------------------------------------------------- /bpf/bpf_trace_printk_definition.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nevermosby/linux-bpf-learning/4c407a218576d56797decf311ff2f5cab1333804/bpf/bpf_trace_printk_definition.pdf -------------------------------------------------------------------------------- /bpf/perf-sys.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: GPL-2.0 */ 2 | #ifndef _PERF_SYS_H 3 | #define _PERF_SYS_H 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #ifdef __powerpc__ 14 | #define CPUINFO_PROC {"cpu"} 15 | #endif 16 | 17 | #ifdef __s390__ 18 | #define CPUINFO_PROC {"vendor_id"} 19 | #endif 20 | 21 | #ifdef __sh__ 22 | #define CPUINFO_PROC {"cpu type"} 23 | #endif 24 | 25 | #ifdef __hppa__ 26 | #define CPUINFO_PROC {"cpu"} 27 | #endif 28 | 29 | #ifdef __sparc__ 30 | #define CPUINFO_PROC {"cpu"} 31 | #endif 32 | 33 | #ifdef __alpha__ 34 | #define CPUINFO_PROC {"cpu model"} 35 | #endif 36 | 37 | #ifdef __arm__ 38 | #define CPUINFO_PROC {"model name", "Processor"} 39 | #endif 40 | 41 | #ifdef __mips__ 42 | #define CPUINFO_PROC {"cpu model"} 43 | #endif 44 | 45 | #ifdef __arc__ 46 | #define CPUINFO_PROC {"Processor"} 47 | #endif 48 | 49 | #ifdef __metag__ 50 | #define CPUINFO_PROC {"CPU"} 51 | #endif 52 | 53 | #ifdef __xtensa__ 54 | #define CPUINFO_PROC {"core ID"} 55 | #endif 56 | 57 | #ifndef CPUINFO_PROC 58 | #define CPUINFO_PROC { "model name", } 59 | #endif 60 | 61 | #ifndef HAVE_ATTR_TEST 62 | #define HAVE_ATTR_TEST 0 63 | #endif 64 | 65 | static inline int 66 | sys_perf_event_open(struct perf_event_attr *attr, 67 | pid_t pid, int cpu, int group_fd, 68 | unsigned long flags) 69 | { 70 | int fd; 71 | 72 | fd = syscall(__NR_perf_event_open, attr, pid, cpu, 73 | group_fd, flags); 74 | 75 | 76 | #if HAVE_ATTR_TEST 77 | if (unlikely(test_attr__enabled)) 78 | test_attr__open(attr, pid, cpu, fd, group_fd, flags); 79 | #endif 80 | return fd; 81 | } 82 | 83 | #endif /* _PERF_SYS_H */ -------------------------------------------------------------------------------- /bpftrace/README.md: -------------------------------------------------------------------------------- 1 | # bpftrace入门(In Progress) 2 | 3 | ## 监控bpftrace调用bpf()系统调用 4 | ```shell 5 | > strace -e bpf /usr/bin/bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' 6 | bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 120) = 3 7 | bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERCPU_HASH, key_size=16, value_size=8, max_entries=4096, map_flags=0, inner_map_fd=0, map_name="@", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 120) = -1 EINVAL (Invalid argument) 8 | bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERCPU_HASH, key_size=16, value_size=8, max_entries=4096, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 120) = 3 9 | bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=2, map_flags=0, inner_map_fd=0, map_name="printf", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 120) = 4 10 | Attaching 1 probe... 11 | bpf(BPF_MAP_UPDATE_ELEM, {map_fd=4, key=0x7ffeae5cd18c, value=0x7ffeae5cd190, flags=BPF_ANY}, 120) = 0 12 | bpf(BPF_MAP_UPDATE_ELEM, {map_fd=4, key=0x7ffeae5cd18c, value=0x7ffeae5cd190, flags=BPF_ANY}, 120) = 0 13 | bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=27, insns=0x7fa74c50c000, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 13, 19), prog_flags=0, prog_name="sys_enter", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 120) = 9 14 | ``` -------------------------------------------------------------------------------- /btf/README.md: -------------------------------------------------------------------------------- 1 | # BTF(BPF Type Format)入门(In Progress) 2 | 3 | ## BTF诞生原因 4 | 先来看看[Linux内核社区是怎么说的](https://www.kernel.org/doc/html/latest/bpf/btf.html): 5 | 6 | > BTF (BPF Type Format) is the metadata format which encodes the debug info 7 | related to BPF program/map. The name BTF was used initially to describe data 8 | types. The BTF was later extended to include function info for defined 9 | subroutines, and line info for source/line information. 10 | 11 | > BTF(BPF类型格式)是一种元数据格式,对与BPF程序/map有关的调试信息进行编码。与BPF程序/map有关的元数据格式。BTF这个名字最初是用来描述数据类型的。后来,BTF被扩展到包括已定义的子程序的函数信息和源/行信息的行信息。 12 | 13 | > The debug info is used for map pretty print, function signature, etc. The function signature enables better bpf program/function kernel symbol. The line info helps generate source annotated translated byte code, jited code and verifier log. 14 | 15 | > 调试信息用于map pretty print、函数签名等。函数签名可以更好地实现bpf程序/函数的内核符号。行信息有助于生成源码注释的翻译字节码、jited代码和验证器日志。 16 | 17 | ### 是什么,不是什么 18 | BTF (BPF Type Format) 是作为一个更通用,更详细的DWARF调试信息的替代品而创建的。BTF是一种节省空间,紧凑但仍具有足够表达能力的格式,可以描述C程序的所有类型信息 19 | ### 优势 20 | - BTF类型信息,用于允许获取关于内核和BPF程序类型和代码的关键信息,进而为解决BPF CO-RE的其他难题提供了可能性 21 | - 由于其简单性和使用的重复数据删除算法,与DWARF相比,BTF的大小可减少多达100倍。 22 | 23 | ## BTF的应用 24 | 1. 内核开启BTF 25 | 启用CONFIG_DEBUG_INFO_BTF=y内核选项即可。内核本身可以使用BTF功能,用于增强BPF验证程序自身的功能。 26 | 27 | 2. 程序中声明使用BTF Map 28 | 使用`SEC(.maps)`宏定义(是在这个[commit](https://github.com/libbpf/libbpf/commit/ec13b303499c881496116881784883c9e44e436b)中引入) 29 | 30 | ### 对比使用前后的差别 31 | 32 | ```s 33 | > llvm-objdump -h btf_xdp_cnt.o 34 | 35 | btf_xdp_cnt.o: file format elf64-bpf 36 | 37 | Sections: 38 | Idx Name Size VMA Type 39 | 0 00000000 0000000000000000 40 | 1 .strtab 000000e3 0000000000000000 41 | 2 .text 00000000 0000000000000000 TEXT 42 | 3 xdp_count 00000120 0000000000000000 TEXT 43 | 4 .relxdp_count 00000020 0000000000000000 44 | 5 .maps 00000020 0000000000000000 DATA 45 | 6 license 00000004 0000000000000000 DATA 46 | 7 .debug_loc 0000017c 0000000000000000 DEBUG 47 | 8 .debug_abbrev 00000119 0000000000000000 DEBUG 48 | 9 .debug_info 0000028e 0000000000000000 DEBUG 49 | 10 .rel.debug_info 000003d0 0000000000000000 50 | 11 .debug_str 000001b7 0000000000000000 DEBUG 51 | 12 .BTF 000004d4 0000000000000000 52 | 13 .rel.BTF 00000020 0000000000000000 53 | 14 .BTF.ext 00000140 0000000000000000 54 | 15 .rel.BTF.ext 00000110 0000000000000000 55 | 16 .debug_frame 00000028 0000000000000000 DEBUG 56 | 17 .rel.debug_frame 00000020 0000000000000000 57 | 18 .debug_line 00000146 0000000000000000 DEBUG 58 | 19 .rel.debug_line 00000010 0000000000000000 59 | 20 .llvm_addrsig 00000003 0000000000000000 60 | 21 .symtab 00000150 0000000000000000 61 | ``` 62 | 63 | ## BTF工作机制 64 | 65 | 1. 初始化机制 66 | bpf_object__init_maps(https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L2568) -> 67 | bpf_object__init_user_btf_maps(https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L2516) -> 68 | bpf_object__init_user_btf_map -> 69 | bpf_object__add_map: https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L1401 70 | 71 | 72 | ## 联合BPF CO-RE机制 73 | 74 | ## 参考链接 75 | - https://www.containiq.com/post/btf-bpf-type-format 76 | - https://github.com/libbpf/libbpf/blob/master/src/btf.h 77 | - https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L2516 78 | - https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L2579 -------------------------------------------------------------------------------- /btf/btf-xdp-cnt.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | /* 7 | * define map without btf 8 | struct bpf_map_def SEC("maps") cnt = { 9 | .type = BPF_MAP_TYPE_ARRAY, 10 | .key_size = sizeof(__u32), 11 | .value_size = sizeof(long), 12 | .max_entries = 2, 13 | }; 14 | */ 15 | 16 | // define map with btf 17 | struct { 18 | __uint(type, BPF_MAP_TYPE_ARRAY); 19 | __type(key, __u32); 20 | __type(value, long); 21 | __uint(max_entries, 2); 22 | } cnt SEC(".maps"); 23 | 24 | /* 25 | another define example 26 | struct { 27 | __u32 type; 28 | __u32 max_entries; 29 | __u32 map_flags; 30 | __u32 key_size; 31 | __u32 value_size; 32 | } stackmap SEC(".maps") = { 33 | .type = BPF_MAP_TYPE_STACK_TRACE, 34 | .max_entries = 128, 35 | .map_flags = BPF_F_STACK_BUILD_ID, 36 | .key_size = sizeof(__u32), 37 | .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id), 38 | }; 39 | */ 40 | 41 | SEC("xdp_count") 42 | int xdp_count_prog(struct xdp_md *ctx) 43 | { 44 | void *data_end = (void *)(long)ctx->data_end; 45 | void *data = (void *)(long)ctx->data; 46 | __u32 ipv6_key = 0; 47 | __u32 ipv4_key = 1; 48 | long *value; 49 | __u16 h_proto; 50 | struct ethhdr *eth = data; 51 | if (data + sizeof(struct ethhdr) > data_end) 52 | return XDP_DROP; 53 | 54 | h_proto = eth->h_proto; 55 | if (h_proto == htons(ETH_P_IPV6)) { 56 | value = bpf_map_lookup_elem(&cnt, &ipv6_key); 57 | if (value) 58 | *value += 1; 59 | return XDP_PASS; 60 | } 61 | value = bpf_map_lookup_elem(&cnt, &ipv4_key); 62 | if (value) 63 | *value += 1; 64 | return XDP_PASS; 65 | 66 | } 67 | 68 | char _license[] SEC("license") = "GPL"; -------------------------------------------------------------------------------- /bumblebee/tcp_kprobe.c: -------------------------------------------------------------------------------- 1 | #include "vmlinux.h" 2 | #include "bpf/bpf_helpers.h" 3 | #include "bpf/bpf_core_read.h" 4 | #include "bpf/bpf_tracing.h" 5 | #include "solo_types.h" 6 | 7 | // 1. Change the license if necessary 8 | char __license[] SEC("license") = "Dual MIT/GPL"; 9 | 10 | struct event_t { 11 | ipv4_addr daddr; 12 | u32 pid; 13 | } __attribute__((packed)); 14 | 15 | struct dimensions_t { 16 | ipv4_addr daddr; 17 | } __attribute__((packed)); 18 | 19 | struct { 20 | __uint(type, BPF_MAP_TYPE_HASH); 21 | __uint(max_entries, 8192); 22 | __type(key, struct dimensions_t); 23 | __type(value, u64); 24 | } connection_count SEC(".maps.counter"); 25 | 26 | // This is the definition for the global map which both our 27 | // bpf program and user space program can access. 28 | // More info and map types can be found here: https://www.man7.org/linux/man-pages/man2/bpf.2.html 29 | struct { 30 | __uint(max_entries, 1 << 24); 31 | __uint(type, BPF_MAP_TYPE_RINGBUF); 32 | __type(value, struct event_t); 33 | } events SEC(".maps.print"); 34 | 35 | SEC("kprobe/tcp_v4_connect") 36 | int BPF_KPROBE(tcp_v4_connect, struct sock *sk, struct sockaddr *uaddr) { 37 | // Init event pointer 38 | struct event_t *event; 39 | struct dimensions_t hash_key = {}; 40 | __u32 daddr; 41 | u64 counter; 42 | u64 *counterp; 43 | 44 | // read in the destination address 45 | struct sockaddr_in *usin = (struct sockaddr_in *)uaddr; 46 | daddr = BPF_CORE_READ(usin, sin_addr.s_addr); 47 | 48 | // Reserve a spot in the ringbuffer for our event 49 | event = bpf_ringbuf_reserve(&events, sizeof(struct event_t), 0); 50 | if (!event) { 51 | return 0; 52 | } 53 | // 3. set data for our event 54 | event->pid = bpf_get_current_pid_tgid(); 55 | event->daddr = daddr; 56 | // submit the event (this makes it available for consumption) 57 | bpf_ringbuf_submit(event, 0); 58 | 59 | // increment the counter for this address 60 | hash_key.daddr = daddr; 61 | counterp = bpf_map_lookup_elem(&connection_count, &hash_key); 62 | if (counterp) { 63 | __sync_fetch_and_add(counterp, 1); 64 | } else { 65 | // we may miss N events, where N is number of CPUs. We may want to 66 | // fix this for prod, by adding another lookup/update calls here. 67 | // we skipped these for brevity 68 | counter = 1; 69 | bpf_map_update_elem(&connection_count, &hash_key, &counter, BPF_NOEXIST); 70 | } 71 | 72 | return 0; 73 | } -------------------------------------------------------------------------------- /libbpf/README.md: -------------------------------------------------------------------------------- 1 | # libbpf编程 2 | 3 | 原生[libbpf库](https://github.com/libbpf/libbpf)是C语言写的。随着bpf热度持续高涨,其他语言实现的bpf库也逐渐遍地开花。 4 | 5 | ## golang实现bpf库 6 | - [gobpf](https://github.com/iovisor/gobpf): iovisor官方出品,可惜目前已经不活跃了 7 | - [libbpfgo](https://github.com/aquasecurity/libbpfgo):通过go实现了一个基于C语言版本的libbpf的封装 8 | - [ebpf](https://github.com/cilium/ebpf): 由Cilium和Cloudflare共同维护的一个纯Go库,它将所有的bpf系统调用抽象在一个原生Go接口里面。 9 | 10 | 11 | 12 | 13 | -------------------------------------------------------------------------------- /libbpf/libbpfgo-example/const-x64.go: -------------------------------------------------------------------------------- 1 | // +build amd64 2 | 3 | package main 4 | const sys_execve="__x64_sys_execve" -------------------------------------------------------------------------------- /libbpf/libbpfgo-example/hellokprobe.c: -------------------------------------------------------------------------------- 1 | // +build ignore 2 | #include "hellokprobe.h" 3 | 4 | // Example: tracing a message on a kprobe 5 | SEC("kprobe/sys_execve") 6 | int hello(void *ctx) 7 | { 8 | bpf_printk("Hello from kprobe"); 9 | return 0; 10 | } -------------------------------------------------------------------------------- /libbpf/libbpfgo-example/hellokprobe.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "C" 5 | // not working for "github.com/aquasecurity/libbpfgo" 6 | // use the old version instead 7 | bpf "github.com/aquasecurity/tracee/libbpfgo" 8 | ) 9 | import ( 10 | "fmt" 11 | "os" 12 | "os/signal" 13 | ) 14 | 15 | func main() { 16 | // use signal to handle the exit finalizer 17 | sig := make(chan os.Signal, 1) 18 | signal.Notify(sig, os.Interrupt) 19 | 20 | bpfModule, err := bpf.NewModuleFromFile("hellokprobe.o") 21 | must(err) 22 | defer bpfModule.Close() 23 | 24 | err = bpfModule.BPFLoadObject() 25 | must(err) 26 | 27 | prog, err := bpfModule.GetProgram("hellokprobe") 28 | must(err) 29 | _, err = prog.AttachKprobe(sys_execve) 30 | must(err) 31 | 32 | // print the output from the default bpf pipe 33 | go bpf.TracePrint() 34 | 35 | <-sig 36 | fmt.Println("Shuting down...") 37 | } 38 | 39 | func must(err error) { 40 | if err != nil { 41 | panic(err) 42 | } 43 | } 44 | -------------------------------------------------------------------------------- /libbpf/libbpfgo-example/hellokprobe.h: -------------------------------------------------------------------------------- 1 | /* In Linux 5.4 asm_inline was introduced, but it's not supported by clang. 2 | * Redefine it to just asm to enable successful compilation. 3 | * see https://github.com/iovisor/bcc/commit/2d1497cde1cc9835f759a707b42dea83bee378b8 for more details 4 | */ 5 | #include 6 | #ifdef asm_inline 7 | #undef asm_inline 8 | #define asm_inline asm 9 | #endif 10 | 11 | #include 12 | #include 13 | 14 | typedef __u64 u64; 15 | 16 | char LICENSE[] SEC("license") = "Dual BSD/GPL"; 17 | -------------------------------------------------------------------------------- /tc/README.md: -------------------------------------------------------------------------------- 1 | ## TC BPF 程序示例 2 | 3 | ## 相关博客地址 4 | - 中文版:[https://davidlovezoe.club/bpf-tc-101](https://davidlovezoe.club/bpf-tc-101) 5 | 6 | ## 推荐阅读tc man手册 7 | tc相关的man手册写得很完整,推荐大家能通读一遍。 8 | - man tc 9 | - man tc-bpf 10 | 11 | ## 运行第一个TC程序 12 | 13 | 1. 设计代码 14 | 15 | XDP是RX链路的第一层,TC是TX链路上的第一层,那么我们就设计一个同时使用这两个hook的程序,让他们一起发光发热,这次我们把流量粒度控制得更细点,实现把双向TCP流量都drop掉。这次拿容器实例作为流控目标。在实验环境上通过docker run运行一个Nginx服务。 16 | ```bash 17 | docker run -d -p 80:80 --name=nginx-xdp-tc nginx:alpine 18 | 19 | # 查看 veth-pair 20 | > ip a | grep veth 21 | 6: veth09e1d2e@if5: mtu 1500 qdisc noqueue master docker0 state UP group default 22 | ``` 23 | 24 | 2. 编写代码 25 | 26 | 源代码在[tc-xdp-drop.tcp.c](./tc-xdp-drop-tcp.c) 27 | 28 | 3. 编译代码 29 | ```bash 30 | clang -I ./headers/ -O2 -target bpf -c tc-xdp-drop-tcp.c -o tc-xdp-drop-tcp.o 31 | ``` 32 | 4. 加载代码 33 | 34 | ```bash 35 | # 最开始的状态 36 | > tc qdisc show dev veth09e1d2e 37 | qdisc noqueue 0: root refcnt 2 38 | # 创建clsact 39 | > tc qdisc add dev veth09e1d2e clsact 40 | # 再次查看,观察有什么不同 41 | > tc qdisc show dev veth09e1d2e 42 | qdisc noqueue 0: root refcnt 2 43 | qdisc clsact ffff: parent ffff:fff1 44 | # 加载TC BPF程序到容器的veth网卡上 45 | > tc filter add dev veth09e1d2e egress bpf da obj tc-xdp-drop-tcp.o sec tc 46 | # 再次查看,观察有什么不同 47 | > tc qdisc show dev veth09e1d2e 48 | qdisc noqueue 0: root refcnt 2 49 | qdisc clsact ffff: parent ffff:fff1 50 | > tc filter show dev veth09e1d2e egress 51 | filter protocol all pref 49152 bpf chain 0 52 | filter protocol all pref 49152 bpf chain 0 handle 0x1 tc-xdp-drop-tcp.o:[tc] direct-action not_in_hw id 24 tag 9c60324798bac8be jited 53 | ``` 54 | 5. 运行效果 55 | 56 | Demo 视频: 57 | 58 | [![tc-xdp-drop-tcp-docker-demo](https://img.youtube.com/vi/NSoK9rCuGP8/0.jpg)](https://www.youtube.com/watch?v=NSoK9rCuGP8) -------------------------------------------------------------------------------- /tc/debug-tc-xdp-drop-tcp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | #include "bpf_endian.h" 10 | #include "bpf_helpers.h" 11 | 12 | typedef unsigned int u32; 13 | #define bpfprint(fmt, ...) \ 14 | ({ \ 15 | char ____fmt[] = fmt; \ 16 | bpf_trace_printk(____fmt, sizeof(____fmt), \ 17 | ##__VA_ARGS__); \ 18 | }) 19 | 20 | /* 21 | check whether the packet is of TCP protocol 22 | */ 23 | static __inline bool is_TCP(void *data_begin, void *data_end){ 24 | bpfprint("Entering is_TCP\n"); 25 | struct ethhdr *eth = data_begin; 26 | 27 | // Check packet's size 28 | // the pointer arithmetic is based on the size of data type, current_address plus int(1) means: 29 | // new_address= current_address + size_of(data type) 30 | if ((void *)(eth + 1) > data_end) // 31 | return false; 32 | 33 | // Check if Ethernet frame has IP packet 34 | if (eth->h_proto == bpf_htons(ETH_P_IP)) 35 | { 36 | struct iphdr *iph = (struct iphdr *)(eth + 1); // or (struct iphdr *)( ((void*)eth) + ETH_HLEN ); 37 | if ((void *)(iph + 1) > data_end) 38 | return false; 39 | 40 | // extract src ip and destination ip 41 | u32 ip_src = iph->saddr; 42 | u32 ip_dst = iph->daddr; 43 | 44 | // 45 | bpfprint("src ip addr1: %d.%d.%d\n",(ip_src) & 0xFF,(ip_src >> 8) & 0xFF,(ip_src >> 16) & 0xFF); 46 | bpfprint("src ip addr2:.%d\n",(ip_src >> 24) & 0xFF); 47 | 48 | bpfprint("dest ip addr1: %d.%d.%d\n",(ip_dst) & 0xFF,(ip_dst >> 8) & 0xFF,(ip_dst >> 16) & 0xFF); 49 | bpfprint("dest ip addr2: .%d\n",(ip_dst >> 24) & 0xFF); 50 | 51 | // Check if IP packet contains a TCP segment 52 | if (iph->protocol == IPPROTO_TCP) 53 | return true; 54 | } 55 | return false; 56 | } 57 | 58 | SEC("xdp") 59 | int xdp_drop_tcp(struct xdp_md *ctx) 60 | { 61 | 62 | void *data_end = (void *)(long)ctx->data_end; 63 | void *data = (void *)(long)ctx->data; 64 | 65 | if (is_TCP(data, data_end)) 66 | return XDP_DROP; 67 | 68 | return XDP_PASS; 69 | } 70 | 71 | SEC("tc") 72 | int tc_drop_tcp(struct __sk_buff *skb) 73 | { 74 | 75 | bpfprint("Entering tc section\n"); 76 | void *data = (void *)(long)skb->data; 77 | void *data_end = (void *)(long)skb->data_end; 78 | 79 | 80 | if (is_TCP(data, data_end)) 81 | return TC_ACT_SHOT; 82 | else 83 | return TC_ACT_OK; 84 | } 85 | 86 | char _license[] SEC("license") = "GPL"; -------------------------------------------------------------------------------- /tc/headers/bpf_endian.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: GPL-2.0 */ 2 | /* Copied from $(LINUX)/tools/testing/selftests/bpf/bpf_endian.h */ 3 | #ifndef __BPF_ENDIAN__ 4 | #define __BPF_ENDIAN__ 5 | 6 | #include 7 | 8 | /* LLVM's BPF target selects the endianness of the CPU 9 | * it compiles on, or the user specifies (bpfel/bpfeb), 10 | * respectively. The used __BYTE_ORDER__ is defined by 11 | * the compiler, we cannot rely on __BYTE_ORDER from 12 | * libc headers, since it doesn't reflect the actual 13 | * requested byte order. 14 | * 15 | * Note, LLVM's BPF target has different __builtin_bswapX() 16 | * semantics. It does map to BPF_ALU | BPF_END | BPF_TO_BE 17 | * in bpfel and bpfeb case, which means below, that we map 18 | * to cpu_to_be16(). We could use it unconditionally in BPF 19 | * case, but better not rely on it, so that this header here 20 | * can be used from application and BPF program side, which 21 | * use different targets. 22 | */ 23 | #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 24 | # define __bpf_ntohs(x)__builtin_bswap16(x) 25 | # define __bpf_htons(x)__builtin_bswap16(x) 26 | # define __bpf_constant_ntohs(x)___constant_swab16(x) 27 | # define __bpf_constant_htons(x)___constant_swab16(x) 28 | # define __bpf_ntohl(x)__builtin_bswap32(x) 29 | # define __bpf_htonl(x)__builtin_bswap32(x) 30 | # define __bpf_constant_ntohl(x)___constant_swab32(x) 31 | # define __bpf_constant_htonl(x)___constant_swab32(x) 32 | #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ 33 | # define __bpf_ntohs(x)(x) 34 | # define __bpf_htons(x)(x) 35 | # define __bpf_constant_ntohs(x)(x) 36 | # define __bpf_constant_htons(x)(x) 37 | # define __bpf_ntohl(x)(x) 38 | # define __bpf_htonl(x)(x) 39 | # define __bpf_constant_ntohl(x)(x) 40 | # define __bpf_constant_htonl(x)(x) 41 | #else 42 | # error "Fix your compiler's __BYTE_ORDER__?!" 43 | #endif 44 | 45 | #define bpf_htons(x)\ 46 | (__builtin_constant_p(x) ?\ 47 | __bpf_constant_htons(x) : __bpf_htons(x)) 48 | #define bpf_ntohs(x)\ 49 | (__builtin_constant_p(x) ?\ 50 | __bpf_constant_ntohs(x) : __bpf_ntohs(x)) 51 | #define bpf_htonl(x)\ 52 | (__builtin_constant_p(x) ?\ 53 | __bpf_constant_htonl(x) : __bpf_htonl(x)) 54 | #define bpf_ntohl(x)\ 55 | (__builtin_constant_p(x) ?\ 56 | __bpf_constant_ntohl(x) : __bpf_ntohl(x)) 57 | 58 | #endif /* __BPF_ENDIAN__ */ -------------------------------------------------------------------------------- /tc/headers/bpf_helpers.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: GPL-2.0 */ 2 | /* Copied from $(LINUX)/tools/testing/selftests/bpf/bpf_helpers.h */ 3 | #ifndef __BPF_HELPERS_H 4 | #define __BPF_HELPERS_H 5 | 6 | /* helper macro to place programs, maps, license in 7 | * different sections in elf_bpf file. Section names 8 | * are interpreted by elf_bpf loader 9 | */ 10 | #define SEC(NAME) __attribute__((section(NAME), used)) 11 | 12 | #ifndef __inline 13 | # define __inline \ 14 | inline __attribute__((always_inline)) 15 | #endif 16 | 17 | /* helper functions called from eBPF programs written in C */ 18 | static void *(*bpf_map_lookup_elem)(void *map, void *key) = 19 | (void *) BPF_FUNC_map_lookup_elem; 20 | static int (*bpf_map_update_elem)(void *map, void *key, void *value, 21 | unsigned long long flags) = 22 | (void *) BPF_FUNC_map_update_elem; 23 | static int (*bpf_map_delete_elem)(void *map, void *key) = 24 | (void *) BPF_FUNC_map_delete_elem; 25 | static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) = 26 | (void *) BPF_FUNC_probe_read; 27 | static unsigned long long (*bpf_ktime_get_ns)(void) = 28 | (void *) BPF_FUNC_ktime_get_ns; 29 | static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) = 30 | (void *) BPF_FUNC_trace_printk; 31 | static void (*bpf_tail_call)(void *ctx, void *map, int index) = 32 | (void *) BPF_FUNC_tail_call; 33 | static unsigned long long (*bpf_get_smp_processor_id)(void) = 34 | (void *) BPF_FUNC_get_smp_processor_id; 35 | static unsigned long long (*bpf_get_current_pid_tgid)(void) = 36 | (void *) BPF_FUNC_get_current_pid_tgid; 37 | static unsigned long long (*bpf_get_current_uid_gid)(void) = 38 | (void *) BPF_FUNC_get_current_uid_gid; 39 | static int (*bpf_get_current_comm)(void *buf, int buf_size) = 40 | (void *) BPF_FUNC_get_current_comm; 41 | static unsigned long long (*bpf_perf_event_read)(void *map, 42 | unsigned long long flags) = 43 | (void *) BPF_FUNC_perf_event_read; 44 | static int (*bpf_clone_redirect)(void *ctx, int ifindex, int flags) = 45 | (void *) BPF_FUNC_clone_redirect; 46 | static int (*bpf_redirect)(int ifindex, int flags) = 47 | (void *) BPF_FUNC_redirect; 48 | static int (*bpf_perf_event_output)(void *ctx, void *map, 49 | unsigned long long flags, void *data, 50 | int size) = 51 | (void *) BPF_FUNC_perf_event_output; 52 | static int (*bpf_get_stackid)(void *ctx, void *map, int flags) = 53 | (void *) BPF_FUNC_get_stackid; 54 | static int (*bpf_probe_write_user)(void *dst, void *src, int size) = 55 | (void *) BPF_FUNC_probe_write_user; 56 | static int (*bpf_current_task_under_cgroup)(void *map, int index) = 57 | (void *) BPF_FUNC_current_task_under_cgroup; 58 | static int (*bpf_skb_get_tunnel_key)(void *ctx, void *key, int size, int flags) = 59 | (void *) BPF_FUNC_skb_get_tunnel_key; 60 | static int (*bpf_skb_set_tunnel_key)(void *ctx, void *key, int size, int flags) = 61 | (void *) BPF_FUNC_skb_set_tunnel_key; 62 | static int (*bpf_skb_get_tunnel_opt)(void *ctx, void *md, int size) = 63 | (void *) BPF_FUNC_skb_get_tunnel_opt; 64 | static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) = 65 | (void *) BPF_FUNC_skb_set_tunnel_opt; 66 | static unsigned long long (*bpf_get_prandom_u32)(void) = 67 | (void *) BPF_FUNC_get_prandom_u32; 68 | static int (*bpf_xdp_adjust_head)(void *ctx, int offset) = 69 | (void *) BPF_FUNC_xdp_adjust_head; 70 | 71 | /* llvm builtin functions that eBPF C program may use to 72 | * emit BPF_LD_ABS and BPF_LD_IND instructions 73 | */ 74 | struct sk_buff; 75 | unsigned long long load_byte(void *skb, 76 | unsigned long long off) asm("llvm.bpf.load.byte"); 77 | unsigned long long load_half(void *skb, 78 | unsigned long long off) asm("llvm.bpf.load.half"); 79 | unsigned long long load_word(void *skb, 80 | unsigned long long off) asm("llvm.bpf.load.word"); 81 | 82 | /* a helper structure used by eBPF C program 83 | * to describe map attributes to elf_bpf loader 84 | */ 85 | struct bpf_map_def { 86 | unsigned int type; 87 | unsigned int key_size; 88 | unsigned int value_size; 89 | unsigned int max_entries; 90 | unsigned int map_flags; 91 | unsigned int inner_map_idx; 92 | }; 93 | 94 | static int (*bpf_skb_load_bytes)(void *ctx, int off, void *to, int len) = 95 | (void *) BPF_FUNC_skb_load_bytes; 96 | static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from, int len, int flags) = 97 | (void *) BPF_FUNC_skb_store_bytes; 98 | static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int flags) = 99 | (void *) BPF_FUNC_l3_csum_replace; 100 | static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flags) = 101 | (void *) BPF_FUNC_l4_csum_replace; 102 | static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) = 103 | (void *) BPF_FUNC_skb_under_cgroup; 104 | static int (*bpf_skb_change_head)(void *, int len, int flags) = 105 | (void *) BPF_FUNC_skb_change_head; 106 | 107 | #if defined(__x86_64__) 108 | 109 | #define PT_REGS_PARM1(x) ((x)->di) 110 | #define PT_REGS_PARM2(x) ((x)->si) 111 | #define PT_REGS_PARM3(x) ((x)->dx) 112 | #define PT_REGS_PARM4(x) ((x)->cx) 113 | #define PT_REGS_PARM5(x) ((x)->r8) 114 | #define PT_REGS_RET(x) ((x)->sp) 115 | #define PT_REGS_FP(x) ((x)->bp) 116 | #define PT_REGS_RC(x) ((x)->ax) 117 | #define PT_REGS_SP(x) ((x)->sp) 118 | #define PT_REGS_IP(x) ((x)->ip) 119 | 120 | #elif defined(__s390x__) 121 | 122 | #define PT_REGS_PARM1(x) ((x)->gprs[2]) 123 | #define PT_REGS_PARM2(x) ((x)->gprs[3]) 124 | #define PT_REGS_PARM3(x) ((x)->gprs[4]) 125 | #define PT_REGS_PARM4(x) ((x)->gprs[5]) 126 | #define PT_REGS_PARM5(x) ((x)->gprs[6]) 127 | #define PT_REGS_RET(x) ((x)->gprs[14]) 128 | #define PT_REGS_FP(x) ((x)->gprs[11]) /* Works only with CONFIG_FRAME_POINTER */ 129 | #define PT_REGS_RC(x) ((x)->gprs[2]) 130 | #define PT_REGS_SP(x) ((x)->gprs[15]) 131 | #define PT_REGS_IP(x) ((x)->psw.addr) 132 | 133 | #elif defined(__aarch64__) 134 | 135 | #define PT_REGS_PARM1(x) ((x)->regs[0]) 136 | #define PT_REGS_PARM2(x) ((x)->regs[1]) 137 | #define PT_REGS_PARM3(x) ((x)->regs[2]) 138 | #define PT_REGS_PARM4(x) ((x)->regs[3]) 139 | #define PT_REGS_PARM5(x) ((x)->regs[4]) 140 | #define PT_REGS_RET(x) ((x)->regs[30]) 141 | #define PT_REGS_FP(x) ((x)->regs[29]) /* Works only with CONFIG_FRAME_POINTER */ 142 | #define PT_REGS_RC(x) ((x)->regs[0]) 143 | #define PT_REGS_SP(x) ((x)->sp) 144 | #define PT_REGS_IP(x) ((x)->pc) 145 | 146 | #elif defined(__powerpc__) 147 | 148 | #define PT_REGS_PARM1(x) ((x)->gpr[3]) 149 | #define PT_REGS_PARM2(x) ((x)->gpr[4]) 150 | #define PT_REGS_PARM3(x) ((x)->gpr[5]) 151 | #define PT_REGS_PARM4(x) ((x)->gpr[6]) 152 | #define PT_REGS_PARM5(x) ((x)->gpr[7]) 153 | #define PT_REGS_RC(x) ((x)->gpr[3]) 154 | #define PT_REGS_SP(x) ((x)->sp) 155 | #define PT_REGS_IP(x) ((x)->nip) 156 | 157 | #elif defined(__sparc__) 158 | 159 | #define PT_REGS_PARM1(x) ((x)->u_regs[UREG_I0]) 160 | #define PT_REGS_PARM2(x) ((x)->u_regs[UREG_I1]) 161 | #define PT_REGS_PARM3(x) ((x)->u_regs[UREG_I2]) 162 | #define PT_REGS_PARM4(x) ((x)->u_regs[UREG_I3]) 163 | #define PT_REGS_PARM5(x) ((x)->u_regs[UREG_I4]) 164 | #define PT_REGS_RET(x) ((x)->u_regs[UREG_I7]) 165 | #define PT_REGS_RC(x) ((x)->u_regs[UREG_I0]) 166 | #define PT_REGS_SP(x) ((x)->u_regs[UREG_FP]) 167 | #if defined(__arch64__) 168 | #define PT_REGS_IP(x) ((x)->tpc) 169 | #else 170 | #define PT_REGS_IP(x) ((x)->pc) 171 | #endif 172 | 173 | #endif 174 | 175 | #ifdef __powerpc__ 176 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ (ip) = (ctx)->link; }) 177 | #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP 178 | #elif defined(__sparc__) 179 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ (ip) = PT_REGS_RET(ctx); }) 180 | #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP 181 | #else 182 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ \ 183 | bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) 184 | #define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({ \ 185 | bpf_probe_read(&(ip), sizeof(ip), \ 186 | (void *)(PT_REGS_FP(ctx) + sizeof(ip))); }) 187 | #endif 188 | 189 | #endif -------------------------------------------------------------------------------- /tc/tc-xdp-drop-tcp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | 8 | #include "bpf_endian.h" 9 | #include "bpf_helpers.h" 10 | 11 | // static bool is_TCP(void *data_begin, void *data_end); 12 | 13 | /* 14 | check whether the packet is of TCP protocol 15 | */ 16 | static bool is_TCP(void *data_begin, void *data_end){ 17 | struct ethhdr *eth = data_begin; 18 | 19 | // Check packet's size 20 | // the pointer arithmetic is based on the size of data type, current_address plus int(1) means: 21 | // new_address= current_address + size_of(data type) 22 | if ((void *)(eth + 1) > data_end) // 23 | return false; 24 | 25 | // Check if Ethernet frame has IP packet 26 | if (eth->h_proto == bpf_htons(ETH_P_IP)) 27 | { 28 | struct iphdr *iph = (struct iphdr *)(eth + 1); // or (struct iphdr *)( ((void*)eth) + ETH_HLEN ); 29 | if ((void *)(iph + 1) > data_end) 30 | return false; 31 | 32 | // Check if IP packet contains a TCP segment 33 | if (iph->protocol == IPPROTO_TCP) 34 | return true; 35 | } 36 | 37 | return false; 38 | } 39 | 40 | SEC("xdp") 41 | int xdp_drop_tcp(struct xdp_md *ctx) 42 | { 43 | 44 | void *data_end = (void *)(long)ctx->data_end; 45 | void *data = (void *)(long)ctx->data; 46 | 47 | if (is_TCP(data, data_end)) 48 | return XDP_DROP; 49 | 50 | return XDP_PASS; 51 | } 52 | 53 | SEC("tc") 54 | int tc_drop_tcp(struct __sk_buff *skb) 55 | { 56 | 57 | void *data = (void *)(long)skb->data; 58 | void *data_end = (void *)(long)skb->data_end; 59 | 60 | if (is_TCP(data, data_end)) 61 | return TC_ACT_SHOT; 62 | 63 | return TC_ACT_OK; 64 | } 65 | 66 | char _license[] SEC("license") = "GPL"; 67 | -------------------------------------------------------------------------------- /tc/tc-xdp-statistics.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #include "bpf_endian.h" 13 | #include "bpf_helpers.h" 14 | 15 | struct pair { 16 | uint32_t lip; // local IP 17 | uint32_t rip; // remote IP 18 | }; 19 | 20 | struct stats { 21 | uint64_t tx_cnt; 22 | uint64_t rx_cnt; 23 | uint64_t tx_bytes; 24 | uint64_t rx_bytes; 25 | }; 26 | 27 | struct bpf_elf_map SEC("maps") trackers = { 28 | .type = BPF_MAP_TYPE_HASH, 29 | .size_key = sizeof(struct pair), 30 | .size_value = sizeof(struct stats), 31 | .max_elem = 2048, 32 | .pinning = 2, // PIN_GLOBAL_NS 33 | }; 34 | 35 | static bool parse_ipv4(bool is_rx, void* data, void* data_end, struct pair *pair){ 36 | struct ethhdr *eth = data; 37 | struct iphdr *ip; 38 | 39 | if(data + sizeof(struct ethhdr) > data_end) 40 | return false; 41 | 42 | if(bpf_ntohs(eth->h_proto) != ETH_P_IP) 43 | return false; 44 | 45 | ip = data + sizeof(struct ethhdr); 46 | 47 | if ((void*) ip + sizeof(struct iphdr) > data_end) 48 | return false; 49 | 50 | pair->lip = is_rx ? ip->daddr : ip->saddr; 51 | pair->rip = is_rx ? ip->saddr : ip->daddr; 52 | 53 | return true; 54 | } 55 | 56 | static void update_stats(bool is_rx, struct pair *key, long long bytes){ 57 | struct stats *stats, newstats = {0,0,0,0}; 58 | 59 | stats = bpf_map_lookup_elem(&trackers, key); 60 | if(stats){ 61 | if(is_rx){ 62 | stats->rx_cnt++; 63 | stats->rx_bytes += bytes; 64 | }else{ 65 | stats->tx_cnt++; 66 | stats->tx_bytes += bytes; 67 | } 68 | }else{ 69 | if(is_rx){ 70 | newstats.rx_cnt = 1; 71 | newstats.rx_bytes = bytes; 72 | }else{ 73 | newstats.tx_cnt = 1; 74 | newstats.tx_bytes = bytes; 75 | } 76 | 77 | bpf_map_update_elem(&trackers, key, &newstats, BPF_NOEXIST); 78 | } 79 | } 80 | 81 | SEC("rx") 82 | int track_rx(struct xdp_md *ctx) 83 | { 84 | void *data_end = (void *)(long)ctx->data_end; 85 | void *data = (void *)(long)ctx->data; 86 | struct pair pair; 87 | 88 | if(!parse_ipv4(true,data,data_end,&pair)) 89 | return XDP_PASS; 90 | 91 | // Update RX statistics 92 | update_stats(true,&pair,data_end-data); 93 | 94 | return XDP_PASS; 95 | } 96 | 97 | SEC("tx") 98 | int track_tx(struct __sk_buff *skb) 99 | { 100 | void *data_end = (void *)(long)skb->data_end; 101 | void *data = (void *)(long)skb->data; 102 | struct pair pair; 103 | 104 | if(!parse_ipv4(false,data,data_end,&pair)) 105 | return TC_ACT_OK; 106 | 107 | // Update TX statistics 108 | update_stats(false,&pair,data_end-data); 109 | 110 | return TC_ACT_OK; 111 | } 112 | -------------------------------------------------------------------------------- /xdp/README.md: -------------------------------------------------------------------------------- 1 | # XDP BPF 程序示例 2 | 3 | ## 相关博客地址 4 | - 中文版:[https://davidlovezoe.club/bpf-xdp-101](https://davidlovezoe.club/bpf-xdp-101) 5 | 6 | ## 运行第一个XDP程序 7 | 8 | 1. 编写代码,文件名`xdp_drop_world.c` 9 | ```C 10 | #include 11 | int main() { 12 | // 意思是无论什么网络数据包,都drop丢弃掉 13 | return XDP_DROP; 14 | } 15 | ``` 16 | 2. 编译代码,生成obj文件 17 | ```bash 18 | clang -O2 -target bpf -c xdp_drop_all.c -o xdp_drop_all.o 19 | ``` 20 | 3. attach xdp 程序到主机网卡上 21 | ```bash 22 | # 通过ip link查询主机上的可用网卡设备名称,一般可以选取本机对外可以访问的IP所在的网卡设备 23 | ip link set dev [network-device-name] xdp obj xdp_drop_all.o sec .text 24 | ``` 25 | 4. 测试 26 | - 使用`tcpdump`命令监测网络数据包传输情况 27 | - 使用`ping`命令创造测试网络数据包 28 | 5. 从主机网卡上detach xdp 程序 29 | ```bash 30 | ip link set dev [network-device-name] xdp off 31 | ``` 32 | - 完整Demo视频 33 | 34 | [![xdp-bpf-demo](https://img.youtube.com/vi/GD6pJLPd08U/0.jpg)](https://www.youtube.com/watch?v=GD6pJLPd08U) 35 | 36 | - 小结 37 | 38 | 这是一个非常简单XDP程序,使用了`xdpgeneric`模式attach到了主机网卡上,以丢弃任何传给这张网卡的网络数据包。 39 | 40 | ## 基于Docker运行第二个XDP程序 41 | 42 | 基于Nginx服务运行一个Docker容器,对外提供http服务。编写一个XDP程序,丢弃所有基于TCP协议的数据包。 43 | 44 | 1. 安装Docker,并启动一个web服务的容器 45 | ```bash 46 | docker run -d -p 80:80 --name=nginx-xdp nginx:alpine 47 | ``` 48 | 2. 编写XDP代码 49 | 50 | ```c 51 | #include 52 | #include 53 | #include 54 | #include 55 | 56 | #define SEC(NAME) __attribute__((section(NAME), used)) 57 | 58 | SEC("drop_tcp") 59 | int dropper(struct xdp_md *ctx) { 60 | int ipsize = 0; 61 | 62 | void *data = (void *)(long)ctx->data; 63 | void *data_end = (void *)(long)ctx->data_end; 64 | 65 | struct ethhdr *eth = data; 66 | ipsize = sizeof(*eth); 67 | struct iphdr *ip = data + ipsize; 68 | ipsize += sizeof(struct iphdr); 69 | 70 | if (data + ipsize > data_end) { 71 | return XDP_PASS; 72 | } 73 | 74 | // 判断是否该数据包是否基于TCP协议 75 | if (ip->protocol == IPPROTO_TCP) { 76 | // 丢弃该数据包 77 | return XDP_DROP; 78 | } 79 | 80 | return XDP_PASS; 81 | } 82 | 83 | char _license[] SEC("license") = "GPL"; 84 | ``` 85 | 3. 编译程序 86 | ```bash 87 | clang -O2 -target bpf -c xdp-drop-tcp.c -o xdp-drop-tcp.o 88 | ``` 89 | 4. 找到docker容器在主机侧的veth pair网卡 90 | ```bash 91 | sandkey=$(docker inspect nginx-xdp -f "{{.NetworkSettings.SandboxKey}}") 92 | mkdir -p /var/run/netns 93 | ln -s $sandkey /var/run/netns/httpserver 94 | ip netns exec httpserver ip a 95 | > ip a | grep veth: 96 | 20: veth5722074@if19: mtu 1500 qdisc noqueue master docker0 state UP group default 97 | ``` 98 | 5. 为主机侧的veth pair网卡 attach/detach XDP程序 99 | ```bash 100 | ip link set dev veth5722074 xdp obj xdp-drop-tcp.o sec drop_tcp 101 | ip link set dev veth5722074 xdp off 102 | ``` 103 | - 完整Demo视频 104 | - 从容器外访问NGINX http服务 105 | 106 | [![xdp-bpf-docker-ingress-demo](https://img.youtube.com/vi/SFDIsDoJG60/0.jpg)](https://www.youtube.com/watch?v=SFDIsDoJG60) 107 | 108 | - 从容器内访问某个http服务 109 | 110 | [![xdp-bpf-docker-engress-demo](https://img.youtube.com/vi/9O6PBnkxMOM/0.jpg)](https://www.youtube.com/watch?v=9O6PBnkxMOM) 111 | 112 | 113 | - 小结 114 | 115 | 从上面的例子可以看到XDP BPF程序只会对传给目标网卡的数据包进行丢弃,不会影响从目标网卡出去的数据包,也就是只影响ingress流量。那么能不能控制egress流量呢? 116 | 117 | ## 测试xdp下的ingress和egress 118 | **说明**: 119 | curl命令的`--dns-server`参数能帮助curl使用自定义的域名解析服务器,否则无法解析域名,只能访问IP。 120 | 这个参数在很多Linux发行版的curl里面没有编译进去,需要自行重新编译curl,带上该参数。可以参照这篇文章自定义curl命令:https://davidlovezoe.club/build-curl-from-source 121 | 122 | - ingress 123 | ```bash 124 | curl localhost 125 | ``` 126 | - egress 127 | ```bash 128 | # curl-new是自己编译的版本,支持--dns-server参数 129 | # 下面这行是指定自己编译出来curl命令的lib库,否则无法使用curl 130 | export LD_LIBRARY_PATH=/usr/local/curl/fromsource/lib 131 | ip netns exec httpserver curl-new --dns-servers 8.8.8.8 www.baidu.com 132 | ``` 133 | -------------------------------------------------------------------------------- /xdp/xdp-drop-world.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | /* 4 | * Helper macro to place programs, maps, license in 5 | * different sections in elf_bpf file. Section names 6 | * are interpreted by elf_bpf loader. 7 | * You can either use the helper header file below 8 | * so that you don't need to defind it yourself: 9 | * #include 10 | */ 11 | #define SEC(NAME) __attribute__((section(NAME), used)) 12 | 13 | // entrance for this program 14 | SEC("xdp") 15 | int xdp_drop_the_world(struct xdp_md *ctx) { 16 | // 意思是无论什么网络数据包,都drop丢弃掉 17 | return XDP_DROP; 18 | } 19 | 20 | // for bpf verifier 21 | char _license[] SEC("license") = "GPL"; 22 | 23 | // another simple version of this pragram 24 | /* 25 | #include 26 | int xdp_drop_the_world() { 27 | return XDP_DROP; 28 | } 29 | */ --------------------------------------------------------------------------------