├── .gitignore ├── README.md ├── ansible ├── README.md └── provision.yml ├── examples ├── Makefile ├── chacha8.c ├── dropworld.c ├── layercoop.c ├── portfilter.c └── tcpfilter.c ├── headers ├── bpf_endian.h ├── bpf_helpers.h └── common.h └── images ├── vbox-create.png ├── vbox-disk.png ├── vbox-hostonly.png ├── vbox-memory.png └── vbox-nat.png /.gitignore: -------------------------------------------------------------------------------- 1 | .vagrant/* 2 | *.log 3 | *.retry 4 | .vscode/* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fast Packet Processing with eBPF and XDP: Concepts, Code, Challenges and Applications 2 | 3 | ## About 4 | 5 | This repository presents complimentary material to the paper "Fast Packet Processing with eBPF and XDP: Concepts, Code, Challenges and Applications" submitted to ACM Computing Surveys (CSUR). 6 | 7 | The contents are divided as as follows: 8 | - `ansible/`: ansible script used to install required dependencies during VM creation 9 | - `examples/`: examples of eBPF programs 10 | - `headers/`: header files needed to compile the examples 11 | - `images/`: images used in this README 12 | 13 | ## Virtual Machine 14 | 15 | We created a virtual machine to be used in this tutorial. It contains all the code and tools required to complete the tutorial step-by-step. 16 | 17 | - [Download VirtualBox VM](https://www.winet.dcc.ufmg.br/ebpf/minicurso-ebpf-sbrc2019.rar) (user: *ebpf*, password: *ebpf*) 18 | 19 | The virtual machine contains the following items: 20 | - kernel v5.0.0 21 | - iproute2-ss190319 22 | - llvm 6.0.0 23 | - bpftool 24 | 25 | The directory `/home/ebpf` includes a copy of this repository and also local copies of the following projects: 26 | - [Linux kernel net-next](https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git) 27 | - [iproute2](https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git) 28 | - [prototype-kernel](https://github.com/netoptimizer/prototype-kernel.git) 29 | 30 | ## Import the virtual machine 31 | 32 | The following steps have been tested with VirtualBox 5.2.18 on Ubuntu. 33 | 34 | After downloading the VM image, unzip the file `.rar`. You should then see a file named `ebpf-vm.vdi`. 35 | 36 | Open the VirtualBox app and then create a new VM by pressing the `New` button and picking a name for it: 37 | 38 |

39 | Criar máquina virtual 40 |

41 | 42 | Next, VirtualBox will allow modifications to the machine specification, such as the amount of RAM (this value can be modified later). 43 | 44 |

45 | Ajustar máquina 46 |

47 | 48 | In the next step, VirtualBox will ask for the desired hard disk option. Here you must use an existing disk, which corresponds to the downloaded VM image: 49 | 50 |

51 | Importar disco 52 |

53 | 54 | Finally, it is necessary to configure the machine with two network interfaces: one in NAT mode (`eth0` - Internet access) and another in HostOnly mode (`eth1` - SSH access). 55 | After the VM creation, right-click on the VM name and then select the `Setting` option. 56 | In the `Network` tab, make sure these two interfaces are created: 57 | 58 |

59 | Importar Appliance 60 |
61 |
62 | Importar Appliance 63 |

64 | 65 | Finished! The machine is now ready for the tutorial. 66 | 67 | ## Compiling kernel examples 68 | 69 | The kernel source code has several sample programs, available in the following directories: 70 | - `samples/bpf` 71 | - `tools/testing/selftests/bpf` 72 | 73 | Here we present two examples from `samples/bpf` folder. To compile them, run the following commands: 74 | 75 | cd ~/net-next/ 76 | make headers_install 77 | make samples/bpf/ 78 | 79 | ## Compiling local examples 80 | 81 | The examples provided in this repository in the `examples/` folder are accompanied by a Makefile. To compile them, run: 82 | 83 | cd examples/ 84 | make 85 | 86 | **P.S.**: The dependencies required for compilation are already installed on the virtual machine, so we recommend compiling the examples in that environment. 87 | 88 | ## Examples 89 | 90 | Below are the step-by-step instructions on how to compile and run each of the examples presented in the ACM CSUR paper, as well as some extra ones. In-depth explanations of each example are present in the paper. 91 | 92 | ### Example 0: **Drop World!** 93 | 94 | File location: `./examples/dropworld.c` 95 | 96 | This example is one of the simplest programs possible. It just discards all received packets. 97 | 98 | To compile it, run: 99 | 100 | cd ./examples/ 101 | make 102 | 103 | Next, the compiled program can be loaded using the `ip` tool: 104 | 105 | sudo ip -force link set dev eth0 xdp obj dropworld.o sec .text 106 | 107 | The `.text` argument refers to the ELF section in which the program is located. Check out the paper for more details. 108 | 109 | It is possible to check the status of the program by using the following command: 110 | 111 | ip link show eth0 112 | 113 | Expected output: 114 | 115 | ebpf@ebpf-vm:~/bpf-tutorial/examples$ ip link show eth0 116 | 2: eth0: mtu 1500 xdpgeneric qdisc fq_codel state UP mode DEFAULT group default qlen 1000 117 | link/ether 08:00:27:58:07:42 brd ff:ff:ff:ff:ff:ff 118 | prog/xdp id 19 119 | 120 | To remove the program, just run: 121 | 122 | sudo ip link set dev eth0 xdp off 123 | 124 | After the removal, the interface status will be as follows: 125 | 126 | ebpf@ebpf-vm:~/bpf-tutorial/examples$ ip link show eth0 127 | 2: eth0: mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 128 | link/ether 08:00:27:58:07:42 brd ff:ff:ff:ff:ff:ff 129 | 130 | Since in this case the `eth0` interface is used for Internet access, discarding packets received by this interface will effectively cut off web access. 131 | 132 | On another terminal, start a process to ping some domain on the internet: 133 | 134 | ping google.com 135 | 136 | Then load the `dropworld.o` program into the `eth0` interface and notice that the ping responses will be interrupted. This interruption will happen because all response messages sent to the `eth0` interface will be discarded by the loaded program. 137 | 138 | **Extra**: Modify the `dropworld.c` file by changing the return value from `XDP_DROP` to `XDP_PASS`. Then compile and repeat the loading process. Observe that, in this case, the ping responses will still be received. Thus, this new program is effectively an empty operation, which merely receives and passes packets up to the kernel stack. 139 | 140 | ### Example 1: **TCP filter** 141 | 142 | File location: `./examples/tcpfilter.c` 143 | 144 | This example parses packets received on an interface and only accepts the ones with TCP segments. Filtering is done by parsing the IP header protocol field. Only packets with a protocol equal to 6, which corresponds to TCP, are accepted. 145 | 146 | Similar to the previous example, compile the program by running: 147 | 148 | cd ./examples/ 149 | make 150 | 151 | Before loading the program, try pinging a domain name and test the access to a web page: 152 | 153 | ping google.com 154 | curl http://www.google.com 155 | 156 | The ping must be successful and the output of the second command should be a print of the requested page's HTML code. Since `ping` uses ICMP packets and HTTP operates over TCP, once we load the program, we should continue to receive responses to curl requests and ping responses should be interrupted. 157 | 158 | Load the program using the `ip` tool: 159 | 160 | sudo ip -force link set dev eth0 xdp obj tcpfilter.o sec .text 161 | 162 | Now, try to access the same page again and then try to ping the same domain: 163 | 164 | curl http://www.google.com 165 | ping google.com 166 | 167 | Because of program *tcpfilter.o*, packets are discarded as soon as they reach the interface `eth0`, preventing access to any service that does not operate over TCP. 168 | 169 | **Extra**: Modify the program in `tcpfilter.c` so that it only accepts ICMP packets (used by `ping` utility). Also check the program in `portfilter.c`, which drops packets based on the application layer protocol used. 170 | 171 | ### Example 2: **User and kernel space interaction** 172 | 173 | File locations: `xdp1_kern.c` and `xdp1_user.c` in `samples/bpf/` in kernel source code (`~/net-next/samples/bpf/` in the VM). 174 | 175 | This example shows how to use maps in eBPF programs and how to interact with user space. The program in `xdp1_kern.c` extracts the layer 4 protocol number (TCP = 6, UDP = 17, ICMP = 1, etc) from each received packet, updates counters for each protocol and then discards the packets. 176 | The counter values are stored in a map named `rxcnt` and later consulted by the program `xdp1_user.c`, which executes in user space. Through the use of a map, both programs (one in the kernel and another in user space) can exchange information. 177 | 178 | To compile the programs, follow the instructions given earlier on how to compile sample programs from the Linux kernel. 179 | 180 | Unlike the previous examples, here the eBPF program is loaded into the kernel by the program `xdp_user.c`, in user space, without requiring the use of `ip` tool. 181 | 182 | After the program compilation, the `samples/bpf/` directory will contain the executable file `xdp1` (generated from `xdp_user.c`). 183 | 184 | ebpf@ebpf-vm:~/net-next/samples/bpf$ ./xdp1 185 | usage: xdp1 [OPTS] IFACE 186 | 187 | OPTS: 188 | -S use skb-mode 189 | -N enforce native mode 190 | 191 | To load the program in the `eth0` interface, just pass it as a parameter to `xdp1`: 192 | 193 | ./xdp1 eth0 194 | 195 | The program will go into an infinite loop, printing the number of packets received per protocol number so far. 196 | 197 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp1 eth0 198 | proto 17: 1 pkt/s 199 | proto 17: 1 pkt/s 200 | proto 17: 1 pkt/s 201 | proto 0: 1 pkt/s 202 | proto 17: 1 pkt/s 203 | 204 | In another terminal, make a request using the `ping`, `curl`, `wget` and similars to get packets to pass through the interface. 205 | 206 | It is possible to analyze map content using the Bpftool tool, already compiled and installed on the provided VM. To do this, it is first necessary to check the eBPF programs loaded on the system: 207 | 208 | sudo bpftool prog show 209 | 210 | Expected output: 211 | 212 | ebpf@ebpf-vm:~$ sudo bpftool prog show 213 | 2: cgroup_skb tag 7be49e3934a125ba gpl 214 | loaded_at 2019-04-23T12:24:29-0400 uid 0 215 | xlated 296B jited 229B memlock 4096B map_ids 2,3 216 | 3: cgroup_skb tag 2a142ef67aaad174 gpl 217 | loaded_at 2019-04-23T12:24:29-0400 uid 0 218 | xlated 296B jited 229B memlock 4096B map_ids 2,3 219 | 4: cgroup_skb tag 7be49e3934a125ba gpl 220 | loaded_at 2019-04-23T12:24:29-0400 uid 0 221 | xlated 296B jited 229B memlock 4096B map_ids 4,5 222 | 5: cgroup_skb tag 2a142ef67aaad174 gpl 223 | loaded_at 2019-04-23T12:24:29-0400 uid 0 224 | xlated 296B jited 229B memlock 4096B map_ids 4,5 225 | 6: cgroup_skb tag 7be49e3934a125ba gpl 226 | loaded_at 2019-04-23T12:24:35-0400 uid 0 227 | xlated 296B jited 229B memlock 4096B map_ids 6,7 228 | 7: cgroup_skb tag 2a142ef67aaad174 gpl 229 | loaded_at 2019-04-23T12:24:35-0400 uid 0 230 | xlated 296B jited 229B memlock 4096B map_ids 6,7 231 | 28: xdp name xdp_prog1 tag 539ec6ce11b52f98 gpl 232 | loaded_at 2019-04-23T14:34:06-0400 uid 0 233 | xlated 488B jited 336B memlock 4096B map_ids 14 234 | 235 | The last program listed corresponds to the XDP program loaded by `xdp1`. The output also indicates that it has a map with id 14. We can use this value to query the map content: 236 | 237 | sudo bpftool map dump id 14 238 | 239 | Expected output: 240 | 241 | ebpf@ebpf-vm:~$ sudo bpftool map dump id 14 242 | key: 243 | 00 00 00 00 244 | value (CPU 00): 4b 00 00 00 00 00 00 00 245 | key: 246 | 01 00 00 00 247 | value (CPU 00): 00 00 00 00 00 00 00 00 248 | key: 249 | 02 00 00 00 250 | value (CPU 00): 00 00 00 00 00 00 00 00 251 | ... 252 | (rest of output omitted) 253 | 254 | The map used is of type `BPF_MAP_TYPE_PERCPU_ARRAY`. As the name implies, it has one array per CPU used. In the map declaration, the number of elements has been set to `256`, so the output of command `bpftool` shows the `256` entries corresponding to CPU 0, the only one on the VM. 255 | 256 | **Extra**: Change the program to let packages pass, rather than being dropped. Also, change the map type to `BPF_MAP_TYPE_HASH` and check its content using `bpftool`. 257 | 258 | ### Example 3: **Cooperation between XDP and TC** 259 | 260 | File location: `./examples/layercoop.c` 261 | 262 | This example uses two eBPF programs in different layers (XDP and TC) to collect joint statistics about communication between any pair of IPs that cross the corresponding interface. 263 | 264 | As before, to compile the example just run: 265 | 266 | cd ./examples/ 267 | make 268 | 269 | Now, load and attach the program from section `rx` to the XDP layer on a chosen interface, say `eth0`: 270 | 271 | sudo ip link set dev eth0 xdp obj layercoop.o sec rx 272 | 273 | The extra flag `-force` after `ip` might be necessary if another XDP program was already attached to that interface. 274 | 275 | Next, we need to load the program responsible for handling the stats collection on TX (ELF section `tx`). But before that, we need to create the `clsact` qdisc on TC: 276 | 277 | sudo tc qdisc add dev eth0 clsact 278 | 279 | Now we can load the program on the TC `egress` hook, to run it on TX: 280 | 281 | sudo tc filter add dev eth0 egress bpf da obj layercoop.o sec tx 282 | 283 | From now on, all pair of communicating IPs will have an entry on the map shared by these programs, which can be inspected using `bpftool` as explained in the previous example. 284 | 285 | Finally, to unload both programs: 286 | 287 | sudo ip link set dev eth0 xdp off 288 | sudo tc filter del dev eth0 egress 289 | 290 | ## Extra examples 291 | 292 | ### Extra example 1: **Packet filtering by TCP port** 293 | 294 | File location: `./examples/portfilter.c` 295 | 296 | This example parses packets received on an interface and discards the ones with the HTTP protocol. Discarding is done by parsing the TCP header source and destination port fields. Packets in which one of these values is 80 are discarded. 297 | 298 | Similar to the previous example, compile the program by running: 299 | 300 | cd ./examples/ 301 | make 302 | 303 | Before loading the program, test the access to a web page: 304 | 305 | curl http://www.google.com 306 | 307 | The output of this command should be a print of the requested page's HTML code. 308 | 309 | Load the program using the `ip` tool: 310 | 311 | sudo ip -force link set dev eth0 xdp obj portfilter.o sec filter 312 | 313 | Now, try to access the same page again: 314 | 315 | curl http://www.google.com 316 | 317 | Because of program *portfilter.o*, packets are discarded as soon as they reach the interface `eth0`, preventing access to the web. 318 | 319 | **Extra**: Modify the program in `portfilter.c` so that it discards all ICMP packets (used by `ping` utility). 320 | 321 | ### Extra example 2: **Interaction between XDP and TC through metadata field** 322 | 323 | File location: `linux/samples/bpf/`: files `xdp2skb_meta_kern.c` and `xdp2skb_meta.sh` 324 | 325 | This example aims to demonstrate how the XDP and TC layers can interact through the use of metadata associated with a packet. File `xdp2skb_meta_kern.c` contains two separate programs, one to be loaded into XDP and one to TC, both on reception. Packets received by XDP receive custom metadata, which is read at the TC layer. Script `xdp2skb_meta.sh` is used to load the programs on their respective hooks and configure the system. 326 | 327 | To help analyze these programs as well as demonstrate an alternative way to debug eBPF programs, let's modify the `xdp2skb_meta_kern.c` file to print log messages after packet processing on each layer. 328 | 329 | To do so, we will use the helper function `bpf_trace_printk`. For ease of use, we can add the following macro to the file: 330 | 331 | ```c 332 | // Nicer way to call bpf_trace_printk() 333 | #define bpf_custom_printk(fmt, ...) \ 334 | ({ \ 335 | char ____fmt[] = fmt; \ 336 | bpf_trace_printk(____fmt, sizeof(____fmt), \ 337 | ##__VA_ARGS__); \ 338 | }) 339 | ``` 340 | Through this macro, we can use function `bpf_trace_printk` indirectly, but with a syntax similar to function `printf`. 341 | 342 | Having added the macro, we can now use it to print metadata values on TC and XDP layers. 343 | 344 | Add to end of function *_xdp_mark()*: 345 | 346 | ```c 347 | SEC("xdp_mark") 348 | int _xdp_mark(struct xdp_md *ctx) 349 | { 350 | struct meta_info *meta; 351 | void *data, *data_end; 352 | int ret; 353 | 354 | <...> // code omitted 355 | 356 | meta->mark = 42; 357 | 358 | bpf_custom_printk("[XDP] metadata = %d\n",meta->mark); // <-- Add this line 359 | 360 | return XDP_PASS; 361 | } 362 | ``` 363 | 364 | Add to end of function *_tc_mark*: 365 | 366 | ```c 367 | SEC("tc_mark") 368 | int _tc_mark(struct __sk_buff *ctx) 369 | { 370 | void *data = (void *)(unsigned long)ctx->data; 371 | void *data_end = (void *)(unsigned long)ctx->data_end; 372 | void *data_meta = (void *)(unsigned long)ctx->data_meta; 373 | struct meta_info *meta = data_meta; 374 | 375 | <...> // code omitted 376 | 377 | ctx->mark = meta->mark; /* Transfer XDP-mark to SKB-mark */ 378 | 379 | bpf_custom_printk("[TC] metadata = %d\n",meta->mark); // <-- Add this line 380 | 381 | return TC_ACT_OK; 382 | } 383 | ``` 384 | 385 | The `bpf_trace_printk` function requires programs that use it to be declared using GPL license. Otherwise, the program will be rejected by the verifier during kernel loading. The error message generated by the verifier is as follows: 386 | 387 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp2skb_meta.sh --dev eth0 [16/1675] 388 | 389 | Prog section 'tc_mark' rejected: Invalid argument (22)! 390 | - Type: 3 391 | - Instructions: 25 (0 over limit) 392 | - License: 393 | 394 | Verifier analysis: 395 | 396 | 0: (61) r3 = *(u32 *)(r1 +76) 397 | 1: (61) r2 = *(u32 *)(r1 +140) 398 | 2: (bf) r4 = r2 399 | 3: (07) r4 += 4 400 | 4: (3d) if r3 >= r4 goto pc+3 401 | R1=ctx(id=0,off=0,imm=0) R2=pkt_meta(id=0,off=0,r=0,imm=0) R3=pkt(id=0,off=0,r=0,imm=0) R4=pkt_meta(id=0,off=4,r=0,imm=0) R10=fp0,call_-1 402 | 5: (b7) r2 = 41 403 | 6: (63) *(u32 *)(r1 +8) = r2 404 | 7: (05) goto pc+15 405 | 23: (b7) r0 = 0 406 | 24: (95) exit 407 | 408 | from 4 to 8: R1=ctx(id=0,off=0,imm=0) R2=pkt_meta(id=0,off=0,r=4,imm=0) R3=pkt(id=0,off=0,r=0,imm=0) R4=pkt_meta(id=0,off=4,r=4,imm=0) R10=fp0,call_-1 409 | 8: (61) r3 = *(u32 *)(r2 +0) 410 | 9: (63) *(u32 *)(r1 +8) = r3 411 | 10: (b7) r1 = 680997 412 | 11: (63) *(u32 *)(r10 -16) = r1 413 | 12: (18) r1 = 0x203d206f64616461 414 | 14: (7b) *(u64 *)(r10 -24) = r1 415 | 15: (18) r1 = 0x74656d205d43545b 416 | 17: (7b) *(u64 *)(r10 -32) = r1 417 | 18: (61) r3 = *(u32 *)(r2 +0) 418 | 19: (bf) r1 = r10 419 | 20: (07) r1 += -32 420 | 21: (b7) r2 = 20 421 | 22: (85) call bpf_trace_printk#6 422 | cannot call GPL-restricted function from non-GPL compatible program 423 | 424 | Error fetching program/map! 425 | Unable to load program 426 | ERROR: Exec error(1) occurred cmd: "tc filter add dev eth0 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark" 427 | 428 | To overcome this limitation, it is necessary to declare a special global variable in the `license` ELF section with this information. This can be done by adding the following line at the end of `xdp2skb_meta_kern.c` file. 429 | 430 | ```c 431 | char _license[] SEC("license") = "GPL"; 432 | ``` 433 | 434 | Finally, recompile the example: 435 | 436 | cd ~/net-next 437 | make samples/bpf/ 438 | 439 | Next, execute the script `xdp2skb_meta.sh` to load the programs into the kernel: 440 | 441 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp2skb_meta.sh 442 | 443 | Usage: ./xdp2skb_meta.sh [-vfh] --dev ethX 444 | -d | --dev : Network device (required) 445 | --flush : Cleanup flush TC and XDP progs 446 | --list : ($LIST) List TC and XDP progs 447 | -v | --verbose : ($VERBOSE) Verbose 448 | --dry-run : ($DRYRUN) Dry-run only (echo commands) 449 | 450 | ERROR: Please specify network device -- required option --dev 451 | 452 | Load the programs in interface `eth0`: 453 | 454 | ./xdp2skb_meta.sh --dev eth0 455 | 456 | We can also load the programs directly using tools as `ip` for the XDP program (`sudo ip -force link set dev eth0 xdp obj xdp2skb_meta_kern.o sec xdp_mark`), just as before, and `tc` for the TC hook program. In the latter case, it is necessary to create a special `qdisc` in the Linux traffic controller, called `clsact`. All this process can be done using the following commands: 457 | 458 | sudo tc qdisc add dev eth0 clsact 459 | sudo tc filter add dev eth0 ingress bpf da obj xdp2skb_meta_kern.o sec tc_mark 460 | 461 | For more information about eBPF on the TC hook, check out the command `man tc-bpf`. 462 | 463 | Once the programs have been loaded on their respective hooks, we can analyze the log messages generated by each one in the file `/sys/kernel/debug/tracing/trace`: 464 | 465 | sudo cat /sys/kernel/debug/tracing/trace 466 | 467 | For continuous reading, use the file `trace_pipe`: 468 | 469 | sudo cat /sys/kernel/debug/tracing/trace_pipe 470 | 471 | With the eBPF programs loaded in the kernel and some traffic flowing through the interface, we can observe the generated messages: 472 | 473 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo cat /sys/kernel/debug/tracing/trace 474 | # tracer: nop 475 | # 476 | # entries-in-buffer/entries-written: 40/40 #P:1 477 | # 478 | # _-----=> irqs-off 479 | # / _----=> need-resched 480 | # | / _---=> hardirq/softirq 481 | # || / _--=> preempt-depth 482 | # ||| / delay 483 | # TASK-PID CPU# |||| TIMESTAMP FUNCTION 484 | # | | | |||| | | 485 | -0 [000] ..s. 13699.213984: 0: [XDP] metadata = 42 486 | -0 [000] ..s. 13699.214009: 0: [TC] metadata = 42 487 | -0 [000] ..s. 13699.421529: 0: [XDP] metadata = 42 488 | -0 [000] ..s. 13699.421542: 0: [TC] metadata = 42 489 | -0 [000] ..s. 13704.450195: 0: [XDP] metadata = 42 490 | -0 [000] ..s. 13704.450205: 0: [TC] metadata = 42 491 | -0 [000] ..s. 13704.450216: 0: [XDP] metadata = 42 492 | 493 | By looking at the messages, we can see that the metadata added on the XDP hook could be successfully received by the program on the TC hook, effectively sharing information between the two kernel stack layers. 494 | -------------------------------------------------------------------------------- /ansible/README.md: -------------------------------------------------------------------------------- 1 | # Script de configuração de ambiente para eBPF 2 | 3 | Ao invés de usar a VM disponibilizada para o minicurso, também é possível provisionar uma outra máquina com todos as dependências necessárias. Para isso disponibilizamos o script `provision.yml`, que pode ser utilizando com a ferramenta [ansible](https://github.com/ansible/ansible). 4 | 5 | **ATENÇÃO**: Este script instala a versão 5.0 do kernel do Linux, além de diversas outras dependências. Aconselhamos a sua utilização apenas em máquinas de teste/desenvolvimento. 6 | 7 | Essa ferramenta se conecta por SSH com uma máquina remota e efetua todos os passos de instalação especificados no script. Em sua máquina local, execute: 8 | 9 | ansible-playbook -i "," -u -k -K -e "ansible_python_interpreter=/usr/bin/python3" provision.yml 10 | 11 | Onde `` e `` devem ser substituídos pelos valores correspondentes à máquina alvo. O ansible pedirá a senha de acesso SSH e a senha para executar operações como super usuário na máquina remota. -------------------------------------------------------------------------------- /ansible/provision.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Provision environment to play with BPF 3 | hosts: all 4 | gather_facts: yes 5 | become: yes 6 | become_user: root 7 | tasks: 8 | - name: Add user ebpf 9 | user: 10 | name: ebpf 11 | groups: sudo 12 | append: yes 13 | - name: Create dir for kernel files 14 | file: 15 | path: /usr/src/kernel-v5.0 16 | state: directory 17 | - name: Download kernel v5.0 18 | get_url: 19 | url: "{{ item }}" 20 | dest: /usr/src/kernel-v5.0/ 21 | loop: 22 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-headers-5.0.0-050000_5.0.0-050000.201903032031_all.deb" 23 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-headers-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb" 24 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-image-unsigned-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb" 25 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-modules-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb" 26 | when: ansible_kernel is not match("5\.0.*") 27 | - name: Install kernel v5.0 28 | shell: dpkg -i *.deb 29 | args: 30 | chdir: /usr/src/kernel-v5.0/ 31 | register: install_result 32 | when: ansible_kernel is not match("5\.0.*") 33 | - name: Reboot and wait 34 | reboot: 35 | when: install_result is changed 36 | - name: Get kernel version 37 | shell: uname -r 38 | register: result 39 | - name: Fail if kernel 5.0 is not loaded 40 | fail: 41 | msg: "Wrong kernel loaded ({{ kernel_version }}}). Expected 5.0" 42 | vars: 43 | kernel_version: "{{ result.stdout }}" 44 | when: kernel_version is not match("5\.0.*") 45 | - name: Install dependencies 46 | apt: 47 | name: "{{ item }}" 48 | update_cache: yes # Run 'apt update' before install 49 | loop: 50 | - make 51 | - gcc 52 | - g++ 53 | - pkg-config 54 | - libssl-dev 55 | - bc 56 | - libelf-dev 57 | - libcap-dev 58 | - gcc-multilib 59 | - libncurses5-dev 60 | - git 61 | - pkg-config 62 | - graphviz 63 | - llvm 64 | - clang 65 | - elfutils 66 | - libmnl-dev 67 | - bison 68 | - flex 69 | - ifupdown 70 | - python-scapy 71 | - python-netifaces 72 | - binutils-dev 73 | - hping3 74 | - net-tools 75 | - python-pip 76 | - autoconf 77 | - automake 78 | - libtool 79 | - unzip 80 | - curl 81 | - python-twisted 82 | tags: install 83 | - name: Clone newest iproute2 repo 84 | git: 85 | repo: https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git 86 | dest: /home/ebpf/iproute2/ 87 | tags: [clone, iproute] 88 | ignore_errors: yes 89 | - name: Install iproute2 90 | shell: ./configure && make && make install 91 | args: 92 | chdir: /home/ebpf/iproute2 93 | ignore_errors: yes 94 | tags: iproute 95 | - name: Clone prototype-kernel project 96 | git: 97 | repo: https://github.com/netoptimizer/prototype-kernel.git 98 | dest: /home/ebpf/prototype-kernel 99 | ignore_errors: yes 100 | tags: clone 101 | - name: Clone linux kernel net-next tree 102 | git: 103 | repo: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 104 | dest: /home/ebpf/net-next 105 | version: v5.0 106 | depth: 1 # Reduce commit history to download faster 107 | ignore_errors: yes 108 | tags: clone 109 | - name: Install bpftool 110 | shell: make && make install 111 | args: 112 | chdir: /home/ebpf/net-next/tools/bpf/bpftool 113 | ignore_errors: yes 114 | tags: kernel 115 | - name: Clone protobuf 116 | git: 117 | repo: https://github.com/protocolbuffers/protobuf.git 118 | dest: /usr/src/protobuf-3.5.0 119 | version: v3.5.0 120 | - name: Compile protobuf 121 | shell: ./autogen.sh && ./configure && make && make check && make install && ldconfig 122 | args: 123 | chdir: /usr/src/protobuf-3.5.0 124 | - name: Clone protobuf-c 125 | git: 126 | repo: https://github.com/protobuf-c/protobuf-c.git 127 | dest: /usr/src/protobuf-c 128 | version: v1.3.0 129 | - name: Compile protobuf 130 | shell: ./autogen.sh && ./configure && make && make install 131 | args: 132 | chdir: /usr/src/protobuf-c 133 | - name: Clone BPFabric 134 | git: 135 | repo: https://github.com/UofG-netlab/BPFabric 136 | dest: /home/ebpf/BPFabric 137 | tags: clone 138 | - name: Patch BPFabric to use latest clang instead of clang-3.9 139 | replace: 140 | path: /home/ebpf/BPFabric/examples/Makefile 141 | regexp: 'clang-3\.9' 142 | replace: 'clang' 143 | tags: bpfabric 144 | - name: Compile BPFabric 145 | shell: make 146 | args: 147 | chdir: /home/ebpf/BPFabric 148 | tags: bpfabric -------------------------------------------------------------------------------- /examples/Makefile: -------------------------------------------------------------------------------- 1 | # SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) 2 | 3 | LLVM_VERSION ?= 4 | LLVM := $(shell clang$(LLVM_VERSION) --version) 5 | CLANG_FLAGS ?= -W -Wall -Wno-compare-distinct-pointer-types -g 6 | 7 | SRCS=$(wildcard *.c) 8 | OBJS=$(patsubst %.c,%.o,$(SRCS)) 9 | Q ?= @ 10 | 11 | INCLUDE_DIRS ?= -I../headers/ 12 | 13 | %.o: %.c 14 | @echo "\tLLVM CC $@" 15 | $(Q) clang$(LLVM_VERSION) $(INCLUDE_DIRS) -O2 -emit-llvm -c $< $(CLANG_FLAGS) -o $(patsubst %.o,%.llvm,$@) 16 | $(Q) llc$(LLVM_VERSION) -march=bpf -filetype=obj -o $@ $(patsubst %.o,%.llvm,$@) 17 | $(Q) rm $(patsubst %.o,%.llvm,$@) 18 | 19 | ifeq ($(LLVM),) 20 | all: 21 | $(warning Install LLVM to compile BPF sources) 22 | else 23 | all: $(OBJS) 24 | endif 25 | 26 | clean: 27 | rm -f *.llvm 28 | rm -f *.o 29 | 30 | .PHONY: all clean 31 | -------------------------------------------------------------------------------- /examples/chacha8.c: -------------------------------------------------------------------------------- 1 | 2 | 3 | //------------------------------------------------------------------ 4 | // Includes. 5 | //------------------------------------------------------------------ 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include "bpf_endian.h" 18 | #include "bpf_helpers.h" 19 | 20 | #ifndef IP_FRAGMENTED 21 | #define IP_FRAGMENTED 65343 22 | #endif 23 | 24 | #ifndef __inline 25 | #define __inline inline __attribute__((always_inline)) 26 | #endif 27 | 28 | #ifndef CHACHA_ROUNDS 29 | #define CHACHA_ROUNDS 8 30 | #endif 31 | 32 | #ifndef memset 33 | #define memset(dest, c_int, n) __builtin_memset((dest), (c_int), (n)) 34 | #endif 35 | 36 | //------------------------------------------------------------------ 37 | // Types. 38 | //------------------------------------------------------------------ 39 | // The chacha state context. 40 | typedef struct 41 | { 42 | uint32_t state[16]; 43 | uint8_t rounds; 44 | } chacha_ctx; 45 | 46 | 47 | //------------------------------------------------------------------ 48 | // Macros. 49 | //------------------------------------------------------------------ 50 | // Basic 32-bit operators. 51 | #define ROTATE(v,c) ((uint32_t)((v) << (c)) | ((v) >> (32 - (c)))) 52 | #define XOR(v,w) ((v) ^ (w)) 53 | #define PLUS(v,w) ((uint32_t)((v) + (w))) 54 | #define PLUSONE(v) (PLUS((v), 1)) 55 | 56 | // Little endian machine assumed (x86-64). 57 | #define U32TO8_LITTLE(p, v) (((uint32_t*)(p))[0] = v) 58 | #define U8TO32_LITTLE(p) (((uint32_t*)(p))[0]) 59 | 60 | #define QUARTERROUND(a, b, c, d) \ 61 | x[a] = PLUS(x[a],x[b]); x[d] = ROTATE(XOR(x[d],x[a]),16); \ 62 | x[c] = PLUS(x[c],x[d]); x[b] = ROTATE(XOR(x[b],x[c]),12); \ 63 | x[a] = PLUS(x[a],x[b]); x[d] = ROTATE(XOR(x[d],x[a]), 8); \ 64 | x[c] = PLUS(x[c],x[d]); x[b] = ROTATE(XOR(x[b],x[c]), 7); 65 | 66 | 67 | //------------------------------------------------------------------ 68 | // Constants. 69 | //------------------------------------------------------------------ 70 | //static const uint8_t SIGMA[16] = "expand 32-byte k"; 71 | static const uint8_t TAU[16] = "expand 16-byte k"; 72 | 73 | 74 | //------------------------------------------------------------------ 75 | // doublerounds() 76 | // 77 | // Perform rounds/2 number of doublerounds. 78 | // TODO: Change output format to 16 words. 79 | //------------------------------------------------------------------ 80 | static __inline void doublerounds(uint8_t output[64], const uint32_t input[16], uint8_t rounds) 81 | { 82 | uint32_t x[16]; 83 | int32_t i; 84 | 85 | #pragma clang loop unroll (full) 86 | for (i = 0;i < 16;++i) { 87 | x[i] = input[i]; 88 | } 89 | 90 | #pragma clang loop unroll (full) 91 | for (i = rounds ; i > 0 ; i -= 2) { 92 | QUARTERROUND( 0, 4, 8,12) 93 | QUARTERROUND( 1, 5, 9,13) 94 | QUARTERROUND( 2, 6,10,14) 95 | QUARTERROUND( 3, 7,11,15) 96 | 97 | QUARTERROUND( 0, 5,10,15) 98 | QUARTERROUND( 1, 6,11,12) 99 | QUARTERROUND( 2, 7, 8,13) 100 | QUARTERROUND( 3, 4, 9,14) 101 | } 102 | 103 | #pragma clang loop unroll (full) 104 | for (i = 0;i < 16;++i) { 105 | x[i] = PLUS(x[i], input[i]); 106 | } 107 | 108 | #pragma clang loop unroll (full) 109 | for (i = 0;i < 16;++i) { 110 | U32TO8_LITTLE(output + 4 * i, x[i]); 111 | } 112 | } 113 | 114 | 115 | 116 | //------------------------------------------------------------------ 117 | // init() 118 | // 119 | // Initializes the given cipher context with key, iv and constants. 120 | // This also resets the block counter. 121 | //------------------------------------------------------------------ 122 | static __inline void init(chacha_ctx *x, uint8_t *key, uint32_t keylen, uint8_t *iv) 123 | { 124 | /* 125 | if (keylen == 256) { 126 | // 256 bit key. 127 | x->state[0] = U8TO32_LITTLE(SIGMA + 0); 128 | x->state[1] = U8TO32_LITTLE(SIGMA + 4); 129 | x->state[2] = U8TO32_LITTLE(SIGMA + 8); 130 | x->state[3] = U8TO32_LITTLE(SIGMA + 12); 131 | x->state[4] = U8TO32_LITTLE(key + 0); 132 | x->state[5] = U8TO32_LITTLE(key + 4); 133 | x->state[6] = U8TO32_LITTLE(key + 8); 134 | x->state[7] = U8TO32_LITTLE(key + 12); 135 | x->state[8] = U8TO32_LITTLE(key + 16); 136 | x->state[9] = U8TO32_LITTLE(key + 20); 137 | x->state[10] = U8TO32_LITTLE(key + 24); 138 | x->state[11] = U8TO32_LITTLE(key + 28); 139 | } 140 | 141 | else { 142 | // 128 bit key. 143 | x->state[0] = U8TO32_LITTLE(TAU + 0); 144 | x->state[1] = U8TO32_LITTLE(TAU + 4); 145 | x->state[2] = U8TO32_LITTLE(TAU + 8); 146 | x->state[3] = U8TO32_LITTLE(TAU + 12); 147 | x->state[4] = U8TO32_LITTLE(key + 0); 148 | x->state[5] = U8TO32_LITTLE(key + 4); 149 | x->state[6] = U8TO32_LITTLE(key + 8); 150 | x->state[7] = U8TO32_LITTLE(key + 12); 151 | x->state[8] = U8TO32_LITTLE(key + 0); 152 | x->state[9] = U8TO32_LITTLE(key + 4); 153 | x->state[10] = U8TO32_LITTLE(key + 8); 154 | x->state[11] = U8TO32_LITTLE(key + 12); 155 | } 156 | */ 157 | // 128 bit key. 158 | x->state[0] = U8TO32_LITTLE(TAU + 0); 159 | x->state[1] = U8TO32_LITTLE(TAU + 4); 160 | x->state[2] = U8TO32_LITTLE(TAU + 8); 161 | x->state[3] = U8TO32_LITTLE(TAU + 12); 162 | x->state[4] = U8TO32_LITTLE(key + 0); 163 | x->state[5] = U8TO32_LITTLE(key + 4); 164 | x->state[6] = U8TO32_LITTLE(key + 8); 165 | x->state[7] = U8TO32_LITTLE(key + 12); 166 | x->state[8] = U8TO32_LITTLE(key + 0); 167 | x->state[9] = U8TO32_LITTLE(key + 4); 168 | x->state[10] = U8TO32_LITTLE(key + 8); 169 | x->state[11] = U8TO32_LITTLE(key + 12); 170 | 171 | // Reset block counter and add IV to state. 172 | x->state[12] = 0; 173 | x->state[13] = 0; 174 | x->state[14] = U8TO32_LITTLE(iv + 0); 175 | x->state[15] = U8TO32_LITTLE(iv + 4); 176 | } 177 | 178 | 179 | //------------------------------------------------------------------ 180 | // next() 181 | // 182 | // Given a pointer to the next block m of 64 cleartext bytes will 183 | // use the given context to transform (encrypt/decrypt) the 184 | // block. The result will be stored in c. 185 | //------------------------------------------------------------------ 186 | static __inline void next(chacha_ctx *ctx, uint8_t *m, const uint8_t *m_end) 187 | { 188 | // Temporary internal state x. 189 | uint8_t x[64]; 190 | uint8_t i; 191 | 192 | // Update the internal state and increase the block counter. 193 | doublerounds(x, ctx->state, ctx->rounds); 194 | ctx->state[12] = PLUSONE(ctx->state[12]); 195 | if (!ctx->state[12]) { 196 | ctx->state[13] = PLUSONE(ctx->state[13]); 197 | } 198 | 199 | // XOR the input block with the new temporal state to 200 | // create the transformed block. 201 | /* 202 | if (m+64 > m_end) { 203 | return; 204 | } 205 | 206 | #pragma clang loop unroll (full) 207 | for (i = 0 ; i < 64 ; ++i) { 208 | //c[i] = m[i] ^ x[i]; 209 | m[i] ^= x[i]; 210 | } 211 | */ 212 | uint64_t * m_pos; 213 | uint64_t * x_pos; 214 | #pragma clang loop unroll (full) 215 | for (i = 0 ; i < 8 ; ++i) { 216 | //c[i] = m[i] ^ x[i]; 217 | m_pos = (uint64_t*)(m) + i; 218 | x_pos = (uint64_t*)(x) + i; 219 | *m_pos ^= *x_pos; 220 | } 221 | } 222 | 223 | 224 | //------------------------------------------------------------------ 225 | // init_ctx() 226 | // 227 | // Init a given ChaCha context by setting state to zero and 228 | // setting the given number of rounds. 229 | //------------------------------------------------------------------ 230 | static __inline void init_ctx(chacha_ctx *ctx, uint8_t rounds) 231 | { 232 | uint8_t i; 233 | 234 | #pragma clang loop unroll (full) 235 | for (i = 0 ; i < 16 ; i++) { 236 | ctx->state[i] = 0; 237 | } 238 | ctx->rounds = rounds; 239 | } 240 | 241 | /* 242 | static __inline int parse_tcp(struct xdp_md *ctx, __u64 nf_off) { 243 | void *data_end = (void*)(long)ctx->data_end; 244 | void *data = (void*)(long)ctx->data; 245 | chacha_ctx cha_ctx; 246 | //struct tcphdr *tcph; 247 | uint8_t *pkt_data; 248 | __u32 tcp_off; 249 | //tcph = data + nf_off; 250 | // pkt_data is the tcp payload 251 | tcp_off = sizeof(struct tcphdr); 252 | nf_off += tcp_off; 253 | if (data + nf_off > data_end) { 254 | return XDP_DROP; 255 | } 256 | pkt_data = data + nf_off; 257 | return XDP_PASS; 258 | }*/ 259 | 260 | static __always_inline bool parse_transport(void *data, __u64 off, void *data_end) { 261 | struct udphdr *tudp; 262 | tudp = data + off; 263 | if (tudp + 1 > data_end) { 264 | return false; 265 | } 266 | else { 267 | return true; 268 | } 269 | } 270 | 271 | 272 | static __inline int parse_ip(struct xdp_md *ctx, __u64 nf_off) { 273 | void *data_end = (void*)(long)ctx->data_end; 274 | void *data = (void*)(long)ctx->data; 275 | struct iphdr *iph; 276 | __u32 ip_off; 277 | __u8 ip_protocol; 278 | 279 | iph = data + nf_off; 280 | if (iph + 1 > data_end) { 281 | return XDP_DROP; 282 | } 283 | if (iph->ihl != 5) { 284 | return XDP_DROP; 285 | } 286 | ip_protocol = iph->protocol; 287 | ip_off = sizeof(struct iphdr); 288 | nf_off += ip_off; 289 | 290 | if (iph->frag_off & IP_FRAGMENTED) { 291 | return XDP_DROP; 292 | } 293 | 294 | if (ip_protocol == IPPROTO_TCP) { 295 | if (!parse_transport(data, nf_off, data_end)) { 296 | return XDP_DROP; 297 | } 298 | else { 299 | nf_off += sizeof(struct tcphdr); 300 | } 301 | } 302 | else if (ip_protocol == IPPROTO_UDP) { 303 | if (!parse_transport(data, nf_off, data_end)) { 304 | return XDP_DROP; 305 | } 306 | else { 307 | nf_off += sizeof(struct udphdr); 308 | } 309 | } 310 | else { 311 | return XDP_PASS; 312 | } 313 | 314 | chacha_ctx cha_ctx; 315 | uint8_t *pkt_data = data + nf_off; 316 | /* 317 | uint8_t t_result[64] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 318 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 319 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 320 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 321 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 322 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 323 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 324 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; 325 | */ 326 | uint8_t t_key[32] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 327 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 328 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 329 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; 330 | 331 | uint8_t t_iv[8] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; 332 | init_ctx(&cha_ctx, CHACHA_ROUNDS); 333 | init(&cha_ctx, t_key, 128, t_iv); 334 | 335 | // loop here 336 | int32_t i; 337 | #pragma clang loop unroll (full) 338 | for(i=0; i <= 23; i++) { 339 | if (pkt_data + 64 > data_end) { 340 | break; 341 | } 342 | //next(&cha_ctx, pkt_data, data_end, t_result); 343 | next(&cha_ctx, pkt_data, data_end); 344 | //memcpy(pkt_data, t_result, sizeof(t_result)); 345 | pkt_data += (__u64)64; 346 | } 347 | return XDP_TX; 348 | } 349 | 350 | 351 | SEC("chacha") 352 | int cha(struct xdp_md *ctx){ 353 | void *data_end = (void *)(long)ctx->data_end; 354 | void *data = (void *)(long)ctx->data; 355 | struct ethhdr *eth = data; 356 | __u32 eth_proto; 357 | __u32 nh_off; 358 | nh_off = sizeof(struct ethhdr); 359 | if (data + nh_off > data_end) 360 | return XDP_PASS; 361 | eth_proto = eth->h_proto; 362 | 363 | // the demo program only accepts IPv4 packets. 364 | if (eth_proto == bpf_htons(ETH_P_IP)) { 365 | return parse_ip(ctx, nh_off); 366 | } 367 | else { 368 | return XDP_PASS; 369 | } 370 | } 371 | 372 | -------------------------------------------------------------------------------- /examples/dropworld.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int prog(struct xdp_md *ctx){ 4 | return XDP_DROP; 5 | } -------------------------------------------------------------------------------- /examples/layercoop.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #include "bpf_endian.h" 13 | #include "bpf_helpers.h" 14 | 15 | struct pair { 16 | uint32_t lip; // local IP 17 | uint32_t rip; // remote IP 18 | }; 19 | 20 | struct stats { 21 | uint64_t tx_cnt; 22 | uint64_t rx_cnt; 23 | uint64_t tx_bytes; 24 | uint64_t rx_bytes; 25 | }; 26 | 27 | struct bpf_elf_map SEC("maps") trackers = { 28 | .type = BPF_MAP_TYPE_HASH, 29 | .size_key = sizeof(struct pair), 30 | .size_value = sizeof(struct stats), 31 | .max_elem = 2048, 32 | .pinning = 2, // PIN_GLOBAL_NS 33 | }; 34 | 35 | static bool parse_ipv4(bool is_rx, void* data, void* data_end, struct pair *pair){ 36 | struct ethhdr *eth = data; 37 | struct iphdr *ip; 38 | 39 | if(data + sizeof(struct ethhdr) > data_end) 40 | return false; 41 | 42 | if(bpf_ntohs(eth->h_proto) != ETH_P_IP) 43 | return false; 44 | 45 | ip = data + sizeof(struct ethhdr); 46 | 47 | if ((void*) ip + sizeof(struct iphdr) > data_end) 48 | return false; 49 | 50 | pair->lip = is_rx ? ip->daddr : ip->saddr; 51 | pair->rip = is_rx ? ip->saddr : ip->daddr; 52 | 53 | return true; 54 | } 55 | 56 | static void update_stats(bool is_rx, struct pair *key, long long bytes){ 57 | struct stats *stats, newstats = {0,0,0,0}; 58 | 59 | stats = bpf_map_lookup_elem(&trackers, key); 60 | if(stats){ 61 | if(is_rx){ 62 | stats->rx_cnt++; 63 | stats->rx_bytes += bytes; 64 | }else{ 65 | stats->tx_cnt++; 66 | stats->tx_bytes += bytes; 67 | } 68 | }else{ 69 | if(is_rx){ 70 | newstats.rx_cnt = 1; 71 | newstats.rx_bytes = bytes; 72 | }else{ 73 | newstats.tx_cnt = 1; 74 | newstats.tx_bytes = bytes; 75 | } 76 | 77 | bpf_map_update_elem(&trackers, key, &newstats, BPF_NOEXIST); 78 | } 79 | } 80 | 81 | SEC("rx") 82 | int track_rx(struct xdp_md *ctx) 83 | { 84 | void *data_end = (void *)(long)ctx->data_end; 85 | void *data = (void *)(long)ctx->data; 86 | struct pair pair; 87 | 88 | if(!parse_ipv4(true,data,data_end,&pair)) 89 | return XDP_PASS; 90 | 91 | // Update RX statistics 92 | update_stats(true,&pair,data_end-data); 93 | 94 | return XDP_PASS; 95 | } 96 | 97 | SEC("tx") 98 | int track_tx(struct __sk_buff *skb) 99 | { 100 | void *data_end = (void *)(long)skb->data_end; 101 | void *data = (void *)(long)skb->data; 102 | struct pair pair; 103 | 104 | if(!parse_ipv4(false,data,data_end,&pair)) 105 | return TC_ACT_OK; 106 | 107 | // Update TX statistics 108 | update_stats(false,&pair,data_end-data); 109 | 110 | return TC_ACT_OK; 111 | } -------------------------------------------------------------------------------- /examples/portfilter.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include "bpf_endian.h" 12 | #include "bpf_helpers.h" 13 | 14 | /* 0x3FFF mask to check for fragment offset field */ 15 | #define IP_FRAGMENTED 65343 16 | 17 | /* Port number to be dropped */ 18 | #define PORT_DROP 80 19 | 20 | static __always_inline int process_packet(struct xdp_md *ctx, __u64 off){ 21 | 22 | void *data_end = (void *)(long)ctx->data_end; 23 | void *data = (void *)(long)ctx->data; 24 | struct iphdr *iph; 25 | struct tcphdr *tcp; 26 | __u16 payload_len; 27 | __u8 protocol; 28 | 29 | iph = data + off; 30 | if (iph + 1 > data_end) 31 | return XDP_PASS; 32 | if (iph->ihl != 5) 33 | return XDP_PASS; 34 | 35 | protocol = iph->protocol; 36 | payload_len = bpf_ntohs(iph->tot_len); 37 | off += sizeof(struct iphdr); 38 | 39 | /* do not support fragmented packets as L4 headers may be missing */ 40 | if (iph->frag_off & IP_FRAGMENTED) 41 | return XDP_PASS; 42 | 43 | if (protocol == IPPROTO_TCP) { 44 | tcp = data + off; 45 | if(tcp + 1 > data_end) 46 | return XDP_PASS; 47 | 48 | /* Drop if using port PORT_DROP */ 49 | if(tcp->source == bpf_htons(PORT_DROP) || tcp->dest == bpf_htons(PORT_DROP)) 50 | return XDP_DROP; 51 | else 52 | return XDP_PASS; 53 | 54 | } else if (protocol == IPPROTO_UDP) { 55 | return XDP_PASS; 56 | } 57 | 58 | return XDP_PASS; 59 | } 60 | 61 | 62 | SEC("filter") 63 | int pfilter(struct xdp_md *ctx){ 64 | 65 | void *data_end = (void *)(long)ctx->data_end; 66 | void *data = (void *)(long)ctx->data; 67 | struct ethhdr *eth = data; 68 | __u32 eth_proto; 69 | __u32 nh_off; 70 | 71 | nh_off = sizeof(struct ethhdr); 72 | if (data + nh_off > data_end) 73 | return XDP_PASS; 74 | eth_proto = eth->h_proto; 75 | 76 | /* demo program only accepts ipv4 packets */ 77 | if (eth_proto == bpf_htons(ETH_P_IP)) 78 | return process_packet(ctx, nh_off); 79 | else 80 | return XDP_PASS; 81 | } -------------------------------------------------------------------------------- /examples/tcpfilter.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include "bpf_endian.h" 7 | 8 | int isTCP( struct xdp_md *ctx ) { 9 | void *data_end = (void *)(long) ctx->data_end; 10 | void *data_begin = (void *)(long) ctx->data; 11 | struct ethhdr* eth = data_begin; 12 | 13 | // Check packet's size 14 | if(eth + 1 > data_end) 15 | return XDP_PASS; 16 | 17 | // Check if Ethernet frame has IPv4 packet 18 | if (eth->h_proto == bpf_htons( ETH_P_IP )) { 19 | struct iphdr *ipv4 = (struct iphdr *)( ((void*)eth) + ETH_HLEN ); 20 | 21 | if(ipv4 + 1 > data_end) 22 | return XDP_PASS; 23 | 24 | // Check if IPv4 packet contains a TCP segment 25 | if (ipv4->protocol == IPPROTO_TCP) 26 | return XDP_PASS; 27 | } 28 | return XDP_DROP; 29 | } -------------------------------------------------------------------------------- /headers/bpf_endian.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: GPL-2.0 */ 2 | /* Copied from $(LINUX)/tools/testing/selftests/bpf/bpf_endian.h */ 3 | #ifndef __BPF_ENDIAN__ 4 | #define __BPF_ENDIAN__ 5 | 6 | #include 7 | 8 | /* LLVM's BPF target selects the endianness of the CPU 9 | * it compiles on, or the user specifies (bpfel/bpfeb), 10 | * respectively. The used __BYTE_ORDER__ is defined by 11 | * the compiler, we cannot rely on __BYTE_ORDER from 12 | * libc headers, since it doesn't reflect the actual 13 | * requested byte order. 14 | * 15 | * Note, LLVM's BPF target has different __builtin_bswapX() 16 | * semantics. It does map to BPF_ALU | BPF_END | BPF_TO_BE 17 | * in bpfel and bpfeb case, which means below, that we map 18 | * to cpu_to_be16(). We could use it unconditionally in BPF 19 | * case, but better not rely on it, so that this header here 20 | * can be used from application and BPF program side, which 21 | * use different targets. 22 | */ 23 | #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 24 | # define __bpf_ntohs(x)__builtin_bswap16(x) 25 | # define __bpf_htons(x)__builtin_bswap16(x) 26 | # define __bpf_constant_ntohs(x)___constant_swab16(x) 27 | # define __bpf_constant_htons(x)___constant_swab16(x) 28 | # define __bpf_ntohl(x)__builtin_bswap32(x) 29 | # define __bpf_htonl(x)__builtin_bswap32(x) 30 | # define __bpf_constant_ntohl(x)___constant_swab32(x) 31 | # define __bpf_constant_htonl(x)___constant_swab32(x) 32 | #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ 33 | # define __bpf_ntohs(x)(x) 34 | # define __bpf_htons(x)(x) 35 | # define __bpf_constant_ntohs(x)(x) 36 | # define __bpf_constant_htons(x)(x) 37 | # define __bpf_ntohl(x)(x) 38 | # define __bpf_htonl(x)(x) 39 | # define __bpf_constant_ntohl(x)(x) 40 | # define __bpf_constant_htonl(x)(x) 41 | #else 42 | # error "Fix your compiler's __BYTE_ORDER__?!" 43 | #endif 44 | 45 | #define bpf_htons(x)\ 46 | (__builtin_constant_p(x) ?\ 47 | __bpf_constant_htons(x) : __bpf_htons(x)) 48 | #define bpf_ntohs(x)\ 49 | (__builtin_constant_p(x) ?\ 50 | __bpf_constant_ntohs(x) : __bpf_ntohs(x)) 51 | #define bpf_htonl(x)\ 52 | (__builtin_constant_p(x) ?\ 53 | __bpf_constant_htonl(x) : __bpf_htonl(x)) 54 | #define bpf_ntohl(x)\ 55 | (__builtin_constant_p(x) ?\ 56 | __bpf_constant_ntohl(x) : __bpf_ntohl(x)) 57 | 58 | #endif /* __BPF_ENDIAN__ */ 59 | -------------------------------------------------------------------------------- /headers/bpf_helpers.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: GPL-2.0 */ 2 | /* Copied from $(LINUX)/tools/testing/selftests/bpf/bpf_helpers.h */ 3 | #ifndef __BPF_HELPERS_H 4 | #define __BPF_HELPERS_H 5 | 6 | /* helper macro to place programs, maps, license in 7 | * different sections in elf_bpf file. Section names 8 | * are interpreted by elf_bpf loader 9 | */ 10 | #define SEC(NAME) __attribute__((section(NAME), used)) 11 | 12 | /* helper functions called from eBPF programs written in C */ 13 | static void *(*bpf_map_lookup_elem)(void *map, void *key) = 14 | (void *) BPF_FUNC_map_lookup_elem; 15 | static int (*bpf_map_update_elem)(void *map, void *key, void *value, 16 | unsigned long long flags) = 17 | (void *) BPF_FUNC_map_update_elem; 18 | static int (*bpf_map_delete_elem)(void *map, void *key) = 19 | (void *) BPF_FUNC_map_delete_elem; 20 | static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) = 21 | (void *) BPF_FUNC_probe_read; 22 | static unsigned long long (*bpf_ktime_get_ns)(void) = 23 | (void *) BPF_FUNC_ktime_get_ns; 24 | static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) = 25 | (void *) BPF_FUNC_trace_printk; 26 | static void (*bpf_tail_call)(void *ctx, void *map, int index) = 27 | (void *) BPF_FUNC_tail_call; 28 | static unsigned long long (*bpf_get_smp_processor_id)(void) = 29 | (void *) BPF_FUNC_get_smp_processor_id; 30 | static unsigned long long (*bpf_get_current_pid_tgid)(void) = 31 | (void *) BPF_FUNC_get_current_pid_tgid; 32 | static unsigned long long (*bpf_get_current_uid_gid)(void) = 33 | (void *) BPF_FUNC_get_current_uid_gid; 34 | static int (*bpf_get_current_comm)(void *buf, int buf_size) = 35 | (void *) BPF_FUNC_get_current_comm; 36 | static unsigned long long (*bpf_perf_event_read)(void *map, 37 | unsigned long long flags) = 38 | (void *) BPF_FUNC_perf_event_read; 39 | static int (*bpf_clone_redirect)(void *ctx, int ifindex, int flags) = 40 | (void *) BPF_FUNC_clone_redirect; 41 | static int (*bpf_redirect)(int ifindex, int flags) = 42 | (void *) BPF_FUNC_redirect; 43 | static int (*bpf_perf_event_output)(void *ctx, void *map, 44 | unsigned long long flags, void *data, 45 | int size) = 46 | (void *) BPF_FUNC_perf_event_output; 47 | static int (*bpf_get_stackid)(void *ctx, void *map, int flags) = 48 | (void *) BPF_FUNC_get_stackid; 49 | static int (*bpf_probe_write_user)(void *dst, void *src, int size) = 50 | (void *) BPF_FUNC_probe_write_user; 51 | static int (*bpf_current_task_under_cgroup)(void *map, int index) = 52 | (void *) BPF_FUNC_current_task_under_cgroup; 53 | static int (*bpf_skb_get_tunnel_key)(void *ctx, void *key, int size, int flags) = 54 | (void *) BPF_FUNC_skb_get_tunnel_key; 55 | static int (*bpf_skb_set_tunnel_key)(void *ctx, void *key, int size, int flags) = 56 | (void *) BPF_FUNC_skb_set_tunnel_key; 57 | static int (*bpf_skb_get_tunnel_opt)(void *ctx, void *md, int size) = 58 | (void *) BPF_FUNC_skb_get_tunnel_opt; 59 | static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) = 60 | (void *) BPF_FUNC_skb_set_tunnel_opt; 61 | static unsigned long long (*bpf_get_prandom_u32)(void) = 62 | (void *) BPF_FUNC_get_prandom_u32; 63 | static int (*bpf_xdp_adjust_head)(void *ctx, int offset) = 64 | (void *) BPF_FUNC_xdp_adjust_head; 65 | 66 | /* llvm builtin functions that eBPF C program may use to 67 | * emit BPF_LD_ABS and BPF_LD_IND instructions 68 | */ 69 | struct sk_buff; 70 | unsigned long long load_byte(void *skb, 71 | unsigned long long off) asm("llvm.bpf.load.byte"); 72 | unsigned long long load_half(void *skb, 73 | unsigned long long off) asm("llvm.bpf.load.half"); 74 | unsigned long long load_word(void *skb, 75 | unsigned long long off) asm("llvm.bpf.load.word"); 76 | 77 | /* a helper structure used by eBPF C program 78 | * to describe map attributes to elf_bpf loader 79 | */ 80 | struct bpf_map_def { 81 | unsigned int type; 82 | unsigned int key_size; 83 | unsigned int value_size; 84 | unsigned int max_entries; 85 | unsigned int map_flags; 86 | unsigned int inner_map_idx; 87 | }; 88 | 89 | static int (*bpf_skb_load_bytes)(void *ctx, int off, void *to, int len) = 90 | (void *) BPF_FUNC_skb_load_bytes; 91 | static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from, int len, int flags) = 92 | (void *) BPF_FUNC_skb_store_bytes; 93 | static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int flags) = 94 | (void *) BPF_FUNC_l3_csum_replace; 95 | static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flags) = 96 | (void *) BPF_FUNC_l4_csum_replace; 97 | static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) = 98 | (void *) BPF_FUNC_skb_under_cgroup; 99 | static int (*bpf_skb_change_head)(void *, int len, int flags) = 100 | (void *) BPF_FUNC_skb_change_head; 101 | 102 | #if defined(__x86_64__) 103 | 104 | #define PT_REGS_PARM1(x) ((x)->di) 105 | #define PT_REGS_PARM2(x) ((x)->si) 106 | #define PT_REGS_PARM3(x) ((x)->dx) 107 | #define PT_REGS_PARM4(x) ((x)->cx) 108 | #define PT_REGS_PARM5(x) ((x)->r8) 109 | #define PT_REGS_RET(x) ((x)->sp) 110 | #define PT_REGS_FP(x) ((x)->bp) 111 | #define PT_REGS_RC(x) ((x)->ax) 112 | #define PT_REGS_SP(x) ((x)->sp) 113 | #define PT_REGS_IP(x) ((x)->ip) 114 | 115 | #elif defined(__s390x__) 116 | 117 | #define PT_REGS_PARM1(x) ((x)->gprs[2]) 118 | #define PT_REGS_PARM2(x) ((x)->gprs[3]) 119 | #define PT_REGS_PARM3(x) ((x)->gprs[4]) 120 | #define PT_REGS_PARM4(x) ((x)->gprs[5]) 121 | #define PT_REGS_PARM5(x) ((x)->gprs[6]) 122 | #define PT_REGS_RET(x) ((x)->gprs[14]) 123 | #define PT_REGS_FP(x) ((x)->gprs[11]) /* Works only with CONFIG_FRAME_POINTER */ 124 | #define PT_REGS_RC(x) ((x)->gprs[2]) 125 | #define PT_REGS_SP(x) ((x)->gprs[15]) 126 | #define PT_REGS_IP(x) ((x)->psw.addr) 127 | 128 | #elif defined(__aarch64__) 129 | 130 | #define PT_REGS_PARM1(x) ((x)->regs[0]) 131 | #define PT_REGS_PARM2(x) ((x)->regs[1]) 132 | #define PT_REGS_PARM3(x) ((x)->regs[2]) 133 | #define PT_REGS_PARM4(x) ((x)->regs[3]) 134 | #define PT_REGS_PARM5(x) ((x)->regs[4]) 135 | #define PT_REGS_RET(x) ((x)->regs[30]) 136 | #define PT_REGS_FP(x) ((x)->regs[29]) /* Works only with CONFIG_FRAME_POINTER */ 137 | #define PT_REGS_RC(x) ((x)->regs[0]) 138 | #define PT_REGS_SP(x) ((x)->sp) 139 | #define PT_REGS_IP(x) ((x)->pc) 140 | 141 | #elif defined(__powerpc__) 142 | 143 | #define PT_REGS_PARM1(x) ((x)->gpr[3]) 144 | #define PT_REGS_PARM2(x) ((x)->gpr[4]) 145 | #define PT_REGS_PARM3(x) ((x)->gpr[5]) 146 | #define PT_REGS_PARM4(x) ((x)->gpr[6]) 147 | #define PT_REGS_PARM5(x) ((x)->gpr[7]) 148 | #define PT_REGS_RC(x) ((x)->gpr[3]) 149 | #define PT_REGS_SP(x) ((x)->sp) 150 | #define PT_REGS_IP(x) ((x)->nip) 151 | 152 | #elif defined(__sparc__) 153 | 154 | #define PT_REGS_PARM1(x) ((x)->u_regs[UREG_I0]) 155 | #define PT_REGS_PARM2(x) ((x)->u_regs[UREG_I1]) 156 | #define PT_REGS_PARM3(x) ((x)->u_regs[UREG_I2]) 157 | #define PT_REGS_PARM4(x) ((x)->u_regs[UREG_I3]) 158 | #define PT_REGS_PARM5(x) ((x)->u_regs[UREG_I4]) 159 | #define PT_REGS_RET(x) ((x)->u_regs[UREG_I7]) 160 | #define PT_REGS_RC(x) ((x)->u_regs[UREG_I0]) 161 | #define PT_REGS_SP(x) ((x)->u_regs[UREG_FP]) 162 | #if defined(__arch64__) 163 | #define PT_REGS_IP(x) ((x)->tpc) 164 | #else 165 | #define PT_REGS_IP(x) ((x)->pc) 166 | #endif 167 | 168 | #endif 169 | 170 | #ifdef __powerpc__ 171 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ (ip) = (ctx)->link; }) 172 | #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP 173 | #elif defined(__sparc__) 174 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ (ip) = PT_REGS_RET(ctx); }) 175 | #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP 176 | #else 177 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ \ 178 | bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) 179 | #define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({ \ 180 | bpf_probe_read(&(ip), sizeof(ip), \ 181 | (void *)(PT_REGS_FP(ctx) + sizeof(ip))); }) 182 | #endif 183 | 184 | #endif 185 | -------------------------------------------------------------------------------- /headers/common.h: -------------------------------------------------------------------------------- 1 | #ifndef COMMON_H_ 2 | #define COMMON_H_ 3 | 4 | // Nicer way to call bpf_trace_printk() 5 | #define bpf_custom_printk(fmt, ...) \ 6 | ({ \ 7 | char ____fmt[] = fmt; \ 8 | bpf_trace_printk(____fmt, sizeof(____fmt), \ 9 | ##__VA_ARGS__); \ 10 | }) 11 | 12 | #endif /* COMMON_H_ */ -------------------------------------------------------------------------------- /images/vbox-create.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-create.png -------------------------------------------------------------------------------- /images/vbox-disk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-disk.png -------------------------------------------------------------------------------- /images/vbox-hostonly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-hostonly.png -------------------------------------------------------------------------------- /images/vbox-memory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-memory.png -------------------------------------------------------------------------------- /images/vbox-nat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-nat.png --------------------------------------------------------------------------------