├── .gitignore
├── README.md
├── ansible
├── README.md
└── provision.yml
├── examples
├── Makefile
├── chacha8.c
├── dropworld.c
├── layercoop.c
├── portfilter.c
└── tcpfilter.c
├── headers
├── bpf_endian.h
├── bpf_helpers.h
└── common.h
└── images
├── vbox-create.png
├── vbox-disk.png
├── vbox-hostonly.png
├── vbox-memory.png
└── vbox-nat.png
/.gitignore:
--------------------------------------------------------------------------------
1 | .vagrant/*
2 | *.log
3 | *.retry
4 | .vscode/*
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Fast Packet Processing with eBPF and XDP: Concepts, Code, Challenges and Applications
2 |
3 | ## About
4 |
5 | This repository presents complimentary material to the paper "Fast Packet Processing with eBPF and XDP: Concepts, Code, Challenges and Applications" submitted to ACM Computing Surveys (CSUR).
6 |
7 | The contents are divided as as follows:
8 | - `ansible/`: ansible script used to install required dependencies during VM creation
9 | - `examples/`: examples of eBPF programs
10 | - `headers/`: header files needed to compile the examples
11 | - `images/`: images used in this README
12 |
13 | ## Virtual Machine
14 |
15 | We created a virtual machine to be used in this tutorial. It contains all the code and tools required to complete the tutorial step-by-step.
16 |
17 | - [Download VirtualBox VM](https://www.winet.dcc.ufmg.br/ebpf/minicurso-ebpf-sbrc2019.rar) (user: *ebpf*, password: *ebpf*)
18 |
19 | The virtual machine contains the following items:
20 | - kernel v5.0.0
21 | - iproute2-ss190319
22 | - llvm 6.0.0
23 | - bpftool
24 |
25 | The directory `/home/ebpf` includes a copy of this repository and also local copies of the following projects:
26 | - [Linux kernel net-next](https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git)
27 | - [iproute2](https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git)
28 | - [prototype-kernel](https://github.com/netoptimizer/prototype-kernel.git)
29 |
30 | ## Import the virtual machine
31 |
32 | The following steps have been tested with VirtualBox 5.2.18 on Ubuntu.
33 |
34 | After downloading the VM image, unzip the file `.rar`. You should then see a file named `ebpf-vm.vdi`.
35 |
36 | Open the VirtualBox app and then create a new VM by pressing the `New` button and picking a name for it:
37 |
38 |
39 |
40 |
41 |
42 | Next, VirtualBox will allow modifications to the machine specification, such as the amount of RAM (this value can be modified later).
43 |
44 |
45 |
46 |
47 |
48 | In the next step, VirtualBox will ask for the desired hard disk option. Here you must use an existing disk, which corresponds to the downloaded VM image:
49 |
50 |
51 |
52 |
53 |
54 | Finally, it is necessary to configure the machine with two network interfaces: one in NAT mode (`eth0` - Internet access) and another in HostOnly mode (`eth1` - SSH access).
55 | After the VM creation, right-click on the VM name and then select the `Setting` option.
56 | In the `Network` tab, make sure these two interfaces are created:
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 | Finished! The machine is now ready for the tutorial.
66 |
67 | ## Compiling kernel examples
68 |
69 | The kernel source code has several sample programs, available in the following directories:
70 | - `samples/bpf`
71 | - `tools/testing/selftests/bpf`
72 |
73 | Here we present two examples from `samples/bpf` folder. To compile them, run the following commands:
74 |
75 | cd ~/net-next/
76 | make headers_install
77 | make samples/bpf/
78 |
79 | ## Compiling local examples
80 |
81 | The examples provided in this repository in the `examples/` folder are accompanied by a Makefile. To compile them, run:
82 |
83 | cd examples/
84 | make
85 |
86 | **P.S.**: The dependencies required for compilation are already installed on the virtual machine, so we recommend compiling the examples in that environment.
87 |
88 | ## Examples
89 |
90 | Below are the step-by-step instructions on how to compile and run each of the examples presented in the ACM CSUR paper, as well as some extra ones. In-depth explanations of each example are present in the paper.
91 |
92 | ### Example 0: **Drop World!**
93 |
94 | File location: `./examples/dropworld.c`
95 |
96 | This example is one of the simplest programs possible. It just discards all received packets.
97 |
98 | To compile it, run:
99 |
100 | cd ./examples/
101 | make
102 |
103 | Next, the compiled program can be loaded using the `ip` tool:
104 |
105 | sudo ip -force link set dev eth0 xdp obj dropworld.o sec .text
106 |
107 | The `.text` argument refers to the ELF section in which the program is located. Check out the paper for more details.
108 |
109 | It is possible to check the status of the program by using the following command:
110 |
111 | ip link show eth0
112 |
113 | Expected output:
114 |
115 | ebpf@ebpf-vm:~/bpf-tutorial/examples$ ip link show eth0
116 | 2: eth0: mtu 1500 xdpgeneric qdisc fq_codel state UP mode DEFAULT group default qlen 1000
117 | link/ether 08:00:27:58:07:42 brd ff:ff:ff:ff:ff:ff
118 | prog/xdp id 19
119 |
120 | To remove the program, just run:
121 |
122 | sudo ip link set dev eth0 xdp off
123 |
124 | After the removal, the interface status will be as follows:
125 |
126 | ebpf@ebpf-vm:~/bpf-tutorial/examples$ ip link show eth0
127 | 2: eth0: mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
128 | link/ether 08:00:27:58:07:42 brd ff:ff:ff:ff:ff:ff
129 |
130 | Since in this case the `eth0` interface is used for Internet access, discarding packets received by this interface will effectively cut off web access.
131 |
132 | On another terminal, start a process to ping some domain on the internet:
133 |
134 | ping google.com
135 |
136 | Then load the `dropworld.o` program into the `eth0` interface and notice that the ping responses will be interrupted. This interruption will happen because all response messages sent to the `eth0` interface will be discarded by the loaded program.
137 |
138 | **Extra**: Modify the `dropworld.c` file by changing the return value from `XDP_DROP` to `XDP_PASS`. Then compile and repeat the loading process. Observe that, in this case, the ping responses will still be received. Thus, this new program is effectively an empty operation, which merely receives and passes packets up to the kernel stack.
139 |
140 | ### Example 1: **TCP filter**
141 |
142 | File location: `./examples/tcpfilter.c`
143 |
144 | This example parses packets received on an interface and only accepts the ones with TCP segments. Filtering is done by parsing the IP header protocol field. Only packets with a protocol equal to 6, which corresponds to TCP, are accepted.
145 |
146 | Similar to the previous example, compile the program by running:
147 |
148 | cd ./examples/
149 | make
150 |
151 | Before loading the program, try pinging a domain name and test the access to a web page:
152 |
153 | ping google.com
154 | curl http://www.google.com
155 |
156 | The ping must be successful and the output of the second command should be a print of the requested page's HTML code. Since `ping` uses ICMP packets and HTTP operates over TCP, once we load the program, we should continue to receive responses to curl requests and ping responses should be interrupted.
157 |
158 | Load the program using the `ip` tool:
159 |
160 | sudo ip -force link set dev eth0 xdp obj tcpfilter.o sec .text
161 |
162 | Now, try to access the same page again and then try to ping the same domain:
163 |
164 | curl http://www.google.com
165 | ping google.com
166 |
167 | Because of program *tcpfilter.o*, packets are discarded as soon as they reach the interface `eth0`, preventing access to any service that does not operate over TCP.
168 |
169 | **Extra**: Modify the program in `tcpfilter.c` so that it only accepts ICMP packets (used by `ping` utility). Also check the program in `portfilter.c`, which drops packets based on the application layer protocol used.
170 |
171 | ### Example 2: **User and kernel space interaction**
172 |
173 | File locations: `xdp1_kern.c` and `xdp1_user.c` in `samples/bpf/` in kernel source code (`~/net-next/samples/bpf/` in the VM).
174 |
175 | This example shows how to use maps in eBPF programs and how to interact with user space. The program in `xdp1_kern.c` extracts the layer 4 protocol number (TCP = 6, UDP = 17, ICMP = 1, etc) from each received packet, updates counters for each protocol and then discards the packets.
176 | The counter values are stored in a map named `rxcnt` and later consulted by the program `xdp1_user.c`, which executes in user space. Through the use of a map, both programs (one in the kernel and another in user space) can exchange information.
177 |
178 | To compile the programs, follow the instructions given earlier on how to compile sample programs from the Linux kernel.
179 |
180 | Unlike the previous examples, here the eBPF program is loaded into the kernel by the program `xdp_user.c`, in user space, without requiring the use of `ip` tool.
181 |
182 | After the program compilation, the `samples/bpf/` directory will contain the executable file `xdp1` (generated from `xdp_user.c`).
183 |
184 | ebpf@ebpf-vm:~/net-next/samples/bpf$ ./xdp1
185 | usage: xdp1 [OPTS] IFACE
186 |
187 | OPTS:
188 | -S use skb-mode
189 | -N enforce native mode
190 |
191 | To load the program in the `eth0` interface, just pass it as a parameter to `xdp1`:
192 |
193 | ./xdp1 eth0
194 |
195 | The program will go into an infinite loop, printing the number of packets received per protocol number so far.
196 |
197 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp1 eth0
198 | proto 17: 1 pkt/s
199 | proto 17: 1 pkt/s
200 | proto 17: 1 pkt/s
201 | proto 0: 1 pkt/s
202 | proto 17: 1 pkt/s
203 |
204 | In another terminal, make a request using the `ping`, `curl`, `wget` and similars to get packets to pass through the interface.
205 |
206 | It is possible to analyze map content using the Bpftool tool, already compiled and installed on the provided VM. To do this, it is first necessary to check the eBPF programs loaded on the system:
207 |
208 | sudo bpftool prog show
209 |
210 | Expected output:
211 |
212 | ebpf@ebpf-vm:~$ sudo bpftool prog show
213 | 2: cgroup_skb tag 7be49e3934a125ba gpl
214 | loaded_at 2019-04-23T12:24:29-0400 uid 0
215 | xlated 296B jited 229B memlock 4096B map_ids 2,3
216 | 3: cgroup_skb tag 2a142ef67aaad174 gpl
217 | loaded_at 2019-04-23T12:24:29-0400 uid 0
218 | xlated 296B jited 229B memlock 4096B map_ids 2,3
219 | 4: cgroup_skb tag 7be49e3934a125ba gpl
220 | loaded_at 2019-04-23T12:24:29-0400 uid 0
221 | xlated 296B jited 229B memlock 4096B map_ids 4,5
222 | 5: cgroup_skb tag 2a142ef67aaad174 gpl
223 | loaded_at 2019-04-23T12:24:29-0400 uid 0
224 | xlated 296B jited 229B memlock 4096B map_ids 4,5
225 | 6: cgroup_skb tag 7be49e3934a125ba gpl
226 | loaded_at 2019-04-23T12:24:35-0400 uid 0
227 | xlated 296B jited 229B memlock 4096B map_ids 6,7
228 | 7: cgroup_skb tag 2a142ef67aaad174 gpl
229 | loaded_at 2019-04-23T12:24:35-0400 uid 0
230 | xlated 296B jited 229B memlock 4096B map_ids 6,7
231 | 28: xdp name xdp_prog1 tag 539ec6ce11b52f98 gpl
232 | loaded_at 2019-04-23T14:34:06-0400 uid 0
233 | xlated 488B jited 336B memlock 4096B map_ids 14
234 |
235 | The last program listed corresponds to the XDP program loaded by `xdp1`. The output also indicates that it has a map with id 14. We can use this value to query the map content:
236 |
237 | sudo bpftool map dump id 14
238 |
239 | Expected output:
240 |
241 | ebpf@ebpf-vm:~$ sudo bpftool map dump id 14
242 | key:
243 | 00 00 00 00
244 | value (CPU 00): 4b 00 00 00 00 00 00 00
245 | key:
246 | 01 00 00 00
247 | value (CPU 00): 00 00 00 00 00 00 00 00
248 | key:
249 | 02 00 00 00
250 | value (CPU 00): 00 00 00 00 00 00 00 00
251 | ...
252 | (rest of output omitted)
253 |
254 | The map used is of type `BPF_MAP_TYPE_PERCPU_ARRAY`. As the name implies, it has one array per CPU used. In the map declaration, the number of elements has been set to `256`, so the output of command `bpftool` shows the `256` entries corresponding to CPU 0, the only one on the VM.
255 |
256 | **Extra**: Change the program to let packages pass, rather than being dropped. Also, change the map type to `BPF_MAP_TYPE_HASH` and check its content using `bpftool`.
257 |
258 | ### Example 3: **Cooperation between XDP and TC**
259 |
260 | File location: `./examples/layercoop.c`
261 |
262 | This example uses two eBPF programs in different layers (XDP and TC) to collect joint statistics about communication between any pair of IPs that cross the corresponding interface.
263 |
264 | As before, to compile the example just run:
265 |
266 | cd ./examples/
267 | make
268 |
269 | Now, load and attach the program from section `rx` to the XDP layer on a chosen interface, say `eth0`:
270 |
271 | sudo ip link set dev eth0 xdp obj layercoop.o sec rx
272 |
273 | The extra flag `-force` after `ip` might be necessary if another XDP program was already attached to that interface.
274 |
275 | Next, we need to load the program responsible for handling the stats collection on TX (ELF section `tx`). But before that, we need to create the `clsact` qdisc on TC:
276 |
277 | sudo tc qdisc add dev eth0 clsact
278 |
279 | Now we can load the program on the TC `egress` hook, to run it on TX:
280 |
281 | sudo tc filter add dev eth0 egress bpf da obj layercoop.o sec tx
282 |
283 | From now on, all pair of communicating IPs will have an entry on the map shared by these programs, which can be inspected using `bpftool` as explained in the previous example.
284 |
285 | Finally, to unload both programs:
286 |
287 | sudo ip link set dev eth0 xdp off
288 | sudo tc filter del dev eth0 egress
289 |
290 | ## Extra examples
291 |
292 | ### Extra example 1: **Packet filtering by TCP port**
293 |
294 | File location: `./examples/portfilter.c`
295 |
296 | This example parses packets received on an interface and discards the ones with the HTTP protocol. Discarding is done by parsing the TCP header source and destination port fields. Packets in which one of these values is 80 are discarded.
297 |
298 | Similar to the previous example, compile the program by running:
299 |
300 | cd ./examples/
301 | make
302 |
303 | Before loading the program, test the access to a web page:
304 |
305 | curl http://www.google.com
306 |
307 | The output of this command should be a print of the requested page's HTML code.
308 |
309 | Load the program using the `ip` tool:
310 |
311 | sudo ip -force link set dev eth0 xdp obj portfilter.o sec filter
312 |
313 | Now, try to access the same page again:
314 |
315 | curl http://www.google.com
316 |
317 | Because of program *portfilter.o*, packets are discarded as soon as they reach the interface `eth0`, preventing access to the web.
318 |
319 | **Extra**: Modify the program in `portfilter.c` so that it discards all ICMP packets (used by `ping` utility).
320 |
321 | ### Extra example 2: **Interaction between XDP and TC through metadata field**
322 |
323 | File location: `linux/samples/bpf/`: files `xdp2skb_meta_kern.c` and `xdp2skb_meta.sh`
324 |
325 | This example aims to demonstrate how the XDP and TC layers can interact through the use of metadata associated with a packet. File `xdp2skb_meta_kern.c` contains two separate programs, one to be loaded into XDP and one to TC, both on reception. Packets received by XDP receive custom metadata, which is read at the TC layer. Script `xdp2skb_meta.sh` is used to load the programs on their respective hooks and configure the system.
326 |
327 | To help analyze these programs as well as demonstrate an alternative way to debug eBPF programs, let's modify the `xdp2skb_meta_kern.c` file to print log messages after packet processing on each layer.
328 |
329 | To do so, we will use the helper function `bpf_trace_printk`. For ease of use, we can add the following macro to the file:
330 |
331 | ```c
332 | // Nicer way to call bpf_trace_printk()
333 | #define bpf_custom_printk(fmt, ...) \
334 | ({ \
335 | char ____fmt[] = fmt; \
336 | bpf_trace_printk(____fmt, sizeof(____fmt), \
337 | ##__VA_ARGS__); \
338 | })
339 | ```
340 | Through this macro, we can use function `bpf_trace_printk` indirectly, but with a syntax similar to function `printf`.
341 |
342 | Having added the macro, we can now use it to print metadata values on TC and XDP layers.
343 |
344 | Add to end of function *_xdp_mark()*:
345 |
346 | ```c
347 | SEC("xdp_mark")
348 | int _xdp_mark(struct xdp_md *ctx)
349 | {
350 | struct meta_info *meta;
351 | void *data, *data_end;
352 | int ret;
353 |
354 | <...> // code omitted
355 |
356 | meta->mark = 42;
357 |
358 | bpf_custom_printk("[XDP] metadata = %d\n",meta->mark); // <-- Add this line
359 |
360 | return XDP_PASS;
361 | }
362 | ```
363 |
364 | Add to end of function *_tc_mark*:
365 |
366 | ```c
367 | SEC("tc_mark")
368 | int _tc_mark(struct __sk_buff *ctx)
369 | {
370 | void *data = (void *)(unsigned long)ctx->data;
371 | void *data_end = (void *)(unsigned long)ctx->data_end;
372 | void *data_meta = (void *)(unsigned long)ctx->data_meta;
373 | struct meta_info *meta = data_meta;
374 |
375 | <...> // code omitted
376 |
377 | ctx->mark = meta->mark; /* Transfer XDP-mark to SKB-mark */
378 |
379 | bpf_custom_printk("[TC] metadata = %d\n",meta->mark); // <-- Add this line
380 |
381 | return TC_ACT_OK;
382 | }
383 | ```
384 |
385 | The `bpf_trace_printk` function requires programs that use it to be declared using GPL license. Otherwise, the program will be rejected by the verifier during kernel loading. The error message generated by the verifier is as follows:
386 |
387 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp2skb_meta.sh --dev eth0 [16/1675]
388 |
389 | Prog section 'tc_mark' rejected: Invalid argument (22)!
390 | - Type: 3
391 | - Instructions: 25 (0 over limit)
392 | - License:
393 |
394 | Verifier analysis:
395 |
396 | 0: (61) r3 = *(u32 *)(r1 +76)
397 | 1: (61) r2 = *(u32 *)(r1 +140)
398 | 2: (bf) r4 = r2
399 | 3: (07) r4 += 4
400 | 4: (3d) if r3 >= r4 goto pc+3
401 | R1=ctx(id=0,off=0,imm=0) R2=pkt_meta(id=0,off=0,r=0,imm=0) R3=pkt(id=0,off=0,r=0,imm=0) R4=pkt_meta(id=0,off=4,r=0,imm=0) R10=fp0,call_-1
402 | 5: (b7) r2 = 41
403 | 6: (63) *(u32 *)(r1 +8) = r2
404 | 7: (05) goto pc+15
405 | 23: (b7) r0 = 0
406 | 24: (95) exit
407 |
408 | from 4 to 8: R1=ctx(id=0,off=0,imm=0) R2=pkt_meta(id=0,off=0,r=4,imm=0) R3=pkt(id=0,off=0,r=0,imm=0) R4=pkt_meta(id=0,off=4,r=4,imm=0) R10=fp0,call_-1
409 | 8: (61) r3 = *(u32 *)(r2 +0)
410 | 9: (63) *(u32 *)(r1 +8) = r3
411 | 10: (b7) r1 = 680997
412 | 11: (63) *(u32 *)(r10 -16) = r1
413 | 12: (18) r1 = 0x203d206f64616461
414 | 14: (7b) *(u64 *)(r10 -24) = r1
415 | 15: (18) r1 = 0x74656d205d43545b
416 | 17: (7b) *(u64 *)(r10 -32) = r1
417 | 18: (61) r3 = *(u32 *)(r2 +0)
418 | 19: (bf) r1 = r10
419 | 20: (07) r1 += -32
420 | 21: (b7) r2 = 20
421 | 22: (85) call bpf_trace_printk#6
422 | cannot call GPL-restricted function from non-GPL compatible program
423 |
424 | Error fetching program/map!
425 | Unable to load program
426 | ERROR: Exec error(1) occurred cmd: "tc filter add dev eth0 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark"
427 |
428 | To overcome this limitation, it is necessary to declare a special global variable in the `license` ELF section with this information. This can be done by adding the following line at the end of `xdp2skb_meta_kern.c` file.
429 |
430 | ```c
431 | char _license[] SEC("license") = "GPL";
432 | ```
433 |
434 | Finally, recompile the example:
435 |
436 | cd ~/net-next
437 | make samples/bpf/
438 |
439 | Next, execute the script `xdp2skb_meta.sh` to load the programs into the kernel:
440 |
441 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp2skb_meta.sh
442 |
443 | Usage: ./xdp2skb_meta.sh [-vfh] --dev ethX
444 | -d | --dev : Network device (required)
445 | --flush : Cleanup flush TC and XDP progs
446 | --list : ($LIST) List TC and XDP progs
447 | -v | --verbose : ($VERBOSE) Verbose
448 | --dry-run : ($DRYRUN) Dry-run only (echo commands)
449 |
450 | ERROR: Please specify network device -- required option --dev
451 |
452 | Load the programs in interface `eth0`:
453 |
454 | ./xdp2skb_meta.sh --dev eth0
455 |
456 | We can also load the programs directly using tools as `ip` for the XDP program (`sudo ip -force link set dev eth0 xdp obj xdp2skb_meta_kern.o sec xdp_mark`), just as before, and `tc` for the TC hook program. In the latter case, it is necessary to create a special `qdisc` in the Linux traffic controller, called `clsact`. All this process can be done using the following commands:
457 |
458 | sudo tc qdisc add dev eth0 clsact
459 | sudo tc filter add dev eth0 ingress bpf da obj xdp2skb_meta_kern.o sec tc_mark
460 |
461 | For more information about eBPF on the TC hook, check out the command `man tc-bpf`.
462 |
463 | Once the programs have been loaded on their respective hooks, we can analyze the log messages generated by each one in the file `/sys/kernel/debug/tracing/trace`:
464 |
465 | sudo cat /sys/kernel/debug/tracing/trace
466 |
467 | For continuous reading, use the file `trace_pipe`:
468 |
469 | sudo cat /sys/kernel/debug/tracing/trace_pipe
470 |
471 | With the eBPF programs loaded in the kernel and some traffic flowing through the interface, we can observe the generated messages:
472 |
473 | ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo cat /sys/kernel/debug/tracing/trace
474 | # tracer: nop
475 | #
476 | # entries-in-buffer/entries-written: 40/40 #P:1
477 | #
478 | # _-----=> irqs-off
479 | # / _----=> need-resched
480 | # | / _---=> hardirq/softirq
481 | # || / _--=> preempt-depth
482 | # ||| / delay
483 | # TASK-PID CPU# |||| TIMESTAMP FUNCTION
484 | # | | | |||| | |
485 | -0 [000] ..s. 13699.213984: 0: [XDP] metadata = 42
486 | -0 [000] ..s. 13699.214009: 0: [TC] metadata = 42
487 | -0 [000] ..s. 13699.421529: 0: [XDP] metadata = 42
488 | -0 [000] ..s. 13699.421542: 0: [TC] metadata = 42
489 | -0 [000] ..s. 13704.450195: 0: [XDP] metadata = 42
490 | -0 [000] ..s. 13704.450205: 0: [TC] metadata = 42
491 | -0 [000] ..s. 13704.450216: 0: [XDP] metadata = 42
492 |
493 | By looking at the messages, we can see that the metadata added on the XDP hook could be successfully received by the program on the TC hook, effectively sharing information between the two kernel stack layers.
494 |
--------------------------------------------------------------------------------
/ansible/README.md:
--------------------------------------------------------------------------------
1 | # Script de configuração de ambiente para eBPF
2 |
3 | Ao invés de usar a VM disponibilizada para o minicurso, também é possível provisionar uma outra máquina com todos as dependências necessárias. Para isso disponibilizamos o script `provision.yml`, que pode ser utilizando com a ferramenta [ansible](https://github.com/ansible/ansible).
4 |
5 | **ATENÇÃO**: Este script instala a versão 5.0 do kernel do Linux, além de diversas outras dependências. Aconselhamos a sua utilização apenas em máquinas de teste/desenvolvimento.
6 |
7 | Essa ferramenta se conecta por SSH com uma máquina remota e efetua todos os passos de instalação especificados no script. Em sua máquina local, execute:
8 |
9 | ansible-playbook -i "," -u -k -K -e "ansible_python_interpreter=/usr/bin/python3" provision.yml
10 |
11 | Onde `` e `` devem ser substituídos pelos valores correspondentes à máquina alvo. O ansible pedirá a senha de acesso SSH e a senha para executar operações como super usuário na máquina remota.
--------------------------------------------------------------------------------
/ansible/provision.yml:
--------------------------------------------------------------------------------
1 | ---
2 | - name: Provision environment to play with BPF
3 | hosts: all
4 | gather_facts: yes
5 | become: yes
6 | become_user: root
7 | tasks:
8 | - name: Add user ebpf
9 | user:
10 | name: ebpf
11 | groups: sudo
12 | append: yes
13 | - name: Create dir for kernel files
14 | file:
15 | path: /usr/src/kernel-v5.0
16 | state: directory
17 | - name: Download kernel v5.0
18 | get_url:
19 | url: "{{ item }}"
20 | dest: /usr/src/kernel-v5.0/
21 | loop:
22 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-headers-5.0.0-050000_5.0.0-050000.201903032031_all.deb"
23 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-headers-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb"
24 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-image-unsigned-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb"
25 | - "https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/linux-modules-5.0.0-050000-generic_5.0.0-050000.201903032031_amd64.deb"
26 | when: ansible_kernel is not match("5\.0.*")
27 | - name: Install kernel v5.0
28 | shell: dpkg -i *.deb
29 | args:
30 | chdir: /usr/src/kernel-v5.0/
31 | register: install_result
32 | when: ansible_kernel is not match("5\.0.*")
33 | - name: Reboot and wait
34 | reboot:
35 | when: install_result is changed
36 | - name: Get kernel version
37 | shell: uname -r
38 | register: result
39 | - name: Fail if kernel 5.0 is not loaded
40 | fail:
41 | msg: "Wrong kernel loaded ({{ kernel_version }}}). Expected 5.0"
42 | vars:
43 | kernel_version: "{{ result.stdout }}"
44 | when: kernel_version is not match("5\.0.*")
45 | - name: Install dependencies
46 | apt:
47 | name: "{{ item }}"
48 | update_cache: yes # Run 'apt update' before install
49 | loop:
50 | - make
51 | - gcc
52 | - g++
53 | - pkg-config
54 | - libssl-dev
55 | - bc
56 | - libelf-dev
57 | - libcap-dev
58 | - gcc-multilib
59 | - libncurses5-dev
60 | - git
61 | - pkg-config
62 | - graphviz
63 | - llvm
64 | - clang
65 | - elfutils
66 | - libmnl-dev
67 | - bison
68 | - flex
69 | - ifupdown
70 | - python-scapy
71 | - python-netifaces
72 | - binutils-dev
73 | - hping3
74 | - net-tools
75 | - python-pip
76 | - autoconf
77 | - automake
78 | - libtool
79 | - unzip
80 | - curl
81 | - python-twisted
82 | tags: install
83 | - name: Clone newest iproute2 repo
84 | git:
85 | repo: https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
86 | dest: /home/ebpf/iproute2/
87 | tags: [clone, iproute]
88 | ignore_errors: yes
89 | - name: Install iproute2
90 | shell: ./configure && make && make install
91 | args:
92 | chdir: /home/ebpf/iproute2
93 | ignore_errors: yes
94 | tags: iproute
95 | - name: Clone prototype-kernel project
96 | git:
97 | repo: https://github.com/netoptimizer/prototype-kernel.git
98 | dest: /home/ebpf/prototype-kernel
99 | ignore_errors: yes
100 | tags: clone
101 | - name: Clone linux kernel net-next tree
102 | git:
103 | repo: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
104 | dest: /home/ebpf/net-next
105 | version: v5.0
106 | depth: 1 # Reduce commit history to download faster
107 | ignore_errors: yes
108 | tags: clone
109 | - name: Install bpftool
110 | shell: make && make install
111 | args:
112 | chdir: /home/ebpf/net-next/tools/bpf/bpftool
113 | ignore_errors: yes
114 | tags: kernel
115 | - name: Clone protobuf
116 | git:
117 | repo: https://github.com/protocolbuffers/protobuf.git
118 | dest: /usr/src/protobuf-3.5.0
119 | version: v3.5.0
120 | - name: Compile protobuf
121 | shell: ./autogen.sh && ./configure && make && make check && make install && ldconfig
122 | args:
123 | chdir: /usr/src/protobuf-3.5.0
124 | - name: Clone protobuf-c
125 | git:
126 | repo: https://github.com/protobuf-c/protobuf-c.git
127 | dest: /usr/src/protobuf-c
128 | version: v1.3.0
129 | - name: Compile protobuf
130 | shell: ./autogen.sh && ./configure && make && make install
131 | args:
132 | chdir: /usr/src/protobuf-c
133 | - name: Clone BPFabric
134 | git:
135 | repo: https://github.com/UofG-netlab/BPFabric
136 | dest: /home/ebpf/BPFabric
137 | tags: clone
138 | - name: Patch BPFabric to use latest clang instead of clang-3.9
139 | replace:
140 | path: /home/ebpf/BPFabric/examples/Makefile
141 | regexp: 'clang-3\.9'
142 | replace: 'clang'
143 | tags: bpfabric
144 | - name: Compile BPFabric
145 | shell: make
146 | args:
147 | chdir: /home/ebpf/BPFabric
148 | tags: bpfabric
--------------------------------------------------------------------------------
/examples/Makefile:
--------------------------------------------------------------------------------
1 | # SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
2 |
3 | LLVM_VERSION ?=
4 | LLVM := $(shell clang$(LLVM_VERSION) --version)
5 | CLANG_FLAGS ?= -W -Wall -Wno-compare-distinct-pointer-types -g
6 |
7 | SRCS=$(wildcard *.c)
8 | OBJS=$(patsubst %.c,%.o,$(SRCS))
9 | Q ?= @
10 |
11 | INCLUDE_DIRS ?= -I../headers/
12 |
13 | %.o: %.c
14 | @echo "\tLLVM CC $@"
15 | $(Q) clang$(LLVM_VERSION) $(INCLUDE_DIRS) -O2 -emit-llvm -c $< $(CLANG_FLAGS) -o $(patsubst %.o,%.llvm,$@)
16 | $(Q) llc$(LLVM_VERSION) -march=bpf -filetype=obj -o $@ $(patsubst %.o,%.llvm,$@)
17 | $(Q) rm $(patsubst %.o,%.llvm,$@)
18 |
19 | ifeq ($(LLVM),)
20 | all:
21 | $(warning Install LLVM to compile BPF sources)
22 | else
23 | all: $(OBJS)
24 | endif
25 |
26 | clean:
27 | rm -f *.llvm
28 | rm -f *.o
29 |
30 | .PHONY: all clean
31 |
--------------------------------------------------------------------------------
/examples/chacha8.c:
--------------------------------------------------------------------------------
1 |
2 |
3 | //------------------------------------------------------------------
4 | // Includes.
5 | //------------------------------------------------------------------
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 | #include
13 | #include
14 | #include
15 | #include
16 | #include
17 | #include "bpf_endian.h"
18 | #include "bpf_helpers.h"
19 |
20 | #ifndef IP_FRAGMENTED
21 | #define IP_FRAGMENTED 65343
22 | #endif
23 |
24 | #ifndef __inline
25 | #define __inline inline __attribute__((always_inline))
26 | #endif
27 |
28 | #ifndef CHACHA_ROUNDS
29 | #define CHACHA_ROUNDS 8
30 | #endif
31 |
32 | #ifndef memset
33 | #define memset(dest, c_int, n) __builtin_memset((dest), (c_int), (n))
34 | #endif
35 |
36 | //------------------------------------------------------------------
37 | // Types.
38 | //------------------------------------------------------------------
39 | // The chacha state context.
40 | typedef struct
41 | {
42 | uint32_t state[16];
43 | uint8_t rounds;
44 | } chacha_ctx;
45 |
46 |
47 | //------------------------------------------------------------------
48 | // Macros.
49 | //------------------------------------------------------------------
50 | // Basic 32-bit operators.
51 | #define ROTATE(v,c) ((uint32_t)((v) << (c)) | ((v) >> (32 - (c))))
52 | #define XOR(v,w) ((v) ^ (w))
53 | #define PLUS(v,w) ((uint32_t)((v) + (w)))
54 | #define PLUSONE(v) (PLUS((v), 1))
55 |
56 | // Little endian machine assumed (x86-64).
57 | #define U32TO8_LITTLE(p, v) (((uint32_t*)(p))[0] = v)
58 | #define U8TO32_LITTLE(p) (((uint32_t*)(p))[0])
59 |
60 | #define QUARTERROUND(a, b, c, d) \
61 | x[a] = PLUS(x[a],x[b]); x[d] = ROTATE(XOR(x[d],x[a]),16); \
62 | x[c] = PLUS(x[c],x[d]); x[b] = ROTATE(XOR(x[b],x[c]),12); \
63 | x[a] = PLUS(x[a],x[b]); x[d] = ROTATE(XOR(x[d],x[a]), 8); \
64 | x[c] = PLUS(x[c],x[d]); x[b] = ROTATE(XOR(x[b],x[c]), 7);
65 |
66 |
67 | //------------------------------------------------------------------
68 | // Constants.
69 | //------------------------------------------------------------------
70 | //static const uint8_t SIGMA[16] = "expand 32-byte k";
71 | static const uint8_t TAU[16] = "expand 16-byte k";
72 |
73 |
74 | //------------------------------------------------------------------
75 | // doublerounds()
76 | //
77 | // Perform rounds/2 number of doublerounds.
78 | // TODO: Change output format to 16 words.
79 | //------------------------------------------------------------------
80 | static __inline void doublerounds(uint8_t output[64], const uint32_t input[16], uint8_t rounds)
81 | {
82 | uint32_t x[16];
83 | int32_t i;
84 |
85 | #pragma clang loop unroll (full)
86 | for (i = 0;i < 16;++i) {
87 | x[i] = input[i];
88 | }
89 |
90 | #pragma clang loop unroll (full)
91 | for (i = rounds ; i > 0 ; i -= 2) {
92 | QUARTERROUND( 0, 4, 8,12)
93 | QUARTERROUND( 1, 5, 9,13)
94 | QUARTERROUND( 2, 6,10,14)
95 | QUARTERROUND( 3, 7,11,15)
96 |
97 | QUARTERROUND( 0, 5,10,15)
98 | QUARTERROUND( 1, 6,11,12)
99 | QUARTERROUND( 2, 7, 8,13)
100 | QUARTERROUND( 3, 4, 9,14)
101 | }
102 |
103 | #pragma clang loop unroll (full)
104 | for (i = 0;i < 16;++i) {
105 | x[i] = PLUS(x[i], input[i]);
106 | }
107 |
108 | #pragma clang loop unroll (full)
109 | for (i = 0;i < 16;++i) {
110 | U32TO8_LITTLE(output + 4 * i, x[i]);
111 | }
112 | }
113 |
114 |
115 |
116 | //------------------------------------------------------------------
117 | // init()
118 | //
119 | // Initializes the given cipher context with key, iv and constants.
120 | // This also resets the block counter.
121 | //------------------------------------------------------------------
122 | static __inline void init(chacha_ctx *x, uint8_t *key, uint32_t keylen, uint8_t *iv)
123 | {
124 | /*
125 | if (keylen == 256) {
126 | // 256 bit key.
127 | x->state[0] = U8TO32_LITTLE(SIGMA + 0);
128 | x->state[1] = U8TO32_LITTLE(SIGMA + 4);
129 | x->state[2] = U8TO32_LITTLE(SIGMA + 8);
130 | x->state[3] = U8TO32_LITTLE(SIGMA + 12);
131 | x->state[4] = U8TO32_LITTLE(key + 0);
132 | x->state[5] = U8TO32_LITTLE(key + 4);
133 | x->state[6] = U8TO32_LITTLE(key + 8);
134 | x->state[7] = U8TO32_LITTLE(key + 12);
135 | x->state[8] = U8TO32_LITTLE(key + 16);
136 | x->state[9] = U8TO32_LITTLE(key + 20);
137 | x->state[10] = U8TO32_LITTLE(key + 24);
138 | x->state[11] = U8TO32_LITTLE(key + 28);
139 | }
140 |
141 | else {
142 | // 128 bit key.
143 | x->state[0] = U8TO32_LITTLE(TAU + 0);
144 | x->state[1] = U8TO32_LITTLE(TAU + 4);
145 | x->state[2] = U8TO32_LITTLE(TAU + 8);
146 | x->state[3] = U8TO32_LITTLE(TAU + 12);
147 | x->state[4] = U8TO32_LITTLE(key + 0);
148 | x->state[5] = U8TO32_LITTLE(key + 4);
149 | x->state[6] = U8TO32_LITTLE(key + 8);
150 | x->state[7] = U8TO32_LITTLE(key + 12);
151 | x->state[8] = U8TO32_LITTLE(key + 0);
152 | x->state[9] = U8TO32_LITTLE(key + 4);
153 | x->state[10] = U8TO32_LITTLE(key + 8);
154 | x->state[11] = U8TO32_LITTLE(key + 12);
155 | }
156 | */
157 | // 128 bit key.
158 | x->state[0] = U8TO32_LITTLE(TAU + 0);
159 | x->state[1] = U8TO32_LITTLE(TAU + 4);
160 | x->state[2] = U8TO32_LITTLE(TAU + 8);
161 | x->state[3] = U8TO32_LITTLE(TAU + 12);
162 | x->state[4] = U8TO32_LITTLE(key + 0);
163 | x->state[5] = U8TO32_LITTLE(key + 4);
164 | x->state[6] = U8TO32_LITTLE(key + 8);
165 | x->state[7] = U8TO32_LITTLE(key + 12);
166 | x->state[8] = U8TO32_LITTLE(key + 0);
167 | x->state[9] = U8TO32_LITTLE(key + 4);
168 | x->state[10] = U8TO32_LITTLE(key + 8);
169 | x->state[11] = U8TO32_LITTLE(key + 12);
170 |
171 | // Reset block counter and add IV to state.
172 | x->state[12] = 0;
173 | x->state[13] = 0;
174 | x->state[14] = U8TO32_LITTLE(iv + 0);
175 | x->state[15] = U8TO32_LITTLE(iv + 4);
176 | }
177 |
178 |
179 | //------------------------------------------------------------------
180 | // next()
181 | //
182 | // Given a pointer to the next block m of 64 cleartext bytes will
183 | // use the given context to transform (encrypt/decrypt) the
184 | // block. The result will be stored in c.
185 | //------------------------------------------------------------------
186 | static __inline void next(chacha_ctx *ctx, uint8_t *m, const uint8_t *m_end)
187 | {
188 | // Temporary internal state x.
189 | uint8_t x[64];
190 | uint8_t i;
191 |
192 | // Update the internal state and increase the block counter.
193 | doublerounds(x, ctx->state, ctx->rounds);
194 | ctx->state[12] = PLUSONE(ctx->state[12]);
195 | if (!ctx->state[12]) {
196 | ctx->state[13] = PLUSONE(ctx->state[13]);
197 | }
198 |
199 | // XOR the input block with the new temporal state to
200 | // create the transformed block.
201 | /*
202 | if (m+64 > m_end) {
203 | return;
204 | }
205 |
206 | #pragma clang loop unroll (full)
207 | for (i = 0 ; i < 64 ; ++i) {
208 | //c[i] = m[i] ^ x[i];
209 | m[i] ^= x[i];
210 | }
211 | */
212 | uint64_t * m_pos;
213 | uint64_t * x_pos;
214 | #pragma clang loop unroll (full)
215 | for (i = 0 ; i < 8 ; ++i) {
216 | //c[i] = m[i] ^ x[i];
217 | m_pos = (uint64_t*)(m) + i;
218 | x_pos = (uint64_t*)(x) + i;
219 | *m_pos ^= *x_pos;
220 | }
221 | }
222 |
223 |
224 | //------------------------------------------------------------------
225 | // init_ctx()
226 | //
227 | // Init a given ChaCha context by setting state to zero and
228 | // setting the given number of rounds.
229 | //------------------------------------------------------------------
230 | static __inline void init_ctx(chacha_ctx *ctx, uint8_t rounds)
231 | {
232 | uint8_t i;
233 |
234 | #pragma clang loop unroll (full)
235 | for (i = 0 ; i < 16 ; i++) {
236 | ctx->state[i] = 0;
237 | }
238 | ctx->rounds = rounds;
239 | }
240 |
241 | /*
242 | static __inline int parse_tcp(struct xdp_md *ctx, __u64 nf_off) {
243 | void *data_end = (void*)(long)ctx->data_end;
244 | void *data = (void*)(long)ctx->data;
245 | chacha_ctx cha_ctx;
246 | //struct tcphdr *tcph;
247 | uint8_t *pkt_data;
248 | __u32 tcp_off;
249 | //tcph = data + nf_off;
250 | // pkt_data is the tcp payload
251 | tcp_off = sizeof(struct tcphdr);
252 | nf_off += tcp_off;
253 | if (data + nf_off > data_end) {
254 | return XDP_DROP;
255 | }
256 | pkt_data = data + nf_off;
257 | return XDP_PASS;
258 | }*/
259 |
260 | static __always_inline bool parse_transport(void *data, __u64 off, void *data_end) {
261 | struct udphdr *tudp;
262 | tudp = data + off;
263 | if (tudp + 1 > data_end) {
264 | return false;
265 | }
266 | else {
267 | return true;
268 | }
269 | }
270 |
271 |
272 | static __inline int parse_ip(struct xdp_md *ctx, __u64 nf_off) {
273 | void *data_end = (void*)(long)ctx->data_end;
274 | void *data = (void*)(long)ctx->data;
275 | struct iphdr *iph;
276 | __u32 ip_off;
277 | __u8 ip_protocol;
278 |
279 | iph = data + nf_off;
280 | if (iph + 1 > data_end) {
281 | return XDP_DROP;
282 | }
283 | if (iph->ihl != 5) {
284 | return XDP_DROP;
285 | }
286 | ip_protocol = iph->protocol;
287 | ip_off = sizeof(struct iphdr);
288 | nf_off += ip_off;
289 |
290 | if (iph->frag_off & IP_FRAGMENTED) {
291 | return XDP_DROP;
292 | }
293 |
294 | if (ip_protocol == IPPROTO_TCP) {
295 | if (!parse_transport(data, nf_off, data_end)) {
296 | return XDP_DROP;
297 | }
298 | else {
299 | nf_off += sizeof(struct tcphdr);
300 | }
301 | }
302 | else if (ip_protocol == IPPROTO_UDP) {
303 | if (!parse_transport(data, nf_off, data_end)) {
304 | return XDP_DROP;
305 | }
306 | else {
307 | nf_off += sizeof(struct udphdr);
308 | }
309 | }
310 | else {
311 | return XDP_PASS;
312 | }
313 |
314 | chacha_ctx cha_ctx;
315 | uint8_t *pkt_data = data + nf_off;
316 | /*
317 | uint8_t t_result[64] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
318 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
319 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
320 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
321 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
322 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
323 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
324 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
325 | */
326 | uint8_t t_key[32] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
327 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
328 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
329 | 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
330 |
331 | uint8_t t_iv[8] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
332 | init_ctx(&cha_ctx, CHACHA_ROUNDS);
333 | init(&cha_ctx, t_key, 128, t_iv);
334 |
335 | // loop here
336 | int32_t i;
337 | #pragma clang loop unroll (full)
338 | for(i=0; i <= 23; i++) {
339 | if (pkt_data + 64 > data_end) {
340 | break;
341 | }
342 | //next(&cha_ctx, pkt_data, data_end, t_result);
343 | next(&cha_ctx, pkt_data, data_end);
344 | //memcpy(pkt_data, t_result, sizeof(t_result));
345 | pkt_data += (__u64)64;
346 | }
347 | return XDP_TX;
348 | }
349 |
350 |
351 | SEC("chacha")
352 | int cha(struct xdp_md *ctx){
353 | void *data_end = (void *)(long)ctx->data_end;
354 | void *data = (void *)(long)ctx->data;
355 | struct ethhdr *eth = data;
356 | __u32 eth_proto;
357 | __u32 nh_off;
358 | nh_off = sizeof(struct ethhdr);
359 | if (data + nh_off > data_end)
360 | return XDP_PASS;
361 | eth_proto = eth->h_proto;
362 |
363 | // the demo program only accepts IPv4 packets.
364 | if (eth_proto == bpf_htons(ETH_P_IP)) {
365 | return parse_ip(ctx, nh_off);
366 | }
367 | else {
368 | return XDP_PASS;
369 | }
370 | }
371 |
372 |
--------------------------------------------------------------------------------
/examples/dropworld.c:
--------------------------------------------------------------------------------
1 | #include
2 |
3 | int prog(struct xdp_md *ctx){
4 | return XDP_DROP;
5 | }
--------------------------------------------------------------------------------
/examples/layercoop.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 |
12 | #include "bpf_endian.h"
13 | #include "bpf_helpers.h"
14 |
15 | struct pair {
16 | uint32_t lip; // local IP
17 | uint32_t rip; // remote IP
18 | };
19 |
20 | struct stats {
21 | uint64_t tx_cnt;
22 | uint64_t rx_cnt;
23 | uint64_t tx_bytes;
24 | uint64_t rx_bytes;
25 | };
26 |
27 | struct bpf_elf_map SEC("maps") trackers = {
28 | .type = BPF_MAP_TYPE_HASH,
29 | .size_key = sizeof(struct pair),
30 | .size_value = sizeof(struct stats),
31 | .max_elem = 2048,
32 | .pinning = 2, // PIN_GLOBAL_NS
33 | };
34 |
35 | static bool parse_ipv4(bool is_rx, void* data, void* data_end, struct pair *pair){
36 | struct ethhdr *eth = data;
37 | struct iphdr *ip;
38 |
39 | if(data + sizeof(struct ethhdr) > data_end)
40 | return false;
41 |
42 | if(bpf_ntohs(eth->h_proto) != ETH_P_IP)
43 | return false;
44 |
45 | ip = data + sizeof(struct ethhdr);
46 |
47 | if ((void*) ip + sizeof(struct iphdr) > data_end)
48 | return false;
49 |
50 | pair->lip = is_rx ? ip->daddr : ip->saddr;
51 | pair->rip = is_rx ? ip->saddr : ip->daddr;
52 |
53 | return true;
54 | }
55 |
56 | static void update_stats(bool is_rx, struct pair *key, long long bytes){
57 | struct stats *stats, newstats = {0,0,0,0};
58 |
59 | stats = bpf_map_lookup_elem(&trackers, key);
60 | if(stats){
61 | if(is_rx){
62 | stats->rx_cnt++;
63 | stats->rx_bytes += bytes;
64 | }else{
65 | stats->tx_cnt++;
66 | stats->tx_bytes += bytes;
67 | }
68 | }else{
69 | if(is_rx){
70 | newstats.rx_cnt = 1;
71 | newstats.rx_bytes = bytes;
72 | }else{
73 | newstats.tx_cnt = 1;
74 | newstats.tx_bytes = bytes;
75 | }
76 |
77 | bpf_map_update_elem(&trackers, key, &newstats, BPF_NOEXIST);
78 | }
79 | }
80 |
81 | SEC("rx")
82 | int track_rx(struct xdp_md *ctx)
83 | {
84 | void *data_end = (void *)(long)ctx->data_end;
85 | void *data = (void *)(long)ctx->data;
86 | struct pair pair;
87 |
88 | if(!parse_ipv4(true,data,data_end,&pair))
89 | return XDP_PASS;
90 |
91 | // Update RX statistics
92 | update_stats(true,&pair,data_end-data);
93 |
94 | return XDP_PASS;
95 | }
96 |
97 | SEC("tx")
98 | int track_tx(struct __sk_buff *skb)
99 | {
100 | void *data_end = (void *)(long)skb->data_end;
101 | void *data = (void *)(long)skb->data;
102 | struct pair pair;
103 |
104 | if(!parse_ipv4(false,data,data_end,&pair))
105 | return TC_ACT_OK;
106 |
107 | // Update TX statistics
108 | update_stats(false,&pair,data_end-data);
109 |
110 | return TC_ACT_OK;
111 | }
--------------------------------------------------------------------------------
/examples/portfilter.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include "bpf_endian.h"
12 | #include "bpf_helpers.h"
13 |
14 | /* 0x3FFF mask to check for fragment offset field */
15 | #define IP_FRAGMENTED 65343
16 |
17 | /* Port number to be dropped */
18 | #define PORT_DROP 80
19 |
20 | static __always_inline int process_packet(struct xdp_md *ctx, __u64 off){
21 |
22 | void *data_end = (void *)(long)ctx->data_end;
23 | void *data = (void *)(long)ctx->data;
24 | struct iphdr *iph;
25 | struct tcphdr *tcp;
26 | __u16 payload_len;
27 | __u8 protocol;
28 |
29 | iph = data + off;
30 | if (iph + 1 > data_end)
31 | return XDP_PASS;
32 | if (iph->ihl != 5)
33 | return XDP_PASS;
34 |
35 | protocol = iph->protocol;
36 | payload_len = bpf_ntohs(iph->tot_len);
37 | off += sizeof(struct iphdr);
38 |
39 | /* do not support fragmented packets as L4 headers may be missing */
40 | if (iph->frag_off & IP_FRAGMENTED)
41 | return XDP_PASS;
42 |
43 | if (protocol == IPPROTO_TCP) {
44 | tcp = data + off;
45 | if(tcp + 1 > data_end)
46 | return XDP_PASS;
47 |
48 | /* Drop if using port PORT_DROP */
49 | if(tcp->source == bpf_htons(PORT_DROP) || tcp->dest == bpf_htons(PORT_DROP))
50 | return XDP_DROP;
51 | else
52 | return XDP_PASS;
53 |
54 | } else if (protocol == IPPROTO_UDP) {
55 | return XDP_PASS;
56 | }
57 |
58 | return XDP_PASS;
59 | }
60 |
61 |
62 | SEC("filter")
63 | int pfilter(struct xdp_md *ctx){
64 |
65 | void *data_end = (void *)(long)ctx->data_end;
66 | void *data = (void *)(long)ctx->data;
67 | struct ethhdr *eth = data;
68 | __u32 eth_proto;
69 | __u32 nh_off;
70 |
71 | nh_off = sizeof(struct ethhdr);
72 | if (data + nh_off > data_end)
73 | return XDP_PASS;
74 | eth_proto = eth->h_proto;
75 |
76 | /* demo program only accepts ipv4 packets */
77 | if (eth_proto == bpf_htons(ETH_P_IP))
78 | return process_packet(ctx, nh_off);
79 | else
80 | return XDP_PASS;
81 | }
--------------------------------------------------------------------------------
/examples/tcpfilter.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 | #include "bpf_endian.h"
7 |
8 | int isTCP( struct xdp_md *ctx ) {
9 | void *data_end = (void *)(long) ctx->data_end;
10 | void *data_begin = (void *)(long) ctx->data;
11 | struct ethhdr* eth = data_begin;
12 |
13 | // Check packet's size
14 | if(eth + 1 > data_end)
15 | return XDP_PASS;
16 |
17 | // Check if Ethernet frame has IPv4 packet
18 | if (eth->h_proto == bpf_htons( ETH_P_IP )) {
19 | struct iphdr *ipv4 = (struct iphdr *)( ((void*)eth) + ETH_HLEN );
20 |
21 | if(ipv4 + 1 > data_end)
22 | return XDP_PASS;
23 |
24 | // Check if IPv4 packet contains a TCP segment
25 | if (ipv4->protocol == IPPROTO_TCP)
26 | return XDP_PASS;
27 | }
28 | return XDP_DROP;
29 | }
--------------------------------------------------------------------------------
/headers/bpf_endian.h:
--------------------------------------------------------------------------------
1 | /* SPDX-License-Identifier: GPL-2.0 */
2 | /* Copied from $(LINUX)/tools/testing/selftests/bpf/bpf_endian.h */
3 | #ifndef __BPF_ENDIAN__
4 | #define __BPF_ENDIAN__
5 |
6 | #include
7 |
8 | /* LLVM's BPF target selects the endianness of the CPU
9 | * it compiles on, or the user specifies (bpfel/bpfeb),
10 | * respectively. The used __BYTE_ORDER__ is defined by
11 | * the compiler, we cannot rely on __BYTE_ORDER from
12 | * libc headers, since it doesn't reflect the actual
13 | * requested byte order.
14 | *
15 | * Note, LLVM's BPF target has different __builtin_bswapX()
16 | * semantics. It does map to BPF_ALU | BPF_END | BPF_TO_BE
17 | * in bpfel and bpfeb case, which means below, that we map
18 | * to cpu_to_be16(). We could use it unconditionally in BPF
19 | * case, but better not rely on it, so that this header here
20 | * can be used from application and BPF program side, which
21 | * use different targets.
22 | */
23 | #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
24 | # define __bpf_ntohs(x)__builtin_bswap16(x)
25 | # define __bpf_htons(x)__builtin_bswap16(x)
26 | # define __bpf_constant_ntohs(x)___constant_swab16(x)
27 | # define __bpf_constant_htons(x)___constant_swab16(x)
28 | # define __bpf_ntohl(x)__builtin_bswap32(x)
29 | # define __bpf_htonl(x)__builtin_bswap32(x)
30 | # define __bpf_constant_ntohl(x)___constant_swab32(x)
31 | # define __bpf_constant_htonl(x)___constant_swab32(x)
32 | #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
33 | # define __bpf_ntohs(x)(x)
34 | # define __bpf_htons(x)(x)
35 | # define __bpf_constant_ntohs(x)(x)
36 | # define __bpf_constant_htons(x)(x)
37 | # define __bpf_ntohl(x)(x)
38 | # define __bpf_htonl(x)(x)
39 | # define __bpf_constant_ntohl(x)(x)
40 | # define __bpf_constant_htonl(x)(x)
41 | #else
42 | # error "Fix your compiler's __BYTE_ORDER__?!"
43 | #endif
44 |
45 | #define bpf_htons(x)\
46 | (__builtin_constant_p(x) ?\
47 | __bpf_constant_htons(x) : __bpf_htons(x))
48 | #define bpf_ntohs(x)\
49 | (__builtin_constant_p(x) ?\
50 | __bpf_constant_ntohs(x) : __bpf_ntohs(x))
51 | #define bpf_htonl(x)\
52 | (__builtin_constant_p(x) ?\
53 | __bpf_constant_htonl(x) : __bpf_htonl(x))
54 | #define bpf_ntohl(x)\
55 | (__builtin_constant_p(x) ?\
56 | __bpf_constant_ntohl(x) : __bpf_ntohl(x))
57 |
58 | #endif /* __BPF_ENDIAN__ */
59 |
--------------------------------------------------------------------------------
/headers/bpf_helpers.h:
--------------------------------------------------------------------------------
1 | /* SPDX-License-Identifier: GPL-2.0 */
2 | /* Copied from $(LINUX)/tools/testing/selftests/bpf/bpf_helpers.h */
3 | #ifndef __BPF_HELPERS_H
4 | #define __BPF_HELPERS_H
5 |
6 | /* helper macro to place programs, maps, license in
7 | * different sections in elf_bpf file. Section names
8 | * are interpreted by elf_bpf loader
9 | */
10 | #define SEC(NAME) __attribute__((section(NAME), used))
11 |
12 | /* helper functions called from eBPF programs written in C */
13 | static void *(*bpf_map_lookup_elem)(void *map, void *key) =
14 | (void *) BPF_FUNC_map_lookup_elem;
15 | static int (*bpf_map_update_elem)(void *map, void *key, void *value,
16 | unsigned long long flags) =
17 | (void *) BPF_FUNC_map_update_elem;
18 | static int (*bpf_map_delete_elem)(void *map, void *key) =
19 | (void *) BPF_FUNC_map_delete_elem;
20 | static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) =
21 | (void *) BPF_FUNC_probe_read;
22 | static unsigned long long (*bpf_ktime_get_ns)(void) =
23 | (void *) BPF_FUNC_ktime_get_ns;
24 | static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
25 | (void *) BPF_FUNC_trace_printk;
26 | static void (*bpf_tail_call)(void *ctx, void *map, int index) =
27 | (void *) BPF_FUNC_tail_call;
28 | static unsigned long long (*bpf_get_smp_processor_id)(void) =
29 | (void *) BPF_FUNC_get_smp_processor_id;
30 | static unsigned long long (*bpf_get_current_pid_tgid)(void) =
31 | (void *) BPF_FUNC_get_current_pid_tgid;
32 | static unsigned long long (*bpf_get_current_uid_gid)(void) =
33 | (void *) BPF_FUNC_get_current_uid_gid;
34 | static int (*bpf_get_current_comm)(void *buf, int buf_size) =
35 | (void *) BPF_FUNC_get_current_comm;
36 | static unsigned long long (*bpf_perf_event_read)(void *map,
37 | unsigned long long flags) =
38 | (void *) BPF_FUNC_perf_event_read;
39 | static int (*bpf_clone_redirect)(void *ctx, int ifindex, int flags) =
40 | (void *) BPF_FUNC_clone_redirect;
41 | static int (*bpf_redirect)(int ifindex, int flags) =
42 | (void *) BPF_FUNC_redirect;
43 | static int (*bpf_perf_event_output)(void *ctx, void *map,
44 | unsigned long long flags, void *data,
45 | int size) =
46 | (void *) BPF_FUNC_perf_event_output;
47 | static int (*bpf_get_stackid)(void *ctx, void *map, int flags) =
48 | (void *) BPF_FUNC_get_stackid;
49 | static int (*bpf_probe_write_user)(void *dst, void *src, int size) =
50 | (void *) BPF_FUNC_probe_write_user;
51 | static int (*bpf_current_task_under_cgroup)(void *map, int index) =
52 | (void *) BPF_FUNC_current_task_under_cgroup;
53 | static int (*bpf_skb_get_tunnel_key)(void *ctx, void *key, int size, int flags) =
54 | (void *) BPF_FUNC_skb_get_tunnel_key;
55 | static int (*bpf_skb_set_tunnel_key)(void *ctx, void *key, int size, int flags) =
56 | (void *) BPF_FUNC_skb_set_tunnel_key;
57 | static int (*bpf_skb_get_tunnel_opt)(void *ctx, void *md, int size) =
58 | (void *) BPF_FUNC_skb_get_tunnel_opt;
59 | static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) =
60 | (void *) BPF_FUNC_skb_set_tunnel_opt;
61 | static unsigned long long (*bpf_get_prandom_u32)(void) =
62 | (void *) BPF_FUNC_get_prandom_u32;
63 | static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
64 | (void *) BPF_FUNC_xdp_adjust_head;
65 |
66 | /* llvm builtin functions that eBPF C program may use to
67 | * emit BPF_LD_ABS and BPF_LD_IND instructions
68 | */
69 | struct sk_buff;
70 | unsigned long long load_byte(void *skb,
71 | unsigned long long off) asm("llvm.bpf.load.byte");
72 | unsigned long long load_half(void *skb,
73 | unsigned long long off) asm("llvm.bpf.load.half");
74 | unsigned long long load_word(void *skb,
75 | unsigned long long off) asm("llvm.bpf.load.word");
76 |
77 | /* a helper structure used by eBPF C program
78 | * to describe map attributes to elf_bpf loader
79 | */
80 | struct bpf_map_def {
81 | unsigned int type;
82 | unsigned int key_size;
83 | unsigned int value_size;
84 | unsigned int max_entries;
85 | unsigned int map_flags;
86 | unsigned int inner_map_idx;
87 | };
88 |
89 | static int (*bpf_skb_load_bytes)(void *ctx, int off, void *to, int len) =
90 | (void *) BPF_FUNC_skb_load_bytes;
91 | static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from, int len, int flags) =
92 | (void *) BPF_FUNC_skb_store_bytes;
93 | static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int flags) =
94 | (void *) BPF_FUNC_l3_csum_replace;
95 | static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flags) =
96 | (void *) BPF_FUNC_l4_csum_replace;
97 | static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) =
98 | (void *) BPF_FUNC_skb_under_cgroup;
99 | static int (*bpf_skb_change_head)(void *, int len, int flags) =
100 | (void *) BPF_FUNC_skb_change_head;
101 |
102 | #if defined(__x86_64__)
103 |
104 | #define PT_REGS_PARM1(x) ((x)->di)
105 | #define PT_REGS_PARM2(x) ((x)->si)
106 | #define PT_REGS_PARM3(x) ((x)->dx)
107 | #define PT_REGS_PARM4(x) ((x)->cx)
108 | #define PT_REGS_PARM5(x) ((x)->r8)
109 | #define PT_REGS_RET(x) ((x)->sp)
110 | #define PT_REGS_FP(x) ((x)->bp)
111 | #define PT_REGS_RC(x) ((x)->ax)
112 | #define PT_REGS_SP(x) ((x)->sp)
113 | #define PT_REGS_IP(x) ((x)->ip)
114 |
115 | #elif defined(__s390x__)
116 |
117 | #define PT_REGS_PARM1(x) ((x)->gprs[2])
118 | #define PT_REGS_PARM2(x) ((x)->gprs[3])
119 | #define PT_REGS_PARM3(x) ((x)->gprs[4])
120 | #define PT_REGS_PARM4(x) ((x)->gprs[5])
121 | #define PT_REGS_PARM5(x) ((x)->gprs[6])
122 | #define PT_REGS_RET(x) ((x)->gprs[14])
123 | #define PT_REGS_FP(x) ((x)->gprs[11]) /* Works only with CONFIG_FRAME_POINTER */
124 | #define PT_REGS_RC(x) ((x)->gprs[2])
125 | #define PT_REGS_SP(x) ((x)->gprs[15])
126 | #define PT_REGS_IP(x) ((x)->psw.addr)
127 |
128 | #elif defined(__aarch64__)
129 |
130 | #define PT_REGS_PARM1(x) ((x)->regs[0])
131 | #define PT_REGS_PARM2(x) ((x)->regs[1])
132 | #define PT_REGS_PARM3(x) ((x)->regs[2])
133 | #define PT_REGS_PARM4(x) ((x)->regs[3])
134 | #define PT_REGS_PARM5(x) ((x)->regs[4])
135 | #define PT_REGS_RET(x) ((x)->regs[30])
136 | #define PT_REGS_FP(x) ((x)->regs[29]) /* Works only with CONFIG_FRAME_POINTER */
137 | #define PT_REGS_RC(x) ((x)->regs[0])
138 | #define PT_REGS_SP(x) ((x)->sp)
139 | #define PT_REGS_IP(x) ((x)->pc)
140 |
141 | #elif defined(__powerpc__)
142 |
143 | #define PT_REGS_PARM1(x) ((x)->gpr[3])
144 | #define PT_REGS_PARM2(x) ((x)->gpr[4])
145 | #define PT_REGS_PARM3(x) ((x)->gpr[5])
146 | #define PT_REGS_PARM4(x) ((x)->gpr[6])
147 | #define PT_REGS_PARM5(x) ((x)->gpr[7])
148 | #define PT_REGS_RC(x) ((x)->gpr[3])
149 | #define PT_REGS_SP(x) ((x)->sp)
150 | #define PT_REGS_IP(x) ((x)->nip)
151 |
152 | #elif defined(__sparc__)
153 |
154 | #define PT_REGS_PARM1(x) ((x)->u_regs[UREG_I0])
155 | #define PT_REGS_PARM2(x) ((x)->u_regs[UREG_I1])
156 | #define PT_REGS_PARM3(x) ((x)->u_regs[UREG_I2])
157 | #define PT_REGS_PARM4(x) ((x)->u_regs[UREG_I3])
158 | #define PT_REGS_PARM5(x) ((x)->u_regs[UREG_I4])
159 | #define PT_REGS_RET(x) ((x)->u_regs[UREG_I7])
160 | #define PT_REGS_RC(x) ((x)->u_regs[UREG_I0])
161 | #define PT_REGS_SP(x) ((x)->u_regs[UREG_FP])
162 | #if defined(__arch64__)
163 | #define PT_REGS_IP(x) ((x)->tpc)
164 | #else
165 | #define PT_REGS_IP(x) ((x)->pc)
166 | #endif
167 |
168 | #endif
169 |
170 | #ifdef __powerpc__
171 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ (ip) = (ctx)->link; })
172 | #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP
173 | #elif defined(__sparc__)
174 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ (ip) = PT_REGS_RET(ctx); })
175 | #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP
176 | #else
177 | #define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ \
178 | bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
179 | #define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({ \
180 | bpf_probe_read(&(ip), sizeof(ip), \
181 | (void *)(PT_REGS_FP(ctx) + sizeof(ip))); })
182 | #endif
183 |
184 | #endif
185 |
--------------------------------------------------------------------------------
/headers/common.h:
--------------------------------------------------------------------------------
1 | #ifndef COMMON_H_
2 | #define COMMON_H_
3 |
4 | // Nicer way to call bpf_trace_printk()
5 | #define bpf_custom_printk(fmt, ...) \
6 | ({ \
7 | char ____fmt[] = fmt; \
8 | bpf_trace_printk(____fmt, sizeof(____fmt), \
9 | ##__VA_ARGS__); \
10 | })
11 |
12 | #endif /* COMMON_H_ */
--------------------------------------------------------------------------------
/images/vbox-create.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-create.png
--------------------------------------------------------------------------------
/images/vbox-disk.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-disk.png
--------------------------------------------------------------------------------
/images/vbox-hostonly.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-hostonly.png
--------------------------------------------------------------------------------
/images/vbox-memory.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-memory.png
--------------------------------------------------------------------------------
/images/vbox-nat.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/racyusdelanoo/bpf-tutorial/d2c78e7abe416ae91efbe296b1a85384df94baed/images/vbox-nat.png
--------------------------------------------------------------------------------